Ch.03

Loss Function (MSE): Measuring Prediction Error

(y - \hat{y})^2

Select a chapter to see its diagram below. View the machine learning flow at a glance.

\hat{y}

= \frac{1}{n}\sum_i (y_i - \hat{y}_i)^2

y

+2

\hat{y} = w x + b

It defines the learning goal — Machine learning is often summarized as 'minimize the loss'. For regression, when that loss is MSE, the model moves only in directions that lower MSE, so the objective is clear .

Differentiation is easy — The square function has a simple derivative, so gradient descent with MSE is tractable. Deep learning also uses squared-error-style losses widely.

y

Training regression models — Linear regression, neural network regression, etc. compute MSE on the training data and update parameters to reduce it.

Comparing models — To compare which line (or model) fits the data better, compute MSE for each; the smaller value wins.

Validation and test — After training, computing MSE on unseen data (validation/test set) gives an objective measure of generalization .

y