Chapter 11

Mean and Variance: The Center and Spread of Distributions

The mean (expected value) is the center of a distribution; variance measures spread. Used in AI for prediction, loss, and regularization.

Math diagram by chapter

Select a chapter to see its diagram below. View the flow of basic math at a glance.

Mean and variance

Bar heights show the probability of each value. The red line is the mean (μ)—the center of the distribution. The purple band shows the typical spread (μ\pmσ). The tallest bar is the mode—the most frequent value.

What are mean and variance

The mean (expected value) is the center of mass of a distribution. Variance measures how much values spread around the mean. Standard deviation is the square root of variance, so it shows “typical distance from the mean” in the same units as the data.

Mean — e.g. die average (1+…+6)/6=3.5, exam class average, or demand forecast “expected value.” The red line in the figure is the mean

\mu

Variance — probability-weighted average of (value−mean)². Large variance ⇒ more spread. Standard deviation $\sigma=\sqrt{\text{variance}}$ brings spread back to the original units (points, kg, etc.): e.g. “mean 70, σ=10” means many scores lie roughly in 60–80.

Knowing only the mean is risky—e.g. a river may have average depth 1 m but spots deeper than 3 m. Variance is what we need to manage that risk (volatility). In AI we don’t just output a prediction (mean); we also look at how much it can vary (variance) to measure confidence.

Concepts often used in AI — The table below summarizes mode, mean, min/max, and median: what they mean and how they are used in AI.

Concept $Mode$
Meaning $The value with the highest probability; the outcome that appears most often in repeated trials.$
In AI $Used when choosing the “most likely class” in classification; the argmax of softmax output is the mode.$

Concept $Mean (expected value)$
Meaning $The center of mass of the distribution; the sum of value\timesprobability. It represents the “expected” value.$
In AI $Used for regression predictions, loss (e.g. MSE), expected reward in reinforcement learning, and so on.$

Concept $Min / Max$
Meaning $The interval [min, max] in which the variable can lie; the smallest and largest values that define the range.$
In AI $Used in loss minimization (gradient descent), value clipping, and setting normalization ranges.$

Concept $Median$
Meaning $The value in the middle when ordered by size. Unlike the mean, it is less affected by extreme values (outliers).$
In AI $Used when summarizing data with many outliers or when a robust statistic is needed.$

Concept	Meaning	In AI
Mode	The value with the highest probability; the outcome that appears most often in repeated trials.	Used when choosing the “most likely class” in classification; the argmax of softmax output is the mode.
Mean (expected value)	The center of mass of the distribution; the sum of value×probability. It represents the “expected” value.	Used for regression predictions, loss (e.g. MSE), expected reward in reinforcement learning, and so on.
Min / Max	The interval [min, max] in which the variable can lie; the smallest and largest values that define the range.	Used in loss minimization (gradient descent), value clipping, and setting normalization ranges.
Median	The value in the middle when ordered by size. Unlike the mean, it is less affected by extreme values (outliers).	Used when summarizing data with many outliers or when a robust statistic is needed.

A measure of prediction accuracy. The number an AI outputs is usually the expected value of its probability distribution. If the variance of that prediction is large, we can interpret it as the model not being confident in its own prediction.

It quantifies uncertainty. In autonomous driving or medical AI, “how certain” matters a lot. Using standard deviation

\sigma

, we set confidence intervals (e.g. mean ± 2σ) and assess the risk of results falling outside that range, supporting safer decisions.

It is the design principle of the loss function. In regression, MSE (mean squared error) is the mean of squared differences between target and prediction — i.e. minimizing the variance of the error. So reducing variance is exactly how the model gets better.

It is the basis of normalization. If the variance of weights grows too large, the model becomes oversensitive and overfits. Keeping or suppressing variance keeps the model stable and more general.

Daily life — Exam scores are reported as “mean 70, standard deviation 10” so you see center and spread. Same for height/weight distributions, demand forecasts (expected value ± error range), and quality control (spec ± σ).

Regression — The prediction is usually the conditional expected value: “average output given this input.” We minimize MSE (mean of squared errors), i.e. we minimize a kind of average.

Classification — The model outputs probabilities per class; we take the mode (the class with the highest probability) as the predicted class. The argmax of the softmax output does exactly that.

Reinforcement learning — Policies are evaluated using the expected reward. We learn to maximize “average future reward” for an action, which is an expectation.

Discrete case: mean =

\sum \text{value}\times\text{probability}

, i.e.

E[X]=\sum_i x_i p_i

Variance =

E[X^2]-(E[X])^2

— first sum

\text{value}^2\times\text{probability}

to get

E[X^2]

, then subtract

(E[X])^2

With denominator 6, $6\times$ mean and $36\times$ variance are integers. Mode = value with highest probability; cumulative

P(X\le k)

= sum probabilities for values

\le k

Simplest case: values 1,2,3; probs

\frac{1}{6},\frac{2}{6},\frac{3}{6}

6E[X]=1\cdot1+2\cdot2+3\cdot3=

14.

Below are worked examples by type. Follow problem → solution → answer.

Example (6×mean)

Values 1, 2, 3 with probabilities 1/6, 2/6, 3/6. Find

6E[X]

Solution

6E[X]=1\times 1+2\times 2+3\times 3=14

→ Answer 14

Example (36×variance)

Same distribution with

n_1=1,n_2=2,n_3=3

x_i=1,2,3

. Find

36\times\mathrm{Var}(X)

Solution

6\sum n_i x_i^2-(\sum n_i x_i)^2=6(1+8+27)-14^2=20

→ Answer 20

Example (mean as a rational number)

Given

6E[X]=18

, find

E[X]

Solution

E[X]=18/6=3

→ Answer 3

Example (mode)

Values 1, 2, 3 with probabilities 1/6, 2/6, 3/6. Find the mode.

Solution

Largest probability is at 3.

→ Answer 3

Example (cumulative numerator)

Write

P(X\le 2)

k/6

and find

k

Solution

P(X\le 2)=1/6+2/6=3/6

. Numerator 3.

→ Answer 3

What are mean and variance

Mean — e.g. die average (1+…+6)/6=3.5, exam class average, or demand forecast “expected value.” The red line in the figure is the mean

\mu

Concepts often used in AI — The table below summarizes mode, mean, min/max, and median: what they mean and how they are used in AI.

Concept $Mode$
Meaning $The value with the highest probability; the outcome that appears most often in repeated trials.$
In AI $Used when choosing the “most likely class” in classification; the argmax of softmax output is the mode.$

Concept $Mean (expected value)$
Meaning $The center of mass of the distribution; the sum of value\timesprobability. It represents the “expected” value.$
In AI $Used for regression predictions, loss (e.g. MSE), expected reward in reinforcement learning, and so on.$

Concept $Min / Max$
Meaning $The interval [min, max] in which the variable can lie; the smallest and largest values that define the range.$
In AI $Used in loss minimization (gradient descent), value clipping, and setting normalization ranges.$

Concept $Median$
Meaning $The value in the middle when ordered by size. Unlike the mean, it is less affected by extreme values (outliers).$
In AI $Used when summarizing data with many outliers or when a robust statistic is needed.$

Concept	Meaning	In AI
Mode	The value with the highest probability; the outcome that appears most often in repeated trials.	Used when choosing the “most likely class” in classification; the argmax of softmax output is the mode.
Mean (expected value)	The center of mass of the distribution; the sum of value×probability. It represents the “expected” value.	Used for regression predictions, loss (e.g. MSE), expected reward in reinforcement learning, and so on.
Min / Max	The interval [min, max] in which the variable can lie; the smallest and largest values that define the range.	Used in loss minimization (gradient descent), value clipping, and setting normalization ranges.
Median	The value in the middle when ordered by size. Unlike the mean, it is less affected by extreme values (outliers).	Used when summarizing data with many outliers or when a robust statistic is needed.

It quantifies uncertainty. In autonomous driving or medical AI, “how certain” matters a lot. Using standard deviation

\sigma

, we set confidence intervals (e.g. mean ± 2σ) and assess the risk of results falling outside that range, supporting safer decisions.

Regression — The prediction is usually the conditional expected value: “average output given this input.” We minimize MSE (mean of squared errors), i.e. we minimize a kind of average.

Reinforcement learning — Policies are evaluated using the expected reward. We learn to maximize “average future reward” for an action, which is an expectation.

Discrete case: mean =

\sum \text{value}\times\text{probability}

, i.e.

E[X]=\sum_i x_i p_i

Variance =

E[X^2]-(E[X])^2

— first sum

\text{value}^2\times\text{probability}

to get

E[X^2]

, then subtract

(E[X])^2

With denominator 6, $6\times$ mean and $36\times$ variance are integers. Mode = value with highest probability; cumulative

P(X\le k)

= sum probabilities for values

\le k

Simplest case: values 1,2,3; probs

\frac{1}{6},\frac{2}{6},\frac{3}{6}

6E[X]=1\cdot1+2\cdot2+3\cdot3=

14.

Below are worked examples by type. Follow problem → solution → answer.

Example (6×mean)

Values 1, 2, 3 with probabilities 1/6, 2/6, 3/6. Find

6E[X]

Solution

6E[X]=1\times 1+2\times 2+3\times 3=14

→ Answer 14

Example (36×variance)

Same distribution with

n_1=1,n_2=2,n_3=3

x_i=1,2,3

. Find

36\times\mathrm{Var}(X)

Solution

6\sum n_i x_i^2-(\sum n_i x_i)^2=6(1+8+27)-14^2=20

→ Answer 20

Example (mean as a rational number)

Given

6E[X]=18

, find

E[X]

Solution

E[X]=18/6=3

→ Answer 3

Example (mode)

Values 1, 2, 3 with probabilities 1/6, 2/6, 3/6. Find the mode.

Solution

Largest probability is at 3.

→ Answer 3

Example (cumulative numerator)

Write

P(X\le 2)

k/6

and find

k

Solution

P(X\le 2)=1/6+2/6=3/6

. Numerator 3.

→ Answer 3