Chapter 11
Mean and Variance: The Center and Spread of Distributions
The mean (expected value) is the center of a distribution; variance measures spread. Used in AI for prediction, loss, and regularization.
Math diagram by chapter
Select a chapter to see its diagram below. View the flow of basic math at a glance.
Mean and variance
What are mean and variance
The mean (expected value) is the center of mass of a distribution. Variance measures how much values spread around the mean. Standard deviation is the square root of variance, so it shows “typical distance from the mean” in the same units as the data.
Mean — e.g. die average (1+…+6)/6=3.5, exam class average, or demand forecast “expected value.” The red line in the figure is the mean .
Variance — probability-weighted average of (value−mean)². Large variance ⇒ more spread. Standard deviation brings spread back to the original units (points, kg, etc.): e.g. “mean 70, σ=10” means many scores lie roughly in 60–80.
Knowing only the mean is risky—e.g. a river may have average depth 1 m but spots deeper than 3 m. Variance is what we need to manage that risk (volatility). In AI we don’t just output a prediction (mean); we also look at how much it can vary (variance) to measure confidence.
Concepts often used in AI — The table below summarizes mode, mean, min/max, and median: what they mean and how they are used in AI.
- ConceptMode
- MeaningThe value with the highest probability; the outcome that appears most often in repeated trials.
- In AIUsed when choosing the “most likely class” in classification; the argmax of softmax output is the mode.
- ConceptMean (expected value)
- MeaningThe center of mass of the distribution; the sum of value×probability. It represents the “expected” value.
- In AIUsed for regression predictions, loss (e.g. MSE), expected reward in reinforcement learning, and so on.
- ConceptMin / Max
- MeaningThe interval [min, max] in which the variable can lie; the smallest and largest values that define the range.
- In AIUsed in loss minimization (gradient descent), value clipping, and setting normalization ranges.
- ConceptMedian
- MeaningThe value in the middle when ordered by size. Unlike the mean, it is less affected by extreme values (outliers).
- In AIUsed when summarizing data with many outliers or when a robust statistic is needed.
| Concept | Meaning | In AI |
|---|---|---|
| Mode | The value with the highest probability; the outcome that appears most often in repeated trials. | Used when choosing the “most likely class” in classification; the argmax of softmax output is the mode. |
| Mean (expected value) | The center of mass of the distribution; the sum of value×probability. It represents the “expected” value. | Used for regression predictions, loss (e.g. MSE), expected reward in reinforcement learning, and so on. |
| Min / Max | The interval [min, max] in which the variable can lie; the smallest and largest values that define the range. | Used in loss minimization (gradient descent), value clipping, and setting normalization ranges. |
| Median | The value in the middle when ordered by size. Unlike the mean, it is less affected by extreme values (outliers). | Used when summarizing data with many outliers or when a robust statistic is needed. |
A measure of prediction accuracy. The number an AI outputs is usually the expected value of its probability distribution. If the variance of that prediction is large, we can interpret it as the model not being confident in its own prediction.
It quantifies uncertainty. In autonomous driving or medical AI, “how certain” matters a lot. Using standard deviation , we set confidence intervals (e.g. mean ± 2σ) and assess the risk of results falling outside that range, supporting safer decisions.
It is the design principle of the loss function. In regression, MSE (mean squared error) is the mean of squared differences between target and prediction — i.e. minimizing the variance of the error. So reducing variance is exactly how the model gets better.
It is the basis of normalization. If the variance of weights grows too large, the model becomes oversensitive and overfits. Keeping or suppressing variance keeps the model stable and more general.
Daily life — Exam scores are reported as “mean 70, standard deviation 10” so you see center and spread. Same for height/weight distributions, demand forecasts (expected value ± error range), and quality control (spec ± σ).
Regression — The prediction is usually the conditional expected value: “average output given this input.” We minimize MSE (mean of squared errors), i.e. we minimize a kind of average.
Classification — The model outputs probabilities per class; we take the mode (the class with the highest probability) as the predicted class. The argmax of the softmax output does exactly that.
Reinforcement learning — Policies are evaluated using the expected reward. We learn to maximize “average future reward” for an action, which is an expectation.
Discrete case: mean = sum of , variance = . With denominator 6, and are integers.
Mean — add . With denominator 6, is an integer.
Variance — minus . is an integer and easy to compute.
Below: compute , , mean (integer), mode, and cumulative numerator.
Example. Values 1,2,3 with probs , , → .
Example. Same distribution: .
Problem types and how to solve
- Type6×mean
- Description
- How to get the answer. With denominator 6, result is an integer.
- Type36×variance
- Description
- How to get the answer. =numerator, =value.
- TypeMean (integer)
- DescriptionExpectation as integer
- How to get the answer(6×mean)/6 when integer. Problems give integer.
- TypeMode
- DescriptionValue with highest probability
- How to get the answerThe with the tallest bar.
- TypeCumulative numerator
- DescriptionNumerator of
- How to get the answerSum numerators of probabilities for values .
| Type | Description | How to get the answer |
|---|---|---|
| 6×mean | . With denominator 6, result is an integer. | |
| 36×variance | . =numerator, =value. | |
| Mean (integer) | Expectation as integer | (6×mean)/6 when integer. Problems give integer. |
| Mode | Value with highest probability | The with the tallest bar. |
| Cumulative numerator | Numerator of | Sum numerators of probabilities for values . |
Example (6×mean)
Values 1, 2, 3 with probabilities 1/6, 2/6, 3/6. Find 6×mean.
Solution
. → Answer 14
Example (36×variance)
Same distribution with , . . → Answer 20 (numeric example)