Everyone's AI
Machine learningAI Papers
Loading...

Learn

🏅My achievements

Chapter 11

Mean and Variance: The Center and Spread of Distributions

The mean (expected value) is the center of a distribution; variance measures spread. Used in AI for prediction, loss, and regularization.

Math diagram by chapter

Select a chapter to see its diagram below. View the flow of basic math at a glance.

Mean and variance
123μP(x)x

Bar heights show the probability of each value. The red line is the mean (μ)—the center of the distribution. The purple band shows the typical spread (μ±σ). The tallest bar is the mode—the most frequent value.

What are mean and variance

The mean (expected value) is the center of mass of a distribution. Variance measures how much values spread around the mean. Standard deviation is the square root of variance, so it shows “typical distance from the mean” in the same units as the data.
Mean — e.g. die average (1+…+6)/6=3.5, exam class average, or demand forecast “expected value.” The red line in the figure is the mean μ\muμ.
Variance — probability-weighted average of (value−mean)². Large variance ⇒ more spread. Standard deviation σ=variance\sigma=\sqrt{\text{variance}}σ=variance​ brings spread back to the original units (points, kg, etc.): e.g. “mean 70, σ=10” means many scores lie roughly in 60–80.
Knowing only the mean is risky—e.g. a river may have average depth 1 m but spots deeper than 3 m. Variance is what we need to manage that risk (volatility). In AI we don’t just output a prediction (mean); we also look at how much it can vary (variance) to measure confidence.
Concepts often used in AI — The table below summarizes mode, mean, min/max, and median: what they mean and how they are used in AI.
  • ConceptMode
  • MeaningThe value with the highest probability; the outcome that appears most often in repeated trials.
  • In AIUsed when choosing the “most likely class” in classification; the argmax of softmax output is the mode.
  • ConceptMean (expected value)
  • MeaningThe center of mass of the distribution; the sum of value×probability. It represents the “expected” value.
  • In AIUsed for regression predictions, loss (e.g. MSE), expected reward in reinforcement learning, and so on.
  • ConceptMin / Max
  • MeaningThe interval [min, max] in which the variable can lie; the smallest and largest values that define the range.
  • In AIUsed in loss minimization (gradient descent), value clipping, and setting normalization ranges.
  • ConceptMedian
  • MeaningThe value in the middle when ordered by size. Unlike the mean, it is less affected by extreme values (outliers).
  • In AIUsed when summarizing data with many outliers or when a robust statistic is needed.
ConceptMeaningIn AI
ModeThe value with the highest probability; the outcome that appears most often in repeated trials.Used when choosing the “most likely class” in classification; the argmax of softmax output is the mode.
Mean (expected value)The center of mass of the distribution; the sum of value×probability. It represents the “expected” value.Used for regression predictions, loss (e.g. MSE), expected reward in reinforcement learning, and so on.
Min / MaxThe interval [min, max] in which the variable can lie; the smallest and largest values that define the range.Used in loss minimization (gradient descent), value clipping, and setting normalization ranges.
MedianThe value in the middle when ordered by size. Unlike the mean, it is less affected by extreme values (outliers).Used when summarizing data with many outliers or when a robust statistic is needed.
A measure of prediction accuracy. The number an AI outputs is usually the expected value of its probability distribution. If the variance of that prediction is large, we can interpret it as the model not being confident in its own prediction.
It quantifies uncertainty. In autonomous driving or medical AI, “how certain” matters a lot. Using standard deviation σ\sigmaσ, we set confidence intervals (e.g. mean ± 2σ) and assess the risk of results falling outside that range, supporting safer decisions.
It is the design principle of the loss function. In regression, MSE (mean squared error) is the mean of squared differences between target and prediction — i.e. minimizing the variance of the error. So reducing variance is exactly how the model gets better.
It is the basis of normalization. If the variance of weights grows too large, the model becomes oversensitive and overfits. Keeping or suppressing variance keeps the model stable and more general.
Daily life — Exam scores are reported as “mean 70, standard deviation 10” so you see center and spread. Same for height/weight distributions, demand forecasts (expected value ± error range), and quality control (spec ± σ).
Regression — The prediction is usually the conditional expected value: “average output given this input.” We minimize MSE (mean of squared errors), i.e. we minimize a kind of average.
Classification — The model outputs probabilities per class; we take the mode (the class with the highest probability) as the predicted class. The argmax of the softmax output does exactly that.
Reinforcement learning — Policies are evaluated using the expected reward. We learn to maximize “average future reward” for an action, which is an expectation.
Discrete case: mean = sum of value×probability\text{value}\times\text{probability}value×probability, variance = E[X2]−(E[X])2E[X^2]-(E[X])^2E[X2]−(E[X])2. With denominator 6, 6×mean6\times\text{mean}6×mean and 36×variance36\times\text{variance}36×variance are integers.
Mean — add value×probability\text{value}\times\text{probability}value×probability. With denominator 6, 6×mean6\times\text{mean}6×mean is an integer.
Variance — E[X2]E[X^2]E[X2] minus (mean)2(\text{mean})^2(mean)2. 36×variance36\times\text{variance}36×variance is an integer and easy to compute.
Below: compute 6×mean6\times\text{mean}6×mean, 36×variance36\times\text{variance}36×variance, mean (integer), mode, and cumulative numerator.
Example. Values 1,2,3 with probs 16\frac{1}{6}61​, 26\frac{2}{6}62​, 36\frac{3}{6}63​ → 6×mean=1×1+2×2+3×3=146\times\text{mean} = 1\times1+2\times2+3\times3 = 146×mean=1×1+2×2+3×3=14.
Example. Same distribution: 36×variance=6∑i(nixi2)−(∑inixi)236\times\text{variance} = 6\sum_i (n_i x_i^2) - (\sum_i n_i x_i)^236×variance=6∑i​(ni​xi2​)−(∑i​ni​xi​)2.
Problem types and how to solve
  • Type6×mean
  • Description6E[X]6 E[X]6E[X]
  • How to get the answer∑(value×numerator)\sum (\text{value}\times\text{numerator})∑(value×numerator). With denominator 6, result is an integer.
  • Type36×variance
  • Description36×variance36\times\text{variance}36×variance
  • How to get the answer6∑nixi2−(∑nixi)26\sum n_i x_i^2 - (\sum n_i x_i)^26∑ni​xi2​−(∑ni​xi​)2. nin_ini​=numerator, xix_ixi​=value.
  • TypeMean (integer)
  • DescriptionExpectation as integer
  • How to get the answer(6×mean)/6 when integer. Problems give integer.
  • TypeMode
  • DescriptionValue with highest probability
  • How to get the answerThe xix_ixi​ with the tallest bar.
  • TypeCumulative numerator
  • DescriptionNumerator of P(X≤k)P(X\le k)P(X≤k)
  • How to get the answerSum numerators of probabilities for values ≤k\le k≤k.
TypeDescriptionHow to get the answer
6×mean6E[X]6 E[X]6E[X]∑(value×numerator)\sum (\text{value}\times\text{numerator})∑(value×numerator). With denominator 6, result is an integer.
36×variance36×variance36\times\text{variance}36×variance6∑nixi2−(∑nixi)26\sum n_i x_i^2 - (\sum n_i x_i)^26∑ni​xi2​−(∑ni​xi​)2. nin_ini​=numerator, xix_ixi​=value.
Mean (integer)Expectation as integer(6×mean)/6 when integer. Problems give integer.
ModeValue with highest probabilityThe xix_ixi​ with the tallest bar.
Cumulative numeratorNumerator of P(X≤k)P(X\le k)P(X≤k)Sum numerators of probabilities for values ≤k\le k≤k.

Example (6×mean)
Values 1, 2, 3 with probabilities 1/6, 2/6, 3/6. Find 6×mean.
Solution
6E[X]=1×1+2×2+3×3=146E[X]=1\times 1+2\times 2+3\times 3=146E[X]=1×1+2×2+3×3=14. → Answer 14

Example (36×variance)
Same distribution with n1=1,n2=2,n3=3n_1=1,n_2=2,n_3=3n1​=1,n2​=2,n3​=3, x1=1,x2=2,x3=3x_1=1,x_2=2,x_3=3x1​=1,x2​=2,x3​=3. 36×variance=6(1⋅1+2⋅4+3⋅9)−(1+4+9)2=6⋅36−196=2036\times\text{variance}=6(1\cdot 1+2\cdot 4+3\cdot 9)-(1+4+9)^2=6\cdot 36-196=2036×variance=6(1⋅1+2⋅4+3⋅9)−(1+4+9)2=6⋅36−196=20. → Answer 20 (numeric example)