Chapter 10
Random Variable & Distribution
A random variable assigns numbers to outcomes of an experiment; a probability distribution summarizes how likely each value is. Used in deep learning for prediction and uncertainty.
Math diagram by chapter
Select a chapter to see its diagram below. View the flow of basic math at a glance.
Poisson: skewed (event count) · Binomial: symmetric, peak at center (success count)
Normal
Poisson
Binomial
Figure 2: Discrete vs continuous
What are random variables and probability distributions?
A random variable assigns numbers to outcomes of an experiment; a probability distribution summarizes how likely each value is. The figure above shows three distributions often used in AI: normal, Poisson, and binomial.
① Discrete random variable — takes only finite or countable values. It can be shown in a table, as a function, or as a bar graph. The probability for each value is the probability mass function (PMF); the essential condition is .
Discrete examples — zoo visitors per day, number of heads when flipping two coins, number of rolls until a bowling strike: countable outcomes. The Poisson and binomial bars in the figure are discrete random variables.
② Continuous random variable — takes infinitely many values in an interval. We don't assign probability to a single value; we use a probability density function (PDF) for probabilities over intervals. It's expressed by a function and a curve, not a table.
Continuous examples — annual rainfall, light-bulb lifetime, time until the next bus: continuous quantities. The normal distribution (bell curve) in the figure is a classic continuous example.
A probability distribution is the rule for which values occur and how often. The figure shows normal (continuous), Poisson (discrete), and binomial (discrete) — knowing these covers most uses in AI.
The probability mass function (PMF) is the probability for each value of a discrete random variable. In a bar chart, the height of each bar is that probability, and the sum of all bar heights is 1. The figure below shows three common distributions.
Connecting to the figures — Figure 1 (above): the normal (left) is continuous (curve); Poisson and binomial (center, right) are discrete (bars). Figure 2 compares discrete (bars) and continuous (curve) side by side. In AI: normal for noise and regression, Poisson for event counts, binomial for success counts and binary classification.
Distribution condition (discrete) — The PMF is the probability of each value . Essential: . (e.g. For a die, .)
In plain words: For discrete distributions, all the probabilities of the possible outcomes must add up to 1. Just like a die—the chances of 1 through 6 add up to 1.
Distribution condition (continuous) — The PDF gives probability over intervals: , and the total area is .
In plain words: For continuous distributions, probability is the area under the curve. The probability that X falls in [a,b] is the area under the curve from a to b, and the total area under the whole curve is 1.
Expectation (mean) — Discrete: ; continuous is given by an integral. The “average weighted by probability.”
In plain words: Expectation is the average value when each outcome is weighted by its probability. For a die, it's (1×1/6)+(2×1/6)+…+(6×1/6)=3.5—the "probability-weighted" average.
Variance — . Standard deviation is . Ch11 covers this in detail.
In plain words: Variance measures how spread out the values are from the mean. You take (each value minus the mean), square it, then average by probability; the square root of variance is the standard deviation.
Normal distribution (continuous) — Density . is the mean, the standard deviation.
In plain words: A symmetric bell-shaped curve centered at the mean μ. The spread is controlled by σ (standard deviation)—larger σ means a wider, flatter curve. Often used for heights, measurement error, and noise.
Poisson distribution (discrete) — (). is the average number of events in a fixed interval.
In plain words: Used when counting how many times an event happens in a fixed time or space. λ is the average count; the formula gives the probability of exactly k events. The bar chart is usually skewed to one side.
Binomial distribution (discrete) — . = number of trials, = success probability per trial.
In plain words: You run the same trial n times and count how many successes (k). p is the chance of success on one trial. Like flipping a coin n times and counting heads—often gives a symmetric, peaked bar chart.
When we predict with "possible values and their probabilities," that's a random variable and distribution. The three distributions in the figure are used in AI to express uncertainty.
AI and the figure — (Normal) for regression, noise, latent space; (Poisson) for view counts, clicks, event counts; (binomial) for binary classification and success probability. Softmax, sampling, and cross-entropy loss all tie to these distributions.
Daily life — zoo visitors (discrete); rainfall, bulb lifetime, bus wait time (continuous). Distinguishing countable vs continuous matches the bars (discrete) and curve (continuous) in the figure.
In AI — The normal in the figure models errors and Gaussian noise; Poisson models count data and word frequency; binomial models class probability and success/failure. Ch11–Ch12 cover mean, variance, and the normal distribution in more detail.
For a discrete random variable: ① list possible values and their probabilities → ② check that probabilities sum to 1 → ③ expectation = sum of (value)×(probability).
Sum of probabilities — . With denominator 6, gives . Knowing two of gives the third.
Expectation — . When the denominator is 6, is an integer, so problems may ask for “6×expectation”.
Examples — Fill the blank so probabilities sum to 1, or find 6×expectation.
Ex 1. Three probabilities a/6, b/6, c/6 sum to 1, so a+b+c=6. If a=1 and b=2, then c=3.
Ex 2. Values 1, 2, 3 with probabilities 1/6, 2/6, 3/6: 6×expectation = 1×1+2×2+3×3 = 14.