Chapter 10

Random Variable & Distribution

A random variable assigns numbers to outcomes of an experiment; a probability distribution summarizes how likely each value is. Used in deep learning for prediction and uncertainty.

Math diagram by chapter

Select a chapter to see its diagram below. View the flow of basic math at a glance.

Poisson: skewed (event count) · Binomial: symmetric, peak at center (success count)

Normal

Poisson

Binomial

Figure 2: Discrete vs continuous

What are random variables and probability distributions?

A random variable assigns numbers to outcomes of an experiment; a probability distribution summarizes how likely each value is. The figure above shows three distributions often used in AI: normal, Poisson, and binomial.

① Discrete random variable — takes only finite or countable values. It can be shown in a table, as a function, or as a bar graph. The probability

P(X=k)

for each value

k

is the probability mass function (PMF); the essential condition is

\sum_k P(X=k)=1

Discrete examples — zoo visitors per day, number of heads when flipping two coins, number of rolls until a bowling strike: countable outcomes. The Poisson and binomial bars in the figure are discrete random variables.

② Continuous random variable — takes infinitely many values in an interval. We don't assign probability to a single value; we use a probability density function (PDF) for probabilities over intervals. It's expressed by a function and a curve, not a table.

Continuous examples — annual rainfall, light-bulb lifetime, time until the next bus: continuous quantities. The normal distribution (bell curve) in the figure is a classic continuous example.

A probability distribution is the rule for which values occur and how often. The figure shows normal (continuous), Poisson (discrete), and binomial (discrete) — knowing these covers most uses in AI.

The probability mass function (PMF) is the probability

P(X=k)

for each value

k

of a discrete random variable. In a bar chart, the height of each bar is that probability, and the sum of all bar heights is 1. The figure below shows three common distributions.

Connecting to the figures — Figure 1 (above): the normal (left) is continuous (curve); Poisson and binomial (center, right) are discrete (bars). Figure 2 compares discrete (bars) and continuous (curve) side by side. In AI: normal for noise and regression, Poisson for event counts, binomial for success counts and binary classification.

Distribution condition (discrete) — The PMF is the probability

P(X=k)

of each value

k

. Essential:

\sum_k P(X=k)=1

. (e.g. For a die,

P(1)+\cdots+P(6)=1

In plain words: For discrete distributions, all the probabilities of the possible outcomes must add up to 1. Just like a die—the chances of 1 through 6 add up to 1.

Distribution condition (continuous) — The PDF

f(x)

gives probability over intervals:

P(a\le X\le b)=\int_a^b f(x)\,dx

, and the total area is

\int_{-\infty}^{\infty} f(x)\,dx=1

In plain words: For continuous distributions, probability is the area under the curve. The probability that X falls in [a,b] is the area under the curve from a to b, and the total area under the whole curve is 1.

Expectation (mean) — Discrete:

E[X]=\sum_k x_k\, P(X=k)

; continuous is given by an integral. The “average weighted by probability.”

In plain words: Expectation is the average value when each outcome is weighted by its probability. For a die, it's (1×1/6)+(2×1/6)+…+(6×1/6)=3.5—the "probability-weighted" average.

Variance —

\mathrm{Var}(X)=E[(X-E[X])^2]

. Standard deviation is

\sigma=\sqrt{\mathrm{Var}(X)}

. Ch11 covers this in detail.

In plain words: Variance measures how spread out the values are from the mean. You take (each value minus the mean), square it, then average by probability; the square root of variance is the standard deviation.

Normal distribution (continuous) — Density

f(x)=\frac{1}{\sigma\sqrt{2\pi}}\,e^{-(x-\mu)^2/(2\sigma^2)}

\mu

is the mean,

\sigma

the standard deviation.

In plain words: A symmetric bell-shaped curve centered at the mean μ. The spread is controlled by σ (standard deviation)—larger σ means a wider, flatter curve. Often used for heights, measurement error, and noise.

Poisson distribution (discrete) —

P(X=k)=\frac{\lambda^k e^{-\lambda}}{k!}

(

k=0,1,2,\ldots

\lambda

is the average number of events in a fixed interval.

In plain words: Used when counting how many times an event happens in a fixed time or space. λ is the average count; the formula gives the probability of exactly k events. The bar chart is usually skewed to one side.

Binomial distribution (discrete) —

P(X=k)=\binom{n}{k}p^k(1-p)^{n-k}

n

= number of trials,

p

= success probability per trial.

In plain words: You run the same trial n times and count how many successes (k). p is the chance of success on one trial. Like flipping a coin n times and counting heads—often gives a symmetric, peaked bar chart.

When we predict with "possible values and their probabilities," that's a random variable and distribution. The three distributions in the figure are used in AI to express uncertainty.

AI and the figure — (Normal) for regression, noise, latent space; (Poisson) for view counts, clicks, event counts; (binomial) for binary classification and success probability. Softmax, sampling, and cross-entropy loss all tie to these distributions.

Daily life — zoo visitors (discrete); rainfall, bulb lifetime, bus wait time (continuous). Distinguishing countable vs continuous matches the bars (discrete) and curve (continuous) in the figure.

In AI — The normal in the figure models errors and Gaussian noise; Poisson models count data and word frequency; binomial models class probability and success/failure. Ch11–Ch12 cover mean, variance, and the normal distribution in more detail.

For a discrete random variable: ① list possible values and their probabilities → ② check that probabilities sum to 1 → ③ expectation = sum of (value)×(probability).

Sum of probabilities —

P(X=1)+P(X=2)+P(X=3)=1

. With denominator 6,

a/6+b/6+c/6=1

gives

a+b+c=6

. Knowing two of

a,b,c

gives the third.

Expectation —

E[X]=x_1 p_1+x_2 p_2+x_3 p_3

. When the denominator is 6,

6\cdot E[X]

is an integer, so problems may ask for “6×expectation”.

Examples — Fill the blank so probabilities sum to 1, or find 6×expectation.

Ex 1. Three probabilities a/6, b/6, c/6 sum to 1, so a+b+c=6. If a=1 and b=2, then c=3.

Ex 2. Values 1, 2, 3 with probabilities 1/6, 2/6, 3/6: 6×expectation = 1×1+2×2+3×3 = 14.