Chapter 12

Uniform and Normal Distributions: From Initialization to Prediction

Uniform distribution spreads probability evenly over an interval; normal distribution is bell-shaped around the mean. Used in AI for initialization, noise, and priors.

Math diagram by chapter

Select a chapter to see its diagram below. View the flow of basic math at a glance.

Uniform distribution spreads probability evenly over an interval; normal distribution is bell-shaped around the mean. Used in AI for initialization, noise, and priors.

Uniform & Normal distribution

Many continuous data in the world follow a certain pattern. Understanding the two most basic—uniform and normal distributions—is a key step to grasping how AI works inside. The two measures from earlier chapters, mean (

\mu

) and variance (

\sigma^2

), are what shape these distributions.

Uniform distribution — Every value in an interval

[a,b]

has the same probability. The graph is a flat rectangle. Think of it as extending “each face of a die has equal chance” to a continuous scale. We use it when we want to give every possibility a fair chance with no bias.

The mean of a uniform distribution is the midpoint

(a+b)/2

. Variance is

(b-a)^2/12

, proportional to the square of the interval length. The wider the interval, the harder it is to predict the outcome (uncertainty grows), so variance grows too.

Normal distribution — A bell-shaped (Bell-curve) distribution symmetric about the mean. Heights, test scores, measurement error, and many natural phenomena follow it—hence the name “normal.” Also called Gaussian; mean (

\mu

) is the peak, standard deviation (

\sigma

) is the spread.

The power of the normal distribution is the empirical rule (68–95–99.7): about 68% of data lie in

\mu \pm 1\sigma

, about 95% in

\mu \pm 2\sigma

, and about 99.7% in

\mu \pm 3\sigma

. With this rule we can quickly see how far a value is from the mean (outlier or not) and assess AI prediction confidence.

Uniform stands for “we know nothing—blank slate”; normal for “a natural state with a mean as reference.” AI initializes weights by spreading them uniformly, then uses the normal distribution to model the errors in data as it learns toward the answer.

Design of prior information: In Bayesian statistics, the "preconception" that AI has before learning is called the prior distribution. When we want to start from a perfectly fair position we use the uniform distribution; when we have a reasonable guess that (the parameter) is near a certain mean we use the normal distribution to design the model’s basic strength.

Mathematical modeling of error: All data in the world contains noise. These noises occur independently, and when summed they end up following a normal distribution. When AI removes noise from photos or restores blurry audio, assuming the noise has a normal shape makes restoration much more accurate.

Central limit theorem: This is the foundation of statistics. No matter what shape the data has, if we sample it many times and take the mean, the distribution of those means surprisingly approaches a normal distribution. Thanks to this, AI can predict the character of the whole population from a small sample by borrowing the normal distribution.

In deep learning weight initialization can make or break training. Techniques like Xavier and He initialization finely adjust the variance of uniform or normal distributions so that the data signal is transmitted without distortion to the depths of the network.

Weight initialization — If we set all weights to zero at the start, the network cannot learn. So we fill them with random numbers from a uniform or normal distribution. Using a normal with small variance keeps most weights near zero, so training starts more stably and quickly.

Noise — VAE samples the latent vector from a normal; diffusion models add and remove Gaussian noise step by step.

Regression — Assuming normal errors makes least squares (OLS) equivalent to maximum likelihood. Prediction intervals use

\mu \pm k\sigma

Bayesian — Uniform or normal priors are common; after observing data we compute the posterior. Neural network weights can have normal priors.

Math flow — Ch10 random variables and distributions, Ch11 mean and variance, then Ch12 two concrete distributions (uniform and normal). Knowing these helps read 'initialization', 'noise', and 'prior' in AI papers.

Uniform on

[a,b]

: density

1/(b-a)

, mean

(a+b)/2

, variance

(b-a)^2/12

. For a subinterval

[c,d]\subset[a,b]

P(c\le X\le d)=(d-c)/(b-a)

Normal: mean

\mu

, variance

\sigma^2

(or std dev

\sigma

). Use tables, a calculator, or the 68–95–99.7 rule (

\mu\pm\sigma

≈ 68%,

\mu\pm2\sigma

≈ 95%).

Uniform: on

[0,6]

, mean

(0+6)/2=3

, variance

36/12=

Below are worked examples for uniform and normal setups.

Example (uniform: mean & variance)

Uniform on

[0,6]

. Find mean and variance.

Solution

Mean

(0+6)/2=3

. Variance

(6-0)^2/12=3

→ Mean 3, variance 3

Example (uniform: interval probability)

Uniform on

[0,6]

. Find

P(2\le X\le 4)

Solution

Length

4-2=2

over total length

6

P=2/6=1/3

→ Answer $1/3$

Example (normal: $\mu\pm\sigma$ )

Normal mean 70, std dev 10. What fraction lies in

\mu\pm\sigma

(60–80)?

Solution

Empirical rule → about 68%.

→ About 68%

Example (normal: $\mu\pm2\sigma$ )

Same distribution. What fraction lies in

\mu\pm2\sigma

(50–90)?

Solution

Empirical rule → about 95%.

→ About 95%

Uniform & Normal distribution

\mu

) and variance (

\sigma^2

), are what shape these distributions.

Uniform distribution — Every value in an interval

[a,b]

The mean of a uniform distribution is the midpoint

(a+b)/2

. Variance is

(b-a)^2/12

, proportional to the square of the interval length. The wider the interval, the harder it is to predict the outcome (uncertainty grows), so variance grows too.

\mu

) is the peak, standard deviation (

\sigma

) is the spread.

The power of the normal distribution is the empirical rule (68–95–99.7): about 68% of data lie in

\mu \pm 1\sigma

, about 95% in

\mu \pm 2\sigma

, and about 99.7% in

\mu \pm 3\sigma

. With this rule we can quickly see how far a value is from the mean (outlier or not) and assess AI prediction confidence.

Noise — VAE samples the latent vector from a normal; diffusion models add and remove Gaussian noise step by step.

Regression — Assuming normal errors makes least squares (OLS) equivalent to maximum likelihood. Prediction intervals use

\mu \pm k\sigma

Bayesian — Uniform or normal priors are common; after observing data we compute the posterior. Neural network weights can have normal priors.

Uniform on

[a,b]

: density

1/(b-a)

, mean

(a+b)/2

, variance

(b-a)^2/12

. For a subinterval

[c,d]\subset[a,b]

P(c\le X\le d)=(d-c)/(b-a)

Normal: mean

\mu

, variance

\sigma^2

(or std dev

\sigma

). Use tables, a calculator, or the 68–95–99.7 rule (

\mu\pm\sigma

≈ 68%,

\mu\pm2\sigma

≈ 95%).

Uniform: on

[0,6]

, mean

(0+6)/2=3

, variance

36/12=

Below are worked examples for uniform and normal setups.

Example (uniform: mean & variance)

Uniform on

[0,6]

. Find mean and variance.

Solution

Mean

(0+6)/2=3

. Variance

(6-0)^2/12=3

→ Mean 3, variance 3

Example (uniform: interval probability)

Uniform on

[0,6]

. Find

P(2\le X\le 4)

Solution

Length

4-2=2

over total length

6

P=2/6=1/3

→ Answer $1/3$

Example (normal: $\mu\pm\sigma$ )

Normal mean 70, std dev 10. What fraction lies in

\mu\pm\sigma

(60–80)?

Solution

Empirical rule → about 68%.

→ About 68%

Example (normal: $\mu\pm2\sigma$ )

Same distribution. What fraction lies in

\mu\pm2\sigma

(50–90)?

Solution

Empirical rule → about 95%.

→ About 95%