Chapter 12
Uniform and Normal Distributions: From Initialization to Prediction
Uniform distribution spreads probability evenly over an interval; normal distribution is bell-shaped around the mean. Used in AI for initialization, noise, and priors.
Math diagram by chapter
Select a chapter to see its diagram below. View the flow of basic math at a glance.
Uniform and Normal Distributions: From Initialization to Prediction
Uniform distribution spreads probability evenly over an interval; normal distribution is bell-shaped around the mean. Used in AI for initialization, noise, and priors.
Uniform & Normal distribution
Many continuous data in the world follow a certain pattern. Understanding the two most basic—uniform and normal distributions—is a key step to grasping how AI works inside. The two measures from earlier chapters, mean () and variance (), are what shape these distributions.
Uniform distribution — Every value in an interval has the same probability. The graph is a flat rectangle. Think of it as extending “each face of a die has equal chance” to a continuous scale. We use it when we want to give every possibility a fair chance with no bias.
The mean of a uniform distribution is the midpoint . Variance is , proportional to the square of the interval length. The wider the interval, the harder it is to predict the outcome (uncertainty grows), so variance grows too.
Normal distribution — A bell-shaped (Bell-curve) distribution symmetric about the mean. Heights, test scores, measurement error, and many natural phenomena follow it—hence the name “normal.” Also called Gaussian; mean () is the peak, standard deviation () is the spread.
The power of the normal distribution is the empirical rule (68–95–99.7): about 68% of data lie in , about 95% in , and about 99.7% in . With this rule we can quickly see how far a value is from the mean (outlier or not) and assess AI prediction confidence.
Uniform stands for “we know nothing—blank slate”; normal for “a natural state with a mean as reference.” AI initializes weights by spreading them uniformly, then uses the normal distribution to model the errors in data as it learns toward the answer.
Design of prior information: In Bayesian statistics, the "preconception" that AI has before learning is called the prior distribution. When we want to start from a perfectly fair position we use the uniform distribution; when we have a reasonable guess that (the parameter) is near a certain mean we use the normal distribution to design the model’s basic strength.
Mathematical modeling of error: All data in the world contains noise. These noises occur independently, and when summed they end up following a normal distribution. When AI removes noise from photos or restores blurry audio, assuming the noise has a normal shape makes restoration much more accurate.
Central limit theorem: This is the foundation of statistics. No matter what shape the data has, if we sample it many times and take the mean, the distribution of those means surprisingly approaches a normal distribution. Thanks to this, AI can predict the character of the whole population from a small sample by borrowing the normal distribution.
In deep learning weight initialization can make or break training. Techniques like Xavier and He initialization finely adjust the variance of uniform or normal distributions so that the data signal is transmitted without distortion to the depths of the network.
Weight initialization — If we set all weights to zero at the start, the network cannot learn. So we fill them with random numbers from a uniform or normal distribution. Using a normal with small variance keeps most weights near zero, so training starts more stably and quickly.
Noise — VAE samples the latent vector from a normal; diffusion models add and remove Gaussian noise step by step.
Regression — Assuming normal errors makes least squares (OLS) equivalent to maximum likelihood. Prediction intervals use .
Bayesian — Uniform or normal priors are common; after observing data we compute the posterior. Neural network weights can have normal priors.
Math flow — Ch10 random variables and distributions, Ch11 mean and variance, then Ch12 two concrete distributions (uniform and normal). Knowing these helps read 'initialization', 'noise', and 'prior' in AI papers.
Uniform — On , density , mean , variance . Normal — Mean , variance ; interval probabilities from standard normal table or calculator.
Example (uniform). On , mean is , variance , standard deviation .
Example (normal). For mean and standard deviation , about 68% lie in –, about 95% in –.
Problem types and how to solve
- TypeUniform
- DescriptionOn
- How to get the answerMean , variance , std dev .
- TypeNormal
- DescriptionMean , std dev
- How to get the answerInterval probabilities from standard normal table or 68-95-99.7 rule. ≈ 68%.
| Type | Description | How to get the answer |
|---|---|---|
| Uniform | On | Mean , variance , std dev . |
| Normal | Mean , std dev | Interval probabilities from standard normal table or 68-95-99.7 rule. ≈ 68%. |
Example (uniform)
On uniform distribution, find mean and variance.
Solution
Mean . Variance . → Mean 3, variance 3
Example (normal)
Normal with mean 70 and std dev 10. What fraction lies in (60–80)?
Solution
By the empirical rule, about 68%. → About 68%