Ch.00
Intermediate Math and AI: Multivariable Space and Uncertainty
Intermediate math is where the language of AI becomes more precise. Instead of treating data as just numbers, this course views it as vectors and matrices, and studies the rules that move between them as linear transformations. You’ll also interpret how learning behaves by using Jacobians (how outputs change with many inputs) and Hessians (curvature information), so you can understand why training can be fast, slow, or unstable.
Math diagram by chapter
Select a chapter to see its diagram below. View the flow of intermediate math at a glance.
What you learn in Ch01–Ch20
Intermediate math deepens the language you use to understand AI. You learn how data is represented and transformed using vectors, matrices, and linear transformations, then quantify similarity and direction with dot products and projections. Next, you interpret change and curvature using Jacobians and Hessians, which lets you understand the shape of the loss landscape. Finally, you design learning more robustly with Taylor series and convex optimization, and learn uncertainty with Bayes, covariance, and the multivariate normal distribution.
- Ch.01Vectors and Vector Space: Magnitude and Direction Beyond Scalars
- Ch.02Dot Product and Projection: Angle and Similarity Between Data
- Ch.03Matrices and Data: Structural Representation of Many Vectors
- Ch.04Matrix Multiplication and Linear Transformation: Math That Manipulates Space
- Ch.05Inverse and Determinant: Inverse of Transformation and Change in Volume
- Ch.06Linear Independence and Rank: Redundancy and Effective Dimension
- Ch.07Eigenvalues and Eigenvectors: Principal Axes Unchanged by Transformation
- Ch.08Directional Derivative and Gradient: Steepest Ascent in Multidimensional Space
- Ch.09Jacobian Matrix: First Derivatives of Multivariable Vector Functions
- Ch.10Hessian Matrix: Second Derivatives and Curvature of Surfaces
- Ch.11Taylor Series: Approximating Complex Functions with Polynomials
- Ch.12Convex Optimization: Conditions for Finding the Minimum
- Ch.13Conditional Probability and Dependence: Probabilistic Relations Between Variables
- Ch.14Bayes' Theorem: Updating Probability with Observed Data
- Ch.15Covariance and Correlation: Measuring Linear Association Between Two Variables
- Ch.16Multivariate Normal Distribution: Joint Probability Model for Many Variables
- Ch.17Maximum Likelihood Estimation (MLE): Inferring Parameters from Observations
- Ch.18Entropy: Quantifying Uncertainty via Information Theory
- Ch.19Cross-Entropy and KL Divergence: Measuring Difference Between Two Distributions
- Ch.20Intermediate Math Summary: Linear Algebra and Probability Combined
Vectors, matrices, and sensitivity: how intermediate math explains AI
Vector spaces give a framework for describing data by both direction and magnitude. For example, an image can be represented as coordinates of learned features.
A matrix represents transformations of vectors. In particular, linear transformations provide consistent rules for how coordinates change—this is exactly how each layer in a neural network can be expressed mathematically.
Jacobians and Hessians are maps of sensitivity. Jacobians answer “how much the output changes when the inputs change,” while Hessians describe the curvature of the loss landscape. With these maps, you can design learning updates more intelligently.
Training is essentially repeated computation that reduces error. To understand why error decreases, you need multivariable change (gradients and sensitivity), which is the core of intermediate math.
Linear algebra helps interpret representation. Many ideas (like embeddings and component analysis) reduce to “how vectors are rearranged.” Once you know the math, the results become explainable.
Understanding Hessians helps you see why learning is slow near some regions and faster near others. Second-order information also underpins methods such as Newton’s method and trust-region approaches.
In forward pass, input vectors are transformed by matrix multiplications and linear rules. This determines which features are emphasized and which are suppressed.
In backward pass, you need how changes propagate—Jacobians play that role. The chain rule becomes a language for tracking how small changes reach the output, enabling accurate gradient computation.
During optimization, curvature information (Hessians) can improve stability. Hessians tell you whether the loss surface is flat or steep, shaping the update step.
- TopicSimilarity & direction
- Role in AIBring similar features closer
- Intermediate conceptDot product, projection
- TopicHow a layer operates
- Role in AIHow one layer transforms vectors
- Intermediate conceptMatrices, linear transformations
- TopicSensitivity (change)
- Role in AIHow output changes when inputs change
- Intermediate conceptJacobians, gradients
- TopicLearning curvature
- Role in AIHow fast optimization proceeds
- Intermediate conceptHessians, eigenvalues
- TopicUncertainty language
- Role in AIDescribe joint behavior of multiple variables
- Intermediate conceptCovariance, multivariate normal
| Topic | Role in AI | Intermediate concept |
|---|---|---|
| Similarity & direction | Bring similar features closer | Dot product, projection |
| How a layer operates | How one layer transforms vectors | Matrices, linear transformations |
| Sensitivity (change) | How output changes when inputs change | Jacobians, gradients |
| Learning curvature | How fast optimization proceeds | Hessians, eigenvalues |
| Uncertainty language | Describe joint behavior of multiple variables | Covariance, multivariate normal |