Everyone's AI
Machine learningAI Papers
Loading...

Learn

🏅My achievements

Ch.02

Dot Product and Projection: Angle and Similarity Between Data

Math diagram by chapter

Select a chapter to see its diagram below. View the flow of intermediate math at a glance.

Plane: u, v, projection

xyuvproj
base urotating vshadowresidual ⊥ u

Direction, cosine & values

direction
−10+1

u·v

13.32

cos θ (direction)

0.969

|proj| / |v|

0.969

As green vector vvv rotates, θ\thetaθ changes, and the amber shadow (projection) length, dot product, and cos⁡θ\cos\thetacosθ move together. Nearer same direction → larger dot product; orthogonal → 000; opposite → negative. The small circle shows only vvv's direction.
The dot product compresses “how aligned two vectors are” into a single number. An orthogonal projection moves one vector onto the line (or subspace) spanned by another—like a shadow. On Rn\mathbb{R}^nRn from Ch.01, this chapter trains you to read similarity, angles, and distance in the language of dot products, and connects naturally to similarity, attention, and linear layers in ML and deep learning.

Dot Product and Orthogonal Projection: Measuring Similarity with Numbers

The dot product is the “multiply matching entries and add” rule from Ch.01 folded into one number. Geometrically it is ∥u∥∥v∥cos⁡θ\|\mathbf{u}\|\|\mathbf{v}\|\cos\theta∥u∥∥v∥cosθ. A projection onto a direction is the shadow vector you get after rescaling by that dot-product coefficient.
In plain words, the dot product scores how much two arrows point the same way. Same direction → large positive; perpendicular → 000; opposite → negative. Think of projection as the shadow of a flashlight on a wall.
Here are the core formulas.
1. Dot product: u⋅v=∥u∥∥v∥cos⁡θ\mathbf{u} \cdot \mathbf{v} = \|\mathbf{u}\|\|\mathbf{v}\|\cos\thetau⋅v=∥u∥∥v∥cosθ (uses both lengths and the angle θ\thetaθ between the vectors)
2. Cosine similarity: cos⁡θ=u⋅v∥u∥∥v∥\cos\theta = \frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\|\|\mathbf{v}\|}cosθ=∥u∥∥v∥u⋅v​ (compare pure directional similarity when lengths differ)
3. Orthogonal projection: projuv\mathrm{proj}_{\mathbf{u}}\mathbf{v}proju​v (the shadow of v\mathbf{v}v onto the line in the direction of u\mathbf{u}u)
4. Unit vector: The hat on u^\mathbf{\hat{u}}u^ usually means “focus on direction.” A unit vector is an arrow with length 1 (∥u^∥=1\|\mathbf{\hat{u}}\|=1∥u^∥=1), so length is fixed and only which way it points matters. Then the shadow of v\mathbf{v}v onto u^\mathbf{\hat{u}}u^ can be written in one step as (v⋅u^) u^(\mathbf{v}\cdot\mathbf{\hat{u}})\,\mathbf{\hat{u}}(v⋅u^)u^. The number v⋅u^\mathbf{v}\cdot\mathbf{\hat{u}}v⋅u^ is a single alignment score; the length of the shadow is its magnitude ∣v⋅u^∣|\mathbf{v}\cdot\mathbf{\hat{u}}|∣v⋅u^∣. (If the score is negative, the shadow points the opposite way along that line; for length we use the absolute value.)
Here ∥u∥\|\mathbf{u}\|∥u∥ and ∥v∥\|\mathbf{v}\|∥v∥ are norms (lengths). Cosine similarity divides by the product of those lengths, so magnitude cancels out and only direction remains.
These formulas may look dense, but they are just how a computer scores “how similar” two vectors are.
In deep learning, each linear layer is built from dot products between rows of weights and the input. Attention uses query–key dot products (or scores) to decide where to look. Recommendation uses dot products / cosines between user and item embeddings.
Summary: The dot product is “sum of products of components” and couples length and angle; projection is the shadow along a direction; cosine focuses on direction; projections pair with orthogonal residuals. Ch.03 matrices bundle many dot products at once.
After Ch.01’s “boxes of numbers,” this chapter is the rule that pairs boxes to make one score. That score becomes the common language for distance, angle, and similarity before matrices, eigenvalues, and optimization.
To make “similar” precise you need a measure. Dot products and cosines separate direction vs magnitude in high dimensions and tie directly to preprocessing (e.g. normalization).
Machine learning: similarity for kNN, kernels, linear/logistic terms w⋅x\mathbf{w}\cdot\mathbf{x}w⋅x; outliers may show up as small dot products or large angles.
Geometry: least-squares fits as projection onto the column space; PCA / orthogonal bases; Gram–Schmidt subtracts projections to orthogonalize.
The table summarizes formulas and symbol meanings for solving problems, followed by item-by-item notes on why those definitions are set up that way. Worked examples walk through representative types step by step.
  • Formulau⋅v\mathbf{u}\cdot\mathbf{v}u⋅v
  • MeaningSum of products of matching components; result is a scalar
  • Formula∥u∥\|\mathbf{u}\|∥u∥
  • MeaningEuclidean norm (length) u⋅u\sqrt{\mathbf{u}\cdot\mathbf{u}}u⋅u​
  • Formulacos⁡θ\cos\thetacosθ
  • Meaningu⋅v∥u∥∥v∥\dfrac{\mathbf{u}\cdot\mathbf{v}}{\|\mathbf{u}\|\|\mathbf{v}\|}∥u∥∥v∥u⋅v​ — cosine of the angle between the vectors (exclude zero vectors)
  • Formulaprojuv\mathrm{proj}_{\mathbf{u}}\mathbf{v}proju​v
  • MeaningProjection of v\mathbf{v}v onto the line spanned by u\mathbf{u}u
  • Formulav−projuv\mathbf{v}-\mathrm{proj}_{\mathbf{u}}\mathbf{v}v−proju​v
  • MeaningResidual after projection; always orthogonal to u\mathbf{u}u
  • FormulaUnit u^\mathbf{\hat{u}}u^
  • Meaning∥u^∥=1\|\mathbf{\hat{u}}\|=1∥u^∥=1; shadow length =∣v⋅u^∣=|\mathbf{v}\cdot\mathbf{\hat{u}}|=∣v⋅u^∣
FormulaMeaning
u⋅v\mathbf{u}\cdot\mathbf{v}u⋅vSum of products of matching components; result is a scalar
∥u∥\|\mathbf{u}\|∥u∥Euclidean norm (length) u⋅u\sqrt{\mathbf{u}\cdot\mathbf{u}}u⋅u​
cos⁡θ\cos\thetacosθu⋅v∥u∥∥v∥\dfrac{\mathbf{u}\cdot\mathbf{v}}{\|\mathbf{u}\|\|\mathbf{v}\|}∥u∥∥v∥u⋅v​ — cosine of the angle between the vectors (exclude zero vectors)
projuv\mathrm{proj}_{\mathbf{u}}\mathbf{v}proju​vProjection of v\mathbf{v}v onto the line spanned by u\mathbf{u}u
v−projuv\mathbf{v}-\mathrm{proj}_{\mathbf{u}}\mathbf{v}v−proju​vResidual after projection; always orthogonal to u\mathbf{u}u
Unit u^\mathbf{\hat{u}}u^∥u^∥=1\|\mathbf{\hat{u}}\|=1∥u^∥=1; shadow length =∣v⋅u^∣=|\mathbf{v}\cdot\mathbf{\hat{u}}|=∣v⋅u^∣
Notes on each row
① u⋅v\mathbf{u}\cdot\mathbf{v}u⋅v Multiply matching entries and add. The result is a scalar, not another vector. In R2\mathbb{R}^2R2 this is uxvx+uyvyu_xv_x+u_yv_yux​vx​+uy​vy​.
② ∥u∥\|\mathbf{u}\|∥u∥ Defined as u⋅u\sqrt{\mathbf{u}\cdot\mathbf{u}}u⋅u​, the Euclidean length.
③ cos⁡θ\cos\thetacosθ For angle θ\thetaθ between the vectors, u⋅v∥u∥∥v∥\dfrac{\mathbf{u}\cdot\mathbf{v}}{\|\mathbf{u}\|\|\mathbf{v}\|}∥u∥∥v∥u⋅v​. Keep denominators nonzero (avoid zero vectors). Same direction → near 111; orthogonal → 000; opposite → negative.
④ projuv\mathrm{proj}_{\mathbf{u}}\mathbf{v}proju​v For u≠0\mathbf{u}\neq\mathbf{0}u=0, u⋅vu⋅uu\dfrac{\mathbf{u}\cdot\mathbf{v}}{\mathbf{u}\cdot\mathbf{u}}\mathbf{u}u⋅uu⋅v​u. Think of the shadow of v\mathbf{v}v on the line along u\mathbf{u}u.
⑤ v−projuv\mathbf{v}-\mathrm{proj}_{\mathbf{u}}\mathbf{v}v−proju​v The residual; always orthogonal to u\mathbf{u}u, and v=projuv+(v−projuv)\mathbf{v}=\mathrm{proj}_{\mathbf{u}}\mathbf{v}+(\mathbf{v}-\mathrm{proj}_{\mathbf{u}}\mathbf{v})v=proju​v+(v−proju​v) is an orthogonal decomposition.
⑥ Unit u^\mathbf{\hat{u}}u^ If ∥u^∥=1\|\mathbf{\hat{u}}\|=1∥u^∥=1, then (v⋅u^)u^(\mathbf{v}\cdot\mathbf{\hat{u}})\mathbf{\hat{u}}(v⋅u^)u^ is the projection, and ∣v⋅u^∣|\mathbf{v}\cdot\mathbf{\hat{u}}|∣v⋅u^∣ is the shadow length along u^\mathbf{\hat{u}}u^.

Practice problems

Below are 10 problems sampled from a bank of 60 (easy 4 · medium 3 · hard 3; order easy→medium→hard). Each item is multiple choice—pick the option number.

In logistic regression with z=w⋅x+bz=\mathbf{w}\cdot\mathbf{x}+bz=w⋅x+b, what does w⋅x\mathbf{w}\cdot\mathbf{x}w⋅x mainly encode?
1 / 10