Ch.10
Hessian Matrix: Second Derivatives and Curvature of Surfaces
Math diagram by chapter
Select a chapter to see its diagram below. View the flow of intermediate math at a glance.
The first derivative tells you "which way is downhill"; the second (Hessian) tells you "will the surface bowl down, or go up in one direction and down in another (saddle point)?" Follow the animation below.
The Hessian is the matrix of second derivatives, so the "curvature" in the figure below is exactly what the Hessian describes.
Bowl: curves only down → minimum
Saddle: value ↑ this way, value ↓ that way
Orange direction: value goes up · Green direction: value goes down
Saddle: neither minimum nor maximum
Bowl curves only down → here is the minimum
Inverted bowl curves only up → here is the maximum
Saddle: one direction up, the other down → neither min nor max
Left: bowl (only curves down) → minimum. Inverted bowl (only curves up) → maximum. Saddle: one direction up, the other down → neither min nor max.
The Hessian matrix is a square matrix of second-order partial derivatives of a scalar function. It encodes how much a surface curves at a point and is used to classify minima, maxima, and saddle points in optimization, and forms the basis of Newton's method and trust-region methods.
Hessian Matrix: Reading the Curvature of Surfaces
What is the Hessian matrix? — Think of it as a table of numbers that describe how much the surface curves in every direction at the point where you stand. It is a square matrix built from second derivatives of the function, and it is symmetric (same on both sides of the diagonal).
Imagine walking downhill with your eyes closed. What you feel under your feet—"this way is steeper down"—is the first derivative (gradient). The sense of "if I take one more step, will the ground bowl down or stay flat?" is the second derivative, i.e. the Hessian. With it you can avoid cliffs and find the true bottom, like the bottom of a bowl.
More precisely, the Hessian is the table whose entry is —the function differentiated twice, once in each of the and directions. The eigenvalues of this matrix are what matter: all positive → local minimum (bowl), all negative → local maximum (dome), mixed signs → saddle point (up in one direction, down in another).
In machine learning, training is about finding the "valley" where the error is smallest. Moving only by gradient is slow. Using the Hessian to read curvature lets you take Newton-style jumps toward the bottom and learn much faster.
The Hessian is a symmetric matrix of second partial derivatives of a scalar function and encodes curvature and the nature of critical points. At a point where the gradient is zero, all positive eigenvalues imply a local minimum, all negative a local maximum, and mixed signs a saddle point. In machine learning it underlies second-order optimization such as Newton's method, trust-region, and quasi-Newton methods.
On the way down you may hit a flat spot where the gradient is zero. That does not mean you have reached the true bottom—it could be a saddle (flat in one place but up one way and down another). The eigenvalues of the Hessian tell you whether it is a true minimum or a saddle. When there are many variables (as in AI), avoiding these fake bottoms is crucial.
You want small steps on narrow paths and larger steps on open ground. The Hessian tells you "how steep each direction is," so you can set step size (learning rate) well and descend efficiently without wasted moves.
Newton's method moves a lot in one step with: . Here is the current point, is the gradient there, is the Hessian at that point, and is its inverse. So you look at both the gradient and the curvature (Hessian) and jump toward the bottom to . That can reach the answer much faster than small gradient-only steps.
When there are many variables, computing the Hessian exactly is costly. In practice, quasi-Newton methods (e.g. BFGS) approximate the Hessian from past gradient information instead of computing it fully, and are used more often.
The table below lists only formulas and symbol meanings needed for problem-solving. See the worked examples under the table for step-by-step solutions.
- Formula
- Symbol meaning = the number in the entry of the table—think of it as "differentiate once in , once in ." is the function, , are variable (axis) indices. Order does not matter, so and the matrix is symmetric.
- Formula (total entries)
- Symbol meaning = number of variables. With variables the Hessian is , so there are entries. E.g. 2 vars → 4, 3 vars → 9.
- Formula (independent entries)
- Symbol meaning = number of variables. By symmetry you only count the upper triangle, giving . E.g. 2 vars → 3, 3 vars → 6.
- Formula (rows/columns)
- Symbol meaning = number of variables. The Hessian is , so "how many rows? columns?" both are .
- FormulaEigenvalue test
- Symbol meaning = eigenvalue of the Hessian (curvature in each direction). All positive → bowl, minimum. All negative → dome, maximum. Mixed signs → up one way, down the other, saddle.
- Formula
- Symbol meaning = current point, = next point. = Hessian at that point, = its inverse. = gradient there. The formula "jumps toward the bottom" using both gradient and curvature.
- Formula
- Symbol meaning = current position, = next. = slope (1st derivative), = 2nd derivative (Hessian in 1D). For , (constant).
- Formula ()
- Symbol meaning = second derivative. is the coefficient of . For a quadratic, differentiating twice removes and leaves the constant .
- Formula (critical point)
- Symbol meaning = gradient (vector of 1st partials). = zero vector ("no gradient"). Where the gradient is zero is a candidate for min/max/saddle; use Hessian eigenvalues to tell which.
| Formula | Symbol meaning |
|---|---|
| = the number in the entry of the table—think of it as "differentiate once in , once in ." is the function, , are variable (axis) indices. Order does not matter, so and the matrix is symmetric. | |
| (total entries) | = number of variables. With variables the Hessian is , so there are entries. E.g. 2 vars → 4, 3 vars → 9. |
| (independent entries) | = number of variables. By symmetry you only count the upper triangle, giving . E.g. 2 vars → 3, 3 vars → 6. |
| (rows/columns) | = number of variables. The Hessian is , so "how many rows? columns?" both are . |
| Eigenvalue test | = eigenvalue of the Hessian (curvature in each direction). All positive → bowl, minimum. All negative → dome, maximum. Mixed signs → up one way, down the other, saddle. |
| = current point, = next point. = Hessian at that point, = its inverse. = gradient there. The formula "jumps toward the bottom" using both gradient and curvature. | |
| = current position, = next. = slope (1st derivative), = 2nd derivative (Hessian in 1D). For , (constant). | |
| () | = second derivative. is the coefficient of . For a quadratic, differentiating twice removes and leaves the constant . |
| (critical point) | = gradient (vector of 1st partials). = zero vector ("no gradient"). Where the gradient is zero is a candidate for min/max/saddle; use Hessian eigenvalues to tell which. |
Worked examples
Example 1 — Entry count
Problem: How many Hessian entries does have?
Solution: For 2 variables the Hessian is a matrix, so there are 4 entries in total. By symmetry , so the number of independent entries is , , = 3.
→ Answer 4 for total count, 3 for independent entries.
Example 2 — Minimum
Problem: When the Hessian eigenvalues are 2 and 5, what kind of point is it?
Solution: Both eigenvalues are positive, so the surface curves downward in every direction (bowl). So it is a local minimum.
→ Choose 1 (minimum) among
①min
②max
③saddle.
Example 3 — Maximum
Problem: When the Hessian eigenvalues are and , what kind of point is it?
Solution: Both are negative, so the surface curves upward in every direction (inverted bowl). Local maximum.
→ Choose 2 (maximum).
Example 4 — Saddle
Problem: When the Hessian eigenvalues are and , what kind of point is it?
Solution: Eigenvalues have both signs, so one direction goes up and another down. Saddle point.
→ Choose 3 (saddle).
Example 5 — Second derivative value
Problem: For , what is ?
Solution: For a quadratic the coefficient of is . The second derivative is , a constant independent of .
→ Answer 6.
Example 6 — Newton step (1D)
Problem: For , , what is after one Newton step?
Solution: The 1D Newton step is . We have , , so , . Thus .
→ Answer 0.
Example 7 — Definition (T/F)
Problem: "If all eigenvalues of the Hessian are positive, the point is a local minimum." Answer 1 if true, 0 if false.
Solution: The statement is correct. When all eigenvalues are positive, the surface curves down in every direction, so it is a local minimum.
→ Answer 1.
문제
Read the instructions below, find the answer (integer), and enter it in the blank (?).
Choose the option that matches the question. Enter one number (1, 2, 3) for
①minimum
②maximum
③saddle.
(Hessian eigenvalue / definition question)
1 / 10