Ch.10

Hessian Matrix: Second Derivatives and Curvature of Surfaces

Math diagram by chapter

Select a chapter to see its diagram below. View the flow of intermediate math at a glance.

The first derivative tells you "which way is downhill"; the second (Hessian) tells you "will the surface bowl down, or go up in one direction and down in another (saddle point)?" Follow the animation below.
The Hessian is the matrix of second derivatives, so the "curvature" in the figure below is exactly what the Hessian describes.

Bowl: curves only down → minimum

↓ curvature(x, f(x))

Saddle: value ↑ this way, value ↓ that way

Orange direction: value goes up · Green direction: value goes down

Saddle: neither minimum nor maximum

Bowl curves only down → here is the minimum

Inverted bowl curves only up → here is the maximum

Saddle: one direction up, the other down → neither min nor max

Left: bowl (only curves down) → minimum. Inverted bowl (only curves up) → maximum. Saddle: one direction up, the other down → neither min nor max.

The Hessian matrix is a square matrix of second-order partial derivatives of a scalar function. It encodes how much a surface curves at a point and is used to classify minima, maxima, and saddle points in optimization, and forms the basis of Newton's method and trust-region methods.

Hessian Matrix: Reading the Curvature of Surfaces

What is the Hessian matrix? — Think of it as a table of numbers that describe how much the surface curves in every direction at the point where you stand. It is a square matrix built from second derivatives of the function, and it is symmetric (same on both sides of the diagonal).
Imagine walking downhill with your eyes closed. What you feel under your feet—"this way is steeper down"—is the first derivative (gradient). The sense of "if I take one more step, will the ground bowl down or stay flat?" is the second derivative, i.e. the Hessian. With it you can avoid cliffs and find the true bottom, like the bottom of a bowl.
More precisely, the Hessian H\mathbf{H} is the table whose (i,j)(i,j) entry is Hij=2fxixjH_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}—the function ff differentiated twice, once in each of the xix_i and xjx_j directions. The eigenvalues of this matrix are what matter: all positive → local minimum (bowl), all negative → local maximum (dome), mixed signs → saddle point (up in one direction, down in another).
In machine learning, training is about finding the "valley" where the error is smallest. Moving only by gradient is slow. Using the Hessian to read curvature lets you take Newton-style jumps toward the bottom and learn much faster.
The Hessian is a symmetric matrix of second partial derivatives of a scalar function and encodes curvature and the nature of critical points. At a point where the gradient is zero, all positive eigenvalues imply a local minimum, all negative a local maximum, and mixed signs a saddle point. In machine learning it underlies second-order optimization such as Newton's method, trust-region, and quasi-Newton methods.
On the way down you may hit a flat spot where the gradient is zero. That does not mean you have reached the true bottom—it could be a saddle (flat in one place but up one way and down another). The eigenvalues of the Hessian tell you whether it is a true minimum or a saddle. When there are many variables (as in AI), avoiding these fake bottoms is crucial.
You want small steps on narrow paths and larger steps on open ground. The Hessian tells you "how steep each direction is," so you can set step size (learning rate) well and descend efficiently without wasted moves.
Newton's method moves a lot in one step with: xk+1=xkH1f(xk)\mathbf{x}_{k+1} = \mathbf{x}_k - \mathbf{H}^{-1} \nabla f(\mathbf{x}_k). Here xk\mathbf{x}_k is the current point, f(xk)\nabla f(\mathbf{x}_k) is the gradient there, H\mathbf{H} is the Hessian at that point, and H1\mathbf{H}^{-1} is its inverse. So you look at both the gradient and the curvature (Hessian) and jump toward the bottom to xk+1\mathbf{x}_{k+1}. That can reach the answer much faster than small gradient-only steps.
When there are many variables, computing the Hessian exactly is costly. In practice, quasi-Newton methods (e.g. BFGS) approximate the Hessian from past gradient information instead of computing it fully, and are used more often.
The table below lists only formulas and symbol meanings needed for problem-solving. See the worked examples under the table for step-by-step solutions.
  • FormulaHij=2fxixjH_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}
  • Symbol meaningHijH_{ij} = the number in the (i,j)(i,j) entry of the table—think of it as "differentiate once in xix_i, once in xjx_j." ff is the function, xix_i, xjx_j are variable (axis) indices. Order does not matter, so Hij=HjiH_{ij}=H_{ji} and the matrix is symmetric.
  • Formulan2n^2 (total entries)
  • Symbol meaningnn = number of variables. With nn variables the Hessian is n×nn \times n, so there are n2n^2 entries. E.g. 2 vars → 4, 3 vars → 9.
  • Formulan(n+1)2\frac{n(n+1)}{2} (independent entries)
  • Symbol meaningnn = number of variables. By symmetry you only count the upper triangle, giving 1+2++n=n(n+1)/21+2+\cdots+n = n(n+1)/2. E.g. 2 vars → 3, 3 vars → 6.
  • Formulann (rows/columns)
  • Symbol meaningnn = number of variables. The Hessian is n×nn \times n, so "how many rows? columns?" both are nn.
  • FormulaEigenvalue test
  • Symbol meaningλ\lambda = eigenvalue of the Hessian (curvature in each direction). All positive → bowl, minimum. All negative → dome, maximum. Mixed signs → up one way, down the other, saddle.
  • Formulaxk+1=xkH1f(xk)\mathbf{x}_{k+1} = \mathbf{x}_k - \mathbf{H}^{-1} \nabla f(\mathbf{x}_k)
  • Symbol meaningxk\mathbf{x}_k = current point, xk+1\mathbf{x}_{k+1} = next point. H\mathbf{H} = Hessian at that point, H1\mathbf{H}^{-1} = its inverse. f(xk)\nabla f(\mathbf{x}_k) = gradient there. The formula "jumps toward the bottom" using both gradient and curvature.
  • Formulax1=x0f(x0)f(x0)x_1 = x_0 - \frac{f^{\prime}(x_0)}{f^{\prime\prime}(x_0)}
  • Symbol meaningx0x_0 = current position, x1x_1 = next. f(x0)f^{\prime}(x_0) = slope (1st derivative), f(x0)f^{\prime\prime}(x_0) = 2nd derivative (Hessian in 1D). For f(x)=ax2+bx+cf(x)=ax^2+bx+c, f=2af^{\prime\prime}=2a (constant).
  • Formulaf(x)=2af^{\prime\prime}(x)=2a (f(x)=ax2+bx+cf(x)=ax^2+bx+c)
  • Symbol meaningff^{\prime\prime} = second derivative. aa is the coefficient of x2x^2. For a quadratic, differentiating twice removes xx and leaves the constant 2a2a.
  • Formulaf=0\nabla f = \mathbf{0} (critical point)
  • Symbol meaningf\nabla f = gradient (vector of 1st partials). 0\mathbf{0} = zero vector ("no gradient"). Where the gradient is zero is a candidate for min/max/saddle; use Hessian eigenvalues to tell which.

Worked examples

Example 1 — Entry count
Problem: How many Hessian entries does f(x1,x2)f(x_1, x_2) have?
Solution: For 2 variables the Hessian is a 2×22 \times 2 matrix, so there are 4 entries in total. By symmetry H12=H21H_{12}=H_{21}, so the number of independent entries is H11H_{11}, H12H_{12}, H22H_{22} = 3.
→ Answer 4 for total count, 3 for independent entries.

Example 2 — Minimum
Problem: When the Hessian eigenvalues are 2 and 5, what kind of point is it?
Solution: Both eigenvalues are positive, so the surface curves downward in every direction (bowl). So it is a local minimum.
→ Choose 1 (minimum) among
①min
②max
③saddle.

Example 3 — Maximum
Problem: When the Hessian eigenvalues are 1-1 and 3-3, what kind of point is it?
Solution: Both are negative, so the surface curves upward in every direction (inverted bowl). Local maximum.
→ Choose 2 (maximum).

Example 4 — Saddle
Problem: When the Hessian eigenvalues are 22 and 1-1, what kind of point is it?
Solution: Eigenvalues have both signs, so one direction goes up and another down. Saddle point.
→ Choose 3 (saddle).

Example 5 — Second derivative value
Problem: For f(x)=3x2+2x+1f(x)=3x^2+2x+1, what is f(x)f^{\prime\prime}(x)?
Solution: For a quadratic ax2+bx+cax^2+bx+c the coefficient of x2x^2 is a=3a=3. The second derivative is f(x)=2a=2×3=6f^{\prime\prime}(x)=2a=2 \times 3 = 6, a constant independent of xx.
→ Answer 6.

Example 6 — Newton step (1D)
Problem: For f(x)=x2f(x)=x^2, x0=4x_0=4, what is x1x_1 after one Newton step?
Solution: The 1D Newton step is x1=x0f(x0)/f(x0)x_1 = x_0 - f^{\prime}(x_0)/f^{\prime\prime}(x_0). We have f(x)=2xf^{\prime}(x)=2x, f(x)=2f^{\prime\prime}(x)=2, so f(4)=8f^{\prime} (4) =8, f(4)=2f^{\prime\prime} (4) =2. Thus x1=48/2=0x_1 = 4 - 8/2 = 0.
→ Answer 0.

Example 7 — Definition (T/F)
Problem: "If all eigenvalues of the Hessian are positive, the point is a local minimum." Answer 1 if true, 0 if false.
Solution: The statement is correct. When all eigenvalues are positive, the surface curves down in every direction, so it is a local minimum.
→ Answer 1.

문제

Read the instructions below, find the answer (integer), and enter it in the blank (?).

Choose the option that matches the question. Enter one number (1, 2, 3) for
①minimum
②maximum
③saddle.
(Hessian eigenvalue / definition question)
1 / 10