Ch.10

Hessian Matrix: Second Derivatives and Curvature of Surfaces

Math diagram by chapter

Select a chapter to see its diagram below. View the flow of intermediate math at a glance.

The first derivative tells you "which way is downhill"; the second (Hessian) tells you "will the surface bowl down, or go up in one direction and down in another (saddle point)?" Follow the animation below.

The Hessian is the matrix of second derivatives, so the "curvature" in the figure below is exactly what the Hessian describes.

Bowl: curves only down → minimum

Saddle: value ↑ this way, value ↓ that way

Orange direction: value goes up · Green direction: value goes down

Saddle: neither minimum nor maximum

Bowl curves only down → here is the minimum

Inverted bowl curves only up → here is the maximum

Saddle: one direction up, the other down → neither min nor max

Left: bowl (only curves down) \to minimum. Inverted bowl (only curves up) \to maximum. Saddle: one direction up, the other down \to neither min nor max.

The Hessian matrix is a square matrix of second-order partial derivatives of a scalar function. It encodes how much a surface curves at a point and is used to classify minima, maxima, and saddle points in optimization, and forms the basis of Newton's method and trust-region methods.

Hessian Matrix: Reading the Curvature of Surfaces

What is the Hessian matrix? — Think of it as a table of numbers that describe how much the surface curves in every direction at the point where you stand. It is a square matrix built from second derivatives of the function, and it is symmetric (same on both sides of the diagonal).

Imagine walking downhill with your eyes closed. What you feel under your feet—"this way is steeper down"—is the first derivative (gradient). The sense of "if I take one more step, will the ground bowl down or stay flat?" is the second derivative, i.e. the Hessian. With it you can avoid cliffs and find the true bottom, like the bottom of a bowl.

\mathbf{H}

In machine learning, training is about finding the "valley" where the error is smallest. Moving only by gradient is slow. Using the Hessian to read curvature lets you take Newton-style jumps toward the bottom and learn much faster.

The Hessian is a symmetric matrix of second partial derivatives of a scalar function and encodes curvature and the nature of critical points. At a point where the gradient is zero, all positive eigenvalues imply a local minimum, all negative a local maximum, and mixed signs a saddle point. In machine learning it underlies second-order optimization such as Newton's method, trust-region, and quasi-Newton methods.

On the way down you may hit a flat spot where the gradient is zero. That does not mean you have reached the true bottom—it could be a saddle (flat in one place but up one way and down another). The eigenvalues of the Hessian tell you whether it is a true minimum or a saddle. When there are many variables (as in AI), avoiding these fake bottoms is crucial.

You want small steps on narrow paths and larger steps on open ground. The Hessian tells you "how steep each direction is," so you can set step size (learning rate) well and descend efficiently without wasted moves.

\mathbf{x}_{k+1} = \mathbf{x}_k - \mathbf{H}^{-1} \nabla f(\mathbf{x}_k)

When there are many variables, computing the Hessian exactly is costly. In practice, quasi-Newton methods (e.g. BFGS) approximate the Hessian from past gradient information instead of computing it fully, and are used more often.

The table below lists only formulas and symbol meanings needed for problem-solving. See the worked examples under the table for step-by-step solutions.

H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}

f(x_1, x_2)

문제

Read the instructions below, find the answer (integer), and enter it in the blank (?).

Choose the option that matches the question. Enter one number (1, 2, 3) for ①minimum ②maximum ③saddle. (Hessian eigenvalue / definition question)

1 / 10

Formula	Symbol meaning
$H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}$	$H_{ij}$ = the number in the $(i,j)$ entry of the table—think of it as "differentiate once in $x_i$ , once in $x_j$ ." $f$ is the function, $x_i$ , $x_j$ are variable (axis) indices. Order does not matter, so $H_{ij}=H_{ji}$ and the matrix is symmetric.
$n^2$ (total entries)	$n$ = number of variables. With $n$ variables the Hessian is $n \times n$ , so there are $n^2$ entries. E.g. 2 vars → 4, 3 vars → 9.
$\frac{n(n+1)}{2}$ (independent entries)	$n$ = number of variables. By symmetry you only count the upper triangle, giving $1+2+\cdots+n = n(n+1)/2$ . E.g. 2 vars → 3, 3 vars → 6.
$n$ (rows/columns)	$n$ = number of variables. The Hessian is $n \times n$ , so "how many rows? columns?" both are $n$ .
Eigenvalue test	$\lambda$ = eigenvalue of the Hessian (curvature in each direction). All positive → bowl, minimum. All negative → dome, maximum. Mixed signs → up one way, down the other, saddle.
$\mathbf{x}_{k+1} = \mathbf{x}_k - \mathbf{H}^{-1} \nabla f(\mathbf{x}_k)$	$\mathbf{x}_k$ = current point, $\mathbf{x}_{k+1}$ = next point. $\mathbf{H}$ = Hessian at that point, $\mathbf{H}^{-1}$ = its inverse. $\nabla f(\mathbf{x}_k)$ = gradient there. The formula "jumps toward the bottom" using both gradient and curvature.
$x_1 = x_0 - \frac{f^{\prime}(x_0)}{f^{\prime\prime}(x_0)}$	$x_0$ = current position, $x_1$ = next. $f^{\prime}(x_0)$ = slope (1st derivative), $f^{\prime\prime}(x_0)$ = 2nd derivative (Hessian in 1D). For $f(x)=ax^2+bx+c$ , $f^{\prime\prime}=2a$ (constant).
$f^{\prime\prime}(x)=2a$ ( $f(x)=ax^2+bx+c$ )	$f^{\prime\prime}$ = second derivative. $a$ is the coefficient of $x^2$ . For a quadratic, differentiating twice removes $x$ and leaves the constant $2a$ .
$\nabla f = \mathbf{0}$ (critical point)	$\nabla f$ = gradient (vector of 1st partials). $\mathbf{0}$ = zero vector ("no gradient"). Where the gradient is zero is a candidate for min/max/saddle; use Hessian eigenvalues to tell which.