Chapter 06

Derivative

Differentiation gives the instantaneous rate of change (slope) at a point. The derivative as a function is the basis for gradient descent and backprop in deep learning.

Math diagram by chapter

Select a chapter to see its diagram below. View the flow of basic math at a glance.

Left: Tangent line — the line that touches the curve at exactly one point. Its slope is the derivative there. Right: Line through two points — the line through two points on the curve. As the points get closer, this line approaches the tangent, and that limit is the derivative.

Pick one point (2, 4) on the curve $y=x^2$ . We want to measure the slope at this point.

The derivative is the slope of the tangent line. The limit of the "line through two points" slopes (as the points get closer) is the tangent slope.

What are differentiation and the derivative?

In one sentence: The derivative is how steep the curve is at a single point—the slope of the tangent line there. On the graph, pick a point and draw the tangent (the line that just touches the curve). The slope of that line is the derivative. It tells you how quickly the height changes when you move a tiny bit sideways.

In symbols: That slope is the limit of "average slope between two very close points." When this limit exists, we say the function is differentiable there and write the value as

f'(a)

\frac{df}{dx}(a)

—the derivative at $a$ . In the visual above, the tangent line’s slope is exactly

f'(a)

The derivative (function) is the function that gives the tangent slope at every point. Each point has one slope; the derivative collects them. Finding it is called differentiating.

Item	Description
Below	Common differentiation formulas.
$'$	Means the derivative.

Formula	Description
Constant	derivative is 0
Power $x^n$	bring down $n$ , exponent $n-1$
Exponential $e^x$	derivative is still $e^x$
Exponential $a^x$	$a^x \ln a$
Natural log $\ln x$	$1/x$
Log (base a)	$1/(x \ln a)$
sin	derivative is cos
cos	derivative is $-\sin$
tan	derivative is $1/\cos^2$
Sum/difference	differentiate each part and add or subtract
Constant multiple	keep constant, differentiate the rest
Product rule	(first′)×second + first×(second′)
Quotient rule	(top′×bottom − top×bottom′) / bottom²
Intuition (product)	first derivative × second + first × second derivative. In the limit the two changes add up, so you get these two terms.
Intuition (quotient)	Write the quotient as top × (1/bottom) and use the product rule. The derivative of (1/bottom) gives the usual formula.

Formula	Numeric example	Solution step
Constant	(5)'=0, (-3)'=0	derivative is 0
$x^n$	$(x^3)'=3x^2$ , at $x=2$ : 12	bring down $n$ , exponent $n-1$
$e^x$	at $x=0$ : 1; at $x=1$ : $e$	unchanged by differentiation
$a^x$	$(2^x)'=2^x \ln 2$	multiply by $\ln a$
$\ln x$	at $x=5$ : $1/5$	derivative is $1/x$
sin	at $x=0$ : 1	sin → cos
cos	at $x=0$ : 0	cos → $-\sin$
Sum	$(x^2+x^3)'=2x+3x^2$	differentiate each term and add
Constant multiple	$(5x^2)'=10x$	keep constant, differentiate rest
Product	$(x\cdot e^x)'=e^x(1+x)$	(first′)×second + first×(second′)
Quotient	$x/(x^2+1)$ → at $x=1$ : 0	(top′×bottom−top×bottom′)/bottom²

In everyday life, "slope" tells you which way something is changing and how fast. For example, when you go down a hill, the steepest way down is the direction where the slope is largest. At the bottom of a valley (minimum) or the top of a hill (maximum), you are neither going up nor down, so the slope there is zero. So finding the lowest or highest point is the same as finding where the slope is zero. The derivative is how we write that "slope" precisely in math. Once you know derivatives, you can not only find where the minimum or maximum is, but also "which way to go to go down the fastest."

In deep learning, we first define a score (loss) — for example how far the prediction is from the right answer — and we want to lower it. The lower this score, the better the model is doing. Then we change many numbers inside the model (weights) a little at a time to reduce that score. But there are so many numbers that we cannot try "increase this one, decrease that one" by hand. So we use the gradient. The gradient tells us, for each number, whether increasing it a little or decreasing it a little makes the score go down — it is the same idea as the derivative. We then decide "to lower the score, increase this weight and decrease that one." Knowing derivatives helps explain why the AI updates numbers the way it does.

Backpropagation is the method that goes backward from the output toward the input, step by step, and computes for each weight "how much does the loss change?" (the derivative). The model has many layers (input → first layer → … → output), so when you change one weight, the next steps change in turn, and in the end the loss changes. To follow this "chain" of changes with derivatives, we need differentiation of composite functions (chain rule). The derivative and formulas in this chapter are used as-is, so learning them here pays off later.

In general, the derivative measures "if I change this value a little, how much does the result change, and in which direction (up or down)?" So we use it whenever we decide "should I increase or decrease this number?" For example: does sales go up or down when temperature rises? If we raise the price, how much does demand drop? If we spend more on ads, how much does revenue go up? All of these — "if I change one thing a little, how much and which way does something else change?" — can be measured with the derivative.

In AI training, "how much and in which direction to update each weight" is given by differentiating the loss with respect to that weight. We want to lower the loss (the score), but there are thousands or millions of weights, so we cannot try each one by experiment. Instead we use the derivative: "if I increase this weight a little, does the loss go down or up?" So knowing the differentiation rules in this chapter (sum, product, quotient, chain) lets you read backprop formulas and later extend to partial derivatives and the gradient (Ch08).

To find the derivative, identify which rules apply (power, exponential, log, trig, product, quotient, chain), then apply them and simplify.

Example problems and solutions are in the table below.

Problem	Solution
Ex 1. $f(x)=x^3-2x$	Power and sum: $f'(x)=3x^2-2$
Ex 2. $g(x)=e^x \ln x$	Product rule: $e^x \ln x + e^x \cdot \frac{1}{x}$
Ex 3. $h(x)=\frac{\sin x}{x}$	Quotient rule: $\frac{\cos x \cdot x - \sin x}{x^2}$