Chapter 06

Derivative

Differentiation gives the instantaneous rate of change (slope) at a point. The derivative as a function is the basis for gradient descent and backprop in deep learning.

Math diagram by chapter

Select a chapter to see its diagram below. View the flow of basic math at a glance.

Left: Tangent line — the line that touches the curve at exactly one point. Its slope is the derivative there. Right: Line through two points — the line through two points on the curve. As the points get closer, this line approaches the tangent, and that limit is the derivative.

Pick one point (2, 4) on the curve y=x2y=x^2. We want to measure the slope at this point.

The derivative is the slope of the tangent line. The limit of the "line through two points" slopes (as the points get closer) is the tangent slope.

What are differentiation and the derivative?

In one sentence: The derivative is how steep the curve is at a single point—the slope of the tangent line there. On the graph, pick a point and draw the tangent (the line that just touches the curve). The slope of that line is the derivative. It tells you how quickly the height changes when you move a tiny bit sideways.
In symbols: That slope is the limit of "average slope between two very close points." When this limit exists, we say the function is differentiable there and write the value as f(a)f'(a) or dfdx(a)\frac{df}{dx}(a)—the derivative at aa. In the visual above, the tangent line’s slope is exactly f(a)f'(a).
The derivative (function) is the function that gives the tangent slope at every point. Each point has one slope; the derivative collects them. Finding it is called differentiating.
ItemDescription
BelowCommon differentiation formulas.
'Means the derivative.
FormulaDescription
Constantderivative is 0
Power xnx^nbring down nn, exponent n1n-1
Exponential exe^xderivative is still exe^x
Exponential axa^xaxlnaa^x \ln a
Natural log lnx\ln x1/x1/x
Log (base a)1/(xlna)1/(x \ln a)
sinderivative is cos
cosderivative is sin-\sin
tanderivative is 1/cos21/\cos^2
Sum/differencedifferentiate each part and add or subtract
Constant multiplekeep constant, differentiate the rest
Product rule(first′)×second + first×(second′)
Quotient rule(top′×bottom − top×bottom′) / bottom²
Intuition (product)first derivative × second + first × second derivative. In the limit the two changes add up, so you get these two terms.
Intuition (quotient)Write the quotient as top × (1/bottom) and use the product rule. The derivative of (1/bottom) gives the usual formula.
FormulaNumeric exampleSolution step
Constant(5)'=0, (-3)'=0derivative is 0
xnx^n(x3)=3x2(x^3)'=3x^2, at x=2x=2: 12bring down nn, exponent n1n-1
exe^xat x=0x=0: 1; at x=1x=1: eeunchanged by differentiation
axa^x(2x)=2xln2(2^x)'=2^x \ln 2multiply by lna\ln a
lnx\ln xat x=5x=5: 1/51/5derivative is 1/x1/x
sinat x=0x=0: 1sin → cos
cosat x=0x=0: 0cos → sin-\sin
Sum(x2+x3)=2x+3x2(x^2+x^3)'=2x+3x^2differentiate each term and add
Constant multiple(5x2)=10x(5x^2)'=10xkeep constant, differentiate rest
Product(xex)=ex(1+x)(x\cdot e^x)'=e^x(1+x)(first′)×second + first×(second′)
Quotientx/(x2+1)x/(x^2+1) → at x=1x=1: 0(top′×bottom−top×bottom′)/bottom²
In everyday life, "slope" tells you which way something is changing and how fast. For example, when you go down a hill, the steepest way down is the direction where the slope is largest. At the bottom of a valley (minimum) or the top of a hill (maximum), you are neither going up nor down, so the slope there is zero. So finding the lowest or highest point is the same as finding where the slope is zero. The derivative is how we write that "slope" precisely in math. Once you know derivatives, you can not only find where the minimum or maximum is, but also "which way to go to go down the fastest."
In deep learning, we first define a score (loss) — for example how far the prediction is from the right answer — and we want to lower it. The lower this score, the better the model is doing. Then we change many numbers inside the model (weights) a little at a time to reduce that score. But there are so many numbers that we cannot try "increase this one, decrease that one" by hand. So we use the gradient. The gradient tells us, for each number, whether increasing it a little or decreasing it a little makes the score go down — it is the same idea as the derivative. We then decide "to lower the score, increase this weight and decrease that one." Knowing derivatives helps explain why the AI updates numbers the way it does.
Backpropagation is the method that goes backward from the output toward the input, step by step, and computes for each weight "how much does the loss change?" (the derivative). The model has many layers (input → first layer → … → output), so when you change one weight, the next steps change in turn, and in the end the loss changes. To follow this "chain" of changes with derivatives, we need differentiation of composite functions (chain rule). The derivative and formulas in this chapter are used as-is, so learning them here pays off later.
In general, the derivative measures "if I change this value a little, how much does the result change, and in which direction (up or down)?" So we use it whenever we decide "should I increase or decrease this number?" For example: does sales go up or down when temperature rises? If we raise the price, how much does demand drop? If we spend more on ads, how much does revenue go up? All of these — "if I change one thing a little, how much and which way does something else change?" — can be measured with the derivative.
In AI training, "how much and in which direction to update each weight" is given by differentiating the loss with respect to that weight. We want to lower the loss (the score), but there are thousands or millions of weights, so we cannot try each one by experiment. Instead we use the derivative: "if I increase this weight a little, does the loss go down or up?" So knowing the differentiation rules in this chapter (sum, product, quotient, chain) lets you read backprop formulas and later extend to partial derivatives and the gradient (Ch08).
To find the derivative, identify which rules apply (power, exponential, log, trig, product, quotient, chain), then apply them and simplify.
Example problems and solutions are in the table below.
ProblemSolution
Ex 1. f(x)=x32xf(x)=x^3-2xPower and sum: f(x)=3x22f'(x)=3x^2-2
Ex 2. g(x)=exlnxg(x)=e^x \ln xProduct rule: exlnx+ex1xe^x \ln x + e^x \cdot \frac{1}{x}
Ex 3. h(x)=sinxxh(x)=\frac{\sin x}{x}Quotient rule: cosxxsinxx2\frac{\cos x \cdot x - \sin x}{x^2}