Chapter 07
Chain Rule: Unraveling Composite Functions, the Heart of Backprop
When you differentiate a function inside another, multiply outer derivative × inner derivative. That's the core of backprop.
Math diagram by chapter
Select a chapter to see its diagram below. View the flow of basic math at a glance.
A nested function is a chain → inner → outer → . Multiply outer derivative × inner derivative to get the total derivative.
Example: calculation order (one step highlighted at a time)
1.Example: as in the graphs above, and , so . Differentiate with respect to .
2.① Inner derivative (left graph): → derivative w.r.t. is
3.② Outer derivative (right graph): → derivative w.r.t. is
4.③ Multiply: → answer
As the dot moves along the chain, rates multiply along the way. Backprop is the same: multiply at each step.
What is the chain rule?
The chain rule is the rule for differentiating composite functions—functions inside other functions. Like peeling an onion: differentiate the outer function () and multiply by the derivative of the inner (). In symbols: . It is like finding the gear ratio of meshed gears.
Intuition: You () push a friend (), and the friend pushes a cart (). If you push the friend 2× harder and the friend pushes the cart 3× harder, the cart moves 2×3=6× as much as your push. The chain rule is this multiplication of rates along the chain.
One-line summary: To differentiate a composite function w.r.t. , multiply the outer derivative and the inner derivative. See the table below for the steps.
- Step1
- TaskIdentify inner and outer
- Example: Inner , outer
- Step2
- TaskOuter derivative — differentiate outer (keep as is)
- Example: →
- Step3
- TaskInner derivative — differentiate inner w.r.t.
- Example: →
- Step4
- TaskMultiply
- Example:
| Step | Task | Example: |
|---|---|---|
| 1 | Identify inner and outer | Inner , outer |
| 2 | Outer derivative — differentiate outer (keep as is) | → |
| 3 | Inner derivative — differentiate inner w.r.t. | → |
| 4 | Multiply |
Main formula: or . As in the visual, → inner → outer → , so multiply the derivative on each segment. If the inner is itself composite, apply the same: outer derivative × inner derivative, and repeat.
Why multiply instead of add? Because these are rates. A car at 100 km/h () and an exchange rate of 1300 won per dollar () cannot be added meaningfully. To compute amplification or damping of change, you must multiply.
Check with numbers: For at , the formula gives . If goes from 1 to 1.01 (change 0.01), goes from 9 to about 9.1204 (change about 0.12). So the rate is 12, which matches.
Deep learning models are huge composite functions—dozens or hundreds of functions stacked (). We need to know how the final loss () changes when we change the initial input or a weight () in the middle. That requires the chain rule.
Backpropagation is exactly the chain rule in action. When we propagate the error from the output layer backward to the input, we multiply the derivative at each layer. Without this, training deep networks would be impossible.
So AI learning is passing derivative values along by multiplying them (the chain rule). The deeper the network, the more this multiplication repeats. Multiplying numbers less than 1 (e.g. 0.5) many times drives the product toward 0. This vanishing gradient was one reason deep networks were hard to train. Techniques like ReLU and skip connections help mitigate it.
It is used to analyze complex cause-and-effect chains. If A affects B and B affects C, the effect of A on C is found by multiplying the effect at each step.
- SituationCost → output → time
- What we findEffect of time on cost
- Chain rule (total rate)(cost/output) (output/time)
- SituationVolume → radius → time
- What we findHow fast volume changes when blowing up a balloon
- Chain rule (total rate)(volume/radius) (radius/time)
- SituationError → output → weight
- What we findAI learning: how much to update the weight
- Chain rule (total rate)(error/output) (output/weight)
| Situation | What we find | Chain rule (total rate) |
|---|---|---|
| Cost → output → time | Effect of time on cost | (cost/output) (output/time) |
| Volume → radius → time | How fast volume changes when blowing up a balloon | (volume/radius) (radius/time) |
| Error → output → weight | AI learning: how much to update the weight | (error/output) (output/weight) |
Automatic differentiation: Frameworks like PyTorch or TensorFlow compute derivatives when we call `loss.backward()`. Inside, they build a computation graph and apply the chain rule at each node to compute and multiply gradients in an instant.
For a nested function, treat the inner part as one block, then multiply the derivative of the outer (with respect to that block) by the derivative of the inner. If the inner is itself nested, repeat. Tip: Set inner = something, differentiate the outer only, then multiply by the derivative of the inner w.r.t. .
Simplest example: . Inner → derivative . Outer → derivative . Product: . At the slope is .
Easy to varied examples are in the table below. In each row, multiply inner and outer derivatives to get the answer.
- ProblemEasy
- SolutionInner → inner deriv , outer → outer deriv ; product
- ProblemEasy
- SolutionInner → inner deriv , outer → outer deriv ; product
- ProblemEx.
- SolutionInner deriv , outer deriv → product
- ProblemEx.
- SolutionInner deriv , outer deriv → product
- ProblemEx.
- SolutionInner → inner deriv , outer → outer deriv ; product
- ProblemEx.
- SolutionInner deriv , outer deriv → product
- ProblemEx.
- SolutionInner deriv , outer deriv → product
| Problem | Solution |
|---|---|
| Easy | Inner → inner deriv , outer → outer deriv ; product |
| Easy | Inner → inner deriv , outer → outer deriv ; product |
| Ex. | Inner deriv , outer deriv → product |
| Ex. | Inner deriv , outer deriv → product |
| Ex. | Inner → inner deriv , outer → outer deriv ; product |
| Ex. | Inner deriv , outer deriv → product |
| Ex. | Inner deriv , outer deriv → product |
Problem types and how to solve
- TypePower
- Form
- How to get Outer deriv × inner deriv .
- TypeExponential
- Form
- How to get Outer deriv × inner deriv → .
- TypeTrig
- Form,
- How to get Outer deriv (cos or −sin) × inner deriv.
- TypeRoot
- Form
- How to get Outer deriv × inner deriv.
- TypeLog
- Form
- How to get Outer deriv × inner deriv → .
- TypeQuadratic inside
- Form etc.
- How to get Inner deriv ; multiply by outer deriv.
| Type | Form | How to get |
|---|---|---|
| Power | Outer deriv × inner deriv . | |
| Exponential | Outer deriv × inner deriv → . | |
| Trig | , | Outer deriv (cos or −sin) × inner deriv. |
| Root | Outer deriv × inner deriv. | |
| Log | Outer deriv × inner deriv → . | |
| Quadratic inside | etc. | Inner deriv ; multiply by outer deriv. |
Example (power)
For , find the derivative at .
Solution
. At → . → Answer 36
Example (exponential)
For , find the derivative at .
Solution
. At → . → Answer 3
Example (trig)
For , find the derivative at .
Solution
. At → . → Answer 2
Example (log)
For , find the derivative at .
Solution
. At , so . → Answer 0