Chapter 07

Chain Rule: Unraveling Composite Functions, the Heart of Backprop

When you differentiate a function inside another, multiply outer derivative \times inner derivative . That's the core of backprop.

Math diagram by chapter

Select a chapter to see its diagram below. View the flow of basic math at a glance.

A nested function is a chain $x$ → inner → outer → $y$ . Multiply outer derivative × inner derivative to get the total derivative.

Example: calculation order (one step highlighted at a time)

1.Example: as in the graphs above,

u = g(x) = 2x+1

and

y = f(u) = u^2

, so

y = (2x+1)^2

. Differentiate with respect to

x

2.① Inner derivative (left graph):

u = g(x) = 2x+1

→ derivative w.r.t.

x

2

3.② Outer derivative (right graph):

y = f(u) = u^2

→ derivative w.r.t.

u

2u = 2(2x+1)

4.③ Multiply:

2 \times 2(2x+1) = 4(2x+1)

→ answer

As the dot moves along the chain, rates multiply along the way. Backprop is the same: multiply at each step.

What is the chain rule?

The chain rule is the rule for differentiating composite functions—functions inside other functions. Like peeling an onion: differentiate the outer function ( $f^{\prime}$ ) and multiply by the derivative of the inner ( $g'$ ). In symbols:

\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}

. It is like finding the gear ratio of meshed gears.

Intuition: You (

x

) push a friend (

u

), and the friend pushes a cart (

y

). If you push the friend 2× harder and the friend pushes the cart 3× harder, the cart moves 2×3=6× as much as your push. The chain rule is this multiplication of rates along the chain.

One-line summary: To differentiate a composite function w.r.t.

x

, multiply the outer derivative and the inner derivative. See the table below for the steps.

Step $1$
Task $Identify inner and outer$
Example: $y=(2x+1)^2$ $u=2x+1$

Step $2$
Task $u$
Example: $y=(2x+1)^2$ $u^2$

Step $3$
Task $x$
Example: $y=(2x+1)^2$ $2x+1$

Step $4$
Task $Multiply$
Example: $y=(2x+1)^2$ $2u \times 2 = 2(2x+1) \times 2 = 4(2x+1)$

Step	Task	Example: $y=(2x+1)^2$
1	Identify inner and outer	Inner $u=2x+1$ , outer $y=u^2$
2	Outer derivative — differentiate outer (keep $u$ as is)	$u^2$ → $2u$
3	Inner derivative — differentiate inner w.r.t. $x$	$2x+1$ → $2$
4	Multiply	$2u \times 2 = 2(2x+1) \times 2 = 4(2x+1)$

Main formula:

\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}

(f \circ g)'(x) = f^{\prime}(g(x)) \cdot g'(x)

. As in the visual,

x

→ inner → outer →

y

, so multiply the derivative on each segment. If the inner is itself composite, apply the same: outer derivative × inner derivative, and repeat.

Why multiply instead of add? Because these are rates. A car at 100 km/h (

v

) and an exchange rate of 1300 won per dollar (

r

) cannot be added meaningfully. To compute amplification or damping of change, you must multiply.

Check with numbers: For

y=(2x+1)^2

x=1

, the formula gives

4(2(1)+1)=12

. If

x

goes from 1 to 1.01 (change 0.01),

y

goes from 9 to about 9.1204 (change about 0.12). So the rate is 12, which matches.

Deep learning models are huge composite functions—dozens or hundreds of functions stacked (

y = f_n(...f_2(f_1(x))...)

). We need to know how the final loss ( $L$ ) changes when we change the initial input or a weight ( $w$ ) in the middle. That requires the chain rule.

Backpropagation is exactly the chain rule in action. When we propagate the error from the output layer backward to the input, we multiply the derivative at each layer. Without this, training deep networks would be impossible.

So AI learning is passing derivative values along by multiplying them (the chain rule). The deeper the network, the more this multiplication repeats. Multiplying numbers less than 1 (e.g. 0.5) many times drives the product toward 0. This vanishing gradient was one reason deep networks were hard to train. Techniques like ReLU and skip connections help mitigate it.

It is used to analyze complex cause-and-effect chains. If A affects B and B affects C, the effect of A on C is found by multiplying the effect at each step.

Situation $Cost \to output \to time$
What we find $Effect of time on cost$
Chain rule (total rate) $\times$

Situation $Volume \to radius \to time$
What we find $How fast volume changes when blowing up a balloon$
Chain rule (total rate) $\times$

Situation $Error \to output \to weight$
What we find $AI learning: how much to update the weight$
Chain rule (total rate) $\times$

Situation	What we find	Chain rule (total rate)
Cost → output → time	Effect of time on cost	(cost/output) $\times$ (output/time)
Volume → radius → time	How fast volume changes when blowing up a balloon	(volume/radius) $\times$ (radius/time)
Error → output → weight	AI learning: how much to update the weight	(error/output) $\times$ (output/weight)

Automatic differentiation: Frameworks like PyTorch or TensorFlow compute derivatives when we call `loss.backward()`. Inside, they build a computation graph and apply the chain rule at each node to compute and multiply gradients in an instant.

For a nested function, treat the inner part as one block, then multiply the derivative of the outer (with respect to that block) by the derivative of the inner. If the inner is itself nested, repeat. Tip: Set inner = something, differentiate the outer only, then multiply by the derivative of the inner w.r.t.

x

Simplest example:

y=(3x)^2

. Inner

u=3x

→ derivative

3

. Outer

u^2

→ derivative

2u=2\cdot 3x

. Product:

3 \times 2\cdot 3x = 18x

. At

x=2

the slope is

36

Easy to varied examples are in the table below. In each row, multiply inner and outer derivatives to get the answer.

Problem $y=(3x)^2$
Solution $u=3x$

Problem $y=\sqrt{x+1}$
Solution $u=x+1$

Problem $y=(2x+1)^5$
Solution $2$

Problem $y=e^{x^2}$
Solution $2x$

Problem $y=\sin(2x)$
Solution $u=2x$

Problem $y=e^{3x}$
Solution $3$

Problem $y=\ln(\sin x)$
Solution $\cos x$

Problem	Solution
Easy $y=(3x)^2$	Inner $u=3x$ → inner deriv $3$ , outer $u^2$ → outer deriv $2u$ ; product $2\cdot 3x\cdot 3=18x$
Easy $y=\sqrt{x+1}$	Inner $u=x+1$ → inner deriv $1$ , outer $\sqrt{u}$ → outer deriv $1/(2\sqrt{u})$ ; product $1/(2\sqrt{x+1})$
Ex. $y=(2x+1)^5$	Inner deriv $2$ , outer deriv $5(2x+1)^4$ → product $10(2x+1)^4$
Ex. $y=e^{x^2}$	Inner deriv $2x$ , outer deriv $e^{x^2}$ → product $2x\,e^{x^2}$
Ex. $y=\sin(2x)$	Inner $u=2x$ → inner deriv $2$ , outer $\sin u$ → outer deriv $\cos u$ ; product $2\cos(2x)$
Ex. $y=e^{3x}$	Inner deriv $3$ , outer deriv $e^{3x}$ → product $3e^{3x}$
Ex. $y=\ln(\sin x)$	Inner deriv $\cos x$ , outer deriv $1/\sin x$ → product $\cos x/\sin x=\cot x$

Problem types and how to solve

Type $Power$
Form $(g(x))^n$
How to get $f^{\prime}(x)$ $n u^{n-1}$

Type $Exponential$
Form $e^{g(x)}$
How to get $f^{\prime}(x)$ $e^u$

Type $Trig$
Form $\sin(g(x))$
How to get $f^{\prime}(x)$ $Outer deriv (cos or -sin) \times inner deriv.$

Type $Root$
Form $\sqrt{g(x)}$
How to get $f^{\prime}(x)$ $1/(2\sqrt{u})$

Type $Log$
Form $\ln(g(x))$
How to get $f^{\prime}(x)$ $1/u$

Type $Quadratic inside$
Form $(ax^2+bx+c)^n$
How to get $f^{\prime}(x)$ $2ax+b$

Type	Form	How to get $f^{\prime}(x)$
Power	$(g(x))^n$	Outer deriv $n u^{n-1}$ × inner deriv $g'(x)$ .
Exponential	$e^{g(x)}$	Outer deriv $e^u$ × inner deriv → $e^{g(x)} \cdot g'(x)$ .
Trig	$\sin(g(x))$ , $\cos(g(x))$	Outer deriv (cos or −sin) × inner deriv.
Root	$\sqrt{g(x)}$	Outer deriv $1/(2\sqrt{u})$ × inner deriv.
Log	$\ln(g(x))$	Outer deriv $1/u$ × inner deriv → $g'(x)/g(x)$ .
Quadratic inside	$(ax^2+bx+c)^n$ etc.	Inner deriv $2ax+b$ ; multiply by outer deriv.

Example (power)

For

y=(3x)^2

, find the derivative at

x=2

Solution

y'=2\cdot 3x \cdot 3=18x

. At

x=2

→

36

. → Answer 36

Example (exponential)

For

y=e^{3x}

, find the derivative at

x=0

Solution

y'=3e^{3x}

. At

x=0

→

3e^0=3

. → Answer 3

Example (trig)

For

y=\sin(2x)

, find the derivative at

x=0

Solution

y'=2\cos(2x)

. At

x=0

→

2\cos 0=2

. → Answer 2

Example (log)

For

y=\ln(\sin x)

, find the derivative at

x=\pi/2

Solution

y'=\frac{\cos x}{\sin x}=\cot x

. At

x=\pi/2

\cos(\pi/2)=0

y'=0

. → Answer 0

What is the chain rule?

\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}

. It is like finding the gear ratio of meshed gears.

Intuition: You (

x

) push a friend (

u

), and the friend pushes a cart (

y

). If you push the friend 2× harder and the friend pushes the cart 3× harder, the cart moves 2×3=6× as much as your push. The chain rule is this multiplication of rates along the chain.

One-line summary: To differentiate a composite function w.r.t.

x

, multiply the outer derivative and the inner derivative. See the table below for the steps.

Step $1$
Task $Identify inner and outer$
Example: $y=(2x+1)^2$ $u=2x+1$

Step $2$
Task $u$
Example: $y=(2x+1)^2$ $u^2$

Step $3$
Task $x$
Example: $y=(2x+1)^2$ $2x+1$

Step $4$
Task $Multiply$
Example: $y=(2x+1)^2$ $2u \times 2 = 2(2x+1) \times 2 = 4(2x+1)$

Step	Task	Example: $y=(2x+1)^2$
1	Identify inner and outer	Inner $u=2x+1$ , outer $y=u^2$
2	Outer derivative — differentiate outer (keep $u$ as is)	$u^2$ → $2u$
3	Inner derivative — differentiate inner w.r.t. $x$	$2x+1$ → $2$
4	Multiply	$2u \times 2 = 2(2x+1) \times 2 = 4(2x+1)$

Main formula:

\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}

(f \circ g)'(x) = f^{\prime}(g(x)) \cdot g'(x)

. As in the visual,

x

→ inner → outer →

y

, so multiply the derivative on each segment. If the inner is itself composite, apply the same: outer derivative × inner derivative, and repeat.

Why multiply instead of add? Because these are rates. A car at 100 km/h (

v

) and an exchange rate of 1300 won per dollar (

r

) cannot be added meaningfully. To compute amplification or damping of change, you must multiply.

Check with numbers: For

y=(2x+1)^2

x=1

, the formula gives

4(2(1)+1)=12

. If

x

goes from 1 to 1.01 (change 0.01),

y

goes from 9 to about 9.1204 (change about 0.12). So the rate is 12, which matches.

Deep learning models are huge composite functions—dozens or hundreds of functions stacked (

y = f_n(...f_2(f_1(x))...)

). We need to know how the final loss ( $L$ ) changes when we change the initial input or a weight ( $w$ ) in the middle. That requires the chain rule.

It is used to analyze complex cause-and-effect chains. If A affects B and B affects C, the effect of A on C is found by multiplying the effect at each step.

Situation $Cost \to output \to time$
What we find $Effect of time on cost$
Chain rule (total rate) $\times$

Situation $Volume \to radius \to time$
What we find $How fast volume changes when blowing up a balloon$
Chain rule (total rate) $\times$

Situation $Error \to output \to weight$
What we find $AI learning: how much to update the weight$
Chain rule (total rate) $\times$

Situation	What we find	Chain rule (total rate)
Cost → output → time	Effect of time on cost	(cost/output) $\times$ (output/time)
Volume → radius → time	How fast volume changes when blowing up a balloon	(volume/radius) $\times$ (radius/time)
Error → output → weight	AI learning: how much to update the weight	(error/output) $\times$ (output/weight)

x

Simplest example:

y=(3x)^2

. Inner

u=3x

→ derivative

3

. Outer

u^2

→ derivative

2u=2\cdot 3x

. Product:

3 \times 2\cdot 3x = 18x

. At

x=2

the slope is

36

Easy to varied examples are in the table below. In each row, multiply inner and outer derivatives to get the answer.

Problem $y=(3x)^2$
Solution $u=3x$

Problem $y=\sqrt{x+1}$
Solution $u=x+1$

Problem $y=(2x+1)^5$
Solution $2$

Problem $y=e^{x^2}$
Solution $2x$

Problem $y=\sin(2x)$
Solution $u=2x$

Problem $y=e^{3x}$
Solution $3$

Problem $y=\ln(\sin x)$
Solution $\cos x$

Problem	Solution
Easy $y=(3x)^2$	Inner $u=3x$ → inner deriv $3$ , outer $u^2$ → outer deriv $2u$ ; product $2\cdot 3x\cdot 3=18x$
Easy $y=\sqrt{x+1}$	Inner $u=x+1$ → inner deriv $1$ , outer $\sqrt{u}$ → outer deriv $1/(2\sqrt{u})$ ; product $1/(2\sqrt{x+1})$
Ex. $y=(2x+1)^5$	Inner deriv $2$ , outer deriv $5(2x+1)^4$ → product $10(2x+1)^4$
Ex. $y=e^{x^2}$	Inner deriv $2x$ , outer deriv $e^{x^2}$ → product $2x\,e^{x^2}$
Ex. $y=\sin(2x)$	Inner $u=2x$ → inner deriv $2$ , outer $\sin u$ → outer deriv $\cos u$ ; product $2\cos(2x)$
Ex. $y=e^{3x}$	Inner deriv $3$ , outer deriv $e^{3x}$ → product $3e^{3x}$
Ex. $y=\ln(\sin x)$	Inner deriv $\cos x$ , outer deriv $1/\sin x$ → product $\cos x/\sin x=\cot x$

Problem types and how to solve

Type $Power$
Form $(g(x))^n$
How to get $f^{\prime}(x)$ $n u^{n-1}$

Type $Exponential$
Form $e^{g(x)}$
How to get $f^{\prime}(x)$ $e^u$

Type $Trig$
Form $\sin(g(x))$
How to get $f^{\prime}(x)$ $Outer deriv (cos or -sin) \times inner deriv.$

Type $Root$
Form $\sqrt{g(x)}$
How to get $f^{\prime}(x)$ $1/(2\sqrt{u})$

Type $Log$
Form $\ln(g(x))$
How to get $f^{\prime}(x)$ $1/u$

Type $Quadratic inside$
Form $(ax^2+bx+c)^n$
How to get $f^{\prime}(x)$ $2ax+b$

Type	Form	How to get $f^{\prime}(x)$
Power	$(g(x))^n$	Outer deriv $n u^{n-1}$ × inner deriv $g'(x)$ .
Exponential	$e^{g(x)}$	Outer deriv $e^u$ × inner deriv → $e^{g(x)} \cdot g'(x)$ .
Trig	$\sin(g(x))$ , $\cos(g(x))$	Outer deriv (cos or −sin) × inner deriv.
Root	$\sqrt{g(x)}$	Outer deriv $1/(2\sqrt{u})$ × inner deriv.
Log	$\ln(g(x))$	Outer deriv $1/u$ × inner deriv → $g'(x)/g(x)$ .
Quadratic inside	$(ax^2+bx+c)^n$ etc.	Inner deriv $2ax+b$ ; multiply by outer deriv.

Example (power)

For

y=(3x)^2

, find the derivative at

x=2

Solution

y'=2\cdot 3x \cdot 3=18x

. At

x=2

→

36

. → Answer 36

Example (exponential)

For

y=e^{3x}

, find the derivative at

x=0

Solution

y'=3e^{3x}

. At

x=0

→

3e^0=3

. → Answer 3

Example (trig)

For

y=\sin(2x)

, find the derivative at

x=0

Solution

y'=2\cos(2x)

. At

x=0

→

2\cos 0=2

. → Answer 2

Example (log)

For

y=\ln(\sin x)

, find the derivative at

x=\pi/2

Solution

y'=\frac{\cos x}{\sin x}=\cot x

. At

x=\pi/2

\cos(\pi/2)=0

y'=0

. → Answer 0