Chapter 04

Activation (Nonlinear)

A function that makes a neuron's output nonlinear.

Deep learning diagram by chapter

As you complete each chapter, the diagram below fills in. This is the structure so far.

Representative activation functions where output Y changes nonlinearly with input X. (3-level quantized version)

Y = Sigmoid(X)

Y = ReLU(X)

Y = Tanh₃(X)

Node values change in a nonlinear way through ReLU or σ. The last layer Y1, Y2, Y3 come from that.

Activation in deep learning

An activation function transforms a neuron's raw output (weighted sum) into a specific range or shape. The most common ones are ReLU (negative → 0, positive → unchanged), Sigmoid (compresses to 0–1), and Tanh (compresses to −1 to 1).

Think of it like a faucet: when water (signal) comes in, it either 'only lets through above a threshold (ReLU)' or 'reduces the flow if it's too strong (Sigmoid, Tanh).' This transformation makes the output suitable for the next layer.

ReLU is the most popular because it's simple to compute (keep if positive, zero if negative) and trains fast. Sigmoid is used when you need probability-like outputs, and Tanh when you want values centered around zero.

No matter how many multiply-and-add (linear) operations you stack, the result is the same as one multiply-and-add. Just as connecting straight lines only gives you a straight line, linear operations alone can never represent curves or complex patterns.

Activation functions add bends (nonlinearity). These bends allow stacked layers to create curves and complex boundaries, enabling the model to learn patterns in images, speech, and text.

Without activation functions, no matter how deep the network, it can only do what a single line could do. Activations are the essential ingredient that makes deep learning 'deep.'

Image recognition: After computing W·X + b at each layer, ReLU clips irrelevant features (negatives to zero) and passes relevant ones (positives) to the next layer, progressively extracting 'eyes,' 'ears,' 'wheels,' etc.

Chatbots & translators: Hidden layers use ReLU or GELU (a smoother version) for nonlinearity; the final layer uses Sigmoid (yes/no decisions) or Softmax (choosing among multiple candidates) to produce the answer.

Speech recognition & self-driving: Sound waves or camera images are converted to numbers, then passed through many linear + activation layers to determine 'what word is this' or 'what object is that.' Without activation, such complex decisions would be impossible.

Find X's interval in the table; that gives Y.

Function	Rule
ReLU	0 or less → 0; positive → same as X
Sigmoid	Small → 0, middle → 0.5, large → 1
Tanh₃	Small → -1, middle → 0, large → 1
Note	Check the problem's table for boundaries.

Y = ReLU(X)

X ~ Y

-2

1.5

Y = Sigmoid(X)

X ~ Y

-2

0.5

Y = Tanh₃(X)

X ~ Y

-2

-1

Problem

Given the activation function (Sigmoid, ReLU, Tanh₃), find Y for each X and fill in the blank (?).

Y = ReLU(X)

X ~ Y

-3

-1

1 / 20