Chapter 08
Hidden Layers (Invisible Layers)
Layers between the input and output layers.
Deep learning diagram by chapter
As you complete each chapter, the diagram below fills in. This is the structure so far.
We only see input (X) and output (Y). The layer in between is used only inside the network, so it’s the hidden layer.
Visible: input→Hidden: H→Visible: output
Values flow input → hidden → output. The hidden layer is an internal representation we don’t see.
Hidden layers in deep learning
A hidden layer is an intermediate stage between input and output. Users only see the input (e.g., a photo) and output (e.g., 'dog'), but in between, hidden layers create 'hidden features.'
The flow is: X → Linear(W·X+b) → ReLU → H (hidden representation) → Linear(W·H+b) → ReLU → Y (output). H is the hidden layer's result, containing compressed 'key features' of the input.
Analogy: When you see a photo and say 'dog,' your brain goes through 'colors → edges → eyes/nose/ears → dog!' These intermediate thinking steps are the hidden layers. The number of neurons (width) in the hidden layer determines how many different features it can capture.
Hidden layers progressively summarize and transform input data. Early layers capture simple features (brightness, edges), later layers capture complex features (eyes, wheels, letters).
Without hidden layers, the model maps input directly to output, only expressing very simple (linear) relationships. With hidden layers, it can learn complex relationships (curves, multi-condition combinations).
The number of neurons (width) and number of layers (depth) determine the model's representational power. Too small = information bottleneck and poor performance; too large = overfitting (memorizing instead of learning).
Image recognition: The stages 'pixels → edges → textures → object parts (eyes, wheels) → whole objects (dog, car)' are all hidden layers. Deeper layers extract more abstract features.
Chatbots & translators: After converting text to numbers, multiple hidden layers progressively refine 'word meaning → sentence context → answer direction.' ChatGPT passes through dozens of hidden layers (Transformer blocks) to generate responses.
Speech recognition: The transformation 'sound wave → frequency features → phonemes → words → sentences' goes through hidden layers at each stage.
Compute in order: X → (W·X+b) → ReLU → H → (W·H+b) → ReLU → Y. Compute each step sequentially. If the blank is in H, compute only through the first linear+ReLU. If in Y, compute H first then the second stage.
ReLU caution: When the linear result (W·input+b) is negative, ReLU turns it to 0. In the next layer, that value is 0, so that term contributes nothing—you can ignore it entirely. This is a frequent key point in hidden layer problems.
Blank in W or b: Hidden layer problems have two stages (two linear+activation). First identify which stage the blank is in. If you know the input and output of that stage, solve for the blank using that stage's equation alone.
A hidden layer takes input, applies a linear transform () and ReLU to produce an intermediate representation H, then applies another linear transform and ReLU to produce the final output Y.
Layer 1: H = ReLU(W₁·X + b₁)
Layer 2: Y = ReLU(W₂·H + b₂)
Problem
In the forward pass with a hidden layer (X → (W·X+b) → ReLU → H → (W·H+b) → ReLU → Y), fill in the blank (?).