Chapter 03

Linear Layer (Weights and Bias)

A layer that multiplies the input by a weight matrix and adds a bias vector.

Deep learning diagram by chapter

As you complete each chapter, the diagram below fills in. This is the structure so far.

X1X2X3Y1Y2Y3Weight·input+biasReLUYResult

This block is a linear layer. Input is computed to the next layer at once as Y = W·X + b.

Linear layer in deep learning

A linear layer multiplies the input by weights (W) and adds a bias (b) to produce output: Y = W·X + b. The W·X part is matrix multiplication, and b shifts the baseline up or down.

Think of it like a grading formula: 'math×0.3 + science×0.5 + English×0.2 + 10'. Here 0.3, 0.5, 0.2 are weights (W), 10 is bias (b), and the subject scores are input (X).

A single linear layer decides 'how much to scale each input and how much to add.' With multiple outputs, each output uses different weights and bias, computing many scores at once.

Almost every deep learning model uses linear layers as basic building blocks. ChatGPT, translators, and image classifiers all repeat 'W·X + b' hundreds to thousands of times. It's the brick of deep learning.

Model size (parameter count) is determined by 'how many inputs × how many outputs' for each linear layer. This size controls how complex things the model can learn (capacity) vs. the risk of overfitting (just memorizing training data).

However, stacking linear layers alone is equivalent to one linear operation (only straight lines). That's why an activation function (a bending function) is always added after each linear layer to enable curves and complex patterns.

ChatGPT & translators: Sentences are converted to number vectors, then passed through dozens to hundreds of linear layers, each computing W·X + b followed by an activation, to understand context and generate answers.

Image recognition: Feature vectors from photos are fed into linear layers to compute 'dog score,' 'cat score,' 'bird score' simultaneously. The final linear layer's outputs become per-class scores.

Recommendation systems: User info and product info are combined into a vector, fed through linear layers to get a 'how much this user would like this product' score. More layers allow finer recommendations.

One formula: Multiply input X by weight matrix W and add bias b to get output Y. So Y = W·X + b. Linear layer problems give you X, W, b and ask for Y, as in the purple box below.

Numeric example: With X = [2, 1], W = [[1,0],[1,1]], b = [1, -1], we get W·X = (2, 3). Adding bias b gives Y = (2+1, 3-1) = [3, 2]. The bias shifts each output up or down. Each entry of Y is the dot product of the corresponding row of W with X, plus the corresponding entry of b.

Blank strategy: If the blank is in Y, compute that row's W·X + b. If the blank is in W or b, use the known Y and X and rearrange the equation. Then verify by plugging back into Y = W·X + b.

Multiply input X by weight matrix W and add bias b to get output Y. Y=WX+bY = W \cdot X + b

X
2
1
·
W
1
0
1
1
+
b
1
-1
=
Y
3
2

W row 1·X + b[0] → Y[0]|W row 2·X + b[1] → Y[1]

Problem

Find the value that goes in the blank (?) in the linear layer Y=WX+bY = W \cdot X + b below.

X
1
2
·
W
0
-2
+
b
0
=
Y
1 / 20