Chapter 07

Weight Connections

The weighted links between layers and neurons.

Deep learning diagram by chapter

As you complete each chapter, the diagram below fills in. This is the structure so far.

Each line between layers is a weight (w). Multiply input by weights, add them, then add bias (b) to get the next layer Y.

weight(w)weight(w)weight(w)weight(w)weight(w)weight(w)weight(w)weight(w)weight(w)X1X2X3+bias(b)Y1Y2Y3

Circles are values, lines are weights (w). Add bias (b) to the weighted sum to get the next layer Y.

Connection in deep learning

A connection describes how neurons in one layer link to neurons in the next layer. Each connection has a weight (number) that determines 'how much this input affects this output.'

Fully connected: Every neuron in the previous layer connects to every neuron in the next. The linear layer (Y = W·X + b) we've learned is exactly a fully connected layer—every entry in W has a number.

Partially connected: Some entries in W are zero, meaning 'no connection.' That input has no effect on that output. CNNs, which connect only nearby pixels, are a classic example of partial connections.

Connection structure defines the model's character. Fully connected considers all inputs (more information but more parameters), while partial connections only look at what's needed (efficient and fast but may miss some information).

AI training is the process of adjusting connection strengths (weights). 'Make this connection stronger, that one weaker'—gradually adjusting to produce outputs closer to the correct answer. Large models have billions of such connections.

Looking at where W is zero reveals what the model ignores. After training, connections with near-zero weights indicate 'unimportant information.' This is used in pruning to make models lighter.

Image recognition (CNN): Uses partial connections where only nearby pixels connect. Distant pixels are less relevant, so this reduces parameters and is faster and more efficient.

Chatbots & translators (Transformer): Attention determines 'which words relate to which other words'—it learns which connections to strengthen dynamically from the data.

Recommendation & speech recognition: The weights connecting user features to product features directly become recommendation scores. In speech recognition, the model learns how each sound frequency connects to the next layer's features.

W = 0 means no connection: For example, if W(2,1) = 0, the 1st input has zero effect on the 2nd output. You can skip it entirely in the calculation.

Finding one output: Find which inputs are connected (W ≠ 0) to that output, multiply W · X for those positions only, sum them, and add b. Zero entries multiply to zero, so skipping them gives the same result.

Blank strategy: First, identify the zero entries in W. Then set up equations using only the non-zero connections. If the blank is in W, use Y and X to reverse-calculate; if it's in Y, compute forward from W and X.

Connections describe how neurons in one layer link to the next. Only non-zero weights are actual links; the graph below shows those partial connections.

W
0
1
0
0
1
0
·
X
1
2
1
+
b
1
-1
=
Y
3
1

Each output: (W row·X) multiplied + b added = Y

Y₁ = (W row 1·X) + b₁ = (0×1 + 1×2 + 0×1) + 1 = 2 + 1 = 3
Y₂ = (W row 2·X) + b₂ = (0×1 + 1×2 + 0×1) + (-1) = 2 + (-1) = 1

W row 1·X + b[0] → Y[0]|W row 2·X + b[1] → Y[1]

Problem

In the connection Y=WX+bY = W \cdot X + b , find the value for the blank (?). Inputs with W=0 are not connected to that output.

X
2
1
·
W
-1
0
+
b
-1
=
Y
1 / 20