Chapter 02
Matrix Multiplication
The product of two matrices is a new matrix whose entries are dot products of rows of the first and columns of the second.
Deep learning diagram by chapter
As you complete each chapter, the diagram below fills in. This is the structure so far.
Left is one row of matrix A; right Y1–Y3 are dot products with columns of B. Together they form the matrix product A·B.
Matrix multiplication in deep learning
Matrix multiplication combines two number tables (matrices) into a new one. Take one row of the first matrix and one column of the second, compute their dot product, and that fills one entry in the result.
Repeat this for every row-column combination and the result matrix is complete. For example, a 2×3 matrix times a 3×2 matrix gives a 2×2 result.
The rule for it to work: the number of columns of the first matrix must equal the number of rows of the second. Remember this, and you can always tell whether two matrices can be multiplied.
A linear layer in deep learning multiplies the input by a weight matrix—that's matrix multiplication. If you have 10 neurons, you'd need 10 dot products; matrix multiplication does all 10 at once.
GPUs are specifically designed to do thousands of matrix multiplications in parallel. This is why millions of multiplications finish instantly, enabling real-time image recognition and chatbots.
Nearly every operation in deep learning boils down to matrix multiplication—attention, convolution, recurrent networks. Understanding matrix multiplication means understanding the backbone of deep learning.
Image recognition: Pixel values are arranged in a matrix, multiplied by weight matrices to extract features like 'is there a dog or a cat?' This repeats across many layers.
Chatbots & translators: ChatGPT and Google Translate convert sentences into number matrices, then multiply by huge weight matrices dozens to hundreds of times to generate answers. Matrix multiplication accounts for most of the computation.
Recommendations & self-driving: Netflix computing recommendation scores for thousands of users at once, and a self-driving car recognizing obstacles from camera frames—both rely on large-scale matrix multiplication inside.
Finding one entry: Entry (i, j) of the result = dot product of row i of A and column j of B. Multiply same-position elements and sum.
Blank strategy: If the blank is in the result, just compute the dot product for that row and column. If the blank is in A or B, use the known result and other values to work backwards.
Check dimensions: Before multiplying, verify that A's column count equals B's row count. The result size is (A's rows) × (B's columns).
One row of A · one column of B (dot product) → one entry of the result matrix
This entry — Row 2 of A · column 2 of B (one dot product)
Problem
Find the value that goes in the blank (?) in the matrix product below.