Chapter 06

Batch (Compute All at Once)

A group of samples processed together in one forward pass.

Deep learning diagram by chapter

As you complete each chapter, the diagram below fills in. This is the structure so far.

So output Y also comes out as one table at once.

Input table X
X1
X2
X3
Sample 1
Sample 2
Sample 3
One column = one sample
Same W, bCompute at once
Output Y
Sample 1
Sample 2
Sample 3
Y1
Y2
Y3
← Result from same W, b at once

So when we merge inputs into one table, output Y also comes out as one table at once.

Batch in deep learning

A batch means grouping multiple inputs (samples) into one table (matrix) and computing them all at once with the same weights. Each column = one sample in the table.

Imagine a teacher grading tests one by one vs. feeding 30 tests into a grading machine at once—the machine is much faster. Batching works the same way: the GPU processes many inputs simultaneously.

Key idea: the same W (weights) and b (bias) are applied to all samples. The only thing that differs per sample is the input X. That's why one matrix multiplication can compute results for many samples at once.

Speed: GPUs are optimized for processing thousands of numbers simultaneously rather than one at a time. Batching lets you use the GPU's full power, computing tens to hundreds of times faster than one-by-one.

Training stability: Updating weights based on just 1 sample is noisy. Using a mini-batch (e.g., 32 or 64 samples) averages the gradients for much more stable learning. Batch size is a critical training setting.

Memory management: With 1 million data points, you can't fit them all at once (GPU memory!). So you split into mini-batches (e.g., 64 at a time), process each batch, update weights, and repeat.

Netflix & YouTube recommendations: Instead of computing for one user at a time, thousands of users' data are batched for simultaneous scoring. This enables real-time service.

ChatGPT & translators: When many users ask questions at the same time, their queries are batched together for one GPU pass. That's how millions of users get fast responses simultaneously.

Image training: When training on 100,000 images, they're split into mini-batches of 32, running 3,125 iterations. Each mini-batch computes Z = W·X + b, measures error (loss), and slightly adjusts weights.

X has multiple columns: Each column is one sample. Use the same W and b for each column. Find which row and column the blank is in, and use only that column's numbers to compute.

Add/subtract/multiply/mean operations: These apply to same positions (same row, same column). For mean (e.g., zero-centering), compute the average per column. Use only that column's values for the blank.

Verification tip: Each column is independent—one column's result doesn't affect another. Check each column separately to catch mistakes easily.

A batch stacks multiple samples as columns of a matrix. The same W and b are applied at once: Y=WX+bY = W \cdot X + b .

Batch X
{1}
{2}
{3}
X1
3
1
-1
X2
1
-2
3
·
W
1
0
0
1
+
b
1
0
=
Z
{1}
{2}
{3}
Z1
4
2
0
Z2
1
-2
3

One column = one sample. Same W and b applied to all columns at once.

Example: calculation for one column (sample)(열 1: X₁=3, X₂=1)

Z1 = (W row 1·this column)+b[0] = (1×3+0×1)+(1) = 4
Z2 = (W row 2·this column)+b[1] = (0×3+1×1)+(0) = 1

Problem

Fill in the blank (?) in the batch operation (weight times input plus bias, add, subtract, multiply, subtract mean, sum, or mean).

Subtract each row's mean from that row and fill the blank.

{1}
{2}
{3}
Y1
3
-1
-1
Y2
-1
-3
-2
μ
0
-2
=
{1}
{2}
{3}
Y1
3
-1
Y2
1
-1
0
(row mean → 0)
1 / 20