Ch.21
GAN Basics: Generator vs Discriminator
GAN: make and tell apart
Like a counterfeiter and an expert keeping each other sharp.
Real photos and fakes from noise enter the discriminator as real or fake. First tell who makes (G) and who judges (D).
Training flow
- ① Real images: Take one real example from the training data.
- ② Random noise: Sample noise that decides what to generate.
- ③ Generator: Turn into a fake sample .
- ④ Discriminator: Look at and and tell real vs fake.
- ⑤ Take turns: Update G and D one after another, a little at a time.
A GAN (Generative Adversarial Network) is an innovative setup where the Generator () that creates new content and the Discriminator () that judges real vs fake keep competing and improving. Think of a breathless mind game between a genius counterfeiter and a veteran forensic detective: the counterfeiter keeps refining fakes, and the detective keeps raising detection skills. In this tense minimax tug of war, the counterfeiter may eventually produce outputs humans cannot tell from real data. This chapter explores the math behind GANs, the minimax game, and mode collapse when the generator falls into mannerism—with rich examples.
Reading the formulas (GAN)
In one line: makes fakes; tries to tell real from fake.
(generator): maps noise to a new fake sample.
(discriminator): outputs how likely the input is real, between 0 and 1.
: and pull the score in opposite directions— avoids a bad score, seeks a good one—so they are trained in turns.
: the objective they compete over: the left plus right terms below.
(left term): draw real many times and average . This is the real-data side.
: grows when is near 1. wants to call real real.
(right term): draw noise , build fakes , and average . This is the fake-data side.
: one fake sample from the noise you drew.
: grows when calls the fake fake ( near 0). wins by fooling .
recap: for any input, probability it is real (near 0 = fake, near 1 = real).
GAN: Generator vs. Discriminator
1. Core GAN architecture: generator vs discriminator
A GAN is a structure where two networks fight endlessly and grow stronger. The Generator () tries to make fake data look real, while the Discriminator () sharply judges real vs fake.
* Analogy: A forger (generator) brings a fake painting, and an appraiser (discriminator) uses a magnifying glass to tell originals from fakes. Each side keeps sharpening its craft.
2. The minimax objective
The core GAN objective is:
* Discriminator (), maximize: on real , push toward ; on fake , push toward .
* Generator (), minimize: make the discriminator treat as real () so the second term shrinks.
3. Latent noise
Latent noise is the random vector fed to the generator as a starting point.
* Analogy: Like a lump of clay handed to a sculptor—small changes in can change expression, color, or style in the finished image.
4. Mode collapse
A notorious failure mode: the generator stops exploring diversity and keeps copying one sample that already fooled the discriminator.
* Analogy: A restaurant that earns a perfect score for kimchi stew, then serves only kimchi stew to every guest all year.
5. Conditional GAN (cGAN)
Add a condition ()—class label or text—alongside to steer generation, e.g. "draw a cat" or "colorize this sketch".
Why it matters
1. A true starting point for generative AI
Where classifiers answer "this is a dog," GANs paint dogs that never existed—a backbone of modern generative AI across images, audio, and voice.
2. Sharp, vivid detail
Unlike blurry average-seeking models, GANs must pass a harsh critic, so hair strands and skin texture can look razor-sharp.
3. Data augmentation
Train on a few snowy-night driving photos and synthesize thousands more; rare medical or defect images can be multiplied for downstream models.
How it is used
Step 1: Normalize inputs (tanh)
Scale pixels (often 0–255) to . If the generator ends with , match real images to the same range so the discriminator compares fairly.
Step 2: BCE loss and label smoothing
Use binary cross-entropy (BCE) for real vs fake. Label smoothing (e.g. targets instead of ) can curb an overconfident discriminator.
Step 3: Alternate training
Do not update and in lockstep. Often train for steps, then once. If dominates, may vanish-gradient; balance learning rates and update ratios.
Step 4: Stability checks and FID
Watch for mode collapse visually. FID compares real vs fake feature distributions—lower FID usually means closer match to real data.
Summary
One-line summary: A GAN is a generator–discriminator game that learns to produce realistic samples from noise .
Key point: stability, balance, and diversity are the main things to watch.
Next: conditional and more stable variants extend the same idea.
Problem-solving notes
Start with one line: Generator turns noise into fakes; discriminator judges real vs fake. First decide who makes and who judges, then add minimax, alternating updates, and mode collapse when needed.
When numbers appear: flattened length is (height)×(width)(×3 for RGB); patch grids without CLS use ; one fully connected layer is roughly weights.
Example (flatten) — GAN grayscale flattened ? → 784
Example (patch grid) — , patches, no CLS → 196
Example (concept) — Generator role in a GAN?
② Turn noise into fakes → 2
Example (calculation) — RGB with 3 channels flattened ? → 3072
Example (application) — Discriminator too strong?
① Rebalance G/D updates
Definition — Mode collapse means repeating nearly the same sample. → pick that description
True/false — A conditional GAN can use labels or conditions. → 1