Everyone's AI
Machine learningAI Papers
Loading...

Learn

🏅My achievements

Ch.21

GAN Basics: Generator vs Discriminator

GAN: make and tell apart

Like a counterfeiter and an expert keeping each other sharp.

Real photos and fakes from noise enter the discriminator as real or fake. First tell who makes (G) and who judges (D).

Training flow

  1. ① Real images: Take one real example xxx from the training data.
  2. ② Random noise: Sample noise zzz that decides what to generate.
  3. ③ Generator: Turn zzz into a fake sample x^\hat{x}x^.
  4. ④ Discriminator: Look at xxx and x^\hat{x}x^ and tell real vs fake.
  5. ⑤ Take turns: Update G and D one after another, a little at a time.
A GAN (Generative Adversarial Network) is an innovative setup where the Generator (GGG) that creates new content and the Discriminator (DDD) that judges real vs fake keep competing and improving. Think of a breathless mind game between a genius counterfeiter and a veteran forensic detective: the counterfeiter keeps refining fakes, and the detective keeps raising detection skills. In this tense minimax tug of war, the counterfeiter may eventually produce outputs humans cannot tell from real data. This chapter explores the math behind GANs, the minimax game, and mode collapse when the generator falls into mannerism—with rich examples.

Reading the formulas (GAN)

In one line: GGG makes fakes; DDD tries to tell real from fake.
min⁡Gmax⁡DV(D,G)=Ex∼pdata[log⁡D(x)]+Ez∼p(z)[log⁡(1−D(G(z)))]\min_G \max_D V(D,G)=\mathbb{E}_{x\sim p_{data}}[\log D(x)] + \mathbb{E}_{z\sim p(z)}[\log(1-D(G(z)))]minG​maxD​V(D,G)=Ex∼pdata​​[logD(x)]+Ez∼p(z)​[log(1−D(G(z)))]
GGG (generator): maps noise zzz to a new fake sample.
DDD (discriminator): outputs how likely the input is real, between 0 and 1.
min⁡Gmax⁡D\min_G \max_DminG​maxD​: GGG and DDD pull the score in opposite directions—GGG avoids a bad score, DDD seeks a good one—so they are trained in turns.
V(D,G)V(D,G)V(D,G): the objective they compete over: the left plus right terms below.
Ex∼pdata[⋅]\mathbb{E}_{x\sim p_{data}}[\cdot]Ex∼pdata​​[⋅] (left term): draw real xxx many times and average log⁡D(x)\log D(x)logD(x). This is the real-data side.
log⁡D(x)\log D(x)logD(x): grows when D(x)D(x)D(x) is near 1. DDD wants to call real xxx real.
Ez∼p(z)[⋅]\mathbb{E}_{z\sim p(z)}[\cdot]Ez∼p(z)​[⋅] (right term): draw noise zzz, build fakes G(z)G(z)G(z), and average log⁡(1−D(G(z)))\log(1-D(G(z)))log(1−D(G(z))). This is the fake-data side.
G(z)G(z)G(z): one fake sample from the noise zzz you drew.
log⁡(1−D(G(z)))\log(1-D(G(z)))log(1−D(G(z))): grows when DDD calls the fake fake (D(G(z))D(G(z))D(G(z)) near 0). GGG wins by fooling DDD.
D(x)D(x)D(x) recap: for any input, probability it is real (near 0 = fake, near 1 = real).
Real samplexRandom noisezG Generatorx̂D DiscriminatorRealFakecompetition · real/fake prediction · adversarial loss
In one line: noise zzz goes into the generator, which creates fake samples, and the discriminator competes to tell real from fake.

GAN: Generator vs. Discriminator

1. Core GAN architecture: generator vs discriminator
A GAN is a structure where two networks fight endlessly and grow stronger. The Generator (GGG) tries to make fake data look real, while the Discriminator (DDD) sharply judges real vs fake.
* Analogy: A forger (generator) brings a fake painting, and an appraiser (discriminator) uses a magnifying glass to tell originals from fakes. Each side keeps sharpening its craft.
2. The minimax objective
The core GAN objective is:
min⁡Gmax⁡DV(D,G)=Ex[log⁡D(x)]+Ez[log⁡(1−D(G(z)))]\min_G \max_D V(D, G) = \mathbb{E}_{x}[\log D(x)] + \mathbb{E}_{z}[\log(1 - D(G(z)))]minG​maxD​V(D,G)=Ex​[logD(x)]+Ez​[log(1−D(G(z)))]
* Discriminator (DDD), maximize: on real xxx, push D(x)D(x)D(x) toward 111; on fake G(z)G(z)G(z), push D(G(z))D(G(z))D(G(z)) toward 000.
* Generator (GGG), minimize: make the discriminator treat G(z)G(z)G(z) as real (D(G(z))→1D(G(z)) \to 1D(G(z))→1) so the second term shrinks.
3. Latent noise zzz
Latent noise zzz is the random vector fed to the generator as a starting point.
* Analogy: Like a lump of clay handed to a sculptor—small changes in zzz can change expression, color, or style in the finished image.
4. Mode collapse
A notorious failure mode: the generator stops exploring diversity and keeps copying one sample that already fooled the discriminator.
* Analogy: A restaurant that earns a perfect score for kimchi stew, then serves only kimchi stew to every guest all year.
5. Conditional GAN (cGAN)
Add a condition (yyy)—class label or text—alongside zzz to steer generation, e.g. "draw a cat" or "colorize this sketch".

Why it matters

1. A true starting point for generative AI
Where classifiers answer "this is a dog," GANs paint dogs that never existed—a backbone of modern generative AI across images, audio, and voice.
2. Sharp, vivid detail
Unlike blurry average-seeking models, GANs must pass a harsh critic, so hair strands and skin texture can look razor-sharp.
3. Data augmentation
Train on a few snowy-night driving photos and synthesize thousands more; rare medical or defect images can be multiplied for downstream models.

How it is used

Step 1: Normalize inputs (tanh)
Scale pixels (often 0–255) to [−1,1][-1,1][−1,1]. If the generator ends with tanhtanhtanh, match real images to the same range so the discriminator compares fairly.
Step 2: BCE loss and label smoothing
Use binary cross-entropy (BCE) for real vs fake. Label smoothing (e.g. targets 0.90.90.9 instead of 1.01.01.0) can curb an overconfident discriminator.
Step 3: Alternate training
Do not update GGG and DDD in lockstep. Often train DDD for kkk steps, then GGG once. If DDD dominates, GGG may vanish-gradient; balance learning rates and update ratios.
Step 4: Stability checks and FID
Watch for mode collapse visually. FID compares real vs fake feature distributions—lower FID usually means closer match to real data.

Summary

One-line summary: A GAN is a generator–discriminator game that learns to produce realistic samples from noise zzz.
Key point: stability, balance, and diversity are the main things to watch.
Next: conditional and more stable variants extend the same idea.

Problem-solving notes

Start with one line: Generator GGG turns noise zzz into fakes; discriminator DDD judges real vs fake. First decide who makes and who judges, then add minimax, alternating updates, and mode collapse when needed.
When numbers appear: flattened length is (height)×(width)(×3 for RGB); patch grids without CLS use (H/p)×(W/p)(H/p)\times(W/p)(H/p)×(W/p); one fully connected layer is roughly din×doutd_{\mathrm{in}}\times d_{\mathrm{out}}din​×dout​ weights.
Example (flatten) — GAN grayscale 28×2828\times2828×28 flattened ddd? → 784

Example (patch grid) — 224×224224\times224224×224, 16×1616\times1616×16 patches, no CLS → 142=14^2=142= 196
Example (concept) — Generator role in a GAN?
② Turn noise zzz into fakes → 2

Example (calculation) — RGB 32×3232\times3232×32 with 3 channels flattened ddd? → 3072

Example (application) — Discriminator too strong?
① Rebalance G/D updates
Definition — Mode collapse means repeating nearly the same sample. → pick that description

True/false — A conditional GAN can use labels or conditions. → 1