Everyone's AI
Machine learningAI Papers
Loading...

Learn

🏅My achievements

Ch.00

Advanced DL: Large Models and Generative AI Paradigm

Advanced Deep Learning (Ch.00) is the entry point that connects “why models got so large” with “how generative AI systems actually work.” We go beyond learning representations from data: how large Transformers build contextual understanding, predict the next token, and then how we align, control, and deploy those models for real users.

An advanced roadmap toward large generative models

This roadmap gradually fills from Ch01 onward, showing how each chapter contributes to the full system.

What you will learn in Ch01–Ch24

  • Ch.01
    Transformer 1: Self-Attention and Parallelization
  • Ch.02
    Transformer: Positional Encoding and Feed-Forward
  • Ch.03
    Transformer Lineage: Encoder (BERT) vs Decoder (GPT)
  • Ch.04
    Attention Optimization: FlashAttention and Sparse Attention
  • Ch.05
    Vision Transformer (ViT) and Image Patches
  • Ch.06
    Swin Transformer: Hierarchical Windows and Global Context
  • Ch.07
    Vision Models: Local CNN vs Global ViT
  • Ch.08
    PEFT 1: PEFT and LoRA
  • Ch.09
    QLoRA and Quantization: Tuning When Smaller
  • Ch.10
    Value Alignment and RLHF: Matching Human Preferences
  • Ch.11
    DPO: Aligning with Preferences without Reinforcement Learning
  • Ch.12
    RAG: Reducing Hallucinations with Retrieval
  • Ch.13
    LLM Agents: Models That Use Tools
  • Ch.14
    Master CNNs: Kernels, Stride, Padding & Backbone Evolution
  • Ch.15
    Object Detection: R-CNN Family vs YOLO (Bounding Boxes)
  • Ch.16
    Image Segmentation: U-Net and DeepLab (Pixel-Level Understanding)
  • Ch.17
    Grad-CAM and XAI: Where CNNs Look
  • Ch.18
    Graph Neural Networks (GNN): Message Passing to Neighbors
  • Ch.19
    Autoencoder: Compress and Reconstruct
  • Ch.20
    VAE: A Generative Space in Probability
  • Ch.21
    GAN Basics: Generator vs Discriminator
  • Ch.22
    Conditional GAN: Generate on Condition
  • Ch.23
    Diffusion 1: Add Noise, Then Denoise
  • Ch.24
    Diffusion 2: Diffusing in Latent Space
  • Ch.25
    Vision-Language Models and CLIP: Images and Text Together (CNN Meets LLM)
  • Ch.26
    Speech Recognition and Audio: Sound to Text
  • Ch.27
    Model Compression and Knowledge Distillation
  • Ch.28
    Inference Optimization and Deployment: From Servers to Browser Runtimes
  • Ch.29
    Advanced DL Wrap-Up: Architecture and Future

What is Advanced DL? (Generative AI system view)

Foundation models / LLMs are trained with the objective of predicting the next token. In other words, they maximize p(xt∣x<t)p(x_t\mid x_{<t})p(xt​∣x<t​), learning language flow and patterns that go beyond simple grammar.
A practical way to understand generative AI is to split it into stages: pretraining (broad knowledge), instruction / SFT (follow user intent), and alignment (preference, safety, and reduced hallucinations).
The backbone is mostly Transformers. Self-attention creates “token-to-token” context, and feed-forward + normalization layers refine it so the model stays consistent even with long contexts.
Bigger models can improve capability, but they also make training less stable and dramatically increase cost. Advanced DL therefore focuses on more than accuracy: training stability, efficiency (compute/memory), and reproducibility.
In the real world, generative AI is judged by trust: truthfulness, safety, and reliability. Achieving that requires alignment, evaluation, and control mechanisms.
Finally, deployment constraints (latency, cost, server limits) matter. So advanced DL continues from training to inference optimization, compression, and serving strategies.
In production, systems usually follow a pipeline like `text/image -> tokenization -> context window -> Transformer -> decoding (greedy/beam/sample)`. Decoding strategy and prompt design strongly affect output quality.
Alignment and control can be done in multiple ways. For example, RLHF / DPO uses preferences to improve the model, and RAG retrieves external knowledge to ground answers.
From a product perspective, tool use, caching/batching, and optimization such as quantization or knowledge distillation are part of the whole stack. The same base model can feel very different depending on how you run it.
This section ties the whole Advanced DL track to how you might reason about it in exam-style questions. Next-token prediction in pretraining builds broad language ability and connects to probabilistic generation and representation learning. Instruction tuning and SFT shape how models follow user intent, which brings in data formatting and fine-tuning.
Alignment addresses preferences, safety, and truthfulness through ideas like preference learning and reward modeling. RAG and grounded generation lean on retrieval, embeddings, and assembling context to reduce ungrounded answers. Inference optimization targets latency and cost with quantization, caching, distillation, and similar serving-side tools.