Ch.01

Supervised, Unsupervised, and Self-Supervised Learning

Machine learning is often divided into supervised, unsupervised, and self-supervised learning depending on how data is used. Supervised learning is like studying with an answer key; unsupervised learning is like finding patterns and grouping similar items without labels; self-supervised learning is like masking part of the data and learning by predicting the missing part. This chapter summarizes the core ideas, math, and real-world use of these three paradigms so you can build a solid base for the algorithms covered later.

Select a chapter to see its diagram below. View the machine learning flow at a glance.

Three learning paradigms: supervised (input–label pairs), unsupervised (no label), self-supervised (self-created target).

Supervised: input x and label y come in pairs

(x₁,y₁)→

(x₂,y₂)→

(x₃,y₃)

When (x, y) pairs are given in order, the model learns the rule

Unsupervised: only input x (no label y)

x1x2x3x4x5x6

There is no y (label), only x. Some x blink on and off → the model still finds structure and clusters

Self-supervised: mask part of the data and predict the gap

…

① Mask② Predict③ Fill

e.g. fill in the blank → representation learning (BERT, etc.)

Three Ways of Learning: Supervised, Unsupervised, Self-Supervised

\mathbf{x}

\mathbf{x}

Self-Supervised Learning: Creating targets from data Instead of human labels, the model creates pseudo-labels from the data. Typical flow: (1) Mask part of the input (e.g. a word, an image patch). (2) Predict the masked part from the rest. (3) Use the learned representation for downstream tasks with a small amount of supervised data. This is how BERT, GPT, and many vision models are pre-trained on large unlabeled corpora. * Example 1 (language) : "I ate [ MASK ]" \to predict the masked word from context (LLMs). * Example 2 (vision) : Mask a region of an image and reconstruct it from the rest. * Example 3 (contrastive) : Treat two augmented views of the same image as "same" and different images as "different" to learn representations.

Data nature and cost — Building labels for all data is expensive. When labels are sufficient, supervised is effective; when they are scarce, unsupervised or self-supervised use unlabeled data, then a small supervised fine-tuning step. Interpretability also differs: supervised allows some explanation via loss and decision path; unsupervised/self-supervised require separate interpretation (e.g. cluster names, visualization).

Pre-training and fine-tuning — Modern pipelines often use self-supervised pre-training on large unlabeled data, then supervised fine-tuning on a small labeled set. Unsupervised is common in preprocessing and exploration—e.g. cluster customers with K-Means, assign human meanings to clusters (e.g. "loyal", "churn risk"), then build a supervised churn model. Choosing the right paradigm makes the pipeline clear and realistic given data size and label cost.

Supervised — Ch02 KNN, Ch03 Linear Regression, Ch04 Logistic Regression learn from (input, label) pairs. Classification : spam filter, disease prediction, image classification. Regression : house price, sales, temperature—Ch03/Ch04 cover the math and optimization.

Unsupervised — Ch08 K-Means clusters data without labels; dimension reduction (reducing many features to 2-3 numbers) is another key tool. Clustering : customer segmentation, topic grouping. Anomaly detection : learn a "normal" region, flag points outside it.

Self-supervised — BERT (masked word prediction), GPT (next-token prediction), and contrastive learning in vision are widely used. After pre-training, a small amount of labeled data is used for QA, summarization, or classification.

y=f(\mathbf{x})

y

By problem type — Definition : supervised = (x,y) pairs; unsupervised = no label; self-supervised = self-created target. Task : Human-provided labels? \to Supervised. No labels, only grouping/reduction? \to Unsupervised. Labels derived from data (e.g. masked word)? \to Self-supervised. Scenarios : spam classification (supervised), customer clustering (unsupervised), predict masked word (self-supervised).

One-line comparison — Supervised: "Learn from (question, answer) pairs." Unsupervised: "No answers—only group or reduce the data." Self-supervised: "Mask part of the data and predict the gap to learn representations." In problems, check whether labels exist and whether they are human-provided or data-derived to choose the type.

	Supervised	Unsupervised	Self-Supervised
Label	Yes ( $y$ )	No	Self-created target
Goal	Predict $y$ (classification/regression)	Structure, clusters, dimensionality reduction	Representation learning
Examples	KNN, linear/logistic regression	K-Means, dimension reduction	BERT, contrastive learning