What is machine learning?

Machine learning learns patterns from data to make predictions. Start with https://mdooai.com/en/learn/ml/mlSupervisedUnsupervisedSelf.

What is the difference between ML and DL?

Deep learning is a subset of machine learning focused on neural networks. Build foundations at https://mdooai.com/en/learn/ml/mlDataFeature first.

How do I start hyperparameter tuning?

Use cross-validation while narrowing search ranges. Start at https://mdooai.com/en/learn/ml/mlCrossValidation.

Ch.02

Supervised, Unsupervised, and Self-Supervised Learning

Machine learning is often divided into supervised, unsupervised, and self-supervised learning depending on how data is used. Supervised learning is like studying with an answer key; unsupervised learning is like finding patterns and grouping similar items without labels; self-supervised learning is like masking part of the data and learning by predicting the missing part. This chapter summarizes the core ideas, math, and real-world use of these three paradigms so you can build a solid base for the algorithms covered later.

ML diagram by chapter

Select a chapter to see its diagram below. View the machine learning flow at a glance.

Three learning paradigms: supervised (input–label pairs), unsupervised (no label), self-supervised (self-created target).

Supervised: input x and label y come in pairs

(x₁,y₁)→

(x₂,y₂)→

(x₃,y₃)

When (x, y) pairs are given in order, the model learns the rule

Unsupervised: only input x (no label y)

x1x2x3x4x5x6

There is no y (label), only x. Some x blink on and off → the model still finds structure and clusters

Self-supervised: mask part of the data and predict the gap

…

① Mask② Predict③ Fill

e.g. fill in the blank → representation learning (BERT, etc.)

Three Ways of Learning: Supervised, Unsupervised, Self-Supervised

\mathbf{x}

Why it matters

Data nature and cost — Building labels for all data is expensive. When labels are sufficient, supervised is effective; when they are scarce, unsupervised or self-supervised use unlabeled data, then a small supervised fine-tuning step. Interpretability also differs: supervised allows some explanation via loss and decision path; unsupervised/self-supervised require separate interpretation (e.g. cluster names, visualization). Pre-training and fine-tuning — Modern pipelines often use self-supervised pre-training on large unlabeled data, then supervised fine-tuning on a small labeled set. Unsupervised is common in preprocessing and exploration—e.g. cluster customers with K-Means, assign human meanings to clusters (e.g. "loyal", "churn risk"), then build a supervised churn model. Choosing the right paradigm makes the pipeline clear and realistic given data size and label cost.

How it is used

Supervised — Ch02 KNN, Ch03 Linear Regression, Ch04 Logistic Regression learn from (input, label) pairs. Classification : spam filter, disease prediction, image classification. Regression : house price, sales, temperature—Ch03/Ch04 cover the math and optimization. Unsupervised — Ch08 K-Means clusters data without labels; dimension reduction (reducing many features to 2-3 numbers) is another key tool. Clustering : customer segmentation, topic grouping. Anomaly detection : learn a "normal" region, flag points outside it. Self-supervised — BERT (masked word prediction), GPT (next-token prediction), and contrastive learning in vision are widely used. After pre-training, a small amount of labeled data is used for QA, summarization, or classification.

Three Ways of Learning: Supervised, Unsupervised, Self-Supervised

Supervised Learning: Learning from input–label pairs

The model is given input $\mathbf{x}$ and the corresponding label (target) $y$ as pairs. The goal is to approximate a function

y = f(\mathbf{x})

. Formally we have a training set

\mathcal{D} = \{(\mathbf{x}_1, y_1), (\mathbf{x}_2, y_2), \ldots\}

and find

f

by minimizing a loss (e.g. MSE, cross-entropy). Ch02 KNN, Ch03 Linear Regression, Ch04 Logistic Regression are all supervised.

* Example 1 (classification): Spam filter—email content (

\mathbf{x}

) → spam or not (

y

* Example 2 (regression): House price—area, location (

\mathbf{x}

) → price (

y

* Example 3 (medical): Patient test values (

\mathbf{x}

) and diagnosis (

y

) for decision support.

Unsupervised Learning: Discovering hidden structure

Only input $\mathbf{x}$ is given; there is no label $y$ . Think of it as "only questions, no answer key." The goal is to find structure, patterns, or clusters using distance and similarity between

\mathbf{x}

s: group similar points (clustering), compress to fewer dimensions (dimensionality reduction), or flag anomalies that fall outside the normal pattern.

* Example 1 (clustering): Customer age and purchase history (

\mathbf{x}

) → segment similar customers.

* Example 2 (anomaly detection): Learn normal payment patterns (

\mathbf{x}

), then flag unusual transactions.

* Example 3 (dimension reduction): Reduce many features to 2–3 numbers for visualization or denoising. (You’ll learn concrete methods later.)

Self-Supervised Learning: Creating targets from data

Instead of human labels, the model creates pseudo-labels from the data. Typical flow:

(1) Mask part of the input (e.g. a word, an image patch).

(2) Predict the masked part from the rest.

(3) Use the learned representation for downstream tasks with a small amount of supervised data. This is how BERT, GPT, and many vision models are pre-trained on large unlabeled corpora.

* Example 1 (language): "I ate [ MASK ]" → predict the masked word from context (LLMs).

* Example 2 (vision): Mask a region of an image and reconstruct it from the rest.

* Example 3 (contrastive): Treat two augmented views of the same image as "same" and different images as "different" to learn representations.