Ch.03
K-Nearest Neighbors (KNN): Birds of a Feather
Birds of a feather flock together — KNN finds the K nearest stored examples and uses their labels (majority vote) to predict the new one. No fancy training; just distance and neighbors.
ML diagram by chapter
Select a chapter to see its diagram below. View the machine learning flow at a glance.
1/5
① Training data — points in feature space (labels 1 or 2)
Dashed circles: distance order. K=3 neighbors (purple) labels: 1, 2, 2 → majority 2
K-Nearest Neighbors (KNN): Birds of a Feather
What is KNN? — For a new data point, we pick the K closest points among labeled data and assign the majority label. Example: if 4 of the 5 nearest emails are spam, the new email is classified as spam.
'Closest' means distance in feature space — Usually Euclidean distance: . With two features, this is the straight-line distance on the plane.
K is a hyperparameter — K=1 uses only the single nearest neighbor; larger K smooths the decision but can blur boundaries. Odd K is often used to avoid ties.
Why it matters
No explicit training (lazy learning) — KNN does not learn a compact model; at prediction time it computes distances to all stored points. Training cost is low; prediction cost can be high.
Interpretable — We can explain a prediction by showing the K neighbors (e.g. "spam because 4 of 5 similar emails were spam"), which supports explainable AI.
Useful as a baseline — Before trying complex models, KNN gives a quick sense of how well the data can be classified.
How it is used
Classification — Majority vote among the K neighbors' labels. Used in image classification, spam detection, risk bands, etc.
Regression — Predict the average of the K neighbors' target values (e.g. house price from nearby sales).
Distance and scale — If features have different scales, distance is dominated by one feature. Normalization or standardization is recommended before computing distances.