Everyone's AI
Machine learningAI Papers
Loading...

Learn

🏅My achievements

Ch.03

K-Nearest Neighbors (KNN): Birds of a Feather

Birds of a feather flock together — KNN finds the K nearest stored examples and uses their labels (majority vote) to predict the new one. No fancy training; just distance and neighbors.

ML diagram by chapter

Select a chapter to see its diagram below. View the machine learning flow at a glance.

1/5

① Training data — points in feature space (labels 1 or 2)

11221x₁x₂

Dashed circles: distance order. K=3 neighbors (purple) labels: 1, 2, 2 → majority 2

K-Nearest Neighbors (KNN): Birds of a Feather

What is KNN? — For a new data point, we pick the K closest points among labeled data and assign the majority label. Example: if 4 of the 5 nearest emails are spam, the new email is classified as spam.
'Closest' means distance in feature space — Usually Euclidean distance: d(x,y)=∑i(xi−yi)2d(\mathbf{x}, \mathbf y) = \sqrt{\sum_{i}(x_i - y_i)^2}d(x,y)=∑i​(xi​−yi​)2​. With two features, this is the straight-line distance on the plane.
K is a hyperparameter — K=1 uses only the single nearest neighbor; larger K smooths the decision but can blur boundaries. Odd K is often used to avoid ties.

Why it matters

No explicit training (lazy learning) — KNN does not learn a compact model; at prediction time it computes distances to all stored points. Training cost is low; prediction cost can be high.
Interpretable — We can explain a prediction by showing the K neighbors (e.g. "spam because 4 of 5 similar emails were spam"), which supports explainable AI.
Useful as a baseline — Before trying complex models, KNN gives a quick sense of how well the data can be classified.

How it is used

Classification — Majority vote among the K neighbors' labels. Used in image classification, spam detection, risk bands, etc.
Regression — Predict the average of the K neighbors' target values (e.g. house price from nearby sales).
Distance and scale — If features have different scales, distance is dominated by one feature. Normalization or standardization is recommended before computing distances.