Ch.02

K-최근접 이웃 (KNN): 유유상종, 끼리끼리 모이기

Birds of a feather flock together — KNN finds the K nearest stored examples and uses their labels (majority vote) to predict the new one. No fancy training; just distance and neighbors.

Select a chapter to see its diagram below. View the machine learning flow at a glance.

① Training data — points in feature space (labels 1 or 2)

Dashed circles: distance order. K=3 neighbors (purple) labels: 1, 2, 2 → majority 2

K-Nearest Neighbors (KNN): Birds of a Feather

What is KNN? — For a new data point, we pick the K closest points among labeled data and assign the majority label . Example: if 4 of the 5 nearest emails are spam, the new email is classified as spam.

d(\mathbf{x}, \mathbf{y}) = \sqrt{\sum_{i}(x_i - y_i)^2}

K is a hyperparameter — K=1 uses only the single nearest neighbor; larger K smooths the decision but can blur boundaries. Odd K is often used to avoid ties.

No explicit training (lazy learning) — KNN does not learn a compact model; at prediction time it computes distances to all stored points. Training cost is low; prediction cost can be high.

Interpretable — We can explain a prediction by showing the K neighbors (e.g. "spam because 4 of 5 similar emails were spam"), which supports explainable AI.

Useful as a baseline — Before trying complex models, KNN gives a quick sense of how well the data can be classified.

Classification — Majority vote among the K neighbors' labels. Used in image classification, spam detection, risk bands, etc.

Regression — Predict the average of the K neighbors' target values (e.g. house price from nearby sales).

Distance and scale — If features have different scales, distance is dominated by one feature. Normalization or standardization is recommended before computing distances.

KNN works by selecting the K closest stored examples to a new input, then using majority vote of their labels for classification or the average of their values for regression. There is no separate training step—only distance computation—so it is intuitive, but normalization (scaling) is important so that no single feature dominates the distance.

\mathbf{x}

Step	Description
Input	New feature vector $\mathbf{x}$
Stored	Labeled examples $(\mathbf{x}_i, y_i)$
1	Compute distance $d(\mathbf{x}, \mathbf{x}_i)$ to each $\mathbf{x}_i$
2	Select K smallest distances
3 (classification)	Predict $\hat{y}$ by majority vote of the K labels
3 (regression)	Predict $\hat{y}$ as average of the K $y_i$ values