Ch.08

K-Means Clustering: Grouping Without Labels

K-Means is a classic unsupervised learning algorithm that groups data into K clusters using distance —no labels. You will see how the 'unsupervised' idea from Ch01 works in practice: concept \to intuition \to math \to application. It reuses the distance formula from Ch02 (KNN) and shows how repeating 'assign to nearest center' and 'update centers' yields clear clusters.

Select a chapter to see its diagram below. View the machine learning flow at a glance.

Assign each point to the nearest center, then move centers to the mean of assigned points; repeat.

① Data — unlabeled points in feature space

Point: Data

y

K is the number of clusters — The user chooses K (e.g. K=2 \to two groups). There are no 'correct' labels, only a partition. In practice, K is chosen by domain knowledge, the elbow method, or silhouette scores.

J = \sum_{k=1}^K \sum_{i \in C_k} \|\mathbf{x}_i - \boldsymbol{\mu}_k\|^2

J

Ch01 unsupervised learning in action — K-Means is the go-to when you have no labels and want structure (e.g. customer segmentation, clustering documents or images, preprocessing for anomaly detection).

Customer segmentation — With only purchase history and no segment labels, K-Means groups similar customers; people then attach meaning (e.g. VIP, churn risk) to each cluster and use it for downstream tasks (Ch09, Ch12).

Simple and interpretable — Assign (nearest center) and update (mean) are easy to implement and visualize in 2D.

Clustering — Customer segmentation, topic/document grouping, image color compression, gene expression groups.

Preprocessing — Use cluster index as a new feature for supervised models, or keep only centroids to reduce data size.

Choosing K — The user sets K; compare SSE or silhouette across K to pick a value (e.g. elbow).

K

(x_1,y_1)

d(\mathbf{x}, \boldsymbol{\mu}) = \sqrt{\sum_j (x_j - \mu_j)^2}

Item	Description
Distance squared	For two points $(x_1,y_1)$ , $(x_2,y_2)$ : $(x_2-x_1)^2+(y_2-y_1)^2$ . No need for the square root when comparing.
Assign	For a point and $K$ centers, compute distance (or distance²) to each; the smallest center index (1-based) is that point's cluster.
Center update	New center = (mean of $x$ , mean of $y$ ) of points in that cluster; round if needed.
SSE	In one cluster: $J = \sum_{i \in C_k} \lVert\mathbf{x}_i - \boldsymbol{\mu}_k\rVert^2$ (sum of squared distances to center).