Everyone's AI
Machine learningAI Papers
Loading...

Learn

🏅My achievements

Ch.08

K-Means Clustering: Grouping Without Labels

K-Means is a classic unsupervised learning algorithm that groups data into K clusters using distance—no labels. You will see how the 'unsupervised' idea from Ch01 works in practice: concept → intuition → math → application. It reuses the distance formula from Ch02 (KNN) and shows how repeating 'assign to nearest center' and 'update centers' yields clear clusters.

ML diagram by chapter

Select a chapter to see its diagram below. View the machine learning flow at a glance.

Assign each point to the nearest center, then move centers to the mean of assigned points; repeat.

① Data — unlabeled points in feature space

Point: Data

K-Means Clustering: Grouping Without Labels

What is K-Means? — With no labels yyy, only data x1,x2,…\mathbf{x}_1, \mathbf{x}_2, \ldotsx1​,x2​,…, K-Means partitions points into K groups by nearest centroid. Distance is Euclidean d(x,μ)=∑j(xj−μj)2d(\mathbf{x}, \boldsymbol{\mu}) = \sqrt{\sum_j (x_j - \mu_j)^2}d(x,μ)=∑j​(xj​−μj​)2​ (as in Ch02). Each group has one centroid μk\boldsymbol{\mu}_kμk​. The algorithm alternates: assign each point to the nearest center → set each center to the mean of its assigned points, until convergence.
K is the number of clusters — The user chooses K (e.g. K=2 → two groups). There are no 'correct' labels, only a partition. In practice, K is chosen by domain knowledge, the elbow method, or silhouette scores.
Objective: minimize SSE (distortion) — K-Means minimizes J=∑k=1K∑i∈Ck∥xi−μk∥2J = \sum_{k=1}^K \sum_{i \in C_k} \|\mathbf{x}_i - \boldsymbol{\mu}_k\|^2J=∑k=1K​∑i∈Ck​​∥xi​−μk​∥2. The update μk=1∣Ck∣∑i∈Ckxi\boldsymbol{\mu}_k = \frac{1}{|C_k|}\sum_{i \in C_k} \mathbf{x}_iμk​=∣Ck​∣1​∑i∈Ck​​xi​ (mean of assigned points) reduces each cluster's SSE.
If the formulas feel heavy — The distance formula is just 'length between a point and a center.' SSE JJJ is a single number for 'how tightly points sit around their center'; the algorithm moves centers to make JJJ smaller. The centroid update is literally 'average of the coordinates of points in that cluster.' The Formula guide below spells out each symbol step by step.
Ch01 unsupervised learning in action — K-Means is the go-to when you have no labels and want structure (e.g. customer segmentation, clustering documents or images, preprocessing for anomaly detection).
Customer segmentation — With only purchase history and no segment labels, K-Means groups similar customers; people then attach meaning (e.g. VIP, churn risk) to each cluster and use it for downstream tasks (Ch09, Ch12).
Simple and interpretable — Assign (nearest center) and update (mean) are easy to implement and visualize in 2D.
Clustering — Customer segmentation, topic/document grouping, image color compression, gene expression groups.
Preprocessing — Use cluster index as a new feature for supervised models, or keep only centroids to reduce data size.
Choosing K — The user sets K; compare SSE or silhouette across K to pick a value (e.g. elbow).
Summary
(1) Input: Unlabeled points and cluster count KKK.
(2) Initialize: Place KKK centroids at random or by heuristic.
(3) Assign: Each point goes to the nearest center's cluster.
(4) Update: Set each center to the mean of its assigned points.
(5) Repeat: Steps 3–4 until assignment and centers no longer change.
Objective: Minimize SSE (distortion) J=∑k∑i∈Ck∥xi−μk∥2J = \sum_{k}\sum_{i \in C_k} \|\mathbf{x}_i - \boldsymbol{\mu}_k\|^2J=∑k​∑i∈Ck​​∥xi​−μk​∥2.
Centroid update: μk=1∣Ck∣∑i∈Ckxi\boldsymbol{\mu}_k = \frac{1}{|C_k|}\sum_{i \in C_k} \mathbf{x}_iμk​=∣Ck​∣1​∑i∈Ck​​xi​
See the table and examples below.
Terminology
  • ItemDistance squared
  • DescriptionFor two points (x1,y1)(x_1,y_1)(x1​,y1​), (x2,y2)(x_2,y_2)(x2​,y2​): (x2−x1)2+(y2−y1)2(x_2-x_1)^2+(y_2-y_1)^2(x2​−x1​)2+(y2​−y1​)2. No need for the square root when comparing.
  • ItemAssign
  • DescriptionFor a point and KKK centers, compute distance (or distance²) to each; the smallest center index (1-based) is that point's cluster.
  • ItemCenter update
  • DescriptionNew center = (mean of xxx, mean of yyy) of points in that cluster; round if needed.
  • ItemSSE
  • DescriptionIn one cluster: J=∑i∈Ck∥xi−μk∥2J = \sum_{i \in C_k} \lVert\mathbf{x}_i - \boldsymbol{\mu}_k\rVert^2J=∑i∈Ck​​∥xi​−μk​∥2 (sum of squared distances to center).
ItemDescription
Distance squaredFor two points (x1,y1)(x_1,y_1)(x1​,y1​), (x2,y2)(x_2,y_2)(x2​,y2​): (x2−x1)2+(y2−y1)2(x_2-x_1)^2+(y_2-y_1)^2(x2​−x1​)2+(y2​−y1​)2. No need for the square root when comparing.
AssignFor a point and KKK centers, compute distance (or distance²) to each; the smallest center index (1-based) is that point's cluster.
Center updateNew center = (mean of xxx, mean of yyy) of points in that cluster; round if needed.
SSEIn one cluster: J=∑i∈Ck∥xi−μk∥2J = \sum_{i \in C_k} \lVert\mathbf{x}_i - \boldsymbol{\mu}_k\rVert^2J=∑i∈Ck​​∥xi​−μk​∥2 (sum of squared distances to center).

Example (assign)
Centers μ1=(0,0)\mu_1=(0,0)μ1​=(0,0), μ2=(4,0)\mu_2=(4,0)μ2​=(4,0); point (2,0)(2,0)(2,0). Distance²: d12=4d_1^2=4d12​=4, d22=4d_2^2=4d22​=4; tie → cluster 1. Answer 1

Example (center update)
Points (1,2)(1,2)(1,2), (3,4)(3,4)(3,4) in cluster 1 → new center xˉ=2\bar{x}=2xˉ=2, yˉ=3\bar{y}=3yˉ​=3. (2, 3)
Formula guide
① Distance d(x,μ)=∑j(xj−μj)2d(\mathbf{x}, \boldsymbol{\mu}) = \sqrt{\sum_j (x_j - \mu_j)^2}d(x,μ)=∑j​(xj​−μj​)2​
What it means — This formula measures the straight-line distance (ruler length) between one data point x\mathbf{x}x and one center μ\boldsymbol{\mu}μ.
Reading the symbols — xjx_jxj​ is the point’s jjj-th feature value, μj\mu_jμj​ is the center’s jjj-th coordinate. For example, with two features (e.g. height, weight), x1x_1x1​ and x2x_2x2​ are those values; you take (x1−μ1)2+(x2−μ2)2(x_1-\mu_1)^2 + (x_2-\mu_2)^2(x1​−μ1​)2+(x2​−μ2​)2, then the square root, which in 2D is the same as the Pythagorean “straight line” length.
Numeric example — Point (1,2)(1,2)(1,2), center (4,6)(4,6)(4,6). Differences: 4−1=34-1=34−1=3, 6−2=46-2=46−2=4. Squared distance =32+42=25= 3^2+4^2 = 25=32+42=25, distance =25=5= \sqrt{25}=5=25​=5.
In K-Means — We only need to compare which center is closer, so we can skip the square root and use squared distance (x1−μ1)2+(x2−μ2)2(x_1-\mu_1)^2+(x_2-\mu_2)^2(x1​−μ1​)2+(x2​−μ2​)2; the smaller value is the closer center.
One line: Formula for (squared) straight-line distance between a point and a center.

② Objective (SSE) J=∑k∑i∈Ck∥xi−μk∥2J = \sum_{k}\sum_{i \in C_k} \lVert\mathbf{x}_i - \boldsymbol{\mu}_k\rVert^2J=∑k​∑i∈Ck​​∥xi​−μk​∥2
What it means — The sum over all clusters of “squared distance from each point to its cluster center.” This total is called SSE (or distortion).
Reading the symbols — kkk = cluster index (1, 2, …). CkC_kCk​ = set of points in cluster kkk. xi\mathbf{x}_ixi​ = coordinates of the iii-th point. μk\boldsymbol{\mu}_kμk​ = center of cluster kkk. ∥xi−μk∥2\lVert \mathbf{x}_i - \boldsymbol{\mu}_k \rVert^2∥xi​−μk​∥2 is the squared distance from point iii to that center.
Why squared? — Squaring makes “far” points count more and keeps everything positive, so JJJ clearly measures how spread out the clusters are. Smaller JJJ means points are tighter around their centers; the algorithm moves centers to reduce JJJ.
Example — If cluster 1 has three points whose squared distances to its center are 1, 4, and 9, that cluster’s contribution is 1+4+9=141+4+9=141+4+9=14. JJJ is the sum of such contributions over all clusters.
One line: A number that says how tight the clusters are; smaller is better.

③ Centroid update μk=1∣Ck∣∑i∈Ckxi\boldsymbol{\mu}_k = \frac{1}{|C_k|}\sum_{i \in C_k} \mathbf{x}_iμk​=∣Ck​∣1​∑i∈Ck​​xi​
What it means — Take the average of the coordinates of all points in cluster kkk and use that as the new center.
Reading the symbols — ∣Ck∣|C_k|∣Ck​∣ = number of points in cluster kkk. ∑i∈Ckxi\sum_{i \in C_k} \mathbf{x}_i∑i∈Ck​​xi​ = sum of the xxx and yyy coordinates of those points. Multiplying by 1∣Ck∣\frac{1}{|C_k|}∣Ck​∣1​ gives the mean.
Numeric example — Cluster 1 has points (1,2)(1,2)(1,2), (3,4)(3,4)(3,4), (5,0)(5,0)(5,0), so ∣Ck∣=3|C_k|=3∣Ck​∣=3. Sum of xxx: 1+3+5=91+3+5=91+3+5=9, sum of yyy: 2+4+0=62+4+0=62+4+0=6. New center xˉ=9/3=3\bar{x}=9/3=3xˉ=9/3=3, yˉ=6/3=2\bar{y}=6/3=2yˉ​=6/3=2 → (3, 2).
Why the mean? — Moving the center to the mean of its points reduces the sum of squared distances (SSE) for that cluster; that’s why K-Means uses this update.
One line: New center = (mean xxx, mean yyy) of the points in that cluster.