Ch.06

Decision Tree: Twenty Questions to the Answer

A decision tree works like the game of Twenty Questions: ask yes/no questions, follow branches, and reach a prediction at a leaf. It is easy to interpret (you can see exactly why it made each decision) and is the building block for random forests and other ensemble methods.

ML diagram by chapter

Select a chapter to see its diagram below. View the machine learning flow at a glance.

From the root, follow branches by answering yes/no to each question; the leaf gives the prediction.

 

RootYes(1)No(0)QuestionQuestionLeaf 0Leaf 1

 

Decision Tree: Twenty Questions to the Answer

Basic structure — Picture an upside-down tree. At the top is the root node (first question). From there you ask a condition (e.g. “Is feature x13x_1 \le 3?”); yes and no lead to internal nodes. When you can’t split further, you reach a leaf node and output the prediction (class or value).
Same as Twenty Questions — Just like guessing an animal by asking “Does it have four legs?” → “Is it a herbivore?” → “Tiger!”, the tree narrows down the answer step by step. Each question splits the data into two groups.
Good questions: reducing impurityImpurity measures how mixed the classes are at a node. We want splits that make nodes purer. Two common formulas: Gini G=1pi2G = 1 - \sum p_i^2 and Entropy H=pilog2piH = -\sum p_i \log_2 p_i. When one class has 100% (p=1p=1), both are 0 (pure). When classes are half-and-half, impurity is high.
Information gainInformation gain = impurity before the split minus (weighted) impurity after. It measures how much a question “cleans up” the data. The tree chooses the question with the highest information gain at each step.
Prediction at the leaf — At a leaf, we output: for classification, the majority class of the samples there; for regression, the average of their target values. For new data, we just follow the path and read off the leaf’s prediction.
Pruning — A tree that is too deep overfits (memorizes the training set). Pruning cuts branches to limit depth and improve generalization. These pruned trees are the base models used in random forest and other ensembles.
Explainable AI — Unlike many black-box models, a decision tree shows the exact path of questions that led to each prediction (e.g. “age < 30 and income ≥ 30M → approve loan”). This is valued in finance and healthcare.
Nonlinear boundaries — Linear models cut the space with a single line; a tree can approximate step-like boundaries by repeated splits, capturing more complex patterns.
Foundation for ensembles — A single tree can be unstable, but hundreds of trees (e.g. random forest) form a strong, robust model. Ch06 is the basis for Ch07 Ensemble.
Credit and loans — Questions like “Income ≥ 50M?” and “Any default in the last year?” form a path to approve or deny.
Medical decision support — Patient data (blood pressure, cholesterol, etc.) is used in a sequence of questions to predict disease risk and support diagnosis.
Marketing (churn, purchase) — “Registered > 6 months?”, “Logins in the last month ≤ 3?” help find at-risk customers for targeted campaigns.
Decision tree — solving guide
(1) Follow path: Start at root; 0 = no/left, 1 = yes/right; the leaf’s prediction is the answer.
(2) Gini: Get pip_i from class counts, compute G=1ipi2G = 1 - \sum_i p_i^2, then round 100×G100 \times G.

(3) Entropy: H=ipilog2piH = -\sum_i p_i \log_2 p_i, then round 100×H100 \times H.

(4) Leaf majority: If class 0 has aa and class 1 has bb, predict 0 if aba \ge b, else 1. Node count, leaf count, depth: use the numbers given in the problem. See the Explanation for solving the problems table below.