Ch.06

Decision Tree: Twenty Questions to the Answer

A decision tree works like the game of Twenty Questions : ask yes/no questions, follow branches, and reach a prediction at a leaf. It is easy to interpret (you can see exactly why it made each decision) and is the building block for random forests and other ensemble methods.

Select a chapter to see its diagram below. View the machine learning flow at a glance.

From the root, follow branches by answering yes/no to each question; the leaf gives the prediction.

x_1 \le 3

Same as Twenty Questions — Just like guessing an animal by asking “Does it have four legs?” \to “Is it a herbivore?” \to “Tiger!”, the tree narrows down the answer step by step. Each question splits the data into two groups.

G = 1 - \sum p_i^2

Information gain — Information gain = impurity before the split minus (weighted) impurity after. It measures how much a question “cleans up” the data. The tree chooses the question with the highest information gain at each step.

Prediction at the leaf — At a leaf, we output: for classification, the majority class of the samples there; for regression, the average of their target values. For new data, we just follow the path and read off the leaf’s prediction.

Pruning — A tree that is too deep overfits (memorizes the training set). Pruning cuts branches to limit depth and improve generalization. These pruned trees are the base models used in random forest and other ensembles.

Explainable AI — Unlike many black-box models, a decision tree shows the exact path of questions that led to each prediction (e.g. “age &lt; 30 and income \geq 30M \to approve loan”). This is valued in finance and healthcare.

Nonlinear boundaries — Linear models cut the space with a single line; a tree can approximate step-like boundaries by repeated splits, capturing more complex patterns.

Foundation for ensembles — A single tree can be unstable, but hundreds of trees (e.g. random forest) form a strong, robust model. Ch06 is the basis for Ch07 Ensemble.

Credit and loans — Questions like “Income \geq 50M?” and “Any default in the last year?” form a path to approve or deny.

Medical decision support — Patient data (blood pressure, cholesterol, etc.) is used in a sequence of questions to predict disease risk and support diagnosis.

Marketing (churn, purchase) — “Registered &gt; 6 months?”, “Logins in the last month \leq 3?” help find at-risk customers for targeted campaigns.

p_i