Ch.07

Ensemble and Random Forest: The Wisdom of the Crowd

Ensemble methods combine predictions from multiple models to produce a single, often better prediction. This chapter explains bagging, boosting, stacking, and random forest—where many decision trees vote or average—so beginners can follow the idea of collective intelligence.

Select a chapter to see its diagram below. View the machine learning flow at a glance.

Combine predictions from multiple models (trees) by voting or averaging to get the final prediction.

The core idea of ensemble: many hands make light work — An ensemble builds a team of multiple models and combines their predictions to reach a final answer. Like a jury voting on a verdict, using many models instead of one sharply reduces the chance of wrong answers (variance) and makes predictions more stable . For classification we use majority vote; for regression we use the average of predictions.

Why are many better than one? (Wisdom of the crowd) — If you ask 100 people to guess a cow's weight, individual guesses may be off, but the average of 100 guesses is often surprisingly close to the true weight. When models independently judge and we combine results, their random errors tend to cancel out and the shared signal remains.

Three main ensemble methods: Bagging, Boosting, Stacking — (1) Bagging : Each model gets a different random subset of data (like different practice tests); then they vote. (2) Boosting : The next model focuses on what the previous one got wrong, learning sequentially from mistakes. (3) Stacking : A meta-model takes the reports of base models and makes the final decision.

Random Forest: a forest of diverse trees — Bagging with decision trees : grow hundreds of trees. To keep them diverse, each tree is trained on a random subset of features at each split. Some trees rely on "age", others on "income", maximizing diversity .

\hat{y} = \frac{1}{B}\sum_{b=1}^B \hat{y}_b

OOB (Out-of-Bag) evaluation — In bagging/random forest, each tree is trained on a random sample of the data. The left-out samples (Out-of-Bag) can be used to evaluate those trees that did not see them—like a built-in validation set without holding out separate test data.

A stable forest that doesn't sway — A single decision tree can change a lot when data changes slightly. A forest of hundreds of trees stays stable; a few wrong trees don't change the overall vote. This leads to strong, reliable performance in practice.

Natural extension of Ch06 Decision Tree — The same tree structure (impurity, information gain) is reused. You're not learning new rules—just how to combine many trees with voting, so the previous chapter's knowledge is fully used.

The go-to model in industry and competitions — Random forest often works very well with little tuning, so it's many practitioners' first choice. It also provides feature importance, which helps explain which variables matter most.

General-purpose for business (classification and regression) — From "Is this email spam?" to "What will tomorrow's stock price be?", ensembles are used across almost every business problem.

Finding what matters (feature importance) — If trees in a loan model rely most on "income", that variable is the most important for the decision. This helps filter out unnecessary data.

Wide real-world use — Fraud detection, recommendation systems (e.g. Netflix, YouTube), equipment failure prediction—wherever accuracy and stability matter.

\hat{y} = \frac{1}{B}\sum_{b=1}^B \hat{y}_b