Ch.01
머신러닝의 출발: 데이터와 특성(Feature)
Machine learning starts with data. We turn images, text, and numbers into features—numeric representations that let the model learn patterns. The world of numbers and functions from Basic Math Ch00 becomes reality here.
ML diagram by chapter
Select a chapter to see its diagram below. View the machine learning flow at a glance.
A visualization for this concept is coming soon.
What are Data and Features?
Data is the raw material of machine learning — As we learned in Basic Math Ch00, deep learning and machine learning turn images, text, and sound into numbers. These numeric inputs paired with labels (correct answers) form data. For example, 'cat image + cat' is one data point, and thousands of such pairs become the material for the model to learn from.
Features are the numeric essence of data — A photo we see is just a pile of tens of thousands of pixel numbers to a computer. Features are the useful information—like ear shape, eye size, fur color—extracted and expressed as numbers. Mathematically they are vectors, extracted from raw data through functions. The 'functions that define input-output rules' from Ch00 handle this transformation.
In short — Data is a collection of (input, label) pairs; features are the result of turning that input into numeric vectors the model can understand. Good features lead to better learning; bad features hurt performance even with lots of data. The start of machine learning is deciding what data to use and what features to extract.
Without data, learning is impossible — Every decision a model makes is the result of numbers and functions. As in Ch00, to follow the AI computation we need data expressed as numbers. If data is scarce or labels are wrong, the model learns the wrong patterns.
Feature design sets the model's limits — Deciding which information to turn into numbers is called feature engineering. Using only 'yesterday's closing price' vs. adding 'moving average, volume, volatility' for stock prediction leads to very different results. Vectors and matrices bundle many features for batch computation—a core part of the Ch00 roadmap—and the quality of features drives model performance.
Bridge to the next chapters — Ch02 KNN, Ch03 Linear Regression, Ch05 Logistic Regression, and all ML algorithms take feature vectors as input. Understanding data and features is needed to interpret why a model made a given prediction, and the later chapters on differentiation and probability build on this foundation.
Input → feature extraction → model → prediction — The ML pipeline matches the input → numeric conversion → repeated functions → output structure from Ch00. Feature extraction is the 'numeric conversion' step; models (linear regression, KNN, etc.) are sets of functions. Differentiation is used to reduce error during training; probability expresses uncertainty in predictions like '90% chance this image is a cat'.
This chapter summarized the role of data and features in machine learning and how they are used in practice. Data is a collection of (input, label) pairs; features are the result of turning that input into numeric vectors the model can use. Feature engineering—choosing and designing good features—strongly affects performance, so it helps to solidify these ideas before moving on to the next chapters (KNN, linear regression, etc.).
| Concept | Role in data/features | Basic math link |
|---|---|---|
| Data | Collection of (input, label) pairs, expressed as numbers | Domain and codomain of functions (Ch01) |
| Features | Input converted to vectors; model input | Vectors, matrices (Ch00 roadmap) |
| Training | Adjusting model parameters from data | Differentiation, gradient (Ch06–08) |
| Prediction | Feature vector → model → prediction or probability | Probability, distributions (Ch10–12) |