Chapter 03
Logarithm: From Multiplication to Addition, the Language of Loss Design
A logarithm answers 'how many times we multiply the base to get this number?' It is the inverse of exponentiation and is used with exponentials in loss and probability in deep learning.
Math diagram by chapter
Select a chapter to see its diagram below. View the flow of basic math at a glance.
Logarithm is the inverse of exponent. means . Below are the graphs of and its inverse .
Example: , , , (when , is )
Purple: , Teal:
What is the logarithm?
The logarithm is like "running exponentiation backward." In , when you see the result 8 and ask "how many times did we multiply 2 to get 8?", that count
(3) is the logarithm: . Here 2 is the base and 8 is the argument.
Think of it as counting digits. so ; so . When the number grows 10×, the log value only goes up by 1. So log acts as a filter that turns explosively large numbers into much gentler ones. Basic properties: (base to the 0th power is 1), (base to the 1st power is itself).
The magic of log is that it turns multiplication into addition: . For computers, multiplication is costlier than addition and can overflow or underflow; taking the log turns that multiplication into a safer, simpler addition.
The argument condition () matters: log of 0 or a negative number is undefined. So in AI code we often add a tiny constant (, epsilon) to avoid errors. The natural log (, base ) keeps differentiation tidy and is the standard in deep learning.
Avoiding underflow is essential. If AI multiplies probability a hundred times (), the computer may treat it as zero. Taking the log gives —a meaningful number the computer can still handle.
It is the ruler for information (entropy). The rarer an event, the larger (in absolute value) its log. A rare event (e.g. "sun rises in the west") carries high information; an obvious one ("morning comes") carries almost none. AI uses this log-based measure to see how much surprising information was learned.
It penalizes mistakes harshly. For with , as approaches 0, plunges toward . If the model predicts probability 0.9 for the correct answer, loss is small; if it predicts 0.01, the log gives a huge penalty. So the model is pushed to fix wrong answers clearly.
The cross-entropy loss is a prime example. In classification we minimize for the correct class. That is the mathematical way to say "make the probability of the correct answer as close to 1 as possible (and its log close to 0)."
Maximum likelihood estimation (MLE) uses it: "maximize the probability of observing this data" means maximizing a product of probabilities. Taking the log turns that into maximizing a sum, which is easier to differentiate and numerically more stable.
- Example
- Value3 ()
- Example
- Value2
- Example
- Value2
| Example | Value |
|---|---|
| 3 () | |
| 2 | |
| 2 |
The log is an integer only when the argument is a power of the base.
Operations frequently used with logarithms (often in AI loss and probability):
- OperationLog sum
- Formula
- Noteproduct → sum
- OperationLog difference
- Formula
- Notequotient → difference
- OperationPower
- Formula
- Noteexponent out front
| Operation | Formula | Note |
|---|---|---|
| Log sum | product → sum | |
| Log difference | quotient → difference | |
| Power | exponent out front |
- ExampleLog sum
- Calculation
- ExampleLog difference
- Calculation
| Example | Calculation |
|---|---|
| Log sum | |
| Log difference |
In the problems below, find log values, arguments, log sums, or log differences.