Everyone's AI
Machine learningAI Papers
Loading...

Learn

🏅My achievements

Chapter 03

Logarithm: From Multiplication to Addition, the Language of Loss Design

A logarithm answers 'how many times we multiply the base to get this number?' It is the inverse of exponentiation and is used with exponentials in loss and probability in deep learning.

Math diagram by chapter

Select a chapter to see its diagram below. View the flow of basic math at a glance.

Logarithm is the inverse of exponent. y=log⁡2xy = \log_2 xy=log2​x means 2y=x2^y = x2y=x. Below are the graphs of y=log⁡2xy = \log_2 xy=log2​x and its inverse y=2xy = 2^xy=2x.

012345678012345678xy(x=1.0, y=0.00)(x=0.0, y=1.0)

Example: log⁡21=0\log_2 1 = 0log2​1=0, log⁡22=1\log_2 2 = 1log2​2=1, log⁡24=2\log_2 4 = 2log2​4=2, log⁡28=3\log_2 8 = 3log2​8=3 (when 2y=x2^y = x2y=x, yyy is log⁡2x\log_2 xlog2​x)

Purple: y=log⁡2xy=\log_2 xy=log2​x, Teal: y=2xy=2^xy=2x

What is the logarithm?

The logarithm is like "running exponentiation backward." In 23=82^3 = 823=8, when you see the result 8 and ask "how many times did we multiply 2 to get 8?", that count
(3) is the logarithm: log⁡28=3\log_2 8 = 3log2​8=3. Here 2 is the base and 8 is the argument.
Think of it as counting digits. 100=102100 = 10^2100=102 so log⁡10100=2\log_{10} 100 = 2log10​100=2; 1000=1031000 = 10^31000=103 so log⁡101000=3\log_{10} 1000 = 3log10​1000=3. When the number grows 10×, the log value only goes up by 1. So log acts as a filter that turns explosively large numbers into much gentler ones. Basic properties: log⁡a1=0\log_a 1 = 0loga​1=0 (base to the 0th power is 1), log⁡aa=1\log_a a = 1loga​a=1 (base to the 1st power is itself).
The magic of log is that it turns multiplication into addition: log⁡a(b×c)=log⁡ab+log⁡ac\log_a(b \times c) = \log_a b + \log_a cloga​(b×c)=loga​b+loga​c. For computers, multiplication is costlier than addition and can overflow or underflow; taking the log turns that multiplication into a safer, simpler addition.
The argument condition (x>0x>0x>0) matters: log of 0 or a negative number is undefined. So in AI code we often add a tiny constant (ϵ\epsilonϵ, epsilon) to avoid log⁡(0)\log(0)log(0) errors. The natural log (ln⁡\lnln, base eee) keeps differentiation tidy and is the standard in deep learning.
Avoiding underflow is essential. If AI multiplies probability 0.10.10.1 a hundred times (0.11000.1^{100}0.1100), the computer may treat it as zero. Taking the log gives log⁡(0.1100)=100×log⁡(0.1)=−100\log(0.1^{100}) = 100 \times \log(0.1) = -100log(0.1100)=100×log(0.1)=−100—a meaningful number the computer can still handle.
It is the ruler for information (entropy). The rarer an event, the larger (in absolute value) its log. A rare event (e.g. "sun rises in the west") carries high information; an obvious one ("morning comes") carries almost none. AI uses this log-based measure to see how much surprising information was learned.
It penalizes mistakes harshly. For y=ln⁡xy=\ln xy=lnx with 0<x<10<x<10<x<1, as xxx approaches 0, yyy plunges toward −∞-\infty−∞. If the model predicts probability 0.9 for the correct answer, loss is small; if it predicts 0.01, the log gives a huge penalty. So the model is pushed to fix wrong answers clearly.
The cross-entropy loss is a prime example. In classification we minimize −log⁡p-\log p−logp for the correct class. That is the mathematical way to say "make the probability of the correct answer as close to 1 as possible (and its log close to 0)."
Maximum likelihood estimation (MLE) uses it: "maximize the probability of observing this data" means maximizing a product of probabilities. Taking the log turns that into maximizing a sum, which is easier to differentiate and numerically more stable.
  • Examplelog⁡28\log_2 8log2​8
  • Value3 (23=82^3=823=8)
  • Examplelog⁡24\log_2 4log2​4
  • Value2
  • Examplelog⁡39\log_3 9log3​9
  • Value2
ExampleValue
log⁡28\log_2 8log2​83 (23=82^3=823=8)
log⁡24\log_2 4log2​42
log⁡39\log_3 9log3​92
The log is an integer only when the argument is a power of the base.
Operations frequently used with logarithms (often in AI loss and probability):
  • OperationLog sum
  • Formulalog⁡ab+log⁡ac=log⁡a(b⋅c)\log_a b + \log_a c = \log_a(b \cdot c)loga​b+loga​c=loga​(b⋅c)
  • Noteproduct → sum
  • OperationLog difference
  • Formulalog⁡ab−log⁡ac=log⁡a(b/c)\log_a b - \log_a c = \log_a(b/c)loga​b−loga​c=loga​(b/c)
  • Notequotient → difference
  • OperationPower
  • Formulalog⁡a(bn)=n⋅log⁡ab\log_a(b^n) = n \cdot \log_a bloga​(bn)=n⋅loga​b
  • Noteexponent out front
OperationFormulaNote
Log sumlog⁡ab+log⁡ac=log⁡a(b⋅c)\log_a b + \log_a c = \log_a(b \cdot c)loga​b+loga​c=loga​(b⋅c)product → sum
Log differencelog⁡ab−log⁡ac=log⁡a(b/c)\log_a b - \log_a c = \log_a(b/c)loga​b−loga​c=loga​(b/c)quotient → difference
Powerlog⁡a(bn)=n⋅log⁡ab\log_a(b^n) = n \cdot \log_a bloga​(bn)=n⋅loga​bexponent out front
  • ExampleLog sum
  • Calculationlog⁡22+log⁡24=1+2=3\log_2 2 + \log_2 4 = 1 + 2 = 3log2​2+log2​4=1+2=3
  • ExampleLog difference
  • Calculationlog⁡28−log⁡22=3−1=2\log_2 8 - \log_2 2 = 3 - 1 = 2log2​8−log2​2=3−1=2
ExampleCalculation
Log sumlog⁡22+log⁡24=1+2=3\log_2 2 + \log_2 4 = 1 + 2 = 3log2​2+log2​4=1+2=3
Log differencelog⁡28−log⁡22=3−1=2\log_2 8 - \log_2 2 = 3 - 1 = 2log2​8−log2​2=3−1=2
In the problems below, find log values, arguments, log sums, or log differences.