Updating What You Believe
In the last lesson, we saw how new information changes probability. Conditional probability answered the question: "Given that something happened, what are the new chances?" Bayes' Theorem takes this one step further by giving us a systematic way to update our beliefs when we get new evidence.
Named after Reverend Thomas Bayes, an 18th-century minister and mathematician, this theorem is one of the most powerful ideas in all of statistics. And despite its reputation, the core idea is surprisingly straightforward.
The Big Idea in Plain Language
Here is Bayes' Theorem in everyday terms:
Your updated belief = Your original belief × How well the evidence fits ÷ How common the evidence is overall
Or more precisely:
- Start with what you believed before (the "prior" probability).
- Look at the new evidence and ask: "How likely would this evidence be if my belief were true?"
- Also consider: "How likely is this evidence in general - whether my belief is true or not?"
- Combine these to get your updated belief (the "posterior" probability).
The Formula
For those who like formulas, here it is:
P(A | B) = P(B | A) × P(A) ÷ P(B)
Where:
- P(A | B) = the probability of A after seeing evidence B (what you want to find)
- P(B | A) = the probability of seeing evidence B if A is true
- P(A) = the probability of A before any new evidence (your prior belief)
- P(B) = the overall probability of seeing evidence B
Do not worry if the formula feels abstract right now. The examples below will make it concrete.
Example: Disease Testing Revisited
This is the classic Bayes' Theorem example, and it picks up exactly where our conditional probability lesson left off.
A disease affects 1 in 200 people (0.5%). A blood test detects the disease 95% of the time when someone has it (this is called "sensitivity"). But 3% of healthy people also get a false positive result.
You test positive. What is the probability you actually have the disease?
Step 1: Write down what you know.
- P(disease) = 0.005 (your prior - 1 in 200)
- P(positive | disease) = 0.95 (test catches 95% of sick people)
- P(positive | no disease) = 0.03 (3% false positive rate)
Step 2: Find P(positive) - the overall chance of testing positive.
P(positive) = P(positive | disease) × P(disease) + P(positive | no disease) × P(no disease)
= (0.95 × 0.005) + (0.03 × 0.995)
= 0.00475 + 0.02985 = 0.03460
Step 3: Apply Bayes' Theorem.
P(disease | positive) = P(positive | disease) × P(disease) ÷ P(positive)
= (0.95 × 0.005) ÷ 0.03460
= 0.00475 ÷ 0.03460 = 0.137, or about 13.7%.
Even with a 95% accurate test, a positive result means only about a 14% chance of actually having the disease. The low base rate (only 0.5% of people are sick) means most positive results are false alarms.
This is why doctors often order a second test after a positive result. If the second test is also positive, the probability jumps dramatically - because now your "prior" is 13.7% instead of 0.5%.
Thinking in Terms of People (The Natural Frequency Approach)
Many people find Bayes' Theorem easier to understand using actual numbers of people instead of probabilities. Let us redo the example above with 10,000 people:
Out of 10,000 people:
- 50 have the disease (0.5% of 10,000).
- 9,950 do NOT have the disease.
Test the 50 sick people: 95% test positive → 47.5 (about 48) positive results.
Test the 9,950 healthy people: 3% test positive → 298.5 (about 299) false positives.
Total positive results: 48 + 299 = 347.
Of those 347 positive results, only 48 actually have the disease.
48 ÷ 347 = 0.138, or about 13.8%. (The tiny difference from 13.7% is just rounding.)
This approach - counting actual people - often feels more intuitive than plugging numbers into a formula.
Example: Is That Email Spam?
Email spam filters are one of the most common real-world applications of Bayes' Theorem.
Suppose 40% of emails you receive are spam. The word "FREE" appears in 80% of spam emails but only 5% of legitimate emails. An email arrives containing the word "FREE." What is the probability it is spam?
What we know:
- P(spam) = 0.40
- P("FREE" | spam) = 0.80
- P("FREE" | not spam) = 0.05
P("FREE") overall:
= (0.80 × 0.40) + (0.05 × 0.60) = 0.32 + 0.03 = 0.35
Apply Bayes' Theorem:
P(spam | "FREE") = (0.80 × 0.40) ÷ 0.35 = 0.32 ÷ 0.35 = 0.914, or about 91.4%.
An email with "FREE" has about a 91% chance of being spam. Real spam filters use this exact logic, but with thousands of words and features instead of just one.
Example: Cooking and Smoke Alarms
Your smoke alarm goes off. You know from experience:
- There is a real fire in your home about once every 10 years, or roughly a 0.03% chance on any given day.
- When there IS a fire, the alarm goes off 99% of the time.
- When there is NO fire, the alarm still goes off about 2% of the time (burnt toast, steam from a shower, etc.).
The alarm just went off. Is there a fire?
P(fire | alarm) = (0.99 × 0.0003) ÷ [(0.99 × 0.0003) + (0.02 × 0.9997)]
= 0.000297 ÷ (0.000297 + 0.019994) = 0.000297 ÷ 0.020291 = 0.0146, or about 1.5%.
There is about a 1.5% chance of an actual fire. That is low - but not zero. You should still check! The point is that most alarms are false alarms, because real fires are very rare compared to burnt toast.
Why the Base Rate Matters So Much
Across all these examples, one theme keeps appearing: the base rate - how common something is before any evidence - has an enormous impact on the final answer.
When the thing you are testing for is rare (a disease that affects 0.5% of people, or a fire that happens once a decade), even very accurate tests produce mostly false positives. This is not a failure of the test; it is basic math.
Conversely, when something is common (spam making up 40% of email), even moderate evidence can push the probability very high.
The practical lesson: always ask "how common is this in the first place?" before interpreting any test result, warning, or indicator.
Bayes' Theorem as a Way of Thinking
Beyond the math, Bayes' Theorem offers a valuable way of thinking about the world:
- Start with what you know. Before getting new evidence, what is your best estimate? This is your prior.
- Weigh new evidence carefully. How strongly does this evidence point one way or the other?
- Update proportionally. Strong evidence should shift your beliefs a lot. Weak evidence should shift them a little.
- Be willing to update again. Every new piece of evidence is a chance to refine your beliefs further.
This approach - starting with a belief, gathering evidence, and updating - is the backbone of scientific thinking, medical diagnosis, criminal investigation, and good decision-making in general.
Common Pitfalls
- Ignoring the prior. A "99% accurate" test sounds impressive, but if the condition is one-in-a-million, most positives will still be false.
- Confusing P(B | A) with P(A | B). The probability the test is positive given you are sick is NOT the same as the probability you are sick given a positive test.
- Not updating enough - or too much. One piece of weak evidence should not completely overturn a strong prior. But very strong evidence should cause a big update, even if it contradicts your initial belief.
Bayes' Theorem provides a logical way to update your beliefs when new evidence arrives. The formula P(A | B) = P(B | A) × P(A) ÷ P(B) combines your prior belief with the strength of the evidence to produce an updated probability. The base rate - how common something is to begin with - is crucial and often overlooked. Whether you are interpreting medical test results, filtering spam, or making everyday decisions, thinking like Bayes means starting with what you know, weighing the evidence, and updating your beliefs step by step.