P-Values Explained

Difficulty: Intermediate Reading Time: 12 minutes

The Most Misunderstood Number in Statistics

If you've ever read a science article, you've probably seen phrases like "p < 0.05" or "the result was statistically significant." Behind those phrases is a single number called a p-value. It's one of the most widely used - and most widely misunderstood - concepts in all of statistics.

-3 -2 -1 0 1 2 3

Let's clear things up with plain language and a simple experiment.

Start with a Question

Imagine your friend says they can predict coin flips. You're skeptical. So you design a test: flip a coin 20 times and let them call each one. If they're just guessing, they should get about 10 out of 20 right - roughly 50%.

They get 14 out of 20 correct. Is that impressive, or could it easily happen by luck?

That's exactly the kind of question a p-value answers.

What a P-Value Actually Is

A p-value answers this specific question: If nothing special is actually happening, how likely is it that we'd see results at least this extreme?

45 p<0.01 30 p<0.05 15 p<0.10 10 p>0.10

In the coin flip example: if your friend has no real ability (the null hypothesis), what's the probability of getting 14 or more correct out of 20 just by luck?

The answer turns out to be about 0.058 - roughly a 6% chance. That's the p-value.

Example

Your friend guesses 14 out of 20 coin flips correctly.

Null hypothesis: They're just guessing (50% chance each time).

P-value: About 0.058 - meaning there's roughly a 6% chance of getting 14 or more right by pure luck.

Is 6% low enough to convince you they have real ability? That depends on your threshold. At the common 5% cutoff, you'd say "not quite enough evidence." If they got 15 right (p ≈ 0.02), you might be more convinced.

The 0.05 Threshold

In most fields of research, a p-value below 0.05 (5%) is considered "statistically significant." This means the results would happen less than 5% of the time by pure chance, which is considered unlikely enough to take seriously.

Why 0.05? Honestly, it's somewhat arbitrary. The statistician Ronald Fisher suggested it in the 1920s as a convenient benchmark. It stuck, and now it's used almost everywhere. Some fields use stricter thresholds - particle physics uses 0.0000003 (about 1 in 3.5 million) to claim a discovery.

The key idea: a smaller p-value means stronger evidence against the null hypothesis. A p-value of 0.001 is much more convincing than 0.04.

What a P-Value Does NOT Mean

This is where most confusion lives. Here are the most common mistakes:

-3 -2 -1 0 1 2 3

Mistake 1: "The p-value is the probability the null hypothesis is true."

No. A p-value of 0.03 does NOT mean there's a 3% chance that nothing is happening. The p-value assumes the null hypothesis is true and asks how surprising the data would be. It doesn't tell you the probability of any hypothesis being true or false.

Mistake 2: "A small p-value means the effect is large or important."

No. You can get a tiny p-value for a very small, practically meaningless effect - especially with a large sample size. If you survey a million people, even a trivial difference between two groups can produce a p-value of 0.0001. The effect might be real but too small to care about.

Mistake 3: "A p-value above 0.05 means there's no effect."

No. It means you didn't find strong enough evidence of an effect. That's different from proving there is none. Maybe you didn't have enough data. Maybe the effect is real but small. Absence of evidence is not evidence of absence.

Putting It in Everyday Terms

Think of the p-value as a "surprise meter." You start by assuming the boring explanation is true (nothing special is happening). Then you look at your data and ask: how surprised should I be?

  • P-value near 1.0: Not surprised at all. Your data is completely consistent with the boring explanation.
  • P-value around 0.5: Your data is unremarkable. Could easily happen by chance.
  • P-value around 0.05: Getting interesting. This would only happen about 1 in 20 times by chance.
  • P-value around 0.001: Very surprising. Only about 1 in 1,000 times by chance. Strong evidence something real is happening.

Why the 0.05 Cutoff Causes Problems

Treating 0.05 as a hard line creates odd situations. A study with p = 0.049 gets published as a "significant finding." A study with p = 0.051 gets treated as if nothing was found. But those two results are practically identical - the tiny difference could come down to one extra person in the study.

Many statisticians now argue we should stop treating 0.05 as a magic threshold. Instead, they suggest reporting the actual p-value and letting readers judge the strength of evidence for themselves.

Example

Two researchers study whether a certain exercise routine lowers blood pressure.

Researcher A finds p = 0.048 and writes: "The exercise significantly lowered blood pressure."

Researcher B finds p = 0.052 and writes: "The exercise had no significant effect on blood pressure."

Their results are almost identical! But because one crossed the 0.05 line and the other didn't, the conclusions sound completely different. This is why looking at the actual numbers - not just "significant or not" - matters so much.

P-Values in the Real World

P-values appear in medical studies, business experiments, social science research, and news headlines. When you see them, ask yourself:

  • How small is the p-value? (Smaller = stronger evidence)
  • How big is the actual effect? (A real but tiny effect might not matter)
  • How large was the sample? (Huge samples can make tiny effects "significant")
  • Was the study well designed? (A p-value from a poorly designed study means little)
Key Takeaway

A p-value tells you how surprising your data would be if nothing special were happening. A small p-value (typically below 0.05) suggests the data is unlikely to have occurred by chance alone. But a p-value is NOT the probability that a hypothesis is true, and a "significant" result doesn't automatically mean the finding is important or large. Always look at the size of the effect and the quality of the study alongside the p-value.