Statistical vs Practical Significance

Difficulty: Beginner Reading Time: 10 minutes

What Does "Significant" Really Mean?

When a researcher says a result is "statistically significant," they mean that the observed effect is unlikely to have occurred by pure chance. Specifically, the probability of seeing such a result if there were truly no effect is very small, typically less than 5%. This is what the p-value measures.

But here is the catch: statistically significant does not mean important, meaningful, or useful. A result can be statistically significant while being so small that nobody would care about it in practice. Understanding this distinction is one of the most valuable skills you can develop as a consumer of research.

When Tiny Effects Look Impressive

Imagine a company tests a new website layout and finds that it increases the average time users spend on the site by 0.8 seconds. With a sample of 500,000 visitors, this difference produces a p-value of 0.001, which is highly statistically significant. But does an extra 0.8 seconds of browsing actually matter for the business? Probably not. The effect is real in the statistical sense, but it has no practical value.

47.2 Old Layout 48 New Layout

This happens because statistical significance depends heavily on sample size. With a large enough sample, even the tiniest difference between two groups will produce a small p-value. The test becomes so sensitive that it picks up on noise-level effects that would be invisible and irrelevant in the real world.

Example

A pharmaceutical company tests a new blood pressure drug on 50,000 patients. The drug lowers systolic blood pressure by 1.2 mmHg compared to a placebo, and the result is statistically significant (p = 0.003). However, doctors consider a reduction of at least 5-10 mmHg to be clinically meaningful. A 1.2 mmHg drop would not change any treatment decision. The drug "works" statistically, but it is practically useless.

Practical Significance: Does It Actually Matter?

Practical significance asks a different question: is the effect large enough to matter in the real world? This depends on context, not just math. A 2% improvement in fuel efficiency might be practically significant for an airline that burns millions of gallons per year, but meaningless for someone who drives to the grocery store once a week.

Researchers use a concept called "effect size" to measure how large a difference actually is, independent of sample size. Common effect size measures include Cohen's d (for comparing two group means) and correlation coefficients. A small effect size combined with a small p-value should make you cautious. The result is real but may not be worth acting on.

Drug Effect (mmHg) 0.4 2
Clinically Meaningful 6.1 10.9

Notice in the confidence intervals above how the drug's effect and its entire range of plausible values fall well below what doctors would consider a meaningful change. Even though we are confident the effect is not zero, it is still too small to matter.

How Sample Size Creates Confusion

Small samples have the opposite problem. With too few participants, a study may fail to detect a real and important effect simply because the sample was not large enough to produce a significant p-value. This is called low statistical power. A study of 20 people might find a large, practically meaningful difference but report it as "not statistically significant" because the sample was too small to be confident.

This means you can be misled in both directions. Large samples can make trivial effects look significant, and small samples can make important effects look insignificant. Neither the p-value nor the sample size alone tells you whether a result matters. You need to look at the actual size of the effect and judge it against the context.

When Statistics Mislead: Real-World Traps

Headlines love to report statistically significant findings without mentioning effect size. "Study finds that eating chocolate is linked to lower stress!" may be based on a study where chocolate eaters scored 0.3 points lower on a 100-point stress scale. Technically true, practically meaningless.

Marketing teams exploit this too. "Clinically proven to improve skin hydration" might mean a moisturizer increased hydration by 2% compared to using nothing at all, tested on thousands of people. The claim is technically supported by a significant p-value, but the effect is invisible to anyone using the product.

To protect yourself, always ask: how big is the effect? Is it expressed in units you can understand? Would this difference change your behavior or decisions? If the study only reports a p-value without telling you the size of the effect, that is a red flag.

Key Takeaway

Statistical significance tells you whether an effect is likely real. Practical significance tells you whether it actually matters. A result can be statistically significant but too small to care about, especially with large samples. Always look at the size of the effect, not just the p-value, and ask yourself whether the difference would change any real-world decision.