A Trend That Flips
Imagine you are comparing two hospitals. Hospital A has a higher survival rate than Hospital B for heart surgery patients. Hospital A also has a higher survival rate for general surgery patients. So Hospital A must be better overall, right? Not necessarily. When you combine the data, Hospital B can actually have the higher overall survival rate. This is Simpson's Paradox: a trend that appears in separate groups reverses or disappears when the groups are combined.
It sounds impossible, but it happens all the time in real data. The paradox arises because of an imbalance in how cases are distributed across groups. Understanding it is critical for anyone who works with data or reads research, because aggregated numbers can tell a completely misleading story.
The Berkeley Admissions Case
The most famous example of Simpson's Paradox comes from the University of California, Berkeley. In 1973, overall graduate admissions data showed that 44% of male applicants were admitted compared to only 35% of female applicants. This looked like clear evidence of gender discrimination against women.
But when researchers examined each department individually, they found something startling. In most departments, women were admitted at equal or even higher rates than men. There was no department-level bias against women. So how could the overall numbers show such a gap?
The answer was that women disproportionately applied to the most competitive departments, ones with low admission rates for everyone. Men tended to apply to less competitive departments with higher admission rates. When you combined all departments together, the differences in where men and women applied created the illusion of a bias that did not exist at the department level.
As you can see in the department-level data above, women actually had comparable or better admission rates within individual departments. The overall gap was entirely driven by the composition of who applied where.
Why It Happens: Lurking Variables
Simpson's Paradox occurs because of a lurking variable, also called a confounding variable, that changes the mix of data between groups. In the Berkeley example, the lurking variable was department choice. It was related to both gender (women chose different departments) and the outcome (some departments were harder to get into).
Think of it this way: if you mix together data from very different situations, the proportions of each situation in each group can dominate the results. A small group with a high rate and a large group with a low rate will produce a combined rate that is pulled toward the larger group. If two groups have different proportions of "easy" and "hard" cases, their combined rates can flip.
A company has two divisions. In Division X, the new training program improved performance for 80% of participants (40 out of 50). In Division Y, it improved performance for 90% of participants (9 out of 10). The overall improvement rate is 49 out of 60, or about 82%. Meanwhile, a different company's program improved 85% in Division X (17 out of 20) and 95% in Division Y (38 out of 40). Their overall rate is 55 out of 60, or about 92%. The second company looks better overall, but the first company's program had a higher rate in both divisions. The paradox arises because the first company put most people through the harder division.
Simpson's Paradox in Medicine and Business
In medicine, Simpson's Paradox can affect treatment comparisons. A study might show that Treatment A has better outcomes than Treatment B overall, but when you separate patients by severity, Treatment B is actually better for both mild and severe cases. This can happen if Treatment B is disproportionately given to the most severe patients, pulling down its overall average.
In business, you might see it in conversion rates. A marketing channel might have a lower overall conversion rate but outperform in every customer segment. The difference arises because that channel brings in more customers from hard-to-convert segments. Making decisions based on the aggregated number could lead you to cut your best-performing channel.
Batting averages in baseball have also famously demonstrated the paradox. A player can have a higher batting average than another player in each individual year but a lower average when the years are combined, because the number of at-bats in each year differs dramatically.
How to Avoid Being Fooled
The key defense against Simpson's Paradox is to always consider whether subgroups exist that might tell a different story. When you see aggregated data, ask yourself: are there meaningful categories within this data? Could the mix of those categories differ between the groups being compared?
This does not mean you should always prefer the subgroup results. Sometimes the aggregated view is the right one. The correct approach depends on your specific question and what is causing the difference. If the lurking variable is a confounder you need to control for, then the subgroup analysis is more trustworthy. If the lurking variable reflects a genuine aspect of the comparison, the aggregate may be appropriate.
Whenever possible, look at the data both ways. If the aggregated and subgroup analyses agree, you can be more confident. If they disagree, dig deeper before drawing conclusions. The paradox is a powerful reminder that data summaries can hide as much as they reveal.
Simpson's Paradox occurs when a trend that holds within every subgroup reverses when the groups are combined. It happens because a lurking variable changes the composition of data across groups. The antidote is to look at your data at multiple levels and always ask whether hidden subgroups could be driving the overall pattern. Aggregated data can tell a completely different story from the detailed view.