Non-Parametric Tests

Difficulty: Intermediate Reading Time: 12 minutes

When Normal Assumptions Fail

Many of the most popular statistical tests, like the t-test and ANOVA, assume that your data comes from a normal (bell-shaped) distribution. They also assume that the data is measured on an interval or ratio scale and that variances are roughly equal across groups. These assumptions work well much of the time, but what happens when they do not hold?

Real-world data is often skewed, has outliers, or comes in the form of ranks or ordinal categories. Satisfaction ratings on a 1-to-5 scale, income data with extreme high earners, or response times with a long right tail all violate normality assumptions. Applying a t-test to heavily skewed data can give you misleading p-values and unreliable conclusions. Non-parametric tests provide a robust alternative.

2 8 14 20 26 32 35

Look at the dot plot above. This data has a clear right skew with a few extreme values pulling the tail out. A t-test on this kind of data could be unreliable. Non-parametric methods handle this gracefully because they work with ranks rather than raw values, making them resistant to outliers and skew.

The Rank-Based Approach

The central idea behind most non-parametric tests is simple: instead of analyzing the actual data values, you convert them to ranks. The smallest value gets rank 1, the next smallest gets rank 2, and so on. Then you perform your analysis on the ranks.

Why does this work? Ranks preserve the order of your data without being affected by how far apart the values are. Whether your highest value is 50 or 5,000, it still gets the highest rank. This makes rank-based tests insensitive to outliers and distributional assumptions. The trade-off is that you lose some information by discarding the actual distances between values, which is why non-parametric tests are generally less powerful than their parametric counterparts when the assumptions of the parametric test are actually met.

Mann-Whitney U Test

The Mann-Whitney U test (also called the Wilcoxon rank-sum test) is the non-parametric alternative to the independent samples t-test. Use it when you want to compare two independent groups but your data is not normally distributed, your sample is small, or your data is ordinal.

Example

A restaurant wants to compare customer satisfaction ratings (on a 1-to-10 scale) between its lunch and dinner service. The ratings are not normally distributed and the scale is arguably ordinal. A Mann-Whitney U test ranks all ratings together regardless of group, then checks whether one group's ranks tend to be higher. If lunch customers consistently get higher ranks than dinner customers, the test will show a significant difference.

8 Lunch Median 6 Dinner Median

The Mann-Whitney test actually tests whether one group tends to produce larger values than the other. It is often described as comparing medians, which is a useful simplification, though technically it compares the entire distributions. It is one of the most commonly used non-parametric tests in medical and social science research.

Wilcoxon Signed-Rank Test

The Wilcoxon signed-rank test is the non-parametric alternative to the paired samples t-test. Use it when you have two related measurements from the same subjects, like before-and-after scores, but the differences are not normally distributed.

The test works by computing the difference for each pair, ranking the absolute differences, and then comparing the sum of ranks for positive differences against the sum for negative differences. If a treatment truly has an effect, you would expect the positive (or negative) differences to have systematically higher ranks.

For example, if you measure pain levels in 20 patients before and after a new therapy, and the improvements are not symmetrically distributed, the Wilcoxon signed-rank test will give you a more reliable answer than a paired t-test. It is particularly common in clinical studies with small samples where normality cannot be verified.

Kruskal-Wallis Test

The Kruskal-Wallis test extends the Mann-Whitney approach to three or more independent groups. It is the non-parametric alternative to one-way ANOVA. All observations from all groups are ranked together, and the test checks whether the average ranks differ significantly across groups.

3 5 7 9 11 13 14

Like ANOVA, a significant Kruskal-Wallis result tells you that at least one group differs from the others, but it does not tell you which one. You would then use a post-hoc test (such as Dunn's test) to make pairwise comparisons.

Example

A company tests three different website designs and collects user engagement scores. The scores are heavily skewed because a few users spend much more time than others. A Kruskal-Wallis test compares the three designs without requiring the engagement scores to follow a normal distribution. If the result is significant, the company follows up with pairwise comparisons to identify which design outperformed.

When to Go Non-Parametric

Use non-parametric tests when your data is ordinal (like Likert scale ratings), when your sample size is very small (under 20-30 per group), when your data is clearly skewed or contains influential outliers, or when the assumptions of the parametric equivalent cannot be satisfied. They are also the right choice when you are analyzing ranks directly, such as preferences or rankings given by judges.

Do not use non-parametric tests simply because they seem safer. When your data reasonably meets parametric assumptions, parametric tests are more powerful, meaning they are better at detecting real effects. The ideal approach is to check your assumptions first (using histograms, normality tests, or Q-Q plots) and then choose the appropriate test.

In practice, many researchers report both parametric and non-parametric results when assumptions are borderline. If both tests lead to the same conclusion, you can be more confident in the finding. If they disagree, the non-parametric result is generally considered more trustworthy because it makes fewer assumptions.

Key Takeaway

Non-parametric tests are your safety net when data does not follow a normal distribution, contains outliers, or is measured on an ordinal scale. The Mann-Whitney U compares two independent groups, the Wilcoxon signed-rank compares paired measurements, and the Kruskal-Wallis compares three or more groups. They work by analyzing ranks instead of raw values, making them robust but slightly less powerful than parametric tests when normality holds.