Percentiles & Box Plots

Difficulty: Beginner Reading Time: 10 minutes

What Are Percentiles?

A percentile tells you what percentage of values in a dataset fall below a given point. If your test score is at the 85th percentile, it means you scored higher than 85% of test-takers. It does not mean you got 85% of the questions right -- percentiles describe your rank relative to everyone else, not your absolute performance.

Percentiles are used everywhere. Pediatricians track children's height and weight using percentile charts. Standardized tests like the SAT and GRE report scores as percentiles. Salary surveys describe compensation in percentiles so companies can see where they stand relative to the market.

The most commonly referenced percentiles are the quartiles, which divide data into four equal parts. The 25th percentile is called Q1 (the first quartile), the 50th percentile is Q2 (the median), and the 75th percentile is Q3 (the third quartile). Together with the minimum and maximum, these five values form the five-number summary -- a compact snapshot of an entire dataset.

12 20 28 36 44 52 55

In the dot plot above, you can see how most values cluster in the 20s and 30s, with a few lower values and one high outlier at 55. Percentiles help us describe this distribution concisely without needing to list every data point.

The Five-Number Summary

The five-number summary consists of five values: the minimum, Q1, median, Q3, and maximum. These five numbers tell you where the data starts, where the middle 50% sits, and where the data ends.

Example

Consider the daily tips earned by a waiter over 20 shifts: $12, $15, $17, $19, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $33, $35, $38, $42, $55. The five-number summary would be: Minimum = $12, Q1 = $20, Median = $26.50, Q3 = $32, Maximum = $55. At a glance, you can see that the middle 50% of tips falls between $20 and $32, the typical tip is around $26-$27, and there is one unusually large tip day at $55.

The Interquartile Range (IQR)

The interquartile range is simply Q3 minus Q1. It measures the spread of the middle 50% of your data, ignoring the extremes. In the waiter example, IQR = $32 - $20 = $12.

The IQR is a more robust measure of spread than the range (maximum minus minimum) because it is not affected by outliers. The waiter's range is $55 - $12 = $43, which is heavily influenced by that one great tip day. The IQR of $12 gives a more accurate picture of typical day-to-day variation.

The IQR is also used to identify outliers. A common rule of thumb says that any value below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR is a potential outlier. In the waiter example, the upper fence would be $32 + 1.5 * $12 = $50. The $55 tip day exceeds this threshold, confirming it as a statistical outlier.

Reading a Box Plot

A box plot (also called a box-and-whisker plot) is the visual representation of the five-number summary. The box stretches from Q1 to Q3, with a line inside marking the median. "Whiskers" extend from the box to the smallest and largest non-outlier values. Any outliers appear as individual dots beyond the whiskers.

Box plots are especially useful for comparing multiple groups side by side. If you wanted to compare tips across three different restaurants, three box plots placed next to each other would instantly show which restaurant has higher typical tips, which has more variation, and which has more outliers.

12 Min 20 Q1 26 Median 32 Q3 55 Max

The bar chart above represents the five-number summary values as bars so you can see their relative positions. Notice the gap between Q3 and the maximum -- this asymmetry suggests the data is right-skewed, with a long tail toward higher values.

What Box Plots Reveal About Shape

Box plots can tell you about the skewness of a distribution. If the median line is centered in the box and the whiskers are roughly equal in length, the data is symmetric. If the median is closer to Q1 and the upper whisker is longer, the data is right-skewed (a long tail of high values). If the median is closer to Q3 and the lower whisker is longer, the data is left-skewed.

For instance, income data almost always produces a right-skewed box plot: the median is low in the box, the upper whisker is long, and there are many outliers on the high end. Exam scores in a well-designed course often produce a left-skewed box plot: most students do well, but a few stragglers pull the lower whisker down.

Box plots sacrifice some detail compared to histograms -- you cannot see the exact shape of the distribution or identify multiple peaks. But they excel at compact comparison and outlier detection, which is why they are a staple in exploratory data analysis.

5 10 15 20 25 30
Key Takeaway

Percentiles rank values relative to the rest of the data, with quartiles (Q1, median, Q3) being the most important landmarks. The five-number summary and the IQR provide a concise, outlier-resistant snapshot of any dataset. Box plots turn this summary into a visual that reveals center, spread, skewness, and outliers at a glance -- making them ideal for quick comparison across multiple groups.