You Can't Measure Everything
Imagine you want to know the average height of every adult in your country. To get a perfect answer, you'd need to measure every single adult. That could be tens or hundreds of millions of people. You'd need an army of helpers, years of time, and a mountain of money. By the time you finished, people would have grown, shrunk, or passed away. Your data would already be outdated.
This is why we use samples. Instead of measuring everyone, we measure a smaller group and use the results to draw conclusions about the larger group. This idea is one of the most powerful concepts in statistics.
Populations vs. Samples
A population is the entire group you're interested in studying. A sample is a smaller subset of that population that you actually collect data from.
You're making a big pot of soup. To check if it's seasoned properly, you stir it well and taste one spoonful. The entire pot is the population. The spoonful is the sample. You don't need to drink the whole pot to know if it needs more salt. One representative spoonful tells you what you need to know, but only if you stirred the pot first. If all the salt settled at the bottom, your spoonful from the top would be misleading.
The population doesn't always mean "all the people in a country." It means whatever complete group you're studying:
- If you want to know what students at your school think about lunch options, the population is all students at your school.
- If a factory wants to test the durability of its light bulbs, the population is every light bulb produced.
- If a doctor studies the effect of a medication on adults with high blood pressure, the population is all adults with high blood pressure.
Why We Sample
There are several practical reasons why studying the entire population is often impossible or impractical:
- Cost: Surveying millions of people is expensive. A well-designed sample of 1,000 people can give you remarkably accurate results for a fraction of the cost.
- Time: Collecting data from everyone takes too long. By the time you finish, the information may no longer be relevant.
- Impossibility: Some testing destroys the item being tested. A light bulb company can't test every bulb until it burns out and still have products to sell.
- Accessibility: You simply can't reach every member of some populations. You can't interview every fish in the ocean to study their feeding habits.
A company makes 100,000 batteries per month. To check quality, they randomly select 500 batteries and test them. If 98% of the tested batteries meet quality standards, the company can be reasonably confident that about 98% of all 100,000 batteries are also good. They can't test every single one because the testing process drains the batteries completely.
What Makes a Good Sample?
Not all samples are created equal. A bad sample gives you misleading results, no matter how large it is. The key quality of a good sample is that it's representative, meaning it reflects the characteristics of the population as a whole.
Representative Samples
A representative sample looks like a miniature version of the population. If 60% of the population is female, roughly 60% of your sample should be female. If the population includes people of all ages, your sample should too.
A polling company wants to predict how a country will vote. If they only survey people in wealthy urban neighborhoods, their results will be skewed. Those people may have very different political views than people in rural areas or lower-income communities. A good election poll makes sure the sample includes people from different regions, income levels, age groups, and backgrounds, reflecting the actual voting population.
Bias in Sampling
Bias happens when your sample systematically differs from the population. Here are common ways this occurs:
- Convenience sampling: You survey whoever is easiest to reach. Asking only your friends about a product isn't representative of all customers.
- Voluntary response: You put out a survey and wait for people to respond. People with strong opinions (very happy or very angry) are more likely to respond, skewing the results.
- Undercoverage: Part of the population has no chance of being selected. If you survey people by calling landlines, you'll miss everyone who only uses a mobile phone, which tends to be younger people.
Random Sampling
The best way to get a representative sample is through random sampling. In a truly random sample, every member of the population has an equal chance of being selected. This doesn't mean haphazard or careless. It means deliberately using a process that removes human bias from the selection.
Think of it like a lottery. If every ticket has an equal chance of being drawn, the winning numbers aren't influenced by anyone's preferences or habits.
Types of Random Sampling
- Simple random sampling: Every individual has the same chance of being picked. Like pulling names from a hat that contains everyone.
- Stratified sampling: You divide the population into groups (strata) based on a key characteristic (like age or income), then randomly sample from each group. This ensures each group is represented.
- Systematic sampling: You pick every nth person from a list. For example, every 10th customer who enters a store.
- Cluster sampling: You divide the population into clusters (like neighborhoods or schools), randomly select some clusters, and then survey everyone within those chosen clusters.
A school wants to know if students are satisfied with lunch options. Using simple random sampling, they assign each of the 800 students a number and use a random number generator to select 80 students. Using stratified sampling, they'd make sure to include students from every grade level proportionally. Using cluster sampling, they might randomly select 4 classrooms out of 30 and survey everyone in those rooms.
Sample Size: How Big Is Big Enough?
A common question is "how many people do I need to survey?" The answer depends on several factors, but here are the basics:
- Bigger is generally better. Larger samples tend to produce more accurate results because random fluctuations balance out.
- But there are diminishing returns. Going from 100 to 1,000 people dramatically improves accuracy. Going from 10,000 to 11,000 barely makes a difference.
- The population size matters less than you think. A well-chosen sample of 1,000 people can accurately represent a city of 500,000 or a country of 50 million. What matters is how the sample is selected, not just its size relative to the population.
This might seem counterintuitive. How can 1,000 people represent millions? Think back to the soup example. Whether you have a small pot or a giant cauldron, one well-stirred spoonful tells you about the seasoning. The key isn't how much soup you taste; it's whether the soup was stirred properly.
Major national polls in the United States often survey around 1,000 to 1,500 people to predict the behavior of over 150 million voters. When done well with proper random sampling, these polls are typically accurate within 3 to 4 percentage points. The secret isn't the number of people surveyed; it's the method used to select them.
Parameters vs. Statistics
Here's a quick vocabulary note that will help in future lessons. A number that describes a population is called a parameter. A number that describes a sample is called a statistic.
For instance, the true average income of everyone in a city is a parameter (you'd need data from every person). The average income calculated from a survey of 500 residents is a statistic (calculated from a sample). We use the statistic to estimate the parameter.
A population is the complete group you want to study. A sample is a manageable subset of that group. We use samples because studying the entire population is usually too expensive, time-consuming, or impossible. The most important quality of a sample is that it's representative of the population, and random sampling is the best way to achieve that. Sample size matters, but how you select the sample matters even more. A small, well-chosen sample beats a large, biased one every time.