What is a chi-square test used for?

A chi-square test checks whether there is a significant association between two categorical variables or if observed frequencies match expected ones.

When should you use a chi-square test?

Use it when comparing frequencies or proportions of categorical data, such as survey responses or demographic distributions.

What are the assumptions of a chi-square test?

Observations must be independent, data must be categorical, and expected frequencies in each cell should generally be 5 or more.

What is the difference between chi-square goodness of fit and independence?

Goodness of fit tests if one variable matches an expected distribution. The independence test checks if two categorical variables are related.

Chi-Square Test

When Your Data Isn't Numbers

Not all data involves measurements like height, weight, or test scores. Sometimes your data is about categories - things like yes or no, brand preferences, colors chosen, or types of food ordered. When you want to know if two categories are related, you need a different tool. That tool is the chi-square test (pronounced "kai-square").

The Core Question

The chi-square test answers a simple question: are two categorical variables related, or are they independent?

For instance: does someone's age group affect which streaming service they prefer? Do men and women choose different college majors at different rates? Is there a relationship between the region someone lives in and how they vote?

These questions all involve counting how many people fall into different combinations of categories - and then checking whether the pattern you see could have happened by chance.

Observed vs. Expected

The chi-square test works by comparing two things:

Observed counts: What you actually found in your data.
Expected counts: What you would expect to find if the two categories had absolutely no relationship.

If the observed counts are very different from the expected counts, that's evidence the categories are related. If they're close, the categories are probably independent.

Example

A phone retailer surveys 400 customers and records their gender and phone brand preference:

	Apple	Samsung	Other	Total
Women	120	55	25	200
Men	90	80	30	200
Total	210	135	55	400

If gender and brand preference were completely independent, you'd expect each gender to prefer brands at the same rates. Since 210 out of 400 total customers prefer Apple (52.5%), you'd expect about 52.5% of women (105) and 52.5% of men (105) to prefer Apple.

But the actual numbers are 120 women and 90 men. That's noticeably different from the expected 105 each. The chi-square test measures whether differences like this are large enough to be meaningful or could happen by chance.

How It Works (Without the Math)

The chi-square test follows these steps:

Count what you observed. Tally up how many people or things fall into each combination of categories.
Calculate what you'd expect. Figure out what the counts would look like if the two categories were completely unrelated.
Compare observed to expected. For each cell in your table, measure how far off the observed count is from the expected count.
Combine the differences. Add up all those differences (after squaring them and adjusting for the expected counts) to get a single number - the chi-square statistic.
Get a p-value. Use the chi-square statistic to determine how likely it is you'd see differences this large by pure chance.

A large chi-square statistic (and small p-value) means the categories are likely related. A small chi-square statistic means they're probably independent.

Another Common Use: Goodness of Fit

There's a second type of chi-square test called the goodness-of-fit test. Instead of asking whether two categories are related, it asks whether your data matches a specific distribution you expected.

Example

A candy company claims their bags contain equal proportions of five colors: red, blue, green, yellow, and orange (20% each). You buy a bag and count 100 candies:

Red: 28, Blue: 15, Green: 22, Yellow: 18, Orange: 17

If the company's claim is true, you'd expect about 20 of each color. Your bag has noticeably more red and fewer blue. A chi-square goodness-of-fit test checks whether these differences are large enough to doubt the company's claim, or if they're within the range of normal random variation.

When to Use a Chi-Square Test

The chi-square test is the right choice when:

Your data consists of counts or frequencies in categories (not measurements like heights or scores).
Each observation falls into exactly one category per variable.
You have a reasonably large sample - generally, each expected cell count should be at least 5.
The observations are independent - each person or item is counted only once.

Limitations

The chi-square test tells you whether a relationship exists between categories, but not how strong it is. A very large sample can produce a significant result even for a trivially small relationship. For measuring the strength of association, statisticians use additional measures like Cramer's V alongside the chi-square test.

Also, like all statistical tests, finding a relationship doesn't prove causation. If men and women prefer different phone brands, the test doesn't tell you why - it could be marketing, peer influence, feature preferences, or many other factors.

Key Takeaway

The chi-square test is used when your data involves categories rather than numbers. It compares what you actually observed to what you'd expect if two categories were unrelated. A large difference between observed and expected counts (resulting in a small p-value) suggests the categories are connected. It's widely used in surveys, market research, and social science - any time you're asking whether group membership affects the choices people make.