The statistics Module
The statistics module is Python's built-in toolkit for calculating averages, spread, and quantiles directly on ordinary lists of numbers, with zero external dependencies.
Learn The statistics Module in our free Python course — an interactive lesson with runnable examples, a practice exercise and a quick reference.
Part of the free Python course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
Before you reach for numpy or pandas, this lightweight standard-library module handles the everyday math — means, medians, modes, and standard deviation — in a few clear function calls.
The three classic "averages" answer different questions. The mean is the arithmetic average, the median is the middle value when sorted, and the mode is the most common value. Each is a single function call:
The median ignores extreme outliers. If one billionaire walks into a room, the mean income skyrockets but the median barely moves. For skewed data like incomes or house prices, the median is usually the more honest "typical" value.
Center tells you where the data sits; spread tells you how scattered it is. Variance is the average squared distance from the mean, and standard deviation is its square root (back in the original units). Python gives you both a sample and a population version:
Quantiles cut sorted data into equal-sized buckets. Pass n=4 to get the three quartile cut-points (25%, 50%, 75%), or n=100 for percentiles. Python also offers faster and specialized averages:
Use fmean when you just want a fast average of floats, geometric_mean for things that multiply (like yearly growth), and harmonic_mean when averaging rates (like average speed over equal distances).
Replace each ___ with the correct statistics function so the output matches. Think about which "average" each comment describes.
✅ Use stdev for samples, pstdev only for full populations
Build a small grade summary that reports the class average, the median, how spread out the scores are, and the top-quartile cutoff.
Lesson complete — you can summarize data with pure Python!
You can now find the mean , median , and mode , measure spread with sample ( stdev ) and population ( pstdev ) standard deviation, split data with quantiles , and choose specialized means like fmean and geometric_mean — all without any external libraries.
🚀 Up next: the operator module — clean, fast function versions of + , * , indexing, and attribute access.
Practice quiz
What does statistics.mean compute?
- The middle value when sorted
- The most common value
- The arithmetic average
- The largest value
Answer: The arithmetic average. mean is the arithmetic average — the sum of values divided by how many there are.
What is statistics.median([12, 15, 15, 18, 22, 30])?
- 15
- 16.5
- 18
- 18.67
Answer: 16.5. With an even count, the median is the average of the two middle values: (15 + 18) / 2 = 16.5.
What does statistics.mode return when several values tie for most common?
- A list of all the most-common values
- The single first most-common value it encounters
- It always raises an error on ties
- The average of the tied values
Answer: The single first most-common value it encounters. mode returns only the first most-common value; use multimode to get a list of all of them.
What does statistics.multimode([1, 1, 2, 2, 3]) return?
multimode returns a list of all values sharing the top frequency — here both 1 and 2 appear twice.
What is the difference between stdev and pstdev?
- stdev divides by n-1 (sample); pstdev divides by n (population)
- stdev is for integers, pstdev for floats
- They are identical
- pstdev divides by n-1; stdev divides by n
Answer: stdev divides by n-1 (sample); pstdev divides by n (population). stdev/variance are the sample versions (divide by n-1); pstdev/pvariance are population versions (divide by n).
Which should you use when your data is a SAMPLE of a larger population?
- pstdev
- pvariance
- stdev
- harmonic_mean
Answer: stdev. Use the sample versions (stdev, variance) unless your list truly contains the entire population.
What does statistics.quantiles(data, n=4) return?
- Four equal buckets of the data
- The three quartile cut-points (at 25%, 50%, 75%)
- The single median value
- The four largest values
Answer: The three quartile cut-points (at 25%, 50%, 75%). n=4 returns the three cut-points that divide sorted data into four equal-sized groups.
Which mean is correct for averaging rates like speeds over equal distances?
- mean
- geometric_mean
- harmonic_mean
- fmean
Answer: harmonic_mean. harmonic_mean is the right average for rates; harmonic_mean([60, 40]) gives 48.0.
What does statistics.mean([]) do on an empty list?
- Returns 0
- Returns None
- Raises StatisticsError
- Returns an empty list
Answer: Raises StatisticsError. mean raises StatisticsError on empty input — it requires at least one data point.
When should you prefer statistics over numpy or pandas?
- For huge arrays needing vectorized speed
- For small datasets with zero dependencies
- Only when working with DataFrames
- Never — they are always slower
Answer: For small datasets with zero dependencies. statistics is ideal for small datasets and zero-dependency scripts; numpy/pandas win on large, vectorized data.