Statistics: percentile, median, std, var
Descriptive statistics summarize a dataset's center and spread, and NumPy gives you fast built-ins — mean , median , std , var , and percentile — that work on whole arrays or along any axis.
Learn Statistics: percentile, median, std, var in our free NumPy course — a beginner-friendly interactive lesson with worked examples, a practice exercise…
Part of the free Numpy course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
You'll compute centers and spread, control the divisor with ddof , read cut points with percentile and quantile , and measure relationships with corrcoef and cov .
np.mean gives the average, np.median the middle value (robust to outliers), np.std the standard deviation, and np.var the variance. By default NumPy uses the population formula (divide by n ). Pass ddof=1 for the sample formula (divide by n-1 ), which most statistics tools use.
A percentile is the value below which a given share of the data falls. np.percentile(a, 50) is the median; np.percentile(a, [25, 50, 75]) returns the quartiles in one call. np.quantile is identical but takes fractions from 0 to 1. Every statistic here accepts an axis argument to work down columns or across rows.
np.corrcoef measures how strongly two variables move together on a scale from -1 to 1 ; np.cov gives the unscaled covariance. Both return a matrix: the diagonal is each variable with itself and the off-diagonal is the relationship between them. A correlation of exactly 1 means a perfect straight-line relationship.
The off-diagonal 1 in the correlation matrix confirms y is a perfect linear function of x — exactly what you expect when y = 2x .
Replace each ___ so the program prints the mean and the median (the 50th percentile).
Expected output: 30.0 then 30.0 . (Answers: mean , percentile .)
NumPy defaults to the population formula, so its std/var differ from pandas or Excel.
✅ Fix: pass ddof=1 when your data is a sample of a larger population.
percentile wants 0–100, quantile wants 0–1; swapping them gives wrong results.
✅ Fix: use np.percentile(a, 90) or np.quantile(a, 0.9) — not quantile(a, 90) .
For a 3-student by 3-subject score table, compute each subject's mean and sample standard deviation (down the columns), plus the overall quartiles.
Lesson complete — statistics unlocked!
You can now describe data with mean , median , std , and var , choose population vs sample with ddof , read cut points with percentile / quantile , and measure relationships with corrcoef and cov .
🚀 Up next: Linear Algebra — solve systems, invert matrices, and compute eigenvalues and the SVD.
Practice quiz
What does np.std compute by default (ddof=0)?
- The sample standard deviation
- The population standard deviation
- The variance
- The median
Answer: The population standard deviation. By default NumPy divides by n, giving the population standard deviation.
Which call gives the sample (n-1) standard deviation?
- np.std(a, ddof=1)
- np.std(a)
- np.std(a, ddof=0)
- np.var(a)
Answer: np.std(a, ddof=1). ddof=1 divides by n-1, the sample (Bessel-corrected) formula.
How is the median computed in NumPy?
- np.mean(a)
- np.std(a)
- np.median(a)
- np.var(a)
Answer: np.median(a). np.median returns the middle value, which is robust to outliers.
Which expression returns the median using percentile?
- np.percentile(a, 100)
- np.percentile(a, 25)
- np.percentile(a, 0)
- np.percentile(a, 50)
Answer: np.percentile(a, 50). The 50th percentile is the median.
What scale does np.quantile take for its q argument?
- A fraction from 0 to 1
- A number from 0 to 100
- A percentage string
- Any integer
Answer: A fraction from 0 to 1. np.quantile takes 0 to 1, so the median is np.quantile(a, 0.5).
On a 2D array, which axis gives one statistic per column?
- axis=1
- axis=0
- axis=-1
- axis=2
Answer: axis=0. axis=0 collapses the rows, producing one result per column.
What does np.var measure?
- The square root of the std
- The middle value
- The average squared deviation from the mean
- The maximum
Answer: The average squared deviation from the mean. Variance is the mean of squared differences from the mean; std is its square root.
What range does a Pearson correlation from np.corrcoef fall in?
- -1 to 1
- 0 to 1
- 0 to 100
- -100 to 100
Answer: -1 to 1. np.corrcoef returns correlations on a scale from -1 to 1.
np.percentile(a, [25, 50, 75]) returns what?
- A single number
- The mean
- The three quartiles in one call
- An error
Answer: The three quartiles in one call. Passing a list returns several cut points at once: the quartiles.
Why might NumPy's std differ from pandas or Excel by default?
- NumPy rounds differently
- NumPy uses ddof=0 (population) while they use ddof=1
- NumPy ignores some values
- NumPy sorts first
Answer: NumPy uses ddof=0 (population) while they use ddof=1. NumPy defaults to the population formula; pass ddof=1 to match sample-based tools.