Statistics: percentile, median, std, var

Descriptive statistics summarize a dataset's center and spread, and NumPy gives you fast built-ins — mean , median , std , var , and percentile — that work on whole arrays or along any axis.

Learn Statistics: percentile, median, std, var in our free NumPy course — a beginner-friendly interactive lesson with worked examples, a practice exercise…

Part of the free Numpy course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

You'll compute centers and spread, control the divisor with ddof , read cut points with percentile and quantile , and measure relationships with corrcoef and cov .

np.mean gives the average, np.median the middle value (robust to outliers), np.std the standard deviation, and np.var the variance. By default NumPy uses the population formula (divide by n ). Pass ddof=1 for the sample formula (divide by n-1 ), which most statistics tools use.

A percentile is the value below which a given share of the data falls. np.percentile(a, 50) is the median; np.percentile(a, [25, 50, 75]) returns the quartiles in one call. np.quantile is identical but takes fractions from 0 to 1. Every statistic here accepts an axis argument to work down columns or across rows.

np.corrcoef measures how strongly two variables move together on a scale from -1 to 1 ; np.cov gives the unscaled covariance. Both return a matrix: the diagonal is each variable with itself and the off-diagonal is the relationship between them. A correlation of exactly 1 means a perfect straight-line relationship.

The off-diagonal 1 in the correlation matrix confirms y is a perfect linear function of x — exactly what you expect when y = 2x .

Replace each ___ so the program prints the mean and the median (the 50th percentile).

Expected output: 30.0 then 30.0 . (Answers: mean , percentile .)

NumPy defaults to the population formula, so its std/var differ from pandas or Excel.

✅ Fix: pass ddof=1 when your data is a sample of a larger population.

percentile wants 0–100, quantile wants 0–1; swapping them gives wrong results.

✅ Fix: use np.percentile(a, 90) or np.quantile(a, 0.9) — not quantile(a, 90) .

For a 3-student by 3-subject score table, compute each subject's mean and sample standard deviation (down the columns), plus the overall quartiles.

Lesson complete — statistics unlocked!

You can now describe data with mean , median , std , and var , choose population vs sample with ddof , read cut points with percentile / quantile , and measure relationships with corrcoef and cov .

🚀 Up next: Linear Algebra — solve systems, invert matrices, and compute eigenvalues and the SVD.

Practice quiz

What does np.std compute by default (ddof=0)?

The sample standard deviation
The population standard deviation
The variance
The median

Answer: The population standard deviation. By default NumPy divides by n, giving the population standard deviation.

Which call gives the sample (n-1) standard deviation?

np.std(a, ddof=1)
np.std(a)
np.std(a, ddof=0)
np.var(a)

Answer: np.std(a, ddof=1). ddof=1 divides by n-1, the sample (Bessel-corrected) formula.

How is the median computed in NumPy?

np.mean(a)
np.std(a)
np.median(a)
np.var(a)

Answer: np.median(a). np.median returns the middle value, which is robust to outliers.

Which expression returns the median using percentile?

np.percentile(a, 100)
np.percentile(a, 25)
np.percentile(a, 0)
np.percentile(a, 50)

Answer: np.percentile(a, 50). The 50th percentile is the median.

What scale does np.quantile take for its q argument?

A fraction from 0 to 1
A number from 0 to 100
A percentage string
Any integer

Answer: A fraction from 0 to 1. np.quantile takes 0 to 1, so the median is np.quantile(a, 0.5).

On a 2D array, which axis gives one statistic per column?

axis=1
axis=0
axis=-1
axis=2

Answer: axis=0. axis=0 collapses the rows, producing one result per column.

What does np.var measure?

The square root of the std
The middle value
The average squared deviation from the mean
The maximum

Answer: The average squared deviation from the mean. Variance is the mean of squared differences from the mean; std is its square root.

What range does a Pearson correlation from np.corrcoef fall in?

-1 to 1
0 to 1
0 to 100
-100 to 100

Answer: -1 to 1. np.corrcoef returns correlations on a scale from -1 to 1.

np.percentile(a, [25, 50, 75]) returns what?

A single number
The mean
The three quartiles in one call
An error

Answer: The three quartiles in one call. Passing a list returns several cut points at once: the quartiles.

Why might NumPy's std differ from pandas or Excel by default?

NumPy rounds differently
NumPy uses ddof=0 (population) while they use ddof=1
NumPy ignores some values
NumPy sorts first

Answer: NumPy uses ddof=0 (population) while they use ddof=1. NumPy defaults to the population formula; pass ddof=1 to match sample-based tools.