Statistics: percentile, median, std, var

Descriptive statistics summarize a dataset's center and spread, and NumPy gives you fast built-ins — mean , median , std , var , and percentile — that work on whole arrays or along any axis.

Learn Statistics: percentile, median, std, var in our free NumPy course — a beginner-friendly interactive lesson with worked examples, a practice exercise…

Part of the free Numpy course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

You'll compute centers and spread, control the divisor with ddof , read cut points with percentile and quantile , and measure relationships with corrcoef and cov .

np.mean gives the average, np.median the middle value (robust to outliers), np.std the standard deviation, and np.var the variance. By default NumPy uses the population formula (divide by n ). Pass ddof=1 for the sample formula (divide by n-1 ), which most statistics tools use.

A percentile is the value below which a given share of the data falls. np.percentile(a, 50) is the median; np.percentile(a, [25, 50, 75]) returns the quartiles in one call. np.quantile is identical but takes fractions from 0 to 1. Every statistic here accepts an axis argument to work down columns or across rows.

np.corrcoef measures how strongly two variables move together on a scale from -1 to 1 ; np.cov gives the unscaled covariance. Both return a matrix: the diagonal is each variable with itself and the off-diagonal is the relationship between them. A correlation of exactly 1 means a perfect straight-line relationship.

The off-diagonal 1 in the correlation matrix confirms y is a perfect linear function of x — exactly what you expect when y = 2x .

Replace each ___ so the program prints the mean and the median (the 50th percentile).

Expected output: 30.0 then 30.0 . (Answers: mean , percentile .)

NumPy defaults to the population formula, so its std/var differ from pandas or Excel.

✅ Fix: pass ddof=1 when your data is a sample of a larger population.

percentile wants 0–100, quantile wants 0–1; swapping them gives wrong results.

✅ Fix: use np.percentile(a, 90) or np.quantile(a, 0.9) — not quantile(a, 90) .

For a 3-student by 3-subject score table, compute each subject's mean and sample standard deviation (down the columns), plus the overall quartiles.

Lesson complete — statistics unlocked!

You can now describe data with mean , median , std , and var , choose population vs sample with ddof , read cut points with percentile / quantile , and measure relationships with corrcoef and cov .

🚀 Up next: Linear Algebra — solve systems, invert matrices, and compute eigenvalues and the SVD.

Practice quiz

What does np.std compute by default (ddof=0)?

  • The sample standard deviation
  • The population standard deviation
  • The variance
  • The median

Answer: The population standard deviation. By default NumPy divides by n, giving the population standard deviation.

Which call gives the sample (n-1) standard deviation?

  • np.std(a, ddof=1)
  • np.std(a)
  • np.std(a, ddof=0)
  • np.var(a)

Answer: np.std(a, ddof=1). ddof=1 divides by n-1, the sample (Bessel-corrected) formula.

How is the median computed in NumPy?

  • np.mean(a)
  • np.std(a)
  • np.median(a)
  • np.var(a)

Answer: np.median(a). np.median returns the middle value, which is robust to outliers.

Which expression returns the median using percentile?

  • np.percentile(a, 100)
  • np.percentile(a, 25)
  • np.percentile(a, 0)
  • np.percentile(a, 50)

Answer: np.percentile(a, 50). The 50th percentile is the median.

What scale does np.quantile take for its q argument?

  • A fraction from 0 to 1
  • A number from 0 to 100
  • A percentage string
  • Any integer

Answer: A fraction from 0 to 1. np.quantile takes 0 to 1, so the median is np.quantile(a, 0.5).

On a 2D array, which axis gives one statistic per column?

  • axis=1
  • axis=0
  • axis=-1
  • axis=2

Answer: axis=0. axis=0 collapses the rows, producing one result per column.

What does np.var measure?

  • The square root of the std
  • The middle value
  • The average squared deviation from the mean
  • The maximum

Answer: The average squared deviation from the mean. Variance is the mean of squared differences from the mean; std is its square root.

What range does a Pearson correlation from np.corrcoef fall in?

  • -1 to 1
  • 0 to 1
  • 0 to 100
  • -100 to 100

Answer: -1 to 1. np.corrcoef returns correlations on a scale from -1 to 1.

np.percentile(a, [25, 50, 75]) returns what?

  • A single number
  • The mean
  • The three quartiles in one call
  • An error

Answer: The three quartiles in one call. Passing a list returns several cut points at once: the quartiles.

Why might NumPy's std differ from pandas or Excel by default?

  • NumPy rounds differently
  • NumPy uses ddof=0 (population) while they use ddof=1
  • NumPy ignores some values
  • NumPy sorts first

Answer: NumPy uses ddof=0 (population) while they use ddof=1. NumPy defaults to the population formula; pass ddof=1 to match sample-based tools.