Histograms & Binning

A histogram groups numbers into ranges called bins and counts how many values fall into each one, and np.histogram computes those counts and bin edges for you in a single call.

Learn Histograms & Binning in our free NumPy course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick reference.

Part of the free Numpy course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

You'll set the number of bins and the range, normalize counts into a density, label each value's bin with np.digitize , and tally integer labels fast with np.bincount .

np.histogram(data, bins, range) returns two arrays: the counts per bin and the bin_edges . There is always one more edge than count, because n bins need n+1 boundaries. Always set range when you want predictable bins, otherwise NumPy uses the data's min and max.

The last bin is closed on both ends, so the value 5 is included in the final bin. Here is the same idea on seeded random data so the numbers are reproducible.

Raw counts depend on how many samples you have. Passing density=True normalizes the histogram so the total area equals 1, turning counts into a probability density. That lets you compare datasets of different sizes on the same axes.

When you need a label per value instead of totals, use np.digitize(values, bins) : it returns the index of the bin each value falls into. For non-negative integer labels, np.bincount tallies how many times each label appears — faster than a histogram for that case.

digitize returns 0 for values below the first edge and len(bins) for values at or above the last edge — handy for spotting out-of-range data.

Replace each ___ so the program bins exam scores into four equal ranges from 50 to 100.

Expected output: counts [2 2 2 2] and edges [50. 62.5 75. 87.5 100.] . (Answers: histogram , range .)

❌ Expecting counts and edges to be the same length

There is always one more edge than count, so zipping them naively drops a bin.

✅ Fix: remember len(edges) == len(counts) + 1 ; use edges[:-1] as bin starts.

Without range , edges depend on the data's min and max and change run to run.

✅ Fix: pass an explicit range=(low, high) so bins are reproducible and comparable.

Take 30 reproducible exam scores, bucket them into grade bands at 60/70/80/90 with digitize , then count how many fall in each band with bincount .

Lesson complete — histograms unlocked!

You can now bin data with np.histogram , read counts against n+1 edges, normalize with density=True , label each value with np.digitize , and tally fast with np.bincount .

🚀 Up next: Memory Layout & Strides — how arrays really sit in memory and why it affects speed.

Practice quiz

What two arrays does np.histogram return?

  • counts and bin_edges
  • min and max
  • mean and std
  • values and labels

Answer: counts and bin_edges. It returns the per-bin counts and the bin_edges array.

For n bins, how many bin edges are there?

  • n
  • n - 1
  • n + 1
  • 2n

Answer: n + 1. n bins need n+1 boundaries, so there is always one more edge than count.

For data [1,2,2,3,3,3,4,4,5], what does np.histogram(data, bins=4, range=(1,5)) give as counts?

The four equal bins over 1..5 hold 1, 2, 3, and 3 values respectively.

What does density=True do in np.histogram?

  • Doubles the bin count
  • Sorts the data first
  • Removes empty bins
  • Normalizes so the area integrates to 1

Answer: Normalizes so the area integrates to 1. With density=True each bar is count/(total*bin_width) and the area sums to 1.

What does np.digitize return?

  • The bin index each value falls into
  • The total count per bin
  • The bin edges
  • A normalized density

Answer: The bin index each value falls into. digitize labels every value with the index of the bin it lands in.

Which function quickly tallies non-negative integer labels?

  • np.histogram
  • np.bincount
  • np.digitize
  • np.unique

Answer: np.bincount. np.bincount counts how many times each non-negative integer appears.

Why should you pass an explicit range to np.histogram?

  • It is required or it errors
  • Otherwise bins depend on the data's min and max and shift run to run
  • It speeds up sorting
  • It changes the dtype

Answer: Otherwise bins depend on the data's min and max and shift run to run. Without range, edges come from the data min/max, so bins are not reproducible.

What does np.bincount([0,1,1,2,2,2,3,0]) return?

Label 0 appears twice, 1 twice, 2 three times, 3 once: [2, 2, 3, 1].

What does np.digitize returning len(bins) for a value indicate?

  • The value is exactly at the first edge
  • The value is below the first edge
  • The value is at or above the last edge
  • An error occurred

Answer: The value is at or above the last edge. digitize returns len(bins) for values at or above the last edge (out of range high).

How does np.histogram differ from np.digitize?

  • histogram counts per bin; digitize labels each value's bin
  • They are identical
  • digitize returns counts; histogram returns labels
  • Only digitize works on floats

Answer: histogram counts per bin; digitize labels each value's bin. histogram gives a distribution summary; digitize tags every input element.