Histograms & Binning
A histogram groups numbers into ranges called bins and counts how many values fall into each one, and np.histogram computes those counts and bin edges for you in a single call.
Learn Histograms & Binning in our free NumPy course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick reference.
Part of the free Numpy course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
You'll set the number of bins and the range, normalize counts into a density, label each value's bin with np.digitize , and tally integer labels fast with np.bincount .
np.histogram(data, bins, range) returns two arrays: the counts per bin and the bin_edges . There is always one more edge than count, because n bins need n+1 boundaries. Always set range when you want predictable bins, otherwise NumPy uses the data's min and max.
The last bin is closed on both ends, so the value 5 is included in the final bin. Here is the same idea on seeded random data so the numbers are reproducible.
Raw counts depend on how many samples you have. Passing density=True normalizes the histogram so the total area equals 1, turning counts into a probability density. That lets you compare datasets of different sizes on the same axes.
When you need a label per value instead of totals, use np.digitize(values, bins) : it returns the index of the bin each value falls into. For non-negative integer labels, np.bincount tallies how many times each label appears — faster than a histogram for that case.
digitize returns 0 for values below the first edge and len(bins) for values at or above the last edge — handy for spotting out-of-range data.
Replace each ___ so the program bins exam scores into four equal ranges from 50 to 100.
Expected output: counts [2 2 2 2] and edges [50. 62.5 75. 87.5 100.] . (Answers: histogram , range .)
❌ Expecting counts and edges to be the same length
There is always one more edge than count, so zipping them naively drops a bin.
✅ Fix: remember len(edges) == len(counts) + 1 ; use edges[:-1] as bin starts.
Without range , edges depend on the data's min and max and change run to run.
✅ Fix: pass an explicit range=(low, high) so bins are reproducible and comparable.
Take 30 reproducible exam scores, bucket them into grade bands at 60/70/80/90 with digitize , then count how many fall in each band with bincount .
Lesson complete — histograms unlocked!
You can now bin data with np.histogram , read counts against n+1 edges, normalize with density=True , label each value with np.digitize , and tally fast with np.bincount .
🚀 Up next: Memory Layout & Strides — how arrays really sit in memory and why it affects speed.
Practice quiz
What two arrays does np.histogram return?
- counts and bin_edges
- min and max
- mean and std
- values and labels
Answer: counts and bin_edges. It returns the per-bin counts and the bin_edges array.
For n bins, how many bin edges are there?
- n
- n - 1
- n + 1
- 2n
Answer: n + 1. n bins need n+1 boundaries, so there is always one more edge than count.
For data [1,2,2,3,3,3,4,4,5], what does np.histogram(data, bins=4, range=(1,5)) give as counts?
The four equal bins over 1..5 hold 1, 2, 3, and 3 values respectively.
What does density=True do in np.histogram?
- Doubles the bin count
- Sorts the data first
- Removes empty bins
- Normalizes so the area integrates to 1
Answer: Normalizes so the area integrates to 1. With density=True each bar is count/(total*bin_width) and the area sums to 1.
What does np.digitize return?
- The bin index each value falls into
- The total count per bin
- The bin edges
- A normalized density
Answer: The bin index each value falls into. digitize labels every value with the index of the bin it lands in.
Which function quickly tallies non-negative integer labels?
- np.histogram
- np.bincount
- np.digitize
- np.unique
Answer: np.bincount. np.bincount counts how many times each non-negative integer appears.
Why should you pass an explicit range to np.histogram?
- It is required or it errors
- Otherwise bins depend on the data's min and max and shift run to run
- It speeds up sorting
- It changes the dtype
Answer: Otherwise bins depend on the data's min and max and shift run to run. Without range, edges come from the data min/max, so bins are not reproducible.
What does np.bincount([0,1,1,2,2,2,3,0]) return?
Label 0 appears twice, 1 twice, 2 three times, 3 once: [2, 2, 3, 1].
What does np.digitize returning len(bins) for a value indicate?
- The value is exactly at the first edge
- The value is below the first edge
- The value is at or above the last edge
- An error occurred
Answer: The value is at or above the last edge. digitize returns len(bins) for values at or above the last edge (out of range high).
How does np.histogram differ from np.digitize?
- histogram counts per bin; digitize labels each value's bin
- They are identical
- digitize returns counts; histogram returns labels
- Only digitize works on floats
Answer: histogram counts per bin; digitize labels each value's bin. histogram gives a distribution summary; digitize tags every input element.