Binning Continuous Data: cut & qcut
Binning is the act of converting a continuous numeric column into a small set of labelled categories — turning raw ages into "child", "adult", and "senior" buckets so the values become easier to group, count, and chart.
Learn Binning Continuous Data: cut & qcut in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a…
Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
You will learn pd.cut() for fixed value ranges, pd.qcut() for equal-frequency quantiles, and the boundary options right= and include_lowest= that decide exactly where each value lands.
pd.cut() slices a numeric column at edges you specify . You hand it a list of boundaries, and every value is placed into the interval it falls inside. With four edges you get three bins. By default each interval is closed on the right, written like (18, 35] , which means 18 is excluded and 35 is included.
Pass labels=[...] to replace the interval notation with friendly names — there must be exactly one label per bin. To see how many rows landed in each bin, call .value_counts(sort=False) so the bins stay in numeric order.
The right= argument flips which edge is inclusive. With right=False the intervals become [0, 18) , so a value of exactly 18 moves up into the next bin instead of staying in the lower one.
Where cut fixes the edges , qcut fixes the counts . You ask for q=4 and pandas finds the edges so each bin holds roughly a quarter of the rows — the classic quartile split. This is ideal when you want "top 25%" style buckets and you do not care about the exact value thresholds.
With 4 edges you get 3 bins, so you need exactly 3 labels:
Take a small spend table and produce both a fixed-edge tier and an equal-frequency tier.
Lesson complete — your numbers are now categories!
You can carve a continuous column into labelled groups with pd.cut() , balance the counts with pd.qcut() , and steer the boundaries with right= and include_lowest= .
🚀 Up next: One-Hot Encoding — turn those categories into model-ready 0/1 columns with get_dummies .
Practice quiz
What is the key difference between pd.cut and pd.qcut?
- They are identical
- qcut only works on dates
- cut fixes the edges; qcut fixes the count per bin
- cut returns numbers, qcut returns strings
Answer: cut fixes the edges; qcut fixes the count per bin. cut uses value-range edges you choose; qcut chooses edges so each bin holds roughly equal counts.
With bins=[0, 18, 65, 120] passed to pd.cut, how many bins do you get?
- 3
- 4
- 2
- 5
Answer: 3. Four edges create three intervals, so you need three labels.
By default (right=True), which edge of an interval like (18, 65] is inclusive?
- The left edge
- Both edges
- Neither edge
- The right edge
Answer: The right edge. right=True closes the interval on the right, so 65 lands in (18, 65].
With pd.cut and right=False, where does a value of exactly 18 land given bins [0,18,60,100]?
right=False makes intervals like [18, 60), so 18 moves up into the next bin.
Why might the smallest value become NaN after pd.cut?
- The lowest edge is exclusive by default
- cut always drops the minimum
- qcut was used instead
- Labels were missing
Answer: The lowest edge is exclusive by default. The first lower edge is excluded by default; pass include_lowest=True to include it.
For pd.qcut(income, q=4) on 8 evenly-binnable values, how many rows fall in each bin?
- 1
- 4
- 2
- 8
Answer: 2. q=4 splits the 8 rows into 4 equal-frequency quartiles of 2 rows each.
How do you count how many rows landed in each bin while keeping bin order?
- .sum()
- .value_counts(sort=False)
- .mean()
- .describe()
Answer: .value_counts(sort=False). value_counts(sort=False) tallies each bin and keeps the natural numeric bin order.
If pd.cut has 4 edges but you supply only 2 labels, what happens?
- The extra bins are dropped silently
- Labels repeat
- It uses interval notation
- A ValueError is raised
Answer: A ValueError is raised. You must supply exactly one label per bin (edges minus one), or pandas raises a ValueError.
What kind of object does pd.cut return?
- A float Series
- An ordered Categorical
- A list of ints
- A DatetimeIndex
Answer: An ordered Categorical. cut returns an ordered Categorical, so you can group, sort, and count by it.
When should you reach for cut rather than qcut?
- When you want balanced equal-count groups
- When the data is text
- When the thresholds carry real meaning, like a legal age or pass mark
- When you want quartiles
Answer: When the thresholds carry real meaning, like a legal age or pass mark. Use cut for meaningful fixed thresholds; use qcut for balanced equal-frequency buckets.