GroupBy: Split-Apply-Combine
GroupBy in pandas is the operation that splits a DataFrame into groups by the values in one or more key columns, applies a summary function such as mean or sum to each group, and combines the answers into one tidy result.
Learn GroupBy: Split-Apply-Combine in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick…
Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
Learn the split-apply-combine pattern, group by single and multiple keys, run common aggregations, iterate groups, and reach into a specific group with get_group.
Every groupby follows three steps. Split the rows into groups by a key, apply a function to each group, then combine the results. The pattern df.groupby("key")["value"].mean() reads exactly like that sentence.
Group by several columns by passing a list, which forms a group for every unique combination and produces a MultiIndex result. The .agg() method lets you run more than one function at once, returning a column for each.
A groupby object is iterable: each loop gives you the group key and its DataFrame slice. get_group() jumps straight to one group, and as_index=False keeps the grouping keys as ordinary columns instead of moving them into the index.
Replace the blank so you total the amount for each region. The expected output is North 270 and South 170.
Build a per-store report: total sales, average sale, and number of transactions, with the store kept as a column.
Lesson complete — you can summarise by category!
You now understand split-apply-combine, can group by one or many keys, run mean, sum and count, apply several functions with agg, iterate groups, reach into one with get_group, and flatten results with as_index=False.
🚀 Up next: Aggregation & Transformation — named aggregations and group-aware transforms.
Practice quiz
What are the three steps of the split-apply-combine pattern?
- Load, clean, save
- Sort, filter, join
- Split into groups, apply a function, combine results
- Map, reduce, shuffle
Answer: Split into groups, apply a function, combine results. groupby splits rows into groups, applies a function to each, then combines the results.
What does df.groupby('region')['amount'].mean() compute?
- The overall mean of amount
- The mean amount within each region
- A count per region
- The maximum amount
Answer: The mean amount within each region. It splits by region and averages amount within each group.
How do you group by more than one column?
- Call groupby twice
- Use groupby('a+b')
- region
- product
Answer: region. Pass a list of column names to form a group per unique combination.
What does grouping by multiple keys produce in the result?
- A flat single index
- A MultiIndex, one level per key
- A new column
- An error
Answer: A MultiIndex, one level per key. Multiple grouping keys give a MultiIndex with one level per key.
Which method runs several aggregation functions at once?
- sum
- mean
- max
Answer: sum. .agg([...]) returns a column for each function you pass.
What does as_index=False do?
- Drops the grouping keys
- Sorts the groups
- Reverses the order
- Keeps the grouping keys as ordinary columns
Answer: Keeps the grouping keys as ordinary columns. as_index=False keeps the keys as columns instead of moving them into the index.
What does grouped.get_group('North') return?
- The mean of North
- Just the North sub-DataFrame
- A boolean mask
- The number of groups
Answer: Just the North sub-DataFrame. get_group jumps straight to the rows of a single named group.
For amounts North [100,50,120] and South [80,90], what is the sum for North?
- 170
- 85.0
- 270
- 90.0
Answer: 270. 100 + 50 + 120 = 270 for North.
When you iterate a groupby object, what does each loop give you?
- The group key and its sub-DataFrame
- Only the key
- Only the values
- A single number
Answer: The group key and its sub-DataFrame. Iterating yields a (key, sub-DataFrame) pair for each group.
as_index=False is essentially equivalent to what afterwards?
- sort_values()
- drop_duplicates()
- set_index()
- reset_index()
Answer: reset_index(). Both flatten the grouping keys back into columns; reset_index does it after the fact.