GroupBy transform & filter
GroupBy transform broadcasts a per-group statistic back to every original row so you can add it as a column, while GroupBy filter keeps or discards whole groups based on a test — two tools that go beyond the usual collapse-to-a-summary aggregation.
Learn GroupBy transform & filter in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick…
Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
You will use groupby().transform() to add group means and z-scores, groupby().filter() to drop small or low-volume groups, and see exactly how both differ from .agg() .
A normal .agg() collapses each group to one row. .transform() does the opposite: it computes the group statistic, then spreads that one value back across every row in the group . The result has the same length and index as the input, so it slots straight into a new column.
Because transform keeps shapes aligned, you can build per-row standardised scores. A z-score measures how far a value sits from its group's mean, in units of the group's standard deviation. Pass a small custom function that returns a Series the same length as each group.
groupby().filter(func) hands each group to your function as a sub-DataFrame. If the function returns True , every row of that group survives; if False , the whole group is dropped. It is row selection at the group level — useful for removing rare categories or low-volume segments.
agg shrinks the frame, so this misaligns when assigned back:
Lesson complete — group stats now ride along!
You can broadcast a group statistic onto every row with transform , build within-group z-scores, and keep or drop whole groups with filter — all without collapsing your data the way agg does.
🚀 Up next: Time Series Ops — lag, lead, and percentage change with shift , diff , and pct_change .
Practice quiz
How does transform differ from agg?
- transform sorts the groups
- transform deletes groups
- transform keeps the original shape, agg collapses to one row per group
- They are identical
Answer: transform keeps the original shape, agg collapses to one row per group. transform broadcasts a group stat back to every row; agg shrinks.
What does df.groupby('team')['pts'].transform('mean') return?
- One value per ORIGINAL row, the group mean broadcast back
- One row per group
- A single number
- A boolean Series
Answer: One value per ORIGINAL row, the group mean broadcast back. transform returns the group mean repeated across each row of the group.
Why can you assign a transform result directly as a new column?
- It sorts the index
- It drops duplicates
- It returns a scalar
- It has the same length and index as the input
Answer: It has the same length and index as the input. transform preserves shape and index, so it aligns for assignment.
What does groupby().filter(func) do?
- Modifies individual values
- Keeps or drops whole groups based on a True/False test
- Collapses each group to a summary
- Sorts within groups
Answer: Keeps or drops whole groups based on a True/False test. filter keeps every row of groups whose predicate returns True.
What must the function passed to filter return?
- A single boolean per group
- A new DataFrame
- A list of indices
- A scalar mean
Answer: A single boolean per group. filter expects one True/False per group to keep or drop it.
What does the function passed to filter receive?
- A single row
- A column name
- Each group as a sub-DataFrame
- The group's mean only
Answer: Each group as a sub-DataFrame. Each group is handed in as a sub-DataFrame to test.
What does a within-group z-score with transform measure?
- The group's total
- The number of rows
- The maximum value
- How far each value is from its group mean in std-dev units
Answer: How far each value is from its group mean in std-dev units. (x - x.mean()) / x.std() standardises each value within its group.
Why does df['m'] = df.groupby('team')['pts'].mean() misalign?
- mean is deprecated
- agg-style mean returns one row per group, not per original row
- It returns strings
- It sorts descending
Answer: agg-style mean returns one row per group, not per original row. The aggregated result is shorter, so assigning it back misaligns.
Which expression keeps groups whose total sales exceed 200?
- sales
filter with a lambda testing the group sum keeps qualifying groups.
Which best captures the mental model of transform vs filter?
- transform adds a per-row column; filter chooses which rows to keep
- Both collapse the data
- transform drops groups; filter adds columns
- Both return a single number
Answer: transform adds a per-row column; filter chooses which rows to keep. transform enriches columns; filter selects whole groups.