GroupBy transform & filter

GroupBy transform broadcasts a per-group statistic back to every original row so you can add it as a column, while GroupBy filter keeps or discards whole groups based on a test — two tools that go beyond the usual collapse-to-a-summary aggregation.

Learn GroupBy transform & filter in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick…

Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

You will use groupby().transform() to add group means and z-scores, groupby().filter() to drop small or low-volume groups, and see exactly how both differ from .agg() .

A normal .agg() collapses each group to one row. .transform() does the opposite: it computes the group statistic, then spreads that one value back across every row in the group . The result has the same length and index as the input, so it slots straight into a new column.

Because transform keeps shapes aligned, you can build per-row standardised scores. A z-score measures how far a value sits from its group's mean, in units of the group's standard deviation. Pass a small custom function that returns a Series the same length as each group.

groupby().filter(func) hands each group to your function as a sub-DataFrame. If the function returns True , every row of that group survives; if False , the whole group is dropped. It is row selection at the group level — useful for removing rare categories or low-volume segments.

agg shrinks the frame, so this misaligns when assigned back:

Lesson complete — group stats now ride along!

You can broadcast a group statistic onto every row with transform , build within-group z-scores, and keep or drop whole groups with filter — all without collapsing your data the way agg does.

🚀 Up next: Time Series Ops — lag, lead, and percentage change with shift , diff , and pct_change .

Practice quiz

How does transform differ from agg?

  • transform sorts the groups
  • transform deletes groups
  • transform keeps the original shape, agg collapses to one row per group
  • They are identical

Answer: transform keeps the original shape, agg collapses to one row per group. transform broadcasts a group stat back to every row; agg shrinks.

What does df.groupby('team')['pts'].transform('mean') return?

  • One value per ORIGINAL row, the group mean broadcast back
  • One row per group
  • A single number
  • A boolean Series

Answer: One value per ORIGINAL row, the group mean broadcast back. transform returns the group mean repeated across each row of the group.

Why can you assign a transform result directly as a new column?

  • It sorts the index
  • It drops duplicates
  • It returns a scalar
  • It has the same length and index as the input

Answer: It has the same length and index as the input. transform preserves shape and index, so it aligns for assignment.

What does groupby().filter(func) do?

  • Modifies individual values
  • Keeps or drops whole groups based on a True/False test
  • Collapses each group to a summary
  • Sorts within groups

Answer: Keeps or drops whole groups based on a True/False test. filter keeps every row of groups whose predicate returns True.

What must the function passed to filter return?

  • A single boolean per group
  • A new DataFrame
  • A list of indices
  • A scalar mean

Answer: A single boolean per group. filter expects one True/False per group to keep or drop it.

What does the function passed to filter receive?

  • A single row
  • A column name
  • Each group as a sub-DataFrame
  • The group's mean only

Answer: Each group as a sub-DataFrame. Each group is handed in as a sub-DataFrame to test.

What does a within-group z-score with transform measure?

  • The group's total
  • The number of rows
  • The maximum value
  • How far each value is from its group mean in std-dev units

Answer: How far each value is from its group mean in std-dev units. (x - x.mean()) / x.std() standardises each value within its group.

Why does df['m'] = df.groupby('team')['pts'].mean() misalign?

  • mean is deprecated
  • agg-style mean returns one row per group, not per original row
  • It returns strings
  • It sorts descending

Answer: agg-style mean returns one row per group, not per original row. The aggregated result is shorter, so assigning it back misaligns.

Which expression keeps groups whose total sales exceed 200?

  • sales

filter with a lambda testing the group sum keeps qualifying groups.

Which best captures the mental model of transform vs filter?

  • transform adds a per-row column; filter chooses which rows to keep
  • Both collapse the data
  • transform drops groups; filter adds columns
  • Both return a single number

Answer: transform adds a per-row column; filter chooses which rows to keep. transform enriches columns; filter selects whole groups.