Aggregation & Transformation
Aggregation collapses each group of rows into a single summary value, while transformation computes a per-group statistic and broadcasts it back so every original row keeps its place — two complementary tools that turn grouped data into insight.
Learn Aggregation & Transformation in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick…
Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
Learn to run lists and dictionaries of functions, give outputs clean names with named aggregation, and use transform to attach group statistics back onto each row.
Pass a list of functions to .agg() to compute several summaries of the same column. Pass a dictionary to apply different functions to different columns, mapping each column name to the function(s) you want.
Named aggregation gives every output a tidy column name. Each keyword you pass to .agg() becomes a column, and its value is a tuple of ("column", "function") . This avoids the awkward MultiIndex column names you get from list-style agg.
The key difference: agg reduces each group to one row, while transform preserves shape and returns a value for every original row. That makes transform perfect for attaching a group statistic, like the group mean, back onto each row so you can compare individuals to their group.
Replace the blank so each row gets its team's total points. The expected team_total column is 30, 30, 45, 45, 45.
Use transform to compute each sale's percentage share of its region total, then build a named-aggregation summary.
Lesson complete — you summarise like a pro!
You can now run lists and dicts of functions with agg, give outputs clean names with named aggregation, and use transform to broadcast group statistics back onto every row. You know exactly when to reduce and when to preserve shape.
🚀 Up next: Merging & Joining DataFrames — combine data from multiple tables.
Practice quiz
What does .agg(['sum', 'mean', 'max']) on a grouped column return?
- One summary row per group with three result columns
- A single number
- The original rows unchanged
- An error
Answer: One summary row per group with three result columns. A list of functions produces one column per function, summarised per group.
How do you apply a different function to each column in one call?
- Pass a list of functions
- Pass a dictionary mapping column to function
- Call .agg() with no arguments
- Use .transform()
Answer: Pass a dictionary mapping column to function. A dict like {'points':'sum','minutes':'mean'} maps each column to its own function.
In named aggregation, what is the form of each keyword's value?
- A single function name
- A tuple of (column, function)
- A list of columns
- A boolean
Answer: A tuple of (column, function). Each keyword maps to a (source_column, function) tuple, and the keyword becomes the output column name.
What is the main difference between agg and transform?
- agg reduces each group to one row; transform keeps the original shape
- They are identical
- transform only works on numbers
- agg cannot use mean
Answer: agg reduces each group to one row; transform keeps the original shape. agg collapses each group to a summary; transform broadcasts a result back to every original row.
For df.groupby('team')['points'].transform('sum') with teams A,A,B,B,B and points 10,20,5,15,25, what is the result?
transform broadcasts each team's total (A=30, B=45) onto every row, keeping the original length.
Which method gives a smaller summary table rather than a column the same length as the DataFrame?
- transform
- agg
- map
- apply with axis=1
Answer: agg. agg reduces to a summary; use transform when you want a same-length column.
Why use named aggregation instead of a list of functions?
- It runs faster
- It avoids awkward MultiIndex column names by giving clean labels
- It is the only way to use sum
- It changes the index
Answer: It avoids awkward MultiIndex column names by giving clean labels. Named aggregation lets you set predictable, readable output column names directly.
To compute each row's share of its group total, which approach fits best?
- Divide the value by a transform('sum') of the group
- Use agg('sum') alone
- Use sort_values
- Use drop_duplicates
Answer: Divide the value by a transform('sum') of the group. transform('sum') keeps the same shape so you can divide each value by its group total.
What does .agg(games=('points', 'count')) produce for a group of 3 rows?
- The value 3 in a column named games
- The sum of points
- The mean of points
- An error because count needs a list
Answer: The value 3 in a column named games. count counts the rows in the group; the keyword games names the output column.
Which statement about transform is true?
- It returns fewer rows than the input
- It returns a value for every original row
- It only works after agg
- It removes duplicate groups
Answer: It returns a value for every original row. transform preserves shape, returning one value per original row.