group_by & summarise
group_by() and summarise() are dplyr's tools for grouped statistics — the "split-apply-combine" pattern that answers questions like average sales per region or counts per category in a couple of readable lines.
Learn group_by & summarise in our free R course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick reference.
Part of the free R course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
By the end of this lesson you'll group data by one or more columns, compute several summaries at once, count rows with n(), and sort the results into a tidy report.
What You'll Learn in This Lesson
1️⃣ group_by() + summarise()
group_by() tags rows by a column; the following summarise() computes its result once per group. Together they turn raw rows into a per-category summary.
2️⃣ Multiple Summaries and n()
Inside one summarise() you can compute many statistics — sum, mean, min, max — and use n() to count the rows in each group.
3️⃣ Grouping by Several Columns
Group by two or more columns for a cross-tab style breakdown — for instance sales by region and channel. Use .groups = "drop" to clear grouping afterwards.
Your turn. Fill in the # TODO blank, run it, and compare with the expected output.
Write the full pipeline from the outline, run it, and check it against the example output. mutate then group then summarise then arrange is the classic reporting flow.
📋 Quick Reference — group_by & summarise
Practice quiz
What does group_by() do on its own?
- Collapses the table to one row
- Sorts the data
- Tags rows so the NEXT operation runs per group
- Deletes duplicate rows
Answer: Tags rows so the NEXT operation runs per group. group_by() tags rows by category; the following verb then runs within each group.
What does summarise() do to each group?
- Adds a column to every row
- Collapses each group to a single summary row
- Keeps all rows unchanged
- Removes the grouping column
Answer: Collapses each group to a single summary row. summarise() reduces each group to one row of summary values.
Which function counts the number of rows in the current group?
- n()
- count()
- nrow()
- length()
Answer: n(). n() returns the number of rows in the current group inside dplyr verbs.
Which pattern does group_by() + summarise() implement?
- Map-reduce-only
- Filter-sort-join
- Read-eval-print
- Split-apply-combine
Answer: Split-apply-combine. Grouping then summarising is the classic split-apply-combine workflow.
How do summarise() and mutate() differ?
- summarise() collapses to one row per group; mutate() keeps every row
- They are identical
- mutate() collapses rows; summarise() keeps them
- Both delete columns
Answer: summarise() collapses to one row per group; mutate() keeps every row. mutate() keeps the table height; summarise() aggregates each group to one row.
What does summarise(.groups = "drop") do?
- Drops the data frame
- Removes all grouping after summarising
- Deletes the summary columns
- Drops missing values
Answer: Removes all grouping after summarising. .groups = "drop" clears the grouping so later steps act on the whole result.
To get sales per region AND channel, you should:
- Use filter() twice
- group_by(region, channel)
- Only group_by(region)
- Skip group_by entirely
Answer: group_by(region, channel). Group by several columns for a cross-tab style summary.
Why might summarise(total = sum(x)) return NA for a group?
- x is a character
- The group is too large
- The group contains missing values (NA)
- summarise never returns NA
Answer: The group contains missing values (NA). Missing values propagate; add na.rm = TRUE inside sum()/mean() to ignore them.
Which is a quick group-and-tally shortcut in dplyr?
- tally_all(region)
- group(region)
- rollup(region)
- count(region)
Answer: count(region). count(region) groups and counts in a single verb.
Where does n() work?
- Inside dplyr verbs like summarise() or mutate()
- Anywhere in base R
- Only inside print()
- Only on vectors
Answer: Inside dplyr verbs like summarise() or mutate(). n() only works inside dplyr verbs, not in plain base R calls.