Joining Data
A join combines two tables by matching rows on a shared key column — the way you bring related data, split across separate tables, back together for analysis.
Learn Joining Data in our free R course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick reference.
Part of the free R course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
By the end of this lesson you'll use inner_join() and left_join() to combine tables, understand full_join() and anti_join(), and chain a join into a grouped summary.
What You'll Learn in This Lesson
1️⃣ inner_join(): Only Matches
inner_join(a, b, by = "key") keeps only rows whose key appears in both tables. Rows without a partner on either side are dropped.
2️⃣ left_join(): Keep the Left Table
left_join() keeps every row of the left table and attaches matching data from the right, filling NA where there's no match. It's the most common join because it never loses your main records.
3️⃣ full_join() and anti_join()
full_join() keeps all rows from both tables; anti_join() is a filtering join that returns rows of the first table with no match in the second — ideal for "what's missing" questions.
Your turn. Fill in the # TODO blank, run it, and compare with the expected output.
Write the full pipeline from the outline, run it, and check it against the example output. Join then group then summarise is one of the most common real analyses.
📋 Quick Reference — Joins
Practice quiz
Which join keeps only rows with a match in BOTH tables?
- full_join()
- inner_join()
- left_join()
- anti_join()
Answer: inner_join(). inner_join() drops any row without a partner on either side.
What does left_join() do with a left-table row that has no match?
- Drops the row
- Duplicates it
- Keeps it and fills NA for right columns
- Raises an error
Answer: Keeps it and fills NA for right columns. left_join() keeps every left row, filling NA where the right table has no match.
Which join keeps every row from both tables?
- full_join()
- inner_join()
- semi_join()
- anti_join()
Answer: full_join(). full_join() keeps everything from both, filling gaps with NA.
What does anti_join(a, b) return?
- Rows matched in both
- All rows of b
- Rows of a with no match in b
- The intersection of columns
Answer: Rows of a with no match in b. anti_join is a filtering join returning a's rows that have NO match in b.
How do you specify the key column for a join?
- key = "id"
- on = "id"
- using("id")
- by = "id"
Answer: by = "id". The by argument names the key, e.g. by = "id".
How do you join when key columns have different names?
- by = c("left" = "right")
- by = "left, right"
- on = left == right
- key = both
Answer: by = c("left" = "right"). Use a named vector: by = c("left_name" = "right_name").
Which filtering join keeps only rows of a that DO have a match in b?
- anti_join()
- right_join()
- semi_join()
- full_join()
Answer: semi_join(). semi_join() keeps matching rows of a without adding columns from b.
Why might a join cause the row count to explode?
- Missing by argument
- Duplicate keys on both sides (many-to-many)
- Using left_join
- NA values present
Answer: Duplicate keys on both sides (many-to-many). Duplicate keys on both sides multiply rows in a many-to-many join.
Which join is most common because it never loses your main records?
- inner_join()
- anti_join()
- left_join()
- full_join()
Answer: left_join(). left_join() preserves all rows of the primary (left) table.
What suffixes appear when both tables share a non-key column name?
- .x and .y
- .a and .b
- .1 and .2
- .left and .right
Answer: .x and .y. dplyr disambiguates clashing columns with .x and .y by default.