Merging & Joining DataFrames
Merging in pandas is the operation of combining two DataFrames into one by matching rows on shared key columns or indexes — the same idea as a SQL JOIN, letting you stitch related tables (like customers and their orders) into a single analysis-ready table.
Learn Merging & Joining DataFrames in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick…
Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
Real data almost never lives in one table. You might have a customers table and a separate orders table that share a customer_id . pd.merge() lines them up by that shared key.
The most common merge is a many-to-one join: many orders point to one customer. Each order row gets the matching customer details attached.
The how argument decides which keys survive the merge:
Only keys present in BOTH frames — the intersection.
All keys from the left frame; non-matches get NaN.
All keys from the right frame; non-matches get NaN.
When the key columns have different names , use left_on and right_on . When both frames have a column with the same name (other than the key), pandas appends suffixes so nothing is overwritten. And df.join() is a shortcut for joining on the index.
Fill in the blank so every product appears, even ones that were never sold. Which how keeps all rows from the left frame?
Join three pieces of information — customers, orders, and a region lookup — into one report, then total the amount spent per city.
Lesson complete — you can now combine tables like a pro!
You can merge on columns or indexes, choose the right join type, handle mismatched key names and overlapping columns, and audit which rows matched. This is the backbone of every multi-table analysis.
🚀 Up next: Concatenating & Appending — stack DataFrames on top of or beside each other.
Practice quiz
Which join keeps only keys present in BOTH frames?
- how='outer'
- how='inner'
- how='left'
- how='right'
Answer: how='inner'. An inner join keeps the intersection — only keys found in both frames.
Which join keeps the union of all keys from both frames?
- how='inner'
- how='left'
- how='outer'
- how='cross'
Answer: how='outer'. An outer join keeps every key from both sides, filling gaps with NaN.
To keep ALL rows from the left frame, even unmatched ones, which how do you use?
- how='left'
- how='right'
- how='inner'
- how='outer'
Answer: how='left'. how='left' keeps every left-frame key; non-matches get NaN.
When key columns have different names, which arguments do you use?
- on and by
- suffixes
- left_index and right_index
- left_on and right_on
Answer: left_on and right_on. left_on/right_on tell merge which differently-named columns to match.
What does suffixes=('_l','_r') control?
- The join type
- Renaming overlapping non-key columns
- The sort order
- Which keys survive
Answer: Renaming overlapping non-key columns. suffixes disambiguate columns that share a name across both frames.
What does indicator=True add to the result?
- A row total
- A _merge column tagging each row's source
- An index
- A sort key
Answer: A _merge column tagging each row's source. indicator=True adds a _merge column of 'both', 'left_only', or 'right_only'.
What does df.join() join on by default?
- The first column
- A random key
- The index of the other frame
- All columns
Answer: The index of the other frame. join is a shortcut that aligns on the index and defaults to a left join.
What is the default value of the how argument in pd.merge?
- 'inner'
- 'left'
- 'outer'
- 'right'
Answer: 'inner'. pd.merge defaults to how='inner'.
If a key is duplicated on BOTH sides, what happens to row count?
- It stays the same
- It is halved
- Rows multiply (one per combination)
- An error is raised
Answer: Rows multiply (one per combination). A many-to-many match produces a row for every combination, multiplying rows.
What fills non-matching cells in a left join?
- 0
- An empty string
- The previous value
- NaN
Answer: NaN. Unmatched right-frame columns become NaN in a left join.