The Index: set_index, reset_index & reindex
The index is a pandas DataFrame's row-label axis — the column of labels on the left that pandas uses to align, look up, and join rows quickly.
Learn The Index: set_index, reset_index & reindex in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice…
Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
Learn to promote a column to the index with set_index, push it back with reset_index, conform to new labels with reindex, and why a meaningful index makes .loc lookups dramatically faster.
When you build a DataFrame, pandas hands you a default RangeIndex : the plain integers 0, 1, 2, 3... down the left side. That auto-number rarely means anything. If one of your columns is a natural identifier — a product code, an employee ID, a date — you can promote it to become the index with df.set_index("id") .
Once a column is the index it stops being a regular column (it moves to the label axis), and you can address rows by that label using .loc . This is the difference between "row number 2" and "the row for product P-100."
reset_index() is the inverse of set_index. It pops the current index back out into an ordinary column and gives you a fresh default RangeIndex (0, 1, 2...). You will reach for this constantly: after a groupby , after filtering rows away (which leaves gaps in the integer index), or whenever you want the index labels to live in a real column you can save to CSV.
reindex() is different from both. You hand it the exact list of labels you want the result to have. Pandas keeps and reorders the rows whose labels you asked for, drops any you left out, and creates NaN-filled rows for labels that do not yet exist. It is the tool for forcing data onto a known shape — for example, making sure every day of the week appears even if some had no sales.
You called set_index but the original frame looks unchanged:
❌ KeyError when using .loc on a default index
You tried df.loc["P-101"] but never set an index, so the labels are still 0, 1, 2:
✅ Fix: set the index first (or use a boolean mask):
Take a small catalogue through the full lifecycle of the index.
Lesson complete — you own the index!
You can promote a column with set_index , flatten it with reset_index , force data onto target labels with reindex , and you understand why a meaningful, unique index makes .loc lookups fast.
🚀 Up next: Data Types & astype — control how pandas stores each column and convert between int, float, object, and datetime.
Practice quiz
What is the index of a DataFrame?
- A hidden cache
- The first data column
- The row-label axis on the left that pandas uses to align rows
- The list of column names
Answer: The row-label axis on the left that pandas uses to align rows. The index is the row-label axis used for alignment, lookups, and joins.
What does df.set_index('id') do?
- Sorts by id
- Deletes the id column
- Counts unique ids
- Promotes the id column to become the row-label index
Answer: Promotes the id column to become the row-label index. set_index promotes a column to the index so you can look rows up by that label.
What default index does a new DataFrame get?
- A RangeIndex of 0, 1, 2, ...
- The first column
- An empty index
- Random IDs
Answer: A RangeIndex of 0, 1, 2, .... Pandas gives a default RangeIndex (0, 1, 2, ...).
What does reset_index() do?
- Sorts the index
- Deletes all rows
- Pops the index back into a column and restores a default RangeIndex
- Renames the index
Answer: Pops the index back into a column and restores a default RangeIndex. reset_index moves the index labels back into an ordinary column with a fresh RangeIndex.
After df = df.set_index('id'), how do you look up the 'P-101' row?
- P-101
With a label index, df.loc['P-101'] selects that row by label.
What does reset_index(drop=True) do with the old index?
- Keeps it as a new column
- Throws it away entirely
- Moves it to the end
- Sorts it
Answer: Throws it away entirely. drop=True discards the old index rather than keeping it as a column.
What does reindex(['Mon','Tue','Wed','Thu']) do for a label that does not exist?
- Raises a KeyError
- Drops the request
- Fills it with 0
- Creates a row filled with NaN
Answer: Creates a row filled with NaN. Missing labels get NaN-filled rows (unless you pass fill_value).
How do you make reindex fill missing labels with 0 instead of NaN?
- Pass fill_value=0
- Pass na=0
- Call fillna first
- It is impossible
Answer: Pass fill_value=0. reindex(..., fill_value=0) substitutes 0 for missing labels.
Why is a meaningful, unique index faster for lookups?
- It compresses the data
- It runs on the GPU
- It can use a hash or binary search instead of scanning every row
- It caches the whole table
Answer: It can use a hash or binary search instead of scanning every row. A sorted, unique index allows hash or binary-search lookups instead of a full scan.
Why might set_index appear to 'not work'?
- It only works on numbers
- The result was not assigned back to df
- It needs inplace=True always
- The column must be sorted first
Answer: The result was not assigned back to df. set_index returns a new DataFrame; reassign it (df = df.set_index('id')).