Handling Missing Data (NaN, fillna, dropna)
Missing data in pandas is represented by NaN ("Not a Number"), and handling it means detecting those gaps and deciding whether to remove the affected rows or fill them with a sensible value — a step almost every real dataset requires before analysis.
Learn Handling Missing Data (NaN, fillna, dropna) in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice…
Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
Learn to detect NaN with isna()/notna(), count gaps, drop incomplete rows with dropna(), and fill holes with fillna() — including filling with a column's mean.
Pandas marks missing values with NaN . To create one in code you use numpy.nan . Because NaN == NaN is False , you can't find gaps with == — instead use df.isna() (identical to df.isnull() ) which returns a True/False mask, and df.notna() for the opposite.
df.dropna() deletes rows that contain any NaN and returns a new DataFrame. Three arguments fine-tune it:
Often you want to keep every row and replace gaps instead. df.fillna(value) substitutes a constant; method="ffill" forward-fills the last valid value down; and a very common technique is filling a numeric column with its own mean .
❌ Forgetting that dropna/fillna return copies
A survey DataFrame has gaps. Clean it step by step.
Lesson complete — your data is clean and analysis-ready!
You can detect NaN with isna() , tally gaps with isna().sum() , remove incomplete rows with dropna() , and fill holes with fillna() — including with a column's mean and forward-fill.
🚀 Up next: Sorting and Ranking — order your rows and assign ranks.
Practice quiz
How does pandas mark a missing value?
- As an empty string
- As NaN (Not a Number)
- As the integer 0
- As the text 'null'
Answer: As NaN (Not a Number). Pandas uses NaN, a special float, to mark missing values.
Why does df['x'] == np.nan never find missing values?
- == is not supported on Series
- NaN equals only itself
- NaN is never equal to anything, even itself
- It needs quotes
Answer: NaN is never equal to anything, even itself. NaN compares unequal to everything, so == always returns False.
Which method returns a True/False mask of missing values?
- df.isna()
- df.dropna()
- df.fillna()
- df.mean()
Answer: df.isna(). df.isna() (same as isnull()) is True where a value is missing.
What is the most useful one-liner to count gaps per column?
- df.notna()
- df.dropna().sum()
- df.count()
- df.isna().sum()
Answer: df.isna().sum(). df.isna().sum() tallies missing values column by column.
What does df.dropna() do by default (how='any')?
- Drops a row if ANY value is NaN
- Drops only fully-empty rows
- Fills NaN with 0
- Drops all columns
Answer: Drops a row if ANY value is NaN. The default how='any' removes any row containing at least one NaN.
What does dropna(how='all') drop?
- Every row
- Only rows where EVERY value is NaN
- Columns with NaN
- Nothing
Answer: Only rows where EVERY value is NaN. how='all' removes only rows that are entirely missing.
How do you only drop rows missing the 'score' column?
- dropna(axis=1)
- dropna(how='all')
- score
Answer: score. subset=['score'] limits the NaN check to that one column.
What does df.fillna(0) do?
- Drops NaN rows
- Replaces every NaN with 0
- Counts NaN
- Renames columns
Answer: Replaces every NaN with 0. fillna(value) substitutes the given value for every NaN.
When filling a numeric column with its own mean, what does .mean() do with NaN?
- Treats NaN as 0
- Raises an error
- Returns NaN
- Ignores NaN automatically
Answer: Ignores NaN automatically. .mean() skips NaN, so the average isn't skewed by the gaps.
What does method='ffill' do?
- Carries the previous valid value forward
- Fills with the column max
- Drops the row
- Fills with zeros
Answer: Carries the previous valid value forward. Forward-fill copies the last valid value down into the gaps.