Handling Missing Data (NaN, fillna, dropna)

Missing data in pandas is represented by NaN ("Not a Number"), and handling it means detecting those gaps and deciding whether to remove the affected rows or fill them with a sensible value — a step almost every real dataset requires before analysis.

Learn Handling Missing Data (NaN, fillna, dropna) in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice…

Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

Learn to detect NaN with isna()/notna(), count gaps, drop incomplete rows with dropna(), and fill holes with fillna() — including filling with a column's mean.

Pandas marks missing values with NaN . To create one in code you use numpy.nan . Because NaN == NaN is False , you can't find gaps with == — instead use df.isna() (identical to df.isnull() ) which returns a True/False mask, and df.notna() for the opposite.

df.dropna() deletes rows that contain any NaN and returns a new DataFrame. Three arguments fine-tune it:

Often you want to keep every row and replace gaps instead. df.fillna(value) substitutes a constant; method="ffill" forward-fills the last valid value down; and a very common technique is filling a numeric column with its own mean .

❌ Forgetting that dropna/fillna return copies

A survey DataFrame has gaps. Clean it step by step.

Lesson complete — your data is clean and analysis-ready!

You can detect NaN with isna() , tally gaps with isna().sum() , remove incomplete rows with dropna() , and fill holes with fillna() — including with a column's mean and forward-fill.

🚀 Up next: Sorting and Ranking — order your rows and assign ranks.

Practice quiz

How does pandas mark a missing value?

As an empty string
As NaN (Not a Number)
As the integer 0
As the text 'null'

Answer: As NaN (Not a Number). Pandas uses NaN, a special float, to mark missing values.

Why does df['x'] == np.nan never find missing values?

== is not supported on Series
NaN equals only itself
NaN is never equal to anything, even itself
It needs quotes

Answer: NaN is never equal to anything, even itself. NaN compares unequal to everything, so == always returns False.

Which method returns a True/False mask of missing values?

df.isna()
df.dropna()
df.fillna()
df.mean()

Answer: df.isna(). df.isna() (same as isnull()) is True where a value is missing.

What is the most useful one-liner to count gaps per column?

df.notna()
df.dropna().sum()
df.count()
df.isna().sum()

Answer: df.isna().sum(). df.isna().sum() tallies missing values column by column.

What does df.dropna() do by default (how='any')?

Drops a row if ANY value is NaN
Drops only fully-empty rows
Fills NaN with 0
Drops all columns

Answer: Drops a row if ANY value is NaN. The default how='any' removes any row containing at least one NaN.

What does dropna(how='all') drop?

Every row
Only rows where EVERY value is NaN
Columns with NaN
Nothing

Answer: Only rows where EVERY value is NaN. how='all' removes only rows that are entirely missing.

How do you only drop rows missing the 'score' column?

dropna(axis=1)
dropna(how='all')
score

Answer: score. subset=['score'] limits the NaN check to that one column.

What does df.fillna(0) do?

Drops NaN rows
Replaces every NaN with 0
Counts NaN
Renames columns

Answer: Replaces every NaN with 0. fillna(value) substitutes the given value for every NaN.

When filling a numeric column with its own mean, what does .mean() do with NaN?

Treats NaN as 0
Raises an error
Returns NaN
Ignores NaN automatically

Answer: Ignores NaN automatically. .mean() skips NaN, so the average isn't skewed by the gaps.

What does method='ffill' do?

Carries the previous valid value forward
Fills with the column max
Drops the row
Fills with zeros

Answer: Carries the previous valid value forward. Forward-fill copies the last valid value down into the gaps.