Missing Data (NA)

NA is R's special marker for a missing or unknown value, and because it propagates through calculations you must handle it explicitly with tools like is.na() and na.rm = TRUE.

Learn Missing Data (NA) in our free R course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick reference.

Part of the free R course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

This lesson shows how NA spreads through arithmetic, why you must test it with is.na() rather than ==, how na.rm and na.omit() let you compute and clean around it, and how NaN and Inf fit into the picture.

What You'll Learn in This Lesson

1️⃣ How NA Propagates

Any calculation that touches an NA produces NA — this is intentional, so you never accidentally report a total that quietly ignored missing data. The big trap: you cannot test for NA with == , because NA == NA is itself NA . Always use is.na() .

2️⃣ Computing Around NA with na.rm

Most summary functions accept na.rm = TRUE , which strips the NAs before computing. Note that mean(x, na.rm = TRUE) divides by the count of present values, not the original length. To count how many are missing, sum a logical: sum(is.na(x)) .

3️⃣ Removing & Flagging Missing Values

To produce a cleaned vector, filter with x[!is.na(x)] or call na.omit(x) . To flag which entries are complete (no NA), complete.cases() returns a logical vector — the same idea scales to data-frame rows.

4️⃣ NaN and Inf

NaN ("Not a Number") comes from undefined math like 0/0 and counts as NA. Inf and -Inf are infinity from things like 1/0 — they are real numbers, not missing, so is.na(Inf) is FALSE .

Your turn. Fill in the ___ blanks, run it, and compare with the expected output.

A realistic cleaning task: missing readings plus an undefined 0/0 . Notice that is.na() catches both the NA and the NaN — that's exactly the behaviour you want when scrubbing data.

📋 Quick Reference — Missing Data

Practice quiz

What does sum(c(10, NA, 30)) return?

  • 40
  • NA
  • 0
  • An error

Answer: NA. A single NA poisons the whole sum, so the result is NA.

Why can't you test for missing with x == NA?

  • It is too slow
  • It only checks the first value
  • It deletes the value
  • It returns NA, never TRUE/FALSE

Answer: It returns NA, never TRUE/FALSE. Comparing to an unknown is itself unknown, so == NA returns NA.

Which function correctly tests for missing values?

  • is.na(x)
  • missing(x)
  • x == NA
  • isNA(x)

Answer: is.na(x). is.na(x) returns a proper logical vector marking each missing slot.

What does mean(x, na.rm = TRUE) do?

  • Replaces NAs with 0
  • Returns NA
  • Counts the NAs
  • Averages only the present values

Answer: Averages only the present values. na.rm = TRUE strips NAs first, then divides by the count of present values.

How do you count how many values are missing?

  • sum(is.na(x))
  • length(is.na(x))
  • count(NA)
  • na.count(x)

Answer: sum(is.na(x)). is.na gives TRUE/FALSE and sum() counts the TRUEs.

Which expression keeps only the non-missing values?

x[!is.na(x)] keeps the elements where NOT missing.

What does 0/0 produce in R?

  • NaN
  • 0
  • Inf
  • NA

Answer: NaN. 0/0 is undefined and yields NaN ('Not a Number').

Is is.na(Inf) TRUE or FALSE?

  • TRUE
  • FALSE
  • NA
  • It errors

Answer: FALSE. Inf is a valid (extreme) number, not missing, so is.na(Inf) is FALSE.

Is is.na(NaN) TRUE or FALSE?

  • FALSE
  • NA
  • It errors
  • TRUE

Answer: TRUE. NaN counts as a special kind of NA, so is.na(NaN) is TRUE.

What does mean(is.na(x)) give you?

  • The count of missing
  • The proportion of values that are missing
  • The mean ignoring NA
  • Always 0

Answer: The proportion of values that are missing. TRUE counts as 1, so the mean of is.na(x) is the fraction missing.