Data Frames

A data frame is R's spreadsheet-like table — rows of observations and columns of variables, where each column can hold a different type — and it's the central structure for nearly all data analysis.

Learn Data Frames in our free R course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick reference.

Part of the free R course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

By the end of this lesson you'll create data frames, select columns and filter rows, add computed columns, and explore a new dataset with str(), head(), and summary().

What You'll Learn in This Lesson

1️⃣ Creating a Data Frame

Build one with data.frame() , passing each column as a named vector. All columns must be the same length, but they can be different types. The columns are really just the vectors you've already learned.

2️⃣ Selecting and Filtering

Grab a column with $ , or use df[rows, cols] like a matrix. The most important move in analysis is filtering rows — put a logical test in the row slot.

The pattern df[df$age 30, ] keeps only the rows where the test is TRUE — the everyday verb of data analysis. (dplyr, two lessons from now, gives an even cleaner way.)

3️⃣ Exploring and Extending

When you meet a new dataset, run str() , head() , and summary() first. Adding a computed column is a single assignment — and it's vectorized, so it fills every row at once.

Your turn. Fill in the # TODO blank, run it, and compare with the expected output.

Write it from the outline, run it, and check it against the example output. which.max() finds the position of the largest value — perfect for "the top row".

📋 Quick Reference — Data Frames

Practice quiz

What is true of the columns of a data frame?

  • All columns must be numeric
  • Columns may have different lengths
  • Each column can be a different type, but all have the same length
  • Columns are always factors

Answer: Each column can be a different type, but all have the same length. A data frame is a list of equal-length vectors that can differ in type.

Which function builds a data frame from named vectors?

  • data.frame()
  • as.matrix()
  • list()
  • table()

Answer: data.frame(). data.frame(name = ..., age = ...) constructs a data frame.

How do you extract the age column from a data frame people?

  • people@age
  • people->age
  • people::age
  • people$age

Answer: people$age. The $ operator returns a column as a vector.

What does people[1, ] return?

  • The first column, all rows
  • The first row, all columns
  • The element at row 1 column 1
  • The column named 1

Answer: The first row, all columns. With df[rows, cols], a blank column slot means all columns.

Which expression keeps only rows where age is over 30?

Put the logical test in the row slot, and don't forget the comma.

How do you add a computed tax column equal to price times 0.2?

  • df + tax = price * 0.2
  • add(df, tax)
  • df$tax <- df$price * 0.2

Answer: df$tax <- df$price * 0.2. Assigning to a new $ name creates the column; the math is vectorized.

Which function shows column types and a compact preview?

  • dim(df)
  • str(df)
  • print(df)
  • rev(df)

Answer: str(df). str() reports the structure: types and a preview of each column.

What does nrow(df) return?

  • The number of columns
  • The column names
  • The first row
  • The number of rows

Answer: The number of rows. nrow() gives rows; ncol() gives columns.

Why is df[df$age > 30] (no comma) usually wrong for filtering rows?

  • It always errors
  • Without the comma it selects columns, not rows
  • It deletes the data frame
  • It rounds the values

Answer: Without the comma it selects columns, not rows. The comma separates the row slot from the column slot; omit it and you index columns.

What does df$Age return if the column is actually named age?

  • The age column
  • An error
  • NULL
  • An empty string

Answer: NULL. Column lookup is case-sensitive; a wrong name yields NULL.