Creating DataFrames

Creating a DataFrame means turning raw Python data — dictionaries, lists, or NumPy arrays — into a labelled table using the pd.DataFrame(...) constructor, choosing the input shape that best matches your data.

Learn Creating DataFrames in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick reference.

Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

Real data arrives in many shapes. In this lesson you'll learn every common way to hand data to Pandas and get a clean table back.

The two most common inputs map onto how your data is organised:

Each key is a column, each list is that column's values.

Each dict is one row; keys become the column names.

If your data is a plain grid of values — a list where each inner list is one row — pass it directly and name the columns with columns= :

Numerical data often starts life as a NumPy array . Pass the 2-D array as the first argument, then label the rows and columns with index= and columns= :

The index= argument sets custom row labels instead of the default 0, 1, 2, ... .

Imagine survey responses arriving one record at a time as dictionaries:

Lesson 4 complete — you can build any DataFrame!

You've created DataFrames from dicts of lists, lists of dicts, lists of lists with named columns, and NumPy arrays — and you can set a custom index too.

🚀 Up next: Reading & Writing Data — load real datasets from CSV, Excel, and JSON files, and save your results back out.

Practice quiz

What constructor builds every kind of DataFrame in this lesson?

  • pd.make_frame(...)
  • pd.create(...)
  • pd.DataFrame(...)
  • pd.table(...)

Answer: pd.DataFrame(...). All four approaches use the same pd.DataFrame(...) constructor.

In a dict of lists like {'name':[...], 'age':[...]}, what does each key become?

  • A column
  • A row
  • An index label
  • A data type

Answer: A column. A dict of lists is column-oriented: each key is a column and its list holds that column's values.

In a list of dicts, what does each dictionary become?

  • A column
  • The index
  • A dtype
  • One row, with keys as column names

Answer: One row, with keys as column names. A list of dicts is row-oriented: each dict is one row and its keys become columns.

When is a list of dicts the most natural input?

  • When you already have whole columns ready
  • When records arrive row by row, like from an API or database
  • When the data is a NumPy array
  • Never

Answer: When records arrive row by row, like from an API or database. Row-by-row records map cleanly onto a list of dicts; missing keys become NaN.

If you pass a list of lists without columns=, what are the column labels?

  • Integers 0, 1, 2, ...
  • The first row's values
  • Letters A, B, C
  • An error

Answer: Integers 0, 1, 2, .... Without columns=, pandas labels the columns 0, 1, 2, ... — almost never what you want.

How do you give a NumPy-array-backed DataFrame meaningful column names?

  • It happens automatically
  • Use df.rename only

Pass columns= (and optionally index=) so the otherwise unnamed columns get labels.

Which argument sets custom row labels instead of the default 0,1,2...?

  • columns=
  • index=
  • labels=
  • rows=

Answer: index=. index= sets the row labels; you can also promote a column with set_index.

Do a dict-of-lists and an equivalent list-of-dicts produce the same table?

  • No, they always differ
  • Only for numeric data
  • Only with index= set
  • Yes, the resulting DataFrames are identical

Answer: Yes, the resulting DataFrames are identical. Both inputs build the same DataFrame; .equals() confirms they are identical.

What does pd.DataFrame(records).shape return for 3 records of 2 keys each?

  • (2, 3)
  • (3, 2)
  • (3, 3)
  • (2, 2)

Answer: (3, 2). shape is (rows, columns): 3 rows and 2 columns gives (3, 2).

How do you promote an existing column to be the index?

  • df.index('column')
  • df.reindex()
  • df.set_index('column')
  • df.columns = 'column'

Answer: df.set_index('column'). df.set_index('column') makes that column the row index.