Reading CSV & Excel Files

Spreadsheets run the business world — sales exports, customer lists, financial reports. Python lets you read, transform, and write that tabular data automatically, turning hours of manual spreadsheet work into a script that runs in seconds.

Learn Reading CSV & Excel Files in our free Python course — a beginner-friendly interactive lesson with runnable examples, a practice exercise and a quick…

Part of the free Python course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

You'll start with the built-in csv module, then meet openpyxl and pandas for real Excel files.

A CSV is just a text file. If you open employees.csv in a plain editor, you'll see:

The first line is usually the header (column names). Each following line is a row , with commas separating fields . That's the whole format.

csv.reader gives each row as a list . csv.DictReader uses the header row to give each row as a dict — usually much clearer:

Writing mirrors reading. Use writer for lists or DictWriter for dicts. Remember newline="" here too:

writeheader() writes the column names; writerow() writes one row; writerows() writes a whole list at once.

The csv module can't read .xlsx files — those are zipped XML, not text. The openpyxl library handles them:

An Excel workbook contains one or more worksheets , each a grid of cells addressed like A1 , B2 . iter_rows(values_only=True) is the cleanest way to loop.

When you need filtering, grouping, or statistics, pandas reads CSV and Excel into a DataFrame — a spreadsheet you can program against:

A classic task: read a raw export, fix the data, write a clean version. This runs with the built-in csv module — no extra installs:

Real data is messy: stray spaces, blank cells, numbers stored as text. The cleaning loop here (strip, convert, default missing values) is the pattern you'll use again and again.

These lines write a CSV file with a header and one data row, but they're scrambled. Find the order:

Import, open the file (note newline="" ), create the writer with field names, write the header, then write the row.

TypeError — CSV values are strings, so "5" + 3 mixes str and int. You must convert first: int(row["qty"]) + 3 gives 8 .

95 — zip pairs the header with the row and dict() builds a mapping. This is exactly what DictReader does for you under the hood.

20.0 — the average of 10, 20, 30. In pandas this is just df["price"].mean() , which is why it's so handy for analysis.

You can automate the spreadsheet world now!

You can read and write CSV with the built-in module, parse real Excel files with openpyxl, and unleash pandas for analysis. Cleaning a messy export by hand is now a script you run in seconds.

🚀 Up next: Web Scraping with BeautifulSoup — pull structured data out of web pages.

Practice quiz

What does CSV stand for?

Computed Spreadsheet Values
Column Sorted Vector
Comma-Separated Values
Common Storage View

Answer: Comma-Separated Values. CSV stands for Comma-Separated Values — a plain-text table format.

What is the difference between csv.reader and csv.DictReader?

reader gives lists; DictReader uses the header row to give dicts
DictReader gives lists; reader gives dicts
They are identical
reader only works on Excel files

Answer: reader gives lists; DictReader uses the header row to give dicts. csv.reader yields each row as a list; csv.DictReader uses the header to yield each row as a dict.

Why should you open a CSV file with newline=""?

It makes the file smaller
It is required for UTF-8
It speeds up reading
It prevents the csv module from inserting blank rows on Windows

Answer: It prevents the csv module from inserting blank rows on Windows. newline="" stops double newline translation that would otherwise add blank rows on Windows.

What type is every value you read from a CSV, e.g. row["salary"] = "95000"?

int
str (a string)
float
bool

Answer: str (a string). Every CSV value is a string; you must convert with int() or float() before doing math.

Why should you NOT split CSV lines yourself with line.split(",")?

Fields can contain commas inside quotes, which naive splitting corrupts
It's too slow
split doesn't exist in Python
It only works on Excel

Answer: Fields can contain commas inside quotes, which naive splitting corrupts. Quoted fields may contain commas; the csv module handles quoting and escaping that split breaks.

Which csv.DictWriter call writes the column names as the first row?

writer.writerow()
writer.header()
writer.writeheader()
writer.writerows()

Answer: writer.writeheader(). writeheader() writes the field names; writerow/writerows write data rows.

Which library is commonly used to read real .xlsx Excel files?

The built-in csv module
openpyxl
json
sqlite3

Answer: openpyxl. The csv module can't read .xlsx; openpyxl (or pandas) reads Excel files.

In openpyxl, what does sheet.iter_rows(values_only=True) yield?

Cell objects
Column headers only
A single string per sheet
Plain cell values rather than Cell objects

Answer: Plain cell values rather than Cell objects. values_only=True yields plain values, which is the cleanest way to loop over rows.

What does pandas read CSV and Excel data into?

A list of tuples
A DataFrame
A Counter
A plain dict

Answer: A DataFrame. pandas reads tabular data into a DataFrame, a programmable spreadsheet-like structure.

When is the built-in csv module preferable to pandas?

When you need grouping and statistics
When reading .xlsx files
For simple, dependency-free, row-by-row reading and writing
When joining multiple tables

Answer: For simple, dependency-free, row-by-row reading and writing. Use csv for simple dependency-free row work; reach for pandas when you need analysis or Excel support.