Reading CSV & Excel Files

Spreadsheets run the business world — sales exports, customer lists, financial reports. Python lets you read, transform, and write that tabular data automatically, turning hours of manual spreadsheet work into a script that runs in seconds.

Learn Reading CSV & Excel Files in our free Python course — a beginner-friendly interactive lesson with runnable examples, a practice exercise and a quick…

Part of the free Python course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

You'll start with the built-in csv module, then meet openpyxl and pandas for real Excel files.

A CSV is just a text file. If you open employees.csv in a plain editor, you'll see:

The first line is usually the header (column names). Each following line is a row , with commas separating fields . That's the whole format.

csv.reader gives each row as a list . csv.DictReader uses the header row to give each row as a dict — usually much clearer:

Writing mirrors reading. Use writer for lists or DictWriter for dicts. Remember newline="" here too:

writeheader() writes the column names; writerow() writes one row; writerows() writes a whole list at once.

The csv module can't read .xlsx files — those are zipped XML, not text. The openpyxl library handles them:

An Excel workbook contains one or more worksheets , each a grid of cells addressed like A1 , B2 . iter_rows(values_only=True) is the cleanest way to loop.

When you need filtering, grouping, or statistics, pandas reads CSV and Excel into a DataFrame — a spreadsheet you can program against:

A classic task: read a raw export, fix the data, write a clean version. This runs with the built-in csv module — no extra installs:

Real data is messy: stray spaces, blank cells, numbers stored as text. The cleaning loop here (strip, convert, default missing values) is the pattern you'll use again and again.

These lines write a CSV file with a header and one data row, but they're scrambled. Find the order:

Import, open the file (note newline="" ), create the writer with field names, write the header, then write the row.

TypeError — CSV values are strings, so "5" + 3 mixes str and int. You must convert first: int(row["qty"]) + 3 gives 8 .

95 — zip pairs the header with the row and dict() builds a mapping. This is exactly what DictReader does for you under the hood.

20.0 — the average of 10, 20, 30. In pandas this is just df["price"].mean() , which is why it's so handy for analysis.

You can automate the spreadsheet world now!

You can read and write CSV with the built-in module, parse real Excel files with openpyxl, and unleash pandas for analysis. Cleaning a messy export by hand is now a script you run in seconds.

🚀 Up next: Web Scraping with BeautifulSoup — pull structured data out of web pages.

Practice quiz

What does CSV stand for?

  • Computed Spreadsheet Values
  • Column Sorted Vector
  • Comma-Separated Values
  • Common Storage View

Answer: Comma-Separated Values. CSV stands for Comma-Separated Values — a plain-text table format.

What is the difference between csv.reader and csv.DictReader?

  • reader gives lists; DictReader uses the header row to give dicts
  • DictReader gives lists; reader gives dicts
  • They are identical
  • reader only works on Excel files

Answer: reader gives lists; DictReader uses the header row to give dicts. csv.reader yields each row as a list; csv.DictReader uses the header to yield each row as a dict.

Why should you open a CSV file with newline=""?

  • It makes the file smaller
  • It is required for UTF-8
  • It speeds up reading
  • It prevents the csv module from inserting blank rows on Windows

Answer: It prevents the csv module from inserting blank rows on Windows. newline="" stops double newline translation that would otherwise add blank rows on Windows.

What type is every value you read from a CSV, e.g. row["salary"] = "95000"?

  • int
  • str (a string)
  • float
  • bool

Answer: str (a string). Every CSV value is a string; you must convert with int() or float() before doing math.

Why should you NOT split CSV lines yourself with line.split(",")?

  • Fields can contain commas inside quotes, which naive splitting corrupts
  • It's too slow
  • split doesn't exist in Python
  • It only works on Excel

Answer: Fields can contain commas inside quotes, which naive splitting corrupts. Quoted fields may contain commas; the csv module handles quoting and escaping that split breaks.

Which csv.DictWriter call writes the column names as the first row?

  • writer.writerow()
  • writer.header()
  • writer.writeheader()
  • writer.writerows()

Answer: writer.writeheader(). writeheader() writes the field names; writerow/writerows write data rows.

Which library is commonly used to read real .xlsx Excel files?

  • The built-in csv module
  • openpyxl
  • json
  • sqlite3

Answer: openpyxl. The csv module can't read .xlsx; openpyxl (or pandas) reads Excel files.

In openpyxl, what does sheet.iter_rows(values_only=True) yield?

  • Cell objects
  • Column headers only
  • A single string per sheet
  • Plain cell values rather than Cell objects

Answer: Plain cell values rather than Cell objects. values_only=True yields plain values, which is the cleanest way to loop over rows.

What does pandas read CSV and Excel data into?

  • A list of tuples
  • A DataFrame
  • A Counter
  • A plain dict

Answer: A DataFrame. pandas reads tabular data into a DataFrame, a programmable spreadsheet-like structure.

When is the built-in csv module preferable to pandas?

  • When you need grouping and statistics
  • When reading .xlsx files
  • For simple, dependency-free, row-by-row reading and writing
  • When joining multiple tables

Answer: For simple, dependency-free, row-by-row reading and writing. Use csv for simple dependency-free row work; reach for pandas when you need analysis or Excel support.