Reading CSV & Excel Files
Spreadsheets run the business world — sales exports, customer lists, financial reports. Python lets you read, transform, and write that tabular data automatically, turning hours of manual spreadsheet work into a script that runs in seconds.
Learn Reading CSV & Excel Files in our free Python course — a beginner-friendly interactive lesson with runnable examples, a practice exercise and a quick…
Part of the free Python course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
You'll start with the built-in csv module, then meet openpyxl and pandas for real Excel files.
A CSV is just a text file. If you open employees.csv in a plain editor, you'll see:
The first line is usually the header (column names). Each following line is a row , with commas separating fields . That's the whole format.
csv.reader gives each row as a list . csv.DictReader uses the header row to give each row as a dict — usually much clearer:
Writing mirrors reading. Use writer for lists or DictWriter for dicts. Remember newline="" here too:
writeheader() writes the column names; writerow() writes one row; writerows() writes a whole list at once.
The csv module can't read .xlsx files — those are zipped XML, not text. The openpyxl library handles them:
An Excel workbook contains one or more worksheets , each a grid of cells addressed like A1 , B2 . iter_rows(values_only=True) is the cleanest way to loop.
When you need filtering, grouping, or statistics, pandas reads CSV and Excel into a DataFrame — a spreadsheet you can program against:
A classic task: read a raw export, fix the data, write a clean version. This runs with the built-in csv module — no extra installs:
Real data is messy: stray spaces, blank cells, numbers stored as text. The cleaning loop here (strip, convert, default missing values) is the pattern you'll use again and again.
These lines write a CSV file with a header and one data row, but they're scrambled. Find the order:
Import, open the file (note newline="" ), create the writer with field names, write the header, then write the row.
TypeError — CSV values are strings, so "5" + 3 mixes str and int. You must convert first: int(row["qty"]) + 3 gives 8 .
95 — zip pairs the header with the row and dict() builds a mapping. This is exactly what DictReader does for you under the hood.
20.0 — the average of 10, 20, 30. In pandas this is just df["price"].mean() , which is why it's so handy for analysis.
You can automate the spreadsheet world now!
You can read and write CSV with the built-in module, parse real Excel files with openpyxl, and unleash pandas for analysis. Cleaning a messy export by hand is now a script you run in seconds.
🚀 Up next: Web Scraping with BeautifulSoup — pull structured data out of web pages.
Practice quiz
What does CSV stand for?
- Computed Spreadsheet Values
- Column Sorted Vector
- Comma-Separated Values
- Common Storage View
Answer: Comma-Separated Values. CSV stands for Comma-Separated Values — a plain-text table format.
What is the difference between csv.reader and csv.DictReader?
- reader gives lists; DictReader uses the header row to give dicts
- DictReader gives lists; reader gives dicts
- They are identical
- reader only works on Excel files
Answer: reader gives lists; DictReader uses the header row to give dicts. csv.reader yields each row as a list; csv.DictReader uses the header to yield each row as a dict.
Why should you open a CSV file with newline=""?
- It makes the file smaller
- It is required for UTF-8
- It speeds up reading
- It prevents the csv module from inserting blank rows on Windows
Answer: It prevents the csv module from inserting blank rows on Windows. newline="" stops double newline translation that would otherwise add blank rows on Windows.
What type is every value you read from a CSV, e.g. row["salary"] = "95000"?
- int
- str (a string)
- float
- bool
Answer: str (a string). Every CSV value is a string; you must convert with int() or float() before doing math.
Why should you NOT split CSV lines yourself with line.split(",")?
- Fields can contain commas inside quotes, which naive splitting corrupts
- It's too slow
- split doesn't exist in Python
- It only works on Excel
Answer: Fields can contain commas inside quotes, which naive splitting corrupts. Quoted fields may contain commas; the csv module handles quoting and escaping that split breaks.
Which csv.DictWriter call writes the column names as the first row?
- writer.writerow()
- writer.header()
- writer.writeheader()
- writer.writerows()
Answer: writer.writeheader(). writeheader() writes the field names; writerow/writerows write data rows.
Which library is commonly used to read real .xlsx Excel files?
- The built-in csv module
- openpyxl
- json
- sqlite3
Answer: openpyxl. The csv module can't read .xlsx; openpyxl (or pandas) reads Excel files.
In openpyxl, what does sheet.iter_rows(values_only=True) yield?
- Cell objects
- Column headers only
- A single string per sheet
- Plain cell values rather than Cell objects
Answer: Plain cell values rather than Cell objects. values_only=True yields plain values, which is the cleanest way to loop over rows.
What does pandas read CSV and Excel data into?
- A list of tuples
- A DataFrame
- A Counter
- A plain dict
Answer: A DataFrame. pandas reads tabular data into a DataFrame, a programmable spreadsheet-like structure.
When is the built-in csv module preferable to pandas?
- When you need grouping and statistics
- When reading .xlsx files
- For simple, dependency-free, row-by-row reading and writing
- When joining multiple tables
Answer: For simple, dependency-free, row-by-row reading and writing. Use csv for simple dependency-free row work; reach for pandas when you need analysis or Excel support.