Data Types & astype
A dtype is the internal storage type pandas assigns to each column — int64, float64, object, bool, or datetime64 — and it decides how fast and how correctly your operations run.
Learn Data Types & astype in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick reference.
Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
Learn to read the .dtypes attribute, convert columns with astype(), handle messy values with pd.to_numeric(errors=), and understand why object columns are the slow ones.
Every column has exactly one dtype, and df.dtypes lists them all at once. The names you will meet most often are int64 (whole numbers), float64 (decimals, and the home of NaN ), bool (True/False), datetime64[ns] (timestamps), and object — the catch-all that usually means text, but is also where pandas dumps any column it could not interpret as something cleaner.
astype() is the direct way to change a column's type. Call it on a Series and pass the target type — "int64" , "float64" , "str" , "bool" . It is strict: if any value cannot be converted, it raises an error rather than guessing. That strictness is a feature, because it stops bad data from silently slipping through.
Real spreadsheets are dirty: a price column might hold "1.50" , "3.00" , and a rogue "N/A" . Calling astype("float") on that explodes. pd.to_numeric() is built for this — its errors= argument lets you decide what happens to values that will not parse. errors="coerce" turns each bad value into NaN so the rest of the column converts successfully.
A stray non-numeric value blows up a strict astype:
astype("int") refuses a column that contains NaN:
✅ Fix: fill the gaps first, or use the nullable Int64:
A small import came in with everything as text. Repair the dtypes.
Lesson complete — your columns have the right types!
You can read df.dtypes , convert cleanly with astype , rescue messy data with pd.to_numeric(errors="coerce") and pd.to_datetime , and you know why object columns drag performance down.
🚀 Up next: Categorical Data — the special dtype that shrinks repeated-text columns and unlocks ordered categories.
Practice quiz
What does df.dtypes show you?
- The data type of every column
- The number of rows
- The column names only
- The missing values
Answer: The data type of every column. df.dtypes lists each column's storage type at once.
Which dtype is the home of NaN and decimal numbers?
- int64
- bool
- float64
- datetime64
Answer: float64. float64 holds decimals and is where NaN lives.
What happens to an int column the moment a single NaN appears?
- It stays int64
- It becomes bool
- It is deleted
- It becomes float64
Answer: It becomes float64. Classic int64 cannot hold NaN, so the column is promoted to float64.
What does pd.Series([88.7, 91.2, 75.9]).astype('int') produce?
astype('int') truncates toward zero rather than rounding.
How is astype() described compared with to_numeric()?
- Strict, raising an error on bad values
- It silently drops bad rows
- It only works on dates
- It always returns strings
Answer: Strict, raising an error on bad values. astype is strict and raises if any value cannot be converted.
What does errors='coerce' do in pd.to_numeric()?
- Raises a ValueError
- Skips the whole column
- Rounds the values
- Turns unparseable values into NaN
Answer: Turns unparseable values into NaN. errors='coerce' replaces values that cannot be parsed with NaN.
After pd.to_numeric(pd.Series(['1.50','3.00','N/A','0.75']), errors='coerce'), what does .isna().sum() return?
- 1
- 0
- 2
- 3
Answer: 1. Only 'N/A' fails to parse, so exactly one NaN appears.
Which function parses a column of date strings into real datetimes?
- pd.to_int
- pd.parse
- pd.to_datetime
- pd.astype_date
Answer: pd.to_datetime. pd.to_datetime parses common formats into a datetime64 column.
Why are object columns slow?
- They are always sorted
- They store pointers to scattered Python objects instead of a contiguous block
- They are stored on disk
- They are encrypted
Answer: They store pointers to scattered Python objects instead of a contiguous block. Object columns hold pointers to separate Python objects, so operations loop in slow Python.
astype('int') fails on a column containing NaN. A safe fix is:
- Delete the column
- Convert to bool first
- Use a for loop
- fillna(0) first, or use the nullable 'Int64' dtype
Answer: fillna(0) first, or use the nullable 'Int64' dtype. Clean the NaNs with fillna, or use the nullable Int64 dtype that allows missing values.