Performance: vectorization & memory

Performance tuning in pandas is the practice of making your code faster and leaner by replacing slow Python row-loops with vectorized operations that run in compiled C, and by shrinking memory with smaller dtypes and the category type — so the same analysis handles far larger datasets.

Learn Performance: vectorization & memory in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a…

Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

The single biggest pandas speed lesson: operate on whole columns at once , not row by row. A vectorized expression like df["a"] * df["b"] runs in compiled C over the entire array; a Python loop or iterrows() runs interpreted Python for every row and is typically 10–100× slower.

.apply is still a per-row Python call. For conditional logic, np.where (or .clip , .map , boolean masks) does the same job vectorized.

pandas defaults to wide dtypes (int64, float64, object strings). Downcasting numbers to int32 / int16 and converting low-cardinality text columns to category can slash memory. Measure with df.memory_usage(deep=True) .

Replace the slow loop with a single vectorized expression that applies a 10% discount. Fill in the blank.

Vectorize a calculation and shrink the frame's memory footprint, then confirm the savings.

Lesson complete — your pandas is now fast and lean!

You know to vectorize instead of loop, to skip .apply when a vectorized op exists, and to cut memory with smaller dtypes and category. These habits let you scale from toy data to millions of rows.

🚀 Up next: the Capstone — an end-to-end data cleaning pipeline tying everything together.

Practice quiz

Why is a vectorized expression faster than a Python row-loop?

It uses more memory
It runs as compiled C over whole arrays
It prints less
It skips rows

Answer: It runs as compiled C over whole arrays. Vectorized ops run in compiled C on entire arrays, avoiding per-row Python overhead.

Roughly how much slower can iterrows be than a vectorized op?

About the same
Slightly faster
10-100x slower
Always 2x slower

Answer: 10-100x slower. Looping with iterrows runs interpreted Python per row, typically 10-100x slower.

What is a good vectorized replacement for an if/else .apply?

np.where(cond, x, y)
a for loop
df.iterrows()
print()

Answer: np.where(cond, x, y). np.where applies conditional logic vectorized across the whole column.

When is .apply acceptable?

Always, it's fastest
For every column
Never use it
Only when no vectorized equivalent exists

Answer: Only when no vectorized equivalent exists. .apply is essentially a loop; reserve it for cases with no vectorized alternative.

Which dtype best stores a low-cardinality repeated string column?

category
object
float64
int64

Answer: category. category stores each unique value once plus tiny integer codes, saving lots of memory.

How do you measure a DataFrame's true memory use including object strings?

len(df)
df.memory_usage(deep=True)
df.size
df.count()

Answer: df.memory_usage(deep=True). deep=True accounts for the actual size of object (string) columns.

When does the category dtype HURT rather than help?

When values repeat a lot
When the column is numeric only
When almost every value is unique (high cardinality)
Never

Answer: When almost every value is unique (high cardinality). With mostly-unique values, category adds overhead instead of saving memory.

Which downcasts a column to a smaller integer type automatically?

pd.to_numeric(s, downcast='integer')
s.astype('object')
s.apply(int)
s.round()

Answer: pd.to_numeric(s, downcast='integer'). pd.to_numeric with downcast='integer' picks the smallest integer dtype that fits.

Is %timeit plain Python or a Jupyter magic?

A pandas method
A built-in function
A Jupyter/IPython magic command
A NumPy function

Answer: A Jupyter/IPython magic command. %timeit is an IPython magic that times a line over many runs; it isn't plain Python.

Which of these string cleanups is vectorized (no apply needed)?

A for loop over names
df.iterrows()
json parsing
names.str.strip().str.title()

Answer: names.str.strip().str.title(). The .str accessor runs vectorized string operations across the Series.