Performance & Vectorization vs Loops
Vectorization is the practice of expressing a computation as operations on whole arrays instead of element-by-element Python loops — letting NumPy run the work as a single compiled C loop over contiguous memory rather than slow, interpreted Python. In this lesson you'll see why Python loops are slow, rewrite a loop as a one-line vectorized expression, preallocate instead of appending, lean on built-in aggregations, and understand why np.vectorize is convenience rather than speed.
Learn Performance & Vectorization vs Loops in our free NumPy course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a…
Part of the free Numpy course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
A Python for loop over numbers pays a tax on every iteration: it interprets bytecode and treats each number as a full Python object that must be unboxed, operated on, and reboxed. A NumPy array keeps raw numbers in contiguous memory and runs the whole operation as one compiled C loop , skipping all that overhead.
Expected output: both produce 1, 4, 9, 16, 25 — identical results, but the vectorized version is far faster on big data.
Most element-wise loops collapse into a single expression. Whenever you see a loop that does the same arithmetic to each element, look for the array version: a + b , a * 2 , or a combined formula applied to whole arrays at once.
Expected output: both give roughly [22, 22, 99] , and np.allclose confirms they match with True .
Calling np.append inside a loop is a trap: each call builds a brand-new array and copies everything, which is quadratic and painfully slow. If you truly need a loop, preallocate with np.zeros(n) once and fill slots by index. Better still, vectorize and skip the loop.
Expected output: both print [ 0 1 4 9 16] — same result, but the preallocated version avoids the costly per-step copies that np.append makes.
Don't hand-roll a running total or maximum in Python. NumPy's built-in aggregations — .sum() , .mean() , .max() , .min() — run in C and are both shorter and dramatically faster than a manual loop.
Expected output: 5050 from both the loop and .sum() , then mean 50.5 .
Beware of np.vectorize : it wraps a scalar function so it accepts arrays — handy for readability. But it still loops in Python internally, so it is not a performance tool. For real speed, express the logic with true NumPy operations (arithmetic, np.where , masks).
Expected output: both print [0 0 0 1 2] , but np.where runs in C while np.vectorize only looks fast.
Replace each ___ so the slow loop becomes a fast, vectorized one-liner that doubles every value and sums the result.
Expected output: [2 4 6 8] then 20 . (Answers: 2 , sum .)
❌ Looping over a huge array element by element
A Python for loop over millions of elements can be 50-100x slower than the array version.
✅ Fix: replace the loop with a single vectorized expression or a built-in aggregation.
It only changes how the function is called; the work still happens in a Python-level loop.
✅ Fix: rewrite the logic with real NumPy ops like np.where , masks, or arithmetic.
Compute the Euclidean distance of each point from the origin, first with a loop, then with one vectorized expression, and confirm they match.
Lesson 21 complete — you write fast NumPy now!
You know why Python loops are slow, how to rewrite them as vectorized expressions, when to preallocate, how to reach for built-in aggregations, and why np.vectorize is convenience rather than speed.
🚀 Up next: the Capstone — put it all together by treating an image as a NumPy array.
Practice quiz
Why is a NumPy vectorized operation faster than a Python for loop?
- It runs as one compiled C loop over contiguous memory
- It uses more RAM
- It skips some elements
- It runs on the GPU automatically
Answer: It runs as one compiled C loop over contiguous memory. NumPy runs the whole operation in compiled C over a contiguous block, avoiding per-element Python overhead.
What does 'vectorization' mean?
- Drawing vectors
- Expressing a calculation as operations on whole arrays
- Using a for loop
- Converting to a list
Answer: Expressing a calculation as operations on whole arrays. Vectorization replaces element-by-element loops with whole-array expressions.
Does np.vectorize make code run faster?
- Yes, it compiles to C
- Yes, it uses threads
- No, it still loops in Python internally
- Yes, always
Answer: No, it still loops in Python internally. np.vectorize is a convenience wrapper; it still loops in Python and is not a speed tool.
Why is calling np.append inside a loop slow?
- It uses the GPU
- It sorts each time
- It compresses data
- It copies the whole array on every call, giving quadratic cost
Answer: It copies the whole array on every call, giving quadratic cost. Each np.append builds a new array and copies all existing data, which is quadratic.
Which is the recommended fast alternative to a loop building squares?
- data ** 2
- a Python for loop with append
- np.vectorize(pow)
- map(lambda x: x*x, data)
Answer: data ** 2. data ** 2 is one vectorized expression that runs in C with no Python loop.
What is the fast way to total an array instead of a manual loop?
- A while loop
- arr.sum()
- np.vectorize(add)
- reduce in pure Python
Answer: arr.sum(). Built-in aggregations like arr.sum() run in C and are far faster than a manual loop.
Which truly fast operation replaces a per-element if/else?
- np.vectorize
- a list comprehension
- np.where(cond, a, b)
- a for loop
Answer: np.where(cond, a, b). np.where applies the conditional element-wise in C, the genuinely fast replacement.
If you must use a loop to fill values, what should you do first?
- Append each value
- Use np.vectorize
- Convert to a list
- Preallocate with np.zeros(n) and fill by index
Answer: Preallocate with np.zeros(n) and fill by index. Preallocating once and filling by index avoids the repeated copies that np.append causes.
How does a Python loop treat each number compared to a NumPy array element?
- As a boxed Python object that must be unboxed each iteration
- As raw C memory
- As a string
- As a GPU value
Answer: As a boxed Python object that must be unboxed each iteration. Python boxes each number as a full object, adding overhead that NumPy's raw contiguous storage avoids.
Both a loop and a vectorized version of the same math should produce...
- Different results
- The same results, but vectorized is faster
- An error
- Only integer results
Answer: The same results, but vectorized is faster. They compute identical results; vectorization just runs much faster on large data.