Performance & Vectorization vs Loops

Vectorization is the practice of expressing a computation as operations on whole arrays instead of element-by-element Python loops — letting NumPy run the work as a single compiled C loop over contiguous memory rather than slow, interpreted Python. In this lesson you'll see why Python loops are slow, rewrite a loop as a one-line vectorized expression, preallocate instead of appending, lean on built-in aggregations, and understand why np.vectorize is convenience rather than speed.

Learn Performance & Vectorization vs Loops in our free NumPy course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a…

Part of the free Numpy course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

A Python for loop over numbers pays a tax on every iteration: it interprets bytecode and treats each number as a full Python object that must be unboxed, operated on, and reboxed. A NumPy array keeps raw numbers in contiguous memory and runs the whole operation as one compiled C loop , skipping all that overhead.

Expected output: both produce 1, 4, 9, 16, 25 — identical results, but the vectorized version is far faster on big data.

Most element-wise loops collapse into a single expression. Whenever you see a loop that does the same arithmetic to each element, look for the array version: a + b , a * 2 , or a combined formula applied to whole arrays at once.

Expected output: both give roughly [22, 22, 99] , and np.allclose confirms they match with True .

Calling np.append inside a loop is a trap: each call builds a brand-new array and copies everything, which is quadratic and painfully slow. If you truly need a loop, preallocate with np.zeros(n) once and fill slots by index. Better still, vectorize and skip the loop.

Expected output: both print [ 0 1 4 9 16] — same result, but the preallocated version avoids the costly per-step copies that np.append makes.

Don't hand-roll a running total or maximum in Python. NumPy's built-in aggregations — .sum() , .mean() , .max() , .min() — run in C and are both shorter and dramatically faster than a manual loop.

Expected output: 5050 from both the loop and .sum() , then mean 50.5 .

Beware of np.vectorize : it wraps a scalar function so it accepts arrays — handy for readability. But it still loops in Python internally, so it is not a performance tool. For real speed, express the logic with true NumPy operations (arithmetic, np.where , masks).

Expected output: both print [0 0 0 1 2] , but np.where runs in C while np.vectorize only looks fast.

Replace each ___ so the slow loop becomes a fast, vectorized one-liner that doubles every value and sums the result.

Expected output: [2 4 6 8] then 20 . (Answers: 2 , sum .)

❌ Looping over a huge array element by element

A Python for loop over millions of elements can be 50-100x slower than the array version.

✅ Fix: replace the loop with a single vectorized expression or a built-in aggregation.

It only changes how the function is called; the work still happens in a Python-level loop.

✅ Fix: rewrite the logic with real NumPy ops like np.where , masks, or arithmetic.

Compute the Euclidean distance of each point from the origin, first with a loop, then with one vectorized expression, and confirm they match.

Lesson 21 complete — you write fast NumPy now!

You know why Python loops are slow, how to rewrite them as vectorized expressions, when to preallocate, how to reach for built-in aggregations, and why np.vectorize is convenience rather than speed.

🚀 Up next: the Capstone — put it all together by treating an image as a NumPy array.

Practice quiz

Why is a NumPy vectorized operation faster than a Python for loop?

  • It runs as one compiled C loop over contiguous memory
  • It uses more RAM
  • It skips some elements
  • It runs on the GPU automatically

Answer: It runs as one compiled C loop over contiguous memory. NumPy runs the whole operation in compiled C over a contiguous block, avoiding per-element Python overhead.

What does 'vectorization' mean?

  • Drawing vectors
  • Expressing a calculation as operations on whole arrays
  • Using a for loop
  • Converting to a list

Answer: Expressing a calculation as operations on whole arrays. Vectorization replaces element-by-element loops with whole-array expressions.

Does np.vectorize make code run faster?

  • Yes, it compiles to C
  • Yes, it uses threads
  • No, it still loops in Python internally
  • Yes, always

Answer: No, it still loops in Python internally. np.vectorize is a convenience wrapper; it still loops in Python and is not a speed tool.

Why is calling np.append inside a loop slow?

  • It uses the GPU
  • It sorts each time
  • It compresses data
  • It copies the whole array on every call, giving quadratic cost

Answer: It copies the whole array on every call, giving quadratic cost. Each np.append builds a new array and copies all existing data, which is quadratic.

Which is the recommended fast alternative to a loop building squares?

  • data ** 2
  • a Python for loop with append
  • np.vectorize(pow)
  • map(lambda x: x*x, data)

Answer: data ** 2. data ** 2 is one vectorized expression that runs in C with no Python loop.

What is the fast way to total an array instead of a manual loop?

  • A while loop
  • arr.sum()
  • np.vectorize(add)
  • reduce in pure Python

Answer: arr.sum(). Built-in aggregations like arr.sum() run in C and are far faster than a manual loop.

Which truly fast operation replaces a per-element if/else?

  • np.vectorize
  • a list comprehension
  • np.where(cond, a, b)
  • a for loop

Answer: np.where(cond, a, b). np.where applies the conditional element-wise in C, the genuinely fast replacement.

If you must use a loop to fill values, what should you do first?

  • Append each value
  • Use np.vectorize
  • Convert to a list
  • Preallocate with np.zeros(n) and fill by index

Answer: Preallocate with np.zeros(n) and fill by index. Preallocating once and filling by index avoids the repeated copies that np.append causes.

How does a Python loop treat each number compared to a NumPy array element?

  • As a boxed Python object that must be unboxed each iteration
  • As raw C memory
  • As a string
  • As a GPU value

Answer: As a boxed Python object that must be unboxed each iteration. Python boxes each number as a full object, adding overhead that NumPy's raw contiguous storage avoids.

Both a loop and a vectorized version of the same math should produce...

  • Different results
  • The same results, but vectorized is faster
  • An error
  • Only integer results

Answer: The same results, but vectorized is faster. They compute identical results; vectorization just runs much faster on large data.