Rate Limiting (Flask-Limiter)

Rate limiting caps how many requests a single client may make in a time window, rejecting the excess with a 429 Too Many Requests response to protect your API from abuse.

Learn Rate Limiting (Flask-Limiter) in our free Flask course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick…

Part of the free Flask course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

In this lesson you'll build a fixed-window limiter and a token bucket from scratch, return proper 429 responses, key limits on IP or user, and see how Flask-Limiter does it declaratively.

A public API is a target. A buggy client, a scraper, or an attacker can hammer an endpoint thousands of times a second and exhaust your database or bandwidth. Rate limiting caps how many requests one client may make in a window and rejects the rest.

The simplest strategy is a fixed window : allow N requests per window, then reset the counter when the window expires. When a client goes over, you return HTTP 429 Too Many Requests — the standard "slow down" status.

The runnable example below allows 3 requests, then returns 429 for the 4th and 5th. The limit is keyed on the client's IP, so different clients get independent allowances.

Requests 1-3 return 200 with pong ; requests 4 and 5 return 429 with an error. The window holds the line.

A fixed window has a weakness: a client can fire its whole quota at the very end of one window and again at the start of the next, doubling the burst. A token bucket fixes this. The bucket holds up to capacity tokens, refills at a steady rate, and each request must take one token — so requests are smoothed to the refill rate.

The runnable class below starts with 2 tokens and refills 1 per second. The first two requests succeed, the next two fail (the bucket is empty), and after waiting just over a second one token has refilled, so the next request succeeds again.

The key decides who a limit applies to. Keying on IP protects anonymous endpoints, but for an authenticated API you usually key on the user or API key so each account gets a fair, independent allowance. A well-behaved limiter also sets a Retry-After header telling the client how long to wait.

The runnable example keys on an X-User-Id header when present. alice uses up her allowance and gets a 429 with Retry-After , while bob — a different key — is still allowed.

In Flask-Limiter you swap the key_func — for example lambda: current_user.id — to limit per user instead of per IP.

Complete the limiter below. Replace each ___ so the route allows 2 requests per IP and returns the right status when exceeded.

An in-memory dict (or Flask-Limiter's default memory storage) is per-process, so 4 workers allow 4× the limit. Use a shared storage_uri="redis://..." so every worker counts against one store.

Behind a proxy, request.remote_addr is the proxy's IP, so all users share one bucket. Configure ProxyFix / trusted proxies so the real client IP (from X-Forwarded-For ) is used as the key.

Build one limit helper and apply different limits to two routes.

Lesson complete — your API can defend itself!

You built a fixed-window limiter and a token bucket, returned 429 with Retry-After , keyed limits on IP and user, and saw the declarative Flask-Limiter equivalent.

🚀 Up next: Background Tasks — move slow work off the request thread with Celery and RQ.

Practice quiz

Which HTTP status means the client has sent too many requests?

429 Too Many Requests
403 Forbidden
401 Unauthorized
404 Not Found

Answer: 429 Too Many Requests. 429 Too Many Requests is the standard rate-limit rejection status.

Which response header tells the client how long to wait before retrying?

Cache-Control
Retry-After
X-Rate-Remaining
Location

Answer: Retry-After. Retry-After gives the number of seconds to wait before sending another request.

What weakness does a fixed-window limiter have?

It needs Redis
It cannot return 429
Bursts can sneak through at window edges
It blocks all requests

Answer: Bursts can sneak through at window edges. A client can spend its quota at the end of one window and again at the start of the next.

How does a token bucket smooth bursts?

It blocks every other request
It resets nightly
It caches responses
Tokens refill at a steady rate and each request takes one

Answer: Tokens refill at a steady rate and each request takes one. Requests are smoothed to the refill rate; a request passes only if a token is available.

Which Flask-Limiter decorator applies a per-route limit?

@limiter.limit("5/minute")
@app.rate(5)
@limit_route(5)
@throttle(5)

Answer: @limiter.limit("5/minute"). @limiter.limit("5/minute") declares the allowance for that route.

What does key_func=get_remote_address do in Flask-Limiter?

Caches the response
Keys limits on the client IP address
Sets the storage backend
Returns the 429 body

Answer: Keys limits on the client IP address. get_remote_address makes the limit count per client IP.

Why use Redis (storage_uri) instead of in-memory counts?

It is required by Flask
It encrypts requests
So all worker processes share one count
To speed up the database

Answer: So all worker processes share one count. In-memory counts are per-process, so multiple workers would each allow the full limit.

For an authenticated API, what is usually the best key for limits?

The request path
The User-Agent
A random value
The user or API key

Answer: The user or API key. Keying on the user or API key gives each account its own fair allowance.

Why might every user appear rate-limited as one client behind a proxy?

request.remote_addr is the proxy's IP
Redis is offline
The limit is too high
429 is disabled

Answer: request.remote_addr is the proxy's IP. Behind a proxy remote_addr is the proxy IP unless ProxyFix / X-Forwarded-For is configured.

Which Flask-Limiter decorator removes limiting from a route?

@limiter.skip
@limiter.exempt
@limiter.off
@limiter.free

Answer: @limiter.exempt. @limiter.exempt exempts a route from rate limiting.