Regular Expressions (the re module)

A regular expression is a tiny pattern language for describing text — "any digit", "an email", "a word at the end of a line". Python's re module turns those patterns into powerful search-and-replace tools.

Learn Regular Expressions (the re module) in our free Python course — a beginner-friendly interactive lesson with runnable examples, a practice exercise and…

Part of the free Python course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

Once you can read and write regex, validating phone numbers, pulling dates out of log files, or cleaning messy data becomes a few lines instead of a hundred.

A regular expression (regex) is a string that describes a pattern of text. Instead of asking "does this string contain the exact word cat?", regex lets you ask "does this string contain three digits, then a dash, then four digits?".

Everything starts with importing the module — it ships with Python, nothing to install:

That pattern \d{' '}-\d{' '}-\d{' '} reads as: "three digits, a dash, three digits, a dash, four digits". You just described a US phone number in 17 characters.

Regex patterns are full of backslashes ( \d , \w , \b ). But in a normal Python string, \ is the escape character — "\n" means newline, "\t" means tab. That clash causes endless confusion.

The re module has many functions, but 95% of real work uses just five:

Character classes — shortcuts for common groups

By default quantifiers are greedy : they grab as much as possible. Add a ? to make them lazy — grab as little as possible. This trips up everyone the first time:

Parentheses (...) create a capture group . After a match you can pull out each group individually — perfect for parsing structured text.

Named groups ( (?P<name>...) ) make patterns self-documenting and let you reference fields by name instead of counting parentheses.

Here's the kind of validation code that runs behind every registration form. Notice how each rule is a small, readable pattern:

The "perfect" email regex is famously gigantic. In real apps, a simple pattern like the one above plus sending a confirmation email is the practical standard. Don't try to validate every RFC edge case with regex alone.

If you reuse a pattern, compile it once into a Pattern object . It's faster and reads better in loops:

Common flags: re.IGNORECASE (case-insensitive), re.MULTILINE ( ^ / $ match each line), re.DOTALL ( . also matches newlines).

These lines extract all hashtags from a tweet and print them lowercased — but they're scrambled. Put them in the correct order:

Import first, define the data, run findall to collect the captures, then loop. The capture group (\w+) means findall returns just the word, without the # .

Read each snippet and predict what prints before revealing the answer.

['1', '22', '333'] — \d+ grabs each run of one-or-more digits as a separate match.

hello_big_world — \s+ matches each run of whitespace (even multiple spaces) and replaces the whole run with a single underscore.

False — re.match only checks the START of the string, and "the cat" starts with "the". Use re.search to find it anywhere.

You can now describe text patterns like a pro!

You've learned the five core re functions, character classes, quantifiers, greedy vs lazy matching, and capture groups. Regex is a skill you sharpen by using it — keep a cheat sheet handy and test patterns on small strings first.

🚀 Up next: Dates & Times — work with the datetime module to parse, format, and do arithmetic with dates.

Practice quiz

What is the difference between re.match and re.search?

match scans the whole string; search only the start
They are identical
match checks only the START of the string; search scans the whole string
search only works on numbers

Answer: match checks only the START of the string; search scans the whole string. re.match only matches at the very start of the string, while re.search scans the whole string for the first match anywhere.

Why write regex patterns as raw strings like r'\d+'?

So backslashes are treated literally and reach re unchanged
They run faster
Raw strings allow Unicode
It is required by Python syntax

Answer: So backslashes are treated literally and reach re unchanged. Regex uses many backslashes; a raw string (r'...') tells Python to treat them literally instead of as escape sequences like .

What does re.findall(r'\d+', 'a1b22c333') return?

\d+ grabs each run of one-or-more digits as a separate match, giving the list of strings ['1', '22', '333'].

What does bool(re.match(r'cat', 'the cat')) evaluate to?

True
False
None
It raises an error

Answer: False. re.match only checks the START of the string, and 'the cat' starts with 'the', so the result is False. Use re.search to find it anywhere.

By default, quantifiers like .* are:

Greedy (match as much as possible)
Lazy (match as little as possible)
Disabled
Case-insensitive

Answer: Greedy (match as much as possible). Quantifiers are greedy by default, grabbing as much as possible. Add ? (as in .*?) to make them lazy.

What does the lazy pattern r'<.*?>' match in 'bold'?

The whole string at once
Only the first letter
Each tag separately: '', ''
Nothing

Answer: Each tag separately: '', ''. The lazy .*? stops at the first > it can, so it matches each tag individually rather than spanning from the first < to the last >.

How do you create a capture group in a regex?

Parentheses (...) create a capture group; you then retrieve each piece with .group(1), .group(2), etc., or via re.findall.

What does re.sub(r'\s+', '_', 'hello big world') return?

'hello___big___world'
'hello big world'
'_hello_big_world_'
'hello_big_world'

Answer: 'hello_big_world'. \s+ matches each run of whitespace (even multiple spaces) and replaces the whole run with one underscore, giving 'hello_big_world'.

When re.search finds no match, what does it return?

An empty string
None
An empty Match object
It raises ValueError

Answer: None. re.search (and re.match) return None when nothing matches, so always check before calling .group() to avoid an AttributeError.

What is the benefit of re.compile(pattern)?

It validates the pattern only
It makes the regex case-insensitive
It builds a reusable Pattern object that is faster for repeated use
It converts the pattern to a string

Answer: It builds a reusable Pattern object that is faster for repeated use. re.compile turns a pattern into a reusable Pattern object — faster and clearer when you apply the same pattern many times.