Regular Expressions (Pattern & Matcher)
A regular expression is a tiny pattern language for describing text — "three letters then four digits", "anything that looks like an email". Java exposes it through Pattern and Matcher (plus handy String shortcuts). Learn it once and you'll validate, search, and extract text for the rest of your career.
Learn Regular Expressions (Pattern & Matcher) in our free Java course — a beginner-friendly interactive lesson with worked examples, a practice exercise and…
Part of the free Java course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
You'll want a solid grip on Strings (especially escape sequences, since regex needs double backslashes) and while loops for stepping through matches with find() .
💡 Analogy: A regex is a stencil you hold over text. The stencil has fixed parts ("a dash here") and wildcard slots ("any digit here, four of them"). Slide it across a page and it highlights every spot the text fits the shape. \\d{' '}-\\d{' '}-\\d{' '} is a stencil for a phone number; the engine finds every place the page matches that outline.
You can describe most everyday patterns with a small vocabulary:
The quickest way to use a regex is String.matches(regex) , which returns true only if the entire string fits the pattern (it's anchored at both ends). It's perfect for validation: "is this all digits?", "does this look like an email?". Note it does not search for the pattern inside a longer string — for that you need find() (next section).
For anything beyond a single yes/no, use the two-step API: Pattern.compile(regex) compiles the pattern once into a reusable object, and pattern.matcher(text) gives you a Matcher bound to a specific input. Then find() searches for the next occurrence anywhere in the text — call it in a loop to walk through them all, reading each with group() , start() , and end() .
Wrap parts of a pattern in parentheses to capture them. After a match, group(0) is the whole match and group(1) , group(2) … are the captured pieces, numbered left-to-right by opening parenthesis. This is how you turn raw text into structured data — splitting a log line into date, level, and message.
Two more String methods take a regex. replaceAll(regex, replacement) substitutes every match — great for tidying whitespace or masking data. split(regex) breaks a string wherever the pattern matches — far more flexible than splitting on a fixed character, since the separator can itself be a pattern (e.g. "one or more commas with optional spaces").
These lines find and print every word that starts with a capital letter. Put them in order.
Declare the text, compile the pattern, bind a matcher to the text, then loop with find() printing each match. The pattern [A-Z]\\w+ means "a capital letter followed by one or more word characters".
Predict the output before opening each answer.
Answer: false . matches requires the whole string to fit, and \\w+ matches only word characters — the space in "hello world" isn't a word character, so the whole-string match fails. ( find() with the same pattern would instead locate "hello" and "world" separately.)
Answer: ada | example . The two parenthesised groups capture the parts before and after the @. group(1) is the first group ("ada"), group(2) the second ("example"). group(0) would be the whole match "ada@example".
Answer: 3 — the array is ["a", "", "b"] . The two commas create an empty string between them. (Note: trailing empty strings are dropped by default, but an empty string in the middle is kept.)
🎯 Your Turn #1 — Validate a code
Write a regex for "3 uppercase letters then 4 digits" and pass it to matches() .
🎯 Your Turn #2 — Extract hashtags
Build a Pattern for hashtags and loop with find() to print each one.
🧩 Mini-Challenge — Parse key=value config
Use a two-group pattern and find() to pull every key=value pair from a config line.
You now know the core regex tokens, the crucial difference between matches() (whole string) and find() (search), how to compile reusable Pattern s, extract data with capturing groups, and clean or slice text with replaceAll and split — and you'll never forget the double backslash again.
Next up: a Checkpoint that combines switch, wrappers, access modifiers, packages, dates, comparators, and records into one build.
Practice quiz
How do you write the regex token for a digit inside a Java String literal?
- "\d"
- "d"
- "\\d"
- "//d"
Answer: "\\d". A single backslash is a Java escape, so you double it: "\\d" in source becomes the regex \d that the engine sees.
What does String.matches(regex) require?
- The ENTIRE string to match the pattern (anchored at both ends)
- The pattern to appear anywhere in the string
- At least one match
- A compiled Pattern
Answer: The ENTIRE string to match the pattern (anchored at both ends). matches() is implicitly anchored — the whole string must fit the pattern. Use find() to search within a string.
What does "the cat".matches("cat") return?
- true
- It throws
- "cat"
- false
Answer: false. matches() needs the whole string to match, and "the cat" is not exactly "cat", so it returns false.
Which method searches for the pattern anywhere and can step through every occurrence?
- matches()
- find()
- equals()
- contains()
Answer: find(). Matcher.find() locates the next occurrence anywhere in the input; call it in a loop to walk through all matches.
Why prefer Pattern.compile(...) over String.matches in a loop?
- compile() builds the regex once into a reusable object; matches recompiles every call
- matches() is broken
- compile() runs at compile time
- There is no difference
Answer: compile() builds the regex once into a reusable object; matches recompiles every call. String.matches recompiles the regex each time. Compile a Pattern once (ideally a static final field) and reuse it.
After a match, what does group(0) return?
- The first capturing group
- Always an empty string
- The entire match
- The last group
Answer: The entire match. group(0) is the whole match; group(1), group(2)... are the parenthesised capturing groups, numbered left to right.
For the pattern "(\\w+)=(\\S+)" matching "host=localhost", what is group(1)?
- host=localhost
- host
- localhost
- =
Answer: host. The first capturing group (\w+) captures 'host'; group(2) captures 'localhost'; group(0) is the whole match.
What does "a,,b".split(",") produce?
The two commas create an empty string between them, giving ["a", "", "b"] — an empty element in the middle is kept.
What does the quantifier {3} mean, as in [A-Z]{3}?
- Three or more
- Exactly three occurrences
- Zero to three
- At most three
Answer: Exactly three occurrences. {n} means exactly n occurrences, so [A-Z]{3} matches exactly three uppercase letters.
What happens if you call group() before a successful find() or matches()?
- It returns an empty string
- It returns null
- It throws IllegalStateException
- It returns the whole input
Answer: It throws IllegalStateException. Reading a group before a successful match throws IllegalStateException — only read groups after a match returns true.