Regular Expressions
A regular expression is a compact pattern language for describing text, and in R you use it with functions like grepl() and gsub() to detect, extract, and replace pieces of strings.
Learn Regular Expressions in our free R course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick reference.
Part of the free R course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
This lesson covers detecting matches with grepl/grep, replacing with sub/gsub, extracting with regmatches, the core anchors, classes and quantifiers, and R's two-backslash escaping along with fixed= and perl= modes.
What You'll Learn in This Lesson
1️⃣ Detecting Matches: grepl() & grep()
grepl() returns TRUE / FALSE for each string — ideal for filtering. grep() returns the matching positions , or the matching strings with value = TRUE . The anchors ^ (start) and $ (end) pin the pattern to the edges.
2️⃣ Replacing: sub() & gsub()
sub() changes the first match; gsub() changes them all (g for "global"). Character classes like [0-9] match any digit and the quantifier + means "one or more". Note the whitespace shorthand \s — in an R string you write it with two backslashes.
3️⃣ Extracting Matches
To pull the matched text out (rather than replace it), pair a locator with regmatches() . regexpr() finds the first match; gregexpr() finds them all and regmatches() returns them as a list.
4️⃣ Escaping, fixed=, and perl=
Regex metacharacters like . need escaping to match literally — and in an R string the backslash itself is doubled, so the regex \. becomes "\\." . To skip regex entirely, pass fixed = TRUE ; for advanced features, pass perl = TRUE .
Your turn. Fill in the ___ blanks, run it, and compare with the expected output.
The key trick here is the negated class [^0-9-] — "anything that is NOT a digit or a dash" — which is the cleanest way to strip a string down to just the characters you want to keep.
📋 Quick Reference — Regex in R
Practice quiz
What does grepl() return?
- The matching strings
- A logical TRUE/FALSE for each element
- The match positions
- The replaced text
Answer: A logical TRUE/FALSE for each element. grepl() returns a logical vector; the 'l' stands for logical.
What does grep('e
#39;, x) return?- The integer positions that match
- TRUE/FALSE per element
- Only the first match
- A list of matches
Answer: The integer positions that match. grep() returns the positions; add value = TRUE to get the strings.
Which function replaces ALL matches in each string?
- sub()
- replace()
- gsub()
- regmatches()
Answer: gsub(). gsub() replaces every match; the g is for global. sub() replaces only the first.
In an R string, how do you write the regex for a digit, \d?
- "d"
- "\d"
- "//d"
- "\\d"
Answer: "\\d". R's parser needs the backslash doubled, so \d is written "\\d".
Which anchor matches the END of a string?
- $
- ^
- .
- *
Answer: $. $ anchors to the end; ^ anchors to the start.
What does the quantifier + mean?
- Zero or more
- One or more
- Exactly one
- Optional
Answer: One or more. + means one or more of the preceding element.
What does fixed = TRUE do?
- Enables Perl features
- Makes matching case-insensitive
- Turns regex off so the pattern is literal
- Anchors the pattern
Answer: Turns regex off so the pattern is literal. fixed = TRUE matches the pattern literally, so . means a real dot.
Which pair extracts the matched text rather than replacing it?
- regexpr() + regmatches()
- sub() + gsub()
- grepl() + grep()
- paste() + cat()
Answer: regexpr() + regmatches(). regexpr() locates the match and regmatches() pulls it out.
What does the character class [0-9] match?
- Any letter
- Any single digit
- Whitespace
- Any character
Answer: Any single digit. [0-9] matches a single digit character.
What does perl = TRUE enable?
- Literal matching
- Case folding
- Slower matching only
- Perl-style features like lookarounds and \d
Answer: Perl-style features like lookarounds and \d. perl = TRUE switches to the PCRE engine with richer features.