Regular Expressions

A regular expression is a compact pattern language for describing text, and in R you use it with functions like grepl() and gsub() to detect, extract, and replace pieces of strings.

Learn Regular Expressions in our free R course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick reference.

Part of the free R course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

This lesson covers detecting matches with grepl/grep, replacing with sub/gsub, extracting with regmatches, the core anchors, classes and quantifiers, and R's two-backslash escaping along with fixed= and perl= modes.

What You'll Learn in This Lesson

1️⃣ Detecting Matches: grepl() & grep()

grepl() returns TRUE / FALSE for each string — ideal for filtering. grep() returns the matching positions , or the matching strings with value = TRUE . The anchors ^ (start) and $ (end) pin the pattern to the edges.

2️⃣ Replacing: sub() & gsub()

sub() changes the first match; gsub() changes them all (g for "global"). Character classes like [0-9] match any digit and the quantifier + means "one or more". Note the whitespace shorthand \s — in an R string you write it with two backslashes.

3️⃣ Extracting Matches

To pull the matched text out (rather than replace it), pair a locator with regmatches() . regexpr() finds the first match; gregexpr() finds them all and regmatches() returns them as a list.

4️⃣ Escaping, fixed=, and perl=

Regex metacharacters like . need escaping to match literally — and in an R string the backslash itself is doubled, so the regex \. becomes "\\." . To skip regex entirely, pass fixed = TRUE ; for advanced features, pass perl = TRUE .

Your turn. Fill in the ___ blanks, run it, and compare with the expected output.

The key trick here is the negated class [^0-9-] — "anything that is NOT a digit or a dash" — which is the cleanest way to strip a string down to just the characters you want to keep.

📋 Quick Reference — Regex in R

Practice quiz

What does grepl() return?

  • The matching strings
  • A logical TRUE/FALSE for each element
  • The match positions
  • The replaced text

Answer: A logical TRUE/FALSE for each element. grepl() returns a logical vector; the 'l' stands for logical.

What does grep('e

#39;, x) return?

  • The integer positions that match
  • TRUE/FALSE per element
  • Only the first match
  • A list of matches

Answer: The integer positions that match. grep() returns the positions; add value = TRUE to get the strings.

Which function replaces ALL matches in each string?

  • sub()
  • replace()
  • gsub()
  • regmatches()

Answer: gsub(). gsub() replaces every match; the g is for global. sub() replaces only the first.

In an R string, how do you write the regex for a digit, \d?

  • "d"
  • "\d"
  • "//d"
  • "\\d"

Answer: "\\d". R's parser needs the backslash doubled, so \d is written "\\d".

Which anchor matches the END of a string?

  • $
  • ^
  • .
  • *

Answer: $. $ anchors to the end; ^ anchors to the start.

What does the quantifier + mean?

  • Zero or more
  • One or more
  • Exactly one
  • Optional

Answer: One or more. + means one or more of the preceding element.

What does fixed = TRUE do?

  • Enables Perl features
  • Makes matching case-insensitive
  • Turns regex off so the pattern is literal
  • Anchors the pattern

Answer: Turns regex off so the pattern is literal. fixed = TRUE matches the pattern literally, so . means a real dot.

Which pair extracts the matched text rather than replacing it?

  • regexpr() + regmatches()
  • sub() + gsub()
  • grepl() + grep()
  • paste() + cat()

Answer: regexpr() + regmatches(). regexpr() locates the match and regmatches() pulls it out.

What does the character class [0-9] match?

  • Any letter
  • Any single digit
  • Whitespace
  • Any character

Answer: Any single digit. [0-9] matches a single digit character.

What does perl = TRUE enable?

  • Literal matching
  • Case folding
  • Slower matching only
  • Perl-style features like lookarounds and \d

Answer: Perl-style features like lookarounds and \d. perl = TRUE switches to the PCRE engine with richer features.