Text Wrangling with stringr

stringr is the tidyverse package for working with text, giving every string operation a consistent str_ name with the string always first — so detecting, replacing, extracting, and splitting text becomes predictable and pipe-friendly.

Learn Text Wrangling with stringr in our free R course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick reference.

Part of the free R course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

By the end of this lesson you'll detect patterns, replace and extract text, split and pad strings, and use regular expressions to clean messy real-world data.

What You'll Learn in This Lesson

1️⃣ Detecting and Measuring Text

Start by asking questions about each string. str_detect() returns a logical vector you can use to filter; str_length() counts characters; and str_to_upper() / str_to_lower() normalise case. In every function the text comes first and the pattern second.

2️⃣ Replacing, Slicing, and Padding

str_replace() swaps the first match (use str_replace_all() for every match). str_sub() pulls out a substring by start/end position, and str_pad() fills a string to a fixed width — perfect for zero-padded IDs.

3️⃣ Extracting and Splitting with Regex

Regular expressions describe patterns, not exact text. str_extract() pulls the first match (e.g. "[0-9]+" grabs a run of digits), str_split() breaks a string into a list of pieces, and str_count() tallies matches.

Your turn. Fill in the # TODO blank, run it, and compare with the expected output.

Chain a few stringr calls to standardise messy SKUs. Remember the string is always the first argument, so these read naturally left to right.

📋 Quick Reference — stringr

Practice quiz

What does str_detect(x, '@') return?

  • The matched text
  • A logical TRUE/FALSE for each element
  • The first position
  • A list of matches

Answer: A logical TRUE/FALSE for each element. str_detect() returns a logical vector, ideal for filtering.

In stringr functions, which argument comes first?

  • The string being operated on
  • The pattern
  • The replacement
  • The width

Answer: The string being operated on. The string is always the first argument, making functions pipe-friendly.

What is the difference between str_replace() and str_replace_all()?

  • They are identical
  • str_replace_all() changes only the first match
  • str_replace() changes the first match; _all changes every match
  • str_replace() works only on numbers

Answer: str_replace() changes the first match; _all changes every match. str_replace() swaps the first match; str_replace_all() swaps them all.

Which function extracts a substring by position, e.g. characters 1 to 5?

  • str_extract()
  • str_split()
  • str_count()
  • str_sub()

Answer: str_sub(). str_sub(x, 1, 5) slices by start and end position.

What does str_pad(c('7','42'), width = 4, pad = '0') produce?

  • "0007" "0042"
  • "7000" "4200"
  • "7" "42"
  • "0070" "0420"

Answer: "0007" "0042". str_pad() left-pads each string with zeros to width 4.

Which function pulls the FIRST regex match from each element?

  • str_detect()
  • str_extract()
  • str_replace()
  • str_length()

Answer: str_extract(). str_extract(x, "[0-9]+") returns the first matching substring.

Why does str_split() return a list instead of a vector?

  • It is a bug
  • It is faster
  • Each input can split into a different number of pieces
  • Lists are always used in stringr

Answer: Each input can split into a different number of pieces. Because piece counts vary per input, a list (one element per input) is the honest container.

Which function counts how many times a pattern appears in each string?

  • str_count()
  • str_length()
  • str_detect()
  • str_sub()

Answer: str_count(). str_count(x, "a") tallies matches per element.

Which function removes surrounding whitespace?

  • str_pad()
  • str_squish()
  • str_trim()
  • str_wrap()

Answer: str_trim(). str_trim() strips leading and trailing whitespace.

How do you make a stringr pattern case-insensitive?

  • Use fixed('...')
  • Add ignore = TRUE as an argument
  • Use str_to_lower first only
  • Wrap it in regex('...', ignore_case = TRUE)

Answer: Wrap it in regex('...', ignore_case = TRUE). regex("err", ignore_case = TRUE) enables case-insensitive matching.