Text Wrangling with stringr
stringr is the tidyverse package for working with text, giving every string operation a consistent str_ name with the string always first — so detecting, replacing, extracting, and splitting text becomes predictable and pipe-friendly.
Learn Text Wrangling with stringr in our free R course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick reference.
Part of the free R course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
By the end of this lesson you'll detect patterns, replace and extract text, split and pad strings, and use regular expressions to clean messy real-world data.
What You'll Learn in This Lesson
1️⃣ Detecting and Measuring Text
Start by asking questions about each string. str_detect() returns a logical vector you can use to filter; str_length() counts characters; and str_to_upper() / str_to_lower() normalise case. In every function the text comes first and the pattern second.
2️⃣ Replacing, Slicing, and Padding
str_replace() swaps the first match (use str_replace_all() for every match). str_sub() pulls out a substring by start/end position, and str_pad() fills a string to a fixed width — perfect for zero-padded IDs.
3️⃣ Extracting and Splitting with Regex
Regular expressions describe patterns, not exact text. str_extract() pulls the first match (e.g. "[0-9]+" grabs a run of digits), str_split() breaks a string into a list of pieces, and str_count() tallies matches.
Your turn. Fill in the # TODO blank, run it, and compare with the expected output.
Chain a few stringr calls to standardise messy SKUs. Remember the string is always the first argument, so these read naturally left to right.
📋 Quick Reference — stringr
Practice quiz
What does str_detect(x, '@') return?
- The matched text
- A logical TRUE/FALSE for each element
- The first position
- A list of matches
Answer: A logical TRUE/FALSE for each element. str_detect() returns a logical vector, ideal for filtering.
In stringr functions, which argument comes first?
- The string being operated on
- The pattern
- The replacement
- The width
Answer: The string being operated on. The string is always the first argument, making functions pipe-friendly.
What is the difference between str_replace() and str_replace_all()?
- They are identical
- str_replace_all() changes only the first match
- str_replace() changes the first match; _all changes every match
- str_replace() works only on numbers
Answer: str_replace() changes the first match; _all changes every match. str_replace() swaps the first match; str_replace_all() swaps them all.
Which function extracts a substring by position, e.g. characters 1 to 5?
- str_extract()
- str_split()
- str_count()
- str_sub()
Answer: str_sub(). str_sub(x, 1, 5) slices by start and end position.
What does str_pad(c('7','42'), width = 4, pad = '0') produce?
- "0007" "0042"
- "7000" "4200"
- "7" "42"
- "0070" "0420"
Answer: "0007" "0042". str_pad() left-pads each string with zeros to width 4.
Which function pulls the FIRST regex match from each element?
- str_detect()
- str_extract()
- str_replace()
- str_length()
Answer: str_extract(). str_extract(x, "[0-9]+") returns the first matching substring.
Why does str_split() return a list instead of a vector?
- It is a bug
- It is faster
- Each input can split into a different number of pieces
- Lists are always used in stringr
Answer: Each input can split into a different number of pieces. Because piece counts vary per input, a list (one element per input) is the honest container.
Which function counts how many times a pattern appears in each string?
- str_count()
- str_length()
- str_detect()
- str_sub()
Answer: str_count(). str_count(x, "a") tallies matches per element.
Which function removes surrounding whitespace?
- str_pad()
- str_squish()
- str_trim()
- str_wrap()
Answer: str_trim(). str_trim() strips leading and trailing whitespace.
How do you make a stringr pattern case-insensitive?
- Use fixed('...')
- Add ignore = TRUE as an argument
- Use str_to_lower first only
- Wrap it in regex('...', ignore_case = TRUE)
Answer: Wrap it in regex('...', ignore_case = TRUE). regex("err", ignore_case = TRUE) enables case-insensitive matching.