If you’ve ever needed to extract a phone number or pull out specific text in R, you’ve probably run into something like this:
stringr::str_extract_all("Call us at 800-555-1234", "\\d+")That "\\d+" is a regular expression. It works — but what
does it mean? For anyone who doesn’t have regex memorized, this
is a wall. You have to stop what you’re doing, look up the syntax, and
hope you get the escaping right.
This is the problem regexpert solves.
regexpert breaks pattern matching into a plain-English
pipeline:
xp_build_* functions.xp_action_find() to
return matches.These steps are connected with the pipe (%>% or
|>), making your code read like a sentence:
xp_build_digits() %>%
xp_op_repeat(1, Inf) %>%
xp_action_find("Call us at 800-555-1234")
#> [1] "800" "555" "1234"No regex required. You described what you wanted, and
regexpert handled the escaping and logic.
xp_build_digits()
Matches numeric characters (0–9). In regexpert, builders
match a single character by default. To match groups of digits, follow
them with xp_op_repeat().
xp_build_digits() %>%
xp_op_repeat(1, Inf) %>%
xp_action_find("Order #88412 placed on 2024-01-15")
#> [1] "88412" "2024" "01" "15"xp_build_letters()
Matches alphabetic characters (a–z, A–Z). You can specify the case if needed.
xp_build_letters(case = "both") %>%
xp_op_repeat(1, Inf) %>%
xp_action_find("Order #88412 placed")
#> [1] "Order" "placed"xp_build_whitespace()
Matches spaces, tabs, and newlines.
xp_build_whitespace() %>%
xp_op_repeat(1, Inf) %>%
xp_action_find("hello world")
#> [1] " "xp_build_literal()
Sometimes you need to match specific symbols like $ or
.. These are “dangerous” in regex, but
xp_build_literal() automatically escapes them for you.
xp_build_literal("$") %>%
xp_build_digits() %>%
xp_op_repeat(1, Inf) %>%
xp_action_find("The price is $100")
#> [1] "$100"The real power of regexpert comes from chaining multiple
builders into a single pipeline. Each builder adds to the pattern left
to right, letting you describe complex matches in plain English
steps.
For example, say you want to extract prices from a messy string — a
$ sign followed by one or more digits, a dot, then exactly
two decimal digits:
xp_build_literal("$") %>%
xp_build_digits() %>%
xp_op_repeat(1, Inf) %>%
xp_build_literal(".") %>%
xp_build_digits() %>%
xp_op_repeat(2, 2) %>%
xp_action_find("Items: $4.99, $12.50, and $120.00")
#> [1] "$4.99" "$12.50" "$120.00"Or say you want to pull out hyphenated product codes — three letters, a dash, then four digits:
xp_build_letters(case = "upper") %>%
xp_op_repeat(3, 3) %>%
xp_build_literal("-") %>%
xp_build_digits() %>%
xp_op_repeat(4, 4) %>%
xp_action_find("Products: ABC-1234, XYZ-5678, and DEF-9999")
#> [1] "ABC-1234" "XYZ-5678" "DEF-9999"You can always use xp_action_view() to peek at the regex
being built at any point in the pipeline — useful for debugging or
learning:
xp_build_letters(case = "upper") %>%
xp_op_repeat(3, 3) %>%
xp_build_literal("-") %>%
xp_build_digits() %>%
xp_op_repeat(4, 4) %>%
xp_action_view()
#> Current Regex Pattern:
#> [A-Z]{3}\-\d{4}For common real-world data types, regexpert includes a
library of pre-built patterns via xp_build_standard(). No
need to remember complex regex — just name what you want.
# Extract email addresses
xp_build_standard("email") %>%
xp_action_find("Contact us at support@regexpert.com or sales@example.org")
#> [1] "support@regexpert.com" "sales@example.org"
# Extract phone numbers
xp_build_standard("phone") %>%
xp_action_find("Call us at (208) 555-1234 or 208.555.5678")
#> [1] "(208) 555-1234" "208.555.5678"
# Extract ZIP codes
xp_build_standard("zip") %>%
xp_action_find("My zip code is 83440")
#> [1] "83440"
# Extract ISO dates
xp_build_standard("date_iso") %>%
xp_action_find("The event is on 2026-03-17")
#> [1] "2026-03-17"The full list of available types includes: email,
phone, zip, date_iso,
date_us, credit_card, ipv4,
hex_color, time_24, ssn, and
mac_address.
Need a pattern that isn’t in the library? Use
xp_register() to add your own custom patterns to the
session library and access them the same way via
xp_build_standard().
If you ever want to see the actual regex string being generated under
the hood, use xp_action_view().
xp_build_digits() %>%
xp_op_repeat(3) %>%
xp_build_literal("-") %>%
xp_action_view()
#> Current Regex Pattern:
#> (?:\d){3}\-Now that you’ve seen how regexpert works, here are a few
challenges to test your understanding. Each one can be solved by
chaining builders and operators together.
Challenge 1 — Extract the time
Given the string
"Meeting at 09:30 and follow-up at 14:45", write a pipeline
that extracts both times in HH:MM format.
Hint: a time is two digits, a literal colon, then two more digits.
Challenge 2 — Pull out the serial numbers
Given "Units: SN-00123, SN-00456, SN-00789", write a
pipeline that extracts each serial number in SN-XXXXX
format.
Hint: think about what xp_build_literal() and
xp_op_repeat() can do together.
Challenge 3 — Find the hex color codes
Given "Theme colors: #ff5733, #c70039, and #900c3f", try
extracting the hex codes using xp_build_standard() first,
then try building the same pattern manually from scratch using
xp_build_literal() and xp_build_alnum().
Hint: a hex color is a # followed by exactly 6
alphanumeric characters.
Once you’re comfortable with the basics, explore the full function
reference at ?xp_build_standard and
?xp_op_repeat to see everything regexpert can
do.