The Problem with Regular Expressions

If you’ve ever needed to extract a phone number or pull out specific text in R, you’ve probably run into something like this:

stringr::str_extract_all("Call us at 800-555-1234", "\\d+")

That "\\d+" is a regular expression. It works — but what does it mean? For anyone who doesn’t have regex memorized, this is a wall. You have to stop what you’re doing, look up the syntax, and hope you get the escaping right.

This is the problem regexpert solves.

The Build → Operate → Act Workflow

regexpert breaks pattern matching into a plain-English pipeline:

  1. Build your pattern using descriptive xp_build_* functions.
  2. Operate on the pattern to specify repetitions or boundaries.
  3. Act on it using xp_action_find() to return matches.

These steps are connected with the pipe (%>% or |>), making your code read like a sentence:

xp_build_digits() %>%
  xp_op_repeat(1, Inf) %>%
  xp_action_find("Call us at 800-555-1234")
#> [1] "800" "555" "1234"

No regex required. You described what you wanted, and regexpert handled the escaping and logic.

The Builders

xp_build_digits()

Matches numeric characters (0–9). In regexpert, builders match a single character by default. To match groups of digits, follow them with xp_op_repeat().

xp_build_digits() %>%
  xp_op_repeat(1, Inf) %>%
  xp_action_find("Order #88412 placed on 2024-01-15")
#> [1] "88412" "2024" "01" "15"

xp_build_letters()

Matches alphabetic characters (a–z, A–Z). You can specify the case if needed.

xp_build_letters(case = "both") %>%
  xp_op_repeat(1, Inf) %>%
  xp_action_find("Order #88412 placed")
#> [1] "Order" "placed"

xp_build_whitespace()

Matches spaces, tabs, and newlines.

xp_build_whitespace() %>%
  xp_op_repeat(1, Inf) %>%
  xp_action_find("hello   world")
#> [1] "   "

xp_build_literal()

Sometimes you need to match specific symbols like $ or .. These are “dangerous” in regex, but xp_build_literal() automatically escapes them for you.

xp_build_literal("$") %>%
  xp_build_digits() %>%
  xp_op_repeat(1, Inf) %>%
  xp_action_find("The price is $100")
#> [1] "$100"

Chaining Builders Together

The real power of regexpert comes from chaining multiple builders into a single pipeline. Each builder adds to the pattern left to right, letting you describe complex matches in plain English steps.

For example, say you want to extract prices from a messy string — a $ sign followed by one or more digits, a dot, then exactly two decimal digits:

xp_build_literal("$") %>%
  xp_build_digits() %>%
  xp_op_repeat(1, Inf) %>%
  xp_build_literal(".") %>%
  xp_build_digits() %>%
  xp_op_repeat(2, 2) %>%
  xp_action_find("Items: $4.99, $12.50, and $120.00")
#> [1] "$4.99"   "$12.50"  "$120.00"

Or say you want to pull out hyphenated product codes — three letters, a dash, then four digits:

xp_build_letters(case = "upper") %>%
  xp_op_repeat(3, 3) %>%
  xp_build_literal("-") %>%
  xp_build_digits() %>%
  xp_op_repeat(4, 4) %>%
  xp_action_find("Products: ABC-1234, XYZ-5678, and DEF-9999")
#> [1] "ABC-1234" "XYZ-5678" "DEF-9999"

You can always use xp_action_view() to peek at the regex being built at any point in the pipeline — useful for debugging or learning:

xp_build_letters(case = "upper") %>%
  xp_op_repeat(3, 3) %>%
  xp_build_literal("-") %>%
  xp_build_digits() %>%
  xp_op_repeat(4, 4) %>%
  xp_action_view()
#> Current Regex Pattern:
#> [A-Z]{3}\-\d{4}

Standard Patterns

For common real-world data types, regexpert includes a library of pre-built patterns via xp_build_standard(). No need to remember complex regex — just name what you want.

# Extract email addresses
xp_build_standard("email") %>%
  xp_action_find("Contact us at support@regexpert.com or sales@example.org")
#> [1] "support@regexpert.com" "sales@example.org"

# Extract phone numbers
xp_build_standard("phone") %>%
  xp_action_find("Call us at (208) 555-1234 or 208.555.5678")
#> [1] "(208) 555-1234" "208.555.5678"

# Extract ZIP codes
xp_build_standard("zip") %>%
  xp_action_find("My zip code is 83440")
#> [1] "83440"

# Extract ISO dates
xp_build_standard("date_iso") %>%
  xp_action_find("The event is on 2026-03-17")
#> [1] "2026-03-17"

The full list of available types includes: email, phone, zip, date_iso, date_us, credit_card, ipv4, hex_color, time_24, ssn, and mac_address.

Need a pattern that isn’t in the library? Use xp_register() to add your own custom patterns to the session library and access them the same way via xp_build_standard().

Inspecting Your Work

If you ever want to see the actual regex string being generated under the hood, use xp_action_view().

xp_build_digits() %>%
  xp_op_repeat(3) %>%
  xp_build_literal("-") %>%
  xp_action_view()
#> Current Regex Pattern:
#> (?:\d){3}\-

Try It Yourself

Now that you’ve seen how regexpert works, here are a few challenges to test your understanding. Each one can be solved by chaining builders and operators together.

Challenge 1 — Extract the time

Given the string "Meeting at 09:30 and follow-up at 14:45", write a pipeline that extracts both times in HH:MM format.

Hint: a time is two digits, a literal colon, then two more digits.

Challenge 2 — Pull out the serial numbers

Given "Units: SN-00123, SN-00456, SN-00789", write a pipeline that extracts each serial number in SN-XXXXX format.

Hint: think about what xp_build_literal() and xp_op_repeat() can do together.

Challenge 3 — Find the hex color codes

Given "Theme colors: #ff5733, #c70039, and #900c3f", try extracting the hex codes using xp_build_standard() first, then try building the same pattern manually from scratch using xp_build_literal() and xp_build_alnum().

Hint: a hex color is a # followed by exactly 6 alphanumeric characters.


Once you’re comfortable with the basics, explore the full function reference at ?xp_build_standard and ?xp_op_repeat to see everything regexpert can do.