Join the Shiny Community every month at Shiny Gatherings

stringr guide on modern and consistent string text processing in r examples

stringr: 10 Examples on How to Do Efficient String Processing in R


Working with strings in R can be surprisingly complex and challenging. Dealing with diverse data types, including textual, numeric, and language-specific characters, adds further complexity. It’s even worse if you’re collecting string data through some website form. Good luck processing that.

Truth be told, R’s built-in functions for working with strings leave a lot to be desired. That’s where the R package stringr comes in, and it ships with the Tidyverse ecosystem, so it’s likely you already have it installed. Don’t worry if you don’t, as we’ll walk you through the stringr installation steps.

This article will demonstrate 10 useful stringr functions that you should know in order to work efficiently with string data and avoid wasting time reinventing the wheel. Before diving into the examples, we will cover some basics about the R stringr package.

Need to manage environment-specific configuration files in R? Look no futher than R config package.

Table of contents:


What is the stringr package and How to Install it

The stringr package provides you with a collection of functions for working with strings. It was developed by Hadley Wickham, who is a Chief Scientist at Posit and a well-known figure in the world of R programming language.

This package is designed to be user-friendly, easy to learn, and easy to use, which makes it an essential tool for those who want to work with string data effectively.

The stringr package has a lot of things going for it. It’s consistent with function naming, which isn’t always given in other packages. For example, all stringr functions have a prefix of str_, followed by the function name.

You can expect to find pretty much any function you can imagine, from simple string operations to pattern matching, substitution, trimming, splitting, and much more. It’s an easy to understand tidyverse wrapper over common stringi functions; generally, if the use-case is not too complex, stringr helps the user avoid using stringi.

But before you can use the package, you’ll have to install it. The recommended method is to install the entire tidyverse, as stringr is a part of it.

You can do so by running the following command from the R console:

install.packages("tidyverse")

Alternatively, you can install onlystringr by running the following command:

install.packages("stringr")

Either way, you now have stringr installed, which means we can go over the top 10 functions next.

stringr in Action – 10 Functions to Preprocess Textual Data

This section will give you 10 function examples of the stringr package, which will come in handy when preprocessing textual data.

As for the data, we’ll declare a vector x that contains five strings:

library(stringr)

x <- c("house", "car", "plant", "telephone", "arm chair")
print(x)

Here’s what it looks like:

Image 1 - Vector of strings

Image 1 – Vector of strings

We can now apply a whole collection of stringr functions to this vector. Let’s start with a simple one.

1. str_length()

This function is used when you want to return the number of characters in a given string. When applied to a vector, it returns a vector where each item represents the number of characters in a corresponding string.

The str_length() function takes a string or a vector as a parameter and returns either an int or a vector of ints, depending on what was passed in.

Take a look at the following example – we’re using the function on the entire vector at once:

str_length(x)

And this is the output:

Image 2 - R stringr str_length() function

Image 2 –  stringr str_length() function

The returned vector of integers matches the input vector of strings and informs you how long each string is.

2. str_sub()

The str_sub() function returns a substring of a given string. It takes three parameters:

  1. The string (or a vector of strings)
  2. The starting index of the substring
  3. The ending index of the substring

For example, if you pass in 2 and 5 for the last two parameters, only a part of the string between those index locations would be returned.

This function is much easier to understand in practice, so let’s apply it to our vector of strings:

str_sub(x, start = 2, end = 5)

And here is the result:

Image 3 - R stringr str_sub() function

Image 3 – stringr str_sub() function

It’s useful when you want to limit the number of characters or trim the start/end of a string.

3. str_detect()

This function returns a boolean or a vector of booleans. The value depends on whether the entered pattern exists in a given string or not.

The str_detect() function takes two parameters – your string (or vector of strings) and a pattern to search for. If the pattern is found, the function returns TRUE; otherwise, it returns FALSE.

Let’s take a look at it in code. We’ll search for the ar letter pattern in our vector of strings:

str_detect(x, "ar")

Here’s the resulting vector of booleans:

Image 4 - R stringr str_detect() function

Image 4 – stringr str_detect() function

It’s a boolean vector, which means you can use it to select only those input strings that satisfy the condition:

x[str_detect(x, "ar")]

We now get a vector of strings back:

Image 5 - R stringr str_detect() function (2)

Image 5 – stringr str_detect() function (2)

There’s a more convenient function for doing so, and we’ll explore it later, but it doesn’t hurt to be a bit creative.

4. str_replace()

The str_replace() function is useful when you want to replace the first occurrence of a pattern in a string with a specified replacement string. It takes three parameters:

  1. The string (or a vector of strings) to search
  2. The pattern to search
  3. The replacement string

The function returns a modified string in which the pattern to search is replaced with the replacement string, but only at the first occurrence.

Let’s give it a shot and replace all letters e with a string ***:

str_replace(x, "e", "***")

Here’s what it returns:

Image 6 - R stringr str_replace() function

Image 6 – stringr str_replace() function

The function does what was advertised, which is replacing only the first occurrence of the search pattern. Just take a look at the telephone string and you’ll see that only the first e was replaced.

If you want to replace all occurrences, do so with the upcoming function.

5. str_replace_all()

This function is almost identical to the previous one, but it replaces all occurrences of the search pattern with the provided replacement string. It takes in identical parameters, so there’s no need to go over them once again.

We’ll once again replace all characters e with a string ***. Here’s the code:

str_replace_all(x, "e", "***")

And these are the results:

Image 7 - R stringr str_replace_all() function

Image 7 – stringr str_replace_all() function

Take a look at the telephone string and you’ll immediately see that all e‘s were successfully replaced.

In practice, you’ll use str_replace_all() much more frequent than str_replace().

6. str_count()

The str_count() function is here to count the number of times a search pattern appears in a string. It takes two parameters: the string on which the search is performed (or a vector of strings), and the search pattern which can also be a regular expression.

This function will return an integer (or a vector of integers) representing the number of times the search pattern was found.

Let’s declare the letter a as a search pattern and perform the search on our vector of words:

str_count(x, "a")

Here’s what the function returns:

Image 8 - R stringr str_count() function

Image 8 – stringr str_count() function

That’s the number of times the letter a is present in all of the input strings. Easy!

7. str_subset()

Remember earlier when we said there’s an easier way to get a vector of strings that satisfies the condition than comparing it to a boolean vector? Well, this is the function for the job.

The str_subset() function returns a subset of a vector of strings that match a certain search pattern. It takes in two parameters: the vector of strings to search and the search pattern itself.

Let’s take a look at this function in code and return all words that contain a letter a:

str_subset(x, "a")

We get a vector of three strings back:

Image 9 - R stringr str_subset() function

Image 9 – stringr str_subset() function

Neat! No need to reinvent the wheel.

8. str_trim()

The str_trim() function is useful when you have messy strings full of leading and trailing whitespaces. It will remove all of them, either from a single string or from a vector of strings.

Since our vector x doesn’t contain any elements with leading or trailing whitespaces, we’ll declare a new one that does:

y <- c("  hello ", "from  ", " R   ")
print(y)

Here’s what it looks like:

Image 10 - String vector with leading and trailing whitespaces

Image 10 – String vector with leading and trailing whitespaces

From here, just pass this vector into the str_trim() function and you’ll be good to go:

str_trim(y)

This is the result:

Image 11 - R stringr str_trim() function

Image 11 – stringr str_trim() function

This function is particularly useful when processing form data, and will make sure no whitespace was entered by mistake.

9. str_split()

This function will split a string or a vector of strings into a vector of substrings or a list of vectors of substrings, depending on the format of data passed in. It does so on a specified delimiter which you have to pass in, meaning there are two parameters in total to this function.

Now, there’s only one string with two words in our x vector, so we’ll declare a new one where strings are a bit wordier:

z <- c("office chair", "front desk", "brown laptop case")
print(z)

This is what it looks like:

Image 12 - A vector of lengthy strings

Image 12 – A vector of lengthy strings

We can now call str_split() on z and pass in space as a delimiter:

str_split(z, " ")

The function returns a list in which each child element is a vector of strings:

Image 13 - R stringr str_split() function

Image 13 – stringr str_split() function

Let’s take a look at another function before wrapping up.

10. str_to_xyz()

There’s actually no function named str_to_xyz(), but there’s a set of functions for transforming a string or a vector of strings. You can use one of the following functions:

  • str_to_title() – To capitalize first letter of each word in a string
  • str_to_sentence() – To capitalize the first letter of a string
  • str_to_upper() – To uppercase the entire string
  • str_to_lower() – To lowercase the entire string

We’ll show you two of these in action. First, let’s use str_to_title() on the entire vector x:

str_to_title(x)

Here are the results:

Image 14 - R stringr str_to_title() function

Image 14 – stringr str_to_title() function

Each word now has the first letter capitalized. Up next, let’s take a look at str_to_upper(). Here’s the code:

str_to_upper(x)

And these are the results:

Image 15 - R stringr str_to_upper() function

Image 15 – stringr str_to_upper() function

All letters of each vector item are now uppercased.

And these are the top 10 stringr functions you must know. Let’s make a brief recap next.


Summing Up Strings in R with stringr

To conclude, the R stringr package packs a powerful set of functions for working with text data. We’ve explored 10 of them in this article, and we hope they’ll help you in your job.

The main benefit of using the stringr package is its simplicity. The functions are intuitive and easy to use, even for newcomers to R. In addition, the package offers consistent syntax across functions, making it easy to learn and apply these tools to different text analysis projects.

What’s your favorite stringr/stringi function? Or a set of functions? Make sure to share in the comment section below, or reach out on Twitter – @appsilon. We’d love to hear your thoughts.

Having trouble managing dependencies in R projects? Try R renv, you’ll never look back.