Hi. I’m Sharon Machlis at IDG Communications, here with Episode 60 of Do More With R: Built-in Pipes in R version 4.1.
One of the most interesting new features in the latest version of R is: There’s now a built-in pipe operator! Let’s take a look!
This is the R pipe that most people know. It’s from the magrittr package. And, by the way, if you’re wondering about the package name, it comes from Belgian artist Rene Magritte and this painting of his (that text says “This is not a pipe”).
Here’s a somewhat trivial example using the magrittr pipe with the mtcars data set and a couple of dplyr functions. I’m taking the data set, filtering it for rows with more than 25 mpg and arranging it by descending miles per gallon. Not everyone likes the pipe syntax. But especially when using tidyverse functions, the advantages are: not creating new copies of a data set, and not repeating the data frame name, like here; or I’d argue making it easier to read than without the pipe if you have a lot of steps in your code.
So let’s take a look at R 4.1 and its built-in pipe. If you’re not yet ready to install R 4.1 on your system, one easy way to see it locally is by running it inside a Docker container. You can see full general instructions on how to run R in Docker at the InfoWorld link here. Basically, download and install Docker, run it, and then run the code here in a terminal window (not the R console, a terminal). I’ll do that now. To get to R and RStudio, I’ll open port 8787 on localhost.
In that Docker code I just ran, I created a volume connecting my Docker container to files on my local system, so I can use those now. First let me bump up the font size go to Tools > Global Options > Appearance, and change Editor font size to 14>. OK, with no libraries loaded, my usual pipe won’t work But the new built-in pipe – which is 2 characters, a pipe and a greater-than sign – does work.
Why a new pipe? Making it available without an external dependency is appealing to some developers. Also, it looks like the built-in pipe is faster. Michael Barrowman did some tests. No pipe and the new built-in pipe were about the same speed. You can see an old implementation of a maggritr pipe – the 2nd row – is quite slow. The more recent one is a big improvement, but still not as fast in this test as the new base R pipe.
The maggritr and base R pipes work mostly the same, but there’s at least one important difference if you’re using a function that doesn’t have pipe-friendly syntax.
What do I mean by pipe-friendly? Pipes assume that the first argument in code right after a pipe is whatever the previous code returned and sent along. Here’s an example:
The string detect function in the stringr package uses the string to be searched as its first argument and the pattern to search for as the second argument. That’s pipe friendly, because the string to be searched is likely to come from a previous line of code.
Let’s say I want just rows where the car model name starts with the letter F. This is the syntax with str_detect and a pipe: filter where the model column starts with F.
But grepl in base R has the opposite syntax. Its first argument is the pattern and the second argument is the string to search, which is what a pipe would send along. That causes problems for a pipe . . . but the maggritr pipe has a solution. You can use the dot character to represent the value being piped in. You can see that in the last group of code.
The new pipe runs the stringr code just fine (run the 1st code group). However, this pipe doesn’t use a dot to represent what’s being piped; so the second group of code won’t work. And at least as of now, there is no special character to represent the value being piped.
In this example it hardly matters, since you don’t need a pipe to do something this simple. But for more complex calculations where there isn’t an existing function with pipe-friendly syntax, can you still use the new pipe?
This usually isn’t the most efficient option, but you could create your own function using the original function and just switch the arguments around. I did that here with my own new version of grepl. Again, I know this is kind of a silly example, but try to imagine something more substantial.
Speaking of functions, there’s something else of possible interest in R 4.1: You can use the backslash character as a shorthand for “function”. I think this was done mostly for so-called anonymous functions – functions you create within code that don’t have their own names. But it works for all functions.
Finally, one last point about the new built-in pipe: If you’re piping into a function without any arguments, parentheses are optional with the maggritr pipe. These first two both work. But with the base R pipe, parentheses are required.
That’s it for this episode, thanks for watching! For more R tips, head to the Do More With R page at bit-dot-l-y slash do more with R, all lowercase except for the R.
You can also find the Do More With R playlist on YouTube’s IDG Tech Talk channel -- where you can subscribe so you never miss an episode. Hope to see you next time. Stay healthy and safe, everyone!