R tip: Iterate with purrr's map_df function

InfoWorld | Jul 12, 2018

In this sixth episode of Do More with R, learn how to apply a function to a vector of values and return a data frame

Similar
Hi, I’m Sharon Machlis, Director of Editorial Data & Analytics at IDG Communications. I’m here with episode 6 of Do More With R: Iterate with purrr’s map_df() function.
Applying a function to a lot of different values is one of the most common tasks in programming. In most languages, you’d typically use a for loop for that. You can code for loops in R. But most R programmers use some kind of iterating function instead.
Base R has a family of apply functions for that: apply, l-apply, s-apply, v-apply. They work, but can be confusing to use.
The current tidyverse way of iterating is with the purrr package’s map functions. What’s handy about map is that it’s really intuitive to specify what kind of result you want. map() produces a list. map underscore D.F. gives you a data frame. Map underscore I.N.T. creates a vector of integers. And so on. Today, I’d like to show you map_df.
I’ve got three CSV files, each with information from a New York City airport about delays. I’d like to create one data frame from importing all three files.
The format for any of the map functions is map(data, function, and then any additional arguments you want to pass to the function.)
I’ll load the purrr package, and dplyr because I always load dplyr; then read all the files in my data directory with list.files. All the file names are now in the myfiles variable.
See map_df here: myfiles is the data (the vector of file names), read.csv is the function, and then strings as factors equals false is the additional argument I’m passing.
That’s it, I’ve got all the data from the files in a single data frame.
This is a pretty simple example. For more complex operations, the map functions have a different, formula format. Here’s what the same thing looks like with a map formula syntax:
Subtle difference. Here you’ve got map_df, the data, a comma, then a tilde – that says “what follows is a formula”. Then you can write code to act on each item in your data, using a dot to represent the item.
There’s a load more you can do with purrr. Map2 functions iterate over two same-sized lists or vectors at a time. Walk functions do the same thing as map, but without returning a value, for things like saving data to disk or printing results. If you want to learn more, there’s a recording of Charlotte Wickham’s in-depth tutorial at Rstudio dot com – use the search there and look for purrr tutorial. The purrr Web site is at purrr dot tidyverse dot org. Remember, purrr as in the purrr package has THREE Rs.
That’s it for this episode, thanks for watching! For more R tips, head to the More With R video page at bit.ly/morewithR. That’s https B I T period L Y slash more with R, all lowercase except for the R. So long, and hope to see you next episode!
Popular