With dozens of RStudio Conference videos now available online, it’s hard to know where to begin. I hope this look at some of my favorites will help get you started!
Error messages in R
I could probably watch Jenny Bryan teach data entry (oh, that looks interesting, maybe I want to try typing in a thousand rows . . . . ). But in this keynote, she tackles a much more compelling topic: dealing with errors in R. There’s a lot of useful advice here, which she shares in an engaging, relatable way. One takeaway: Try the equivalent of a reboot—restart your R session! (I’ve been doing that much more often since returning from the conference.) Video: Object of type ‘closure’ is not subsettable.
New features in RStudio
Wondering what features are coming to the next version of RStudio desktop? RStudio’s Jonathan McPherson outlined several, including modern-era spell check (at last), better cloud usability on iOS, and more screen-reader accessibility for visually impaired users—something that also improves keyboard navigation for all users. Video: RStudio 1.3 Sneak Preview.
State of the tidyverse
RStudio Chief Scientist Hadley Wickham reviewed last year’s highlights from the tidyverse and this year’s plans for further development, but he was also fairly forthright in discussing some recent missteps.
In particular, he acknowledged that the initial rollout of “tidy evaluation” launched with a somewhat difficult-to-master syntax and an unreasonable expectation that users would want to learn the detailed computing theory behind it. It turned out that many users didn’t care about the mechanics behind incorporating the tidyverse into their own custom functions; they just wanted to write their code. Since then, tidy eval syntax has been changed to more understandable {{}}
double braces.
Wickham also outlined how tidyverse package authors will help users better understand the lifecycle of older functions and if/how some functions may be deprecated. Video: State of the tidyverse.
Styled text with ggtext
Claus Wilke gave an overview of the ggtext package in this fast-paced presentation, showing how to customize ggplot visualizations with colored text, images on axes, and more. He also explained the package’s current limits. Video: Spruce up your ggplot2 visualizations with formatted text.
Below, you can also watch my Do More With R tutorial on one way to use ggtext: adding color to ggplot text. (Or read the companion article.)
What you didn’t know about R’s scales package
I’ve used scales package functions such as comma()
or dollar()
to add commas or dollar signs to a vector of numbers, but I never really explored the package further. Turns out that was my loss. At this presentation, data scientist Dana Seidel showed that scales does a lot more than format numbers. One tip: The show_col()
function lets you easily see how various colors and palettes look. Video: The little package that could: Taking visualizations to the next level with the scales package.
Result from running scales::show_col(viridis_pal()(4))
.
Customizing Shiny apps and R Markdown
Shiny creator and RStudio CTO Joe Cheng demo’d bootstraplib, a new package for customizing the look of Shiny apps without having to hunt through and tweak complex CSS. The bootstraplib package lets you change Bootstrap defaults within an R script, without having to write HTML and CSS. (Bootstrap is the open source HTML/CSS/JavaScript framework used by Shiny and many other Web projects.) You can also use bootstraplib in non-Shiny R Markdown documents. Video: Styling Shiny apps with Sass and Bootstrap 4.
Better spaghetti plots using brolgar in R
A spaghetti plot that makes sense, using the brolgar R package.
What do you get when you have a load of items plotted over time? That data type is known as longitudinal, and visualizing it can often end up looking like a pile of spaghetti. To help solve this problem, Nicholas Tierney at Monash University created the brolgar package (watch the presentation if you’re wondering why that name) to summarize, visualize, and otherwise understand such data. Video: Making better spaghetti (plots): Exploring the individuals in longitudinal data with the brolgar package.
Dataviz best (and worst) practices
This wasn’t R-specific, but University of Pennsylvania dataviz specialist Will Chase gave an engaging, opinionated talk on how to “take your charts from drab to fab.” One tip: “White space is like garlic — take as much as you think you need and triple it.” Video: The Glamour of Graphics.
From Will Chase’s The Glamour of Graphics presentation.
R Markdown to its limits
There is a lot more one can do with R Markdown than I thought. And the fun-as-well-as-educational Teacup Giraffe site pushes the limits. In addition to enjoying a look at the Teacup Giraffe website, this presentation by neuroscience Ph.D. student Desiree De Leon includes some basic advice for improving your own R Markdown documents. Video: Of Teacups, Giraffes, & R Markdown.
And speaking of getting more out of markdown, RStudio’s Yihui Xie had a separate talk showing how to generate many more file types than just HTML or PDF from an R Markdown document. Video: One R Markdown Document, Fourteen Demos.
3D visualizations in R
I’d been resisting razzle-dazzle around the rayshader package for awhile. Did I really need to turn ggplots into 3D visualizations and animate them? But I’m glad I went to author Tyler Morgan-Wall’s presentation, because the package is pretty cool—even if I’m not sure yet how I’d use it in my own work.
3D raytracer version of a 2D ggplot.
Morgan-Wall showed how to turn a conventional graphic into a 3D visualization and animation with very little code. He also showed some recent package improvements that make some graphics more visually striking. In this case, seeing the animated examples is a lot better than trying to read about them. If you’re at all interested in this package, it’s worth watching the presentation. Video: 3D ggplots with rayshader.
Accelerating analytics in R
The Apache Arrow project is a multi-language standard for in-memory data aimed at interoperability and high performance. Arrow has been implemented in R with the arrow package. Ursa Labs Engineering Director Neal Richardson outlined the status of Arrow in R, including the ability to query a directory of files using dplyr syntax without having to load that data into memory, as well as some upcoming features. Video: Accelerating analytics with Apache Arrow.
list-columns in data.table
If you’ve seen the heated discussions on social media, you might think that tidyverse and data.table are in two opposing camps. But while each has its fans, there are an increasing number of people who use both. Utah State Research Assistant Professor Tyson S. Barrett is one, and he brought data.table to RStudio Conference with a talk on using complex list-columns with data.table and tidyverse functions. One interesting tip: If you’re joining a complex data set, nesting all of the columns you’re not joining on can help prevent errors.
Barrett also mentioned his tidyfast package, which has a streamlined “translation” of data.table code to tidyverse-like functions. (It’s similar to dtplyr but doesn’t use “lazy” data sets.) Unfortunately, this session video occasionally obscures some of the code and graphs being shown. If you watch this one, I suggest looking at the slides separately; they’re available at Barrett’s website. Video: List-columns in data.table: Reducing the cognitive & computational burden of complex data.
Not familiar with data.table? Check out my Do More With R 5-minute intro below.
Bonus: RStudio Conference 2020 lightning talks
There were a lot of interesting lightning talks, but a few stood out in part for the cool websites and packages being demo’d as well as the presentations themselves.
RStudio intern Maya Gans showed a drag-and-drop interface for tidyverse tasks such as transforming, summarizing, and plotting data. It’s an interesting way to teach tidyverse concepts before students have to learn actual code. Video: TidyBlocks: using the language of the tidyverse in a blocks-based interface. Website: TidyBlocks.tech.
The still-experimental livecode package lets you live code a demo and have it appear on attendees’ own systems in near real time. University of Edinburgh lecturer Colin Rundel explains why you’d want to do that and how it works. Video: `livecode`: Broadcast your live coding sessions from and to RStudio.
Data science for software engineers: Busting software myths with R featured a website designed to teach statistics to software engineering students with relevant problems like: Does test-driven software improve quality? Does sleep deprivation make programmers more or less effective? When will that project be finished? Yim Register showed a bit of the site, but you can also check out all the lessons at Data Science for Software Engineers.
Bonus: RStudio Conference 2020 keynotes
RStudio founder and CEO J.J. Allaire discussed the state of open source software, how it’s possible to fund open source efforts, and the company’s move to become a certified benefit corporation. Video (presentation only, not subsequent Q&A): Open Source Software for Data Science.
Every time I see Martin Wattenberg and Fernanda Viegas speak, I leave feeling grateful that I’ve got a job that lets me peek into the work and thoughts of some supersmart people. Co-leaders of Google Brain’s PAIR (People+AI Research), the two discussed a number of their projects on topics like understanding algorithm bias. Several of the projects they discussed are available to the public. Video: Data, visualization, and designing.
A final note: With multiple tracks going on at once, I missed a lot of excellent talks. I also attended other good ones that didn’t make the list because I didn’t want this article to get too long. For example, how Associated Press uses R (Larry Fenn), hitting R a million times a day at T-Mobile (Heather Nolis and Jacqueline Nolis), and tuning models with the tune and workflow packages (Max Kuhn). You can find all of the available videos here: https://resources.rstudio.com/rstudio-conf-2020.
Want more R tips? Check out InfoWorld’s Do More With R video tutorials.