One of the most appealing things about R is its ability to create data visualizations with just a couple of lines of code.
For example, it takes just one line of code -- and a short one at that -- to plot two variables in a scatterplot. Let's use as an example the mtcars data set installed with R by default. To plot the engine displacement column disp on the x axis and MPG on y:
plot(mtcars$disp, mtcars$mpg)
[This story is part of Computerworld's "Beginner's guide to R." To read from the beginning, check out the introduction; there are links on that page to the other pieces in the series.]
[ Learn how to work smarter, not harder with InfoWorld's roundup of all the tips and trends programmers need to know in the Developers' Survival Guide. Download the PDF today! | Keep up with the latest developer news with InfoWorld's Developer World newsletter. ]
See the entire beginner's guide to R:
• Part 2: Getting your data into R
• Part 3: Easy ways to do basic data analysis with R
• Part 4: Painless data visualization using R
• Part 5: Syntax quirks you'll want to know about R
• Part 6: Useful resources for R
You really can't get much easier than that.
Of course that's a pretty no-frills graphic. If you'd like to label your x and y axes, use the parameters xlab
and ylab
. To add a main headline, such as "Page views by time of day," use the parameter main
:
plot(mtcars$disp, mtcars$mpg, xlab="Engine displacement", ylab="mpg", main="MPG compared with engine displacement")
If you find having the y-axis labels rotated 90 degrees annoying (as I do), you can position them for easier reading with the las=1
argument:
plot(mtcars$disp, mtcars$mpg, xlab="Engine displacement", ylab="mpg", main="MPG vs engine displacement", las=1)
What's las
and why is it 1? las refers to "label style," and it has four options: 0 is the default, with text always parallel to its axis; 1 is always horizontal; 2 is always perpendicular to the axis; and 3 is always vertical. For much more on plot parameters, run the help
command on par like so:
?par
In addition to the basic dataviz functionality included with standard R, there are numerous add-on packages to expand R's visualization capabilities. Some packages are for specific disciplines such as biostatistics or finance; others add general visualization features.
Why use an add-on package if you don't need something discipline-specific? If you're doing more complex dataviz or want to pretty up your graphics for presentations, some packages have more robust options. Another reason: The organization and syntax of an add-on package might appeal to you more than do the R defaults.
Using ggplot2
In particular, the ggplot2
package is quite popular and worth a look for robust visualizations. The package requires a bit of time to learn its "Grammar of Graphics" approach.
But once you have that down, you have a tool to create many different types of visualizations using the same basic structure.
If ggplot2
isn't installed on your system yet, install it with the command:
install.packages("ggplot2")
You only need to do this once.
To use its functions, load the ggplot2
package into your current R session -- you only need to do this once per R session -- with the library()
function:
library(ggplot2)
Onto some ggplot2
examples.
ggplot2
has a "quick plot" function called qplot()
that is similar to R's basic plot()
function but adds some options. The basic quick plot code generates a scatterplot:
qplot(disp, mpg, data=mtcars)
The qplot
default starts the y axis at a value that makes sense to R. However, you might want your y axis to start at 0 so that you can better see whether changes are truly meaningful (starting a graph's y axis at your first value instead of 0 can sometimes exaggerate changes).
Use the ylim
argument to manually set your lower and upper y axis limits:
qplot(disp, mpg, ylim=c(0,35), data=mtcars)