Beginner's guide to R: Painless data visualization

Learn how to paint a picture with data with R, using just a couple lines of code

Page 2 of 6

Bonus intermediate tip: Sometimes on a scatterplot you may not be sure if a point represents just one observation or multiple ones, especially if you've got data points that repeat -- such as in this example that ggplot2 creator Hadley Wickham generated with the command:

qplot(cty, hwy, data=mpg)

TITLE
Some scatterplots don't show the full picture because one point represents more than one entry in your data.

qplot(cty, hwy, data=mpg, geom="jitter")

As you might have guessed, if there's a "quick plot" function in ggplot2 there's also a more robust, full-featured plotting function. That's called ggplot() -- yes, while the add-on package is called ggplot2, the function is ggplot() and not ggplot2().

The code structure for a basic graph with ggplot() is a bit more complicated than in either plot() or qplot(); it goes as follows:

ggplot(mtcars, aes(x=disp, y=mpg)) + geom_point()

The first argument in the ggplot() function, mtcars, is fairly easy to understand -- that's the data set you're plotting. But what's with aes() and geom_point()?

aes stands for aesthetics -- what are considered visual properties of the graph. Those are things like position in space, color, and shape.

geom is the graphing geometry you're using, such as lines, bars, or the shapes of your points.

Now if "line" and "bar" also seem like aesthetic properties to you, similar to shape, well, you can either accept that's how it works or do some deep reading into the fundamentals behind the Grammar of Graphics. (Personally, I just take Wickham's word for it.)

Want a line graph instead? Simply swap out geom_point() and replace it with geom_line(), as in this example that plots temperature vs pressure in R's sample pressure data set:

ggplot(pressure, aes(x=temperature, y=pressure)) + geom_line()

TITLE
Creating a line graph with ggplot2 .

It may be a little confusing here since both the data set and one of its columns are called the same thing: pressure. That first "pressure" represents the name of the data frame; the second, "y=pressure," represents the column named pressure.

In these examples, I set only x and y aesthetics. But there are lots more aesthetics we could add, such as color, axes and more.

You can also use the ylim argument with ggplot to change where the y axis starts. If mydata is the name of your data frame, xcol is the name of the column you want on the x axis, and ycol is the name of the column you want on the y axis, use the ylim argument like this:

ggplot(mydata, aes(x=xcol, y=ycol), ylim=0) + geom_line()

Perhaps you'd like both lines and points on that temperature vs. pressure graph?

ggplot(pressure, aes(x=temperature, y=pressure)) + geom_line() + geom_point()

The point here (pun sort of intended) is that you can start off with a simple graphic and then add all sorts of customizations: Set the size, shape, and color of the points; plot multiple lines with different colors; add labels; and a ton more. See Bar and line graphs (ggplot2) for a few examples, or the The R Graphics Cookbook by Winston Chang for many more.

TITLE
Bar chart with R's barplot() function.
| 1 2 3 4 5 6 Page 2
From CIO: 8 Free Online Courses to Grow Your Tech Skills
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.