Bonus intermediate tip: Sometimes on a scatterplot you may not be sure if a point represents just one observation or multiple ones, especially if you've got data points that repeat -- such as in this example that
ggplot2 creator Hadley Wickham generated with the command:
qplot(cty, hwy, data=mpg)
qplot(cty, hwy, data=mpg, geom="jitter")
As you might have guessed, if there's a "quick plot" function in
ggplot2 there's also a more robust, full-featured plotting function. That's called
ggplot() -- yes, while the add-on package is called
ggplot2, the function is
ggplot() and not
The code structure for a basic graph with
ggplot() is a bit more complicated than in either
qplot(); it goes as follows:
ggplot(mtcars, aes(x=disp, y=mpg)) + geom_point()
The first argument in the
ggplot() function, mtcars, is fairly easy to understand -- that's the data set you're plotting. But what's with
aes stands for aesthetics -- what are considered visual properties of the graph. Those are things like position in space, color, and shape.
geom is the graphing geometry you're using, such as lines, bars, or the shapes of your points.
Now if "line" and "bar" also seem like aesthetic properties to you, similar to shape, well, you can either accept that's how it works or do some deep reading into the fundamentals behind the Grammar of Graphics. (Personally, I just take Wickham's word for it.)
Want a line graph instead? Simply swap out
geom_point() and replace it with
geom_line(), as in this example that plots temperature vs pressure in R's sample pressure data set:
ggplot(pressure, aes(x=temperature, y=pressure)) + geom_line()
It may be a little confusing here since both the data set and one of its columns are called the same thing: pressure. That first "pressure" represents the name of the data frame; the second, "y=pressure," represents the column named pressure.
In these examples, I set only x and y aesthetics. But there are lots more aesthetics we could add, such as color, axes and more.
You can also use the
ylim argument with
ggplot to change where the y axis starts. If
mydata is the name of your data frame,
xcol is the name of the column you want on the x axis, and
ycol is the name of the column you want on the y axis, use the
ylim argument like this:
ggplot(mydata, aes(x=xcol, y=ycol), ylim=0) + geom_line()
Perhaps you'd like both lines and points on that temperature vs. pressure graph?
ggplot(pressure, aes(x=temperature, y=pressure)) + geom_line() + geom_point()
The point here (pun sort of intended) is that you can start off with a simple graphic and then add all sorts of customizations: Set the size, shape, and color of the points; plot multiple lines with different colors; add labels; and a ton more. See Bar and line graphs (ggplot2) for a few examples, or the The R Graphics Cookbook by Winston Chang for many more.