# Beginner's guide to R: Painless data visualization

### Learn how to paint a picture with data with R, using just a couple lines of code

Page 2 of 6

Bonus intermediate tip: Sometimes on a scatterplot you may not be sure if a point represents just one observation or multiple ones, especially if you've got data points that repeat -- such as in this example that `ggplot2` creator Hadley Wickham generated with the command:

`qplot(cty, hwy, data=mpg)`

`qplot(cty, hwy, data=mpg, geom="jitter")`

As you might have guessed, if there's a "quick plot" function in `ggplot2` there's also a more robust, full-featured plotting function. That's called `ggplot()` -- yes, while the add-on package is called `ggplot2`, the function is `ggplot()` and not `ggplot2()`.

The code structure for a basic graph with `ggplot()` is a bit more complicated than in either `plot()` or `qplot()`; it goes as follows:

`ggplot(mtcars, aes(x=disp, y=mpg)) + geom_point()`

The first argument in the `ggplot()` function, mtcars, is fairly easy to understand -- that's the data set you're plotting. But what's with `aes()` and `geom_point()`?

`aes` stands for aesthetics -- what are considered visual properties of the graph. Those are things like position in space, color, and shape.

`geom` is the graphing geometry you're using, such as lines, bars, or the shapes of your points.

Now if "line" and "bar" also seem like aesthetic properties to you, similar to shape, well, you can either accept that's how it works or do some deep reading into the fundamentals behind the Grammar of Graphics. (Personally, I just take Wickham's word for it.)

Want a line graph instead? Simply swap out `geom_point()` and replace it with `geom_line()`, as in this example that plots temperature vs pressure in R's sample pressure data set:

`ggplot(pressure, aes(x=temperature, y=pressure)) + geom_line()`

It may be a little confusing here since both the data set and one of its columns are called the same thing: pressure. That first "pressure" represents the name of the data frame; the second, "y=pressure," represents the column named pressure.

In these examples, I set only x and y aesthetics. But there are lots more aesthetics we could add, such as color, axes and more.

You can also use the `ylim` argument with `ggplot` to change where the y axis starts. If `mydata` is the name of your data frame, `xcol` is the name of the column you want on the x axis, and `ycol` is the name of the column you want on the y axis, use the `ylim` argument like this:

`ggplot(mydata, aes(x=xcol, y=ycol), ylim=0) + geom_line()`

Perhaps you'd like both lines and points on that temperature vs. pressure graph?

`ggplot(pressure, aes(x=temperature, y=pressure)) + geom_line() + geom_point()`

The point here (pun sort of intended) is that you can start off with a simple graphic and then add all sorts of customizations: Set the size, shape, and color of the points; plot multiple lines with different colors; add labels; and a ton more. See Bar and line graphs (ggplot2) for a few examples, or the The R Graphics Cookbook by Winston Chang for many more.

| Page 2
``` ```