Beginner's guide to R: Painless data visualization

Learn how to paint a picture with data with R, using just a couple lines of code

Page 3 of 6

Bar graphs

To make a bar graph from the sample BOD data frame included with R, the basic R function is barplot(). To plot the demand column from the BOD data set on a bar graph, you can use the command:

barplot(BOD$demand)

Add main="Graph of demand" if you want a main headline on your graph:

barplot(BOD$demand, main="Graph of demand")

To label the bars on the x axis, use the names.arg argument and set it to the column you want to use for labels:

barplot(BOD$demand, main="Graph of demand", names.arg = BOD$Time)

Sometimes you'd like to graph the counts of a particular variable but you have only raw data, not a table of frequencies. R's table() function is a quick way to generate counts for each factor in your data.

The R Graphics Cookbook uses an example of a bar graph for the number of 4-, 6- and 8-cylinder vehicles in the mtcars data set. Cylinders are listed in the cyl column, which you can access in R using mtcars$cyl.

Here's code to get the count of how many entries there are by cylinder with the table() function; it stores results in a variable called cylcount:

cylcount <- table(mtcars$cyl)

That creates a table called cylcount containing:

4 6 8

11 7 14

Now you can create a bar graph of the cylinder count:

barplot(cylcount)

The qplot() quick plotting function can also create bar graphs:

qplot(mtcars$cyl)

TITLE
Creating a bar plot.

However, this defaults to an assumption that 4, 6, and 8 are part of a variable set that could run from 4 through 8, so it shows blank entries for 5 and 7.

To treat cylinders as distinct groups -- that is, you have a group with 4 cylinders, a group with 6, and a group with 8, not the possibility of entries anywhere between 4 and 8 -- you want cylinders to be treated as a statistical factor:

qplot(factor(mtcars$cyl))

To create a bar graph with the more robust ggplot() function, you can use syntax such as:

ggplot(mtcars, aes(factor(cyl))) + geom_bar()

Histograms

Histograms work pretty much the same, except you want to specify how many buckets or bins you want your data to be separated into. For base R graphics, use:

hist(mydata$columnName, breaks = n)

In this example, columnName is the name of your column in a mydata dataframe that you want to visualize, and n is the number of bins you want.

TEXT
What happens to your bar chart when you don't instruct R not to plot continuous variables.

The ggplot2 commands are:

qplot(columnName, data=mydata, binwidth=n)

For quick plots and for the more robust ggplot():

ggplot(mydata, aes(x=columnName)) + geom_histogram(binwidth=n)

You may be starting to see strong similarities in syntax for various ggplot() examples. While the ggplot() function is somewhat less intuitive, once you wrap your head around its general principles, you can do other types of graphics in a similar way.

| 1 2 3 4 5 6 Page 3
From CIO: 8 Free Online Courses to Grow Your Tech Skills
View Comments
Join the discussion
Be the first to comment on this article. Our Commenting Policies