Beginner's guide to R: Painless data visualization

Learn how to paint a picture with data with R, using just a couple lines of code

Page 5 of 6

Chances are, you'll want to use color to show certain characteristics of your data, as opposed to simply assigning random colors in a graphic. That goes a bit beyond beginning R, but to give one example, say you have a vector of test scores:

testscores <- c(96, 71, 85, 92, 82, 78, 72, 81, 68, 61, 78, 86, 90)

You can do a simple barplot of those scores like this:


And you can make all the bars blue like this:

barplot(testscores, col="blue")

But what if you want the scores 80 and above to be blue and the lower scores to be red? To do this, create a vector of colors of the same length and in the same order as your data, adding a color to the vector based on the data. In other words, since the first test score is 96, the first color in your color vector should be blue; since the second score is 71, the second color in your color vector should be red; and so on.

Of course, you don't want to create that color vector manually! Here's a statement that will do so:

testcolors <- ifelse(testscores >= 80, "blue", "red")

If you have any programming experience, you might guess that this creates a vector that loops through the testscores data and runs the conditional statement: 'If this entry in testscores is greater than or equal to 80, add "blue" to the testcolors vector; otherwise add "red" to the testcolors vector.'

A color-coded bar graph.

Now that you have the list of colors properly assigned to your list of scores, just add the testcolors vector as your desired color scheme:

barplot(testscores, col=testcolors)

Note that the name of a color must be in quotation marks, but a variable name that holds a list of colors should not be within quote marks.

Add a graph headline:

barplot(testscores, col=testcolors, main="Test scores")/

And have the y axis go from 0 to 100:

barplot(testscores, col=testcolors, main="Test scores", ylim=c(0,100))

Then use las-1 to style the axis label to be horizontal and not turned 90 degrees vertical:

barplot(testscores, col=testcolors, main="Test scores", ylim=c(0,100), las=1)

And you have a color-coded bar graph.

By the way, if you wanted the scores sorted from highest to lowest, you could have set your original testscores variable to the following:

testscores <- sort(c(96, 71, 85, 92, 82, 78, 72, 81, 68, 61, 78, 86, 90), decreasing = TRUE)

The sort() function defaults to ascending sort; for descending sort you need the additional argument: decreasing = TRUE.

If that code above is starting to seem unwieldy to you as a beginner, break it into two lines for easier reading, and perhaps set a new variable for the sorted version:

testscores <- c(96, 71, 85, 92, 82, 78, 72, 81, 68, 61, 78, 86, 90)

testscores_sorted <- sort(testscores, decreasing = TRUE)

If you had scores in a data frame called results with one column of student names called students and another column of scores called testscores, you could use the ggplot2 package's ggplot() function as well:

ggplot(results, aes(x=students, y=testscores)) + geom_bar(fill=testcolors, stat = "identity")

Why stat = "identity"? That's needed here to show that the y axis represents a numerical value as opposed to an item count.

Coloring bars by factor.

In ggplot2, qplot() also has easy ways to color bars by a factor, such as number of cylinders, and then automatically generate a legend. Here's an example of graph counting the number of 4-, 6- and 8-cylinder cars in the mtcars data set:

qplot(factor(cyl), data=mtcars, geom="bar", fill=factor(cyl))

But as I said, we're getting somewhat beyond a beginner's overview of R when coloring by factor. For a few more examples and details for many of the themes covered here, you might want to see the online tutorial Producing Simple Graphs with R. For more on graphing with color, check out a source such as the R Graphics Cookbook. The ggplot2 documentation also has a lot of examples, such as this page for bar geometry.

| 1 2 3 4 5 6 Page 5