Beginner's guide to R: Syntax quirks you'll want to know

Part 5 of our hands-on guide covers some R mysteries you'll need to understand.

Page 4 of 6

A single number or character string is also a vector -- a vector of 1. When you access the value of a variable that has just one value, such as 73 or "Learn more about R at Computerworld.com," you'll also see this in your console before the value:

[1]

That's telling you that your screen printout is starting at vector item number one. If you have a vector with lots of values so the printout runs across multiple lines, each line will start with a number in brackets, telling you which vector item number that particular line is starting with. (See the screenshot below.)

TITLE
If you have a vector with lots of values so the printout runs across multiple lines, each line will start with a number in brackets, telling you which vector item number that particular line is starting with.

If you want to mix numbers and strings or numbers and TRUE/FALSE types, you need a list. If you don't create a list, you may be unpleasantly surprised that your variable containing (3, 8, "small") was turned into a vector of characters ("3", "8", "small").

By the way, R assumes that 3 is the same class as 3.0 -- numeric (that is, with a decimal point). If you want the integer 3, you need to signify it as 3L or with the as.integer() function. In a situation where this matters to you, you can check what type of number you've got by using the class() function:

class(3)

class(3.0)

class(3L)

class(as.integer(3))

There are several as() functions for converting one data type to another, including as.character(), as.list(), and as.data.frame().

R also has special vector and list types that are of special interest when analyzing data, such as matrices and data frames. A matrix has rows and columns; you can find a matrix dimension with dim() such as:

dim(my_matrix)

A matrix needs to have all the same data type in every column, such as numbers everywhere.

Data frames are like matrices except one column can have a different data type from another column, and each column must have a name. If you've got data in a format that might work well as a database table (or well-formed spreadsheet table), it will also probably work well as an R data frame.

In a data frame, you can think of each row as similar to a database record and each column like a database field. There are lots of useful functions you can apply to data frames, some of which I've gone over in earlier sections, such as summary() and the psych package's describe().

And speaking of quirks: There are several ways to find an object's underlying data type, but not all of them return the same value. For example, class() and str() will return data.frame on a data frame object, but mode() returns the more generic list.

If you'd like to learn more details about data types in R, you can watch this video lecture by Roger Peng, associate professor of biostatistics at the Johns Hopkins Bloomberg School of Public Health:

One more useful concept to wrap up this section -- hang in there, we're almost done: factors. These represent categories in your data. If you have a data frame with employees, their department, and their salaries, salaries would be numerical data and employees would be characters (strings in many other languages); but you'd likely want department to be a factor -- in other words, a category you may want to group or model your data by. Factors can be unordered, such as department, or ordered, such as "poor", "fair", "good" and "excellent."

| 1 2 3 4 5 6 Page 4
From CIO: 8 Free Online Courses to Grow Your Tech Skills
View Comments
Join the discussion
Be the first to comment on this article. Our Commenting Policies