Now that I've stressed the importance of that
c() function, I (reluctantly) will tell you that there's a case when you can leave it out -- if you're referring to consecutive values in a range with a colon between minimum and maximum, like this:
my_vector <- (1:10)
I bring up this exception because I've run into that style quite a bit in R tutorials and texts, and it can be confusing to see the c required for some multiple values but not others. Note that it won't hurt anything to use the
c with a colon-separated range, though, even if it's not required, such as:
my_vector <- c(1:10)
One more very important point about the
c() function: It assumes that everything in your vector is of the same data type -- that is, all numbers or all characters. If you create a vector such as:
my_vector <- c(1, 4, "hello", TRUE)
You will not have a vector with two integer objects, one character object, and one logical object. Instead,
c() will do what it can to convert them all into all the same object type, in this case all character objects. So my_vector will contain
TRUE. In other words,
c() is also for "convert" or "coerce."
To create a collection with multiple object types, you need a list, not a vector. You create a list with the
list() function, not
c(), such as:
My_list <- list(1,4,"hello", TRUE)
Now you have a variable that holds the number 1, the number 4, the character object
hello, and the logical object
Iterating through a collection of data with loops like
while is a cornerstone of many programming languages. That's not the R way, though. While R does have
repeat loops, you'll more likely see operations applied to a data collection using
apply() functions or by using the
plyr() add-on package functions.
But first, some basics.
If you have a vector of numbers such as:
my_vector <- c(7,9,23,5)
Say you want to multiply each by 0.01 to turn them into percentages, how would you do that? You don't need a
while loop. Instead, you can create a new vector called
my_pct_vectors like this:
my_pct_vector <- my_vector * 0.01
Performing a mathematical operation on a vector variable will automatically loop through each item in the vector.
Typically in data analysis, though, you want to apply functions to subsets of data: Finding the mean salary by job title or the standard deviation of property values by community. The
apply() function group and
plyr add-on package are designed for that.
There are more than half a dozen functions in the apply family, depending on what type of data object is being acted upon and what sort of data object is returned. "These functions can sometimes be frustratingly difficult to get working exactly as you intended, especially for newcomers to R," says a blog post at Revolution Analytics, which focuses on enterprise-class R.