Not all R functions need a robust data set to be useful for statistical work. For example, how many ways can you select a committee of 4 people from a group of 15? You can pull out your calculator and find 15! divided by 4! times 11! ... or you can use the R
Or, perhaps you want to see all of the possible pair combinations of a group of 5 people, not simply count them. You can create a vector with the people's names and store it in a variable called
mypeople <- c("Bob", "Joanne", "Sally", "Tim", "Neal")
In the example above,
c() is the combine function.
Then run the
combn() function, which takes two arguments -- your entire set first and then the number you want to have in each group:
Probably most experienced R users would combine these two steps into one like this:
combn(c("Bob", "Joanne", "Sally", "Tim", "Neal"),2)
But separating the two can be more readable for beginners.
Get slices or subsets of your data
Maybe you don't need correlations for every column in your data frame and you just want to work with a couple of columns, not 15. Perhaps you want to see data that meets a certain condition, such as within 3 standard deviations. R lets you slice your data sets in various ways, depending on the data type.
To select just certain columns from a data frame, you can either refer to the columns by name or by their location (column 1, 2, 3).
For example, the
mtcars sample data frame has these column names:
Can't remember the names of all the columns in your data frame? If you just want to see the column names and nothing else, instead of functions such as
head(mtcars), you can type:
That's handy if you want to store the names in a variable, perhaps called
mtcars.colnames (or anything else you'd like to call it):
mtcars.colnames <- names(mtcars)
But back to the task at hand. To access only the data in the
mpg column in
mtcars, you can use R's dollar sign notation:
More broadly, then, the format for accessing a column by name would be:
That will give you a 1-dimensional vector of numbers like this:
 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8<
 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5
 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4
The numbers in brackets are not part of your data, by the way. They indicate what item number each line is starting with. If you have only one line of data, you'll just see . If there's more than one line of data and only the first 11 entries can fit on the first line, your second line will start with , and so on.
Sometimes a vector of numbers is exactly what you want -- if, for example, you want to quickly plot
mtcars$mpg and don't need item labels, or you're looking for statistical info such as variance and mean.