If you're finding that your selection statement is starting to get unwieldy, you can put your row and column selections into variables first, such as:
mpg20 <- mtcars$mpg > 20
cols <- c("mpg", "hp")
Then you can select the rows and columns with those variables:
making for a more compact select statement but more lines of code.
Getting tired of including the name of the data set multiple times per command? If you're using only one data set and you are not making any changes to the data that need to be saved, you can attach and detach a copy of the data set temporarily.
attach() function works like this:
So, instead of having to type:
mpg20 <- mtcars$mpg > 20
You can leave out the data set reference and type this instead:
mpg20 <- mpg > 20
attach(), remember to use the
detach function when you're finished:
Some R users advise avoiding
attach() because it can be easy to forget to
detach(). If you don't
detach() the copy, your variables could end up referencing the wrong data set.
Alternative to bracket notation
Bracket syntax is pretty common in R code, but it's not your only option. If you dislike that format, you might prefer the
subset() function instead, which works with vectors and matrices as well as data frames. The format:
subset(your data object, logical condition for the rows you want to return, select statement for the columns you want to return)
mtcars example, to find all rows where MPG is greater than 20 and return only those rows with their MPG and HP data, the
subset() statement would look like:
subset(mtcars, mpg>20, c("mpg", "hp"))
What if you wanted to find the row with the highest MPG?
If you just wanted to see the MPG information for the highest MPG:
subset(mtcars, mpg==max(mpg), mpg)
If you just want to use
subset to extract some columns and display all rows, you can either leave the row conditional spot blank with a comma, similar to bracket notation:
subset(mtcars, , c("mpg", "hp"))
Or, indicate your second argument is for columns with select= like this:
subset(mtcars, select=c("mpg", "hp"))
To tally up counts by factor, try the
table command. For the diamonds data set, to see how many diamonds of each category of cut are in the data, you can use:
This will return how many diamonds of each factor -- fair, good, very good, premium, and ideal -- exist in the data. Want to see a cross-tab by cut and color?
table function returns a count of each factor in your data.
If you are interested in learning more about statistical functions in R and how to slice and dice your data, there are a number of free academic downloads with many more details. These include Learning statistics with R by Daniel Navarro at the University of Adelaide in Australia (500+ page PDF download, may take a little while). And although not free, books such as "The R Cookbook" and "R in a Nutshell" have a lot of good examples and well-written explanations.
See the entire beginner's guide to R:
• Part 1: Introduction to R
• Part 2: Getting your data into R
• Part 3: Easy ways to do basic data analysis with R
• Part 4: Painless data visualization using R
• Part 5: Syntax quirks you'll want to know about R
• Part 6: Useful resources for R