Chances are, though, you'll want to subset your data by more than one column at a time. That's when you'll want to use bracket notation, what I think of as rows-comma-columns. Basically, you take the name of your data frame and follow it by
[rows,columns]. The rows you want come first, followed by a comma, followed by the columns you want. So, if you want all rows but just columns 2 through 4 of mtcars, you can use:
Do you see that comma before the 2:4? That's leaving a blank space where the "which rows do you want?" portion of the bracket notation goes, and it means "I'm not asking for any subset, so return all." Although it's not always required, it's not a bad practice to get into the habit of using a comma in bracket notation so that you remember whether you were slicing by columns or rows.
If you want multiple columns that aren't contiguous, such as columns 2 and 4 but not 3, you can use the notation:
A couple of syntax notes here:
- R indexes from 1, not 0. So your first column is at  and not .
- R is case sensitive everywhere.
mtcars$mpgis not the same as
mtcars[,-1]will not get you the last column of a data frame, the way negative indexing works in many other languages. Instead, negative indexing in R means exclude that item. So,
mtcars[,-1]will return every column except the first one.
- To create a vector of items that are not contiguous, you need to use the combine function
cwill not work. You need that
What if want to select your data by data characteristic, such as "all cars with mpg > 20", and not column or row location? If you use the column name notation and add a condition like:
you don't end up with a list of all rows where mpg is greater than 20. Instead, you get a vector showing whether each row meets the condition, such as:
 TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
 TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE
 TRUE FALSE FALSE FALSE TRUE
To turn that into a listing of the data you want, use that logical test condition and row-comma-column bracket notation. Remember that this time you want to select rows by condition, not columns. This:
tells R to get all rows from mtcars where mpg > 20, and then to return all the columns.
If you don't want to see all the column data for the selected rows but are just interested in displaying, say, MPG and horsepower for cars with an MPG greater than 20, you could use the notation:
using column locations, or:
using the column names.
Why do you need to specify
mtcars$mpg in the row spot but "mpg" in the column spot? Just another R syntax quirk is the best answer I can give you.