So you've read your data into an R object. Now what?
Examine your data object
Before you start analyzing, you might want to take a look at your data object's structure and a few row entries. If it's a two-dimensional table of data stored in an R data frame object with rows and columns -- one of the more common structures you're likely to encounter -- here are some ideas. Many of these also work on one-dimensional vectors as well.
[ Learn how to work smarter, not harder with InfoWorld's roundup of all the tips and trends programmers need to know in the Developers' Survival Guide. Download the PDF today! | Keep up with the latest developer news with InfoWorld's Developer World newsletter. ]
See the entire beginner's guide to R:
Many of the commands below assume that your data are stored in a variable called
mydata (and not that
mydata is somehow part of these functions' names).
[This story is part of Computerworld's "Beginner's guide to R." To read from the beginning, check out the introduction; there are links on that page to the other pieces in the series.]
If you type:
R will display
mydata's column headers and first six rows by default. Want to see, oh, the first 10 rows instead of six?
Note: If your object is just a 1-dimensional vector of numbers, such as (1, 1, 2, 3, 5, 8, 13, 21, 34),
head(mydata) will give you the first six items in the vector.
To see the last few rows of your data, use the
Tail can be useful when you've read in data from an external source, helping to see if anything got garbled (or there was some footnote row at the end you didn't notice).
To quickly see how your R object is structured, you can use the
This will tell you the type of object you have; in the case of a data frame, it will also tell you how many rows (observations in statistical R-speak) and columns (variables to R) it contains, along with the type of data in each column and the first few entries in each column.
For a vector,
str() tells you how many items there are -- for 8 items, it'll display as [1:8] -- along with the type of item (number, character, and so on) and the first few entries.
Various other data types return slightly different results.
If you want to see just the column names in the data frame called
mydata, you can use the command:
Likewise, if you're interested in the row names -- in essence, all the values in the first column of your data frame -- use: