Beginner's guide to R: Syntax quirks you'll want to know

Part 5 of our hands-on guide covers some R mysteries you'll need to understand.

Page 3 of 6

Plain old apply() runs a function on either every row or every column of a two-dimensional matrix where all columns are the same data type. For a 2D matrix, you also need to tell the function whether you're applying by rows or by columns: Add the argument 1 to apply by row or 2 to apply by column. For example:

apply(my_matrix, 1, median)

This returns the median of every row in my_matrix.

apply(my_matrix, 2, median)

And the above calculates the median of every column.

Other functions in the apply() family such as lapply() or tapply() deal with different input/output data types. Australian statistical bioinformatician Neal F.W. Saunders has a nice brief introduction to apply in R in a blog post if you'd like to find out more and see some examples. (In case you're wondering, bioinformatics involves issues around storing, retrieving, and organizing biological data, not just analyzing it.)

Many R users who dislike the the apply functions don't turn to for loops, but instead install the plyr package created by Hadley Wickham. He uses what he calls the "split-apply-combine" model of dealing with data: Split up a collection of data the way you want to operate on it, apply whatever function you want to each of your data group(s), and combine them all back together again.

The plyr package is probably a step beyond this basic beginner's guide, but if you'd like to find out more, you can head to Wickham's plyr website. There's also a useful slide presentation in PDF format from Cosma Shalizi, an associate professor of statistics at Carnegie Mellon University, and Vincent Vu. Another PDF presentation is from an introduction to R workshop at Iowa State University.

R data types in brief (very brief)

Should you learn about all of R's data types and how they behave right off the bat, as a beginner? If your goal is to be an R ninja then, yes, you have to know the ins and outs of data types. But my assumption is that you're here to try generating quick plots and stats before diving in to create complex code.

To start off with the basics, here's what I'd suggest you keep in mind for now: R has multiple data types. Some of them are especially important when doing basic data work. And some functions that are quite useful for doing your basic data work require your data to be in a particular type and structure.

More specifically, R has the "Is it an integer or character or true/false?" data type, the basic building blocks. R has several of these including integer, numeric, character, and logical. Missing values are represented by NaN (if a mathematical function won't work properly) or NA (missing or unavailable).

As mentioned in the prior section, you can have a vector with multiple elements of the same type, such as 1, 5, 7 or "Bill", "Bob", "Sue".

| 1 2 3 4 5 6 Page 3
From CIO: 8 Free Online Courses to Grow Your Tech Skills
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.