R is hot. Whether measured by more than 4,400 add-on packages, the 18,000-plus members of LinkedIn's R group, or the close to 80 R Meetup groups currently in existence, there can be little doubt that interest in the R statistics language, especially for data analysis, is soaring.
Why R? It's free, open source, powerful, and highly extensible. "You have a lot of prepackaged stuff that's already available, so you're standing on the shoulders of giants," Google's chief economist told the New York Times back in 2009.
[ Learn how to work smarter, not harder with InfoWorld's roundup of all the tips and trends programmers need to know in the Developers' Survival Guide. Download the PDF today! | For a quick, smart take on the news you'll be talking about, check out InfoWorld TechBrief -- subscribe today. ]
Learn to use R: Your hands-on guide
Because it's a programmable environment that uses command-line scripting, you can store a series of complex data-analysis steps in R. That lets you reuse your analysis work on similar data more easily than if you were using a point-and-click interface, notes Hadley Wickham, author of several popular R packages and chief scientist with RStudio.
That also makes it easier for others to validate research results and check your work for errors -- an issue that cropped up in the news recently after an Excel coding error was among several flaws found in an influential economics analysis report known as Reinhart/Rogoff.
The error itself wasn't a surprise, blogs Christopher Gandrud, who earned a doctorate in quantitative research methodology from the London School of Economics. "Despite our best efforts we always will" make errors, he notes. "The problem is that we often use tools and practices that make it difficult to find and correct our mistakes."
Sure, you can easily examine complex formulas on a spreadsheet. But it's not nearly as easy to run multiple data sets through spreadsheet formulas to check results as it is to put several data sets through a script, he explains.
Indeed, the mantra of "Make sure your work is reproducible!" is a common theme among R enthusiasts.
Who uses R? Relatively high-profile users of R include:
|Used by some within the company for tasks such as analyzing user behavior.|
|There are more than 500 R users at Google, according to David Smith at Revolution Analytics, doing tasks such as making online advertising more effective.|
|National Weather Service||Flood forecasts.|
|Orbitz||Statistical analysis to suggest best hotels to promote to its users.|
|Source: Revolution Analytics|
Why not R? Well, R can appear daunting at first. That's often because R syntax is different from that of many other languages, not necessarily because it's any more difficult than others.
"I have written software professionally in perhaps a dozen programming languages, and the hardest language for me to learn has been R," writes consultant John D. Cook in a Web post about R programming for those coming from other languages. "The language is actually fairly simple, but it is unconventional."
And so, this guide. Our aim here isn't R mastery, but giving you a path to start using R for basic data work: Extracting key statistics out of a data set, exploring a data set with basic graphics and reshaping data to make it easier to analyze.