The R programming language is a key tool for data scientists. It is not, however, easy to learn or use. While some suggest that R and data science in general is inherently complex, there's clearly opportunity for it to be democratized, at least to the point that business analysts can take advantage of it -- which is critical, given how important data has become to running an enterprise effectively.
Embedded in Microsoft’s acquisition last week of Revolution Analytics is the possibility that in the future you may not need to be a propeller-head to effectively use R. Just as Microsoft lowered the bar to becoming an effective system administrator and developer so, too, may its ownership of R help to close the data science skills gap that plagues the industry.
Geeks to inherit R
In some ways, Microsoft had no choice but acquire Revolution Analytics. As much as Microsoft, Oracle, or other tech giants may wish it otherwise, big data is a big deal, and nearly all of the best big data technology is open source. This is why Microsoft has embraced Hadoop, MongoDB, and other leading big data technologies, both for internal use and within the products it licenses to others.
Buying into R, the default programming language for data scientists, makes sense.
While Python has proven popular with an increasing number of would-be data scientists, it’s still the case, as a KDNuggets poll of data science professionals reveals, that R dominates data science (used by 61 percent of responders), compared to Python (39 percent) or SQL (37 percent).
As Gartner research director Alexander Linden finds, “A lot of innovative data scientists really favor open source components (especially Python and R) in their advanced analytics stack.” The statement is true, but it also implies R’s Achilles' heel: It’s hard to use (Bob Muenchen offers a few reasons why), a tool for the “innovative” and “advanced.”
Many have been willing to forgive R for this sin because, as Tal Yarkoni speculates, “even people who hate the way R chokes on large data sets, and its general clunkiness as a language, often [can’t] help running back to R as soon as any kind of serious data manipulation was required.”
It’s hard, but it’s powerful.
Microsoft to the rescue
But what if it could be easy and powerful? Companies like Datameer promise to democratize data science, but arguably no other company has more potential to do this than Microsoft.
Microsoft has a long history of making complicated technology simple to use. Love it or hate it, Microsoft has done more to democratize technology than any other vendor.
Could Microsoft do the same for R? Definitely maybe.
A certain amount of complexity is inherent in R, of course. Red Hat’s Dave Neary argues, “R is for statistics and numerical analysis,” requiring an “understand[ing of] the math to some degree.” He goes on to suggest, “Saying it's too hard for mere mortals is like saying a saw is too hard. [You n]eed to learn the tools.”
The promise behind the deal, however, is that Microsoft can significantly improve those tools, such that a tech-savvy, nonprogrammer can do “data science.”
Or as Henri Yandell humorously responded to my interaction with Neary, it’s like “asking if Microsoft are going to make a power saw for those too lazy to learn how to use a hand saw.” It's not a perfect analogy, of course, but I suspect many will be very happy to be given a power saw for data.
Let’s be clear: That “power saw” is very much what Microsoft seems to have in mind. While few details were offered, Joseph Sirosh, Microsoft’s corporate vice president of Machine Learning, insists that Microsoft plans to improve access to the power of R:
As their volumes of data continually grow, organizations of all kinds around the world – financial, manufacturing, health care, retail, research – need powerful analytical models to make data-driven decisions. This requires high performance computation that is “close” to the data, and scales with the business’ needs over time. At the same time, companies need to reduce the data science and analytics skills gap inside their organizations, so more employees can use and benefit from R. This acquisition is part of our effort to address these customer needs.
The plan, then, is to “empower enterprises, R developers and data scientists to more easily and cost effectively build applications and analytics solutions at scale” -- not only data scientists, not only R developers, but also the more pedestrian enterprise customers that Microsoft has sold into for decades.
Analyst Ben Kepes believes there’s promise in such “applied analytics,” and I agree. He writes, “‘Analytics’ is, for the vast majority of people, merely a concept they have no access to. Everyone has heard of ‘delivering insights,’ but few have the ability to do so. Analytics, when applied to core applications and delivered to end users, changes that.”
Assuming Microsoft can deliver, he concludes, this has the potential to deliver “analytics democratized.”
Cause for concern in open source land?
While Microsoft’s ability to democratize R remains an open question, its commitment to R’s open source community is not. Not many years ago, Microsoft buying an open source company would have been impossible. The culture simply couldn’t support it.
But such has been the progress under CEO Satya Nadella that no one even smirks when David Smith, chief community officer at Revolution Analytics, declares:
For our users and customers, nothing much will change with the acquisition. We’ll continue to support and develop the Revolution R family of products — including non-Windows platforms like Mac and Linux. The free Revolution R Open project will continue to enhance open source R.
Of course they will. Not only will Microsoft tolerate it, Microsoft will actually encourage it. It’s a new Microsoft, helping to create a new R. The two need each other, making this an exceptionally interesting development in big data.