Hey Microsoft, a rewrite of the R language is a silly idea

Acquiring Revolution Analytics gave Microsoft a distribution of the open source R language. Let's hope it doesn't go through with its rumored plans for a GPL-free rewrite of R

R language graffiti
Credit: flickr/David Goehring

Correction: An earlier version of this article incorrectly assumed that Microsoft acquired the rights to R Project from Revolution Analytics. As Revolution Analytics chief community officer David Smith has since stressed, "R is owned by the R Foundation," not Microsoft. The author apologizes for the inaccuracy, though he continues to believe it would be foolish of Microsoft to reimplement R to get around the GPL, which is the main argument.

Microsoft has become an open source advocate, with CEO Satya Nadella declaring that Microsoft must go "open source internally" even as it engages more with external open source communities.

But not all open source, apparently.

According to sources, Microsoft is evaluating a rewrite of GPL-licensed R, the open-source language and programming environment for statistical analysis. Because Microsoft wants to embed R in a host of proprietary products, the thinking goes, it will need to ditch R's GPL license in favor of an alternate license.

Whether Microsoft is actually planning such a rewrite -- and my inquiries to the company went unanswered -- it's worth asking whether doing so would make much sense.

Actually, it's not worth asking that question. The answer is no, for a host of reasons.

Freedom to be proprietary

Not that everyone agrees. Tibco, for example, rewrote R to get around the GPL and build in scalability. It feels Microsoft needs to do the same:

Microsoft should re-implement the R language much like TIBCO Software did when they created the TIBCO Enterprise Runtime for R (TERR). Otherwise, the open-source GPL license that protects the R language may limit what Microsoft can do as it keeps R from being widely embedded into commercial products.

Though Microsoft acquired Revolution Analytics, Revolution Analytics' chief community officer David Smith is quick to insist that this did not give Microsoft ownership of R. The intellectual property associated with R is owned by the R Foundation. (It's not clear how Revolution Analytics was able to distribute proprietary extensions to R without having a license thereto, and requests for clarification have gone unanswered as of this writing. But we'll leave that issue alone for now.)

Because R is licensed under the GPL and none of Microsoft's products (such as SQL Server) are, Microsoft can only benefit from R at arm's length. As Tibco points out, a recent preview shown at Microsoft Ignite had R running in a separate sandbox process alongside SQL Server, "an example of the sort of loose integration that might be required to respect the GPL terms and to address concerns over open source R's scalability, at the potential cost of performance and convenience for the end user."

It's not clear why Microsoft can't integrate proprietary code with R in the same way that Revolution Analytics did, but it is pretty clear that its attorneys don't relish the idea of trying.

Let's assume Microsoft's attorneys aren't horrified by the GPL (no, really!), and that the only way to rid itself of R's GPL license is through a rewrite. It's still a bad idea. Here's why.

Stripping open source of its value

Though many enterprises continue to mistakenly associate open source's value with "free as in beer" software, the reality is different. With any open source project, the value of the code pales in comparison to the community associated with it.

For example, when Matt Cwalinski, Sruti Cheedalla, and Kat Styons, Web developers at The Washington Post, were asked why they opted for MongoDB over Couchbase for a project, they said features factored in, but the deciding factor was community.

Community, not code.

This emphasis on community heavily factors into Redmonk analyst Stephen O'Grady's contention that the "costs (hard and soft) [associated with an R rewrite] would seem to far outweigh the benefits, all of which [have] workarounds (license, performance)." He goes further by questioning "whether embeddable R is worth [the] production costs (current and going forward), PR hit, integration issues w/ standard packages, etc."

Community, in O'Grady's reasoning, isn't just the developers contributing to the R project, though this is a big component. It's also the third parties that build around it, whose packages wouldn't work with a forked version of R.

All this seems a big price to pay to maintain Microsoft's preferred licensing regime. Especially now.

While Microsoft has billions of dollars tied up in that licensing regime, the reality is that the world is changing. As O'Grady also notes, the industry "trend is toward services-based monetization, which is not an issue for the GPL."

That is, though the GPL requires contribution of modifications around software distributed in a traditional manner (i.e., you distribute the software on a CD or in some other tangible medium), it doesn't kick in for software distributed as a service over a network.

Given that Microsoft is helping to lead this shift, it's unclear just how long a GPL-free R would even be relevant for Microsoft. BBVA's José María San José Juarez told the Red Hat Summit audience this week that "everything will be cloud in the future, even transactional systems."

When even financial services companies see a cloudy future, we're clearly moving to a world that is safe for otherwise GPL-averse companies.

Build a real community

Even if we disregard all this and come to the conclusion that there are still good reasons for Microsoft to rewrite R to escape the GPL, there are better and worse ways to go about it.

On the "worse" side of the ledger, Microsoft doing anything on its own is a bad idea. But the company is now savvy enough with open source to grok this.

Former Pivotal chief scientist (and current CEO of a stealth startup) Milind Bhandarkar offers an ideal formula for success:

Create R foundation. Donate R to them. Foundation creates dual license. Licenses it to Microsoft to embed it in SQL server. Done.

In this way, Microsoft could have its GPL-free cake and eat it, too. Of course, questions will remain. For example, Michael Bauer, data wrangler for the Open Knowledge Foundation, wonders why Microsoft would waste time rewriting the R runtime instead of just contributing to the project.

Why subtract when you can add?

It's possible that Microsoft could form a foundation that garners even more community than R currently has, but this is a big risk, and arguably not worth it, for all the reasons already stated.

Let's be clear: R is not popular despite its license, but partly because of it. That license has helped to corral a community around the project. If Microsoft just needed the code, it could have started its own R project (let's call it the "M" programming language).

But it didn't. It bought Revolution Analytics, a company with deep expertise in R and a business devoted to extending it with proprietary code. Now it's time for Microsoft to stick with R and contribute to its success. That success won't begin by stripping out the GPL, though it could end with such an act.

From CIO: 8 Free Online Courses to Grow Your Tech Skills
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.