Ever since Yahoo released its then-internal Hadoop project to the Apache Software Foundation in 2007, several commercial vendors have been fighting for mindshare on the market, trying to be viewed as the first/best/most open source/least proprietary/most enterprise-ready/etc. provider of Hadoop distribution. In The Twelve Days of Christmas (a Data Carol), we listed nine vendors of Hadoop distros.
When I worked at Talend, we would partner with most of these players, steadily remaining distribution-agnostic. Of course this meant we had to make adjustments to our big data integration stack to support these various distributions, but thankfully these adjustments were minimal because all distributions are based on the same "commons": the Apache Hadoop project(s). (Obviously since I left Talend I no longer have insight on their product or partnership strategy but I don't believe this to have changed, based on the publications I see from them).
Sitting on the sidelines of this fight, it is interesting to analyze the respective messages of each of the key players.
- Cloudera: We were first on the market. And we have Doug Cutting, the inventor of Hadoop, on our team. And we have more contributors/lines of code than the others.
- Hortonworks: We are funded by Yahoo, where Hadoop was invented. And we have more contributors/lines of code than the others.
- All others: We are happy to watch you two fight over this while we do business.
- Hortonworks: We are the only pure open source player in Hadoop. We have zero proprietary extension, everything we build is reversed to the Apache project.
- Cloudera: Sure, we are open source, but we also have extensions that are open, although not exactly open source. We don't like the term "open core," though....
- MapR: Some of that open source stuff is not exactly failure-proof or high performance, so we rewrote it (and yes, it's proprietary). Our customers don't really care, all they want is stuff that works.
- All others: We sell Hadoop distributions alongside our closed source technologies, so it does not really matter for us. Please continue to contribute good stuff to the project so that we can use it too. We may also contribute every now and then, but it's not our core focus.
- MapR: By rewriting some of the core parts, we were able to improve reliability, performance and ease of use. We have the best Hadoop distro, even if it's not 100% Pure Hadoop.
- Cloudera: We focus on the enterprise and every open core part we've built on top of Pure Hadoop is focused at enterprise usability.
- Hortonworks: We keep pushing improvements to the Hadoop open source core. Everybody can use them, but they don't always do because they have invested in competing proprietary technology. We've just announced an even better way to get everyone to use our cool stuff: the Open Data Platform.
- All others: We focus on interoperability with our proprietary stuff, that's what our clients want.
- All: We have raised gazillions of dollars of funding (numbers keep changing so I won't risk this piece being obsolete before publication by attempting to track them). We are/will be the first/second/next to be publicly traded. And we are spending a lot more money than we make (but the market seems to like that).
Obviously, this is only the beginning. Consolidation of distros is already happening, with Intel and Pivotal throwing in the towel. Will vendor consolidation be next? In any case, this is an interesting fight to watch.
This article is published as part of the IDG Contributor Network. Want to Join?