Hadoop runs out of gas

As big data customers flee complexity and embrace the cloud, the Hadoop vendors are sputtering

Hadoop runs out of gas
Thinkstock

Big data remains a big deal, but that fact is somewhat obscured by the recent stumbling of its former poster children: Cloudera, Hortonworks, and MapR. Once the darlings of data, able to raise gargantuan piles of cash—Intel pumped $766 million into Cloudera in just one investment round!—the heavyweights have been forced to skinny down, whether by merging (Cloudera and Hortonworks) or cutting heads (MapR).

Meanwhile, other open source big data vendors like Elastic and MongoDB are soaring. What gives? There is, of course, a variety of reasons, among them the fact that the erstwhile Hadoop vendors bet big on the wrong audience, namely architects bound to the data center, while the market shifted to developers seeking freedom in the cloud.

Big is relative

MapR is the latest casualty of the vendors that grew fat on Hadoop’s riches. Once valued at over $1 billion, MapR recently revealed that it must lay off 122 employees (roughly 25 percent of its employee base) including its CEO, John Schroeder, other senior executives, and many engineers, while also shutting down its headquarters location, unless it can find an investor.

That investor must sign on by June 14 or MapR’s future looks dismal.

But then, so does its recent past. Over the last two years, according to LinkedIn data, MapR has shrunk 29 percent. Nor is it alone. After combining with Hortonworks (presumably because the two companies couldn’t subsist solo), Cloudera just announced calamitous earnings, projecting $69 million to $89 million less in revenue than analysts were projecting. At the same time, CEO Tom Reilly and CSO and co-founder Mike Olson both announced their resignations.

The stock promptly took a 40 percent nosedive.

These results would be easier to ascribe to reality returning to an overhyped big data world, but other vendors have thrived, even as the Hadoop bellwethers have collapsed. The MongoDB database, for example, keeps growing in popularity, now roughly one-third as popular as Oracle and MySQL (measured across a variety of indices), up from one-tenth just five years ago. This popularity, in turn, keeps driving lots of revenue growth for the eponymous company, which most recently saw revenue jump 78 percent.

Similarly, Elastic, the company behind the Elasticsearch distributed search and analytics engine, has doubled its workforce in the last year while seeing revenue climb 70 percent in its latest quarter. Companies have been turning to Elastic for traditional text search and much more, like Stansted Airport using Elastic’s tools to track and visualize people and baggage traffic through the airport, offering real-time analysis.

This wasn’t how the script was supposed to read. Technologies like MongoDB and Elasticsearch, and the companies behind them, were never supposed to be able to challenge Hadoop and its offspring. Yet they have. Why?

A very cloudy forecast

Well, cloud is one answer, but it’s part of a multi-faceted response. As Anaconda senior vice president Mathew Lodge has written, although Cloudera, Hortonworks, and MapR tried desperately to evolve from on-premises offerings, cloud-native options from AWS, Microsoft Azure, and Google Cloud all conspired to provide “fully integrated offerings that have a lower cost of acquisition and are cheaper to scale.” Enterprises noticed. Again, the Hadoop vendors moved as quickly as they could to build out cloud services, but they simply haven’t matched the pace of their cloud-heavy competitors.

Continuing the cloud advantages, Hadoop, while revolutionary for its time, is absurdly expensive compared to cloud alternatives. As Clint Sharp notes, “The main primary use case for Hadoop has always been cheap storage. [With the cloud] storage both got cheaper and the UX of S3+EMR and other services is 1000x better.” Hadoop might have been a great alternative to traditional, proprietary data warehouses, for example, but it is nowhere near as good as more modern approaches like cloud-based Snowflake.

At the same time, the cloud heralded different, new ways to deal with data. These weren’t like-for-like replacements, per se, but like MongoDB or Elasticsearch, they tackled the same sorts of problems as Hadoop but without the mind-numbing difficulty. As MongoDB’s Joe Drumgoole puts it, “Writing effective distributed map-reduce algorithms is hard, really hard.” Making this worse, the Hadoop vendors scrambled to add a wide array of open source add-ons (Impala! Pig! Hive! Flume!) to their Hadoop products, inventing ever more cumbersome “solution stacks” until, finally, “Nobody knows what the %&*# these Hadoop companies do,” according to one observer.

For some enterprises it was worth the pain to wade through this expense in terms of time and attention. As for developers tasked with “getting stuff done,” however, they have increasingly opted for more straightforward alternatives.

Convenience trumps all

The out-of-the-box experience for users of Hadoop and its progeny is ugly. Contrast this with MongoDB. Former MongoDB executive Kelly Stirman identifies the MongoDB user experience as a key differentiator. How so? Tom Barber explains:

[With] MongoDB you can apt install on one server with ease and not have to mess around with a terrible VM to get going. In production, you can run it on one server. You can hook it up to a bunch of stuff without writing a bunch of code. People want databases…. MongoDB is easy to get data into, it’s also easy to get data out of.

TimeScale DB CEO Ajay Kulkarni, nodding in agreement, adds:

Developer love [is the reason MongoDB trumped Hadoop]. Mongo focused on the first-time user experience. Hadoop is notoriously hard to run. [Hadoop vendors] had a good sales pitch for enterprises but without dev love growth stalled and the market evaporated.

While it would be an overstatement to claim developer love completely accounts for the success of MongoDB and Elastic over Cloudera and MapR, it is a real factor.

Developers, Jake Kaldenbaugh reasons, started “baking” MongoDB into their modern applications. Over time, developers who were pushing MongoDB into less-critical applications moved them into business-critical applications, with MongoDB adding functionality (like multi-document transactions) to enable more complicated use cases without making them tremendously more complicated.

So where does that leave the former giants of big data? Lodge offers the eulogy:

[A]fter a good 10 years of Cloudera and Hortonworks [and MapR] being the center of the Big Data universe, the center of gravity has moved elsewhere. The leading cloud companies don’t run large Hadoop/Spark clusters from Cloudera and Hortonworks — they run distributed cloud-scale databases and applications on top of container infrastructure. They do their machine learning in Python, R, and other languages that are not Java. Increasingly, enterprises are shifting to similar approaches because they want to reap the same speed and scale benefits. It’s time for the Hadoop and Spark world to move with the times.

This is one of the blessings and curses of open source data infrastructure innovation. It’s happening at breakneck speed, and some vendors will be broken in the process.