Enterprises don't seem to be getting any better at figuring out Hadoop, but that hasn't stopped them from dumping ever-increasing mountains of cash into it.
By Gartner's preliminary estimates, 2016 spend on Hadoop distributions reached $800 million, a 40 percent spike from 2015. Unfortunately, all that spending still only has 14 percent of enterprises actually reporting Hadoop deployments, hardly climbing from 2015's 10 percent.
One bright spot: Hadoop deployments are increasingly moving to the cloud, where they may have a better chance of success.
Everybody's doing the Hadoop thing
Before you take exception to the word "Hadoop," arguing that it has been displaced by Apache Spark or other big data infrastructure, you're right. And wrong.
That is, in this case Gartner includes all "commercially packaged and supported editions of the open source Apache Hadoop-related projects" in its definition of "Hadoop." In other words, while the old-school HDFS and MapReduce are included in Gartner's definition, so are YARN, Pig, Hive, HBase, ZooKeeper, Avro, Flume, Kafka, Oozie, Parquet, Solr, Spark, and Sqoop.
Indeed, as Gartner analyst Merv Adrian explains, "The survey is about big data projects." Given how mainstream big data has become in mainstream media, it would be tempting to think big data Hadoop projects had gone mainstream in adoption. It would also be wrong.
As Gartner illustrates, enterprises seem stuck in a constant state of experimentation with Hadoop, never quite able to move into production:
Not only did 2016 see only a small increase in Hadoop deployments, but the pipeline leading into deployment fell across the board. Even if we assume that somehow the word "Hadoop" is skewing the results and we need to delve into a more general big data definition, the historic numbers aren't much better:
In sum, big data has yielded big hype, but not yet big success.
To the cloud!
Well, that's not quite true. Hortonworks, for example, recently had a strong quarter, growing revenue 39 percent year over year. In 2016, the company did nearly $200 million in revenue, $126 million of it derived from subscriptions to its Hadoop platform.
Part of this success for Hortonworks, however, probably comes down to its increasing embrace of the cloud. As noted on its earnings call, roughly 25 percent of Hortonworks customers now run its software in the public cloud, up from approximately 0 percent two years ago. This is where developers want to run their software, and appeasing them is smart business.
While this shift to the cloud likely favors Amazon Web Services and Microsoft Azure far more than it helps Hortonworks, Cloudera, or MapR, it's a rising tide that will tend to lift all boats. It also may save them from leaking.
One of the big drivers for Hadoop deployments moving to the cloud is the sheer complexity of making Hadoop work. Every day there's a new Apache project to complement and accelerate innovation in Hadoop, and it's next to impossible for mainstream enterprises to keep up. For enterprises that aren't Google, keeping up with the latest and greatest in streaming analytics, for example, will "often require the use of immature, unsupported software," as Gartner notes.
In response, Gartner says, "Cloud-based delivery models also allow organizations to better absorb the constant stream of changes to the components (typically Apache projects) in the Hadoop ecosystem." The heavy lifting of upgrading a constant stream of Hadoop components is left to the cloud provider, which also makes it easier to manage the separation of storage and compute.
Frankly, this is where big data projects belong. As AWS product strategy chief Matt Wood told me, "Those that go out and buy expensive infrastructure find that the problem scope and domain shift really quickly. By the time they get around to answering the original question, the business has moved on." In other words, the cloud not only makes big data manageable, but it also makes it productive.
What it may not do, as mentioned, is enrich the traditional Hadoop vendors over the long term. Given that data will increasingly live on public clouds from Amazon, Microsoft, and Google, it's very possible that so-called data gravity will push enterprises to use the Hadoop services native to those platforms.