There are lies, damn lies ... and technology industry studies.
Back in May, Gartner released a survey that crapped all over the Hadoop industry: Of the 284 CIOs polled by Gartner, only 26 percent claimed to be "either deploying, piloting, or experimenting with Hadoop." Even with the small sample size and large margin of error, these were depressing numbers -- ones that contradicted the level of adoption people like me are seeing in the real world.
Well, Gartner has been wrong before. Released today, a new survey commissioned by AtScale that surveyed more than 2,100 people offers results a whole lot closer to what those of us in the field encounter. Most dramatically, 76 percent of respondents said they plan to use Hadoop -- or are already using it and plan to use it more.
Naturally, the AtScale numbers need to be taken with a grain of salt: AtScale is a Hadoop solutions provider and the survey was conducted on its behalf. But anecdotally, at least, I can tell you that the results paint a picture a closer to reality than Gartner's doom and gloom.
Hadoop's killer app
Some of you might find this shocking, but the AtScale survey indicates that Hadoop's killer app is business intelligence -- for 69 percent of those planning to use Hadoop and 65 percent of those already using it. If that's a surprise, then you didn't read even the top item in my article "The 7 most common Hadoop and Spark projects."
Most companies don't have "big data" -- only many new unstructured or semistructured data sources -- and they'd like to gain insight by aggregating them and hooking up a visualization tool. In fact, according to the study, most want to gain insight using Tableau or Excel. If they're already using Hadoop, they are probably also working with Tableau (51 percent). If they aren't, they'd like to use Excel (60 percent).
This matches exactly what I see in the field. My company's bread and butter is building data lakes (or enterprise data hubs, if you prefer). As the survey confirms, these new Hadoop-based systems generally don't replace Teradata or Netezza. Instead, customers either want to augment their existing MPP to handle new types of data or they don't have MPP in place at all. In my experience, companies find their MPP systems can't scale the way they hoped -- and discover they can shove Hadoop on regular hardware (or on Amazon) and add more nodes as they grow.
According to the study, the lower cost of Hadoop solutions isn't the main attraction for most companies. But cost and scale are always connected. If you're looking to do BI and analytics today, you probably don't think "let me buy this weird proprietary hardware columnar database thing" as a first step. In fact, if you draw the architecture of a Netezza next to the architecture of Hive or maybe HBase plus Phoenix, you'll see a very similar structure.
What does Netezza's owner IBM think? Well, it thinks about Spark.
Self-service is the goal
Most companies want to achieve a level of self-service from Hadoop. According to the study, the companies that have achieved significant business value have already reached some level of self-service.
Self-service has multiple meanings. On the one hand, you need fewer people involved in managing Hadoop. On the other hand, you need a sufficient amount of data in the lake so that a new feed isn't needed for each new report or dashboard. You also need views and general structure around it to make sure a mere mortal can query it with SQL. Yes, the main way people practice self-service is with SQL tools.
According to the study, most people haven't achieved self-service and thus haven't achieved the tangible value they were looking for.
Anemic 10-node clusters are losers
"Hello world" in Hadoop 2 takes 12 nodes. Anything smaller and you get a really slow version of what you already had in SQL Server. According to the study, the people who have larger clusters have achieved more value.
This isn't shocking. I've mentioned more than a few times that Hive is slow but scales well, and that can be said of other Hadoop technologies. If you have a 10-node cluster, it's probably barely functional. Of course you didn't achieve value.
Secondly, if revenue generation (14 percent) or scale-out (37 percent) are your main business drivers rather than cost, but you don't actually scale out -- then of course you don't achieve value. I'd say this connects to another finding of the survey, which I've covered many times before: If you have an executive mandate, not sponsorship alone, you have a 20 percent better chance of achieving value. In my experience, having an executive mandate usually results in a larger cluster.
Exploring the verticals
I made my bones as an open source developer, but today, I do a lot of what could be called sales or sales engineering. Doing this requires fun exercises, such as: "Which industries do we want to focus on?" I came up with financial services, health care, retail, and manufacturing. This was mainly a function of what we'd done before, where we are, and who calls us the most. According to the study, retail didn't make the short list of industries using Hadoop, despite producing many early success stories.
Manufacturing, consulting, telecommunications, financial services and health care all made the list. In my opinion, the opportunities are growing quickest in financial services and health care.
You'd expect financial services companies to be relatively mature users of Hadoop, but that seems to hold true only for the top-tier banks. The next level of clients are barely kicking the tires, but want what the big boys have. Meanwhile, the Affordable Care Act has driven the need for "meaningful use" of electronic medical records. This involves data from disparate systems -- for infection control, population health management, and so on.
Actually deriving meaningful information means consolidating data from multiple data sources. That sounds a lot like data consolidation, data lakes, and BI, doesn't it?
The talent gap
If Gartner had been right about Hadoop stalling, it wouldn't be such a problem to find people with Hadoop skills. I can tell you recruiting experienced Hadoop people is not easy, and once you train them, you have to pay them well and keep them engaged as 1,000 recruiters descend upon them.
According to AtScale's survey, 61 percent of respondents view the talent pool as the biggest challenge to adoption. This doesn't change once customers adopt Hadoop, although they find that management, security, performance, governance, and accessibility are bigger issues than they realized.
A brighter picture
In the AtScale study, 49 percent of respondents have already achieved value and 45 percent are optimistic about achieving it. Only 6 percent are pessimistic, and 3 percent plan to use Hadoop less in the future. Are these numbers too rosy to be believed? Probably.
Nonetheless, I'll bask in the glow of a survey that reflects what I've seen in the field: Brisk and constant adoption -- mainly for consolidation -- and a rapidly maturing technology that people get value from.
AtScale plans to do a follow-up survey. I hope to see a deeper view of companies that never bought Teradata and Netezza, how companies are using Spark, and whether they are buying hardware for Hadoop. Meanwhile, I'd love to hear from others in the Hadoop community about adoption levels, challenges, and opportunities going forward.