When it comes to big data, vendors are saying one thing, and organizations are doing something very different.
According to the first edition of Dresner Advisory Services' Big Data Analytics Market Study, the big data industry may talk up the importance of big data, but customers don't have a lot of plans to adopt it -- at least, not in the form that Hadoop currently presents.
Hadoop? Maybe someday
Around 3,000 organizations were polled, and only 17 percent "actively use big data in their organization today," with big data meaning Hadoop; 47 percent "may adopt big data in the future."
That leaves 36 percent who have no professed plans for big data -- and 69 percent of potential future adopters aren't even thinking about doing anything about it until after 2016.
In fact, big data in the abstract ranked relatively low when compared to other common business intelligence technologies. The ones that ranked highest were all about access to and visualizations for data: dashboards, end user data self-service, discovery, and data warehousing. Big data ranked in the bottom third of the list, and big data search technologies like Apache Sol or Elasticsearch were uniformly unimportant to those customers.
With organizations that used Hadoop, they ranked it as most relevant for optimizing data warehouses and analyses of customer behavior; fraud detection and clickstream analytics ranked far lower. That behavior held true across industries, departments, sizes of organizations, and so on.
"This reinforces our view," stated the report, "that big data is presently a large-organization pursuit meant to lower cost, complexity, and time to benefit rather than innovate with new data sources."
No single big data distribution stood out for customers, either. Cloudera, Amazon, Hortonworks, and MapR all ranked about the same in terms of their importance -- 40 percent or more of the respondents ranked them as "not important."
This led into the biggest contrast described by the report: The difference between customer preferences and vendor perceptions. Vendors ranked big data as massively important (59 percent "critically" important, 23 percent "very" important), way out of phase with self-reported customer perceptions and usage.
"We do not believe this incongruity is entirely unreasonable in the current market," stated the report, pointing out that big data solutions are still moving targets. "While untold volumes of data are already coming under management today with implications for business, our respondent base tells us that mainstream big data business support has not yet arrived and is not a high or immediate priority."
For technologies under Hadoop's umbrella that mattered most, the top two on that list came as little surprise. Venerable MapReduce led the pack, followed almost immediately by Spark. Most everything else, from Yarn down through to Mesos and Tachyon, trailed considerably.
Spark's high placement corresponds with the high position of Spark MLib in the list of machine-learning/big data analytics technologies ranked by the respondents. It also explains the high degree of current and near-future support for Spark in commercial distributions of Hadoop.
All this squares with the possibility that interest in big data is manifesting more as an interest in specific technologies within Hadoop -- Spark, in particular -- and not Hadoop generally. Spark's intrigue may also correlate with the high ranking of, say, self-service approaches to data. Spark enables that, albeit in a developer, rather than a user context.
With machine learning, more than half the respondents dismissed Spark MLib and other related technologies (Rhipe, Mahour, Oryx, Myrrix) as "not important." This does not indicate a general indifference to machine learning, but rather hints that the use of machine learning with big data is still in its relative infancy -- much like the rest of big data.