Devs will lead us to the big data payoff at last

Enterprises have gotten little satisfaction from their early adventures in big data, so developers are charting their own course in the cloud

Devs will lead us to the big data payoff at last
Credit: Wikimedia

In 2011, McKinsey & Co. published a study trumpeting that "the use of big data will underpin new waves of productivity growth and consumer surplus" and called out five areas ripe for a big data bonanza. In personal location data, for example, McKinsey projected a $600 billion increase in economic surplus for consumers. In health care, $300 billion in additional annual value was waiting for that next Hadoop batch process to run.

Five years later, according to a follow-up McKinsey report, we're still waiting for the hype to be fulfilled. A big part of the problem, the report intones, is, well, us: "Developing the right business processes and building capabilities, including both data infrastructure and talent" is hard and mostly unrealized. All that work with Hadoop, Spark, Hive, Kafka, and so on has produced less benefit than we thought it would.

In part that's because keeping up with all that open source software and stitching it together is a full-time job in itself. But you can also blame the bugbear that stalks every enterprise: institutional inertia. Not to worry, though: The same developers who made open source the lingua franca of enterprise development are now making big data a reality through the public cloud.

Paltry big data progress

On the surface the numbers look pretty good. According to a recent SyncSort survey, a majority (62 percent) are looking to Hadoop for advanced/predictive analytics with data discovery and visualization (57 percent) also commanding attention.

Yet when you examine this investment more closely, a comparatively modest return emerges in the real world. By McKinsey's estimates, we're still falling short for a variety of reasons:

  • Location-based data has seen 50 to 60 percent of potential value captured, mainly because not everyone can afford a GPS-enabled smartphone
  • In U.S. retail, we're seeing 30 to 40 percent, due to a lack of analytical talent and an abundance of still-siloed data
  • Manufacturing comes in at 20 to 30 percent, again because data remains siloed in legacy IT systems and because management remains unconvinced that big data will drive big returns
  • U.S. health care limps along at a dismal 10 to 20 percent, beset by poor interoperability and data sharing, along with a paucity of proof that clinical utility will result
  • The E.U. public sector also lags at 10 to 20 percent, thanks to an analytics talent shortage and data siloed in various government agencies

These aren't the only areas measured by McKinsey, but they provide a good sampling of big data's impact across a range of industries. To date, that impact has been muted. This brings us to the most significant hole in big data's process: culture. As the report authors describe:

Adapting to an era of data-driven decision making is not always a simple proposition. Some companies have invested heavily in technology but have not yet changed their organizations so they can make the most of these investments. Many are struggling to develop the talent, business processes, and organizational muscle to capture real value from analytics.

Given that people are the primary problem holding up big data's progress, you could be forgiven for abandoning all hope.

Big data's cloudy future

Nonetheless, things may be getting better. For example, in a recent AtScale survey of more than 2,500 data professionals across 1,400 companies and 77 countries, roughly 20 percent of respondents reported clusters of more than 100 nodes, a full 74 percent of which are in production. This represents double-digit year-over-year growth.

It's even more encouraging to see where these nodes are running, which probably accounts for the increase in success rates. According to the same survey, more than half of respondents run their big data workloads in the cloud today and 72 percent plan to do so going forward. This aligns with anecdotal data from Gartner that interest in data lakes has mushroomed along with a propensity to build those lakes in public clouds.

This makes sense. Given that the very nature of data science -- asking questions of our data to glean insight -- requires a flexible approach, the infrastructure powering our big data workloads needs to enable this flexibility. In an interview, AWS product chief Matt Wood makes it clear that because "your resource mix is continually evolving, if you buy infrastructure it's almost immediately irrelevant to your business because it's frozen in time."

Infrastructure elasticity is imperative to successful big data projects. Apparently more and more enterprises got this memo and are building accordingly. Perhaps not surprising, this shift in culture isn't happening top-down; rather, it's a bottom-up, developer-driven phenomenon.

What should enterprises do? Ironically, it's more a matter of what they shouldn't do: obstruct developers. In short, the best way to ensure an enterprise gets the most from its data is to get out of the way of its developers. They're already taking advantage of the latest and greatest big data technologies in the cloud.