Big data is dead -- long live big data

Soon, we'll see 'prepacked' applications that incorporate the distributed processing, machine learning, and analytics of today's overhyped, custom-made solutions

skeleton dead computer PC user
Damien du Toit (CC BY 2.0)

For the last few years we've talked endlessly about big data, led by Hadoop and now Spark. The next round of hype is all about applying machine learning to big data, but that's merely a way to sell AI and analytics to people without using those dirty words.

In truth, the big data era is rapidly coming to a close. You've probably seen media reports of big data pullbacks -- which, I suppose, puts us in the trough of disillusionment in Gartner's famous hype cycle.

Now is the point where big data "ends" and actual application of the technology begins.

For the industry, this means there will be fewer "let's roll out the platform and see what happens" projects. The decision makers are going to take a more rational approach, as they should, and start with a business problem first. This means even the platform companies are talking about "solutions."

Standard solutions for actual problems

The next big step is analyzing problems, finding patterns, and creating packaged solutions to those problems.

We already see this in the finance industry with the latest generation of distributed fraud detection packages wrapped up and ready to go. Fraud detection software isn't new, but distributing it at Hadoop and/or cloud scale is pretty fresh. Not only is finance happening faster, but so is fraud. For years, there has been a missile gap -- and the industry was losing. Now they're fighting back, and Hadoop, Spark, and other modern tools are the firepower behind a new arsenal.

Custom-built solutions using the next wave of technology aren't enough. Fraud detection for credit cards isn't that different than for invoicing, insurance, or other common business applications. The next big wave isn't to write superspecialized apps for very specific industries, but to identify the "distributed big data patterns" for solving common problems that exist across lines of business.

Sure, building custom solutions where everyone solves similar problems in different ways will persist for a while. But the future is finding commonality, developing patterns, and spreading that across lines of business -- that is, to use this new technology of massive distribution and cost-effective scale and apply it without blinders on. In the end, we customize it and use the right terms and add the twists, but designing pluggable algorithms in software that don't have to be written over and over again is what we're supposed to be good at, right?

We've seen this before. Decades ago, accounting software was a hot topic. While you can still occasionally find specialized accounting software for specific businesses, most big companies use a prepackaged solution that's customized to some degree or has a plug-in specific to the industry in question. It seldom occurs to a skilled CIO or CTO to write an accounting package for a line of business, let alone one specific to the company. They buy off the shelf, even though there are no more shelves of software.

The next big leap is going "data driven" and using "machine learning" through a series of software package acquisitions and trivial integration. It might be driven by big data in the back end, but "big data" will be like Ethernet cards: a given, but not a hot topic of conversation.

Stay up to date with InfoWorld’s newsletters for software developers, analysts, database programmers, and data scientists. • Get expert insights from our member-only Insider articles.