4 reasons big data projects fail—and 4 ways to succeed

Nearly all big data projects end up in failure, despite all the mature technology available. Here's how to make big data efforts actually succeed

4 reasons big data projects fail—and 4 ways to succeed
Thinkstock

Big data projects are, well, big in size and scope, often very ambitious, and all too often, complete failures. In 2016, Gartner estimated that 60 percent of big data projects failed. A year later, Gartner analyst Nick Heudecker‏ said his company was "too conservative" with its 60 percent estimate and put the failure rate at closer to 85 percent. Today, he says nothing has changed.

Gartner isn’t alone in that assessment. Long-time Microsoft executive and (until recently) Snowflake Computing CEO Bob Muglia told the analytics site Datanami, “I can’t find a happy Hadoop customer. It’s sort of as simple as that. … The number of customers who have actually successfully tamed Hadoop is probably fewer than 20 and it might be fewer than ten. That’s just nuts given how long that product, that technology has been in the market, and how much general industry energy has gone into it.” Hadoop, of course, is the engine that launched the big data mania.

Other people familiar with big data also say the problem remains real, severe, and not entirely one of technology. In fact, technology is a minor cause of failure relative to the real culprits. Here are the four key reasons that big data projects fail—and four key ways in which you can succeed.

Big data problem No. 1: Poor integration

Heudecker said there is one major technological problem behind big data failures, and that is integrating siloed data from multiple sources to get the insights companies want. Building connections to siloed, legacy systems are simply not easy. Integration costs are five to ten times the cost of software, he said. “The biggest problem is simple integration: How do you link multiple data sources together to get some sort of outcome? A lot go the data lake route and think if I link everything to something magic will happen. That’s not the case,” he said.

Siloed data is part of the problem. Clients have told him they pulled data from systems of record into a common environment like a data lake and couldn’t figure out what the values meant. “When you pull data into a data lake, how do you know what that number 3 means?” Heudecker asked.

Because they're working in silos or creating data lakes that are just data swamps, they're just scratching the surface of what they could accomplish, said Alan Morrison, a senior research fellow with PwC. “They don't understand all the relationships in data that need to be mined or inferred and made explicit so machines can adequately interpret that data. They need to create a knowledge graph layer so that machines can interpret all the instance data that's mapped underneath. Otherwise, you've just got a data lake that's a data swamp,” he said.

Big data problem No. 2: Undefined goals

You would think most people undertaking a big data project would actually have a goal in mind, but a surprising number don’t. They just launch the project with the goal as an afterthought.

“You have to scope the problem well. People think they can connect structured and unstructured data and get the insight you need. You have to define the problem well up front. What’s the insight you want to get? It’s having a clear definition of the problem and defining it well up front,” said Ray Christopher, product marketing manager with Talend, a data-integration software company.

Joshua Greenbaum, a principal analyst at Enterprise Application Consulting, said part of what has bedeviled both big data and data warehousing projects is the main guiding criteria is typically the accumulation of large amounts of data and not the solving of a discrete business problems.

“If you pull together large amounts of data you get a data dump. I call it a sanitary landfill. Dumps are not a good place to find solutions,” Greenbaum said. “I always tell clients decide what discrete business problem needs to be solved first and go with that, and then look at quality of data available and solve the data problem once the business problem has been identified.”

“Why do most big data projects fail? For starters, most big data project leaders lack vision,” said PwC’s Morrison. “Enterprises are confused about big data. Most just think about numerical data or black box NLP and recognition engines and that do simple text mining and other kinds of pattern recognition.”

Big data problem No. 3: The skills gap

Too often, companies think the in-house skills they have built for data warehousing will translate to big data, when that is clearly not the case. For starters, data warehousing and big data handle data in total opposite fashion: Data warehousing does schema on write, which means the data is cleaned, processed, structured, and organized before it ever goes into the data warehouse.

In big data, data is accumulated and schema on read is applied, where the data is processed as it is read. So if data processing goes backward from one methodology to another, you can bet that skills and tools are as well. And that’s just one example.

“Skills are always going to be a challenge. If we’re talking about big data 30 years from now, there will still be a challenge,” Heudecker said. “A lot of people hang their hat on Hadoop. My clients are challenged finding Hadoop resources. Spark is a little better because that stack is smaller and easier to train up. Hadoop is dozens of software components.”

Big data problem No. 4: The tech generation gap

Big data projects frequently take from older data silos and try to merge them with new data sources, like sensors or web traffic or social media. That’s not entirely the fault of the enterprise, which collected that data in a time before the idea of big data analytics, but it is a problem nonetheless.

“Almost the biggest skill missing is the skill to understand how to blend these two stakeholders to get them to work together to solve complex problems,” consultant Greenbaum said. “Data silos can be a barrier to big data projects because there is no standard anything. So when they start to look at planning, they find these systems have not been implemented with any fashion that this data would be reused,” he said.

“With different architectures you need to do processing differently,” said Talend’s Christopher. “Tech skills and architecture differences were a common reason why you can’t take current tools for an on-premises data warehouse and integrate it with a big data project—because those technologies will become too costly to process new data. So you need Hadoopand Spark, and you need to learn new languages.”

Big data solution No. 1: Plan ahead

It’s an old cliché but applicable here: If you fail to plan, plan to fail. “Successful companies are the ones who have an outcome,” Gartner’s Heudecker said. “Pick something small and achievable and new. Don’t take legacy use case because you get limitations.”

“They need to think about the data first, and model their organizations in a machine-readable way so the data serves that organization,” PwC’s Morrison said.

Big data solution No. 2: Work Together

All too often, stakeholders are left out of big data projects—the very people who would use the results. If all of the stakeholders collaborate, they can overcome many roadblocks, Heudecker said. “If the skilled people are working together and working with the business side to deliver actionable outcome, that can help,” he said.

Heudecker noted that the companies succeeding in big data invest heavily in in the necessary skills. He sees this the most in data-driven companies, like financial services, Uber, Lyft, and Netflix, where the company’s fortune is based on having good, actionable data.

“Make it a team sport to help curate and collect data and cleanse it. Doing that can increase the integrity of the data as well,” Talend’s Christopher said.

Big data solution No. 3: Focus

People seem to have the mindset that a big data project needs to be massive and ambitious. Like anything you are learning for the first time, the best way to succeed is to start small then gradually expand in ambition and scope.

“They should very narrowly define what they are doing,” Heudecker said. “They should pick a problem domain and own it, like fraud detection, microsegmenting customers, or figuring out what new product to introduce in a Millennial marketplace.”

“At the end of the day, you have to ask the insight you want or the business process to be digitized,” said Christopher. “You don’t just throw technology at a business problem; you have to define it up front. The data lake is a necessity, but you don’t want to collect data if it’s not going to be used by anyone in business.”

In many cases, that also means not overinflating your own company. “In every company I’ve ever studied, there are only a few hundred key concepts and relationships that the entire business runs on. Once you understand that, you realize all of these millions of distinctions are just slight variations of those few hundred important things,” PwC’s Morrison said. “In fact, you discover that many of the slight variations aren’t variations at all. They’re really the same things with different names, different structures, or different labels,” he added.

Big data solution No. 4: Jettison the legacy

While you may want to use those terabytes of data collected and stored in your data warehouse, the fact is you might be better served just focusing on newly gathered data in storage systems designed for big data and designed to be unsiloed.

“I would definitely advise not necessarily being beholden to an existing technology infrastructure just because your company as a license for it,” consultant Greenbaum said. “Often, new complex problems may require new complex solutions. Falling back on old tools around the corporation for a decade isn’t the right way to go. Many companies use old tools, and it kills the project.”

Morrison o=noted, “Enterprises need to stop getting their feet tangled in their own underwear and just jettison the legacy architecture that creates more silos.” He also said they need to stop expecting vendors to solve their complex system problems for them. “For decades, many seem to assume they can buy their way out of a big data problem. Any big data problem is a systemic problem. When it comes to any complex systems change, you have to build your way out,” he said.

Copyright © 2019 IDG Communications, Inc.