How to avoid big data analytics failures

Follow these six best practices to blow past the competition, generate new revenue sources, and better serve customers

Big data and analytics initiatives can be game-changing, giving you insights to help blow past the competition, generate new revenue sources, and better serve customers.

Big data and analytics initiatives can also be colossal failures, resulting in lots of wasted money and time—not to mention the loss of talented technology professionals who become fed up at frustrating management blunders.

How can you avoid big data failures? Some of the best practices are the obvious ones from a basic business management standpoint: be sure to have executive buy-in from the most senior levels of the company, ensure adequate funding for all the technology investments that will be needed, and bring in the needed expertise and/or having good training in place. If you don’t address these basics first, nothing else really matters.

But assuming that you have done the basics, what separates success from failure in big data analytics is how you deal with the technical issues and challenges of big data analytics. Here’s what you can do to stay on the success side of the equation.

Carefully choose your big data analytics tools

Many technology failures stem from the fact that companies buy and implement products that prove to be an awful fit for what they are trying to accomplish. Any vendor can slap the words “big data” or “advanced analytics” onto their product descriptions to try to take advantage of the high level of hype around these terms.

But products differ considerably not only in quality and effectiveness but also focus. Thus, even if you choose a technically strong product, it may not be good at what you actually need done.

There are some basic capabilities for nearly all big data analytics, such as around the data transformation and storage architecture (think Hadoop and Apache Spark). But there are also multiple niches in big data analytics, and you have to get products for the niches your technology strategy actually involves. These niches include process mining, predictive analytics, real-time solutions, artificial intelligence, and business intelligence dashboards.

Before deciding to purchase any big data analytics products or storage platform, you need to figure out what the real business needs and problems are, select products designed to effectively address those specific issues.

For example, you would opt for cognitive big data products such as analytics that use artificial intelligence for analyzing unstructured data, due to the complexity of compiling huge data sets. But you would not use cognitive tools for structured and standardized data, for which you can deploy one of many analytics products that can generate quality insights in real time with a more rational price, says Israel Exposito, global process lead for big data at telecommunications company Vodafone.

It’s wise to run proof of concepts using at least two products before you choose the one for your production environment, Exposito says.. The product also should be able to interface with your relevant enterprise platforms.

Every big data analytics tool requires developing a data model in the back-end system. This is the most important part of the project. So, you need to make sure that system integrators and business subject matter experts work hand in hand in this effort. Take your time and do it right the first time.

It’s important to remember that the right data should be always available and translated into business language, so users will fully understand the output and thus can use it to drive opportunities or process improvements.

Make sure the tools are easy to use

Big data and advanced analytics are complex, but the products that business users rely on to access and make sense of the data don’t need to be.

Provide simple, effective tools for business analytics teams to use for data discovery as well as analytics and visualizations.

Finding the right combination of tools was difficult for domain registrar GoDaddy, says Sharon Graves, the company’s business intelligence tools evangelist for enterprise data. It had to be simple for quick visualizations, yet capable enough for deep-dive analytics. GoDaddy was able to find products that let business users easily find the appropriate data and then generate visualizations on their own. That freed up the analytics teams to perform more advanced analytics.

Above all, don’t provide programmer-level tools to nontechnical business users. They’ll become frustrated and might resort to using their previous tools, which aren’t really up to the job (otherwise, you wouldn’t have a big data analytics project).

Align the project—and the data—with the actual business need

Another reason why bug data analytics efforts might fail is because they end up being a solution in search of a problem that doesn’t really exist. That’s why you must frame the business challenges/needs you’re looking to address into the right analytical problem, says Shanji Xiong, chief scientist in the Global Data Labs at information services provider Experian.

A key is to involve subject matter experts with strong analytical backgrounds early in the project to work with data scientists to define the problem. 

Here’s an example from Experian’s own big data analytics initiative. When developing analytics solutions to combat identity fraud, the challenge could be to assess if a combination of personal identification information (PII) such as names, addresses, and Social Security numbers is legitimate. Or the challenge might be to assess whether a customer applying for a loan using a set of identities is the legitimate owner of the identities. Or both challenges might exist.

The first challenge is a “synthetic identity” problem, and it needs a analytics model assessing the risk of synthetic identity developed at the consumer or PII level, Xiong says. The second challenge is an application fraud problem, and the scores to assess the risk of fraud need to be developed at the application level. Experian had to understand these were different problems, even though they may have been seen initially as the same problem stated differently, and then create the right models and analyses to address them.

When a set of PII is presented to two financial institutions to apply for loans, a usual requirement is to return the same score for synthetic risk, but that is not usually a required feature for application fraud scores, Xiong says.

The right algorithms must be applied to the right data to extract business intelligence and to make accurate predictions. Collecting and including relevant data sets in the modeling process is almost always more important than fine-tuning machine learning algorithms, and so the data effort should be treated as a top priority.

Build a data lake, and don’t skimp on bandwidth

As the term implies, big data involves tremendous amounts of data. In the past, very few organizations could store so much data, much less organize and analyze it. But today, high-performance storage technologies and large-scale parallel processing are widely available, both in the cloud and via on-premises systems.

However, storage itself is not enough. You need a way to handle disparate types of data that feed into your big data analytics. That was the genius of Apache’s Hadoop, which allowed the storage and mapping of huge, disparate data sets. Such repositories are often called data lakes. An actual lake is typically fed by multiple streams, and it contains many species of plants, fish, and other animas; a data lake is typically fed by multiple data sources and contains many types of data.

But a data lake should not be a dumping ground for data. You need to be thoughtful about how you aggregate data, extending attributes in a meaningful way, says Jay Etchings, director of research computing at Arizona State University. The data can be disparate, but how it is transformed for your analytics using tools like MapReduce and Apache Spark should be done with a solid data architecture in place.

Create a data lake where ingestion, indexing, and normalization are well-planned components of the big data strategy. Without a clearly understood and articulated blueprint, most data-intensive initiatives are doomed to fail, Etchings says.

Likewise, having sufficient bandwidth is vital; otherwise the data won’t move from various sources to the data lake and business users fast enough to be useful. To deliver on the promise of having massive data resources requires not only fast disks capable of millions of I/Os per seconds (IOPS), Etchings says, but also interconnected nodes and processing engines that can readily access data as it’s generated.

Speed is particularly important for real-time analytics, from social media trends to traffic routing. So build your data lake on the fastest interconnect available.

Design security into every facet of big data

The high degree of heterogeneity in computational infrastructure components has substantially sped up organizations’ ability to glean meaningful insights from data. But there’s a downside: The systems are much more complex to manage and secure, Etchings says. With the huge amounts of data involved and the mission-criticality of most big data analytics systems, failing to take adequate precautions in protecting systems and data is asking for trouble on a major scale.

Much of the data that companies are gathering, storing, analyzing and sharing is customer information—some of it personal and identifiable. If that data gets into the wrong hands, the results are predictable: monetary losses from lawsuits and possibly regulatory fines, damaged brand and reputation, and unhappy customers.

Your security measures should include deploying the basic enterprise tools: data encryption whenever practical, identity and access management, and network security. But your security measures should also include policy enforcement and training about the proper access and use of data.

Make data management and quality a top priority

Ensuring good data management and quality should be a hallmark of all big data analytics projects—otherwise the chances of failure are much greater.

You need to put controls in place to ensure that data is up to date, accurate, and delivered in a timely manner. As part of its big data initiative, GoDaddy implemented alerting that informs managers if a data update has failed or is running late. In addition, GoDaddy has implemented data-quality checks on key metrics, sending alerts when these metrics are not aligned with expectations.

A big part of ensuring data quality and governance is hiring skilled data-management professionals, including a director of data management or other executive to oversee these areas. Given the strategic importance of these initiatives, enterprises have a real need for data ownership over data stewardship, management, governance, and policy.

Related:

Copyright © 2017 IDG Communications, Inc.