You can't build enterprise AI if you suck at data & analytics

Executives are anxious to jump on the AI bandwagon, but many organizations lack the requisite data & analytics infrastructure or executive culture to build production machine learning systems.

analyze / inspect / examine / find / research / data / charts / graphs / magnifying glass / code

Saying you use “AI” at your company may give you bragging rights at your industry meetups and even fool the media, but actually implementing enterprise-wide transformation is much harder than just claiming you have.

Before you beg management for an eye-popping AI budget, be aware: not all companies are ready for artificial intelligence.

Unlike flash drives and mobile apps, enterprise-scale AI is not standalone plug-and-play technology. The quality of your data and analytics infrastructure, as well as your organization’s engineering and business culture, are critical foundations for any AI initiative. Even tech-savvy companies like Google have made embarrassing faux-pas, such as mistakenly auto-labeling black people as gorillas.

If you don’t want to end up on lists of top AI fails, keep these implementation principles and practices in mind:

Practice 1: Start With Goals And Hypotheses, Not With Solutions

Shiny new technologies aren’t necessarily better for your specific organization. If you’re only working with less than a few TB of data, you don’t have big data and don’t need to incur the operational headache of implementing a Hadoop architecture. If you’re doing static analysis and don’t need real-time predictions from your machine learning algorithms, you probably don’t need the speed advantages of Spark. Deep learning may be in vogue, but many enterprises find that ensemble approaches combining older machine learning and statistical methods actually outperform modern neural network approaches for specific problems and tasks.

Start with the problem, not with the technology. Clearly articulate business goals and metrics of success, then apply the scientific method of testing various hypotheses to gain knowledge and expertise about the limitations of different solutions. Understanding where technologies fail is as important as understanding where they succeed.  

Practice 2: Get The Right Data, Not Just More Data

There’s a common myth that “more data is better” and that “AI needs vast amounts of data to work”. The truth is more nuanced. When asked about the biggest limitation to AI systems today, executives at leading technology firms unanimously agreed that data quality is the key bottleneck.

“AI is like a human, incorrect or bad data will produce bad results,” states Sanjeev Katariya, Chief Architect at eBay. “If you’re not careful and you select data and features that don’t really help with learning, you can wind up with some really erratic behaviors.” Serving more than 160 million active users, eBay reportedly employs thousands of data analysts to ensure the company’s machine learning algorithms are fed the right data with the right quality, a process which takes expertise,  time, and patience.

Deepak Agarwal, VP of Engineering and Head of Relevance and AI at LinkedIn corrects a common misconception: “We see a lot of media misreporting about AI taking over human roles. Most AI actually requires more human time either in engineering or in reviewing to ensure that the information the algorithms come up with are unbiased, accurate, offensive or misleading.”  

Practice 3: Enforce Data Standards & Enable Broad Access

Business leaders often think they have good data when in fact their data is inconsistent, incomplete, erroneous, sparse, biased, and spread across distributed systems that only specialized engineers understand how to work. Massimo Mascaro, Director of Data Engineering and Data Science at Intuit, explains why his company mandates data cleanliness and organization: “As a financial services firm, we work in a complex, regulatory context that is constantly in flux, resulting in long tails of data across a variety of different use cases.”

Even if you’re not in finance, creating and enforcing company-wide standards and access to data is essential to streamline analytics and machine learning. Consolidate a single source of truth, apply clear labeling and metadata, document religiously to mitigate employee confusion, and built the requisite tools and technologies to enable enterprise-wide access.

Lack of access to critical data causes political challenges and unnecessary delays in large enterprises. Companies we work with who are the most successful at adopting new AI techniques have virtually all encapsulated core business functionality in the form of well-documented internal APIs and interfaces that make cross-department and cross-team technical collaboration easy. Many also deploy business intelligence (BI), analytics dashboards, and visualization tools to help non-technical leaders and employees engage with important insights.  

Practice 4: Make Sure Data Owners (Business) Talk To Data Stewards (Tech)

Rare is the enterprise where data owners, business leaders responsible for the information and insights, sit next to data stewards, the engineers implementing and managing data capture, storage, and handling, and actually learn from each other. “For non-technical leaders, there is a misconception that AI can be applied generally,” says Chris Curran, Chief Technologist at PwC. “One skill that remains to be scarcely found is the ability to map a particular business problem to a specific AI technique; there are many techniques and they don't all work equally well.”

The lack of technical literacy in business leaders negatively impacts AI projects. “We often close a sale with a business person who says they have all the requisite data ready to go,” says Robbie Allen, Founder of Automated Insights, an enterprise natural language processing & generation (NLP / NLG) platform. “Then we find out they have no idea how the data is stored, that it’s distributed across a number of systems, and the technical talent required to get the data out properly is booked on other projects for months. That’s how a 3 month project easily blows out to 6 months or more.”

On the flip side, the engineers and data scientists working directly with a company’s data might not be domain experts on the business use cases. Automated Insights, which enables a company to turn quantitative data such as sports scores and earnings reports into computer-generated articles, requires domain expertise to produce the best results. Allen explains: “Text output is not the hard part of NLG. The hard part is determining what is interesting and worth talking about in a story. Sports reporting may seem easy, for example, but many concepts such as leagues, teams, players, coaches, and playoffs don’t exist in other industries. The relationships between all the entities is also constantly in flux.”  

Practice 5: Improve Your Executive Culture & Data Literacy

HiPPO is an acronym for “highest paid person’s opinion” and reflects the antithesis of what you want at a data-driven organization. None of us are perfectly free from bias, but an executive culture that is driven by HiPPOs rather than data, objective experimentation, and collaborative thinking will invariably thwart rational thinking and AI implementation across your organization. If your company culture isn’t open to challenging assumptions and adapting to new learnings, you might fall prey to the common practice of cherry-picking data sets to force-engineer insights that jive well with your tilted assumptions and desired conclusions. Refusing to recognize reality typically takes you further, not closer, to your goals.

“Using data as a basis for all decisions, rather than opinions, is a primary reason why we have been successful integrating AI throughout the company,” claims Agarwal of LinkedIn. 

Copyright © 2017 IDG Communications, Inc.

How to choose a low-code development platform