The high cost of data science toil

Data science toil saps agility and prevents organizations from scaling data science efforts efficiently and sustainably. Here’s how to avoid it.

The high cost of data science toil
Getty Images

Data science will scale in your enterprise. AI solutions will be built. It’s not a matter of if, only how. And scaling in the wrong way can introduce significant inefficiencies leading to delayed ROI or outright failure of transformation initiatives.

While leaders figure out how to scale data science, the gap between successful scalers and laggards continues to widen. According to the McKinsey & Company Global AI Survey, “A small share of companies—from a variety of sectors—are generating outsized business results from AI, potentially widening the gap between AI power users and adoption laggards.”

Why does this gap exist? What are the barriers to scale? Clearly, there are many valid answers. However, one that continues to plague organizations sits squarely in the purview of IT leaders: ungoverned data science. If you let data science scale organically rather than purposefully, the result, every time, is “data science toil.”

What is data science toil?

Data science toil is the collection of inefficiencies, custom workarounds, tribal communication, non-interchangeable MLOps flows, redundant efforts, and shadow IT that naturally emerge as enterprises expand beyond a few models in production. It creates angst for IT leaders and it increases risk, cost, and resource demands at the expense of higher-value, innovative activities. Data science toil saps agility and keeps leaders from achieving things their competitors might do first or their board expects of them.

Removing and preventing data science toil is not easy. But if we look at top organizations that have achieved strong results with data science, certain patterns emerge, and the path forward becomes clear. Organizations looking to scale data science repeatedly run into three sources of toil, which if not avoided or removed, result in data science expanding inefficiently and unsustainably.

A lack of ownership from IT leadership

Too often in today’s enterprises, data science is treated like a technical discipline when it should be treated like an enterprise capability. CIOs, this is your opportunity. No one else can do it. Data science needs security, reproducibility, and governance. It needs to avoid shadow IT. It needs to remove redundant work. It needs systems to optimize expensive compute resources and maintain cost controls.

For example, the CIO of a Fortune 50 medical devices, pharmaceuticals, and consumer packaged goods company noted upon his arrival that dispersed teams had 28 different ways of forecasting the same critical insight. Everyone told him theirs was different, but as he investigated, he found them to be the same with a few slight variations. This kind of redundant work not only hinders productivity but can lead to inaccurate decisions across groups.

A lack of standardized MLOps processes

ROI lies in data scientists’ research and data products, not in their ability to manage MLOps. IT leaders should bring standardized flexibility to MLOps, where standardized means there is a clear, easy-to-use, best practice for things like data pipelines, research, paths to production, and maintenance of assets. Flexibility lies in an open tooling approach. Without open tooling, you won’t be able to recruit or retain talented professionals.

The key is to bring self-service to data scientists and their research in a manner that is blessed by IT.

A lack of shared, collaborative research

IT leaders may not be aware that knowledge management in data science is a pervasive problem. Most data science teams struggle to collaborate effectively within their team. Sharing knowledge across teams is even more rare. The solution is integrating strong project management capabilities with data science tools. It entails experiment management. It involves a system to carefully trace the lineage of software environments for full reproducibility.

Steps to take to avoid data science toil

The cure for data science toil starts and ends with senior technical and IT leaders. If nothing is done, analytical transformations may fail. But where do you start? Ensure standardized MLOps flows exist so that data scientists can self-service their needs in a way that doesn’t restrict experimentation but is still blessed by IT in a governed and structured manner. Additionally, be sure to take into account the collaborative nature of data science research as proper knowledge management becomes vital when scaling beyond a few data scientists on one team.

IT leaders who can execute against these three pillars will enable analytical transformations and realize the ROI promised to those who invest in data science at scale.

Josh Poduska is chief data scientist at Domino Data Lab.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Copyright © 2021 IDG Communications, Inc.

How to choose a low-code development platform