Dataops: agile infrastructure for data-driven organizations

While still emerging as an enterprise practice, dataops is increasingly driving teams to collaborate and organize in new ways to build, manage, deploy, and monitor data-intensive applications

big data certification hand holding data
Thinkstock

About a decade ago, the software engineering industry reinvented itself with the development and codification of so-called devops practices. Devops, a compound of “development” and “operations,” refers to a set of core practices and processes that aim to decrease time to market by thoughtfully orchestrating the tight integration between software developers and IT operations, emphasizing reuse, monitoring, and automation. In the years since its introduction, devops has taken the enterprise software community by storm garnering respect and almost-religious-like reverence from practitioners and devotees.

Today, at the dawn of 2018, we are seeing a subtle but profound shift that warrants a reexamination of established software development practices. In particular, there is a growing emphasis on leveraging data for digital transformation and the creation of disruptive business models concomitant with the growth of data science and machine learning practices in the enterprise. As adoption of big data computing platforms and commodity storage becomes more widespread, the ability to leverage large data sets for enterprise applications is becoming economically feasible. We’re seeing massive growth in investments in the development of data science applications—including deep learning, machine learning, and artificial intelligence—that involve large volumes of raw training data. The insights and efficiencies gained through data science are some of the most disruptive of enterprise applications.

However, the goals and challenges to building a robust and productive data science practice in the enterprise are distinct from the challenges of building traditional, lightweight applications that do not rely on large volumes of persistent data. These new challenges have motivated the need to go beyond devops to a more data-centric approach to building and deploying data-intensive applications that includes a holistic data strategy. While the principal goals of devops—namely agility, efficiency, and automation—remain important today as ever, the requirements of leveraging massive volumes of persistent data for new applications has spawned a cadre of practices that extend devops in important ways to support data-intensive applications—hence dataops.

Dataops in the enterprise is a cross-functional process that requires the close collaboration of multiple groups to build, deploy, secure, and monitor data-intensive applications. A dataops process brings together teams from Development (to build the application logic and architecture), Operations (to deploy and monitor applications), Security & Governance (to define the data access policies for both production and historical data sets), Data Science (to build data science and machine learning models that become part of larger applications), and Data Engineering (to prepare training data sets for the data science team).

Consider the development process for a prototypical data-intensive application today. First, data-intensive applications often embed data science or machine learning functions as part of the application logic. Data scientists build these models through an iterative training process that typically relies on large volumes of training data.

Once the models have been trained, they can be deployed or embedded into a larger application that a software developer would implement. This paradigm of leveraging data to build the application logic itself is a major shift; before the data science and machine learning renaissance of the past few years, application logic was designed wholly by the developer without needing to run large experiments and therefore without relying on large volumes of persistent data.

After the application is deployed to a production environment, the embedded data science or machine learning models can be rescored and therefore improved over time. As a result, the data science model might be redeployed independent of any other changes to the overall application logic. Whereas devops practices promote agility by allowing application logic to be continuously deployed to reflect the addition of new features or fixes to the application, data-intensive applications extend this philosophy by emphasizing also continuous model deployment to deploy newly trained or rescored data science models to existing production applications. Underlying this whole process, of course, is the need to ensure that the data used to train and rescore the models, as well as the production data streams, are governed and secured properly.

Dataops focuses on the business value of data science and machine learning by improving the time to market for intelligent, data-intensive applications. While still emerging as an enterprise practice, dataops is increasingly driving teams to collaborate and organize in new ways to build, manage, deploy and monitor data-intensive applications. Fundamentally, dataops puts data squarely at the heart of application development considerations and turns conventional application-centric thinking on its head.

This article is published as part of the IDG Contributor Network. Want to Join?