Modern IT and data science in an era of analytic deployment

How data scientists and IT build and deploy their analytics models

Data analytics graph on an iPad Air tablet
Burak Kebapci (CC0)

How data scientists and IT build and deploy their models into production today is a reflection of the way big data and data science has grown and evolved throughout the years.

According to a McKinsey report, by 2020 there will be 40,000 exabytes of data collected. The mass amount of data that is being consumed everyday by data scientists has strengthened the demand for these intellectuals to use and build models from this data. As the demand has grown, the way in which data scientists have built and then passed along their models to IT has changed to keep up with this demand.

You may be wondering what has changed in the ways data scientists have built their models throughout the years and why it matters. To answer this, let’s discuss a very common structure of models that many data scientists have started with and are currently using: a monolithic approach. A monolithic architecture is the traditional programming model, where the elements of a software program are interwoven and interdependent. This framework has created an environment where the models data scientists have created contain tradeoffs into their analytics to fit the solution they have built. While this may not be the most convenient approach, it is one that data scientists and IT have always been sure of.

However, a new framework to building models has become more well-known and used in many tech industries: modular architecture. A modular approach contrasts with a monolithic approach, in that the models being created have the capability to be configured to fit data scientists and IT’s needs. The model is not locked into a one-step solution; therefore, it is flexible to change and adapt more freely. This new approach has raised many questions over its ease of use, but it has shown a better outcome overall.

Not only has the way models have been developed changed, but the way models have been passed along to IT and deployed into production has changed as well.

When a model is passed along from data scientists to IT, it must be recoded to fit the language and configure to the tools IT uses. To ease this process, these two teams had used different formats, moving from one solution to the next in an attempt to be better than the last:

  • PMML (Predictive Model Markup Language): an XML format used to help with the safety and scalability of scoring engines, and used to assist analytic applications interpret and exchange predictive models.
  • PFA (Portable Format for Analytics) to combat challenges with PMML; it is a common language that assists with the transition of models from development to production.

While each new solution, whether an architecture or model interchange format, has come with complications, it was still a step closer to a more innovative approach to deploying models.

Currently, many data scientists and engineers have taken on a different approach when deploying models into production. They have acquired the implementation of an “agnostic” engine. An agnostic engine can essentially take any model no matter what language it was written in, and produce a score without any restrictions. This new solution was derived from the desire to be able to run any model without having any constraints due to differing languages and tools, which is something that has not been seen in any other solution.

Using an agnostic engine takes away any restrictions that may be placed onto the model when transitioning it from data scientists to IT, which is a strong selling point for taking this new route with deployment. With an agnostic engine, data scientists and engineers can:

  • Place any model into the engine no matter what language.
  • Run the native language used to write the model without trade-offs.
  • Capability of scoring and scaling data at a faster and easier rate.

This new and innovative way of deploying models into production is the future of achieving faster and easier analytics with your models.

Copyright © 2017 IDG Communications, Inc.

InfoWorld Technology of the Year Awards 2023. Now open for entries!