Where are you in your analytical journey?

5 stages of the analytic process for data scientists and IT, plus a few best practices for tackling those stages.

Data analytics graph on an iPad Air tablet
Burak Kebapci (CC0)

The analytic journey that many data scientists and IT go through is filled with an abundance of questions and unseen turns. From capturing data for your model, to generating analytics, there is always something new to look forward to. 

In this article, I explain the analytic journey process from start to finish, and along the way state a few best practices that can make this task a bit easier.

Phase 1: Define the problem

The first stage in your analytic journey is to define the problem that you are trying to solve. You must first understand the pain point your organization is facing, and how you wish to resolve this issue. Tackling this beginning step will set you up for the rest of your journey.

After understanding your pain point and possible solution, you can then answer these next two questions: “What kind of model does your organization need?” and “Will you build or buy your model?”

The type of model you require will determine which path you take within your next stage. You will need to know if you require a basic model, or something more custom made. And once you have determined your desired model, you can decide whether to build or buy it.

Understanding when to buy and when to build a model for your company is a very important decision due to the benefits and risks involved. If you choose to buy a model, then your analytic journey is close to finished (the labor part of the process not the waiting). However, if you decide to build your own model, you have quite the journey in store.

Phase 2: Data collection

After making the decision to build your model, you will need to ask yourself, “Do I have enough of the right data, and if so how will I organize it?” When collecting data for your model, you can either gather new data or sift through already existing data.

If you decide to collect new data, you’ll want to make sure that you organize your data into a useful data collection. This will help you further along in your journey by making sure that you can find any specific code.

Most data scientists will instead sift through already existing data. This task is a bit time-consuming. A few of the best practices to easing along this part and making sure it runs smoothly for the rest of our journey are to organize your data with labels, to keep what’s relevant while getting rid of what you will not need for your model, and to set up features to help organize your data.

Phase 3: Building your model

Now that you have all of the data you will need, you can begin building your model. This is the fun, and occasionally stressful, part of the journey. Before you jump into creating a stellar model, you will have to decide which tools and features you will need, as well as what kind of platform you want to build your model off of. Will you use a monolithic or modular platform?

If you decide to use a monolithic platform, there are certain things to keep in mind. For instance, you may not be able to configure your model to fit what you need. Instead, you will have more of a one-stop shop. Everything you will require will essentially be within your platform, but your model will contain trade-offs to adhere to any constraints placed onto it due to the lack of configuration.

If a monolithic platform does not work for you, there is a more modern path which is choosing a modular approach. Through a modular approach, you can swap in and out what features you will need for your model. Having the capability to configure your model will provide you the advantage to better adapt your model later on as it is in use.

Phase 4: Training and testing your model  

As your model is being developed, you must continuously test it to check for any errors. Have you properly trained and tested your model? Ensuring that your model is free from any errors and runs the way it is planned can increase your speed to deployment. Also, reviewing your code before sending your model over to IT is key to producing a well-written model. Is your code an unreadable list of numbers, or is it well organized? Having a descriptive code eases the task of IT having to encode and then recode your model.

Phase 5: Deploying your model

Once your model is free from any bugs, it can then be sent over to IT for deployment. Brace yourself: This is often a very tedious part of the journey, but necessary for deploying a well-written model into production.

While preparing to send your model to IT, you should ask yourself a few questions: How will your model work with IT? Have you used tools and languages that they are not familiar with? Often, the longest part of the analytic journey is deploying models from data scientists to IT. This time delay is due to various factors, but the most prominent being that data scientists and IT use different tools and sources that do not configure with one another.

One of the best practices for having a smooth journey when deploying your model into production, is to use an external engine that can encode your data. This practice will ease the task of IT having to spend a large amount of time trying to encode your model or worse, sending it back to you because it is unreadable.

Once you have successfully completed all of these steps, you should now have a well-performing model that is ready to tackle any problem it comes across.

This article is published as part of the IDG Contributor Network. Want to Join?