3 tips for a successful data strategy

Most modern businesses will survive only if they use data to acquire and retain customers, and if they use data to automate their supply chain or protect their valuable assets

number 3 with network nodes top three
Getty Images
Current Job Listings

When we witness the Cambridge Analytica story—where Facebook’s user information was used to interpret and influence electoral votes in the US and other countries—and when we see targeted ads and product recommendations every time we browse the internet, one thing is clear: Data is power. Hopefully this power is in the right hands and the data is used in the right timing and context. Data can be a potent weapon if misused and a crucial asset if used strategically and mindfully.  

In fact, most modern businesses will survive only if they use data to acquire and retain customers, and if they use data to automate their supply chain or protect their valuable assets.

Here are the top three tips for the deployment of a successful data strategy.

1. Act on data before it loses its value

The diagram below of the time value of data demonstrates how data quickly loses its value, and how traditional batch processing (big data), BI, and data warehousing technologies are becoming irrelevant in today’s AI-driven and always-connected world.

value of data fig 1 Yaron Haviv

Many companies are transitioning to provide interactive analytics that let data scientists and business owners view trends and anomalies and conclude what works or doesn’t shortly after the data was captured. This is not a proactive strategy; if it took an hour to conclude that smoke detectors are reporting a fire, by the time action is taken it will only be to clean up the ashes.

What defines digital and online services provided by web companies is that the loop from data capture to action is automated using the fastest data analysis and AI technologies. No one is sitting at a dashboard to decide which advertisement to present to us, deciding what’s the next song that should be played, or giving us driving directions. There is no man in the middle, but machine and AI that are interpreting and asking us to act on data insights.  

To enjoy better and faster intelligence, the cloud is often too far, so you must look to the edge as it is closer to the source of data or event. Edge computing complements the cloud and can take place in our smartphones, nearby datacenter, a factory, or a hospital.  

2. Know the context for the action

An actionable analytics process comprises of four main steps:

  1. Capture: ingest the data or events
  2. Contextualize: Enrich the event with historical, environmental, and operational context
  3. Inference (AI): Analyze the enriched data to predict or classify an outcome
  4. Act: Serve the results to control systems or dashboards

By now everyone understands that up to date interactive dashboards that drive real-time actions are necessary; the market is flooded with “real-time/streaming analytics” tools that can observe and aggregate events flowing through them (e.g., watch the temperature trend of a sensor or the market trend of a stock), compare the results with some prelearned algorithmic or AI model, and act immediately.

value of data fig 2 Yaron Haviv

Traditional stream processing architecture

Unfortunately, this approach is limited at best.  It cannot enrich events with up-to-date external or shared context (weather information, user record data, etc.) because that data didn’t flow through it. Furthermore, to maintain low-latency, streaming tools place data in-memory and can only maintain a short historical view.

It is essential that a business’s analytics or AI solution use as much context as possible to maximize insights accuracy. This entails incorporating:

  • Event context: Information extracted from the event such as web request details, sensor data, or voice command.
  • Historical context: Information aggregated over time such as hourly average of the stock price or sensor temperature, or the previous geolocation of the car.
  • Operational context: Information from operational data systems such as user transaction balance, home address, machine servicing time, or company market cap.
  • Environmental context: things that may impact the event; for example, weather, traffic conditions, market trends, or social media sentiments.

Adding broader context can be accomplished by integrating a real-time database as part of the streaming tools that store the aggregated data or cache enough external environmental or operational data. The real-time database must be able to serve many concurrent operations simultaneously and not limit the data to in-memory capacity.    

3. Build a continuous online service

Many companies still think about development as a project with a clear long-term goal and concrete start and end dates. What are the features and services that Facebook, LinkedIn, or Google are planning to release? Does that make sense?  Probably not, because new features are added daily, sometimes as an immediate reaction to a competing offering or a new trend.

Modern online applications must not be designed using traditional or monolithic approaches, instead they should adopt an agile and cloud-native methodology of continuous development and integration. You partition your service to smaller microservices that can evolve, autoscale, or be upgraded independently and break development to short sprints with few simple and well-defined goals or features. The testing and staging of your solution is automated using one of the many CI/CD frameworks (such as Jenkins or Travis).

You can form sets of microservices based on existing open source and commercial tools to handle the four analytics steps: capture, contextualize, analyze, and act. For example:

  • API microservices to capture web transactions, sensor data, voice commands or chatbot data, preprocess or contextualize them, and push them into a stream.
  • Serverless functions, stream processing microservices (like Apache Spark), or AI inferencing logic (using tools like TensorFlow) that analyze the data.
  • Microservices that act immediately on the results by sending an alert, controlling a device, or interacting directly with the user.
  • Microservices for data visualization or data movement/transformation.
value of data fig 3 Yaron Haviv

Continuous analytics service architecture

This loosely coupled and agile architecture lets you add and modify functionality frequently. Microservices can be automatically deployed, scaled, and upgraded by using Kubernetes, a widely adopted orchestration framework that can be delivered as a service through all major cloud or edge providers. Another great way to quickly develop and productize microservices is to use serverless platforms (such as Amazon Lambda or Azure Functions) or cloud-independent open source alternatives like Nuclio, OpenFaaS, or OpenWhisk (these also have native integration with Kubernetes).   

As for storing the data, there are many options you can use or deploy from scratch. A wise move is to focus your energy on building a differentiated application and consuming pre-integrated data services and databases from cloud providers or faster/better independent offerings vs. implementing all the data availability, security, servicing, and glue related features by yourself.

The only way to succeed with a data strategy is to adopt the three golden rules:

  1. Focus on time to action (not on collecting big data).
  2. Deliver optimal results based on a broad data context.
  3. Design a continuous, agile, and online service—not a one-time application or model.

This article is published as part of the IDG Contributor Network. Want to Join?