How to do real-time analytics across historical and live data

5 in-memory computing platform capabilities that support analytical processing of both data lake data and operational streams

Today’s analytical requirements are putting unprecedented pressures on existing data infrastructures. Performing real-time analytics across operational and stored data is typically critical to success but always challenging to implement.

Consider an airline that wants to collect and analyze a continuous stream of data from its jet engines to enable predictive maintenance and faster time to issue resolution. Each engine has hundreds of sensors that monitor conditions such as temperature, speed, and vibration, and continuously send this information to an internet of things (IoT) platform. After the IoT platform ingests, processes, and analyzes the data, it is stored in a data lake (also known as an operational data store), with only the most recent data retained in the operational database.

Now, whenever an anomalous reading in the live data triggers an alert for a particular engine, the airline needs to run real-time analytics across the live operational data and the stored historical data for that engine. However, the airline may find that accomplishing real-time analytics using its current infrastructure is impossible. 

Today, companies developing big data initiatives typically use Hadoop to store a copy of their operational data in a data lake, where data scientists can access the data for various analyses. When the use case requires running real-time analytics across the incoming operational data as well as a subset of the data stored in the data lake, the traditional infrastructure becomes a stumbling block. There are inherent delays in accessing data stored in a data lake as well as challenges running federated queries across the combined data lake and operational data.

To continue reading this article register now