How to choose a streaming data platform

Many of the best platforms for managing, storing, and analyzing streaming data are Apache open source projects, along with commercial and cloud implementations of those projects.

How to choose a streaming data platform
Gonin / Getty Images

Streaming data is generated continuously, often by thousands of data sources, such as sensors or server logs. Streaming data records are often small, perhaps a few kilobytes each, but there are many of them, and in many cases the stream goes on and on without ever stopping. In this article, we will provide some background and discuss how to choose a streaming data platform.

How do streaming data platforms work?

Ingestion and data export. In general, both data ingestion and data export are handled over data connectors that are specialized for the foreign systems. In some cases there is an ETL (extract, transform, and load) or ELT (extract, load, and transform) process to reorder, clean, and condition the data for its destination.

Ingestion for streaming data often reads data generated by multiple sources, sometimes thousands of them, such as in the case of IoT (internet of things) devices. Data export is sometimes to a data warehouse or data lake for deep analysis and machine learning.

Pub/sub and topics. Many streaming data platforms, including Apache Kafka and Apache Pulsar, implement a publish and subscribe model, with data organized into topics. Ingested data may be tagged with one or more topics, so that clients subscribed to any of those topics can receive the data. For example, in an online news publishing use case, an article about a politician’s speech might be tagged as Breaking News, US News, and Politics, so that it could be included in each of those sections by the page layout software under the supervision of the (human) section editor.

To continue reading this article register now

How to choose a low-code development platform