Real-time data processing with data streaming: new tools for a new era

Real-time data streaming is still early in its adoption, but over the next few years organizations with successful rollouts will gain a competitive advantage

Become An Insider

Sign up now and get FREE access to hundreds of Insider articles, guides, reviews, interviews, blogs, and other premium content. Learn more.

Today, there are many data sources—such at IoT devices, user interaction events from mobile applications, financial service transactions, and health monitoring systems—that broadcast critical information in real time. Developers working with these data sources need to think about the architecture to capture real time streaming data at varying scales and complexities.

It used to be that processing real time information at significant scale was hard to implement. Hardware architectures needed to be engineered for low latency while software needed more advanced programming techniques that combined receiving data, processing it, and shipping it efficiently.

The paradigm shift in data processing

I recently attended the Strata Data Conference and discovered a paradigm shift: There are multiple frameworks (both open source and commercial) that let developers handle data streaming or real time data-processing payloads. There are also commercial tools that simplify the programming, scaling, monitoring, and data management of data streams.

The world isn’t batch anymore, and the tools to process data streams is a lot more accessible today than just two or three years ago.

Today’s the tools, architectures, and approaches all are very different from those used historically for data integration and data warehousing, which grew up during an era of batch processing. You developed scripts or jobs that extracted data mostly from flat files, transformed it into a usable structure, and loaded it into a database or other data-management system. These ETL (extract, transform, load) scripts were deployed directly to servers and scheduled to run with tools like Unix cron, or they were services that ran when new data was available, or they were engineered in an ETL platform from Informatica, Talend, IBM, Microsoft, or other provider.

To continue reading this article register now