How to choose a cloud data warehouse

Modern data warehouses can query structured data and semi-structured data simultaneously, and even combine historical data and streaming live data for analysis.

How to choose a cloud data warehouse
Thinkstock

Enterprise data warehouses, or EDWs, are unified databases for all historical data across an enterprise, optimized for analytics. These days, organizations implementing data warehouses often consider creating the data warehouse in the cloud rather than on premises. Many also consider using data lakes that support queries instead of traditional data warehouses. A third question is whether you want to combine historical data with streaming live data.

A data warehouse is an analytic, usually relational, database created from two or more data sources, typically to store historical data, which may have a scale of petabytes. Data warehouses often have significant compute and memory resources for running complicated queries and generating reports, and are often the data sources for business intelligence (BI) systems and machine learning.

The write throughput requirements of transactional operational databases limit the number and kind of indexes you can create (more indexes mean more writes and updates per record added, and more possible contention). This in turn slows down analytic queries against the operation database. Once you have exported your data into a data warehouse, you can index everything you care about in the data warehouse for good analytic query performance, without affecting the write performance of the separate OLTP (online transaction processing) database.

Data marts contain data oriented toward a specific business line. Data marts may be dependent on the data warehouse, independent of the data warehouse (i.e., drawn from an operational database or external source), or a hybrid of the two.

Data lakes, which store files of data in its native format, are essentially “schema on read,” meaning that any application that reads data from the lake will need to impose its own types and relationships on the data. Traditional data warehouses, on the other hand, are “schema on write,” meaning that data types, indexes, and relationships are imposed on the data as it is stored in the data warehouse.

To continue reading this article register now

How to choose a low-code development platform