How to choose a cloud data warehouse

Modern data warehouses can query structured data and semi-structured data simultaneously, and even combine historical data and streaming live data for analysis.

Vertica provides a unified analytics warehouse across major public clouds and on-premises data centers and integrates data in cloud object storage and HDFS without forcing you to move any of your data. Vertica offers two deployment options. Vertica in Enterprise Mode runs on industry-standard servers with tightly coupled storage, delivering the highest performance for use cases that demand consistent compute capacity. Vertica in Eon Mode has a cloud-native architecture that separates compute from storage, enabling simplified management for variable workloads with the flexibility to apply specific compute resources to shared storage for different business use cases. Vertica in Eon Mode is available on Amazon Web Services and Google Cloud Platform but is not limited to public cloud deployments.

Yandex ClickHouse

Yandex ClickHouse is an open source, column-oriented, OLAP database management system that manages extremely large volumes of data, including non-aggregated data, and allows generating custom data reports online in real time. The system is linearly scalable and can be scaled up to store and process trillions of rows and petabytes of data. ClickHouse is designed to work on regular hard drives, which means the cost per GB of data storage is low, but SSD and additional RAM are also fully used if available.

In ClickHouse, data can reside on different shards. Each shard can be a group of replicas that are used for fault tolerance. The query is processed on all the shards in parallel. ClickHouse uses asynchronous multi-master replication. After being written to any available replica, data is distributed to all of the remaining replicas in the background. ClickHouse is available as a cloud service from Yandex, Altinity (on AWS), Alibaba, SberCloud, and Tencent.

Yellowbrick Data Warehouse

Yellowbrick Data Warehouse is a modern, massively parallel processing, analytic database designed for the most demanding batch, real-time, interactive, and mixed workloads. Yellowbrick lets you provision data warehouses wherever needed—in private data centers, multiple public clouds, and the network edge. Yellowbrick promotes its use for data lake augmentation and data warehouse modernization.

As you evaluate cloud data warehouses, look for administrative simplicity, high scalability, high performance, good integrations, and reasonable cost. Ask for customer references, especially for large deployments, and do your own proof of concept. Look explicitly at the vendor’s current and planned machine learning capabilities, since much of the business value of data warehouses comes from obtaining and applying predictive analytics.

