Qubole review: Self-service big data analytics

Cloud-native data platform puts Spark, Presto, Hive, and Airflow at your fingertips, while controlling your cloud spending

Qubole review: Self-service big data analytics
Thinkstock
At a Glance

Billed as a cloud-native data platform for analytics, AI, and machine learning, Qubole offers solutions for customer engagement, digital transformation, data-driven products, digital marketing, modernization, and security intelligence. It claims fast time to value, multi-cloud support, 10x administrator productivity, a 1:200 operator-to-user ratio, and lower cloud costs.

editors choice award logo plum InfoWorld

What Qubole actually does, based on my brief experience with the platform, is to integrate a number of open-source tools, and a few proprietary tools, to create a cloud-based, self-service big data experience for data analysts, data engineers, and data scientists.

Qubole takes you from ETL through exploratory data analysis and model building to deploying models at production scale. Along the way, it automates a number of cloud operations, such as provisioning and scaling resources, that can otherwise require significant amounts of administrator time. Whether that automation actually will allow a 10x increase in administrator productivity or a 1:200 operator-to-user ratio for any specific company or use case is not clear.

Qubole tends to pound on the concept of “active data.” Basically, most data lakes—which are essentially file stores filled with data from many sources, all in one place but not in one database—have a low percentage of data that is actively used for analysis. Qubole estimates that most data lakes are 10% active and 90% inactive, and predicts that it can reverse that ratio.

To continue reading this article register now