Billed as a cloud-native data platform for analytics, AI, and machine learning, Qubole offers solutions for customer engagement, digital transformation, data-driven products, digital marketing, modernization, and security intelligence. It claims fast time to value, multi-cloud support, 10x administrator productivity, a 1:200 operator-to-user ratio, and lower cloud costs.
What Qubole actually does, based on my brief experience with the platform, is to integrate a number of open-source tools, and a few proprietary tools, to create a cloud-based, self-service big data experience for data analysts, data engineers, and data scientists.
Qubole takes you from ETL through exploratory data analysis and model building to deploying models at production scale. Along the way, it automates a number of cloud operations, such as provisioning and scaling resources, that can otherwise require significant amounts of administrator time. Whether that automation actually will allow a 10x increase in administrator productivity or a 1:200 operator-to-user ratio for any specific company or use case is not clear.
Qubole tends to pound on the concept of “active data.” Basically, most data lakes—which are essentially file stores filled with data from many sources, all in one place but not in one database—have a low percentage of data that is actively used for analysis. Qubole estimates that most data lakes are 10% active and 90% inactive, and predicts that it can reverse that ratio.