Greenplum goes open source -- and a new cloud analytics star is born

An open source Greenplum presents a golden opportunity for cloud vendors of big data solutions

Greenplum goes open source -- and a new cloud analytics star is born
LDFranklin/DeviantArt

Greenplum Database, Pivotal's data warehouse solution, has come full circle. Once derived from the open source PostgreSQL, Greenplum is open source once again.

Greenplum could be used to yank the rug out from under the stagnant legacy players in data warehousing and analytic RDBMSes, but Oracle, Impala, and Teradata alone aren't the competition. Rather, cloud leaders are also at risk.

Cloud vendors would be foolish to pass up the opportunity around this development. Greenplum is a name-brand data warehouse solution, and now most everyone with a cloud platform will have a chance to capture a segment of its customer base.

Pivoting to free

Greenplum going open source was part of a promise made earlier this year. Months ago, Pivotal vowed that all the components of its Big Data Suite -- Greenplum, the HAWQ SQL query engine, and the GemFire in-memory database technology -- would be made open source with cost-plus enterprise components and support subscriptions.

When this plan was first announced, InfoWorld's Andy Oliver viewed it as the company pivoting (pun intended) toward Cloud Foundry as a business model, noting that "an open source Greenplum isn't great news for Teradata." Likewise, database industry analyst Curt Monash took note. To him, Greenplum was a genuine contender as an analytic RDBMS; for it to go open source meant another low-cost-of-entry alternative to existing legacy products.

For users, Greenplum immediately becomes a prime candidate to displace existing on-prem data warehouse products -- costly stalwarts like Teradata, HP Vertica, IBM Netezza, and Oracle Exadata. Alternatively, Pivotal or another party could offer Greenplum as a service in the cloud.

The hard part is the actual service

The movement to offer cloud data warehousing solutions at the scale normally associated with conventional databases hosted in the cloud is already under way. Amazon has Redshift, and Microsoft recently unveiled Azure SQL Data Warehouse. Meanwhile, Google has BigQuery, but is also talking up the highly scalable Mesa. Startups, too, are nosing their way into the tent: Snowflake, founded by former Microsoftie Bob Muglia, boasts high elasticity and simplicity of use as two big benefits.

Greenplum as a service seems a no-brainer, and not only because Greenplum is a familiar name with an existing customer base. In fact, MySQL or PostgreSQL (Greenplum's technological ancestor) are now offered in the cloud through similar technology. If Pivotal's work to speed up and enrich Greenplum turns out to be more than marketing smoke, it should benefit even more with the support of cloud-scale deployments and engineering.

However, Greenplum will have to play up its benefits, particularly two factors: ease of migration for existing Greenplum installations, and roads into actionable insights, whether by enrichment with data from other cloud-hosted products or integration with emerging analytics technologies like Spark. If the industry has been looking for a novel opportunity to show what can be done with big data in the cloud, this is it.

Copyright © 2015 IDG Communications, Inc.

InfoWorld Technology of the Year Awards 2023. Now open for entries!