An open source software company is something of a paradox. On the one hand, it has to convince customers that software is increasingly becoming commoditized, that proprietary software is limiting and expensive, and that standards-based, community-developed and -supported open source software is the way to go. On the other hand, an open source company has to persuade those same customers that they should pay it for the use of that same software.
It stands to reason that not every open source company will survive to become a successful business over the long haul. Likewise, the business models around open source must continue to evolve to match the changing nature of the market.
When that next wave of open source companies hits, Scott Yara, president and cofounder of Greenplum, wants to be riding the crest. Of course, it helps when your commercial open source solution is considerably less expensive than its best-known competitor.
Greenplum, Yara says, has a clear market focus: delivering a database product based on open source code that's tailored specifically for BI and data warehousing. That means it's gunning for companies like Teradata, a division of NCR that Yara says charges in the neighborhood of $1 million per terabyte of storage on its high-end data warehousing hardware. By building on a core of open source, Yara plans to pull the rug out from under that business model.
Greenplum's core code base is a fork of the open source database PostgreSQL. Called Bizgres, it includes tweaks and code patches designed to meet the needs of BI, which Yara says has very different requirements from traditional OLTP (online transaction processing) applications. In a sense, you can think of it as a distribution of PostgreSQL, in the same way that Ubuntu is a distribution of Linux.
Building on that foundation, Greenplum's business hinges on an even more sophisticated product, Bizgres MPP, which adds the capability to do "shared nothing" database clustering. This unusual form of clustering -- among the mainstream commercial database vendors, only IBM DB2 can do this -- can dramatically improve the performance of a data warehouse by dividing up the work for complex database queries among multiple nodes on a cluster.
By comparison, Oracle RAC (Real Application Clusters) uses a "shared everything" approach, which replicates all the data across all the database nodes, dramatically improving reliability but doing little to increase performance. With a shared-nothing design, by contrast, a complex query that might have taken days to complete on a single database instance can be resolved in hours or less.
The catch? While it's built on open source, Bizgres MPP is essentially proprietary software. It ships as a binary-only distribution; no source code. The clustering facility is Greenplum's intellectual property.
Is that a deal-breaker? Greenplum isn't the first company to try this model. Another company, EnterpriseDB, offers an Oracle-compatible distribution of PostgreSQL under similar terms. In the case of Greenplum's potential customers, whether they have access to all the code or not, the cost savings are clearly there.