EMC blends Greenplum analytics with backup and recovery

EMC Greenplum Data Computing Appliance combines deep-data queries, dedupe and rapid data movement for data warehousing hybrid

EMC's recent acquisition of Greenplum has already bore fruit in the form of a new hybrid data warehousing system, as predicted by industry analysts. The company today announced the EMC Greenplum Data Computing Appliance -- note the use of "computing" in place of "warehousing" -- designed to bring order, quick transportability, and resilience to organizations' ever-growing pools of widespread, unstructured data.

With the announcement, EMC brazenly calls out competitors such as Oracle, which, thanks to its acquisition of Sun, offers a hybrid hardware-software bundle of its own, the Exadata line, combining seemingly incompatible data-warehousing capabilities with OLTP. Other plays in the space include Teradata and Netezza; IBM is in the process of acquiring the latter in a similar effort to take on Oracle.

[ Are your storage requirements out of control? Then start by eliminating data redundancy. InfoWorld contributor Keith Schultz lays it all out in our Deep Dive Report on Data Deduplication. ]

The EMC Greenplum DCA comes loaded with the latest version of Greenplum's database software, now at the respectable age of 4.0. Version 4.0, according to EMC, brings advanced workload management, fault tolerance, and analytics to the database's existing resume of complex query optimization, embedded languages and analytics, and monitoring and administration features.

The database continues to run on third-party hardware from companies such as HP and IBM. Not surprisingly, EMC is declaring its homegrown hardware to be fully optimized to reap top performance from Greenplum Database 4.0, yielding, for example, fast query performance and data loading.

In addition, the EMC Greenplum DCA can serve as a data backup and recovery system, drawing on EMC's Data Domain storage deduplication technology. Among the appliance's tricks, its capacity can be effectively doubled, according to EMC, by using free space on an organization's SAN. Backup data can be pushed to alternative storage media, such as tape, thus freeing up space on the machine while potentially saving on the expense of using more spinning disks than necessary for storage.

The appliance also works with existing EMC storage systems' RecoverPoint feature, through which data can be rapidly moved from site to site for the purpose of, among other things, data recovery.

In its announcement, EMC emphasizes speed, capacity, and scalability as major differentiators, compared to Oracle's Exadata X2-8, the Netezza TwinFin 12, and the TeraData 2580. For example, EMC claims its system scans uncompressed data at a rate of 24GB per second and loads at a speed of more than 10TB per hour. The Oracle system, EMC will tell you, scans at 25GB per second (but only under very specific circumstances) while loading at around 5TB per hour. Netezza and TeraData both scan at 10GB per second, according to EMC. Netezza loads at 2TB per second and TeraData's numbers are "to be determined."

In terms of capacity and scalability, EMC reports that the Greenplum appliance can expand to 4,608 database cores spread over 24 racks. A single rack can hold 36TB of uncompressed data or 144TB of compressed data. That, EMC claims, represents up to three times more scalability and up to four times as many cores as competitive systems.

This article, "EMC blends Greenplum analytics with backup and recovery," was originally published at InfoWorld.com. Get the first word on what the important tech news really means with the InfoWorld Tech Watch blog.