Greenplum entices developers with fully loaded Hadoop sandbox

Greenplum's massive 1,000-node Analytics Workbench lets organizations test the limits of their cloud code

In a move to garner valuable support from the developer community while honing its own big data chops, EMC-offshoot Greenplum has unveiled a cloud computing cluster called Analytics Workbench, built on Hadoop, on which developers can freely experiment with their code in the cloud.

Greenplum's end goal is to see Hadoop widely deployed in the enterprise -- no doubt with the company's own big data analytics tool in the mix as it takes on rivals like IBM and Oracle, as well as other organizations jumping aboard the Hadoop wagon.

The 1,000-node cluster boasts 24PB of physical storage and 48TB of memory, and it comes loaded with the entire Hadoop stack, including the Hadoop Distributed File System, MapReduce, Pig, Hive, HBase, and Mahout. Also in the grab bag is Greenplum Database, intended to augment the workbench's SQL capabilities. Additionally, Greenplum is serving up open, freely available data -- both structured and unstructured -- from such sources as social media, sensor devices, and call centers. The cluster is connected via 56Gbps InfiniBand interconnect.

Greenplum's vision is to let developers test the scalability of mixed-mode applications in a large-scale cloud-computing environment while also giving them a chance to see Greenplum's big data analytics software in action. Working with the Apache Software Foundation, the company will share the useful metrics it gleans from Analytics Workbench with the open source community.

Several vendors have contributed hardware to the Analytics Workbench, including Super Micro, which shelled out 1,000 2U Greenplum Hadoop OEM Servers; Micron, which offered 6,000 DDR3 RDIMM memory sticks; Intel, which contributed 2,000 Westmere processors; and Seagate, which added 12,000 2TB drives.

Mellanox, meanwhile, has brought to the table ConnectX-3 VPI network cards, SwitchX VPI Switches, FDR cables, and a software plug-in called Mellanox UDA (Unstructured Data Accelerator), designed to accelerate Hadoop network and improve the scaling of Hadoop clusters executing data-analytics intensive applications. VMware's Rubicon team is providing Tier-1 and Tier-2 support for the cluster, monitoring the network, and systems using Zabbix coupled with homegrown plug-ins and a dashboard.

In addition to offering Workbench access to big data application developers, Greenplum is opening it up to select academic institutions, such as Stanford and MIT. Not just anyone can use the cluster, however; organizations must apply and pass a vetting process. Alternatively, anyone who passes the company's forthcoming training and certification classes in Hadoop will receive access.

This story, "Greenplum entices developers with fully loaded Hadoop sandbox," was originally published at Get the first word on what the important tech news really means with the InfoWorld Tech Watch blog. For the latest developments in business technology news, follow on Twitter.