Enterprise IT has long trended toward generic, white-box hardware and away from dedicated systems, with the real magic happening in the software. Hadoop is no exception -- one of its most appealing characteristics is that it can be run on most any hardware around.
But there's a difference between merely running Hadoop and running it well at scale, which a number of hardware vendors have kept in mind as they've assembled dedicated Hadoop solutions. The five outlined here sell their iron to businesses that want to run their own in-house Hadoop solution, but each one tilts toward a different segment (read: price tag) and market varying Hadoop distributions as part of the package.
HP offers Hadoop via two grades of its ProLiant-branded servers: the DL series, optimized for storage density, and the SL series, optimized for scale-out. These form factors are used to satisfy two separate classes of Hadoop users. There are folks who want conventional rack-mount hardware (DL), while others want solutions optimized for scale-out over time (SL). HP also supports three of the biggest Hadoop distributions: Cloudera, Hortonworks (featuring strong connections back to Microsoft and Azure), and MapR.
Additionally, HP provides bidirectional integration with its Vertica database system -- useful for those either already invested in Vertica or looking to migrate from it.
If you plan to build a Hadoop cluster with Dell's hardware, it's Cloudera or nothing. Maybe that's not the worst approach, since Cloudera consistently positions itself as an enterprise-geared offering. Plus, Dell's Hadoop appliance is built from the inside out to run Cloudera with an emphasis on Apache Spark, the in-memory Hadoop processing framework, and offers the ability to scale up to 48 nodes.
For those seeking to get a leg up with Hadoop, Dell also offers a low-end QuickStart Hadoop solution, using Dell PowerEdge servers and a scaled-down version of Cloudera's enterprise product.
Best known for its line of ultra-high-end supercomputing systems, Cray has set about applying the lessons learned from that space to its Hadoop appliances. The Urika-XA appliance, announced for a December release, doesn't skimp on features: 38TB of SSD storage, 1,500 processor cores, InfiniBand interconnects, and Cray's own Sonexion storage system, all in a 42U rack. (The edition of Hadoop preloaded is not a commercial distribution, however -- it's apparently the generic Apache variety, with some of Cray's management software to add value.)
There's little question all this comes at a premium cost. Not only does Cray's announcement for the Urika-XA omit pricing (if you have to ask, you can't afford it), but one of the listed customers is none other than the Department of Energy's Oak Ridge National Laboratory. It's for the highest of high-end customers only, it seems.
Oracle's big-data solutions are as costly as you can imagine -- $525,000 for its chief Hadoop offering, the Big Data Appliance. Predictably, the focus is as much on Oracle as it is on Hadoop, since the other main ingredients include Oracle Linux and Java, as well as Oracle's NoSQL Database product. As for Hadoop itself, the only distribution you can get with it is Cloudera.
Not a name you'd associate with Hadoop, is it? Yet the networking giant (also a stealth giant in server sales) indeed has Hadoop hardware built with its Unified Computing System architecture's C240 M3 system and that emphasizes storage density, offering anywhere from 115TB to 576TB in a single rack.
Major names have gone missing from this list in the last year or so. IBM, for instance, no longer seems to be offering its PureData System for Hadoop, most likely due to divesting its x86 server business to Lenovo and concentrating more on the software side of Hadoop.