When you hear the expression "big data," the next word you will often hear is "Hadoop." That's because the underlying technology that has made massive amounts of data accessible is based on the open source Apache Hadoop project.
From the outside looking in, you would rightly assume then that Hadoop is big data and vice versa; that without one the other cannot be. But there is a Hadoop competitor that in many ways is more mature and enterprise-ready: High Performance Computing Cluster.
[ Harness the power of Hadoop with InfoWorld's 7 top tools for taming big data. | Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. | Explore the current trends and solutions in BI with InfoWorld's interactive Business Intelligence iGuide. | Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. ]
Like Hadoop, HPCC is open-sourced under the Apache 2.0 license and is free to use. Both likewise leverage commodity hardware and local storage interconnected through IP networks, allowing for parallel data processing and/or querying across the architectures. This is where most of the similarities end, says Flavio Villanustre, vice president of information security and the lead for the HPCC Systems initiative within LexisNexis.
HPCC older, wiser than Hadoop?
HPCC has been in production use for more than 12 years, though the HPCC open source version has been available for only a little more than a year. Hadoop, on the other hand, was originally part of the Nutch project that Google put together to parse and analyze log files and wasn't even its own Apache project until 2006. Since that time, though, it has become the de facto standard for big data projects, far outpacing HPCC's 60 or so enterprise users. Hadoop is also supported by an open source community in the millions and an entire ecosystem of start-ups springing up to take advantage of this leadership position.
That said, HPCC is a more mature enterprise-ready package that uses a higher-level programming language called enterprise control language ( ECL) based on C++, as opposed to Hadoop's Java. This, says Villanustre, gives HPCC advantages in terms of ease of use as well as backup and recovery of production. Speed is enhanced in HPCC because C++ runs natively on top of the operating system, while Java requires a Java virtual machine (JVM) to execute.