HPCC also possesses more mission-critical functionality, says Boris Evelson, vice president and principal analyst for Application Development and Delivery at Forrester Research. Because it's been in use for much longer, HPCC has layers-security, recovery, audit and compliance, for example-that Hadoop lacks. Lose data during a search and it's not gone forever, Evelson says. It can be recovered like a traditional data warehouse such as Teradata.
How-to: Secure Big Data in Hadoop
Rags Srinivasan, senior manager for big data products at Symantec, wrote about this shortcoming in a May 2012 blog post on issues with enterprise Hadoop: "No reliable backup solution for Hadoop cluster exists. Hadoops way of storing three copies of data is not the same as backup. It does not provide archiving or point in time recovery."
Although Hadoop is less mature in these areas, it's not intended to be used in a production environment, so these distinctions may not be that important at the moment, says Jeff Kelly, big data analyst at Wikibon. What it's being used for is analyzing massive amounts of data to find correlations between heretofore hard-to-connect data points. Once these points are uncovered, the data is often moved to a more traditional business intelligence solution and data warehouse for further analysis.
"Currently, the most common use case for Hadoop is as a large-scale staging area," Kelly says. "Essentially [it is] a platform for adding structure to large volumes of multi-unstructured data so that it can then be analyzed by relational-style database technology."
ECL: A high-level query language with a drag-and-drop interface
Another key benefit of ECL, Villanustre says, is that it's very much like high-level query languages such as SQL. If you're a Microsoft Excel maven, then, you should have no trouble picking up ECL.
Developing queries is further simplified by the work HPCC has done with analytics provider Pentaho and its open source Kettle project, which lets users create ECL queries in a drag and drop interface. This isn't possible with Hadoop's Pig or Hive query languages yet.
HPCC is also designed to answer real-world questions. Hadoop requires users to put together separate queries for each variable they seek; HPCC does not.
"ECL is a little bit like SQL...in that it is declarative, so you tell the computer what you want rather than how to do it," Villanustre says. Pig and Hive, on the other hand, are quite primitive. "They are hard to program, they are hard to maintain and they are hard to extend and reuse the code-which are the key elements for any computer language to be successful."
Hadoop's advantages? It's scalable, flexible, inexpensive
Charles Zedlewski, vice president of products at Cloudera, disagrees with this perspective. Cloudera, after all, is among the best-known and most successful Hadoop start-ups, providing turnkey Hadoop implementations to companies as diverse as eBay, Chevron and Nokia.