For data interoperability, the package includes a metadata catalogue that should make it easier for business intelligence and other data analysis products to query Hadoop datasets. Based on Apache HCatalog, this metadata repository provides pointers to Hadoop data in a set of tables that can be easily queried by tools commonly used for relational databases, enterprise data warehouses and other structured data systems.
The package also includes a copy of Talend Open Studio, which provides a GUI (graphical user interface) for exploring, querying and applying logical workflows to Hadoop data sets.
Created in 2005 to analyze large amounts of Web traffic logs, Hadoop is increasingly being used for analyzing swaths of unstructured data too large and unwieldy to be crammed into a relational database or enterprise data warehouse -- data often referred to as big data. In survey results released Tuesday by IT consulting company Capgenimi, 58 percent of 600 senior business and IT executives had stated that they plan to invest in big data systems, such as Hadoop, over the next three years.
In addition to announcing this release, Hortonworks also announced that it has teamed with VMware to provide a set of tools to run HDP in high-availability (HA) mode. VMware's vSphere can monitor Hadoop NameNode and JobTracker services. Should one of these services fail, vSphere can redirect operations to live backup services and keep the cluster running.
HDP itself will be available for a free download. Using a payment model similar to Red Hat's, Hortonworks will offer support subscriptions. Pricing is based on a per-cluster basis, starting at US$12,500 per year for 10 nodes.