Review: Apache Hive brings real-time queries to Hadoop
Hive's SQL-like query language and vastly improved speed on huge data sets make it the perfect partner for an enterprise data warehouse
Organizations or departments without a data warehouse can start with Hive to get a feel for the value of data analytics while keeping startup costs to a minimum. Although Hive doesn't offer a complete data warehouse solution, it does make a great, low-cost, large-scale operational data store with a fair set of analytics tools. If your analytics needs outgrow those satisfied by Hive, many traditional data warehouse vendors offer connectors and tools to bring the data into the warehouse, preserving your investments.
Until the playing field levels, companies making the best decisions -- decisions based on data and analytics -- will have a competitive advantage. Hive offers near linear scalability in query processing, an order of magnitude better price/performance ratio than traditional enterprise data warehouses, and a low barrier to entry. With 10TB enterprise data warehouse solutions costing around $1 million, managing large unstructured data sets with Hive makes a lot of sense.
Apache Hive at a glance
Pros |
|
Cons |
|
Platforms | Works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y |
Cost | Free open source under the Apache License, Version 2.0 |
This article, "Review: Apache Hive brings real-time queries to Hadoop," was originally published at InfoWorld.com. Follow the latest developments in big data and open source at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.
Copyright © 2014 IDG Communications, Inc.