Review: Apache Hive brings real-time queries to Hadoop

Hive's SQL-like query language and vastly improved speed on huge data sets make it the perfect partner for an enterprise data warehouse

1 2 3 4 5 Page 2
Page 2 of 5

In order to address these shortcomings, the Hive community began a program (sometimes called the "Stinger" initiative) to improve query speed, with the goal of making Hive suitable for real-time, interactive queries and exploration. These improvements were delivered in three phases in versions 0.11, 0.12, and 0.13 of Hive.

Finally, although HiveQL, the query language, is based on SQL-92, it differs from SQL in some important ways due to its running on top of Hadoop. For instance, DDL (Data Definition Language) commands need to account for the fact that tables exist in a multi-user file system that supports multiple storage formats. Nevertheless, SQL users will find the HiveQL language familiar and should not have any problems adapting to it.

Hive platform architecture
From the top down, Hive looks much like any other relational database. Users write SQL queries and submit them for processing, using either a command line tool that interacts directly with the database engine or by using third-party tools that communicate with the database via JDBC or ODBC. The picture looks something like this:

Apache Hive architecture
The Hive architecture.

By using the JDBC and ODBC drivers, available for Mac and Windows, data workers can connect their favorite SQL client to Hive to browse, query, and create tables. For power users, there is still the original, thick client CLI that interacts directly with the Hive driver. This client is the most robust, but it requires direct access to Hadoop and therefore is most suitable for local network operations where firewalls, DNS, and network topology aren't an issue.

The Hive metastore, HCatalog, previously a separate Hadoop project, has been rolled up into the Hive distribution. Backed by its own relational database, it saves schemas that are defined in Hive, simplifying new queries, as well as making the schemas available to other tools in the Hadoop tool chain such as Pig.

1 2 3 4 5 Page 2
Page 2 of 5
InfoWorld Technology of the Year Awards 2023. Now open for entries!