What MapReduce does for batch processing, Cloudera Impala does for real-time SQL queries. The Impala engine sits on all the data nodes in your Hadoop cluster, listening for queries. After parsing each query and optimizing an execution plan, it coordinates parallel processing among the worker nodes in the cluster. The result is low-latency SQL queries across Hadoop with near-real-time insight into big data.
Because Impala uses your native Hadoop infrastructure (HDFS, HBase, Hive metadata), you get a unified platform where you can analyze all of your data without connector complexities, ETL, or expensive data warehousing. And because Impala can be tapped from any ODBC/JDBC source, it makes a great companion for BI packages like Pentaho.
-- James R. Borck