Though conventional relational database management technologies have supported interactive querying for years, Dremel offers far greater scalability and speed, contends Google. Thousands of users at Google operations use Dremel for a variety of applications, such as analyzing crawled web documents, tracking installation data for Android applications, crash reporting and for maintaining disk I/O statistics for hundreds of thousands of disks.
Dremel, though, isn't a replacement for MapReduce and Hadoop, said Ju-kay Kwek, product manager of Google's recently-launched BigQuery hosted big data analytics service based on Dremel. Google uses Dremel in conjunction with MapReduce, he said. Hadoop MapReduce is used to prepare, clean, transform and stage massive amounts of server log data, and then Dremel is used to analyze the data.
Hadoop and Dremel are distributed computing technologies, but each was built to address very different problems, Kwek said. For example, if Google were trying to troubleshoot a problem with its Gmail service, it would need to look through massive volumes of log data to pinpoint the issue quickly.
"Gmail has 450 million users. If every user had several hundred interactions with Gmail think of the number of events and interaction we would have to log," Kwek said. "Dremel allows us to go into the system and start to interrogate those logs with speculative queries," Kwek said. A Google engineer could say, "show me all the response times that were above 10 seconds. Now show it to me by region," Kwek said. Dremel allows engineers to very quickly pinpoint where the slowdown was occurring, Kwek said.
"Dremel distributes data across many, many machines and it distributes the query to all of the servers and asks each one 'do you have my answer?' It then aggregates it and gets back the answer in literally seconds."
Using Hadoop and MapReduce for the same task would take longer because it requires writing a job, launching it and waiting for it to spread across the cluster before the information can be sent back to a user. "You can do it, but it's messy. It's like trying to use a cup to slice bread," Kwek said.
The same kind of data volumes that pushed Google to Dremel years ago have started emerging in some mainstream enterprise organizations as well, Kwek said.
Companies in the automobile, pharmaceutical, logistics and financial services industries are constantly inundated with data and are looking for tools to help them quickly query and analyze it.
Google's hosted BigQuery analytics service is being positioned to take advantage of the need for new big data technologies. In fact, said Gartner analyst Rita Sallam, the Dremel-based hosted service could be a game changer for big data analytics.
The service allows enterprises to interactively query massive data sets without having to buy expensive underlying analytics technologies, Sallam said. Business can explore and experiment with different data types and different data volumes at a fraction of what it would cost to buy a enterprise data analytics platform, she said.
The real noteworthy aspect of BigQuery is not its underlying technology, but its potential to cut IT costs at large companies, she said. "It offers a much more cost effective way to analyze large sets of data," compared to traditional enterprise data platforms "It really has a potential to change the cost equation and allow companies to experiment with their big data," Sallam said.
Major vendors of business intelligence products, including SAS Institute, SAP, Oracle, Teradata and Hewlett-Packard Co., have been rushing to deliver tools that deliver improved data analytics capabilities. Like Google, most of these vendors see Hadoop platform mainly as a massive data store for preparing and staging multi-structured data for analysis by other tools.