Deploying an update of its DB2 database, IBM is pitching its SmartCloud IaaS (infrastructure as a service) for use in data reporting and analysis.
"We're the only player in the marketplace that has [a cloud service] for data-in-motion -- being able to analyze data in real time," said Bob Picciano, IBM's general manager for Information Management.
[ Andrew C. Oliver answers the question on everyone's mind: Which freaking database should I use? | Keep up with the latest approaches to managing information overload and compliance in InfoWorld's Enterprise Data Explosion Digital Spotlight. ]
Starting in the second half of this year, the IBM SmartCloud IaaS will start using version 10.5 of IBM's DB2 database, which should be generally available by early June. One new set of technologies that will come with this database, collectively called BLU Acceleration, can speed data analysis by 25 times or more, IBM claimed.
IBM also announced that SmartCloud can now run copies of SAP's HANA in-memory database, initially for test and development jobs.
BLU (a code name that stood for Big data, Lightening fast, Ultra easy) is a set of technologies that optimize the speed of DB2 queries for in-memory databases. It offers columnar processing, in which only the needed columns of a database table are read in order to speed performance. It skips over unneeded data, such as duplicate entries. It can use multiple processor cores to execute a query, using parallel vector processing techniques. It also includes a compression technology that minimizes the amount of space a data set needs, while still providing easy accessibility for quick analysis.
A customer can continue to use preferred BI (business intelligence) software and just redirect the SQL and OLAP (Online analytical processing) queries from the BI software to the IBM service. Each DB2 instance on SmartCloud can run on up to 16 processor cores, which collectively could manage a terabyte or more of memory.
"You don't need a lot of cores to be able to take advantage of this kind of parallelism, because we're so efficient in how we're using the processors and memory," Picciano said.
With this technology, a customer could set up a system that would analyze data as it comes off the wire, Picciano said. Typically, organizations may store their data in a data warehouse or, if data is collected in sufficient quantities, in a Hadoop cluster. Then they would use various BI tools to extract patterns and other intelligence from the data set. IBM proposes keeping the data in-memory and analyzing it on the fly, using the same BI tools. The in-memory approach can cut the query times by a thousand times, Picciano said.
"Queries that used to run in seven minutes on a well-known data warehouse systems can run in 8 milliseconds on our systems," Picciano claimed.
IBM joins a number of companies that are pitching cloud computing as an ideal platform for large-scale data analysis. Google, for instance, offers its BigQuery service for crunching data online. VMware, along with parent company EMC, recently spun out a cloud service subsidiary, called Pivotal, that will focus on a providing data analysis services.