When you absolutely, positively need to crunch numbers as quickly as possible, you turn to a GPU. Small wonder that most math-intensive applications, including some machine learning frameworks, draw on GPUs to parallelize and accelerate their calculations.
But GPUs are now souping up databases as well. The two pair up nicely: GPUs are adept at whipping through calculations at scale, and databases often have exactly those kinds of demands -- for instance, when performing complex joins or row-by-row math.
Here are five offerings, three commercial and two open source, that offer database solutions with GPU acceleration as part of the package.
A startup that recently whipped back the drapes on its premiere offering, MapD compiles SQL queries to native GPU code with the LLVM compiler framework. It can also use the CPU as a fallback if needed.
But another big source of acceleration, according to the company, is each GPU's local memory store, which is used as a data cache that operates many times faster than the CPU cache or main memory itself. MapD claims its GPU-powered setup is orders of magnitude faster than in-memory databases and Hadoop setups alike. But take those numbers with a pinch of salt, as the timings are based on using ultra-high-end and ultra-expensive Nvidia Kepler K80 GPUs.
Formerly known as GPUdb, the company's old name should be a hint that it makes GPU-powered database solutions. The latest version of its database product, also named Kinetica, not only uses GPU acceleration generally, it exploits acceleration features specific to Nvidia's GPU stack -- such as, Nvidia's NVlink technology, which accelerates data transfers between GPUs (and between GPUs and CPUs) to avoid bottlenecks on the PCIe bus.
But Kinetica is also trying to make sure enterprises see this as a modern enterprise database product, not only a showcase for cutting-edge tech. Hence, there's support for standard commercial database features like SQL-92 queries, clustering, failover, and one-click installation.
BlazingDB is a GPU-powered database aimed specifically at companies using PostgreSQL, MySQL, or Amazon Redshift. BlazingDB's creators claim massive speed improvements over all of those products.
Another key difference is that BlazingDB offers both local and cloud-hosted instances of its product. If you already have data in Amazon or Azure, you can spin up a BlazingDB instance next to it, pipe in your data, and compare query performance yourself.
The company began offering a commercial version of its product back in June, as well as a free community edition. Note that you need the Nvidia CUDA driver for Linux, and the only supported platform right now is Ubuntu 14.04.
Not all databases are general-purpose SQL systems; some are optimized for specific kinds of data-manipulation jobs. Graph databases, for instance, analyze relationships between objects and report back on them.
Those kinds of databases are also amenable to GPU speedups. Behold Blazegraph, an open source graph database written in Java, with two methods of GPU acceleration. The most basic one is to simply apply GPU acceleration to existing graph-analysis jobs, which Blazegraph's creators claim will provide a speed boost of 200 to 300 times over a CPU-bound job.
Option No. 2 is to rewrite the job in Blazegraph DASL, a language designed for parallel execution on GPUs. "By combining the ease of Spark with the speed of CUDA and GPUs," Blazegraph's creators claim, "their applications can operate up to 1,000x faster than Spark without GPUs."
Popular open source database PostgreSQL has a lot of selling points: It's highly scalable, sports NoSQL/JSON-style document storage functions, and has stayed current with state-of-the-art additions to database technology.
One feature it doesn't have out of the box is GPU acceleration. However, it's possible to add GPU acceleration via a side project named PG-Strom. When a given query is optimized, PG-Strom determines if it can be offloaded to the GPU; if so, it builds a GPU-optimized version of that query with a just-in-time compiler. The resulting query is then offloaded to the GPU and parallelized.
Setting up PG-Strom takes some work, as it requires Nvidia's CUDA toolkit and needs to be compiled from source. But once integrated into PostgreSQL as a custom scan provider, it works with queries as-is; they don't need to be rewritten to be GPU-accelerated.
[An earlier version of this article incorrectly identified MapD as MapDB.]