BlazingSQL review: Fast ETL for GPU-based data science

BlazingSQL builds on RAPIDS to distribute SQL query execution across GPU clusters, delivering the ETL for an all-GPU data science workflow.

At a Glance
  • BlazingSQL V0.17

BlazingSQL is a GPU-accelerated SQL engine built on top of the RAPIDS ecosystem. BlazingSQL allows standard SQL queries to be distributed across GPU clusters, and the results to be fed directly into GPU-accelerated visualization and machine learning libraries. Basically, BlazingSQL provides the ETL portion of an all-GPU data science workflow.

RAPIDS is a suite of open source software libraries and APIs, incubated by Nvidia, that uses CUDA and is based on the Apache Arrow columnar memory format. CuDF, part of RAPIDS, is a Pandas-like DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data on GPUs.

For distributed SQL query execution, BlazingSQL draws on Dask, which is an open source tool that can scale Python packages to multiple machines. Dask can distribute data and computation over multiple GPUs, either in the same system or in a multi-node cluster. Dask integrates with RAPIDS cuDF, XGBoost, and RAPIDS cuML for GPU-accelerated data analytics and machine learning.

BlazingSQL is a SQL interface for cuDF, with various features to support large-scale data science workflows and enterprise datasets, including support for the dask-cudf library maintained by the RAPIDS project. BlazingSQL allows you to query data stored externally (such as in Amazon S3, Google Storage, or HDFS) using simple SQL; the results of your SQL queries are GPU DataFrames (GDFs), which are immediately accessible to any RAPIDS library for data science workloads.

The BlazingSQL code is an open source project released under the Apache 2.0 License. The BlazingSQL Notebooks site is a service using BlazingSQL, RAPIDS, and JupyterLab, built on AWS. It currently uses g4dn.xlarge instances and Nvidia T4 GPUs. There are plans to upgrade some of the larger BlazingSQL Notebooks cluster sizes to A100 GPUs in the future.

To continue reading this article register now

How to choose a low-code development platform