Direct Data Accelerator: The F1 that powers Yellowbrick’s Cloud Data Warehouse

Yellowbrick’s Direct Data Accelerator technology shrinks and removes the bottlenecks in the data flow from storage through the CPU, across the network, and back to the SQL client.

Smiling biracial businessman holding video call with clients partners.

Growing up, I was a big Formula 1 fan. Whenever I saw the Ferrari duos–Schumacher and Barrichello–I was glued to my TV. But it’s not just the sheer speed that enthralls me. The number of hours put in by the drivers and much more by the crew, with the underlying passion for going half a secod faster and the absolute need for perfection, impresses me the most.

Check out the moves on this F1 pit stop–if this video doesn’t blow your spark plug, you probably need a service:

Like the F1, the Yellowbrick Cloud Data Warehouse is engineered for extreme efficiency and speed while delivering extreme cost savings. Yellowbrick’s data warehouse technology is highly differentiated from our competitors. We provide everything you would expect from a modern, elastic, SQL-based data warehouse that combines the simplicity of the cloud with the performance perfected through years of delivering the highest ROI to customers on-prem.

But how do we do it? First, we challenged some assumptions with data warehouse architecture and optimized the entire data path and OS process management. We call this Direct Data Accelerator Technology.

Direct Data Accelerator: The high-octane fuel of Yellowbrick Cloud Data Warehouse

Today’s servers are routinely available with over a terabyte of memory, over 100 CPU cores, and data acceleration capabilities. Yet, the algorithms used in data warehouses are still built with several assumptions around slower storage, network, and general Linux management.

Yellowbrick’s Direct Data Accelerator technology shrinks and removes the bottlenecks in the data flow from storage through the CPU, across the network, and back to the SQL client. This requires optimizing operations at a significantly lower level of technology stack than most cloud data warehouse providers would dare to tread.

Direct Data Accelerator Technology consists of three key components:

  1. A purpose-built OS for a cloud data warehouse.
  2. An optimized storage stack that uses Intel® Direct IO for faster data loading into the warehouse vCPUs from storage.
  3. An ultra-low latency network using Intel® DPDK for faster performance of expensive queries.
picture1 Intel

The result - Yellowbrick customers deliver a differentiated experience for their users at a significantly lower cost. For example, a B2B martech company could help see customers whose marketing campaigns and channels work in real-time when using Yellowbrick vs. AWS Redshift. They were able to cut down their ETL process by 31X from 9 hours to 17.5 minutes while increasing the ad hoc query performance by 400x from over 6 minutes to less than 1 second, all at 1/6th of their cloud cost from 8 RA3.4x Large nodes to 1 Small Yellowbrick node. 1

Re-envisioning the Cloud Data Warehouse Operating System

Most database platforms run on general-purpose OS built to support various workloads together. In traditional OS, a process comprises threads that execute. Yellowbrick has re-envisioned a single-purpose OS optimized for database workload efficiency, bypassing the OS for task scheduling, device interfaces, and memory management. Cooperative multitasking within and across distributed compute nodes ensures queries are answered faster.

Yellowbrick has a new threading model based on reactive concepts such as futures and co-routines. As a result, small, individual tasks–which do not have any stack associated with them–are scheduled and run to completion without preemptive context switching.

The collection of tasks is called a work that executes in a fully async, reactive manner. Works have their memory arenas, and all resource consumption of the work is bounded and isolated by the kernel.

Finally, the schedular is aware of works and tasks, and to avoid cache displacement, it will never try to intermix the execution of tasks from different works. For example, when database queries exchange large volumes of data (such as during the re-distributing of data for a large join), the schedular synchronizes the same work to run on peers in the cluster, guaranteeing that the received data is processed immediately.

Yellowbrick was conceived with the goal of optimizing price/performance. It uses a hybrid column and row store. A row store is optimized for low-commit latency operations such as real-time streaming ingest from Kafka or CDC tools. The column data is where most data in Yellowbrick resides.

Columnar databases are nothing new. What’s different with Yellowbrick is that it uses Intel’s AXV instructions to process columnar data. This leads to faster results, especially with large analytics queries.

Accelerating load times from database storage to data warehouse

In cloud data warehouse architecture, local storage is ephemeral. The only way to reliably persist data is by writing it to cost-effective object storage like S3, GCS, and ADLS gen2. Data is typically moved from object storage to compute instance’s (data warehouse) local storage, from local storage to main memory, and finally, from main memory (to cache) to CPU before a query can be processed.

An obvious downside to this approach is terrible latency between compute and storage instances which can dramatically impact a data warehouse’s performance. Large IO queue depths across many targets must be correctly pipelined to maximize bandwidth and IOPS. The client libraries from the cloud vendors are incompatible. All third-party libraries are incredibly inefficient, performing gratuitous data copying and dealing poorly with pipelining in a massive number of outstanding operations needed to drive high bandwidth.

Additionally, traditionally database platforms move data from disk storage to main memory and then start operating on the data based on the outdated assumption that it improves performance. This results in wasted CPU resources switching data in and out of the memory cache, which could be used for supporting critical calculations, not serving database internals. The problem becomes much worse with flash-based storage since the data is transferred at a higher speed to memory but also uses more system resources, leaving less for query processing.

picture2 Intel

Yellowbrick’s architecture optimizes this entire path and accelerates the load time. First, we developed a custom asynchronous user-space HTTP stack and object store library to reduce CPU consumption by 97% compared to Amazon’s library.1 Local NVMe drive is used as a cache for blocks on object storage to increase processing efficiency.

Secondly, we have architected our query engine to bypass the main memory by directly reading from the local NVMe storage and random reads at the memory transfer speed using Intel’s Direct IO technology. This results in huge efficiency savings, with more memory and CPU resources available for actual query data.

Reimaging network for expensive database queries

TCP/IP stack designed for general-purpose networking relies heavily on Linux kernel consuming expensive CPU resources for context switches and interrupts. For maximum efficiency, Yellowbrick contains a highly efficient communication framework called ybRPC, optimized for modern microservice-based software stacks.

ybRPC uses kernel-bypass by leveraging Intel’s DPDK library removing legacy network stack to move data across cloud compute instances without consuming CPU resources. This allows for expensive parts of database queries–such as re-distribution of data for joins, aggregates (GROUP BY), and sorting–to run 10x more efficiently than competing databases, using a fraction of the resources.1

Reduce  data warehouse costs with direct data accelerator

Cloud data warehouses such as Snowflake and Redshift have become crucial for modern enterprise analytics and applications by delivering an easy-button approach to data warehouses, which promotes simple consumption vs. cost stability. Yellowbrick provides the same simplicity but brings the performance perfected jointly with Intel over the years of delivering the highest ROI to customers on-prem.

Yellowbrick’s Direct Data Accelerator technology enables organizations to achieve dramatic performance gains and run more concurrent queries and workloads in a smaller cloud footprint in their cloud instance, thereby reducing cost and meeting sustainability goals.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

1 Tests and benchmark data performed by end-customer and results in your own environment may vary.

Copyright © 2023 IDG Communications, Inc.