Reduce Time to Decision With the Databricks Lakehouse Platform and Latest Intel 3rd Gen Xeon Scalable Processors

With the Databricks Lakehouse Platform and latest Intel 3rd Gen Xeon Scalable processors, reduce time to decision by up to 3.0x price/performance benefits and 6.7x the speed up.

The Databricks Lakehouse Platform unifies the best of data lake’s openness, scalability and flexibility with the best of data warehouse’s reliability, governance, and performance. In this blog, we will look at performance aspects using Databricks Photon, which uses the latest techniques in vectorized query processing, and the latest Intel 3rd Gen Xeon scalable processors, which includes Intel Advanced Vector Extensions 512 (Intel® AVX-512).

Before we dive into the numbers, and the price/performance improvements, let’s take a moment to consider why these performance improvements are important. Consider this: as the volume of your data grows, and the requirement to deliver insights and take decisions quickly becomes important as a competitive advantage, the need to quickly process your data grows even faster.

While optimizing and refactoring queries or code could help speed up workloads, analysts should focus on functional intent and business questions rather than query optimization. How do you ensure that results improve over time?

When you choose the Databricks Lakehouse Platform, you are choosing a platform that, together with our partners, consistently pushes and delivers improvements to help deliver the best value to our customers.

To examine these benefits in action, we ran a test derived from the industry-standard TPC-DS power test2. We examined the results3 before and after enabling Photon and then switching to use latest Intel 3rd Gen Xeon Scalable processors:

Photon is the native vectorized query engine on Databricks, written to be directly compatible with Apache Spark APIs so it works with your existing code. When you enable Photon, your existing code and queries can take advantage of the latest techniques in vectorized query processing to capitalize on data – and instruction-level parallelism in CPUs. This allows Photon customers to get a lower TCO and faster SLA for ETL and interactive queries.

Intel 3rd Gen Xeon Scalable processor includes Intel’s latest generation of Single Instruction Multiple Data (SIMD) instruction set, Intel® AVX-512, which boosts performance and throughput for the most demanding computational tasks such as data analytics and machine learning.

Establishing a baseline

For the baseline, we are using Azure’s E8ds_v3 virtual machines, which have Intel 1st Gen Xeon Scalable processors, and Databricks runtime (DBR) 10.3 without Photon enabled. We ran TPC-DS benchmarks during March 2022 at both 1TB and 10TB scales on 20 worker clusters sizes.

20 x E8ds_v3 ( Intel 1st Gen Xeon Scalable processors) workers, DBR 10.3 without Photon enabled.

 

TPC-DS at 1TB

TPC-DS at 10TB

Time (s)

2,265

15,324

Total cost
(Databricks Premium + VM costs)

$14

$98

The Photon effect

We then ran the same workload without any code changes on the same machines with Photon enabled.

20 x E8ds_v3 ( Intel 1st Gen Xeon Scalable processors) workers, DBR 10.3 with Photon enabled.

 

TPC-DS at 1TB

TPC-DS at 10TB

Time (s)

645

4,482

Total cost
(Databricks Premium + VM costs)

$7

$52

That’s already yielded a 1.9x price-performance increase and a 3.4x performance speedup compared to the baseline.

Unleashing the full potential with Photon and Intel 3rd Gen Xeon Scalable processors

Again the same workload without any code changes, but this time using Azure’s E8_ds_v5 virtual machines, with Intel 3rd Gen Xeon Scalable processors, and Photon enabled

20 x E8ds_v5 (Intel 3rd Gen Xeon Scalable processors) workers, DBR 10.3 with Photon enabled.

 

TPC-DS at 1TB

TPC-DS at 10TB

Time (s)

334

2,271

Total cost
(Databricks Premium + VM costs)

$4.78

$32.47

That’s a 3x price-performance increase and a 6.7x performance speedup compared to our baseline.

Time for some graphs

data chart Intel
data chart Intel

Putting it all together

By enabling Databricks Photon and using Intel’s 3rd Gen Xeon Scalable processors, without making any code modifications, we were able to save ⅔ of the costs on our TPC-DS benchmark at 10TB and run 6.7 times quicker. This translates not only to cost savings but also reduced time-to-insight.

Learn more at

databricks.com/lakehouse
databricks.com/photon
intel.com/xeonscalable
intel.com/avx512

Footnotes

1 3.0x price/performance benefits and 6.7x the speed up – compared to the same TPC-DS 10TB benchmark with Intel 1st Gen Xeon processors with DBR 10.3 and without Photon enabled.

2 Derived from the power test consisting of all 99 TPC-DS queries ran in sequential order within a single stream.

3 The results shown are not comparable to an official, audited TPC benchmark.

Related:

Copyright © 2022 IDG Communications, Inc.