A first look at Azure Synapse

Microsoft’s new Azure data analytics platform is an evolution of Azure Data Warehouse

A first look at Azure Synapse
Thinkstock

Hyperscale cloud services such as Azure are designed to work with large amounts of data, taking advantage of their economies of scale when purchasing storage hardware. Their close relationship with search engines like Bing and Google allows them to build on the algorithms and tools developed to analyze the public internet. It’s a combination that makes them an ideal platform for building applications that need to process massive data sets, at a scale that would be prohibitive in your own data center.

Microsoft has offered a range of data and analytics services on Azure since its early days, starting with its own SQL database (which quickly became a cloud-hosted version of the familiar SQL Server), adding HDInsight for Hadoop and other Apache data services, and offering a large-scale data lake that lets you mix structured and unstructured data. Until recently most of these services have been stand-alone, and if you wanted to bring them together, you’d need to build your own analytics tooling. At Ignite 2019, Microsoft launched Azure’s existing SQL Data Warehouse as Azure Synapse, rearchitected and rebranded, adding support for Apache Spark and its own Studio development and analytics tools.

Introducing Azure Synapse

Azure Synapse is more than a rebranding of an existing product, with a focus on integrating much of Azure’s data analysis capabilities into a single service. Unlike traditional data warehouses, there’s support for mixed relational and unstructured data, while still allowing you to use existing SQL skills to build and test analytical models, building on Azure SQL’s PolyBase big data query engine. Because it uses column stores in memory, it’s fast and efficient, an important feature when you’re using a cloud service consumption model.

Where Synapse differs from other data warehouse products is its roots in Azure SQL’s hyperscale option. Instead of a single compute node handling all your queries, it uses a cluster of what Microsoft is calling “data warehouse units.” These separate query compute from the underlying storage and let Synapse take a massively data parallel approach to working with your queries. Each data warehouse unit has compute and a custom application, the Data Movement Service, that works across nodes and with Azure Storage to ensure that the right data is available in the right node. It’s certainly fast; a demo at Ignite compared it with Google’s Big Query on a 30 petabyte data set, and showed Synapse to be 75 times faster.

To continue reading this article register now