Get started with PostgreSQL on Azure

Microsoft’s recent acquisition of Citus Data is Azure’s path to further enhancing its PostgreSQL database implementation

Get started with PostgreSQL on Azure
Thinkstock

Azures collection of open source data tools includes the popular PostgreSQL database. Postgres began its life in the mid-1980s as a follow-up to the University of California at Berkeley’s Ingres system. Now developed by an international consortium, it’s grown a lot over the decades, moving from its own query language to one based on SQL. That change led to its renaming as PostgreSQL in 1996. It has maintained a large academic following, with a lot of support in the machine learning community, as well as powering many popular consumer-facing services.

Its enduring popularity has meant that it’s become one of the standard on-premises databases for large applications, with integration with most common development platforms and with web content management tools like WordPress and Drupal. A wide set of use cases and a massive user base makes PostgreSQL an essential component of any public cloud service, enabling quick lift-and-shift of existing applications and content.

Managed PostgreSQL on Azure

Given that history, it’s not surprising that there’s a managed PostgreSQL among Azure’s many services. Like Azure’s MariaDB, it’s a way to use a familiar database without needing to worry about provisioning and managing your database servers; all you need to do is set up an instance, load up your data, and start writing code. If you’re using WordPress or a similar content management tool, you can install it from the Azure Marketplace, upload your data, and go.

There’s a lot of focus on lift-and-shift with tools like this, as organizations migrate from on-premises data centers to the cloud. While that’s still an important market for Microsoft, it’s one that’s going to decline as more and more applications either migrate to the cloud or are left running on-premises for regulatory and compliance reasons. By adding high-availability tools and scaling, Microsoft is looking to support new and existing born-in-the-cloud applications. PostgreSQL’s history as a tool for software-as-a-service (SaaS) providers lets Azure support them as well, either as additional capacity where necessary or as a complete replacement for existing data centers.

Creating PostgreSQL databases

Azure’s PostgreSQL is based on the Community Edition, so you’ll find it familiar. You can create a new database using the Azure Portal or the Azure CLI, because it’s a standard database option. Like all Azure services, you need to add your database to a resource group and define login usernames and passwords before Azure sets it up. You can choose a version if you need a specific set of features; otherwise it defaults to the latest major release.

Like many Azure services built on its infrastructure-as-a-service (IaaS) environment Azure’s PostgreSQL needs a server size. You can choose from a basic tier, a general-purpose tier, and a tier intended for in-memory operations. Each option lets you pick either fourth- or fifth-generation servers where possible, because some regions now only support fifth-generation systems.

Basic workloads are best used for test and development, or for small applications. Most applications instead will run off general purpose instances, with between two and 32 cores. As it’s a managed service, you’re able to add Azure-specific services including scaling with red replicas and using Azure Advanced Threat Protection to reduce security risks. Azure PostgreSQL is self-patching, so once you’re up and running you can leave Azure to manage the underlying virtual server and the PostgreSQL instance.

Creating a database takes a few minutes, ready for use with familiar tools and SDKs. Developers running local copies of PostgreSQL on Linux or WSL systems will be able to connect to Azure with psql; using it to create and populate tables, or to try out queries. Apps can be built using any language with PostgreSQL support, either through applications like Django or via SQL.

The next step for Azure: adding Citus Data

Microsoft recently acquired Citus Data, which develops a PostgreSQL extension that simplifies scaling out databases. It’s a way of building distributed databases across multiple nodes and sites, working through a coordinator that handles access to the different worker nodes.

Where things get interesting is how Citus Data works: as the coordinator node stores metadata about the data that’s then sharded across the various instances. You make queries against the coordinator, which then directs its own queries to workers before delivering responses to the querying application. Because the coordinator holds the worker data, it can produce optimized queries. So, a query for product data that’s been entered between 2005 and 2007 is only directed to the nodes that hold that data. By transforming queries automatically, Citus Data can manage load across shards, improving performance while still supporting standard SQL queries.

Microsoft plans to add Citus Data to its own managed PostgreSQL instances, using it to scale across Azure instances and using multiple databases in a single region. The current open source Citus release includes tools for migrating from existing single node databases to its multinode system, including an option that keeps your existing data in sync with a new Citus Data-based system while you migrate your application and data. Once you’ve finished migrating data, you can test against your new Citus-enabled PostgreSQL service before switching to using it live.

Scaling PostgreSQL in the public cloud

Because the coordinator node handles query translation, your code should work with a scale-out PostgreSQL system without needing any changes, Citus Data supports most popular programming languages, and it includes Ruby on Rails integration for building and managing multitenant systems, so you can use it to run your own SaaS applications. There’s no need to switch from relational to NoSQL, because Citus’s features hide the complexity of running and managing a distributed database behind a set of familiar APIs and queries.

Combining Azure’s managed PostgreSQL with Citus Data makes a lot of sense, especially if it can be automated as part of a managed service. Modifying Citus Data’s existing migration tools to work with live data should be possible, especially with Microsoft’s resources. Instead of having to manually add nodes and reconfigure tables, a future Azure PostgreSQL should be able to add new shards as required. All you’d need to do is decide if you want to optimize your database for performance or for scale.

Having PostgreSQL support in a public cloud is, to be honest, table stakes. Azure has to compete with Google Cloud Platform and Amazon Web Services, as well as services like Heroku. By adding Citus Data to its own expertise, Azure now has the opportunity to significantly differentiate its offering from its competitors.

Copyright © 2019 IDG Communications, Inc.