Postgres everywhere

SQLite is embedded everywhere. Will Postgres follow suit?

SQLite is the world's most widely-deployed database engine. It’s in your phone, it’s in your browser, and if you search your computer you’ll find its .db files there too. SQLite was inspired by Postgres. Its author Richard Hipp has called SQLite a “conceptual fork” of Postgres. There’s no shared code, but Postgres was the north star to which he aligned SQLite. The two are complementary, he said, in the following ways.

Postgres SQLite
Enterprise data depot Application file format
Client/server Serverless
Scale up Scale down

Nowadays those distinctions have begun to blur. For example, SQLite is considered to be an embedded database. But Postgres is becoming one too. For example, we say that Steampipe embeds Postgres. That’s not technically true. You can’t link Postgres into a binary application, but you can (as Steampipe does) deliver a binary that installs, runs, and cooperates with Postgres. Or consider Yugabyte, which bolts the Postgres query layer onto a distributed storage layer. Not technically an embedding of Postgres, perhaps, but arguably the moral equivalent.

Steampipe and Yugabyte aren’t just compatible with Postgres; they actually are Postgres with extra capability (Steampipe’s foreign data wrappers for APIs, Yugabyte’s distributed storage). Users can connect to these products with psql, the interactive terminal for Postgres; they can write the same kinds of queries; they can use bundled or third-party extensions.

We may not see billions of Postgres deployments anytime soon, as is astonishingly true for SQLite, but your devices are more than capable of running Postgres and increasingly, for one reason or another, they will. What might all those instances of Postgres be doing?

Enhanced file systems

From a 2003 InfoWorld column, A tale of two Cairos:

Microsoft’s 2003 Professional Developers Conference (PDC) reminded some observers of the same event in 1993, when the hot topics were the Win32 APIs, a rough draft of Windows 95 code-named Chicago, and a preview of a futuristic object-file-system-based NT successor code-named Cairo. The hot topics this year were the WinFX managed APIs, a rough draft of a future version of NT code-named Longhorn, and ... Cairo. Now called WinFS, this vision of metadata-enriched storage and query-driven retrieval was, and is, compelling.

No prediction of the file system’s demise has stood the test of time, and I’m not making one now, but I will observe that the SQL databases of today are far better equipped than their ancestors to enrich the file system more comprehensively than SQLite does on a per-application basis. The once-futuristic idea of objects as first-class citizens of the database, for example, is now made real in the form of JSON columns. And Postgres in particular, thanks to its radically open extension APIs, not only unites relational data with JSON objects but also embraces key-value, fulltext, hierarchical, geospatial, time-series, and columnar data.

Between Richard Hipp’s two extremes — client/server database and application file format — there lies a middle ground: a local database. When you install Steampipe on your Windows or Mac computer, you’re also installing Postgres. For me, this has been an eye-opening experience. Steampipe bundles Postgres in order to query APIs, run compliance benchmarks, and visualize API-accessible data. But a local instance of the database can also handle all kinds of data management chores in a system-wide versus merely application-specific way.

The file system isn’t going away, but a local database can powerfully complement it. To do so ubiquitously, such a database has to be an open-source product. SQLite has shown the way. Were Postgres to follow suit it would be a fitting tribute to its protégé.

Data synchronization

In a world where Postgres is everywhere, instances will need to synchronize with other instances in many different ways. Postgres offers a wealth of mechanisms for doing that. When using the built-in streaming replication feature, a primary server transfers data synchronously to one or more standby receivers. Another built-in feature, log shipping, asynchronously transfers batches of log records from a primary to a standby.

As always, Postgres’s robust extension ecosystem augments the built-in capabilities. One third-party extension, pglogical, implements logical replication for non-Postgres publishers and subscribers such as Kafka and RabbitMQ. You can find a number of other solutions in this expanding category.

Meanwhile the bundled postgres_fdw extension leverages Postgres’s foreign data wrapper mechanism to connect local and remote tables for both read and write operations. One way or another a Postgres instance running on your devices, or in your personal and team clouds, will be able to sync with instances running elsewhere.

A language-neutral application runtime

The two dominant examples of language-neutral runtimes are the JVM (Java Virtual Machine) and the .NET runtime. Both enable primary programming languages — Java and C#, respectively — but both also enable other languages that use the common type systems and runtime services of these engines. For the JVM, notable languages include Groovy, Kotlin, Scala, and Clojure. For .NET, they include F# and VB.NET.

While not widely known or appreciated, Postgres also provides a common type system available to many languages. For example, here is a native Postgres function that returns a boolean result if its string argument matches one of an enumerated list of regular-expression patterns.

create function matches_humanities(course_name text) returns boolean as $$
  select
    string ~* any(array[
      'psych',
      'religio',
      'soci'
    ]);
$$ language sql;

=> select matches_humanities('Religion 202');

 matches_humanities
--------------------
 t
(1 row)

Here is that same function written in PL/Python, a Postgres extension that makes Python another way to implement Postgres functions.

create function matches_humanities(course_name text) returns boolean as $$
  import re
  regexes = [
    'psych',
    'religio',
    'soci'
  ]
  matches = [r for r in regexes if re.search(r, course_name, re.I)]
  return len(matches)
$$ language plpython3u;

The author of a SELECT statement that calls this function doesn’t know, or care, that the function is implemented in Postgres’s native language, or in PL/Python, or another of the available procedural languages.

Database programmers of a certain age are familiar with the use of stored procedures that run inside the database, at database speed, with direct access to data. With Postgres’s support for procedures and functions written in modern languages, this now-unfashionable technique deserves a second look. SQL has always been a hybrid language: a relational core augmented with a library of functions. When you can expand that library of functions using your language of choice — Python, JavaScript, R — Postgres starts to look like a different kind of application server.

The functions shown above, for example, form the set of regular expressions directly in code. A more robust version would source them from a database table. In the now-conventional approach that would happen in a separate process: connect to the database, ship a SQL query to it, unpack the result. When Postgres is the application server, you can simply rewrite the function to use a pair of tables.

select 
  *
from 
  course_names c
join
  humanities_patterns h
on c.course_name ~* h.pattern;

As before, the author of the query select matches_humanities('Religion 202') won’t know the difference.

Note too that functions needn’t only return simple values like the boolean in these examples. They can also be set-returning functions that yield tables you can query and join just like conventional tables, and that are optionally governed by language-neutral type definitions. This approach enables SQL to blend with other languages in ways that are otherwise impossible.

Data science

The data science literature is full of guides that show how to reimplement core features of SQL in languages, such as Python and R, that are popular among data scientists. A typical recipe for Python begins with importing data from a SQL table to a Pandas dataframe, the core construct at the heart of Python-flavored data science, then shows how to translate SQL idioms into corresponding Pandas idioms.

SQL Pandas
select * from airports airports
select * from airports limit 3 airports.head(3)
select id from airports where ident = 'KLAX' airports[airports.ident == 'KLAX'].id
select distinct type from airport airports.type.unique()
select * from airports where iso_region = 'US-CA' and type = 'seaplane_base' airports[(airports.iso_region == 'US-CA') & (airports.type == 'seaplane_base')]
select ident, name, municipality from airports where iso_region = 'US-CA' and type = 'large_airport' airports[(airports.iso_region == 'US-CA') & (airports.type == 'large_airport')][['ident', 'name', 'municipality']]

The same pattern appears often in the R literature: use an adapter to pull data from a SQL database into a language-specific table-like construct, then use language-specific idioms to replicate basic SQL idioms for querying it.

Why not use SQL data and standard SQL query idioms in situ? If Postgres lives on your local machine and in your personal and team clouds, and hosts Python or R, you can query with SQL and use those languages (and their extensive libraries!) for the special machine-learning and statistical powers they bring to the table.

A ubiquitous platform for working with data

I’ve shown elsewhere how Steampipe uses Postgres foreign data wrappers to unify access to APIs. It’s also a nice example of an embedded Postgres that runs on your Windows, Mac, or Linux computer, in a container, or in the cloud. The Steampipe instance of Postgres loads an extension that implements foreign data wrappers for APIs, provides a plugin manager to handle the growing suite of adapters that talk to APIs, and includes a dashboard server for visualizing queries of those APIs. And as an instance of Postgres it can also load other extensions that expand the powers of the database in all dimensions: data types, languages, synchronization, storage.

Steampipe is just one the signposts pointing to a world in which Postgres runs everywhere. We call Postgres a database, and of course it is one, but it’s also becoming a platform that delivers access to all kinds of data and brings modern styles of computation directly to that data.

Copyright © 2022 IDG Communications, Inc.

How to choose a low-code development platform