PostgreSQL is an open-source, object-relational (also called extended relational) database management system. Modern relational database features in PostgreSQL include complex queries, foreign keys, triggers, updatable views, transactional integrity, and multi-version concurrency control. Users can extend PostgreSQL with new data types, functions, operators, aggregate functions, index methods, and procedural languages.
With more than 20 years of development and deployment behind it, PostgreSQL is a solid open-source database that rivals even commercial relational databases in many respects. You can install it on Linux (all recent distributions), Windows (Windows 2000 SP4 and later), FreeBSD, OpenBSD, NetBSD, MacOS, AIX, HP/UX, and Solaris. You can also find a hosted high-performance version of PostgreSQL in Amazon Aurora, and a wire-compatible distributed implementation in CockroachDB.
From here on in, I will use the names PostgreSQL and Postgres (not all-caps) interchangeably, as is the general practice. I can pronounce Postgres; I never know whether to say Post-gres-Q-L or Post-gres-sequel.
In this tutorial, we will step through installing PostgreSQL, connecting to the database, loading some data, and running a variety of SQL queries. To cap it all off, we’ll execute a query against two related tables by using SQL join clauses. Along the way, I will suggest some GUI clients for Postgres you might want to use, and provide a brief overview of useful PostgreSQL extensions. Finally, I’ll leave you with a few good references to help you take your next steps with Postgres and SQL.
Install PostgreSQL
My InfoWorld colleague Serdar Yegulalp wrote a strong tutorial on installing and configuring PostgreSQL, “Get started with PostgreSQL 10.” I used that to inform my installation on an iMac, and wound up downloading and running the Mac-only Postgres.App shown in the screenshot below.
In addition to installing the latest production database version (10.5) with default configuration settings, I added the Postgres command-line utilities to my computer's path and installed both the cross-platform pgAdmin 4 and the Mac-only Postico GUI tools, as shown below.
It’s useful to add the command-line utilities for Postgres to your path.
pgAdmin 4 displays its UI in a web browser. The transactions being monitored in the postgres database are for maintenance purposes.
Postico is a commercial Postgres client product for MacOS. The free trial has limits, but it is usable and does not expire.
Since I have an active JetBrains subscription, I also installed DataGrip, a multi-database client tool. If I were doing this exercise on Windows, I might well use the visual SQL query builder and SQL command line in Alpha Anywhere.
Connect to a PostgreSQL database
The key things to know to connect to your Postgres database are its host, its assigned port, the database that you’d like to use initially, and your username and password. Several of these have well-known default values.
If you install Postgres on the same computer as the client, the host will be localhost. By default, there will be two visible databases, one named postgres and one using your user name. The port will be 5432 unless it was changed, typically at database startup.
The information above is independent of the client you use. For most purposes, the psql command line works just fine at the beginning. Later on, when you're trying to construct queries with complicated joins, you might want to use a graphical client, primarily to save yourself a lot of time and error-prone typing.
To connect with psql if your database uses the defaults and is on the computer you are using, make sure your database is running, and type psql <enter>
at the command line:
Martins-iMac:~ mheller$ psql
psql (10.5)
Type "help" for help.
If you instead see an error message to the effect that the command was not found, you may either add the Postgres command-line utilities to your path and restart your shell, or fully qualify the location of the executable.
Once you have psql running, if you type help <enter>
you’ll see something like this:
mheller=# help
You are using psql, the command-line interface to PostgreSQL.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit
mheller=#
The prompt tells you what database you are using. You can find out more with \conninfo
, and change the connection with \c
:
mheller=# \conninfo
You are connected to database "mheller" as user "mheller" via socket in "/tmp" at port "5432".
From the command help (\?):
\c[onnect] {[DBNAME|- USER|- HOST|- PORT|-] | conninfo}
connect to new database (currently "mheller")
Let’s try changing databases:
mheller=# \c postgres
You are now connected to database "postgres" as user "mheller".
postgres=#
And now let’s switch back, in my case to database mheller:
postgres=# \c mheller
You are now connected to database "mheller" as user "mheller".
mheller=#
Your default database and user names will differ from mine.
To execute SQL commands and queries, add a semicolon (;
) when the query is complete and then press <enter>. Here we create a new database, connect to it, connect back to our default database, and finally drop the new database:
mheller=# create database test;
CREATE DATABASE
mheller=# \c test
You are now connected to database "test" as user "mheller".
test=# \c mheller
You are now connected to database "mheller" as user "mheller".
mheller=# drop database test
mheller-# ;
DROP DATABASE
mheller=#
Database creates and drops can also be accomplished from the system shell. The relevant command-line utilities are createdb
and dropdb
.
Load data into a PostgreSQL database
Database design is an art, not a science, and takes considerable experience to get right. Even expert database designers often have to modify their designs over time to match the application data usage pattern, for example by adding indexes for common queries, or by adding foreign key constraints to enforce referential integrity.
Rather than taking weeks (or months) to learn database design, let’s load a stock database that’s already in Postgres format. I wanted to use one of the PgFoundry Sample Databases, but alas, that site is no longer active.
One alternative would be to use the three-table database from the PostgreSQL Exercises site; it’s simple enough, but I’ll leave that for you to work through later, since you can do all the exercises online, and I want you to use a database locally now. The SportsDB and DVD Rental (Pagila) samples are interesting, but a bit too complicated for a first go-round.
Instead, I want you to open https://github.com/pthom/northwind_psql and either clone or download and unpack the repository. Northwind was originally a Microsoft sample, but GitHub user pthom has adapted it for Postgres. I like Northwind mostly because I used to use it to teach SQL to developers, and became familiar with the data. The database diagram is as below:
Entity-relationship diagrams like this one come out of tools; this image looks a lot like it was generated by SQL Server Management Studio. Essentially, the rectangles are tables, the columns are shown within the rectangles, and the primary and foreign keys are marked with key icons. The connections between tables are the foreign key contraints.
You can load the database into PostgreSQL using the create_db
script, assuming that you are on a machine with a Bash shell. If you are running on Windows, try renaming create_db.sh
to create_db.cmd
and deleting the first line of the script.
Open a shell in the directory where you cloned or unpacked northwind_psql and run the script. A successful database load will look something like this:
Martins-iMac:northwind_psql mheller$ ./create_db.sh
NOTICE: database "northwind" does not exist, skipping
NOTICE: role "northwind_user" does not exist, skipping
SET
SET
SET
SET
SET
SET
SET
SET
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
…
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
CREATE ROLE
ALTER ROLE
GRANT
GRANT
Now we can try querying the database.
Run SQL queries in PostgreSQL
To begin with, let’s try looking at a single table. From a command line run psql
against the northwind
database. Your default username will probably work; if not, use the credentials user northwind_user
and password thewindisblowing
.
What are the regions defined in the database? The select
query should look like the following. I got the table name from the diagram above.
northwind=# select * from region;
region_id | region_description
-----------+--------------------
1 | Eastern
2 | Western
3 | Northern
4 | Southern
(4 rows)
That was easy. The asterisk notation says to return all fields in the table. Suppose we just wanted to see the region names in alphabetical order. That requires specifying the desired field name, the name of which we learned from the previous query, and adding an order by
clause. Ascending order is the default.
northwind=# select region_description from region order by region_description;
region_description
--------------------
Eastern
Northern
Southern
Western
(4 rows)
Now let’s look at the product categories:
northwind=# select * from categories;
category_id | category_name | description | picture
-------------+----------------+------------------------------------------------------------+---------
1 | Beverages | Soft drinks, coffees, teas, beers, and ales | \x
2 | Condiments | Sweet and savory sauces, relishes, spreads, and seasonings | \x
3 | Confections | Desserts, candies, and sweet breads | \x
4 | Dairy Products | Cheeses | \x
5 | Grains/Cereals | Breads, crackers, pasta, and cereal | \x
6 | Meat/Poultry | Prepared meats | \x
7 | Produce | Dried fruit and bean curd | \x
8 | Seafood | Seaweed and fish | \x
(8 rows)
And let’s look at the beverage products, which we just saw have category_id=1
. That requires a where
clause.
northwind=# select * from products where category_id=1;
product_id | product_name | supplier_id | category_id | quantity_per_unit | unit_price | units_in_stock | units_on_order | reorder_level | discontinued
------------+---------------------------+-------------+-------------+---------------------+------------+----------------+----------------+---------------+--------------
1 | Chai | 8 | 1 | 10 boxes x 30 bags | 18 | 39 | 0 | 10 | 1
2 | Chang | 1 | 1 | 24 - 12 oz bottles | 19 | 17 | 40 | 25 | 1
24 | Guaraná Fantástica | 10 | 1 | 12 - 355 ml cans | 4.5 | 20 | 0 | 0 | 1
34 | Sasquatch Ale | 16 | 1 | 24 - 12 oz bottles | 14 | 111 | 0 | 15 | 0
35 | Steeleye Stout | 16 | 1 | 24 - 12 oz bottles | 18 | 20 | 0 | 15 | 0
38 | Côte de Blaye | 18 | 1 | 12 - 75 cl bottles | 263.5 | 17 | 0 | 15 | 0
39 | Chartreuse verte | 18 | 1 | 750 cc per bottle | 18 | 69 | 0 | 5 | 0
43 | Ipoh Coffee | 20 | 1 | 16 - 500 g tins | 46 | 17 | 10 | 25 | 0
67 | Laughing Lumberjack Lager | 16 | 1 | 24 - 12 oz bottles | 14 | 52 | 0 | 10 | 0
70 | Outback Lager | 7 | 1 | 24 - 355 ml bottles | 15 | 15 | 10 | 30 | 0
75 | Rhönbräu Klosterbier | 12 | 1 | 24 - 0.5 l bottles | 7.75 | 125 | 0 | 25 | 0
76 | Lakkalikööri | 23 | 1 | 500 ml | 18 | 57 | 0 | 20 | 0
(12 rows)
There are a lot of fields in employees
, so let’s just look at a few of them. Here we are ordering by two fields, title
and last_name
.