SQL lessons for NoSQL developers

Because no software developer is complete without at least a passing knowledge of the world’s most popular query language

1 2 3 Page 2
Page 2 of 3

SQL conditions

In order to do some inventory planning, let’s get the product_name, number of units in stock, and units on order from the products table where the product is discontinued. Try

select product_name, units_in_stock, units_on_order from products where discontinued=1;
         product_name          | units_in_stock | units_on_order
-------------------------------+----------------+----------------
 Thüringer Rostbratwurst       |              0 |              0
 Chai                          |             39 |              0
 Chang                         |             17 |             40
 Chef Anton's Gumbo Mix        |              0 |              0
 Guaraná Fantástica            |             20 |              0
 Perth Pasties                 |              0 |              0
 Rössle Sauerkraut             |             26 |              0
 Mishi Kobe Niku               |             29 |              0
 Singaporean Hokkien Fried Mee |             26 |              0
 Alice Mutton                  |              0 |              0
(10 rows)

Next, let’s look at all the products that are discontinued but still have units on order. Why order a product that will no longer be carried? Unfortunately, Northwind Traders has discontinued carrying their Thai beer!

select product_name, units_in_stock, units_on_order from products where discontinued=1 and units_on_order >0;
 product_name | units_in_stock | units_on_order
--------------+----------------+----------------
 Chang        |             17 |             40
select product_name, units_in_stock, units_on_order from products where discontinued=1 and units_on_order !=0;

This last query has two conditions now; they both must be true. The discontinued field must be equal to 1 and units_on_order must be greater than 0. 

There are other operators that could be used for the same result such as not equals and greater and less than. For example select product_name, units_in_stock, units_on_order from products where discontinued>0 and units_on_order !=0;

Broken down this query is:

select product_name, units_in_stock, units_on_order from products where discontinued=1 and units_on_order >0;
  • discontinued=1 
  • units_on_order >0 
  • and – both conditions must be true

Conditions can also use or instead of and. So, the query below will pull back any discontinued items as well as any items that are on order (regardless of whether they are discontinued):

select product_name, units_in_stock, units_on_order, discontinued from products where discontinued=1 or units_on_order>0;
         product_name          | units_in_stock | units_on_order | discontinued
-------------------------------+----------------+----------------+--------------
 Thüringer Rostbratwurst       |              0 |              0 |            1
 Outback Lager                 |             15 |             10 |            0
 Scottish Longbreads           |              6 |             10 |            0
 Louisiana Hot Spiced Okra     |              4 |            100 |            0
 Chai                          |             39 |              0 |            1
 Gravad lax                    |             11 |             50 |            0
 Mascarpone Fabioli            |              9 |             40 |            0
 Longlife Tofu                 |              4 |             20 |            0
 Rogede sild                   |              5 |             70 |            0
 Chang                         |             17 |             40 |            1
 Queso Cabrales                |             22 |             30 |            0
 Aniseed Syrup                 |             13 |             70 |            0
 Wimmers gute Semmelknödel     |             22 |             80 |            0
 Chef Anton's Gumbo Mix        |              0 |              0 |            1
 Guaraná Fantástica            |             20 |              0 |            1

(26 rows)

This is roughly the same as the and query but now there are more rows because the query includes both discontinued products and anything on order regardless of whether it is discontinued or not.

Note that this is the same as doing two queries and joining them together (using the union operator) from a data perspective (though it might be less efficient). Broken down this query is:

select product_name, units_in_stock, units_on_order, discontinued from products where discontinued=1 OR units_on_order>0;
  • discontinued=1
  • units_on_order >0
  • or – either condition must be true

This query produces the same result as

select product_name from products where discontinued=1
union
select product_name from products where units_on_order>0;

The orders and order_details tables

To go further look at the orders table:

\d orders
                           Table "public.orders"
      Column      |         Type          | Collation | Nullable | Default
------------------+-----------------------+-----------+----------+---------
 order_id         | smallint              |           | not null |
 customer_id      | bpchar                |           |          |
 employee_id      | smallint              |           |          |
 order_date       | date                  |           |          |
 required_date    | date                  |           |          |
 shipped_date     | date                  |           |          |
 ship_via         | smallint              |           |          |
 freight          | real                  |           |          |
 ship_name        | character varying(40) |           |          |
 ship_address     | character varying(60) |           |          |
 ship_city        | character varying(15) |           |          |
 ship_region      | character varying(15) |           |          |
 ship_postal_code | character varying(10) |           |          |
 ship_country     | character varying(15) |           |          |
Indexes:
    "orders_pkey" PRIMARY KEY, lsm (order_id HASH)
Foreign-key constraints:
    "orders_customer_id_fkey" FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
    "orders_employee_id_fkey" FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
    "orders_ship_via_fkey" FOREIGN KEY (ship_via) REFERENCES shippers(shipper_id)
Referenced by:
    TABLE "order_details" CONSTRAINT "order_details_order_id_fkey" FOREIGN KEY (order_id) REFERENCES orders(order_id)

Also look at its related child table, the order_details table:

\d order_details
              Table "public.order_details"
   Column   |   Type   | Collation | Nullable | Default
------------+----------+-----------+----------+---------
 order_id   | smallint |           | not null |
 product_id | smallint |           | not null |
 unit_price | real     |           | not null |
 quantity   | smallint |           | not null |
 discount   | real     |           | not null |
Indexes:
    "order_details_pkey" PRIMARY KEY, lsm (order_id HASH, product_id)
Foreign-key constraints:
    "order_details_order_id_fkey" FOREIGN KEY (order_id) REFERENCES orders(order_id)
    "order_details_product_id_fkey" FOREIGN KEY (product_id) REFERENCES products(product_id)

SQL inner join

For a summary query, take some of the columns from orders, some from order_details, and make a projection, then join them together. However, there are a lot of rows in this result so include a condition specifying only orders shipped to Portland.

select o.order_id, o.customer_id, o.order_date, d.product_id, d.unit_price, d.quantity from orders o inner join order_details d on o.order_id=d.order_id where ship_city='Portland';
 order_id | customer_id | order_date | product_id | unit_price | quantity
----------+-------------+------------+------------+------------+----------
    10708 | THEBI       | 1997-10-17 |          5 |      21.35 |        4
    10708 | THEBI       | 1997-10-17 |         36 |         19 |        5
    10307 | LONEP       | 1996-09-17 |         62 |       39.4 |       10
    10307 | LONEP       | 1996-09-17 |         68 |         10 |        3
    10544 | LONEP       | 1997-05-21 |         28 |       45.6 |        7
    10544 | LONEP       | 1997-05-21 |         67 |         14 |        7
    10992 | THEBI       | 1998-04-01 |         72 |       34.8 |        2
    10805 | THEBI       | 1997-12-30 |         34 |         14 |       10
    10805 | THEBI       | 1997-12-30 |         38 |      263.5 |       10
    11018 | LONEP       | 1998-04-13 |         12 |         38 |       20
    11018 | LONEP       | 1998-04-13 |         18 |       62.5 |       10
    11018 | LONEP       | 1998-04-13 |         56 |         38 |        5
    10662 | LONEP       | 1997-09-09 |         68 |       12.5 |       10
    10665 | LONEP       | 1997-09-11 |         51 |         53 |       20
    10665 | LONEP       | 1997-09-11 |         59 |         55 |        1
    10665 | LONEP       | 1997-09-11 |         76 |         18 |       10
    10883 | LONEP       | 1998-02-12 |         24 |        4.5 |        8
    10317 | LONEP       | 1996-09-30 |          1 |       14.4 |       20
    10310 | THEBI       | 1996-09-20 |         16 |       13.9 |       10
    10310 | THEBI       | 1996-09-20 |         62 |       39.4 |        5
    10867 | LONEP       | 1998-02-03 |         53 |       32.8 |        3
(21 rows)

The inner join matches on the order_id column so every order ID that has one or more matching order IDs in order_details is included in the result set. Additionally, the query specifies “aliases” for the tables, so in the from clause the order table is aliased to o. In the inner join clause, the order_id columns from both tables are matched using the on subclause. Broken down this is:

select o.order_id, o.customer_id, o.order_date, d.product_id, d.unit_price, d.quantity from orders o inner join order_details d on o.order_id=d.order_id where ship_city='Portland';
  • o.order_id, d.product_id – order_id comes from the orders table, product_id comes from order_details. o and d are aliases.
  • from orders o – in the from clause alias orders to o
  • inner join order_details d – join order_details and alias it as d
  • on o.order_id=d.order_id – the field used to join on is the order_id “foreign key”

In the orders table, the order_id is guaranteed unique and it is called a primary key. In the order_details table it is a foreign key. When the same field in each table has the same value to match the records up it is called a “foreign key.”

Essentially, an inner join is an intersection or that part in the middle of the Venn diagram.

This query explicitly used an inner join clause, which I strongly recommend. However, some queries have two tables in the from clause and the where clause and they match the columns as an “implicit” join. It is not a big problem if there are two tables, but where clauses can get long. This can make finding the joins vs. conditions difficult, and having joins just randomly mixed with conditions is bad hygiene. However, when I learned SQL this was how you did an inner join.

select o.order_id, o.customer_id, o.order_date, d.product_id, d.unit_price, d.quantity from orders o, order_details d where o.order_id=d.order_id and ship_city='Portland';

This query returns the same result as the previous query.

SQL group by and sum

Next, we’ll do a group by — joining the orders table with the order_details table, but grouping by the order_id from the orders table. Here, we’re just trying to find how much revenue Northwind will receive per order!

select o.order_id, sum(d.unit_price * d.quantity) from orders o inner join order_details d on o.order_id=d.order_id where ship_city='Portland' group by o.order_id;
 order_id |       sum       
----------+------------------
    10307 | 424.000015258789
    10708 | 180.400001525879
    10992 | 69.5999984741211
    11018 |             1575
    10665 |             1295
    10867 | 98.3999977111816
    10317 | 287.999992370605
    10544 | 417.199989318848
    10662 |              125
    10310 | 336.000003814697
    10805 |             2775
    10883 |               36
(12 rows)

The sum function adds up all the columns for a “group.” In this case, the query also uses a mathematical operator to multiply price and quantity. The group by clause basically says, for each order, treat all of its line items in the order_details table as a group. Broken down this is:

select o.order_id, sum(d.unit_price * d.quantity) from orders o or inner join order_details d on o.order_id=d.order_id where ship_city='Portland' group by o.order_id;
  • sum(d.unit_price * d.quantity) – do unit_price * quantity on each row, and sum the results for the “group”
  • group by o.order_id; – for every unique o.order_id, group them together for calculations like sum
  • In this case for every order treat its line items as a “group,” so sum the totals for each order

SQL group by and count

In addition to sum there are other functions. For instance, you can count the line items. This query counts the order_details side of the join.

select o.order_id, count(d.order_id) from orders o inner join order_details d on o.order_id=d.order_id where ship_city='Portland' group by o.order_id;
 order_id | count
----------+-------
    10307 |     2
    10708 |     2
    10992 |     1
    11018 |     3
    10665 |     3
    10867 |     1
    10317 |     1
    10544 |     2
    10662 |     1
    10310 |     2
    10805 |     2
    10883 |     1
(12 rows)

You can validate the number by doing a select on one of the rows. So if you run a select query on the first order 10307 you will see that there are indeed two order_details records for the order.

northwind=# select * from order_details where order_id=10307;
 order_id | product_id | unit_price | quantity | discount
----------+------------+------------+----------+----------
    10307 |         62 |       39.4 |       10 |        0
    10307 |         68 |         10 |        3 |        0
(2 rows)

In addition to using count for a group by, you can count all of the rows in a table by just doing select count(1) or select count(*) from a table. Broken down this query is:

select o.order_id, count(d.order_id) from orders o inner join order_details d on o.order_id=d.order_id where ship_city='Portland' group by o.order_id;
  • count(d.order_id) – count the d.order_ids in the o.order_id groups (in essence count the line items)
  • Could have used a different field
  • Could also do select count(*) from orders; to get row count for the table
  • Could also do select count(1) from orders; (which will not be null)
  • Does not count nulls
1 2 3 Page 2
Page 2 of 3
How to choose a low-code development platform