If you were raised on MongoDB or learned full stack development in a coding bootcamp, then you might not know SQL. You might have even been told that SQL was bad. However, NoSQL databases are a bit like Hadoop — they had early promise but it fizzled. After 10 years of the NoSQL “revolution,” SQL databases remain the bulk of the database market.
There are several reasons for this. First, many applications require real transactional integrity, which NoSQL databases (despite their claims) do not offer. Second, the relational model is an incredibly useful way to represent data. Third, SQL is still the best-thought-out and most capable query language. Fourth, GraphQL and object-relational mapping (ORM) technologies made developer challenges with relational databases largely moot. And finally, we have the emergence of distributed SQL databases, sometimes called NewSQL databases. (Full disclosure: I work for Yugabyte, provider of an open source distributed SQL database.)
With the COVID-19 pandemic, transactional applications that never would have been trusted to a NoSQL database are rapidly moving to the cloud as they require more scalability and resilience. These are typically SQL database applications. So many developers who learned on document databases or other NoSQL databases now need to know SQL. Get started!
Choose a SQL database
Pick your favorite SQL database. If you pick a PostgreSQL-compatible database such as YugabyteDB the code samples should work without modification. If you pick MariaDB or another MySQL derivative, then you will probably have to change the data types and make minor modifications. The same can be said for Oracle Database or SQL Server. While SQL is a standard there are differences between the underlying database implementations and minor dialect choices that can be non-ecumenical. Regardless of your RDBMS choice, install it and get it running!
A note on file handles
This is not related to SQL, but really if you install any database you need to increase the file handle limits on your OS. For MacOS you can follow the instructions below. For Linux please consult your distribution’s documentation.
Run
launchctl limit maxfiles
And check to see if the limit is a high enough number. I recommend 1048576 for most databases.
If not then edit /etc/sysctl.conf and include the following:
kern.maxfiles=1048576
kern.maxproc=2500
kern.maxprocperuid=2500
kern.maxfilesperproc=1048576
Create or edit /Library/LaunchDaemons/limit.maxfiles.plist and insert the following:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>limit.maxfiles</string>
<key>ProgramArguments</key>
<array>
<string>launchctl</string>
<string>limit</string>
<string>maxfiles</string>
<string>1048576</string>
<string>1048576</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>ServiceIPC</key>
<false/>
</dict>
</plist>
Run
sudo launchctl unload -w /Library/LaunchDaemons/limit.maxfiles.plist
sudo launchctl load -w /Library/LaunchDaemons/limit.maxfiles.plist
In whatever terminal session is used to run the database this command needs to be run:
ulimit -n 1048576
Northwind sample database
In order to run some queries you need a database schema and some data. Grab the Northwind sample database:
- DDL script – defines the schema
- DML script – loads the data
This is basically the same kind of demo database that came with Microsoft Access back in the day, and that Microsoft still provides for SQL Server:
The Northwind database contains products, customers, orders, and order details among other things. It should be intuitive to anyone who has seen a business run.
To create and load the database on YugabyteDB type ./bin/ysqlsh
from your install directory. On PostgreSQL type ./bin/psql
. Then from this shell type:
CREATE DATABASE northwind;
\c northwind
\i northwind_ddl.sql
\i northwind_data.sql
This enters the shell, creates the database, and loads the data. Note the slash commands are specific to the shell of PostgreSQL and compatible database systems. They are not SQL.
If you are not a command-line junkie like me you can also use a tool like pgAdmin for either database.
The us_states table
In PostgreSQL-compatible databases, from the shell you type \dt
to list the tables in the database. The output should look be something like this:
List of relations
Schema | Name | Type | Owner
--------+------------------------+-------+----------
public | categories | table | yugabyte
public | customer_customer_demo | table | yugabyte
public | customer_demographics | table | yugabyte
public | customers | table | yugabyte
public | employee_territories | table | yugabyte
public | employees | table | yugabyte
public | order_details | table | yugabyte
public | orders | table | yugabyte
public | products | table | yugabyte
public | region | table | yugabyte
public | shippers | table | yugabyte
public | suppliers | table | yugabyte
public | territories | table | yugabyte
public | us_states | table | yugabyte
To view a table definition you type \d
and the name of the table, e.g. \d us_states
. In MySQL derivatives the comand is show columns from us_states
and in Oracle Database it is describe us_states
. Regardless of the database your output should be something like this:
Table "public.us_states"
Column | Type | Collation | Nullable | Default
--------------+------------------------+-----------+----------+---------
state_id | smallint | | not null |
state_name | character varying(100) | | |
state_abbr | character varying(2) | | |
state_region | character varying(50) | | |
Indexes:
"us_states_pkey" PRIMARY KEY, lsm (state_id HASH)
SQL select statement
From the shell type select * from us_states
. You will see something like:
select * from us_states;
state_id | state_name | state_abbr | state_region
----------+----------------------+------------+--------------
4 | Arkansas | AR | south
46 | Vermont | VT | east
29 | Nevada | NV | west
25 | Mississippi | MS | south
14 | Illinois | IL | midwest
23 | Michigan | MI | north
1 | Alabama | AL | south
47 | Virginia | VA | east
37 | Oklahoma | OK | midwest
13 | Idaho | ID | midwest
20 | Maine | ME | north
27 | Montana | MT | west
51 | Wyoming | WY | west
39 | Pennsylvania | PA | east
32 | New Mexico | NM | west
45 | Utah | UT | west
2 | Alaska | AK | north
7 | Connecticut | CT | east
11 | Georgia | GA | south
select
is what tells the database that we’re retrieving records. The *
means all columns. The from
clause specifies which tables should be used to retrieve records. In this case only one table is used, us_states. All SQL statements end with a semicolon.
Here is the command dissected into its constituent parts:
select * from us_states;
select
– retrieve records*
– all columnsfrom
– list of tablesus_states
– the table selected;
– end of statement
SQL projection with a where clause
Try a projection and a condition. This query retrieves all of the state abbreviations from the us_states table where the region is south — my favorite part of the US.
From the shell run:
select state_abbr from us_states where state_region='south';
state_abbr
------------
AR
MS
AL
GA
WV
LA
KY
FL
MO
(9 rows)
This time instead of signifying all columns with *
, the statement uses what is called a projection, which is just a list of columns. This query includes only the state_abbr field in the list. where
is how a list of conditions is specified — only records matching these conditions will be included. In this query, the condition is state_region equals the string literal south
, which is delineated by the single quotes.
Here are the constituent parts:
select state_abbr from us_states where state_region='south';
state_abbr
– a field from the tablewhere
– query conditionsstate_region='south'
– a condition, return all rows where state_region has the string literal valuesouth
.
Make sure the query has 'real' single quotes. "Double quotes" are used for something else and the stylized or ‘smart quotes’ are not interpreted as the same character.
The products table
For some of the next steps, a different table is required. Take a look at the products table:
\d products
Table "public.products"
Column | Type | Collation | Nullable | Default
-------------------+-----------------------+-----------+----------+---------
product_id | smallint | | not null |
product_name | character varying(40) | | not null |
supplier_id | smallint | | |
category_id | smallint | | |
quantity_per_unit | character varying(20) | | |
unit_price | real | | |
units_in_stock | smallint | | |
units_on_order | smallint | | |
reorder_level | smallint | | |
discontinued | integer | | not null |
Indexes:
"products_pkey" PRIMARY KEY, lsm (product_id HASH)
Foreign-key constraints:
"products_category_id_fkey" FOREIGN KEY (category_id) REFERENCES categories(category_id)
"products_supplier_id_fkey" FOREIGN KEY (supplier_id) REFERENCES suppliers(supplier_id)
Referenced by:
TABLE "order_details" CONSTRAINT "order_details_product_id_fkey" FOREIGN KEY (product_id) REFERENCES products(product_id)