SQL illiteracy considered harmful (to your career)

SQL is more than just a database language

In the 1960s, Turing award winner Edsger Dijkstra wrote the heartfelt, "Go To Statement Considered Harmful" that inspired many other "XYZ Considered Harmful" imitators. The title was intended to challenge orthodox views on a topic. The following four posts is a lighthearted series of posts in that vein.

As more and more people in enterprises work with data directly in their day-to-day jobs, companies are wrestling with the right model for the appropriate level of training to achieve results. Does it suffice for people to understand how to interact with spreadsheets? Does it suffice for them to use visual analytics applications like Tableau? An important bar in data literacy is being able to read and write a precise question about data:

SELECT customer_name, count(purchase) AS num_purchases
FROM PurchasesTable
WHERE purchase_date >= DATE ‘2014-01-01’
GROUP BY customer_name
HAVING num_purchases < 10

This is a valid SQL statement that you can run on most any SQL-capable system like your database. The key element here is that it reads like structured English, yet it is precise enough for a computer to execute and produce a result like the table below:

If you understand SQL at this level, you already know the major concepts you will need for many analyses. For example, "PurchasesTable" is a table, i.e., a collection of records each of which has attributes like "customer_name" and "purchase". The SELECT and GROUP BY clauses tell the system what data to calculate, the WHERE and HAVING clauses say what records to exclude. Even fro this basic example, you can see that you might want "Gray" instead of "Jim Gray" or you might want purchases over a certain price, etc. All of those things can be readily expressed. As with any language, there are subtleties and important contextual elements, but the learning happens in the dialectic interaction. Armed with these concepts, you're already at the 70 percent point that will help you ask questions independently and reducing your dependence on intermediaries.

There are many excellent tools online for learning SQL (even YouTube videos!), but it's honestly about trying things out, getting your hands dirty, and finding value. At Google, the most effective tool I had in showing colleagues the potential of the language was showing them results using their own data, quickly. So I would recommend starting with a data set of interest to you, perhaps in an EDW or a departmental database. Of course, "there's no data like more data" -- therefore, you'll ultimately want to combine that data with more data that provides additional context or finer distinctions. This is an area of the technology that is evolving quickly that I will talk about in future posts.

Is SQL better than building a great Excel spreadsheet or a beautiful Tableau dashboard? It's not better, it's complementary. When concepts are read and written in prose, they are better understood, iterated upon, and improved. And that's the value of at least reading SQL, if not writing it. If your business logic is locked in a proprietary application controlled by a vendor, there are limitations to what you can do. If, on the other hand, you use a high level, standard language that's been around for 40 years, you have options for adapting your computation to the rapidly changing environment of today's data systems. 

Of course, you might ask why you haven't heard all of this before. It's really because most people think of SQL exclusively as a specialized database language. In fact, SQL can do a lot more, in broader contexts, which is tarting to be recognized. More on this as I discuss more topics in the data supply chain.

As always, I'd love to hear your feedback. Please feel free to leave comments below.

Related:

Copyright © 2015 IDG Communications, Inc.