Micromanaging computation considered harmful

SQL isn't just a database language; it's also management by objective

In people management, it's generally accepted that micromanaging a person is often a bad idea. Micromanagement removes the opportunity for the individual to find a better path than originally conceived. It also often focuses on the specific steps of a process, but unfortunately removes the context that would help someone understand how to improve the process.

So, here's an analogy that highlights the key difference between what "imperative" languages like Java or Python and "declarative" languages like SQL do to your computation. In Python, say, you specify step-by-step what the computer should do: open the file; read the first line; if the line doesn't match some requirement, skip it; update the counter; read the next line; update the counter again; if the counter exceeds some value, stop; if the end of file is reached, close the file; return the counter. Code often accumulates like this and builds up into complex business rules that are usually poorly understood. 

In SQL (see my post on SQL illiteracy), authors express their objective in English-like form:

SELECT customer, count(purchases) AS num_purchases

FROM PurchasesTable

WHERE purchase_date >= DATE ‘2014-01-01’

GROUP BY customer

HAVING num_purchases > 10

Notice that the SQL statement does not reference details like opening the file, updating counters, etc. It just states the author's intent of counting items by customer. More technical people often really enjoy the fine level of control. A much wider range of people prefer expressing their intent and leaving implementation to the system. The database or other system that runs the query takes care of the details under the covers. 

Is this analogy accurate? Can computers really find better ways to do this? Not always, but they often can -- only if you express what you're intending to achieve at a high enough level. It works by searching through many of the equivalent ways of asking the same question. Trying all the different ways to say something is not ideal for humans, but good for computers. That's what a database query optimizer is charged with. Skilled humans can often find better re-writes, but it's a slow and laborious endeavor. In addition, once they've done that, they bake in the specifics of the intent. If the intent changes, then the manual optimization must start again. 

In a business context, taking the step to at least read and maybe even write SQL is an empowering moment. It requires some effort, but as with most things, there is no royal road to understanding. It's like learning to read what's really happening in your enterprise. If other people tell you, then you can have an impression; if you can ask for yourself, then you really understand. 

In a world of distributed computing and evolving storage systems, expressing your business objective in high level, English-like SQL also makes it possible for distributed compute engines to use more than one computer to answer your query. With more traditional imperative programs, that step often requires some adaptation of the program. Similarly, if you change your underlying storage systems, then lower level expression will also often lead to computation changes. This all translates to dollars and cents. The more complex an operation is to change, the more expensive it is to make changes to how a business operates. This reduces agility in obvious ways.

In a world of regulation and increasing interdependencies between organizations, expressing intent independently of implementation means that you can avoid a class of unintended consequences of systems building. For one, it's much easier to understand why certain pieces of data were touched and what computation was done with them. For another, it's clearer to do those things automatically rather than through a manual audit process. The world of continuous, computational audit is just around the corner. 

You might ask why you haven't heard all of this before. It's because most people think of SQL as a database language only. But this is rapidly changing as SQL is used to access novel big data systems like Hadoop, NoSQL, and others. Of course, there is such a thing as pushing the abstraction too high, where it starts to hide the ability to reason about the computation (see my post on visual ETL). In fact, it can do a lot more. 

As always, we'd love to hear your feedback on this. Please feel free to ping by email or leave comments below.

In the 1960s, Edsger Dijkstra wrote the heartfelt "Go To Statement Considered Harmful" that inspired many other "XYZ Considered Harmful." The title was intended to challenge orthodox views on a topic. This is a lighthearted series of posts in that vein.


Copyright © 2015 IDG Communications, Inc.