Take this scenario, for instance: You're inserting data into a temp table and need it to display a certain value if another value exists. Maybe you're pulling from the Customer table and you want anyone with more than $100,000 in orders to be labeled as "Preferred." Thus, you insert the data into the table and run an
UPDATE statement to set the CustomerRank column to "Preferred" for anyone who has more than $100,000 in orders. The problem is that the
UPDATE statement is logged, which means it has to write twice for every single write to the table. The way around this, of course, is to use an inline
CASE statement in the SQL query itself. This tests every row for the order amount condition and sets the "Preferred" label before it's written to the table. The performance increase can be staggering.
2. Don't blindly reuse code
This issue is also very common. It's very easy to copy someone else's code because you know it pulls the data you need. The problem is that quite often it pulls much more data than you need, and developers rarely bother trimming it down, so they end up with a huge superset of data. This usually comes in the form of an extra outer join or an extra condition in the
WHERE clause. You can get huge performance gains if you trim reused code to your exact needs.
3. Do pull only the number of columns you need
This issue is similar to issue No. 2, but it's specific to columns. It's all too easy to code all your queries with
SELECT * instead of listing the columns individually. The problem again is that it pulls more data than you need. I've seen this error dozens and dozens of times. A developer does a
SELECT * query against a table with 120 columns and millions of rows, but winds up using only three to five of them. At that point, you're processing so much more data than you need it's a wonder the query returns at all. You're not only processing more data than you need, but you're also taking resources away from other processes.
4. Don't double-dip
Here's another one I've seen more times than I should have: A stored procedure is written to pull data from a table with hundreds of millions of rows. The developer needs customers who live in California and have incomes of more than $40,000. So he queries for customers that live in California and puts the results into a temp table; then he queries for customers with incomes above $40,000 and puts those results into another temp table. Finally, he joins both tables to get the final product.
Are you kidding me? This should be done in a single query; instead, you're double-dipping a superlarge table. Don't be a moron: Query large tables only once whenever possible -- you'll find how much better your procedures perform.
A slightly different scenario is when a subset of a large table is needed by several steps in a process, which causes the large table to be queried each time. Avoid this by querying for the subset and persisting it elsewhere, then pointing the subsequent steps to your smaller data set.
5. Do know when to use temp tables
This issue is a bit harder to get a handle on, but it can yield impressive gains. You can use temp tables in a number of situations, such as keeping you from double-dipping into large tables. You can also use them to greatly decrease the processing power required to join large tables. If you must join a table to a large table and there's a condition on that large table, you can improve performance by pulling out the subset of data you need from the large table into a temp table and joining with that instead. This is also helpful (again) if you have several queries in the procedure that have to make similar joins to the same table.