Stored procedures have a number of advantages. For starters, you’re pushing much less data across the network. If you have a long query, then it could take three or four round trips across the network to get the entire query to the database server. That’s not including the time it takes the server to put the query back together and run it, or considering that the query may run several—or several hundred—times a second.
Using a stored procedure will greatly reduce that traffic because the stored procedure call will always be much shorter. Also, stored procedures are easier to trace in Profiler or any other tool. A stored procedure is an actual object in your database. That means it’s much easier to get performance statistics on a stored procedure than on an ad-hoc query and, in turn, find performance issues and draw out anomalies.
In addition, stored procedures parameterize more consistently. This means you’re more likely to reuse your execution plans and even deal with caching issues, which can be difficult to pin down with ad-hoc queries. Stored procedures also make it much easier to deal with edge cases and even add auditing or change-locking behavior. A stored procedure can handle many tasks that trouble ad-hoc queries. My wife unraveled a two-page query from Entity Framework a couple of years ago. It took 25 minutes to run. When she boiled it down to its essence, she rewrote that huge query as SELECT COUNT(*)
from T1
. No kidding.
OK, I kept it as short as I could. Those are the high-level points. I know many .Net coders believe that business logic doesn’t belong in the database, but what can I say other than you’re outright wrong. By putting the business logic on the front end of the application, you have to bring all of the data across the wire merely to compare it. That’s not good performance. I had a client that kept all of the logic out of the database and did everything on the front end. The company was shipping hundreds of thousands of rows of data to the front end so it could apply the business logic and present the data it needed. It took 40 minutes to do that. I put a stored procedure on the back end and had it call from the front end; the page loaded in three seconds.
Of course, the truth is that sometimes the logic belongs on the front end and sometimes it belongs in the database. But ORMs always get me ranting.
13. Don’t do large ops on many tables in the same batch
This one seems obvious, but apparently it’s not. I’ll use another live example because it will drive home the point much better. I had a system that suffered tons of blocking. Dozens of operations were at a standstill. As it turned out, a delete routine that ran several times a day was deleting data out of 14 tables in an explicit trans- action. Handling all 14 tables in one transaction meant that the locks were held on every single table until all of the deletes were finished. The solution was to break up each table’s deletes into separate transactions so that each delete transaction held locks on only one table. This freed up the other tables and reduced the blocking and allowed other operations to continue working. You always want to split up large transactions like this into separate smaller ones to prevent blocking.
14. Don’t use triggers
This one is largely the same as the previous one, but it bears mentioning.
The problem with triggers: Whatever it is you want them to do will be done in the same transaction as the original operation. If you write a trigger to insert data into another table when you update a row in the Orders table, the lock will be held on both tables until the trigger is done. If you need to insert data into another table after the update, then put the update and the insert into a stored procedure and do them in separate transactions. If you need to roll back, you can do so easily without having to hold locks on both tables. As always, keep transactions as short as possible and don’t hold locks on more than one resource at a time if you can help it.
15. Don’t cluster on GUID
After all these years, I can’t believe we’re still fighting this issue. But I still run into clustered GUIDs at least twice a year.
A GUID (globally unique identifier) is a 16-byte randomly generated number. Ordering your table’s data on this column will cause your table to fragment much faster than using a steadily increasing value like DATE
or IDENTITY
. I did a benchmark a few years ago where I inserted a bunch of data into one table with a clustered GUID and into another table with an IDENTITY
column. The GUID table fragmented so severely that the performance degraded by several thou- sand percent in a mere 15 minutes. The IDENTITY
table lost only a few percent off performance after five hours. This applies to more than GUIDs—it goes toward any volatile column.
16. Don’t count rows if you only need to see if data exists
It’s a common situation. You need to see if data exists in a table or for a customer, and based on the results of that check, you’re going to perform some action. I can’t tell you how often I’ve seen someone do a SELECT COUNT(*) FROM dbo.T1
to check for the existence of that data:
SET @CT = (SELECT COUNT(*) FROM dbo.T1); If @CT > 0 BEGIN <Do something> END
It’s completely unnecessary. If you want to check for existence, do this:
If EXISTS (SELECT 1 FROM dbo.T1) BEGIN <Do something> END
Don’t count everything in the table. Just get back the first row you find. SQL Server is smart enough to use EXISTS
properly, and the second block of code returns superfast. The larger the table, the bigger difference this will make. Do the smart thing now before your data gets too big. It’s never too early to tune your database.
In fact, I just ran this example on one of my production databases against a table with 270 million rows. The first query took 15 seconds, and included 456,197 logical reads, while the second one returned in less than one second and included only five logical reads.
However, if you really do need a row count on the table, and it’s really big, another technique is to pull it from the system table. SELECT
rows from sysindexes
will get you the row counts for all of the indexes. And because the clustered index represents the data itself, you can get the table rows by adding WHERE indid = 1
. Then simply include the table name and you’re golden. So, the final query is
SELECT rows from sysindexes where object_name(id) = 'T1' and indexid = 1
In my 270-million-row table, this returned subsecond and had only six logical reads. Now that’s performance.
17. Don’t do negative searches
Take the simple query SELECT * FROM Customers WHERE RegionID <> 3
. You can’t use an index with this query because it’s a negative search that has to be compared row by row with a table scan. If you need to do some- thing like this, you may find it performs much better if you rewrite the query to use the index.
This query can easily be rewritten like this:
SELECT * FROM Customers WHERE RegionID < 3 UNION ALL SELECT * FROM Customers WHERE RegionID
This query will use an index, so if your data set is large it could greatly outperform the table scan version. Of course, nothing is ever that easy, right? It could also perform worse, so test this before you implement it. There are too many factors involved for me to tell you that it will work 100 percent of the time. Finally, I realize this query breaks the No. 4 rule, “No Double Dipping,” but that goes to show there are no hard-and-fast rules. Although we’re double dipping here, we’re doing it to avoid a costly table scan.
OK, there you go. You won’t be able to apply all of these tips all of the time, but if you keep them in mind you’ll find yourself using them as solutions to some of your biggest issues. The most important thing to remember is not to take anything I say as the gospel and implement it because I said so. Test everything in your environment, then test it again. The same solutions won’t work in every situation. But these are tactics I use all the time when addressing poor performance, and they have all served me well time and again.