Microsoft SQL Server 2008 is the best SQL Server yet
Microsoft's 'Katmai' gives both big shops and small shops reasons to make the upgradeFollow @infoworld
SQL Server's first attempt at Data Compression actually looks pretty good. As I noted in my beta preview (see "Microsoft's 'Katmai' is filled to the brim"), SQL Server 2008 provides two types of compression: row and page. Row compression is true compression, in which unused spaces at the ends of columns are removed to save storage. Page compression, aka dictionary compression, normalizes the data on each page and keeps a lookup pointer. In SQL Server 2008, page compression includes row compression. If you have page compression turned on, you get row compression in the bargain.
Microsoft provides a handy compression calculation wizard that will give you a good estimate of the benefits you can expect. The wizard runs a test compression scenario against your data for each compression type (row and page) and tells you what the new size of the table should be. I tested the compression calculator against a number of data sets, and on average the calculation deviated from my final results by only 1 or 2 percent. That's pretty good, considering that the calculation is based on a relatively small amount of data.
How much will a table compress? That depends on both the type of data and the type of compression. If your data is something like SAP data, which has a lot of trailing spaces, then row-level compression will serve you well. If you have a lot of repeating data, then table-level page compression will be your best friend.
The order of your data matters also. Because SQL Server compresses at the page level, repeating data that is clustered together will be de-duplicated. So it pays to be aware of both the nature and the order of your data.
All that said, just as with Advanced Compression in Oracle Database 11g (see the review and compression test results), my results with Data Compression in SQL Server 2008 were all over the map. I got as little as 17 percent compression on some data sets, and as much as 76 percent on another. The high of 76 percent was reached on denormalized order line data in a data warehouse.
When it comes to Data Compression, SQL Server has one advantage over Oracle: SQL Server will compress existing tables in a single statement. In Oracle Database 11g, you have to create a new table, insert the data, and then delete the old table. Not only does SQL Server spare you from creating an additional copy of the data, but you can still perform operations on the table during the compression operation, no downtime required.
An excellent new feature called Sparse Columns allows you to store null values without taking up any physical space. But here Microsoft made a big mistake: Sparse Columns aren't compatible with Data Compression. If you define Sparse Columns on a table, you can't also apply Data Compression. As I mentioned in my preview, this one may be worthy of a Darwin Award. Sparse Columns and Data Compression would be a perfect match. Let's hope Microsoft fixes this soon.