Katmai, the code name for Microsoft's imminent SQL Server 2008 release, comes from an Alaskan territory know for volcanoes, which may not be the best symbol for a database. So far, however, Katmai hasn't blown up on me. And the lower-profile Katmai seems like a good follow-on to Yukon, the code name for the gigantic SQL Server 2005 release.
Building on the sweeping, enterprise-oriented improvements in SQL Server 2005 (see review), Katmai sports very nice new features for large deployments. Among the more touted attributes in the database engine are data and backup compression, sparse columns, and compressed and filtered indexes, all of which are geared to saving storage space, as well as Change Data Capture, which captures changes to production data in tables that can be used to update a data warehouse.
[ Compare Katmai to Oracle Database 11g. See "Lab test: Oracle Database 11g shoots the moon." ]
These are just the tip of the iceberg, or volcano, and there are of course many other new features – such as policy-based management – that will appeal to large and small shops alike. Every aspect of the product has been touched significantly.
More data, less storage
For starters, there are two types of data compression: row and page. They do, in fact, compress data in different ways, so it's important to understand the benefits of each, as well as how they work. Row compression is true compression, whereby the engine removes unused spaces at the ends of columns and, thus, saves space. This is the same technique SQL Server already uses for vardecimal compression; Microsoft has just expanded the use to other data types.
Page compression does what's known as dictionary compression, in that it normalizes the data on each page and keeps a lookup pointer. This is essentially the same trick used in Oracle Database 11g, which Oracle calls Oracle Advanced Compression. Without getting too much into the pros and cons of each, it's worth noting that SQL Server's page compression includes the lower-level row compression. In other words, if you have page compression turned on, you automatically get row compression.
Microsoft has included a couple of stored procedures to help you estimate both the level of savings you'll get with each method before you compress, and how much expansion will result if you uncompress the database later. This is an important and really thoughtful feature because you need to know not only if compression will be worth your time, but also if your disk can handle the uncompressed data should you need to revert. Just keep in mind that the procedures work on a small yet statistically significant random sampling of the data. You could get some bad estimates if the query happens to hit a poor representation of your data.
Plus, the way Microsoft implements compression spares more than storage resources. The data stays compressed in memory and only gets decompressed when read, meaning that you can fit more data pages into memory. This should save disk fetches, and the CPU it takes to decompress will be far less expensive than the disk seek would have been.
The sparse columns feature allows you to store null values without taking up any physical space. If you have a large table with a lot of null values in a column, you can waste ample disk space keeping track of those nulls. Storing nulls in sparse columns takes zero space, so your storage requirement goes way down.
One big caveat with sparse columns is that they don't work with compression. Frankly, this is a big mistake by Microsoft, and I hope the company does right by its users, pushing this fix into a service pack instead of waiting for the next release. In the meantime, if you have sparse columns defined on a table, don't expect to compress your data on it as well. I honestly don't know what Microsoft was thinking with this one, but this should never have gotten out the door. Sparse columns and compression are a perfect match; this one may deserve a Darwin Award.
Compressed indexes are just what they sound like: another opportunity to save storage space. Filtered indexes allow you to put a where clause (just like a query) on your index so that only a portion of your table is indexed. It may seem counterproductive, but there are several instances where you would want to filter an index. The perfect example is with sparse columns. Instead of keeping an index that contains mostly nulls, you would put an index on the sparse column where the value does not equal null. This way, only the rows with actual values will be indexed and the size of your index decreases significantly.
Good workload, bad workload
The Resource Governor is Microsoft's first real attempt at a resource governor in SQL Server. And honestly, it doesn't hold a candle to Oracle's. SQL Server 2008 allows you to define resource limits on memory and CPU, which is a good start, but as I said in my review of Oracle Database 11g, those metrics quite often aren't adequate to define a rogue workload.
In Microsoft's defense, the aim of Resource Governor isn't to define rogue queries just yet. In this first version, the goal is simply capping those resources for workloads to help keep them from becoming rogue processes. Of course, that still doesn't solve the problem of excessive disk usage or processing time. And there's no way to automatically move a process to a defined Resource Governor if it starts using too many resources. A process either belongs to a Resource Governor and has its resources capped, or it doesn't.
I think the biggest boon for this feature may be on OLTP (online transaction processing) systems where some light reporting may be necessary and you don't want it to take up too much of your server's resources. You can put the query processes into their own Resource Governor to cap their resource usage and keep the bulk of the server's power for the OLTP load that actually makes money.
Change Data Capture (CDC) is a very nice feature that I think will be very popular among DBAs grappling with ETL (Extract, Transformation, and Load) processes. CDC allows SQL Server to capture which rows and columns have changed on the defined columns and put the changes into a separate table that can be queried by ETL. The benefit is knowing -- without having to perform extensive queries -- which rows have been inserted, deleted, or updated. Currently, finding these operations in a table isn't easy, and you quite often have to write code into your process to mark these activities. But with CDC, you can define these audit policies at the database level and not have to make those drastic changes to your application code.
This release also brings Policy-Based Management (PBM), which is a way to define policies for any number of boxes that will be either enforced or alerted on when the server is out of policy. You can define almost anything for your policies, so even something like making sure that no tables begin with the prefix "tbl" would be one policy that you could enforce. You can make another policy that says all databases should be backed up every day, in order to be alerted if one of your servers misses a backup. PBM is going to be a very powerful tool for SQL Server going forward, and so far, I really like what I see.
Jewels of Katmai
There are far too many new features in Katmai to discuss in one place. I didn't even get a chance to touch the almost complete rewrite of SQL Server Reporting Services or all the work that was done in SQL Server Integration Services or SQL Server Analysis Services. And then there's the management data warehouse, interactive Dundas drill-down reports, IntelliSense, the new activity monitor, PowerShell integration, and much more.
For most SQL Server shops, I think the big news in this release is going to be data compression and the CDC, because both are going to affect shops where it counts: their budgets. The Resource Governor is a nice feature, but I think it's still too young and has too many limitations to make the splash Microsoft is hoping for. It will likely take another couple of releases before Microsoft turns it into something that can make a real difference in a lot of shops.
Compressed and filtered indexes are going to make a difference right away, and while caveats exist with filtered indexes, if you use them right, you'll see the benefits you expect. The tools have also seen some significant improvements, but there will still be disappointments for DBAs who don't want to be treated like developers. But as long as it took to release SQL Server 2005, this version is shaping up to be what SQL Server 2005 should have been from the beginning.
Having trouble installing and setting up Win10? You aren’t alone. Here are many of the most common...
Picking an Android phone can be difficult, but we're here to help. These are the top Android phones you...
Confidence in our power over machines also makes us guilty of hoping to bend reality to our code
Sponsored by Intel
Sponsored by Puppet
All Microsoft’s next-gen development stack needed was adequate tooling, APIs, libraries, and...
Microsoft expands R support and adds Python for developers who aren’t also data scientists
IT security professionals are in high demand in most job markets, but some metropolitan areas are...
The next version of the open source cloud-native monitoring system will feature a totally rewritten...