Inside IBM DB2 Viper

A technological marvel, IBM's new XML-powered server aims to change the face of database storage

The viper has struck.

IBM’s newly released DB2 9.1 (previously code-named “Viper”) sheds many of the limitations of DB2 8, boosting performance, scalability, and security. But one feature in particular, the hybrid XML/relational engine, gives this Big Blue serpent its distinctive shape. For customers plunging into the new era of XML data management, Viper’s innovations are tempting indeed.

Native XML databases have been around for a while, but they require special libraries and aren’t compatible with relational data. On the other hand, traditional relational databases have trouble dealing with hierarchical models and have only limited functionality in this area. So the major database vendors have been busy bolting XML capabilities onto their relational database products. IBM is no exception.

IBM’s technology outdoes its competitors, however, by preserving the native format of XML data. Five years in development, DB2’s brand-new storage engine, dubbed pureXML, has one foot planted squarely in the world of relational databases and the other in that of XML databases. Instead of storing the XML as a BLOB (binary large object) or parsing it into relational key/value pairs, pureXML stores the XML file itself, with all its properties and hierarchical structure preserved.

IBM is characterizing this revamped DB2 as something entirely new, a “hybrid data server” that could change the face of data storage as we know it. The exact implementation details are tightly under wraps. It’s up to you whether you prefer to think of DB2 with pureXML as a single database engine in two parts or as two separate engines that just work really closely together (see “Inside IBM’s Hybrid Database” graphic below). What is certain is this release does provide some interesting capabilities.

For starters, it gives you the ability to access XML data using SQL queries, just like ordinary relational tables. You can also use XQuery to access relational tables, in addition to XML. You can even use relational SQL to limit the range of data pulled back from XQuery expressions. DB2 allows almost continuous intermixing of the two languages.

The pureXML engine also provides more efficient indexing, because individual XML nodes aren’t stored merely as strings. According to IBM, customers who have already adopted the new engine have reported performance increases of approximately 5 to 7 times over what they were getting from Microsoft or Oracle.

In keeping with this focus on XML, IBM has supplied a number of new developer tools. The new Developer Workbench (which replaces the Development Center) offers a new XQuery builder as well as Visual Studio 2005 add-in enhancements.

Is this trip necessary?

The big question, of course, is How many customers will DB2’s hybrid capabilities entice? Analyst opinions are divided. At this point, I’m not sure even IBM knows what the exact implications of this new technology are, and if it does, it isn’t telling.

It’s certainly possible to imagine applications that take good advantage of a hybrid XML/relational data store. A clinical database, for example, might contain a relational patient table with all of the relevant information about a patient, plus a list of allergies stored as XML. This kind of record could be modeled relationally, but using XML is a good way to reduce the number of joins and ease development effort, because you no longer have to maintain relationships between patients and allergies. You could do something similar with orders and order details, where each order stores the line items as XML instead of the classic line-item table.

Despite these possibilities, the issue of mixing SQL and XQuery expressions is something else again. Switching back and forth between the two query styles becomes overly complex and difficult to read quickly. I suspect most developers will avoid writing these types of hybrid applications at all costs.

33FEdb2_in2.jpg
Click for larger view.

Regarding IBM’s optimizations for XML data, as with any performance increase, you have to ask yourself what it will mean to you and your shop. For tasks such as loading millions of rows into a database, a 7x improvement is a big deal, but for the casual insert statement it just isn't significant. Customers will most likely see improvements in two scenarios: when the database is being pounded by thousands upon thousands of XML inserts, and when the database is loading enormous XML files.

One very interesting feature of the pureXML engine is that it will preserve digital signatures of signed XML files. If you receive a digitally signed XML file, you can load it into the database, retrieve it at any time in the future, and the digital signature will still be intact. Microsoft and Oracle can’t do that; but then again, it isn’t a widespread requirement.

Thus, as cool as it may be, I can’t see pureXML significantly reducing TCO (total cost of ownership). So far, its coolness seems to be mostly technology for technology’s sake. Just because DB2 has some functionality doesn’t necessarily make it the best strategy.

Scaling new heights

Fortunately, DB2’s XML capabilities aren’t the only improvements in the new release. Far from it. Scalability is another area that IBM has given special attention.

For starters, by using a larger record identifier, DB2 9.1 allows admins to create temporary work tables for system and user queries that are much larger than was previously possible. The size of a single table has also been increased to a whopping 1.1 trillion rows or 16TB, whichever comes first. Of course, both of these are quite dangerous. Should you actually create objects this large you’re going to have severe performance problems. Still, if it’s a choice between doing it slow and not doing it at all, you’re better off with what DB2 gives you.

It’s like DB2’s query limit. DB2 allows queries up to 2MB long. So I decided to do an experiment. I pasted a query in Word until it reached 2MB, and the result was somewhere in the neighborhood of 64 pages. While I can’t imagine a single query that long, I suppose it’s useful to somebody. Likewise, if you foresee having more than a trillion rows in your tables, you’re in luck with DB2.

Statistical views are another scalability feature in DB2 9.1. Ordinarily, the query optimizer uses statistics to estimate cardinality of the data stored in tables, and this is one of the major factors used to determine an execution plan for a given query. Statistical views extend this capability to views as well, which means that now views are not only considered to be derived tables, but they’re actually treated more like tables. IBM isn’t the only one to consider blurring the line between views and tables, but it is the first to implement it.

Label-based access control

Oracle has had advanced, row-level access control since Version 8.1. DB2 9.1 is catching up, albeit with a slightly different method. (Unfortunately, SQL Server doesn’t have similar functionality at all, and I hear that it’s not even being considered for the Katmai release next year.) Although the traditional methods of controlling access to row-level data (views and stored procedures) are still in play, DB2’s LBAC (label-based access control) gives you much greater reassurance by preventing users from circumventing your security measures.

For instance, using traditional access control, you could create a view that specifies that a certain user can only see customers in a certain region, or only a certain column in the customers table. A user who knows the name of the base table, however, can just query the table instead of using the view. With LBAC, on the other hand, you can specify a security policy to control access to specific columns or even specific ranges of rows.

Other databases can achieve something like this level of security using the traditional methods. The difference is in the level of administration involved. With traditional methods, for example, you can create a stored procedure to give a group of users access to data. If another group comes along you can write another stored procedure to give them the access they need as well, and so on. You have to keep up with all of the procedures you’ve written and track which groups get access to each one, but the advantage is that one stored procedure doesn’t affect the other ones -- it doesn’t take much planning.

With LBAC, you really have to plan how you want the security policy to play out. If a group comes along that needs different access, you might need to redesign your whole policy to make it run well. But for the long haul, the LBAC approach requires much less ongoing administration, because all the work is done up front. You don’t have to worry about keeping up with stored procedure versions and the like. Each approach has its place, and it’s my guess that shops employing LBAC will find themselves using a combination of both methods.

Reduce your data

DB2’s new row-level compression is one of my favorite features. It’s actually table-level compression and it can result in a direct storage savings of 45 to 75 percent.

I performed an initial benchmark just to get an idea of the performance difference between two identical tables, one that was compressed and the other that was not. I found that the performance between the two was close, but the compressed table often actually performed better than the uncompressed table (presumably because more of its data fit in memory).

For my next test, I put my database on a server that had only 1GB of RAM, to see how a smaller business might take advantage of the compression. I found that the compressed table doesn’t perform nearly as well in this scenario as it did on my server with 4GB of RAM. Because DB2 keeps the compression dictionaries in memory, my guess is that the lower memory of the server is causing the compression dictionaries to be paged to disk when the server is busy.

Whatever the actual cause for the slowdown, if you’re going to use row-level compression, make sure you thoroughly benchmark it before you put it in your production environment. Even if you have a lot of RAM you might be surprised by poor performance due to any number of conditions, especially if you don’t have a dedicated database server. All the same, if I had to pick one feature that puts DB2 ahead of any of the other databases, this would definitely be the one, because it’s going to be far more useful to the largest portion of the client base. I would imagine that Oracle and Microsoft are both scrambling to be the next to bring this to market.

Time to jump ship?

The new DB2 is a technically impressive release. It’s loaded with features that are sure to please DB2 admins and developers alike. Whether those features will be compelling enough to convince a die-hard Oracle DBA to switch platforms, however, is unclear.

In the XML department, the pureXML engine hits a home run, but its significance in the business landscape remains to be seen. The scalability features, including larger temporary work areas and statistical views, make DB2 more attractive to the high-end market, but won’t necessarily appeal to smaller customers.

Viper’s groundbreaking row-level compression is where most customers will see the greatest returns. But although this feature will definitely reduce TCO for current DB2 shops, it simply isn’t enough to justify porting to a new platform. Likewise for the myriad other features, ranging from XML query enhancements to disaster recovery improvements -- they’re compelling but not revolutionary.

All in all, this is an excellent release for current DB2 customers, but in the highly competitive relational database market it takes a lot to win new converts. DB2’s new features certainly show off IBM’s engineering know-how, and they may be laying the groundwork for something yet to come, but as of now, the worth of many of these capabilities hasn’t hit the industry yet. Maybe over the next couple releases, as IBM and its customers start to build on these technologies and do things nobody else can do, the true payoff of the DB2 vision will emerge.

33FEdb2viperWkbnch_sc.gif
Click for larger view.
33FEdb2viperXMLvwr_sc1.gif
Click for larger view.

Correction:

In this article, we originally misreported IBM's XML performance claims. According to IBM, DB2 9 adopters report performance increases of approximately 5 to 7 times more than what they experienced with Microsoft SQL Server or Oracle Database. The errors have been corrected.

InfoWorld Scorecard
Value (10.0%)
Manageability (25.0%)
Performance (25.0%)
Scalability (20.0%)
Availability (20.0%)
Overall Score (100%)
IBM DB2 9.1 ("Viper") 8.0 8.0 9.0 9.0 9.0 8.7
From CIO: 8 Free Online Courses to Grow Your Tech Skills
Join the discussion
Be the first to comment on this article. Our Commenting Policies