The viper has struck.
IBM’s newly released DB2 9.1 (previously code-named “Viper”) sheds many of the limitations of DB2 8, boosting performance, scalability, and security. But one feature in particular, the hybrid XML/relational engine, gives this Big Blue serpent its distinctive shape. For customers plunging into the new era of XML data management, Viper’s innovations are tempting indeed.
Native XML databases have been around for a while, but they require special libraries and aren’t compatible with relational data. On the other hand, traditional relational databases have trouble dealing with hierarchical models and have only limited functionality in this area. So the major database vendors have been busy bolting XML capabilities onto their relational database products. IBM is no exception.
IBM’s technology outdoes its competitors, however, by preserving the native format of XML data. Five years in development, DB2’s brand-new storage engine, dubbed pureXML, has one foot planted squarely in the world of relational databases and the other in that of XML databases. Instead of storing the XML as a BLOB (binary large object) or parsing it into relational key/value pairs, pureXML stores the XML file itself, with all its properties and hierarchical structure preserved.
IBM is characterizing this revamped DB2 as something entirely new, a “hybrid data server” that could change the face of data storage as we know it. The exact implementation details are tightly under wraps. It’s up to you whether you prefer to think of DB2 with pureXML as a single database engine in two parts or as two separate engines that just work really closely together (see “Inside IBM’s Hybrid Database” graphic below). What is certain is this release does provide some interesting capabilities.
For starters, it gives you the ability to access XML data using SQL queries, just like ordinary relational tables. You can also use XQuery to access relational tables, in addition to XML. You can even use relational SQL to limit the range of data pulled back from XQuery expressions. DB2 allows almost continuous intermixing of the two languages.
The pureXML engine also provides more efficient indexing, because individual XML nodes aren’t stored merely as strings. According to IBM, customers who have already adopted the new engine have reported performance increases of approximately 5 to 7 times over what they were getting from Microsoft or Oracle.
In keeping with this focus on XML, IBM has supplied a number of new developer tools. The new Developer Workbench (which replaces the Development Center) offers a new XQuery builder as well as Visual Studio 2005 add-in enhancements.
Is this trip necessary?
The big question, of course, is How many customers will DB2’s hybrid capabilities entice? Analyst opinions are divided. At this point, I’m not sure even IBM knows what the exact implications of this new technology are, and if it does, it isn’t telling.
It’s certainly possible to imagine applications that take good advantage of a hybrid XML/relational data store. A clinical database, for example, might contain a relational patient table with all of the relevant information about a patient, plus a list of allergies stored as XML. This kind of record could be modeled relationally, but using XML is a good way to reduce the number of joins and ease development effort, because you no longer have to maintain relationships between patients and allergies. You could do something similar with orders and order details, where each order stores the line items as XML instead of the classic line-item table.
Click for larger view. |
Regarding IBM’s optimizations for XML data, as with any performance increase, you have to ask yourself what it will mean to you and your shop. For tasks such as loading millions of rows into a database, a 7x improvement is a big deal, but for the casual insert statement it just isn't significant. Customers will most likely see improvements in two scenarios: when the database is being pounded by thousands upon thousands of XML inserts, and when the database is loading enormous XML files.
One very interesting feature of the pureXML engine is that it will preserve digital signatures of signed XML files. If you receive a digitally signed XML file, you can load it into the database, retrieve it at any time in the future, and the digital signature will still be intact. Microsoft and Oracle can’t do that; but then again, it isn’t a widespread requirement.
Thus, as cool as it may be, I can’t see pureXML significantly reducing TCO (total cost of ownership). So far, its coolness seems to be mostly technology for technology’s sake. Just because DB2 has some functionality doesn’t necessarily make it the best strategy.
Scaling new heights
Fortunately, DB2’s XML capabilities aren’t the only improvements in the new release. Far from it. Scalability is another area that IBM has given special attention.
For starters, by using a larger record identifier, DB2 9.1 allows admins to create temporary work tables for system and user queries that are much larger than was previously possible. The size of a single table has also been increased to a whopping 1.1 trillion rows or 16TB, whichever comes first. Of course, both of these are quite dangerous. Should you actually create objects this large you’re going to have severe performance problems. Still, if it’s a choice between doing it slow and not doing it at all, you’re better off with what DB2 gives you.
It’s like DB2’s query limit. DB2 allows queries up to 2MB long. So I decided to do an experiment. I pasted a query in Word until it reached 2MB, and the result was somewhere in the neighborhood of 64 pages. While I can’t imagine a single query that long, I suppose it’s useful to somebody. Likewise, if you foresee having more than a trillion rows in your tables, you’re in luck with DB2.
Sean McCown is a contributing editor of the InfoWorld Test Center.
Talkback
E-mail
Printer Friendly
Reprints




