Databases flex their XML

IBM, Microsoft, Oracle, and Sybase compete in our data management gymnastics

If you could do one thing to improve integration and automate processes with customers and business partners, it would be to implement XML, which has become the standard for exchanging information between disparate systems because it is easily transformed into any format. With very little effort, the same file can be sent to several different customers with their own specific needs. XML eases the development effort for the transmitting company and gives recipients a safety net for altering the way they use the data without having to alter how they receive it.

Being able to merge, query, and transform transmitted data with relational data is becoming as essential to businesses as data warehouses themselves. The good news is that the four leading relational databases, namely Oracle Database, IBM DB2, Sybase ASE (Adaptive Server Enterprise), and Microsoft SQL Server, not only can store XML data, but they hide much of the complexity of working with XML. Depending on which of these relational databases you use, however, the XML features you will have to work with may be extremely rich or limited in important ways.

What does a fashionable XML database provide? Four basic functions: the ability to consume, store, search, and generate XML. The extent to which the database supports these functions and the methods it uses to accomplish them are what make for a successful implementation of XML in a database.

I examined these four areas in Oracle Database 10g, IBM DB2 Universal Database V8.1, Sybase ASE 12.5.1, and Microsoft SQL Server 2000. I tested how they imported and read XML files, their options for saving the data, their indexing and query capabilities, and their options for creating XML and graded them based on the ease, flexibility, and speed with which they handled the most common XML operations.

Of course, these products have many other capabilities beyond handling XML. My grades should not be interpreted as complete evaluations.

Documents and Tables

Relational databases and XML documents are both powerful ways to represent relationships among data, but they're powerful in different ways. For example, querying on a patient ID number in a relational database may allow you to quickly find the dates a certain patient visited the hospital, the conditions he was diagnosed with, and the treatments he was given. But it likely won't help you determine which treatments were provided for which conditions or what times the treatments took place, nor will it give you other useful information that XML versions of these records could provide.

But whether or not you can combine the benefits of relational and XML data depends on how you store the XML. There are three methods for physically storing XML data in a relational database: shredded, unstructured, and structured. Shredded and unstructured are useful methods but limited. The structured method allows you to leverage the power of both relational data and XML hierarchies.

Shredding puts XML data into relational columns but strips it of its XMLness, meaning the hierarchical relationships among the data in the original XML document are lost. Shredding is useful when you're not concerned about keeping the data in XML format. For example, let's say you have a Web site that allows customers to place orders, and the order needs to go to a number of different database systems. Producing an XML file and having the different systems pick it up -- that is, shred it -- from a network share may be the most efficient and error-free way to get the data where you want it to go.

The unstructured method uses a data type called a CLOB (Character Large Object) to store an entire XML document as a single unit. Databases have been doing this for years with different types of documents, so this is nothing new. The unstructured method provides limited search capabilities, but it is still quite useful. You can't base queries on it, but the structure of the original data is preserved. A good use for unstructured XML storage would be in keeping original documents to comply with government regulations. For example, if a financial institution were to receive original loan documents in XML, this would allow them to have a relational record of each loan application, and also to store the original application with that record.

The structured method allows you to store XML data inside the database and preserve the hierarchy of the data. Structured storage, also known as "native XML" storage, is what every vendor is trying to achieve. The most obvious benefit of preserving the hierarchical relationships of XML data is being able to receive an XML document, combine it or manipulate it with relational data, and produce XML as a result. It isn't possible to produce such result sets with a relational query language alone.

Only Microsoft SQL Server 2000, among the four databases tested, does not support structured XML storage. And while all four, including Microsoft, allow you to reap the benefits of the shredded and unstructured methods, the approaches they take and flexibility they provide can differ considerably.

Oracle Sets the Curve

Oracle Database 10g breaks new ground in support for XML technology, offering very rich features for importing, storing, querying, and generating XML data. Providing native, structured XML storage as well as support for unstructured document storage and shredding, Oracle Database 10g allows you to pull XML data from files and merge it with relational data in views. But before jumping into an upgrade for the enhanced XML capabilities, Oracle shops should note that most of the functionality is available in Database 9i.

Oracle Database can read and write to WebDAV (Web Distributed Authoring and Versioning) repositories, giving users a nice Web folder view of the XML data they can request. The WebDAV support allows administrators to set up access to thousands upon thousands of XML files without requiring much disk space. This is because these WebDAV files, though they appear to be regular XML files, are just shortcuts, queries that point back to the database. The data that comprise these files doesn't materialize until it's requested by opening the file or copying it to a different location. Along with WebDAV, Oracle also supports transferring XML documents across FTP and HTTP.

Perhaps the most important improvement in Database 10g is XML schema evolution. Schema evolution allows you to implement changes to XML schema by mapping the existing data to the new schema. Instead of having to export and re-import all of your XML data, you just create the XSLT (XSL Transformation) style sheet to transform your old documents to the new schema, and the database takes care of the rest. Schema evolution greatly eases the management of XML data, because one thing administrators can count on is constantly changing requirements. 

Oracle Database 10g offers some superb options for storing XML data. The XMLType data type obscures much of the complexity of storing and querying XML data, allowing admins to manipulate the data using familiar SQL tools and concepts. For instance, you can create an XMLType table or an XMLType column in a relational table. Creating an XMLType table is how you implement the structured storage method and preserve your XML hierarchy. Creating an XMLType column in a relational table is how you implement the unstructured storage method, which is useful for storing entire XML documents for attachment to relational records.

And what can be done with a table can be done with a view. You can also use these tables and views in stored procedures and process data based on them, so there are some very advanced possibilities here.

How easy is managing XML data in Oracle? Assuming you already understand all things XML, including SQL/XML, XQuery, and DTD (Document Type Definition), XML processes in Oracle Database are very easy to manage. In fact, as in schema evolution, the database handles many data management processes on its own, with a minimum of admin intervention.

In my speed tests, which involved importing thousands of files, creating thousands of files, and importing very large documents, Database 10g proved very fast, beating all comers at imports and finishing a close second to IBM in the file-creation test. 

Big Blue, B Student

IBM is on the ball, too. DB2 implements XML as a collection of options called the XML Extender. Another key piece, the Information Integrator, is licensed separately from DB2. Although the XML Extender may sound like a separate program that sits outside the database and interacts with the relational engine, it is in fact just a collective name for the stored procedures and functions that provide the XML functionality inside DB2. These stored procedures and functions sit inside the relational engine like any other objects in the database and are accessible to any user who needs them. It's only slightly annoying that there's a separate install to get the Information Integrator functionality.

The Information Integrator sits outside the relational engine and extends the functionality of XML in DB2 by adding the XML Wrapper, which allows you to treat XML documents as relational sources. The XML Wrapper is a very powerful tool that could be useful in receiving transmitted XML files that you want to be able to query and possibly report on, but which you don't want to import into the database.

Like Oracle Database, DB2 allows you to store structured XML, unstructured documents, or shredded data. And like Oracle's XMLType, DB2's XML datatype allows you to obscure the details of XML data storage from both users and admins alike. Views can be created to merge relational data with XML data in a single result set. This same result set can be merged with XML documents on the file system or with other XML or relational data inside the database.

DB2 uses SQL/XML as the main language for creating XML documents from relational data; you use SQL/XML functions to create XML tags, attributes, and the like. SQL/XML also allows you to perform concatenations and aggregations while creating the document. DB2's SQL/XML support isn't complete, but it covers the major functions. If you are using DB2 today, you'll probably be able to do anything you want when creating XML documents.

XQuery, on the other hand, is completely unsupported by DB2. No doubt IBM is waiting for the W3C to solidify the specification before it implements the unfinished standard. Nevertheless, the absence of XQuery imposes significant limitations on DB2's XML search capabilities. I would much prefer to see a partial implementation than for it to be completely left out.

XQuery, which is a language used for crawling through structured data paths, offers significant query functionality beyond what SQL can do. Because XQuery is an identity-based query language, whereas SQL is value-based, you can use XQuery to find out whether a certain element exists; SQL can tell you whether an element that's known to exist holds a certain value. Unlike SQL, XQuery also has knowledge of attributes inside element tags, and it has stronger data types than SQL, providing you greater control over the type of data being written or queried.

XQuery notwithstanding, IBM's commitment to XML is quite apparent when you look at the improvements it has made in DB2 V8.1. Not only does V8.1 support XSLT for on-the-fly translation of XML data, there are more than 1,000 changes to SQL syntax for handling XML alone.

How easy is DB2's XML to manage? Very. The views manage the dynamic data underneath them and provide a layer of abstraction that clients need in order to query the database effectively. However, updating your XML schema will be a hassle. Because DB2 doesn't support schema evolution, you will have to physically map your old data to any new schema produced by changing requirements.

After a slow start, due to confusion over what I could and couldn't do without Information Integrator, my tests of DB2 went smoothly. DB2's development interface is very intuitive, stronger than Oracle's and even Sybase's. And despite the absence of XQuery, I found DB2 to have very powerful XML capabilities. It did everything I asked it to do and did it very quickly. 

Sybase Pulls a B, Too

I very much like the way Sybase has implemented XML in Sybase ASE. Native XML support is fully integrated with the relational database, and no external programs are necessary to get the functionality.

Sybase ASE has a native XML database that allows storage of shredded, structured, and unstructured XML data. Admins can import the XML data into the database or query XML files directly from the file system. Unfortunately, ASE doesn't support XQuery, so it suffers from the same limitations as DB2 when it comes to searching XML.

Like Oracle and DB2, ASE allows very rich indexing on XML data, whether shredded or structured. A self-defining indexing mechanism in ASE handles evolving XML formats and structures without requiring any user input. Element values are automatically indexed to speed up predicate searches (the XML equivalent of "select phone from customers where lastname = smith"), and different types of indexes (XML indexes or relational indexes) are chosen by the database depending on the type of query predicate. And queries can be predicated using relational elements, XML elements, or both.

1 2 Page 1
Page 1 of 2