If you could do one thing to improve integration and automate processes with customers and business partners, it would be to implement XML, which has become the standard for exchanging information between disparate systems because it is easily transformed into any format. With very little effort, the same file can be sent to several different customers with their own specific needs. XML eases the development effort for the transmitting company and gives recipients a safety net for altering the way they use the data without having to alter how they receive it.
Being able to merge, query, and transform transmitted data with relational data is becoming as essential to businesses as data warehouses themselves. The good news is that the four leading relational databases, namely Oracle Database, IBM DB2, Sybase ASE (Adaptive Server Enterprise), and Microsoft SQL Server, not only can store XML data, but they hide much of the complexity of working with XML. Depending on which of these relational databases you use, however, the XML features you will have to work with may be extremely rich or limited in important ways.
What does a fashionable XML database provide? Four basic functions: the ability to consume, store, search, and generate XML. The extent to which the database supports these functions and the methods it uses to accomplish them are what make for a successful implementation of XML in a database.
I examined these four areas in Oracle Database 10g, IBM DB2 Universal Database V8.1, Sybase ASE 12.5.1, and Microsoft SQL Server 2000. I tested how they imported and read XML files, their options for saving the data, their indexing and query capabilities, and their options for creating XML and graded them based on the ease, flexibility, and speed with which they handled the most common XML operations.
Of course, these products have many other capabilities beyond handling XML. My grades should not be interpreted as complete evaluations.
Documents and Tables
Relational databases and XML documents are both powerful ways to represent relationships among data, but they're powerful in different ways. For example, querying on a patient ID number in a relational database may allow you to quickly find the dates a certain patient visited the hospital, the conditions he was diagnosed with, and the treatments he was given. But it likely won't help you determine which treatments were provided for which conditions or what times the treatments took place, nor will it give you other useful information that XML versions of these records could provide.
But whether or not you can combine the benefits of relational and XML data depends on how you store the XML. There are three methods for physically storing XML data in a relational database: shredded, unstructured, and structured. Shredded and unstructured are useful methods but limited. The structured method allows you to leverage the power of both relational data and XML hierarchies.
Shredding puts XML data into relational columns but strips it of its XMLness, meaning the hierarchical relationships among the data in the original XML document are lost. Shredding is useful when you're not concerned about keeping the data in XML format. For example, let's say you have a Web site that allows customers to place orders, and the order needs to go to a number of different database systems. Producing an XML file and having the different systems pick it up -- that is, shred it -- from a network share may be the most efficient and error-free way to get the data where you want it to go.
The unstructured method uses a data type called a CLOB (Character Large Object) to store an entire XML document as a single unit. Databases have been doing this for years with different types of documents, so this is nothing new. The unstructured method provides limited search capabilities, but it is still quite useful. You can't base queries on it, but the structure of the original data is preserved. A good use for unstructured XML storage would be in keeping original documents to comply with government regulations. For example, if a financial institution were to receive original loan documents in XML, this would allow them to have a relational record of each loan application, and also to store the original application with that record.
The structured method allows you to store XML data inside the database and preserve the hierarchy of the data. Structured storage, also known as "native XML" storage, is what every vendor is trying to achieve. The most obvious benefit of preserving the hierarchical relationships of XML data is being able to receive an XML document, combine it or manipulate it with relational data, and produce XML as a result. It isn't possible to produce such result sets with a relational query language alone.
Oracle Sets the Curve
Oracle Database 10g breaks new ground in support for XML technology, offering very rich features for importing, storing, querying, and generating XML data. Providing native, structured XML storage as well as support for unstructured document storage and shredding, Oracle Database 10g allows you to pull XML data from files and merge it with relational data in views. But before jumping into an upgrade for the enhanced XML capabilities, Oracle shops should note that most of the functionality is available in Database 9i.
Oracle Database can read and write to WebDAV (Web Distributed Authoring and Versioning) repositories, giving users a nice Web folder view of the XML data they can request. The WebDAV support allows administrators to set up access to thousands upon thousands of XML files without requiring much disk space. This is because these WebDAV files, though they appear to be regular XML files, are just shortcuts, queries that point back to the database. The data that comprise these files doesn't materialize until it's requested by opening the file or copying it to a different location. Along with WebDAV, Oracle also supports transferring XML documents across FTP and HTTP.
Perhaps the most important improvement in Database 10g is XML schema evolution. Schema evolution allows you to implement changes to XML schema by mapping the existing data to the new schema. Instead of having to export and re-import all of your XML data, you just create the XSLT (XSL Transformation) style sheet to transform your old documents to the new schema, and the database takes care of the rest. Schema evolution greatly eases the management of XML data, because one thing administrators can count on is constantly changing requirements.
Oracle Database 10g offers some superb options for storing XML data. The XMLType data type obscures much of the complexity of storing and querying XML data, allowing admins to manipulate the data using familiar SQL tools and concepts. For instance, you can create an XMLType table or an XMLType column in a relational table. Creating an XMLType table is how you implement the structured storage method and preserve your XML hierarchy. Creating an XMLType column in a relational table is how you implement the unstructured storage method, which is useful for storing entire XML documents for attachment to relational records.
And what can be done with a table can be done with a view. You can also use these tables and views in stored procedures and process data based on them, so there are some very advanced possibilities here.
How easy is managing XML data in Oracle? Assuming you already understand all things XML, including SQL/XML, XQuery, and DTD (Document Type Definition), XML processes in Oracle Database are very easy to manage. In fact, as in schema evolution, the database handles many data management processes on its own, with a minimum of admin intervention.
In my speed tests, which involved importing thousands of files, creating thousands of files, and importing very large documents, Database 10g proved very fast, beating all comers at imports and finishing a close second to IBM in the file-creation test.
Big Blue, B Student
IBM is on the ball, too. DB2 implements XML as a collection of options called the XML Extender. Another key piece, the Information Integrator, is licensed separately from DB2. Although the XML Extender may sound like a separate program that sits outside the database and interacts with the relational engine, it is in fact just a collective name for the stored procedures and functions that provide the XML functionality inside DB2. These stored procedures and functions sit inside the relational engine like any other objects in the database and are accessible to any user who needs them. It's only slightly annoying that there's a separate install to get the Information Integrator functionality.
The Information Integrator sits outside the relational engine and extends the functionality of XML in DB2 by adding the XML Wrapper, which allows you to treat XML documents as relational sources. The XML Wrapper is a very powerful tool that could be useful in receiving transmitted XML files that you want to be able to query and possibly report on, but which you don't want to import into the database.
Like Oracle Database, DB2 allows you to store structured XML, unstructured documents, or shredded data. And like Oracle's XMLType, DB2's XML datatype allows you to obscure the details of XML data storage from both users and admins alike. Views can be created to merge relational data with XML data in a single result set. This same result set can be merged with XML documents on the file system or with other XML or relational data inside the database.
DB2 uses SQL/XML as the main language for creating XML documents from relational data; you use SQL/XML functions to create XML tags, attributes, and the like. SQL/XML also allows you to perform concatenations and aggregations while creating the document. DB2's SQL/XML support isn't complete, but it covers the major functions. If you are using DB2 today, you'll probably be able to do anything you want when creating XML documents.
XQuery, on the other hand, is completely unsupported by DB2. No doubt IBM is waiting for the W3C to solidify the specification before it implements the unfinished standard. Nevertheless, the absence of XQuery imposes significant limitations on DB2's XML search capabilities. I would much prefer to see a partial implementation than for it to be completely left out.
XQuery, which is a language used for crawling through structured data paths, offers significant query functionality beyond what SQL can do. Because XQuery is an identity-based query language, whereas SQL is value-based, you can use XQuery to find out whether a certain element exists; SQL can tell you whether an element that's known to exist holds a certain value. Unlike SQL, XQuery also has knowledge of attributes inside element tags, and it has stronger data types than SQL, providing you greater control over the type of data being written or queried.
XQuery notwithstanding, IBM's commitment to XML is quite apparent when you look at the improvements it has made in DB2 V8.1. Not only does V8.1 support XSLT for on-the-fly translation of XML data, there are more than 1,000 changes to SQL syntax for handling XML alone.
How easy is DB2's XML to manage? Very. The views manage the dynamic data underneath them and provide a layer of abstraction that clients need in order to query the database effectively. However, updating your XML schema will be a hassle. Because DB2 doesn't support schema evolution, you will have to physically map your old data to any new schema produced by changing requirements.
After a slow start, due to confusion over what I could and couldn't do without Information Integrator, my tests of DB2 went smoothly. DB2's development interface is very intuitive, stronger than Oracle's and even Sybase's. And despite the absence of XQuery, I found DB2 to have very powerful XML capabilities. It did everything I asked it to do and did it very quickly.
Sybase Pulls a B, Too
I very much like the way Sybase has implemented XML in Sybase ASE. Native XML support is fully integrated with the relational database, and no external programs are necessary to get the functionality.
Sybase ASE has a native XML database that allows storage of shredded, structured, and unstructured XML data. Admins can import the XML data into the database or query XML files directly from the file system. Unfortunately, ASE doesn't support XQuery, so it suffers from the same limitations as DB2 when it comes to searching XML.
Like Oracle and DB2, ASE allows very rich indexing on XML data, whether shredded or structured. A self-defining indexing mechanism in ASE handles evolving XML formats and structures without requiring any user input. Element values are automatically indexed to speed up predicate searches (the XML equivalent of "select phone from customers where lastname = smith"), and different types of indexes (XML indexes or relational indexes) are chosen by the database depending on the type of query predicate. And queries can be predicated using relational elements, XML elements, or both.
ASE does most of the XML data management for you. Building views, stored procedures, and functions based on XML data is not only easy to code, it's easy to manage as well. However, the lack of schema evolution will hinder growing projects. Lack of schema evolution isn't the only shortcoming in ASE. When creating XML files, ASE cannot perform on-the-fly XSLT translation. The end-user is expected to run the newly created XML files through an XSLT processor afterwards, which hinders any efforts to create XML files in a timely fashion. Any organization that needs to generate XML files on a regular basis should find another solution. (I find Microsoft .Net to be very good at this task.)
If it weren't for ASE's lack of XSLT and XQuery support, its XML implementation would be in the same class as IBM or even Oracle. Development was extremely easy, and the tools were very intuitive. Sybase has put together an excellent product that should be exploited by any company currently using ASE.
In the speed tests, ASE held its own except when it came to creating XML files, where performance was affected by my having to script XSLT processing. Had ASE included a method for using XSLT, it would have performed well in comparison with the others.
Microsoft Takes a Lesson
Microsoft SQL Server 2000 falls glaringly short of Oracle, IBM, and Sybase when handling XML. As an XML-enabled database, SQL Server only allows you to store XML documents as unstructured objects in the database or to shred the XML data into relational tables.
You might think that the absence of structured XML support would leave SQL Server dead in the water, but that's not entirely so. SQL Server has plenty of functionality that makes it possible to parse and write XML documents. Foremost, you can easily generate XML result sets in SQL Server using the For XML clause in a T-SQL query. This in itself is useful, but is made more so by the ability to scrub the data through an XSLT style sheet along the way.
SQL Server has other rich options for creating XML documents, allowing you to slice and dice views and present XML data in almost any form users require. You can also use SQL Server as a sort of bidirectional XML repository by mapping XML documents directly to relational tables through the use of Updategrams, which sit between the XML document and the database and allow you to specify what elements look like before and after a change. This allows you to send updates, inserts, and deletes to SQL Server in the form of an XML document. You receive the results in XML as well.
When reading XML documents, SQL Server gets even more sophisticated. For example, SQL Server can pass XML documents into stored procedures as parameters for shredding into the database. And SQL Server's OpenXML function allows you to generate rowsets from XML documents and insert them into the relational database. In fact, using OpenXML, you have access to some nicely featured methods for manipulating XML data. You can retrieve attributes and elements using flags, and you can define rowset schemas by using column patterns or table names.
However, because SQL Server treats the entire XML document as a single piece of data, the database is not aware that XML elements exist until you start parsing through them in the stored procedure. This isn't the end of the world. You can still build queries and views that leverage the hierarchical relationships of the data in XML files; it just takes a lot of extra work (read: coding) to do so. The next version of SQL Server, due in 2005, will support structured XML storage (see "Will Yukon strike XML gold?").
In my tests, I found that SQL Server speedily imported XML files, but it was the slowest of the group at creating XML files. It was also slowest at reading large XML files but not by a significant margin.
Relating to XML
All four of these relational databases provide useful XML capabilities. Even SQL Server, which lacks support for structured XML storage, allows you to store, manipulate, and create XML in useful ways.
While Microsoft has a lot of catching up to do, IBM DB2 and Sybase ASE suffer from less significant limitations. Their lack of support for XQuery means less granular XML search capabilities than the others provide. Sybase ASE's inability to perform XSLT impromptu transformations means extra time and administrator effort will be necessary to generate XML documents or evolve schema.
Although IBM, Sybase, and even Microsoft offer significant XML functionality, they trail Oracle by a wide margin. Oracle not only does the best job of hiding the complexity of managing XML data, it also offers the richest set of query capabilities and tops it off with extras such as support for schema evolution and WebDAV repositories. If you're using DB2, Sybase ASE, or SQL Server 2000, you'll be able to make good use of XML data.
Jennifer Nelson, database administrator at Info Integration, contributed to this article.