If you could do one thing to improve integration and automate processes with customers and business partners, it would be
to implement XML, which has become the standard for exchanging information between disparate systems because it is easily
transformed into any format. With very little effort, the same file can be sent to several different customers with their
own specific needs. XML eases the development effort for the transmitting company and gives recipients a safety net for altering
the way they use the data without having to alter how they receive it.
Being able to merge, query, and transform transmitted data with relational data is becoming as essential to businesses as
data warehouses themselves. The good news is that the four leading relational databases, namely Oracle Database, IBM DB2,
Sybase ASE (Adaptive Server Enterprise), and Microsoft SQL Server, not only can store XML data, but they hide much of the
complexity of working with XML. Depending on which of these relational databases you use, however, the XML features you will
have to work with may be extremely rich or limited in important ways.
What does a fashionable XML database provide? Four basic functions: the ability to consume, store, search, and generate XML.
The extent to which the database supports these functions and the methods it uses to accomplish them are what make for a successful
implementation of XML in a database.
I examined these four areas in Oracle Database 10g, IBM DB2 Universal Database V8.1, Sybase ASE 12.5.1, and Microsoft SQL
Server 2000. I tested how they imported and read XML files, their options for saving the data, their indexing and query capabilities,
and their options for creating XML and graded them based on the ease, flexibility, and speed with which they handled the most
common XML operations.
Of course, these products have many other capabilities beyond handling XML. My grades should not be interpreted as complete
evaluations.
Documents and Tables
Relational databases and XML documents are both powerful ways to represent relationships among data, but they're powerful
in different ways. For example, querying on a patient ID number in a relational database may allow you to quickly find the
dates a certain patient visited the hospital, the conditions he was diagnosed with, and the treatments he was given. But it
likely won't help you determine which treatments were provided for which conditions or what times the treatments took place,
nor will it give you other useful information that XML versions of these records could provide.
But whether or not you can combine the benefits of relational and XML data depends on how you store the XML. There are three
methods for physically storing XML data in a relational database: shredded, unstructured, and structured. Shredded and unstructured
are useful methods but limited. The structured method allows you to leverage the power of both relational data and XML hierarchies.
Shredding puts XML data into relational columns but strips it of its XMLness, meaning the hierarchical relationships among
the data in the original XML document are lost. Shredding is useful when you're not concerned about keeping the data in XML
format. For example, let's say you have a Web site that allows customers to place orders, and the order needs to go to a number
of different database systems. Producing an XML file and having the different systems pick it up -- that is, shred it -- from
a network share may be the most efficient and error-free way to get the data where you want it to go.
The unstructured method uses a data type called a CLOB (Character Large Object) to store an entire XML document as a single
unit. Databases have been doing this for years with different types of documents, so this is nothing new. The unstructured
method provides limited search capabilities, but it is still quite useful. You can't base queries on it, but the structure
of the original data is preserved. A good use for unstructured XML storage would be in keeping original documents to comply
with government regulations. For example, if a financial institution were to receive original loan documents in XML, this
would allow them to have a relational record of each loan application, and also to store the original application with that
record.
The structured method allows you to store XML data inside the database and preserve the hierarchy of the data. Structured
storage, also known as "native XML" storage, is what every vendor is trying to achieve. The most obvious benefit of preserving
the hierarchical relationships of XML data is being able to receive an XML document, combine it or manipulate it with relational
data, and produce XML as a result. It isn't possible to produce such result sets with a relational query language alone.
Only Microsoft SQL Server 2000, among the four databases tested, does not support structured XML storage. And while all four,
including Microsoft, allow you to reap the benefits of the shredded and unstructured methods, the approaches they take and
flexibility they provide can differ considerably.
Oracle Sets the Curve
Oracle Database 10g breaks new ground in support for XML technology, offering very rich features for importing, storing, querying,
and generating XML data. Providing native, structured XML storage as well as support for unstructured document storage and
shredding, Oracle Database 10g allows you to pull XML data from files and merge it with relational data in views. But before
jumping into an upgrade for the enhanced XML capabilities, Oracle shops should note that most of the functionality is available
in Database 9i.