When XML came along five years ago, promising to rewrite the rules of data management, vendors of relational databases took
note, but they didn't panic. They'd already seen this movie a decade before, when the object database had been cast in the
role of paradigm shifter. This new software species did emerge and did popularize the notion of persistence -- that is, the
capability of storing and retrieving programming-language objects without arduous translation to and from relational tables.
As it turned out, however, the old RDBMS dog could learn new tricks. Relational databases figured out how to store complex
types using the SQL:1999 object model. Implementations of JDO (Java Data Objects) exist for relational as well as for object
databases. And according to Microsoft, the forthcoming Yukon edition of SQL Server will be capable of persisting .Net objects.
Having absorbed objects, the RDBMS vendors are now working hard to absorb XML documents. Don't expect a simple rerun of the
last movie, though. We've always known that most of the information that runs our businesses resides in the documents we create
and exchange, and those documents have rarely been kept in our enterprise databases. Now that XML can represent both the documents
that we see and touch -- such as purchase orders -- and the messages that exchange those documents on networks of Web services,
it's more critical than ever that our databases can store and manage XML documents. A real summer blockbuster is in the making.
No one knows exactly how it will turn out, but we can analyze the story so far and make some educated guesses.
Consensual Hallucination
The first step in the long journey of SQL/XML hybridization was to publish relational data as XML. BEA Chief Architect Adam
Bosworth, who worked on the idea's SQL Server implementation, calls it "the consensual-hallucination approach -- we all agree
to pretend there is a document."
XML publishing was the logical place to start because it's easy to represent a SQL result set in XML and because so many dynamic
Web pages are fed by SQL queries. The traditional approach required programmatic access to the result set and programmatic
construction of the Web page. The new approach materializes that dynamic Web page in a fully declarative way, using a SQL-to-XML
query to produce an XML representation of the data and XSLT (eXtensible Stylesheet Language Transformation) to massage the
XML into the HTML delivered to the browser.
Originally these virtual documents were created using proprietary SQL extensions such as SQL Server's FOR XML clause. There's
now an emerging ISO/ANSI standard called SQL/XML, which defines a common approach. SQL/XML is supported today by Oracle and
DB2. It defines XML-oriented operators that work with the native XML data types available in these products. SQL Server does
not yet support an XML data type or the SQL/XML extensions, but Tom Rizzo, SQL Server group product manager at Redmond, Wash.-based
Microsoft, says that Yukon, due in 2004, will.
Storing Documents
Most of the information in an enterprise lives in documents kept in file systems, not in relational databases. There have
always been reasons to move those documents into databases -- centralized administration, full-text search -- but in the absence
of a way to relate the data in the documents to the data in the database, those reasons weren't compelling. XML cinches the
argument.
As business documents morph from existing formats to XML -- admittedly a long, slow process that has only just begun -- it
becomes possible to correlate the two flavors of data. Consider an insurance application that stores claims data in a relational
table and claims documents in XML. A hybrid SQL/XML database enables the application to extract fragments of XML from a subset
of the documents. And that subset can be created by joining XML elements in the document with column values in relational
tables.
These wildly powerful effects are currently achieved using a few different kinds of storage and query strategies. On the storage
side, there are two general approaches. You can put whole documents into columns of the database, or you can "shred" the document
into a collection of relational tables. The latter approach makes best use of the database's query engine and atomic update
capabilities, but mapping from irregular XML data to SQL is much harder than mapping from SQL to XML. It helps if your XML
documents are governed by XML Schema descriptions. These provide hints to the XML-to-SQL mapper and can be annotated to more
precisely control the mapping. It also helps if your database supports objects that can receive irregularly shaped XML data.
"We extended the relational base technology to include objects as part of SQL:1999," says Sandeepan Banerjee, director of
product management at Oracle Server Technologies. "And this has matured with 8i and 9i to the point where the typing system
of XML Schema can be fully represented in the object/relational typing."