Query Strategies
The foundation of all XML-oriented query strategies is XPath, a syntax built to descend treelike structures and to lop off
branches. When an XSLT stylesheet transforms an XML document, it uses XPath to isolate fragments of the document. Relational
databases that support XML queries -- including stalwarts Oracle, DB2, and SQL Server, newcomers such as OpenLink Software's
Virtuoso, but not yet MySQL -- use XPath in the same way. At first, this XPath support was delivered in the form of proprietary
extensions. More recently, the SQL/XML standard has defined a common set of XPath-aware SQL extensions. XPath is also used
in the W3C's forthcoming XQuery standard, an ambitious effort to adapt the data-joining power of SQL to the world of semistructured
XML data. "We're working heavily with XQuery in order to enable manipulation of XML content in ways familiar to SQL developers,"
says Jeff Jones, director of strategy for the information management group at Armonk, N.Y.-based IBM.
Although vendors are chomping at the bit for XQuery 1.0 to be finalized, their implementations of it will be less powerful,
in some ways, than their current SQL/XML implementations. Most notably, XQuery does not define a syntax for updating elements
within XML documents. Although SQL/XML's update mechanism is not yet approved, it has been defined and is already implemented
in Oracle and DB2.
Has SQL/XML stolen XQuery's thunder? In the short term, XQuery may appear to be just an alternative way to do things that
can be done equally well in SQL and XPath. But Redwood Shores, Calif.-based Oracle's Banerjee thinks that in the long run,
it's possible that developers "will want to stay within an XML abstraction for all their data sources." In that case XQuery,
a rich and complete programming language built to manipulate complex data, could emerge as a major paradigm.
The Future of Documents
Imagine a purchase order flowing through a business process some time in 2005. It's an XML document, created with a tool such
as InfoPath, carrying a mixture of core data and contextual metadata. The core data, including the item number and department
code, will wind up in the columns of a relational table. The contextual metadata, which might include a threaded discussion
made from comments injected by the requester, the reviewer, and the approver, will remain in document form. "This human context
is never stored in the RDBMS today," says Kingsley Idehen, CEO of Burlington, Mass.-based OpenLink. Yet it's the key to understanding
how the data got there and what it means.
Once written, the purchase order is injected into a workflow orchestrated on top of a Web services network. A security service
may enforce authorization policy by updating a SOAP header; a choreography service may search for sets of documents that have
SOAP headers that contain the same correlation ID. These active intermediaries will need some kind of database technology
to manage the XML that lives transiently in their queues, but it probably won't be a job for Oracle or DB2. Here a specialized
XML database, such as Software AG's Tamino or Sleepycat Software's Berkeley DB XML may be better suited to the task. They're
fast and, as Mike Champion, senior R&D advisor at Software AG in Darmstadt, Germany notes they're built to work well with
dynamic XML documents even when those documents lack the schemas the RDBMS SQL/XML mappers rely on.
During the workflow and after it has been completed, the document will be accessible to interested parties via a certain URL.
That URL might resolve to a projection of the document -- from a hybrid SQL/XML RDBMS, to an intranet Web server or a WebDAV
repository such as Oracle's. Alternatively, the URL might resolve to the underlying instance of the document stored natively
in the RDBMS. Either way, the state of the business process -- both core data and contextual metadata -- will be visible at
all times to anyone who's interested in looking at it and is authorized to do so. What's more, both flavors of data carried
in the document will be accessible to queries that reach across the enterprise, joining SQL and XML sources to create consolidated
views.
Did I say that this will come to pass by 2005? Make that 2006 or maybe 2007. A major shift in the style of enterprise data
management is under way, and there are huge architectural issues yet to be resolved. Oracle, not surprisingly, wants you to
store everything in a centralized hybrid DBMS. IBM says it would rather enable you to federate data across a range of sources.
Each strategy has merit, and most enterprises will wind up pursuing both -- in different ways, for various reasons. Despite
these differences, we are witnessing a sacred union. SQL and XML have been pronounced man and wife, and the honeymoon has
begun.