Will Yukon strike XML gold?

Son of Microsoft SQL Server 2000 is going native

Expected to ship by mid-2005, Yukon represents a major step forward for Microsoft in the XML arena. Striving to match the native XML storage capabilities in IBM’s, Oracle’s, and Sybase’s relational databases, Yukon will finally store XML data in a structured fashion, along with supporting the shredded and unstructured storage methods already available in Microsoft SQL Server 2000.

Yukon’s approach to preserving the hierarchical data relationships in XML documents differs from that of Oracle, IBM, and Sybase. Instead of using nested tables for storing structured XML, Yukon features what I refer to as a managed BLOB (Binary Large Object) data type. This means that, although the data is being mapped to a regular BLOB data type, the XML data type assigned to the column keeps track of the hierarchy of the XML data. One consequence of the BLOB approach is a 2GB limit on the size of XML files you can store in Yukon.

Will this matter? There are some who think the 2GB limit will be a showstopper for Microsoft in large XML implementations, but I’ll tell you something … I’ve never seen an XML document even half that size. XML is about finding records in a file, not dumping an entire database to a file and trying to run an XQuery search against it. 

Native storage isn’t the only way Yukon improves XML data management. Yukon also offers a simple version of schema evolution. Taking a different route from Oracle, it will allow you to update existing documents by changing the namespace instead of attaching the new schema and scrubbing the existing data through XSLT (XSL Transformation). Microsoft’s method ignores validation for existing data, while Oracle’s method allows you to fill in missing data based on the new schema. In that respect, at least, Oracle’s implementation is more flexible.

You also can validate schema in a number of ways. You can explicitly create an XML schema collection, attach a schema during table creation, or use the cast function to convert data types inside a query. So far, from what I can tell using the Yukon beta, there’s no way to turn off schema validation once you’ve turned it on, and while it seems like a good idea, complete schema validation is expensive. In the interest of conserving time and computing resources, you may not always want to use it, especially against trusted data sources. I’ll certainly be interested in how this shakes out in the final release.

In short, Microsoft seems to be on the right track, but it’s much too early to see how Yukon will pan out. Any Microsoft shop trafficking in XML should be hoping there’s gold in those hills.

From CIO: 8 Free Online Courses to Grow Your Tech Skills