Modeling biz docs in XML
Learning XML Schema won't be easy, but don't let that stop you
Follow @infoworldTHE GOOD NEWS is that Office 11 supports XML Schema. The bad news is that XML Schema has been described even by XML experts as "confusing," "impenetrable," "fuzzy," and "as user-friendly as a stick in the eye." A successor to the SGML/XML DTD (Standard Generalized Markup Language/XML document type definition), XML Schema is a language for writing rules that constrain the kinds of elements that can appear in documents and the ways in which they can be sequenced, grouped, and nested.
XML Schema is still a relatively new specification. The W3C Recommendation for XML Schema was published in May 2001. XML parsers that support XML Schema haven't done so for very long, and there is not yet much experience using it. Most people who are adept at defining document structure learned how to do so by writing DTDs. Some of the allergic reaction to XML Schema can, therefore, be chalked up to normal reluctance to learn new skills.
Of course, it's hard to work up a lot of nostalgia for the DTD legacy. Adjectives such as "confusing" and "impenetrable" were also flung at SGML DTD. Back in the day, more than a few large document management projects -- like too many modern ERP systems -- produced a lot of sound and fury, signifying nothing. The fact is that, although sets of documents do exhibit databaselike properties that we can usefully formalize and exploit, this kind of information management is still in its infancy.
Boeing, one notable exception, has always understood that documentation is integral to its business. The company likes to joke that a jet is "five million parts flying in formation." The documents that describe that inventory are themselves part of the inventory, and are engineered accordingly. Applying that same discipline to routine business documents such as rÈsumÈs, expense reports, and purchase orders, though, was never a serious option. Sure, it would be nice to tag all this stuff for intelligent search, aggregation, and data mining. But there were no general-purpose tools for tagging documents that are individually low-value (albeit collectively high-value), and no business case could be made for creating special-purpose tools to do that instrumentation. Office 11, which aims to bring special-purpose capability to general-purpose tools, is arguably one of the most disruptive technologies in the pipeline.
"Got a question?" writes Phil Windley, CIO of the State of Utah, on his Weblog. "Somewhere, on some government computer, the information you need is probably available. Information you paid for and the government would gladly share with you -- if only they could find it." Upgrading the word processors and spreadsheets on those government computers to versions that not only can read and write XML, but, more crucially, can enforce rules about datatypes and structures, is part of the solution. Assuming, of course, that such rules can be written, deployed, and unobtrusively applied and maintained over time. "Therein," observes Windley, "lies the rub."









