This year's upcoming debut of Microsoft Office 11 will mark the start of a long process of education and adaptation.
Our previous look at the Office 11 beta (see "XML for the rest of us") painted the big picture. We described how and why the pillars of Office — Word and Excel — can make use of XML. But the devil's in the details. So here we'll explore how existing Office documents can benefit from the new features, how developers will prepare XML-aware Office templates, and how users will apply them to create and analyze XML data.
Microsoft's Jean Paoli, the architect of Office 11's XML support, was co-editor of the XML 1.0 specification with Tim Bray. The first thing Paoli showed Bray was that any existing .doc file can be saved as XML — specifically, as WordML, which expresses both the style and the content of the document in pure XML. "When I showed that to Tim," Paoli remarked, "he was jumping for joy." In a separate interview, Bray — an Internet search pioneer and founder of data-visualization provider Antarctica Systems — said the same thing. Although it's true that Google can index Word, PDF, and other formats, .doc files are inherently opaque. WordML is a bridge from the .doc format to the world of XML and its associated technologies of transformation, indexing, and search. In Word 11, you need only Save As XML to enter that world.
Word 11's Save As XML feature presents a check-box labeled "Save as data only." What data means, here, is tagged elements belonging to an XML Schema. For a preexisting .doc file — a status report, a book chapter — there are no such elements. If you check "Save as data only," Word warns that you'll lose your document formatting. In this case, you'll lose more than that. The output will be an empty file because the document has no data in the XML sense. Let's conjure up some.
The example that Paoli offered began with a standard .dot file — that is, an existing Word template, just like those you already use. To make that template a launchpad for a family of documents that store valid XML data, the first step is to acquire, or create, an XSD (XML Schema Definition) file. And that step is a doozy. As we discussed in "Modeling Biz Docs in XML," few IT professionals have experience modeling data with XML Schema's predecessor, DTD (Document Type Definition), which has been around for more than 15 years. Even fewer have XML Schema experience. After Office 11 ships, we face a classic chicken-and-egg scenario. Developers can't really learn the art of modeling data in business documents without user feedback. But users can't provide that feedback until they start actually working with XML-enriched documents. Office 11's XML support isn't a final solution. Rather, it allows for a long, difficult, and absolutely vital bootstrapping process.