February 21, 2003

Exploring XML in Office 11

XML capabilities in store for Word and Excel pack a learning curve

This year's upcoming debut of Microsoft Office 11 will mark the start of a long process of education and adaptation.

Our previous look at the Office 11 beta (see "XML for the rest of us") painted the big picture. We described how and why the pillars of Office — Word and Excel — can make use of XML. But the devil's in the details. So here we'll explore how existing Office documents can benefit from the new features, how developers will prepare XML-aware Office templates, and how users will apply them to create and analyze XML data.

Microsoft's Jean Paoli, the architect of Office 11's XML support, was co-editor of the XML 1.0 specification with Tim Bray. The first thing Paoli showed Bray was that any existing .doc file can be saved as XML — specifically, as WordML, which expresses both the style and the content of the document in pure XML. "When I showed that to Tim," Paoli remarked, "he was jumping for joy." In a separate interview, Bray — an Internet search pioneer and founder of data-visualization provider Antarctica Systems — said the same thing. Although it's true that Google can index Word, PDF, and other formats, .doc files are inherently opaque. WordML is a bridge from the .doc format to the world of XML and its associated technologies of transformation, indexing, and search. In Word 11, you need only Save As XML to enter that world.

Word 11's Save As XML feature presents a check-box labeled "Save as data only." What data means, here, is tagged elements belonging to an XML Schema. For a preexisting .doc file — a status report, a book chapter — there are no such elements. If you check "Save as data only," Word warns that you'll lose your document formatting. In this case, you'll lose more than that. The output will be an empty file because the document has no data in the XML sense. Let's conjure up some.

The example that Paoli offered began with a standard .dot file — that is, an existing Word template, just like those you already use. To make that template a launchpad for a family of documents that store valid XML data, the first step is to acquire, or create, an XSD (XML Schema Definition) file. And that step is a doozy. As we discussed in "Modeling Biz Docs in XML," few IT professionals have experience modeling data with XML Schema's predecessor, DTD (Document Type Definition), which has been around for more than 15 years. Even fewer have XML Schema experience. After Office 11 ships, we face a classic chicken-and-egg scenario. Developers can't really learn the art of modeling data in business documents without user feedback. But users can't provide that feedback until they start actually working with XML-enriched documents. Office 11's XML support isn't a final solution. Rather, it allows for a long, difficult, and absolutely vital bootstrapping process.

Close

On Twitter now

Application development

Powered by Twitter

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive InfoWorld Resource Alerts

Subscribe to the Developer World Newsletter

Receive a weekly roundup about the art and science of software development.

©1994-2009 Infoworld, Inc.