JEAN PAOLI, XML architect at Microsoft, is a man on a mission. A former developer of SGML tools, he joined Microsoft in
1996 and co-edited the first XML specification in 1998. All along, he has dreamed of building software that would make it
easy for ordinary folks to create, edit, and analyze structured and semistructured data. Now, finally, his vision is coming
into focus.

Microsoft Office 11 and XML
|
 |
Executive Summary: In Office 11, Word and Excel can display, edit, and save XML documents. Using XML Schema definitions bound to these documents,
enterprise architects can for the first time ensure that users of common desktop applications will create and maintain high-quality,
integration-ready data.
Test Center Perspective: In a dramatic breakthrough, Office 11's XML features target end-users with no knowledge of XML. Users of Word and Excel will
be most productive when supported by developers who can fluently define data models, using XML Schema, and write XML transformations,
using XSLT.
|
 |
|
|
|
The first public beta of Microsoft Office 11 demonstrates, as promised, that XML has become a native Office file format.
What's more, Word 11 and Excel 11 can associate documents with data definitions written in XML Schema, and they can interactively
validate documents against schemas. These are transforming achievements. Previous Office upgrades have been yawners, but version
11 should rivet the attention of IT planners.
We've known for many years that most of our vital information lives in documents, not databases. XML was supposed to help
us capture the implicit structure of ordinary business documents (memos, expense reports) and make it explicit. Sets of such
documents would then form a kind of virtual database. The cost to search, correlate, and recombine the XML-ized data would
fall dramatically, and its value would soar. It was a great idea, but until the tools used to create memos and expense reports
became deeply XML-aware, it was stillborn. XML did, of course, thrive in another and equally important way. It became the
exchange format of enterprise databases and the lingua franca of Web services. Now Office 11 wants to erase the differences
between XML documents written and read by people using desktop applications, and XML documents produced and consumed by databases
and Web services. This is a really big deal.
The first beta of Office 11 doesn't include any demonstrations of the new XML features, but the Office team put together
some examples for us, and Jean Paoli talked us through them. We started with a rÈsumÈ template written in Word 11. Today we
use such templates mainly to control the appearance of documents. If we also want to control their content, we can ask developers
to write macros that enforce business rules. In principle, a company could publish a rÈsumÈ template that would, for example,
require job seekers to describe past experience in terms of a controlled vocabulary. In practice, that rarely happens. Procedural
code to enforce such constraints is hard to write and even harder to reuse. With Word 11, you can attack this problem by
defining a schema and mapping its elements to a rÈsumÈ template.
In the rÈsumÈ example, we associated a schema with a sample rÈsumÈ, using the Templates and Add-ins dialog. A new task pane
called XML Structure then appeared, displaying a single root element named RÈsumÈ. We selected it, and chose the option Apply
to Whole Document. Now subelements named Objective, Experience, and Education appeared in the task pane. Mapping these to
regions of the sample rÈsumÈ revealed deeper structure until the entire schema was finally mapped.
Another example illustrated the same scenario for Excel. Here, the fields defining an expense report were captured in a
schema, then mapped to an expense report. Once we saw how it worked, we were able to apply the same concept to our existing
InfoWorld spreadsheet. After writing a simple schema, we dragged elements from the XML Structure pane onto the spreadsheet
to bind named schema elements to numbered cells.
Office 11 doesn't help you write your schemas. That is both a science and an art, and something that few outside the XML
development community have attempted. But once you have a schema, no programming skill is needed to bind it to a document
or to enforce the constraints expressed by the schema. In the rÈsumÈ example, those constraints were trivial: A user of the
document who typed nondigits into the YearFrom or YearTo elements would be alerted and could not save the document until these
elements were written as the integers required by the schema. But this humble example has profound implications. Consider
the InfoWorld story shown in the screen shot. It's written in Word but backed by a schema that enumerates the set of allowable
author names, limits the length of headlines and of the main story, and disallows Greek symbols. The story as shown violates
two of those constraints: It includes a Greek letter and the author's name, misspelled, fails to match the enumerated set
of allowed names. Word 11 reports the infractions as they occur and stops complaining as soon as they are corrected.
Once valid, the document can be saved as XML in two ways. The default is to create WordML, which preserves Word's styles
and formatting in an XML name-space that's separate from the one bound to the schema-controlled data. You can optionally save
through an XSLT transformation which, in a publish-to-the-Web scenario, could translate WordML formatting into HTML/CSS formatting.
Alternatively, if you tick the Save as Data option, you can instead save just the raw XML data. In that case, you can bind
one or more XSLT stylesheets to the document, each of which can generate WordML styles and formatting.
The XML expertise needed to create schemas and XSLT transformations is scarce today. Once Office 11 hits the streets, its
mainstream applications could arguably commoditize those XML skills more quickly and broadly than have Web services technologies.
What's more, Office is positioned as a bridge between the worlds of desktop applications and Web services. In the emerging
architecture of the business Web, XML-wrapped remote procedure calls are giving way to XML documents. SOAP, we'll soon see,
isn't just a way for services to talk to one another. A purchase order acquired from a Web service by means of a SOAP call
will sometimes need to be modified by a person. The application used to edit that purchase order will have to be a familiar
tool. It will also have to guarantee that the document it passes along contains well-structured, valid, and thus enterprise-ready
data.
Office 11 appears to meet both of these requirements. And it does so in ways that respect the inherent strengths of the
applications. Displayed in Word, an electronic purchase order can reflect its paper-based legacy by exploiting Word's formatting
power. Instances of that same document, brought into Excel, can feed the analytical functions that are Excel's specialty.
When XML data has a regular structure that maps naturally to a grid, Excel 11 can make that data immediately available for
columnwise sorting, charts, and pivot tables. Here, in fact, is a case where Microsoft has put XSLT's basic XML-shredding
capability into the hands of a nonprogrammer. Absent a schema, Excel 11 can still infer structure from raw XML data. When
we pointed it at an XML data dump taken from a back-office system, it automatically proposed a structure. We were then able
to populate a spreadsheet template with selected elements, reorder them at will, and define a mapped region into which a subset
of our data could be imported. We previously had to write XPath expressions to target elements and XSLT code to rearrange
them. Excel 11 makes that an interactive task that any user can perform.
Jean Paoli is wildly enthusiastic about what all this will mean. We share his excitement. Empowering ordinary users to create
and interact with XML data is a huge step forward. It's too bad that Outlook hasn't been given the same treatment as Word
and Excel. Most of us do a lot more communicating than document processing or number crunching. We'd like to see e-mail become
a natively structured and manageable data type, too. Meanwhile, we'll have our hands full just exploring the new vistas opened
up by the XML features of the new versions of Word and Excel.