Free Newsletters
Technology & Business Daily

InfoWorld
Log-in | Register

XML for the rest of us

Microsoft Office 11's XML capabilities contain the seeds of a revolution in enterprise content management

By Jon Udell  
November 15, 2002
 

JEAN PAOLI, XML architect at Microsoft, is a man on a mission. A former developer of SGML tools, he joined Microsoft in 1996 and co-edited the first XML specification in 1998. All along, he has dreamed of building software that would make it easy for ordinary folks to create, edit, and analyze structured and semistructured data. Now, finally, his vision is coming into focus.

Free IT resource

Hear how top CIOs turn change into a competitive advantage.

Sponsored by HP

Free IT resource

Attend the SOA Executive Forum: Breaking SOA Bottlenecks SOAExecForum.com/may2007

Sponsored by InfoWorld



Microsoft Office 11 and XML


Executive Summary:
In Office 11, Word and Excel can display, edit, and save XML documents. Using XML Schema definitions bound to these documents, enterprise architects can for the first time ensure that users of common desktop applications will create and maintain high-quality, integration-ready data.

Test Center Perspective:
In a dramatic breakthrough, Office 11's XML features target end-users with no knowledge of XML. Users of Word and Excel will be most productive when supported by developers who can fluently define data models, using XML Schema, and write XML transformations, using XSLT.

The first public beta of Microsoft Office 11 demonstrates, as promised, that XML has become a native Office file format. What's more, Word 11 and Excel 11 can associate documents with data definitions written in XML Schema, and they can interactively validate documents against schemas. These are transforming achievements. Previous Office upgrades have been yawners, but version 11 should rivet the attention of IT planners.

We've known for many years that most of our vital information lives in documents, not databases. XML was supposed to help us capture the implicit structure of ordinary business documents (memos, expense reports) and make it explicit. Sets of such documents would then form a kind of virtual database. The cost to search, correlate, and recombine the XML-ized data would fall dramatically, and its value would soar. It was a great idea, but until the tools used to create memos and expense reports became deeply XML-aware, it was stillborn. XML did, of course, thrive in another and equally important way. It became the exchange format of enterprise databases and the lingua franca of Web services. Now Office 11 wants to erase the differences between XML documents written and read by people using desktop applications, and XML documents produced and consumed by databases and Web services. This is a really big deal.

The first beta of Office 11 doesn't include any demonstrations of the new XML features, but the Office team put together some examples for us, and Jean Paoli talked us through them. We started with a rÈsumÈ template written in Word 11. Today we use such templates mainly to control the appearance of documents. If we also want to control their content, we can ask developers to write macros that enforce business rules. In principle, a company could publish a rÈsumÈ template that would, for example, require job seekers to describe past experience in terms of a controlled vocabulary. In practice, that rarely happens. Procedural code to enforce such constraints is hard to write and even harder to reuse. With Word 11, you can attack this problem by defining a schema and mapping its elements to a rÈsumÈ template.

In the rÈsumÈ example, we associated a schema with a sample rÈsumÈ, using the Templates and Add-ins dialog. A new task pane called XML Structure then appeared, displaying a single root element named RÈsumÈ. We selected it, and chose the option Apply to Whole Document. Now subelements named Objective, Experience, and Education appeared in the task pane. Mapping these to regions of the sample rÈsumÈ revealed deeper structure until the entire schema was finally mapped.

Another example illustrated the same scenario for Excel. Here, the fields defining an expense report were captured in a schema, then mapped to an expense report. Once we saw how it worked, we were able to apply the same concept to our existing InfoWorld spreadsheet. After writing a simple schema, we dragged elements from the XML Structure pane onto the spreadsheet to bind named schema elements to numbered cells.

Office 11 doesn't help you write your schemas. That is both a science and an art, and something that few outside the XML development community have attempted. But once you have a schema, no programming skill is needed to bind it to a document or to enforce the constraints expressed by the schema. In the rÈsumÈ example, those constraints were trivial: A user of the document who typed nondigits into the YearFrom or YearTo elements would be alerted and could not save the document until these elements were written as the integers required by the schema. But this humble example has profound implications. Consider the InfoWorld story shown in the screen shot. It's written in Word but backed by a schema that enumerates the set of allowable author names, limits the length of headlines and of the main story, and disallows Greek symbols. The story as shown violates two of those constraints: It includes a Greek letter and the author's name, misspelled, fails to match the enumerated set of allowed names. Word 11 reports the infractions as they occur and stops complaining as soon as they are corrected.

Once valid, the document can be saved as XML in two ways. The default is to create WordML, which preserves Word's styles and formatting in an XML name-space that's separate from the one bound to the schema-controlled data. You can optionally save through an XSLT transformation which, in a publish-to-the-Web scenario, could translate WordML formatting into HTML/CSS formatting. Alternatively, if you tick the Save as Data option, you can instead save just the raw XML data. In that case, you can bind one or more XSLT stylesheets to the document, each of which can generate WordML styles and formatting.

The XML expertise needed to create schemas and XSLT transformations is scarce today. Once Office 11 hits the streets, its mainstream applications could arguably commoditize those XML skills more quickly and broadly than have Web services technologies. What's more, Office is positioned as a bridge between the worlds of desktop applications and Web services. In the emerging architecture of the business Web, XML-wrapped remote procedure calls are giving way to XML documents. SOAP, we'll soon see, isn't just a way for services to talk to one another. A purchase order acquired from a Web service by means of a SOAP call will sometimes need to be modified by a person. The application used to edit that purchase order will have to be a familiar tool. It will also have to guarantee that the document it passes along contains well-structured, valid, and thus enterprise-ready data.

Office 11 appears to meet both of these requirements. And it does so in ways that respect the inherent strengths of the applications. Displayed in Word, an electronic purchase order can reflect its paper-based legacy by exploiting Word's formatting power. Instances of that same document, brought into Excel, can feed the analytical functions that are Excel's specialty. When XML data has a regular structure that maps naturally to a grid, Excel 11 can make that data immediately available for columnwise sorting, charts, and pivot tables. Here, in fact, is a case where Microsoft has put XSLT's basic XML-shredding capability into the hands of a nonprogrammer. Absent a schema, Excel 11 can still infer structure from raw XML data. When we pointed it at an XML data dump taken from a back-office system, it automatically proposed a structure. We were then able to populate a spreadsheet template with selected elements, reorder them at will, and define a mapped region into which a subset of our data could be imported. We previously had to write XPath expressions to target elements and XSLT code to rearrange them. Excel 11 makes that an interactive task that any user can perform.

Jean Paoli is wildly enthusiastic about what all this will mean. We share his excitement. Empowering ordinary users to create and interact with XML data is a huge step forward. It's too bad that Outlook hasn't been given the same treatment as Word and Excel. Most of us do a lot more communicating than document processing or number crunching. We'd like to see e-mail become a natively structured and manageable data type, too. Meanwhile, we'll have our hands full just exploring the new vistas opened up by the XML features of the new versions of Word and Excel.





 


 
Jon Udell is lead analyst and blogger in chief at the InfoWorld Test Center.

  More of Jon Udell's column
  Jon Udell's Weblog

Newsletter Check out all of our free newsletters!
Enter e-mail address:




 

TOP NEWS:


»  Top 10: Intel antitrust redux, AMD change, network woes
This week's roundup of the top tech news stories includes Intel's EC woes, AMD's new CEO, San Francisco's network issues, the ongoing MS-Yahoo saga, and more

»  Why San Francisco's network admin went rogue
An inside source reveals details of missteps and misunderstandings in the curious case of Terry Childs, network kidnapper

»  AMD takes on Intel with its own low-power chip
The chip, code-named Bobcat, is designed for low-cost laptops and mobile devices and will compete with Intel's Atom processor

»  Hold off on WiMax investments, Gartner cautions
Analysts say businesses should wait until WiMax is more widely deployed and there are more dual-mode handsets

»  Samsung, Sun jointly develop NAND flash memory chip
The 8GB single-level cell NAND flash memory chip developed by Samsung and Sun should have a significantly longer lifespan than current flash memory

»  RIM fixes critical BlackBerry Enterprise Server bug
Research in Motion patched a critical bug in its BlackBerry Enterprise Server that could have allowed hackers to break into company networks




Solutions to the Toughest IT Challenges in Remote Offices
Though small in size, remote offices face many of the same IT challenges as larger central offices. This Webcast zeroes in on the top line challenges to deliver information that can provide immediate benefits to your business. Sponsor: AMD and Dell

»  Click here to view this Webcast
  Zombie PCs Are Attacking Your LAN
A recent study showed that malware-infected zombie PCs are now a bigger threat to ISPs and Web infrastructure than DoS attacks. As this brand new IT Strategy Guide explains, an increased use of peer-to-peer techniques by the attackers has made it harder to fight back. Download now, compliments of Verio:

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 
SEE ALSO
• Jean Paoli on XML in Office 11
• Interplanetary tango


FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


What's the 411 on GOOG-411?
Just as Google has become synonymous with "performing a Web search," 411 is understood to mean "information" -- as in "what's the 411?" I was thus surprised to discover, from a billboard, no less, that the king of search is taking on the ...

Apple HTML source reveals 'iPhone Extreme'
"This one's a stretch..." reports AppleInsider. Um, yeah. Reporting on HTML code sightings of product names could be called a stretch, but iPhone Extreme has a ring to it. Now, that sounds like the product Apple should have released first, rather ...

COLUMNISTS

Unified under law
Ephraim Schwartz's Column and Blog (InfoWorld) - In the litigious world we live in, deploying a unified communications platform in your enterprise could...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Product Management
When I joined MySQL four years ago, there was quite a lot of debate about product management. We didn't actually have ...

Zero Day 
Botnet herders tending smaller flocks
New research backs up the theory that botnet operators are keeping their networks smaller in a continued effort to keep ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Open Sources
• ProdBlog
• Real World SOA
• Reality Check
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
'If you don't go after the network, you're never going to stop these guys. Never.'
From the State Department, All the News for Inquiring Minds
TechPresident, the Internet Citizenry's New Consensus Taker



Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist