Ten things to know about XDocs

InfoPath a huge step in the right direction

Jean Paoli, the architect of Microsoft Office's XML capabilities, recently spent several hours showing me Microsoft's newest Office family member, InfoPath (formerly XDocs, originally NetDocs). Here are 10 things you should know about this revolutionary piece of software.

1. You use it to gather and view semi-structured information.

The most obvious example of such data-gathering is the business form. While acknowledging the marketing need to brand InfoPath as a forms application, Paoli insists -- rightly -- that there is more to the story. To the user, InfoPath is a general-purpose viewer and editor of business information. To the developer, it's a power tool for building applications that view, edit, and transform XML data.

2. Users create and maintain high-quality data.

Like Office 11 and Excel 11, InfoPath can bind an XML Schema to a document, can interactively validate the document against the schema, and can prevent the user from saving the document in an invalid state. In addition to schema constraints, you can attach extra validation rules. For this purpose, the InfoPath design mode includes an XPath-aware expression builder.

3. It is aggressively standards-based.

Word 11 can save formatted documents in an XML format called WordML, or it can save schematized data without formatting as generic XML. Although these two modes are both standard in their use of XML, they are nevertheless quite distinct from one another. In the latter case you use XSLT to apply the WordML styling to a core of pure structured data, but it's optional.

With InfoPath, XSLT isn't an option. The document's core of structured data is always expressed through one or more views, and those views are XSLT transformations. InfoPath's more unified model does not derive from Word, but rather from Paoli's former project, Internet Explorer. In an InfoPath document, formatted text is expressed as XHTML (the schema for which must be bound to the document), and all styling is accomplished by means of standard CSS (Cascading Style Sheets).

InfoPath also provides a DOM (Document Object Model) accessible to scripting languages such as VBScript and JavaScript.

4. It connects people to business processes.

As SOAP packets wend their way through business workflows, people need to open them up, look at them, think about them, interact with them, and inject them back into the workflows. InfoPath aims to support that interaction in a way that's completely natural for the user, but sacrifices none of the fidelity of the XML data on which Web services depend.

5. It embraces both centralized and peer-to-peer workflow.

In a server-centric model, you might fetch an InfoPath document from a server, add to or modify the data, and then post it back to the server. But since the document is a fully self-contained package of data, schema, and layout, you could as easily e-mail your changed version to a colleague for review and for injection back into the workflow. This fluid blend of centralized and peer-to-peer styles nicely accommodates business reality.

6. You can use it online or offline.

When you're online, you can request an InfoPath document from a Web server and post changes back to that server. When offline, your InfoPath document (i.e., a package of XML data, schema, and layout information) lives in a .CAB (Cabinet) file that you can use locally.

7. It helps you visualize your XML data.

Paoli worries that as we start to produce more and more XML data, there's a danger that we'll lose sight of it. One pile of angle-bracketed text looks pretty much like another. He notes, however, that the same XSLT stylesheets used to view data in InfoPath can also be used elsewhere. For example, when InfoPath-created XML data accumulates on a server, a process running on that server -- using only off-the-shelf XML parsing and XSLT transformation -- can render views of the data.

8. It breaks the XSLT bottleneck.

Even if you've drunk the XSLT Kool-Aid and know how a powerful a language it is, you'll probably admit that XSLT programming is no walk in the park. The InfoPath designer can, crucially, generate the XSLT code needed to map between complex XML data and useful views of that data. Like all visual tools it has limits, which you can escape from by defining regions within the generated code for handwritten extensions. That said, the designer works very hard to make intelligent mappings between XML structures and user-interface controls. Such automation should help prevent XML transformation from becoming the IT bottleneck of the Web services era.

9. Users and IT can jointly prototype new data structures.

In the real world, people don't start with formal schema design tools. They write some XML, look at it, think about it, pass it around, talk about it, and incrementally refine it. InfoPath does not aspire to be a schema design tool. But it will be a great environment for informal prototyping. If you start with some raw XML data that is well-formed but lacks a schema, InfoPath will generate one. You can even start from a blank slate. Either way, you can create views, test drive them by gathering real data, and use what you learn to refine your understanding of what data to collect and how to structure it.

10. It represents a paradigm shift.

I try to avoid the "P" phrase. But there is no other way to describe InfoPath. At the dawn of the .Net era, Bill Gates introduced the notion of a "universal canvas" -- a viewing and editing surface for anything that can be represented in XML. InfoPath isn't that yet, but it's a huge step in the right direction.