Next-generation e-forms

Paper will never die. Instead, it’s going digital and providing a better, XML-enabled way to enter critical data

The transition from paper to electronic forms seems like a no-brainer. Who wouldn’t want to abolish the anachronism of paper forms in capturing and relaying business-critical information? Of course, centuries of bureaucracy yield habits that are hard to break. In December, for example, the federal government ignored a $17 million grant application from New Hampshire because, according to a state official, “some pages had margins narrower than one inch.”

E-forms can’t do anything about boneheaded business rules. But they provide a more accurate, intuitive replacement for paper forms than plain HTML forms or antiseptic data entry screens — and in the latest e-forms software, they wrap captured data in XML format. These products also provide design tools that allow you to build attractive XML-enabled forms quickly and easily.

Microsoft’s XML-oriented InfoPath, which shipped with Office 2003 in October, is now deployed and in use. Adobe plans to ship a beta version of its PDF- and XML-oriented forms designer in the first quarter of this year. And e-forms veterans such as PureEdge and Cardiff, whose offerings are built on an XML core, are lining up behind XForms, an e-forms specification that became an official W3C recommendation in October 2003.

The XML Denominator

Common to all these vendors’ approaches is the use of XML as the bridge between applications that gather data from end-users and the back-office systems that absorb that data. Three factors contribute to the unanimous choice of XML:

Universal data exchange  Web services are only the tip of the XML iceberg. InfoPath, for example, can talk to SOAP end points, but it can also post raw XML data to an ordinary Web server or send it as e-mail. In general, if you want to move a package of structured information from point A to point B, you’d be crazy not to leverage the XML machinery that’s freely available and widely deployed on all platforms.

Declarative validation  Enterprise applications depend on clean data; you’ve got to scrub it before it enters your systems. There’s no avoiding the use of procedural code for scrubbing, but the more validation you can handle declaratively, the better. All the emerging solutions rely on W3C XML Schema for that purpose. Support for XML Schema in e-forms software is a watershed event.

Document orientation  Forms can present flat lists of name/value pairs, hierarchical and irregularly shaped structures, or — typically — a combination of these styles. XML’s roots in publishing make it a good fit for modeling the documentlike qualities of forms as well as their databaselike qualities.

The Paper Legacy

Within this broad XML consensus, there are differences that reflect the legacies of Microsoft, Adobe, the e-forms vendors, and the customers they serve (see “E-forms Line-by-Line,” page 54). The relationship of e-forms solutions to printed forms, and to the processes that surround them, is a major source of differentiation. For all their inefficiency as data-gathering instruments, printed forms are highly engineered information displays. People who scan and process forms often rely on their layout and typography, which is why some industries — insurance, for example — standardize the look and feel of forms as well as their content.

Yet even a pixel-perfect rendering of a form on a screen can’t replace the printed original. You can’t arrange piles of Tablet PCs on your desk the way you can arrange piles of paper. In a 2002 New Yorker article entitled “The Social Life of Paper”, Malcolm Gladwell argued convincingly that the paperless office has failed to materialize for the very good reason that piles of paper are not simply messes but rather critical mechanisms for thinking and planning.

Printed forms have many other uses. Although all the e-forms solutions support digital signatures, few users have acquired the digital certificates they need to sign forms. So we’ll be signing printed documents for many years to come. And we’ll be archiving them, too: Paper’s proven longevity far exceeds that of any digital medium.

Adobe vs. Microsoft

For all these reasons, Adobe’s forthcoming solution is certain to attract attention. It builds on a capability that Adobe has quietly embedded into the free Acrobat Reader. Version 6 of that product can display a form backed by XML data that is governed by any XML Schema definition. According to Adobe Senior Product Manager Chuck Myers, no licensed extensions are required in order to interact with that data and post it back to a Web server or to transmit it by e-mail. An enterprise that needs its users to save forms locally for offline use and to digitally sign, annotate, or connect them to Web services end points will be able to unlock these capabilities using Adobe’s Document Server for Reader Extensions.

Given this infrastructure, all Adobe needs to compete head-to-head with Microsoft’s InfoPath is an XML-aware forms designer. And that’s just what the company demonstrated at XML 2003 in December. The Adobe Forms Designer supports two approaches to creating forms. As with InfoPath, you can start with a blank canvas plus a schema, paint the canvas with user-interface widgets, and then bind schema elements to those widgets by dragging and dropping. Or you can start with an existing PDF form and bind schema elements to regions of the form. Wizards that guide users through complex data-entry chores and implement procedural validation can be added as scripted extensions.

Because Adobe’s solution leverages features that already exist in the free and widely deployed Acrobat Reader, its reach will exceed that of InfoPath, a product that is available only for Windows and is bundled only with the enterprise edition of Office 2003.

Adobe’s PDF-oriented solution also trumps Microsoft’s in terms of its fidelity to printed forms — but a form’s appearance and its capability of capturing data are really two issues that need to be teased apart. A printed form is often the best solution for reviewing, signing, and archiving, but the digital-paper version of that form, onscreen, is not clearly the best solution for interactive data-gathering. Users of tax preparation programs such as Intuit’s TurboTax or H&R Block’s TaxCut, for example, want to be able to print pixel-perfect tax forms. But few would prefer to interact with literal representations of those forms. We value the dynamic, spreadsheet-like, wizard-assisted experience that the tax programs provide.

Digital paper is not a medium that easily supports implementation of these dynamic behaviors. This will be a major consideration for developers who pursue Adobe’s solution. It’s fundamental to InfoPath, for example, that documents can have multiple views, that regions of documents can grow on demand, and that repeating elements can appear and multiply. The Acrobat display engine wasn’t built to do these things. Adobe’s Myers admits that some capabilities available in the Forms Designer — notably, repeating elements — won’t initially be supported in Acrobat. “On the PDF side, we have a ways to go still,” he says. “But when you look at future versions of Acrobat, that’s the direction we’re moving.”

Introducing XForms

Any e-forms application that can absorb an XML schema and emit schema-valid XML data is guaranteed to be able to exchange data with XML-aware back-end systems and with other XML-aware clients. But whereas portability of data is rapidly becoming a given, portability of business logic and user-interface behavior is not.

XForms, which can be thought of as HTML forms on steroids, specifies a processing model and set of user-interface controls that are device-neutral and platform-independent. So a form’s interactive behavior and to some extent its business logic can be made portable, too. A key aspect of that portability is the relationship of XForms to its so-called host markup language. In one implementation, the XML syntax defining an XForms form might be embedded in a Web page, using HTML as its host language, and a list of choices would be rendered as an HTML pick list. In another implementation, the same form definition might be embedded in a smartphone application, using VoiceXML as its host language, and the same list of choices would be rendered as on a voice menu.

Another host language is XFDL (Extensible Forms Definition Language), which predated and inspired XForms and is now the foundation of the PureEdge Solutions’ e-forms products. According to John Boyer, the PureEdge senior product architect who co-wrote the XFDL specification, it has two features that make it especially appealing to enterprise developers. First, its built-in compute engine supports a highly declarative approach to business logic, minimizing the need for procedural scripting. Second, it supports flexible and granular security policies. Parts of documents can be signed separately by different people, a key requirement for complex workflow currently met neither by Acrobat nor InfoPath. The only implementation of XFDL is PureEdge’s, but Boyer is a leading contributor to XForms 1.1 and hopes to infuse it with these ideas.

Cardiff is another leading e-forms vendor with a strong hand in the XForms process. Micah Dubinko, principal software engineer at Cardiff, co-edited the XForms 1.0 specification. Cardiff’s XForms implementation, he explains, is “a generator, not an engine.” CTO Mark Seamans elaborates on the point: The company’s LiquidOffice e-forms offering, he says, includes “a universal design environment that’s publish-agnostic.” A form designed in that environment can currently run in InfoPath or Acrobat and will also be able to run in an XForms-based engine.

One advantage of this approach is portability of design effort. “An enterprise could invest three man-years in mouse clicks and drag-drop in InfoPath or the Adobe designer,” Dubinko says. “Cardiff wants to get you off that lock-in path.” Another advantage, specific to XForms, is the ability to target small-format devices — although that has not yet been proved.

Here’s the bottom line if you’re starting a new e-forms project in early 2004. InfoPath is a fine solution when users run Windows, can be expected to install InfoPath, and don’t require interactive digital paper. If you need to leverage Acrobat for one or more of these reasons, you’ll have to wait a while to try out Adobe’s Forms Designer. While waiting, you might want to evaluate Cardiff’s LiquidOffice as means of designing forms that play in Acrobat. If your requirements include complex digital signature scenarios supported neither by InfoPath nor Acrobat, have a look at the PureEdge suite. Finally, if you’re burning need is to build forms that adapt to a range of devices, you’re out of luck. The industry is still figuring out how to do that.

Copyright © 2004 IDG Communications, Inc.