XML everywhere

As platform vendors incorporate XML, will content management problems go away?

THE REVOLUTION WROUGHT by Web-based standards and distributed computing environments is having tremendous impact, including the democratization of many specialized enterprise software functions. Business intelligence applications, for example, are embedded under the hood of many general-purpose platforms and are thereby moving closer to their end-users.

CM (content management) is the latest beneficiary, or victim, of this trend. Thanks to XML's proliferation, the concept of a dedicated CM application is challenged by the notion that ubiquitous platforms from companies such as Microsoft, Oracle, and IBM will play a greater role in helping enterprises manage business documents, e-mail, and other unstructured content.

Each of these companies is pointing the way to a holistic approach to managing unstructured content -- as indicated by Microsoft's recent XDocs and Jupiter initiatives. This is an appealing notion to those of us who create mounds of text each day and can't remember where we put our damn notes about that important customer. But the devil will be in the details of these XML-based implementations, which confront a host of unresolved issues, such as openness, integration, and control of the business logic, workflow, and user interfaces.

Incumbents in the content management space -- companies such as Interwoven, Vignette, Documentum, and Filenet -- clearly see the threat from XML and the horizontal platform players. "I think you'll see more of these content management capabilities out of the box," says Howard Shao, CTO of Pleasanton, Calif.-based Documentum. "XML makes configuring a generalized system to a unique need a lot easier." The incumbents are therefore scrambling to broaden their offerings (for example, Documentum's recent merger with collaboration software vendor eRoom), add more application-specific functionality, and move up the value chain with more robust business logic management tools.

Traditional CM solutions center around capabilities such as indexing and tagging, versioning, syndicating, querying, archiving (for compliance), and solving the workflow problems of multiple authors using multiple tools and data types to work collaboratively on multiple documents. The underlying technology is an object store or repository. Any asset or content fragment becomes an object, which can be projected to the contributors and then published dynamically.

In the pre-XML world, each vendor had its own proprietary tagging and indexing schemes, workflow engines, and data repositories for managing content. The content would work well within its own system, but when it interacted with other enterprise systems, it would get bogged down in the muck of metadata and other integration issues. With the rise of XML and Web services, more work is going into standards for modeling content and performing services on content such as versioning, workflow, and search.

A new set of rules

These standards include XML Schema, a standard data model for expression of content such as documents and e-mail; XQuery for querying XML documents; XSL (Extensible Stylesheet Language) for converting XML to other formats such as PDF or HTML; XPath, a navigational query standard for URL-like navigation of XML documents; and WebDAV (Web Distributed Authoring and Versioning), an Internet Engineering Task Force (IETF) standard for Web-based collaboration. Furthermore, efforts are under way within the Prism Group, the Organization for the Advancement of Structured Information Standards (OASIS), and the W3C to develop meta data-related standards and bridge well-known taxonomies using XML and Web services.

However, incumbent CM vendors say these standards haven't yielded deployable results, except for a few ad hoc efforts in specific vertical areas. "People don't know the problem well enough yet," says Documentum's Shao. "Or the people who know it well enough are not ready to generalize it yet."

Nonsense, say the horizontal platform players. The issue is simply how far beyond a general-purpose solution do you need to go for any particular application. "If you're not Los Alamos Labs, you don't need to lock into some exquisite versioning model that only Vignette does," says Sandeepan Banerjee, director of product management for Redwood Shores, Calif.-based Oracle's database product, which supports a variety of XML content management standards. Aside from the high end of the market, Banerjee claims, standard tools will suffice to support broad, common-denominator functionality. "The broad prize is the content management requirements of the mainstream enterprise," he explains.

Oracle and IBM have made clear their intentions to address CM by building on the XML capabilities of their database products. But the incumbent CM vendors say the database technologies are too rigid to effectively handle the job. "Relational databases do not solve unstructured content problems," claims Jack Jia, CTO of Interwoven in Sunnyvale, Calif. "It's a square solution ... you've got rows and columns." Unstructured content, he asserts, has a "random shape ... you cannot easily model that in a relational database."

"You end up creating these compound tables, table after table ... it becomes extremely slow," Jia says. "XML is trying to bridge the two worlds ... it's semistructured." But by its nature XML creates similar kinds of challenges. You start by defining one structure with a Document Type Definition and tags, but then you inevitably have to expand it. "There are so many business rules and decisions you can make that it really behaves like unstructured content," Jia says.

Indeed, business rules and workflow management are becoming key differentiators in content management as vendors try to move up the value chain. Rather than hard-coding business semantics, such as the review cycle for a press release or a standard contract, into the application, the programs deal with a higher level of business logic abstraction. "Content management really goes beyond the version data table," says Documentum's Shao. "We have business logic about the whole content life cycle."

While Oracle and IBM focus on the server side, Microsoft has gone a step further, indicating that it will build XML support and CM-related intelligence into its key client applications, such as Word and Excel, as well as its enterprise servers. The advantage of handling CM on the client is the ability to enhance and control the tagging of data as it is created.

Microsoft, which last year purchased a small Canadian CM software company called Encompass Labs, recently announced that its Content Management server would be combined with its BizTalk business process management server and its Commerce server into a single offering, code-named Jupiter. Simultaneously, the company announced a new client offering, called XDocs, which will be partly an end-user application and partly a development tool. XDocs, which will launch next year, will enable the creation of XML forms and documents that can be dynamically linked to other data throughout the enterprise, as well as to business rules for workflow and CM-like services.

End-to-end XML

Will Microsoft seize the high ground in the enterprise CM battle by capturing and tagging unstructured content at the highest point upstream, when it's entered into XDocs? It will try.

And why did it choose to create a separate application for this, rather than simply build more robust structured data and enterprise integration capabilities into Word and Excel? It will do this as well, starting in Office 11.

"The notion of tagging data when you're working with it is of key importance," explains Neil Charney, a .Net platform strategy group director at Microsoft in Redmond, Wash. "It becomes very interesting to think about enabling your own documents and the data within these documents to be available to others, should you so choose. I think what you're going to see is distributed data stores."

Despite its goal of tagging and managing unstructured content, XDocs is not really a competitive strike at Interwoven and Documentum, but a slap at application vendors such as Siebel. "Right now people are doing this -- they're doing it with Siebel," explains Scott Bishop, Microsoft Office product manager. "But the problem is it requires everybody in the organization to know the Siebel UI, to know that Siebel exists. The competitors really are custom solutions that are being built for products such as Siebel."

So will Microsoft be doing Siebel and the rest of us a favor by validating, indexing, tagging, and XML-izing every keystroke we enter into a Microsoft client application, including the new XDocs? It depends how open Microsoft will be about passing that data along to whatever applications want to consume it in an industry standard format. Right now the company seems to think more data will flow in than out because everybody will prefer to consume their data in a Microsoft client. And just as importantly, how open will Microsoft be in passing along the business and workflow rules that accompany that content?

Did you think XML would put an end to all the fun in the CM space? It's certainly changing the rules of the game, but some of the same old questions remain.

Copyright © 2002 IDG Communications, Inc.