May 23, 2005

XML databases evolve

Open source Apache Xindice, Berkeley DB XML set solid base for content management

Attention in the wide universe of databases and content management has been drawn lately to XML and, specifically, XML databases. You’ll get a good indication of the state of XML-based content management technology by examining developments at the ground floor: the XML database libraries that form a base for larger content management applications.

Two such libraries are the targets of this review: the Apache Software Foundation’s Xindice and Sleepycat’s Berkeley DB XML. Both are open source, both are free (although the nature of “free” differs between them), and both provide standards-compliant XML document manipulation. In addition, both are powerful developer tools that place eye-opening XML document storage, query, and retrieval capabilities into the hands of eager programmers.

Apache Xindice 1.0

Apache Xindice began as the dbXML Core project, but the fruit of that labor transferred to the Xindice group sometime after 2001. Xindice’s documentation makes no bones about its intended audience: It will be of interest only to developers in need of a solution for storing and manipulating XML data.

Likewise, the Xindice Web site is clear about the package’s limitations; unlike Berkeley DB XML, Xindice does not deal well with large XML documents. Small-to-moderate documents are best for Xindice, although there’s no precise definition of a “small-to-moderate” XML document -- a megabyte or smaller is probably in the ballpark.

Installation is simple and deposits on your system the Xindice server executable, a command-line tool, documentation, source, and a number of examples. Xindice is written entirely in Java, so you’ll need a JDK 1.3 or greater installed to run the Xindice JAR (Java Archive) file.

The programming interface -- the DB XML API -- is Java as well, but Xindice does not limit itself to the Java language. It is built on a client-server architecture and supports the XML-RPC API, so remote Java clients can access the server, as can clients written in other programming languages.

Xindice arranges its storage in the form of “collections,” and all collections exist within a root instance, “/db.” Think of collections as subfolders in file systems; collections contain “subcollections” to an arbitrary depth. The “files” in this analogy are the actual XML documents. Querying and updating are typically applied collectionwide, although you can adjust the granularity to manipulate individual documents.

Command-line control

Xindice’s command-line tool is a godsend for new users. Experimenting with it provides an excellent introduction to Xindice’s capabilities and will give you a good feel for the programming API when it’s time to turn your attention to development. The command-line tool is also useful for jump-starting your database. The tool creates new collections, feeds XML documents into the collections, and even feeds whole subdirectory hierarchies into Xindice (in which case the subfolders appear in the database as subcollections).

Xindice uses XPath for querying collections and XUpdate for updating them. It would be nice if XQuery were supported, as it provides for much richer querying, but for now XQuery support is an entry on the Xindice team’s to-do list. The command-line tool is a great way to test out XPath and XUpdate expressions, but as of this writing the documentation for it is incomplete and leads one to erroneously conclude that XUpdate is not supported.

Test Center Scorecard
20%20%20%20%10%10%
Apache Xindice 1.0989979
8.6
Very Good
20%20%20%20%10%10%
Sleepycat Berkeley DB XML 2.010999910
9.3
Excellent

Sign up to receive Data Management Resource Alerts

Subscribe to the Technology: Data Management Newsletter

The one-stop resource center for IT professionals.

©1994-2009 Infoworld, Inc.