Attention in the wide universe of databases and content management has been drawn lately to XML and, specifically, XML databases.
You’ll get a good indication of the state of XML-based content management technology by examining developments at the ground
floor: the XML database libraries that form a base for larger content management applications.
Two such libraries are the targets of this review: the Apache Software Foundation’s Xindice and Sleepycat’s Berkeley DB XML.
Both are open source, both are free (although the nature of “free” differs between them), and both provide standards-compliant
XML document manipulation. In addition, both are powerful developer tools that place eye-opening XML document storage, query,
and retrieval capabilities into the hands of eager programmers.
Apache Xindice 1.0
Apache Xindice began as the dbXML Core project, but the fruit of that labor transferred to the Xindice group sometime after
2001. Xindice’s documentation makes no bones about its intended audience: It will be of interest only to developers in need
of a solution for storing and manipulating XML data.
Likewise, the Xindice Web site is clear about the package’s limitations; unlike Berkeley DB XML, Xindice does not deal well with large XML documents. Small-to-moderate
documents are best for Xindice, although there’s no precise definition of a “small-to-moderate” XML document -- a megabyte
or smaller is probably in the ballpark.
Installation is simple and deposits on your system the Xindice server executable, a command-line tool, documentation, source,
and a number of examples. Xindice is written entirely in Java, so you’ll need a JDK 1.3 or greater installed to run the Xindice
JAR (Java Archive) file.
The programming interface -- the DB XML API -- is Java as well, but Xindice does not limit itself to the Java language. It
is built on a client-server architecture and supports the XML-RPC API, so remote Java clients can access the server, as can
clients written in other programming languages.
Xindice arranges its storage in the form of “collections,” and all collections exist within a root instance, “/db.” Think
of collections as subfolders in file systems; collections contain “subcollections” to an arbitrary depth. The “files” in this
analogy are the actual XML documents. Querying and updating are typically applied collectionwide, although you can adjust
the granularity to manipulate individual documents.
Command-line control
Xindice’s command-line tool is a godsend for new users. Experimenting with it provides an excellent introduction to Xindice’s
capabilities and will give you a good feel for the programming API when it’s time to turn your attention to development. The
command-line tool is also useful for jump-starting your database. The tool creates new collections, feeds XML documents into
the collections, and even feeds whole subdirectory hierarchies into Xindice (in which case the subfolders appear in the database
as subcollections).
Xindice uses XPath for querying collections and XUpdate for updating them. It would be nice if XQuery were supported, as it
provides for much richer querying, but for now XQuery support is an entry on the Xindice team’s to-do list. The command-line
tool is a great way to test out XPath and XUpdate expressions, but as of this writing the documentation for it is incomplete
and leads one to erroneously conclude that XUpdate is not supported.
A number of sample Java programs are buried in an examples subfolder, with run scripts thoughtfully provided. A rather large
Addressbook Web application is also included, although you must have an installation of Tomcat to run it. Here, as with the
Xindice documentation, everything is a bit rough around the edges, and you must be willing to work your way through some mazes
to avoid the occasional blind alley.
On the security front, you can password-protect a Xindice database, and it’s also thread safe, so multiple clients can connect
without worry. However, there is no transaction support built into Xindice; it is an optional package in the DB XML API and
may be added to the server in the future.
Xindice is an Apache project, so it progresses at a speed governed by the enthusiasm of its participants. In some cases this
is remarkably prompt. But the process is inherently somewhat stochastic, so there are no guarantees concerning when important
modifications or additions (such as handling larger XML files) will be made. What I’ve seen so far, however, will have me
keeping a hopeful eye on the project.
Sleepycat Berkeley DB XML 2.0
Sleepycat recently released Version 2.0 of its DB XML database (see our review of an earlier edition at infoworld.com/1529). Berkeley DB XML sits on top of the venerable Berkeley DB database and inherits Berkeley DB’s transaction
support, crash recovery, deadlock detection, encryption, and other features. In fact, you can freely intermix DB XML databases
and “ordinary” Berkeley DB databases in the same application without having to link additional libraries into that application.
Berkeley DB XML is an open source tool, although there are licensing restrictions that vary depending on how you use and distribute
applications built from the tool (details available at sleepycat.com).