Unlike Xindice, DB XML is not a client/server system; it is a library that you link into -- and that runs in the process space
of -- your application. Bindings are available for numerous languages, including Java, C++, Perl, Python, Tcl (Tool Command
Language), and PHP (PHP: Hypertext Processor). There are also several third-party bindings available for other languages.
Much of what’s in the new 2.0 release is the direct result of user feedback. The preceding release handled documents as single
entities, imparting an upper limit on the size of the document that DB XML could handle (that upper limit was typically set
by available memory, and any XML document exceeding that limit was probably a good candidate for factoring). In this release,
DB XML allows you to store documents either wholesale (as before) or per node -- carved up, if you will.
When you choose per-node storage, documents are taken apart and their individual nodes are stored in separate records in the
database. Consequently, available disk space is the only real upper limit on the size of a document handled. The Berkeley
DB system can deal with databases ranging in sizes as large as 256TB, but only a few people will hit that ceiling.
Document options
As with Xindice, DB XML’s storage uses a collections paradigm. You associate whole-document or per-node storage for a given
collection; all documents in that collection are stored similarly.
Whole-document storage is best if your documents are reasonably small (measuring 1MB or less), and you must process each document
intact. Also, documents retrieved from whole-document storage are byte-for-byte identical to the document that was placed
in storage. That’s important if you want to be able to verify that the content of the document has not been meddled with --
for example, if you’ve added a digital signature to the document.
Per-node storage provides faster queries and updates, because the entire document need not be read in to be processed. And,
as already stated, it allows you to manage extremely large XML documents.
DB XML 2.0 also has a new command-line tool. Like Xindice’s command-line tool, it’s the perfect way to familiarize yourself
with the database’s capabilities. The commands accepted by the tool have a one-to-one correspondence with the product’s API.
Sleepycat was in the process of finishing a tutorial for the command-line tool at the time of my review. I saw an early version
that was already polished enough to be useful, and can say that the tutorial promises to be a worthy guide to the neophyte
DB XML user.
DB XML 2.0 supports XPath and XUpdate as well as the more robust XQuery. As with Xindice, you can use the command-line tool
to familiarize yourself with the syntax of these queries and update dialects. And, like Xindice, DB XML 2.0 provides numerous
examples to work through and explore.
Quite a pair
Both Apache Xindice and Sleepycat Berkeley DB XML allow you to attach indexes to your databases for the purpose of speeding
queries. DB XML, however, gives you greater control over the index type, and thereby allows you to fine-tune an index for
the sorts of queries likely to take place.
In addition, the DB XML command-line tool will return the amount of time taken by a query, so you can experiment with different
index types and query strategies to optimize performance.
Xindice and DB XML 2.0 are top-notch database libraries, although DB XML provides a greater range of features and is polished
to a more impressive sheen. Nevertheless, I expect to see the Xindice project’s feature list lengthen over time. Improvements
in Xindice will only benefit the wide and growing XML database community.