Berkeley DB XML is a database library built on the venerable Berkeley DB engine. Sleepycat engineers erected a layer atop
Berkeley DB, extending that engine and creating a new one that provides XML document storage, management, and querying.

Berkeley DB XML
Sleepycat Software, sleepycat.com
|
Very Good 8.5 |
 |
| criteria |
score |
weight |
| Ease-of-use |
8 |
20% |
 |
| Performance |
9 |
20% |
 |
| Scalability |
8 |
20% |
 |
| Implementation |
9 |
15% |
 |
| Setup |
8 |
15% |
 |
| Value |
9 |
10% |
 |
|
 |
Cost: Free download for single-site use; licenses required for multiple-site use
Platforms: Windows, Linux, Solaris
Bottom Line: Berkeley DB XML is not an end-user tool. It’s a set of libraries for application builders, and those application builders
won’t be disappointed by its sturdy, clever performance and open source characteristics. It’s hard to knock Berkeley DB off
its feet, and Berkeley DB XML rides nicely on its shoulders.
|
 |
About our Reviews and Scoring Methodology
|
|
|
|
So, Berkeley DB XML inherits transaction protection, multiple database access, deadlock detection, encryption, database sizes
up to 25TB, and more from Berkeley DB. In addition, a single application employing the Berkeley DB XML engine can simultaneously
access and freely mix XML databases and “normal” Berkeley DB databases
I downloaded Version 1.2 of Berkeley DB XML, and explored the package’s Java personality. Berkeley DB XML provides bindings
for a number of popular languages: C/C++, Java, Perl, Tcl, and Python (as of Python 2.3). It comes with all the necessary
JAR (Java Archive) files and DLL native libraries to build a complete DB XML application, plus API documentation.
Finally — did I forget to mention? — Berkeley DB XML is an open-source product, so you get all the source code for the engine.
Containers and Queries
Berkeley DB XML stores everything in an abstract entity called a “container,” which is analogous to an RDBMS’s database. “Everything”
in DB XML’s case is synonymous with “XML documents” because a document is the engine’s atom of persistence; you cannot store
or otherwise manipulate pieces of documents. Behind the scenes, Berkeley DB XML converts each document to a string and stores
each string as an individual record in the underlying Berkeley DB database.
Defining containers and adding documents is reasonably simple. From a Java programming perspective, you need only surmount
the small learning curve of deducing which classes map to what entities within the engine and coding the proper initialization
steps before you’re doing serious work.
Berkeley DB XML queries use the XPath 1.0 XML query language standard. The call into the query subsystem takes not only the
XPath query itself, but the query’s context, which consists of the namespace, result type, query variables, and a flag indicating
whether the query is “eager” or “lazy.” Eager queries assemble the entire result set before returning. Lazy queries don’t
complete the query processing until code steps through the result set. These are useful when the result set is large and it’s
likely that the caller won’t examine all of it.
Although database storage is completely document-based, queries return either whole documents or pieces of documents. The
latter query result can be difficult to untangle if the structure of your XML documents and the nature of the query return
multiple pieces from within multiple documents.
Luckily, the Berkeley DB XML documentation suggests an iterative query tactic to avoid this: Program the first query to return
a set of matching documents, then iterate through that set, re-issuing the query on individual documents, and examine the
returned elements.
You can accelerate queries by defining indexes, and Berkeley DB XML has a flexible indexing scheme that lets you create indexes
for elements (or “edges”, which are paths to elements, rather than the elements themselves) and define the index structure
so that it’s optimal for the expected queries.
The engine’s query system maintains index statistics and performs cost-based analysis for query optimization. Often-repeated
queries can be precompiled for even greater performance.
Meta Features
Because Berkeley DB XML manages documents, you would expect it to allow you to attach information that isn’t in the document’s
content. Berkeley DB XML meets those expectations by allowing you to attach metadata in a clever way that leaves the document’s
content unmolested.
When you define metadata for a document, the engine “reflects” that metadata as attributes into the document’s root element:
From the query’s perspective, the metadata name/value pairs are simply XPath-queryable tag attributes.
But that perspective is an illusion — the document’s contents are unchanged. The attributes have actually been “snuck in”
by the engine for the benefit of the query, so you can search for attributes without having to use a special syntax.
Berkeley DB XML’s processing of a whole XML document as a single unit does create some side effects in the way documents are
accessed. As you might imagine, you cannot delete portions of a document; you can only delete the whole thing. Consequently,
modifying or deleting part of an XML document is really an update operation, and an update can only be done by reading the
old document, modifying it in memory, deleting its image from the container, then re-storing the updated version.
Happily, Berkeley DB XML provides an update method that does all this dirty work for you invisibly. But if your application
employs transactions or locks, you have to keep in mind that lock granularity is at the document level. It’s not possible,
for instance, to lock an element within an XML document. This could affect performance if you craft an app such that a lot
of locking is going on and users hold each other up.
Unsung Magic
Possibly the greatest benefit of Berkeley DB XML is its masking of the Berkeley DB system complexity so that a programmer
can easily add XML database capabilities to an application. Berkeley DB XML “pre-tweaks” Berkeley DB parameters for you, so
you can go straight on to programming your app.
But while it hides these details, it does not make them unreachable. If you want to crawl under the hood and retune some of
Berkeley DB’s parameters, you can. In fact, because all the source code is provided, if you want to crawl under the hood and
reverse-engineer the entire engine, you can do that, too.
Berkeley DB XML’s real power is its foundation: The Berkeley DB system is fast and rock-solid. Even better, all the extensions
available to Berkeley DB are instantly available to a Berkeley DB XML application. With free availability on a single-site
installation, plenty of examples, and source code, how can you go wrong?
The only thing I missed in Berkeley DB XML was some sort of query console so that I could easily experiment with XPath queries
and view the results. A Sleepycat engineer told me that, in the next release, they are providing a written sample that would
incorporate many of the features of a query console. I can’t wait.