Free Newsletters
InfoWorld Daily

InfoWorld
Log-in | Register

Berkeley DB adds XML smarts

Sleepycat’s Berkeley DB XML database library combines sturdy Berkley DB engine with XML doc management

By Rick Grehan
April 09, 2004
 

Berkeley DB XML is a database library built on the venerable Berkeley DB engine. Sleepycat engineers erected a layer atop Berkeley DB, extending that engine and creating a new one that provides XML document storage, management, and querying.

Free IT resource

Hear how top CIOs turn change into a competitive advantage.

Sponsored by HP

Free IT resource

Attend the SOA Executive Forum: Breaking SOA Bottlenecks SOAExecForum.com/may2007

Sponsored by InfoWorld



Berkeley DB XML

Sleepycat Software, sleepycat.com

Very Good  8.5
criteria score weight
Ease-of-use 8 20%
Performance 9 20%
Scalability 8 20%
Implementation 9 15%
Setup 8 15%
Value 9 10%

Cost:
Free download for single-site use; licenses required for multiple-site use

Platforms:
Windows, Linux, Solaris

Bottom Line:
Berkeley DB XML is not an end-user tool. It’s a set of libraries for application builders, and those application builders won’t be disappointed by its sturdy, clever performance and open source characteristics. It’s hard to knock Berkeley DB off its feet, and Berkeley DB XML rides nicely on its shoulders.

About our Reviews and Scoring Methodology

So, Berkeley DB XML inherits transaction protection, multiple database access, deadlock detection, encryption, database sizes up to 25TB, and more from Berkeley DB. In addition, a single application employing the Berkeley DB XML engine can simultaneously access and freely mix XML databases and “normal” Berkeley DB databases

I downloaded Version 1.2 of Berkeley DB XML, and explored the package’s Java personality. Berkeley DB XML provides bindings for a number of popular languages: C/C++, Java, Perl, Tcl, and Python (as of Python 2.3). It comes with all the necessary JAR (Java Archive) files and DLL native libraries to build a complete DB XML application, plus API documentation.

Finally — did I forget to mention? — Berkeley DB XML is an open-source product, so you get all the source code for the engine.

Containers and Queries

Berkeley DB XML stores everything in an abstract entity called a “container,” which is analogous to an RDBMS’s database. “Everything” in DB XML’s case is synonymous with “XML documents” because a document is the engine’s atom of persistence; you cannot store or otherwise manipulate pieces of documents. Behind the scenes, Berkeley DB XML converts each document to a string and stores each string as an individual record in the underlying Berkeley DB database.

Defining containers and adding documents is reasonably simple. From a Java programming perspective, you need only surmount the small learning curve of deducing which classes map to what entities within the engine and coding the proper initialization steps before you’re doing serious work.

Berkeley DB XML queries use the XPath 1.0 XML query language standard. The call into the query subsystem takes not only the XPath query itself, but the query’s context, which consists of the namespace, result type, query variables, and a flag indicating whether the query is “eager” or “lazy.” Eager queries assemble the entire result set before returning. Lazy queries don’t complete the query processing until code steps through the result set. These are useful when the result set is large and it’s likely that the caller won’t examine all of it.

Although database storage is completely document-based, queries return either whole documents or pieces of documents. The latter query result can be difficult to untangle if the structure of your XML documents and the nature of the query return multiple pieces from within multiple documents.

Luckily, the Berkeley DB XML documentation suggests an iterative query tactic to avoid this: Program the first query to return a set of matching documents, then iterate through that set, re-issuing the query on individual documents, and examine the returned elements.

You can accelerate queries by defining indexes, and Berkeley DB XML has a flexible indexing scheme that lets you create indexes for elements (or “edges”, which are paths to elements, rather than the elements themselves) and define the index structure so that it’s optimal for the expected queries.

The engine’s query system maintains index statistics and performs cost-based analysis for query optimization. Often-repeated queries can be precompiled for even greater performance.

Meta Features

Because Berkeley DB XML manages documents, you would expect it to allow you to attach information that isn’t in the document’s content. Berkeley DB XML meets those expectations by allowing you to attach metadata in a clever way that leaves the document’s content unmolested.

When you define metadata for a document, the engine “reflects” that metadata as attributes into the document’s root element: From the query’s perspective, the metadata name/value pairs are simply XPath-queryable tag attributes.

But that perspective is an illusion — the document’s contents are unchanged. The attributes have actually been “snuck in” by the engine for the benefit of the query, so you can search for attributes without having to use a special syntax.

Berkeley DB XML’s processing of a whole XML document as a single unit does create some side effects in the way documents are accessed. As you might imagine, you cannot delete portions of a document; you can only delete the whole thing. Consequently, modifying or deleting part of an XML document is really an update operation, and an update can only be done by reading the old document, modifying it in memory, deleting its image from the container, then re-storing the updated version.

Happily, Berkeley DB XML provides an update method that does all this dirty work for you invisibly. But if your application employs transactions or locks, you have to keep in mind that lock granularity is at the document level. It’s not possible, for instance, to lock an element within an XML document. This could affect performance if you craft an app such that a lot of locking is going on and users hold each other up.

Unsung Magic

Possibly the greatest benefit of Berkeley DB XML is its masking of the Berkeley DB system complexity so that a programmer can easily add XML database capabilities to an application. Berkeley DB XML “pre-tweaks” Berkeley DB parameters for you, so you can go straight on to programming your app.

But while it hides these details, it does not make them unreachable. If you want to crawl under the hood and retune some of Berkeley DB’s parameters, you can. In fact, because all the source code is provided, if you want to crawl under the hood and reverse-engineer the entire engine, you can do that, too.

Berkeley DB XML’s real power is its foundation: The Berkeley DB system is fast and rock-solid. Even better, all the extensions available to Berkeley DB are instantly available to a Berkeley DB XML application. With free availability on a single-site installation, plenty of examples, and source code, how can you go wrong?

The only thing I missed in Berkeley DB XML was some sort of query console so that I could easily experiment with XPath queries and view the results. A Sleepycat engineer told me that, in the next release, they are providing a written sample that would incorporate many of the features of a query console. I can’t wait.





 


 
Rick Grehan is a contributing editor at InfoWorld. Contact him at rick_grehan@infoworld.com.
 

TOP NEWS:


»  Four quick tips for choosing an IM security product
71 percent of businesses will invest in real-time messaging this year. If you're one of them, be sure to protect your enterprise

»  Forrester analysts ID hot IT jobs
Research group finds 16 IT roles with a promising future

»  Nvidia claims 10 hours of HD video on Tegra chip
The Tegra 600 and 650 can be used with hard disk drives and are designed partly for mobile Internet devices

»  Database vendors add Google's MapReduce
Greenplum and Aster Data Systems will support Google's programming technique, developed for parallel processing of large data sets across commodity hardware

»  Network management: Tips for managing costs
New technologies, changing requirements, and ongoing equipment maintenance and upgrades cost money, but there are ways to manage expenses

»  EMC targets SMBs, branch offices with new low-end storage
Celerra NX4 highlights include thin provisioning, snapshot technology for data recovery and backups, and Web-based console for management of storage volumes




MIGRATING TO VISTA
Join Windows Vista Expert, Richard Whitehead as he presents the benefits and challenges of migrating to Windows Vista. Sponsored by Novell

»  Click here to view this Webcast
  The Path to Enterprise Security
This is your comprehensive guide to Enterprise Security. In it you'll find solutions to the most pressing security threats facing you and your company. Learn the latest on insider threats and how to effectively minimize risk within your organization. Sponsored by Nokia

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 

FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


What's the 411 on GOOG-411?
Just as Google has become synonymous with "performing a Web search," 411 is understood to mean "information" -- as in "what's the 411?" I was thus surprised to discover, from a billboard, no less, that the king of search is taking on the ...

Apple HTML source reveals 'iPhone Extreme'
"This one's a stretch..." reports AppleInsider. Um, yeah. Reporting on HTML code sightings of product names could be called a stretch, but iPhone Extreme has a ring to it. Now, that sounds like the product Apple should have released first, rather ...

COLUMNISTS

Unified under law
Ephraim Schwartz's Column and Blog (InfoWorld) - In the litigious world we live in, deploying a unified communications platform in your enterprise could...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Product Management
When I joined MySQL four years ago, there was quite a lot of debate about product management. We didn't actually have ...

Zero Day 
Botnet herders tending smaller flocks
New research backs up the theory that botnet operators are keeping their networks smaller in a continued effort to keep ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Open Sources
• ProdBlog
• Real World SOA
• Reality Check
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
'If you don't go after the network, you're never going to stop these guys. Never.'
From the State Department, All the News for Inquiring Minds
TechPresident, the Internet Citizenry's New Consensus Taker



Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist