Free Newsletters
Technology & Business Daily

InfoWorld
Log-in | Register

XML databases evolve

Open source Apache Xindice, Berkeley DB XML set solid base for content management

By Rick Grehan
May 23, 2005
 

Attention in the wide universe of databases and content management has been drawn lately to XML and, specifically, XML databases. You’ll get a good indication of the state of XML-based content management technology by examining developments at the ground floor: the XML database libraries that form a base for larger content management applications.

Free IT resource

TechNet: More ways to know it, share it, and keep it running.

Sponsored by Microsoft

Free IT resource

Attend the SOA Executive Forum: Breaking SOA Bottlenecks SOAExecForum.com/may2007

Sponsored by InfoWorld

Two such libraries are the targets of this review: the Apache Software Foundation’s Xindice and Sleepycat’s Berkeley DB XML. Both are open source, both are free (although the nature of “free” differs between them), and both provide standards-compliant XML document manipulation. In addition, both are powerful developer tools that place eye-opening XML document storage, query, and retrieval capabilities into the hands of eager programmers.

Apache Xindice 1.0
Apache Xindice began as the dbXML Core project, but the fruit of that labor transferred to the Xindice group sometime after 2001. Xindice’s documentation makes no bones about its intended audience: It will be of interest only to developers in need of a solution for storing and manipulating XML data.

Likewise, the Xindice Web site is clear about the package’s limitations; unlike Berkeley DB XML, Xindice does not deal well with large XML documents. Small-to-moderate documents are best for Xindice, although there’s no precise definition of a “small-to-moderate” XML document -- a megabyte or smaller is probably in the ballpark.

Installation is simple and deposits on your system the Xindice server executable, a command-line tool, documentation, source, and a number of examples. Xindice is written entirely in Java, so you’ll need a JDK 1.3 or greater installed to run the Xindice JAR (Java Archive) file.

The programming interface -- the DB XML API -- is Java as well, but Xindice does not limit itself to the Java language. It is built on a client-server architecture and supports the XML-RPC API, so remote Java clients can access the server, as can clients written in other programming languages.

Xindice arranges its storage in the form of “collections,” and all collections exist within a root instance, “/db.” Think of collections as subfolders in file systems; collections contain “subcollections” to an arbitrary depth. The “files” in this analogy are the actual XML documents. Querying and updating are typically applied collectionwide, although you can adjust the granularity to manipulate individual documents.

Command-line control
Xindice’s command-line tool is a godsend for new users. Experimenting with it provides an excellent introduction to Xindice’s capabilities and will give you a good feel for the programming API when it’s time to turn your attention to development. The command-line tool is also useful for jump-starting your database. The tool creates new collections, feeds XML documents into the collections, and even feeds whole subdirectory hierarchies into Xindice (in which case the subfolders appear in the database as subcollections).

Xindice uses XPath for querying collections and XUpdate for updating them. It would be nice if XQuery were supported, as it provides for much richer querying, but for now XQuery support is an entry on the Xindice team’s to-do list. The command-line tool is a great way to test out XPath and XUpdate expressions, but as of this writing the documentation for it is incomplete and leads one to erroneously conclude that XUpdate is not supported.

A number of sample Java programs are buried in an examples subfolder, with run scripts thoughtfully provided. A rather large Addressbook Web application is also included, although you must have an installation of Tomcat to run it. Here, as with the Xindice documentation, everything is a bit rough around the edges, and you must be willing to work your way through some mazes to avoid the occasional blind alley.

On the security front, you can password-protect a Xindice database, and it’s also thread safe, so multiple clients can connect without worry. However, there is no transaction support built into Xindice; it is an optional package in the DB XML API and may be added to the server in the future.

Xindice is an Apache project, so it progresses at a speed governed by the enthusiasm of its participants. In some cases this is remarkably prompt. But the process is inherently somewhat stochastic, so there are no guarantees concerning when important modifications or additions (such as handling larger XML files) will be made. What I’ve seen so far, however, will have me keeping a hopeful eye on the project.

Sleepycat Berkeley DB XML 2.0
Sleepycat recently released Version 2.0 of its DB XML database (see our review of an earlier edition at infoworld.com/1529). Berkeley DB XML sits on top of the venerable Berkeley DB database and inherits Berkeley DB’s transaction support, crash recovery, deadlock detection, encryption, and other features. In fact, you can freely intermix DB XML databases and “ordinary” Berkeley DB databases in the same application without having to link additional libraries into that application.

Berkeley DB XML is an open source tool, although there are licensing restrictions that vary depending on how you use and distribute applications built from the tool (details available at sleepycat.com).


Continued
1 | 2 | Next Page » 



Apache Xindice 1.0

Apache Software Foundation, apache.org/xindice

Very Good  8.6
criteria score weight
Interoperability 9 20%
Performance 8 20%
Scalability 9 20%
Setup 9 20%
Documentation 7 10%
Value 9 10%

Cost:
Free

Platforms:
Linux, Windows (NT or better), Solaris, Mac OS X

Bottom Line:
Another fine product from Apache. Xindice is most easily approached by Java programmers, though other languages can be used with some work. Not as versatile as Berkeley DB XML, since it doesn't yet support XQuery, and the documentation needs work, but Xindice should improve with time.

About our Reviews and Scoring Methodology



Sleepycat Berkeley DB XML 2.0

Sleepycat Software, sleepycat.com

Excellent  9.3
criteria score weight
Interoperability 10 20%
Performance 9 20%
Scalability 9 20%
Setup 9 20%
Documentation 9 10%
Value 10 10%

Cost:
Free

Platforms:
Supports all major OSes

Bottom Line:
Berkeley DB XML is a killer XML database library running a layer above the bulletproof Berkeley DB. Sleepycat has improved this latest version greatly, adding XQuery support, the ability to manage large files with per-node storage, and new documentation to flatten the learning curve.

About our Reviews and Scoring Methodology



 


 
Rick Grehan is a contributing editor at InfoWorld. Contact him at rick_grehan@infoworld.com.
 

TOP NEWS:


»  Think small with Linutop 2 PC
The tiny, energy-efficient Linux-based Linutop 2 is a low-cost, minimalist PC that is eerily quiet to use

»  Sun technologist: SOAP stack a 'failure'
Tim Bray, co-inventor of XML, prefers REST mechanism over SOAP

»  Software piracy hurts the open-source community too
Many nations are beginning to see stolen proprietary software as a lost opportunity for open source software, whose development can encourage innovation and job growth

»  Intel readies slew of embedded chips based on Atom core
Intel is trying to increase performance and drop power consumption in more than 15 system-on-chips that use the Atom core

»  Microsoft surprise reorganization aimed at online woes
Microsoft's online troubles hint at larger vulnerability; the company is facing challenges in areas that have been a lock for many years

»  Attack code released for DNS bug
Security experts warn that this attack code may give cybercriminals a way to launch virtually undetectable phishing attacks




TAKE CONTROL OF YOUR CONTENT- LEVERAGE MICROSOFT SHAREPOINT
Microsoft Office SharePoint Server (MOSS) offers core content management designed for a broad user population. Attend this webcast to learn how to implement a strategy that allows for the coexistence of both MOSS and advanced ECM solution within the same IT environment. Sponsor: IBM

»  Click here to view this Webcast
  Zombie PCs Are Attacking Your LAN
A recent study showed that malware-infected zombie PCs are now a bigger threat to ISPs and Web infrastructure than DoS attacks. As this brand new IT Strategy Guide explains, an increased use of peer-to-peer techniques by the attackers has made it harder to fight back. Download now, compliments of Verio:

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 

FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


What's the 411 on GOOG-411?
Just as Google has become synonymous with "performing a Web search," 411 is understood to mean "information" -- as in "what's the 411?" I was thus surprised to discover, from a billboard, no less, that the king of search is taking on the ...

Apple HTML source reveals 'iPhone Extreme'
"This one's a stretch..." reports AppleInsider. Um, yeah. Reporting on HTML code sightings of product names could be called a stretch, but iPhone Extreme has a ring to it. Now, that sounds like the product Apple should have released first, rather ...

COLUMNISTS

Unified under law
Ephraim Schwartz's Column and Blog (InfoWorld) - In the litigious world we live in, deploying a unified communications platform in your enterprise could...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Product Management
When I joined MySQL four years ago, there was quite a lot of debate about product management. We didn't actually have ...

Zero Day 
Botnet herders tending smaller flocks
New research backs up the theory that botnet operators are keeping their networks smaller in a continued effort to keep ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Open Sources
• ProdBlog
• Real World SOA
• Reality Check
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
'If you don't go after the network, you're never going to stop these guys. Never.'
From the State Department, All the News for Inquiring Minds
TechPresident, the Internet Citizenry's New Consensus Taker



Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist