Playing with data using XML

ECMAScript  for XML provides the latest method of using XML to mold data

Last December at XML 2003, I met with Jonathan Robie, one of my XML heroes. You’ll find his name on such core XML specifications as DOM, XML Schema, and XQuery. Jonathan is both a co-inventor of XQuery and, as XML program manager at DataDirect, a leading implementer of that standard. So he gets to combine theory with practice — nice work, if you can get it.

When we sat down to lunch, Jonathan’s comment on the keynote talk I’d given the previous day was: “You like to play with data.” Guilty as charged. My slides, for example, were XHTML rather than PowerPoint. Now that’s not unusual at an XML conference, but I did score some points when, on one of the slides, I clicked a link to reveal a JavaScript-based local search engine that performed a structured search of the contents of the slideshow. Using XPath expressions, I was able to search for images, external links, and quotes from Bill Gates and Tim Bray. Try it for yourself.

The title of the slide that contained my little search engine was “Fingerpainting on the universal canvas.” And not coincidentally, the two Gates quotes that it found were expressions of that noble concept, one from a 1990 Comdex keynote and one from the 2000 .Net announcement. It’s actually Bill’s fault that I’m obsessed with the notion of data as Play-doh — a tangible substance that we can squeeze, stretch, and explore directly. Microsoft hasn’t consistently followed through on that vision, but I’m sure it’s correct, and I’m always on the lookout for technologies that can help make it real.

A universal canvas requires a universal way to represent data. Other solutions are conceivable, but let’s accept for now that XML is a reasonable one, and that it’s here to stay. How does XML become Play-doh? The first answer, for me, was Perl with its XML::Parser module. Then came XSLT (XSL Transformation), which traded away procedural idioms to gain declarative transformational power. Then came Python with its libxml/libxslt modules, which married the procedural and declarative styles in a highly interactive way. That’s been my weapon of choice lately, but now there’s a new contender: E4X (ECMAScript for XML).

With E4X, XML becomes a native programming-language data type. The language is ECMAScript, popularly known as JavaScript (in browsers) and ActionScript (in Macromedia’s Flash player). I first heard about E4X from Adam Bosworth, who used it to manage the intelligent browser cache that was part of his Alchemy project. (Now that Bosworth has moved from BEA to Google, by the way, the future of that nascent open source project is uncertain.)

Meanwhile, the first generally available implementation of E4X has been built into the latest version of Rhino, the Mozilla project’s Java-based JavaScript engine. It was demonstrated to me recently by John Schneider, the chief technologist at AgileDelta and lead editor of the E4X specification. I was immediately inspired to try it out, and you can see some sample experiments on my Weblog. I’ll spare you the geeky details, but here’s the gist: More than any other technology, E4X makes XML data feel like Play-doh in the hands of a programmer. That’s not the endgame — I won’t be satisfied until users can reach into their XML data and make new things with it — but it’s a step in the right direction.

Copyright © 2004 IDG Communications, Inc.