The two-way data web

Atom and RSS are evolving into tools for creating loosely coupled databases on Internet scale

Two years ago, I gave the keynote address on the opening day of XML 2003. The next day, Adam Bosworth delivered a weirdly complementary keynote, in which he began to lay out an idea he’s been developing ever since, first at BEA and now at Google. The idea, in a nutshell, is that the truly scalable databases of the future will be more like the Web than like Oracle, DB2, or SQL Server.

In last month’s ACM Queue, Bosworth elaborated on some of the lessons the Web has taught us about simplicity, human accessibility, “sloppily extensible” formats, the social dimension of software, and loose coupling. But he also introduced a key technical point about RSS and Atom, the feed formats powering the blog revolution. These formats represent sets of items. Typically, the items contain Weblog postings, but they can also contain XML fragments that represent anything under the sun. What’s more, items can link to other items or collections. Bosworth argues that this architecture lends itself to aggressive scale-out, decentralized caching, and grassroots schema evolution, all of which tend to elude conventional databases.

There’s no free lunch, of course. When you query this RSS/Atom data web, you should expect more structural precision than full-text search affords, but you shouldn’t plan on fast execution of complex nested queries.

We’ve yet to colonize the middle ground between these extremes, and I don’t think anyone really knows what the sweet spot will turn out to be. I’ve gotten plenty of mileage out of XPath and XQuery, and my dream is that these XML-oriented query disciplines can be federated at large scale. But first things first: We need to create the data web. And recently, two leading figures have dropped major hints about how that’s going to happen.

The first was Bill Gates, who, in a September interview, told me, “the RSS data web is a natural development coming out of the acceptance of XML ... and we’ve got some ideas internally ... about making RSS work two-way.”

Historically, RSS has been a read-mostly affair. There are APIs through which blogging tools can inject content into publishing systems, which then reflect it back out as XML feeds. But while the blogosphere has at last realized the vision of a two-way Web, RSS as a data transport remains largely asymmetric. Microsoft evidently wants to change that.

The second and much more explicit hint appeared a month later in Adam Bosworth’s ACM Queue article. Atom is both a feed format and a publishing protocol. The latter, Bosworth noted, is “a simple HTTP-based way to INSERT, DELETE, and REPLACE” entries within a feed.

Microsoft developer/blogger Dare Obasanjo responded with a question. “Perhaps,” he asked, “this Atom store, accessible via Atom feeds and the Atom API, is [the rumored] Google Base?” I’d say that’s a good guess.

We’ll surely see more squabbling within the already fragmented world of lightweight XML syndication. But while the RSS feed format won the first round, and I suspect the Atom API will win the next one, don’t take your eye off the ball. This game isn’t about formats and APIs; it’s about the emergence of a data web made of loosely coupled sets of XML fragments that people and processes can easily read and write. Bring it on!