August 27, 2004

The human information filter

Sites like del.icio.us lead the way in Internet’s grand experiment in information routing

In last week’s column, I mentioned del.icio.us, Joshua Schachter’s “social bookmarking” service. Since then, I’ve explored the service more deeply in a series of blog entries. Using del.icio.us, I’m now able to process information in dramatically more efficient ways. Let’s look at some of the reasons why.

For starters, del.icio.us is a machine-independent way to store bookmarks. From any Web page, you can use a del.icio.us bookmarklet to post the page’s URL, title, description, and a set of keywords or tags. From any computer, you can then recover the page by searching for text in the title or description or by navigating to it using one of its tags.

Dumping your own information into a service is always a concern. What if the service goes belly-up? You need an exit strategy, and del.icio.us provides exactly the right kind. A simple URL retrieves all your posts as an XML file. I now run a scheduled daily fetch of that URL, so that everything I add to del.icio.us is backed up locally.

A clean exit strategy is obviously desirable. Less obvious but equally crucial is a robust entry strategy. How easily can you import your own data into the service? The test case here was an XML file with hundreds of my blog entries. Thanks to the simplicity of del.icio.us’ API, which is similar to REST (representational state transfer), it passed the test with flying colors. After tagging the entries with keywords, I transformed the file into the set of URLs needed to populate my slice of the del.icio.us namespace. Suddenly, my blog entries and InfoWorld columns became navigable in a new and powerful way.

Of course, most blogging systems support categorized browsing. But I quit using my blog that way because I wasn’t interested in building a private taxonomy. A tag in del.icio.us is really a topic in a publish/subscribe network. When I assign a tag to an item, I’m routing the item to a topic. Anyone who subscribes to that topic using its RSS feed can monitor the items flowing to it.

If anyone can publish to a topic, won’t the signal-to-noise ratio degrade? Yes, but del.icio.us has another ace up its sleeve. For a given topic, you could subscribe to all items, but you might rather subscribe to postings only from people whose views on that topic you trust. On the topic of social software, for example, Clay Shirky and Sébastien Paquet are two observers who would make excellent filters.

In a March 2003 column, I wrote about the challenges of doing publish/subscribe at Internet scale. David Rosenblum, who was then CTO of messaging startup PreCache, had described to me an optimization procedure he called “filter merging.” The architecture of del.icio.us lends itself to just that kind of optimization. The combination of several trusted human filters, with respect to some topic of interest, yields a powerful merged filter.

Nothing about del.icio.us is rocket science. A competent developer could re-create the service in short order. And that’s one of its greatest strengths. We’re all becoming information routers, but we’re still discovering how the process needs to work. To do the experiment, we’ll need flexible and lightweight systems that are easy to implement, join, use, and build on. Joshua Schachter has shown how to build the right kind of laboratory.

Close

On Twitter now

Application development

Powered by Twitter

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive InfoWorld Resource Alerts

Subscribe to the Developer World Newsletter

Receive a weekly roundup about the art and science of software development.

©1994-2009 Infoworld, Inc.