May 04, 2005

Paving the information footpaths

Information systems should adapt to our usage patterns, but making that happen is no easy task

I’m sure there are dozens of versions of this story, but I heard it from Larry Wall, the father of Perl, and it goes like this: Instead of laying down sidewalks, the builders of a new university campus waited for footpaths to emerge on the lawns. Then they paved the footpaths. Larry designed Perl around this idea of structure emerging from use, but that was an unusual case. We typically lay down the sidewalks first, and when footpaths emerge we profess surprise or try to ignore them.

I recently learned, for example, that developers have found an unexpected use for the new XML data type in SQL Server 2005 (code-named Yukon). Although Yukon embeds the .Net CLR (Common Language Runtime) and stores CLR objects in database columns, people have also been serializing CLR object data and storing it as XML. It’s true that there’s an 8KB limit on stored CLR objects, but that’s not the only reason folks are coloring outside the lines. They’re also escaping constraints that make it hard to evolve data structures in response to patterns of use.

Another example comes from a recent conversation with John Schneider of AgileDelta, who analyzed message traffic flowing through military systems during a four-year period. Although the messages were in theory governed by schemas, in practice nearly all of them extended or deviated from those schemas. Of course that didn’t prevent soldiers from calling in air strikes. A system that simply failed on receipt of an invalid message could not survive in the fog of war.

We see this kind of example everywhere on the Web. Many if not most Web pages are malformed. If browsers had required correct HTML, the Web would have been stillborn. Similarly, RSS, arguably the most popular application of XML, has no schema. If XML parsers had required schema validation in addition to well-formedness, the blogosphere never would have emerged.

We’re learning to tolerate these sloppy practices, and even to appreciate them, but we haven’t really begun to work shoulder-to-shoulder with them. Here are two strategies for doing that: opportunistic enhancement and statistical classification.

My recent XQuery experiments illustrate the idea of opportunistic enhancement. Virtually all the blogs I read can be converted from HTML to XHTML. In that form they can be pumped into an XML database and be queried in a structured way. If the structure implicit in content is revealed and made useful, we might kick off a virtuous cycle.

The social tagging systems are a laboratory in which techniques of statistical classification will be explored. As Clay Shirky has pointed out, the terms “movies,” “film,” and “cinema” are not just synonyms; they encode real cultural differences. A taxonomy that stamps out those differences won’t serve the various constituencies. We can still build systems around taxonomies, but we have to let the footpaths emerge, and in this realm they’re just fuzzy statistical traces.

It’s easy to criticize information systems that fail to embrace sloppiness. It’s much harder to explain how they should embrace it. Sloppiness is only a means to an end. In order to make things work and get things done, we need to codify patterns of use. It’s a catch-22, though. The right patterns don’t emerge from systems that people won’t use. How we reconcile specification with emergence isn’t an engineering discipline, but it probably should be.

Close

On Twitter now

Application development

Powered by Twitter

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive InfoWorld Resource Alerts

Subscribe to the Developer World Newsletter

Receive a weekly roundup about the art and science of software development.

©1994-2009 Infoworld, Inc.