December 13, 2006

Social context for data analysis

Online tools are unlocking the inherently social nature of public data analysis

I’m a huge fan of the CAPStat (formerly DCStat) program. At InfoWorld’s recent SOA Executive Forum this fall, I taped a video interview with Dan Thomas. His innovative efforts led to the Web release of a set of data feeds from the office of Washington, D.C.’s CTO, detailing information about such areas as real estate, reported crime, licensing, and service requests. Earlier I published a podcast and a column on this topic. But despite my cheerleading, the hoped-for citizen-led mashups haven’t yet materialized in a big way.

In principle, the data is there for the taking, and there’s an open invitation for anyone to scoop it up and do useful analysis. In practice, only half the battle is won — thanks to the immediate availability of data represented as RSS, Atom, and the district’s own, richer flavor of XML. It’s great to lay your hands on the data, but as Bob Glushko rightly insists on reminding me, XML only seems to be a self-describing format. What do tags or field names really mean? Which elements or fields are or are not comparable? We can only answer these questions by pointing to instances of data (records, documents), discussing them, and coming to agreements.

Lately, I’m seeing some intriguing glimpses of how that process could work productively on the Web. One stunning example is the short clip that I extracted from October’s Dabble DB screencast. The clip shows how Dabble DB enables you to pluck data right from the surface of a Web page and inject it into a shareable Web database. Once it’s there, the whole panoply of Web-2.0-style techniques — linking, tagging, blogging — can support a loosely coupled conversation about the provenance and the semantics of the data.

Today I found another piece of the puzzle — a new site called Swivel. It’s done in the standard Web 2.0 style, complete with regulation Flickr-blue search buttons and Ruby on Rails URL syntax. To tell you the truth, I’m not sure how useful it’ll turn out to be. But the idea at the core of Swivel — inviting people to publish, annotate, and share datasets — is spot on.

As a first experiment, I grabbed the CAPStat reported-crime feed for November, sucked it into Excel 2003, consolidated incidents by day, pivoted them on type of offense (homicide, burglary), and exported them back out as a CSV (comma-separated value) file that Swivel could import. The service immediately produced a chart for each of the nine crime types in my data set. Eventually the site will “swivel” my data, a process of further analysis that it assures me will be “worth the wait.” I dunno, maybe — I’m not holding my breath. Poking around, I haven’t found any breathtaking examples of mechanical insight.

But there’s something a lot simpler, yet I think also a lot more useful, going on here. The charts are fun to look at, but it’s the data (and the source attributions) that really matter. When it’s parked in the cloud, other people can find it by way of search terms (‘washington,’ ‘burglary,’ ‘arson,’ ‘dcstat’). And whether Dabble DB massages it online or Excel does so locally, they can gather around a common URL to discuss how to use and interpret the data.

Data analysis is an inherently social act. Until now it has lacked an appropriate social context. But that’s going to change — and soon, I hope.

Close

On Twitter now

Platforms

Powered by Twitter

On Twitter now

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive Platforms Resource Alerts

Subscribe to the Today's Headlines: First Look Newsletter

Find out what will be news for the day, with our first-thing-in-the-morning briefing.

©1994-2009 Infoworld, Inc.