November 08, 2006

Web apps, just give me the data

Too many Web apps present data passively, instead of providing real access to it

If you search the Web for “fortune500.xml,?you’ll find an ordered list of the Fortune 500 companies. It’s just what you’d want if you were writing a custom portfolio application. But it didn’t exist until last week when Doug Purdy, a Microsoft program manager, created it while writing his own personal portfolio application. Because he also blogged the list, you can use it, too.

There are plenty of Fortune 500 lists on the Web. But none of the ones that Doug (or I) could easily find presented the data in a reusable format. At the canonical Web address for the Fortune 500, CNNMoney.com offers the typical Web fare. The master list is chopped up into HTML tables of 100 entries each, for the convenience of advertisers readers. Then there’s the Custom Ranking, which “gives users the chance to sort the Fortune 500 according to the company data they find most interesting.?You can, for example, view just the companies with revenue above or below $4 billion.

What if you’re interested in a $3 billion cutoff? You’d need to get hold of that data and query it yourself. That should be a routine and trivial operation, but as Doug Purdy found out, it’s anything but. Most Web presentations of data are designed for passive viewing, not active analysis.

For an example of what things could and should be like, check out episode 10 of The Screening Room. At the six-minute mark in that screencast about Dabble DB, a Web database, Smallthought Systems?Avi Bryant -- who is analyzing a set of data about investments -- wants to look at investments by U.S. state as a function of population. The current data set includes states but not their populations. To add population data, Avi visits a Web site that lists states and populations, activates a JavaScript bookmarklet, and imports two columns from the HTML table on that Web page.

Scraping data off Web pages can be effective, but it’s far from ideal. Although we think of the Web as a rich trove of data, the pickings are depressingly slim if you want to transform or recombine that data. And there’s no good reason why that should be so. It’s easy to make data available for reuse by human analysts or automatic services.

InfoWorld.com’s Power Search and Metadata Explorer features, for example, present every HTML view accompanied by an alternate XML/RSS view. It required very little effort on my part to make these services mashup-ready. Until recently, I’d have said there was little reward for that effort. Then it paid off last week when InfoWorld’s Web team needed to republish a slice of the data set. You never know how people might reuse the data you publish. If you hope they will, though, but fail to make it usefully available, you pretty much guarantee that they won’t.

Mere access to data does not, of course, yield meaningful interpretation. That’s an art, and a science, that Edward Tufte has been developing for 15 years. In his new book, Beautiful Evidence, he elaborates on methods familiar to longtime readers but still too rarely applied. On page 176 he explores a different way to visualize survival rates for various cancers over time. I blogged a Web treatment of that chart. Don’t like it? Scoop up the data and show me a better way.

Close

On Twitter now

Application development

Powered by Twitter

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive InfoWorld Resource Alerts

Subscribe to the Developer World Newsletter

Receive a weekly roundup about the art and science of software development.

©1994-2009 Infoworld, Inc.