Free Newsletters
InfoWorld Daily

InfoWorld
Log-in | Register
STRATEGIC DEVELOPER  

Web apps, just give me the data

Too many Web apps present data passively, instead of providing real access to it

By Jon Udell  
November 08, 2006
 

If you search the Web for “fortune500.xml,?you’ll find an ordered list of the Fortune 500 companies. It’s just what you’d want if you were writing a custom portfolio application. But it didn’t exist until last week when Doug Purdy, a Microsoft program manager, created it while writing his own personal portfolio application. Because he also blogged the list, you can use it, too.

Free IT resource

Open Source Business Conference (OSBC) May 22-23, 2007

Sponsored by OSBC

Free IT resource

TechNet: More ways to know it, share it, and keep it running.

Sponsored by Microsoft

There are plenty of Fortune 500 lists on the Web. But none of the ones that Doug (or I) could easily find presented the data in a reusable format. At the canonical Web address for the Fortune 500, CNNMoney.com offers the typical Web fare. The master list is chopped up into HTML tables of 100 entries each, for the convenience of advertisers readers. Then there’s the Custom Ranking, which “gives users the chance to sort the Fortune 500 according to the company data they find most interesting.?You can, for example, view just the companies with revenue above or below $4 billion.

What if you’re interested in a $3 billion cutoff? You’d need to get hold of that data and query it yourself. That should be a routine and trivial operation, but as Doug Purdy found out, it’s anything but. Most Web presentations of data are designed for passive viewing, not active analysis.

For an example of what things could and should be like, check out episode 10 of The Screening Room. At the six-minute mark in that screencast about Dabble DB, a Web database, Smallthought Systems?Avi Bryant -- who is analyzing a set of data about investments -- wants to look at investments by U.S. state as a function of population. The current data set includes states but not their populations. To add population data, Avi visits a Web site that lists states and populations, activates a JavaScript bookmarklet, and imports two columns from the HTML table on that Web page.

Scraping data off Web pages can be effective, but it’s far from ideal. Although we think of the Web as a rich trove of data, the pickings are depressingly slim if you want to transform or recombine that data. And there’s no good reason why that should be so. It’s easy to make data available for reuse by human analysts or automatic services.

InfoWorld.com’s Power Search and Metadata Explorer features, for example, present every HTML view accompanied by an alternate XML/RSS view. It required very little effort on my part to make these services mashup-ready. Until recently, I’d have said there was little reward for that effort. Then it paid off last week when InfoWorld’s Web team needed to republish a slice of the data set. You never know how people might reuse the data you publish. If you hope they will, though, but fail to make it usefully available, you pretty much guarantee that they won’t.

Mere access to data does not, of course, yield meaningful interpretation. That’s an art, and a science, that Edward Tufte has been developing for 15 years. In his new book, Beautiful Evidence, he elaborates on methods familiar to longtime readers but still too rarely applied. On page 176 he explores a different way to visualize survival rates for various cancers over time. I blogged a Web treatment of that chart. Don’t like it? Scoop up the data and show me a better way.





 


 
Jon Udell is lead analyst and blogger in chief at the InfoWorld Test Center.

  More of Jon Udell's column
  Jon Udell's Weblog

Newsletter Check out all of our free newsletters!
Enter e-mail address:




 

TOP NEWS:


»  Four quick tips for choosing an IM security product
71 percent of businesses will invest in real-time messaging this year. If you're one of them, be sure to protect your enterprise

»  Forrester analysts ID hot IT jobs
Research group finds 16 IT roles with a promising future

»  Nvidia claims 10 hours of HD video on Tegra chip
The Tegra 600 and 650 can be used with hard disk drives and are designed partly for mobile Internet devices

»  Database vendors add Google's MapReduce
Greenplum and Aster Data Systems will support Google's programming technique, developed for parallel processing of large data sets across commodity hardware

»  Network management: Tips for managing costs
New technologies, changing requirements, and ongoing equipment maintenance and upgrades cost money, but there are ways to manage expenses

»  EMC targets SMBs, branch offices with new low-end storage
Celerra NX4 highlights include thin provisioning, snapshot technology for data recovery and backups, and Web-based console for management of storage volumes




REMOTE ACCESS: MAINTAIN SECURITY AND DECREASE THE BURDEN ON IT
Join this interactive webcast to discover how IT Managers can control access rights, end-user security settings and end-point authorization. Sponsor: Citrix(R) GoToMyPC(R) Corporate

»  Click here to view this Webcast
  WAN Emulation Sponsored Solutions Guide
WAN emulation technology enables IT organizations to predict reliably how applications will perform in a networked environment, before application rollout, mitigating development risk and costs.This Sponsores Solutions Guide has everything you need to now about WAN emulation and WAN and how to best implement it in your organization. Sponsored by Shunra

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 

FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


What's the 411 on GOOG-411?
Just as Google has become synonymous with "performing a Web search," 411 is understood to mean "information" -- as in "what's the 411?" I was thus surprised to discover, from a billboard, no less, that the king of search is taking on the ...

Apple HTML source reveals 'iPhone Extreme'
"This one's a stretch..." reports AppleInsider. Um, yeah. Reporting on HTML code sightings of product names could be called a stretch, but iPhone Extreme has a ring to it. Now, that sounds like the product Apple should have released first, rather ...

COLUMNISTS

Unified under law
Ephraim Schwartz's Column and Blog (InfoWorld) - In the litigious world we live in, deploying a unified communications platform in your enterprise could...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Product Management
When I joined MySQL four years ago, there was quite a lot of debate about product management. We didn't actually have ...

Zero Day 
Botnet herders tending smaller flocks
New research backs up the theory that botnet operators are keeping their networks smaller in a continued effort to keep ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Open Sources
• ProdBlog
• Real World SOA
• Reality Check
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
'If you don't go after the network, you're never going to stop these guys. Never.'
From the State Department, All the News for Inquiring Minds
TechPresident, the Internet Citizenry's New Consensus Taker



Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist