Free Newsletters
InfoWorld Daily

InfoWorld
Log-in | Register
STRATEGIC DEVELOPER  

Information trailblazing

The game is over for proprietary data pumps

By Jon Udell  
February 28, 2003
 

Last week Matt McAlister, InfoWorld's director of online product development, forwarded me a list of the week's 10 most-read stories. I was tickled to see a couple of my stories among them. "You can log in to the reporting system for more detail," Matt said, so I did. Twice. First I logged in to the Web interface that selects report intervals. (No luck doing that on Mac OS X, by the way, where Mozilla, Safari, and MSIE all failed the browser check.) Then I logged in to the Java applet that delivers the reports. Once you burrow into the inner sanctum, you can see the data sliced and diced in every way that the system's designers thought you might need. But there are two huge problems: You can't link to those views, and you can't link to the data that supports them.

Free IT resource

Virtualization Insights from Top Experts - Learn how virtualization gets real!

Sponsored by Dell

Free IT resource

TechNet: More ways to know it, share it, and keep it running.

Sponsored by Microsoft

I won't take potshots by naming the vendor because, in truth, this system is state-of-the-art. Web analytics has been one of my passions for almost a decade, so I know firsthand the challenge of reducing vast quantities of log data into views that make sense to the business sponsors of a Web site. You've got to boil the stuff down in ways that are instantly accessible to those folks, and this system meets that expectation. But we're at an inflection point, I believe, in terms of what regular folks will expect these systems to do.

Consider librarians. As I mentioned on my Weblog, these are non-technical users who have nonetheless begun to describe their OPACs (online public access catalogs) as being "the wrong kind of software" when they can't adapt to the LibraryLookup style of hyperlink-driven integration. It's becoming apparent to everybody that deep linking isn't some obscure geekism, but rather a vital property of information systems. When an OPAC supports deep linking, integration with other systems is trivial. When an OPAC doesn't support deep linking -- for example, because it delivers only a Java interface, or because it encodes session IDs in URLs -- such integration is much, much harder.  Users are starting to notice the difference.

I'm not discounting the value that client-side Java can bring to the table. Rich clients are an increasingly important part of the emerging picture. But please, pretty please, don't force me to use the rich client to get to the data. Use it, instead, to enhance the presentation of XML data that is also highly accessible by way of hyperlinks and (where appropriate) Web services. The heavy lifting done by a Web analytics engine aggregates the raw log data along many dimensions: page views, referrers, paths (sequences of pageviews), you know the drill. Once that hard work is done, make sure it can be leveraged. If you want to use Java or Flash or another rich-client technology to visualize the data, then great, but make sure that users can share those views by passing around easily discovered links.

For extra credit, cache the views as XML files. It costs little to do this, and you open up worlds of possibility. It's great if you can pair those files with XSLT transformations that render views of them, but just caching the work of analysis in URL-accessible XML files creates a terrific resource. That's how AllConsuming.net, a different kind of analysis engine, enables reuse of the book discussion data it harvests from Weblogs. You can get fancy and make SOAP calls to retrieve this cached data; but hey, it's just data, and you can navigate to a directory and scoop it up directly if you like. Here's what DJ Adams, author of Programming Jabber, did with the data:

"While AllConsuming.net can send you book reading recommendations (by email) based on what your friends are reading and commenting about, I thought it might be useful to be able to read any comments that were made on books that you had in your collection. 'I've got book X. Let me know when someone says something about book X.'

So I whipped up a little script ... to grab a user's currently reading and favorite books lists, and then look at the hourly list of latest books mentioned. Any intersections are pushed onto the top of a list of items in an RSS file, which represents a sort of 'commentary alert' feed for that user and his books," he says in his Weblog.

That's all well and good for scripting wonks like DJ (and me, and maybe you too), you're probably thinking, but what about civilians who use off-the-shelf software like Microsoft Office? Funny you should ask. The log analyzer I mentioned does, in fact, have back-door access to the report data. You can download a special client that will suck the data out of the server and feed it into Word or Excel files for display and analysis. But I won't. And soon nobody else will either. Now that Office 2003 can directly consume XML, it's game over for proprietary data pumps. It's a whole new game for systems that blaze information trails for others to follow.





 


 
Jon Udell is lead analyst and blogger in chief at the InfoWorld Test Center.

  More of Jon Udell's column
  Jon Udell's Weblog

Newsletter Check out all of our free newsletters!
Enter e-mail address:




 

TOP NEWS:


»  Four quick tips for choosing an IM security product
71 percent of businesses will invest in real-time messaging this year. If you're one of them, be sure to protect your enterprise

»  Forrester analysts ID hot IT jobs
Research group finds 16 IT roles with a promising future

»  Nvidia claims 10 hours of HD video on Tegra chip
The Tegra 600 and 650 can be used with hard disk drives and are designed partly for mobile Internet devices

»  Database vendors add Google's MapReduce
Greenplum and Aster Data Systems will support Google's programming technique, developed for parallel processing of large data sets across commodity hardware

»  Network management: Tips for managing costs
New technologies, changing requirements, and ongoing equipment maintenance and upgrades cost money, but there are ways to manage expenses

»  EMC targets SMBs, branch offices with new low-end storage
Celerra NX4 highlights include thin provisioning, snapshot technology for data recovery and backups, and Web-based console for management of storage volumes




REMOTE ACCESS: MAINTAIN SECURITY AND DECREASE THE BURDEN ON IT
Join this interactive webcast to discover how IT Managers can control access rights, end-user security settings and end-point authorization. Sponsor: Citrix(R) GoToMyPC(R) Corporate

»  Click here to view this Webcast
  The Path to Enterprise Security
This is your comprehensive guide to Enterprise Security. In it you'll find solutions to the most pressing security threats facing you and your company. Learn the latest on insider threats and how to effectively minimize risk within your organization. Sponsored by Nokia

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 

FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


What's the 411 on GOOG-411?
Just as Google has become synonymous with "performing a Web search," 411 is understood to mean "information" -- as in "what's the 411?" I was thus surprised to discover, from a billboard, no less, that the king of search is taking on the ...

Apple HTML source reveals 'iPhone Extreme'
"This one's a stretch..." reports AppleInsider. Um, yeah. Reporting on HTML code sightings of product names could be called a stretch, but iPhone Extreme has a ring to it. Now, that sounds like the product Apple should have released first, rather ...

COLUMNISTS

Unified under law
Ephraim Schwartz's Column and Blog (InfoWorld) - In the litigious world we live in, deploying a unified communications platform in your enterprise could...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Product Management
When I joined MySQL four years ago, there was quite a lot of debate about product management. We didn't actually have ...

Zero Day 
Botnet herders tending smaller flocks
New research backs up the theory that botnet operators are keeping their networks smaller in a continued effort to keep ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Open Sources
• ProdBlog
• Real World SOA
• Reality Check
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
'If you don't go after the network, you're never going to stop these guys. Never.'
From the State Department, All the News for Inquiring Minds
TechPresident, the Internet Citizenry's New Consensus Taker



Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist