May 21, 2004

The artful logger

Smart logging can capture a wealth of compelling data. The trick is in deciding what to log

I confess to a deep fascination with the seemingly mundane topic of logging. Software crashes, shopping cart abandonment, and security breaches are among the many situations in which you’ll find yourself poring over logs trying to figure out what went wrong. Like many a developer and network administrator, I honed my Perl programming chops doing the kinds of data reduction and analysis for which that language is ideally suited.

Yet no amount of Perl magic can save the day if your logs capture too little or wrongly focused data. And that’s a bit of a catch-22. To do good sleuthing you’ve got to have deployed the right kinds and levels of instrumentation. But as the data begins to tell its tale, it suggests the need for more or different instrumentation. Because the feedback loop is often attenuated, it’s a real challenge to strike the right balance.

Why not just log everything? Even today’s capacious disks fill up quickly when you turn your loggers’ dials to 10. So adaptive logging is becoming a hot research topic, especially in the field of security. The idea is to let your loggers idle until something suspicious happens, then crank them up. Of course, defining what’s suspicious is the essence of the challenge. Network forensics experts say that it takes, on average, 40 hours of analysis to unravel a half-hour of attack activity — and that’s after the fact. Will autonomic systems someday be able to generate and test hypotheses in real time, while adjusting instrumentation on the fly? I hope so, but I’ll believe it when I see it.

In the field of Web analytics, it’s been fairly straightforward to correlate user interaction with the clickstream recorded in a Web server’s log, but the changing architecture of Web software now threatens old assumptions. When I gave a talk describing how rich Internet applications can converse with Web services, a Web developer in the audience asked, “Where are the logs?” That’s a good question. Local interaction with a Java or .Net or Flash application won’t automatically show up in the clickstream, nor will SOAP calls issued from the rich client. You have to make special provisions to capture these events. That’s eminently doable, but I worry that if logging isn’t always on by default, vital information will often go unrecorded. On the other hand, clickstreams don’t necessarily correlate well to behaviors you’d like to understand. The XML message patterns of a services-based application may enable higher-level and more meaningful analysis.

It’s fun to speculate, but meanwhile our systems keep accumulating logs. How can we deal with them more effectively? Over the years I’ve developed some simple strategies. In the security realm, for example, I like to watch the size of my logs day by day. That’s an easily obtained baseline; deviation from it tells me to look under the hood.

When you want to do Web analytics, here’s a tip: Intelligent namespace design can dramatically simplify the chore. If you consistently embed categories, dates, or other selectors into your URLs, it’s easy to view your logs along those dimensions. I steer clear of content management systems and log analysis tools that don’t offer such flexibility.

Logs can flood us with information, or they can tell us compelling stories. We can influence the outcome by artful and iterative refinement of the data we collect.

Close

On Twitter now

Application development

Powered by Twitter

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive InfoWorld Resource Alerts

Subscribe to the Developer World Newsletter

Receive a weekly roundup about the art and science of software development.

©1994-2009 Infoworld, Inc.