February 27, 2004

Structured change detection

Everything could use a little version control, even this column. XML makes it easier

Andy Hunt and Dave Thomas are apostles of common sense. Their bestselling book, The Pragmatic Programmer, is a thoughtful guide to the craft of programming. Its tenets are closely aligned with those of the Agile Manifesto, which Hunt and Thomas co-wrote. Now they're self-publishing a three-volume "prequel" to The Pragmatic Programmer called The Pragmatic Starter Kit, which focuses on three core sets of skills: version control, unit testing, and automation.

Two of the three volumes are available, and I've just read the first of them: Pragmatic Version Control Using CVS (Concurrent Versions System). It is a spectacularly lucid and useful book that brings CVS novices up to speed in a flash and offers CVS experts new tricks and broader perspectives.

Confession: I'm not (yet) the CVS expert that I should be. One of my excuses doesn't stand up to scrutiny: It's been a long while since I was part of a team programming effort. Working solo, my rationalization has been that formal version control was overkill for the simple coding projects I undertake. But Hunt and Thomas aren't buying that excuse. They understand that friction is the enemy of version control — and they present recipes and scenarios that make the process nearly as frictionless as it can be.

Version control isn't only for code, of course. Any evolving set of documents can benefit from an infinite undo stack and a change narrative. In fact, the Hunt/Thomas book has prompted me to move my InfoWorld columns into a CVS repository — yes, I'm writing this column under version control.

Admittedly, CVS or any source-code control system is a dubious way to manage prose. Deeply wired into source code — and the tools that work with it — is the notion of the 80-character line. The ubiquitous change detector, diff, sees all content as a sequence of lines. Historically, that's worked remarkably well for code and not so well for other content types. A Word document, for example, is structured in terms of sections, subsections, and paragraphs, not lines. So when you're managing a Word document in CVS — as often happens because software projects typically include prose "artifacts" — the recommended strategy is to check it in as a binary file that's exempt from line-by-line change detection.

XML, however, creates a middle ground. Consider two versions of a Word document saved as XML. There are "structured diff " tools that can map the changes at an intermediate level, in terms of XML elements. For example, IBM's AlphaWorks  site offers the XML Diff and Merge Tool for Java, while Microsoft's GotDotNet site offers XML Diff and Patch for .Net. Both of these free tools can track element-level change. To get a sense of what's possible, check out Monsell EDM's online demo of its Delta XML technology. The demo compares two subtly different versions of a complex graphic — the standard SVG (Scalable Vector Graphics) "tiger" benchmark — and animates the differences between the two. It's stunningly cool.

As XML becomes the standard way to represent prose, graphics, and other content, we should expect such change visualization to become routine. What about code? It has sections, subsections, and paragraphs, too. XML isn't — and probably shouldn't be — the primary way we read and write code. But the underlying abstract syntax tree has structure that can — and arguably should — help us see and comprehend the code's evolution.

Close

On Twitter now

Application development

Powered by Twitter

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive InfoWorld Resource Alerts

Subscribe to the Developer World Newsletter

Receive a weekly roundup about the art and science of software development.

©1994-2009 Infoworld, Inc.