February 27, 2004

Structured change detection

Everything could use a little version control, even this column. XML makes it easier

Andy Hunt and Dave Thomas are apostles of common sense. Their bestselling book, The Pragmatic Programmer, is a thoughtful guide to the craft of programming. Its tenets are closely aligned with those of the Agile Manifesto, which Hunt and Thomas co-wrote. Now they're self-publishing a three-volume "prequel" to The Pragmatic Programmer called The Pragmatic Starter Kit, which focuses on three core sets of skills: version control, unit testing, and automation.

Two of the three volumes are available, and I've just read the first of them: Pragmatic Version Control Using CVS (Concurrent Versions System). It is a spectacularly lucid and useful book that brings CVS novices up to speed in a flash and offers CVS experts new tricks and broader perspectives.

Confession: I'm not (yet) the CVS expert that I should be. One of my excuses doesn't stand up to scrutiny: It's been a long while since I was part of a team programming effort. Working solo, my rationalization has been that formal version control was overkill for the simple coding projects I undertake. But Hunt and Thomas aren't buying that excuse. They understand that friction is the enemy of version control — and they present recipes and scenarios that make the process nearly as frictionless as it can be.

Version control isn't only for code, of course. Any evolving set of documents can benefit from an infinite undo stack and a change narrative. In fact, the Hunt/Thomas book has prompted me to move my InfoWorld columns into a CVS repository — yes, I'm writing this column under version control.

Admittedly, CVS or any source-code control system is a dubious way to manage prose. Deeply wired into source code — and the tools that work with it — is the notion of the 80-character line. The ubiquitous change detector, diff, sees all content as a sequence of lines. Historically, that's worked remarkably well for code and not so well for other content types. A Word document, for example, is structured in terms of sections, subsections, and paragraphs, not lines. So when you're managing a Word document in CVS — as often happens because software projects typically include prose "artifacts" — the recommended strategy is to check it in as a binary file that's exempt from line-by-line change detection.

XML, however, creates a middle ground. Consider two versions of a Word document saved as XML. There are "structured diff " tools that can map the changes at an intermediate level, in terms of XML elements. For example, IBM's AlphaWorks  site offers the XML Diff and Merge Tool for Java, while Microsoft's GotDotNet site offers XML Diff and Patch for .Net. Both of these free tools can track element-level change. To get a sense of what's possible, check out Monsell EDM's online demo of its Delta XML technology. The demo compares two subtly different versions of a complex graphic — the standard SVG (Scalable Vector Graphics) "tiger" benchmark — and animates the differences between the two. It's stunningly cool.

As XML becomes the standard way to represent prose, graphics, and other content, we should expect such change visualization to become routine. What about code? It has sections, subsections, and paragraphs, too. XML isn't — and probably shouldn't be — the primary way we read and write code. But the underlying abstract syntax tree has structure that can — and arguably should — help us see and comprehend the code's evolution.

Read more about software development in InfoWorld's Developer World Channel.

Close

On Twitter now

Application development

Powered by Twitter
additional resources
White Paper - How to Improve Delivery of Advanced Web Applications

White Paper

Virtual Workforce: The Key to Expanding The Business While Cutting Costs

Get the independent advice and expertise you need to support a virtual workforce.

Go inside:
The three-step approach to making a virtual workforce a reality.
The four flavors of client virtualization technologies.
The three key initiatives that solve IT challenges.
Download now »
White Paper: Successfully Secure Your Wireless LAN With Wi-Fi firewalls.

White Paper

Addressing Linux Threats Leveraging Fewer Resources

The increase in Linux popularity has increased the frequency and sophistication of malware attacks. Read this 2 page white paper now to learn how you can protect your Linux environment with real-time protection that is certified by all major Linux vendors.

Download now »
White Paper - The 2009 Handbook of Application Delivery

White Paper

The 2009 Handbook of Application Delivery

Ensuring acceptable application delivery will become even more difficult over the next few years. As a result, IT organizations need to ensure that the approach that they take to resolving the current application delivery challenges can scale to support the emerging challenges. This handbook elaborates on the key tasks associated with planning, optimization, management and control and provides decision criteria to help IT organizations choose appropriate solutions.

Download now »
White Paper - Is Your Backup System Outdated?

White Paper

Mid-range Storage Considerations

A common misconception is that mid-range storage requirements are dramatically different than that of a larger enterprise. Mid-range storage users may require less capacity, but they have similar functionality and management requirements. This ESG paper examines mid-range storage needs and reviews a new solution that adjusts size while retaining value, performance and functionality.

Download now »

Developer World Newsletter

Receive a weekly roundup about the art and science of software development.

©1994-2010 Infoworld, Inc.