February 10, 2003

Heaven or XM-hell?

XML isn't a panacea, especially if the semantic integrity of data hasn't been maintained properly

Over the past few weeks, InfoWorld has been engaged in an epic IT battle against the forces of business evil: a mountain of data combined with mutant business processes that were the result of staff molding their work habits to inflexible systems that boxed them in. In our case, it was the implementation of a content management system, but it could have just as easily been an electronic trading system at a financial services company or a new fulfillment system for an online retailer. To a large degree, IT is about a few very simple things: moving data around and giving people systems that help them act on data, make sense of data, and ultimately add meaning to data before passing it down the chain. So the ability to move data around quickly is key. Like many of you, we depend on “legacy” systems to do it, and that means any new system must interface with the old ones in the proper ways to be effective. It has been five years since the XML 1.0 spec was released in February 1998, so anyone who has been “doing XML” during that time is in the pleasing position of having beautifully clean XML to migrate from their legacy systems into their new ones. At InfoWorld, we had thousands of XML documents from the past three years that made it a snap to migrate content into our new system.

If only that were true.

If you look at an XML FAQ ( http://www.ucc.ie/xml/faq.xml ), one question is, “Why is XML such an important development?” Part of the answer is that it removes constraints that Web developers previously dealt with, one of which was the “dependence on a single, inflexible document type (HTML) which was being much abused for tasks it was never designed for.” This is unquestionably true, but I’ve observed an interesting phenomenon as we approach XML’s five-year anniversary. As XML has infiltrated the enterprise, it too has been abused, neglected, and misunderstood. At InfoWorld, we started our data migration project with high hopes, approaching our mother lode of XML data with the tools that any self-respecting 21st century developer would use: Java and XSL. It was all “in XML” — how could we lose?  In the end, we shuffled away from the XML scrap heap with heavy hearts and a mountain of one-off Perl scripts that got the data migration job done. We prevailed, but ultimately it was what you hear some football coaches call “winning ugly.” If XML holds such promise, how could something like this happen at a place such as InfoWorld, where we’ve had a front-row seat for the emergence of XML-based standards? No one intended for our XML data to grow unwieldy over the past few years, but it did. It takes a lot of hard work and attention to maintain the semantic integrity of the data represented in your XML, as your business morphs and changes and new people come along to touch and manipulate the data in different ways. It’s particularly difficult when you’re converting data created by people, ensconced in the daily ebb and flow of messy human life, into a machine-readable format intended for the ages. Data validation is important and should be encouraged and practiced, but like security, only insofar as it allows people reasonable freedom to do their jobs.

The problem goes back to the simple adage: garbage in, garbage out. XML is only meaningful if you insist on it from the beginning and throughout the life of your data. If you allow the fact that your data is “in XML” to lull you to sleep, be prepared for a rude awakening (and a lot of Perl hacking) later.

 

Close

On Twitter now

Architecture

Powered by Twitter

On Twitter now

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Sign up to receive Architecture Resource Alerts

Subscribe to the Today's Headlines: First Look Newsletter

Find out what will be news for the day, with our first-thing-in-the-morning briefing.

©1994-2009 Infoworld, Inc.