Free Newsletters
Technology & Business Daily

InfoWorld
Log-in | Register
CTO CONNECTION  

Heaven or XM-hell?

XML isn't a panacea, especially if the semantic integrity of data hasn't been maintained properly

By Chad Dickerson  
February 10, 2003
 

Over the past few weeks, InfoWorld has been engaged in an epic IT battle against the forces of business evil: a mountain of data combined with mutant business processes that were the result of staff molding their work habits to inflexible systems that boxed them in. In our case, it was the implementation of a content management system, but it could have just as easily been an electronic trading system at a financial services company or a new fulfillment system for an online retailer. To a large degree, IT is about a few very simple things: moving data around and giving people systems that help them act on data, make sense of data, and ultimately add meaning to data before passing it down the chain. So the ability to move data around quickly is key. Like many of you, we depend on “legacy” systems to do it, and that means any new system must interface with the old ones in the proper ways to be effective. It has been five years since the XML 1.0 spec was released in February 1998, so anyone who has been “doing XML” during that time is in the pleasing position of having beautifully clean XML to migrate from their legacy systems into their new ones. At InfoWorld, we had thousands of XML documents from the past three years that made it a snap to migrate content into our new system.

Free IT resource

Hear how top CIOs turn change into a competitive advantage.

Sponsored by HP

Free IT resource

Attend the SOA Executive Forum: Breaking SOA Bottlenecks SOAExecForum.com/may2007

Sponsored by InfoWorld

If only that were true.

If you look at an XML FAQ ( http://www.ucc.ie/xml/faq.xml ), one question is, “Why is XML such an important development?” Part of the answer is that it removes constraints that Web developers previously dealt with, one of which was the “dependence on a single, inflexible document type (HTML) which was being much abused for tasks it was never designed for.” This is unquestionably true, but I’ve observed an interesting phenomenon as we approach XML’s five-year anniversary. As XML has infiltrated the enterprise, it too has been abused, neglected, and misunderstood. At InfoWorld, we started our data migration project with high hopes, approaching our mother lode of XML data with the tools that any self-respecting 21st century developer would use: Java and XSL. It was all “in XML” — how could we lose?  In the end, we shuffled away from the XML scrap heap with heavy hearts and a mountain of one-off Perl scripts that got the data migration job done. We prevailed, but ultimately it was what you hear some football coaches call “winning ugly.” If XML holds such promise, how could something like this happen at a place such as InfoWorld, where we’ve had a front-row seat for the emergence of XML-based standards? No one intended for our XML data to grow unwieldy over the past few years, but it did. It takes a lot of hard work and attention to maintain the semantic integrity of the data represented in your XML, as your business morphs and changes and new people come along to touch and manipulate the data in different ways. It’s particularly difficult when you’re converting data created by people, ensconced in the daily ebb and flow of messy human life, into a machine-readable format intended for the ages. Data validation is important and should be encouraged and practiced, but like security, only insofar as it allows people reasonable freedom to do their jobs.

The problem goes back to the simple adage: garbage in, garbage out. XML is only meaningful if you insist on it from the beginning and throughout the life of your data. If you allow the fact that your data is “in XML” to lull you to sleep, be prepared for a rude awakening (and a lot of Perl hacking) later.





 


 
Chad Dickerson is CTO of InfoWorld.

  More of Chad Dickerson's column
  Chad Dickerson's Weblog

Newsletter Get Chad's column delivered weekly.
Enter e-mail address:




 

TOP NEWS:


»  Despite financial losses, Microsoft looks to increase investment in online services
Steve Ballmer says that the $488 million loss for the fourth quarter that the online services division reported is insignificant compared to the its potential

»  Think small with Linutop 2 PC
The tiny, energy-efficient Linux-based Linutop 2 is a low-cost, minimalist PC that is eerily quiet to use

»  Sun technologist: SOAP stack a 'failure'
Tim Bray, co-inventor of XML, prefers REST mechanism over SOAP

»  Software piracy hurts the open-source community too
Many nations are beginning to see stolen proprietary software as a lost opportunity for open source software, whose development can encourage innovation and job growth

»  Intel readies slew of embedded chips based on Atom core
Intel is trying to increase performance and drop power consumption in more than 15 system-on-chips that use the Atom core

»  Microsoft surprise reorganization aimed at online woes
Microsoft's online troubles hint at larger vulnerability; the company is facing challenges in areas that have been a lock for many years




Remote Access: Maintain Security and Decrease the Burden on IT
Join this interactive webcast to discover how IT Managers can control access rights, end-user security settings and end-point authorization. Sponsor: Citrix(R) GoToMyPC(R) Corporate

»  Click here to view this Webcast
  Zombie PCs Are Attacking Your LAN
A recent study showed that malware-infected zombie PCs are now a bigger threat to ISPs and Web infrastructure than DoS attacks. As this brand new IT Strategy Guide explains, an increased use of peer-to-peer techniques by the attackers has made it harder to fight back. Download now, compliments of Verio:

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 

FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


What's the 411 on GOOG-411?
Just as Google has become synonymous with "performing a Web search," 411 is understood to mean "information" -- as in "what's the 411?" I was thus surprised to discover, from a billboard, no less, that the king of search is taking on the ...

Apple HTML source reveals 'iPhone Extreme'
"This one's a stretch..." reports AppleInsider. Um, yeah. Reporting on HTML code sightings of product names could be called a stretch, but iPhone Extreme has a ring to it. Now, that sounds like the product Apple should have released first, rather ...

COLUMNISTS

Unified under law
Ephraim Schwartz's Column and Blog (InfoWorld) - In the litigious world we live in, deploying a unified communications platform in your enterprise could...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Product Management
When I joined MySQL four years ago, there was quite a lot of debate about product management. We didn't actually have ...

Zero Day 
Botnet herders tending smaller flocks
New research backs up the theory that botnet operators are keeping their networks smaller in a continued effort to keep ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Open Sources
• ProdBlog
• Real World SOA
• Reality Check
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
'If you don't go after the network, you're never going to stop these guys. Never.'
From the State Department, All the News for Inquiring Minds
TechPresident, the Internet Citizenry's New Consensus Taker



Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist