Free Newsletters
InfoWorld Daily

InfoWorld
Log-in | Register
STRATEGIC DEVELOPER  

XML's quirky namespaces

You may not understand namespaces now, but soon you may have to

By Jon Udell  
July 13, 2005
 

Last month, Microsoft announced that the forthcoming Office 12 will save to XML by default and that earlier versions will be retrofitted to work with XML. This week Apple released its podcast-aware version of iTunes and defined an extension to RSS 2.0 for use with its online music store. Over the next year or so, these initiatives will create millions of new users of XML. They'll also expose thousands of developers to a feature of XML that's caused more than its fair share of headaches: namespaces.

Free IT resource

TechNet: More ways to know it, share it, and keep it running.

Sponsored by Microsoft

Free IT resource

Attend the SOA Executive Forum: Breaking SOA Bottlenecks SOAExecForum.com/may2007

Sponsored by InfoWorld

You can perform all kinds of useful XML processing without ever touching a namespace, and many developers do. Most flavors of RSS don't use namespaces, for example. The tag names -- title, link, description -- are only implicitly associated with RSS feeds, and that's fine for many purposes. But what happens when you extract an RSS item from a feed, mix it with a chunk of XML from some other source, and produce an HTML page? Now you need to be able to distinguish the title of the RSS item from, say, the title of the HTML page.

Modular namespaces are a familiar concept in many realms. Area codes disambiguate phone numbers; domain names qualify URLs; package names scope identifiers in programs. Partitioning XML vocabularies in the same way seems like a natural thing to do, and it is. But for a variety of reasons explained in Ronald Bourret's "Namespace Myths Exploded" -- an essay written way back in 2000 that still resonates today -- XML namespaces cause a lot of confusion.

Recently, for example, I needed to process some RSS 1.0 feeds. An RSS 1.0 feed is actually rooted in the RDF (Resource Description Framework) namespace, though its items live in the RSS 1.0 namespace. Such feeds typically also weave in elements from other namespaces -- for example, Dublin Core metadata. My task was simple: parse the feed, use XPath queries to carve out items, and unpack the elements contained within those items.

This proved surprisingly hard to do with my regular XML parser and toolkit, libxml2, which deals strictly with namespaces. I then repeated the exercise using three other toolkits -- Python's minidom module, E4X (ECMAScript for XML) implemented using Rhino, and Mark Logic's XQuery-based Content Interaction Server. Each made the task simpler, though perhaps not laudably so in the case of minidom and E4X, neither of which requires namespace prefixes to resolve to Universal Resource Identifiers. But what's most striking when you point a variety of XML toolkits at documents that use namespaces is how differently each of them approaches the problem.

That's understandable, given that namespaces were always -- and still are -- optional. But thanks to Microsoft and Apple, what was the exception may soon become the rule.

That's good news in the long run. We'll increasingly want to mix and remix XML data, and to do so we'll need to master namespaces. In the short run, though, I expect more of the turbulence we ran into this week when Sam Ruby and Mark Pilgrim, co-developers of the RSS/Atom Feed Validator and contributors to the Atom specification, found problems with Apple's specification of an iTunes namespace, and with Apple's -- and other podcast publishers' -- use of that namespace. These folks should have known better. But they weren't the first to be bitten by the quirkiness of XML namespaces, and they won't be the last.





 


 
Jon Udell is lead analyst and blogger in chief at the InfoWorld Test Center.

  More of Jon Udell's column
  Jon Udell's Weblog

Newsletter Check out all of our free newsletters!
Enter e-mail address:




 

TOP NEWS:


»  Four quick tips for choosing an IM security product
71 percent of businesses will invest in real-time messaging this year. If you're one of them, be sure to protect your enterprise

»  Forrester analysts ID hot IT jobs
Research group finds 16 IT roles with a promising future

»  Nvidia claims 10 hours of HD video on Tegra chip
The Tegra 600 and 650 can be used with hard disk drives and are designed partly for mobile Internet devices

»  Database vendors add Google's MapReduce
Greenplum and Aster Data Systems will support Google's programming technique, developed for parallel processing of large data sets across commodity hardware

»  Network management: Tips for managing costs
New technologies, changing requirements, and ongoing equipment maintenance and upgrades cost money, but there are ways to manage expenses

»  EMC targets SMBs, branch offices with new low-end storage
Celerra NX4 highlights include thin provisioning, snapshot technology for data recovery and backups, and Web-based console for management of storage volumes




MIGRATING TO VISTA
Join Windows Vista Expert, Richard Whitehead as he presents the benefits and challenges of migrating to Windows Vista. Sponsored by Novell

»  Click here to view this Webcast
  WAN Emulation Sponsored Solutions Guide
WAN emulation technology enables IT organizations to predict reliably how applications will perform in a networked environment, before application rollout, mitigating development risk and costs.This Sponsores Solutions Guide has everything you need to now about WAN emulation and WAN and how to best implement it in your organization. Sponsored by Shunra

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 

FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


What's the 411 on GOOG-411?
Just as Google has become synonymous with "performing a Web search," 411 is understood to mean "information" -- as in "what's the 411?" I was thus surprised to discover, from a billboard, no less, that the king of search is taking on the ...

Apple HTML source reveals 'iPhone Extreme'
"This one's a stretch..." reports AppleInsider. Um, yeah. Reporting on HTML code sightings of product names could be called a stretch, but iPhone Extreme has a ring to it. Now, that sounds like the product Apple should have released first, rather ...

COLUMNISTS

Unified under law
Ephraim Schwartz's Column and Blog (InfoWorld) - In the litigious world we live in, deploying a unified communications platform in your enterprise could...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Product Management
When I joined MySQL four years ago, there was quite a lot of debate about product management. We didn't actually have ...

Zero Day 
Botnet herders tending smaller flocks
New research backs up the theory that botnet operators are keeping their networks smaller in a continued effort to keep ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Open Sources
• ProdBlog
• Real World SOA
• Reality Check
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
'If you don't go after the network, you're never going to stop these guys. Never.'
From the State Department, All the News for Inquiring Minds
TechPresident, the Internet Citizenry's New Consensus Taker



Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist