Rebooting HTML for the Semantic Web

Seeking better standards compliance, the Web's creator faces an uphill battle

"Making standards is hard work," writes Tim Berners-Lee in a recent blog post. And he should know. The creator of the World Wide Web, Berners-Lee is responsible for developing and popularizing some of the most significant open standards in computing history.

His current project, the Semantic Web, is an attempt to carry Web standards to a level beyond anything we've known so far. Its goal is to transform today's Web into a semi-intelligent network of information resources, where machines will be able to analyze and understand the meaning of information, similar to how humans do today. If successful, it will absolutely revolutionize information retrieval. And the key to its success is the rigorous application of standards.

But there's a catch: It's hard enough to get people to comply with the standards we have already.

HTML, in particular, has a troubled past. The headaches began in the bad old days of the browser wars, when competing browser makers would implement the specifications in dubious ways and add nonstandard features to their software. Confounded by conflicting results, developers got into the habit of writing code that worked, no matter what sins against the standards they would have to commit.

Some years ago, the engineers at the W3C reasoned that the best way to get back on track would be to start with Web developers. Get developers to write HTML that adheres to the published standards, rather than relying on the behavior of any one browser, and end-users would naturally gravitate toward browsers that did a better job of implementing the standards. In turn, this would create incentives for browser vendors to make standards compliance a top priority.

It was a logical enough plan. The folly lay in its execution. Because the way the W3C chose to reach developers was with -- you guessed it -- another standard.

Enter XHTML. A successor to the original browser markup language, XHTML combined the vocabulary of HTML with the syntax of XML, and in the process it stripped away many of the inconsistencies and bad coding practices that HTML developers had accumulated through the years.

XHTML actually has a lot going for it. Because of its strict syntax, it encourages more rigorous coding. It is also easy to validate using automated tools, so that Web developers can know when they've made errors, as programmers do. What's more, it encourages the use of CSS (Cascading Style Sheets), which helps to keep actual Web content separate from the details of how it is presented onscreen.

The problem? "The attempt to get the world to switch to XML ... all at once," writes Berners-Lee, "didn't work." In other words, very few Web developers use XHTML. Or if they do, they don't use it properly.

Berners-Lee blames the browsers for not requiring well-formed code, but his colleague Håkon Wium Lie, CTO of Opera Software and inventor of CSS, believes there's more to it than that. Lie suspects that XHTML is unpopular because it tends to "punish the good guys" by being too rigid and unforgiving in its syntax. Writing good XHTML is laborious, a pursuit better suited to engineers or library science majors than Web designers. What Web publishers care about is producing exciting content, not standards compliance.

So it's back to the drawing board. In his post, Berners-Lee announced a brand-new working group within the W3C that would once again try to address the challenges and shortcomings of HTML, while working on the XHTML standards in parallel. The new group will take input from engineers, browser vendors, and Web developers, and make incremental improvements to the standards, taking into account the needs of diverse audiences.

It's a good step. But it does make me wonder about the future of Berners-Lee's vision of the Semantic Web. The lesson learned from XHTML is that, when it comes to standards, just because you build it doesn't mean they will come. And yet, XHTML is only the beginning of the standards compliance that the Semantic Web would require. If the Semantic Web is to succeed, it will have to find ways to accommodate human nature, and not just good engineering -- or I suspect the work Mr. Berners-Lee has ahead of him will be very hard, indeed.

Copyright © 2006 IDG Communications, Inc.

How to choose a low-code development platform