This frustrated many developers who saw greater potential in the Web as an application platform. In 2004, representatives of Apple, the Mozilla Foundation, and Opera Software founded the Web Hypertext Application Technology Working Group (WHATWG), an independent Web standards consortium. Working outside the W3C, WHATWG began a parallel effort to revamp HTML for a more application-centric view of the Web.
In 2007, with its XHTML 2 work mired in seemingly endless debate, the W3C voted to adopt WHATWG's work as the starting point for a new HTML5 standard. By this time, even Berners-Lee had come around to the notion of an application-centric Web. "Some things are clearer with hindsight of several years," he wrote in 2006. "It is necessary to evolve HTML incrementally. The attempt to get the world to switch to XML ... all at once didn't work."
That's not to say the concept of a pure-XML Web markup language is dead. Although HTML has retaken the lead role in the standards effort, an XML formulation of HTML5, to be known as XHTML5, is being developed at the same time. The difference is that while XHTML5 will be available for those who have already made the switch, developers will no longer be required to observe the rigorous syntax of XHTML to take advantage of Web markup's latest features.
HTML5: Markup gets a makeover
Be that as it may, HTML5 has inherited many additions originally proposed for XHTML 2, including a number of features designed to improve document structure. For example, new HTML tags such as
figure allow content authors to specify common document elements in a consistent way. Previously, developers had to mark such elements using
div tags with custom class attributes, an arbitrary method that made HTML documents difficult to parse.
HTML5 also continues the effort to separate Web content from presentation. Developers might be surprised to see the
i elements available in the new standard, for example, but these elements are now used to offset portions of text in generic ways, without implying any specific typographic treatment. Where the
i element once implied italic type, for example, in HTML5 it merely means "a span of text in an alternate voice or mood." Similarly, the
b element does not imply specifically boldfaced text, but text that is stylistically offset without having any additional importance.
By comparison, the
u tag, which referred specifically to underlined text, has been dropped from HTML5, along with other presentation-specific elements, including
strike. Such stylistic attributes are now considered the exclusive domain of CSS.
The new standard introduces additional data types for form input elements, including dates, URLs, and email addresses. Still other elements improve support for non-Latin character sets, including tags for specifying the "ruby text" that appears in some Asian languages. HTML5 also introduces the concept of microdata, a method of annotating HTML content with machine-readable tags, making it easier to process for the Semantic Web. Together, these structural enhancements allow content authors to build cleaner, more manageable Web pages that play nicely with search engines, screen readers, and other automated content parsers.
Enabling a richer, standards-based Web
But the most eagerly anticipated additions to HTML5 are the new elements and APIs that enable content authors to create rich media using nothing more than standards-based HTML. Modern Web pages increasingly incorporate scalable graphics, animation, and multimedia, but so far these capabilities have required proprietary plug-ins such as Flash, RealMedia, and QuickTime. Such plug-ins not only introduce new security risks, but they also narrow the audiences of the resulting pages.