HTML5 in the Web browser: HTML5 forms

The newest specs for HTML forms give programmers more control over data input and validation, while offloading much of the work to the browsers

The changes and enhancements to the form tags are some of the most extensive amendments to the HTML5 standard, offering a wide variety of options that once required add-on libraries and a fair amount of tweaking. All of the hard work that went into building self-checking widgets and the libraries that ensure the data is of the correct format is now being poured into the browser itself. The libraries won't be necessary -- in theory -- because the work will be done seamlessly by all browsers that follow the standard. In practice, we'll probably continue to use small libraries that smooth over slight inconsistencies.

The new HTML specifications include input types that offer a number of new options for requesting just the right amount of data -- say, a form element that requests the time in different levels of granularity, such as month, week, or minute. Other new input types insist that the user type in only valid URLs or email addresses. All of these input fields will be tested to ensure that the text in them is valid and that the user's progress toward satisfying the data integrity police will be tracked by a series of events. There are even hooks for a value sanitization algorithm that checks the information and perhaps cleans it up with some AJAX.

[ Also on InfoWorld: "HTML5 in the browser: Canvas, video, audio, and graphics" | "HTML5 in the browser: Local data storage | "HTML5 in the browser: HTML5 data communications." ]

Compliance with these options is gradually appearing in the browsers. At the time of this writing, for instance, Chrome lets you pin the min and max for some dates, but you can't install a value sanitization function. The minimum and maximum values are, of course, the simplest controls to create. It's much harder to offer the deeper hooks.

Holes like this are sprinkled throughout the new options. Firefox, Safari, Opera, and Internet Explorer are all slowly rolling out the new form features, and they're pretty much done with the most important ones. Alas, not all of them support the new features in exactly the same way, so it's still a bit complicated to create content that uses them. But as these gaps close, the new form elements will make it much easier for Web developers to gather information and enforce a few rules that keep the users in line.

To find out if your browser supports the new input data types and controls, try my experimental HTML5 table at

HTML5 forms: Input element type
In the old days, there were only a few types of input widgets in the forms: radio buttons, check boxes, and catchall boxes that accepted text. Even the color wheel choosers in JavaScript libraries would simply place the RGB values for selected hues in a text input box. If you wanted to do any value checking, it was up to you to implement it with JavaScript. In time, data validation became relatively easy to do with the various libraries, but it was still up to the programmer to handle.

The new options take on some of these chores. The compliant browser will now make a distinction between a wide range of data types, including dates, email addresses, numbers, and URLs. Each of these types has several more specific options. The date field may ask for a full date, a year and week alone, a year and month alone, or just the time of day. If you want to be very specific, you can mix together a date and time with the option of including or leaving off a time zone.

Some of these types seem like invitations to trouble. I'm happy I'm not responsible for implementing the code that will validate all of the different kinds of telephone numbers around the world. In America, it's a hassle because some folks will punctuate the number in odd ways, like wrapping the area code in parentheses. Freezing these rules in the browser standard will be problematic if the phone companies dream up new ways of using the numbers. Of course, if that day arrives we can always override the validation because there are attributes that allow specifying novalidate=true or formnovalidate=true. Or we can just forget about the extra features, flip the input type back to pure text, and use JavaScript the way we've always done it.

HTML5 forms: Input element type attributes
Choosing the type is just the beginning of the fun when creating these new form elements. Each type may or may not have additional features that can be specified with additional attributes. Many of these attributes are straightforward. For example, min and max can only be used with times and numbers, and not with unlikely items like email addresses, even though they're technically sortable.

By my quick count, there are 37 attributes and 14 different types. The current version of the HTML5 input element specs includes a table that shows which attributes are allowed (limiting the max value of a number, for example) and which are ignored (limiting the max value of an email address) for which types. I'm still a bit confused by why you can only specify a placeholder for some types. This short suggestion (for example, "your email address") isn't available for times or colors. Most of the other pairs that are allowed or forbidden are easy to understand, but I think most will find one or two combinations that they wish were there.

The new mechanisms are meant to extend the status quo, and that means not changing some of the old patterns. To me, it might make sense to allow each type of input to be hidden with an attribute, but the new standard continues the old approach of making "hidden" a type that accepts generic text. That's the price of backward compatibility.

HTML5 forms: Client-side form validation
Specifying the type and attribute are just the beginning because the validation process is fairly transparent. While the form will handle most of the work for you, it will also allow a number of hooks for interrupting the process or replacing it.

When something seems incorrect, the validation will set up a data structure that can be queried. The method validity.patternMismatch, for instance, will return true if a pattern is specified but the data doesn't fit it.

If you want to specify your own validation, you can add a custom message indicating why the data might not be acceptable. You can fire off this routine with an oninput or onchange event.

Problematic input data can also trigger events of their own that you can trap. Data checks can be set off by hitting the checkValidity method.

It's all pretty flexible and built in a way that will be familiar to everyone used to the traditional mechanism of attaching functions that listen for particular events. There are probably three or four different ways to check each form field.

The standard also includes a good reminder that the clients can't be trusted to enforce these rules. Although testing the data locally will save time and energy, it won't be a perfect solution because older browsers may not implement the validity checks. It's also possible that clever users may override some of the methods and block checking. For this reason, any serious data validation rules must be re-evaluated at the server. The browser can't be trusted.

HTML5 forms: Customizable options
Simply validating the data as acceptable or not acceptable is not the only option anymore. HTML5 includes several attributes that let you offer help and suggestions to the visitor.

The simplest option lets you turn on spell-check for any input element that's marked as editable. This will normally apply to form elements like textarea but may also include any part of the document that's marked contenteditable. (Editable content is discussed below.) The attribute spellcheck='true' determines when it applies.

I'm guessing that the spellcheck attribute also toggles the grammar checker, but it's not immediately apparent to me. The title of the section of the spec is "Spelling and grammar checking," but the text only mentions one attribute called spellcheck. If I were designing the spec, I would make them independent, if only because I've found that one feature is much more accurate than the other.

The datalist element lets you add a list of strings that can automatically complete a form element. The structure is like the option tags used in select elements. At this point, only Opera seems to support the feature, and some feel it makes the HTML that much grungier by larding it up with suggested answers. I'm also a bit annoyed by the idea that each potential option comes with a label that is displayed and a value that actually fills up the form element. It seems like a dangerous way to hide functionality from the user and perhaps trick them into thinking that one thing is going in the form (the label), while filling it with another (the value).

I was also confused by the possibility of having an external list of data options stored in an XML file independent of the current HTML form. This would not only simplify the HTML but also make the data reusable in different pages. It seems like a good idea, but the spec doesn't mention it yet. I've found only secondary references to this option.

HTML5 forms: Authentication
One of the most tempting options brings authentication or certification to the form information, but it is still rather unformed and not very well implemented. The so-called keygen element adds some form of cryptography using public-key encryption, but it is only partially implemented on Chrome, Firefox, and Opera, despite dating from the time of Netscape. The potential power is huge, but I think it will take several more iterations to find a good set of features that work the way that people expect.

The idea is to get the browser to offer a way to generate pairs of public and private keys automatically. Many programmers who've tried to use keygen say it's confusing for the average person because it requires too much understanding of such details as the length of keys. There are also deeper issues about how users might move the certificates from computer to computer or how malware might target them.

In the future, the option might include a better way to automatically use a key pair to sign all data in the form, not just the challenge attribute attached to the keygen item. This, of course, requires a more standard mechanism for creating the signature over all possible forms of data. The standard hash functions and message digests are probably a good place to begin. This will have to wait until the feature is more fully formed.

HTML5 drag and drop
The ability to drag HTML elements around and drop them somewhere else is an old option for Web designers who are willing to use their own libraries, but it's always been mired in some confusion. After Microsoft included drag-and-drop support in what was called DHTML in 1999, developers had to struggle with cross-browser problems. A number of good cross-browser scripts appeared over the years, and many sites use them, even though they seem to confuse the public, who tend to expect the items on Web pages to be somewhat fixed in place. I've often expected companies like Netflix to implement drag and drop to maintain lists, but they never seem to choose that path.

In any case, the HTML5 drag-and-drop spec smoothes away many of the browser differences. In theory, the cross-browser scripts won't be necessary as long as all browsers follow the standard in exactly the same way. All that you need to do is add the attribute draggable='true' and the element can be picked up and moved.

Well, that's not quite all. If you want to do something with the dragged element, you must be able to handle at least seven different events that fire as it moves around the page. Struggling to deal with all possible options has driven some people to write long complaints about the complexity. (A "disaster" and "far from complete" are two early gripes.)

There are also some compatibility issues. Safari, for instance, requires a separate CSS entry to turn on dragging even after you add the draggable='true' attribute. All of these issues point to the fact that someone is going to write a simpler drag-and-drop library that abstracts away much of this complexity and makes it as easy as adding the draggable='true' attribute.

HTML5 forms: Self-calculating form fields
One of the traditional jobs of JavaScript has been to perform calculations for the user who is adding data to the form. The traditional way was to set up some text input elements and let the JavaScript change the other elements whenever an onchange event fires.

The new idea is to create a new output element that will work in concert with the input element. An attribute specifies the formula for the output field. The browser is responsible for updating the output field whenever the form changes by calculating the formula. I have tried to use this on several browsers without success. It just seems easier to use good old input fields instead.

The output can also be represented graphically using the progress and meter tags. Both essentially represent some fraction between zero and one as a thermometer-like rectangle that fills up with color -- but there are differences. The progress element has an "indeterminate" setting that indicates the software has no clue what the value really is. This is usually displayed as wavy lines.

1 2 Page 1
Page 1 of 2