The first is the W3C's VoiceXML (VXML) standard, which is an XML specification for voice and telephone keypad interactions with the Web. VXML pages are interpreted by a voice browser, much as HTML is interpreted by a Web browser. The forthcoming third version will permit similar interactions using SMS and instant messaging.
The State Chart XML (SCXML) standard looks to be a key enabler of synchronized processing of multimodal voice and Web interactions, so customers can talk to your call center on their smartphones and simultaneously use their keypad to choose from a list of options in the phone's Web browser. For example, if a customer calls an airline to find out his flight is cancelled, the voice portal can send a Web page of new flights to his smartphone, so he can book one without a live agent.
The third key standard is Call Control XML (CCXML), which handles call processing, including how to take, accept, transfer, record a call, as well as set up a conference.
It's not necessary to understand these standards in depth, as most vendors offer drag-and-drop graphical development tools that hide all that XML code and provide reusable code for many common VoiceXML interactions. Many of these tools take advantage of the open source Eclipse development environment, which is familiar to most Web developers. There are several open source IVR efforts as well.
Implementing voice interfaces and recognition can be tricky
The one area that does require specialized expertise is the interface for touch-tone interactions and speech recognition, known collectively as the voice user interface. Voice UIs follow a different paradigm from Web page interactions and so require specialized training. However, a team of voice portal developers probably needs only one or two voice UI experts; your telecom group may already have such specialists, and voice portal vendors are happy to do the consulting work for you. Once the voice UI is designed and perfected, the IVR project can be completed by typical Web developers.
Implementing speech recognition is particularly difficult. "The skills around developing speech applications are pretty stringent, says Brian Bischoff, global vice president of subscriber solutions at Genesys, a Web conferencing vendor. "A speech application can perform pretty poorly if you don't know what you're doing." Speech recognition vendors include Nuance, Tuvox, and Voxify.
If you implement speech recognition, do note that your enablement fees per port will rise significantly. It typically costs $1,000 to $2,000 per port to enable a voice portal. With speech recognition, that expense climbs to $1,800 to $3,600 per port, says Bischoff.
Many larger companies have multiple IVR systems in place. If so, it makes sense to look for tools that can create a common voice UI over several different voice portal systems. Telecom provider T-Mobile International faced that challenge, ultimately using Voxeo's tools to develop a single voice UI across its mishmash of voice portals spread across many European countries, the result of several acquisitions. "We can also make a single change across them all, rather than having to make separate changes to each of them," says Jan Safka, T-Mobile's senior head of voice and mobile services.
Read more about networking in InfoWorld's Networking Channel.