The why and how of voice portals

Beyond saving money for business, IT can leverage voice portals for better functionality

For years, IT and business have heard the sexy promise of "IP convergence," which would allow all sorts of voice- and video-enabled applications to appear in business. However, for most organizations, this Jetsons-like vision has yet to occur.

But many IT organizations are in the midst of one major "IP convergence" transition -- from traditional proprietary interactive voice response (IVR) systems, the voice-prompt phone interfaces so common for automated customer service and other call-center activities, to standards-based voice portals. According to Datamonitor, sales of the "IP convergence" voice portal technology surpassed those of traditional IVR systems some time in 2007 and have continued their upward trend ever since. And with the current recession, IT may find IVR moving up its priority list.

[ Trying to understand what's real about unified communications? Watch InfoWorld's video tutorial. ]

A voice portal replaces the once-proprietary vendor IVR box in a legacy contact center system with software developed in Java, PHP, or .Net that sits on a standards-based application server and accepts commands from a special server-based voice browser, which in turn receives pages back from the application server in a standard called VoiceXML.

Standardization has a lot of advantages over the old proprietary IVR systems. You no longer need an Avaya, Nortel, or Intervoice expert to integrate your IVR with your back-end systems using screen scrapes and proprietary APIs. Instead, your voice portal can simply act as another Web service linked into existing Web-enabled infrastructure, including live sales data; personalization information; and back-end inventory order management, shipping, and accounting systems. "Since you're using the same logic as your Web interface, your customers get the same menus and the same answer to the same questions, regardless of how they come in," says Joe Outlaw, a principal analyst for Frost and Sullivan. All this can be accomplished without having to change any of your existing phone systems, be they TDM- or IP-based.

For business, the rationale for voice portals is clear: reducing the labor costs of answering calls. Well-developed voice portals can resolve many more calls without the need for a live agent than legacy IVR systems can. "A live-agent call can cost anywhere from $6 to $15," says Michael Perry, director of product management at Avaya's interactive solutions group, "but a typical voice portal call resolved without an agent runs between 25 and 50 cents, depending on whether speech recognition is used."

IT can get more business bang for its technology buck
But there are benefits to IT as well: One is the ability to respond more quickly to changing business needs than with traditional IVR systems. Another advantage is the ability to leverage the IVR effort to provide more functionality than the business group realized it could gain, which helps makes IT's case as a business enabler. And the third is to lower the costs of supporting IVR systems.

For example, you might help your company improve its customer experience by adding a speech recognition engine. Many airlines have already implemented this technology, letting airline customers book reservations simply by speaking the names of the origin and destination cities. In financial services deployments, a broker's clients can check or purchase a stock simply by saying its name or symbol, something that would be difficult, if not impossible with traditional touch-tone-response IVR systems.

Or, by linking your voice portal with your existing personalization engine, your voice application can identify the caller and immediately ask if he wants to make a payment to the account ending in 1224, if that's what he has often done in the past, or if the call relates to an e-mail or invoice that was recently sent.

Perhaps most important, voice/data integration makes it easier to send data along with a call transfer so that if an agent is needed, she doesn't have to ask the same personal information the caller just entered.

Your IT budget can benefit as well from easier ongoing development and maintenance, as developers no longer need specialized expertise in Nortel or Avaya APIs to make simple changes. In fact, in many cases employees can make minor changes themselves, which can be important in industries such as airlines where conditions can change daily or even hourly.

Travelocity is a case in point: "We had to respond better to the volatility of airlines with all their bankruptcies, bad weather, and policy changes. It was such an arduous task to change that information in our legacy IVRs," says Rob Mabry, senior manager of software development for agent systems.

Longer term, voice portals are a nice entry point into the world of convergence, as they demonstrate many of the potential benefits of combining voice and data without the major infrastructure upgrade required for full-fledged IP telephony. It's a way to get your feet wet and see just how voice/data convergence can benefit your organization.

But as attractive as voice portals are, don't underestimate the upfront effort. At Travelocity, the transition to a voice portal was "a forklift in terms of hardware and architecture," Mabry says. That's because there was no way to port existing IVR applications to the new voice portal system. However, that reality gave Travelocity the opportunity to reengineer and improve its call flows and add better speech recognition.

"Over five years, our legacy system had become spaghetti code and was very difficult to maintain," Mabry says. With the new voice portal, that's changed. "Our Web application developers can get features out much more quickly than we could before," he notes.

Major voice portal providers include Avaya, Cisco, Convergys' Intervoice unit, and Nortel. Travelocity built its voice portal application, which runs on a Tomcat application server, with Intervoice InVision Studio tools and uses a Nuance Speechworks media server for speech recognition.

IT can also roll its own voice portals using a combination of industry standards and specialized development tools offered by vendors such as OpenMethods, Syntellect, and Voxeo. Many savvy IT departments simply write their own in Java.

IVR development doesn't require many highly specialized skills
Most seasoned Web developers can start building voice portal applications fairly quickly. That's because voice portals rely heavily on a trio of standards.

The first is the W3C's VoiceXML (VXML) standard, which is an XML specification for voice and telephone keypad interactions with the Web. VXML pages are interpreted by a voice browser, much as HTML is interpreted by a Web browser. The forthcoming third version will permit similar interactions using SMS and instant messaging.

The State Chart XML (SCXML) standard looks to be a key enabler of synchronized processing of multimodal voice and Web interactions, so customers can talk to your call center on their smartphones and simultaneously use their keypad to choose from a list of options in the phone's Web browser. For example, if a customer calls an airline to find out his flight is cancelled, the voice portal can send a Web page of new flights to his smartphone, so he can book one without a live agent.

The third key standard is Call Control XML (CCXML), which handles call processing, including how to take, accept, transfer, record a call, as well as set up a conference.

It's not necessary to understand these standards in depth, as most vendors offer drag-and-drop graphical development tools that hide all that XML code and provide reusable code for many common VoiceXML interactions. Many of these tools take advantage of the open source Eclipse development environment, which is familiar to most Web developers. There are several open source IVR efforts as well.

Implementing voice interfaces and recognition can be tricky
The one area that does require specialized expertise is the interface for touch-tone interactions and speech recognition, known collectively as the voice user interface. Voice UIs follow a different paradigm from Web page interactions and so require specialized training. However, a team of voice portal developers probably needs only one or two voice UI experts; your telecom group may already have such specialists, and voice portal vendors are happy to do the consulting work for you. Once the voice UI is designed and perfected, the IVR project can be completed by typical Web developers.

Implementing speech recognition is particularly difficult. "The skills around developing speech applications are pretty stringent, says Brian Bischoff, global vice president of subscriber solutions at Genesys, a Web conferencing vendor. "A speech application can perform pretty poorly if you don't know what you're doing." Speech recognition vendors include Nuance, Tuvox, and Voxify.

If you implement speech recognition, do note that your enablement fees per port will rise significantly. It typically costs $1,000 to $2,000 per port to enable a voice portal. With speech recognition, that expense climbs to $1,800 to $3,600 per port, says Bischoff.

Many larger companies have multiple IVR systems in place. If so, it makes sense to look for tools that can create a common voice UI over several different voice portal systems. Telecom provider T-Mobile International faced that challenge, ultimately using Voxeo's tools to develop a single voice UI across its mishmash of voice portals spread across many European countries, the result of several acquisitions. "We can also make a single change across them all, rather than having to make separate changes to each of them," says Jan Safka, T-Mobile's senior head of voice and mobile services.