Microsoft Corp. is about to stir the speech recognition market with the launch of its Speech Server products next week. The vendor promises speech recognition for the masses, but analysts warn that speech-enabling applications is not easy.
Microsoft Chairman and Chief Software Architect Bill Gates is scheduled to formally launch Speech Server 2004 Standard Edition and Enterprise Edition at the SpeechTEK conference in San Francisco next week. The launch marks the Redmond, Washington-based company's entry into the server-based speech recognition market where it will compete with vendors including Nuance Communications Inc., ScanSoft Inc. and IBM Corp.
"Our goal is to make speech recognition technologies mainstream," said James Mastan, director of marketing for the Microsoft's Speech Server group. Microsoft's way to do that is by making speech recognition available at lower cost and easier to deploy, manage, develop and maintain than competing products, he said.
The pitch is simple. Developers can add speech capabilities to existing Web applications based on Microsoft's ASP application framework by adding code based on XML (Extensible Markup Language) and SALT (Speech Application Language Tags) technologies using Visual Studio .Net. Speech Server takes calls and communicates with the Web server through XML and SALT and makes applications offered online available through the phone, Mastan said.
Speech Server runs on Windows Server 2003. The Enterprise Edition needs to run on a separate physical server while Standard Edition, designed for small and medium-sized installations, can be placed on the same hardware as the Web server. Microsoft will recommend configurations and resellers will offer fully configured systems, Mastan said.
Users will like Speech Server because it is familiar, Mastan said. Developers can use Visual Studio and it runs just like any other Microsoft server product. "It is not some black box in a call center that you have to program for in some weird language and you can't maintain yourself because you don't know how it works," he said.
Microsoft's entry will stir the speech recognition market, according to Yankee Group and Gartner Inc. analysts. However, Microsoft has to prove itself in the market and users need to be aware that creating a speech recognition system is more complex than Microsoft makes it sound in its marketing messages, they said.
"Speech applications and a voice user interface are pretty tricky to do. That may well get lost in the first version of the Microsoft marketing hype that will go out there," said Steve Cramoysan, a principal analyst at Gartner. "If you're going to use Microsoft Speech Server, use professional services people who know exactly what they're doing."
Yankee Group Senior Analyst Art Schoeller in a research note last year issued the same warning to potential Speech Server users. "It is dangerous to imply that any Web developer will speech-enable applications, because not all have proper training in the best practices for dialog design," he wrote.
Still, Microsoft's entry into the speech recognition market is a significant event, Cramoysan said. "Microsoft will certainly shake up this market, but I think we're going to be looking at the second and third version of this product when they will become much more competitive than with this first release of the product," he said.