About InfoWorld : Advertise : Subscribe : Contact Us : Awards : Events : Store
InfoWorld HomeNewsTest CenterOpinionsProduct GuideTechIndex
PRODUCT REVIEWS GUIDE    REVIEWS    ANALYSES    SPECIAL REPORTS 
 

TEST CENTER

 
The power of voice

By Jon Udell
December 13, 2002


CHEAP STORAGE MAKES it feasible to save voice recordings of many of our meetings, teleconferences, interviews, and other conversations. In some environments -- call centers and certain sectors of finance and government -- that already happens. But audio surveillance isn't yet routine, and the thorny legal, social, and cultural issues it raises haven't yet been widely debated. That's because, until now, there was no practical way to mine voice data.

   ADVERTISEMENT
  

Free IT resource

Virtualization Insights from Top Experts - Learn how virtualization gets real!

Sponsored by Dell

Free IT resource

TechNet: More ways to know it, share it, and keep it running.

Sponsored by Microsoft

RELATED LINKS
»  AT&T buys high-speed wireless spectrum for $2.5 billion
»  Update: Sprint chief Forsee resigns
»  IT trainer offers master's degree for hackers
»  Wireless RSS feed 

IDG ENTERPRISE NETWORK
More Network LAN/WAN News...  (ComputerWorld)
Wireless EV-DO on board  (ComputerWorld)

TOP NEWS 


IT SOLUTION SEARCH

As with other forms of practical obscurity, this artificial barrier was bound to topple, and now it has. Fast-Talk Communications' revolutionary phonetic indexing and search technology brings the magic of full-text search to the formerly opaque realms of audio recordings and video soundtracks. If you consider the way in which Google has already become everyone's indispensable "outboard brain," and extrapolate that to all the voice data that exists -- and to the vast quantities that soon will exist -- it's hard to avoid the conclusion that Fast-Talk is one of the most disruptive technologies in the pipeline.

A phonetic search engine

What Fast-Talk sells is an engine and a software development kit, not an end-user product. The kit includes a "technology demo," however, which is a fully functional tool that has changed how I work in a dramatic way. Though I've been a journalist on and off for many years, I had never integrated audio recording into my routine. Finding quotes in those recordings was a painful process, and sending them out for transcription (as my InfoWorld colleagues routinely do) incurred delay and expense. So, being a fast typist, I just captured what I needed live. That technique was stressful, not always accurate, and obviously not appropriate for most people. So when I interviewed Antarctica Systems CTO Tim Bray recently for InfoWorld's CTO Zone (see "Mapping the future"), I used Fast-Talk to record, index, and then search the conversation.

The Fast-Talk engine can work with multiple audio formats, using pluggable "media accessors" to encapsulate them. The technology demo supports only WAV files, which it indexes to create PAT (phonetic audio track) indexes. If you want to search video, Fast-Talk recommends using VirtualDub, an open-source program, to extract the audio track as a WAV file. You can use Fast-Talk's demo to index pre-existing WAV files or, as I did, to index a WAV file while recording. This near-real-time indexing meant I was able to begin searching the index as soon as the 45-minute conversation ended. That was true because Fast-Talk's phonetic technology is orders of magnitude faster than the conventional alternative: speech-to-text translation followed by text indexing.

Like many great innovations, Fast-Talk is simple to describe. Phonemes are the basic units of sound in a language, and North American English has 39 of them. You can look up a word's phonetic spelling in the Carnegie Mellon dictionary (see Kevin Lenzo's Web site at www.speech.cs.cmu.edu/cgi-bin/cmudict). "Dictionary," for example, works out to "D IH K SH AH N EH R IY." Fast-Talk's indexer recognizes phonemes and notes the time of their occurrence. The searcher converts text input to phoneme strings, looks for them, and returns their time-codes. It's as simple -- and brilliant -- as that.

Fast-Talk in action

When my interview with Tim Bray was done, the first segment I looked for was the one where Bray said, "Jean Paoli spent four hours showing me XDocs." The name "Jean Paoli" was, not surprisingly, ineffective as a search term. But "four hours" found the segment instantly, as did "fore ours" -- which of course resolves to the same string of phonemes. "Zhawn Powli" also worked, illustrating what will soon become a new strategy for users of voice-aware search engines: When in doubt, spell it out phonetically. In practice, I find myself resorting to this strategy less often than I'd have expected. And it was fairly obvious when to do so. I guessed correctly that "MySQL" would not work, for example, but that "my sequel" would.

The query language is dead simple, but there's an interesting twist on proximity. In a conventional search engine, proximity means "find a word within so many words of another word." In Fast-Talk's engine, it means "find a string of phonemes within so many seconds of another string of phonemes."

I was unable to find any variant of "XDocs," but I chalk that up to the recording's poor quality -- I was testing an IP phone at the time. There were some dropouts, and "XDocs" came during one of them. The marginal recording quality was, in fact, an excellent test. Like most people, I have no special audio engineering skill and no special recording equipment. To succeed in the real world, Fast-Talk will have to work well with whatever raw material it can get -- and it does. Although it is tuned for North American English, the international nature of our industry made it inevitable that I would push those limits. Sure enough, the accents I threw at it included Ximian CTO Miguel de Icaza's (Mexican), OpenLink Software CEO Kingsley Idehen's (Nigerian/British), and Systinet CEO Roman Stanek's (Czech), with usable results in each case. It's preferable, of course, to have a high-quality recording of a native speaker of North American English. When I indexed a well-modulated phone conversation that Test Center Director Steve Gillmor had with Microsoft's Mark Lucovsky, the results were simply uncanny.

Developers will find Fast-Talk to be a clean, well-documented toolkit. The engine is packaged as a static link library for use in Microsoft's C++ environment, and from other languages by way of a COM (Component Object Model) wrapper. (There's not yet a managed interface for .Net, but C# or Visual Basic .Net programmers can use the COM API.) The API supports multithreading so that indexing and search tasks can be parceled out to a set of processors. Non-Windows packaging of the engine, when needed, will be straightforward to produce.

Call centers are obvious first candidates for the Fast-Talk treatment. "Think about running a support center," says Patrick Taylor, Atlanta-based Fast-Talk's vice president of sales and marketing. In theory, answers to hard questions are written down in a knowledge base. In practice, that rarely happens. "It's compelling to just index everything that's said by the best experts," suggests Taylor, "so you can instantly find where they mention, say, NT kernel error 304."

Clearly, that's just the tip of the iceberg. The implications are both exhilarating and frightening. "This business of recording everything scares the bejesus out of me," says Ray Ozzie, CEO of Groove Networks in Beverly, Mass. With entry-level deployment of Fast-Talk starting at $10,000, routine meetings and phone calls won't be indexed anytime soon. But it's coming, and it is scary. As always, great power brings great responsibility. The genie's out of the lamp, though, so we'll just have to learn to use this new power well.




  BOTTOM LINE
Fast-Talk's phonetic searching
EXECUTIVE SUMMARY
With Fast-Talk Communication's revolutionary phonetic indexing and search engine, you can instantly find words and phrases buried in many hours of spoken recordings. It's a major breakthrough that will forever transform voice data.

TEST CENTER PERSPECTIVE
Google has become the "outboard brain" that we increasingly cannot function without. However, while Google is a voracious reader, it can't hear a thing. Fast-Talk's technology promises to remedy that handicap someday soon. It's a dizzying, if sobering, prospect.


RELATED ARTICLES

http://www.infoworld.com/features/fetelephony.html


SPONSORED WHITE PAPERS
EMC - Lower costs and improve reliability-Get the EMC CLARiiON white paper!
Ciphertrust - Are you ready for Sobig.G? Learn how to protect your email systems.
CDW - Personal attention. CDW. The Right Technology. Right Away.
EMC - Explore key performance features and capabilities of EMC ControlCenter 5.1.1.
Intel - Free Intel white paper shows you how to deploy a secure wireless LAN
Cisco - FREE WHITE PAPER: BLUEPRINT to design and implement secure VPNs
Verity, Inc. - "Mass Consolidation Hits the Web-Search Market"
McDATA - Download a FREE storage consolidation white paper from McDATA(R).
Lucent Technologies - Overcoming Common Firewall Limitations
Lucent Technologies - Leverage Your Mobile High Speed Data Access. Download Free White Paper!
Nokia - Get the scoop! Mobilizing business white papers & case studies.
BMC Software - Maximize the Potential of Enterprise Data: Free white paper!
Network Associates - Free white paper - Strategies for Optimizing Network Costs and Benefits
Entrust - Manage identities across applications. Improve productivity.
Stalker Software - CommuniGate Pro - Transform your Email and Calendaring
Remedy - A NEW Gartner Research Note:Producing Quality IT Services

Search the IDG White Paper Library:


SPONSORED LINKS

INFOWORLD MARKETPLACE


» Hot Stock Alert (TMDI)
Telemedicus - Medical Communication Top Telemedicine Technology
» Apply BPM and ITIL at your IT Help Desk
ServiceWise brings BPM to complete IT service while eliminating integration cost. Learn more here.
» EMC delivers high-speed image capture, storage
Learn how you can quickly capture, organize, and deliver information with EMC ApplicationXtender.
» Register for your free VMWare Virtualization kit!
VMware virtualization takes the cost and complexity out of IT  Download this free kit to learn how.
» FREE Sophos Threat Detection Test
Is your AV catching everything it should? Free virus, spyware and adware scan.




 HOME  NEWS  TEST CENTER  OPINIONS  PRODUCT GUIDE  TECHINDEX   About : Advertise : Subscribe : Contact Us : Awards : Events 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy

All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses, phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

Computerworld :: Network World :: CIO :: PC World :: Darwin :: CMO :: CSO
IT Careers :: JavaWorld :: Macworld :: Mac Central :: Playlist :: GamePro :: GameStar :: Gamerhelp
ITWorld Canada :: Computerwoche :: Techworld UK :: tecChannel :: IDG.se :: IDG.no