The power of voice
Fast-Talk Communications brings full-text search to audio recordings
Follow @infoworldCHEAP STORAGE MAKES it feasible to save voice recordings of many of our meetings, teleconferences, interviews, and other conversations. In some environments -- call centers and certain sectors of finance and government -- that already happens. But audio surveillance isn't yet routine, and the thorny legal, social, and cultural issues it raises haven't yet been widely debated. That's because, until now, there was no practical way to mine voice data.
As with other forms of practical obscurity, this artificial barrier was bound to topple, and now it has. Fast-Talk Communications' revolutionary phonetic indexing and search technology brings the magic of full-text search to the formerly opaque realms of audio recordings and video soundtracks. If you consider the way in which Google has already become everyone's indispensable "outboard brain," and extrapolate that to all the voice data that exists -- and to the vast quantities that soon will exist -- it's hard to avoid the conclusion that Fast-Talk is one of the most disruptive technologies in the pipeline.
A phonetic search engine
What Fast-Talk sells is an engine and a software development kit, not an end-user product. The kit includes a "technology demo," however, which is a fully functional tool that has changed how I work in a dramatic way. Though I've been a journalist on and off for many years, I had never integrated audio recording into my routine. Finding quotes in those recordings was a painful process, and sending them out for transcription (as my InfoWorld colleagues routinely do) incurred delay and expense. So, being a fast typist, I just captured what I needed live. That technique was stressful, not always accurate, and obviously not appropriate for most people. So when I interviewed Antarctica Systems CTO Tim Bray recently for InfoWorld's CTO Zone (see " Mapping the future "), I used Fast-Talk to record, index, and then search the conversation.
The Fast-Talk engine can work with multiple audio formats, using pluggable "media accessors" to encapsulate them. The technology demo supports only WAV files, which it indexes to create PAT (phonetic audio track) indexes. If you want to search video, Fast-Talk recommends using VirtualDub, an open-source program, to extract the audio track as a WAV file. You can use Fast-Talk's demo to index pre-existing WAV files or, as I did, to index a WAV file while recording. This near-real-time indexing meant I was able to begin searching the index as soon as the 45-minute conversation ended. That was true because Fast-Talk's phonetic technology is orders of magnitude faster than the conventional alternative: speech-to-text translation followed by text indexing.
Like many great innovations, Fast-Talk is simple to describe. Phonemes are the basic units of sound in a language, and North American English has 39 of them. You can look up a word's phonetic spelling in the Carnegie Mellon dictionary (see Kevin Lenzo's Web site at www.speech.cs.cmu.edu/cgi-bin/cmudict ). "Dictionary," for example, works out to "D IH K SH AH N EH R IY." Fast-Talk's indexer recognizes phonemes and notes the time of their occurrence. The searcher converts text input to phoneme strings, looks for them, and returns their time-codes. It's as simple -- and brilliant -- as that.
Fast-Talk in action









