"Practical" is a relative term when discussing speech recognition technology. Applications that service very large operations are simply too expensive and impractical for small centers that deal with far fewer calls. In this audio lecture from the 2009 Emerging Communications Conference (eComm), speech technology expert, author and Disaggregate President Moshe Yudkowsky explains just what is required for both.
While there has been significant advances in speech recognition, open source efforts have yet to provide a workable free alternative to expensive enterprise software. This is primarily due to the complex engines and the vast amounts of information that is requiredn for such applications.
Yudkowsky boils down speech recognition into three catagories: engines, analytics and biometrics.
The engine that runs a speech application is what does the actual work and there is a big difference between speaker dependent and speaker independent applications and hardware.
Analyzing audio information is a critical step in the recognition process and can be quite difficult to master, especially when dictation and highly personal speech is used. The use of biometrics like voice recognition and emotion detection is available but not yet widely used, especially in privacy- concious countries like the U.S.
Complete with lesson exercises; this brief audio lecture is all you need to get a decent grasp on what is currently practical, regardless of your call volume.
Dr. Moshe Yudkowsky has twenty years of experience in high-technology product development.
Dr. Yudkowsky is president of Disaggregate, a consulting company that helps companies create, understand, and apply revolutionary technology. He is author of the "The Pebble and the Avalanche: How Taking Things Apart Creates Revolutions" (Berrett-Koehler, 2005). He specializes in consulting and education for a particular high-tech industry: speech technology, which includes speech recognition, text-to-speech, and biometrics.
Yudkowsky received his Ph.D. in Physics from Northwestern University; he joined Bell Laboratories in 1987. At Bell Labs, Dr. Yudkowsky worked on several large-scale deployments of speech recognition applications, with responsibilities ranging from architecture to DSP development to application design.
He joined Dialogic Corp. (later Intel) in 1996 as a Senior System Architect to nurture speech development. Moshe founded Disaggregate in 2002 to implement the principles outlined in his book.
Moshe Yudkowsky speaks and writes extensively on various topics. He led the ECTF's Automatic Speech Recognition Task Group for over a decade, and served as Technical Chair of the ECTF in 2001. He was a board member of AVIOS, an organization that promotes speech technology. In 2002, he co-founded and became the first Chair of the Midwest Speech Technology Association, a US-based organization of speech technology professionals.
This free podcast is from our Emerging Communications series.