Kevin Lenzo has a unique background in academia, the open source community, and now as the founder of Cepstral, a text-to-speech (TTS) company seeking to interact with the open source community to build a commercial product. This gives him a panoramic view of both the potential and the problems involved in implementing voice technology most effectively.
Lenzo asserts that the key speech technology is not speech recognition, but text-to-speech. Speech output is of paramount importance, not speech input. He uses the example of a car radio to illustrate that buttons can be just as effective in controlling what you hear, with greater privacy, and without the inevitable occasional failures associated with speech recognition.
Lenzo presents a long list of possible applications of TTS, including hands-free in-car navigation systems, location-based weather reporting, remote network monitoring, and just-in-time broadcasting. He contrasts the latter with packaged podcasts that can end up relaying stale information. In all his examples, he sees it as crucial that devices are driven by user needs rather than the needs of the service provider, so that such applications can evolve into what he terms "an external brian" that, in a sense, controls the user. This may sound almost threatening on first hearing, but Lenzo welcomes devices, such as location-aware warehouse systems, that can guide and inform as you perform other tasks.
There are other areas in which TTS can be extremely useful. Lenzo is involved in a project in Kenya to provide speech services via phone in areas where computers are extremely rare. With literacy and language problems, TTS can provide accessibility to information that traditional computing cannot.
After bemoaning the problems involved in porting VoiceXML across different platforms, Lenzo ends his presentation with a plea for a vendor-independent cross-platform API for speech components.
Kevin A. Lenzo is on the faculty at the Institute for Software Research International at Carnegie Mellon University, where he is currently steward of the Sphinx open source initiative. He is the co-founder of Cepstral, LLC, a company offering voice building services on top of free software synthesis engines.
Lenzo has more than a decade of experience with speech research in academia and industry. He is the author of the Infobot interactive agent and co-author of the forthcoming O'Reilly book "Building Synthetic Voices." He is president and founder of Yet Another Society, a nonprofit organization for the advancement of collaborative efforts in computer and information science and the parent organization of the Yet Another Perl Conferences.
This free podcast is from our Emerging Telephony Conference series.