Why synthesis of Laughter?
Presently, the goal of researchers in the speech synthesis field is to include expressive and emotional content in machine synthesized speech to enhance its naturalness which includes incorporating non-verbal cues appropriate to the context. One main motivation comes from the development of interactive applications in entertainment/games, education and even business services. Synthesis of laughter can be viewed as a part of expressive communication for instance, synthesized laughter can be used by itself, or along with "happy" speech to express the positive emotion of happiness better.
(as an aside: since analysis and synthesis go hand in hand, laughter as cue for detecting a person's emotional state and a person identity is a valuable side benefit of this work).
Some challeneges in synthesizing Human-like laughter :
The challenge here is to come up with a model that meets the following constraints:
- It should be able to capture the whole spectrum of the highly variable physiological process of laughter. It should also be able to bring out the individual traits of laughter in people.
- It should have the provision to generate other types of laughter (e.g. short bursts or a long train of laughter) depending on the immediate context. A parametric control over the generated laughter is preferred from a synthesis point of view.
- The model should convenient to use: i.e. it should be able to generate laughter based on simple, easily available information. If a model requires too much information based on complex analysis, it becomes inconvenient for many applications.
The model used in this research to generate laughter is similar to one that describes the oscillatory behaviour of a pendulum, or an oscillating body that is attached to the end of a spring. And thus it overcomes the main engineering challenges from a synthesis point of view, as listed above.
How is synthesis of laughter different from speech synthesis?
Speech, in humans, is a more "constant" or controlled and well understood process that is governed by the rules of a language grammar. For example, if you wish to synthesize a phrase How are you? the sequence of sounds phonemes and the expected intonational variation is well defined. Also, speech is abundantly available for analysis. Thus the inputs required to generate any word in synthesized speech are well defined.
Laughter ,on the other hand, is highly variable and the acoustics of its production are not completely understood. Since laughter synthesis effectiveness is evaluated primarily on the subjective expressive quality rather than intelligibility, the goals for synthesis evaluation are more difficult than for speech synthesis. From a R & D point of view, it is quite difficult to get data for analysis of the various types of laughter. If you wish to record "natural" laughter as data for scientific analysis, how would you do it? Compared to this, a person can pronounce the phrase How are you? easily in a laboratory environment, almost exactly the same way he/she would pronounce it in everyday usage. Due to constraints in analysis, the inputs required for synthesizing a laughter instance is not well defined. While the exact synthesis of the waveform for each event in laughter is the same as used in speech synthesis, the laughter synthesis model provides information about how the individual events need to be synthesized.As in any speech synthesis problem which comprises of the duration,pitch and stress value of the indivudual units, the laughter synthesis problem deals with the duration, pitch and stress of each laughter call .To illustrate, in a caricature of laughter HA!HA!HA!ho!hee!heee!hee!hee! each HA,Ho,Hee are individually termed as a laughter call. The overall sequence is a laughter bout.Using the proposed SHM model, one can calculate the duration and stress for each laughter call. As another example, if you imagine a laughter bout to be similar to a physical chain, the model proposed in this research simply tells you how long and how thick each link should be so that the overall chain is laughter. To actually make each link, and connect the links, the technique is adopted from speech synthesis. The model uses inputs that are not the same ones that one requires for speech synthesis. The required inputs are much simpler and also provide flexibility to generate various types of laughter. Thus it is possible to work around the limitations of data availability.
Listen to samples of Synthesized Laughter