Skip to Content

Department of Linguistics

TTS: Prosody - An Overview

Robert Mannell

There have been many approaches to the allocation of prosody in TTS systems. These approaches have varied from simple algorithms which, for example, apply a simple declining F0 contour to a declarative sentence, to very complex theory-driven approaches which attempt to first discern the phrase structure of a sentence before allocating the parameters that control intonation. There has also been a great deal of variation in the extent to which TTS prosody modules attempt to apply natural rhythmic patterns to synthesised speech.

Vocal affect, or emotion in speech, has occasionally been considered by TTS system designers, but most often it is researchers interested in vocal affect who use available TTS systems to experiment with emotion in voice. Most research in this area suggests that it is the manipulation of prosody and of the paralinguistic aspects of the acoustic correlates of prosody (ie. intensity, duration and F0) which will most likely provide the best results in producing a good approximation of vocal affect.

  • For information on how MU-Talk processes prosody, click here.
  • For more information on TTS prosody modules and vocal affect, click here.