Skip to Content

Department of Linguistics

TTS: Text Preprocessors

Robert Mannell

Some TTS modules typically do their processing on individual words. Other parts of a TTS system work on some longer sequences of text arranged into sentences or phrases. One of the jobs of a TTS text preprocessor is to remove all codes in the input text and to present simple text to the TTS system. Such a module will also need to handle abbreviations, number strings, and any other string sequence that is not a word.

A preprocessor may, however, maintain in some format information meant to be passed to certain TTS modules. That is, a text preprocessor may, for example, capture information encoded in a TTS mark-up language for transmission to the appropriate module. Note, however, that the handling of mark-up code is optional and depends upon the design goals of a particular TTS system. A text preprocessor might also capture punctuation data for use in other modules.

The goal of most Text-to-Speech systems is to be able to handle "unrestricted text". Unrestricted text is text that has not been modified or simplified in any way so that it can be better handled by a TTS system. Such unrestriced text may be in the simple ASCII text or it may be in a wordprocessor format and so be embedded in a great deal of formatting codes. Unrestricted text can consist of simple word sequences broken up by simple punctuation into paragraphs, sentences and phrases, or it might be much more complex. Unrestricted text may have some (or all) of the following features:-

  • The punctuation may be irregular. Some punctuation may be missing or there may be additional, possibly spurious, punctuation. These irregularities may make it very difficult to determine paragraph, sentence and phrase boundaries.
  • The text may include abbreviations, acronyms, etc.
  • The text may include number strings of both simple and complex nature (dates, times, ordinals, cardinal numbers with and without commas, decimal numbers, fractions, etc.).
  • The text may include letter and symbol strings that can't be interpreted and pronounced as words. (eg. "That's &*%!!*! terrible!")
  • The text may include special codes that are intended for the synthesiser. These may include speech mark-up language tags, or phonetic script.
  • The text may include non-ascii or extended ascii characters.
  • The text may include Unicode characters outside the ascii character set.
  • The text may include mathematical or chemical formulae.
  • The text may include words from other languages, such as foreign names.

A text pre-processor for a TTS system needs to handle all of these types of non-lexical items. A TTS system should not "crash" when it encounters unexpected or difficult strings and should have a fall-back worst case scenario where it simply spells a string by naming each character (eg. "&*!!*!" will be parsed as "ampersand, asterisk, exclamation, exclamation, asterisk, exclamation" and these character names will be pronounced by the synthesiser).

Grammatical parsing of text does not have to be exhaustive in a TTS system, but it should be able to provide a reasonably accurate indication of phrase boundaries and perhaps also to provide word parts-of-speech for syntactic homographs. MU-TALK currently relies entirely upon punctuation for phrase-boundary parsing. This simplistic approach needs to be greatly improved, especially for complex sentences. MU-TALK currently has no facility for determining word parts-of-speech.

Click here, for an overview of the current status of the MU-Talk text preprocessor.

Bibliography

  1. Monaghan A.I.C. (1992) "Heuristic strategies for the higher-level analysis of unrestricted text", In Bailly G., and Benoit C. Talking Machines: Theories, Models and Designs. North-Holland, Amsterdam, pp143-162
  2. O'Shaughnessy D. (1992) "Text processing for text-to-speech synthesis", In Bailly G., and Benoit C. Talking Machines: Theories, Models and Designs. North-Holland, Amsterdam, pp109-112
  3. Russi T. (1992) "A framework for morphological and syntactic analysis and its application in a text-to-speech system for German", In Bailly G., and Benoit C. Talking Machines: Theories, Models and Designs. North-Holland, Amsterdam, pp163-182
  4. Sproat R.W., (1997) "Section introduction: The analysis of text in Text-to-Speech synthesis", In In van Santen J.P.H., Sproat R.W., Olive J.P., and Hirschberg J., (eds.) Progress in Speech Synthesis, Springer, New York, pp73-76