Department of Linguistics

MU-Talk: Connected Speech Processing

Robert Mannell

Currently, MU-Talk has a fairly simple connected speech processor. This processor is applied after grapheme-to-phoneme (GTP) conversion, but before context-sensitive-rules (CSR). Currently, the connected speech processor is an add-on component of the GTP module.

Connected speech transformations, of the type handled by this processor, affect the stream of phonemes by assimilation (where one phoneme changes into another phoneme), insertion (where an additional phoneme is inserted), deletion (where a phoneme is deleted) and reduction (particularly of vowels, where a phoneme is converted into another phoneme, which is considered to be a reduced form - this is typically reduction of full vowels to schwa). For an overview of connected speech processes in Australian English, please read these pages. (Important: To make sense of these linked pages, you will need to ensure that you can view the phonetic fonts.)

These types of transformations occur at word and morpheme boundaries. In MU-Talk, morpheme boundary processes have already been handled by the morphology processor (ie. the affix processor within the GTP module). The GTP module outputs the pronunciations of a series of words. These words are given their "citation-form" pronunciation (ie. the dictionary pronunciation of the word spoken in isolation). So, the connected speech processor needs to handle word boundary effects.

Because, at this stage in the processing, the system doesn't yet have a model of the phrase structure of each sentence, it is not possible to carry out all function word reduction, but a start is made here to this process.

Currently, the connected speech processor carries out the following transformations:-

  1. Reduction of certain function words
  2. Assimilation of word-final alveolar stops and nasals to the place of articulation of certain following consonants
  3. Insertion of "linking-r" for words ending in certain vowels, but only when the next words starts with a vowel and there is no intervening pause (such as a sentence break).

Plans for the Future

  1. Re-location of the connected speech processor to a position immediately following the phrase determining code in the prosody module. This will enable a more accurate processing of function word reductions. Note, that this module will still need to precede the component that syllabifies the phoneme stream and that allocates phoneme durations.
  2. Extension of the connected speech processor to include a more exhaustive set of connected speech transformations.