Skip to Content

Department of Linguistics

MU-Talk: Context Sensitive Rule Module

Robert Mannell

The CSR module of MU-Talk has the task of determining the appropriate allophone for each phoneme as determined by its context.

The CSR module examines the context of each phoneme and applies modifications to the non-contrastive features of each phoneme. A non-contrastive feature modification is a feature modification that does not turn a phoneme into another phoneme. Such non-contrastive features might include:-

  1. voicing in approximants
  2. different levels of stress in vowels
  3. variations in the relative durations of sub-phonemic features (eg. targets, transitions, stop occlusions, stop bursts and aspirations, etc.)
  4. non-contrastive shifts in place of articulation (eg. velar to palatal shifts in /k/ and /g/)

The location of the CSR module on the MU-Talk diagram is currently in error. It actually represents the intended location for part of this module (see below).

At present, the CSR module is effectively a preprocessor to the synthesis-by-rule branch of MU-Talk. It provides very detailed, and SBR-specific, information regarding the precise quality of each of the allophones. It does, however, also provide information on the relative timing of sub-phonemic features which could be of use in a concatenation module that would be able to take into account the positions of these features during time-warping and concatenation.

Included here are a few sample lines of CSR code that is intended to illustrate these two aspects of the CSR module (ie. general rules and SBR-specific rules). Both of these pieces of code come from the function "Phoneme_Default" which sets default values of certain variables for each phoneme. Note that "..." indicates that code has been ommitted.

This CSR rule scripting language (modelled after the C programming language) and its related interpreter were designed by Mannell in 1997. The version of the rules shown here is dated November,1997. No further development of the CSR module has occurred since that time.

CSR Data

CSR Input Data

The following data would represent the input to the CSR module for the text "live long and prosper".

##  700
#     0
l   120
I    90  0
v   120
#     0
l   110
O   150  0
N    90
#     0
A    90  1
n    50
d    40
#     0
p   110
r    50
O   100  0
.     0
s   130
p    90
@    60  0
#     0
##  700

In the left most column are the phoneme codes and codes for word ("#") and sentence ("##") boundaries. The next column represents the unit durations in milliseconds. Note that sentence boundaries are given a duration of 700 ms, and this represents a fixed inter-sentence pause duration. If this sentence had an intonational phrase boundary within the sentence then this would also be given a pause duration. This sentence has an intermediate phrase boundary, not indicated here but between "long" and "and". Intermediate phrase boundaries are given zero pause durations (ie. no pause).

The third column indicates vowel stress levels. A value of "1" for /A/ in "and" indicates that the vowel is to be reduced to a schwa. It is intended that such degrees of reduction, that is to schwa, will in future be dealt with entirely by the connected speech module (acting upon data supplied by the prosody module).

CSR Output Data

At present, the output data for the CSR module is identical to the input data to the SBR module.

SBR-specific Rules

Here are a few lines of CSR code that determine exact vowel quality for the SBR module (note that this cardinal vowel based approach comes originally from Clark (1979, 1981).

/* ------------------------------------------------------------------------- */
~ rules Phoneme_Default
/* ------------------------------------------------------------------------- */
/* Context sensitive rules for allocating the default phonetic feature values*/
{
    ...

    /* VOWELS */
    else if (|<vowl>|)
    {
        ...

        else    /* Fully stressed vowels */
        {
            if      (|<ei>|) CARD1=3,SLV1=50,H1=-2,F1=-3,GL1=110,
                             CARD2=2,SLV2=10,H2=1,F2=-2;
               ...

            else if (|<i:>|) CARD1=2,SLV1=10,H1=3,F1=3,C1=1,GL1=60,
                             CARD2=1,SLV2=120,H2=-3,F2=-3;
            else if (|<I>|)  CARD1=2,SLV1=80,H1=3,F1=-5;
            else if (|<E>|)  CARD1=3,SLV1=90,H1=4,F1=-1;

               ...
        }
    }
}
END Phoneme_Default

Note that these rules are for "fully stressed" vowels. Separate rules are also applied for fully reduced and partially reduced vowels.

In the above code "CARD1" and "CARD2" mean the reference cardinal vowel for targets 1 and 2, respectively. For a monophthong, only "CARD1" is used. Note that /i:/ is modeled as a diphthong. The numbers following "CARD1" and "CARD2" represent the reference cardinal vowel (1-16) for this target.

Raising and fronting are handled by "H1" (or "H2") and "F1" (or "F2") respectively. The raising and fronting values are on a ten point scale (ie. ten steps between cardinals). Negative values for raising and fronting represent lowering and retracting, respectively.

"C1" (and "C2") refer to a centralised version of each cardinal vowel, and have the default value of "0" (false), in which case this variable can be ommitted. When "C1" (or "C2") have the value of "1" (true) it means that the reference cardinal vowel is a centred version of the true cardinal vowel.

Note that the logic of the SBR module takes into account vowel co-articulation with adjacent consonants, so this does not need to be addressed in the CSR module.

"SLV1" and "SLV2" represent the durations of the first and second targets of the vowel, respectively. "GL1" represents the duration of the glide between the two targets of a diphone. These are not absolute duration values, but are later converted into a percentage of the prosody module's allocated phoneme duration. Initial and final vowel transition durations are dealt with in separate functions.

General Sub-phonemic Feature Timing Rules

Here are a few lines of CSR code that set the relative timing for certain sub-phonemic features. Note that actual phoneme duration has already been set by part of the prosody module.

/* ------------------------------------------------------------------------- */
~ rules Phoneme_Default
/* ------------------------------------------------------------------------- */
/* Context sensitive rules for allocating the default phonetic feature values*/
{
    /* CONSONANTS */
    if (|<cons>|)
    {
        if (|<vstop>|)    /* Voiced stops */
        {
            OL=70;
            if      (|<b!d>|) SLC=10,TT1=40,TT2=40;
            else if (|<g>|)   SLC=20,TT1=40,TT2=50;
        }

        else if (|<uvstop>|)    /* Voiceless stops */
            SLC=40,OL=70,TT1=40,TT2=40;

        else if (|<approx>|)    /* Approximants */
        {
            SLC=60;
            if      (|<w!r>|) TT1=80,TT2=80;
            else if (|<j>|)   TT1=70,TT2=70;
            else if (|<l>|)   TT1=50,TT2=50;
        }

        else if 
           ...
    }
    ...
}
END Phoneme_Default

In the above function, durations are only relative durations and are converted into true durations in this module, but following the operation of the CSR rules.

"SLC" represents the consonant "target" duration. This has a special meaning in stop consonants as it refers to the duration of the aspiration. "OL" controls the stop and affricate occlusion duration. "TT1" and "TT2" control the transition lengths from the preceding vowel and to the following vowel, respectively. These transition values are altered in other functions in certain contexts, such as consonant-to-consonant transitions.

In the above script, only durations are being set but the control of durations partially affects stop voice onset time (VOT), by controlling the duration of the aspiration. The control of the actual point of voice onset in stops and approximants is handled by other functions.

Plans for the Future

The CSR module needs to be split into three parts. These parts are:-

  1. A module that produces a high-level specification of allophones (probably something like a narrow phonetic transcription). This would be placed where CSR currently is on the MU-Talk diagram, but it would also need to utilise the decisions of the syllabification procedure.
  2. A module that determines the durations of selected sub-phonemic features. This module would require considerable testing to ensure that the data it provides is useful to both concatenation and SBR branches. If this proves to be unsatisfactory for use in concatenation time-warping then this module would be combined with the next, SBR-specific, module.
  3. A SBR-specific module that would effectively be a preprocessor to the SBR module and which would provide data only of relevance to the SBR module (such as cardinal vowel specifications).

Bibliography

  1. Clark J.E. (1979) Synthesis by Rule of Australian English Speech, PhD Thesis, Macquarie University
  2. Clark, J.E. (1981) A low-level speech synthesis by rule system, Journal of Phonetics, 9, 451-476.