Department of Linguistics
PHONETICS AND PHONOLOGY
Coarticulation and Assimilation
Click here for a print formatted PDF version of this web page
(a more complete version of this topic)
Click here for a PDF version of the lecture slides (1 to a page) for this topic
Click here for a PDF version of the lecture slides (6 to a page) for this topic
(the lecture slides omit some of the content of this web page,
but are better if you wish to follow the live or recorded lecture)
|Important: If you have not yet either installed the phonetic font "Charis SIL" or tested this installation to determine if the phonetic characters installed properly then click here to go to the phonetic font help pages.|
Active and Passive Articulators
Active articulators are those moveable parts of the vocal tract that participate in the production of speech. They do this by moving towards or away from passive articulators (other active articulators).
Passive articulators are relatively stationary parts of the vocal tract.
Gesture and Constriction
The goal of active articulator gestures is to produce certain degrees of stricture (opening / closure) at certain points in the vocal tract.
A speech gesture is the coordinated movement of one or more articulators to achieve a desired constriction.
Constriction involves the following degrees of stricture:-
|Approximant||medium degree of stricture|
|Fricative||open but high degree of stricture|
Articulatory Phonology (Browman and Goldstein, 1992) is a phonological theory that regards gestures as the basic units of phonological contrast. This system of gestures is based on constrictions involving the lips, tongue tip, tongue body, velum and glottis.
Articulatory Phonology describes 8 tract variables:-
|3||TTCL||tongue tip constriction location|
|4||TTCD||tongue tip constriction degree|
|5||TBCL||tongue body constriction location|
|6||TBCD||tongue body constriction degree|
Lip protrusion (LP) accounts for rounding and spreading whilst lip aperture (LA) accounts for differences in degree of constriction (eg. vowel rounding [u] or spreading [i], approximant [w], fricative [v] and stop [b] articulations). Three articulators are involved in LP and LA: upper lips (mostly for protrusion), lower lips (protrusion and stricture), jaw (assists stricture).
The tongue tip can move partly independently of the tongue body. Three articulators are involved in tongue tip constriction location (TTCL) and degree (TTCD). They are the tongue tip, tongue body (tongue body movement can assist tongue tip placement), and jaw (its easier to reach the roof of the mouth if the jaw isn't too low).
Tongue body constriction degree (TBCD) and location (TBCL) is important for all vowels, and for palatal, velar, uvular and pharyngeal consonant place of articulation (for approximants, fricatives and stops). Articulators involved in TBCD and TBCL are the tongue body and the jaw (which assists in raising the tongue for some articulations).
Velic aperture (VEL) must not be confused with tongue body articulations (TBCL and TBCD) in which the velum acts as a passive articulator for a tongue gesture. VEL refers to velum opening and closing. The velum must be open for nasal stops, must be closed for oral stops and fricatives, and its opening for other sounds is language dependent.
Glottal aperture (GLO) relates to voicing. GLO is closed for normal voicing, and partly open (slightly open at the back of the glottis) for breathy voicing. GLO is wide open for voiceless stops and voiceless fricatives (excluding the glottal stop closure and the glottal /h/ fricative). GLO opening and closing is subject to complex language specific timing patterns in oral stops. This relates to differences in voice onset time (VOT). GLO is closed for ejectives and implosives and is also subject to important timing constraints in these sounds.
Gestures overlap in an utterance. The relative timing of gestures can be extremely important.
Differing patterns of overlapping gestures distinguish phonemes, syllables and larger speech units (eg. polysyllabic words, phrases).
Two major theories of speech perception regard the syllable as the minimal unit of speech perception. In these theories, speech perception is the perception of gestures. These theories are:-
- Motor theory of speech perception (Liberman, Mattingly and Turvey, 1967)
- Direct-realist theory of speech perception (eg. Fowler, 1986, Best, 1995)
There is a very large, and growing, body of research that suggests that the syllable is the most basic unit of articulatory planning in the brain.
Gestures interact with each other to a greater extent within syllable boundaries than they do across syllable boundaries.
Phonemes in Connected Speech
Phonemes rarely occur in isolation. In English this only occurs in the citation (isolated word) utterance of the words "I"/"eye"/"aye", "owe"/"oh" and "a" (also "air", "ear", "are", "oar"/"or"/"ore", "err" in non-rhotic dialects such as Australian English) plus the interjections "eh" and "oi"/"oy" (note that they are all long vowels or diphthongs).
Phonemes are articulated as part of a syllable and each syllable is normally part of a longer sequence of speech. The majority of syllables consist of more than one phoneme and most syllables are made up of one vowel and one or more consonants. See the topic "The syllable and the foot" for more information on syllables.
In simple models of articulatory planning, each phoneme has a single ideal articulatory target for each contrastive articulator. At the most abstract level of motor planning each articulator might be thought of as having a target position that it must try to achieve for each phoneme. That is the idealised target for a phoneme is invariant. Differences in actual articulations occur as a consequence of physics and timing.
In the following diagram we can see two ideal (and entirely abstract) targets for a particular articulator in two adjacent phonemes. The blue lines represent idealised phoneme boundaries.
|Figure 1: Two ideal articulatory targets for a single articulator and two adjacent phonemes. PB1, PB2 and PB3 are the idealised phoneme boundaries for Phoneme 1 and Phoneme 2.|
There are no readily definable articulatory or acoustic boundaries between phonemes in continuous speech, except at certain phrase boundaries which are characterised by pauses.
When we talk about boundaries between phonemes we are only referring to an approximate boundary at some point in the transition between the two phonemes. True acoustic and articulatory boundaries between phonemes don't exist. What we are referring to when we talk about (or plot) phoneme boundaries is a point between two phoneme targets where the two phonemes contribute approximately equally to the articulatory or acoustic pattern.
The cortical centres that plan articulatory movements during speech production can be understood as producing a series of increasingly more concrete specifications for articulator muscle movement. At the most abstract level of motor planning each articulator might be thought of as having a target position that it must attempt to achieve for each phoneme. Such a target would be the ideal target for each phoneme and would depend upon an individual's physiology and linguistic experience. This model of articulatory planning assumes a fixed ideal articulatory target for each contrastive articulator.
Phonemes are best described in terms of target positions with articulatory (and resulting acoustic) transitions between the two targets. These transitions share the characteristics of the two targets.
A "transition" in speech is caused by the movement of articulators between phoneme targets. Phonemes can be defined in terms of one or more idealised articulatory targets. For example, a monophthong vowel is defined as having one target whilst a diphthong vowel is defined as having two targets. There are transitions between the targets of two adjacent phonemes as well as between the multiple targets within a single phoneme.
|Figure 2: An idealised articulatory transition from the target of one phoneme (T1) to the target of another phoneme (T2). PB represents the approximate position of the phoneme "boundary".|
The transition between two phonemes shares the articulatory and acoustic characteristics of both phonemes but gradually changes from being predominantly like the first phoneme target to predominantly like the second phoneme target.
A contrastive articulator is an articulator whose configuration (shape, constriction) or position (place) is important to the accurate transmission of the intended phoneme to a listener. The listener ideally shares the same language or dialect and therefore the same expectations for the articulatory, acoustic and auditory characteristics of each phoneme.
Which articulators are contrastive may vary from phoneme to phoneme. An articulator that is contrastive in one language for a certain class of sounds may not be contrastive for another language.
Velum opening is always contrastive for oral stops (closed) and nasal stops (open), in all languages that have these classes of sound.
Velum opening is contrastive for certain (but not all) vowels in French (distinct nasal and non-nasal vowel phonemes), but is not contrastive for English vowels (nasality does not change vowel phonemes).
For the vast majority of languages (possibly only one exception), the position of the tongue body in the front-back and the high-low dimensions are contrastive for vowels and so the tongue body constriction location (TBCL) is a contrastive for vowels.
In English the configuration of the lips (rounded, neutral, spread) is not contrastive for vowels (no pair of vowels is contrasted solely on the basis of lip shape) and so the lips are not contrastive articulators for English vowels (but are for certain pairs of French vowels). This lack of contrastiveness does not mean that there are not characteristic lip shapes for English vowels but such lip shapes are redundant as they can be predicted from tongue position. That is, in English high front vowels are always lip spread, high back vowels are always rounded and low vowels tend to have a neutral lip shape.
When an articulator is not contrastive in a phoneme that is surrounded by two other phonemes for which that articulator is contrastive then the articulator is free to move slowly (or not at all) within that central phoneme. A good example in English is nasality, or velum opening. In English velum opening is contrastive for both oral stops (must be closed) and for nasal stops (must be open) but it is not contrastive for vowels. Vowels are identified as the same phoneme regardless of whether the velum is open or closed (but recall that this is not true for languages such as French). A vowel between two oral stops has a closed velum whilst a vowel between two nasal stops has a significantly open velum and the vowel is said to be nasalised.
Inertia is the tendency of a stationary body to resist movement or for a moving body to resist changes in its rate or direction of movement. The more massive (ie. bigger and heavier) an articulator is the greater its inertia or resistance to movement. The tongue body is more massive than the tongue tip and so it moves more slowly than the tongue tip. The jaw is even more massive and so it moves even more slowly.
|Figure 3: Two articulators with different degrees of inertia will take a different times to move the same distance. Articulator 1 (A1) has less inertia than articulator 2 (A2). Therefore, articulator 1 (A1) can move from its first target (T1) to its second target (T2,1) in much less time than articulator 2 (A2) can move the same distance to its target 2 (T2,2).|
The maximum distance that an articulator can move in a given time depends upon its inertia.
|Figure 4: Two articulators with different degrees of inertia can move different maximum distances in the same time. Articulator 1 (A1) has less inertia than articulator 2 (A2). Therefore, articulator 1 (A1) can move a greater distance from its first target (T1) to its second target (T2) than can articulator 2 (A2).|
Articulator Agility - Motor units and density of innervation
Articulators with low inertia, such as the tongue tip, can be manoeuvered rapidly. Another variable that affects articulator agility is the extent to which it is controlled neurally. An articulator with a larger proportion of nerve fibres per unit mass has fewer muscle fibres per nerve fibre. A nerve fibre and its associated muscle fibres is known as a motor unit and each motor unit can potentially be controlled individually. The larger the number of motor units in an articulator the more controllable its movement is. The tongue tip and the velum have similar mass but the tongue tip has smaller and more numerous motor units. This means that the tongue tip is more manoeuvrable than the velum in spite of their similar mass.
The high degree of innervation and the larger number of motor units in the tongue tip doesn't mean that as a whole it moves faster. This is limited by inertia and therefore by its overall mass. The greater degree of innervation does, however, make it possible to semi-independently move sub-structures in the tongue tip (eg. to permit such things as tongue tip grooving). These sub-structures are limited in their independent movement by their elastic linkage to the rest of the tongue tip, but being sub-structures (rather than the whole structure) of the tongue tip they have lower mass and therefore they have less inertia than the whole structure.
Target undershoot occurs when there is insufficient time for an articulator to reach its target position. The articulator has too much inertia to move between two targets in the available time so it doesn't travel as far as it would if there was more time. Target undershoot can occur in both vowels and consonants.
|Figure 5: This figure illustrates target undershoot. In this example the ideal target of phoneme 2 (T2) is represented by the articulator position value IT whilst UT represents the undershot target value. This would be typical of an unaccented short low vowel between two alveolar or velar stops, for example.|
A "gesture" is the movement of one articulator, such as the lower lip, the tongue tip or the tongue body, from an articulatory position characteristic of one speech sound to an articulatory position characteristic of the next speech sound.
Articulatory gestures overlap. This means that the movement of each articulator occurs at the same time as the movements of all the other articulators. These overlapping gestures may not be synchronised because of sequencing requirements, articulator contrastiveness and articulator inertia.
|Figure 6: The gestures of three articulators (A1, A2, A3) overlap in time as they pass through three idealised phoneme targets (T1, T2, T3).|
Articulatory overlap is the basis of coarticulation.
Coarticulation and Syllables
Coarticulation tends to be stronger within syllables rather than across syllable boundaries. This greater coarticulation within syllables is evidence for the cognitive existence of the syllable as a fundamental unit of articulatory organisation.
Greater degrees of coarticulation between the phonemes in a syllable increase the perceptual integration of syllables. That is, greater degrees of coarticulation increase the perception that the phonemes in a syllable are connected.
We perceive speech by recognising the (auditorily-transformed) acoustic patterns of syllables. These patterns consist of targets (or rather the effect of coarticulation on the realisation of those targets) and the transitions between targets.
Coarticulation between Consonants and Vowels
Vowels affect the articulation of adjacent consonants (and adjacent vowels). Consonants affect the articulation of adjacent vowels (and other adjacent consonants).
Some sounds are more resistant to coarticulation than other sounds. This may be due to differences in phoneme duration, differences in the inertia of contrastive articulators, differences in articulator movement distances or it may be due to the phoneme inventory effect (see below).
Coarticulation is greatest (for a given phoneme duration) when there is the greatest articulator movement (ie. greatest distance) between phonemes.
Many consonants have a high tongue position (eg. [k]) and so they are more likely to cause target undershoot in low vowels than in high vowels. This is because there is a longer distance to travel from a high consonant tongue position to a low vowel tongue position than there is to a high vowel tongue position.
Long vowels are more resistant to target undershoot than are short vowels as there is more time for the articulator to reach its target. Accented and stressed vowels are more resistant to undershoot than are unstressed vowels as they are even longer.
In other words (when adjacent to consonants with high tongue body articulations):-
- Accented high long monophthongal vowels are the most resistant vowels to target undershoot.
- Unstressed low vowels and particularly unaccented short low vowels are most affected by target undershoot.
Schwa is not realised by a defined target. It is usually very short and is therefore greatly affected by coarticulation with adjacent consonants but since it doesn't have a defined target it can't be said to have target undershoot.
Phoneme Inventory Effects
Coarticulation is resisted (reduced in degree) when it will result in perceptual confusion. The chances of perceptual confusion occurring is greatest in languages that have a large number of phonemes of a particular class. The greater the number of members of a class of speech sounds the greater the chance that adjacent phonemes will be close together.
Vowels coarticulate most in languages with a small number of vowels. Some languages have three (or even fewer) phonological vowels. Vowel fronting and height is free to vary considerably in such languages without a vowel phoneme being confused for an adjacent phoneme.
Consonants coarticulate most in languages with a small number of places of articulation. For example, in English there are only three places of oral stop articulation and only one place of articulation that uses the tongue body. Vowel articulations are also tongue body articulations. The velar stop consonants /k,g, ŋ/ in English can vary in place of articulation (when adjacent to front, central and back vowels) between palatal, velar and uvular without being confused for another oral stop phoneme. Australian aboriginal languages, with both a palatal and a velar oral stop, can't afford to have velar stop phonemes moving too far forward as they would be confused with the palatal stop phoneme. Arabic, with both a uvular and a velar oral stop, can't afford to have velar stop phonemes moving too far back as they would be confused with the uvular stop phoneme.
Coarticulation and Assimilation
Coarticulation is the way in which the movements of different articulators affect each other and the ways in which preceding and following articulations of an individual articulator affect its current articulation. Articulatory planning in the brain takes these affects into account when producing a sequence of speech sounds. Some of these effects are very slight whilst some are quite strong.
Coarticulation ALWAYS occurs in ALL languages for ALL sequences of sounds not separated by pauses. Speakers have no choice - they MUST coarticulate adjacent sounds. Without appropriate coarticulation (eg. in poor synthetic speech) the resulting speech sounds unnatural and is hard to understand.
Two opposing forces are at work in speech production and perception. (eg. Boersma, 1998)
- There is a tendency to simplify speech patterns to increase ease of articulation as simpler speech is easier to produce.
- This is opposed by the competing need ("constraint") to maintain phonological distinctiveness in speech perception.
One of the reasons why we accent certain words (eg. words containing new information) is so that we can increase their duration, and therefore can avoid undershoot (ie. more time to reach targets).
- Accented words are the words for which we most need to maintain maximum perceptual distinctiveness as we can't rely on context to perceive them. (constraints maximised)
- We tend to not accent words representing given information. We don't require maximum perceptual distinctiveness as word identification is assisted by context. (constraints reduced)
- Function words carry very little of the semantic load and so they are often reduced as there is no need to maintain their perceptual distinctiveness. (constraints almost absent)
These patterns of accenting certain words (new information) and reducing the other words (given information) in a sentence, provides a pattern of timing that results in the greatest time to reach articulatory targets in accented words and much greater chance of undershoot for other words (especially function words).
If we always attempted to maximise phonological distinctiveness in speech perception (without the opposing tendency towards ease of production) then we could perhaps predict the degree to which targets are achieved or undershot from articulator inertia and the timing of each syllable in each accented or unaccented word. However, this isn't always possible.
We vary in the extent to which we maximise distinctiveness from one phonetic context to another. This variation may be language, dialect, local speech community and individual specific. That is, there are different constraints on the extent to which we can relax perceptual distinctiveness and increase ease of articulation. Assimilation is a language (or speech community) specific and also a phonetic context specific relaxation of these constraints. Assimilation constraints may allow some allophones in certain contexts but disallow other allophones in other contexts.
In some cases it may even be possible to increase ease of articulation to a point where a sound assimilates to such a great extent that it becomes more like another phoneme.
- In Australian English, alveolar consonants (especially alveolar oral and nasal stops) readily assimilate to the place of articulation of a following consonant.
- Sometimes this results in an allophonic change (eg. to dental) and sometimes a phonemic change (eg. to velar or bilabial)
This change in Australian English alveolar stops (particularly the phonemic assimilations) also happens in other, but not all, English dialects. It isn't permitted in many other languages.
The extent or degree of such assimilations may vary between discourse contexts (eg. formal vs. informal)
Even where such a relaxation of constraints is permitted, we often find different degrees of assimilation for different speakers or different groups of speakers. (eg. some speakers of Australian English exhibit greater degrees of contextual nasality than do others).
Careful physiological analysis of alveolar to bilabial or velar phonemic assimilations appears to indicate that some aspect of the alveolar articulation is still present. eg. in alveolar to bilabial assimilation there is a clear primary bilabial articulation but often a secondary tongue tip gesture can also be detected (so the alveolar feature is still present)
Assimilation is a language (or dialect or speech community) specific relaxation in constraints that maintain a certain degree of balance between perceptual distinctiveness and ease of articulation. In other words, assimilation could be regarded as a language (or dialect) specific enhancement of the degree of coarticulation.
In the topic "Phonemic (Broad) Transcription of Australian English" there are numerous examples of assimilation where the identity of the phoneme is changed by adopting an articulatory strategy that simplifies articulatory effort. Most of these examples involve the conversion of an alveolar consonant (oral stop or nasal stop or fricative) into the same place of articulation as the following consonant (in these cases alveolar oral and nasal stops become bilabial or velar and alveolar fricatives become palato-alveolar fricatives). There is not a clear case for considering this to be driven strongly by coarticulation as the tongue tip is the most rapidly moveable of all articulators. The reverse seems to be the case. In order to prevent the adjacent sound from being affected by coarticulation (ie. slower lip or velar movements) the intervening alveolar phoneme is substituted for a homorganic phoneme. This seems to be an articulatory or phonological choice made in either the higher levels of articulatory planning or at the even more abstract level of phonological specification.
On the other hand, in the topic "Phonetic (Narrow) Transcription of Australian English" oral and nasal stops are assimilated to the place of articulation of the following labiodental, dental or postalveolar consonants. This can't occur at the level of phonological specification as there is no change of phoneme involved so this suggests a habitual choice made at the higher levels of articulatory planning. It seems best to treat these two types of assimilation (those that change phoneme and those that don't) as being equivalent and therefore not specified at a phonological level but at an articulatory planning level. The fact that this is possible in English without confusion is that there are very few contexts where this would create ambiguity. Such a processes are prohibited in Australian Aboriginal languages which have up to six places of stop articulation. This process tends to be language-specific.
Clark, J., Yallop, C. & Fletcher, J. (2007), An introduction to phonetics and phonology, 3rd. edition, Blackwell, Oxford (pages 84-90 and Section 7.17)
Note: Unfortunately, this reference (and many other references on coarticulation) is expressed in terms of speech acoustics. If you skip over the references to acoustic properties you should still be able to make sense of much of what is written here.
A useful web site
Referred to in the notes, but not required reading:-
Best, C. (1995), "A direct realist view of cross-language speech perception", in Strange, W. (ed.) Speech Perception and Linguistic Experience: Issues in cross-language research, Maryland: York Press.
Boersma, P. (1998), Functional Phonology: Formalizing the interactions between articulatory and perceptual drives, PhD dissertation, University of Amsterdam.
Browman, C., and Goldstein, L. (1992), “Articulatory Phonology: An overview”, Phonetica, 49, 155-180.
Fowler, C. (1986), "An event approach to the study of speech perception from a direct-realist perspective", Journal of Phonetics, 14:3-28.
Liberman, A.M., Mattingly, I.G., & Turvey, M.T. (1967), "Language codes and memory codes", in Coding processes in Human Memory, eds. Melton, A.W., & Martin, E., Washington: V.H. Winston
Liberman, A.M. and Mattingly, I.G. (1985), "The motor theory of speech perception revised", Cognition, 21:1-36.