Skip to Content

Department of Linguistics

SPEECH ACOUSTICS

Consonant Acoustics

The Acoustic Characteristics of Stops

Felicity Cox

The phonetic symbols used on these pages are know as the ANDOSL machine-readable symbols. Click here for information on how to read these symbols.

Introduction

The classes of speech sounds that we have discussed to date have all been periodic resonant sounds. The vowels, diphthongs, approximants and nasals are all produced in this way.

In contrast, the stops, fricatives and affricates make use of the vocal tract as the sound source.

In the case of voiced stops, fricatives and affricates there are two sound sources, the periodic laryngeal source combined with the aperiodic vocal tract sound source.

Aperiodic sound is produced by two different types of disturbance:

  1. Sudden release of air pressure built up behind closure. e.g. stops
  2. Turbulence in the air rushes through a narrow constriction e.g. fricatives

Manner of Articulation

From an articulatory point of view, stops are produced with a closure within the oral cavity, a build up of pressure behind this closure and a release of the closure allowing the air to be rapidly expelled.

Acoustically these events can be divided into five components

  1. Occlusion
  2. Transient
  3. Frication
  4. Aspiration
  5. Transition

The occlusion is the period during which there is a stoppage of the airflow during which the pressure increases. It is characterised by silence or the absence of energy. Voiced stops may have low frequency (0 - 500Hz) periodic energy during this phase.

The transient corresponds to the release of the closure. It is characterised by a spike on the spectrogram of intense energy with a duration of about 10msec.

The frication component is the result of the combination of high intra-oral pressure being released through a narrow opening at the point of release.

The aspiration phase is the result of the vocal tract opening even further with turbulence through the glottis rather than the oral constriction. Formants are often present during this phase.

The transition is the component where formants are present and the oral tract is moving to the position for the following vowel target.

In practice it is difficult to differentiate the transient from the frication so this complex is generally referred to as the burst.

Acoustic Cues to the Voiced/Voiceless Distinction

1. VOT

Voiced and voiceless stops differ in the co-ordination between supralaryngeal and laryngeal events. This difference is referred to as differences in Voice Onset Time (VOT).

Voice onset time is the time that voicing begins relative to consonant release.

In English, voiceless stops have large VOT values and voiced stops have small or negative VOT values. Negative VOT occurs when the periodicity begins before stop release i.e. during closure.

English speakers will hear a consonant as voiceless if the VOT is over 25msec for bilabials, over 35 msec for alveolars and over 40 msec for velars.

VOT values separating voiced from voiceless stops are language specific.

Spanish and French make use of prevoiced stops (negative VOT) and contrast these with positive VOT stops. English does not recognise a difference between prevoiced and voiceless unaspirated.

Thai speakers make a three way distinction for bilabials and alveolars. Voiced, voiceless unaspirated, voiceless aspirated.

Values also change in context.

VOT separation decreases for stops produced in sentences compared with initial stops produced in isolated words.

Stressed voiceless are produced with greater VOT values than unstressed.

VOT increases when stops occur in Stop Approximant sequences

VOT for unaspirated stops (/sC/ clusters) is close to VOT for voiced stops in CV syllables.

2. F1

The first formant provides important acoustic information about the voicing characteristics of the stop.

F1 is very low during complete closure.

For voiced stops F1 rises very quickly from the burst to the vowel target formant position. The rise is steepest in open vowels where F1 is high, and flattest in close vowels (low F1).

For voiceless stops, periodicity occurs at least 30 msec later than voiced stops so less of the formant will be pulse excited. By the time pulse excitation begins, the formant has almost reached the vowel target.

On spectrograms, voiced stops are characterised by a voiced, rising F1 transition which is not present in voiceless stops due to the fact that

  1. pulse excitation begins later in the transition for voiceless stops
  2. aspiration requires the glottis to be open which (due to the large resonating sub laryngeal chamber) causes an attenuation of F1.

For VC syllables F1 should fall sharply into the closure for voiced stops. The offset frequency should be higher for voiced than voiceless stops.

3. Preceding Vowel Duration

Duration of vowels before voiceless stops is shorter than before voiced stops.

52-69% shorter vowel duration before voiceless than voiced stops.

4. Other Cues

Voiced stops have voicing/periodicity during closure when in intervocalic or postvocalic position.

The duration of the intervocalic closure provides an additional cue to voicing.

Closure is greater for voiceless than voiced e.g. rapid vs rabid

The onset frequency of F0 is higher following voiceless than voiced stops.

The intensity of the burst of voiceless stops is greater than that of voiced.

Characteristics of english Stops in Context

Aspiration

  1. When /p,t,k/ are followed by /r,l,w,j/ the aspiration manifests itself in the devoicing of the approximants. "please", "try", "clean", "pew"
  2. In final position and in unstressed syllables aspiration is weak.
  3. When /s/ precedes /p,t,k/ initially , there is no aspiration.

Closure

  1. /b,d,g/ are only fully voiced during closure when they occur intervocalically

Release

Generally, stops have a release stage in the form of aspiration or as a following vowel. However, there are instances where the release does not occur.

  1. No audible release in final position: e.g. rope/robe
  2. No audible release in stop clusters: e.g. dropped, locked, good boy
  3. Glottal reinforcement of final voiceless stops:
  4. Nasal release: If a stop is followed by a homorganic nasal in the following syllable, the release of air is usually via the nasal cavity. e.g. topmost, submerge, cotton, not now, red nose.
  5. Lateral release: When the homorganic stops /t,d/ occur before /l/ they are released laterally. The tip remains in contact with the alveolar ridge but one or both of the sides is lowered allowing the air to escape. e.g. cattle, medal, atlas.

Place of Articulation

Place of articulation for stops is determined by the characteristics of

  1. the burst
  2. the transitions

The Burst

The burst is the combination of the transient and frication phase. It provide important information for place of articulation. The frequency spectrum for the alveolars and velars results from resonance of the cavity in front of the tongue constriction.

For the alveolars, the front cavity is small and place of articulation doesn't alter greatly under the influence of different vowels.

For velars, the front cavity shape varies greatly with different vowels.

There are three important parameters of the burst that allow us to differentiate the place of articulation of stops:

  1. Energy level
  2. Spectral centre of gravity (frequency location of the main energy concentration)
  3. Spectral variance (whether the spectrum lacks peaks or has multiple peaks)

1. Energy Level

Alveolar stops have the most intense bursts and bilabials have the weakest bursts (Due to the lack of resonance for bilabials as no front cavity to amplify the sound). There is little difference between the alveolar and velar.

2. Energy Distribution: (Centre of gravity and spectral variance)

Bilabials lack any main resonance in the 0-10kHz range as there is no front cavity so they are characterised by a gradually falling distribution of energy throughout the frequency range.

Alveolars - broad distribution of energy in the burst characterised by prominence about 1.8 kHz and another rise between 2.5 -4.5 kHz.

Velar - compact concentration of energy in the middle of the spectrum which varies according to F2 and F3 of the following vowel.

The frequency position of the energy for velars derives from the cavity in front of the tongue constriction.

Prevelar (before front vowels (/kip/, /gis/), compact energy is distributed around a centre frequency of about 3 kHz.

Postvelar (before back vowels(/ko:t/, /go:d/) compact energy is distributed around a centre frequency of about 1 kHz.

high frequency bursts = alveolar 3kHz to 4kHz

low frequency bursts = bilabial 350Hz (but higher for front vowels)

bursts with energy slightly above the F2 for the following vowel = velar e.g back vowels = low F2 :700Hz, front vowels high F2: 3kHz

Formant Locus and Transitions

The locus theory proposes that the place of articulatory closure for each of the three places of articulation is relatively fixed regardless of following vowel and that this articulatory invariance has its acoustic correlate in the starting frequency of the second formant. Even though the formants may not reach the actual locus position they will still point to it.

Once we know the locus frequency we should be able to predict the slope of the second formant transition if we know the following vowel formant frequencies.

Therefore: The locus for /b/ is low (720Hz) and most vowels would have an F2 value greater than that then the transition will be rising in /bV/ syllables.

The locus for /d/ being at 1800Hz means that for central and back vowels F2 will fall in /dV/ syllables but will be level or slightly rising in /di,dI,de/.

Only the alveolars can be considered to have a relatively stable locus at around 1800 Hz. Cassidy and Harrington (1994) found that the variability in F2 onset frequency is least for /d/ followed by /b/ then /g/.

It seems clear that for bilabials and velars there is not an invariant locus value as modifying the following vowel will produce large changes in the formant frequency values.

For instance, for bilabials F2 and F3 will have rising transitions before front vowels but F2 will be falling before back vowels.

When F3 information is included we get a better picture of how the stops cluster. F2/F3 plots show a tendency of there to be three clusters corresponding to bilabial, alveolar and velar. However there are examples of bilabials which are potentially confusable with velars preceding back vowels. If we examine the change in F2 relative to the change in F3 (the difference between the formant value at onset and the value at the vowel target) then these bilabials are well separated.

It seems clear that we cannot separate place of articulation on just one dimension such as F2 locus. Several variables are required to give the whole picture.

It is also clear that locus is not invariant as it will change substantially as a result of coarticulation.