Skip to Content

Department of Linguistics

Important: If you have not yet either installed the phonetic font "Charis SIL" or tested this installation to determine if the phonetic characters installed properly then click here to go to the phonetic font help pages.

Robert Mannell

Resolving Vowel Formants in Australian English

Please Note: For an explanation of the phonetic symbols used to represent the Australian English vowel phonemes, go to the Australian English Vowel Symbols page

Table 1 displays the mean formant centre frequencies of the 11 Australian English monophthongs as produced by more than 200 adult male speakers (Bernard, 1970; Bernard & Mannell, 1986). The Hertz values are taken from Bernard & Mannell (1986) and represent the mean formant values in citation and sentence context for all speakers. The Bark values have been calculated using the following formula (Zwicker, 1990):-

ERB-rate values have been calculated using the following formula (Moore & Glasberg, 1986) :-

In both cases, F is frequency in kHz.

Vowel
F1
F2
F3
Hz
Bark
ERB
Hz
Bark
ERB
Hz
Bark
ERB
300
2.9
7.3
2280
13.9
22.0
2800
15.2
23.7
ɪ
370
3.6
8.4
2210
13.7
21.8
2740
15.0
23.6
e
460
4.4
9.8
2040
13.2
21.1
2650
14.8
23.3
æ
640
5.9
12.0
1870
12.7
20.4
2600
14.7
23.1
ɐː
740
6.7
13.0
1350
10.5
17.7
2490
14.5
22.8
ɐ
750
6.8
13.1
1400
10.7
18.0
2520
14.5
22.8
ɔ
630
5.8
11.9
1060
8.9
15.8
2420
14.3
22.5
440
4.2
9.5
840
7.4
14.0
2390
14.2
22.4
ʊ
400
3.8
8.9
910
7.9
14.6
2360
14.1
22.3
ʉː
350
3.4
8.1
1600
11.6
19.1
2350
14.1
22.3
ɜː
480
4.6
10.0
1510
11.2
18.6
2520
14.5
22.8

Table 1 : Mean formant centre frequencies in Hz, Bark, and ERB-rate for the monophthong productions of > 200 male speakers of Australian English. The Hz values are from Bernard & Mannell (1986) whilst the Bark and ERB-rate values have been calculated by formula (see text).

It is evident from table 1 that several pairs of vowels are separated from each other by less than 1 Bark or 1 ERB in either F1 or F2. One pair of short vowels /ɪ/ and /e/ are separated by less than 1 Bark for both F1 and F2. Further, there is very little separation between any of the vowels in terms of their F3 (range: 14.1-15.2 Bark or 22.3-23.7 ERB). Clearly, if formant discrimination were of the same order as auditory frequency selectivity then differences in F3 would be too small to be relevant and /ɪ/ and /e/ could not be discriminated. As discussed in Mannell (1994, section 2.2.1.1), frequency discrimination is considerably finer than frequency selectivity even for formant frequency discrimination which is about 4 times coarser than pure tone discrimination. Flanagan (1972) found formant frequency jnds of 3-5% for F1 and F2. This would predict F1 jnds of about 0.1-0.2 Bark and F2 jnds of about 0.15-0.3 Bark. If F3 jnds are assumed to be of the same order (3-5%) then that would predict F3 jnds of about 0.2-0.35 Bark. These levels of discrimination would even allow the discrimination of some pairs of F3, especially /ʉː/ and /ɜː/ where F3 differences may be phonetically important in Australian English. Note also, that /ɐː/ and /ɐ/ are probably not sufficiently separated in F1, F2 or F3 to discriminate between them on the basis of formant values and so their duration distinction will be all that will separate them perceptually. /iː/ and /ɪ/ are separated by more than 1 jnd for F1, but they will also be discriminated because of the /iː/ onglide and the duration distinction. /oː/ and /ɔ/ are separated by more than 1 jnd in both F1 and F2 and so will probably be discriminated both on the grounds of their formant values and their duration. /oː/ and /ʊ/, whilst being separated by less than 1 Bark in both F1 and F2, are nevertheless separated by more than 1 jnd in both F1 and F2 and so will probably also be discriminated both on the grounds of their formant values and their duration.

When the frequency resolution of a speech transmission system (eg. vocoder or the auditory system) is degraded (by increasing bandwidth) a vowel may be affected in one or both of the following two ways:-

i) Closely spaced F1/F2 or F2/F3 peaks will gradually merge together into a single peak (ie. they will no longer be resolved into two peaks).

ii) The formant bandwidths will increase making the determination of formant centre frequency harder for the ear to determine.

For formant merging to occur the separation of the two formants concerned should be of the same order of (or less than) the bandwidth of the filter system. Table 3 shows the F1/F2 and the F2/F3 monophthong formant differences for average male speakers of Aus.E. The Hertz values are derived from Bernard & Mannell (1986) values for all speakers (broad, general and cultivated, pooled together) and the Bark values are derived from table 2. Firstly, no formant pairs have their centres separated by less than 1 Bark. It must be remembered, however, that the formants have characteristic bandwidths that will decrease their separation below the values derived from their centre frequencies. This may mean that centre frequencies will need to be separated by about 1.5 Bark before they can be resolved. It is likely, therefore, that only /iː/, /ɪ/, and /e/ will have unresolved F2 and F3 for a transmission system with a 1 Bark bandwidth.

If the large-scale (3-3.5 Bark) spectral integration hypothesis of Chistovich and colleagues (eg. Chistovich & Lublinskaja, 1979) is correct, then one would expect F1/F2 merging for /ɔ/ and /oː/ and F2/F3 merging for /iː/, /ɪ/, /e/, /æ/, /ʉː/ and /ɜː/. Because, according to this hypothesis, their F1 and F2 have already been integrated together, one would expect that /ɔ/ and /oː/ would be resistant to fronting or height confusions caused by increasing the bandwidth of a transmission system such as a vocoder. For these two vowels, a very broad composite F1/F2 peak would result from integrating the two formants together. Uncertainty as to formant centre frequency, caused by broad transmission channel bandwidths, would be a secondary concern, for the absence of an effective F2 peak would be a much stronger cue to their identities. If large-scale 3-3.5 Bark spectral integration is the sole representation involved in vowel phonetic processing then a transmission system with filter bandwidths up to 3 Bark should not affect vowel intelligibility. On the other hand, if 1 Bark integration is the basis of vowel phonetic processing then /ɔ/ and /oː/ would be expected to be the pair of vowels most affected by a transmission system with a 3 Bark bandwidth as they would be the first vowels to lose their F1/F2 separation.

Vowel
(Monophthong)
F (F2-F1)
F (F3-F2)
Hertz
Bark
Hertz
Bark
1980
11.0
520
1.3
ɪ
1840
10.1
530
1.3
e
1580
8.8
610
1.6
æ
1230
6.8
730
2.0
ɐː
610
3.8
1140
4.0
ɐ
650
3.9
1120
3.8
ɔ
430
3.1
1360
5.4
400
3.2
1550
6.8
ʊ
510
4.1
1450
6.2
ʉː
1250
8.2
750
2.5
ɜː
1030
6.6
1010
3.3

Table 2 : Distances between F1/F2 and F2/F3 for the 11 monophthongs of Australian English (values derived from Bernard & Mannell, 1986)

It is clear from table 2 that poor frequency resolution on a uniform Hertz scale will affect /oː/ or /ɔ/ F1/F2 separation before it will affect /iː/ or /ɪ/ F2/F3 separation, but when resolution is degraded uniformly on the Bark scale then /iː/, /ɪ/, /e/, /æ/, /ʉː/ and possibly /ɜː/ F2/F3 separation will be more vulnerable than F1/F2 separation for any vowel. In a normal auditory system /iː/ and /ɪ/ are already close enough together not to be well resolved as two peaks. A doubling of auditory bandwidth (not unusual in the hearing impaired) would result in failure to resolve the F2 and F3 of about half of the monophthongs but would have little effect on F1/F2 separation. The confusion between vowel pairs with similar values for F1 and F2 would affect the pair /ʉː, ɜː/ first as the average difference between them on the F1/F2 plane is less than about 130 Hz. A filter system with a frequency resolution of 200 Hz could be expected to blur the distinction between several Australian English vowel pairs. A similar effect would occur with a filter system that has a resolution of 2 Bark. In general, it is expected that broadening the bandwidths of vocoded vowels will reduce the listener's accuracy in determining the centre frequency and would cause confusion with adjacent vowels on the F1/F2 plane. The pairs , iː/, , ɐː/, and , oː/ are each very close in F1, F2 and F3 (<100 Hz) and if it were not for a [short] vs [long] temporal opposition, they would be confused even in unsynthesised speech. The pairs that are most likely to suffer from the frequency resolution of the vocoder system used in this study are those with differences between F1, F2 and F3 of less than 225 Hz, that also share the same length feature. They are , e/, /e, æ/ and /ʉː, ɜː/.

References

Bernard, J.R. (1970) "Toward the acoustic specification of Australian English", Zeitschrift fur Phonetik, Sprachwissenschaft und Kommunikationsforschung, Band 23, Heft 2/3

Bernard, J.R. and Mannell, R.H. (1986) "A study /h_d/ words in Australian English", Working Papers of the Speech, Hearing and Language Research Centre, Macquarie University

Chistovich, L.A. & Lublinskaya, V.V. (1979) "The 'center of gravity' effect in vowel spectra and critical distance between the formants: psychoacoustical study of the perception of vowel-like stimuli", Hearing Research 1, 185-195

Flanagan, J.L. (1972) Speech Analysis, Synthesis and Perception, Springer-Verlag, Berlin

Mannell, R.H. (1994) The Perceptual and Auditory Implications of Parametric Scaling in Synthetic Speech, Ph.D. dissertation, Macquarie University

Zwicker, E. & Fastl, H. (1990) Psychoacoustics: Facts and Models, Springer-Verlag: Berlin