Department of Linguistics
SPEECH ACOUSTICS
Vowels
| This article is an extract (chapter 3) from: Cox, F. M. (1996) An Acoustic Study of Vowel Variation in Australian English, Ph.D. dissertation, Macquarie University |
| Important: If you have not yet either installed the phonetic font "Charis SIL" or tested this installation to determine if the phonetic characters installed properly then click here to go to the phonetic font help pages. |
3.1 Vowel Acoustics
In vowel production, the vocal tract is excited by the quasi-periodic series of air pulses that pass through the laryngeal valve during the open phases of the vocal fold vibratory cycle. The specific characteristics of each vowel sound are a direct consequence of the momentary attributes of vocal tract size and shape. This process can be described in terms of the source filter theory of vowel production.
3.1.1 The source filter theory
The vocal tract in the production of a neutral vowel can be approximated by a tube of constant cross-sectional area which is closed at one end (the glottis) and open at the other (the lips) (Chiba and Kajiyama, 1941). The uniform tube of this type has predictable resonance properties and will accommodate standing waves or acoustic disturbances that propagate in both directions due to reflection at the closed end and impedance at the open end. Chiba and Kajiyama (1941) specify that when the length of the tube is greater than the diameter, air pressure is reflected when minimum pressure (maximum volume velocity) occurs at the open end and maximum pressure (minimum volume velocity) at the closed end of the tube. Such boundary conditions are met for the quarter wave and for odd multiples of the quarter wave. The resonant frequencies are calculated using the formula:

where Rn is the nth resonance, c is the speed of sound in air and L is the total length of the tube (Fant, 1960).
For the typical male vocal tract of length 17.6cm (Fant, 1960), and the speed of sound in air at 35200 cm/s, the first resonance can be calculated as:

The successive resonances occur at odd multiples of the lowest resonance.
Human vocal tracts vary in length depending on the size, age, and sex of the individual and vocal tract resonant frequencies vary as a consequence of these parameters. Longer vocal tracts resonate at lower frequencies than shorter vocal tracts; hence the resonant frequencies for females are higher than for males and those for children are higher than those for adults (Chiba and Kajiyama, 1941).
The natural resonances of the tube can be specified by a resonance curve detailing the frequencies at which the air vibrates maximally (the resonance centre frequencies) and the range of frequencies on either side of the centre frequency which are within 3dB of the peak amplitude (the resonance bandwidth). The resonance curve is also the spectrum of the impulse response of the vocal tract filter and it provides information about which components of the excitation source (glottal source) are amplified and which attenuated. When the glottal source is passed through the vocal tract filter, the spectrum of the output sound has characteristics of both the filter and the source. The output components are the result of the source but their relative amplitudes are a result of the combination of the source slope and the filter. The peaks that are present in the output spectrum are known as the formants and their exact specification is a consequence of the spectral properties of both the source and the filter. As the filter and the source make relatively independent contributions to the output signal (but see Klatt and Klatt, 1990), a change to one will have a minimal effect on the other . For instance, modifying the vocal tract filter characteristics by changing the configuration of the vocal tract can be achieved with very little effect on the laryngeal source. Conversely, the same vowel type can be produced by maintaining vocal tract shape while modifying the source characteristics.
3.1.1.1 The glottal source
Vocal fold vibration for voicing is achieved by the combined efforts of muscular tension, tissue elasticity and aerodynamic forces. The vocal folds are initially drawn together by the activities of the various laryngeal adductor muscles. As the folds come together the velocity of air passing through the glottis increases which results in a pressure drop between the medial edges of the folds (Bernoulli effect) causing them to be sucked together. Pressure then builds up below the closed glottis until the folds are forced apart and the cycle repeats (Van den Berg, 1958; 1968). One necessary condition of voicing is that subglottal pressure exceeds supraglottal pressure (the transglottal pressure difference) (Ohala, 1983; Sawashima and Hirose, 1983).
The activity of the larynx during phonation causes the airstream flowing out of the lungs to be broken up into a rapid series of puffs due to the opening and closing of the vocal folds . Each burst of compressed air escapes through the glottis at high speed and collides with the column of air inside the vocal tract. This causes an acoustic shock wave which is propagated to the outside.
The spectrum of the periodic glottal waveform is a line spectrum comprising harmonics which occur at multiples of the fundamental frequency. According to theoretical calculations (Fant, 1960; Rosenberg 1971), the glottal tone for normal phonation has a spectrum that falls off at about 12dB per octave. Other phonation types, as described by Laver (1980), display different glottal tone characteristics.
3.1.1.2 The vocal tract filter
As we have seen, the vocal tract for the production of neutral vowels, such as [ɜ], can be likened to a tube of equal cross section along its length. Although this model works well for neutral vowels, it fails to account for the cross sectional area perturbations that are characteristic of most other vowel articulations. Researchers have attempted to account for more complex vocal tract shapes by reducing the vocal tract configuration to a series of interconnecting cylindrical tubes with specified lengths and areas (Lindblom and Sundberg, 1971; Stevens and House, 1955). The resonance curves of each of the tubes can then be derived from the transfer function of the equivalent cylindrical representation as lossless tubes. In order to simplify the process of estimating the resonances, several assumptions are made about the shape of the tract and about the propagation of the air stream. Some of the simplifications are as follows:
- The bend in the vocal tract between the pharynx and the oral cavity is ignored.
- Differences in cross sectional shape are considered less important than cross sectional area.
- Diameter change within each component part are considered negligible, hence the assumption that the tract can be modelled as a series of cylinders.
- Wave propagation is assumed to be one-dimensional.
- The tubes are considered to be lossless.
Transfer functions can therefore be successfully estimated for vowels from the area function with adjustments made to account for predictable effects such as lip radiation (Fant, 1960; Flanagan, 1972). Clark and Yallop (1990) state that the head functions as a "spherical baffle of about 9cm radius" which favours the propagation of high frequency sound and hence the spectrum of the output sound rises by approximately 6dB per octave (Flanagan, 1972). As a result, the -12dB per octave slope associated with the glottal tone reduces to -6dB per octave. All vowel sounds reflect the source, transfer and radiation effects.
Fant (1960) has shown that the vocal tract transfer function for specific vowel articulations can be quite accurately calculated from a four-tube, three-parameter model. This model successfully accounts for vowels as they are produced with a narrowing of the vocal tract by the tongue such that the constriction itself creates a cavity. The four tube model comprises a cavity behind the constriction (from the glottis to the constriction), a cavity in front of the constriction (from the constriction to the lips), the constriction cavity, and the lip constriction (the size of the mouth aperture). The three parameters that are important in specifying the vowel are the distance from the glottis to the constriction, the area of the constriction, and the degree of lip opening. Nomograms (Fant, 1960) are used to display the acoustic consequences of modifying the constriction position and the degree of lip opening. The nomograms clearly illustrate that the first two resonances vary systematically with constriction position, size and degree of lip opening. When the constriction is at the front of the oral cavity, hence a large horizontal distance from the glottis, the first and second resonances are widely separate in frequency, whereas, when the constriction is near the glottis, the first two resonances are close together and low in frequency. The output of vocal tract models such as those documented by Stevens and House (1955) and Lindblom and Sundberg (1971) support these theoretical articulatory effects. It appears from analysis of nomograms that the first resonance is inversely related to vowel height and that the second resonance is related to vowel fronting. All else being equal, both resonances are lowered as lip aperture decreases; however the relationship between lip rounding and the formants is dependent on constriction location. Fant's four-tube three-parameter model supports the findings of Chiba and Kajiyama (1941) that the resonant frequencies change according to the position of the constriction relative to the maximum and minimum pressure points in the vocal tract. It is also clear, however, from the work of Stevens and House (1955) and others involved in articulatory modelling that there is a many to one relationship between articulation and acoustics. Hence caution must be exercised when making articulatory inferences from acoustic information.
Stevens (1972, 1989) suggested a quantal relationship between speech acoustics and articulation. The quantal theory was developed in response to observations that variations in place or constriction for certain vocal tract configurations produce minimal acoustic change. The three quantal positions for vowels were identified as; the high front represented by the vowel [i]; the high back, represented by the vowel [u]; and the low central, represented by the vowel [a]. Stevens proposed that human language exploits these quantal positions in order that articulatory imprecision does not hamper communication and this ensures maximal efficiency. Researchers have had difficulty testing the assumptions of the quantal theory and evidence of its merits are contradictory . However, Syrdal and Gopal (1986) report data which provide support for some aspects; specifically that the close approximation of two formants, which is a characteristic of the quantal vowels, display reduced variability on an auditory scale. Beckman, Jung, Lee, de Jong, Krishnamurthy, Ahalt, Cohen and Collins (1995) also support the quantal theory by showing that the vowels [i, u] and [a] exhibit greater variability in constriction location than constriction degree, and Mrayati, Carre and Guerin (1988) present a theory of speech production based on perturbation theory which also lend some support to the quantal theory.
3.1.2 Acoustic Specification Of Vowels
Vowel sounds are most frequently described with reference to their formant characteristics which provide an indication of the resonance positions and hence the articulatory shape for the vowel production.
Early speech perception studies (Delattre, Liberman, Cooper and Gerstman, 1952; Miller, 1953) showed that the frequencies of first three formants were the most important cues to vowel identification. These findings have been supported by several subsequent analyses (Fox, 1985, Kewley-Port and Atal, 1989; Klein, Plomp and Pols, 1970; Rackerd and Verbrugge, 1985; Shepard, 1972; Terbeek, 1977). The first formant has been shown to be associated with the auditory quality of height and the second formant with the auditory impression of the front/back dimension, or, more correctly, degree of constriction and point of maximal constriction . Ladefoged, De Clerk, Lindau and Papçun (1972) remind us that degree of lip opening, or protrusion, pharyngeal width and larynx height also contribute to modifications of acoustic output. Lindblom and Sundberg (1971) found that all formants were lowered by lip rounding but that for palatal configurations, F3 was particularly affected. Högberg (1995) also found that lip area was an important factor in the determination of F3 for the front vowels. When the first two formants are plotted on axes with certain directional and scaling characteristics, the vowel relationships closely resembles the traditional auditory vowel map . Such vowel spaces, with axes F1 and F2, rely on the concept of the vowel target which is the part of the vowel least influenced by its surrounding phonetic context. The vowel target is where the articulators, and therefore the formants, are moving the least and is referred to as the steady-state component of the vowel. The target is considered to be either a point in the time course of the vowel or else a section of time during which the vowel position remains stable. A single point is often used to provide an estimate of the target position, and for most vowels this can be assumed to be approximately mid way though the nucleus . Several authors have noted the problems inherent in the target theory for vowels citing the difficulties often encountered in establishing steady state components by eye or by automatic extraction procedures (Benguerel and McFadden, 1989; Nearey and Assmann, 1986). Van Son and Pols (1990), however, examined five different methods of identifying vowel targets and found that the use of the different methods made little difference to the results of their experiments.
The conventional method of depicting the F1/F2 does not adequately represent the multi-dimensional nature of vowel quality. Delattre et al. (1952) showed that the third formant influenced listeners judgements of vowel quality and more recent experiments have determined that the higher formants have a combined influence on vowel perception. The combined upper formant is referred to as F2 prime (F2') (Bladon, 1983; Bladon and Fant, 1978; Carlson, Fant and Ganstrom, 1975; Paliwal, Lindsay and Ainsworth, 1983). Delattre et al. (1952) suggested that the ear averages formants that are close together. Carlson, Ganstrom and Fant (1970) tested this hypothesis for Swedish vowels concluding that all vowels could be effectively synthesised using two formant approximations. Chistovich and colleagues found that formant averaging or integration occurred only if two formants were situated within a critical distance of 3 to 3.5 bark (Chistovich and Lublinskaya, 1979 and Chistovich, Sheikin and Lublinskaya, 1979). More recent studies have examined global spectral features suggesting that the F3 - F2 difference is a more accurate way of identifying vowel frontedness. Syrdal and Gopal (1986) have shown that the separation between back and front vowels is more closely linked to the F3 - F2 difference than the F2 - F1 difference. It is important to recognise, however, that F3 and F4 vary more than F1 and F2 as a result of speaker characteristics whereas they are relatively stable across vowel categories in contrast to F1 and F2 which vary greatly as a result of vowel quality. The higher formants are therefore less effective carriers of phonetic information than the lower formants (Harrington and Cassidy, 1999).
3.1.3 Vowel length
Vowels may be distinguished not only by resonances but also in terms of duration. This is due to the phonemic nature of vowel length in many languages (Ainsworth, 1972, 1981; Klatt, 1976; Ladefoged and Maddieson, 1990; Lehiste and Peterson, 1961; Lindau, 1975; Peterson and Lehiste, 1960). Classification experiments have demonstrated increased accuracy when durational information is included as part of the model (Harrington and Cassidy, 1994; Hillenbrand et al., 1995).
In Australian English, the vowels /ɪ, e, æ, ɐ, ɔ/ and /ʊ/ are considered to be short vowels, while all the others (except schwa) are long vowels (Bernard, 1967b; Clark, 1989; Mitchell and Delbridge, 1965b). Bernard (1967b) and Cochrane (1967) have shown that duration is the primary cue for discriminating between the vowels /ɐː/ and /ɐ/ in Australian English. The major difference between the long and the short vowels is simply one of total vowel duration, however, the difference is relative rather than absolute, as contextual and prosodic factors will affect the ultimate length of the vowel. There is also some evidence to suggest that short vowels have proportionately longer offglide components than long vowels (Huang, 1986; Lehiste and Peterson 1961; Rackerd and Verbrugge 1985; Strange 1989). Lehiste and Peterson (1961:274) state that for the short "lax" vowels there is "a short target position and a slow relaxation of the hold" whereas for long "tense" vowels "the target position is maintained for a longer time, and the (articulatory) movement away from the target position is relatively rapid". It has been clearly documented that openness is positively correlated with length and therefore open vowels will tend to be longer than close vowels (Lindblom, 1967; Lisker, 1974). Lindblom (1967) suggests that this is a phonetic universal and the result of the increased biomechanical effort required to produce low vowels. Cochrane (1967:248) states, however, that this universal principle is "subject to partial or total suppression through phonological or other causes."
Klatt (1976) presents a summary of the factors that affect segmental duration. He includes; speaking rate, semantic emphasis, word final lengthening, phonological/phonetic influences such as inherent segmental duration, the effect of linguistic stress, and the effect of a postvocalic consonant as important determiners of the durational characteristics of vowels.
3.2 Monopthongs versus Diphthongs
Under the traditional target theory of speech production vowels can be classified as either monophthongs or diphthongs. The monophthongs are sometimes referred to as single target vowels which can be adequately described by a single spectral slice, commonly called the target. Diphthongs, however, require two target specifications due to a glide in the formant pattern resulting from inherent articulatory change. Indeed, traditional transcription systems have recognised the changing nature of diphthongs by symbolising them as a sequence of two monophthongs. This characterisation implies that diphthongs can be described with reference to the acoustic structure of the component monophthongs (Collier, Bell-Berti and Raphael, 1982; Clark, 1989). It is generally agreed, however, that a diphthong is not a sequence of two monophthongs and does not have components which are identifiable as particular monophthongs (Ladefoged, 1982). There are several reasons for accepting the monophonematic interpretation of diphthongs ; primarily that diphthongs behave phonologically as monophthongs. Clark (1989) proposes a revised transcription system for Australian English where diphthong symbols are based on correspondences found in Bernard (1967a) between monophthongs and diphthong components. Therefore, the assumption remains that diphthongs are indeed systematically related to monophthongs.
Most vowel research has involved analysis of monophthongs, with diphthongs comparatively neglected as a class of sound. This is possibly the result of traditional phonological theory in which diphthongs were classified as forms derived from underlying long monophthongs (Chomsky and Halle, 1968). Some early studies, however, did compare the formant values of the diphthong components with those of their associated monophthongs (Holbrook and Fairbanks, 1962; Lehiste, 1964). The results suggest a large degree of variability between individuals, contexts, and dialects and no firm universal conclusions are possible based on this research. These results do, however, accord with Ladefoged's view that diphthongs do not have components which are identifiable as particular monophthongs (Ladefoged, 1982). Lindau, Norlin and Svantesson (1985) also found a great deal of intradiphthongal as well as interlanguage variability in their cross linguistic examination of durational components of diphthongs. Based on this small amount of evidence, it seems that the actual phonetic representation of the diphthong and its relationship to other vowels in the phonetic space is dialect specific and extremely variable.
Gottfried, Miller and Meyer (1993) provide a brief history of the examination of American English diphthongs concluding that there are three general ways in which diphthongs can be described based on different combinations of acoustic parameters. The three different descriptive systems involve specification of 1) two targets, 2) onset plus slope, 3) onset plus direction. Gottfried et al. (1993) report that all three systems provide information resulting in over 90% correct classification scores with the two target condition giving slightly better results than the other two. Bladon (1985) also found that perceived identity of the diphthongs was determined by the endpoint steady states. Bladon questioned Gay's (1970) theory that diphthongs are primarily identified from the first target combined with information about the direction of the trajectory. Gay's conclusions are based on the observation that diphthong second targets display considerable undershoot at fast speaking rates. Bladon suggests that the second targets of diphthongs encompass a large target area as there is little competition for these second elements and hence undershoot does not compromise the integrity of the vowel.
Diphthongs can be categorised as either closing or centring (probably more accurately as outgliding and ingliding) determined by the movement of the glide either towards a peripheral position or towards a central position in the vowel space. Australian English diphthongs are sometimes called two target vowels with the two targets used to specify both the direction and the range of the glide (Clark and Yallop, 1990).
Bernard (1967a) found that speakers of the three different accent types for Australian English displayed different durational relationships between their diphthongal component parts. Cultivated speakers exploited longer transitions whereas Broad speakers used longer first targets.
3.3 Target vs Dynamic Theories of Vowels
There are two major theories of vowel identification that relate to the type of information required by listeners for accurate perception. The two theories can be summarised as the target theory and the dynamic theory. The target theory is based on the traditional view that each vowel contains a relatively steady articulatory, and hence acoustic, component which is the principle cue to its identification. Numerous automatic classification studies based on pattern recognition techniques have demonstrated accurate separation of vowels from static spectral characteristics and therefore provide support for the theory that, at least for monophthongs, vowel target appears to be a robust cue to vowel identification (Harrington and Cassidy, 1994; Hillenbrand and Gayvert, 1993; Syrdal and Gopal, 1986; Zahorian and Jagharghi, 1993). In contrast to this, the dynamic theory, considers spectral change part of the essential information for the identification of vowels. The dynamic theory was first developed in response to observations made by Strange and colleagues that listeners were more accurately able to identify vowels in context than in isolation (Strange, Verbrugge, Shankweiler and Edman, 1976). Several subsequent studies have used modified silent centre syllables, in which the target components have been removed from the signal, to demonstrate that vowels are well identified in the absence of a target (Benguerel and McFadden, 1989; Fox, 1989; Strange, 1989). Researchers have also observed that, despite adequate performance of models based on target information, results were still inferior to those obtained from human listeners. Hillenbrand and Gayvert (1993) hypothesised that the reason for this discrepancy is that human listeners have the benefit of dynamic information such as duration and spectral change. Some proponents of the dynamic theory suggest that all vowels are non-contextually, inherently dynamic and that spectral change contributes to the identification of monophthongs in much the same way as it does for diphthongs (Andruski and Nearey, 1992; Nearey, 1989; Nearey and Assmann, 1986). Harrington and Cassidy (1994) used classification tasks to assess the dynamic nature of Australian English vowels produced by 10 speakers. Using a combination of Gaussian classification techniques and neural networks they found that spectral information from multiple time points combined with duration was required only for the classification of diphthongs. Monophthongs were successfully classified from information at the midpoint combined with vowel duration. This result is contrary to those of Hillenbrand et al. (1995), Huang (1992) and Zahorian and Jagharghi (1993) who demonstrated better vowel separation when transitional information was included in their classification experiments. The different methodologies employed in these studies, as well as the possibility of dialect specific effects, must not be overlooked as explanations for the different results obtained. For instance, Hillenbrand et al. (1995) admit that some of the monophthongs in their study are typically realised as diphthongs, an admission which casts some doubt on the conclusions reached. Despite some evidence for improved performance when dynamic information is included in the classification models, Zahorian and Jagharghi (1993), acknowledge that the steady state component of the vowel carries the most important cues to vowel identity, followed by the final transition and then the initial transition, and that transitional information supplies additional vowel information. Hillenbrand and Gayvert (1993) also comment that the phonetic information is largely preserved in static spectral cross sections.
3.4 Phonetic Aspects of Vowel Variation
3.4.1 Situational Effects
The articulation of a particular vowel is influenced by its surrounding segmental and suprasegmental context. This process is referred to as phonetic vowel reduction (Joos, 1948) and is the direct result of target undershoot. Target undershoot refers to the articulatory situation where the ideal vowel target (canonical configuration) cannot be reached due to either mechanical limitations or neuromuscular coordination (Clark and Yallop, 1990). Two different types of undershoot serve to increase the variability in the speech signal. These are centralisation and contextual assimilation (Harrington and Cassidy, 1999).
Centralisation is the tendency for peripheral vowels to shift to a more neutral position when produced in destressed positions or at faster speaking rate resulting in a shrinkage of the vowel space (Miller, 1981). At fast speaking rates, vowel duration decreases leaving the articulators less time to achieve optimal placement, hence target undershoot occurs (Gay, 1968; 1978; Gay, Ushijima, Hirose and Cooper, 1974; Lindblom, 1963; 1964; 1983; Lindblom and Moon, 1988) . A similar situation applies in unstressed syllables which are shorter in duration than stressed syllables (Engstrand, 1988; van Summers, 1987).
Fourakis (1991) found that context was a more important parameter than tempo and stress in determining phonetic vowel reduction. Contextual assimilation is when a vowel is modified according to its surrounding context by assuming some characteristic associated with the context. Speech gestures overlap in time and in most instances contextual effects will cause the vowel to become centralised. Lindblom and Studdert-Kennedy (1967) showed that listeners' perception of vowels was influenced by surrounding consonantal context implying that listeners compensate for articulatory effects of consonants on vowels. Much research has been conducted to examine the effects of context on vowel formant frequencies (Fourakis, 1991; Lindblom, 1963; Moon and Lindblom, 1994; Stevens and House, 1963; Stevens, House and Paul, 1966; van Bergem, 1993) showing that many consonantal effects on vowel realisation are the predictable consequences of articulatory accommodation.
Researchers investigating clear speech processes and the effects of speech under difficult conditions have observed that vowel reduction is minimised in such situations (Moon and Lindblom, 1994). The acoustic consequences of clear speech include: more peripheral vowel formants, slower tempo, greater discrepancy between the lengths of long and short vowels, greater velocity of second formant transitions, increased durations of all segments, greater degree of diphthongisation, and complete release of final stops (Clark, Lubker and Hunnicut, 1987; Lindblom and Lindgren, 1985; Moon and Lindblom, 1994; Palethorpe, 1992; Picheny, Durlach and Braida, 1986).
3.4.2 Speaker Effects
The acoustic characteristics of an individual's vowel sounds are the result of a combination of both physiological and phonetic factors. Phonetically equivalent vowels vary from speaker to speaker as a consequence of individual physiology which affects the resonance characteristics and hence acoustic output. Listeners are able to ignore these differences when decoding the speech signal probably through the actions of a set of transformations which standardise the acoustic output. Researchers have attempted to replicate these transformations and have developed numerous methods for reducing the impact of the speaker specific physiological effects by applying algorithms to the speech signals. Such methods are referred to as normalisation procedures and their aim is to reduce the variance in the data such that the speaker specific effects are minimised, while preserving phonetic information and hence rendering the data suitable for comparison with other normalised data. The technique of normalisation is particularly valuable when speaker/population characteristics are substantially different such as between males and females or children and adults. Normalisation, theoretically, allows researchers to make comparisons between such disparate populations.
Joos (1948) suggested that listeners' accurate perception of vowels is dependent on information contained in the point vowels [i, a, u]. Joos' theory was that once listeners had received point vowel information they could then scale all other vowels according to this maximal space and therefore accurately identify intermediate vowels. Ladefoged and Broadbent (1957) were able to confirm that listeners did indeed use extrinsic information in the perception of vowels by showing that variations in the formant range of a carrier sentence affected the perception of a test word in predictable ways. More recent work has shown that extrinsic cues can influence vowel perception but it is still unclear whether such information is essential to accurate perception or merely supplementary. Some studies suggest that vowels can be accurately identified without the benefit of extrinsic vowel information (Assmann, Nearey, Hogan, 1982; Verbrugge, Strange, Shankweiler and Edman, 1976).
Disner (1980) provides an evaluation of various normalisation procedures including those of Nearey (1977), Lobanov (1971) and Gerstman (1968). Gerstman's procedure transforms each speaker's vowels to a fixed vowel space specified by maximum and minimum F1 and F2 values. Hence, spaces are standardised according to the point vowels. Lobanov's procedure, on the other hand, standardises the spaces according to their centres by equalising the means and standard deviation. Lobanov's procedure therefore theoretically interferes less with the phonetic distribution of the peripheral vowels whereas Gerstman's assumes that the point vowels do not vary phonetically from speaker to speaker.
Nearey (1977) uses a log-mean normalisation procedure where the mean provides a correction factor for scaling individual's vowels. The procedure involves taking the logarithms of a speaker's vowel formants and averaging the data separately for each formant. The average of the formant one and formant two means is taken and this value is then subtracted from the original logarithms for each vowel formant.
Disner (1980) found that Nearey's log-mean normalisation procedure was most effective in reducing variability with Lobanov's procedure also effective. She did find that different languages responded differently to the procedures and cautions against assuming direct correspondence and comparison between different languages. Disner (1980:253) is careful to point out that care "should be exercised to ensure that the trends which remain in the normalised data are truly linguistic trends and not artifacts of the normalisation technique itself".
Other researchers have proposed speaker -independent strategies for normalising vowels based on auditory theories although these strategies have yet to be fully examined (Bladon, Henton and Pickering, 1984; Syrdal and Gopal, 1986)
3.5 Sex Differences
There are major acoustic differences between phonetically equivalent vowels produced by males and females. These differences have two possible sources. Firstly, the differences are physiological and therefore universal, and secondly, the differences are the result of sex-specific articulations and are therefore culturally determined.
Chiba and Kajiyama (1941) suggested that vocal tract length differences represented the major source of variation in vocal tract transfer between the groups. This hypothesis is tenable given that the standing wave characteristics of the resonant frequencies of the vocal tract tube are dictated by length to a greater degree than cross sectional dimension (Fant, 1960). Fant (1966), however, found that a simple scaling factor inversely proportional to vocal tract length did not adequately account for the observed differences in formant frequencies between the sexes. The magnitude of the difference depended on the vowel itself. He found that rounded back vowels required a minimal amount of correction for F1 and F2, very open unrounded vowels required maximal correction for F1, and high front vowels required minimal correction for F1. Fant provided two explanations for these findings. Firstly, that women and children have relatively shorter pharyngeal cavities and smaller laryngeal cavities than men, and that this factor would have a differential influence on formant frequencies. Secondly, that not all formants have standing wave characteristics where total length is the primary determinant of resonance characteristics. For instance, in double helmholtz resonators, such as is characteristic of back rounded vowels, overall length does not have the same magnitude of effect on F1 and F2. Fant (1966) concluded that female and male formant patterns were not related in a simple linear fashion. Fant (1975) attempted to assess the universal nature of male to female vowel formant differences in order to establish an anatomical basis for departure from a simple uniform scaling. He examined eight different languages and concluded that uniform normalisation with a single scale factor could substantially reduce the sex differences in the data but non-uniform techniques were required to account for specific vowel category and formant number trends. He concludes that "the female-male formant-frequency relations are in part determined by anatomical constraints" (Fant, 1975:17), but also that "we cannot quite exclude the possibility of universal "feministic" preference in vowel qualities which might have influenced the average data" (p18).
Nordström (1977) attempted to simulate female area functions from male data taking into account well known anatomical differences such as overall length and pharyngeal/oral ratio. Resulting acoustic interpolations did not produce formant pattern in agreement with observations. Nordström concludes, as does Fant (1975), that anatomical differences only explain part of the formant differences.
Goldstein (1980, cited by Rosner and Pickering, 1994 and Högberg, 1995) examined the growing vocal tract in children but also discussed male/female differences. She used an anatomical model to predict male to female formant ratios but without success. She concluded that sex differences were a combination of anatomical and culturally based factors. Goldstein found that women tended to produce more peripheral vowels than men and hence have a wider vowel space. She hypothesised that this was the result of women's need to spread the formants more because of underepresentation of the vocal tract transfer characteristics due to the high fundamental frequency and hence widely spaced harmonics. She also suggested that women make narrower constrictions than men and that this has a differential effect on the vowels. Labov (1972a) suggests that more peripheral women's vowels is the result of women's tendency to produce "clearer" speech.
Traunmüller (1984) disagrees that observed formant frequency differences
between the sexes are the result of learned sex specific articulatory habits.
His thesis is that the descent of the larynx during puberty in males is the
direct cause of the differences and that previous anatomical models have inaccurately
accounted for the growth effects and their physiological consequences. Traunmüller
hypothesised that the lowering of the larynx would also affect constriction
size as a result of the back part of the tongue being pulled down. The model
suggests that the tongue growth affects the first formant such that maximum
constrictions are more open which raises F1 of high front and back vowels
and lowers F1 of low front and back vowels. F2 for front vowels would be affected
more than F2 of back vowels due to different cavity affiliations. F2 of front
vowels is said to be affiliated with the back cavity and F2 of back vowels
is affiliated with the front cavity. In Traunmüller's model these factors
account for the more centralised nature of male vowel spaces. Despite numerous
modifications to the model in order to account for known physiological effects,
Traunmüller was still unable to make predictions that would match actual formant
observations.
As we have seen, there is some disagreement in the literature over the possibility
of sex specific articulations which are not the result of anatomical differences.
An examination of children's vowel productions may help to shed some light
on this question. Several researchers (Bennett and Weinberg, 1979b; Bennett,
1981; Busby and Plant, 1995) have examined the formant structure of children's
vowels and have established significant differences between the sexes despite
presumed anatomical congruence. Others have established that listeners are
able to use formant information to ascertain the sex of children when fundamental
frequency information is equivocal (Bennett and Weinberg, 1979a; Ingrisano,
Weismer and Schuckers, 1980; Sachs, Lieberman and Erickson, 1973). These studies
suggest very strongly that learned patterns of sex specific articulatory gestures
are present in the speech of preadolescent children.
3.6 Summary
Vowels can be described in terms of the centre frequencies of the first three formants at the vowel target (or targets for diphthongs). Vowel duration and other dynamic spectral information contribute to a more complete description but the extent of this contribution remains unclear. Contextual environment as well as suprasegmental factors plays an important role in the ultimate realisation of the vowel phoneme and so such characteristics must be carefully controlled in phonetic research.
Physiological differences between speakers also affect vowel characteristics and such effects must be accounted for in phonetic research and minimised if necessary. One method of minimising physiological effects is to use one of the many normalisation procedures available to reduce variance but care must always be taken when manipulating data to ensure that phonetic accuracy is preserved. The question of sex specific articulations remains open as researchers have been unable to adequately model male to female vowel behaviour.
Acoustic data provides an accessible means for hypothesising about articulatory behaviour and it is customary, in phonetic discussions of vowel characteristics, to use articulatory labels to refer to auditory and acoustic properties (Ladefoged and Maddieson, 1990). Articulatory discussions provide convenient global labels for describing acoustic effects, however, specific articulatory detail should not be ascribed to acoustic vowel data.
Bibliography
Please note: The references below were referred to in the text above. They are not required reading for this unit.
Ainsworth, W. A. (1972) Duration as a cue in the recognition of synthetic vowels. Journal of the Acoustical Society of America, 51, 648-651.
Ainsworth, W. A. (1981) Duration as a factor in the recognition of synthetic vowels. Journal of Phonetics, 9, 333-342.
Andruski, J. E. and Nearey, T. M. (1992) On the sufficiency of target specification of isolated vowels in /bVb/ syllables. Journal of the Acoustical Society of America, 91, 390-410.
Assmann, P., Nearey, T. and Hogan, J. (1982) Vowel identification: orthographic, perceptual, and acoustic aspects. Journal of the Acoustical Society of America, 71, 975-989.
Beckman, M. E., Jung, T-P., Lee, S-h., de Jong, K., Krishnamurthy, A. K., Ahalt, S. C., Cohen, K. B. and Collins, M, J. (1995) Variability in the production of quantal vowels revisited. Journal of the Acoustical Society of America, 97, 471-490.
Benguerel, A.-P. and McFadden, T. (1989) The effect of coarticulation on the role of transitions in vowel perception. Phonetica, 46, 80-96.
Bennett, S. (1981) Vowel formant frequency characteristics of preadolescent males and females. Journal of the Acoustical Society Of America, 69, 231-238.
Bennett, S. and Weinberg, B. (1979a) Sexual characteristics of preadolescent children's voices. Journal of the Acoustical Society Of America, 65, 179-189.
Bennett, S. and Weinberg, B.(1979b) Acoustic correlates of perceived sexual identity in preadolescent children's voices. Journal of the Acoustical Society Of America, 66, 989-1000.
Bernard, J. R. L. (1967a) Some Measurements of Some Sounds of Australian English. Unpublished Doctoral Dissertation. Sydney University.
Bernard, J. R. L. (1967b) Length and identification of Australian English vowels. AUMLA: Journal of the Australasian Universities Language and Literature Association, 27, 37-58.
Bladon, A. (1983) Two-formant models of vowel perception: shortcomings and enhancements. Speech Communication, 2, 305-313.
Bladon, A. (1985) Diphthongs: a case study of dynamic auditory processing. Speech Communication, 4, 145-154.
Bladon, A. and Fant, G. (1978) A two-formant model and the cardinal vowels. Speech Transmission Laboratory, Quarterly Progress Status Report , 1, 1-8.
Bladon, A., Henton, C. and Pickering, J. (1984) Towards an auditory theory of speaker normalisation. Language and Communication, 4, 59-69.
Busby, P. A. and Plant, G. L. (1995) Formant frequency values of vowels produced by preadolescent boys and girls. Journal of the Acoustical Society of America, 97, 2603-2606.
Carlson, R., Fant, G. and Granström, B. (1975) Two-formant models, pitch, and vowel perception. In Fant, G. and Tatham, M. (eds.) Auditory Analysis and Perception of Speech. pp 55-82, Academic Press, London.
Carlson, R., Granström, B. and Fant, G. (1970) Some studies concerning perception of isolated vowels. Speech Transmission Laboratory. Quarterly Progress Status report, 3-4, 84-104.
Chiba, T. and Kajiyama, M. (1941) The Vowel: its Nature and Structure. Tokyo Publishing Company, Tokyo.
Chistovich, L. and Lublinskaya, V. (1979) The 'centre of gravity' effect in vowel spectra and critical distance between the formants: psychoacoustical study of the perception of vowel-like stimuli. Hearing Research, 1, 185-195.
Chistovich, L. A., Sheikin, R. L. and Lublinskaya, V. V. (1979) 'Centres of gravity' and spectral peaks as the determinants of vowel quality. In Lindblom B. and Öhman, S. (eds.) Frontiers of Speech Communication Research, pp 143-158, Academic Press, London.
Chomsky, N. and Halle, M. (1968) The Sound Pattern of English. Harper and Row, New York.
Clark, J. E. (1989) Some proposals for a revised phonetic transcription of Australian English. In Collins, P. and Blair. D. (eds.) Australian English: The Language of a New Society. pp 205-213, University of Queensland Press, St Lucia.
Clark, J. E., Lubker, J. F. and Hunnicut, S. (1987) Some preliminary evidence for phonetic adjustment strategies in communication difficulty. In Steele, R. and Threadgold, T. (eds.) Language Topics: Essays in Honour of Michael Halliday. Volume 2. pp 161-181, John Benjamin Publishing Company, Amsterdam.
Clark, J. E. and Yallop, C. (1990) An Introduction to Phonetics and Phonology. Basil Blackwell, Oxford.
Cochrane, G. R. (1967) The perception of short segments from some Australian English vowels. Zeitschrift fur Phonetik, 20, 81-88.
Collier, R., Bell-Berti, F. and Raphael, L. J. (1982) Some acoustic and physiological observations on diphthongs. Language and Speech, 25, 305-323.
Delattre, P., Liberman, A. M., Cooper, F. S. and Gerstman, F. J. (1952) An experimental study of the acoustic determinants of vowel colour: observations on one- and two-formant vowels synthesised from spectrographic patterns. Word, 8, 195-210.
Disner, S. F. (1980) Evaluation of vowel normalisation procedures. Journal of the Acoustical Society of America, 67, 253-261.
Disner, S. F. (1986) On describing vowel quality. In Ohala, J. J. and Jaeger, J. J. (eds.) Experimental Phonology. pp 69-79, Academic Press, Orlando.
Engstrand, O. (1988) Articulatory correlates of stress and speaking rate in Swedish VCV utterances. Journal of the Acoustical Society of America, 83, 1863-1875.
Fant, G. (1960) Acoustic Theory of Speech Production. Mouton, The Hague.
Fant, G. (1966) A note on vocal tract size factors and non-uniform F-pattern scalings. Speech Transmission Laboratory, Quarterly Progress Status Report, 4, 22-30.
Fant, G. (1975) Non-uniform vowel normalisation. Speech Transmission Laboratory, Quarterly Progress Status Report, 2-3, 1-19.
Flanagan, J. L (1972) Speech Synthesis, Analysis, and Perception. Springer-Verlag, New York.
Fourakis, M. (1991) Tempo, stress and vowel reduction in American English. Journal of the Acoustical Society of America, 90, 1816-1827.
Fox, R. (1985) Multidimensional scaling and perceptual features: evidence of stimulus processing or memory prototypes? Journal of Phonetics, 13, 205-217.
Gay, T. (1968) Effect of speaking rate on diphthong formant movements. Journal of the Acoustical Society of America, 44, 1570-1573.
Gay, T. (1970) A perceptual study of American English diphthongs. Language and Speech, 13, 65-88.
Gay, T. (1978) Effect of speaking rate on vowel formant movements. Journal of the Acoustical Society of America, 63, 223-230.
Gay, T., Ushijima, T., Hirose, H. and Cooper, F. S. (1974) Effect of speaking rate on labial consonant-vowel articulation. Journal of Phonetics, 2, 47-63.
Gerstman, L. (1968) Classification of self-normalised vowels. IEEE Transactions on Audio and Electroacoustics, 16, 78-80.
Goldstein, U. (1980) An Articulatory Model of the Vocal Tracts of Growing Children. Unpublished Doctoral Dissertation, MIT, Cambridge, Massachusetts.
Gottfried, T, Miller, J. and Meyer, D. (1993) Three approaches to the classification of American English diphthongs. Journal of Phonetics, 21, 205-229.
Harrington, J. and Cassidy, S. (1994) Dynamic and target theories of vowel classification: Evidence from monophthongs and diphthongs in Australian English. Language and Speech, 37, 357-373.
Harrington, J. and Cassidy, S. (1999) Techniques in Speech Acoustics. Kluwer, Dordrecht.
Hillenbrand, J. and Gayvert, R. T. (1993) Identification of steady-state vowels synthesised from the Peterson and Barney measurements. Journal of the Acoustical Society of America, 94, 668-673.
Hillenbrand, J., Getty, L. A., Clark, M. J. and Wheeler, K. (1995) Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America, 97, 3099-3111.
Högberg, J. (1995) From sagittal distance to area function and male to female scaling of the vocal tract. Speech Transmission Laboratory, Quarterly Progress Status Report, 4, 11-48.
Holbrook, A. and Fairbanks, G. (1962) Diphthong formants and their movements. Journal of Speech and Hearing Research, 5, 38-58.
Huang, C. (1986) The effect of formant trajectory and spectral shape on the tense/lax distinction in American vowels. In Proceedings IEEE Conference on Acoustics, Speech and Signal Processing. pp 893-896, Tokyo, Japan.
Huang, C. (1992) Modeling human vowel identification using aspects of formant trajectory and context. In Tohkura, Y., Vatikiotis-Bateson, and Sagisake, Y. (eds.) Speech Perception, Production and Linguistic Structure. pp 43-61, IOS Press, Amsterdam.
Ingrisano, D., Weismer, G. and Schuckers, G. H. (1980) Sex identification in preschool children's voices. Folia Phoniatrica, 32, 61-69.
Jones, A. I. (1964-5) Sydney English - a seven vowel system. Studies in Linguistics, 18, 29-35.
Joos, M. (1948) Acoustic phonetics. Language, 24, 1-136.
Kewley-Port, D. and Atal, B. (1989) Perceptual differences between vowels located in a limited phonetic space. Journal of the Acoustical Society of America, 85, 1726-1740.
Klatt, D. H. (1976) Linguistic uses of segmental duration in English: Acoustic and perceptual evidence. Journal of the Acoustical Society of America, 59, 1208-1221.
Klatt. D. and Klatt, L. (1990) Analysis, synthesis and perception of voice quality variations among male and female talkers. Journal of the Acoustical Society of America, 87, 820-857.
Klein, W., Plomp, R. and Pols, L. (1970) Vowel spectra, vowel spaces and vowel identification. Journal of the Acoustical Society of America, 48, 999-1009.
Labov, W. (1972a) Sociolinguistic Patterns. University of Pennsylvania Press, Philadelphia.
Ladefoged, P. (1982) A Course in Phonetics. Harcourt Brace Jovanovich, New York.
Ladefoged, P. and Broadbent, D. E. (1957) Information conveyed by vowels. Journal of the Acoustical Society of America, 29, 98-104.
Ladefoged, P., De Clerk, J., Lindau, M. and Papçun, G. (1972) An auditory -motor theory of speech production. UCLA Working Papers in Phonetics, 22, 48-75.
Ladefoged, P. and Maddieson, I. (1990) Vowels of the world's languages. Journal of Phonetics, 18, 93-122.
Lass, R. (1984) Phonology. Cambridge University Press, Cambridge.
Laver, J. (1980) The Phonetic Description of Voice Quality. Cambridge University Press, Cambridge.
Lehiste, I. (1964) Acoustical Characteristics of Selected English Consonants. Indiana University, Bloomington.
Lehiste, I. and Peterson, G. (1961) Transitions, glides and diphthongs. Journal of the Acoustical Society of America, 33, 268-277.
Lindau, M. (1975) Features for Vowels, UCLA Working Papers in Phonetics, 30, 1-155.
Lindau, M., Norlin, K. and Svantesson, J-O. (1985) Cross-linguistic differences in diphthongs. UCLA Working Papers in Phonetics, 61, 40-44.
Lindblom, B. (1963) Spectrographic study of vowel reduction. Journal of the Acoustical Society of America, 35, 1773-1781.
Lindblom, B. (1964) Articulatory activity in vowels. Speech Transmission Laboratory, Quarterly Progress Status Report, 2, 1-5.
Lindblom, B. (1967) Vowel duration and a model of lip-mandible coordination. Speech Transmission Laboratories Progress Status Report, 4, 1-29.
Lindblom, B. (1983) Economy of speech gestures. In MacNeilage, P. F. (ed.) The Production of Speech. pp 217-245, Springer-Verlag, New York.
Lindblom, B. and Lindgren, R. (1985) Speaker-listener interaction and phonetic variation. Phonetic Experimental Research at the Institute of Linguistics, University of Stockholm (Perilus), IV, 77-85. Institute of Linguistics, University of Stockholm.
Lindblom, B. and Moon, S.-J. (1988) Formant undershoot in clear- and citation-form speech. Phonetic Experimental Research at the Institute of Linguistics, University of Stockholm (Perilus),VIII, 21-33.
Lindblom, B. E. F. and Sundberg, J. E. F. (1971) Acoustical consequences of lip, tongue, jaw and larynx movement. Journal of the Acoustical Society of America, 50, 1166-1179.
Lisker, L. (1974) On "explaining" vowel duration, Glossa, 2, 233-246.
Lobanov, B. M. (1971) Classification of Russian vowels spoken by different speakers. Journal of the Acoustical Society of America, 49, 606-608.
Miller, R. L. (1953) Auditory tests with synthetic vowels. Journal of the Acoustical Society of America, 25, 114-121.
Miller, J. (1981) Effects of speaking rate on segmental distinctions. In Eimas, P. and Miller, J. (eds.), Perspectives on the Study of Speech. pp 39-74, Lawrence Erlbaum, New Jersey.
Mitchell, A. G. and Delbridge, A. (1965b) The Pronunciation of English in Australia. Angus and Robertson, Sydney.
Moon, S.-J. and Lindblom, B. (1994) Interaction between duration, context, and speaking style in English stressed vowels. Journal of the Acoustical Society of America,96, 40-55.
Mrayati, M., Carre, R., Guerin, B. (1988) Distinctive regions and modes: A new theory of speech production. Speech Communication, 7, 257-286.
Nearey, T. (1977) Phonetic Feature Systems for Vowels. Unpublished Doctoral Dissertation, University of Connecticut, Storrs, CT.
Nearey, T. (1989) Static, dynamic, and relational properties in vowel perception. Journal of the Acoustical Society of America, 85, 2088-2113.
Nearey, T. M. and Assmann P. F. (1986) Modeling the role of inherent spectral change in vowel identification. Journal of the Acoustical Society of America, 80, 1297-1308.
Nordström, P.-E. (1977) Female and infant vocal tracts simulated from male are functions. Journal of Phonetics, 5, 81-92.
Ohala, J. J. (1983) The origin of sound patterns in vocal tract constraints. In MacNeilage, P. F. (ed.) The Production of Speech, pp 189-216, Springer-Verlag, New York.
Paliwal, K., Lindsay, D. and Ainsworth, W. (1983) A study of two-formant models for vowel identification. Speech Communication, 2, 295-303.
Peterson, G. E., and Lehiste, I. (1960) Duration of syllable nuclei in English. Journal of the Acoustical Society of America, 32, 693-703.
Picheny, M. A., Durlach, N. I., and Braida, L. D. (1986) Speaking clearly for the hard of hearing II: Acoustic characteristics of clear and conversational speech. Journal of Speech and Hearing Research, 29, 434-446.
Rackerd, B. and Verbrugge, R. (1985) Linguistic and acoustic correlates of the perceptual structure found in an individual differences scaling study of vowels. Journal of the Acoustical Society of America, 71, 296-301.
Rosenberg, A. (1971) Effect of glottal pulse shape on the quality of natural vowels. Journal of the Acoustical Society of America, 49, 583-590.
Rosner, B. S. and Pickering, J. B. (1994) Vowel Perception and Production. Oxford University Press, Oxford.
Sachs, J., Lieberman, P. and Erickson, D. (1973) Anatomical and cultural determinants of male and female speech. In Shuy, R. W. and Fasold, R. W. (eds.) Language Attitudes: Current Trends and Prospects. Georgetown University Press, Washington, D.C.
Sawashima, M. and Hirose, H. (1983) Laryngeal gestures in speech production. In MacNeilage, P. F. (ed.) The Production of Speech. pp 189-212, Springer Verlag, New York.
Shepard, R. (1972) Psychological representation of speech sounds. In David, E. and Denes, D. (eds.) Human Communication: A Unified View. pp 67-113, McGraw Hill, New York.
Stevens, K. (1972) The quantal nature of speech: evidence from articulatory-acoustic data. In David, E. E. and Denes, P. B. (eds.) Human Communication: A Unified View. pp 51-66, McGraw-Hill, New York.
Stevens, K. N. (1989) On the quantal nature of speech. Journal of Phonetics, 17, 3-45.
Stevens, K. N. and House, A.S. (1955) Development of a quantitative description of vowel articulation. Journal of the Acoustical Society of America, 27, 484-493.
Stevens, K., N. and House, A. S. (1963) Perturbation of vowel articulations by consonantal context: An acoustical study. Journal of Speech and Hearing Research, 6, 111-128.
Stevens, K. N., House, A. S. and Paul, A. (1966) Acoustical description of syllabic nuclei: an interpretation in terms of a dynamic model od articulation. Journal of the Acoustical Society of America, 40, 123-132.
Strange, W. (1989) Dynamic specification of coarticulated vowels spoken in sentence context. Journal of the Acoustical Society of America, 85, 2135-2153.
Strange, W. (1989) Evolving theories of vowel perception. Journal of the Acoustical Society of America, 85, 2081-2087.
Strange, W., Verbrugge, R., Shankweiler, D. and Edman, T. (1976) Consonant environment specifies vowel identity. Journal of the Acoustical Society of America, 60, 213-224.
Syrdal, A. and Gopal, H. (1986) A perceptual model of vowel recognition based on the auditory representation of American English vowels. Journal of the Acoustical Society of America, 79, 1086-1100.
Terbeek, D. (1977) Cross-language multidimensional scaling study of vowel perception. UCLA Working Papers in Phonetics, 37, University of California, Los Angeles.
Traunmüller, H. (1984) Articulatory and perceptual factors controlling the age- and sex-conditioned variability in formant frequencies of vowels. Speech Communication, 3, 49-61.
Trubetskoy, N. S. (1939) Grundzüge der Phonologie. Travaux du Cercle Linguistique de Prague 7. Reprinted 1958, Göttingen: Vandenhoeck & Ruprecht. Translated into French by J. Cantineau 1949 as Principes de Phonologie, Paris: Librairie Klincksiek. Translated into English by C. A. M. Baltaxe 1969 as Principles of Phonology, Berkeley: University of California Press.
van den Berg, J. (1958) Myoelastic-aerodynamic theory of voice production. Journal of Speech and Hearing Research, 1, 227-244.
van den Berg, J. (1968) Mechanisms of the larynx and laryngeal vibrations. In Malmberg, B. (ed) Manual of Phonetics, pp 278-308, North-Holland, Amsterdam.
van Son, R. J. J. H. and Pols, L. C. W. (1990) Formant frequencies of Dutch vowels in a text, read at normal and fast rate. Journal of the Acoustical Society of America, 88, 1683-1693.
van Son, R. J. J. H. and Pols, L. C. W. (1992) Formant movements of Dutch vowels in a text, read at normal and fast rate. Journal of the Acoustical Society of America, 92 , 121-127.
van Summers, W. (1987) Effects of stress and final-consonant voicing on vowel production: articulatory and acoustic analyses. Journal of the Acoustical Society of America, 82 , 847-863.
van Bergem, D. (1993) Acoustic vowel reduction as a function of sentence accent, word stress, and word class. Speech Communication, 12, 1-23.
Verbrugge, R., Strange, W., Shankweiler, D. and Edman, T. (1976) What information enables a listener to map a talkers vowel space? Journal of the Acoustical Society of America, 60 , 198-212.
Zahorian, S. and Jagharghi, A. (1993) Spectral-shape features versus formants as acoustic correlates for vowels. Journal of the Acoustical Society of America, 94, 1966-1982.

