Department of Linguistics
Acoustic Representations of Speech
- FFT and LPC Spectra
- Fundamental Frequency Plots
- Intensity Plots
- Speech Spectra and Spectrograms
In speech analysis FFT's and LPC's are used for the accurate identification of the frequencies and relative intensities of the various components of the speech spectrum. For example, FFT's allow a close examination of the interaction between harmonic frequencies and formant frequencies. LPC's provide a convenient method for the identification of the formants of vowels and vowel-like consonants.
Spectrograms permit the examination of the dynamic changes in a speech spectrum. This is particularly useful for the examination of rapidly changing consonants (eg. stop bursts) and also for vowel transitions (between vowels and consonants and between the targets in diphthongs). Spectrograms, usually in conjunction with waveforms, are essential during the segmenting and labeling of speech. Spectrograms usually provide the clearest visual cues to the boundaries between phonemes. Spectrograms do not, however, provide accurate measurements of vowel formants as broad band spectrograms have a poor frequency resolution (about 300 Hz) and so there is a high degree of intrinsic error in formant measurements taken visually from spectrograms. That is why we tend to use FFTs and LPCs for the accurate measurement of formant frequencies.
On very many speech acoustics packages, automatic formant tracking is a commonly used tool. Such tools generally superimpose a formant track (often colour coded) over the spectrogram. This often greatly facilitates the user's ability to identify the formants. Further, automatic formant tracking usually provides a set of formant values that can be analysed statistically in work on large speech databases.
|Figure 1: This is a broad band spectrogram of the word "hide" with the formant tracks for formants 1 to 5 superimposed over it.|
In figure 1, the formant tracks provide continuous plots of formant frequencies even over those parts of the spectrogram for which there is no displayed spectral energy (such as the stop occlusion above 1000 Hz). Unfortunately, most formant trackers are very error prone in voiceless fricatives and in oral stops and don't provide as tidy a set of formant tracks as those that appear here. Such formant trackers are quite accurate in vowels, but their accuracy decreases as we go from the most vowel-like consonants (ie. semivowels) to the least vowel-like consonants (ie. oral stops and voiceless fricatives).
Fundamental frequency (F0) plots are essential when working with prosody, and particularly with intonation. We will use F0 plots extensively in this course when we examine the analysis of speech intonation. Until then, look at figure 7.19.3 (panel c) on page 298 of Clark and Yallop.
Intensity plots are often useful in speech analysis. They can sometimes help to identify phoneme boundaries and can also be useful in the analysis of the intensity correlates of prosody. Figure 7.19.3 (panel b) of Clark and Yallop, shows a dB-scaled "short-term average" intensity plot for the word "Woolloomooloo". This root mean square average was taken using a contiguous series of short overlapping windows. Such overlapping windows are usually set so that each window is greater than the pitch period of the waveform. This permits the examination of the intensity profile without the interference of fluctuations in intensity caused by variations in voice source intensity during each glottal cycle.
Click here to view this topic.