Skip to Content

Department of Linguistics

SPEECH ACOUSTICS

Intensity Transformations of Speech

Robert Mannell

Gustav Fechner (1860) formulated a "law" of psychophysics, "Fechner's law", which states a general relationship between the magnitude of a physical stimulus (S) and the magnitude of the perceptual response (R), where R is proportional to the log of S. When this general principle is applied to the relationship between the physical intensity (I) of a sound and our perception of intensity, or loudness (L), we can deduce that there is a logarithmic relationship between the intensity of a sound and its perceived loudness. This assumption is the basis for the use of the deciBel in the measurement of intensity. The dB is a logarithmic representation of physical intensity and so it is a fairly simple way of producing an intensity scale that approximates our perception of intensity.

Numerous psychoacoustic tests of our perception of the intensity dimension of sound have been carried out over the years. These tests are mostly analogous to tests of frequency perception. For example, there have been tests of just noticeable differences (jnd) in intensity (at different frequencies) just as there have been tests of jnd for frequency. What seems most relevant, however, to our discussion of the perception of complex sounds such as speech are tests that examine the relative loudness of sounds at both the same and different frequencies.

Loudness level tests attempt to determine the intensity at one frequency that sounds as loud as a particular intensity at another frequency. Figure 1 illustrates these relationships.

Figure 1: This diagram illustrates lines of equal loudness level across the range of audible frequencies and intensities. These contours define the phon scale.

The values for each of the contours in figure 1 can be determined by examining the intensity at the point where a curve intersects the 1000 Hz line. That is, sounds tat sound as loud as 40 dB at 1000 Hz are found on the contour that passes through the 40 dB point at 1000 Hz and all of these sounds are said to have a loudness level of 40 phons. This means that a sound with an intensity of 52 dB and a frequency of 100 Hz has a loudness level of 40 phons as it occurs on the contour that passes through 40 dB at 1000 Hz. It should be noted that the baseline is the threshold of hearing and is set at 0 phons (which was once thought to be at 0 dB at 1000 Hz - this is why 1000 Hz was chosen as the reference frequency, and it has been retained because of convention).

Note that the phon scale tells us which sounds have the same loudness and which sounds have greater or lesser loudness, but it does not tell us anything about relative loudness. For example, 40 phons is NOT twice as loud as 20 phons and half as loud as 80 phons. A large series of experiments which looked at the relative perception of loudness for a large number of normal hearing subjects across a range of frequencies determined a scale of relative loudness - known as the sone scale of loudness. In the sone scale a sound with a loudness of 2 sone is twice as loud as a sound with a loudness of 1 sone and half as loud as a sound with a loudness of 4 sone. The sone scale is illustrated in figure 2.

Figure 2: The sone scale of relative loudness across the range of audible frequencies and intensities. Each of these contours are 1 sone apart.

It is immediately apparent from the sone scale contours that the contours get increasingly closer together as intensity increases. Note that the 1 sone contour is identical to the 40 phon contour. This is the arbitrary set reference sound level upon which the sone scale was developed by a large series of test where subjects were asked to adjust one sound so that it sounded twice as loud or half as loud as a reference sound. Once 40 phons was arbitrarily set to equal 1 sone all other loudness values could be determined in relation to that loudness level.

A series of experiments carried out by Mannell (1994) suggest that when we perceive speech we don't use the sone scale, but rather the logarithm of the sone scale (logsone) to encode the intensity dimension of speech sounds. Such a scale is almost identical to the phon scale, but with different numerical values applied. Further, across most frequencies relevant to the perception of speech (say 100 to 4000 Hz) the logsone and phon scales are very closely approximated by decibels above threshold. For this reason, I have used the dB scale in this part of the course to represent the intensity dimension of speech sounds in spectrograms and spectra even when using auditory scales to represent the frequency dimension.

For further information see these pages from Mannell (1994).

Mannell, R.H., (1994), The perceptual and auditory implications of parametric scaling in synthetic speech, Unpublished Ph.D. dissertation, Macquarie University