Speech Directivity And Vowel Sounds
When we are behind someone who is speaking, we hear them less clearly, and it is more difficult to understand what they are saying. This is one of the most direct experiences we can have of speech directivity. This phenomenon induces variations of the amplitude and frequency content of the sound produced by a speaker with the direction.
Measurements with microphones regularly spaced around subjects showed that the amplitude of sound decreases progressively moving from front to back. On the other hand, this amplitude variation is greater for high frequencies of the speech spectrum. It reaches down to 30 dB at 16 kHz.
The frequency dependence of directivity explains why it can be more difficult to understand someone when we are located behind them. In fact, we only get the low-frequency part of speech sounds, and consonants (like /s/, /t/, or /f/ as an example), which generally contain a lot of high frequencies, are altered and more difficult to recognize.
Quite intuitively, speech directivity has long ago been attributed to the obstacle that the head and the body form against the propagation of the sound emitted by the mouth. The phenomenon responsible for this attenuation, diffraction, becomes more significant when the wavelength of the sound and the obstacle are of comparable dimensions. Thus, low frequencies which have wavelengths much larger than the human body are almost not affected and are just slightly damped.
On the contrary, higher frequencies, which have smaller wavelengths, are more affected, and the body acts as a screen. However, this does not explain everything. The precise shape of the variation of amplitude with the direction, the directivity pattern, varies with the different speech sounds.
Among the vowels, /a/, /e/, and /i/ tend to show more front-back differences than /o/ and /u/. This is explained by the differences in mouth opening involved for these two groups of vowels. A large mouth opening will favor the concentration of the acoustic energy in front of the speaker. And, as a matter of fact, /a/, /e/, and /i/ correspond to a larger mouth opening than /o/ and /u/.
But we can look even closer. We can create images of the directivity phenomenon. We can represent the variations of amplitude with colors as a function of the frequency and the direction. These images, which can help to find what would be the amplitude variation in a given direction at a given frequency, can be called directivity maps.
Looking at this kind of map, we can see the global front-back difference which increases with the frequency. But we also notice that an incredible richness and complexity of amplitude variation shapes can appear at high frequency. The amplitude of sound appears to be very weak in some particular directions. The maximum of amplitude is no more necessarily located in front but can travel on the sides quite a lot in a few hundreds of Hz. The consequence of this is small variations in voice quality with the direction. This is particularly obvious if we listen to a recording of someone moving while speaking into a microphone.
This richness and complexity find their origins inside the vocal tract. When the frequency is high enough, the wavelength is smaller than the width of the vocal tract in some parts. In this case, the amplitude of the sound varies on the transverse section of the vocal tract. If these transverse variations reach the mouth, this affects the way in which the sound is radiated outside of the mouth, creating complex radiation patterns.
The frequency from which these transverse variations can occur depends on the width of the vocal tract. In the widest parts of the vocal tract, it can appear from 3.5 kHz, and in the narrowest, it can happen well above 20 kHz (the upper limit of audible frequencies). As a consequence, the vowel /a/, which is produced by forming a wide cavity just before the mouth, generates very rich and complex directivity patterns from 3.5 kHz on. On the contrary, the vowel /u/, which is produced by forming a narrow constriction before the mouth, exhibits almost no complex directivity patterns. If and how these differences in directivity affect the way we perceive these vowels is still a question of research.
Understanding the mechanisms surrounding the origin of variation in speech sound amplitude with the direction and the frequency can help create realistic and natural sounding virtual voices. On the other hand, since transverse variations depend on the vocal tract width, it may help extract anatomical information about speakers with several microphones.
These findings are described in the article entitled The effect on vowel directivity patterns of higher order propagation modes, recently published in The Journal of Sound and Vibration. This work was conducted by Rémi Blandin, Annemie Van Hirtum, Xavier Pelorson, and Rafael Laboissière from the Grenoble Alpes University.