You are on page 1of 22
Musica ACOUSTICS Donald E. Hall California State University, Sacramento BROOKS/COLE THOMSON LEARNING ‘Australia « Canada Mexico » Singapore + Spain « United Kingdom + United States Tn a contest for versatility among musical instruments, the human voice wins easily. (Coming from a pipe organist, that is no mean compliment!) This i all the more remarkable because the same apparatus serves other purposes as well, such as when we eat and breathe. After a brief introduction to the relevant anatomy, we will take up the mechanism of sound production in speech and singing, two activities thee we both covered by the term phonation. This requires consideration of both transions and steady sounds, which correspond only roughly to the linguistic categories of Consonants and vowels. We will take this opportunity to explain the tole af a fluid-dynamical force called the Bemouli effect in the operation of vocal conde and lip reeds. We will find ourselves particularly concemed with the concept of formants or frequency ranges in which harmonic components are especialy strong, These are central to our recognition of vowels in speech and song. The chapter will close with brief remarks on miscellaneous aspects of singing, such as vibrato cad carrying power. MRS een In common with all wind instruments, the human voice has (1) an air reservoir with @ means of maintaining pressure above atmospheric, (2) an outlet channel with a narrow constriction (or in this case several) where airflow can be inter rupred or modulated, and (3) a resonant cavity to strengthen some aspects of the resulting sound waves. The air reservoir is located in the lungs, where you normally hold approxi- mately 3 to 4 liters of air, with half a liter moving in and out with each breath, Air is drawn in by raising the rib cage to expand the lungs, and expelled by con. tracting abdominal muscles, which force the abdominal contents upward againse the diaphragm. The lungs of an adult taking a very deep breath may hold as much 38 5 t0 6 liters of air, and still will contain 1 to 2 liters after maximum exhalation, fhe difference of 3 to 5 liters represents the maximum available ait for singing from a single breath, although most people will not use more than l to 2 eee without some vocal traning. A tube called the trachea leads from the lungs up to the vocal tract term that includes the throat, mouth, and nose (Figure 1-1), At the top of the trachea, serving as a switchyard to join it with the esopha- gus and the vocal tract, isa hollow boxlike structure of catilage called the laryne (Figure 14.2). The epielotts is flaplike valve on top of the larynx that cheps down during swallowing to prevent food from entering the trachea, although itis 293 294 cHaPTER 14 THE HUMAN Voice FIGURE 14.1 The human vocal apparatus. See Figure 14.2 for more detail on the larynx open for phonation. The airway also can be blocked by the vocal cords (or vocal folds), a pair of ridges of soft layered tissue on the inside walls of the larynx, ‘whose shape and rigidity can be changed by several small muscles. The opening beoween the vocal cords is called the glotis and is V-shaped because the vooal cords stay together in front while moving apart in back. The glottis is approxie mately 2 cm long and 1 cm across when open wide. 14.1 THE VocaL AppaRATUs 295, Felse vocal False vocal cords: cords Thyroid: Thyroid cartilage cartilage ‘True vocal True vocal cords: cords Thyroid, cartilage Mocal cord BL Antenci EJ cartilage ) FIGURE 14.2. The larynx, which is approximately 7 cm high and 5 cm across. (a) Cutaway. view, looking from back to front. (b) View from above, with front at top, showing aryeenoid cartilage whose movements open and clase the vocal cords The vocal cords close for swallowing, as a backup in case anything gets past the epiglottis, and they are open during normal breathing. For phonation the vocal cords close, or nearly so, and the lungs apply a pressure equivalent to a col. tumn of at least 5 cm of water (0.005 atmosphere), quite similar to that men. tioned for organ pipes in Chapter 12. The maximum pressure may be as high as 0.03 atm during speech, and 0.10 atm for singing. This excess pressure forces the Cords to open and admit bursts of air into the vocal tract. The cords vibrate at a frequency controlled by the tension applied in their muscles. For normal speech these frequencies typically extend over a range from 70 to 200 He for a man’ voice or 140 to 400 Hz for a woman's; the difference is due to the longer and more massive vocal cords of adult males. These ranges may be extended upward another octave or more when singing. 296 CHAPTER 14 THE HUMAN Voice Immediately above the larynx is the pharynx or throat cavity. This opens to the outside through the mouth, with the tongue, teeth, and lips providing addi. tional means of restricting or blocking the airflow. Depending on the position of the soft palate, the throat may or may not open also into the nose. The size and shape of the vocal tract can vary greatly and thereby produce widely differing sounds. This occurs largely because of the tongue's ability to change both its post tion and its shape in several ways. C77 OUND PRODUCTION Let us now describe in more detail how this mechanism works to produce audible sound. We must in fact describe several mechanisms, because speech includes several different types of sounds. Each distinct elemental speech sound is called a phoneme. The usual distinction between consonants and vowels suggests that we might explain them as corresponding to transient and steady sounds, respectively. But that is overly naive; for acoustical purposes it is more informative to classify the phonemes into five groups: plosives, fricatives, other consonants, pure vowels, and diphthongs. The plosive consonants are those produced by completely blocking the vocal tract and then suddenly opening it to let a single burst of air through. This can be done at several different points; when the blocking and release occur at the lips we have p, at the front of the tongue t, and at the soft palate k. These are clearly transient sounds (Figure 14.3a); there is no way you can “hold” a plosive. Acoustically, the plosive isa simple suelden pulse of higher pressure followed by a brief interval of whatever damped vibration this sets up in the vocal tract together with the breathy sound of air continuing through the opening until the excess pressure is fully relieved. This nonsteady, nonperiodic sound has a con- tinuous spectrum of frequencies and thus no definite pitch. I ws 0.1 sec >} 10 me________ (@) (o) FIGURE 14.3 Oscilloscope traces for (a) the plosive k and (b) the fricative sh 14.2 Sounp PRopuction 297 Each of the unvoiced plosives described above has a voiced version as well— 5, d, and g. In these, the vocal cords are set to begin steady vibration with some vowel sound immediately (meaning within approximately 30 ms) after the ait is released instead of leaving a larger gap between (compare coal and goal). They also may be voiced very briefly when occurring at the end of a word (compare lack and lag). The transient nature of plosives means they carry little total seund energy so singers are taught to exaggerate them, The fricatives also come in unvoiced/voiced pairs: f, vs th (as in thin), th (as in them); s, x sh, zh (measure). The last four are sometimes called the sibilants. To this list we add one more, unvoiced only: h (hat). These are quite unlike the plo. sives in that they can be sustained steadily for any length of time (even though in normal speech they are not). Yet their unvoiced versions have no identifiable pitch; they are steady only in the sense that white noise is steady; their waveforme are nonperiodic (Figure 14.3b) and their spectra include a continuous range of frequencies, not just a harmonic series. The same statements can be made about that part of the complex sound of the voiced fricatives that distinguishes them from vowels. (To hear for yourself that a voiced fricative is a mixture of two dis. tinct sounds, only one of which has pitch, try gradually opening your mouth while saying “zzzzz"; chen say it again while gradually closing your lips. In the first case you are left with a vowel, in the second with the unvoiced s.) The “frying” sound of the fricatives is merely the turbulence in a fluid flow ing through a small opening at greater than critical speed. As with the plosives, this opening may be formed at several different points along the vocal tract, These places of articulation are the lips and teeth (ff, th), tongue and palate (for. ward for s/z, farther back for sh/zh) and glottis (h). The different vocal tract shapes help emphasize certain frequency ranges (around 4 to 6 KHz for s versus 2 to 3 KH: for sh, for example) but no particular individual frequency within those ranges. There is insufficient positive feedback (resonance) to control the flow and force it to be periodic instead of random. Under the heading “other consonants” we group several kinds that accom- plish transitions whose acoustical properties are not fundamentally different from those described elsewhere. These include the semivowels (or glides) w and y and the liquids! and r (both loosely related to diphthongs) and the nasals m, n, and ng, which encompass a vowel-consonant transition. As a nasal begins, the vocal cords already are vibrating. Only the nose is open, but that allows a steady voiced sound—the sound of humming, That sound can be held indefinitely before articulating the con. sonant, and it has definite pitch and a harmonic-series spectrum. It could he called “the nasal vowel” and added to the others below as far as acoustics is concerned; it is practically the same for all three nasal phonemes. The consonant ending of the nasal results from the opening of the mouth passage to prepare the way for some fol. lowing phoneme. The nasals differ from each other only in place of articulation of their final consonant—lips for m, tongue for n, and soft palate for ng The vowels are steady, voiced sounds with definite pitch; their waveforms are Periodic (aside from small imperfections such as unsteadiness in muscle control or superimposed hissing noise from airstream turbulence) and their spectra are har. monic series. So itis appropriate to try to characterize the vowels according to the 298 CHAPTER 14 THE HUMAN VOICE e r re - a a - a Ss FIGURE 14.4 One cycle of vocal cord vibration in a young female singing C% these movie frames are at approximately 0.5-millisecond intervals. (Courtesy of Dale Metz and JASA.) relative strengths of these harmonics, which we shall do at length in the following section. Here we will concentrate on how voiced sounds in general are produced by the vocal cords. When a stream of ar is sent between nearly closed vocal cords, there is critical speed (or critical pressure driving it) above which the flow cannot be steady. Fluid- dynamic instability makes the cords vibrate back and forth (Figure 14.4) so that the flow becomes intermittent. The resemblance to a trumpet player’ lips (Figure 13.7) is obvious. We can classify the voice as another reed instrument and apply to it some of the concepts developed in the last chapter. But there are certain features of the flow instability that deserve further explanation, and these are presented in Box 14.1. The Bernoulli effect Understanding the vibration of vocal cords and brass players’ lips requires consideration of the Bernoulli effect. This concerns the change in pressure at different points along a stream of flowing fluid, and has many other applications in such devices as sailboats, airplane wings, as- pirator pumps, and airspeed indicators. Consider the airstream in Figure 14.5. For the same amount of air to flow steadily through the entire channel, its speed must be greater in the constriction 8 than at A or C. But if the air speeds up in going from A to B, something must be pushing it toward the right; that is, the fluid pressure at A must be greater than at B. Simi- larly, to slow it down again, the pressure at C must be greater than at B. In general, the pres- sure along any fluid streamline is reduced wher- ever the speed is increased; that is the Bernoulli effect, Now consider hypothetical oscillations of a lip or vocal cord (Figure 14.6a). The springlike force of tension and elasticity in the tissues tends to return the mass to its equilibrium position (Fig- ure 14,6b). The Bernoulli force, coming from a reduction of pressure at B below atmospheric, pulls the mass downward; but its strength changes during the motion. For a narrower con- striction the velocity and resulting downward force are reduced; if the channel widens, they (continued) 14.2 Sounp PRopuctioN 299 Box 14.1 (continued) FIGURE 14.5 The mass M isa physicists analogy to a vocal con its merely a minor detall that iis not @ Pais For suffcienely large airflow, M will undergo steady oscillations, thus periodically changing the flow that i, creating sound waves. (From Fundamentals of Musical Acoustics by Arthor H. Berle Copyright © 1976 by Oxford University Press, Inc. Reprinted by permission.) both increase (Figure 14.6c). The average Bernoulli force moves the equilibrium position down a bit; it is the changes above and below average that pull effectively sometimes down- ward and other times upward, As described thus far, the Bernoulli effect has the same result as if the spring were made stiffer. The mass would oscillate only if “kicked,” and that oscillation would gradually die away as energy is lost to friction. How can the oscilla- tion spontaneously grow to large amplitude and continue indefinitely? Only if it arranges to re- ceive a continual supply of energy. This comes about because the inertia of the air means that it takes a little while after a change in pressure is applied before the flow attains its new speed. So the least and greatest flow speeds, and the accompanying least and greatest Bernoulli forces, occur a little later than the times of widest and narrowest openings (Figure 14.6d), This is precisely what is needed to make the oscillating part of the Bernoulli force act down- ward during the greater part of the downward motion of the mass and similarly upward for the upward motion, thus delivering more energy to the motion than it takes away. This extra energy helps make up for frictional losses so that the oscillation can be sustained at large amplitudes. (See also Box 11.1.) Lips and vocal cords also are subject to forces that are parallel to the flow and in most cases stronger than the Bernoulli forces; these Produce rolling motions so that the total pic- ture is somewhat more complicated. But again the inertial time lag between applied force and (continued) 300 CHAPTER 14 THE HUMAN VOICE Box 14.1 (continued) @® FIGURE 14.6 (a) Width of opening in a channel during vibration. All dashed lines show time-average values. (b) Restoring force exerted by the spring; note this is downward when the position of the mass is high and upward when it is low. (c) Bernoulli force, always down but stronger downward than average when mass is high and weaker when it is low. The alternating part of this force (difference from the average value) is negative for mass high and positive for mass low. (d) Bernoulli force with a hypothetical one-eighth-cycle lag of flow in response to pressure. Bernoulli force isin same direction as motion (delivers energy to the system) three-fourths of the time in this case (horizontal bars) instead of only half resulting motion is an essential ingredient in making self-sustained oscillation possible, The Bernoulli component must be relatively minor for ‘organ or woodwind single reeds, for which most of the reed area feels only a quasi-static pressure difference; only a narrow strip around the edge of the reed is close enough to the shallot or mouthpiece facing to experience a significant re- duction in pressure. Benade reports an estimate by Worman that the Bernoulli effect contributes only a few percent to the forces on a clarinet reed, This still can be important, however, in de- termining the exact way the reed finally closes against the mouthpiece, The Bernoulli effect plays a somewhat greater role for the oboe fam- ily’s double reeds. 14.3 FORMANTS 301 There is also a critical difference in the relation between reed and resonator. In all the brass and woodwind instruments, the feedback from the tube is strong enough to have a major influence on the reed frequency, so bore and bell shape are especially important in determining the sounded pitch. The vocal tract differs in having soft, yielding walls that absorb much of the vibration energy. Whatever resonances occur in the vocal tract are relatively weak—like the broad, low curve in Figure 11.16 (page 217) rather than the extremely high, sharp peaks of Figures 13.15 or 13.17 (pages 277, 279). So the feedback from vocal tract to vocal cords is much too weak to influence them. Even though made of soft tissue, the vocal cords acoustically must be viewed as hard reeds whose vibration frequency and waveform are determined almost entirely by their own tension, mass, and separa- tion, and with some slight influence from the hing pressure. So regardless of what vowel may be involved, the pitch of the voice is deter- mined by muscular control in the larynx. Leaning to sing pitches in a musical scale accurately on demand is a matter of training the mind by repetition to control these muscles precisely. And vowel identity is determined by an entirely separate set of muscles controlling vocal tract configuration, as we shall see in Section 14.3. The diphthongs are quick transitions, starting as one vowel but ending as an- other because the tongue changes shape. There is no long i sound (as in might), for instance, that can be sustained. The essence of the diphthong is in the transi. tion, in this case ah-ee. Others in English are eh-ee (mate), aw-oo (moat), a-ou (mount), and atw-ce (oil). Diphthongs are a veritable minefield for singers, because on long notes they raise the problem of how much to lean toward the transition and how soon to finally go through with it. Trying to hold the diphthong at the halfway point may change the meaning altogether (for instance, mate to mitt) BEERS es ee Ofall the phonemes, the pure vowels are of special concem to musicians, because it is these that are sustained for the assigned length of each musical note. We should like to understand how different vowels are produced and what acoustical properties make each one distinguishable from the others. We already have stated that vocal tract resonance is too weak to control the oscillations of the vocal cords. We may carry that line of reasoning one step further: Not only is the frequency of cord vibrations determined almost exclu- sively by the larynx, so also is their waveform. That is, for a given lung pressure and vocal cord opening and tension, practically the same cord vibration takes place regardless of vocal tract shape (as long as itis at least open). Producing dif. ferent vowels by moving the tongue around must mean producing different filter- ing actions on one and the same sound from the vocal cords. What is the sound input to the vocal tract? If you could hear it unaltered, you would discover it is a buzzing sound rather like that from a trumpet player’ lips separated from the trumpet. It is a series of puffs of ait, whose exact nature depends on how forcefully the air is being sent through the larynx. For gentle 302 HAPTER 14 THE HUMAN VoIcE 2 . it ' t f (a) . Li (b) Q. Ak ©) FIGURE 14.7 Approximate waveforms created in the larynx (left) and the corresponding harmonic spectra (right). Q is the total flow rate through the vocal cords in em?/sec; dashed lines represent average flow, Note that vertical scales differ for the three graphs. (a) A very soft sound for which the glottis never completely closes; the harmonic content is extremely poor. (b) An intermediate case, with a smoothly decreasing spectrum. (c) A very loud sound, with the glottis staying closed about one-third of each cycle; the lowest half- dozen harmonics have comparable amplitudes, and the spectrum drops off beyond them. sounds the vocal cords may never close completely and the waveform may be fairly smooth (Figure 14.7a). More commonly, the flow is shut off during some portion of the cycle, which may be as much as a third for high breath pressure and close initial cord spacing (Figure 14.7b,c). For purposes of understanding how these sounds are modified by the vocal tract, it is most helpful to translate each one into its recipe of Fourier components. For subsequent discussion let us take Figure 14.7b as typical of moderate intensity. How does the vocal tract modify the sound spectrum? It is roughly in the form of a tube about 17 cm long, closed at the inner end (as for all reed instruments) and open at the mouth. Suppose as a first rough approximation we pretend that it is a uniform cylinder. Then Figure 12.3 (page 235) reminds us that the natural frequen- cies are odd multiples of v/4L, or 500, 1500, 2500, 3500, ... Hz. Suppose further (to make the numbers easy) that we have a male voice singing a pitch a little above Gy so that the frequency of vocal cord vibrations is 100 Hz and their spectrum includes all multiples of 100 He. 14.3 FoRMANTs 303 Input impedance TT 1000 2000 3000 Z (a) Pressure amplitude 1000) 2000 3000 ) Pressure amplitude ! Ene 7208 00 3 (© FIGURE 14.8 (a) Resonant response of the hypothetical perfectly cylindrical vocal tract. (b) Strength of harmonic components of a spectrum like that of Figure 14.7b (smooth tends only, ignoring individual even-odd differences) after transmission through the cylindrical vocal tract for fundamental frequency 100 Hz, (c) The same for 200 Hz fundamental, Naive application of the vocal tract resonance idea would suggest that only 500, 1500, 2500, ... Hz come out of the mouth with appreciable strength while all others are suppressed. But we must remember that these are only weak, broad reso- nances because of the softness of the tube walls (Figure 14.8a). So all spectral compo- nents remain in the radiated sound, and there is merely a mild boost ofall frequencies twithin the general vicinity of each resonance (Figure 14.8b). Each such frequency range, in which amplitudes of spectral components are enhanced, is called a formant. Suppose our subject sings an octave higher while keeping the same uniform cylindrical vocal tract shape. Then the harmonic series contains all multiples of a 200 Hz fundamental, but the formants remain the same (Figure 14.8c). The ear somehow recognizes the locations of formant regions almost independently of what- ever individual frequencies make up those formants, so that the sounds represented 304 CHAPTER 14 THE HUMAN Voice — 1 2 3 (KH2) @ TikHz) (b) a er a a 3 F (KHz) i} FIGURE 14.9 Formant-defining resonance curves of (a) a cone of length 17 cm, (b) a cylindrical botle of length 9 cm and cross-sectional area § cm? with a neck of length 6 cm and area 1 cm*, and (c) a narrow tube (length 8 cm, area 1 cm?) opening into a wider one (length 9 cm, area 8 em). (Based on data from Strong and Plitnik, Chapter 5.) in Figure 14.8b,¢ are perceived as having approximately the same vowel quality. (Specifically, it would be a relatively neutral sound such as ea in heard.) What about other vowels? There are a few other idealized shapes for which resonant frequencies are only mildly difficult to calculate; the one that is simple enough for us to give the answer immediately is a cone (Figure 14.9a). For the same 17-cm length, this gives formants around 1, 2, 3, ... KHz. Although these formants give a sound that would probably be identified as the short a in had, we faust not jump to the conclusion that the vocal tract is really cone-shaped for that vowel; there may be other more complicated shapes whose first few formants happen to fall at similar frequencies. Strong and Plimik present calculated results for two cases where two cylindrical tubes of different diameter are joined (Figure 14.9b,c). These roughly approximate the vocal tract shapes, and resulting formants, for ee (heed, tongue up and forward) and aw (bought, tongue down and back), as illustrated in Figure 14.10. Additional examples of formant spectra and corresponding waveforms are shown in Figure 14.11. The first formant frequency is especially affected by the 14.3 ForMANTS — 305 J FIGURE 14.10 Vocal tract configurations for vowels in see, put, lt, and bought. Tongue forward for ee and e, back for u and aw; tongue high for ee and w, low for e and aw Compare ee and aw with Figures 14.9b,c, respectively. (From The Speech Chain, by Peter B. Denes and Elliot N. Pinson. Copyright © 1963 by Bell Telephone Laboratories, Inc. Reprinted by permission of Doubleday & Company, Inc.) = 3 h } * lin (a) i ball Time Frequency (b) el g Etdh | 8 Time Frequency (c) FiGuRE 14.11 Waveforms and spectra for (a) ah at 150 Hz, (b) ah at 90 Hl, and (c) uh at 90 Hz. Compare (a) and (b) for same formants but different fundamental, (b) and (e) for same fundamental but different formants. (From The Speech Chain, by Peter B. Denes and Elliot N. Pinson. Copyright © 1963 by Bell Telephone Laboratories, Inc. Reprinted by permission of Doubleday & Company, Inc.) 306 CHAPTER 14 THE HUMAN VOICE 3000 2000 1000 ‘Second formant frequency (Hz) 200 500 1000-4500 First formant frequency (Hz) FIGURE 14-12 _ Regions of vowel recognition in terms of the frst two formant frequencies, for standard spoken American English. Values for men tend to be toward the lower left of each region (both frequencies lower), for children coward the upper right, and for women intermediate. jaw opening, the second by the body of the tongue, and the third by the place- ment of the tip of the tongue. We could proceed to show formant graphs for a long list of different vowels or a large table of their formant peak frequencies. But it is perhaps more informative to take a limited portion of this information and work it into a single picture (Figure 14.12). Although we actually are aided in vowel recognition by third and fourth formants, it is possible to distinguish them largely on the basis of the first two formants alone. If we choose to make 4 graph with horizontal axis representing first formant frequency and vertical axis representing second formant frequency, then every point on this graph represents a unique pair of formants. Ideally, each vowel would correspond to such a point, but of course in real life there is considerable variability not only from one speaker to another but also from time to time for each person, So there is a whole range of formant-frequency pairs (a whole region in this graph) that may be used to convey the same vowel information. A first for. ‘mant in the vicinity of 800 Hz and second formant near 1500 Hz, for example, give uh as in sun These ranges overlap, so that in some cases precisely the same sound may be perceived in two different ways. For instance, with formants at 500 and 143 FoRMANTS 307 2100 He you may think you hear either hid or head. Usually only one of the two makes sense in context, and we automatically pick that one without consciously realizing the ambiguity. This flexibility in vowel perception is helpful in music: Because some vowels sound shrill when sustained, singers can be taught to “cover” their tone, shading short e (head) toward ea (heard), for example, without losing intelligibility. Thus a modified version of Figure 14.12 for sung vowels would have several of the uppermost blobs moved downward, One of the reasons for the rather elongated area for each vowel is that men, women, and children do not all have the same size vocal tracts. They could not reasonably be asked to produce precisely the same formant frequen- ies. Listeners seem to readily make allowance for this; A formant pair at 700 and 1000 Hs might be perceived as aw (hawed) in a child's speech but ah (Rod) in a man’s, because of its position relative to other vowels heard from the same speaker. The “long o” (moat) provides a nice illustration of the pitfalls awaiting those who want everything to fit neatly into standard pigeonholes. People who study running speech generally classify this as a diphthong, aw-oo, as we did at the end of Section 14.2. But singers will insist that they can sustain a long o, and regard it as a pure vowel. That is, they would be inclined to add another blob (sone) in the lower part of Figure 14.12, occupying the empty space among soon, soot, and sought, and partially overlapping them. ‘An important modem tool for analyzing the rapid succession of phonemes in running speech is the speech spectrogram (Figure 14.13). This uses darkness of shading (or colors) to represent strength of signal so that a whole series of spectra can be displayed, with frequency of spectral components represented on the vertical axis and time elapsed on the horizontal axis. Then the acoustic fox. tures of speech production can be studied, and especially the changes in for. mants resulting from adjustments in vocal tract shape. Formant frequencies for Pure vowels, which in singing would remain constant for a long time, seldom remain the same for even a tenth of a second. Even these vowels are modified by quick upward or downward formant shifts at the transitions to and from ad- Joining consonants, which generally require a different vocal tract configura, tion. It is the formant shifts themselves that characterize the semi-vowels and diphthongs. There are two factors, dynamic level and voice range, that sometimes steatly reduce the distinctness of vowel formation. Reexamine Figure 14,7, and ask what happens to each of those spectra as they pass through the formant filter- ing process. The results are qualitatively as shown in Figure 14.14, which suggests that an extremely soft sound may provide so little high-harmonie input that the higher formants are hard to recognize. You can easily verify, by doing a little singing, that softly sung vowels are rather colorless, while loud ones are much more distinct from one another. The other problem occurs when higher pitches are sung. Higher fundamen- tal frequencies spread out the harmonic series of the vocal cord input, even to 308 © CHAPTER 14 THE HUMAN Voice Time (sec) (Og 02 1 04l 0.8 108) 0.0) Frequency (KHz) SAY B I TEAG AIN (a) Time (sec) o 02 04 08 08 10 Frequency (KHz) S_AY BI TEAGAIN ) FIGURE 14.13 A speech spectrogram for a male voice saying “say bice again’ (a) with shore pauses separating the words and (b) in normal running speech. The broad, dark bands running horizontally represent formants. Ignore the narrow vertical stripes, which are only an artifact of the method for producing the picture. (Courtesy of Kay Elemettics Corporation.) the point where a given formant simply may not have any candidates for resonance within its range (Figure 14.15a). This means that the best sopranos, especially on their higher notes, are prepared to shift formants, deliberately sacri. ficing vowel accuracy to have strong enough low harmonics to make a strong and musical tone (Figure 14.15). Several vowels all may merge into approximately the same sound on the high notes. 143° ForMants 309 im ri @ iy i" ll AL (b) att FIGURE 14.14 Application of the filtering action of a cylindrical vocal tract (Figure 14.82) to cach of the signals of Figure 14.7 for fundamental frequency 100 He, (a) If some harmon- ics are extremely weak in the original spectrum, the filter will not compensate; so higher formants are sparsely populated for a soft sound, (b) An intermediate case; this isthe same as Figure 14.8b. (c) The richer input spectrum of a loud sound makes its higher formants more Prominent in the output. To close this section, we add remarks about the comparison between voice and other instruments. The discussion of vowel recognition from formants Suggests retuming to the question of instrument recognition (Section 6.7) and asking whether a similar mechanism may operate there. Some studies (see references on page 314) have reported identification of characteristic formant frequencies for several wind instruments. More recent work, however, led Benade to believe that they are better characterized by a single “cutoff frequency.” Below this frequency the spectra are relatively flat, while above it they fall off steeply. These cutoff frequencies typically are approximately 400 He for bassoons, for in. stance, but 1400-1600 Hz for obves and clarinets. Finally, a reminder. Trensiente also provide extremely important recognition clues to supplement those available in the steady sound. 310 CHAPTER 14 THE HUMAN Voice (a) 1 2 3 Frequency (KHz) ) FIGURE 14.15. (a) Normal formant placement for 00 (cool) gives little support to the first several harmonics of 660 Hz (Es). (b) Ifthe soprano is willing to shift formants toward could ot even cull, the first two harmonics can be strengthened to give a more solid tone, There are significant differences in the way the vocal apparatus produces steady tones in different pitch ranges. An important part of voice training lies in leaning to control and utilize transitions between “chest voice” and “head voice.” This is most obvious for male singers in the change from normal voice to falsetto on high notes. In falsetto singing, the vocal folds are stretched longer and thinner, becom- ing effectively stiffer and thus making it possible to vibrate at higher frequencies than could be reached otherwise. The glottis generally does not close in falsetto singing, so the waveform is relatively smooth and the tone color more pure of bland (refer to Figure 14.7a). The wider glottal opening also accounts for the singer’ air supply running out more quickly in falsetto singing, Vocal teachers sometimes use words such as resonance and projection in ways that seem vague and hard to understand to an acoustician. Remember that resonance means an especially strong vibration that occurs because some system is driven at a frequency close to that of its own free vibrations. It is reasonable enough to speak of carefully controlling jaw, tongue, and soft-palate positions to adjust vocal tract reso- ances to produce better tone quality. But resonance in the chest is a false issue. It may, of course, be helpful psychologically for voice students to think in a way about 144 SPECIAL CHARACTERISTICS OF THE SINGING VOICE 311 their chests that leads to producing a steady, strong, and well-controlled pressure at the larynx. But any literal resonance in the chest is quite out of the question, even if Wwe ignore the way the glottis practically isolates the lungs from the vocal treet, The spongy lung tissue isa prime example of a region that will greedily absorb any vibra, tions that enter it and never return any strong reflected waves Similarly, talk of projection may become associated in the singer's mind With muscular controls that produce louder, steadier sound at the vocal cords ot a Wider mouth opening. But intensity will always fall off with distance in quite the same way, and directional spreading will he controlled entisely by diffraction, Our statements in Chapter 4 make it clear that audible wavelengths from aprroxic mately 15 m down to perhaps 0.5 m (frequencies up to approximately 700 Hs) are radiated almost equally in all directions, being well able to get around the barrier formed by the head. For higher frequencies the sound goes largely to the front half-space rather than the back but still spreads well throughout that half nly at extremely high frequencies (say, 6 KHz ot more, corresponding to wave. lengths below 6 cm) could a singer reasonably expect to literally project a beam of sound in one particular direction by using a very wide mouth opening. Such frequencies, of course, would have to be higher harmonics; singing 2 fundaneneal well beyond C, is entirely out of the question, Sundberg presents an interesting explanation of how a singer—say, an gperatic tenor-—can make himself heard above an orchestral accompaniment, Even though the total sound output of the singer can hardly match that of che orchestra, the singer still can draw attention to his part by concentrating much of fis acoustic power in a part of the spectrum where the orchestra is not so strong (Figure 14.16); the result may be perceived by listeners as “good projection.” The Relative sound level (a8) 7000 200030004000 Frequency (Hz) FIGURE 14.16 Long-term average sound output a different frequencies for (a) typical orchestral music and (b) an operatic tenor. The tenot is heard above the orchestra Partly because of his strength in the singers formant at 2500-3000 He. (After Sundberg.) 312 cHaPTER 14 THE HUMAN VOICE apparently is accomplished by a lowering of the larynx and accompanying expansion of the throat immediately above it, which makes a discontinuity in the cross section of the vocal tract. This enables the larynx to have some standing waves of its own, nearly independently of the rest of the vocal tract, and the first of these has a frequency of approximately 2500 to 3000 Hz. This resonance pro- vides the singer's formant, enabling the singer to be heard well at the expense of some distortion of vowel production. A different strategy called belting is a characteristic part of Broadway musi- cal theater style and achieves loud sound with vowel colors more similar to speech than to operatic singing. It is produced with a narrow pharynx, elevated larynx, and high lung pressure. This keeps the vocal cords closed for a larger por- tion of the vibration cycle, giving relatively shorter glottal air pulses and corre spondingly greater content of high overtones. Careless belting can be harmful to the voice. A more unusual way of using the voice is the overtone singing (or “throat singing”) that is esteemed in several central Asian cultures, especially in Tuva and Tibet. This too involves short glottal pulses and strong overtone content. But forward placement of the jaw, narrowed lip opening, and tongue raised in the back are used to raise and sharpen the second formant so that one of the higher harmonics becomes sufficiently prominent to be heard as a separate tone. By carefully tuning this formant to various harmonics (between the sixth and twelfth), the singer produces a high-pitched melody over a sustained constant bass note. Vibrato (frequency modulation) is a voice characteristic that is sometimes cultivated; fashions often change. Vibrato is produced by pulsations in the cricothyroid, one of the tiny muscles in the larynx. Singers can also use variation of subglottal air pressure to achieve tremolo (amplitude modulation, sometimes confusingly called “amplitude vibrato”; refer to Section 8.3 for this distinetion in terminology). The best training for a singer probably is to learn to sing both with and without vibrato, so that it can be used deliberately for special effect rather than being insistently and remorselessly present all the time. How often should the modulation occur? Modulation frequencies of approximately 5 to 7 Hz usually are judged to be most pleasing; at 4 Hz ot less the pitch will seem unclear, and 8-Hz vibrato begins to give an impression of nervousness. How much should the audio frequency be changed by the modulation? Excursions of half a semitone on either side of the central pitch (3% in frequeney) are not uncommon. Much more than that is distracting: much less than 1% will hardly be noticed; around 2% seems to be musically pleasing. Why is vibrato considered desirable not only for the voice but for many other instruments as well? A cynical answer from evolutionary biology would be that many human voices do waver when they sing; therefore we have come to believe it is a good thing. And because the voice does it, perhaps we think other musical sources also should. But we also could say that vibrato lends warmth to the tone, as well as drawing attention to the solo lines, where it is often strongest. SuMMARY 313 There also may be good acoustical reason to have more modest amounts of vibrato in ensemble music, where it blends in with the chorus effect discussed at the end of Chapter 5. If two or more voices sing together, a little vibrato can camouflage defects in the ability of the singers to stay perfectly in tune with one another. Similarly, the sound of a string section in an orchestra may blend better when there is some vibrato. The problem encountered in the opposite case of rigidly fixed frequencies becomes quite clear if you ever hear two organs (or to a lesser extent two pianos) together; modest amounts of mistuning between them become prominent and distracting Finally, why is 6 Hz more desirable than other values for the modulation frequency? At least two theories have proposed that modulation is most easily produced at this frequency, but there is no strong proof for either one. Certain natural brain rhythms occur at similar frequencies, so perhaps it is at these fre- quencies that it is easiest for the brain to send the series of electrical control signals to whatever muscles are used to produce the vibrato or tremolo. Alter- natively, crude estimates of the abdominal mass and the springiness of the air in the lungs suggest that the natural vibration frequency for the body contents is approximately 5 or 6 Hz; this makes it easier to maintain a good tremolo at such frequencies, and so we have come to prefer it. Finally, there may be a per- ceptual reason, too: Any sounds within one- or two-tenths of a second tend to become merged during processing by the ear and brain. Thus, modulation fre- quencies much above 5 Hz will become harder to perceive as such; there will seem to be a homogenized rough sound instead of a rhythmically varying simple sound. The human voice produces three distinct types of sounds. The plosive conso- nants are intrinsically transient and are made by blocking and then suddenly opening the vocal tract. The fricatives depend on turbulence in air forced through a narrow opening to generate a continuous noise. Only the vowels have periodic waveform and definite pitch, and they depend on vocal cord vibration. Those vibrations are of the lip reed type and are aided by the pressure reduction that occurs where an airstream passes through a constriction (the Bernoulli effect). The pitch of a sustained vowel is determined entirely by the vocal cords But the vowel identity is determined by the shape of the vocal tract, which determines the location in frequency of the broad resonance bands called formants. A formant enhances those components of the harmonic series of the vocal cord vibration that happen to fall within the formant region. Distinct vowel formation is more difficult for soft sounds or high pitches, in both cases because of a dearth of harmonics to populate the formants and make them recognizable.

You might also like