THE EAR A sound source produces sound waves by alternately compressing and rarefying the air between it and

the listener. The compression causes an increase of pressure above the normal atmospheric pressure on the ear corresponding to the positive cycles of the waveforms. The rarefaction causes a decrease of pressure below the normal atmospheric pressure on the ear corresponding to the negative section of the waveforms. The ear responds to these variations in pressure by producing the sensation of hearing. The frequency of the wave reaching the ear determines the pitch the listeners hear. The ear perceives frequencies that are even multiples of each other to be especially related, and this relationship is the basis of the musical octave. For example, since concert A is 440 Hz, the ear hears 880 Hz as having a special relationship to concert A; namely, that is is the tone higher than concert A that sounds most like concert A. The next note above 880 Hz that sounds most like 440 Hz would be 1760 Hz. Therefore, 880 Hz is said to be one octave above 440 Hz and 1760 Hz is said to be two octaves above 440 Hz. The human ear does not respond to all frequencies of pressure wave. It responds to frequency range of about 15 Hz to 20 kHz, a range of ten and one-half octaves. Some young people may hear as high as 23 kHz, but the highfrequency response of the ear drops off with age and many people over 60 years cannot hear above 8 kHz. The ear operates over an energy range of more than 1012:1 and compresses the perceived intensity level to protect itself. The loudness of a sound is perceived by the ear as varying approximately in proportion to the logarithm of its energy. As a result, increasing the power output of an amplifier by 10 watts, from 10 to 20 W, gives a significantly greater volume increase than increasing power output from 60 to 70 W. The ear is sensitive to the ratio of the two power levels, and 20 W is two times 10 W or a ratio of 2:1. To get the same increase in loudness, the 60-W power output would also have to be doubled to 120 W. The ear has its greatest sensitivity in the range of 1 to 4 kHz. This means that a 1-kHz sine wave that produces a given sound pressure will sound louder than a 10-kHz sine wave, which produces the same sound pressure. In addition, the nature of the ear causes it to produce harmonic distortion of sound waves above a certain volume level. Harmonic distortion is the production of harmonics of a waveform that do not exist in the original signal. Thus, the ear can cause a loud 1-kHz wave to be heard as a combination of 1-kHz, 2-kHz, 3-kHz, etc., tones. Harmonics are very important with respect to musical instruments because their presence and relative intensities in the sound waves produced enable the ear to differentiate between instruments playing the same fundamental tone. For example, a violin has a set of harmonics differing in degree and intensity from that of a viola. This overtone structure is called the timbre of an instrument. SOUND-PRESSURE LEVEL The Fletcher-Munson equal-loudness contours indicate the average ear response to different frequencies at different levels. The horizontal curve indicates the sound-pressure level (spl) at different frequencies that are required to produce the same perceived loudness. Thus, to equal the loudness of a 1.5-kHz tone at a level of 110-dB spl, which is the level typically created by a trumpet-type car horn at a distance of three feet, a 40-Hz tone has to be 2 dB greater in sound-pressure level, while a 10-kHz tone must be 8 dB greater than the 1.5-kHz tone to be perceived as being as loud. At 50-dB spl, the noise level present in the average private business office, the level of a 30-Hz tone must be 30 dB greater, and a 10-kHz tone must be 14 dB greater

than a 1.5-kHz tone to be perceived as being at the same volume. Thus, if a piece of music is monitored so that the signals produce a sound-pressure level of 110 dB, and it sounds well balanced, it will sound both bass and treble deficient when played at a level of 50-dB spl. The loudness level of a tone can be also affect the pitch the ear perceives. For example, if the intensity of a 100-Hz tone is increased from 40- to 100-dB spl, the pitch will decrease by about 10%. At 500 Hz, the pitch changes about 2% for the same increase in sound-pressure level.

As a result of the nonlinearity of the ear, tones can interact with each other within it, rather than being perceived separately. Three types of interaction effects occur: beats, combination tones, and masking. PERCEPTION OF DIRECTION One ear cannot discern the direction from which a sound comes, but two ears can. If there is no difference between what the left and right ears hear, the source appears to be the same distance from each ear, and the only place a single source could fulfill this requirement is directly in front of the listener. This phenomenon allows the recording engineer to place sound not only in the left and right speakers of a stereo system but also between the speakers. By feeding the same signal to both speakers, the ear hears the sound identically in both ears, and the source appears to be directly in front of the listener. By changing the proportions fed to the two speakers, the engineer can create the illusion that the sound source is anywhere between the two

speakers and he can make the source appear to move. This technique is called panning. The other localization cues can also be used by the engineer to assign the source to locations between the two speakers through the use of electronic time delays, phase shifters, and filters. The ear can distinguish between a source in front of the listener and a source behind, above, or below him, through small movements of the head which provide slightly different perspectives of the sound due to the interference of the outer ear and the features of the face and body with the sound waves. LOUDNESS LEVELS Because the ear hears over such a large energy range, a logarithmic scale has been adopted to compress the measurements of sound into more workable figures. The system used is the decibel (dB). The decibel has no units attached to it, rather it expresses the ratio of two powers according to the formula dB = 10 log P1/P2… We can say, for example, that the 60-W amplifier is 1.76 dB more powerful than the 40-W amplifier. Since the ear responds proportionally to the power level in dB, the 60-W amp would be 1.76 dB louder than the 40-W amp if they were alternately connected to the same speaker. The ear requires equal ratios of power changes to produce equal loudness (increase or decrease). Thus, a 1.76 dB power increase from 40 W to 60 W increases the loudness by the same amount as an increase from 60 W to 90 W because this is also an increase of 1.76 dB. Under average conditions, the minimum level change that the ear can perceive is about 1.0 dB, although under laboratory or studio conditions of switching sound levels with no appreciable time delay between them, the sensitivity to change can be as low as 0.25 dB at certain frequencies. Thus, an increase from 40 W to 60 W of power does not bring about much of a loudness increase. The ear does not hear power directly, the power first must be converted into sound waves. The intensity of sound wave produced is directly proportional to the power which produced it, and therefore the ratio of two intensities is equal to the ratio of the two computed from their intensities is equal to the ratio of the two power which produced them. The intensity of sound is usually measured indirectly, through the measurement of sound-pressure levels (spl). Since the intensity of a sound wave is proportional to the square of the sound pressure, the power in the wave is also proportional to the square of the sound pressure. Or, I1/I2 = (spl1)2/(spl2)2 = P1/P2. Hence, dB = 10 log P1/P2 = 20 log (spl1/spl2). THRESHOLD OF HEARING In the case of spl, a convenient reference-pressure level is that of the threshold of hearing, which is the minimum sound pressure that produces the phenomenon of hearing in most people, and is equal to 0.0002 microbar. One microbar is equal to one-millionth of normal atmospheric pressure, so it is apparent that the ear is extremely sensitive. In fact, if the ear was any more sensitive, the thermal motion of the molecules of the air would be audible. The use of 0.002 microbar as the reference level (spl2) is indicated by expressing a value as a certain number of dB spl. The reference pressure level is called 0-dB spl. The threshold of hearing is defined as the spl for a specific frequency at which the average person can hear only 50 % of the time. THRESHOLD OF FEELING The spl that will cause discomfort in a listener 50 % of the time is called the threshold of feeling and occurs at a level of about 118 dB spl between 200 Hz and 10 kHz.

THRESHOLD OF PAIN The spl that causes pain in a listener 50 % of the time is called the threshold of pain and corresponds to 140 dB spl in the range between 200 Hz and 10 kHz. Microphone (MIC) Dynamic Mics The diaphragm is attached to a coil of wire located close to a permanent magnet. As the diaphragm vibrates according to the sound waves which reach it, the “voice coil” moves back and forth inside the magnetic field. This creates an electrical current, which is directly proportional to the movement of the diaphragm. Since dynamic microphones utilize an internal coil, they are sometimes called “moving coil” microphones. Dynamic mics are capable of producing excellent sound fidelity, and their rugged construction makes them desirable and relatively insensitive to the harsh handling production mics are subject to in daily operation. Ribbon Mics Also called “velocity” microphone, it uses a permanent magnet to provide a magnetic field and a thin metal strip or ribbon, which serves as both the diaphragm, to receive the sound waves, and as the generating element, to produce the electrical current. When the ribbon moves, it cuts through the lines of flux generated by the magnet, and this flux induces a voltage in the ribbon. The motion of the ribbon is determined by the difference in pressure between its front and back sides. This difference is proportional to the velocity of the air molecules which make up the wave. Condenser Mics They use a capacitor as the generating element. A capacitor is a device consisting of two plates which can hold an electrical charge when current is supplied. (Hence, a power supply is required.) The charge depends on the physical distance between the two plates, and varies as the space between them grows larger. The diaphragm of a condenser mic is actually on of the capacitor plates, which moves relative to the other stationary plate. When air pressure variations from the sound source move the diaphragm, the space between the two capacitor plates varies and its capacitance-changes modulate the voltage of the supply in the form of an electrical signal corresponding to the sound wave. Electret condenser mics employ a capacitor with a permanently-charged metallized plastic membrane. For powering, a low-dc voltage supplied by a small silver oxide or mercury battery is sufficient. Some electret mics can also take external powering. Another type is the RF condenser mic where its capacitor is powered by an 8-kHz signal derived from an external power supply. Directional Mics 1) Pressure-gradient principle

The front of the sensing diaphragm has unimpeded access to the sound pressure. The back of the diaphragm has limited access to the sound pressure. This difference creates a pressure gradient, causing the diaphragm to react differently to sound from various directions. 2) Interference tube principles The tube, attached in front of the mic’s diaphragm, contains a large number of sound inlets distributed over the length of the tube. Each inlet is dampened in a specific way to cancel partially the sound within the tube, depending on the angle of sound incidence. 3) Combination Signals arriving from the rear of the mic are acoustically phase shifted in the body of the mic and applied to the back of the diaphragm so that a signal arriving from the rear of the mic will be applied equally to both the front and back of the diaphragm and generate no output. Signals arriving from the front reach the diaphragm without any phase shifting. A signal from the front which enters the ports is phase shifted twice: once by the time it takes for the wave to travel the external distance from the diaphragm to the ports and a second time by the phase-shifted network inside the case. When this wave reaches the back of the diaphragm, it is in phase with the wave at the front and reinforces it. Directional mics have the property that their bass response increases as the signal source gets closer to the mic called proximity effect. (higher frequencies have shorted wavelength, experience greater phase change and cancellation takes place) Mic Impedance (low: 150-600 ; high: 20 – 50 k) Important things to remember: 1) Mic and recorder (or mixer) should be compatible with respect to impedance. 2) It’s better to have the mic lower in impedance than the recorder (or mixer) than vice versa. 3) Low impedance is more desirable than high impedance for profession gear - less susceptible to electrostatic pickup (noise) e.g., lamp & motors - long cables may be used (capacitance increase with cable length shorting out high frequency information EQUALIZERS One of the most important signal-processing devices used in the multitrack studio is the frequency equalizer. This device gives the engineer control over the harmonic balance or timbre of instruments heard by the listener and can be used to compensate somewhat for deficiencies in microphone frequency response or for deficiencies in the sound of an instrument. The frequency equalizer has several uses: • to make the sounds from several mics or several tape tracks blend better than they would otherwise • to match the sound of an over-dubbed instrument to the same instrument recorded with a different mic or in a different place • to make an instrument sound completely different from the way it normally does for a particular effect and • to increase the separation between instruments by rolling off the leakage frequencies.

Equalization (EQ) refers to altering the frequency response of an amplifier so that certain frequencies are more or less pronounced than others. It is specified as plus or minus a certain number of dB at a certain frequency. Although only one frequency is specified at a time, the frequency set on equalizers used in the studio actually refers to a curve, so signals at 4 kHz and 6 kHz are also boosted somewhat by adding +4 dB at 5 kHz. LOUDSPEAKERS In the recording process, judgments and adjustments of the sound quality are based entirely on what is heard through the monitor system, thus, it is extremely important that monitors be set up and used properly. Speakers are the weakest link in the audio chain because their response is the most difficult to make flat. In addition, the acoustic of a room can create large peaks and valleys in the frequency response at the listening location. The only place a speaker can be truly be designed to have flat response is in an anechoic chamber, i.e., a room that absorbs all the speaker output and reflects more of it back: what is heard is the direct output of the speaker. Unless the rooms are of identical dimensions and furnishings, a speaker will sound differently (it will have a different frequency-response curve) in every room in which it is placed. Although equalizers can flatten speaker response, the output of many speaker systems and amplifiers fall off at the low end and becomes distorted if too much low-frequency signal is applied. Thus, flattening the response of a speaker system may lower its undistorted volumeproducing capability, making it unusable if high monitor levels are desired. Crossover Networks Because individual speaker elements (called drivers) are cleaner and more efficient in some frequency ranges than in others (i.e., have more undistorted output for the same level input signal), different drivers are often used in conjunction with one another to obtain the desired output. Large-diameter drivers such as 15-inch units produce low-frequency information more efficiently than high-frequency information; medium-size speakers such as 4- and 5-inch units produced midrange better than high or lows; and small speakers (1/2 or 1 inch) produce highs better than other range. These speakers are connected by crossover networks, which prevent any signals outside a certain frequency range from being applied to the speaker. The networks usually have one input and two outputs. Input signals above the crossover frequency are fed to one output, while signals below the crossover frequency are fed to the other output. The crossover network uses inductors and capacitors and is designed so that a signal at the crossover frequency will be sent equally to both outputs to provide a smooth transition from speaker to speaker. If a speaker system has only one crossover frequency, it is called a two-way system because it divides the signal into two bands. If the system has two crossover frequencies, it is called a three-way system. As many crossover frequencies as desired can be used, but most manufacturers use either two- or threeway systems. Electronic Crossover Better crossover networks, called electronic crossovers, have been designed in the past few years. Instead of being connected between a single power amp and several drivers, they are used between a preamp and several power amps. Each driver is fed directly by its own power amp (a three-way system would need three power amps per channel). There are several advantages to this approach.

(1) Since the signals are at low levels within the electronic crossover, active filters without inductors can be used, thus removing a source of intermodulation distortion. (2) Power losses due to the resistance of inductors in the passive crossover network are eliminated. (3) Since each frequency range has its own power amp, the full power of the amplifier is available to it regardless of the power requirements of the other ranges. SOUND IN ENCLOSED ROOMS Good Acoustics – Governing Factors Reverberation Time or Amount of Reverberation: This varies with frequency and is measured by the time required for a sound, when suddenly interrupted, to die away or decay to a level 60 decibels below the original sound. The reverberation time and the shape of the reverberation-time/frequency curve can be controlled by selecting the proper amounts and varieties of sound-absorbent materials and by the methods of application. Room occupants must be considered inasmuch as each person present contributes a fairly definite amount of sound absorption. Room Sizes and Proportions for Good Acoustics The frequency of standing waves is dependent on the room sizes: frequency decreases with increase of distance between walls and between floor and ceiling. In rooms with two equal dimensions, the two sets of standing waves occur at the same frequency with resultant increase of reverberation time at resonant frequency. In a room with walls and ceilings of cubical contour this effect is tripled, and elimination of standing waves is practically impossible. The most advantageous ratio for height: width: length is in the proportion of 1: 2 1/3: 2 1/3 or separated by 1/3 or 2/3 of an octave. In properly proportioned rooms, resonant conditions can be effectively reduced and standing waves practically eliminated reduced and standing waves practically eliminated by introducing numerous surfaces disposed obliquely. Thus, large-order reflections can be avoided by breaking them up into numerous smaller reflections. The object is to prevent sound reflection back to the point of origin until after several reflections. Optimum Reverberation Time Optimum, or most desirable reverberation time, varies with (a) room size, and (b) use, such as music, speech, etc. The desirable reverberation time for any frequency between 60 and 8000 hertz may be found by multiplying the reverberation time at 512 Hz by the desirable ratio corresponding to the frequency chosen. The reverberation time affects the intelligibility of speech unless suitable speech cadences are developed. The Sabine Equation At the turn of the century, W. C. Sabine, a professor of physics at Harvard University, experimented with the correction of a poor acoustic environment by the introduction of seat cushions taken from an acceptable acoustic environment. As a result of the experiments, he wrote the first usable reverberation-time equation:
RT 60 = 0.049 V Sa

where RT60 is the time in seconds required for a sound to decay 60 dB, V is the volume of the room in cubic feet, S is the boundary surface area in square feet, and ā is the average absorption coefficient. The value of ā is:

a=

s1a 1 + s 2 a 2 + K + s n a n S

where s1, s2, etc., are boundary surface areas; a1, a2, etc., are the absorption values for the boundary areas with which they are associated; and sn an is the total absorption of the people, furniture, etc., present in the room. For metric use, the constant 0.049 becomes 0.161, V is in cubic meters, and S is in square meters. The total sound absorption is calculated from a = αS where α is the absorption coefficient (percentage of incident sound absorbed) for each surface area S. As an example, consider (at 500 Hz) a living room 20 ft long, 13 ft wide, and 8 ft high, with a plaster ceiling (0.02), a carpeted floor (0.30), a wood-paneled side wall (0.12), and opposite glass wall (0.03), and end wall of medium drapery (0.40), and a brick fireplace (0.02) for the other end wall. With no additional furnishings or occupants the total sound absorption would be (0.02 + 0.30)260 + (0.12 + 0.03)160 + (0.40 + 0.02)104 = 151 sabins = 0.144(1048). The average absorption coefficient is 0.14. The reverberation time at 500 Hz would be approximately 0.68 s. The optimum reverberation time for a room depends upon room volume, sound frequency, and the type of sound which is most important to the room function, e.g., conversation, recorded music, or instrumental music. Larger rooms need greater reverberation to reinforce the loudness of sound at typically greater distances from the sound source. Lowfrequency sounds need a longer 60-dB reverberation time than medium than medium- or highfrequency sounds to have equivalent audible duration of reverberation, because of the higher threshold of audibility at low frequencies. Speech intelligibility can be degraded somewhat by the same amount of reverberation needed for maximum appreciation of some types of music such as classical organ. Conductors and musicians prefer a crowded rehearsal studio to be less reverberant than an equivalent stage space in a large concert hall for the same type of music. Acoustical requirements for sound distribution also depend upon the room function. For example, a lecture room needs outstanding one-way speech distribution from a rostrum (usually near one end of the room) to an audience seated toward the other end of the room. (Reflective surfaces near the lecturer and a need for absorption at the opposite end to minimize reverberation. By contrast, a conference room has many interchangeable source and receiver locations. A concert hall is the musical equivalent of combining the conference (a musical ensemble) with chiefly one-way communication from the performing group to the audience. A courtroom is an example having several scattered but well-defined source locations (judge, witness, attorneys) and seated groups of listeners (jury, public). All these situations require a combination of beneficially shaped sound-reflecting surfaces near the sound sources, acoustically absorptive audience areas, and acoustically absorptive or diffusing surfaces beyond the audience areas. Room echoes are discrete, separately heard sound reflections occurring too late to provide beneficial reinforcement to the direct sound. Beneficial early reflections arrive within

about 20 ms of direct-sound arrival. A concentrated echo arriving more than 50 ms late is a serious acoustical defect. A flutter echo is a rapid (usually regular) succession of reflected pulses resulting from a single initial pulse. Basic Considerations in the Measurement, Calculation, and Application of Acoustic Treatment While the measurements and/or calculations of RT60, ā, etc., can be made today with acceptable accuracy, these techniques and equations supply only a few hints of the variations in application of the material itself to achieve the optimum results with the minimum cost. A few of the more basic rules are listed: 1. Diffusion is highly desirable, and both absorption and room geometry should be employed to enhance it 2. Every effort should be made to preserve useful reflecting surfaces (those within 30 to 50 feet of a sound source) 3. Rarely should absorption be placed on ceilings. Preferred choices include the floor – carpets also lower noise levels at the source as well as providing absorption – rear walls, etc. 4. It should be considered that too high an RT60 will detrimentally affect intelligibility, and an RT60 that is too low requires much higher power output to overcome the excessive absorption 5. Low-frequency absorption is usually controlled by diaphragmatic action and highfrequency absorption by soft, fuzzy materials 6. Materials useful as absorbers are almost never useful as isolators. Absorbers are intended to control the reverberant field within an acoustic environment. Isolators are intended to keep sound inside a given environment or to keep sounds in other environments outside of the given environment. Good isolators are characterized by mass and rigidity. Good absorbers are characterized by porousness and nonrigidity. Successful speech and music reinforcement systems require a threefold design solution: 1. The reconciliation of the reverberation time, the directivity factor of the loudspeaker, and the distance from the sound source to the farthest listener so that an acceptable articulation loss for consonants in speech is obtained 2. The adjustment of the system parameters, within the limits set forth as necessary to achieve good articulation, to insure the required acoustic gain 3. The determination from the first two steps of the electrical power required at the input of the transducers to produce the acoustic power needed at the listener’s ear.

“The complex theories of physical acoustics, the uncertainties in predicting variations in the properties of both natural and manufactured building materials, and the difficulty in controlling installation procedures such as painting, sealing, bracing, furring, and draping all keep the specification and prediction of room acoustics from being an exact science. The frequent need to evaluate the final result by listening rather than by purely quantitative methods gives acoustical planning a somewhat subjective aspect. This is not unique. The solutions to many engineering problems involve human experience and statistical factors.”