This action might not be possible to undo. Are you sure you want to continue?
Basics of Acoustics:
Sound Sound is a form of energy similar to heat and light. Sound is generated from vibrating objects and can flow through a material medium from one place to another. During generation the kinetic energy of the vibrating body is converted to sound energy. Acoustic energy flowing outwards from its point of generation can be compared to a spreading wave over the surface of water. When an object starts vibrating or oscillating rapidly, a part of their kinetic energy is imparted to the layer of the medium in contact with the object e.g. the air surrounding a bell. The particles of the medium on receiving the energy starts vibrating on their own, and in turn help to impart a portion of their energy to the next layer of air particles, which also starts vibrating. This process continues thereby propagating the acoustic energy throughout the medium. When it reaches our ears it sets the ear-drums into similar kind of vibration and our brain recognizes this as sound.
Acoustics is the branch of science dealing with the study of sound and is concerned with the generation, transmission and reception of sound waves. the application of acoustics in technology is called acoustical engineering. The main sub- disciplines of acoustics are : Aero-acoustics, bio-acoustics, biomedical-acoustics, psycho-acoustics, physical-acoustics, speech communication, ultrasonics, musical – acoustics. psycho – acoustics : psycho – acoustics, concerned with the hearing, perceptions and localization of sound related to human beings. Psycho-Acoustics [taken from e-material] In harnessing sound for various musical instruments as well as for multimedia applications, the effects of sound on human hearing and the various factors involved needs to be analyzed. Psycho-acoustics is the branch of acoustics which deals with these effects. Nature of Sound Waves As the sound energy flows through the material medium, it sets the layers of the medium into oscillatory motion. This creates alternate regions of compression and expansion. This is pictorially represented as a wave, the upper part(i.e., the crest or positive peak) denoting a compression and the lower part(i.e., the trough or negative peak) denoting a rarefraction. Since a sound wave actually represents a disturbance of the medium particles from original position(i.e., before the wave started) it cannot exist in vacuum. Sound waves have two characteristics properties. Firstly , they are said to be longitudinal waves, which means that the direction of propagation of sound is the same as the direction along which the medium particles oscillate. Secondly, sound waves are referred to as mechanical waves. This means that they are capable of being compressed and expand like springs. When they are
compressed, the peaks come closer together, while on expansion the peaks move further apart. On compression the frequency of sound increases and it appear more high pitched , while on expansion, the frequency decreases making it appear more dull and flat.
Spatial and Temporal waves Waves can be of two types : Spatial and temporal waves..
Spatial Waves represent the vibrating states of all particles in the path of a wave at an instant of time. The horizontal axis represents the distance of all particles. Distance of separation between points in the same phase is called the Wavelength. The particles at points O and D have the same state of motion at that instant and are said to be at the Same Phase. The length of the wave between O and D is called the Wavelength.
Temporal Waves represent the state of a single particle in the path of a wave over a period of time. The horizontal axis represents the time period over which the wave flows. The time elapsed between which the particle is in the same phase is called the Time Period. The state of the particle is same at instants O and D, and the particle is said to have undergone one Complete Cycle or Oscillation. The time interval between instants O and D is said to be Time Period of the wave.
Fundamental Characteristics A sound wave has three fundamental characteristics : Amplitude of a wave is the maximum displacement of a particle in the path of a wave from its mean position and is the peak height of the wave. The physical manifestation of amplitude is the intensity of energy of the wave. For sound waves this corresponds to the loudness of sound. Loudness is measured in a unit called decibel denoted by dB.
The second characteristic is Frequency. This measures the number of vibration of a particle in the path of a wave, in one second. Higher is the frequency of the wave larger is the number of oscillation per second. The physical manifestation of frequency of a sound wave is the pitch of sound. As frequency of the sound increases, higher becomes the pitch and more shriller becomes the sound. Frequency is measured in an unit called Hertz and denoted by Hz. A sound of 1 Hz is produced by an object vibrating at the rate of 1 vibration per second. The total range of human hearing lies between 20 Hz at the lower end to 20,000 Hz (or 20 KHz) at the higher end. The Time period of a wave is the time taken to complete one complete oscillation Time period is inversely proportional to the frequency of the sound wave.
The third characteristic is the Waveform. This is the actual shape of the wave when represented pictorially. The physical manifestation is the quality or timbre of sound. This helps us to distinguish between sounds coming from different instruments like guitar and violin.
Dynamic range The term dynamic range is used to mean the ratio of maximum amplitude of undistorted sound in an audio equipment sound like microphone or loud speaker to the amplitude of the quietest sound possible which is often determined by inherent noise characteristics of the device. The term is often used to indicate the ratio of the maximum level of power, current or voltage to the minimum detectable values. In music, dynamic range is used to mean the difference between the quitest and loudest volume of an instrument. For digital audio, the dynamic range is synonymous to the signal to noise ratio(SNR) and is expressed in db. It can be shown that increasing the bit-depth of the digital audio by 1-bit results in its increase in dynamic range by 6 dB approximately.
Musical Sound and Noise Sounds pleasant to hear are called Musical and those unpleasant to our ears are called Noise. Though quite subjective, musical sounds normally originate from periodic or regular vibrations while noise generally originates from irregular or non-periodic vibrations. Musical sounds most commonly originate from vibrating strings, like in guitars and violins, vibrating plates, like in drums and tabla, and vibrating air columns, like in pipes and horns. In all these cases periodic vibration is responsible for the musical sensation. Types of Noise: White noise: white noise is a signal that has the same energy or power for any frequency value, i.e., constant power density. Since a signal physically cannot have power for all frequencies(which would mean it has infinite energy content), a signal can be white noise over a defined frequency range. Other colors of noises are: pink noise, red noise, green noise, blue noise, brown noise, black noise. Tone and Note A Tone is a sound having a single frequency. A tone can be represented pictorially by a wavy curve called a Sinusoidal wave. A tone is produced when a tuning fork is struck with a
padded hammer. The kind of vibration associated with the generation of a tone is called Simple Harmonic Motion. This is executed by a spring loaded weight tied by a flexible string to a point moving around the circumference of a circle at constant speed.
In daily life we do not hear single frequency tones. The sounds we normally hear are a composite mixture of various tones of varying amplitudes and frequencies. Such a composite sound is called a Note. The tone parameters determine the resultant of the note. i.e., the wave form of a note can be derived from the resultant or sum of all its tonal components. The lowest frequency of a note is called the fundamental frequency. All other frequencies are called overtones. Frequency of some overtones may be integeral multiples of the fundamental frequency, these are called harmonics. Thus, the tone having frequency double that of the fundamental is called first harmonics, that having three times that of the fundamental frequency is called second harmonics and so on. It has been observed that presence of more harmonic content adds to the richness of sound, which is referred to as harmonious sound. Two tones representing sinusoidal tone can together resultant in a composite sound may have variable waveforms depending upon the phase difference between the component sound.
Decibels A unit for measuring loudness of sound as perceived by the human ear is Decibel. It involves comparing the intensity of a sound with the faintest sound audible by the human ear and expressing the ratio as a logarithmic value. The full range of human hearing is 120 decibels. Logarithms are designed for talking about numbers of greatly different magnitude such as 56 vs. 7.2 billion. The most difficult problem is getting the number of zeros right. We can use scientific notations like 5.6 X 10^1 and 7.2 X 10^9 but these are awkward to deal with. For convenience we find the ratio between the two numbers and convert it to a logarithm. This gives us a number like 8.1. To avoid the decimal we multiply the number by 10. If we measured one value as 56 HP (horse power - a measure of power) and another as 7.2 billion HP, we say that one is 81 dB greater than the other. Power in dB = 10 log10 (power A / power B)
When speaking in the context of sound waves we can use the relation when comparing the energy content of the waves . since the power or intensity of sound energy is proportional to the square of the amplitude of the sound wave, thus Power in dB = 20 log10 (amplitude A / amplitude B) The usefulness of this becomes apparent when we think how our ear perceives loudness. The softest audible sound has a power of about 10-12 watt/sq. meter and the loudest sound we can hear (also known as threshold of pain ) is about 1 watt/sq. meter. giving a total range of 120 dB. Thus when we speak of a 60 dB sound we actually mean : 60 dB =10 * log10 (Energy content of the measured sound / Energy content of the softest/faintest audible sound) Thus, Energy content of measured sound = 106 * (Energy content of softest audible sound) Secondly our judgement of relative levels of loudness is somewhat logarithmic. If a sound has 10 times more power than another, we hear it twice as loud. (Logarithm of 102 is equal to 2). Most studies of psycho-acoustics deal with sensitivity and accuracy of human hearing. The human ear can respond to a range of amplitudes. People's ability to judge pitch is quite variable. Most subjects studied could match pitches to within 3%. Recognition in terms of timbre is not very well studied but once we have learned to identify a particular timbre, recognition is possible even if loudness and pitch are varied. We are able to perceive the direction of sound source with some accuracy. Left, right and height information is determined by the difference of sound in each ear. We can also understand whether the sound source is moving away or towards us. Threshold of Audibility : The faintest sound that can be heard by a normal human ear. The energy content of this sound wave is 10-12 Threshold of pain : The loudest sound that can be tolerated by a normal humanear. The energy content of this sound is 1 watt/sq.meter. Power difference in decibal = 10 * log10 (power of the measured sound in watt/sq.meter / power of softest/faintest audible sound in watt/sq.meter) Masking One of the most important findings from the study of psycho-acoustics is a phenomenon called Masking which has wielded profound influence in later years of digital processing of sound. Masking occurs due to the limitations of human ear in perceiving multiple sources of sound simultaneously. When a large number of sound waves of similar frequencies are present in the air at the same time then it is seen that higher volume or higher intensity sounds apparently predominates over lower intensity sounds and 'masks' the latter out or makes it inaudible. Thus even though the masked out sound actually exists yet we are unable to perceive it as a separate source of sound. The higher intensity sound is called the Masker and the lower intensity sound is called Masked. This phenomenon effective over a limited range of frequencies beyond which masking may not be perceptible. This range of frequencies is called the Critical Band. Though masking occurs as a result of limitations of
human ear, nevertheless modern sound engineers have converted it to advantage in designing digital sound compressors. Software for compressing sound files in digital compressors utilize the masking phenomenon to throw away irrelevant information from sound files in order to reduce its size and storage space. Temporal masking Another related phenomenon called temporal masking occurs when tones are sound close in time but not simultaneously. A louder tone occurring just before a softer tone masks the latter inaudible. Temporal masking increases as time difference are reduced. Temporal masking suggests that the brain probably integrates sound over a period of time and processes the information as bursts.
Elementary Sound Systems
BASIC SOUND SYSTEMS An elementary sound system consists of 3 main components : microphone, amplifier and loudspeaker. A microphone is a device for converting sound energy to electrical energy. An amplifier is a device which boosts the electrical signals leaving the microphone in order to drive the loudspeakers. A loudspeaker is a device which converts electrical energy back into sound energy.
A microphone records sound by converting the acoustic energy to electrical energy. Sound pressure exists as patterns of air pressure. The microphone changes this information into patterns of electrical current. There are several characteristics that classify microphones :
One classification is based on how the microphone responds to the physical properties of a sound wave (like pressure, gradient etc.). Another classification is based on the directional properties of the microphone. A third classification is based on the mechanism by which the microphone creates an electrical signal.[ as per e-material] Based on the constructional features microphones may be of two types: moving coil types and condenser type. Moving Coil Microphones In a moving-coil or dynamic microphone, sound waves cause movement of a thin metallic diaphragm and an attached coil of wire. A magnet produces a magnetic field which surrounds the coil. As sound impinges on the diaphragm attached to the coil, it causes movement of the coil within the magnetic field. A current is therefore produced proportional to the intensity of the sound hitting the diaphragm. Example : Shure Beta 57A dynamic mirophone, Shure SM58 dynamic microphone
Condenser Microphone Often called the capacitor or condenser microphone, here the diaphragm is actually the plate of a capacitor. The incident sound on the diaphragm moves the plate thereby changing the capacitance and generating a voltage. In a condenser microphone the diaphragm is mounted close to but not touching a rigid back plate. A battery is connected to both pieces of metal which produces an electrical potential or charge between them. The amount of charge is determined by the voltage of the battery, the area of the diaphragm and back plate, and distance between the two. This distance changes as the diaphragm moves in response to sound. When distance changes current flows in the wire as the battery maintains the correct charge. The amount of current is proportional to the displacement of the diaphragm. Example : Marshall Instrument MXL-600 condenser microphone, Shure SM87A condenser microphone A common variant of this design uses a material, usually a kind of plastic, with a permanent charge on it. This is called an Electrets microphone.
Based on the directional properties , microphones may be classified into three types: omni directional, bi-directional and uni-directional. Omni- directional Microphone [Pressure Microphones] It consists of a pressure sensitive element contained in an enclosure open to air on one side. Sound waves creates a pressure at the opening regardless of their direction of origin, the pressure cause the diaphragm to vibrate. This vibration is translated to electrical signal through either of the mechanism[by moving coil or condenser mechanism] The polar plot of a microphone graphs the output of the microphone with equal sound levels being input into the microphone at various angles around the microphone. The polar plot for an pressure microphone is a circle. So desired sound and noise are picked off equally from all directions. Thus it is also called an omni-directional microphone. These are used to record sound coming from multiple sources, e.g., environmental sounds in a wild –life video clip.
Bi-directional [Gradient Microphones] The diaphragm is open to air on both sides so that the net force on it is proportional to the pressure difference. A sound impinging upon the front of the microphone creates a pressure at the front opening. A short time later the sound will travel to the back of the microphone and enters the microphone through the rear opening (180°) striking the diaphragm from the opposite side. How ever since the sound had to travel a longer distance to reach the rear opening, it has dissipated more energy, and would be striking the diaphragm with less force. The diaphragm would therefore be vibrating with differential force. Sounds from the sides [90° and 270°] create identical pressure on both side of the diaphragm and produce no resultant displacement. The polar response resembles the
figure 8. It has maximum response for sound from the openings and minimum response for sound incident from the sides. Also known as bi-directional microphone. Ex: Microtech Gefell UMT800 microphone. Bi-directional microphone is sensitive to sounds coming from two directions: the front and rear. It used to record two source of sound simultaneously ,e.g. conversation between two persons on opposite sides of the table.
Uni-directional microphone [ Cardiod Microphones] A Uni–directional microphone is designed to record sound from a single source, e.g. a single individual speaking. Its construction is similar to that of the bi-directional one, with a single exception. On the rear side of the microphone is a resistive material like foam or cloth near the diaphragm. This tends to absorb some of energy of sound entering through the rear opening. Sound produced at the front of the microphone strike the diaphragm directly from the front while a part of the energy travels to the back, get reduced by the resistive material and strikes the diaphragm with a smaller force from the opposite direction. The diaphragm vibrates with the differential force and the microphone responds to the sound. When sound is produced at the back of the microphone, the direct energy wave gets reduced by the resistive material before striking the diaphragm from the back. A part of the original sound energy traveling a longer distance to the front also gets reduced before striking the diaphragm from the front in the opposite direction. The resistive material is designed in such a way that these two reductions are almost equal with the net effect that two equal and opposing force striking the diaphragm produces no vibration. The microphone therefore do not respond to any sound coming from the rear. Sounds from the sides are cancelled out. Ex: Roland DR-20 Cardioid Microphone.
The polar plot of a graph plotting the output level of the microphone against the angle at which the incident sound is produced . By definition the omni-directional microphone produces equal outputs for all angles of incidence. Hence, its polar plot is a circle. For bidirectional microphone, the outputs are maximum for sounds coming from the front (0°) and rear (180°). The out gradually decreases as the incident sound shifts from the front (and rear) to the sides(90° and 270°). The polar plot therefore resembles the figure’8’ . For a unidirectional microphone , the output is maximum at the front and minimum at the rear, and decrease gradually from the front to the rear, resulting in a decreased but non – zero value at the sides. The polar plot is heart shaped due to which the microphone are also called as ‘cardiod’ microphones.
The most important factors in choosing a particular type of microphone is based on how it picks up sound for the required application. In this respect the following issue should be considered:
2. Overload characteristics
4. Frequency Response
6. Condenser Vs Dynamic
Comparison of Microphone
Amplifiers and Loudspeakers
Amplifier is a device in which a varying input signal controls a flow of energy to produce an output signal that varies in the same way but has a larger amplitude. The input signal may be a current, a voltage, a mechanical motion, or any other signal, and the output signal is usually of the same nature. The ratio of the output voltage to the input voltage is called the voltage gain. The most common types of amplifiers are electronic and use a series of Transistors as their principal components. In most cases, the transistors are incorporated into integrated circuit chips. Amplifier circuits are designed as A,B,AB and C for analogue design and D and E for digital designs. Ex: Kenwood KA-5090R Stereo integrated Amplifier
Loudspeakers converts electrical energy back to acoustic energy. A cone made of paper or fiber, known as the diaphragm, is attached to a coil of wire, kept near permanent magnet. When current from source system is passed through the coil, a magnetic field is generated around the coil. This field interacts with the magnetic field of the permanent magnet generating vibrating forces which oscillates the magnetic diaphragm. The diaphragm oscillates in the same frequency as the original electrical signal and therefore reproduces the same sounds which had been used to encode the signal in the first place. All these components are enclosed in a container which additionally include a suspension system which provide lateral stability to the vibrating components. An important criteria is an even response for all frequencies. However the requirements for good high and low frequency response conflicts with each other.
Thus there are separate units called woofer, midrange and tweeter for reproducing sound of different frequencies. Woofer : 20 Hz to 400 Hz Midrange : 400 Hz to 4 KHz Tweeter : 4KHz to 20 KHz
In a studio the stereo sound is produced artificially by placing individual microphone for individual instruments. Each of the signals generated is called a Track. A device called an audio mixer is used to record these individual tracks and edit them separately. Each of these tracks a number Audio mixer consists number of controls for adjusting the volume, tempo(speed of playback), mute , etc. for each individual tracks. Using these controls each separate track of sound, e.g., guitar track, piano track, voice track, etc. could be edited for adjusting the overall volume and tempo of the audio, as well as for providing special effects like chorus , echo, reverb(multiple echo), panning. Finally all these tracks are combimned into two channels (for stereo sound) or multiple channels (for surround sound).
Digitisation of sound
Analog Representations An analog quantity is a physical value that varies continuously over space and/or time. It can be described by mathematical functions of the type s=f(t), s=f(x,y,z) or s=f(x,y,z,t). Physical phenomena that stimulate human senses like light and sound can be thought of as continuous waves of energy in space. Continuity implies that there is no gap in the energy stream at any point. These phenomena can be measured by instruments which transform the captured physical variable into another space/time dependent quantity called a signal. If the signal is also continuous we say that it is analogous to the measured variable. The instruments are
called sensors and the signals usually take the form of electrical signals. For example, a microphone converts the environmental sound energy into electrical signals and a solar cell converts the radiant energy (light and heat) from the sun into electrical signals.
Analog signals have two essential properties: • The signal delivered by the capturing instrument may take any possible value within the limits of the instrument. Thus the value can be expressed by any real number in the available range. Analog signals are thus said to be amplitude continuous. • The value of the analog signal can be determined for any possible value of time or space variable. Analog signals are therefore also said to be time or space continuous. Digital Representations In contrast to analog signals, digital signals are not continuous over space or time. They are discrete in nature which means that they exist or have values only at certain points in space or instants in time, but not at other points or instants. To use a personal computer to create multimedia presentations, all media components have to be converted to the digital form because that is the form the computer recognizes and can work with. Analog to Digital Conversions The transformation from analog to digital form requires three successive steps : Sampling, Quantization and Code-word generation. Sampling Sampling involves examining the values of the continuous analog signal at certain points in time and thereby isolate a discrete set of values from the continuous suite of values. Sampling is usually done at periodic time or space intervals. For time-dependant quantities like sound, sampling is done at specific intervals of time and is said to create timediscretization of the signal. For time-independent quantities like a static image, sampling is done at regular space intervals (i.e. along the length and breadth of the image) and is said to create space-dicretization of the signal.
The figure illustrates the sampling process. For every clock pulse the instantaneous value of the analog waveform is read thus yielding a series of sampled values. The sampling clock frequency is referred to as sampling rate. For a static image, sampling rate would be measured in the spatial domain i.e. along the length and width of the image area and would actually denote the pixel resolution, while for a time-varying medium like sound, it denotes how many times per second the analog wave is sampled and measured in Hertz. Since the input analog signal is continuous, the value change over space or time. The A/D conversion process takes a finite time to complete hence the input analog signal must beheld constant during the conversion process to avoid conversion problems. This is done by a sample-andhold circuit.
Quantization This process consists of converting a sampled signal into a signal which can take only a limited number of values. Quantization is also called amplitude-discretization. To illustrate this process consider an analog electrical signal whose value varies in a continuous way between 0 mV and +255 mV. Sampling of the signal creates a set of discrete values, which can have any value within the specified range, say a thousand different values. For quantizing the signal we need to fix the total number of values permissible. Suppose we decide that we will consider only 256 of the thousand sampled values that adequately represents the total range of sampled values i.e. from the minimum to the maximum. This enables us to create a binary representation of each of the considered values. We can now assign a fixed number of bits to represent the 256 values considered. Since we know that n binary digits can give rise to 2^n numbers, so a total of 8 bits would be sufficient to represent the 256 values. The number of bits is referred to as the bit-depth of the quantized signal. (Incidentally, we could have considered all the thousand values, but that would have required a larger number of bits and corresponding more computing resource, but more of that later on). Code-word Generation
This process consists of associating a group of binary digits called a code-word to every quantized value. In the above example, the 256 permissible values will be allocated values from 00000000 for the minimum value to 11111111 for the maximum value. Each binary value actually represents the amplitude of the original analog signal at a particular point or instant, but between two such points the amplitude value is lost. This explains how a continuous signal is converted into a discrete signal. The whole process of sampling operation followed by quantization and code word generation is called digitization. The result is a sequence of values coded in binary format. Physically an analog signal is digitized by passing it through a electronic chip called an Analog-toDigital Converter (ADC). Digital to Analog Conversion The digital form of representation is useful inside a computer for storage and manipulation. Since humans only react to physical sensory stimuli, playback of the stored media requires a conversion back to the analog form. Our eyes and ears can only sense the physical light and sound energies, which are analog in nature, not the digital quantities stored inside a computer. Hence the discrete set of binary values need to be converted back to the analog form during playback. For example a digital audio file needs to be converted to the analog form and played back using a speaker for it to be perceived by the human ear. A reverse process to that explained above is followed for this conversion. Physically this is done by passing the digital signal through another electronic chip called a Digital-to-Analog Converter (DAC).
Relation between Sampling Rate and Bit Depth As we increase the sampling rate we get more information about the analog wave. So the resultant digital wave would be a more accurate representation of the analog wave. However increasing the sampling rate also implies we have more data to store and thus require more space. In terms of resources this implies more disk space and RAM and hence greater will be the cost involved. Increasing the number of samples per second also means we require more numbers to represent them. Hence we require a greater bit depth. If we use a lower bit depth than is required we will not be able to represent all the sample values. Hence the advantage of using a higher sampling rate will be lost. On the other hand if we use a lower sampling rate we will get a lesser amount of information regarding the analog wave. So the digital sound will not be an accurate representation of the analog wave. Now if we use a high bit depth, we will have provisions for representing a large number of samples per second. Because of the larger number of bits the size of the sound file will be quite large but because of the low number of samples the quality will be degraded as compared to the original analog wave.
Quantization Error No matter what the choice of bit depth digitization can never perfectly encode a continuous analog signal. An analog waveform has an infinite number of amplitude values but a quantizer has a finite number of intervals. All the analog values between two intervals can only be represented by the single number assigned to that interval. Thus the quantized value is only an approximation of the actual. For example suppose the binary number 101000 corresponds to the analog value of 1.4 V, and 101001 corresponds to 1.5 V and the analog value at sample time is 1.45 V. Because 1010001/2 is not available the quantizer must round up to 101001 or down to 101000. Either way there will be an error with a magnitude of onehalf of an interval. Quantization error (e) is the difference between the actual analog value at sample time and the quantized value, as shown below. Let us consider an analog waveform which is sampled at a, b and c and the corresponding sample values are A, B and C. Considering the portion between A and B, the actual value of the signal at some point x after a is xX but value of the digital output is fixed at xm. Thus there is an error equal to the length mX. Similarly at point y, actual value of analog signal is yY but digital output is fixed at yn. Thus error increases to nY. This continues for all points between a and b until just before b for an actual value of almost bB, we get a sampled value still fixed at bp. The error is maximum at this point and equals to pB which is also almost equal to the height of one step. This maximum error is the quantization error, denoted by e and is equal to one step size of the digital output.
Because of quantization error there is always a distortion of the wave when represented digitally. This distortion effect is physically manifested as noise. Noise is any unwanted signal that creeps in along with the required signal. To eliminate noise fully during digitization we must sample at an infinite rate which is practically impossible. Hence we must find out other ways to reduce the effects of noise. Other than quantization error, noise may also percolate in from the environment, as well as the electrical equipment used for digitization. In characterizing digital hardware performance we can determine the ratio of the maximum expressible signal amplitude to the maximum noise amplitude. This determines the S/N (signal to noise) ratio of the system. It can be shown that the S/N ratio expressed in decibels varies as 6 times the bit-depth. Thus increasing bit-depth during sampling leads to the reduction of noise. To remove environmental noise we need to use good quality microphones and sound proof recording studios. Noises generated from electrical wires may be reduced by proper shielding and earthing of the cables. After digitization noise can also be removed by using sound editing software. These employ noise filters to identify and selectively remove noise from the digital audio file. Each sample value needs to be held constant by a hold circuit until the next sample value is obtained. Thus the maximum difference between the sample value and the actual value of the analog wave is equal to the height of one step. If be the peak to peak height of the wave
be the bit depth, then number of steps is
. Height of each step is .
which is equal
to the quantization error
. Thus we have the relation:
Signal to noise ratio Expressed in decibels the SNR is seen to be directly proportional to the bit-depth: This implies that if bit-depth is increased by 1, during digitization, the signal to noise ratio increases by 6 dB. Importance of Digital Representation The key advantage of the digital representation lies in the universality of representation. Since any medium, be it text or image or sound is coded in a unique form which ultimately results in a sequence of bits, all kinds of information can be handled in the same way. The following advantages are also evident: Storage : The same digital storage device, like memory chips, hard disks, floppies and CDROMs, can be used for all media. Transmission : Any single communication network capable of supporting digital transmission has the potential to transmit any multimedia information. Digital signals are less sensitive to noise than analog signals. Attenuation of digital signals are lesser. Error detection and correction can be implemented. The encryption of the information is possible to maintain confidentiality. Processing : Powerful software programs can be used to analyze, modify, alter and manipulate multimedia data in a variety of ways. This is probably where the potential is the highest. The quality of the information may also be improved by removal of noises and errors. This capability enables us to digitally restore old photographs or noisy audio recordings. Drawbacks of Digital Representation The major drawback lies in the coding distortion. The process of first sampling and then quantizing and coding the sampled values introduces distortions. Also since a continuous signal is broken into a discrete form, a part of the signal is actually lost and cannot be recovered. As a result the signal generated after digital to analog conversion and presented to the end user has little chance of being completely identical to the original signal. Another consequence is the requirement of large digital storage capacity required for storing image, sound and video. Each minute of CD-quality stereo sound requires 10 MB of data and each minute of full screen digital video fills up over 1 GB of storage space. Fortunately compression algorithms have been developed to alleviate the problem to a certain extent.
NOTES TAKEN FROM CD Early Sound Storage
A/D and D/A Converter
Analog Vs Digital Format
PCM- pulse Code Modulation
The process of converting an analog signal into a digital signal is called Pulse Code Modulation or PCM and involves sampling. We use electronic circuits to store sampled values as an electrical signal and then hold the signal constant until the next sampled value. What is PCM
Effects of Sampling Parameter As sampling rate is increased we obtain more data about the input signal and the output signal become a closer approximation of the input signal. To accommodate the
larger number of values due to increased rate, resolution must be increased by increasing the number of bits.
Nyquist’s Sampling Theory
When sampling is done at much higher rate than that prescribed by the theorem, it is called an over sampling. Althogh over sampling can generate a high quality digital signal, it can unnecessarily increase the file size.
Practical sampling Frequencies
To handle the full 20KHz range of human hearing , practical sampling systems use frequencies of 44- 48 KHz. However depending on audio content sampling may also be done at lower rate e.g., to reproduce human speech sampling needs to be done at 11KHz.
Here we take a look at some of the case studies for obtaining the digital output waves by using various range of sampling rates and sampling resolutions. The actual values should be chosen keeping in the mind the comprise between cost and quality Low rate, low resolution
Low rate, High resolution
High rate, low resolution
High rate, High resolution
Bit Rate & File Size File size calculation : (sampling rate in Hz)x(sampling resolution in bits)x(No. of channels)x(Duration of clips in seconds) bits. Which is divided by (8x1024) for conversion to KB(Kilo Bytes)
Benefits of digital representation of Sound 1. Usable in multimedia application
2. Easier data manipulation
3.Possibility of compressing data
4.Copies without generation loss
5. Greater durability of data
6.Possibility of synthetic sound
7. Possibilty of upgradation
Electronic Music & Synthesizer SYNTHESIZERS
Synthesizers are electronic instruments which allow us to generate digital samples of sounds of various instruments synthetically i.e. without the actual instrument being present. The core of a synthesizer is a special purpose chip or IC which has the capability of generating the appropriate signals for producing sound. The sound may be recording of actual sounds or simulation of actual sound through mathematical techniques. The sound produced can be modified by additional hardware components for changing its loudness, pitch etc. Synthesizer Basics Synthesizers can be broadly classified into two categories : FM Synthesizers generate sound by combining elementary sinusoidal tones to build up a note having the desired waveform. Earlier generation synthesizers were generally of FM type, the sounds of which lacked the depth of real-world sounds. Wavetable Synthesizers, created later on, produced sound by retrieving high-quality digital recordings of actual instruments from memory and playing them on demand. Modern synthesizers are generally of wavetable type. The sounds associated with synthesizers are called patches, and the collection of all patches is called the Patch Map. Each sound in a patch map must have a unique ID number to identify it during
playback. The audio channel of a synthesizer is divided into 16 logical channel, each of which is capable of playing a separate instrument.
Wave Table Synthesis
Characteristics of a Synthesizer
• • Polyphony : A synthesizer refers to polyphony if it has ability to play more than one note at a time. polyphony is generally measured or specified as a number of notes or voice. Multitimbral : A synthesizer is said to be multitimbral if it is capable of producing two or more different instrument sounds simultaneously
Each physical channel of the synthesizer is divided into 16 logical channels. Omni mode sgnifies all 16 channel are capable of receving data simultaneously. Polyphony means several instruments can play simultaneously in each logical channel.
MUSICAL INSTRUMENT DIGITAL INTERFACE (MIDI)
What is MIDI
The Musical Instrument Digital Interface (MIDI) is a protocol or set of rules for connecting digital synthesizers to each other or to a computers. Much on the same way that two computers communicate via modems, two synthesizers communicate via MIDI. The information exchanged between two MIDI devices is musical in nature. MIDI information tells a synthesizer, in its most basic mode, when to start and stop playing a specific note, I any. MIDI information can also be more hardware specific . It can tell a synthesizer to change sounds, master volume, modulation devices, and even how to receive information. In more advanced uses, MIDI information can be used to indicate the starting and end point of a song. More recent application include using the interface between computers and synthesizer on the computer. MIDI standard defined a protocol in which the keys instead of producing sound directly, produced data in the form of instructions which can be stored and edited in a personal computer before being played as sound.
MIDI Specification The MIDI specification/protocol has three portions: the hardware standards which defined rules for connecting a musical instrument to a computer, the message standards which defined the format for exchanging data between the instruments and the computer, and the file format in which this data can be stored in a computer and playback. Hardware The MIDI hardware has 3 basic components : the keyboard which is played as instrument and translates key notes into MIDI data, the Sequencer which allow MIDI data to be captured, edited and replayed, and Sound Module which translates the MIDI data to sound.
Channel Message are instructions for specific channel and contain data for the actual key notes. The status byte contain the channel number and the function, which is followed by one or two data bytes with additional parameters like note number, velocity.
File Format The MIDI specifications made provisions to save synthesizer audio in a separate file format called MIDI files. MIDI files are totally different from normal digital audio files (like WAV files) in that they do not contain the audio data at all, but rather the instructions on how to play the sound. These instructions act on the synthesizer chips to produce the actual sound. Because of this, MIDI files are extremely compact as compared to WAV files. They also have another advantage that the music in a MIDI file can easily be changed by modifying the instructions using appropriate software.
General MIDI (GM) Specification
SOUND CARD ARCHITECTURE
The sound card is an expansion board in your multimedia PC which interfaces with the CPU via slots on the mother-board. Externally it is connected to speakers for playback of sound. Other than playback the sound card is also responsible for digitizing, recording and compressing the sound files.
The basic internal components of the sound card include : SIMM Banks : Local memory of the sound card for storing audio data during digitization and playback of sound files. DSP : The digital signal processor which is the main processor of the sound card and coordinates the activities of all other components. It also compresses the data so that it takes up less space. DAC/ADC : The digital-to-analog and analog-to-digital converters for digitizing analog sound and reconverting digital sound files to analog form for playback. WaveTable/FM Synthesizers : For generating sound on instructions from MIDI messages. The wavetable chip has a set of pre-recorded digital sounds while the FM chip generates the sound by combining elementary tones. CD Interface : Internal connection between the CD drive of the PC and the sound card. 16-bit ISA connector : Interface for exchanging audio data between the CPU and sound card. Amplifier : For amplification of the analog signals from the DAC before being sent to the speakers for playback. The external ports of the sound card include : Line Out : Output port for connecting to external recording devices like a cassette player or an external amplifier.
MIC : Input port for feeding audio data to the sound card through a microphone connected to it. Line In : Input port for feeding audio data from external CD/cassette players for recording or playback. Speaker Out : Output port for attaching speakers for playback of sound files. MIDI : Input port for interfacing with an external synthesizer.
Source : www.pctechguide.com
Processing Audio Files
WAV files From the microphone or audio CD player a sound card receives a sound as an analog signal. The signals go to an ADC chip which converts the analog signal to digital data. The ADC sends the binary data to the DSP, which typically compresses the data so that it takes up less space. The DSP then sends the data to the PC’s main processor which in turn sends the data to the hard drive to be stored. To play a recorded sound the CPU fetches the file containing the compressed data and sends the data to the DSP. The DSP decompresses the data and sends it to the DAC chip which converts the data to a time varying electrical signal. The analog signal is amplified and fed to the speakers for playback. MIDI files
The MIDI instruments connected to the sound card via the external MIDI port, or the MIDI files on the hard disk retrieved by the CPU, instructs the DSP which sounds to play and how to play them, using the standard MIDI instruction set. The DSP then either fetches the actual sound from a wavetable synthesizer chip or instructs an FM synthesizer chip to generate the sound by combining elementary sinusoidal tones. The digital sound is then sent to the DAC to be converted to analog form and routed to the speakers for playback. File Formats Wave (Microsoft) File (.WAV) : This is the format for sampled sounds defined by Microsoft for use with Windows. It is an expandable format which supports multiple data formats and compression schemes. Macintosh AIFF (.AIF/ .SND) : This format is used on the Apple Macintosh to save sound data files. An .AIFF file is best when transferring files between the PC and the Mac using a network. RealMedia (.RM/.RA) : These are compressed formats designed for real-time audio and video streaming over the Internet. MIDI (.MID) : Text files containing instructions on how to generate music. The actual music is generated from digital synthesizer chips. Sun JAVA Audio (.AU) : Only audio format supported by JAVA Applets on the Internet. MPEG -1 Level 3 (.MP3) : Highly compressed audio files providing almost CD-quality sound.
This action might not be possible to undo. Are you sure you want to continue?