You are on page 1of 30

RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata McGraw-Hill Education

INSTRUCTOR’S MATERIALS
PowerPoint Slides

Chapter - 5

Audio

PROPRIETARY MATERIAL. © 2013 The McGraw-Hill Companies, Inc. All rights reserved. No part of this PowerPoint slide may be displayed, reproduced or distributed in any form or by
any means, without the prior written permission of the publisher, or used beyond the limited distribution to teachers and educators permitted by McGraw-Hill for their
individual course preparation. If you are a student using this PowerPoint slide, you are using it without permission. 1
RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata McGraw-Hill Education

Contents
• Acoustics and Sound Waves
• Types and Properties of Sound
• Psycho-acoustics
• Components of an Audio System
• Digital Audio
• Synthesizers and MIDI
• Digital Audio Processing
• Speech
• Sound Card

INSTRUCTOR'S MATERIALS Chapter – 5 : 2


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata McGraw-Hill Education

Contents
• Audio Transmission
• Audio File Formats
• Surround Sound Systems
• Digital Audio Broadcasting
• Audio Processing Software

INSTRUCTOR'S MATERIALS Chapter – 5 : 3


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Acoustics & Sound Waves


• Acoustics
– Sound is a form of energy similar to heat and light
– Sound is generated from vibrating objects and flows through a material medium
– Acoustics deals with generation, transmission and reception of sound

• Sound waves
– As sound energy propagates it sets up alternate regions of compression and rarefaction
– Pictorially represented as waves , +ve peak denote compression, -ve peak denote rarefaction
– Sound waves are longitudinal waves hence travels in same direction as oscillation
– Sound waves are mechanical waves and hence can be compressed and expanded like springs

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 4


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Acoustics & Sound Waves


• Sound waves
– Amplitude is the maximum displacement of a particle in the
path of a wave
– Indicates the energy content of the wave, and loudness of
sound
– Measured in a unit called ‘decibel’ (dB)

– Frequency indicates number of vibrations of a particle in the


path of a wave
– Indicates the pitch of sound
– Measured in an unit called ‘hertz’ (Hz)
– Total range of human hearing is from 20 Hz to 20 kHz

– Waveform is the pictorial shape of the wave


– Indicates the quality or timber of sound
– Determined by the elementary components of a wave
– May be sinusoidal, square, triangular, irregular etc.

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 5


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata McGraw-Hill Education

Types and Properties of Sound


• Music and noise
– Sound pleasant to hear are called music, and those unpleasant to
hear are called noise
– Musical sound originate from regular and periodic vibrations,
noise
from irregular vibrations
– A note is a unit of fixed pitch, notes are denoted as A, B, C, D, E, F
– Ways of changing pitch (f) of vibrating string are by changing length
L, diameter d, tension T, density ρ i.e. f = (1/2L)√(T/ ρ)

• Tone and note


– Tone is a sound having a single frequency, note is a combination of
multiple tones
– Lowest frequency tone of a note is called ‘fundamental’, all other
tones are called ‘overtones’
– Frequencies of overtones which are integral multiples of
fundamental, are called ‘harmonics’
– Waveform of a note depends on amplitudes, frequencies and phase
difference of elementary tones
CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 6
RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata McGraw-Hill Education

Types and Properties of Sound


• Dynamic range
– Ratio of smallest and largest values of a variable quantity
– In acoustics, ratio of largest amplitude of undistorted sound to the quietest sound possible
– Dynamic range is often synonymous with SNR and expressed in decibels
– Human ear has dynamic range of 120 dB i.e. loudest sounds have energy 1012 times quietest sounds
– Electronic equipments use dynamic range compression (DRC) to reduce dynamic range values

• Colors of noise
– White noise : a signal which has same energy for any frequency value i.e. constant power density
– Pink noise : a signal whose power density decreases at rate of 3 dB per octave
– Brown noise : a signal whose power density decreases at rate of 6dB per octave
– Red noise : oceanic ambient noises from distant sources
– Green noise : background noise of the world
– Blue noise : a signal whose power density increases at rate of 3 dB per octave
– Purple noise : a signal whose power density increases at rate of 6 dB per octave
– Black noise : a noise capable of canceling other noises and producing silence

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 7


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata McGraw-Hill Education

Components of Audio System


• Microphone
– Microphone converts environmental sound into electrical signals
– Structurally microphones may be divided into moving-coil type and condenser type
– Moving coil microphone consists of a diaphragm attached to a wire coil, placed in a magnetic field
– Sound waves hitting the diaphragm causes it and the attached coil to vibrate
– This causes a current to be produced within the coil, which is the microphone output
– In a condenser microphone, the diaphragm is attached to one plate of a capacitor
– Sound waves hitting the diaphragm causes it and the capacitor plate to vibrate
– This causes a change in capacitance and hence a current to be produced in the external circuit

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 8


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata McGraw-Hill Education

Components of Audio System


• Microphone
– Functionally microphones may be divided into omni-directional, bi-directional, uni-directional types
– Omni-directional microphones respond to sound coming from all directions
– Sound from any direction enters through the opening and hits the diaphragm causing it to vibrate
– Bi-directional microphones respond to sound coming from front and back but not from the sides
– Sounds from sides enter through two openings and hit diaphragm with equal and opposite forces.
– Uni-directional microphones respond to sound coming only from the front
– A resistive material placed behind the diaphragm absorbs energy of the sound produced at the back
– Polar plots indicate response of microphone corresponding to sounds coming from different angles
– Omni-directional microphones respond equally for all directions and hence its polar plot is a circle
– Bi-directional microphones respond only from front and back, hence the plot looks like figure ‘8’
– Uni-directional microphones respond only from front, hence the plot is heart-shaped

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 9


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata McGraw-Hill Education

Components of Audio System


• Amplifier
– Amplifiers boost the amplitude levels of electrical signals produced from microphones
– Class-A amplifiers use 100% of the input cycle for generating output
– Class-A amplifiers are not used as audio amplifiers due to generation of large amount of heat
– Class-B amplifiers use 50% of the input cycle for amplification
– Mostly used in pairs in a push-pull configuration each amplifier working for separate halves of input
– Class-C amplifiers use less than half of input cycle for amplification
– Distortions produced can be reduced by using a negative feedback
– Class-D digital amplifiers use a set of transistors as switches
– Class-E amplifiers use PWM to produce output waves whose widths are proportional to desired
amplitudes

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 10


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata McGraw-Hill Education

Components of Audio System


• Loudspeaker
– Converts electrical signals back to environmental sounds
– Current is passed through a coil of wire attached to a diaphragm, kept in a magnetic field
– This generates vibrating forces which vibrates the diaphragm and generates sound
– To get good response, loudspeakers are divided into a number of units
– Physical properties of each unit is designed to make it suitable for a specific frequency range
– Woofers are thick and heavy, and handle low frequency ranges from 20 Hz to 400 Hz
– Mid-range units are designed to handle middle frequency ranges from 400 Hz to 4 kHz
– Tweeters are thin and light, and handle high-frequency ranges from 4 kHz to 20 kHz
– Modern speaker systems also have a sub-woofer for handling very low frequencies below 100 Hz

• Audio mixer
– Used to record individual tracks of audio and store them in editable format
– Each track is typically recorded via a separate microphone, in a multi-sound scenario like orchestra
– Sounds in the recorded tracks can subsequently be modified w.r.t. loudness, pitch, tempo etc.
– Controls may also be provided for special effects like chorus, echo, reverb, panning etc.
– Finally the multi tracks are combined into two or five channels for different playback systems

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 11


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata McGraw-Hill Education

Digital Audio
• Overview
– An analog audio signal of frequency f needs to be sampled at F = 2f as per Nyquist’s postulate
– If b be bit-depth and c number of channels, then data rate D = F.b.c
– If T be the duration in seconds, then number of samples N = F.T
– File size S = D.T
– For audio-CD quality sampling is done at 44.1 kHz, 16-bit, stereo mode
– One minute of uncompressed CD-quality audio takes up about 10 MB of space
– For recording human speech, sampling rate need not exceed 11 kHz, as sounds produced from the
human voice box do not exceed 5.5 kHz in frequency

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 12


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Synthesizers and MIDI


• Overview
– Synthesizers are electronic equipments that generate instrument sounds synthetically
– FM synthesizers generate sounds by combining elementary sinusoidal tones
– Wavetable synthesizers generate sounds by retrieving high-quality digital recordings of instruments
– Polyphony refers to ability of a synthesizer to play more than one note at a time
– A synthesizer is multi-timbral if it can produce two or more instrument sounds simultaneously
– MIDI (Musical Instrument Digital Interface) is a protocol for connecting synthesizers with computers

• MIDI hardware
– MIDI hardware requires the use of a 5-conductor cable for interfacing
– An interface adapter with 25-pin serial connector and MIDI connectors is usually used for the purpose

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 13


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Synthesizers and MIDI


• MIDI connections
– A MIDI controller is a keyboard which generates digital instructions
– These instructions are transmitted to a sound module which interprets them and produce sound
– The MIDI stream flows out of the controller through MIDI-OUT port and into the sound module
through MIDI-IN port
– A MIDI sequencer allows MIDI data sequences to be captured, stored, edited , combined,
replayed
– In a PC-based MIDI system, music composition is done using software
– A MIDI interface card in the PC converts this to MIDI data which is sent to sound modules

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 14


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Synthesizers and MIDI


• MIDI messages
– MIDI based instructions are called messages
– Messages use 3-byte format, a status byte and two data bytes
– Status byte contains the function or operation to be performed,
data bytes contain additional parameters
– Provision for 16 logical channels by a 4-bit number in status byte
– Channel messages contain musical information to be played in a
specific channel
– System messages are not channel specific and no channel
number is included

• MIDI file format


– MIDI files do not contain sampled audio data like WAV file
– They contains instructions on how to play sound
– These instructions are interpreted by a MIDI chip to
generate
actual sounds
– Instructions are stored in text format which makes MIDI files
very compact in size
CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 15
RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Synthesizers and MIDI


• General MIDI specifications
– GM specifications defines a standard patch map
– GM-1 specification contains 128 preset instruments
– Instrument sounds are grouped by families

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 16


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Digital Audio Processing


• Basic operations
– Reading and playing an audio file
– Visual display of waveform : plot of amplitude vs. time
– Normalization and re-sampling the audio signal
– Writing audio data to a file
– Generating sinusoidal waveforms of specific amplitude and frequency
– Adding multiple tonal waveforms to form a composite note
– Adding a noise to a signal
– Associated MATLAB commands : wavread, wavwrite, wavplay, sound, plot

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 17


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Digital Audio Processing


• Temporal domain filtering (uniform filter)
– Uniform temporal domain filtering is expressed in the form y = H(b, 1, x)
– Here x = [x(1), x(2), …, x(n)] be an n-element input audio signal
– H is the filter with an m-element coefficient b = [b(1), b(2), …, b(m)]
– Then y = [y(1), y(2), …, y(n)] is the filtered output, expressed as a linear combination of the input
signal and filter coefficients
– The i-th output element is given by : y(i) = b(1).x(i) + b(2).x(i-1) + … + b(m).x(i-m+1)
– The effect of the filter is to uniformly scale the amplitude of the input audio signal
– As an example : for x = [1, 2, 3, 4,5], b = [1, -0.5], we get y = [1, 1.5, 2, 2.5, 3]
– Associated MATLAB commands : filter

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 18


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Digital Audio Processing


• Temporal domain filtering (non-uniform filter)
– Non-uniform temporal domain filtering is expressed in the form y = H(b, a, x)
– Here x = [x(1), x(2), …, x(n)] be an n-element input audio signal
– H is the filter with an m-element coefficients : a = [a(1), a(2), …, a(m)], b = [b(1), b(2), …, b(m)]
– Then y = [y(1), y(2), …, y(n)] is the filtered output, expressed as a non-linear combination of the
input signal and filter coefficients
– The i-th output element is given by : a(1).y(i) + a(2).y(i-1) + … + a(m).y(i-m+1) = b(1).x(i) + b(2).x(i-1) +
… + b(m).x(i-m+1)
– The effect of the filter is to non-uniformly scale the amplitude of the input audio signal
– As an example : for x = [1, 2, 3, 4,5], a = [1, -1], b = [1, -0.5], we get y = [1, 2.5, 4.5, 7, 10]
– Associated MATLAB commands : filter

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 19


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Digital Audio Processing


• Window functions
– Rectangular window : has value 1 over a specific interval, 0 elsewhere

– Bartlett window : has a triangular form

– Gaussian window : has a bell shaped form

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 20


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Digital Audio Processing


• Window functions
– Hamming window :

– Blackman window :

– Results of applying windowing functions to a sinusoidal signal is shown


– Associated MATLAB commands : window

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 21


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Digital Audio Processing


• Frequency domain conversions
– An audio signal is converted to frequency domain to determine its frequency components
– Frequqncy plot of a 60-Hz sinusoidal signal shows a peak at 60 Hz
– DFT of the signal is computed for frequency domain conversion
– A spectrogram is the DFT plotted as frequency against time with Z-axis representing amplitude
– Associated MATLAB commands : fft, fftshift, specgram

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 22


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Digital Audio Processing


• Cross correlation
– A measure of similarity of two waveforms
– Computed by sliding one waveform over anotehr and computing product at each point
– Similar waveforms generate high correlation values
– For a m-element sliding signal, correlation signal has a total of 2m-1 elements
– Auto correlation is the cross correlation of a signal with itself
– Auto correlation is used to detect inherent periodicities within a signal, as shown below right
– Associated MATLAB commands : xcorr

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 23


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Digital Audio Processing


• Waveforms with variable frequencies
– A linearly increasing frequency is generated by specifying two frequency values at different times
– A concave or covex spectrogram leads to fluctuating pitches
– Associated MATLAB commands : chirp, specgram

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 24


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Digital Audio Processing


• Masking
– Masking can be demonstrated by playing two tones having small differences in frequency
– Two tones in such a case cannot be clearly distinguished, as thay lie within the same critical band
– When frequencies are made largely different, two distinct sounds can be heard

• Cepstral analysis
– Cepstrum of a signal is inverse FFT of logarithm of the FFT of the signal
– Let x is the time domain signal and X be its frequency domain representation
– Then if X’ represents log(X) and x’ be the inverse DFT of X’, then x’ is the cepstrum of x
– The cepstrum is useful in detecting echos and reverberation of a fundamental frequency of signal
– Associated MATLAB commands : cceps

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 25


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Digital Audio Processing


• Wavelet analysis (single resolution)
– Wavelet analysis is equivalent to passing the signal through a set of low-pass and high-pass filters
– Let x be the input signal having n elements
– Output w is generated by passing x through a low-pass filter with coefficients (a, b)
– Output z is generated by passing x through a high-pass filter with coefficients (-b, a)
– No.of elements in output is halved to produce signals wd and zd, process called down-sampling
– Entire step called analysis
– To reconstruct the signals, they are up-sampled to produce wu and zu
– Output wu is passed through a synthesis filter having coefficients (b, a) to produce wf
– Output zu is passed through a synthesis filter having coefficients (a, -b) to produce zf
– Entire step is called synthesis
– The signals wf and zf are added up to reconstruct the original signal

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 26


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Digital Audio Processing


• Wavelet analysis (single resolution)
– Analysis filtering : w(n) = a.x(n) + b.x(n-1), z(n) = -b.x(n) + a.x(n-1)
– Down-sampling : wd(n) = w(n/2), zd(n) = z(n/2), for n even, values do not exist for n odd
– Synthesis up-sampling : wu(n) = wd(n/2), zu(n) = zd(n/2), for n even, wu(n) = 0, zu(n) = 0 for n odd
– Synthesis filtering : wf(n) = b.wu(n) + a.wu(n-1), zf(n) = a.wu(n) - b.wu(n-1)
– Reconstruction : y(n)= wf(n) + zf(n)
– For a Haar wavelet, filter coefficients are equal and given by : a = b = 1/√2
– Example : if x = [9, 8, 3, 5], then wd = (1/√2)[16, 8], zd = (1/√2)[2, -2], wf = [8, 8, 4, 4], zf = [1, -
1, -1, 1]
– Associated MATLAB commands : wavedec, appcoef, detcoef, wrcoef

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 27


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Digital Audio Processing


• Wavelet analysis (multi resolution)
– Analysis filtering (level-1): w1(n) = a.x(n) + b.x(n-1), z1(n) = -b.x(n) + a.x(n-1)
– Down-sampling (level-1) :
– wd1(n) = w1(n/2), zd1(n) = z1(n/2), for n even, values do not exist for n odd
– Analysis filtering (level-2) :
– w2(n) = a.wd1(n) + b.wd1(n-1), z2(n) = -b.wd1(n) + a.wd1(n-1), w3(n) = a.zd1(n) + b.zd1(n-
1), z3(n) = -b.zd1(n) + a.zd1(n-1)
– Down-sampling (level-2) :
– wd2(n) = w2(n/2), zd2(n) = z2(n/2), for n even
– wd3(n) = w3(n/2), zd3(n) = z3(n/2), for n even
– Values do not exist for n odd

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 28


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Digital Audio Processing


• Wavelet analysis (multi resolution)
– Synthesis up-sampling (level-2): wu2(n) = wd2(n/2), zu2(n) = zd2(n/2), for n even, wu3(n) =
wd3(n/2), zu3(n) = zd3(n/2), for n even, wu2(n) = 0, zu2(n) = 0, wu3(n) = 0, zu3(n) = 0, for n odd
– Synthesis filtering (level-2): wf2(n) = b.wu2(n) + a.wu2(n-1), zf2(n) = a.zu2(n) – b.zu2(n-1), wf3(n) =
b.wu3(n) + a.wu3(n-1), zf3(n) = a.zu3(n) – b.zu3(n-1)
– Reconstruction (level-2) : y2 = wf2 + zf2, y3 = wf3 + zf3
– Up-sampling (level-1) : wu4(n) = y2(n/2), zu4(n) = y3(n/2), for n even, wu4(n) = 0, zu4(n) = 0, for n
odd
– Synthesis filtering (level-1) : wf4(n) = b.u4(n) + a.u4(n-1), zf4(n) = a.zu4(n) – b.zu4(n-1)
– Reconstruction (level-1) : y(n)= wf4(n) + zf4(n)

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 29


RANJAN PAREKH Principles of MULTIMEDIA, 2E © 2013 Tata
McGraw-Hill Education

Digital Audio Processing


• Wavelet analysis (multi resolution)
– For example if x = [9, 7, 3, 5], cA1 = (1/√2)[16, 8], cD1 = (1/√2)[2, -2], cA2 = [12],
cD2 = [4],
D1 = [1, -1, -1, 1], A1 = [8, 8, 4, 4]
– A1 can further be split into A2 and D2 : A2 = [6, 6, 6, 6], D2 = [2, 2, -2, -2]
– Thus y = A2 + D1 + D2 = [6, 6, 6, 6] + [1, -1, -1, 1] + [2, 2, -2, -2] = [9, 7, 3, 5]
– Associated MATLAB commands : wavedec, appcoef, detcoef, wrcoef

CONTENTS INSTRUCTOR'S MATERIALS Chapter – 5 : 30

You might also like