Professional Documents
Culture Documents
Module 4
Module 4
o n
Introduction
Audio signal or analog signal uses PCM Digitizationprocess
which involves SAMPLING.
Sampling rate > or = to : 2(Highest frequency component).
Band-limited Signal: When the BW of comm. Channel to be used is
less than minimum sampling rate then signal needs to be bandlimited.
Speech Signal:(15Hz-10kHz)
Max. freq. component is 10kHz
Minimum Sampling rate: 2x10=20 ksps
Bits per sample=12bits per sample
Bit rate used: (Sampling rate X Bits per sample) =240
kbps Audio Signal:(50Hz-20kHz)
General
Max. freq. component is 20kHz
Minimum Sampling rate: 2x20=40 ksps
Bits per sample=16 bits per sample
Bit rate used: 1.28 Mbps
How t he concept of Audio
Compression comes?
In most MM applications, BW of communication channel that are
available does not support such high bit rates of 240kbps and 1.28Mbps
but offers less bit rates….
So what is the solution?????? There are two solutions…they are………
Solution 1:Audio signal is sampled at lower rate! (BAD ONE)
Merit: Simple to implement
Demerit: 1. Quality of decoded signal is reduced resulting in loss
of
HF components from orignal signal
2. Use of fewAlgorithm
Solution 2: Compression bps resultscan
in high QN (GOOD ONE)
be used!
Give good perceptual quality
Reduced BW requirement
Further discussion is on Audio Compression Methods……
1. Different ial Pulse Code
Modulat ion (DPCM)
Differential pulse code modulation is a derivative of the standard
PCM
It uses the fact that the range of differences in amplitudes between
successive samples of the audio waveform is less than the range
of the actual sample amplitudes
Hence fewer bits are required to represent the difference signals
than in case of PCM for the same sampling rate.
It reduces the bit rate requirements from 64kbps to 56kbps.
DPCM Principles
Operat ion of
DPCM:
Encoder
Previously digitized sample is held in the register (R)
The DPCM signal is computed by subtracting the current contents (Ro)
from the new output by the ADC (PCM)
The register value is then updated before transmission
DPCM=PCM-R0
Decoder
Decoder simply adds the previous register contents (PCM) with
the DPCM
R1=R0+DPCM
Limitation of DPCM:
ADC operations introduces quantization errors each time and
will introduce cumulative errors in the value stored in the register(R).
So previous value (R) is only approximation!!!!!!!! . . . W e really ned
more accurate version of previous signal that we got
in................
2. Third Order Predict ive
DPCM
To eliminate this noise effect predictive methods are used to predict a
more accurate version of the previous signal (use not only the current
signal but also varying proportions of a number of the preceding
estimated signals)
These proportions used are known as predictor coefficients
Difference signal is computed by subtracting varying proportions of the
last three predicted values from the current output by the ADC.
It reduces the bit rate requirements from 64kbps to 32kbps.
decode
r
Operat ion of Third Order
Predict ive DPCM
R1, R2, R3 will be subtracted from PCM
APPLICATION
Generated sound at this low rate is very synthetic and so LPC encoders
are used primarily in Military Applications where BW is all important.
Linear predictive coding (LPC) signal encoder and
decoder
THANKS…..
6. Code-excit ed LPC
(CELPC)
The synthesiser used in most LPC decoders are based on a very basic
model of the vocal tract
These are intended for use with applications in which the amount of
bandwidth available is limited but the perceived quality of the speech
must be of acceptable standard for use in various multimedia
applications
In CELPC model instead of treating each digitized segment
independently for encoding purposes, just a limited set of segments
are used, each known as a wave template
A pre computed set of templates are held by the encoder and the
decoder in what is known as the template codebook
Each of the individual digitized samples that make up a particular
template in the codebook are differently encoded
All coders of this type have a delay associated with them which is
incurred while each block of digitized samples is analysed by the encoder
and the speech is reconstructed at the decoder
The combined delay value is known as the coder’s processing delay
In addition before the speech samples can be analysed it is necessary to
buffer the block of samples
The time to accumulate the block of samples is known as the algorithmic
delay
The coders delay an important parameter in conventional telephony
application, a low-delay coder is required whereas in an interactive
application delay of several seconds before the speech starts is acceptable
Percept ual Coding
(PC)
LPC and CELP are used only for telephony applications and hence
compression of speech signal.
PC are designed for compression of general audio such as that associated
with a digital television broadcast.
Use a psychoacoustic model (this exploits a number of limitations of
human ear).
Using this approach, sampled segments of the source audio waveform
are analysed – but only those features that are perceptible to the ear are
transmitted.
E.g although the human ear is sensitive to signals in the range 15Hz to
20 kHz, the level of sensitivity to each signal is non-linear; that is the ear
is more sensitive to some signals than others.
WHAT IS THAT LIMITATION OF
EARS..................?????? HUMAN
................. MASKING.........EFFECT
Frequency Masking: When multiple signals are present in audio, a
strong signal may reduce the level of sensitivity of the ear to other
signals which are near to it in frequency.
Temporal masking: When the ear hears a loud sound it takes a short
but a finite time before it could hear a quieter sound.
Psychoacoustic Model is used to identify those signals which are
influenced by masking and these are then eliminated from the
transmitted signal........and hence compression is achieved . . .
Sensitivity of the
The dynamic range of ear is defined as the loudest sound it can hear to
ear:
the quietest sound
Sensitivity of the ear varies with the frequency of the signal as
shown....in next slide.
The ear is most sensitive to signals in the range 2-5kHz hence the signals
in this band are the quietest the ear is sensitive to.
Vertical axis gives all the other signal amplitudes relative to this signal
(2-5 kHz).
In the fig. although the Signal A & B have same relative amplitude, signal
A would be heard only because it is above the hearing threshold and B is
below the hearing threshold.
Sensit ivit y of t he ear
varieswit h t he frequency
as....
Frequency
Masking
When an audio sound consists of multiple frequency signals is
present, the sensitivity of the ear changes and varies with the
relative amplitude of the signal
Conclusions from
diagram:
Signal B is larger than signal A. This causes the basic sensitivity curve of
the ear to be distorted in the region of signal B
Signal A will no longer be heard as it is within the distortion band.
APPLICATION:
Interpersonal: Video Telephony & Video Conferencing
Interactive: access to stored video in various forms
Entertainment: Digital TV & MOD/VOD
Zig-Zag Scan,
Quantization Run-length
• major reduction coding
• controls ‘quality’
Predictive Frame (P-
frame)
The encoding of the P-frame is relative to the contents of either a
preceding I-frame or a preceding P-frame
P-frames are encoded using a combination of motion estimation and motion
compensation
The accuracy of the prediction operation is determined by how well any
movement between successive frames is estimated. This is known as the
motion estimation
Since the estimation is not exact, additional information must also be sent to
indicate any small differences between the predicted and actual positions
of the moving segments involved. This is known as the motion
compensation
No of P frames between I-frames is limited to avoid error propagation
(since any error present in the first P-frame will be propagated to the next)
No. Of frames between a P-Frame and immediately preceding I-or-P
Frame is called prediction span(M)
Frame Sequences I-, P- and B-
frames
Bi-directional Frame (B-
frame)
For fast moving video e.g movies, B-frames (Bi-directional) are
used. Their contents are predicted using the past and the future
frames.
B-frame is encoded relative to the preceding as well as the succeeding
I & P frame.
B-frame results in encoding delay because time needed to wait for the
next I or P frame in the sequence.
B- frames provides highest level of compression and because they are
not involved in the coding of other frames they do not propagate
errors.
PB-Frames
PB-frame: It does not refer to a new frame type as such but rather
the way two neighbouring P- and B-frames are encoded as if they
were a single frame
D-
frame
This is application specific used in MOD/VOD applications.
In these application user wish for fast forward or rewind through the
movie, this requires the compressed video to be decompressed at a
much higher rate. To support this function encoded bit stream also
contains D-frame.
Motion Estimation & Motion
Compensation (Encoding of P & B frame)
Motion estimation involves comparing small segments of two consecutive
frames for differences, and as difference is detected a search is carried out
to determine which neighbouring segments the original segment has
moved.
To limit the time for search the comparison is limited to few segments
P-Frame: We will estimate the motion that has taken place between the
frame being encoded and preceding I or P frame (in case of P frame)
B-Frame: We will estimate the motion that has taken place between the
frame being encoded and preceding I or P frame as well as succeeding I
or P frame. (in case of B frame).
P-frame encoding
The digitized contents of the Y matrix associated with each frame are first
divided into a two-dimensional matrix of 16 X 16 pixels known as a
MACROBLOCK
MB consists
: 4 DCT blocks (8X8) for the luminance
of
1 signals
DCT block each for the two chrominance signals (Cb and Cr).
Each MB has an address associated with it.
To encode a p-frame the contents of each macroblock in the frame –
known as the target frame are compared on a pixel-by-pixel basis with the
contents of the preceding I or P frames (reference frames)
I or P P
Reference Target Frame
Frame
SEARCH........SEARCH.........SEARCH..............O/P may be...:-
If a close match is found then only the address of the MB is encoded
If a match is not found the search is extended to cover an area around the
MB in the reference frame.
All the possible MB in the selected search area in reference frame
are searched for a match………………………..
Case 1:if a close match is found then two parameters are
encoded:
Motion Vector(V): It indicates the (x,y) offset of the MB encoded. It is
further encoded by differential encoding
Prediction Error: It consists of three matrices (one each for Y, Cb, Cr)
each of which contains the difference values between those in Target MB
and set of pixels in the search area in the Reference frame that produced the
closed match. This is encoded by same method as used for I frame
Case 2: If a match is not found e.g if the moving object
is moved out of the extended search area
MB is encoded independently in the same way as MBs in the I frame.
Match is said to be found if the mean of absolute errors in all the pixel
positions in the difference Difference MB (MD) is less than a given
threshold.
B-frame encoding
If the two contents are the same, only the address of the macroblock
in the reference frame is encoded
If the two contents are very close, both the motion vector and the
difference matrices associated with the macroblock in the reference
frame are encoded
If no close match is found, then the target macroblock is encoded in
the same way as a macroblock in an I-frame
In order to carry out its role, the motion estimation unit
containing the search logic, utilizes a copy of the (uncoded)
reference frame
Implementation schematic – B-
frames