Professional Documents
Culture Documents
Introduction
Audio signal or analog signal uses PCM Digitization process which
involves SAMPLING.
Sampling rate > or = to : 2(Highest frequency component).
Band-limited Signal: When the BW of comm. Channel to be used is less
than minimum sampling rate then signal needs to be bandlimited.
Speech Signal:(15Hz-10kHz)
Max. freq. component is 10kHz
Minimum Sampling rate: 2x10=20 ksps
Bits per sample=12bits per sample
Bit rate used: (Sampling rate X Bits per sample) =240 kbps
General Audio Signal:(50Hz-20kHz)
Max. freq. component is 20kHz
Minimum Sampling rate: 2x20=40 ksps
Bits per sample=16 bits per sample
Bit rate used: 1.28 Mbps
How the concept of Audio Compression
comes?
In most MM applications, BW of communication channel that are
available does not support such high bit rates of 240kbps and 1.28Mbps
but offers less bit rates….
So what is the solution?????? There are two solutions…they are………
Solution 1:Audio signal is sampled at lower rate! (BAD ONE)
Merit: Simple to implement
Demerit: 1.Quality of decoded signal is reduced resulting in loss of
HF components from orignal signal
2. Use of few bps results in high QN
Solution 2: Compression Algorithm can be used! (GOOD ONE)
Give good perceptual quality
Reduced BW requirement
Further discussion is on Audio Compression Methods……
1. Differential Pulse Code Modulation
(DPCM)
Differential pulse code modulation is a derivative of the standard
PCM
It uses the fact that the range of differences in amplitudes between
successive samples of the audio waveform is less than the range of
the actual sample amplitudes
Hence fewer bits are required to represent the difference signals
than in case of PCM for the same sampling rate.
It reduces the bit rate requirements from 64kbps to 56kbps.
DPCM Principles
Operation of DPCM:
Encoder
Previously digitized sample is held in the register (R)
The DPCM signal is computed by subtracting the current contents (Ro)
from the new output by the ADC (PCM)
The register value is then updated before transmission
DPCM=PCM-R0
Decoder
Decoder simply adds the previous register contents (PCM) with the
DPCM
R1=R0+DPCM
Limitation of DPCM:
ADC operations introduces quantization errors each time and will
introduce cumulative errors in the value stored in the register(R).
So previous value (R) is only approximation!!!!!!!! ...We really need
more accurate version of previous signal that we got in................
2. Third Order Predictive DPCM
To eliminate this noise effect predictive methods are used to predict a
more accurate version of the previous signal (use not only the current
signal but also varying proportions of a number of the preceding
estimated signals)
These proportions used are known as predictor coefficients
Difference signal is computed by subtracting varying proportions of the
last three predicted values from the current output by the ADC.
It reduces the bit rate requirements from 64kbps to 32kbps.
Third-order predictive DPCM signal encoder and
decoder
Operation of Third Order Predictive
DPCM
R1, R2, R3 will be subtracted from PCM
The values in the R1 register will be transferred to R2 and R2 to R3 and the
new predicted value goes into R1
Decoder operates in a similar way by adding the same proportions of the
last three computed PCM signals to the received DPCM signal
3. Adaptive differential PCM
(ADPCM)
FirstADPCM International standard is defined in ITU-T
Recommendation G.721
Savings of bandwidth is possible by varying the number of bits used for
difference signal depending on its amplitude (fewer bits to encode smaller
difference signals)
Based on the same principle as the DPCM except an eight-order predictor is
used and the number of bits used to quantize each difference is varied
This can be either 6 bits – producing 32 kbps – to obtain a better quality
output than with third order DPCM, or 5 bits- producing 16 kbps – if lower
bandwidth is more important
APPLICATION
Generated sound at this low rate is very synthetic and so LPC encoders
are used primarily in Military Applications where BW is all important.
Linear predictive coding (LPC) signal encoder and decoder
THANKS…..
6. Code-excited LPC (CELPC)
The synthesiser used in most LPC decoders are based on a very basic
model of the vocal tract
These are intended for use with applications in which the amount of
bandwidth available is limited but the perceived quality of the speech
must be of acceptable standard for use in various multimedia
applications
In CELPC model instead of treating each digitized segment
independently for encoding purposes, just a limited set of segments are
used, each known as a wave template
A pre computed set of templates are held by the encoder and the
decoder in what is known as the template codebook
Each of the individual digitized samples that make up a particular
template in the codebook are differently encoded
All coders of this type have a delay associated with them which is
incurred while each block of digitized samples is analysed by the encoder
and the speech is reconstructed at the decoder
The combined delay value is known as the coder’s processing delay
In addition before the speech samples can be analysed it is necessary to
buffer the block of samples
The time to accumulate the block of samples is known as the algorithmic
delay
The coders delay an important parameter in conventional telephony
application, a low-delay coder is required whereas in an interactive
application delay of several seconds before the speech starts is acceptable
Perceptual Coding (PC)
LPC and CELP are used only for telephony applications and hence
compression of speech signal.
PC are designed for compression of general audio such as that associated
with a digital television broadcast.
Use a psychoacoustic model (this exploits a number of limitations of
human ear).
Using this approach, sampled segments of the source audio waveform
are analysed – but only those features that are perceptible to the ear are
transmitted.
E.g although the human ear is sensitive to signals in the range 15Hz to 20
kHz, the level of sensitivity to each signal is non-linear; that is the ear is
more sensitive to some signals than others.
WHAT IS THAT LIMITATION OF HUMAN
EARS..................??????
................. MASKING.........EFFECT
Frequency Masking: When multiple signals are present in audio, a
strong signal may reduce the level of sensitivity of the ear to other
signals which are near to it in frequency.
Temporal masking: When the ear hears a loud sound it takes a short
but a finite time before it could hear a quieter sound.
Psychoacoustic Model is used to identify those signals which are
influenced by masking and these are then eliminated from the
transmitted signal........and hence compression is achieved ...
Sensitivity of the ear:
The dynamic range of ear is defined as the loudest sound it can hear to
the quietest sound
Sensitivity of the ear varies with the frequency of the signal as
shown....in next slide.
The ear is most sensitive to signals in the range 2-5kHz hence the signals
in this band are the quietest the ear is sensitive to.
Vertical axis gives all the other signal amplitudes relative to this signal
(2-5 kHz).
In the fig. although the Signal A & B have same relative amplitude, signal
A would be heard only because it is above the hearing threshold and B is
below the hearing threshold.
Sensitivity of the ear varies with the
frequency as....
Frequency Masking
When an audio sound consists of multiple frequency signals is
present, the sensitivity of the ear changes and varies with the
relative amplitude of the signal
Conclusions from diagram:
Signal B is larger than signal A. This causes the basic sensitivity curve of
the ear to be distorted in the region of signal B
Signal A will no longer be heard as it is within the distortion band.
APPLICATION:
Interpersonal: Video Telephony & Video Conferencing
Interactive: access to stored video in various forms
Entertainment: Digital TV & MOD/VOD
Zig-Zag Scan,
Quantization Run-length
• major reduction coding
• controls ‘quality’
Predictive Frame (P-frame)
The encoding of the P-frame is relative to the contents of either a
preceding I-frame or a preceding P-frame
P-frames are encoded using a combination of motion estimation and motion
compensation
The accuracy of the prediction operation is determined by how well any
movement between successive frames is estimated. This is known as the
motion estimation
Since the estimation is not exact, additional information must also be sent to
indicate any small differences between the predicted and actual positions
of the moving segments involved. This is known as the motion
compensation
No of P frames between I-frames is limited to avoid error propagation
(since any error present in the first P-frame will be propagated to the next)
No. Of frames between a P-Frame and immediately preceding I-or-P
Frame is called prediction span(M)
Frame Sequences I-, P- and B-frames
Bi-directional Frame (B-frame)
For fast moving video e.g movies, B-frames (Bi-directional) are
used. Their contents are predicted using the past and the future frames.
B-frame is encoded relative to the preceding as well as the succeeding
I & P frame.
B-frame results in encoding delay because time needed to wait for the
next I or P frame in the sequence.
B- frames provides highest level of compression and because they are
not involved in the coding of other frames they do not propagate
errors.
PB-Frames
PB-frame: It does not refer to a new frame type as such but rather
the way two neighbouring P- and B-frames are encoded as if they were
a single frame
D-frame
This is application specific used in MOD/VOD applications.
In these application user wish for fast forward or rewind through the
movie, this requires the compressed video to be decompressed at a
much higher rate. To support this function encoded bit stream also
contains D-frame.
Motion Estimation & Motion Compensation
(Encoding of P & B frame)
Motion estimation involves comparing small segments of two consecutive
frames for differences, and as difference is detected a search is carried out
to determine which neighbouring segments the original segment has
moved.
To limit the time for search the comparison is limited to few segments
P-Frame: We will estimate the motion that has taken place between the
frame being encoded and preceding I or P frame (in case of P frame)
B-Frame: We will estimate the motion that has taken place between the
frame being encoded and preceding I or P frame as well as succeeding I
or P frame. (in case of B frame).
P-frame encoding
The digitized contents of the Y matrix associated with each frame are first
divided into a two-dimensional matrix of 16 X 16 pixels known as a
MACROBLOCK
MB consists of :
4 DCT blocks (8X8) for the luminance signals
1 DCT block each for the two chrominance signals (Cb and Cr).
Each MB has an address associated with it.
To encode a p-frame the contents of each macroblock in the frame –
known as the target frame are compared on a pixel-by-pixel basis with the
contents of the preceding I or P frames (reference frames)
I or P P
Reference Frame Target Frame
If the two contents are the same, only the address of the macroblock
in the reference frame is encoded
If the two contents are very close, both the motion vector and the
difference matrices associated with the macroblock in the reference
frame are encoded
If no close match is found, then the target macroblock is encoded in
the same way as a macroblock in an I-frame
In order to carry out its role, the motion estimation unit
containing the search logic, utilizes a copy of the (uncoded)
reference frame
Implementation schematic – B-frames
For each macroblock it is necessary to identify the type of encoding that has
been used. This is the role of the formatter.
Type – indicates the type of frame encoded I, P or B
Address – identifies the location of the macroblock in the frame
Quantization Value – is the value used to quantize all the DCT
coefficients in the macroblock
Motion vector – encoded vector
Block representation – indicates which of the six 8X8 blocks that make
up the macroblcok are present
B1, B2, ..B6: JPEG encoded DCT coefficients for those blocks present
MPEG (Moving Pictures Expert Group)
Committee of experts that develops video encoding standards
in the year 1990.
Until recently, was the only game in town (still the most
popular, by far)
Suitable for wide range of videos
Low resolution to high resolution
Slow movement to fast action
Can be implemented either in software or hardware
MPEG:
MPEG-1 ISO Recommendation 11172
Source intermediate digitization format (SIF) is used.
Uses resolution of 352x288 pixels and used for VHS quality audio and video
on CD-ROM at a bit rate of 1.5 Mbps
MPEG-2 ISO Recommendation 13818
Used in recording and transmission of studio quality audio and video.
Different levels of video resolution possible
Low: 352X288 comparable with MPEG-1
Main: 720X 576 pixels studio quality video and audio, bit rate up to
15 Mbps
High: 1920X1152 pixels used in wide screen HDTV bit rate of up to
80Mbps are possible
MPEG-4: Used for interactive multimedia applications over the
Internet and over various entertainment networks
MPEG standard contains features to enable a user not only to passively
access a video sequence using for example the start/stop/ but also enables
the manipulation of the individual elements that make up a scene within a
video
In MPEG-4 each video frame is segmented into a number of video
object planes (VOP) each of which will correspond to an AVO (Audio
visual object) of interest.
MPEG-1