Audio and Video Compression

Audio and video compression 4.
2 Audio compression
• Speech and non-speech signals are encoded in different

4.1 introduction approaches.
• Unlike text and images, both audio and most video 4.2.1 Speech coding
signals are continuously varying analog signals.
• Differential pulse code modulation (DPCM) is a
• Compression algorithms associated with digitized audio derivative of standard PCM and exploits the fact that, for
and video are different from those associated with text most audio signals, the range of the differences in
and images. amplitude between successive samples of the audio
waveform is less than the range of the actual sample
amplitudes. (G.711)
• In Adaptive differential PCM (ADPCM), fewer bits are

used to encode smaller difference values than for larger
values. (G.721, G.722 & G.726)
• DPCM and ADPCM can also be used to encode non-

speech signals.
• In linear predictive coding (LPC), a speech signal is
analyzed to extract its perceptual features including pitch
and format frequencies and these features are then
encoded. (LPC-10, G.728 , G.723 & G.729)
CYH/MMT/CmpAV/p.1 CYH/MMT/CmpAV/p.2
• Summary of speech compression standards and their 4.2.2 Perceptual coding
applications:
• Audio signal is coded based on a psychoacoustic model
Standard Compression Compressed Quality Example
technique bit rate (kbps) applications which describes the limitations of the human ear.
G.711 PCM+ 64 Good PSTN/ISDN
companding telephony • Ear is more sensitive to some signals than others.
G.721 ADPCM 32 Good Telephony at
16 Fair reduced bit • Frequency masking: A strong signal may reduce the
rates level of sensitivity of the ear to other signals which are
G.722 ADPCM 64 Excellent Audio near to it in frequency.
with 56/48 Good conferencing
subband
• Temporal masking: When the ear hears a loud sound, it
coding
G.726 ADPCM 40/32 Good General takes a short but finite time before it can hear a quieter
with 24/16 Fair telephony at sound.
subband reduced bit
coding rates
LPC-10 LPC 2.4/1.2 Poor Telephony in
military
networks
G.728 Code-excited 16 Good Low delay/low
LPC (CELP) bit rate
telephony
G.729 CELP 8 Good Telephony in
cellular
networks
G.729(A) CELP 8 Good Simultaneous
telephony and
data (fax)
G.723.1 CELP 6.3 Good Video and
5.3 Fair internet
telephony
MPEG audio coders
• An international standard based on this approach is

defined in ISO Recommendation 11172-3.
• Summary of MPEG layer 1, 2 and 3 perceptual encoders

Layer Application Compressed Quality Example
bit rate input-to-
output delay
1 Digital audio 32-448 kbps Hi-fi quality 20ms
cassette at 192kbps
per channel
2 Digital audio 32-192 kbps Near CD- 40ms
and digital video quality at
broadcasting 128 kbps per
channel
3 CD-quality 64 kbps CD-quality 60ms
audio over low at 64 kbps
bit rate channel per channel
• A higher layer makes a better use of the psychoacoustic

model and hence higher compression rate can be
achieved.
• The 3 layers require increasing levels of complexity (and

hence cost) to achieve a particular perceived quality, the
choice of layer and bit rate is often a compromise
between the desired perceived quality and the available
bit rate.
Dolby audio coders
• In AC-1, the bit allocation information of the quantized

subband samples is directly encoded and embedded in
the bit-stream.
• In AC-2, this information is indirectly encoded and has

to be estimated at the decoder.
• In AC-3, additional information is transmitted to

compensate for the estimation error.
• The acoustic quality of both the MPEG and Dolby audio

coders were found to be comparable.
• Summary of compression standards for general audio:

Standard Compressed Quality Example
bit rate applications
MPEG Layer 1 32-448kbps Hi-fi quality Digital audio
Audio at 192kbps cassettes
Layer 2 32-192kbps Near CD at Digital audio and
128 kbps digital video
broadcasting
Layer 3 64kbps CD quality CD-quality over
low bit rate
channels
Dolby AC-1 512kbps Hi-fi quality Radio and
audio television satellite
coders relays
AC-2 256kbps Hi-fi quality PC sound cards
AC-3 192kbps Near CD Digital video
quality broadcasting
4.3 Video compression
• There is not just a single standard associated with video
but rather a range of standards, each targeted at a
particular application domain.
4.3.1 Video compression principles

• Video is simply a sequence of digitized pictures and it is
also referred to as moving pictures.
• A video sequence can be encoded with JPEG algorithm

frame by frame and this approach is known as motion
JPEG.
• In addition to the spatial redundancy present in each

frame, considerable redundancy is often present between
successive frames.
• Frames are classified as 1 of 3 basic frame types (I-, P-

and B- frames) and encoded differently.
• I-frames:
• I-frames are encoded independently using the JPEG

algorithm.
• I-frames are inserted into the output stream relatively

frequently.
• I-frames are used as access points for random access

and FF/FR functionality in the bit stream.
• P-frames:
• Frames are partitioned into blocks of size 16x16

(macroblocks).
• To encode a P-frame, the contents of each macroblock

in the target frame are compared on a pixel-by-pixel
basis with the contents of the reference frame to find a
best-matched block of equal size.
• The reference frame can be a P- or I- frame.
• The (x,y) offset of the macroblock being encoded and

the best-matched block is known as motion vector.
• This motion-vector-searching process is known as

motion estimation.
• A prediction of the target frame is made with the
reference frame based on the motion vectors obtained.
• The difference between the predicted frame and the

actual target frame is known as the prediction error.
• Motion compensation: Additional bits are required to

encode the prediction error so as to compensate for the
difference if necessary.
• B-frames:
• To encoded a B-frame, any motion is estimated with

reference to both the immediately preceding I- or P-
frame and the immediately succeeding P- or I-frame.
• B-frames provide the highest level of compression.
• B-frames are not involved in the coding of other

frames and hence they do not propagate errors.
• The number of frames between successive I-frames is
known as a group of pictures (GOP).
• The number of frames between a P-frame and the

immediately preceding I- or P-frame is called the
prediction span.
• The order of encoding and transmission of the frames is

changed to minimize the time required to decode the
frames.
• A 4th type of frame known as a PB-frame has also been

defined. Two neighboring P- and B-frames are encoded
as if they were a single frame.
• A 5th type of frame known as a D-frame has been

defined for use in movie/video-on-demand applications.
• Basic bitstream format:
• Type : type of frame , I, P or B
• Address : identifies the location of the macroblock in
the frame
• Quantization value: the threshold value used to
quantize all DCT coefficients in the macroblock.
• Motion vector: encoded vector
• Block present: indicates which block in the
macroblock are present
• Typical figures of the compression ratios

• I-frames: 10~20:1
• P-frames: 20~30:1
• B-frames: 30~50:1
4.3.2 H.261
• Encoding format:
• H.261 has been defined by the ITU-T for the provision
• Type: indicates if the macroblock is intracoded or
of video telephony and videoconferencing services over
intercoded
an ISDN.
• Address: identifies the location of the macroblock in
• Supports I- and P-frames only. the frame
• Quantization value: the threshold value used to
quantize all DCT coefficients in the macroblock.
• Motion vector: encoded vector
• Coded block pattern: indicates which block in the
macroblock are present
• Picture start code: indicates the start of a new frame.
• Temporal reference: a timestamp for the decoder to
synchronize the video information with the audio
information.
• Picture type: indicates if the frame is encoded as I- or
P-frame.
• GOB start code: is a resynchronization marker which
is used for resynchronization in case of error.
• Group of (macro)block (GOP) is a structure consists of

3x11 macroblocks.
4.3.3 H.263
• H.263 has been defined by the ITU-T for use in a range
of real-time video applications over wireless and PSTNs.
• The applications include video telephony,

videoconferencing, security surveillance, interactive
games playing and so on.
• H.263 standard has a number of advanced coding options

compared with H.261:
• Progressive scanning with a refresh rate of either 15 or
7.5 fps.
• Support I-, P-, B- and PB- frames
• Motion vectors, if necessary, are allowed to point
outside of the frame area.
• Schemes such as error tracking, independent segment
decoding and reference picture selection are included
in the standard that aim at minimizing the effects of
errors on neighboring GOBs.
• Error concealment scheme is incorporated into the
decoder to mask the error from the viewer.
4.3.4 MPEG • Typical figures of the compression ratios
• I-frames: 10:1
• The Motion Pictures Expert Group (MPEG) was formed
by the ISO to formulate a set of standards relating to a • P-frames: 20:1
range of multimedia applications that involve the use of • B-frames: 50:1
video with sound.
MPEG1 : ISO Recommendation 11172
• Similar video compression technique as H.261.
• Progressive scanning with a refresh rate of 30Hz (for

NTSC) and 25Hz (for PAL)
• Support I-, P- and B- frames
• I-frames must be used for the various random-access

functions associated with VCRs.
• Improvement with respect to H.261:

1. A new layer called slice is added in the structure of
the stream such that the decoder can resynchronize
more quickly in case of error.
2. support B-frames
3. larger searching window of motion vectors and finer
resolution of its representation
• Bitstream format:
• Sequence start code: indicates the start of a sequence
• Video parameters: specify the screen size and aspect MPEG2 : ISO Recommendation 13818
ratio
• Bitstream parameters: indicate the bit rate and the size • It supports four levels - low, main, high 1440 and high -
of the memory/ frame buffers that are required each targeted at a particular application domain.
• Quantization parameters: contain the contents of the
quantization tables that are to be used. • There are 5 profiles associated with each level: simple,
- main, spatial resolution, quantization accuracy and high.
• GOP start code: indicates the start of a GOP
• The different combinations of levels and profiles form a
• Time stamp: used for synchronization purposes framework for all standards activities associated with
• Parameters: defines the particular sequence of frame MPEG-2.
types that are used in each GOP (e.g. IPPBPP)
- • One of the most popular setting is the MP@ML standard
• Picture start code: indicates the start of a frame which is for digital television broadcasting.
• Type: indicates if it's a I-, P- or B-frame • There are 3 standards associated with HDTV: advanced
• Buffer parameters: indicate how full the buffer should television (ATV) in North America, digital video
be before the decoding operation should start broadcast (DVB) in Europe, and multiple sub-Nyquist
• Encode parameters: indicate the resolution of a motion sampling encoding (MUSE) in Japan.
vector.
ATV DVB MUSE
-
Aspect ratio 16/9 4/3 16/9
• Slice start code: indicates the start of a slice Resolution 1280x720 1440x1152 1920x1035
• Vertical position: indicates the scan line in which the Compression MP@HL of SSP@H1440 Similar to
slice is (video) MPEG2 of MPEG2 MP@HL
Compression Dolby AC-3 MP2
• Quantization parameters: indicates the scaling factor
(Audio)
that applies to this slice.
• Summary of video compression standards
Standard Digitization Compressed Example applications
format bit rate
H.261 CIF/QCIF x64kbps Video telephony/
conferencing over ISDN
and LANs
H.263 S-QCIF/ <64kbps Video telephony/
QCIF conferencing and security
surveillance over low bit
rate channels
MPEG-1/ SIF <1.5Mbps Storage of VHS-quality
ISO11172 video on CD-ROMs
MPEG-2/
ISO13818
Low SIF <4Mbps Recording of VHS-quality
video
Main 4:2:0 <15Mbps Digital video broadcasting
4:2:2 <20Mbps
High 1440 4:2:0 <60Mbps HDTV (4/3 aspect ratio)
4:2:2 <80Mbps
High 4:2:0 <80Mbps HDTV (16/9 aspect ratio)
4:2:2 <100Mbps
MPEG-4 Various 5kbps- Versatile multimedia
tens Mbps coding standard
CYH/MMT/CmpAV/p.31

Audio and Video Compression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Audio and Video Compression

Uploaded by

Copyright:

Available Formats

Audio and video compression 4.

• Speech and non-speech signals are encoded in different

• In Adaptive differential PCM (ADPCM), fewer bits are

• DPCM and ADPCM can also be used to encode non-

• An international standard based on this approach is

• Summary of MPEG layer 1, 2 and 3 perceptual encoders

• A higher layer makes a better use of the psychoacoustic

• The 3 layers require increasing levels of complexity (and

• In AC-1, the bit allocation information of the quantized

• In AC-2, this information is indirectly encoded and has

• In AC-3, additional information is transmitted to

• The acoustic quality of both the MPEG and Dolby audio

• Summary of compression standards for general audio:

4.3.1 Video compression principles

• A video sequence can be encoded with JPEG algorithm

• In addition to the spatial redundancy present in each

• Frames are classified as 1 of 3 basic frame types (I-, P-

• I-frames are encoded independently using the JPEG

• I-frames are inserted into the output stream relatively

• I-frames are used as access points for random access

• Frames are partitioned into blocks of size 16x16

• To encode a P-frame, the contents of each macroblock

• The reference frame can be a P- or I- frame.

• The (x,y) offset of the macroblock being encoded and

• This motion-vector-searching process is known as

• The difference between the predicted frame and the

• Motion compensation: Additional bits are required to

• To encoded a B-frame, any motion is estimated with

• B-frames provide the highest level of compression.

• B-frames are not involved in the coding of other

• The number of frames between a P-frame and the

• The order of encoding and transmission of the frames is

• A 4th type of frame known as a PB-frame has also been

• A 5th type of frame known as a D-frame has been

• Typical figures of the compression ratios

• Group of (macro)block (GOP) is a structure consists of

• The applications include video telephony,

• H.263 standard has a number of advanced coding options

MPEG1 : ISO Recommendation 11172

• Similar video compression technique as H.261.

• Progressive scanning with a refresh rate of 30Hz (for

• Support I-, P- and B- frames

• I-frames must be used for the various random-access

• Improvement with respect to H.261:

• Sequence start code: indicates the start of a sequence

You might also like