MultiMedia1 - Overview - Chapter 1,2,3-Doc Den Trang 4-133

Multimedia Technology
Introduction (2)
Overview
q
q
q
n
n
n
n
n
n
n
q
4/2/2003
JPEG
MPEG-1/MPEG -2 Audio & Video
MPEG-4
MPEG-7 (brief introduction)
HDTV (brief introduction)
H261/H263 (brief introduction)
Model base coding (MBC) (brief introduction)
CATV systems
DVB systems
Chapter 4: Multimedia Network

Nguyen Chan Hung Hanoi University of Technology
Introduction
n
The importance of Multimedia technologies: Multimedia everywhere !!

q
On PCs:
n
Real Player, QuickTime, Windows Media.
n
Music and Video are free on the INTERNET (mp2, mp3, mp4, asf, mpeg,
mov, ra, ram, mid, DIVX, etc)
n
Video/Audio Conferences.
n
Webcast / Streaming Applications
n
Distance Learning (or Tele-Education)
n
Tele-Medicine
n
Tele-xxx (Lets imagine !!)
q
On TVs and other home electronic devices:
n
DVB-T/DVB-C/DVB-S (Digital Video Broadcasting
Terrestrial/Cable/Satellite) shows MPEG -2 superior quality over
traditional analog TV !!
n
Interactive TV Internet applications (Mail, Web, E -commerce) on a TV !!
No need to wait for a PC to startup and shutdown !!
n
CD/VCD/DVD/Mp3 players
q
Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !!
4/2/2003
Multimedia network
Chapter 3: Some real-world systems

n
Introduction
Chapter 1: Background of compression techniques
Chapter 2: Multimedia technologies
4/2/2003
The Internet was designed in the 60s for low-speed internetworks with boring textual applications High delay,
high jitter.
Multimedia applications require drastic modifications
of the INTERNET infrastructure.
Many frameworks have been being investigated and
deployed to support the next generation multimedia
Internet. (e.g. IntServ, DiffServ)
In the future, all TVs (and PCs) will be connected to the
Internet and freely tuned to any of millions broadcast
stations all over the World.
At present, multimedia networks run over ATM (almost
obsolete), IPv4, and in the future IPv6 should
guarantee QoS (Quality of Service) !!
Chapter 1: Background of compression

techniques
n
Why compression ?
q
Compression factor or compression ratio

q
For communication: reduce bandwidth in multimedia

network applications such as Streaming media, Video-onDemand (VOD), Internet Phone
Digital storage (VCD, DVD, tape, etc) Reduce size &
cost, increase media capacity & quality.
Ratio between the source data and the compressed data.
(e.g. 10:1)
2 types of compression:
q
q
4/2/2003
Lossless compression
Lossy compression
Information content and redundancy

n
Lossy Compression
Information rate
q Entropy is the measure of information content.
n
Expressed in bits/source output unit (such as bits/pixel).
The more information in the signal, the higher the

entropy.
q Lossy compression reduce entropy while lossless
compression does not.
Redundancy
q The difference between the information rate and bit
rate.
q Usually the information rate is much less than the bit
rate.
q Compression is to eliminate the redundancy.
q
4/2/2003
q
q
Lossless Compression
n
Example: archives resulting from utilities such as

pkzip or Gzip
Compression factor is around 2:1.
Based on the understanding of

psychoacoustic and psychovisual perception.
Can be forced to operate at a fixed
compression factor.
4/2/2003
Communication (reduce the cost of the data

link)
q
Can not guarantee a fix compression ratio

The output data rate is variable problems
for recoding mechanisms or communication
channel.
4/2/2003
Suitable for audio and video compression.

Compression factor is much higher than that of
lossless. (up to 100:1)
Process of Compression
The data from the decoder is identical to the

source data.
q
The data from the expander is not identical to

the source data but the difference can not be
distinguished auditorily or visually.
Recording (extend playing time: in proportion

to compression factor
q
Data ? Compressor (coder) ? transmission

channel ? Expander (decoder) ? Data'
4/2/2003
Data ? Compressor (coder) ? Storagedevice

(tape, disk, RAM, etc.) ? Expander (decoder)
? Data
Sampling and quantization

n
Why sampling?
q
Computer can not process analog signal directly.
PCM
q
Statistical coding: the Huffman code
Sample the analog signal at a constant rate and

use a fixed number of bits (usually 8 or 16) to
represent the samples.
bit rate = sampling rate * number of bits per
sample
Quantization
q
Map the sampled analog signal (generally, infinite

precision) to discrete level (finite precision).
Represent each discrete level with a number.
4/2/2003
Predictive coding
n
4/2/2003
Use previous sample(s) to estimate the current

sample.
For most signal, the difference of the prediction
and actual values is small. We can use smaller
number of bits to code the difference while
maintaining the same accuracy !!
Noise is completely unpredictable
n
4/2/2003
Most codec requires the data being preprocessed or

otherwise it may perform badly when the data contains
noise.
11
Drawbacks of compression
Prediction
q
Assign short code to the most probable data

pattern and long code to the less frequent
data pattern.
Bit assignment based on statistic of the
source data.
The statistics of the data should be known
prior to the bit assignment.
10
Sensitive to data error

q
Concealment required for real time application

q
Compression eliminates the redundancy which is essential

to making data resistant to errors.
Error correction code is required, hence, adds redundancy
to the compressed data.
Artifacts
q
Artifacts appear when the coder eliminates part of the

entropy.
The higher the compression factor, the more the artifacts.
4/2/2003
12
A coding example: Clustering color pixels

n
n
n
Motion Compensated Prediction
In an image, pixel values are clustered in several

peaks
Each cluster representing the color range of one
object in the image (e.g. blue sky)
Coding process:
1. Separate the pixel values into a limited number of data
clusters (e.g., clustered pixels of sky blue or grass green)

2. Send the average color of each cluster and an
identifying number for each cluster as side information.

3. Transmit, for each pixel:
n
n
4/2/2003
The number of the average cluster color that it is close to.

Its difference from that average cluster color. ( can be
coded to reduce redundancy since the differences are often
similar !!) Prediction
13
Frame-Differential Coding
n
n
q
q
15
Unpredictable information from the previous

frame:
Scene change (e.g. background landscape
change)
2. Newly uncovered information due to object
motion across a background, or at the edges of a
panned scene. (e.g. a soccer s face uncovered
by a flying ball)
1.
Data can be sent only for the first instance of a frame

All subsequent prediction error values are zero.
Retransmit the frame occasionally to allow receivers that
have just been turned on to have a starting point.
4/2/2003
FDC reduces the information for still images, but

leaves significant data for moving images (e.g. a
movement of the camera)
4/2/2003
Unpredictable Information
Frame-Differential Coding = prediction from a

previous video frame.
A video frame is stored in the encoder for
comparison with the present frame causes
encodinglatency of one frame time.
For still images:
q
More data in Frame-Differential Coding can

be eliminated by comparing the present
pixel to the location of the same object
in the previous frame. ( not to the
same spatial location in the previous frame)
The encoder estimates the motion in the
image to find the corresponding area in a
previous frame.
The encoder searches for a portion of a
previous frame which is similar to the part
of the new frame to be transmitted.
It then sends (as side information) a
motion vector telling the decoder what
portion of the previous frame it will use to
predict the new frame.
It also sends the prediction error so that
the exact new frame may be reconstituted
See top figure without motion
compensation Bottom figure With
motion compensation
14
4/2/2003
16
Dealing with unpredictable Information

n
Scene change
q
An Intra-coded picture (MPEG I picture) must be sent

for a starting point require more data than Predicted
picture (P picture)
I pictures are sent about twice per second Their time and
sending frequency may be adjusted to accommodate
scene changes
4/2/2003
17
Transform Coding
q
q
q
q
q
Bi-directionally coded type of picture, or B picture.

There must be enough frame storage in the system to wait
for the later picture that has the desired information.
To limit the amount of decoders memory, the encoder
stores pictures and sends the required reference
pictures before sending the B picture.
Discrete Fourier (DFT)

Karhonen-Loeve
Walsh-Hadamard
Lapped orthogonal
Discrete Cosine (DCT) used in MPEG-2 !
Wavelets New !
The differences between transform coding methods:

q
q
4/2/2003
The degree of concentration of energy in a few coefficients

The region of influence of each coefficient in the
reconstructed picture
The appearance and visibility of coding noise due to coarse
quantization of the coefficients
19
DCT Lossy Coding
Convert spatial image pixel values to

transform coefficient values
the number of coefficients produced is
equal to the number of pixels transformed.
Few coefficients contain most of the
energy in a picture coefficients may be
further coded by lossless entropy coding
The transform process concentrates the
energy into particular coefficients
(generally the low frequency coefficients )
4/2/2003
Types of picture coding:

q
Uncovered information
q
Types of picture transform coding
Lossless coding cannot obtain high

compression ratio (4:1 or less)
Lossy coding = discard selective information
so that the reproduction is visually or aurally
indistinguishable from the source or having
least artifacts.
Lossy coding can be achieved by:
q
q
18
4/2/2003
Eliminating some DCT coefficients

Adjusting the quantizing coarseness of the
coefficients better !!
20
Masking
n
Masking make certain types of coding

noise invisible or inaudible due to some
psycho-visual/acoustical effect.
q
Run-Level coding
In audio, a pure tone will mask energy of higher

frequency and also lower frequency (with weaker
effect).
In video, high contrast edges mask random noise.
21
Variable quantization
q
q
Instead of sending all the zero values

individually, the length of the run is sent.
Useful for any data with long runs of zeros.
Run lengths are easily encoded by Huffman code
Noise introduced at low bit rates falls in the

frequency, spatial, or temporal regions
4/2/2003
"Run-Level" coding = Coding a run-length of

zeros followed by a nonzero level.
Match average bit rate to a constant channel bit rate.
Prevent buffer overflow or underflow.
4/2/2003
23
Key points:
Variable quantization is the main technique of lossy

coding greatly reduce bit rate.
Coarsely quantizing the less significant coefficients
in a transform ( less noticeable / low energy / less
visible/audible)
Can be applied to a complete signal or to individual
frequency components of a transformed signal.
VQ also controls instantaneous bit rate in order to:
q
4/2/2003
n
n
n
Compression process
Quantization & Sampling
Coding:
q
q
q
q
q
22
Lossless & lossy coding

Frame-Differential Coding
Motion Compensated Prediction
Variable quantization
Run level coding
Masking
4/2/2003
24
Chapter 2: Multimedia technologies

q
JPEG Zig-zag scanning
Roadmap
n
JPEG
MPEG-1/MPEG-2 Video
MPEG-1 Layer 3 Audio (mp3)
MPEG-4
MPEG-7 (brief introduction)
HDTV (brief introduction)
n
n
n
n
n
n
H261/H263 (brief introduction)

Model base coding (MBC) (brief introduction)
4/2/2003
25
JPEG (Joint Photographic Experts Group)

n
q
q
q
q
q
Partitions image into blocks of 8 * 8 pixels

Calculates the Discrete Cosine Transform (DCT) of each block.
A quantizer roundsoff the DCT coefficients according to the
quantizationmatrix. lossy but allows for large compression ratios.
Produces a series of DCT coefficients using Zig-zag scanning
Uses a variablelengthcode(VLC) on these DCT coefficients
Writes the compressed data stream to an output file (*.jpg or *.jpeg).
JPEG decoder
q
Input image A:
File input data stream Variable length decoder IDCT (Inverse

DCT) Image
q
q
26
The input image A is N2 pixels wide by N1 pixels high;

A(i,j) is the intensity of the pixel in row i and column j;
Output image B:
q
27
DCT is similar to the Discrete Fourier Transform

transforms a signal or image from the spatial domain to
the frequency domain.
DCT requires less multiplications than DFT
4/2/2003
JPEG - DCT
n
JPEG encoder
q
4/2/2003
4/2/2003
B(k1,k2) is the DCT coefficient in row k1 and column k2 of

the DCT matrix
28
JPEG - Quantization Matrix

n
n
n
The quantization matrix is the 8 by 8 matrix of step sizes

(sometimescalled quantums) - one element for each DCT
coefficient.
Usually symmetric.
Step sizes will be:
q
q
q
n
n
n
MPEG (Moving Picture Expert Group)

n
q
q
q
Small in the upper left (low frequencies),

Large in the upper right (high frequencies)
A step size of 1 is the most precise.
q
q
The quantizer divides the DCT coefficient by its corresponding

quantum, then rounds to the nearest integer.
Large quantums drive small coefficients down to zero.
The result:
q
q
29
JPEG Coding process illustrated
4/2/2003
58
-12
-4
-6
78
-1
-1
-73 -27
-1
-5
-5
-4
-1
-4
-5
-3
q
q
11
-65
80
-49
37
-87
12
10
27
-50
29
13
13
-6
-16
21
-11 -10
10
-21
-6
-1
-14
14
-14
16
-8
-4
-1
-13
12
-9
-1
-4
-2
-7
-1
Easily coded by Run-length Huffman coding
A standard for digital television

Applications: DVD (digital versatile disk), HDTV (high definition
TV), DVB (European Digital Video Broadcasting Group), etc.
A standard for multimedia applications

Applications: Internet, cable TV, virtual studio, etc.
MPEG-7 (Future work ongoing research)

q
30
A standard for storage and retrieval of moving pictures and audio

on storage media
application: VCD (video compact disk)
MPEG-4 (Newly implemented still being

researched)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB
31
MPEG-2 (Widely implemented)
DCT Coefficients
Quantization result
Zigzag scan result: 78 -1 1 -4 -5 4 4 6 3 2 -1 -3 -5 -4 -1 0 -1 0 1 1 0 0 0 0 0 0 0
4/2/2003
MPEG-1 (Obsolete)
43
MPEG-1, MPEG-2, MPEG-4, MPEG-7

(MPEG-3 standard was abandoned and became
an extension of MPEG-2)
MPEG standards
n
1255 -15
Digital television set-top boxes

HDTV decoders
DVD players
Video conferencing
Internet video, etc
MPEG standards:
q
Many high frequency coefficients become zero remove easily.

The low frequency coefficients undergo only minor adjustment.
4/2/2003
MPEG is the heart of:
4/2/2003
Content representation standard for information search

( Multimedia Content Description Interface)
Applications: Internet, video search engine, digital library
32
MPEG-2 formal standards

n
Pixel & Block
The international standard ISO/IEC 13818-2

"Generic Coding of Moving Pictures and
Associated Audio Information
ATSC (Advanced Television Systems
Committee) document A/54 "Guide to the Use of
the ATSC Digital Television Standard
Pixel = "picture element".

q
q
Block
q
q
4/2/2003
33
MPEG video data structure

n
4/2/2003
PIXEL is the fundamental unit

BLOCK is an 8 x 8 array of pixels
MACROBLOCK consists of 4 luma blocks and 2 chroma
blocks
Field DCT Coding and Frame DCT Coding
SLICE consists of a variable number of macroblocks
PICTURE consists of a frame (or field) of slices
GROUP of PICTURES (GOP) consists of a variable
number of pictures
SEQUENCE consists of a variable number of GOPs
PACKETIZED ELEMENTARY STREAM (opt)
q
q
q
q
q
q
q
q
4/2/2003
= 8 x 8 array of pixels.
A block is the fundamental unit for the DCT coding
(discrete cosine transform).
35
Macroblock
The MPEG 2 video data stream is constructed in

layers from lowest to highest as follows:
q
A discrete spatial point sample of an image.

A color pixel may be represented digitally as a
number of bits for each of three primary color
values
A macroblock = 16 x 16 array of luma (Y) pixels ( =

4 blocks = 2 x 2 block array).
The number of chroma pixels (Cr, Cb) will vary
depending on the chroma pixel structure
indicated in the sequence header (e.g. 4:2:0, etc)
The macroblock is the fundamental unit for motion
compensation and will have motion vector(s)
associated with it if is predictively coded.
A macroblock is classified as
Field coded ( An interlaced frame consists of 2 field)
Frame coded
depending on how the four blocks are extracted from the
macroblock.
q
q
34
4/2/2003
36
Slice
n
n
I, P, B Pictures
Encoded pictures are classified into 3 types: I, P, and B.
n I Pictures = Intra Coded Pictures
Pictures are divided into slices.

A slice consists of an arbitrary number of
successive macroblocks (going left to right),
but is typically an entire row of macroblocks.
A slice does not extend beyond one row.
The slice header carries address information
that allows the Huffman decoder to
resynchronize at slice boundaries
q
q
P Pictures = Predicted Pictures

q
q
37
Picture
n
n
n
n
q
q
q
4/2/2003
picture type (I, B, P)

temporal reference information
motion vector search range
optional user data
n
n
39
n
n
The group of pictures layer is optional in MPEG-2.

GOP begins with a start code and a header
The header carries
q time code information
q editing information
q optional user data
First encoded picture in a GOP is always an I picture
Typical length is 15 pictures with the following structure (in display order):
q I B B P B B P B B P B B P B B Provides an I picture with sufficient
frequency to allow a decoder to decode correctly
Forward motion compensation
A frame picture consists of:

q
Macroblocks may be coded with forward prediction from previous I

or P references
Macroblocks may be coded with backward prediction from next I or
P reference
Macroblocks may be coded with interpolated prediction from past
and future I or P references
Macroblocks may be intra coded (no prediction)
Group of pictures (GOP)
A source picture is a contiguous rectangular array of pixels.

A picture may be a complete frame of video ("frame picture") or
one of the interlaced fields from an interlaced source ("field
picture").
A field picture does not have any blank lines between its active
lines of pixels.
A coded picture (also called a video access unit) begins with a
start code and a header. The header consists of:
q
Macroblocks may be coded with forward prediction from references

made from previous I and P pictures or may be intra coded
B Pictures = Bi-directionally predicted pictures
4/2/2003
All macroblocks coded without prediction

Needed to allow receiver to have a "starting point" for prediction after
a channel change and to recover from errors
B B
B P
B
Time
a frame of a progressive source or

a frame (2 spatially interlaced fields) of an interlaced source
Bidirectional motion compensation
4/2/2003
38
4/2/2003
40
Sequence
n
n
q
q
q
q
q
q
A sequence begins with a unique 32 bit start code followed by

a header.
The header carries:
q
Transport stream
q
q
picture size
aspect ratio
frame rate and bit rate
optional quantizer matrices
required decoder buffer size
chroma pixel structure
optional user data
q
q
The sequence information is needed for channel changing.

The sequence length depends on acceptable channel change
delay.
4/2/2003
41
Packetized Elementary Stream (PES)

n
n
n
n
n
Video Elementary Stream (video ES), consists of all

the video data for a sequence, including the sequence
header and all the subparts of a sequence.
An ES carries only one type of data (video or audio)
from a single video or audio encoder.
A PES, consists of a single ES which has been split
into packets, each starting with an added packet
header.
A PES stream contains only one type of data from
one source, e.g. from one video or audio encoder.
PES packets have variable length, not corresponding
to the fixed packet length of transport packets, and
may be much longer than a transport packet.
4/2/2003
Transport packets (fixedlength) are formed from a PES stream,

including:
42
The PES header

Transport packet header.
Successive transport packets payloads are filled by the remaining
PES packet content until the PES packet is all used.
The final transport packet is filled to a fixed length by stuffing with
0xFF bytes (all ones).
Each PES packet header includes:

q An 8-bit stream ID identifying the source of the payload.
q Timing references: PTS (presentation time stamp), the time
at which a decoded audio or video access unit is to be
presented by the decoder
q DTS (decoding time stamp) the time at which an access unit
is decoded by the decoder
q ESCR (elementary stream clock reference).
4/2/2003
43
Intra Frame Coding

n
Intra coding only concern with information within the current

frame, (not relative to any other frame in the video sequence)
MPEG intra-frame coding block diagram (See bottom Fig)
Similar to JPEG (Lets review JPEG coding mechanism !!)
Basic blocks of Intra frame coder:
q
q
q
q
4/2/2003
Video filter
Discrete cosine transform (DCT)
DCT coefficient quantizer
Run-length amplitude/variable length coder (VLC)
44
Video Filter
n
Most sensitive to changes in luminance,

Less sensitive to variations in chrominance.
q
q
MPEG-2 is classified into several profiles.

Main profile features:
q
q
MPEG uses the YCbCr color space to represent the

data values instead of RGB, where:
q
Human Visual System (HVS) is

q
MPEG Profiles & levels
Y is the luminance signal,

Cb is the blue color difference signal,
Cr is the red color difference signal.
Main Profile is subdivided into levels.

q
n
n
n
4:4:4 is full bandwidth YCbCr video each macroblock

consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks
waste of bandwidth !!
4:2:0 is most commonly used in MPEG-2
4/2/2003
n
n
45
Multiplex order (time)

within macroblock
Main stream television,

Consumer entertainment.
4:2:2
(8 blocks)
YYYYCbCrCbCr
Studio production
environments
Professional editing
equipment,
4:4:4
(12 blocks)
YYYYCbCrCbCrCbCrCbCr
Computer graphics
4/2/2003
Upper bounds:
1152 x 1920, 60Hz progressive
80 Mbits/s
47
MPEG encoder/ decoder
Application
YYYYCbCr
4:2:0
(6 blocks)
4/2/2003
Designed with CCIR601 standard for interlaced standard digital

video.
720 x 576 (PAL) or 720 x 483 (NTSC)
30 Hz progressive, 60 Hz interlaced
Maximum bit rate is 15 Mbits/s
MP@HL (Main Profile High Level):

n
Applications of chroma formats

chroma_for
mat
MP@ML (Main Profile Main Level):

n
What is 4:4:4, 4:2:0, etc, video format ?

q
4:2:0 chroma sampling format

I, P, and B pictures
Non-scalable
46
4/2/2003
48
Prediction
DCT and IDCT formulas

n
Backward prediction is done by

storing pictures until the desired
anchor picture is available before
encoding the current stored frames.
The encoder can decide to use:
q
4/2/2003
n
n
n
49
q
q
q
q
q
4/2/2003
4/2/2003
F(u,v) = two-dimensional
NxN DCT.
u,v,x,y = 0,1,2,...N-1
x,y are spatial coordinates in
the sample domain.
u,v are frequency coordinates
in the transform domain.
C(u), C(v) = 1/(square root
(2)) for u, v = 0.
C(u), C(v) = 1 otherwise.
51
The DCT is conceptually similar to the DFT, except:

q
I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)
B(12) I(13)
I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)
B(12) I(13)
50
DCT concentrates energy into lower order coefficients

better than DFT.
DCT is purely real, the DFT is complex (magnitude and
phase).
A DCT operation on a block of pixels produces coefficients
that are similar to the frequency domain coefficients
produced by a DFT operation.
n
I(1) P(4) B(2) B(3) P(7) B(5) B(6) P(10) B(8) B(9) I(13) B(11)
B(12)
Eq 3 Normal form
Eq 4 Matrix form
Where:
q
Decoder output order and display order (same as

input):
q
IDCT:
q
Encoding order and order in the coded bitstream:

q
Eq 1 Normal form
Eq 2 Matrix form
DCT versus DFT
Pictures are coded and decoded in a different order

than they are displayed.
Due to bidirectional prediction for B pictures.
For example we have a 12 picture long GOP:
Source order and encoder input order:
q
to minimize prediction error.

The encoder must transmit pictures in
an order differ from that of source
pictures so that the decoder has the
anchor pictures before decoding
predicted pictures. (See next slide)
The decoder must have two frame
stored.
DCT:
q
Forward prediction from a previous

picture,
Backward prediction from a following
picture,
or Interpolated prediction
I P B Picture Reordering
n
4/2/2003
An N-point DCT has the same frequency resolution as a 2Npoint DFT.

The N frequencies of a 2N point DFT correspond to N points
on the upper half of the unit circle in the complex frequency
plane.
Assuming a periodic input, the magnitude of the DFT

coefficients is spatially invariant (phase of the input does
not matter). This is not true for the DCT.
52
Quantization matrix
MPEG scanning
n
Note DCT
coefficients are:
Small in the upper left

(low frequencies),
q Large in the upper right
(high frequencies)
Recall the JPEG
mechanism !!
q
HVS is less sensitive

to errors in high
frequency coefficients
than it is for lower
frequencies
higherfrequencies
should be more
coarsely quantized !!
53
Result DCT matrix (example)

n
4/2/2003
After adaptive
quantization, the
result is a matrix
containing many
zeros.
55
Huffman/ Run-Level Coding
4/2/2003
Left Zigzag scanning (like JPEG)

Right Alternate scanning better for interlaced frames !
Why ?
q
4/2/2003
54
Huffman coding in combination with Run-Level

coding and zig-zag scanning is applied to
quantized DCT coefficients.
"Run-Level" = A run-length of zeros followed by a
non-zero level.
Huffman coding is also applied to various types of
side information.
A Huffman code is an entropy code which is
optimally achieves the shortest average possible
code word length for a source.
This average code word length is >= the entropy
of the source.
4/2/2003
56
Huffman/ Run-Level coding illustrated

Zero
Run-Length
Amplitude
N/A
8 (DC Value)
110 1000
00001100
00001100
01000
01000
01000
110
110
110
110
12
0010 0010 0
EOB
EOB
10
4/2/2003
MPEG
Code Value
MPEG Data Transport
n
n
q
q
q
q
The most probable

occurrence is given a
relatively short code,
The least probable
occurrence is given a
relatively long code.
57
The first run of 12 zeroes has been efficiently

coded by only 9 bits
The last run of 41 zeroes has been entirely
eliminated, represented only with a 2-bit End Of
Block (EOB) indicator.
The quantized DCT coefficients are now
represented by a sequence of 61 binary bits (See
the table).
Considering that the original 8x8 block of 8-bit
pixels required 512 bits for full representation,
the compression rate is approx. 8,4:1.
4/2/2003
4/2/2003
59
MPEG Transport packet
Adaptation Field:
q
q
q
q
q
58
The PES header is placed immediately following a transport header

Successive portions of the PES packet are then placed in the pay loads of
transport packets.
Remaining space in the final transport packet payload is filled with stuffing
bytes = 0xFF (all ones).
Each transport packet starts with a sync byte = 0x47.
In the ATSC US terrestrial DTV VSB transmission system, sync byte is not
processed, but is replaced by a different sync symbol especially suited to RF
transmission.
The transport packet header contains a 13-bit PID (packet ID), which
corresponds to a particular elementary stream of video, audio, o r other program
element.
PID 0x0000 is reserved for transport packets carrying a program association
table (PAT).
The PAT points to a Program Map Table (PMT) points to particular elements
of a program
Huffman/ Run-Level coding illustrated (2)

n
MPEG packages all data into fixed-size 188-byte packets for transport.
Video or audio payload data placed in PES packets before is broken up
into fixed length transport packet payloads.
A PES packet may be much longer than a transport packet Require
segmentation:
Using the DCT output

matrix in previous slide,
after being zigzag
scanned the output
will be a sequence of
number: 4, 4, 2, 2, 2, 1,
1, 1, 1, 0 (12 zeros), 1, 0
(41 zeros)
These values are looked
up in a fixed table of
variable length codes
8 bits specifying the length of the

adaptation field.
The first group of flags consists of
eight 1-bit flags:
discontinuity_indicator
random_access_indicator
elementary_stream_priority_in
dicator
4/2/2003
q
q
q
q
q
q
PCR_flag
OPCR_flag
splicing_point_flag
transport_private_data_flag
adaptation_field_extension_flag
The optionalfields are present if
indicated by one of the preceding flags.
The remainder of the adaptation field is
filled with stuffing bytes (0xFF, all
ones).
60
Demultiplexing a Transport Stream (TS)

Demultiplexing a transport stream involves:
Finding the PAT by selecting packets with PID = 0x0000

Reading the PIDs for the PMTs
Reading the PIDs for the elements of a desired program
from its PMT (for example, a basic program will have a
PID for audio and a PID for video)
Detecting packets with the desired PIDs and routing them
to the decoders
1.
2.
3.
4.
A MPEG-2 transport stream can carry:
Timing - Synchronization
n
4/2/2003
n
n
Timing & buffer control
4/2/2003
A program component can even have no time stamps but

can not be synchronized with other components.
At encoder input, (Point A), the time of occurrence of an input
video picture or audio block is noted by sampling the STC.
A total delay of encoder and decoder buffer (constant) is
added to STC, creating a Presentation Time Stamp (PTS),
PTS is then inserted in the first of the packet(s ) representing
that picture or audio block, at Point B.
61
The STC belongs to aparticular program and is the master

clock of the video and audio encoders for that program.
Multiple programs, each with its own STC, can also be
multiplexed into a single stream.
Video stream
Audio stream
Any type of data

MPEG-2 TS is the packet format for CATV downstream
data communication.
The decoder is synchronized with the encoder by time stamps

The encoder contains a master oscillator and counter, called the
System Time Clock (STC). (See previous block diagram.)
Point A:
Encoder input
Constant/specifi
edrate
Point B:
Encoder
output
Variable rate
Point C:
Encoderbuffer
output
Constant rate
Point D:
Communication
channel +
decoderbuffer
Constant
rate
Point E:
Decoder input
Variable rate
Point F:
Decoderoutput
Constant/specifi
edrate
62
4/2/2003
63
Timing Synchronization (2)

n
Decode Time Stamp (DTS) can optionally combined into the bit
stream represents the time at which the data should be taken
instantaneously from the decoder buffer and decoded.
q
q
q
q
In addition, the output of the encoder buffer (Point C) is time

stamped with System Time Clock (STC) values, called:
q
q
n
n
n
DTS and PTS are identical except in the case of picture reordering for B
pictures.
The DTS is only used where it is needed because of reordering.
Whenever DTS is used, PTS is also coded.
PTS (or DTS) inserted interval = 700 m S.
In ATSC PTS (or DTS) must be inserted at the beginning of each
coded picture (access unit ).
System Clock Reference (SCR) in a Program Stream.

Program Clock Reference (PCR) in a Transport Stream.
PCR time stamp interval = 100mS.

SCR time stamp interval = 700mS.
PCR and/or the SCR are used to synchronize the decoder STC
with the encoder STC.
4/2/2003
64
Timing Synchronization (3)

n
n
n
n
HDTV (2)
All video and audio streams included in a program must get their
time stamps from a common STC so that synchronization of the
video and audio decoders with each other may be accomplished.
The data rate and packet rate on the channel (at the multiplexer
output) can be completely asynchronous with the System Time
Clock (STC)
PCR time stamps allows synchronizations of different
multiplexed programs having different STCs while allowing STC
recovery for each program.
If there is no buffer underflow or overflow delays in the buffers
and transmission channel for both video and audio are
constant.
The encoder input and decoder output run at equal and constant
rates.
Fixedend-to-end delay from encoder input to decoder output
If exact synchronization is not required, the decoder clock can be
free running video frames can be repeated / skipped as
necessary to prevent buffer underflow / overflow , respectively.
4/2/2003
65
HDTV (High definition television)

n
4/2/2003
'A system designed to allow viewing at about

three times the picture height, such that the
system is virtually, or nearly, transparent to the
quality or portrayal that would have been
perceived in the original scene ... by a discerning
viewer with normal visual acuity.'
HDTV proposals are for a screen which is wider than the conventional
TV image by about 33%. It is generally agreed that the HDTV aspect
ratio will be 16:9, as opposed to the 4:3 ratio of conventional TV
systems. This ratio has been chosen because psychological tests have
shown that it best matches the human visual field.
It also enables use of existing cinema film formats as additional source
material, since this is the same aspect ratio used in normal 35 mm film.
Figure 16.6(a) shows how the aspect ratio of HDTV compares with that
of conventional television, using the same resolution, or the same
surface area as the comparison metric.
To achieve the improved resolution the video image used in HDTV
must contain over 1000 lines, as opposed to the 525 and 625 provided
by the existing NTSC and PAL systems. This gives a much improved
vertical resolution. The exact value is chosen to be a simple multiple of
one or both of the vertical resolutions used in conventional TV.
However, due to the higher scan rates the bandwidth requirement for
analogue HDTV is approximately 12 MHz, compared to the nominal 6
MHz of conventional TV
4/2/2003
67
HDTV (3)
High definition television (HDTV) first came to

public attention in 1981, when NHK, the
Japanese broadcasting authority, first
demonstrated it in the United States.
HDTV is defined by the ITU-R as:
q
66
The introduction of a non-compatible TV transmission format for

HDTV would require the viewer either to buy a new receiver, or to
buy a converter to receive the picture on their old set.
The initial thrust in Japan was towards an HDTV format which is
compatible with conventional TV standards, and which can be
received by conventional receivers, with conventional quality.
However, to get the full benefit of HDTV, a new wide screen, high
resolution receiver has to be purchased.
One of the principal reasons that HDTV is not already common is
that a general standard has not yet been agreed. The 26th CCIR
plenary assembly recommended the adoption of a single, worldwide
standard for high definition television.
Unfortunately, Japan, Europe and North America are all investing
significant time and money in their own systems based on their own,
current, conventional TV standards and other national
considerations.
4/2/2003
68
H261- H263
n
n
The H.261 algorithm was developed for the purpose of image

transmission rather than image storage.
It is designed to produce a constant output of p x 64 kbivs, where
p is an integer in the range 1 to 30.
q
H261-H263 (3)
This allows transmission over a digital network or data link of

varying capacity.
It also allows transmission over a single 64 kbit/s digital
telephone channel for low quality video-telephony, or at higher bit
rates for improved picture quality.
The basic coding algorithm is similar to that of MPEG in that it is

a hybrid of motion compensation, DCT and straightforward
DPCM (intra-frame coding mode), without the MPEG I, P, B
frames.
The DCT operation is performed at a low level on 8 x 8 blocks of
error samples from the predicted luminance pixel values, with
sub-sampled blocks of chrominance data.
4/2/2003
69
H261-H263 (2)
n
n
n
n
n
4/2/2003
71
Model Based Coding (MBC)

n
4/2/2003
H.261 is widely used on 176x 144 pixel images.

The ability to select a range of output rates for the algorithm
allows it to be used in different applications.
Low output rates ( p = 1 or 2) are only suitable for face-to-face
(videophone) communication. H.261 is thus the standard used in
many commercial videophone systems such as the UK
BT/Marconi Relate 2000 and the US ATT 2500 products.
Video-conferencing would require a greater output data rate ( p >
6) and might go as high as 2 Mbit/s for high quality transmission
with larger image sizes.
A further development of H.261 is H.263 for lower fixed
transmission rates.
This deploys arithmetic coding in place of the variable length
coding (See H261 diagram), with other modifications, the data
rate is reduced to only 20 kbit/s.
70
At the very low bit rates (20 kbit/s or less) associated with video
telephony, the requirements for image transmission stretch the
compression techniques described earlier to their limits.
In order to achieve the necessary degree of compression they
often require reduction in spatial resolution or even the
elimination of frames from the sequence.
Model based coding (MBC) attempts to exploit a greater degree
of redundancy in images than current techniques, in order to
achieve significant image compression but without adversely
degrading the image content information.
It relies upon the fact that the image quality is largely subjective.
Providing that the appearance of scenes within an observed
image is kept at a visually acceptable level, it may not matter that
the observed image is not a precise reproduction of reality.
4/2/2003
72
Model Based Coding (2)

n
n
n
Model based coding (4)
One MBC method for producing an artificial image of a head sequence

utilizes a feature codebook where a range of facial expressions,
sufficient to create an animation, are generated from sub-images or
templates which are joined together to form a complete face.
The most important areas of a face, for conveying an expression, are
the eyes and mouth, hence the objective is to create an image in which
the movement of the eyes and mouth is a convincing approximation to
the movements of the original subject.
When forming the synthetic image, the feature template vectors which
form the closest match to those of the original moving sequence are
selected from the codebook and then transmitted as low bit rate coded
addresses.
By using only 10 eye and 10 mouth templates, for instance, a total of
100 combinations exists implying that only a 6-bit codebook address
need be transmitted.
It has been found that there are only 13 visually distinct mouth shapes
for vowel and consonant formation during speech.
However, the number of mouth sub-images is usually increased, to
include intermediate expressions and hence avoid step changes in the
image.
4/2/2003
73
Model Based Coding (3)

n
n
n
n
n
n
n
n
A synthetic image is created by texture mapping detail from an

initial full-face source image, over the wire-frame, Facial
movement can be achieved by manipulation of the vertices of the
wire-frame.
Head rotation requires the use of simple matrix operations upon
the coordinate array. Facial expression requires the manipulation
of the features controlling the vertices.
This model based feature codebook approach suffers from the
drawback of codebook formation.
This has to be done off-line and, consequently, the image is
required to be prerecorded, with a consequent delay.
However, the actual image sequence can be sent at a very low
data rate. For a codebook with 128 entries where 7 bits are
required to code each mouth, a 25 frameh sequence requires
less than 200 bit/s to code the mouth movements.
When it is finally implemented, rates as low as 1 kbit/s are
confidently expected from MBC systems, but they can only
transmit image sequences which match the stored model, e.g.
head and shoulders displays.
4/2/2003
75
Key points:
Another common way of representing objects in threedimensional computer graphics is by a net of

interconnecting polygons.
A model is stored as a set of linked arrays which specify
the coordinates of each polygon vertex, with the lines
connecting the vertices together forming each side of a
polygon.
To make realistic models, the polygon net can be
shaded to reflect the presence of light sources.
The wire-frame model [Welch 19911 can be modified to
fit the shape of a person's head and shoulders. The
wire-frame, composed of over 100 interconnecting
triangles, can produce subjectively acceptable synthetic
images, providing that the frame is not rotated by more
than 30" from the full -face position.
The model, (see the Figure) uses smaller triangles in
areas associated with high degrees of curvature where
significant movement is required.
Large flat areas, such as the forehead, contain fewer
triangles.
A second wire-frame is used to model the mouth
interior.
4/2/2003
n
n
JPEG coding mechanism DCT/ Zigzag Scanning/ Adaptive

Quantization / VLC
MPEG layered structure:
q
MPEG compression mechanism:

q
q
q
q
q
q
q
n
n
Prediction
Motion compensation
Scanning
YCbCr formats (4:4:4, 4:2:0, etc)
Profiles @ Level
I,P,B pictures & reordering
Encoder/ Decoder process & Block diagram
MPEG Data transport

MPEG Timing & Buffer control
q
q
74
Pixel, Block, Macroblock, Field DCT Coding / Frame DCT Coding, Slice,
Picture, Group of Pictures (GOP), Sequence, Packetized Elementary Stream
(PES)
4/2/2003
STC/SCR/DTS
PCR/PTS
76
Technical terms
n
n
n
n
n
n
n
n
n
n
n
n
n
n
A Brief History:
Macro blocks
HVS = Human Visual System
GOP = Group of Pictures
VLC = Variable Length Coding/Coder
IDCT/DCT = (Inverse) Discrete Cosine Transform
PES = Packetized ElementaryStream
MP@ML = Main profile @ Main Level
PCR = Program Clock Reference
SCR = System Clock Reference
STC = System Time Clock
PTS = Presentation Time Stamp
DTS = Decode Time Stamp
PAT = Program Association Table
PMT = Program Map Table
4/2/2003
CATV appeared in the 60s in the US, where high

buildings are the great obstacles for the
propagation of TV signal.
Old CATV networks
n
n
n
n
77
Chapter 3. CATV systems
4/2/2003
Coaxial only
Tree-and-Branch only
TV only
No return path ( high-pass filters are installed in
customers houses to block return low frequency noise)
79
Modern CATV networks

n
Key elements:
q
Overview:
A brief history
q Modern CATV networks
q CATV systems and equipments
q
q
q
q
q
4/2/2003
78
4/2/2003
CO or
Master
Headend
Headends/
Hub
Server
complex
CMTS
TV content
provider
Optical
Nodes
Taps
Amplifiers
(GNA/TNA/L
E)
80
CATV systems and equipments
Modern CATV networks (2)

n
n
Based on Hybrid Fiber-Coaxial architecture also referred to

as HFC networks
The optical section is based on modern optical communication
technologies
q
q
q
n
n
Star/ring/mesh, etc topologies

SDH/SONET for digital fibers
Various architectures digital, analog or mixed fiber cabling
systems.
Part of forward path spectrum is used for high-speed Internet

access
Return path is exploited for Digital data communication the
root of new problems !!
q
q
5-60 MHz band for upstream

88-860 MHz band for downstream
n
n
4/2/2003
88-450 MHz for analog/digital TV channels

450-860 MHz for Internet access
FDM
81
Spectrum allocation of CATV networks
4/2/2003
83
Vocabulary
n
4/2/2003
82
Perception = Su nhan thuc

Lap = Phu len
4/2/2003
84

MultiMedia1 - Overview - Chapter 1,2,3-Doc Den Trang 4-133

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MultiMedia1 - Overview - Chapter 1,2,3-Doc Den Trang 4-133

Uploaded by

Copyright:

Available Formats

Multimedia Technology

Chapter 4: Multimedia Network

The importance of Multimedia technologies: Multimedia everywhere !!

Chapter 3: Some real-world systems

Nguyen Chan Hung Hanoi University of Technology

Nguyen Chan Hung Hanoi University of Technology

Chapter 1: Background of compression

Compression factor or compression ratio

For communication: reduce bandwidth in multimedia

Information content and redundancy

Expressed in bits/source output unit (such as bits/pixel).

The more information in the signal, the higher the

Nguyen Chan Hung Hanoi University of Technology

Example: archives resulting from utilities such as

Nguyen Chan Hung Hanoi University of Technology

Based on the understanding of

Nguyen Chan Hung Hanoi University of Technology

Communication (reduce the cost of the data

Can not guarantee a fix compression ratio

Suitable for audio and video compression.

The data from the decoder is identical to the

The data from the expander is not identical to

Recording (extend playing time: in proportion

Data ? Compressor (coder) ? transmission

Data ? Compressor (coder) ? Storagedevice

Nguyen Chan Hung Hanoi University of Technology

Sampling and quantization

Computer can not process analog signal directly.

Statistical coding: the Huffman code

Sample the analog signal at a constant rate and

Map the sampled analog signal (generally, infinite

Nguyen Chan Hung Hanoi University of Technology

Use previous sample(s) to estimate the current

Most codec requires the data being preprocessed or

Nguyen Chan Hung Hanoi University of Technology

Nguyen Chan Hung Hanoi University of Technology

Assign short code to the most probable data

Sensitive to data error

Concealment required for real time application

Compression eliminates the redundancy which is essential

Artifacts appear when the coder eliminates part of the

The higher the compression factor, the more the artifacts.

Nguyen Chan Hung Hanoi University of Technology

A coding example: Clustering color pixels

Motion Compensated Prediction

In an image, pixel values are clustered in several

1. Separate the pixel values into a limited number of data

clusters (e.g., clustered pixels of sky blue or grass green)

identifying number for each cluster as side information.

The number of the average cluster color that it is close to.

Nguyen Chan Hung Hanoi University of Technology

Nguyen Chan Hung Hanoi University of Technology

Unpredictable information from the previous

Data can be sent only for the first instance of a frame

Nguyen Chan Hung Hanoi University of Technology

FDC reduces the information for still images, but

Frame-Differential Coding = prediction from a

More data in Frame-Differential Coding can

Nguyen Chan Hung Hanoi University of Technology

Dealing with unpredictable Information

An Intra-coded picture (MPEG I picture) must be sent

Bi-directionally coded type of picture, or B picture.

Nguyen Chan Hung Hanoi University of Technology