You are on page 1of 21

Multimedia Technology

Introduction (2)

Overview
q
q
q

n
n
n
n
n
n
n
q

4/2/2003

JPEG
MPEG-1/MPEG -2 Audio & Video
MPEG-4
MPEG-7 (brief introduction)
HDTV (brief introduction)
H261/H263 (brief introduction)
Model base coding (MBC) (brief introduction)

CATV systems
DVB systems

Chapter 4: Multimedia Network


Nguyen Chan Hung Hanoi University of Technology

Introduction
n

The importance of Multimedia technologies: Multimedia everywhere !!


q
On PCs:
n
Real Player, QuickTime, Windows Media.
n
Music and Video are free on the INTERNET (mp2, mp3, mp4, asf, mpeg,
mov, ra, ram, mid, DIVX, etc)
n
Video/Audio Conferences.
n
Webcast / Streaming Applications
n
Distance Learning (or Tele-Education)
n
Tele-Medicine
n
Tele-xxx (Lets imagine !!)
q
On TVs and other home electronic devices:
n
DVB-T/DVB-C/DVB-S (Digital Video Broadcasting
Terrestrial/Cable/Satellite) shows MPEG -2 superior quality over
traditional analog TV !!
n
Interactive TV Internet applications (Mail, Web, E -commerce) on a TV !!
No need to wait for a PC to startup and shutdown !!
n
CD/VCD/DVD/Mp3 players
q
Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !!

4/2/2003

Multimedia network

Chapter 3: Some real-world systems


n

Introduction
Chapter 1: Background of compression techniques
Chapter 2: Multimedia technologies

Nguyen Chan Hung Hanoi University of Technology

4/2/2003

The Internet was designed in the 60s for low-speed internetworks with boring textual applications High delay,
high jitter.
Multimedia applications require drastic modifications
of the INTERNET infrastructure.
Many frameworks have been being investigated and
deployed to support the next generation multimedia
Internet. (e.g. IntServ, DiffServ)
In the future, all TVs (and PCs) will be connected to the
Internet and freely tuned to any of millions broadcast
stations all over the World.
At present, multimedia networks run over ATM (almost
obsolete), IPv4, and in the future IPv6 should
guarantee QoS (Quality of Service) !!

Nguyen Chan Hung Hanoi University of Technology

Chapter 1: Background of compression


techniques
n

Why compression ?
q

Compression factor or compression ratio


q

For communication: reduce bandwidth in multimedia


network applications such as Streaming media, Video-onDemand (VOD), Internet Phone
Digital storage (VCD, DVD, tape, etc) Reduce size &
cost, increase media capacity & quality.
Ratio between the source data and the compressed data.
(e.g. 10:1)

2 types of compression:
q
q

4/2/2003

Lossless compression
Lossy compression
Nguyen Chan Hung Hanoi University of Technology

Information content and redundancy


n

Lossy Compression

Information rate
q Entropy is the measure of information content.
n

Expressed in bits/source output unit (such as bits/pixel).

The more information in the signal, the higher the


entropy.
q Lossy compression reduce entropy while lossless
compression does not.
Redundancy
q The difference between the information rate and bit
rate.
q Usually the information rate is much less than the bit
rate.
q Compression is to eliminate the redundancy.
q

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

q
q

Lossless Compression
n

Example: archives resulting from utilities such as


pkzip or Gzip
Compression factor is around 2:1.

Nguyen Chan Hung Hanoi University of Technology

Based on the understanding of


psychoacoustic and psychovisual perception.
Can be forced to operate at a fixed
compression factor.

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

Communication (reduce the cost of the data


link)
q

Can not guarantee a fix compression ratio


The output data rate is variable problems
for recoding mechanisms or communication
channel.

4/2/2003

Suitable for audio and video compression.


Compression factor is much higher than that of
lossless. (up to 100:1)

Process of Compression

The data from the decoder is identical to the


source data.
q

The data from the expander is not identical to


the source data but the difference can not be
distinguished auditorily or visually.

Recording (extend playing time: in proportion


to compression factor
q

Data ? Compressor (coder) ? transmission


channel ? Expander (decoder) ? Data'

4/2/2003

Data ? Compressor (coder) ? Storagedevice


(tape, disk, RAM, etc.) ? Expander (decoder)
? Data

Nguyen Chan Hung Hanoi University of Technology

Sampling and quantization


n

Why sampling?
q

Computer can not process analog signal directly.

PCM
q

Statistical coding: the Huffman code

Sample the analog signal at a constant rate and


use a fixed number of bits (usually 8 or 16) to
represent the samples.
bit rate = sampling rate * number of bits per
sample

Quantization
q

Map the sampled analog signal (generally, infinite


precision) to discrete level (finite precision).
Represent each discrete level with a number.

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

Predictive coding
n

4/2/2003

Use previous sample(s) to estimate the current


sample.
For most signal, the difference of the prediction
and actual values is small. We can use smaller
number of bits to code the difference while
maintaining the same accuracy !!
Noise is completely unpredictable
n

4/2/2003

Most codec requires the data being preprocessed or


otherwise it may perform badly when the data contains
noise.

Nguyen Chan Hung Hanoi University of Technology

Nguyen Chan Hung Hanoi University of Technology

11

Drawbacks of compression

Prediction
q

Assign short code to the most probable data


pattern and long code to the less frequent
data pattern.
Bit assignment based on statistic of the
source data.
The statistics of the data should be known
prior to the bit assignment.

10

Sensitive to data error


q

Concealment required for real time application


q

Compression eliminates the redundancy which is essential


to making data resistant to errors.
Error correction code is required, hence, adds redundancy
to the compressed data.

Artifacts
q

Artifacts appear when the coder eliminates part of the


entropy.

The higher the compression factor, the more the artifacts.

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

12

A coding example: Clustering color pixels


n
n
n

Motion Compensated Prediction

In an image, pixel values are clustered in several


peaks
Each cluster representing the color range of one
object in the image (e.g. blue sky)
Coding process:

1. Separate the pixel values into a limited number of data

clusters (e.g., clustered pixels of sky blue or grass green)


2. Send the average color of each cluster and an

identifying number for each cluster as side information.


3. Transmit, for each pixel:
n
n

4/2/2003

The number of the average cluster color that it is close to.


Its difference from that average cluster color. ( can be
coded to reduce redundancy since the differences are often
similar !!) Prediction

Nguyen Chan Hung Hanoi University of Technology

13

Frame-Differential Coding
n
n

q
q

Nguyen Chan Hung Hanoi University of Technology

15

Unpredictable information from the previous


frame:
Scene change (e.g. background landscape
change)
2. Newly uncovered information due to object
motion across a background, or at the edges of a
panned scene. (e.g. a soccer s face uncovered
by a flying ball)
1.

Data can be sent only for the first instance of a frame


All subsequent prediction error values are zero.
Retransmit the frame occasionally to allow receivers that
have just been turned on to have a starting point.

Nguyen Chan Hung Hanoi University of Technology

4/2/2003

FDC reduces the information for still images, but


leaves significant data for moving images (e.g. a
movement of the camera)

4/2/2003

Unpredictable Information

Frame-Differential Coding = prediction from a


previous video frame.
A video frame is stored in the encoder for
comparison with the present frame causes
encodinglatency of one frame time.
For still images:
q

More data in Frame-Differential Coding can


be eliminated by comparing the present
pixel to the location of the same object
in the previous frame. ( not to the
same spatial location in the previous frame)
The encoder estimates the motion in the
image to find the corresponding area in a
previous frame.
The encoder searches for a portion of a
previous frame which is similar to the part
of the new frame to be transmitted.
It then sends (as side information) a
motion vector telling the decoder what
portion of the previous frame it will use to
predict the new frame.
It also sends the prediction error so that
the exact new frame may be reconstituted
See top figure without motion
compensation Bottom figure With
motion compensation

14

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

16

Dealing with unpredictable Information


n

Scene change
q

An Intra-coded picture (MPEG I picture) must be sent


for a starting point require more data than Predicted
picture (P picture)
I pictures are sent about twice per second Their time and
sending frequency may be adjusted to accommodate
scene changes

4/2/2003

17

Transform Coding

q
q
q
q
q

Bi-directionally coded type of picture, or B picture.


There must be enough frame storage in the system to wait
for the later picture that has the desired information.
To limit the amount of decoders memory, the encoder
stores pictures and sends the required reference
pictures before sending the B picture.
Nguyen Chan Hung Hanoi University of Technology

Nguyen Chan Hung Hanoi University of Technology

Discrete Fourier (DFT)


Karhonen-Loeve
Walsh-Hadamard
Lapped orthogonal
Discrete Cosine (DCT) used in MPEG-2 !
Wavelets New !

The differences between transform coding methods:


q
q

4/2/2003

The degree of concentration of energy in a few coefficients


The region of influence of each coefficient in the
reconstructed picture
The appearance and visibility of coding noise due to coarse
quantization of the coefficients
Nguyen Chan Hung Hanoi University of Technology

19

DCT Lossy Coding

Convert spatial image pixel values to


transform coefficient values
the number of coefficients produced is
equal to the number of pixels transformed.
Few coefficients contain most of the
energy in a picture coefficients may be
further coded by lossless entropy coding
The transform process concentrates the
energy into particular coefficients
(generally the low frequency coefficients )

4/2/2003

Types of picture coding:


q

Uncovered information
q

Types of picture transform coding

Lossless coding cannot obtain high


compression ratio (4:1 or less)
Lossy coding = discard selective information
so that the reproduction is visually or aurally
indistinguishable from the source or having
least artifacts.
Lossy coding can be achieved by:
q
q

18

4/2/2003

Eliminating some DCT coefficients


Adjusting the quantizing coarseness of the
coefficients better !!
Nguyen Chan Hung Hanoi University of Technology

20

Masking
n

Masking make certain types of coding


noise invisible or inaudible due to some
psycho-visual/acoustical effect.
q

Run-Level coding

In audio, a pure tone will mask energy of higher


frequency and also lower frequency (with weaker
effect).
In video, high contrast edges mask random noise.

Nguyen Chan Hung Hanoi University of Technology

21

Variable quantization

q
q

Instead of sending all the zero values


individually, the length of the run is sent.
Useful for any data with long runs of zeros.
Run lengths are easily encoded by Huffman code

Noise introduced at low bit rates falls in the


frequency, spatial, or temporal regions

4/2/2003

"Run-Level" coding = Coding a run-length of


zeros followed by a nonzero level.

Match average bit rate to a constant channel bit rate.

Prevent buffer overflow or underflow.

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

Nguyen Chan Hung Hanoi University of Technology

23

Key points:

Variable quantization is the main technique of lossy


coding greatly reduce bit rate.
Coarsely quantizing the less significant coefficients
in a transform ( less noticeable / low energy / less
visible/audible)
Can be applied to a complete signal or to individual
frequency components of a transformed signal.
VQ also controls instantaneous bit rate in order to:
q

4/2/2003

n
n
n

Compression process
Quantization & Sampling
Coding:
q
q
q
q
q

22

Lossless & lossy coding


Frame-Differential Coding
Motion Compensated Prediction
Variable quantization
Run level coding

Masking

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

24

Chapter 2: Multimedia technologies


q

JPEG Zig-zag scanning

Roadmap
n

JPEG

MPEG-1/MPEG-2 Video
MPEG-1 Layer 3 Audio (mp3)
MPEG-4
MPEG-7 (brief introduction)
HDTV (brief introduction)

n
n
n
n
n
n

H261/H263 (brief introduction)


Model base coding (MBC) (brief introduction)

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

25

JPEG (Joint Photographic Experts Group)


n

q
q

q
q
q

Partitions image into blocks of 8 * 8 pixels


Calculates the Discrete Cosine Transform (DCT) of each block.
A quantizer roundsoff the DCT coefficients according to the
quantizationmatrix. lossy but allows for large compression ratios.
Produces a series of DCT coefficients using Zig-zag scanning
Uses a variablelengthcode(VLC) on these DCT coefficients
Writes the compressed data stream to an output file (*.jpg or *.jpeg).

JPEG decoder
q

Input image A:

File input data stream Variable length decoder IDCT (Inverse


DCT) Image

q
q

26

The input image A is N2 pixels wide by N1 pixels high;


A(i,j) is the intensity of the pixel in row i and column j;

Output image B:
q

Nguyen Chan Hung Hanoi University of Technology

27

DCT is similar to the Discrete Fourier Transform


transforms a signal or image from the spatial domain to
the frequency domain.
DCT requires less multiplications than DFT

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

JPEG - DCT
n

JPEG encoder
q

4/2/2003

4/2/2003

B(k1,k2) is the DCT coefficient in row k1 and column k2 of


the DCT matrix
Nguyen Chan Hung Hanoi University of Technology

28

JPEG - Quantization Matrix


n

n
n

The quantization matrix is the 8 by 8 matrix of step sizes


(sometimescalled quantums) - one element for each DCT
coefficient.
Usually symmetric.
Step sizes will be:
q
q
q

n
n
n

MPEG (Moving Picture Expert Group)


n

q
q
q

Small in the upper left (low frequencies),


Large in the upper right (high frequencies)
A step size of 1 is the most precise.

q
q

The quantizer divides the DCT coefficient by its corresponding


quantum, then rounds to the nearest integer.
Large quantums drive small coefficients down to zero.
The result:
q
q

Nguyen Chan Hung Hanoi University of Technology

29

JPEG Coding process illustrated

4/2/2003

58

-12

-4

-6

78

-1

-1

-73 -27

-1

-5

-5

-4

-1

-4

-5

-3

q
q

11

-65

80

-49

37

-87

12

10

27

-50

29

13

13

-6

-16

21

-11 -10

10

-21

-6

-1

-14

14

-14

16

-8

-4

-1

-13

12

-9

-1

-4

-2

-7

-1

Easily coded by Run-length Huffman coding

A standard for digital television


Applications: DVD (digital versatile disk), HDTV (high definition
TV), DVB (European Digital Video Broadcasting Group), etc.

A standard for multimedia applications


Applications: Internet, cable TV, virtual studio, etc.

MPEG-7 (Future work ongoing research)


q

30

A standard for storage and retrieval of moving pictures and audio


on storage media
application: VCD (video compact disk)

MPEG-4 (Newly implemented still being


researched)

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB

Nguyen Chan Hung Hanoi University of Technology

31

MPEG-2 (Widely implemented)

DCT Coefficients
Quantization result
Zigzag scan result: 78 -1 1 -4 -5 4 4 6 3 2 -1 -3 -5 -4 -1 0 -1 0 1 1 0 0 0 0 0 0 0

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

MPEG-1 (Obsolete)

43

MPEG-1, MPEG-2, MPEG-4, MPEG-7


(MPEG-3 standard was abandoned and became
an extension of MPEG-2)

MPEG standards
n

1255 -15

Digital television set-top boxes


HDTV decoders
DVD players
Video conferencing
Internet video, etc

MPEG standards:
q

Many high frequency coefficients become zero remove easily.


The low frequency coefficients undergo only minor adjustment.

4/2/2003

MPEG is the heart of:

4/2/2003

Content representation standard for information search


( Multimedia Content Description Interface)
Applications: Internet, video search engine, digital library
Nguyen Chan Hung Hanoi University of Technology

32

MPEG-2 formal standards


n

Pixel & Block

The international standard ISO/IEC 13818-2


"Generic Coding of Moving Pictures and
Associated Audio Information
ATSC (Advanced Television Systems
Committee) document A/54 "Guide to the Use of
the ATSC Digital Television Standard

Pixel = "picture element".


q
q

Block
q
q

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

33

MPEG video data structure


n

4/2/2003

PIXEL is the fundamental unit


BLOCK is an 8 x 8 array of pixels
MACROBLOCK consists of 4 luma blocks and 2 chroma
blocks
Field DCT Coding and Frame DCT Coding
SLICE consists of a variable number of macroblocks
PICTURE consists of a frame (or field) of slices
GROUP of PICTURES (GOP) consists of a variable
number of pictures
SEQUENCE consists of a variable number of GOPs
PACKETIZED ELEMENTARY STREAM (opt)

q
q

q
q
q
q

q
q

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

= 8 x 8 array of pixels.
A block is the fundamental unit for the DCT coding
(discrete cosine transform).

Nguyen Chan Hung Hanoi University of Technology

35

Macroblock

The MPEG 2 video data stream is constructed in


layers from lowest to highest as follows:
q

A discrete spatial point sample of an image.


A color pixel may be represented digitally as a
number of bits for each of three primary color
values

A macroblock = 16 x 16 array of luma (Y) pixels ( =


4 blocks = 2 x 2 block array).
The number of chroma pixels (Cr, Cb) will vary
depending on the chroma pixel structure
indicated in the sequence header (e.g. 4:2:0, etc)
The macroblock is the fundamental unit for motion
compensation and will have motion vector(s)
associated with it if is predictively coded.
A macroblock is classified as
Field coded ( An interlaced frame consists of 2 field)
Frame coded
depending on how the four blocks are extracted from the
macroblock.
q
q

34

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

36

Slice
n
n

I, P, B Pictures
Encoded pictures are classified into 3 types: I, P, and B.
n I Pictures = Intra Coded Pictures

Pictures are divided into slices.


A slice consists of an arbitrary number of
successive macroblocks (going left to right),
but is typically an entire row of macroblocks.
A slice does not extend beyond one row.
The slice header carries address information
that allows the Huffman decoder to
resynchronize at slice boundaries

q
q

P Pictures = Predicted Pictures


q

q
Nguyen Chan Hung Hanoi University of Technology

37

Picture
n
n

n
n

q
q
q

4/2/2003

picture type (I, B, P)


temporal reference information
motion vector search range
optional user data

n
n

Nguyen Chan Hung Hanoi University of Technology

39

n
n

The group of pictures layer is optional in MPEG-2.


GOP begins with a start code and a header
The header carries
q time code information
q editing information
q optional user data
First encoded picture in a GOP is always an I picture
Typical length is 15 pictures with the following structure (in display order):
q I B B P B B P B B P B B P B B Provides an I picture with sufficient
frequency to allow a decoder to decode correctly
Forward motion compensation

A frame picture consists of:


q

Macroblocks may be coded with forward prediction from previous I


or P references
Macroblocks may be coded with backward prediction from next I or
P reference
Macroblocks may be coded with interpolated prediction from past
and future I or P references
Macroblocks may be intra coded (no prediction)

Group of pictures (GOP)

A source picture is a contiguous rectangular array of pixels.


A picture may be a complete frame of video ("frame picture") or
one of the interlaced fields from an interlaced source ("field
picture").
A field picture does not have any blank lines between its active
lines of pixels.
A coded picture (also called a video access unit) begins with a
start code and a header. The header consists of:
q

Macroblocks may be coded with forward prediction from references


made from previous I and P pictures or may be intra coded

B Pictures = Bi-directionally predicted pictures

4/2/2003

All macroblocks coded without prediction


Needed to allow receiver to have a "starting point" for prediction after
a channel change and to recover from errors

B B

B P

B
Time

a frame of a progressive source or


a frame (2 spatially interlaced fields) of an interlaced source
Bidirectional motion compensation

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

38

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

40

Sequence
n
n

q
q
q
q
q
q

A sequence begins with a unique 32 bit start code followed by


a header.
The header carries:
q

Transport stream
q
q

picture size
aspect ratio
frame rate and bit rate
optional quantizer matrices
required decoder buffer size
chroma pixel structure
optional user data

q
q

The sequence information is needed for channel changing.


The sequence length depends on acceptable channel change
delay.

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

41

Packetized Elementary Stream (PES)


n

n
n

n
n

Video Elementary Stream (video ES), consists of all


the video data for a sequence, including the sequence
header and all the subparts of a sequence.
An ES carries only one type of data (video or audio)
from a single video or audio encoder.
A PES, consists of a single ES which has been split
into packets, each starting with an added packet
header.
A PES stream contains only one type of data from
one source, e.g. from one video or audio encoder.
PES packets have variable length, not corresponding
to the fixed packet length of transport packets, and
may be much longer than a transport packet.

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

Transport packets (fixedlength) are formed from a PES stream,


including:

42

The PES header


Transport packet header.
Successive transport packets payloads are filled by the remaining
PES packet content until the PES packet is all used.
The final transport packet is filled to a fixed length by stuffing with
0xFF bytes (all ones).

Each PES packet header includes:


q An 8-bit stream ID identifying the source of the payload.
q Timing references: PTS (presentation time stamp), the time
at which a decoded audio or video access unit is to be
presented by the decoder
q DTS (decoding time stamp) the time at which an access unit
is decoded by the decoder
q ESCR (elementary stream clock reference).

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

43

Intra Frame Coding


n

Intra coding only concern with information within the current


frame, (not relative to any other frame in the video sequence)
MPEG intra-frame coding block diagram (See bottom Fig)
Similar to JPEG (Lets review JPEG coding mechanism !!)
Basic blocks of Intra frame coder:
q
q
q
q

4/2/2003

Video filter
Discrete cosine transform (DCT)
DCT coefficient quantizer
Run-length amplitude/variable length coder (VLC)

Nguyen Chan Hung Hanoi University of Technology

44

Video Filter
n

Most sensitive to changes in luminance,


Less sensitive to variations in chrominance.

q
q

MPEG-2 is classified into several profiles.


Main profile features:
q
q

MPEG uses the YCbCr color space to represent the


data values instead of RGB, where:
q

Human Visual System (HVS) is


q

MPEG Profiles & levels

Y is the luminance signal,


Cb is the blue color difference signal,
Cr is the red color difference signal.

Main Profile is subdivided into levels.


q

n
n
n

4:4:4 is full bandwidth YCbCr video each macroblock


consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks
waste of bandwidth !!
4:2:0 is most commonly used in MPEG-2

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

n
n

45

Multiplex order (time)


within macroblock

Main stream television,


Consumer entertainment.

4:2:2
(8 blocks)

YYYYCbCrCbCr

Studio production
environments
Professional editing
equipment,

4:4:4
(12 blocks)

YYYYCbCrCbCrCbCrCbCr

Computer graphics

4/2/2003

Upper bounds:
1152 x 1920, 60Hz progressive
80 Mbits/s
Nguyen Chan Hung Hanoi University of Technology

47

MPEG encoder/ decoder

Application

YYYYCbCr

4:2:0
(6 blocks)

4/2/2003

Designed with CCIR601 standard for interlaced standard digital


video.
720 x 576 (PAL) or 720 x 483 (NTSC)
30 Hz progressive, 60 Hz interlaced
Maximum bit rate is 15 Mbits/s

MP@HL (Main Profile High Level):


n

Applications of chroma formats


chroma_for
mat

MP@ML (Main Profile Main Level):


n

What is 4:4:4, 4:2:0, etc, video format ?


q

4:2:0 chroma sampling format


I, P, and B pictures
Non-scalable

Nguyen Chan Hung Hanoi University of Technology

46

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

48

Prediction

DCT and IDCT formulas


n

Backward prediction is done by


storing pictures until the desired
anchor picture is available before
encoding the current stored frames.
The encoder can decide to use:
q

4/2/2003

n
n
n

49

q
q
q
q
q

4/2/2003

4/2/2003

F(u,v) = two-dimensional
NxN DCT.
u,v,x,y = 0,1,2,...N-1
x,y are spatial coordinates in
the sample domain.
u,v are frequency coordinates
in the transform domain.
C(u), C(v) = 1/(square root
(2)) for u, v = 0.
C(u), C(v) = 1 otherwise.

Nguyen Chan Hung Hanoi University of Technology

51

The DCT is conceptually similar to the DFT, except:


q

I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)
B(12) I(13)

I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)
B(12) I(13)
50

DCT concentrates energy into lower order coefficients


better than DFT.
DCT is purely real, the DFT is complex (magnitude and
phase).
A DCT operation on a block of pixels produces coefficients
that are similar to the frequency domain coefficients
produced by a DFT operation.
n

I(1) P(4) B(2) B(3) P(7) B(5) B(6) P(10) B(8) B(9) I(13) B(11)
B(12)

Nguyen Chan Hung Hanoi University of Technology

Eq 3 Normal form
Eq 4 Matrix form

Where:
q

Decoder output order and display order (same as


input):
q

IDCT:
q

Encoding order and order in the coded bitstream:


q

Eq 1 Normal form
Eq 2 Matrix form

DCT versus DFT

Pictures are coded and decoded in a different order


than they are displayed.
Due to bidirectional prediction for B pictures.
For example we have a 12 picture long GOP:
Source order and encoder input order:
q

to minimize prediction error.


The encoder must transmit pictures in
an order differ from that of source
pictures so that the decoder has the
anchor pictures before decoding
predicted pictures. (See next slide)
The decoder must have two frame
stored.

Nguyen Chan Hung Hanoi University of Technology

DCT:
q

Forward prediction from a previous


picture,
Backward prediction from a following
picture,
or Interpolated prediction

I P B Picture Reordering
n

4/2/2003

An N-point DCT has the same frequency resolution as a 2Npoint DFT.


The N frequencies of a 2N point DFT correspond to N points
on the upper half of the unit circle in the complex frequency
plane.

Assuming a periodic input, the magnitude of the DFT


coefficients is spatially invariant (phase of the input does
not matter). This is not true for the DCT.
Nguyen Chan Hung Hanoi University of Technology

52

Quantization matrix

MPEG scanning
n

Note DCT
coefficients are:

Small in the upper left


(low frequencies),
q Large in the upper right
(high frequencies)
Recall the JPEG
mechanism !!
q

HVS is less sensitive


to errors in high
frequency coefficients
than it is for lower
frequencies
higherfrequencies
should be more
coarsely quantized !!

Nguyen Chan Hung Hanoi University of Technology

53

Result DCT matrix (example)


n

4/2/2003

After adaptive
quantization, the
result is a matrix
containing many
zeros.

Nguyen Chan Hung Hanoi University of Technology

Nguyen Chan Hung Hanoi University of Technology

55

Huffman/ Run-Level Coding

4/2/2003

Left Zigzag scanning (like JPEG)


Right Alternate scanning better for interlaced frames !

Why ?
q

4/2/2003

54

Huffman coding in combination with Run-Level


coding and zig-zag scanning is applied to
quantized DCT coefficients.
"Run-Level" = A run-length of zeros followed by a
non-zero level.
Huffman coding is also applied to various types of
side information.
A Huffman code is an entropy code which is
optimally achieves the shortest average possible
code word length for a source.
This average code word length is >= the entropy
of the source.

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

56

Huffman/ Run-Level coding illustrated


Zero
Run-Length

Amplitude

N/A

8 (DC Value)

110 1000

00001100

00001100

01000

01000

01000

110

110

110

110

12

0010 0010 0

EOB

EOB

10

4/2/2003

MPEG
Code Value

MPEG Data Transport

n
n

q
q

q
q

The most probable


occurrence is given a
relatively short code,
The least probable
occurrence is given a
relatively long code.

Nguyen Chan Hung Hanoi University of Technology

57

The first run of 12 zeroes has been efficiently


coded by only 9 bits
The last run of 41 zeroes has been entirely
eliminated, represented only with a 2-bit End Of
Block (EOB) indicator.
The quantized DCT coefficients are now
represented by a sequence of 61 binary bits (See
the table).
Considering that the original 8x8 block of 8-bit
pixels required 512 bits for full representation,
the compression rate is approx. 8,4:1.

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

59

MPEG Transport packet

Adaptation Field:
q
q
q
q
q

58

The PES header is placed immediately following a transport header


Successive portions of the PES packet are then placed in the pay loads of
transport packets.
Remaining space in the final transport packet payload is filled with stuffing
bytes = 0xFF (all ones).
Each transport packet starts with a sync byte = 0x47.
In the ATSC US terrestrial DTV VSB transmission system, sync byte is not
processed, but is replaced by a different sync symbol especially suited to RF
transmission.
The transport packet header contains a 13-bit PID (packet ID), which
corresponds to a particular elementary stream of video, audio, o r other program
element.
PID 0x0000 is reserved for transport packets carrying a program association
table (PAT).
The PAT points to a Program Map Table (PMT) points to particular elements
of a program

Huffman/ Run-Level coding illustrated (2)


n

MPEG packages all data into fixed-size 188-byte packets for transport.
Video or audio payload data placed in PES packets before is broken up
into fixed length transport packet payloads.
A PES packet may be much longer than a transport packet Require
segmentation:

Using the DCT output


matrix in previous slide,
after being zigzag
scanned the output
will be a sequence of
number: 4, 4, 2, 2, 2, 1,
1, 1, 1, 0 (12 zeros), 1, 0
(41 zeros)
These values are looked
up in a fixed table of
variable length codes

8 bits specifying the length of the


adaptation field.
The first group of flags consists of
eight 1-bit flags:
discontinuity_indicator
random_access_indicator
elementary_stream_priority_in
dicator

4/2/2003

q
q
q
q
q
q

PCR_flag
OPCR_flag
splicing_point_flag
transport_private_data_flag
adaptation_field_extension_flag
The optionalfields are present if
indicated by one of the preceding flags.
The remainder of the adaptation field is
filled with stuffing bytes (0xFF, all
ones).

Nguyen Chan Hung Hanoi University of Technology

60

Demultiplexing a Transport Stream (TS)


Demultiplexing a transport stream involves:

Finding the PAT by selecting packets with PID = 0x0000


Reading the PIDs for the PMTs
Reading the PIDs for the elements of a desired program
from its PMT (for example, a basic program will have a
PID for audio and a PID for video)
Detecting packets with the desired PIDs and routing them
to the decoders

1.
2.
3.

4.

A MPEG-2 transport stream can carry:

Timing - Synchronization
n

4/2/2003

n
n

Nguyen Chan Hung Hanoi University of Technology

Timing & buffer control

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

A program component can even have no time stamps but


can not be synchronized with other components.
At encoder input, (Point A), the time of occurrence of an input
video picture or audio block is noted by sampling the STC.
A total delay of encoder and decoder buffer (constant) is
added to STC, creating a Presentation Time Stamp (PTS),
PTS is then inserted in the first of the packet(s ) representing
that picture or audio block, at Point B.

61

The STC belongs to aparticular program and is the master


clock of the video and audio encoders for that program.
Multiple programs, each with its own STC, can also be
multiplexed into a single stream.

Video stream

Audio stream

Any type of data


MPEG-2 TS is the packet format for CATV downstream
data communication.

The decoder is synchronized with the encoder by time stamps


The encoder contains a master oscillator and counter, called the
System Time Clock (STC). (See previous block diagram.)

Point A:
Encoder input

Constant/specifi
edrate
Point B:
Encoder
output
Variable rate
Point C:
Encoderbuffer
output
Constant rate
Point D:
Communication
channel +
decoderbuffer
Constant
rate
Point E:
Decoder input
Variable rate
Point F:
Decoderoutput

Constant/specifi
edrate
62

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

63

Timing Synchronization (2)


n

Decode Time Stamp (DTS) can optionally combined into the bit
stream represents the time at which the data should be taken
instantaneously from the decoder buffer and decoded.
q
q
q
q

In addition, the output of the encoder buffer (Point C) is time


stamped with System Time Clock (STC) values, called:
q
q

n
n
n

DTS and PTS are identical except in the case of picture reordering for B
pictures.
The DTS is only used where it is needed because of reordering.
Whenever DTS is used, PTS is also coded.
PTS (or DTS) inserted interval = 700 m S.
In ATSC PTS (or DTS) must be inserted at the beginning of each
coded picture (access unit ).

System Clock Reference (SCR) in a Program Stream.


Program Clock Reference (PCR) in a Transport Stream.

PCR time stamp interval = 100mS.


SCR time stamp interval = 700mS.
PCR and/or the SCR are used to synchronize the decoder STC
with the encoder STC.

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

64

Timing Synchronization (3)


n

n
n
n

HDTV (2)

All video and audio streams included in a program must get their
time stamps from a common STC so that synchronization of the
video and audio decoders with each other may be accomplished.
The data rate and packet rate on the channel (at the multiplexer
output) can be completely asynchronous with the System Time
Clock (STC)
PCR time stamps allows synchronizations of different
multiplexed programs having different STCs while allowing STC
recovery for each program.
If there is no buffer underflow or overflow delays in the buffers
and transmission channel for both video and audio are
constant.
The encoder input and decoder output run at equal and constant
rates.
Fixedend-to-end delay from encoder input to decoder output
If exact synchronization is not required, the decoder clock can be
free running video frames can be repeated / skipped as
necessary to prevent buffer underflow / overflow , respectively.

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

65

HDTV (High definition television)


n

4/2/2003

'A system designed to allow viewing at about


three times the picture height, such that the
system is virtually, or nearly, transparent to the
quality or portrayal that would have been
perceived in the original scene ... by a discerning
viewer with normal visual acuity.'
Nguyen Chan Hung Hanoi University of Technology

HDTV proposals are for a screen which is wider than the conventional
TV image by about 33%. It is generally agreed that the HDTV aspect
ratio will be 16:9, as opposed to the 4:3 ratio of conventional TV
systems. This ratio has been chosen because psychological tests have
shown that it best matches the human visual field.
It also enables use of existing cinema film formats as additional source
material, since this is the same aspect ratio used in normal 35 mm film.
Figure 16.6(a) shows how the aspect ratio of HDTV compares with that
of conventional television, using the same resolution, or the same
surface area as the comparison metric.
To achieve the improved resolution the video image used in HDTV
must contain over 1000 lines, as opposed to the 525 and 625 provided
by the existing NTSC and PAL systems. This gives a much improved
vertical resolution. The exact value is chosen to be a simple multiple of
one or both of the vertical resolutions used in conventional TV.
However, due to the higher scan rates the bandwidth requirement for
analogue HDTV is approximately 12 MHz, compared to the nominal 6
MHz of conventional TV

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

67

HDTV (3)

High definition television (HDTV) first came to


public attention in 1981, when NHK, the
Japanese broadcasting authority, first
demonstrated it in the United States.
HDTV is defined by the ITU-R as:
q

66

The introduction of a non-compatible TV transmission format for


HDTV would require the viewer either to buy a new receiver, or to
buy a converter to receive the picture on their old set.
The initial thrust in Japan was towards an HDTV format which is
compatible with conventional TV standards, and which can be
received by conventional receivers, with conventional quality.
However, to get the full benefit of HDTV, a new wide screen, high
resolution receiver has to be purchased.
One of the principal reasons that HDTV is not already common is
that a general standard has not yet been agreed. The 26th CCIR
plenary assembly recommended the adoption of a single, worldwide
standard for high definition television.
Unfortunately, Japan, Europe and North America are all investing
significant time and money in their own systems based on their own,
current, conventional TV standards and other national
considerations.

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

68

H261- H263
n
n

The H.261 algorithm was developed for the purpose of image


transmission rather than image storage.
It is designed to produce a constant output of p x 64 kbivs, where
p is an integer in the range 1 to 30.
q

H261-H263 (3)

This allows transmission over a digital network or data link of


varying capacity.
It also allows transmission over a single 64 kbit/s digital
telephone channel for low quality video-telephony, or at higher bit
rates for improved picture quality.

The basic coding algorithm is similar to that of MPEG in that it is


a hybrid of motion compensation, DCT and straightforward
DPCM (intra-frame coding mode), without the MPEG I, P, B
frames.
The DCT operation is performed at a low level on 8 x 8 blocks of
error samples from the predicted luminance pixel values, with
sub-sampled blocks of chrominance data.

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

69

H261-H263 (2)

n
n
n

n
n

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

Nguyen Chan Hung Hanoi University of Technology

71

Model Based Coding (MBC)


n

4/2/2003

H.261 is widely used on 176x 144 pixel images.


The ability to select a range of output rates for the algorithm
allows it to be used in different applications.
Low output rates ( p = 1 or 2) are only suitable for face-to-face
(videophone) communication. H.261 is thus the standard used in
many commercial videophone systems such as the UK
BT/Marconi Relate 2000 and the US ATT 2500 products.
Video-conferencing would require a greater output data rate ( p >
6) and might go as high as 2 Mbit/s for high quality transmission
with larger image sizes.
A further development of H.261 is H.263 for lower fixed
transmission rates.
This deploys arithmetic coding in place of the variable length
coding (See H261 diagram), with other modifications, the data
rate is reduced to only 20 kbit/s.

70

At the very low bit rates (20 kbit/s or less) associated with video
telephony, the requirements for image transmission stretch the
compression techniques described earlier to their limits.
In order to achieve the necessary degree of compression they
often require reduction in spatial resolution or even the
elimination of frames from the sequence.
Model based coding (MBC) attempts to exploit a greater degree
of redundancy in images than current techniques, in order to
achieve significant image compression but without adversely
degrading the image content information.
It relies upon the fact that the image quality is largely subjective.
Providing that the appearance of scenes within an observed
image is kept at a visually acceptable level, it may not matter that
the observed image is not a precise reproduction of reality.

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

72

Model Based Coding (2)


n

n
n

Model based coding (4)

One MBC method for producing an artificial image of a head sequence


utilizes a feature codebook where a range of facial expressions,
sufficient to create an animation, are generated from sub-images or
templates which are joined together to form a complete face.
The most important areas of a face, for conveying an expression, are
the eyes and mouth, hence the objective is to create an image in which
the movement of the eyes and mouth is a convincing approximation to
the movements of the original subject.
When forming the synthetic image, the feature template vectors which
form the closest match to those of the original moving sequence are
selected from the codebook and then transmitted as low bit rate coded
addresses.
By using only 10 eye and 10 mouth templates, for instance, a total of
100 combinations exists implying that only a 6-bit codebook address
need be transmitted.
It has been found that there are only 13 visually distinct mouth shapes
for vowel and consonant formation during speech.
However, the number of mouth sub-images is usually increased, to
include intermediate expressions and hence avoid step changes in the
image.

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

73

Model Based Coding (3)


n

n
n

n
n

Nguyen Chan Hung Hanoi University of Technology

n
n
n

A synthetic image is created by texture mapping detail from an


initial full-face source image, over the wire-frame, Facial
movement can be achieved by manipulation of the vertices of the
wire-frame.
Head rotation requires the use of simple matrix operations upon
the coordinate array. Facial expression requires the manipulation
of the features controlling the vertices.
This model based feature codebook approach suffers from the
drawback of codebook formation.
This has to be done off-line and, consequently, the image is
required to be prerecorded, with a consequent delay.
However, the actual image sequence can be sent at a very low
data rate. For a codebook with 128 entries where 7 bits are
required to code each mouth, a 25 frameh sequence requires
less than 200 bit/s to code the mouth movements.
When it is finally implemented, rates as low as 1 kbit/s are
confidently expected from MBC systems, but they can only
transmit image sequences which match the stored model, e.g.
head and shoulders displays.

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

75

Key points:

Another common way of representing objects in threedimensional computer graphics is by a net of


interconnecting polygons.
A model is stored as a set of linked arrays which specify
the coordinates of each polygon vertex, with the lines
connecting the vertices together forming each side of a
polygon.
To make realistic models, the polygon net can be
shaded to reflect the presence of light sources.
The wire-frame model [Welch 19911 can be modified to
fit the shape of a person's head and shoulders. The
wire-frame, composed of over 100 interconnecting
triangles, can produce subjectively acceptable synthetic
images, providing that the frame is not rotated by more
than 30" from the full -face position.
The model, (see the Figure) uses smaller triangles in
areas associated with high degrees of curvature where
significant movement is required.
Large flat areas, such as the forehead, contain fewer
triangles.
A second wire-frame is used to model the mouth
interior.
4/2/2003

n
n

JPEG coding mechanism DCT/ Zigzag Scanning/ Adaptive


Quantization / VLC
MPEG layered structure:
q

MPEG compression mechanism:


q
q
q
q
q
q
q

n
n

Prediction
Motion compensation
Scanning
YCbCr formats (4:4:4, 4:2:0, etc)
Profiles @ Level
I,P,B pictures & reordering
Encoder/ Decoder process & Block diagram

MPEG Data transport


MPEG Timing & Buffer control
q
q

74

Pixel, Block, Macroblock, Field DCT Coding / Frame DCT Coding, Slice,
Picture, Group of Pictures (GOP), Sequence, Packetized Elementary Stream
(PES)

4/2/2003

STC/SCR/DTS
PCR/PTS

Nguyen Chan Hung Hanoi University of Technology

76

Technical terms
n
n
n
n
n
n
n
n
n
n
n
n
n
n

A Brief History:

Macro blocks
HVS = Human Visual System
GOP = Group of Pictures
VLC = Variable Length Coding/Coder
IDCT/DCT = (Inverse) Discrete Cosine Transform
PES = Packetized ElementaryStream
MP@ML = Main profile @ Main Level
PCR = Program Clock Reference
SCR = System Clock Reference
STC = System Time Clock
PTS = Presentation Time Stamp
DTS = Decode Time Stamp
PAT = Program Association Table
PMT = Program Map Table

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

CATV appeared in the 60s in the US, where high


buildings are the great obstacles for the
propagation of TV signal.
Old CATV networks
n
n
n
n

77

Chapter 3. CATV systems

4/2/2003

Coaxial only
Tree-and-Branch only
TV only
No return path ( high-pass filters are installed in
customers houses to block return low frequency noise)

Nguyen Chan Hung Hanoi University of Technology

79

Modern CATV networks


n

Key elements:
q

Overview:
A brief history
q Modern CATV networks
q CATV systems and equipments
q

q
q

q
q

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

78

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

CO or
Master
Headend
Headends/
Hub
Server
complex
CMTS
TV content
provider
Optical
Nodes
Taps
Amplifiers
(GNA/TNA/L
E)

80

CATV systems and equipments

Modern CATV networks (2)


n
n

Based on Hybrid Fiber-Coaxial architecture also referred to


as HFC networks
The optical section is based on modern optical communication
technologies
q
q
q

n
n

Star/ring/mesh, etc topologies


SDH/SONET for digital fibers
Various architectures digital, analog or mixed fiber cabling
systems.

Part of forward path spectrum is used for high-speed Internet


access
Return path is exploited for Digital data communication the
root of new problems !!
q
q

5-60 MHz band for upstream


88-860 MHz band for downstream
n
n

4/2/2003

88-450 MHz for analog/digital TV channels


450-860 MHz for Internet access

FDM
Nguyen Chan Hung Hanoi University of Technology

81

Spectrum allocation of CATV networks

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

83

Vocabulary
n

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

82

Perception = Su nhan thuc


Lap = Phu len

4/2/2003

Nguyen Chan Hung Hanoi University of Technology

84

You might also like