You are on page 1of 171

Cng ngh Multimedia

Khi qut
Gii thiu
Chng 1: Nn tng k thut nn
Chng 2: Cc k thut multimedia

Jpeg
Mpeg-1/Mpeg-2 Audio&Video
Mpeg-4
Mpeg-7 (Gii thiu vn tt)
HDTV (Gii thiu vn tt)
H261/H263 (Gii thiu vn tt)
Model-Based coding (Gii thiu vn tt)

Chng 3: Mng multimedia

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

Multimedia Technology

Overview

Introduction
Chapter 1: Background of compression
techniques
Chapter 2: Multimedia technologies

9/14/2006

JPEG
MPEG-1/MPEG-2 Audio & Video
MPEG-4
MPEG-7 (brief introduction)
HDTV (brief introduction)
H261/H263 (brief introduction)
Model base coding (MBC) (brief introduction)

Chapter 3: Multimedia Network


Nguyen Chan Hung Hanoi University of Technology

Gii thiu

Tm quan trng ca cc k thut Multimedia: -> Multimedia c


khp ni

Trong PC:

Trong truyn hnh v cc thit b in t dn dng:

Real player, Quicktime, Media


m nhc, hnh nh min ph trn internet (mp2, mp3, mp4, asf, ra, ram, mid,
DIVX, v..v...)
Hi tho trc tuyn m thanh, hnh nh
Dch v qung co trn web, truyn s liu
Gio dc t xa.
Y hc t xa
........
DVB-T/DVB-C/DVB-S (Digital Video Broadcastsing-Terrestrial/Cable/Satellite _
Truyn hnh s mt t/cp/v tinh) -> biu din MPEG-2 cht lng cao hn
hn truyn hnh tng t truyn thng.
Truyn hnh tng tc -> Cc ng dng internet trn truyn hnh (Mail,Web, Ecommerce_thng mi in t) -> khng cn i PC khi ng, tt my.
Cc u c CD/VCD/DVD/Mp3

ng thi xut hin trn cc thit b cm tay ( TD th h 3G, PDA


khng dy)

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

Introduction

The importance of Multimedia technologies: Multimedia everywhere !!

On PCs:

Real Player, QuickTime, Windows Media.

Music and Video are free on the INTERNET (mp2, mp3, mp4, asf, mpeg,
mov, ra, ram, mid, DIVX, etc)

Video/Audio Conferences.

Webcast/ Streaming Applications

Distance Learning (or Tele-Education)

Tele-Medicine

Tele-xxx (Lets imagine !!)

On TVs and other home electronic devices:

DVB-T/DVB-C/DVB-S (Digital Video Broadcasting


Terrestrial/Cable/Satellite) shows MPEG-2 superior quality over
traditional analog TV !!

Interactive TV Internet applications (Mail, Web, E-commerce) on a TV !!


No need to wait for a PC to startup and shutdown !!

CD/VCD/DVD/Mp3 players

Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !!

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

Gii thiu (2)

Mng Multimedia

9/14/2006

Internet c thit k vo nhng nm 60 cho cc


mng tc thp vi nhng ng dng vn bn
nhm chn. -> tr cao, jitter cao.
-> Nhng ng dng multimedia yu cu c s bin
i mnh m ca c s h tng internet.
Nhiu c cu t chc c nghin cu v trin khai
h tr cho th h multimedia internet tip theo.
(VD: intServ, DiffServ)
Trong tng lai, tt c mi tivi (v PC) s kt ni
internet v bt sng min ph vi hng triu trm
pht sng trn ton th gii.
Hin ti, mng multimedia chy trn ATM ( c),
IPv4, v tng lai l IPv6 -> nn s bo m c
cht lng dch v QoS (Quality of Service)
Nguyen Chan Hung Hanoi University of Technology

Introduction (2)

Multimedia network

9/14/2006

The Internet was designed in the 60s for low-speed internetworks with boring textual applications High delay,
high jitter.
Multimedia applications require drastic modifications
of the INTERNET infrastructure.
Many frameworks have been being investigated and
deployed to support the next generation multimedia
Internet. (e.g. IntServ, DiffServ)
In the future, all TVs (and PCs) will be connected to the
Internet and freely tuned to any of millions broadcast
stations all over the World.
At present, multimedia networks run over ATM (almost
obsolete), IPv4, and in the future IPv6 should
guarantee QoS (Quality of Service) !!

Nguyen Chan Hung Hanoi University of Technology

Chng 1: Nn tng k thut nn

Ti sao phi nn ?

H s nn hay t l nn

Trong truyn thng: thu hp di thng trong cc ng


dng mng multimedia nh streaming, video theo yu cu
VOD (video on demand), internet phone.
Cc vt cha k thut s (VCD, DVD, bng v..v..) -> gim
kch c, gim gi c, tng dung lng v cht lng ct
gi m thanh, hnh nh.
T l gia d liu ngun v d liu nn (VD: 10:1)

2 loi nn:

9/14/2006

Nn khng tn hao
Nn tn hao

Nguyen Chan Hung Hanoi University of Technology

Chapter 1: Background of compression


techniques

Why compression ?

Compression factor or compression ratio

For communication: reduce bandwidth in multimedia


network applications such as Streaming media, Video-onDemand (VOD), Internet Phone
Digital storage (VCD, DVD, tape, etc) Reduce size &
cost, increase media capacity & quality.
Ratio between the source data and the compressed data.
(e.g. 10:1)

2 types of compression:

9/14/2006

Lossless compression
Lossy compression
Nguyen Chan Hung Hanoi University of Technology

2.1. Ni dung thng tin v d tha

Ni dung thng tin:

Entropy l i lng o ca ni dung thng tin. Entropy


quy nh gii hn di ca tc bit hay dng d liu.

Tn hiu cng nhiu thng tin th entropy cng cao


Nn tn hao th lm gim entropy cn nn khng tn hao
th khng

D tha thng tin:

-> Biu din bi bits/n v ngun u ra (nh bits/pixel)

L s khc nhau gia tc thng tin v tc bit


Thng thng tc thng tin thp hn tc bit rt nhiu

Nn l loi b s d tha

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

Information content and redundancy

Information rate
Entropy is the measure of information content.

Expressed in bits/source output unit (such as bits/pixel).

The more information in the signal, the higher the


entropy.
Lossy compression reduce entropy while lossless
compression does not.
Redundancy
The difference between the information rate and bit
rate.
Usually the information rate is much less than the bit
rate.
Compression is to eliminate the redundancy.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

10

2.2. Entropy (B sung 1)

For a discrete source X with a finite alphabet of N symbols (x0, . . ., xN.1)


and a probability mass function of p(x), the entropy of the source in
bits/symbol is given by

and measures the average number of bits/symbol required to describe the


source.
Such a discrete source is encountered in image compression, in which the
acquired digital image pixels can take on only a finite number of values as
determined by the number of bits used to represent each pixel.
It is easy to show (using the method of Lagrange multipliers) that the
uniform distribution achieves maximum entropy, given by H(X) = log2 N.
A uniformly distributed source can be considered to have maximum
randomness when compared with sources having other distributions
Combining this with the intuitive English text example mentioned previously,
it is apparent that entropy provides a measure of the compressibility of a
source. High entropy indicates more randomness; hence the source
requires more bits on average to describe a symbol.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

11

Entropy (b sung 2)

Calculating EntropyAn Example


An example illustrates the computation of entropy the difficulty in
determining the entropy of a fixed-length signal. Consider the
four-point signal [3/4 1/4 0 0].
There are three distinct values (or symbols) in this signal, with
probabilities 1/4, 1/4, and 1/2 for the symbols 3/4, 1/4, and 0,
respectively. The entropy of the signal is then computed as

This indicates that a variable length code requires 1.5


bits/symbol on average to represent this source.
In fact, a variable-length code that achieves this entropy is [10 11
0] for the symbols [3/4 1/4 0].

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

12

2.3. Nn khng tn hao

D liu gii m ging ht d liu ngun

VD: Cc file u ra ca cc chng trnh tin ch


nh pkzip hay Gzip
H s nn khong 2:1 5:1 (ty theo d tha
thng tin)

Khng th bo m 1 t l truyn c nh -> v tc


d liu u ra bin i -> ny sinh cc vn
cho c cu ghi v truyn thng.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

13

Lossless Compression

The data from the decoder is identical to the


source data.

Example: archives resulting from utilities such as


pkzip or Gzip
Compression factor is around 2:1.

Can not guarantee a fix compression ratio


The output data rate is variable problems
for recoding mechanisms or communication
channel.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

14

2.4. Nn tn hao:

D liu gii nn khc dliu ngun nhng s khc


bit khng th phn bit c r rng bng tai
hoc mt thng.

Ph hp vi m thanh, hnh nh nn.


H s nn cao hn so vi nn khng tn hao (ln ti
100:1)

Da trn nhng kin thc v s nhn thc v th


gic v thnh gic
C thn nh 1 h s nn c nh

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

15

Lossy Compression

The data from the expander is not identical to


the source data but the difference can not be
distinguished auditorily or visually.

Suitable for audio and video compression.


Compression factor is much higher than that of
lossless. (up to 100:1)

Based on the understanding of


psychoacoustic and psychovisual perception.
Can be forced to operate at a fixed
compression factor.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

16

2.5. Qu trnh nn:

Truyn thng (gim chi ph kt ni d liu)

D liu -> B nn (m ho) -> knh truyn dn -> b


gin (gii m) -> d liu

C cu ghi (tng thi gian pht li: t l vi h s


nn)

9/14/2006

D liu -> nn (m ho) -> thit b cha (bng, a,


Ram ...) -> b gin (gii m) -> D liu

Nguyen Chan Hung Hanoi University of Technology

17

Process of Compression

Communication (reduce the cost of the data


link)

DataCompressor (coder)transmission channel


Expander (decoder) Data'

Recording (extend playing time: in proportion


to compression factor

9/14/2006

Data Compressor (coder) Storage device


(tape, disk, RAM, etc.) Expander (decoder)
Data

Nguyen Chan Hung Hanoi University of Technology

18

2.6. Ly mu v lng t ho:

Ti sao ly mu?

PCM (Pulse code modulation) - iu xung m:

My tnh khng th x l trc tip tn hiu tng t


Ly mu tn hiu tng t tc khng i v s dng mt s bit
khng i (thng l 8 hay 16) biu din cc mu.
Tc bit = tc ly mu * s bit/mu

Lng t ho:

9/14/2006

nh x cc tn hiu tng t ly mu (c chnh xc v


hn) sang cc mc ri rc ( chnh xc hu hn)
Biu din mi mc ri rc bng 1 s.

Nguyen Chan Hung Hanoi University of Technology

19

Sampling and quantization

Why sampling?

PCM

Computer can not process analog signal directly.


Sample the analog signal at a constant rate and
use a fixed number of bits (usually 8 or 16) to
represent the samples.
bit rate = sampling rate * number of bits per
sample

Quantization

9/14/2006

Map the sampled analog signal (generally, infinite


precision) to discrete level (finite precision).
Represent each discrete level with a number.
Nguyen Chan Hung Hanoi University of Technology

20

2.7. M ho d on:

D on:

Dng cc mu trc c lng mu hin thi.


i vi hu ht tn hiu, s khc nhau ca gi tr d on vi gi
tr thc t l nh -> ta c th dng s bit nh hn m ho s
sai khc trong khi vn duy tr c cng 1 chnh xc.
Gi i sai khc ca mu vi gi tr don c to ra t cc
mu trc.

Nhiu l hon ton khng th d on c

9/14/2006

Hu ht cc Codec yu cu d liu phi c x l trc, nu


khng Codec s hot ng km khi c nhiu.

Nguyen Chan Hung Hanoi University of Technology

21

Predictive Coding (b sung)

In predictive coding, rather than directly coding the data itself, the coded data consists of
a difference signal formed by subtracting a prediction of the data from the data
itself.
The prediction for the current sample is usually formed using past data. A predictive
encoder and decoder are shown in Figure, with the difference signal given by d. If the
internal loop states are initialized to the same values at the beginning of the signal, then y
= x.
If the predictor is ideal at removing redundancy, then the difference signal contains
only the new information at each time instant that is unrelated to previous data.
This new information is sometimes referred to as the innovation, and d is called the
innovations process. If predictive coding is used, an appropriate predictor must be
determined.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

22

Predictive coding

Prediction

Use previous sample(s) to estimate the current


sample.
For most signal, the difference of the prediction
and actual values is small. We can use smaller
number of bits to code the difference while
maintaining the same accuracy !!
Noise is completely unpredictable

9/14/2006

Most codec requires the data being preprocessed or


otherwise it may perform badly when the data contains
noise.

Nguyen Chan Hung Hanoi University of Technology

23

2.8. M ho thng k: M Huffman

Gn m ngn cho mu c xc sut xut hin cao


v gn m di cho mu t xut hin hn
Sgn bit da trn s thng k ca d liu
ngun.
Thng k d liu ngun c thc hin trc qu
trnh gn bit.
Cn gi l VLC Variable Length Coding
(Mt v d v Huffman code) M Morse..

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

24

Statistical coding: the Huffman code

Assign short code to the most probable data


pattern and long code to the less frequent
data pattern.
Bit assignment based on statistic of the
source data.
The statistics of the data should be known
prior to the bit assignment.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

25

2.9. Nhc im ca nn:

D gy li d liu

i hi yu cu che giu i vi cc ng dng thi


gian thc

Nn loi b phn d tha tuy nhin nhng phn ny


li l yu t cn thit ngn cho d liu khng b li.

Cn thm m sa li, do cng thm phn d tha


vo d liu nn.

Mo nhn to (Artifact):

9/14/2006

Xut hin khi m ho loi b 1 phn entropy


H s nn cng cao cng c nhiu mo nhn to.

Nguyen Chan Hung Hanoi University of Technology

26

Drawbacks of compression

Sensitive to data error

Concealment required for real time application

Compression eliminates the redundancy which is essential


to making data resistant to errors.
Error correction code is required, hence, adds redundancy
to the compressed data.

Artifacts

9/14/2006

Artifacts appear when the coder eliminates part of the


entropy.
The higher the compression factor, the more the artifacts.

Nguyen Chan Hung Hanoi University of Technology

27

2.10. Mt v d v m ho: Tp hp cc im
mu.
Trong 1 tm nh, gi tr im nh c tp hp trong
vi cc i.
Mi tp hp i din cho 1 vng mu ca 1 i tng
trong nh (v d: bu tri xanh)
Qu trnh m ho:

Chia gi tr im nh thnh 1 s lng gii hn ca cc tp hp


d liu. (VD: tp hp cc im nh ca bu tri xanh hay ng
c xanh)
Gi thng tin ca tm nh bao gm mu chnh ca mi tp hp
v 1 con s nhn dng cho mi tp hp.
Vi mi im nh, truyn i:

9/14/2006

Mu trung bnh ca vng mu m n gn nht


S khc nhau ca n so vi tp hp mu trung bnh ( -> c th
c m ho gim d tha khi m cc s sai khc gn nh
nhau) -> c th d on
Nguyen Chan Hung Hanoi University of Technology

28

A coding example: Clustering color pixels

In an image, pixel values are clustered in several


peaks
Each cluster representing the color range of one
object in the image (e.g. blue sky)
Coding process:
Separate the pixel values into a limited number of data
clusters (e.g., clustered pixels of sky blue or grass green)
2. Send the average color of each cluster and an
identifying number for each cluster as side information.
3. Transmit, for each pixel:
1.

9/14/2006

The number of the average cluster color that it is close to.


Its difference from that average cluster color. ( can be
coded to reduce redundancy since the differences are often
similar !!) Prediction
Nguyen Chan Hung Hanoi University of Technology

29

2.11. M ho vi sai khung:

M ho vi sai khung = d on t khung hnh


trc .
1 khung hnh c cha trong b m ho so
snh vi khung hin ti -> gy ra tr 1 khung
Vi nh tnh:

Ch cn gi d liu ca 1 khung u tin


Ton b sai s d on sau c gi tr 0
Thnh thong truyn li khung cho php bn nhn (nu
mi c bt) c c im khi u

-> FDC gim thng tin ca nh tnh nhng li


st li kh nhiu d liu cho nh ng (VD: mt
chuyn ng ca camera)

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

30

Frame-Differential Coding

Frame-Differential Coding = prediction from a


previous video frame.
A video frame is stored in the encoder for
comparison with the present frame causes
encoding latency of one frame time.
For still images:

Data can be sent only for the first instance of a frame


All subsequent prediction error values are zero.
Retransmit the frame occasionally to allow receivers that
have just been turned on to have a starting point.

FDC reduces the information for still images, but


leaves significant data for moving images (e.g. a
movement of the camera)

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

31

2.12. D bo b chuyn ng

D liu trong FDC c th b loi b bng


cch so snh im nh hin ti vi v tr
ca i tng tng ng trong khung
hnh trc (-> ch khng phi v tr
khng gian tng ng trong khung trc
)
B m ho c lng s chuyn ng
trong nh tm vng tng ng trong
khung hnh trc
B m ho tm phn ging ca khung
trc vi khung mi sp truyn i.
Sau n gi 1 Vct chuyn ng,
vct ny s cho b gii m bit phn
no ca khung trc s c dng
d on khung mi.
ng thi n cng gi sai s d on
khi phc khung mi .
S trn -> khng c b chuyn ng.
S di -> c b chuyn ng.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

32

Motion Compensated Prediction

More data in Frame-Differential Coding can


be eliminated by comparing the present
pixel to the location of the same object
in the previous frame. ( not to the
same spatial location in the previous frame)
The encoder estimates the motion in the
image to find the corresponding area in a
previous frame.
The encoder searches for a portion of a
previous frame which is similar to the part
of the new frame to be transmitted.
It then sends (as side information) a
motion vector telling the decoder what
portion of the previous frame it will use to
predict the new frame.
It also sends the prediction error so that
the exact new frame may be reconstituted
See top figure without motion
compensation Bottom figure With
motion compensation

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

33

Motion compensation (B sung)

Actions:

9/14/2006

1. Compute Motion
Vector
2. Shift Data from Picture
N Using Vector to Make
Predicted Picture N+1
3. Compare Actual
Picture with Predicted
Picture
4. Send Vector and
Prediction Error

Nguyen Chan Hung Hanoi University of Technology

34

2.12.1. Thng tin khng th d bo


Thng tin khng th d bo t khung trc
:

1.

2.

9/14/2006

S thay i ca phng nn (VD: phong cnh nn


thay i)
Thng tin mi ca vt th b che ph mi l ra
do chuyn ng ca vt th ngang qua nn,
hoc ra ca khung phong cnh (VD: khun mt
ca cu th b che bi tri bng ang bay)

Nguyen Chan Hung Hanoi University of Technology

35

Unpredictable Information
Unpredictable information from the previous
frame:

Scene change (e.g. background landscape


change)
2. Newly uncovered information due to object
motion across a background, or at the edges of a
panned scene. (e.g. a soccer s face uncovered
by a flying ball)
1.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

36

2.12.2. X l thng tin khng th d


bo trc (b sung)

Phng thay i

Thng tin b che khut:

nh m ho trong phi c gi u tin ->yu cu nhiu d liu hn


nh d on (P picture)
nh m ha trong c gi 2 ln/s -> Thi gian v tn s gi c th c
iu chnh ph hp vi s thay i phng.
nh m ho d on hai chiu Bi-directionally
Trong h thng phi c ch cha khung ch nh pha sau c c
thng tin mong mun.
gii hn b nh ca b gii m, b m ha cha cc nh v gi cc nh
tham kho c yu cu trc khi gi nh d on hai chiu

Trong k thut nn MPEG:

Cc nh c nn trong c gi l nh loi I (I picture)


Cc nh c m ha ch s dng cc nh tham chiu ngc gi l nh P
hay nh d on (P picture)
Cc nh c m ha t vic ni suy c cc nh tham chiu ngc v tham
chiu thun gi l nh B (B picture)

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

37

Dealing with unpredictable Information

Scene change

Uncovered information

An Intra-coded picture (MPEG I picture) must be sent for a


starting point require more data than Predicted picture (P picture)
I pictures are sent about twice per second Their time and sending
frequency may be adjusted to accommodate scene changes
Bi-directionally coded type of picture, or B picture.
There must be enough frame storage in the system to wait for the
later picture that has the desired information.
To limit the amount of decoders memory, the encoder stores
pictures and sends the required reference pictures before
sending the B picture.

In MPEG:

9/14/2006

Pictures which are intracoded only are termed I pictures;


Pictures which are encoded using only backward references are
termed P pictures for Predictive
Pictures which are encoded frominterpolation of both a backward
reference and a forward reference are termed B pictures
Nguyen Chan Hung Hanoi University of Technology

38

2.13. M ho bin i (Transform Coding)

Bin i gi tr khng gian ca im nh thnh cc


gi tr ca cc h s bin i trong min tn s
S h s to ra bng vi s im nh c bin
i
Ch mt s t h s cha hu ht ni dung (nng
lng) ca nh cc h s ny c th c m
ho tip bi m ho entropy khng tn hao
Qu trnh bin i tp trung nng lng vo cc h
s c bit (ch yu l cc h s c tn s thp)

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

39

Transform Coding

Convert spatial image pixel values to


transform coefficient values
the number of coefficients produced is
equal to the number of pixels transformed.
Few coefficients contain most of the
energy in a picture coefficients may be
further coded by lossless entropy coding
The transform process concentrates the
energy into particular coefficients
(generally the low frequency coefficients )

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

40

M ho bin i (Transform Coding) (2)

Khi nim v histogram..

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

41

2.13.1. Cc loi m bin i nh:

Cc loi m ho nh:
Fourier ri rc (DFT)
Karhonen-Loeve
Walsh-Hadamard
Lapped orthogonal
Cosine ri rc (DCT) -> dng trong MPEG 2
Wavelet -> Mi
Nhng s khc bit gia cc phng php m ho
bin i:

Kh nng tp trung nng lng vo mt s t h s

Vng nh hng ca mi h s trong nh khi phc

9/14/2006

S xut hin v kh nng nhn thy cc nhiu m ha sinh


ra do s lng t ho cc h s bin i
Nguyen Chan Hung Hanoi University of Technology

42

Types of picture transform coding

Types of picture coding:

Discrete Fourier (DFT)


Karhonen-Loeve
Walsh-Hadamard
Lapped orthogonal
Discrete Cosine (DCT) used in MPEG-2 !
Wavelets New !

The differences between transform coding methods:

9/14/2006

The degree of concentration of energy in a few coefficients


The region of influence of each coefficient in the
reconstructed picture
The appearance and visibility of coding noise due to coarse
quantization of the coefficients
Nguyen Chan Hung Hanoi University of Technology

43

2.13.2. M ho DCT c tn hao

M ho khng tn hao khng th t c


h s nn cao (khong 4:1 hoc t hn)
M ho tn hao = loi b thng tin 1 cch
chn lc sao cho kh phn bit gia sn
phm ngun v sn phm c ti to bng
th gic v thnh gic hoc gy ra t s mo
dng nht.
M ho tn hao c th c thc hin bi:

9/14/2006

Loi b mt s h s DCT
iu chnh th ca qu trnh lng t ha cc
h s -> bin php tt hn.
Nguyen Chan Hung Hanoi University of Technology

44

DCT Lossy Coding

Lossless coding cannot obtain high


compression ratio (4:1 or less)
Lossy coding = discard selective information
so that the reproduction is visually or aurally
indistinguishable from the source or having
least artifacts.
Lossy coding can be achieved by:

9/14/2006

Eliminating some DCT coefficients


Adjusting the quantizing coarseness of the
coefficients better !!
Nguyen Chan Hung Hanoi University of Technology

45

2.14. Hin tng mt n

Hin tng mt n lm cho mt s loi nhiu m


ha tr nn khng nhn thy hoc khng nghe thy
c.

Trong audio, 1 m thun nht s che du nng lng


c tn s cao hn v thp hn (vi nh hng yu hn)
Trong video, nhng l tng phn cao che du nhiu
ngu nhin

Nhiu sinh ra vi tc bit thp v thuc mt


trong cc loi tn s, khng gian, hoc thi gian.
V d v mt n m thanh: ting bom n t ting
chim ht..

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

46

Masking

Masking make certain types of coding


noise invisible or inaudible due to some
psycho-visual/acoustical effect.

In audio, a pure tone will mask energy of higher


frequency and also lower frequency (with weaker
effect).
In video, high contrast edges mask random noise.

Noise introduced at low bit rates falls in the


frequency, spatial, or temporal regions

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

47

2.15. Lng t ho bin i:

Lng t ho bin i l k thut chnh trong m ho tn hao


lm gim ng k tc bit

Trong mt bin i, lng t ho th cc h s khng quan


trng ( t c ch , c nng lng thp, kh nhn thy hoc
nghe c)

C th p dng cho ton b mt tn hiu hay cho cc thnh phn


tn s ring l ca mt tn hiu c m ha bin i.

Lng t ho bin i cng ng thi iu khin tc


bit :

Bin mt dng bt thnh mt knh tc bit khng i

Ngn cn hin tng b m trn hoc rng.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

48

Variable quantization

Variable quantization is the main technique of lossy


coding greatly reduce bit rate.
Coarsely quantizing the less significant coefficients
in a transform ( less noticeable / low energy / less
visible/audible)
Can be applied to a complete signal or to individual
frequency components of a transformed signal.
VQ also controls instantaneous bit rate in order to:

9/14/2006

Match average bit rate to a constant channel bit rate.


Prevent buffer overflow or underflow.

Nguyen Chan Hung Hanoi University of Technology

49

2.16. M ho Run-level

M ho Run-level = m ho mt dng zero


theo sau bi mt gi tr khc zero

Thay v gi tt c cc gi tr zero 1 cch ring bit


th ch gi chiu di ca dng d liu.
Hu ch cho cc d liu c dng Zero di
Cc dng ny d m ho bi m Huffman

V d (V d 1 ngi chn b m b
c v b ci)

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

50

Run-Level coding

"Run-Level" coding = Coding a run-length of


zeros followed by a nonzero level.

9/14/2006

Instead of sending all the zero values


individually, the length of the run is sent.
Useful for any data with long runs of zeros.
Run lengths are easily encoded by Huffman code

Nguyen Chan Hung Hanoi University of Technology

51

M ho Run-level ( B sung)

Let an event represent the pair (run, level), where run represents the
number of zeros and level represents the magnitude of the
nonzero coefficient.

This coding process is sometimes called run-length coding Then, a


table is built to represent each event by a specific codeword (i.e., a sequence
of bits).

Events that occur more often are represented by shorter codewords,


and less frequent events are represented by longer codewords.
This entropy coding process is therefore called VLC or Huffman
coding.
Table shows part of a sample VLC table. In this table, the last bit s of
each codeword denotes the sign of the level, 0 for positive and 1 for
negative.
It can be seen that more likely events (i.e., short runs and low levels), are
represented with short codewords, and vice versa.
At the decoder, all the above steps are reversed one by one.
All the steps can be exactly reversed except for the quantization step,
which is where loss of information arises This is known as lossy
compression.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

52

Bng VLC mu

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

53

Mi lin h gia cc k thut hc

Quy trnh nn MPEG

9/14/2006

D bo b chuyn ng (MOTION
ESTIMATION)
M ha bin i (DISCRETE COSINE
TRANSFORM - DCT)
Lng t ha bin i (QUANTIZATION)
ZIG ZAG SCAN
RUN LEVEL CODING (RLC)
M ha thng k - Huffman (VARIABLE
LENGTH CODING VLC)

Nguyen Chan Hung Hanoi University of Technology

54

Mi lin h gia cc k thut nn


Cc phng
php nn

Nn khng
tn hao

M ha

VLC

bin i

(Huffman)

9/14/2006

Nn tn hao

RLC

Lng t
ha bin i

Nguyen Chan Hung Hanoi University of Technology

M ha
d on

55

2.17. Tng kt:

Qu trnh nn
Ly mu v lng t ho
M ho:

M ho tn hao v khng tn hao


M ho vi sai khung
D bo b chuyn ng
Lng t ho bin i
M ho Run-level

Hin tng mt n

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

56

Key points:

Compression process
Quantization & Sampling
Coding:

Lossless & lossy coding


Frame-Differential Coding
Motion Compensated Prediction
Variable quantization
Run level coding

Masking

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

57

M ha Huffman (b sung) Bi tp mu

As a simple example of the use of Huffman codes for images, consider an image in which
the pixels (or the difference values) can have one of 8 brightness values.
This would require 3 bits per pixel (2^3=8) for conventional representation. From a
histogram of the image, the frequency of occurrence of each value can be determined and
as an example might show the following results (Table 1), in which the various brightness
values have been ranked in order of frequency. Huffman coding provides a straightforward
way to assign codes from this frequency table, and the code values for this example are
shown.
Note that each code is unique and no sequence of codes can be mistaken for any other
value, which is a characteristic of this type of coding.
Table 1. Example of Huffman codes assigned to brightness values
Brightness Value
4
5
3
6
2
7
1
0

Frequency
0.45
0.21
0.12
0.09
0.06
0.04
0.02
0.01

Huffman Code
1
01
0011
0010
0001
00001
000000
000001

Notice that the most commonly found pixel brightness value requires only a single bit, but
some of the less common values require 5 or 6 bits, more than the three that a simple
representation would need. Multiplying the frequency of occurrence of each value times the
length of the code gives an overall average of
0.451 + 0.212 + 0.124 + 0.094 + 0.064 + 0.045 + 0.026 + 0.016 = 2.33 bits/pixel

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

58

Bi tp chng 1

BT 1: Cho bng 1 ( khng c phn m Huffman)


Hi: ( cha BT mu)

BT 2 : (c bng m HM) cu hi: (n tp)

Entropy ca nh trn l bao nhiu


Nu m ha nh phn bnh thng th cn bao nhiu bit
Nu m ha Huffman th cn bao nhiu bit nhn xt s
hiu qu ca m HM.
C nhn xt g v bng m ha HM ( di t m)

BT3: (cha mu v n tp)

9/14/2006

Cho hai hnh v v 2 nh, tnh ra s bit cn thit m


ha.. (TH s)

Nguyen Chan Hung Hanoi University of Technology

59

BT3:

Tnh xem s bit ti thiu m ha 2 nh


sau:
Hinh tri 63 con 0 v 1 con 1
Hnh phi 32 con 0 v 32 con 1

0
0
0
0
0
0
0
0

9/14/2006

0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0

0
0
0
0
0
1
0
0

0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0

1
0
1
0
0
0
0
0

0
0
0
0
0
0
0
0

Nguyen Chan Hung Hanoi University of Technology

1
0
1
0
0
0
0
0

0
0
0
0
0
0
0
0

0 0
0 0
1 0
0 0
0 0
0 1
0 0
0 0

60

BT 3 (cha)

nh tri:

H(x) = -63/64 log2 63/63 1/64 log2 1/64 = 0,116


bit/pixel

nh phi:

9/14/2006

H(x) = -32/64 log2 32/64 32/64 log2 32/64 = 1


bit/pixel.

Nguyen Chan Hung Hanoi University of Technology

61

Chng 2: cc k thut multimedia

Ni dung

9/14/2006

JPEG
MPEG-1/MPEG-2 Video
MPEG-1 Layer 3 Audio (mp3)
MPEG-4
MPEG-7 (gii thiu)
HDTV (gii thiu)
H261/H263 (gii thiu)
M ho da trn m hnh ha (model base coding
- MBC) (gii thiu)
Nguyen Chan Hung Hanoi University of Technology

62

Chapter 2: Multimedia technologies

Roadmap

9/14/2006

JPEG
MPEG-1/MPEG-2 Video
MPEG-1 Layer 3 Audio (mp3)
MPEG-4
MPEG-7 (brief introduction)
HDTV (brief introduction)
H261/H263 (brief introduction)
Model base coding (MBC) (brief introduction)

Nguyen Chan Hung Hanoi University of Technology

63

JPEG (Joint Photographic Experts Group


nhm chuyn gia nghin cu nh)

B m ho JPEG

Chia nh thnh cc khi 8*8 pixels


Tnh ton bin i cosine ri rc cho mi khi
B lng t ha lm trn h s DCT da theo ma trn lng t tn
hao nhng li cho t l nn ln
To ra 1 chui cc h s DCT bng cch qut ziczac
Dng 1 m di bin i (Variable Length Code VLC) m ha cc h
s DCT
Ghi dng d liu nn ra file ( *.jpeg hay *.jpg)

B gii m JPEG

File dng d liu vo IDCT (Inverse DCT bin i DCT ngc)


nh

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

64

JPEG (Joint Photographic Experts Group)

JPEG encoder

Partitions image into blocks of 8 * 8 pixels


Calculates the Discrete Cosine Transform (DCT) of each block.
A quantizer rounds off the DCT coefficients according to the
quantization matrix. lossy but allows for large compression ratios.
Produces a series of DCT coefficients using Zig-zag scanning
Uses a variable length code (VLC) on these DCT coefficients
Writes the compressed data stream to an output file (*.jpg or *.jpeg).

JPEG decoder

9/14/2006

File input data stream Variable length decoder IDCT (Inverse


DCT) Image

Nguyen Chan Hung Hanoi University of Technology

65

JPEG qut Zig-zag

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

66

JPEG Zig-zag scanning

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

67

JPEG - DCT

DCT ging DFT -> Bin i tn hiu hoc nh t min


khng gian sang min tn s
DCT i hi t php nhn hn DFT

nh u vo A:

nh A l ma trn im nh c kch thc N2 (rng) * N1


(cao)
A(i,j) l chi ca im nh hng i ct j

nh u ra B:

B(k1,k2) l h s DCT hng k1 v ct k2 ca ma trn


DCT

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

68

JPEG - DCT

DCT is similar to the Discrete Fourier Transform


transforms a signal or image from the spatial domain to
the frequency domain.
DCT requires less multiplications than DFT

Input image A:

The input image A is N2 pixels wide by N1 pixels high;


A(i,j) is the intensity of the pixel in row i and column j;

Output image B:

9/14/2006

B(k1,k2) is the DCT coefficient in row k1 and column k2 of


the DCT matrix
Nguyen Chan Hung Hanoi University of Technology

69

JPEG Ma trn lng t ho

Ma trn lng t ha l ma trn 8*8 ca cc bc lng t mi


phn t ng vi mt h s DCT
Thng l i xng

Cc bc lng t s l:
Nh pha trn bn tri (tn s thp)
Ln pha di bn phi (tn s cao)
Bc lng t = 1 l chnh xc nht
B lng t chia h s DCT cho bc lng t tng ng ca n,
sau lm trn ti s nguyn gn nht
Cc bc lng t ln s lm cho cc h s nh gim xung bng 0
Kt qu l:
Nhiu h s tn s cao bin thnh zero -> loi b d dng
Cc h s tn s thp ch chu s iu chnh nh.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

70

JPEG - Quantization Matrix

The quantization matrix is the 8 by 8 matrix of step sizes


(sometimes called quantums) - one element for each DCT
coefficient.
Usually symmetric.
Step sizes will be:
Small in the upper left (low frequencies),
Large in the lower right (high frequencies)
A step size of 1 is the most precise.
The quantizer divides the DCT coefficient by its corresponding
quantum, then rounds to the nearest integer.
Large quantums drive small coefficients down to zero.
The result:
Many high frequency coefficients become zero remove easily.
The low frequency coefficients undergo only minor adjustment.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

71

Minh ho qu trnh m ho JPEG


1255 -15

43

58

-12

-4

-6

78

-1

-1

-73 -27

-1

-5

-5

-4

-1

-4

-5

-3

11

-65

80

-49

37

-87

12

10

27

-50

29

13

13

-6

-16

21

-11 -10

10

-21

-6

-1

-14

14

-14

16

-8

-4

-1

-13

12

-9

-1

-4

-2

-7

-1

DCT Coefficients
Quantization result
Kt qu scan Zigzag : 78 -1 1 -4 -5 4 4 6 3 2 -1 -3 -5 -4 -1 0 -1 0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB

d dng m ho bng Run-length Huffman


9/14/2006

Nguyen Chan Hung Hanoi University of Technology

72

JPEG Coding process illustrated


1255 -15

43

58

-12

-4

-6

78

-1

-1

-73 -27

-1

-5

-5

-4

-1

-4

-5

-3

11

-65

80

-49

37

-87

12

10

27

-50

29

13

13

-6

-16

21

-11 -10

10

-21

-6

-1

-14

14

-14

16

-8

-4

-1

-13

12

-9

-1

-4

-2

-7

-1

DCT Coefficients
Quantization result
Zigzag scan result: 78 -1 1 -4 -5 4 4 6 3 2 -1 -3 -5 -4 -1 0 -1 0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB

Easily coded by Run-length Huffman coding


9/14/2006

Nguyen Chan Hung Hanoi University of Technology

73

MPEG (Moving pic expert group nhm


chuyn gia nghin cu nh ng)

MPEG l tri tim ca:

u thu TV k thut s
B gii m HDTV
u c DVD
Hi tho truyn hnh
Internet video. v.v..

Cc chun MPEG:

9/14/2006

MPEG 1; MPEG 2; MPEG - 4; MPEG 7


MPEG 3 b b qua v tr thnh dng m rng
ca MPEG2
Nguyen Chan Hung Hanoi University of Technology

74

MPEG (Moving Picture Expert Group)

MPEG is the heart of:

Digital television set-top boxes


HDTV decoders
DVD players
Video conferencing
Internet video, etc

MPEG standards:

9/14/2006

MPEG-1, MPEG-2, MPEG-4, MPEG-7


(MPEG-3 standard was abandoned and became
an extension of MPEG-2)
Nguyen Chan Hung Hanoi University of Technology

75

Cc chun MPEG:

MPEG 1 ( lc hu)

MPEG 2 (ng dng rng ri)

1 chun cho tivi s


ng dng: DVD (digital versatile disk), HDTV(high definition TV), DVB
(European Digital Video Broadcasting Group), v.v.

MPEG 4 (ming dng vn cn ang nghin cu)

1 chun lu tr v phc hi hnh nh m thanh trn cc vt liu cha


media (digital media)
ng dng: VCD (video compact disk)

1 chun cho cc ng dng multimedia vi nn cao


ng dng: Internet, TV cp, studio o, v.v.

MPEG 7 (vn ang nghin cu pht trin)

9/14/2006

L 1 chun h trcho tm kim thng tin (gi l Giao din m t ni dung


Multimedia - MCDI)
ng dng: Internet, H thng tm kim Video, th vin s..
Nguyen Chan Hung Hanoi University of Technology

76

MPEG standards

MPEG-1 (Obsolete)

MPEG-2 (Widely implemented)

A standard for digital television


Applications: DVD (digital versatile disk), HDTV (high definition
TV), DVB (European Digital Video Broadcasting Group), etc.

MPEG-4 (Newly implemented still being


researched)

A standard for storage and retrieval of moving pictures and audio


on storage media
application: VCD (video compact disk)

A standard for multimedia applications


Applications: Internet, cable TV, virtual studio, etc.

MPEG-7 (Future work ongoing research)

9/14/2006

Content representation standard for information search


( Multimedia Content Description Interface)
Applications: Internet, video search engine, digital library
Nguyen Chan Hung Hanoi University of Technology

77

Cc chun MPEG-2 chnh thc

Chun Quc T ISO/IEC 13818-2 Phng


php m ha chung ca nh ng v m
thanh kt hp)

ATSC (U ban cc h thng truyn hnh tin


tin) ti liu A/54 Hng dn s dng chun
ti vi s ATSC)

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

78

MPEG-2 formal standards

The international standard ISO/IEC 13818-2


"Generic Coding of Moving Pictures and
Associated Audio Information
ATSC (Advanced Television Systems
Committee) document A/54 "Guide to the Use of
the ATSC Digital Television Standard

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

79

Cu trc d liu nh MPEG:

Dng d liu nh MPEG-2 c xy dng theo cc lp t thp n


cao nh sau:
PIXEL l n v c s
BLOCK l 1 mng 8x8 pixels
MACROBLOCK gm 4 block luma v 2 block chroma (dng cho
b chuyn ng, lng t ha)
SLICE gm cc macroblock vi s lng c th thay i (
khc phc li tryn dn)
PICTURE gm cc khung (hoc trng) ca cc slice
GROUP OF PICTURE (GOP) gm cc picture vi s lng c
th thay i
SEQUENCE cha cc GOP vi s lng c th thay i (dng
thit lp cc tham s Video)
PACKETIZED ELEMENTARY STREAM lung c s ng gi
(ty chn)

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

80

MPEG video data structure

The MPEG 2 video data stream is constructed in


layers from lowest to highest as follows:

9/14/2006

PIXEL is the fundamental unit


BLOCK is an 8 x 8 array of pixels
MACROBLOCK consists of 4 luma blocks and 2 chroma
blocks
SLICE consists of a variable number of macroblocks
PICTURE consists of a frame (or field) of slices
GROUP of PICTURES (GOP) consists of a variable
number of pictures
SEQUENCE consists of a variable number of GOPs
PACKETIZED ELEMENTARY STREAM (opt)
Nguyen Chan Hung Hanoi University of Technology

81

MPEG layers

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

82

Pixel v block:

Pixel = phn t nh

L mt im ly mu trong khng gian ca tm


nh
1 im nh mu c th c c trng s ho
bng mt s lng bit biu din cho mi gi tr
ca 3 mu c bn

Block

9/14/2006

1 block = 1 ma trn 8x8 pixels


1 block l n v c s cho m ho DCT

Nguyen Chan Hung Hanoi University of Technology

83

Pixel & Block

Pixel = "picture element".

A discrete spatial point sample of an image.


A color pixel may be represented digitally as a
number of bits for each of three primary color
values

Block

9/14/2006

= 8 x 8 array of pixels.
A block is the fundamental unit for the DCT coding
(discrete cosine transform).

Nguyen Chan Hung Hanoi University of Technology

84

Macroblock

1 macroblock = ma trn 16x16 ca cc im nh chi (Y) pixels ( =


4 blocks = ma trn 2x2 block)

S lng ca chroma pixel (Cr, Cb) thay i ph thuc vo cu trc


mu (chroma pixel) cu trc ny c biu th phn tip u
ca chui (sequence) (v d: 4:2:0)

Macroblock l n v c s cho b chuyn ng v s c vect


chuyn ng kt hp vi n nu n c m ha bng m d on

1 macroblock c phn loi:

M ha theo trng ( 1 khung qut xen k gm 2 trng bn nh)


M ha khung ( ph thuc vo cch rt ra 4 block t mt
macroblock)

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

85

Macroblock

A macroblock = 16 x 16 array of luma (Y) pixels ( =


4 blocks = 2 x 2 block array).
The number of chroma pixels (Cr, Cb) will vary
depending on the chroma pixel structure
indicated in the sequence header (e.g. 4:2:0, etc)
The macroblock is the fundamental unit for motion
compensation and will have motion vector(s)
associated with it if is predictively coded.
A macroblock is classified as

9/14/2006

Field coded ( An interlaced frame consists of 2 field)


Frame coded depending on how the four blocks are
extracted from the macroblock.

Nguyen Chan Hung Hanoi University of Technology

86

Slice

Cc nh (picture) c chia ra nhiu slice (di)


1 slice gm 1 s bt k cc macroblock lin tip
(t tri sang phi), nhng thng thng l 1
hng lin nhau ca cc macroblock.
1 slice khng m rng ra qu 1 hng.
Tip u ca Slice mang thng tin a ch cho
php b gii m huffman ng b li cc
bin ca slice

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

87

Slice

Pictures are divided into slices.


A slice consists of an arbitrary number of
successive macroblocks (going left to right),
but is typically an entire row of macroblocks.
A slice does not extend beyond one row.
The slice header carries address information
that allows the Huffman decoder to
resynchronize at slice boundaries

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

88

Picture

1 nh ngun l 1 ma trn ch nht lin k ca cc pixel


1 nh c th l 1 khung video hon chnh (frame picture) hoc
1 trng qut xen k t 1 nh qut xen k (field picture)
1 field pic khng c 1 dng trng no gia cc dng
1 nh (cn gi l n v truy nhp video) bt u vi mt m
khi u v mt tip u. Tip u gm:
LoI nh (I, P, B)
Thng tin tham chiu thi gian
Khong tm kim vect chuyn ng
D liu tu chn ngi s dng
1 frame picture gm:
1 khung ca ngun qut lin tc (progressive) hay
2 bn nh qut xen k ca 1 nh ngun qut xen k

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

89

Picture

A source picture is a contiguous rectangular array of pixels.


A picture may be a complete frame of video ("frame picture") or
one of the interlaced fields from an interlaced source ("field
picture").
A field picture does not have any blank lines between its active
lines of pixels.
A coded picture (also called a video access unit) begins with a
start code and a header. The header consists of:
picture type (I, B, P)
temporal reference information
motion vector search range
optional user data
A frame picture consists of:
a frame of a progressive source or
a frame (2 spatially interlaced fields) of an interlaced source

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

90

I, P, B Pictures

nh m ho c chia lm 3 loI: I, P, B
I picture = Intra coded Pictures (nh m ha trong)

P picture = Predicted Pictures ( nh d on)

Cc macroblock c th c m ho vi d on trc t cc nh tham


kho I v P trc hoc cc macroblock c th c m ho trong

B picture = Bi-directionally predicted pictures (nh d on 2


chiu)

9/14/2006

Tt c cc macroblock u dng m ho khng c d on


nh I cn cho php pha thu c im bt u cho d on sau khi thay i
knh v cho php khi phc li sau cc li.

Cc macroblock c th c m ho bng d bo trc t cc nh tham


kho I v P trc
Cc macroblock c th c m ho bng d bo sau t cc nh tham kho
I v P tip theo
Cc macroblock c th c m ho bng d on ni suy t cc nh tham
kho I v P c qu kh v tng lai.
Cc macroblock c th c m ho trong (ko c d on)
Nguyen Chan Hung Hanoi University of Technology

91

I, P, B Pictures
Encoded pictures are classified into 3 types: I, P, and B.
I Pictures = Intra Coded Pictures

P Pictures = Predicted Pictures

All macroblocks coded without prediction


Needed to allow receiver to have a "starting point" for prediction after
a channel change and to recover from errors
Macroblocks may be coded with forward prediction from references
made from previous I and P pictures or may be intra coded

B Pictures = Bi-directionally predicted pictures

9/14/2006

Macroblocks may be coded with forward prediction from previous I


or P references
Macroblocks may be coded with backward prediction from next I or
P reference
Macroblocks may be coded with interpolated prediction from past
and future I or P references
Macroblocks may be intra coded (no prediction)
Nguyen Chan Hung Hanoi University of Technology

92

Nhm nh (GOP)

Lp GOP l tu chn trong MPEG2


GOP bt u vi m khi u v header
Header mang:

Thng tin v thi gian m ha


Thng tin v son tho Video (editing)
D liu tu chn ca ngi s dng

nh m ho u tin trong Gop lun l nh I


Chiu dI in hnh l 15 pic vi cu trc nh sau (minh ha di)

I B B P B B P B B P B B P B B cung cp nh I vi tn s y cho php b gii m


gii m 1 cch chnh xc

Forward motion compensation

Time

Bidirectional motion compensation


9/14/2006

Nguyen Chan Hung Hanoi University of Technology

93

Group of pictures (GOP)

The group of pictures layer is optional in MPEG-2.


GOP begins with a start code and a header
The header carries

time code information


editing information
optional user data

First encoded picture in a GOP is always an I picture


Typical length is 15 pictures with the following structure (in display order):

I B B P B B P B B P B B P B B Provides an I picture with sufficient


frequency to allow a decoder to decode correctly
Forward motion compensation

Time

Bidirectional motion compensation


9/14/2006

Nguyen Chan Hung Hanoi University of Technology

94

Sequence (chui):

1 sequence bt u vi mt m khi u duy nht di


32bit theo sau l 1 header
Header mang cc thng tin:

Kch thc nh
T s din mo (Aspect ratio)
Tc khung v tc bit
Cc ma trn lng t ho tu chn
Kch thc yu cu ca b m gii m
Cu trc mu (chroma pixel)
D liu tu chn ngi s dng

Thng tin chui cn cho vic thay i knh


di chui ph thuc vo gi tr tr i knh chp
nhn c

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

95

Sequence

A sequence begins with a unique 32 bit start code followed by


a header.
The header carries:
picture size
aspect ratio
frame rate and bit rate
optional quantizer matrices
required decoder buffer size
chroma pixel structure
optional user data
The sequence information is needed for channel changing.
The sequence length depends on acceptable channel change
delay.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

96

Packetized Elementary Stream (PES)

u ra ca b m ha MPEG Audio hoc Video c gi l lung c s (ES)


l mt tn hiu gn thi gian thc v khng c gii hn.
cho thun tin, n c ct thnh cc khi d liu c kch thc thch hp
gi l Packetized Elementary Stream (PES).

Cc khi d liu ny cn c tip u mang thng tin v nh du v tr bt u ca


cc khi v phi c nhn thi gian bi v qu trnh ng gi lm sai lch trc thi gian.

Video Elementary Stream - video ES (lung video c s), gm tt c d liu


Video cho 1 chui, bao gm tip u ca chui v cc thnh phn ph ca 1
chui

1 ES ch mang 1 loi d liu (hnh nh hoc m thanh) t mt b m ho hnh


nh hoc m thanh
Cc gi PES c di bin i, khc vi cc gi vn chuyn c chiu di c
nh, v c th di hn nhiu so vi cc gi vn chuyn

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

97

Packetized Elementary Stream (PES)

The output of a single MPEG audio or video coder is called an


Elementary Stream.

An Elementary Stream is an endless near real-time signal.

For convenience, it can be broken into convenient-sized data blocks in


a Packetized Elementary Stream (PES).

These data blocks need header information to identify the start of the
packets and must include time stamps because the packetizing process
disrupts the time axis.

Video Elementary Stream (video ES), consists of all the video data for a
sequence, including the sequence header and all the subparts of a sequence.

An ES carries only one type of data (video or audio) from a single video or
audio encoder.

PES packets have variable length, not corresponding to the fixed packet
length of transport packets, and may be much longer than a transport packet.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

98

MPEG Packetized Elementary Stream (PES) (BS)

The figure shows that one video PES and a number of audio
PES can be combined to form a Program Stream, provided
that all of the coders are locked to a common clock.
Time stamps in each PES ensure lip-sync between the
video and audio.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

99

Intra Frame Coding - M ho trong nh

M ha trong nh ch lin quan vi thng tin trong khung hin ti (ko


lin quan ti khung no khc trong chui video)
S khi m ho trong khung MPEG (hnh di) -> ging JPEG
( xem li c cu m ha JPEG)
Cc khi c bn ca m ho trong nh:

9/14/2006

B lc video (ty chn)


B bin i DCT
B lng t ho cc h s DCT
B m ha chiu di bin i (VLC-variable length coder)

Nguyen Chan Hung Hanoi University of Technology

100

Intra Frame Coding

Intra coding only concern with information within the current


frame, (not relative to any other frame in the video sequence)
MPEG intra-frame coding block diagram (See bottom Fig)
Similar to JPEG (Lets review JPEG coding mechanism !!)
Basic blocks of Intra frame coder:

9/14/2006

Video filter
Discrete cosine transform (DCT)
DCT coefficient quantizer
Run-length amplitude/variable length coder (VLC)

Nguyen Chan Hung Hanoi University of Technology

101

B lc video:

H thng th gic ca con ngi:


Nhy cm nht vi cc thay i ca chi
t nhy cm nht vi s thay i mu
MPEG s dng khng gian mu YCbCr c trng cho gi tr
d liu thay cho RGB:
Y l tn hiu chi
Cb l tn hiu sai phn mu xanh
Cr l tn hiu sai phn mu
Th no l 4:4:4, 4:2:0, v.v, dng video ?

9/14/2006

4:4:4 l tn hiu YCbCr video y mi macroblock gm 4


Y block, 4 Cb block, 4 Cr block lng ph di thng.
4:2:0 c s dng nhiu nht trong MPEG2

Nguyen Chan Hung Hanoi University of Technology

102

Video Filter

Human Visual System (HVS) is

MPEG uses the YCbCr color space to represent the


data values instead of RGB, where:

Most sensitive to changes in luminance,


Less sensitive to variations in chrominance.

Y is the luminance signal,


Cb is the blue color difference signal,
Cr is the red color difference signal.

What is 4:4:4, 4:2:0, etc, video format ?

9/14/2006

4:4:4 is full bandwidth YCbCr video each macroblock


consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks
waste of bandwidth !!
4:2:0 is most commonly used in MPEG-2
Nguyen Chan Hung Hanoi University of Technology

103

Color Subsampling formats (BS)


4:2:2 Format

4:4:4 Format
Legends:

For PAL system (720 *576


lines, 8bits each sample)

4:4:4 Format:
Cr

Bit rate = (720 + 720 + 720)*


576 *8 *25 = 249 Mbps

Cb

4:1:1 Format

4:2:2 Format:

4:2:0 Format

Bit rate = (720 + 360 + 360)*


576 *8 *25 = 166 Mbps
4:2:0 Format:
Bit rate = (720 + 360)* 576
*8 *25 = 124,4 Mbps
4:1:1 Format:
Bit rate = (720 + 180 + 180)*
576 *8 *25 = 124,4 Mbps

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

104

ng dng ca cc dng mu:


nh dng
mu

Th t thi gian
ng dng
trong macroblock

4:2:0
(6 block)

YYYYCbCr

TV v cc thit b gii
tr dn dng

4:2:2
(8 block)

YYYYCbCrCbCr

Thit b studio
Thit b son tho
Video chuyn nghip

4:4:4
(12 block)

YYYYCbCrCbCrCb ha my tnh
CrCbCr

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

105

Applications of chroma formats


chroma_for
mat

Multiplex order (time)


within macroblock

Application

YYYYCbCr

Main stream television,


Consumer entertainment.

4:2:2
(8 blocks)

YYYYCbCrCbCr

Studio production
environments
Professional editing
equipment,

4:4:4
(12 blocks)

YYYYCbCrCbCrCbCrCbCr

Computer graphics

4:2:0
(6 blocks)

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

106

MPEG profiles v cc mc:

MPEG2 c chia lm vi profile


Cc c Im ca profile chnh:

nh dng mu 4:2:0
nh I, P, B

Khng c kh nng thay i t l

Main profile c chia nh thnh cc mc:

MP@ML (Main profile main level):

c thit k vi chun CCIR601 cho video s qut xen k


720x576 (PAL) hay 720x483 (NTSC)
30 Hz qut lin tc, 60 Hz qut xen k.

Tc bit cao nht 15Mbit/s

MP@HL (Main profile high level):

Gii hn trn:
1152x1920, 60 Hz qut lin tc

80 Mbits/s

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

107

MPEG Profiles & levels

MPEG-2 is classified into several profiles.


Main profile features:

4:2:0 chroma sampling format


I, P, and B pictures
Non-scalable

Main Profile is subdivided into levels.

MP@ML (Main Profile Main Level):

MP@HL (Main Profile High Level):

9/14/2006

Designed with CCIR601 standard for interlaced standard digital


video.
720 x 576 (PAL) or 720 x 483 (NTSC)
30 Hz progressive, 60 Hz interlaced
Maximum bit rate is 15 Mbits/s
Upper bounds:
1152 x 1920, 60Hz progressive
80 Mbits/s
Nguyen Chan Hung Hanoi University of Technology

108

M ho/gii m MPEG:

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

109

MPEG encoder/decoder

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

110

D on:

D on sau c thc hin bng


cch lu cc nh cho n khi nh
tham kho mong mun sn sng,
trc khi m ho cc khung ang
c cha.
B m ho s quyt nh dng 1
trong 3 cch:

D on trc t cc nh trc
D on sau t cc nh pha sau
Hay d on ni suy

Mc ch gim thiu sai s d on

B m ho phi truyn cc nh theo 1


trt t khc vi nh ngun cho b
gii m c cc nh tham kho trc
khi gii m nh d on.

B gii m phi lu tr 2 khung

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

111

Prediction

Backward prediction is done by


storing pictures until the desired
anchor picture is available before
encoding the current stored frames.
The encoder can decide to use:

9/14/2006

Forward prediction from a previous


picture,
Backward prediction from a following
picture,
or Interpolated prediction

to minimize prediction error.


The encoder must transmit pictures in
an order differ from that of source
pictures so that the decoder has the
anchor pictures before decoding
predicted pictures. (See next slide)
The decoder must have two frame
stored.

Nguyen Chan Hung Hanoi University of Technology

112

Qu trnh sp xp li nh I P B

Cc nh c m ho v gii m theo cc th t khc vi th t


hin th

Do qu trnh d on 2 chiu ca nh B

V d chng ta c 1 GOP di 12 nh

Th t ngun v th t u vo b m ho:

1 2 3 4 5 6 7 8 9 10 11 12 13

IBBPBB P B B P B B I

Th t m ho v th t trong dng bit m ho:

1 4 2 3 7 5 6 10 8 9 13 11 12

I P B B PB B P B B I

Th t u ra b gii m v th t hin th (ging u vo)

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

113

I P B Picture Reordering

Pictures are coded and decoded in a different order


than they are displayed.
Due to bidirectional prediction for B pictures.
For example we have a 12 picture long GOP:
Source order and encoder input order:

Encoding order and order in the coded bitstream:

I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)
B(12) I(13)
I(1) P(4) B(2) B(3) P(7) B(5) B(6) P(10) B(8) B(9) I(13) B(11)
B(12)

Decoder output order and display order (same as


input):

9/14/2006

I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)
B(12) I(13)
Nguyen Chan Hung Hanoi University of Technology

114

Cng thc DCT v IDCT

DCT:

IDCT:

F(u,v) = ma trn DCT 2 chiu


N*N u,v,x,y = 0,1,2N-1

x,y l cc ta khng gian


u,v l ta tn s trong min
bin i

9/14/2006

Eq3 -> dng thng


Eq4 -> dng ma trn

Trong :

Eq1 -> dng thng


Eq2 -> dng ma trn

C(u) * C(v) = 1/2 vi u,v =0


C(u) * C(v) = 1 trong cc
trng hp khc

Nguyen Chan Hung Hanoi University of Technology

115

DCT and IDCT formulas

DCT:

IDCT:

Eq 3 Normal form
Eq 4 Matrix form

Where:

9/14/2006

Eq 1 Normal form
Eq 2 Matrix form

F(u,v) = two-dimensional
NxN DCT.
u,v,x,y = 0,1,2,...N-1
x,y are spatial coordinates in
the sample domain.
u,v are frequency coordinates
in the transform domain.
C(u), C(v) = 1/(square root
(2)) for u, v = 0.
C(u), C(v) = 1 otherwise.

Nguyen Chan Hung Hanoi University of Technology

116

DCT vs DFT:

Khi nim DCT ging DFT ngoi tr:


DCT tp trung nng lng vo cc h s tn s thp
tt hn DFT.
DCT l thun thc, DFT l phc (bin , pha)
DCT hot ng trn 1 block ca cc im nh to ra
cc h s ging vi cc h s min tn s c to ra
bi DFT

9/14/2006

DCT N im c phn gii tn s ging nh DFT 2N im


N tn s ca DFT 2N im tng ng vi N im na trn
ca vng n v trong tn s phc

Vi u vo lp theo chu k, bin ca h s DFT


khng i (pha ca u vo ko nh hng). Vi DCT
th ko phi nh vy
Nguyen Chan Hung Hanoi University of Technology

117

DCT versus DFT

The DCT is conceptually similar to the DFT, except:

DCT concentrates energy into lower order coefficients


better than DFT.
DCT is purely real, the DFT is complex (magnitude and
phase).
A DCT operation on a block of pixels produces coefficients
that are similar to the frequency domain coefficients
produced by a DFT operation.

9/14/2006

An N-point DCT has the same frequency resolution as a 2Npoint DFT.


The N frequencies of a 2N point DFT correspond to N points
on the upper half of the unit circle in the complex frequency
plane.

Assuming a periodic input, the magnitude of the DFT


coefficients is spatially invariant (phase of the input does
not matter). This is not true for the DCT.
Nguyen Chan Hung Hanoi University of Technology

118

The weighting process (BS)

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

119

Ma trn lng t ho:

Ch gi tr cc
h s DCT l:

Ti sao?

9/14/2006

Nh trn bn tri
(tn s thp)
Ln gc di bn
phi (tn s cao)
xem li JPEG
HVS t nhy cm vi
cc li tn s cao
hn cc tn s thp
Tn s cng cao
cng nn c lng
t ho th hn

Nguyen Chan Hung Hanoi University of Technology

120

Quantization matrix

Note DCT
coefficients are:
Small in the upper left
(low frequencies),
Large in the upper right
(high frequencies)
Recall the JPEG
mechanism !!

Why ?

9/14/2006

HVS is less sensitive


to errors in high
frequency coefficients
than it is for lower
frequencies
higher frequencies
should be more
coarsely quantized !!

Nguyen Chan Hung Hanoi University of Technology

121

Kt qu ma trn DCT (v d)

9/14/2006

Sau khi lng t


ho ph hp, kt
qu l 1 ma trn
c nhiu gi tr 0

Nguyen Chan Hung Hanoi University of Technology

122

Result DCT matrix (example)

9/14/2006

After adaptive
quantization, the
result is a matrix
containing many
zeros.

Nguyen Chan Hung Hanoi University of Technology

123

Qut MPEG:

Tri -> qut ziczac (nh JPEG)


Phi -> qut thay phin xen k -> tt hn cho khung qut
xen k

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

124

MPEG scanning

Left Zigzag scanning (like JPEG)


Right Alternate scanning better for interlaced frames !

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

125

Huffman/Run-level coding:

M Huffman kt hp vi m ha Run-level v thut


qut ziczac c ng dng cho cc h s DCT
lng t ho
Run-level = mt dy cc s 0 tip theo cc mc
khc 0
M Huffman cng c p dng cho nhiu loi
thng tin ph khc nhau
M Huffman l mt m entropy, n to ra c mt
cch ti u di t m trung bnh ngn nht cho 1
ngun tin.

9/14/2006

di t m trung bnh ny >= entropy ca ngun


Nguyen Chan Hung Hanoi University of Technology

126

Huffman/Run-Level Coding

Huffman coding in combination with Run-Level


coding and zig-zag scanning is applied to
quantized DCT coefficients.
"Run-Level" = A run-length of zeros followed by a
non-zero level.
Huffman coding is also applied to various types of
side information.
A Huffman code is an entropy code which is
optimally achieves the shortest average possible
code word length for a source.
This average code word length is >= the entropy
of the source.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

127

Minh ho m Huffman/run-level
Zero
Run-Length

Amplitude

MPEG
Code Value

N/A

8 (DC Value)

110 1000

0000 1100

0000 1100

0100 0

0100 0

0100 0

110

110

110

110

12

0010 0010 0

EOB

EOB

10

9/14/2006

S dng ma trn u ra
DCT slide trc, sau khi
c qut ziczac -> u ra
s l 1 chui s:
4,4,2,2,2,1,1,1,1,0 (12 s
0),1,0 (41 s 0)
Cc gi ti ny c tra
trong bng cc m c
chiu di bin i

Cc gi tr xut hin nhiu


nht c gn cc m
ngn
Cc gi tr xut hin t nht
c gn cc m di

Nguyen Chan Hung Hanoi University of Technology

128

Huffman/Run-Level coding illustrated


Zero
Run-Length

Amplitude

MPEG
Code Value

N/A

8 (DC Value)

110 1000

0000 1100

0000 1100

0100 0

0100 0

0100 0

110

110

110

110

12

0010 0010 0

EOB

EOB

10

9/14/2006

Using the DCT output


matrix in previous slide,
after being zigzag
scanned the output
will be a sequence of
number: 4, 4, 2, 2, 2, 1,
1, 1, 1, 0 (12 zeros), 1, 0
(41 zeros)
These values are looked
up in a fixed table of
variable length codes

The most probable


occurrence is given a
relatively short code,
The least probable
occurrence is given a
relatively long code.

Nguyen Chan Hung Hanoi University of Technology

129

Minh ho m huffman/run-level (2)

12 s 0 u c m ho hiu qu ch bng 9bits

41 s 0 sau b loi b, thay bi 2 bit ch th End Of


Block (EOB)

Cc h s DCT lng t ho lc ny c th hin


bi 1 chui 61 bit nh phn (xem bng)

Ch y rng block nguyn bn 8x8 vi 8 bit/ pixel i


hi 512 bit cho hin th y b m ha Huffman
t tc nn xp x 8,4:1

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

130

Huffman/Run-Level coding illustrated (2)

The first run of 12 zeroes has been efficiently


coded by only 9 bits
The last run of 41 zeroes has been entirely
eliminated, represented only with a 2-bit End Of
Block (EOB) indicator.
The quantized DCT coefficients are now
represented by a sequence of 61 binary bits (See
the table).
Considering that the original 8x8 block of 8-bit
pixels required 512 bits for full representation,
the compression rate is approx. 8,4:1.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

131

Qu trnh truyn d liu MPEG:

MPEG ng gi ton b d liu vo cc gi c kch thc c nh l 188 byte


truyn
D liu m thanh, hnh nh c t vo trong cc gi PES trc khi c ct ra
thnh cc gi vn chuyn c di c nh
1 gi PES c th di hn nhiu so vi 1 gi vn chuyn do cn phn on:

9/14/2006

Header PES c t ngay tip theo header gi vn chuyn


Cc phn lin tip nhau ca gi PES sau c t vo phn ti trng ca gi vn
chuyn
Khng gian cn li trong ti trng ca gi vn chuyn s c thm vo cc byte chn
0xFF
Mi gi vn chuyn bt u vi 1 byte ng b gi tr 0x47
Trong h thng truyn dn ATSC mt t DTV VSB ca M, byte ng b khng c x
l, nhng c thay th bng mt biu tng ng b c bit khc ph hp cho truyn
dn RF
Header gi vn chuyn cha 1 PID 13 bit (ID ca gi), PID ny dng xc nh 1 lung
c s m thanh, hnh nh hay cc phn t chng trnh khc
PID 0x0000 c dnh ring cho gi vn chuyn mang bng lin kt chng trnh PAT
PAT tr ti bng nh x chng trnh PMT bng ny li tr ti cc phn t ring bit
ca mt chng trnh
Nguyen Chan Hung Hanoi University of Technology

132

MPEG Data Transport

MPEG packages all data into fixed-size 188-byte packets for transport.
Video or audio payload data placed in PES packets before is broken up
into fixed length transport packet payloads.
A PES packet may be much longer than a transport packet Require
segmentation:

9/14/2006

The PES header is placed immediately following a transport header


Successive portions of the PES packet are then placed in the payloads of
transport packets.
Remaining space in the final transport packet payload is filled with stuffing
bytes = 0xFF (all ones).
Each transport packet starts with a sync byte = 0x47.
In the ATSC US terrestrial DTV VSB transmission system, sync byte is not
processed, but is replaced by a different sync symbol especially suited to RF
transmission.
The transport packet header contains a 13-bit PID (packet ID), which
corresponds to a particular elementary stream of video, audio, or other program
element.
PID 0x0000 is reserved for transport packets carrying a program association
table (PAT).
The PAT points to a Program Map Table (PMT) points to particular elements
of a program
Nguyen Chan Hung Hanoi University of Technology

133

PAT & PMT (BS)

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

134

MPEG Program Stream (PS) (BS)

Program Streams have variable length


packets with headers.
They are used in data transfers to and from
optical and hard disks, which are error free
and in which files of arbitrary sizes are
expected.
VCD/DVD uses Program Streams.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

135

MPEG Transport Stream (vs. Program stream) (BS)

For transmission and digital broadcasting, several programs and


their associated PES can be multiplexed into a single
Transport Stream.
A Transport Stream differs from a Program Stream in that:
PES packets are further subdivided into short fixed-size
packets
Multiple programs encoded with different clocks can be
carried.
How ?: Transport stream has a program clock reference
(PCR) mechanism which allows transmission of multiple
clocks
One of these clocks is selected and regenerated at the
decoder.
A Single Program Transport Stream (SPTS) is also possible
and this may be found between a coder and a multiplexer.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

136

Gi vn chuyn MPEG:

Trng thch nghi:

8 bit nh di trng thch nghi


Nhm u tin ca cc c gm 8
c 1 bit: C ch th gin on, c
ch th truy cp ngu nhin, ch th
lung c s u tin, c PCR, c
OPCR, c ghp ni, c vn
chuyn d liu ring, c trng
thch nghi m rng

9/14/2006

PCR_flag
OPCR_flag
splicing_point_flag
transport_private_data_flag
adaptation_field_extension_flag

Cc trng tu chn s xut hin nu


c ch th bi 1 trong cc c i trc.
Phn cn li ca trng thch nghi c
in vi cc byte chn 0xFF

Nguyen Chan Hung Hanoi University of Technology

137

MPEG Transport packet

Adaptation Field:

8 bits specifying the length of the


adaptation field.
The first group of flags consists of
eight 1-bit flags:
discontinuity_indicator
random_access_indicator
elementary_stream_priority_in
dicator

9/14/2006

PCR_flag
OPCR_flag
splicing_point_flag
transport_private_data_flag
adaptation_field_extension_flag
The optional fields are present if
indicated by one of the preceding flags.
The remainder of the adaptation field is
filled with stuffing bytes (0xFF, all
ones).

Nguyen Chan Hung Hanoi University of Technology

138

Qu trnh tch lung chuyn vn MPEG-TS

1.
2.
3.

4.

Qu trnh tch lung chuyn vn MPEG (TS) bao gm:


Tm PAT bng cch chn cc gi vi PID = 0x0000
c cc PID ca cc PMT
c cc PID cho cc phn t ca chng trnh mong
mun t cc PMT ca n (v d, 1 chng trnh c bn
s c PID cho m thanh v PID cho hnh nh)
D cc gi vi cc PID mong mun v nh tuyn chng
n cc b gii m

1 lung chuyn vn MPEG2 c th mang:

Dng video

Dng audio

D liu khc
lung chuyn vn MPEG2 l nh dng gi cho truyn
thng d liu ng xung (downstream) trn mng CATV

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

139

Demultiplexing a Transport Stream (TS)


Demultiplexing a transport stream involves:

Finding the PAT by selecting packets with PID = 0x0000


Reading the PIDs for the PMTs
Reading the PIDs for the elements of a desired program
from its PMT (for example, a basic program will have a
PID for audio and a PID for video)
Detecting packets with the desired PIDs and routing them
to the decoders

1.
2.
3.

4.

A MPEG-2 transport stream can carry:

Video stream

Audio stream

Any type of data


MPEG-2 TS is the packet format for CATV downstream
data communication.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

140

nh thi v iu khin m:

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

im A: u
vo b m ho
tc khng
i
im B: u ra
b m ho
tc thay i
im C: u ra
b m m ho
tc khng
i
im D: Knh
giao tip + b
m gii m
tc khng i
im E: u
vo b gii m
tc thay
i
im F: u ra
b gii m
tc khng i
141

Timing & buffer control

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

Point A:
Encoder input

Constant/specifi
ed rate
Point B:
Encoder
output
Variable rate
Point C:
Encoder buffer
output
Constant rate
Point D:
Communication
channel +
decoder buffer
Constant
rate
Point E:
Decoder input
Variable rate
Point F:
Decoder output

Constant/specifi
ed rate
142

ng b thi gian

B gii m c ng b vi b m ho bi cc nhn thi gian


B m ho cha b dao ng ch v b m, c gi l ng h thi
gian h thng (STC) (xem s khi trn)

STC thuc v 1 chng trnh ring v l ng h ch ca b m ho


video, audio cho chng trnh
Nhiu chng trnh, mi chng trnh c STC ring, c th c ghp vo
1 lung

1 thnh phn chng trnh c th thm ch khng c nhn thi gian ->
nhng s khng th ng b vi cc thnh phn khc
u vo b m ho, (im A), thi gian xut hin ca video pic hay
audio block u vo c nh du bng cch ly mu STC.
tr tng cng ca b m m ho v gii m c cng thm vo
STC, to nn nhn thi gian hin th (PTS)

9/14/2006

PTS sau c chn vo v tr u tin ca gi th hin cc nh v


block audio , im B

Nguyen Chan Hung Hanoi University of Technology

143

Timing - Synchronization

The decoder is synchronized with the encoder by time stamps


The encoder contains a master oscillator and counter, called the
System Time Clock (STC). (See previous block diagram.)

The STC belongs to a particular program and is the master


clock of the video and audio encoders for that program.
Multiple programs, each with its own STC, can also be
multiplexed into a single stream.

A program component can even have no time stamps but


can not be synchronized with other components.
At encoder input, (Point A), the time of occurrence of an input
video picture or audio block is noted by sampling the STC.
A total delay of encoder and decoder buffer (constant) is
added to STC, creating a Presentation Time Stamp (PTS),

9/14/2006

PTS is then inserted in the first of the packet(s) representing


that picture or audio block, at Point B.

Nguyen Chan Hung Hanoi University of Technology

144

ng b thi gian (2)

Nhn thi gian gii m DTS c th c kt hp 1 cch ty chn


vo dng bit -> n th hin cho thi im m d liu phi c ly i
ngay t b m gii m v em gii m.

Trong ATSC -> PTS hay DTS phi c chn vo u mi nh m ha


Thm vo , u ra ca b m m ho (im C) c dn nhn thi
gian bng cc gi tr STC, v c gi l:

DTS v PTS ging nhau ngoi tr trng hp sp xp lI cc nh B


DTS ch c s dng cho nhng ni cn sp xp li.
PTS hay DTS c chn vo vi khong thi gian =< 700mS

Tham chiu ng h h thng (SCR) trong lung chng trnh.


Tham chiu ng h chng trnh (PCR) trong lung chuyn vn

Chu k chn ca PCR =< 100mS


Chu k chn ca SCR =< 700mS
PCR v/hoc SCR c s dng ng b STC ca b gii m vi STC
ca b m ho

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

145

Timing Synchronization (2)

Decode Time Stamp (DTS) can optionally combined into the bit
stream represents the time at which the data should be taken
instantaneously from the decoder buffer and decoded.

In addition, the output of the encoder buffer (Point C) is time


stamped with System Time Clock (STC) values, called:

DTS and PTS are identical except in the case of picture reordering for B
pictures.
The DTS is only used where it is needed because of reordering.
Whenever DTS is used, PTS is also coded.
PTS (or DTS) inserted interval 700 mS.
In ATSC PTS (or DTS) must be inserted at the beginning of each
coded picture (access unit ).
System Clock Reference (SCR) in a Program Stream.
Program Clock Reference (PCR) in a Transport Stream.

PCR time stamp interval 100mS.


SCR time stamp interval 700mS.
PCR and/or the SCR are used to synchronize the decoder STC
with the encoder STC.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

146

ng b thi gian (3)

Tt c cc dng video audio nm trong cng 1 chng trnh phI ly nhn thi
gian ca chng t 1 STC chung c th ng b cc b gii m video v
audio vi nhau
Tc d liu v tc gi trn knh ( u ra b ghp knh) c th hon
ton khng ng b vi ng h thi gian h thng STC
Cc nhn thi gian PCR cho php s ng b ca cc chng trnh khc
nhau vi STC khc nhau ghp knh vi nhau trong khi vn cho php ti to
li STC ca mi chng trnh
Nu khng xy ra hin tng trn hoc rng b m th tr trong b m
v knh dn ca c video v audio l khng i
u vo b m ho v u ra b gii m chy vi tc bng nhau v khng
i

Tr t u vo b m ho v u ra b gii m l c nh
Nu khng cn s ng b chnh xc, th ng h gii m c th chy t
do cc khung video c th lp li hoc b qua khi cn thit ngn cn
vic rng hoc trn b m.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

147

Timing Synchronization (3)

All video and audio streams included in a program must get their
time stamps from a common STC so that synchronization of the
video and audio decoders with each other may be accomplished.
The data rate and packet rate on the channel (at the multiplexer
output) can be completely asynchronous with the System Time
Clock (STC)
PCR time stamps allows synchronizations of different
multiplexed programs having different STCs while allowing STC
recovery for each program.
If there is no buffer underflow or overflow delays in the buffers
and transmission channel for both video and audio are
constant.
The encoder input and decoder output run at equal and constant
rates.
Fixed end-to-end delay from encoder input to decoder output
If exact synchronization is not required, the decoder clock can be
free running video frames can be repeated / skipped as
necessary to prevent buffer underflow / overflow, respectively.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

148

HDTV (High definition television)

High definition television (HDTV) first came to


public attention in 1981, when NHK, the
Japanese broadcasting authority, first
demonstrated it in the United States.
HDTV is defined by the ITU-R as:

9/14/2006

'A system designed to allow viewing at about


three times the picture height, such that the
system is virtually, or nearly, transparent to the
quality or portrayal that would have been
perceived in the original scene ... by a discerning
viewer with normal visual acuity.'

Nguyen Chan Hung Hanoi University of Technology

149

HDTV (Truyn hnh nt cao)

HDTV ln u n vi cng chng vo nm


1981, khi NHK, i truyn hnh Nht Bn, th
nghim ln u tin M

HDTV c nh ngha bi ITU-R nh l:

9/14/2006

1 h thng thit k cho php mt ngi vi th


gic bnh thng t 1 khong cch gp 3 ln
chiu cao nh, nhn thc khung cnh vi cht
lng gn nh cnh gc.

Nguyen Chan Hung Hanoi University of Technology

150

HDTV (2)

HDTV proposals are for a screen which is wider than the conventional
TV image by about 33%. It is generally agreed that the HDTV aspect
ratio will be 16:9, as opposed to the 4:3 ratio of conventional TV
systems. This ratio has been chosen because psychological tests have
shown that it best matches the human visual field.
It also enables use of existing cinema film formats as additional source
material, since this is the same aspect ratio used in normal 35 mm film.
Figure 16.6(a) shows how the aspect ratio of HDTV compares with that
of conventional television, using the same resolution, or the same
surface area as the comparison metric.
To achieve the improved resolution the video image used in HDTV
must contain over 1000 lines, as opposed to the 525 and 625 provided
by the existing NTSC and PAL systems. This gives a much improved
vertical resolution. The exact value is chosen to be a simple multiple of
one or both of the vertical resolutions used in conventional TV.
However, due to the higher scan rates the bandwidth requirement for
analogue HDTV is approximately 12 MHz, compared to the nominal 6
MHz of conventional TV

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

151

HDTV (2)

HDTV yu cu 1 mn hnh rng hn mn hnh tivi quy c thng


thng khong 30%. iu ny cho php rng t l mn nh s l
16:9 khc vi t l 4:3 ca h thng tivi quy c.

Chn t l ny v cc th nghim tm l ch ra rng n ph hp


nht vi quan st ca con ngi.

N cng cho php vic s dng cc dng phim chiu bng hin c,
v y cng l t l mn nh s dng cho phim 35mm thng thng.
nhn phn gii cao hn, cc nh dng trong HDTV phi cha
hn 1000 dng, khc vi h NTSC v PAL hin ti ch c 525 hay
625 dng.
iu ny em li phn gii theo chiu dc cao hn. Gi tr chnh
xc c chn la l bi s ca mt phn gii ca TV thng.
Tuy vy, do tc qut cao hn nn di thng yu cu cho HDTV
tng t xp x 12MHz, so vi 6MHz ca TV thng.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

152

HDTV (3)

The introduction of a non-compatible TV transmission format for


HDTV would require the viewer either to buy a new receiver, or to
buy a converter to receive the picture on their old set.
The initial thrust in Japan was towards an HDTV format which is
compatible with conventional TV standards, and which can be
received by conventional receivers, with conventional quality.
However, to get the full benefit of HDTV, a new wide screen, high
resolution receiver has to be purchased.
One of the principal reasons that HDTV is not already common is
that a general standard has not yet been agreed. The 26th CCIR
plenary assembly recommended the adoption of a single, worldwide
standard for high definition television.
Unfortunately, Japan, Europe and North America are all investing
significant time and money in their own systems based on their own,
current, conventional TV standards and other national
considerations.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

153

HDTV (3)

S a ra nh dng truyn dn TV khng tng thch cho HDTV


s yu cu ngi xem hoc phi mua 1 b thu mi hoc phi mua
b bin i nhn c hnh nh trn TV c ca h.
Xu hng Nht hng ti 1 nh dng HDTV tng thch vi h
thng TV c, v c th thu c bng TV thng vi cht lng
bnh thng.
Tuy nhin c c li ch y t HDTV, th phi mua 1
mn nh rng v mt u thu c nt cao.
1 trong nhng nguyn nhn chnh m HDTV cha thng dng l
1 chun chung vn cha c tha nhn.
Hi ngh CCIR ln th 26 khuyn ngh 1 chun h thng ton cu
cho TV nt cao.
Tuy vy, Nht, Chu u, Bc M v ang u t 1 s tin v thi
gian cho vic pht trin h thng ca ring h da trn chun TV
thng thng ca cc nc ny.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

154

H261- H263

The H.261 algorithm was developed for the purpose of image


transmission rather than image storage.
It is designed to produce a constant output of p x 64 kbivs, where
p is an integer in the range 1 to 30.
This allows transmission over a digital network or data link of
varying capacity.
It also allows transmission over a single 64 kbit/s digital
telephone channel for low quality video-telephony, or at higher bit
rates for improved picture quality.
The basic coding algorithm is similar to that of MPEG in that it is
a hybrid of motion compensation, DCT and straightforward
DPCM (intra-frame coding mode), without the MPEG I, P, B
frames.
The DCT operation is performed at a low level on 8 x 8 blocks of
error samples from the predicted luminance pixel values, with
sub-sampled blocks of chrominance data.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

155

H261- H263

Thut ton H261 c pht trin vi mc ch truyn nh


hn l cha nh.
N c thit k sinh ra mt u ra tc khng i p
x 64 kbps, trong p l 1 s nguyn t 1->30

Cho php truyn qua 1 mng s hay kt ni d liu c dung lng


bin i
N cng cho php truyn tng 64kbit/s qua knh thoi s cho
video phone cht lng thp, hoc tc bit cao hn vi cht
lng nh cao hn.

Thut m ho c bn ging vi MPEG, l h thng lai ca b


chuyn ng, DCT v DPCM n gin khng c c cu khung
MPEG I P B
DCT c thc hin mc thp trn 8x8 block ca cc li d
on t cc gi tr im nh chi c d on, vi cc mu
block ph ca d liu mu.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

156

H261-H263 (2)

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

157

H261-H263 (3)

H.261 is widely used on 176x 144 pixel images.


The ability to select a range of output rates for the algorithm
allows it to be used in different applications.
Low output rates ( p = 1 or 2) are only suitable for face-to-face
(videophone) communication. H.261 is thus the standard used in
many commercial videophone systems such as the UK
BT/Marconi Relate 2000 and the US ATT 2500 products.
Video-conferencing would require a greater output data rate ( p >
6) and might go as high as 2 Mbit/s for high quality transmission
with larger image sizes.
A further development of H.261 is H.263 for lower fixed
transmission rates.
This deploys arithmetic coding in place of the variable length
coding (See H261 diagram), with other modifications, the data
rate is reduced to only 20 kbit/s.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

158

H261-H263 (3)

H261 c s dng rng ri vi nh 176x144 pixel


Kh nng la chn khong rng cc tc u ra cho php n
c dng trong nhiu ng dng khc nhau
Tc u ra thp (p = 1 hay 2) ch ph hp cho giao tip mt i
mt. H261 do c dng trong cc h thng videophone thng
mi nh UK BT/Marconi Relate 2000 v cc sn phm US ATT
2500
Hi tho hnh nh s yu cu tc d liu u ra ln hn (p>6) v
c th chy vi tc cao 2Mbit/s cho truyn dn tc cao vi
cc c nh ln hn.
Pht trin xa hn ca ca H261 l H263 cho tc truyn dn thp
hn.
H263 dng thut ton m ho s hc thay th cho VLC (nhn s
H261), v vi mt s ci tin khc cho tc d liu gim xung
n 20kbit/s

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

159

Model Based Coding (MBC)

At the very low bit rates (20 kbit/s or less) associated with video
telephony, the requirements for image transmission stretch the
compression techniques described earlier to their limits.
In order to achieve the necessary degree of compression they
often require reduction in spatial resolution or even the
elimination of frames from the sequence.
Model based coding (MBC) attempts to exploit a greater degree
of redundancy in images than current techniques, in order to
achieve significant image compression but without adversely
degrading the image content information.
It relies upon the fact that the image quality is largely subjective.
Providing that the appearance of scenes within an observed
image is kept at a visually acceptable level, it may not matter that
the observed image is not a precise reproduction of reality.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

160

Model Based Coding (MBC)

tc bit rt thp 20kbit/s hoc thp hn na trong cc ng dng


videophone, cc k thut nn c m t b y n gii hn ca
chng.
t c mc nn cn thit ngi ta phi gim phn gii
hoc thm ch loi bt cc khung trong chui nh.

Phng php m ha bng m hnh ha MBC c gng khai thc


d tha trong nh mc ln hn cc k thut hin ti,
t h s nn cao nhng khng cn phi gim qu nhiu cc
thng tin ca nh
N da vo mt hin tng l rng cht lng nh ph thuc vo
yu t ch quan.
Vi iu kin l s xut hin ca khung cnh trong 1 nh quan st
c c cht lng chp nhn c, s kh nhn ra vic nh quan
st khng phi l 1 sn phm ti to chnh xc ca nh thc.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

161

Model Based Coding (2)

One MBC method for producing an artificial image of a head sequence


utilizes a feature codebook where a range of facial expressions,
sufficient to create an animation, are generated from sub-images or
templates which are joined together to form a complete face.
The most important areas of a face, for conveying an expression, are
the eyes and mouth, hence the objective is to create an image in which
the movement of the eyes and mouth is a convincing approximation to
the movements of the original subject.
When forming the synthetic image, the feature template vectors which
form the closest match to those of the original moving sequence are
selected from the codebook and then transmitted as low bit rate coded
addresses.
By using only 10 eye and 10 mouth templates, for instance, a total of
100 combinations exists implying that only a 6-bit codebook address
need be transmitted.
It has been found that there are only 13 visually distinct mouth shapes
for vowel and consonant formation during speech.
However, the number of mouth sub-images is usually increased, to
include intermediate expressions and hence avoid step changes in the
image.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

162

Model Based Coding (2)

1 trong cc phng php MBC to ra 1 nh nhn to ca ci u s dng


bng m ha cha mt di cc c trng ca khun mt to ra 1
hot hnh, to ra t cc nh con hoc cc template c sn ghp vo nhau
to nn 1 khun mt hon chnh.
Vng quan trng nht ca 1 khun mt truyn cm chnh l mt v
mm, do bc nh to ra c sc thuyt phc th chuyn ng ca mt
v ming phi gn ging vi chuyn ng ca ngi tht.
Khi to mt bc nh nhn to, cc vct c trng gn nht vi chui
chuyn ng gc chn t bng m v s c truyn i di dng a ch
c m ha vi tc rt thp.
Bng cch ch s dng 10 mu mt v 10 mu ming cho sn, tng cng
s c 100 s kt hp m ch cn truyn i 1 a ch codebook 6bit.
Ngi ta tm thy rng ch c 13 kiu mm pht m cc nguyn m
v ph m trong khi ni.
Tuy nhin, s lng nh ph v mm thng c tng ln, m t c
cc cch din t tc thi v do trnh c cc bc thay i t ngt
trong nh.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

163

Model Based Coding (3)

Another common way of representing objects in threedimensional computer graphics is by a net of


interconnecting polygons.
A model is stored as a set of linked arrays which specify
the coordinates of each polygon vertex, with the lines
connecting the vertices together forming each side of a
polygon.
To make realistic models, the polygon net can be
shaded to reflect the presence of light sources.
The wire-frame model [Welch 19911 can be modified to
fit the shape of a person's head and shoulders. The
wire-frame, composed of over 100 interconnecting
triangles, can produce subjectively acceptable synthetic
images, providing that the frame is not rotated by more
than 30" from the full-face position.
The model, (see the Figure) uses smaller triangles in
areas associated with high degrees of curvature where
significant movement is required.
Large flat areas, such as the forehead, contain fewer
triangles.
A second wire-frame is used to model the mouth
interior.
9/14/2006

Nguyen Chan Hung Hanoi University of Technology

164

Model Based Coding (3)

1 cch khc din t ho my tnh ba chiu l bng 1 mng


li cc a gic lin kt nhau
1 m hnh c cha di dng mt tp hp cc ma trn lin
kt c chia ra thnh cc khi a gic u nhau, vi cc ng
ni gia cc nh to ra cc mt ca a gic.
to ra mu thc t, li a gic c th c to bng th
hin li s xut hin ca cc ngun sng.
Mu khung dy Welch 1991 c th c thay i to dng
ging nh u v vai ca mt ngi. Khung dy, gm hn 100
tam gic lin kt vi nhau, c th to ra bc nh nhn to chp
nhn c 1 cch ch quan, vi iu kin rng khung khng
b quay hn 30 so vi v tr c th thy ton b khun mt
M hnh trong hnh v s dng cc tam gic nh hn trong cc
vng c lin kt vi cong cao, ni c cc chuyn ng
quan trng.

Cc vng bng phng, rng nh trn c t tam gic

Khung dy th hai c dng m hnh ha pha trong ming.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

165

Model based coding (4)

A synthetic image is created by texture mapping detail from an


initial full-face source image, over the wire-frame, Facial
movement can be achieved by manipulation of the vertices of the
wire-frame.
Head rotation requires the use of simple matrix operations upon
the coordinate array. Facial expression requires the manipulation
of the features controlling the vertices.
This model based feature codebook approach suffers from the
drawback of codebook formation.
This has to be done off-line and, consequently, the image is
required to be prerecorded, with a consequent delay.
However, the actual image sequence can be sent at a very low
data rate. For a codebook with 128 entries where 7 bits are
required to code each mouth, a 25 frameh sequence requires
less than 200 bit/s to code the mouth movements.
When it is finally implemented, rates as low as 1 kbit/s are
confidently expected from MBC systems, but they can only
transmit image sequences which match the stored model, e.g.
head and shoulders displays.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

166

Model based coding (4)

1 bc nh nhn to c to ra bng cch nh x 1 cc chi tit (texture) t


nh ngun ban u c ton b khun mt ln khung dy, chuyn ng ca
mt c th to ra bng vic ko cc nh khung
S quay u i hi s dng cc thao tc n gin trn ma trn tin hnh
trn to ma trn. Trng thi ca khun mt yu cu phi ko cc nh
iu khin c trng.
Phng php m hnh ha da trn codebook ny c nhc im do qu
trnh to bng m codebook.
N phi c thc hin Ofline, yu cu ghi li nh trc v do gy ra
tr.
Tuy nhin, chui nh tht c th c gi tc d liu rt thp.Vi
codebook c 128 gi tr mm c m ho bi 7 bit, mt chui 25
khung yu cu phi nh hn 200bits/s m ho chuyn ng ca mm
Khi c hon thin, h thng MBC c th t cc tc thp n 1kbit/s,
nhng chng ch c th truyn cc chui nh ph hp vi cc m hnh
c sn, v d, th hin u v vai.

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

167

Key points:

JPEG coding mechanism DCT/ Zigzag Scanning/ Adaptive


Quantization / VLC
MPEG layered structure:

MPEG compression mechanism:

Pixel, Block, Macroblock, Field DCT Coding / Frame DCT Coding, Slice,
Picture, Group of Pictures (GOP), Sequence, Packetized Elementary Stream
(PES)
Prediction
Motion compensation
Scanning
YCbCr formats (4:4:4, 4:2:0, etc)
Profiles @ Level
I,P,B pictures & reordering
Encoder/ Decoder process & Block diagram

MPEG Data transport


MPEG Timing & Buffer control

9/14/2006

STC/SCR/DTS
PCR/PTS

Nguyen Chan Hung Hanoi University of Technology

168

Cc im quan trng

C ch m ho JPEG DCT qut ziczac lng t ho thch nghi


VLC
Cu trc lp ca MPEG
Pixel, Block, Macroblock, trng m ho DCT/ khung m ho DCT,
slice, Picture, GOP, sequene, PES

C ch nn MPEG:

D on
B chuyn ng

Qut
Cc dng YcbCr (4:4:4, 4:2:0, etc)
Profiles @ Level
I,P,B picture, s sp xp li
Qu trnh m ho/gii m, s khi
Truyn d liu MPEG
nh thi v iu khin m

9/14/2006

STC/SCR/DTS
PCR/PTS
Nguyen Chan Hung Hanoi University of Technology

169

Technical terms

Macro blocks
HVS = Human Visual System
GOP = Group of Pictures
VLC = Variable Length Coding/Coder
IDCT/DCT = (Inverse) Discrete Cosine Transform
PES = Packetized Elementary Stream
MP@ML = Main profile @ Main Level
PCR = Program Clock Reference
SCR = System Clock Reference
STC = System Time Clock
PTS = Presentation Time Stamp
DTS = Decode Time Stamp
PAT = Program Association Table
PMT = Program Map Table

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

170

Cc cm t k thut

Macroblock
HVS = Human Visual System
GOP = Group of picture
VLC = Variable Length Coding/Coder
IDCT/DCT = (Inverse) Discrete Cosine Transform
PES = Packetized Elementary Stream
MP@ML = Main Profile @ Mail Level
PCR = Program Clock Reference
SCR = System Clock Reference

STC = System time clock


PTS = Presentation Time Stamp

DTS = Decode Time Stamp


PAT = Program Association Table
PMT = Program Map Table

9/14/2006

Nguyen Chan Hung Hanoi University of Technology

171

You might also like