Ban Dich MultiMedia

Cng ngh Multimedia
Khi qut
Gii thiu
Chng 1: Nn tng k thut nn
Chng 2: Cc k thut multimedia
Jpeg
Mpeg-1/Mpeg-2 Audio&Video
Mpeg-4
Mpeg-7 (Gii thiu vn tt)
HDTV (Gii thiu vn tt)
H261/H263 (Gii thiu vn tt)
Model-Based coding (Gii thiu vn tt)
Chng 3: Mng multimedia
9/14/2006
Nguyen Chan Hung Hanoi University of Technology
Multimedia Technology
Overview
Introduction
Chapter 1: Background of compression
techniques
Chapter 2: Multimedia technologies
9/14/2006
JPEG
MPEG-1/MPEG-2 Audio & Video
MPEG-4
MPEG-7 (brief introduction)
HDTV (brief introduction)
H261/H263 (brief introduction)
Model base coding (MBC) (brief introduction)
Chapter 3: Multimedia Network

Gii thiu
Tm quan trng ca cc k thut Multimedia: -> Multimedia c

khp ni
Trong PC:
Trong truyn hnh v cc thit b in t dn dng:
Real player, Quicktime, Media

m nhc, hnh nh min ph trn internet (mp2, mp3, mp4, asf, ra, ram, mid,
DIVX, v..v...)
Hi tho trc tuyn m thanh, hnh nh
Dch v qung co trn web, truyn s liu
Gio dc t xa.
Y hc t xa
........
DVB-T/DVB-C/DVB-S (Digital Video Broadcastsing-Terrestrial/Cable/Satellite _
Truyn hnh s mt t/cp/v tinh) -> biu din MPEG-2 cht lng cao hn
hn truyn hnh tng t truyn thng.
Truyn hnh tng tc -> Cc ng dng internet trn truyn hnh (Mail,Web, Ecommerce_thng mi in t) -> khng cn i PC khi ng, tt my.
Cc u c CD/VCD/DVD/Mp3
ng thi xut hin trn cc thit b cm tay ( TD th h 3G, PDA

khng dy)
9/14/2006
Introduction
The importance of Multimedia technologies: Multimedia everywhere !!
On PCs:
Real Player, QuickTime, Windows Media.
Music and Video are free on the INTERNET (mp2, mp3, mp4, asf, mpeg,
mov, ra, ram, mid, DIVX, etc)
Video/Audio Conferences.
Webcast/ Streaming Applications
Distance Learning (or Tele-Education)
Tele-Medicine
Tele-xxx (Lets imagine !!)
On TVs and other home electronic devices:
DVB-T/DVB-C/DVB-S (Digital Video Broadcasting

Terrestrial/Cable/Satellite) shows MPEG-2 superior quality over
traditional analog TV !!
Interactive TV Internet applications (Mail, Web, E-commerce) on a TV !!

No need to wait for a PC to startup and shutdown !!
CD/VCD/DVD/Mp3 players
Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !!
9/14/2006
Gii thiu (2)
Mng Multimedia
9/14/2006
Internet c thit k vo nhng nm 60 cho cc

mng tc thp vi nhng ng dng vn bn
nhm chn. -> tr cao, jitter cao.
-> Nhng ng dng multimedia yu cu c s bin
i mnh m ca c s h tng internet.
Nhiu c cu t chc c nghin cu v trin khai
h tr cho th h multimedia internet tip theo.
(VD: intServ, DiffServ)
Trong tng lai, tt c mi tivi (v PC) s kt ni
internet v bt sng min ph vi hng triu trm
pht sng trn ton th gii.
Hin ti, mng multimedia chy trn ATM ( c),
IPv4, v tng lai l IPv6 -> nn s bo m c
cht lng dch v QoS (Quality of Service)
Introduction (2)
Multimedia network
9/14/2006
The Internet was designed in the 60s for low-speed internetworks with boring textual applications High delay,
high jitter.
Multimedia applications require drastic modifications
of the INTERNET infrastructure.
Many frameworks have been being investigated and
deployed to support the next generation multimedia
Internet. (e.g. IntServ, DiffServ)
In the future, all TVs (and PCs) will be connected to the
Internet and freely tuned to any of millions broadcast
stations all over the World.
At present, multimedia networks run over ATM (almost
obsolete), IPv4, and in the future IPv6 should
guarantee QoS (Quality of Service) !!
Chng 1: Nn tng k thut nn
Ti sao phi nn ?
H s nn hay t l nn
Trong truyn thng: thu hp di thng trong cc ng

dng mng multimedia nh streaming, video theo yu cu
VOD (video on demand), internet phone.
Cc vt cha k thut s (VCD, DVD, bng v..v..) -> gim
kch c, gim gi c, tng dung lng v cht lng ct
gi m thanh, hnh nh.
T l gia d liu ngun v d liu nn (VD: 10:1)
2 loi nn:
9/14/2006
Nn khng tn hao
Nn tn hao
Chapter 1: Background of compression

techniques
Why compression ?
Compression factor or compression ratio
For communication: reduce bandwidth in multimedia

network applications such as Streaming media, Video-onDemand (VOD), Internet Phone
Digital storage (VCD, DVD, tape, etc) Reduce size &
cost, increase media capacity & quality.
Ratio between the source data and the compressed data.
(e.g. 10:1)
2 types of compression:
9/14/2006
Lossless compression
Lossy compression
2.1. Ni dung thng tin v d tha
Ni dung thng tin:
Entropy l i lng o ca ni dung thng tin. Entropy

quy nh gii hn di ca tc bit hay dng d liu.
Tn hiu cng nhiu thng tin th entropy cng cao

Nn tn hao th lm gim entropy cn nn khng tn hao
th khng
D tha thng tin:
-> Biu din bi bits/n v ngun u ra (nh bits/pixel)
L s khc nhau gia tc thng tin v tc bit

Thng thng tc thng tin thp hn tc bit rt nhiu
Nn l loi b s d tha
9/14/2006
Information content and redundancy
Information rate
Entropy is the measure of information content.
Expressed in bits/source output unit (such as bits/pixel).
The more information in the signal, the higher the

entropy.
Lossy compression reduce entropy while lossless
compression does not.
Redundancy
The difference between the information rate and bit
rate.
Usually the information rate is much less than the bit
rate.
Compression is to eliminate the redundancy.
9/14/2006
10
2.2. Entropy (B sung 1)
For a discrete source X with a finite alphabet of N symbols (x0, . . ., xN.1)

and a probability mass function of p(x), the entropy of the source in
bits/symbol is given by
and measures the average number of bits/symbol required to describe the

source.
Such a discrete source is encountered in image compression, in which the
acquired digital image pixels can take on only a finite number of values as
determined by the number of bits used to represent each pixel.
It is easy to show (using the method of Lagrange multipliers) that the
uniform distribution achieves maximum entropy, given by H(X) = log2 N.
A uniformly distributed source can be considered to have maximum
randomness when compared with sources having other distributions
Combining this with the intuitive English text example mentioned previously,
it is apparent that entropy provides a measure of the compressibility of a
source. High entropy indicates more randomness; hence the source
requires more bits on average to describe a symbol.
9/14/2006
11
Entropy (b sung 2)
Calculating EntropyAn Example

An example illustrates the computation of entropy the difficulty in
determining the entropy of a fixed-length signal. Consider the
four-point signal [3/4 1/4 0 0].
There are three distinct values (or symbols) in this signal, with
probabilities 1/4, 1/4, and 1/2 for the symbols 3/4, 1/4, and 0,
respectively. The entropy of the signal is then computed as
This indicates that a variable length code requires 1.5

bits/symbol on average to represent this source.
In fact, a variable-length code that achieves this entropy is [10 11
0] for the symbols [3/4 1/4 0].
9/14/2006
12
2.3. Nn khng tn hao
D liu gii m ging ht d liu ngun
VD: Cc file u ra ca cc chng trnh tin ch

nh pkzip hay Gzip
H s nn khong 2:1 5:1 (ty theo d tha
thng tin)
Khng th bo m 1 t l truyn c nh -> v tc

d liu u ra bin i -> ny sinh cc vn
cho c cu ghi v truyn thng.
9/14/2006
13
Lossless Compression
The data from the decoder is identical to the

source data.
Example: archives resulting from utilities such as

pkzip or Gzip
Compression factor is around 2:1.
Can not guarantee a fix compression ratio

The output data rate is variable problems
for recoding mechanisms or communication
channel.
9/14/2006
14
2.4. Nn tn hao:
D liu gii nn khc dliu ngun nhng s khc

bit khng th phn bit c r rng bng tai
hoc mt thng.
Ph hp vi m thanh, hnh nh nn.

H s nn cao hn so vi nn khng tn hao (ln ti
100:1)
Da trn nhng kin thc v s nhn thc v th

gic v thnh gic
C thn nh 1 h s nn c nh
9/14/2006
15
Lossy Compression
The data from the expander is not identical to

the source data but the difference can not be
distinguished auditorily or visually.
Suitable for audio and video compression.

Compression factor is much higher than that of
lossless. (up to 100:1)
Based on the understanding of

psychoacoustic and psychovisual perception.
Can be forced to operate at a fixed
compression factor.
9/14/2006
16
2.5. Qu trnh nn:
Truyn thng (gim chi ph kt ni d liu)
D liu -> B nn (m ho) -> knh truyn dn -> b

gin (gii m) -> d liu
C cu ghi (tng thi gian pht li: t l vi h s

nn)
9/14/2006
D liu -> nn (m ho) -> thit b cha (bng, a,

Ram ...) -> b gin (gii m) -> D liu
17
Process of Compression
Communication (reduce the cost of the data

link)
DataCompressor (coder)transmission channel

Expander (decoder) Data'
Recording (extend playing time: in proportion

to compression factor
9/14/2006
Data Compressor (coder) Storage device

(tape, disk, RAM, etc.) Expander (decoder)
Data
18
2.6. Ly mu v lng t ho:
Ti sao ly mu?
PCM (Pulse code modulation) - iu xung m:
My tnh khng th x l trc tip tn hiu tng t

Ly mu tn hiu tng t tc khng i v s dng mt s bit
khng i (thng l 8 hay 16) biu din cc mu.
Tc bit = tc ly mu * s bit/mu
Lng t ho:
9/14/2006
nh x cc tn hiu tng t ly mu (c chnh xc v

hn) sang cc mc ri rc ( chnh xc hu hn)
Biu din mi mc ri rc bng 1 s.
19
Sampling and quantization
Why sampling?
PCM
Computer can not process analog signal directly.

Sample the analog signal at a constant rate and
use a fixed number of bits (usually 8 or 16) to
represent the samples.
bit rate = sampling rate * number of bits per
sample
Quantization
9/14/2006
Map the sampled analog signal (generally, infinite

precision) to discrete level (finite precision).
Represent each discrete level with a number.
20
2.7. M ho d on:
D on:
Dng cc mu trc c lng mu hin thi.

i vi hu ht tn hiu, s khc nhau ca gi tr d on vi gi
tr thc t l nh -> ta c th dng s bit nh hn m ho s
sai khc trong khi vn duy tr c cng 1 chnh xc.
Gi i sai khc ca mu vi gi tr don c to ra t cc
mu trc.
Nhiu l hon ton khng th d on c
9/14/2006
Hu ht cc Codec yu cu d liu phi c x l trc, nu

khng Codec s hot ng km khi c nhiu.
21
Predictive Coding (b sung)
In predictive coding, rather than directly coding the data itself, the coded data consists of
a difference signal formed by subtracting a prediction of the data from the data
itself.
The prediction for the current sample is usually formed using past data. A predictive
encoder and decoder are shown in Figure, with the difference signal given by d. If the
internal loop states are initialized to the same values at the beginning of the signal, then y
= x.
If the predictor is ideal at removing redundancy, then the difference signal contains
only the new information at each time instant that is unrelated to previous data.
This new information is sometimes referred to as the innovation, and d is called the
innovations process. If predictive coding is used, an appropriate predictor must be
determined.
9/14/2006
22
Predictive coding
Prediction
Use previous sample(s) to estimate the current

sample.
For most signal, the difference of the prediction
and actual values is small. We can use smaller
number of bits to code the difference while
maintaining the same accuracy !!
Noise is completely unpredictable
9/14/2006
Most codec requires the data being preprocessed or

otherwise it may perform badly when the data contains
noise.
23
2.8. M ho thng k: M Huffman
Gn m ngn cho mu c xc sut xut hin cao

v gn m di cho mu t xut hin hn
Sgn bit da trn s thng k ca d liu
ngun.
Thng k d liu ngun c thc hin trc qu
trnh gn bit.
Cn gi l VLC Variable Length Coding
(Mt v d v Huffman code) M Morse..
9/14/2006
24
Statistical coding: the Huffman code
Assign short code to the most probable data

pattern and long code to the less frequent
data pattern.
Bit assignment based on statistic of the
source data.
The statistics of the data should be known
prior to the bit assignment.
9/14/2006
25
2.9. Nhc im ca nn:
D gy li d liu
i hi yu cu che giu i vi cc ng dng thi

gian thc
Nn loi b phn d tha tuy nhin nhng phn ny

li l yu t cn thit ngn cho d liu khng b li.
Cn thm m sa li, do cng thm phn d tha

vo d liu nn.
Mo nhn to (Artifact):
9/14/2006
Xut hin khi m ho loi b 1 phn entropy

H s nn cng cao cng c nhiu mo nhn to.
26
Drawbacks of compression
Sensitive to data error
Concealment required for real time application
Compression eliminates the redundancy which is essential

to making data resistant to errors.
Error correction code is required, hence, adds redundancy
to the compressed data.
Artifacts
9/14/2006
Artifacts appear when the coder eliminates part of the

entropy.
The higher the compression factor, the more the artifacts.
27
2.10. Mt v d v m ho: Tp hp cc im
mu.
Trong 1 tm nh, gi tr im nh c tp hp trong
vi cc i.
Mi tp hp i din cho 1 vng mu ca 1 i tng
trong nh (v d: bu tri xanh)
Qu trnh m ho:
Chia gi tr im nh thnh 1 s lng gii hn ca cc tp hp

d liu. (VD: tp hp cc im nh ca bu tri xanh hay ng
c xanh)
Gi thng tin ca tm nh bao gm mu chnh ca mi tp hp
v 1 con s nhn dng cho mi tp hp.
Vi mi im nh, truyn i:
9/14/2006
Mu trung bnh ca vng mu m n gn nht

S khc nhau ca n so vi tp hp mu trung bnh ( -> c th
c m ho gim d tha khi m cc s sai khc gn nh
nhau) -> c th d on
28
A coding example: Clustering color pixels
In an image, pixel values are clustered in several

peaks
Each cluster representing the color range of one
object in the image (e.g. blue sky)
Coding process:
Separate the pixel values into a limited number of data
clusters (e.g., clustered pixels of sky blue or grass green)
2. Send the average color of each cluster and an
identifying number for each cluster as side information.
3. Transmit, for each pixel:
1.
9/14/2006
The number of the average cluster color that it is close to.

Its difference from that average cluster color. ( can be
coded to reduce redundancy since the differences are often
similar !!) Prediction
29
2.11. M ho vi sai khung:
M ho vi sai khung = d on t khung hnh

trc .
1 khung hnh c cha trong b m ho so
snh vi khung hin ti -> gy ra tr 1 khung
Vi nh tnh:
Ch cn gi d liu ca 1 khung u tin

Ton b sai s d on sau c gi tr 0
Thnh thong truyn li khung cho php bn nhn (nu
mi c bt) c c im khi u
-> FDC gim thng tin ca nh tnh nhng li

st li kh nhiu d liu cho nh ng (VD: mt
chuyn ng ca camera)
9/14/2006
30
Frame-Differential Coding
Frame-Differential Coding = prediction from a

previous video frame.
A video frame is stored in the encoder for
comparison with the present frame causes
encoding latency of one frame time.
For still images:
Data can be sent only for the first instance of a frame

All subsequent prediction error values are zero.
Retransmit the frame occasionally to allow receivers that
have just been turned on to have a starting point.
FDC reduces the information for still images, but

leaves significant data for moving images (e.g. a
movement of the camera)
9/14/2006
31
2.12. D bo b chuyn ng
D liu trong FDC c th b loi b bng

cch so snh im nh hin ti vi v tr
ca i tng tng ng trong khung
hnh trc (-> ch khng phi v tr
khng gian tng ng trong khung trc
)
B m ho c lng s chuyn ng
trong nh tm vng tng ng trong
khung hnh trc
B m ho tm phn ging ca khung
trc vi khung mi sp truyn i.
Sau n gi 1 Vct chuyn ng,
vct ny s cho b gii m bit phn
no ca khung trc s c dng
d on khung mi.
ng thi n cng gi sai s d on
khi phc khung mi .
S trn -> khng c b chuyn ng.
S di -> c b chuyn ng.
9/14/2006
32
Motion Compensated Prediction
More data in Frame-Differential Coding can

be eliminated by comparing the present
pixel to the location of the same object
in the previous frame. ( not to the
same spatial location in the previous frame)
The encoder estimates the motion in the
image to find the corresponding area in a
previous frame.
The encoder searches for a portion of a
previous frame which is similar to the part
of the new frame to be transmitted.
It then sends (as side information) a
motion vector telling the decoder what
portion of the previous frame it will use to
predict the new frame.
It also sends the prediction error so that
the exact new frame may be reconstituted
See top figure without motion
compensation Bottom figure With
motion compensation
9/14/2006
33
Motion compensation (B sung)
Actions:
9/14/2006
1. Compute Motion
Vector
2. Shift Data from Picture
N Using Vector to Make
Predicted Picture N+1
3. Compare Actual
Picture with Predicted
Picture
4. Send Vector and
Prediction Error
34
2.12.1. Thng tin khng th d bo

Thng tin khng th d bo t khung trc
:
1.
2.
9/14/2006
S thay i ca phng nn (VD: phong cnh nn

thay i)
Thng tin mi ca vt th b che ph mi l ra
do chuyn ng ca vt th ngang qua nn,
hoc ra ca khung phong cnh (VD: khun mt
ca cu th b che bi tri bng ang bay)
35
Unpredictable Information
Unpredictable information from the previous
frame:
Scene change (e.g. background landscape

change)
2. Newly uncovered information due to object
motion across a background, or at the edges of a
panned scene. (e.g. a soccer s face uncovered
by a flying ball)
1.
9/14/2006
36
2.12.2. X l thng tin khng th d

bo trc (b sung)
Phng thay i
Thng tin b che khut:
nh m ho trong phi c gi u tin ->yu cu nhiu d liu hn

nh d on (P picture)
nh m ha trong c gi 2 ln/s -> Thi gian v tn s gi c th c
iu chnh ph hp vi s thay i phng.
nh m ho d on hai chiu Bi-directionally
Trong h thng phi c ch cha khung ch nh pha sau c c
thng tin mong mun.
gii hn b nh ca b gii m, b m ha cha cc nh v gi cc nh
tham kho c yu cu trc khi gi nh d on hai chiu
Trong k thut nn MPEG:
Cc nh c nn trong c gi l nh loi I (I picture)

Cc nh c m ha ch s dng cc nh tham chiu ngc gi l nh P
hay nh d on (P picture)
Cc nh c m ha t vic ni suy c cc nh tham chiu ngc v tham
chiu thun gi l nh B (B picture)
9/14/2006
37
Dealing with unpredictable Information
Scene change
Uncovered information
An Intra-coded picture (MPEG I picture) must be sent for a

starting point require more data than Predicted picture (P picture)
I pictures are sent about twice per second Their time and sending
frequency may be adjusted to accommodate scene changes
Bi-directionally coded type of picture, or B picture.
There must be enough frame storage in the system to wait for the
later picture that has the desired information.
To limit the amount of decoders memory, the encoder stores
pictures and sends the required reference pictures before
sending the B picture.
In MPEG:
9/14/2006
Pictures which are intracoded only are termed I pictures;

Pictures which are encoded using only backward references are
termed P pictures for Predictive
Pictures which are encoded frominterpolation of both a backward
reference and a forward reference are termed B pictures
38
2.13. M ho bin i (Transform Coding)
Bin i gi tr khng gian ca im nh thnh cc

gi tr ca cc h s bin i trong min tn s
S h s to ra bng vi s im nh c bin
i
Ch mt s t h s cha hu ht ni dung (nng
lng) ca nh cc h s ny c th c m
ho tip bi m ho entropy khng tn hao
Qu trnh bin i tp trung nng lng vo cc h
s c bit (ch yu l cc h s c tn s thp)
9/14/2006
39
Transform Coding
Convert spatial image pixel values to

transform coefficient values
the number of coefficients produced is
equal to the number of pixels transformed.
Few coefficients contain most of the
energy in a picture coefficients may be
further coded by lossless entropy coding
The transform process concentrates the
energy into particular coefficients
(generally the low frequency coefficients )
9/14/2006
40
M ho bin i (Transform Coding) (2)
Khi nim v histogram..
9/14/2006
41
2.13.1. Cc loi m bin i nh:
Cc loi m ho nh:
Fourier ri rc (DFT)
Karhonen-Loeve
Walsh-Hadamard
Lapped orthogonal
Cosine ri rc (DCT) -> dng trong MPEG 2
Wavelet -> Mi
Nhng s khc bit gia cc phng php m ho
bin i:
Kh nng tp trung nng lng vo mt s t h s
Vng nh hng ca mi h s trong nh khi phc
9/14/2006
S xut hin v kh nng nhn thy cc nhiu m ha sinh

ra do s lng t ho cc h s bin i
42
Types of picture transform coding
Types of picture coding:
Discrete Fourier (DFT)

Karhonen-Loeve
Walsh-Hadamard
Lapped orthogonal
Discrete Cosine (DCT) used in MPEG-2 !
Wavelets New !
The differences between transform coding methods:
9/14/2006
The degree of concentration of energy in a few coefficients

The region of influence of each coefficient in the
reconstructed picture
The appearance and visibility of coding noise due to coarse
quantization of the coefficients
43
2.13.2. M ho DCT c tn hao
M ho khng tn hao khng th t c

h s nn cao (khong 4:1 hoc t hn)
M ho tn hao = loi b thng tin 1 cch
chn lc sao cho kh phn bit gia sn
phm ngun v sn phm c ti to bng
th gic v thnh gic hoc gy ra t s mo
dng nht.
M ho tn hao c th c thc hin bi:
9/14/2006
Loi b mt s h s DCT
iu chnh th ca qu trnh lng t ha cc
h s -> bin php tt hn.
44
DCT Lossy Coding
Lossless coding cannot obtain high

compression ratio (4:1 or less)
Lossy coding = discard selective information
so that the reproduction is visually or aurally
indistinguishable from the source or having
least artifacts.
Lossy coding can be achieved by:
9/14/2006
Eliminating some DCT coefficients

Adjusting the quantizing coarseness of the
coefficients better !!
45
2.14. Hin tng mt n
Hin tng mt n lm cho mt s loi nhiu m

ha tr nn khng nhn thy hoc khng nghe thy
c.
Trong audio, 1 m thun nht s che du nng lng

c tn s cao hn v thp hn (vi nh hng yu hn)
Trong video, nhng l tng phn cao che du nhiu
ngu nhin
Nhiu sinh ra vi tc bit thp v thuc mt

trong cc loi tn s, khng gian, hoc thi gian.
V d v mt n m thanh: ting bom n t ting
chim ht..
9/14/2006
46
Masking
Masking make certain types of coding

noise invisible or inaudible due to some
psycho-visual/acoustical effect.
In audio, a pure tone will mask energy of higher

frequency and also lower frequency (with weaker
effect).
In video, high contrast edges mask random noise.
Noise introduced at low bit rates falls in the

frequency, spatial, or temporal regions
9/14/2006
47
2.15. Lng t ho bin i:
Lng t ho bin i l k thut chnh trong m ho tn hao

lm gim ng k tc bit
Trong mt bin i, lng t ho th cc h s khng quan

trng ( t c ch , c nng lng thp, kh nhn thy hoc
nghe c)
C th p dng cho ton b mt tn hiu hay cho cc thnh phn

tn s ring l ca mt tn hiu c m ha bin i.
Lng t ho bin i cng ng thi iu khin tc

bit :
Bin mt dng bt thnh mt knh tc bit khng i
Ngn cn hin tng b m trn hoc rng.
9/14/2006
48
Variable quantization
Variable quantization is the main technique of lossy

coding greatly reduce bit rate.
Coarsely quantizing the less significant coefficients
in a transform ( less noticeable / low energy / less
visible/audible)
Can be applied to a complete signal or to individual
frequency components of a transformed signal.
VQ also controls instantaneous bit rate in order to:
9/14/2006
Match average bit rate to a constant channel bit rate.

Prevent buffer overflow or underflow.
49
2.16. M ho Run-level
M ho Run-level = m ho mt dng zero

theo sau bi mt gi tr khc zero
Thay v gi tt c cc gi tr zero 1 cch ring bit

th ch gi chiu di ca dng d liu.
Hu ch cho cc d liu c dng Zero di
Cc dng ny d m ho bi m Huffman
V d (V d 1 ngi chn b m b
c v b ci)
9/14/2006
50
Run-Level coding
"Run-Level" coding = Coding a run-length of

zeros followed by a nonzero level.
9/14/2006
Instead of sending all the zero values

individually, the length of the run is sent.
Useful for any data with long runs of zeros.
Run lengths are easily encoded by Huffman code
51
M ho Run-level ( B sung)
Let an event represent the pair (run, level), where run represents the
number of zeros and level represents the magnitude of the
nonzero coefficient.
This coding process is sometimes called run-length coding Then, a

table is built to represent each event by a specific codeword (i.e., a sequence
of bits).
Events that occur more often are represented by shorter codewords,

and less frequent events are represented by longer codewords.
This entropy coding process is therefore called VLC or Huffman
coding.
Table shows part of a sample VLC table. In this table, the last bit s of
each codeword denotes the sign of the level, 0 for positive and 1 for
negative.
It can be seen that more likely events (i.e., short runs and low levels), are
represented with short codewords, and vice versa.
At the decoder, all the above steps are reversed one by one.
All the steps can be exactly reversed except for the quantization step,
which is where loss of information arises This is known as lossy
compression.
9/14/2006
52
Bng VLC mu
9/14/2006
53
Mi lin h gia cc k thut hc
Quy trnh nn MPEG
9/14/2006
D bo b chuyn ng (MOTION
ESTIMATION)
M ha bin i (DISCRETE COSINE
TRANSFORM - DCT)
Lng t ha bin i (QUANTIZATION)
ZIG ZAG SCAN
RUN LEVEL CODING (RLC)
M ha thng k - Huffman (VARIABLE
LENGTH CODING VLC)
54
Mi lin h gia cc k thut nn

Cc phng
php nn
Nn khng
tn hao
M ha
VLC
bin i
(Huffman)
9/14/2006
Nn tn hao
RLC
Lng t
ha bin i
M ha
d on
55
2.17. Tng kt:
Qu trnh nn
Ly mu v lng t ho
M ho:
M ho tn hao v khng tn hao

M ho vi sai khung
D bo b chuyn ng
Lng t ho bin i
M ho Run-level
Hin tng mt n
9/14/2006
56
Key points:
Compression process
Quantization & Sampling
Coding:
Lossless & lossy coding

Frame-Differential Coding
Motion Compensated Prediction
Variable quantization
Run level coding
Masking
9/14/2006
57
M ha Huffman (b sung) Bi tp mu
As a simple example of the use of Huffman codes for images, consider an image in which
the pixels (or the difference values) can have one of 8 brightness values.
This would require 3 bits per pixel (2^3=8) for conventional representation. From a
histogram of the image, the frequency of occurrence of each value can be determined and
as an example might show the following results (Table 1), in which the various brightness
values have been ranked in order of frequency. Huffman coding provides a straightforward
way to assign codes from this frequency table, and the code values for this example are
shown.
Note that each code is unique and no sequence of codes can be mistaken for any other
value, which is a characteristic of this type of coding.
Table 1. Example of Huffman codes assigned to brightness values
Brightness Value
4
5
3
6
2
7
1
0
Frequency
0.45
0.21
0.12
0.09
0.06
0.04
0.02
0.01
Huffman Code
1
01
0011
0010
0001
00001
000000
000001
Notice that the most commonly found pixel brightness value requires only a single bit, but
some of the less common values require 5 or 6 bits, more than the three that a simple
representation would need. Multiplying the frequency of occurrence of each value times the
length of the code gives an overall average of
0.451 + 0.212 + 0.124 + 0.094 + 0.064 + 0.045 + 0.026 + 0.016 = 2.33 bits/pixel
9/14/2006
58
Bi tp chng 1
BT 1: Cho bng 1 ( khng c phn m Huffman)

Hi: ( cha BT mu)
BT 2 : (c bng m HM) cu hi: (n tp)
Entropy ca nh trn l bao nhiu

Nu m ha nh phn bnh thng th cn bao nhiu bit
Nu m ha Huffman th cn bao nhiu bit nhn xt s
hiu qu ca m HM.
C nhn xt g v bng m ha HM ( di t m)
BT3: (cha mu v n tp)
9/14/2006
Cho hai hnh v v 2 nh, tnh ra s bit cn thit m

ha.. (TH s)
59
BT3:
Tnh xem s bit ti thiu m ha 2 nh

sau:
Hinh tri 63 con 0 v 1 con 1
Hnh phi 32 con 0 v 32 con 1
0
0
0
0
0
0
0
0
9/14/2006
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0 0
0 0
1 0
0 0
0 0
0 1
0 0
0 0
60
BT 3 (cha)
nh tri:
H(x) = -63/64 log2 63/63 1/64 log2 1/64 = 0,116

bit/pixel
nh phi:
9/14/2006
H(x) = -32/64 log2 32/64 32/64 log2 32/64 = 1

bit/pixel.
61
Chng 2: cc k thut multimedia
Ni dung
9/14/2006
JPEG
MPEG-1/MPEG-2 Video
MPEG-1 Layer 3 Audio (mp3)
MPEG-4
MPEG-7 (gii thiu)
HDTV (gii thiu)
H261/H263 (gii thiu)
M ho da trn m hnh ha (model base coding
- MBC) (gii thiu)
62
Chapter 2: Multimedia technologies
Roadmap
9/14/2006
JPEG
MPEG-1/MPEG-2 Video
MPEG-1 Layer 3 Audio (mp3)
MPEG-4
MPEG-7 (brief introduction)
HDTV (brief introduction)
H261/H263 (brief introduction)
Model base coding (MBC) (brief introduction)
63
JPEG (Joint Photographic Experts Group

nhm chuyn gia nghin cu nh)
B m ho JPEG
Chia nh thnh cc khi 8*8 pixels

Tnh ton bin i cosine ri rc cho mi khi
B lng t ha lm trn h s DCT da theo ma trn lng t tn
hao nhng li cho t l nn ln
To ra 1 chui cc h s DCT bng cch qut ziczac
Dng 1 m di bin i (Variable Length Code VLC) m ha cc h
s DCT
Ghi dng d liu nn ra file ( *.jpeg hay *.jpg)
B gii m JPEG
File dng d liu vo IDCT (Inverse DCT bin i DCT ngc)

nh
9/14/2006
64
JPEG (Joint Photographic Experts Group)
JPEG encoder
Partitions image into blocks of 8 * 8 pixels

Calculates the Discrete Cosine Transform (DCT) of each block.
A quantizer rounds off the DCT coefficients according to the
quantization matrix. lossy but allows for large compression ratios.
Produces a series of DCT coefficients using Zig-zag scanning
Uses a variable length code (VLC) on these DCT coefficients
Writes the compressed data stream to an output file (*.jpg or *.jpeg).
JPEG decoder
9/14/2006
File input data stream Variable length decoder IDCT (Inverse

DCT) Image
65
JPEG qut Zig-zag
9/14/2006
66
JPEG Zig-zag scanning
9/14/2006
67
JPEG - DCT
DCT ging DFT -> Bin i tn hiu hoc nh t min

khng gian sang min tn s
DCT i hi t php nhn hn DFT
nh u vo A:
nh A l ma trn im nh c kch thc N2 (rng) * N1

(cao)
A(i,j) l chi ca im nh hng i ct j
nh u ra B:
B(k1,k2) l h s DCT hng k1 v ct k2 ca ma trn

DCT
9/14/2006
68
JPEG - DCT
DCT is similar to the Discrete Fourier Transform

transforms a signal or image from the spatial domain to
the frequency domain.
DCT requires less multiplications than DFT
Input image A:
The input image A is N2 pixels wide by N1 pixels high;

A(i,j) is the intensity of the pixel in row i and column j;
Output image B:
9/14/2006
B(k1,k2) is the DCT coefficient in row k1 and column k2 of

the DCT matrix
69
JPEG Ma trn lng t ho
Ma trn lng t ha l ma trn 8*8 ca cc bc lng t mi

phn t ng vi mt h s DCT
Thng l i xng
Cc bc lng t s l:
Nh pha trn bn tri (tn s thp)
Ln pha di bn phi (tn s cao)
Bc lng t = 1 l chnh xc nht
B lng t chia h s DCT cho bc lng t tng ng ca n,
sau lm trn ti s nguyn gn nht
Cc bc lng t ln s lm cho cc h s nh gim xung bng 0
Kt qu l:
Nhiu h s tn s cao bin thnh zero -> loi b d dng
Cc h s tn s thp ch chu s iu chnh nh.
9/14/2006
70
JPEG - Quantization Matrix
The quantization matrix is the 8 by 8 matrix of step sizes

(sometimes called quantums) - one element for each DCT
coefficient.
Usually symmetric.
Step sizes will be:
Small in the upper left (low frequencies),
Large in the lower right (high frequencies)
A step size of 1 is the most precise.
The quantizer divides the DCT coefficient by its corresponding
quantum, then rounds to the nearest integer.
Large quantums drive small coefficients down to zero.
The result:
Many high frequency coefficients become zero remove easily.
The low frequency coefficients undergo only minor adjustment.
9/14/2006
71
Minh ho qu trnh m ho JPEG

1255 -15
43
58
-12
-4
-6
78
-1
-1
-73 -27
-1
-5
-5
-4
-1
-4
-5
-3
11
-65
80
-49
37
-87
12
10
27
-50
29
13
13
-6
-16
21
-11 -10
10
-21
-6
-1
-14
14
-14
16
-8
-4
-1
-13
12
-9
-1
-4
-2
-7
-1
DCT Coefficients
Quantization result
Kt qu scan Zigzag : 78 -1 1 -4 -5 4 4 6 3 2 -1 -3 -5 -4 -1 0 -1 0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB
d dng m ho bng Run-length Huffman

9/14/2006
72
JPEG Coding process illustrated

1255 -15
43
58
-12
-4
-6
78
-1
-1
-73 -27
-1
-5
-5
-4
-1
-4
-5
-3
11
-65
80
-49
37
-87
12
10
27
-50
29
13
13
-6
-16
21
-11 -10
10
-21
-6
-1
-14
14
-14
16
-8
-4
-1
-13
12
-9
-1
-4
-2
-7
-1
DCT Coefficients
Quantization result
Zigzag scan result: 78 -1 1 -4 -5 4 4 6 3 2 -1 -3 -5 -4 -1 0 -1 0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB
Easily coded by Run-length Huffman coding

9/14/2006
73
MPEG (Moving pic expert group nhm

chuyn gia nghin cu nh ng)
MPEG l tri tim ca:
u thu TV k thut s
B gii m HDTV
u c DVD
Hi tho truyn hnh
Internet video. v.v..
Cc chun MPEG:
9/14/2006
MPEG 1; MPEG 2; MPEG - 4; MPEG 7

MPEG 3 b b qua v tr thnh dng m rng
ca MPEG2
74
MPEG (Moving Picture Expert Group)
MPEG is the heart of:
Digital television set-top boxes

HDTV decoders
DVD players
Video conferencing
Internet video, etc
MPEG standards:
9/14/2006
MPEG-1, MPEG-2, MPEG-4, MPEG-7

(MPEG-3 standard was abandoned and became
an extension of MPEG-2)
75
Cc chun MPEG:
MPEG 1 ( lc hu)
MPEG 2 (ng dng rng ri)
1 chun cho tivi s

ng dng: DVD (digital versatile disk), HDTV(high definition TV), DVB
(European Digital Video Broadcasting Group), v.v.
MPEG 4 (ming dng vn cn ang nghin cu)
1 chun lu tr v phc hi hnh nh m thanh trn cc vt liu cha

media (digital media)
ng dng: VCD (video compact disk)
1 chun cho cc ng dng multimedia vi nn cao

ng dng: Internet, TV cp, studio o, v.v.
MPEG 7 (vn ang nghin cu pht trin)
9/14/2006
L 1 chun h trcho tm kim thng tin (gi l Giao din m t ni dung

Multimedia - MCDI)
ng dng: Internet, H thng tm kim Video, th vin s..
76
MPEG standards
MPEG-1 (Obsolete)
MPEG-2 (Widely implemented)
A standard for digital television

Applications: DVD (digital versatile disk), HDTV (high definition
TV), DVB (European Digital Video Broadcasting Group), etc.
MPEG-4 (Newly implemented still being

researched)
A standard for storage and retrieval of moving pictures and audio

on storage media
application: VCD (video compact disk)
A standard for multimedia applications

Applications: Internet, cable TV, virtual studio, etc.
MPEG-7 (Future work ongoing research)
9/14/2006
Content representation standard for information search

( Multimedia Content Description Interface)
Applications: Internet, video search engine, digital library
77
Cc chun MPEG-2 chnh thc
Chun Quc T ISO/IEC 13818-2 Phng

php m ha chung ca nh ng v m
thanh kt hp)
ATSC (U ban cc h thng truyn hnh tin

tin) ti liu A/54 Hng dn s dng chun
ti vi s ATSC)
9/14/2006
78
MPEG-2 formal standards
The international standard ISO/IEC 13818-2

"Generic Coding of Moving Pictures and
Associated Audio Information
ATSC (Advanced Television Systems
Committee) document A/54 "Guide to the Use of
the ATSC Digital Television Standard
9/14/2006
79
Cu trc d liu nh MPEG:
Dng d liu nh MPEG-2 c xy dng theo cc lp t thp n

cao nh sau:
PIXEL l n v c s
BLOCK l 1 mng 8x8 pixels
MACROBLOCK gm 4 block luma v 2 block chroma (dng cho
b chuyn ng, lng t ha)
SLICE gm cc macroblock vi s lng c th thay i (
khc phc li tryn dn)
PICTURE gm cc khung (hoc trng) ca cc slice
GROUP OF PICTURE (GOP) gm cc picture vi s lng c
th thay i
SEQUENCE cha cc GOP vi s lng c th thay i (dng
thit lp cc tham s Video)
PACKETIZED ELEMENTARY STREAM lung c s ng gi
(ty chn)
9/14/2006
80
MPEG video data structure
The MPEG 2 video data stream is constructed in

layers from lowest to highest as follows:
9/14/2006
PIXEL is the fundamental unit

BLOCK is an 8 x 8 array of pixels
MACROBLOCK consists of 4 luma blocks and 2 chroma
blocks
SLICE consists of a variable number of macroblocks
PICTURE consists of a frame (or field) of slices
GROUP of PICTURES (GOP) consists of a variable
number of pictures
SEQUENCE consists of a variable number of GOPs
PACKETIZED ELEMENTARY STREAM (opt)
81
MPEG layers
9/14/2006
82
Pixel v block:
Pixel = phn t nh
L mt im ly mu trong khng gian ca tm

nh
1 im nh mu c th c c trng s ho
bng mt s lng bit biu din cho mi gi tr
ca 3 mu c bn
Block
9/14/2006
1 block = 1 ma trn 8x8 pixels

1 block l n v c s cho m ho DCT
83
Pixel & Block
Pixel = "picture element".
A discrete spatial point sample of an image.

A color pixel may be represented digitally as a
number of bits for each of three primary color
values
Block
9/14/2006
= 8 x 8 array of pixels.
A block is the fundamental unit for the DCT coding
(discrete cosine transform).
84
Macroblock
1 macroblock = ma trn 16x16 ca cc im nh chi (Y) pixels ( =

4 blocks = ma trn 2x2 block)
S lng ca chroma pixel (Cr, Cb) thay i ph thuc vo cu trc

mu (chroma pixel) cu trc ny c biu th phn tip u
ca chui (sequence) (v d: 4:2:0)
Macroblock l n v c s cho b chuyn ng v s c vect

chuyn ng kt hp vi n nu n c m ha bng m d on
1 macroblock c phn loi:
M ha theo trng ( 1 khung qut xen k gm 2 trng bn nh)

M ha khung ( ph thuc vo cch rt ra 4 block t mt
macroblock)
9/14/2006
85
Macroblock
A macroblock = 16 x 16 array of luma (Y) pixels ( =

4 blocks = 2 x 2 block array).
The number of chroma pixels (Cr, Cb) will vary
depending on the chroma pixel structure
indicated in the sequence header (e.g. 4:2:0, etc)
The macroblock is the fundamental unit for motion
compensation and will have motion vector(s)
associated with it if is predictively coded.
A macroblock is classified as
9/14/2006
Field coded ( An interlaced frame consists of 2 field)

Frame coded depending on how the four blocks are
extracted from the macroblock.
86
Slice
Cc nh (picture) c chia ra nhiu slice (di)

1 slice gm 1 s bt k cc macroblock lin tip
(t tri sang phi), nhng thng thng l 1
hng lin nhau ca cc macroblock.
1 slice khng m rng ra qu 1 hng.
Tip u ca Slice mang thng tin a ch cho
php b gii m huffman ng b li cc
bin ca slice
9/14/2006
87
Slice
Pictures are divided into slices.

A slice consists of an arbitrary number of
successive macroblocks (going left to right),
but is typically an entire row of macroblocks.
A slice does not extend beyond one row.
The slice header carries address information
that allows the Huffman decoder to
resynchronize at slice boundaries
9/14/2006
88
Picture
1 nh ngun l 1 ma trn ch nht lin k ca cc pixel

1 nh c th l 1 khung video hon chnh (frame picture) hoc
1 trng qut xen k t 1 nh qut xen k (field picture)
1 field pic khng c 1 dng trng no gia cc dng
1 nh (cn gi l n v truy nhp video) bt u vi mt m
khi u v mt tip u. Tip u gm:
LoI nh (I, P, B)
Thng tin tham chiu thi gian
Khong tm kim vect chuyn ng
D liu tu chn ngi s dng
1 frame picture gm:
1 khung ca ngun qut lin tc (progressive) hay
2 bn nh qut xen k ca 1 nh ngun qut xen k
9/14/2006
89
Picture
A source picture is a contiguous rectangular array of pixels.

A picture may be a complete frame of video ("frame picture") or
one of the interlaced fields from an interlaced source ("field
picture").
A field picture does not have any blank lines between its active
lines of pixels.
A coded picture (also called a video access unit) begins with a
start code and a header. The header consists of:
picture type (I, B, P)
temporal reference information
motion vector search range
optional user data
A frame picture consists of:
a frame of a progressive source or
a frame (2 spatially interlaced fields) of an interlaced source
9/14/2006
90
I, P, B Pictures
nh m ho c chia lm 3 loI: I, P, B
I picture = Intra coded Pictures (nh m ha trong)
P picture = Predicted Pictures ( nh d on)
Cc macroblock c th c m ho vi d on trc t cc nh tham

kho I v P trc hoc cc macroblock c th c m ho trong
B picture = Bi-directionally predicted pictures (nh d on 2

chiu)
9/14/2006
Tt c cc macroblock u dng m ho khng c d on

nh I cn cho php pha thu c im bt u cho d on sau khi thay i
knh v cho php khi phc li sau cc li.
Cc macroblock c th c m ho bng d bo trc t cc nh tham

kho I v P trc
Cc macroblock c th c m ho bng d bo sau t cc nh tham kho
I v P tip theo
Cc macroblock c th c m ho bng d on ni suy t cc nh tham
kho I v P c qu kh v tng lai.
Cc macroblock c th c m ho trong (ko c d on)
91
I, P, B Pictures
Encoded pictures are classified into 3 types: I, P, and B.
I Pictures = Intra Coded Pictures
P Pictures = Predicted Pictures
All macroblocks coded without prediction

Needed to allow receiver to have a "starting point" for prediction after
a channel change and to recover from errors
Macroblocks may be coded with forward prediction from references
made from previous I and P pictures or may be intra coded
B Pictures = Bi-directionally predicted pictures
9/14/2006
Macroblocks may be coded with forward prediction from previous I

or P references
Macroblocks may be coded with backward prediction from next I or
P reference
Macroblocks may be coded with interpolated prediction from past
and future I or P references
Macroblocks may be intra coded (no prediction)
92
Nhm nh (GOP)
Lp GOP l tu chn trong MPEG2

GOP bt u vi m khi u v header
Header mang:
Thng tin v thi gian m ha

Thng tin v son tho Video (editing)
D liu tu chn ca ngi s dng
nh m ho u tin trong Gop lun l nh I

Chiu dI in hnh l 15 pic vi cu trc nh sau (minh ha di)
I B B P B B P B B P B B P B B cung cp nh I vi tn s y cho php b gii m

gii m 1 cch chnh xc
Forward motion compensation
Time
Bidirectional motion compensation

9/14/2006
93
Group of pictures (GOP)
The group of pictures layer is optional in MPEG-2.

GOP begins with a start code and a header
The header carries
time code information

editing information
optional user data
First encoded picture in a GOP is always an I picture

Typical length is 15 pictures with the following structure (in display order):
I B B P B B P B B P B B P B B Provides an I picture with sufficient

frequency to allow a decoder to decode correctly
Forward motion compensation
Time
Bidirectional motion compensation

9/14/2006
94
Sequence (chui):
1 sequence bt u vi mt m khi u duy nht di

32bit theo sau l 1 header
Header mang cc thng tin:
Kch thc nh
T s din mo (Aspect ratio)
Tc khung v tc bit
Cc ma trn lng t ho tu chn
Kch thc yu cu ca b m gii m
Cu trc mu (chroma pixel)
D liu tu chn ngi s dng
Thng tin chui cn cho vic thay i knh

di chui ph thuc vo gi tr tr i knh chp
nhn c
9/14/2006
95
Sequence
A sequence begins with a unique 32 bit start code followed by

a header.
The header carries:
picture size
aspect ratio
frame rate and bit rate
optional quantizer matrices
required decoder buffer size
chroma pixel structure
optional user data
The sequence information is needed for channel changing.
The sequence length depends on acceptable channel change
delay.
9/14/2006
96
Packetized Elementary Stream (PES)
u ra ca b m ha MPEG Audio hoc Video c gi l lung c s (ES)

l mt tn hiu gn thi gian thc v khng c gii hn.
cho thun tin, n c ct thnh cc khi d liu c kch thc thch hp
gi l Packetized Elementary Stream (PES).
Cc khi d liu ny cn c tip u mang thng tin v nh du v tr bt u ca

cc khi v phi c nhn thi gian bi v qu trnh ng gi lm sai lch trc thi gian.
Video Elementary Stream - video ES (lung video c s), gm tt c d liu

Video cho 1 chui, bao gm tip u ca chui v cc thnh phn ph ca 1
chui
1 ES ch mang 1 loi d liu (hnh nh hoc m thanh) t mt b m ho hnh

nh hoc m thanh
Cc gi PES c di bin i, khc vi cc gi vn chuyn c chiu di c
nh, v c th di hn nhiu so vi cc gi vn chuyn
9/14/2006
97
Packetized Elementary Stream (PES)
The output of a single MPEG audio or video coder is called an

Elementary Stream.
An Elementary Stream is an endless near real-time signal.
For convenience, it can be broken into convenient-sized data blocks in

a Packetized Elementary Stream (PES).
These data blocks need header information to identify the start of the
packets and must include time stamps because the packetizing process
disrupts the time axis.
Video Elementary Stream (video ES), consists of all the video data for a
sequence, including the sequence header and all the subparts of a sequence.
An ES carries only one type of data (video or audio) from a single video or
audio encoder.
PES packets have variable length, not corresponding to the fixed packet
length of transport packets, and may be much longer than a transport packet.
9/14/2006
98
MPEG Packetized Elementary Stream (PES) (BS)
The figure shows that one video PES and a number of audio
PES can be combined to form a Program Stream, provided
that all of the coders are locked to a common clock.
Time stamps in each PES ensure lip-sync between the
video and audio.
9/14/2006
99
Intra Frame Coding - M ho trong nh
M ha trong nh ch lin quan vi thng tin trong khung hin ti (ko

lin quan ti khung no khc trong chui video)
S khi m ho trong khung MPEG (hnh di) -> ging JPEG
( xem li c cu m ha JPEG)
Cc khi c bn ca m ho trong nh:
9/14/2006
B lc video (ty chn)

B bin i DCT
B lng t ho cc h s DCT
B m ha chiu di bin i (VLC-variable length coder)
100
Intra Frame Coding
Intra coding only concern with information within the current

frame, (not relative to any other frame in the video sequence)
MPEG intra-frame coding block diagram (See bottom Fig)
Similar to JPEG (Lets review JPEG coding mechanism !!)
Basic blocks of Intra frame coder:
9/14/2006
Video filter
Discrete cosine transform (DCT)
DCT coefficient quantizer
Run-length amplitude/variable length coder (VLC)
101
B lc video:
H thng th gic ca con ngi:

Nhy cm nht vi cc thay i ca chi
t nhy cm nht vi s thay i mu
MPEG s dng khng gian mu YCbCr c trng cho gi tr
d liu thay cho RGB:
Y l tn hiu chi
Cb l tn hiu sai phn mu xanh
Cr l tn hiu sai phn mu
Th no l 4:4:4, 4:2:0, v.v, dng video ?
9/14/2006
4:4:4 l tn hiu YCbCr video y mi macroblock gm 4

Y block, 4 Cb block, 4 Cr block lng ph di thng.
4:2:0 c s dng nhiu nht trong MPEG2
102
Video Filter
Human Visual System (HVS) is
MPEG uses the YCbCr color space to represent the

data values instead of RGB, where:
Most sensitive to changes in luminance,

Less sensitive to variations in chrominance.
Y is the luminance signal,

Cb is the blue color difference signal,
Cr is the red color difference signal.
What is 4:4:4, 4:2:0, etc, video format ?
9/14/2006
4:4:4 is full bandwidth YCbCr video each macroblock

consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks
waste of bandwidth !!
4:2:0 is most commonly used in MPEG-2
103
Color Subsampling formats (BS)

4:2:2 Format
4:4:4 Format
Legends:
For PAL system (720 *576

lines, 8bits each sample)
4:4:4 Format:
Cr
Bit rate = (720 + 720 + 720)*

576 *8 *25 = 249 Mbps
Cb
4:1:1 Format
4:2:2 Format:
4:2:0 Format
Bit rate = (720 + 360 + 360)*

576 *8 *25 = 166 Mbps
4:2:0 Format:
Bit rate = (720 + 360)* 576
*8 *25 = 124,4 Mbps
4:1:1 Format:
Bit rate = (720 + 180 + 180)*
576 *8 *25 = 124,4 Mbps
9/14/2006
104
ng dng ca cc dng mu:

nh dng
mu
Th t thi gian
ng dng
trong macroblock
4:2:0
(6 block)
YYYYCbCr
TV v cc thit b gii
tr dn dng
4:2:2
(8 block)
YYYYCbCrCbCr
Thit b studio
Thit b son tho
Video chuyn nghip
4:4:4
(12 block)
YYYYCbCrCbCrCb ha my tnh
CrCbCr
9/14/2006
105
Applications of chroma formats

chroma_for
mat
Multiplex order (time)

within macroblock
Application
YYYYCbCr
Main stream television,

Consumer entertainment.
4:2:2
(8 blocks)
YYYYCbCrCbCr
Studio production
environments
Professional editing
equipment,
4:4:4
(12 blocks)
YYYYCbCrCbCrCbCrCbCr
Computer graphics
4:2:0
(6 blocks)
9/14/2006
106
MPEG profiles v cc mc:
MPEG2 c chia lm vi profile

Cc c Im ca profile chnh:
nh dng mu 4:2:0
nh I, P, B
Khng c kh nng thay i t l
Main profile c chia nh thnh cc mc:
MP@ML (Main profile main level):
c thit k vi chun CCIR601 cho video s qut xen k

720x576 (PAL) hay 720x483 (NTSC)
30 Hz qut lin tc, 60 Hz qut xen k.
Tc bit cao nht 15Mbit/s
MP@HL (Main profile high level):
Gii hn trn:
1152x1920, 60 Hz qut lin tc
80 Mbits/s
9/14/2006
107
MPEG Profiles & levels
MPEG-2 is classified into several profiles.

Main profile features:
4:2:0 chroma sampling format

I, P, and B pictures
Non-scalable
Main Profile is subdivided into levels.
MP@ML (Main Profile Main Level):
MP@HL (Main Profile High Level):
9/14/2006
Designed with CCIR601 standard for interlaced standard digital

video.
720 x 576 (PAL) or 720 x 483 (NTSC)
30 Hz progressive, 60 Hz interlaced
Maximum bit rate is 15 Mbits/s
Upper bounds:
1152 x 1920, 60Hz progressive
80 Mbits/s
108
M ho/gii m MPEG:
9/14/2006
109
MPEG encoder/decoder
9/14/2006
110
D on:
D on sau c thc hin bng

cch lu cc nh cho n khi nh
tham kho mong mun sn sng,
trc khi m ho cc khung ang
c cha.
B m ho s quyt nh dng 1
trong 3 cch:
D on trc t cc nh trc
D on sau t cc nh pha sau
Hay d on ni suy
Mc ch gim thiu sai s d on
B m ho phi truyn cc nh theo 1

trt t khc vi nh ngun cho b
gii m c cc nh tham kho trc
khi gii m nh d on.
B gii m phi lu tr 2 khung
9/14/2006
111
Prediction
Backward prediction is done by

storing pictures until the desired
anchor picture is available before
encoding the current stored frames.
The encoder can decide to use:
9/14/2006
Forward prediction from a previous

picture,
Backward prediction from a following
picture,
or Interpolated prediction
to minimize prediction error.

The encoder must transmit pictures in
an order differ from that of source
pictures so that the decoder has the
anchor pictures before decoding
predicted pictures. (See next slide)
The decoder must have two frame
stored.
112
Qu trnh sp xp li nh I P B
Cc nh c m ho v gii m theo cc th t khc vi th t

hin th
Do qu trnh d on 2 chiu ca nh B
V d chng ta c 1 GOP di 12 nh
Th t ngun v th t u vo b m ho:
1 2 3 4 5 6 7 8 9 10 11 12 13
IBBPBB P B B P B B I
Th t m ho v th t trong dng bit m ho:
1 4 2 3 7 5 6 10 8 9 13 11 12
I P B B PB B P B B I
Th t u ra b gii m v th t hin th (ging u vo)
9/14/2006
113
I P B Picture Reordering
Pictures are coded and decoded in a different order

than they are displayed.
Due to bidirectional prediction for B pictures.
For example we have a 12 picture long GOP:
Source order and encoder input order:
Encoding order and order in the coded bitstream:
I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)
B(12) I(13)
I(1) P(4) B(2) B(3) P(7) B(5) B(6) P(10) B(8) B(9) I(13) B(11)
B(12)
Decoder output order and display order (same as

input):
9/14/2006
I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)
B(12) I(13)
114
Cng thc DCT v IDCT
DCT:
IDCT:
F(u,v) = ma trn DCT 2 chiu

N*N u,v,x,y = 0,1,2N-1
x,y l cc ta khng gian

u,v l ta tn s trong min
bin i
9/14/2006
Eq3 -> dng thng

Eq4 -> dng ma trn
Trong :
Eq1 -> dng thng

Eq2 -> dng ma trn
C(u) * C(v) = 1/2 vi u,v =0

C(u) * C(v) = 1 trong cc
trng hp khc
115
DCT and IDCT formulas
DCT:
IDCT:
Eq 3 Normal form
Eq 4 Matrix form
Where:
9/14/2006
Eq 1 Normal form
Eq 2 Matrix form
F(u,v) = two-dimensional
NxN DCT.
u,v,x,y = 0,1,2,...N-1
x,y are spatial coordinates in
the sample domain.
u,v are frequency coordinates
in the transform domain.
C(u), C(v) = 1/(square root
(2)) for u, v = 0.
C(u), C(v) = 1 otherwise.
116
DCT vs DFT:
Khi nim DCT ging DFT ngoi tr:

DCT tp trung nng lng vo cc h s tn s thp
tt hn DFT.
DCT l thun thc, DFT l phc (bin , pha)
DCT hot ng trn 1 block ca cc im nh to ra
cc h s ging vi cc h s min tn s c to ra
bi DFT
9/14/2006
DCT N im c phn gii tn s ging nh DFT 2N im

N tn s ca DFT 2N im tng ng vi N im na trn
ca vng n v trong tn s phc
Vi u vo lp theo chu k, bin ca h s DFT

khng i (pha ca u vo ko nh hng). Vi DCT
th ko phi nh vy
117
DCT versus DFT
The DCT is conceptually similar to the DFT, except:
DCT concentrates energy into lower order coefficients

better than DFT.
DCT is purely real, the DFT is complex (magnitude and
phase).
A DCT operation on a block of pixels produces coefficients
that are similar to the frequency domain coefficients
produced by a DFT operation.
9/14/2006
An N-point DCT has the same frequency resolution as a 2Npoint DFT.

The N frequencies of a 2N point DFT correspond to N points
on the upper half of the unit circle in the complex frequency
plane.
Assuming a periodic input, the magnitude of the DFT

coefficients is spatially invariant (phase of the input does
not matter). This is not true for the DCT.
118
The weighting process (BS)
9/14/2006
119
Ma trn lng t ho:
Ch gi tr cc
h s DCT l:
Ti sao?
9/14/2006
Nh trn bn tri
(tn s thp)
Ln gc di bn
phi (tn s cao)
xem li JPEG
HVS t nhy cm vi
cc li tn s cao
hn cc tn s thp
Tn s cng cao
cng nn c lng
t ho th hn
120
Quantization matrix
Note DCT
coefficients are:
Small in the upper left
(low frequencies),
Large in the upper right
(high frequencies)
Recall the JPEG
mechanism !!
Why ?
9/14/2006
HVS is less sensitive

to errors in high
frequency coefficients
than it is for lower
frequencies
higher frequencies
should be more
coarsely quantized !!
121
Kt qu ma trn DCT (v d)
9/14/2006
Sau khi lng t

ho ph hp, kt
qu l 1 ma trn
c nhiu gi tr 0
122
Result DCT matrix (example)
9/14/2006
After adaptive
quantization, the
result is a matrix
containing many
zeros.
123
Qut MPEG:
Tri -> qut ziczac (nh JPEG)

Phi -> qut thay phin xen k -> tt hn cho khung qut
xen k
9/14/2006
124
MPEG scanning
Left Zigzag scanning (like JPEG)

Right Alternate scanning better for interlaced frames !
9/14/2006
125
Huffman/Run-level coding:
M Huffman kt hp vi m ha Run-level v thut

qut ziczac c ng dng cho cc h s DCT
lng t ho
Run-level = mt dy cc s 0 tip theo cc mc
khc 0
M Huffman cng c p dng cho nhiu loi
thng tin ph khc nhau
M Huffman l mt m entropy, n to ra c mt
cch ti u di t m trung bnh ngn nht cho 1
ngun tin.
9/14/2006
di t m trung bnh ny >= entropy ca ngun

126
Huffman/Run-Level Coding
Huffman coding in combination with Run-Level

coding and zig-zag scanning is applied to
quantized DCT coefficients.
"Run-Level" = A run-length of zeros followed by a
non-zero level.
Huffman coding is also applied to various types of
side information.
A Huffman code is an entropy code which is
optimally achieves the shortest average possible
code word length for a source.
This average code word length is >= the entropy
of the source.
9/14/2006
127
Minh ho m Huffman/run-level
Zero
Run-Length
Amplitude
MPEG
Code Value
N/A
8 (DC Value)
110 1000
0000 1100
0000 1100
0100 0
0100 0
0100 0
110
110
110
110
12
0010 0010 0
EOB
EOB
10
9/14/2006
S dng ma trn u ra
DCT slide trc, sau khi
c qut ziczac -> u ra
s l 1 chui s:
4,4,2,2,2,1,1,1,1,0 (12 s
0),1,0 (41 s 0)
Cc gi ti ny c tra
trong bng cc m c
chiu di bin i
Cc gi tr xut hin nhiu

nht c gn cc m
ngn
Cc gi tr xut hin t nht
c gn cc m di
128
Huffman/Run-Level coding illustrated

Zero
Run-Length
Amplitude
MPEG
Code Value
N/A
8 (DC Value)
110 1000
0000 1100
0000 1100
0100 0
0100 0
0100 0
110
110
110
110
12
0010 0010 0
EOB
EOB
10
9/14/2006
Using the DCT output

matrix in previous slide,
after being zigzag
scanned the output
will be a sequence of
number: 4, 4, 2, 2, 2, 1,
1, 1, 1, 0 (12 zeros), 1, 0
(41 zeros)
These values are looked
up in a fixed table of
variable length codes
The most probable

occurrence is given a
relatively short code,
The least probable
occurrence is given a
relatively long code.
129
Minh ho m huffman/run-level (2)
12 s 0 u c m ho hiu qu ch bng 9bits
41 s 0 sau b loi b, thay bi 2 bit ch th End Of

Block (EOB)
Cc h s DCT lng t ho lc ny c th hin

bi 1 chui 61 bit nh phn (xem bng)
Ch y rng block nguyn bn 8x8 vi 8 bit/ pixel i

hi 512 bit cho hin th y b m ha Huffman
t tc nn xp x 8,4:1
9/14/2006
130
Huffman/Run-Level coding illustrated (2)
The first run of 12 zeroes has been efficiently

coded by only 9 bits
The last run of 41 zeroes has been entirely
eliminated, represented only with a 2-bit End Of
Block (EOB) indicator.
The quantized DCT coefficients are now
represented by a sequence of 61 binary bits (See
the table).
Considering that the original 8x8 block of 8-bit
pixels required 512 bits for full representation,
the compression rate is approx. 8,4:1.
9/14/2006
131
Qu trnh truyn d liu MPEG:
MPEG ng gi ton b d liu vo cc gi c kch thc c nh l 188 byte

truyn
D liu m thanh, hnh nh c t vo trong cc gi PES trc khi c ct ra
thnh cc gi vn chuyn c di c nh
1 gi PES c th di hn nhiu so vi 1 gi vn chuyn do cn phn on:
9/14/2006
Header PES c t ngay tip theo header gi vn chuyn

Cc phn lin tip nhau ca gi PES sau c t vo phn ti trng ca gi vn
chuyn
Khng gian cn li trong ti trng ca gi vn chuyn s c thm vo cc byte chn
0xFF
Mi gi vn chuyn bt u vi 1 byte ng b gi tr 0x47
Trong h thng truyn dn ATSC mt t DTV VSB ca M, byte ng b khng c x
l, nhng c thay th bng mt biu tng ng b c bit khc ph hp cho truyn
dn RF
Header gi vn chuyn cha 1 PID 13 bit (ID ca gi), PID ny dng xc nh 1 lung
c s m thanh, hnh nh hay cc phn t chng trnh khc
PID 0x0000 c dnh ring cho gi vn chuyn mang bng lin kt chng trnh PAT
PAT tr ti bng nh x chng trnh PMT bng ny li tr ti cc phn t ring bit
ca mt chng trnh
132
MPEG Data Transport
MPEG packages all data into fixed-size 188-byte packets for transport.
Video or audio payload data placed in PES packets before is broken up
into fixed length transport packet payloads.
A PES packet may be much longer than a transport packet Require
segmentation:
9/14/2006
The PES header is placed immediately following a transport header

Successive portions of the PES packet are then placed in the payloads of
transport packets.
Remaining space in the final transport packet payload is filled with stuffing
bytes = 0xFF (all ones).
Each transport packet starts with a sync byte = 0x47.
In the ATSC US terrestrial DTV VSB transmission system, sync byte is not
processed, but is replaced by a different sync symbol especially suited to RF
transmission.
The transport packet header contains a 13-bit PID (packet ID), which
corresponds to a particular elementary stream of video, audio, or other program
element.
PID 0x0000 is reserved for transport packets carrying a program association
table (PAT).
The PAT points to a Program Map Table (PMT) points to particular elements
of a program
133
PAT & PMT (BS)
9/14/2006
134
MPEG Program Stream (PS) (BS)
Program Streams have variable length

packets with headers.
They are used in data transfers to and from
optical and hard disks, which are error free
and in which files of arbitrary sizes are
expected.
VCD/DVD uses Program Streams.
9/14/2006
135
MPEG Transport Stream (vs. Program stream) (BS)
For transmission and digital broadcasting, several programs and

their associated PES can be multiplexed into a single
Transport Stream.
A Transport Stream differs from a Program Stream in that:
PES packets are further subdivided into short fixed-size
packets
Multiple programs encoded with different clocks can be
carried.
How ?: Transport stream has a program clock reference
(PCR) mechanism which allows transmission of multiple
clocks
One of these clocks is selected and regenerated at the
decoder.
A Single Program Transport Stream (SPTS) is also possible
and this may be found between a coder and a multiplexer.
9/14/2006
136
Gi vn chuyn MPEG:
Trng thch nghi:
8 bit nh di trng thch nghi

Nhm u tin ca cc c gm 8
c 1 bit: C ch th gin on, c
ch th truy cp ngu nhin, ch th
lung c s u tin, c PCR, c
OPCR, c ghp ni, c vn
chuyn d liu ring, c trng
thch nghi m rng
9/14/2006
PCR_flag
OPCR_flag
splicing_point_flag
transport_private_data_flag
adaptation_field_extension_flag
Cc trng tu chn s xut hin nu

c ch th bi 1 trong cc c i trc.
Phn cn li ca trng thch nghi c
in vi cc byte chn 0xFF
137
MPEG Transport packet
Adaptation Field:
8 bits specifying the length of the

adaptation field.
The first group of flags consists of
eight 1-bit flags:
discontinuity_indicator
random_access_indicator
elementary_stream_priority_in
dicator
9/14/2006
PCR_flag
OPCR_flag
splicing_point_flag
transport_private_data_flag
adaptation_field_extension_flag
The optional fields are present if
indicated by one of the preceding flags.
The remainder of the adaptation field is
filled with stuffing bytes (0xFF, all
ones).
138
Qu trnh tch lung chuyn vn MPEG-TS
1.
2.
3.
4.
Qu trnh tch lung chuyn vn MPEG (TS) bao gm:

Tm PAT bng cch chn cc gi vi PID = 0x0000
c cc PID ca cc PMT
c cc PID cho cc phn t ca chng trnh mong
mun t cc PMT ca n (v d, 1 chng trnh c bn
s c PID cho m thanh v PID cho hnh nh)
D cc gi vi cc PID mong mun v nh tuyn chng
n cc b gii m
1 lung chuyn vn MPEG2 c th mang:
Dng video
Dng audio
D liu khc
lung chuyn vn MPEG2 l nh dng gi cho truyn
thng d liu ng xung (downstream) trn mng CATV
9/14/2006
139
Demultiplexing a Transport Stream (TS)

Demultiplexing a transport stream involves:
Finding the PAT by selecting packets with PID = 0x0000

Reading the PIDs for the PMTs
Reading the PIDs for the elements of a desired program
from its PMT (for example, a basic program will have a
PID for audio and a PID for video)
Detecting packets with the desired PIDs and routing them
to the decoders
1.
2.
3.
4.
A MPEG-2 transport stream can carry:
Video stream
Audio stream
Any type of data

MPEG-2 TS is the packet format for CATV downstream
data communication.
9/14/2006
140
nh thi v iu khin m:
9/14/2006
im A: u
vo b m ho
tc khng
i
im B: u ra
b m ho
tc thay i
im C: u ra
b m m ho
tc khng
i
im D: Knh
giao tip + b
m gii m
tc khng i
im E: u
vo b gii m
tc thay
i
im F: u ra
b gii m
tc khng i
141
Timing & buffer control
9/14/2006
Point A:
Encoder input
Constant/specifi
ed rate
Point B:
Encoder
output
Variable rate
Point C:
Encoder buffer
output
Constant rate
Point D:
Communication
channel +
decoder buffer
Constant
rate
Point E:
Decoder input
Variable rate
Point F:
Decoder output
Constant/specifi
ed rate
142
ng b thi gian
B gii m c ng b vi b m ho bi cc nhn thi gian

B m ho cha b dao ng ch v b m, c gi l ng h thi
gian h thng (STC) (xem s khi trn)
STC thuc v 1 chng trnh ring v l ng h ch ca b m ho

video, audio cho chng trnh
Nhiu chng trnh, mi chng trnh c STC ring, c th c ghp vo
1 lung
1 thnh phn chng trnh c th thm ch khng c nhn thi gian ->
nhng s khng th ng b vi cc thnh phn khc
u vo b m ho, (im A), thi gian xut hin ca video pic hay
audio block u vo c nh du bng cch ly mu STC.
tr tng cng ca b m m ho v gii m c cng thm vo
STC, to nn nhn thi gian hin th (PTS)
9/14/2006
PTS sau c chn vo v tr u tin ca gi th hin cc nh v

block audio , im B
143
Timing - Synchronization
The decoder is synchronized with the encoder by time stamps

The encoder contains a master oscillator and counter, called the
System Time Clock (STC). (See previous block diagram.)
The STC belongs to a particular program and is the master

clock of the video and audio encoders for that program.
Multiple programs, each with its own STC, can also be
multiplexed into a single stream.
A program component can even have no time stamps but

can not be synchronized with other components.
At encoder input, (Point A), the time of occurrence of an input
video picture or audio block is noted by sampling the STC.
A total delay of encoder and decoder buffer (constant) is
added to STC, creating a Presentation Time Stamp (PTS),
9/14/2006
PTS is then inserted in the first of the packet(s) representing

that picture or audio block, at Point B.
144
ng b thi gian (2)
Nhn thi gian gii m DTS c th c kt hp 1 cch ty chn

vo dng bit -> n th hin cho thi im m d liu phi c ly i
ngay t b m gii m v em gii m.
Trong ATSC -> PTS hay DTS phi c chn vo u mi nh m ha

Thm vo , u ra ca b m m ho (im C) c dn nhn thi
gian bng cc gi tr STC, v c gi l:
DTS v PTS ging nhau ngoi tr trng hp sp xp lI cc nh B

DTS ch c s dng cho nhng ni cn sp xp li.
PTS hay DTS c chn vo vi khong thi gian =< 700mS
Tham chiu ng h h thng (SCR) trong lung chng trnh.

Tham chiu ng h chng trnh (PCR) trong lung chuyn vn
Chu k chn ca PCR =< 100mS

Chu k chn ca SCR =< 700mS
PCR v/hoc SCR c s dng ng b STC ca b gii m vi STC
ca b m ho
9/14/2006
145
Timing Synchronization (2)
Decode Time Stamp (DTS) can optionally combined into the bit
stream represents the time at which the data should be taken
instantaneously from the decoder buffer and decoded.
In addition, the output of the encoder buffer (Point C) is time

stamped with System Time Clock (STC) values, called:
DTS and PTS are identical except in the case of picture reordering for B
pictures.
The DTS is only used where it is needed because of reordering.
Whenever DTS is used, PTS is also coded.
PTS (or DTS) inserted interval 700 mS.
In ATSC PTS (or DTS) must be inserted at the beginning of each
coded picture (access unit ).
System Clock Reference (SCR) in a Program Stream.
Program Clock Reference (PCR) in a Transport Stream.
PCR time stamp interval 100mS.

SCR time stamp interval 700mS.
PCR and/or the SCR are used to synchronize the decoder STC
with the encoder STC.
9/14/2006
146
ng b thi gian (3)
Tt c cc dng video audio nm trong cng 1 chng trnh phI ly nhn thi
gian ca chng t 1 STC chung c th ng b cc b gii m video v
audio vi nhau
Tc d liu v tc gi trn knh ( u ra b ghp knh) c th hon
ton khng ng b vi ng h thi gian h thng STC
Cc nhn thi gian PCR cho php s ng b ca cc chng trnh khc
nhau vi STC khc nhau ghp knh vi nhau trong khi vn cho php ti to
li STC ca mi chng trnh
Nu khng xy ra hin tng trn hoc rng b m th tr trong b m
v knh dn ca c video v audio l khng i
u vo b m ho v u ra b gii m chy vi tc bng nhau v khng
i
Tr t u vo b m ho v u ra b gii m l c nh
Nu khng cn s ng b chnh xc, th ng h gii m c th chy t
do cc khung video c th lp li hoc b qua khi cn thit ngn cn
vic rng hoc trn b m.
9/14/2006
147
Timing Synchronization (3)
All video and audio streams included in a program must get their
time stamps from a common STC so that synchronization of the
video and audio decoders with each other may be accomplished.
The data rate and packet rate on the channel (at the multiplexer
output) can be completely asynchronous with the System Time
Clock (STC)
PCR time stamps allows synchronizations of different
multiplexed programs having different STCs while allowing STC
recovery for each program.
If there is no buffer underflow or overflow delays in the buffers
and transmission channel for both video and audio are
constant.
The encoder input and decoder output run at equal and constant
rates.
Fixed end-to-end delay from encoder input to decoder output
If exact synchronization is not required, the decoder clock can be
free running video frames can be repeated / skipped as
necessary to prevent buffer underflow / overflow, respectively.
9/14/2006
148
HDTV (High definition television)
High definition television (HDTV) first came to

public attention in 1981, when NHK, the
Japanese broadcasting authority, first
demonstrated it in the United States.
HDTV is defined by the ITU-R as:
9/14/2006
'A system designed to allow viewing at about

three times the picture height, such that the
system is virtually, or nearly, transparent to the
quality or portrayal that would have been
perceived in the original scene ... by a discerning
viewer with normal visual acuity.'
149
HDTV (Truyn hnh nt cao)
HDTV ln u n vi cng chng vo nm

1981, khi NHK, i truyn hnh Nht Bn, th
nghim ln u tin M
HDTV c nh ngha bi ITU-R nh l:
9/14/2006
1 h thng thit k cho php mt ngi vi th

gic bnh thng t 1 khong cch gp 3 ln
chiu cao nh, nhn thc khung cnh vi cht
lng gn nh cnh gc.
150
HDTV (2)
HDTV proposals are for a screen which is wider than the conventional
TV image by about 33%. It is generally agreed that the HDTV aspect
ratio will be 16:9, as opposed to the 4:3 ratio of conventional TV
systems. This ratio has been chosen because psychological tests have
shown that it best matches the human visual field.
It also enables use of existing cinema film formats as additional source
material, since this is the same aspect ratio used in normal 35 mm film.
Figure 16.6(a) shows how the aspect ratio of HDTV compares with that
of conventional television, using the same resolution, or the same
surface area as the comparison metric.
To achieve the improved resolution the video image used in HDTV
must contain over 1000 lines, as opposed to the 525 and 625 provided
by the existing NTSC and PAL systems. This gives a much improved
vertical resolution. The exact value is chosen to be a simple multiple of
one or both of the vertical resolutions used in conventional TV.
However, due to the higher scan rates the bandwidth requirement for
analogue HDTV is approximately 12 MHz, compared to the nominal 6
MHz of conventional TV
9/14/2006
151
HDTV (2)
HDTV yu cu 1 mn hnh rng hn mn hnh tivi quy c thng

thng khong 30%. iu ny cho php rng t l mn nh s l
16:9 khc vi t l 4:3 ca h thng tivi quy c.
Chn t l ny v cc th nghim tm l ch ra rng n ph hp

nht vi quan st ca con ngi.
N cng cho php vic s dng cc dng phim chiu bng hin c,
v y cng l t l mn nh s dng cho phim 35mm thng thng.
nhn phn gii cao hn, cc nh dng trong HDTV phi cha
hn 1000 dng, khc vi h NTSC v PAL hin ti ch c 525 hay
625 dng.
iu ny em li phn gii theo chiu dc cao hn. Gi tr chnh
xc c chn la l bi s ca mt phn gii ca TV thng.
Tuy vy, do tc qut cao hn nn di thng yu cu cho HDTV
tng t xp x 12MHz, so vi 6MHz ca TV thng.
9/14/2006
152
HDTV (3)
The introduction of a non-compatible TV transmission format for

HDTV would require the viewer either to buy a new receiver, or to
buy a converter to receive the picture on their old set.
The initial thrust in Japan was towards an HDTV format which is
compatible with conventional TV standards, and which can be
received by conventional receivers, with conventional quality.
However, to get the full benefit of HDTV, a new wide screen, high
resolution receiver has to be purchased.
One of the principal reasons that HDTV is not already common is
that a general standard has not yet been agreed. The 26th CCIR
plenary assembly recommended the adoption of a single, worldwide
standard for high definition television.
Unfortunately, Japan, Europe and North America are all investing
significant time and money in their own systems based on their own,
current, conventional TV standards and other national
considerations.
9/14/2006
153
HDTV (3)
S a ra nh dng truyn dn TV khng tng thch cho HDTV

s yu cu ngi xem hoc phi mua 1 b thu mi hoc phi mua
b bin i nhn c hnh nh trn TV c ca h.
Xu hng Nht hng ti 1 nh dng HDTV tng thch vi h
thng TV c, v c th thu c bng TV thng vi cht lng
bnh thng.
Tuy nhin c c li ch y t HDTV, th phi mua 1
mn nh rng v mt u thu c nt cao.
1 trong nhng nguyn nhn chnh m HDTV cha thng dng l
1 chun chung vn cha c tha nhn.
Hi ngh CCIR ln th 26 khuyn ngh 1 chun h thng ton cu
cho TV nt cao.
Tuy vy, Nht, Chu u, Bc M v ang u t 1 s tin v thi
gian cho vic pht trin h thng ca ring h da trn chun TV
thng thng ca cc nc ny.
9/14/2006
154
H261- H263
The H.261 algorithm was developed for the purpose of image

transmission rather than image storage.
It is designed to produce a constant output of p x 64 kbivs, where
p is an integer in the range 1 to 30.
This allows transmission over a digital network or data link of
varying capacity.
It also allows transmission over a single 64 kbit/s digital
telephone channel for low quality video-telephony, or at higher bit
rates for improved picture quality.
The basic coding algorithm is similar to that of MPEG in that it is
a hybrid of motion compensation, DCT and straightforward
DPCM (intra-frame coding mode), without the MPEG I, P, B
frames.
The DCT operation is performed at a low level on 8 x 8 blocks of
error samples from the predicted luminance pixel values, with
sub-sampled blocks of chrominance data.
9/14/2006
155
H261- H263
Thut ton H261 c pht trin vi mc ch truyn nh

hn l cha nh.
N c thit k sinh ra mt u ra tc khng i p
x 64 kbps, trong p l 1 s nguyn t 1->30
Cho php truyn qua 1 mng s hay kt ni d liu c dung lng

bin i
N cng cho php truyn tng 64kbit/s qua knh thoi s cho
video phone cht lng thp, hoc tc bit cao hn vi cht
lng nh cao hn.
Thut m ho c bn ging vi MPEG, l h thng lai ca b

chuyn ng, DCT v DPCM n gin khng c c cu khung
MPEG I P B
DCT c thc hin mc thp trn 8x8 block ca cc li d
on t cc gi tr im nh chi c d on, vi cc mu
block ph ca d liu mu.
9/14/2006
156
H261-H263 (2)
9/14/2006
157
H261-H263 (3)
H.261 is widely used on 176x 144 pixel images.

The ability to select a range of output rates for the algorithm
allows it to be used in different applications.
Low output rates ( p = 1 or 2) are only suitable for face-to-face
(videophone) communication. H.261 is thus the standard used in
many commercial videophone systems such as the UK
BT/Marconi Relate 2000 and the US ATT 2500 products.
Video-conferencing would require a greater output data rate ( p >
6) and might go as high as 2 Mbit/s for high quality transmission
with larger image sizes.
A further development of H.261 is H.263 for lower fixed
transmission rates.
This deploys arithmetic coding in place of the variable length
coding (See H261 diagram), with other modifications, the data
rate is reduced to only 20 kbit/s.
9/14/2006
158
H261-H263 (3)
H261 c s dng rng ri vi nh 176x144 pixel

Kh nng la chn khong rng cc tc u ra cho php n
c dng trong nhiu ng dng khc nhau
Tc u ra thp (p = 1 hay 2) ch ph hp cho giao tip mt i
mt. H261 do c dng trong cc h thng videophone thng
mi nh UK BT/Marconi Relate 2000 v cc sn phm US ATT
2500
Hi tho hnh nh s yu cu tc d liu u ra ln hn (p>6) v
c th chy vi tc cao 2Mbit/s cho truyn dn tc cao vi
cc c nh ln hn.
Pht trin xa hn ca ca H261 l H263 cho tc truyn dn thp
hn.
H263 dng thut ton m ho s hc thay th cho VLC (nhn s
H261), v vi mt s ci tin khc cho tc d liu gim xung
n 20kbit/s
9/14/2006
159
Model Based Coding (MBC)
At the very low bit rates (20 kbit/s or less) associated with video
telephony, the requirements for image transmission stretch the
compression techniques described earlier to their limits.
In order to achieve the necessary degree of compression they
often require reduction in spatial resolution or even the
elimination of frames from the sequence.
Model based coding (MBC) attempts to exploit a greater degree
of redundancy in images than current techniques, in order to
achieve significant image compression but without adversely
degrading the image content information.
It relies upon the fact that the image quality is largely subjective.
Providing that the appearance of scenes within an observed
image is kept at a visually acceptable level, it may not matter that
the observed image is not a precise reproduction of reality.
9/14/2006
160
Model Based Coding (MBC)
tc bit rt thp 20kbit/s hoc thp hn na trong cc ng dng

videophone, cc k thut nn c m t b y n gii hn ca
chng.
t c mc nn cn thit ngi ta phi gim phn gii
hoc thm ch loi bt cc khung trong chui nh.
Phng php m ha bng m hnh ha MBC c gng khai thc

d tha trong nh mc ln hn cc k thut hin ti,
t h s nn cao nhng khng cn phi gim qu nhiu cc
thng tin ca nh
N da vo mt hin tng l rng cht lng nh ph thuc vo
yu t ch quan.
Vi iu kin l s xut hin ca khung cnh trong 1 nh quan st
c c cht lng chp nhn c, s kh nhn ra vic nh quan
st khng phi l 1 sn phm ti to chnh xc ca nh thc.
9/14/2006
161
Model Based Coding (2)
One MBC method for producing an artificial image of a head sequence

utilizes a feature codebook where a range of facial expressions,
sufficient to create an animation, are generated from sub-images or
templates which are joined together to form a complete face.
The most important areas of a face, for conveying an expression, are
the eyes and mouth, hence the objective is to create an image in which
the movement of the eyes and mouth is a convincing approximation to
the movements of the original subject.
When forming the synthetic image, the feature template vectors which
form the closest match to those of the original moving sequence are
selected from the codebook and then transmitted as low bit rate coded
addresses.
By using only 10 eye and 10 mouth templates, for instance, a total of
100 combinations exists implying that only a 6-bit codebook address
need be transmitted.
It has been found that there are only 13 visually distinct mouth shapes
for vowel and consonant formation during speech.
However, the number of mouth sub-images is usually increased, to
include intermediate expressions and hence avoid step changes in the
image.
9/14/2006
162
1 trong cc phng php MBC to ra 1 nh nhn to ca ci u s dng

bng m ha cha mt di cc c trng ca khun mt to ra 1
hot hnh, to ra t cc nh con hoc cc template c sn ghp vo nhau
to nn 1 khun mt hon chnh.
Vng quan trng nht ca 1 khun mt truyn cm chnh l mt v
mm, do bc nh to ra c sc thuyt phc th chuyn ng ca mt
v ming phi gn ging vi chuyn ng ca ngi tht.
Khi to mt bc nh nhn to, cc vct c trng gn nht vi chui
chuyn ng gc chn t bng m v s c truyn i di dng a ch
c m ha vi tc rt thp.
Bng cch ch s dng 10 mu mt v 10 mu ming cho sn, tng cng
s c 100 s kt hp m ch cn truyn i 1 a ch codebook 6bit.
Ngi ta tm thy rng ch c 13 kiu mm pht m cc nguyn m
v ph m trong khi ni.
Tuy nhin, s lng nh ph v mm thng c tng ln, m t c
cc cch din t tc thi v do trnh c cc bc thay i t ngt
trong nh.
9/14/2006
163
Another common way of representing objects in threedimensional computer graphics is by a net of

interconnecting polygons.
A model is stored as a set of linked arrays which specify
the coordinates of each polygon vertex, with the lines
connecting the vertices together forming each side of a
polygon.
To make realistic models, the polygon net can be
shaded to reflect the presence of light sources.
The wire-frame model [Welch 19911 can be modified to
fit the shape of a person's head and shoulders. The
wire-frame, composed of over 100 interconnecting
triangles, can produce subjectively acceptable synthetic
images, providing that the frame is not rotated by more
than 30" from the full-face position.
The model, (see the Figure) uses smaller triangles in
areas associated with high degrees of curvature where
significant movement is required.
Large flat areas, such as the forehead, contain fewer
triangles.
A second wire-frame is used to model the mouth
interior.
9/14/2006
164
1 cch khc din t ho my tnh ba chiu l bng 1 mng

li cc a gic lin kt nhau
1 m hnh c cha di dng mt tp hp cc ma trn lin
kt c chia ra thnh cc khi a gic u nhau, vi cc ng
ni gia cc nh to ra cc mt ca a gic.
to ra mu thc t, li a gic c th c to bng th
hin li s xut hin ca cc ngun sng.
Mu khung dy Welch 1991 c th c thay i to dng
ging nh u v vai ca mt ngi. Khung dy, gm hn 100
tam gic lin kt vi nhau, c th to ra bc nh nhn to chp
nhn c 1 cch ch quan, vi iu kin rng khung khng
b quay hn 30 so vi v tr c th thy ton b khun mt
M hnh trong hnh v s dng cc tam gic nh hn trong cc
vng c lin kt vi cong cao, ni c cc chuyn ng
quan trng.
Cc vng bng phng, rng nh trn c t tam gic
Khung dy th hai c dng m hnh ha pha trong ming.
9/14/2006
165
Model based coding (4)
A synthetic image is created by texture mapping detail from an

initial full-face source image, over the wire-frame, Facial
movement can be achieved by manipulation of the vertices of the
wire-frame.
Head rotation requires the use of simple matrix operations upon
the coordinate array. Facial expression requires the manipulation
of the features controlling the vertices.
This model based feature codebook approach suffers from the
drawback of codebook formation.
This has to be done off-line and, consequently, the image is
required to be prerecorded, with a consequent delay.
However, the actual image sequence can be sent at a very low
data rate. For a codebook with 128 entries where 7 bits are
required to code each mouth, a 25 frameh sequence requires
less than 200 bit/s to code the mouth movements.
When it is finally implemented, rates as low as 1 kbit/s are
confidently expected from MBC systems, but they can only
transmit image sequences which match the stored model, e.g.
head and shoulders displays.
9/14/2006
166
Model based coding (4)
1 bc nh nhn to c to ra bng cch nh x 1 cc chi tit (texture) t

nh ngun ban u c ton b khun mt ln khung dy, chuyn ng ca
mt c th to ra bng vic ko cc nh khung
S quay u i hi s dng cc thao tc n gin trn ma trn tin hnh
trn to ma trn. Trng thi ca khun mt yu cu phi ko cc nh
iu khin c trng.
Phng php m hnh ha da trn codebook ny c nhc im do qu
trnh to bng m codebook.
N phi c thc hin Ofline, yu cu ghi li nh trc v do gy ra
tr.
Tuy nhin, chui nh tht c th c gi tc d liu rt thp.Vi
codebook c 128 gi tr mm c m ho bi 7 bit, mt chui 25
khung yu cu phi nh hn 200bits/s m ho chuyn ng ca mm
Khi c hon thin, h thng MBC c th t cc tc thp n 1kbit/s,
nhng chng ch c th truyn cc chui nh ph hp vi cc m hnh
c sn, v d, th hin u v vai.
9/14/2006
167
Key points:
JPEG coding mechanism DCT/ Zigzag Scanning/ Adaptive

Quantization / VLC
MPEG layered structure:
MPEG compression mechanism:
Pixel, Block, Macroblock, Field DCT Coding / Frame DCT Coding, Slice,
Picture, Group of Pictures (GOP), Sequence, Packetized Elementary Stream
(PES)
Prediction
Motion compensation
Scanning
YCbCr formats (4:4:4, 4:2:0, etc)
Profiles @ Level
I,P,B pictures & reordering
Encoder/ Decoder process & Block diagram
MPEG Data transport

MPEG Timing & Buffer control
9/14/2006
STC/SCR/DTS
PCR/PTS
168
Cc im quan trng
C ch m ho JPEG DCT qut ziczac lng t ho thch nghi

VLC
Cu trc lp ca MPEG
Pixel, Block, Macroblock, trng m ho DCT/ khung m ho DCT,
slice, Picture, GOP, sequene, PES
C ch nn MPEG:
D on
B chuyn ng
Qut
Cc dng YcbCr (4:4:4, 4:2:0, etc)
Profiles @ Level
I,P,B picture, s sp xp li
Qu trnh m ho/gii m, s khi
Truyn d liu MPEG
nh thi v iu khin m
9/14/2006
STC/SCR/DTS
PCR/PTS
169
Technical terms
Macro blocks
HVS = Human Visual System
GOP = Group of Pictures
VLC = Variable Length Coding/Coder
IDCT/DCT = (Inverse) Discrete Cosine Transform
PES = Packetized Elementary Stream
MP@ML = Main profile @ Main Level
PCR = Program Clock Reference
SCR = System Clock Reference
STC = System Time Clock
PTS = Presentation Time Stamp
DTS = Decode Time Stamp
PAT = Program Association Table
PMT = Program Map Table
9/14/2006
170
Cc cm t k thut
Macroblock
HVS = Human Visual System
GOP = Group of picture
VLC = Variable Length Coding/Coder
IDCT/DCT = (Inverse) Discrete Cosine Transform
PES = Packetized Elementary Stream
MP@ML = Main Profile @ Mail Level
PCR = Program Clock Reference
SCR = System Clock Reference
STC = System time clock

PTS = Presentation Time Stamp
DTS = Decode Time Stamp

PAT = Program Association Table
PMT = Program Map Table
9/14/2006
171

Ban Dich MultiMedia

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ban Dich MultiMedia

Uploaded by

Copyright:

Available Formats

Cng ngh Multimedia

Chng 3: Mng multimedia

Nguyen Chan Hung Hanoi University of Technology

Chapter 3: Multimedia Network

Tm quan trng ca cc k thut Multimedia: -> Multimedia c

Trong truyn hnh v cc thit b in t dn dng:

Real player, Quicktime, Media

ng thi xut hin trn cc thit b cm tay ( TD th h 3G, PDA

Nguyen Chan Hung Hanoi University of Technology

The importance of Multimedia technologies: Multimedia everywhere !!

Real Player, QuickTime, Windows Media.

Webcast/ Streaming Applications

Distance Learning (or Tele-Education)

Tele-xxx (Lets imagine !!)

On TVs and other home electronic devices:

DVB-T/DVB-C/DVB-S (Digital Video Broadcasting

Interactive TV Internet applications (Mail, Web, E-commerce) on a TV !!

Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !!

Nguyen Chan Hung Hanoi University of Technology

Gii thiu (2)

Internet c thit k vo nhng nm 60 cho cc

Nguyen Chan Hung Hanoi University of Technology

Chng 1: Nn tng k thut nn

Trong truyn thng: thu hp di thng trong cc ng

Nguyen Chan Hung Hanoi University of Technology

Chapter 1: Background of compression

Compression factor or compression ratio

For communication: reduce bandwidth in multimedia

2.1. Ni dung thng tin v d tha

Ni dung thng tin:

Entropy l i lng o ca ni dung thng tin. Entropy

Tn hiu cng nhiu thng tin th entropy cng cao

D tha thng tin:

-> Biu din bi bits/n v ngun u ra (nh bits/pixel)

L s khc nhau gia tc thng tin v tc bit

Nguyen Chan Hung Hanoi University of Technology

Information content and redundancy

Expressed in bits/source output unit (such as bits/pixel).

The more information in the signal, the higher the

Nguyen Chan Hung Hanoi University of Technology

2.2. Entropy (B sung 1)

For a discrete source X with a finite alphabet of N symbols (x0, . . ., xN.1)

and measures the average number of bits/symbol required to describe the

Nguyen Chan Hung Hanoi University of Technology

Calculating EntropyAn Example

This indicates that a variable length code requires 1.5

Nguyen Chan Hung Hanoi University of Technology

2.3. Nn khng tn hao

D liu gii m ging ht d liu ngun

VD: Cc file u ra ca cc chng trnh tin ch

Khng th bo m 1 t l truyn c nh -> v tc

Nguyen Chan Hung Hanoi University of Technology

The data from the decoder is identical to the

Example: archives resulting from utilities such as

Can not guarantee a fix compression ratio

Nguyen Chan Hung Hanoi University of Technology

D liu gii nn khc dliu ngun nhng s khc

Ph hp vi m thanh, hnh nh nn.

Da trn nhng kin thc v s nhn thc v th

Nguyen Chan Hung Hanoi University of Technology

The data from the expander is not identical to

Suitable for audio and video compression.

Based on the understanding of

Nguyen Chan Hung Hanoi University of Technology