Professional Documents
Culture Documents
MPEG-4 standard
O. Le Meur
olemeur@irisa.fr
Univ. of Rennes 1
http://www.irisa.fr/temics/staff/lemeur/
October 1, 2012
Table of Content
MPEG-4 standard
Hierarchical syntax
Transform
Entropy coding
Profiles
Amendments
10
Conclusion
MPEG-4
Hierarchical syntax
Transform
Entropy coding
Profiles
Amendments
10
Conclusion
History
CfP
First solution
Assessment of proposals
Iteration
t
Verification Model (VM)
4
VM Evolution
History
History
Terminology
H.26L (has become outdated...);
JVT (joint Video Team) or JVT codec;
JM2.x, JM3.x, JM4.x;
AVC or Advanced Video Coding.
1
For this part, most of the figures have been extracted from B. Girods courses (EE398 Image and
Video Compression).
A big toolbox
A big toolbox
10
A big toolbox
11
Hierarchical syntax
Hierarchical syntax
Slice and macroblock
Slice
Macroblock
Transform
Entropy coding
Profiles
Amendments
12
Conclusion
Macroblock
Basic syntax and processing unit;
1 MB is composed of 16x16 pixels of luminance and 2 blocks 8x8 of chrominance (4:2:0);
MB within a slice depends on each other;
MB can be further partitionned.
13
Slice group
A slice group is a subset of the MB in a coded picture and may contain one or more slices.
The goal is to divide the picture into different scan patterns of macroblocks. Different types
can be used:
Type 0: interleaved;
Type 1: dispersed;
Type 2: foreground and background. All but the last slice group are defined as
rectangular regions within the picture. The last slice group contains all MBs not
contained in any other slice group (background);
Type 3: box-out. A box is created starting from the center of the frame;
Type 4: raster scan;
Type 5: wipe (vertical scan order);
Type 6: explicit (a MB map is entirely user-defined).
14
Example extracted from Unequal Error Protection Technique for ROI Based H.264
Video Coding , H. K. Arachchi, CCECE 2006.
H.264 network abstraction layer (NAL) encodes each slice into separate data packet.
16
17
18
19
In spatial interpolation, the values of missing pixels are estimated from the surrounding
pixels of the same frame, without using the temporal information.
Temporal interpolation is based on the corresponding regions of the reference frames. If a
motion vector is missing, it can be estimated based on the motion vectors of the
surrounding regions.
Combination schemes use an adaptive mechanism to choose the best concealment
Slice
I-Slice
An I slice contains only intra-coded MBs (predicted from previously coded samples in the same
slice). I=Intra.
P-Slice
An P slice can contain inter-coded MBs (predicted from samples in previously coded pictures),
intra-coded MBs or Skipped MBs. P=Predictive.
B-Slice
An B slice can contain inter-coded MBs (each MB partition can be predicted from samples of
one or two reference pictures before and after the current picture). B=Bi-predictive (6=
Bi-directional as in MPEG-2!!)
Macroblock
MB-AFF=MB-adaptive frame/field
The concept of macroblock frame/field coding decision was originated from MPEG2
standard. Instead of splitting up a 16 16 MB into two 16 8 blocks, super MB is
defined as a decision unit (16 32).
The frame is scanned as MB pairs. For each MB pair (16 32), the coding type
frame/field is decided. A super MB can be coded as:
two frame MBs of 16x16;
one top-field MB of 16x16 and one bottom-field MB of 16x16.
Coding MB pair in field mode requires modifications to a number of the encoding and
decoding steps.
21
Motion compensation
Multiple Reference Frames
Motion vector prediction
Hierarchical syntax
Transform
Entropy coding
Profiles
Amendments
23
Conclusion
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
24
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
Motion Vectors
Each partition or sub-MB partition is predicted from an area of same size and shape in
a reference picture.
The motion vector has a quarter-sample resolution for the luma component and
one-eighth-sample resolution for the chroma components.
Sub-pixel motion compensation can provide significantly better compression
performance than integer-pixel compensation, at the expense of increased complexity.
Quarter-pixel accuracy outperforms half-pixel accuracy.
26
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
Half-sample positions are obtained by applying a 6-tap filter with tap values
1 5 20 20 5 1
( 32
, 32 , 32 , 32 , 32 , 32 ):
Sample b: b = b(E 5F + 20G + 20H 5I + J)/32c
Sample h: h = b(A 5C + 20G + 20M 5R + T )/32c
27
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
Once all the half-pels samples are available, the samples at quarter-pel positions
are produced by linear interpolation (average of samples at integer and half-pel
sample positions).
28
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
29
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
H.264/AVC allows unrestricted motion vector (motion vectors can point outsidet he
image area. In this case, the reference frame is extended beyong the image boundaries
by repeating the edge pixels before interpolation).
31
From [Richardson,03].
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
33
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
Direct prediction mode
35
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
Weighted prediction
Weighted prediction is a method of modifying the samples of motion-compensated
prediction data in a P or B slice MB:
Explicit weighted prediction for P and B slice MB: the weighting factors are
determined by the encoder and transmitted in the slice header;
Implicit weighted prediction for B slice MB: the weighting factors are calculated
based on the relative temporal positions of the list 0 and list 1 reference pictures.
Weighted prediction may be effective in coding of fade transitions.
36
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
37
Motion compensation
38
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
Multiple Reference Frames
Motion vector prediction
Motion compensation
For 8 16 partitions, MVp for the left 8 16 partition is predicted from A and
MVp for the right 8 16 partition is predicted from C.
For 16 8 partitions, MVp for the upper 16 8 partition is predicted from B and
MVp for the lower 16 8 partition is predicted from A.
39
Principle
16 16 Intra prediction modes
4 4 Intra prediction modes
Hierarchical syntax
Transform
Entropy coding
Profiles
Amendments
10
40
Conclusion
Principle
16 16 Intra prediction modes
4 4 Intra prediction modes
Intra prediction
41
Principle
16 16 Intra prediction modes
4 4 Intra prediction modes
Intra prediction
42
Principle
16 16 Intra prediction modes
4 4 Intra prediction modes
Intra prediction
16 16 luma DC prediction
if TOP and LEFT predictors are available:
mean=(sum(H)+sum(V)+16)/32
43
16 16 Plane mode
Given the top predictors (T0...T15), left predictors (L0...L15) and the left-top corner predictor
(LT) arranged as follows:
Principle
16 16 Intra prediction modes
4 4 Intra prediction modes
Intra prediction
45
Intra prediction
46
Principle
16 16 Intra prediction modes
4 4 Intra prediction modes
Principle
16 16 Intra prediction modes
4 4 Intra prediction modes
Intra prediction
47
Intra prediction
48
Principle
16 16 Intra prediction modes
4 4 Intra prediction modes
Idea
Algorithm
Examples
Deblocking filter
Hierarchical syntax
Transform
Entropy coding
Profiles
Amendments
10
49
Conclusion
Idea
Algorithm
Examples
Due to coarse quantization at low bit rates, block-based coding typically results in
visually noticeable discontinuities along the block boundaries.
Idea
To remove such blocking artifacts, a deblocking filter operating within the predictive
coding loop is proposed:
As the coder and the decoder must do the same operation, this filter also
constitutes a required component of the decoding process;
Adaptivity on different levels (slice, edge...);
To improve the appearance of the decoded pictures;
Significantly superior to post filtering (the filter reduces bit rate by typically 5-10
percent);
50
Idea
Algorithm
Examples
Algorithm
51
Idea
Algorithm
Examples
Algorithm
52
Idea
Algorithm
Examples
Algorithm
Filter decision
The set of eight pixels across a 4 4 block horizontal or vertical boundary is denoted
as shown below, with actual boundary between p0 and q0 .
Filtering condition: A group of samples is filtered only if:
Bs 6= 0 and
|p0 q0 | < and |p1 p0 | < and |q1 q0 |
and are defined in the standard. They increase with the average quantiser
parameter Qp of the two blocks.
disabled: there is a high gradient across the block boundary in the original image;
Qp is small: and are small (probability to have blocking effects is very low);
Qp is high: and are high.
53
54
Idea
Algorithm
Examples
From [Richardson,03].
55
Idea
Algorithm
Examples
Introduction
4 4 residual transform
4 4 Hadamard transform
Zig-zag scan
Transform
Hierarchical syntax
Transform
Introduction
4 4 residual transform
4 4 Hadamard transform
Zig-zag scan
Entropy coding
Profiles
Amendments
10
56
Conclusion
Introduction
4 4 residual transform
4 4 Hadamard transform
Zig-zag scan
Introduction
57
Introduction
4 4 residual transform
4 4 Hadamard transform
Zig-zag scan
4 4 residual transform
A DCT-based transform
1
1
1
1
2
1
1 2
Tv = Th =
1 1 1
1
1 2
2
1
58
Introduction
4 4 residual transform
4 4 Hadamard transform
Zig-zag scan
4 4 Hadamard transform
Intra 16 16 MB type
1
1
1
1
1
1
1 1
1 1
H1 = 1
, H2 = 12
1
2
1 1
1 1 1
1 1 1
1
59
Introduction
4 4 residual transform
4 4 Hadamard transform
Zig-zag scan
4 4 Hadamard transform
(a) Luma 4 4 DC
60
(b) Chroma 2 2 DC
Introduction
4 4 residual transform
4 4 Hadamard transform
Zig-zag scan
4 4 Hadamard transform
(a) zigzag scan order (frame (b) zigzag scan order (field
block)
block)
61
Introduction
CA-VLC
CABAC
Entropy coding
Hierarchical syntax
Transform
Entropy coding
Introduction
CA-VLC
CABAC
Profiles
Amendments
10
62
Conclusion
Introduction
CA-VLC
CABAC
Introduction
63
Run-length coding
Huffman coding
Basic arithmetic coding
CABAC (Context Adaptive Binary Arithmetic Coding)
CAVLC (Context Adaptive VLC)
CA-VLC
Introduction
CA-VLC
CABAC
64
The highest nonzeros coefficients after the zigzag scan are often sequences of 1
CA-VLC
Introduction
CA-VLC
CABAC
Four LUTs are available (three VLC and one fixed-length code table). The choice
of Table depends on the number of nonzero coeffs previously coded blocks (=
Context-adaptive).
Encode the sign of each TrailingOne: for each TrailingOne signalled by
coeff_token, the sign is encoded with a single bit in reverse order, starting with
the highest-frequency TrailingOne;
Encode the levels of the remaining nonzero coefficients:
Level of non zero coeffs = sign + magnitude;
Encoded in reverse order;
Code for each level = level_prefix + level_suffix. This last value is adapted
depending on the magnitude of each successive coded level (difference of
magnitude, threshold and LUT) (= Context-adaptive).
65
Encode the total number of zeros before the last coefficient: the sum of all
zeros preceding the highest nonzero coefficient in the reordered array is coded
with a a VLC;
Encode each run of zeros: the number of zeros preceding each non zero coeff is
encoded in reverse order.
CA-VLC
Bitstream=000000011010001001000010111001100
CABAC
Introduction
CA-VLC
CABAC
67
Introduction
CA-VLC
CABAC
68
69
Introduction
CA-VLC
CABAC
Introduction
CA-VLC
CABAC
Context here is: previous letter, 4 different context models employed in CABAC
70
Profiles
Hierarchical syntax
Transform
Entropy coding
Profiles
Amendments
10
71
Conclusion
Profile
Profile
Baseline Profile
I/P slices
Multiple reference frames
In-loop deblocking
CAVLC entropy coding
Main Profile
High Profile
Main Profile features mentioned above
8 8 transform option
Custom quantisation matrices
73
Hierarchical syntax
Transform
Entropy coding
Profiles
Amendments
Scalable Video Coding
Multiview Video Coding
Introduction
Types of scalability
Performances
10
74
Conclusion
Introduction
Use-cases: video telephony and video conferencing over mobile TV, wireless and
Internet video streaming, standard- and high-definition TV broadcasting, storage...
Compress once, decompress many ways!!
75
Types of scalability
Performances
77
Array of 16 cams.
78
79
In July 2008, MPEG officially approved an amendment of the ITU-T Rec. H.264 & ISO/IEC
14996-10 Advanced Video Coding (AVC) standard on Multiview Video Coding. This new
standard enables an efficient compressed representation of stereo and multiview video by
exploiting correlation among neighboring camera views to support 3D and free-viewpoint video
applications.
(a) Config.
From [Dufaux,07].
80
The compression algorithm strongly depends on the data representation and on the
targeted applications.
83
84
85
Conclusion
Hierarchical syntax
Transform
Entropy coding
Profiles
Amendments
10
86
Conclusion
Conclusion
87
Conclusion
88
Conclusion
89
Conclusion
CfP
First solution
Assessment of proposals
Iteration
t
Verification Model (VM)
VM Evolution
90