L3 - 4-Digital Video Standards

CSE412
SELECTED TOPICS IN
COMPUTER ENGINEERING
DIGITAL VIDEO STANDARDS

MPEG: Motion Picture Experts Group
• MPEG-1 (1992)
• Compression for Storage
• 1.5Mbps
• Frame-based Compression
• MPEG-2 (1994)
• Digital TV
• 6.0 Mbps
• Frame-based Compression
• MPEG-4 (1998)
• Multimedia Applications, digital TV, synthetic graphics
• Lower bit rate
• Object based compression
• MPEG-7
• Multimedia Content Description Interface
• MPEG-21
• Digital identification, Intellectual Property (IP) rights management
Basics of MPEG
Types of pictures
– I (intra) frame
• compressed using only intraframe coding
• Moderate compression but faster random access
– P (predicted) frame
• Coded with motion compression using past I frames or P
frames
• Can be used as reference pictures for additional motion
compensation
– B (bidirectional) frame
• Coded by motion compensation by either past or future I or P
frames
– D (DC) frame
• Limited use: encodes only DC components of intraframe coding
MPEG Frame Types
• Intra (I) pictures: coded by themselves, as still
images. No temporal coding. No motion
vectors.
MPEG Frame Types
• Forward Motion Compensated predicted (P)
pictures – forward motion compensated from
the previous I or P frame
MPEG Frame Types
• Motion Compensated interpolated (B) pictures –
forward, backward, or interpolatively (average of
forward and backward) motion compensated from
previous and next I/P frames
MPEG Frame Structure Terminology
• A slice is a collection of macroblocks, tracing in
a raster scan from upper left to lower right
– The resynchronization unit
• A Group of Pictures (GOP) contains ≥ 1 frame.
– The unit for random access into the
sequence
MPEG GOP Structure
• A Group of Pictures (GOP) may contain
– All I pictures
– I & P pictures only
– I, P, & B Pictures
I B B P B B P B B P B B P B B I
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Frame Ordering
– Display order (encoder input order):
B B I B B P B B P B B P B B P B B I
-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
– But consider coding dependencies:

• Frame 2 (B) needs frame 4 (P) to be decoded
first, etc.
• So better transmit frame 4 before frame 2
I B B P B B P B B P B B P B B I B B
1 -1 0 4 2 3 7 5 6 10 8 9 13 11 12 16 14 15
Coding Mode I (Inter-Coding)
Inter coding refers to coding with motion vectors
Macro
Block
Previous Current
Frame Frame
Motion Vector
Coding Mode II (Intra-Coding)
INTRA coding refers to coding without motion vectors

The Macro Block (MB) is coded all by itself, in a manner
similar to JPEG
Macro
Block
Previous Current
Frame Frame
1
2
I-Picture Coding
Use of macroblocks modifies block-scan order:

8 8
8
8
P-Picture Coding: many coding modes
– Motion compensated coding: Motion Vector (MV) only
– Motion compensated coding: MV plus difference
macroblock
Motion
Vector
– Motion compensation: MV & difference MB with

modified quantization scaling
B-Picture Coding
B pictures have even more possible modes:
– Forward prediction MV, no difference block
– Forward prediction MV, plus difference block
– Backward prediction MV, no difference block
– Backward prediction MV, plus difference block
– Interpolative prediction MV, no difference block
– Interpolative prediction MV, plus difference block
– Some of above with modified Quantization parameters
Group of Pictures
IIIII…: Every picture is intra-coded.
– Fully decodable without reference to any other picture
– Editing is straightforward
– Requires about 2.5 more bit rate than bidirectional
IBBPBBPB…: Forward and bidirectional

– Best compression factor
– Needs large decoder memory
– Hard to edit
– Most useful for final delivery of post-produced material
(e.g., broadcast) because no editing requirement
Group of Pictures
IPPPPIPP…: Forward predicted only.
– Needs less decoder memory
IBIBIB…: bidirectional compromise
– Some of the bit rate advantage of bidirectional coding
– Editable with moderate processing.
For example, if the video after a B picture needs to be
deleted, the B frame would not be decodable.
Solution is to decode the B frame first, re-encode it using
forward prediction only. Some quality loss.
MPEG: Video Encoding
Regulator
Frame + Quantizer VLC

-
DCT
Memory (Q) Encoder
Q-1
Predictive frame
Motion vectors
Pre Buffer
processing IDCT
+ Output
Input
Motion Frame
Compensation Memory
Motion
Estimation
MPEG: Video Encoding
– Interframe predictive coding (P-pictures)
• For each macroblock the motion estimator produces the best
matching macroblock
• The two macroblocks are subtracted and the difference is DCT
coded
– Interframe interpolative coding (B-pictures)
• The motion vector estimation is performed twice
• The encoder forms a prediction error macroblock from either of
them or from their average
• The prediction error is encoded using a block-based DCT
– The encoder needs to reorder pictures because B-
frames always arrive late
MPEG-1 Video Layer
• a coded representation that can be used for compressing video
sequences - both 625-line and 525-lines - to bitrates around 1.5
Mbit/s.
• Developed to operate from storage media offering a continuous

transfer rate of about 1.5 Mbit/s.
• Different techniques for video compression:

• Select an appropriate spatial resolution for the signal. Use block-based
motion compensation to reduce the temporal redundancy. Motion
compensation is used for causal prediction of the current picture from a
previous picture, for non-causal prediction of the current picture from a
future picture, or for interpolative prediction from past and future pictures.
• The difference signal, the prediction error, is further compressed using the
discrete cosine transform (DCT) to remove spatial correlation and is then
quantized.
• Finally, the motion vectors are combined with the DCT information, and
coded using variable length codes.
MPEG-1 Systems Layer
• Combines one or more data streams from the video and audio parts with
timing information to form a single stream suited to digital storage or
transmission.
MPEG-1
• I,B,P Frames
• Picture size, bitrate is variable
• No closed-captions
• Group of Pictures
• one I frame in every group
• 10-15 frames per group
• P depends only on I, B depends on both I and P
• B and P are random within GoP
MPEG Video Filtering
I B B P B B P B B P B B P B B I
I B P B P B P B P B I
I P P P P I
I P P P I
I P P I
I I
MPEG-2
– Digital Television (4 - 9 Mb/s)
– Satellite dishes, digital cable video
– Larger data size
– includes closed-captions
– More complex encoding (“long time”)
– Support higher bit rates for HDTV instead of the 1.5Mbps
– Support a larger number of applications
– Different color subsampling modes e.g., 4:2:2, 4:2:0, 4:4:4
MPEG-2: Profiles and Levels
Profiles
Levels SNR Spatial High Multiview
4:2:0 4:2:0 4:2:0;4:2:2 4:2:0
Enhancement 1920 X 1151/60 1920 X 1151/60
Lower 960 X 576/30 1920 X 1151/60
High Bitrate 100, 80,25 130, 50, 80
Enhancement 1440 X 1152/60 1440 X 1152/60 1920 X 1152/60
Lower 720 X 576/30 720 X 576/30 1920 X 1152/60

High-1440
Bitrate 60, 40, 15 80, 60, 20 100, 40, 60

Enhancement 720 X 576/30 720 X 576/30 720 X 576/30
Main Lower 352 X 288/30 720 X 576/30
Bitrate 15, 10 20, 15, 4 25, 10, 15
Enhancement 352 X 288/30 352 X 288/30
Low Lower 352 X 288/30
Bitrate 4, 3 8, 4, 4
MPEG-2 Applications
Digital Betacam: 90 Mbits/s video
MPEG-2
– Main Profile, Main Level, 4:2:0: 15 Mbits/s
– High Profile, High Level, 4:2:0: adequate, expensive
MPEG-4
• Similar to MPEG-2 but it includes the following features
• Interactive Graphics Applications
• Interactive multimedia (WWW), networked distribution
MPEG-4
• Bitrates from 5kb/s to 10Mb/s
• Several extension “profiles”
• Very high quality video
• Better compression than MPEG-1
• Low delay audio and error resilience
• Support for “objects”, e.g., Face Animation
• Support for efficient streaming
MPEG-4
MPEG-4
Objective
– Standardize algorithms for audiovisual coding in
multimedia applications allowing for
• Interactivity
• High compression
• Scalability of audio and video content
• Support for natural and synthetic audio and video
The Idea
– An audiovisual scene is a coded representation of
audiovisual objects related in space and time
MPEG-4: Scenario
A/V object
– A video object within a scene
– The background
– An instrument or voice
– Coded independently
A/V scene
– Mixture of natural or synthetic objects
– Individual bitstreams multiplexed and transmitted
– One or more channels
– Each channel may have its own quality of service
MPEG-4: Video Object Plane (VOP)
• Video frame = sum of segmented regions with
arbitrary shape (VOP)
• Shape motion and texture information of VOPs
belonging to the same video object is encoded
into a video object layer (VOL)
• Encode
– VOL identifiers
– Composition information
• Overlapping configuration of VOPs
MPEG-4: Coding
Shape coding
– Shape information in alpha planes
– Transparency of shape encoded
– Inter and intra shape coding functions
– After shape coding each VOP in a VO is
partitioned into non-overlapping macroblocks
Motion coding
– Shift parameter with respect to reference window
– Standard macroblock
– Contour macroblock
MPEG-4: Coding
Texture coding
– Intra-VOPs, residual errors from motion compensation are
DCT coded like MPEG-1
– P-VOPs (prediction error blocks) may not conform to VOP
boundary
• Pixels outside the active area are set to a constant value
• Standard compression
• Efficient prediction of DC and AC components from intra and
inter coded blocks
– Multiplexing
• Shape → motion → texture coded data
• Motion and DCT coefficients can be jointly or individually
coded
Composition of Audiovisual Objects
(AVOs)
• MPEG-4 provides a standardized way to describe a scene, allowing
the user to:
– place AVOs anywhere in a given coordinate system;
– apply transforms to change the geometrical or acoustical appearance
of a AVO;
– group primitive AVOs in order to form compound media objects;
– apply streamed data to AVOs, in order to modify their attributes;
– change interactively the user’s viewing and listening points anywhere
in the scene.
• With reference to the shown figure, for example, one can replace the
person with a different person, changes her dress or hairstyle;
group the desk and the globe to form a compound AVO since they
are static; or change the background.
An MPEG-4
audiovisual scene
Video Objects
• MPEG-4 treats a video sequence as a collection of
video objects.
• A video object (VO) is an area of video scene that may
occupy an arbitrary-shaped region and may exist for an
arbitrary length of time.
• An instance of a VO at a particular point in time is a
video object plane (VOP).
• In the traditional video coding sense, a rectangular
video frame is a VOP and a video sequence is a VO.
MPEG-4 Encoder
+ motion video
_ DCT Q texture multiplex
coding
Q-1
IDCT
+
+
S pred. 1
w
i Frame
pred. 2 Store
t
c
h pred. 3
Motion
estimation
MPEG-4 encoder.
Shape
coding
VOP Prediction
I-VOP B-VOP P-VOP
Forward Backward
Bidirectional
Forward
VOP prediction
MPEG-4 Profiles
Profile Coding Tools

Simple Profile I-VOP, P-VOP, MV, Intra prediction,
Coding of rectangular video Video packets, Data Partitioning,
frames
Core Profile Simple coding tools in
Coding of arbitrary-shaped addition to B-VOP
video objects
Scalable Profile Simple coding tools in addition to:
Scalable coding of Temporal scalability, Spatial scalability,
rectangular video frames or Fine granular scalability
video objects
Simple Profile (SP): Basic Coding Tools
I-VOP
• An I-VOP is a rectangular video frame encoded in Intra mode.
Source
DCT Q Reorder RLC VLC
frame
Decoded
IDCT Q−1 Reorder RLD VLD
frame
DCT – Discrete cosine transform IDCT – Inverse discrete cosine transform

Q – Quantization Q−1 – Inverse quantization
RLC – Runlength coding RLD - Runlength decoding

VLC – Variable length coding VLD – Variable length decoding
SP: Basic Coding Tools
• A coded I-VOP consists of a VOP header, optional video packet
headers and coded macroblocks.
• Each macroblock (MB) is coded with a header (defining the
macroblock type, identifying which blocks in the MB contain coded
coefficients, signalling changes in quantization parameter, etc.)
followed by coded coefficients for each 88 block.
• In the decoder, the sequence of VLCs are decoded to extract the
quantized transform coefficients which are re-scaled and inverse
transformed to reconstruct the decoded I-VOP.
P-VOP
• A P-VOP is coded with Inter prediction from a previously encoded
I- or P-VOP (a reference VOP).
ME
Source
MCP DCT Q Reorder RLC VLC
frame
Decoded
MCR IDCT Q−1 Reorder RLD VLD
frame
ME – Motion estimation MCP – Motion compensated prediction

MCR – Motion compensated reconstrunction
Motion Estimation and Compensation
• The basic motion compensation (MC) scheme is the block-based
compensation of 1616 pixel blocks.
• The motion vector (MV) may have half-pixel resolution where the
half-pixel positions are calculated using interpolation
between pixels at integer-pixel positions. The motion estimation
(ME) method is not defined and left for the implementer to
decide it.
• The residual MB is formed by subtracting the motion-compensated
MB (prediction) in the reference frame from the current MB.
• The residual MB is transformed with the DCT, quantized, zig-zag

scanned, run-length coded.
SP: Coding Efficiency Tools
Four Motion Vectors per Macroblock
• The default block size for ME is 1616 for luma pixels and 88 for
chroma pixels. This tool allows the encoder to choose a smaller ME
block size of 88 for luma and 44 for chroma pixels, giving 4 MVs
per MB.
• The mode can minimize the energy of the MC residual, particularly
in areas of complex motion or near the boundaries of moving
objects.
• There is an increase in overhead in sending the 4 MVs, and so the
encoder may choose to send one or four MVs on a MB-by-MB basis.
MPEG-4: Core Profile (CP)
• Simple Profile coding tools
• B-VOP (bidirectionally predicted Inter-coded VOP)
• Object-based coding (with Binary Shape)
Core Profile Coding Tools
B-VOP
• The block or macroblock (MB) may be predicted using (a) forward
prediction from the previous I- or P-VOP, (b) backward prediction
from the next I- or P-VOP, or (c) an average of forward and
backward predictions.
• This mode generally gives better coding efficiency than basic
forward prediction; however, the encoder must store multiple
frames prior to coding each B-VOP which increases the memory
requirements and the encoding delay.
Core Profile Coding Tools
Example of direct mode prediction
I4 B5 B6 P7
Forward Backward
MVF MVB
Bidirectional
Direct mode prediction

Object-based Coding
• The most important functionality in the Core Profile (CP) is its
support for coding of arbitrary-shaped objects.
2 3 1
• Each MB position in the picture is classified as: 2 3 1
(1) opaque (fully ‘inside’ the VOP), 2 3 1
(2) transparent (not part of the VOP), or
3 1 1
(3) on the boundary of the VOP.
• In order to indicate the shape of the VOP to the decoder, alpha
mask information is sent for every MB.
• In the Core Profile, only binary alpha information is allowed, and
each pixel position in the VOP is defined either as opaque or
transparent.
• CP supports coding of binary alpha information, and provides
tools to encode texture within boundary MBs.
Object-based Coding
Binary Shape Coding
• The binary alpha mask indicates

which pixels are part of the VOP
and which pixels are outside the VOP.
• The binary alpha mask for each

macroblock is called Binary Alpha Block
(BAB).
Binary alpha mask

MPEG-4: Scalable Profile
• Scalable coding of video data enables a decoder to decode
selectively only part of the coded bitstream.
• The coded bitstream is arranged in a number of layers, including a
base layer and one or more enhancement layers.
• The base layer decodes a video with basic quality, while the
enhancement layer(s) together with the base layer delivers a high
quality video.
• MPEG-4 Scalable Profile supports:
1. Spatial scalability
2. Temporal scalability
3. Fine grain scalability
Scalable Video Coding
Basic-quality
Base layer Decoder A
Video sequence
Encoder Enhancement
sequence
layer 1
     
Decoder B High-quality
Enhancement sequence
layer N
Scalable video coding

Spatial Scalability
• The base layer contains a reduced-resolution version of each
frame. Decoding the base layer alone produces a low-
resolution output sequence, and decoding the base layer with
enhancement layer(s) produces a higher-resolution output.
• The procedure to encode a video sequence into two spatial
layers:
1. Subsample each input video frame (or video object)
horizontally and vertically.
2. Encode the reduced-resolution frame to form the base layer.
3. Decode the base layer and upsample to the original
resolution to form the prediction frame.
4. Subtract the full-resolution frame from this prediction
frame to form the residual.
5. Encode the residual to form the enhancement layer.
Spatial Scalability
• A single-layer decoder decodes only the base layer to produce a
reduced-resolution output sequence.
• To reconstruct the full-resolution sequence:
1. Decode the base layer and upsample to the original resolution.
2. Decode the enhancement layer to obtain the residual.
3. Add the decoded residual to the decoded base layer to form the
output frame.
Temporal Scalability
• The basic idea of temporal scalability is to split the sequence into
two layers. The base layer of a temporal scalable sequence is
encoded at a lower frame rate and an enhancement layer
consisting of I-, P- and/or B-VOPs that can be decoded together
with the base layer to provide an increased video frame rate.
• The enhancement VOPs are predicted using motion-compensated
prediction according to the following rules as illustrated in
the following figures.
• An enhancement I-VOP is encoded without any prediction.
• An enhancement P-VOP is predicted from:
(i) the previous enhancement P-VOP/I-VOP; or
(ii) the previous base layer P-VOP/I-VOP; or
(iii) the next base layer P-VOP/I-VOP.
• An enhancement B-VOP is predicted from:
(i) the previous enhancement and previous base layer P-VOP/I-VOP; or
(ii) the previous enhancement and next base layer P-VOP/I-VOP; or
(iii) the previous and next base layer P-VOP/I-VOP.
(i)
0 2
enhancement
layer VOPs
(iii)
(ii)
1 3 base layer VOPs
Temporal enhancement P-VOP prediction options

0 2 0 2 2
1 3 1 3
(i) (ii) (iii)
Temporal enhancement B-VOP prediction options
57
Fine Granular Scalability
• Fine Granular Scalability (FGS) is a method of encoding a
sequence as a base layer and enhancement layer such that the
enhancement layer can be truncated during or after encoding to
give a highly flexible control over the transmitted bitrate.
• FGS is very useful in video streaming applications where the
channel bandwidth may change. When that happens, the
streaming server transmits the base layer and a truncated version
of the enhancement layer to match the available bandwidth,
hence maximizing the decoded video quality without the need to
re-encode the video sequence.
58
MPEG-7
• Data + Multimedia Content Description Scheme
• Description Definition Language (e.g., XML-based)
• Does not deal with data, but meta-data transmission
• Description Scheme + Content Description, e.g:
• Table of content
• Still Images
• Summaries
• links
• etc.
• Focus mainly on how description of data gets generated and how
it is used
MPEG-7

L3 - 4-Digital Video Standards

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L3 - 4-Digital Video Standards

Uploaded by

Copyright:

Available Formats

CSE412

DIGITAL VIDEO STANDARDS

– Display order (encoder input order):

– But consider coding dependencies:

INTRA coding refers to coding without motion vectors

Use of macroblocks modifies block-scan order:

– Motion compensation: MV & difference MB with

IBBPBBPB…: Forward and bidirectional

Frame + Quantizer VLC

• Developed to operate from storage media offering a continuous

• Different techniques for video compression:

Lower 720 X 576/30 720 X 576/30 1920 X 1152/60

Bitrate 60, 40, 15 80, 60, 20 100, 40, 60

I-VOP B-VOP P-VOP

Profile Coding Tools

DCT – Discrete cosine transform IDCT – Inverse discrete cosine transform

RLC – Runlength coding RLD - Runlength decoding

ME – Motion estimation MCP – Motion compensated prediction

• The residual MB is transformed with the DCT, quantized, zig-zag

Direct mode prediction

Binary Shape Coding

• The binary alpha mask indicates

• The binary alpha mask for each

Binary alpha mask

Scalable video coding

1 3 base layer VOPs

Temporal enhancement P-VOP prediction options

(i) (ii) (iii)

Temporal enhancement B-VOP prediction options

You might also like