You are on page 1of 108

Analysis of MPEG

MPEG: the Organization


• Moving Picture Experts Group

• Established in 1988

• Standards under International Organization for


standardization (ISO) and International Electro
technical Commission (IEC)

• Official name is: ISO/IEC JTC1 SC29 WG11


MPEG vs. Competitor
• Generally produces better quality than the other
formats such as:
• Video for Window
• Index and QuickTime
• MPEG audio/video compression can be used many
applications:
• DVD player
• HDTV recorder
• Internet Video
• Video Conferences
• Others
MPEG Overview
• MPEG-1 : a standard for storage and retrieval of moving
pictures and audio on storage media
• MPEG-2 : a standard for digital television
• MPEG-4 : a standard for multimedia applications
• MPEG-7 : a content representation standard for
information search
• MPEG-21: offers metadata information for audio and
video files
MPEG 1

• First standard to be published by the MPEG


organization (in 1992)

• A standard for storage and retrieval of moving


pictures and audio on storage media

• Example formats: VideoCD (VCD), mp3, mp2


5 Parts of MPEG 1
• Part 1: Combining video and audio inputs into a single/multiple
data stream

• Part 2: Video Compression

• Part 3: Audio Compression

• Part 4: Requirements Verification

• Part 5: Technical report on the software implementation of the


Parts 1 - 3
Basic Structure of Audio Encoder

Note: A decoder basically works in just the opposite manner


Processes of and Audio Encoder
• Mapping Block – divides audio inputs into 32 equal-
width frequency subbands (samples)
• Psychoacoustic Block – calculates masking threshold
for each subband
• Bit-Allocation Block – allocates bits using outputs of
the Mapping and Psychoacoustic blocks
• Quantizer & Coding Block – scales and quantize
(reduce) the samples
• Frame Packing Block – formats the samples with
headers into an encoded stream
MPEG-1 Layers I, II, III

• MPEG layer differences lie in processing power


and resulting audio/sound quality

• Mp1 – little processing needed, poor quality


• Mp2 – minimal processing, “okay” quality
• Mp3 – massive processing, high “CD” quality
MPEG-2 Overview

• Extends video & audio compression of MPEG-1


- Substantially reduces bandwidth required for high-
quality
transmissions
- Optimizes balance between resolution (quality) and
bandwidth (speed)
10 Parts of MPEG-2
• Part 1: Combine video and audio data into single/multiple streams
• Part 2: Offers more advanced video compression tools
• Part 3: Is a multi-channel extension of the MPEG-1 Audio standard
• Part 4/5: Correspond to and build on part 4/5 of MPEG-1
• Part 6: Specifies protocols of managing MPEG-1 & MPEG-2 bitstreams
• Part 7: Specifies a multi-channel audio coding algorithm
• Part 8: (was discontinued because of obsolescence)
• Part 9: specifies the Real-time Interface (RTI) to Transport Stream decoders
• Part 10: the conformance part of Digital Storage Media Command and Control
(currently under development)
MPEG-2 Video Compression
Overview
VIDEO STREAM DATA HIRERARCHY
MPEG-2 Video Compression
• Video stream Overview
• Group of Pictures (GOP)
• I-frames: can be reconstructed without any reference to other frames
• P-frames: forward predicted from last I-frame and P-frames
• B-frames: forward and backward predicted
MPEG-2 Video Compression
• Overview
Compression: Eliminating Redundancies
• Spatial Redundancy
• Pixels are replicated within a single frame of video
• Temporal Redundancy
• Consecutive frames of video display images of the same scene
MPEG-2 Video Compression
Overview
Four Video Compression Techniques:
1. Pre-processing
2. Temporal Prediction
3. Motion Compensation
4. Quantization
MPEG-2 Video Compression
Overview
• Pre-processing
• Filters out unnecessary information
• Information that is difficult to encode
• Not an important component of human visual perception
MPEG-2 Video Compression
Overview
• Temporal Prediction:
• Uses the mathematical algorithm Discrete Cosine Transform
(DCT) to:
• Divide each frame into 8X8 blocks of pixels
• Reorganize residual differences between frames
• Encode each block separately
MPEG-2 Video Compression
Overview
MPEG-2 Video Compression
Overview
MPEG-2 Video Compression
Overview
MPEG-2 Video Compression
Overview

• Quantization:
• Refers to DCT coefficients
• Removes subjective redundancy
• Controls compression factor
• Converts coefficients into even smaller numbers
MPEG-2 Video Compression
Overview
Where It Is Used:
• Multimedia Communications
• Webcasting
• Broadcasting
• Video on Demand
• Interactive Digital Media
• Telecommunications
• Mobile communications
MPEG-2 Transmission Overview

• Building the MPEG Bit Stream:


Elementary Stream (ES)
- Digital Control Data
- Digital Audio
- Digital Video
- Digital Data
Packetised Elementary Stream (PES)
- Each ES combined into stream of PES packets.
- A PES packet may be fixed (or variable) sized block.
- Each block has up to 65536 bytes per block and a 6 byte protocol
header.
MPEG-2 Transmission Cont.

• MPEG-2 Multiplexing
MPEG Program Stream
- Tightly coupled PES packets
- Used for video playback and network application

MPEG Transport Stream


- Each PES packet is broken into fixed-sized transport packets
MPEG Transport Streams
Combining ES from Encoders into
a Transport Stream
Single & Multiple Program Transport
Streams
Format of a Transport Stream
Packet
MPEG-2 Encoders
Types of MPEG-2 Decoders

1. MPEG-2 Software Decoder & PC-Based Accelerator

2. MPEG-2 Computer Decoder

3. MPEG-2 Network Computers/Thin Clients

4. MPEG-2 Set-Top Box

5. MPEG-2 Consumer Equipment


MPEG-4 Overview

• Submergence
• Handle specific requirements from rapidly developing
multimedia applications

• Advantages over MPEG-1 and MPEG-2


• Object-oriented coding
MPEG-4 Standard: 6 Parts

Overview
Part 1: Systems - specifies scene description, multiplexing, synchronization, buffer
management, and management and protection of intellectual property.

• Part 2: Visual - specifies the coded representation of natural and synthetic visual objects .

• Part 3: Audio - specifies the coded representation of natural and synthetic audio objects.

• Part 4: Conformance Testing - defines conformance conditions for bit streams and
devices; this part is used to test MPEG-4 implementations.

• Part 5: Reference Software - includes software corresponding to most parts of MPEG-4,


it can be used for implementing compliant products as ISO
waives the copyright of the code.

• Part 6: Delivery Multimedia Integration Framework (DMIF) - defines a session protocol


for the management of multimedia streaming over generic delivery technologies.
Features & Functionalities
• Object Oriented
• Primitive Audiovisual Objects are Coded

• Low Data Rate


• Allows for high quality video at lower data rates and smaller
file size

• Interoperability
• Opens methods in playing with audiovisual scenes
MPEG-4 Object Based Coding
Architecture
MPEG-4 Scene
Targeted Applications
• Digital TV
• TV logos, Customized advertising, Multi-window screen
• Mobile multimedia
• Cell phones and palm computers
• TV production
• Target viewers
• Games
• Personalize games
• Streaming Video
• News updates and live music shows over Internet
MPEG-4
• MPEG-4, or ISO/IEC 14496 is an international standard
describing coding of audio-video objects
• the 1st version of MPEG-4 became an international standard in
1999 and the 2nd version in 2000 (6 parts); since then many
parts were added and some are under development today
• MPEG-4 included object-based audio-video coding for
Internet streaming, television broadcasting, but also digital
storage
• MPEG-4 included interactivity and VRML support for 3D
rendering
• has profiles and levels like MPEG-2
• has 27 parts
MPEG-4 parts
• Part 1, Systems – synchronizing and multiplexing audio and
video
• Part 2, Visual – coding visual data
• Part 3, Audio – coding audio data, enhancements to Advanced
Audio Coding and new techniques
• Part 4, Conformance testing
• Part 5, Reference software
• Part 6, DMIF (Delivery Multimedia Integration Framework)
• Part 7, optimized reference software for coding audio-video
objects
• Part 8, carry MPEG-4 content on IP networks
MPEG-4 parts (2)
• Part 9, reference hardware implementation
• Part 10, Advanced Video Coding (AVC)
• Part 11, Scene description and application engine; BIFS (Binary
Format for Scene) and XMT (Extensible MPEG-4 Textual format)
• Part 12, ISO base media file format
• Part 13, IPMP extensions
• Part 14, MP4 file format, version 2
• Part 15, AVC (advanced Video Coding) file format
• Part 16, Animation Framework eXtension (AFX)
• Part 17, timed text subtitle format
• Part 18, font compression and streaming
• Part 19, synthesized texture stream
MPEG-4 parts (3)
• Part 20, Lightweight Application Scene Representation
(LASeR) and Simple Aggregation Format (SAF)
• Part 21, MPEG-J Graphics Framework eXtension (GFX)
• Part 22, Open Font Format
• Part 23, Symbolic Music Representation
• Part 24, audio and systems interaction
• Part 25, 3D Graphics Compression Model
• Part 26, audio conformance
• Part 27, 3D graphics conformance
Motivations for MPEG-4
• Broad support for MM facilities are available
• 2D and 3D graphics, audio and video – but
• Incompatible content formats
• 3D graphics formats as VRML are badly integrated to
• 2D formats as FLASH or HTML
• Broadcast formats (MHEG) are not well suited for the Internet
• Some formats have a binary representation – not all
• SMIL, HTML+, etc. solve only a part of the problems
• Both authoring and delivery are cumbersome
• Bad support for multiple formats
MPEG-4: Audio/Visual (A/V)
• Simple video codingObjects
(MPEG-1 and –2)
• A/V information is represented as a sequence of rectangular frames:
Television paradigm
• Future: Web paradigm, Game paradigm … ?
• Object-based video coding (MPEG-4)
• A/V information: set of related stream objects
• Individual objects are encoded as needed
• Temporal and spatial composition to complex scenes
• Integration of text, “natural” and synthetic A/V
• A step towards semantic representation of A/V
• Communication + Computing + Film (TV…)
Main parts of MPEG-4
1. Systems
– Scene description, multiplexing, synchronization, buffer management,
intellectual property and protection management
2. Visual
– Coded representation of natural and synthetic visual objects
3. Audio
– Coded representation of natural and synthetic audio objects
4. Conformance Testing
– Conformance conditions for bit streams and devices
5. Reference Software
– Normative and non-normative tools to validate the standard
6. Delivery Multimedia Integration Framework (DMIF)
– Generic session protocol for multimedia streaming
Main objectives – rich data
• Efficient representation for many data types
• Video from very low bit rates to very high quality
• 24 Kbs .. several Mbps (HDTV)
• Music and speech data for a very wide bit rate range
• Very low bit rate speech (1.2 – 2 Kbps) ..
• Music (6 – 64 Kbps) ..
• Stereo broadcast quality (128 Kbps)
• Synthetic objects
• Generic dynamic 2D and 3D objects
• Specific 2D and 3D objects e.g. human faces and bodies
• Speech and music can be synthesized by the decoder
• Text
• Graphics
Main objectives – robust +
• pervasive
Resilience to residual errors
• Provided by the encoding layer
• Even under difficult channel conditions – e.g. mobile
• Platform independence
• Transport independence
• MPEG-2 Transport Stream for digital TV
• RTP for Internet applications
• DAB (Digital Audio Broadcast) . . .
• However, tight synchronization of media
• Intellectual property management + protection
• For both A/V contents and algorithms
Main objectives - scalability
• Scalability
• Enables partial decoding
• Audio - Scalable sound rendering quality
• Video - Progressive transmission of different quality levels
- Spatial and temporal resolution
• Profiling
• Enables partial decoding
• Solutions for different settings
• Applications may use a small portion of the standard
• “Specify minimum for maximum usability”
Main objectives - genericity
• Independent representation of objects in a scene
• Independent access for their manipulation and re-use
• Composition of natural and synthetic A/V objects into one
audiovisual scene
• Description of the objects and the events in a scene
• Capabilities for interaction and hyper linking
• Delivery media independent representation format
• Transparent communication between different delivery
environments
Object-based architecture
MPEG-4 as a tool box
• MPEG-4 is a tool box (no monolithic standard)
• Main issue is not a better compression
• No “killer” application (as DTV for MPEG-2)
• Many new, different applications are possible
• Enriched broadcasting, remote surveillance, games, mobile
multimedia, virtual environments etc.
• Profiles
• Binary Interchange Format for Scenes (BIFS)
• Based on VRML 2.0 for 3D objects
• “Programmable” scenes
• Efficient communication format
MPEG-4 Systems part
MPEG-4 scene, VRML-like model
Logical scene structure
MPEG-4 Terminal Components
Digital Terminal Architecture
BIFS tools – scene features
• 3D, 2D scene graph (hierarchical structure)
• 3D, 2D objects (meshes, spheres, cones etc.)
• 3D and 2D Composition, mixing 2D and 3D
• Sound composition – e.g. mixing, “new instruments”, special
effects
• Scalability and scene control
• Terminal capabilities (TermCab)
• MPEG-J for terminal control
• Face and body animation
• XMT - Textual format; a bridge to the Web world
BIFS tools – command protocol
• Replace a scene with this new scene
• A replace command is an entry point like an I-frame
• The whole context is set to the new value
• Insert node in a grouping node
• Instead of replacing a whole scene, just adds a node
• Enables progressive downloads of a scene
• Delete node - deletion of an element costs a few bytes
• Change a field value; e.g. color, position, switch on/off
an object
BIFS tools – animation protocol
• The BIFS Command Protocol is a synchronized, but non
streaming media
• Anim is for continuous animation of scenes
• Modification of any value in the scene
– Viewpoints, transforms, colors, lights
• The animation stream only contains the animation values
• Differential coding – extremely efficient
Elementary stream management
• Object description
• Relations between streams and to the scene
• Auxiliary streams:
• IPMP – Intellectual Property Management and Protection
• OCI – Object Content Information
• Synchronization + packetization
– Time stamps, access unit identification, …
• System Decoder Model
• File format - a way to exchange MPEG-4 presentations
An example MPEG-4 scene
Object-based compression and
delivery
Linking streams into the scene (1)
Linking streams into the scene (2)
Linking streams into the scene (3)
Linking streams into the scene (4)
Linking streams into the scene (5)
Linking streams into the scene (6)
• An object descriptor contains ES descriptors pointing to:
• Scalable coded content streams terminal may
select suitable
• Alternate quality content streams streams

• Object content information


• IPMP information
• ES descriptors have subdescriptors to:
• Decoder configuration (stream type, header)
• Sync layer configuration (for flexible SL syntax)
• Quality of service information (for heterogeneous nets)
• Future / private extensions
Describing scalable content
Describing alternate content
versions
Decoder configuration info in older
standards

cfg = configuration information (“stream headers”)


Decoder configuration information
in MPEG-4

• the OD (ESD) must be retrieved first


• for broadcast ODs must be repeated periodically
The Initial Object Descriptor
• Derived from the generic object descriptor
– Contains additional elements to signal profile and level (P&L)
• P&L indications are the default way of content selection
– The terminal reads the P&L indications and knows whether it
has the capability to process the presentation
• Profiles are signaled in multiple separate dimensions
• Scene description
• Graphics
• Object descriptors
• Audio
• Visual
• The “first” object descriptor for an MPEG-4 presentation is
always an initial object descriptor
Transport of object descriptors
• Object descriptors are encapsulated in OD commands
– ObjectDescriptorUpdate / ObjectDescriptorRemove
– ES_DescriptorUpdate / ES_DescriptorRemove
• OD commands are conveyed in their own object descriptor stream
in a synchronized manner with time stamps
– Objects / streams may be announced during a presentation
• There may be multiple OD & scene description streams
– A partitioning of a large scene becomes possible
• Name scopes for identifiers (OD_ID, ES_ID) are defined
– Resource management for sub scenes can be distributed
• Resource management aspect
- If the location of streams is changed, only the ODs need modification. Not
the scene description
Initial OD pointing to scene and
OD stream
Initial OD pointing to a scalable scene
Auxiliary streams
• IPMP streams
• Information for Intellectual Property Management and Protection
• Structured in (time stamped) messages
• Content is defined by proprietary IPMP systems
• Complemented by IPMP descriptors
• OCI (Object Content Information) streams
• Meta data for an object (“Poor man’s MPEG-7”)
• Structured descriptors conveyed in (time stamped) messages
• Content author, date, keywords, description, language, ...
• Some OCI descriptors may be directly in ODs or ESDs
• ES_Descriptors pointing to such streams may be attached to any object
descriptor – scopes the IPMP or OCI stream
• An IPMP stream attached to the object descriptor stream is valid for all streams
Adding an OCI stream to an audio
stream
Adding OCI descriptors to audio
streams
Linking streams to a scene –
including “upstreams”
MPEG-4 streams
Synchronization of multiple
• elementary
Based on two streams
well known concepts
• Clock references
– Convey the speed of the encoder clock

• Time stamps
– Convey the time at which an event should happen

• Time stamps and clock references are


• defined in the system decoder model
• conveyed on the sync layer
System Decoder Model (1)
System Decoder Model (2)
• Ideal model of the decoder behavior
– Instantaneous decoding – delay is implementation’s problem
• Incorporates the timing model
– Decoding & composition time
• Manages decoder buffer resources
• Useful for the encoder
• Ignores delivery jitter
• Designed for a rate-controlled “push” scenario
– Applicable also to flow-controlled “pull” scenario
• Defines composition memory (CM) behavior
• A random access memory to the current composition unit
• CM resource management not implemented
Synchronization of elementary streams with
time events in the scene description

• How are time events handled in the scene


description?
• How is this related to time in the elementary
streams?
• Which time base is valid for the scene description?
Cooperating entities in synchronization
• Time line (“object time base”) for the scene
• Scene description stream with time stamped BIFS access
units
• Object descriptor stream with pointers to all other streams
• Video stream with (decoding & composition) time stamps
• Audio stream with (decoding & composition) time stamp
• Alternate time line for audio and video
A/V scene with time bases and
stamps
Hide the video at time T1
Hide the video on frame boundary
The Synchronization Layer (SL)
• Synchronization layer (short: sync layer or SL)
• SL packet = one packet of data
• Consists of header and payload
• Defines a “wrapper syntax” for the atomic data: access unit
• Indicates boundaries of access units
• AccessUnitStartFlag, AccessUnitEndFlag, AULength
• Provides consistency checking for lost packets
• Carries object clock reference (OCR) stamps
• Carries decoding and composition time stamps (DTS, CTS)
Elementary Stream Interface (1)
Elementary Stream Interface (2)
Elementary Stream Interface (3)
Elementary Stream Interface (4)
The sync layer design
• Access units are conveyed in SL packets
• Access units may use more than one SL packet
• SL packets have a header to encode the information
conveyed through the ESI

SL packets that don’t start an AU have a smaller header


How is the sync layer designed ?
• As flexible as possible to be suitable for
• a wide range of data rates
• a wide range of different media streams
• Time stamps have
• variable length
• variable resolution
• Same for clock reference (OCR) values
• OCR may come via another stream
• Alternative to time stamps exists for lower bit rate
• Indication of start time and
• duration of units (accessUnitDuration,compositionUnitDuration)
SLConfigDescriptor syntax
class SLConfigDescriptor {example
uint (8) predefined;
if (predefined==0) {
bit(1) useAccessUnitStartFlag;
bit(1) useAccessUnitEndFlag;
bit(1) useRandomAccessPointFlag; SDL-
Syntax Description
bit(1) usePaddingFlag; Language

bit(1) useTimeStampsFlag;
uint(32) timeStampResolution;
uint(32) OCRResolution;
uint(6) timeStampLength;
uint(6) OCRLength;
if (!useTimeStamps) {
................
Wrapping SL packets in a suitable
layer
MPEG-4 Delivery Framework
(DMIF)
The MPEG-4 Layers and DMIF
• DMIF hides the delivery technology
• Adopts QoS metrics
• Compression Layer
• Media aware
• Delivery unaware
• Sync Layer
• Media unaware
• Delivery unaware
• Delivery Layer
• Media unaware
• Delivery aware
DMIF communication architecture
Multiplex of elementary streams
• Not a core MPEG task
• Just respond to specific needs for MPEG-4 content
transmission
• Low delay
• Low overhead
• Low complexity
• This prompted the design of the “FlexMux” tool
• One single file format desirable
• This lead to the design of the MPEG-4 file format
Modes of FlexMux
How to configure MuxCode mode ?
A multiplex example
Multiplexing audio channels in
FlexMux
Multiplexing all channels to
MPEG-2 TS
MPEG-2 Transport Stream
MPEG-4 content access procedure
• Locate an MPEG-4 content item (e.g. by
URL) and connect to it
– Via the DMIF Application Interface (DAI)
• Retrieve the Initial Object Descriptor
• This Object Descriptor points to an
BIFS + OD stream
– Open these streams via DAI
• Scene Description points to other streams
through Object Descriptors
- Open the required streams via DAI
• Start playing!
MPEG-4 content access example

You might also like