You are on page 1of 30

Introduction to Multimedia

Synchronization
Klara Nahrstedt
cs598KN
Content
 Notion of Synchronization
 Intra-object and inter-object synchronization
 Live and synthetic Synchronization
 Synchronization Requirements
 Reference Model for Synchronization
 Synchronization in Distributed Environments
 Synchronization Specification
Notion of Synchronization
 Multimedia synchronization is understood in
correspondence of content relation, spatial
relation and temporal relation
 Content Relation: defines dependency of
media objects for some data
– Example: dependency between a filled
spreadsheet and graphics that represent data
listed in spreadsheet

Spatial Relation
 Spatial relation is represented through layout relation
and defines space used for presentation of a media
object on an output device at a certain point of time
in a multimedia document
– Example: desktop publishing
– Layout frames are placed on an output device and a content
is assigned to this frame
 Positioning of layout frames:
– Fixed to a position of a document
– Fixed to a position on a page
– Relative to the position of other frame
– Concept of frames is used for positioning of time-dependent
objects
 Example: in window-based systems, layout frames correspond
to windows and video can be positioned in a window
Temporal Relation
 Temporal relation defines temporal dependencies
between media objects
– Example: lip synchronization
 This relation is the focus of our papers, we will not
talk about content or spatial relation
 Time-dependent objects – represent a media stream
because there exists temporal relations between
consecutive units of the stream
 Time-independent objects are traditional media such
as images or text.

Temporal Relations (2)
 Temporal synchronization is supported by
many system components:
– OS (CPU scheduling, semaphores during IPC)
– Communication systems (traffic shaping, network
scheduling)
– Databases
– Document handling
 Synchronization needed at several levels of
a Multimedia System
Temporal Relations (3)
 1. level: OS and lower communication layers bundle
single streams
– Objective is to avoid jitter at the presentation time of one
stream
 2. level: on top of this level is the RUN-TIME support
for synchronization of multimedia streams
(schedulers)
– Objective is bounded skews between various streams
 3. level: next level holds the run-time support for
synchronization between time-dependent and time-
independent media together with handling of user
interaction
– Objective is bounded skews between time-dependent and
time-independent media

Specification of Synchronization
 Implicit Specification
– Temporal relation may be specified implicitly during
capturing of the media objects; the goal of a presentation is
to present media in the same ways as they were originally
captured
 Audio/video recording and playback /VOD applications
 Explicit Specification
– Temporal relation may be specified explicitly in the case of
presentations that are composed of independently captured
or otherwise created objects
 Slide show where presentation designed
– Selects appropriate slides
– Creates audio slides
– Defines units of audio presentation stream
– Defines units of audio presentation stream where slides have to
be presented
Inter-object and Intra-Object
Synchronization
 Intra-object synchronization refers to the time
relation between various presentation units of one
time-dependent media object


 Inter-object synchronization refers to the
synchronization between media objects

40ms
40ms
Audio 1
Video
Slides
Animation
Audio 2
Classification of Synchronization Units
 Logical Data Units (LDU)
– Samples or Pixels,
– Notes or frames
– Movements or scenes
– Symphony or movie
 Fixed LDU vs Variable LDU
 LDU specification during recording vs LDU
defined by user
 Open LDU vs Closed LDU
Live Synchronization
 Goal is to exactly reproduce at a
presentation the temporal relations as they
existed during the capturing process
 Need to capture temporal relation information
during the capturing
 Live Sync is needed in conversational
services
– Video Conferencing, Video Phone
– Recording and Retrieval services are considered
retrieval services, or presentations with delay
Synthetic Synchronization
 Temporal relations are artificially specified
 Often used in presentation and retrieval-based
systems with stored data objects that are arranged
to provide new combined multimedia objects
– Authoring and tutoring systems
 Need synchronization editors to support flexible
synchronization relations between media
 Two phases: (1) specification phase defines
temporal relations, (2) presentation phase presents
data in a synchronized mode
– 4 audio messages recorded relate to parts of an engine in
an animation; the animation sequence shows a slow 360
degrees rotation of the engine.
Synchronization Requirements
 For intra-object synchronization:
– Accuracy concerning jitters and end-to-end delays in the
presentations of LDUs
 For inter-object synchronization:
– Accuracy in the parallel presentation of media objects
 Implication of blocking method:
– Fine for time-independent media
– Gap problem for time-dependent media
 What does the blocking of a stream mean for the output
device?
 Should be repeated previous parts in case of speech or
music?
 Should the last picture of a stream be shown?
 How long can such a gap exist?
Synchronization Requirements (2)
 Solutions to Gap problem
– Restricted blocking method
 Switching to alternative presentation if gap between late video
and audio exceeds a predefined threshold
 Show last picture as a still image
– Re-sampling of a stream
 Speed up or slow down streams for the purpose of
synchronization
– Off-line-re-sampling – used after capturing of media streams
with independent devices
 Concert which is captured with two independent audio and
video devices
– Online re-sampling – used during a presentation in the case that
at run-time a gap between media streams occurs
Synchronization Requirements (3)
 Lip Synchronization Requirements refer to temporal
relation between audio and video streams for human
speaking
 Time difference between related audio and video LDUs is
called synchronization skew
 Streams are in sync if skew = 0 or skew ≤ bound
 Streams are out of sync if skew > 0
 Bounds:
– Audio/Video in sync means -80ms ≤ skew ≤ 80ms
– Audio/Video out of sync means skew < -160ms or skew > 160ms
– Transient means -160ms ≤ skew < -80ms, and 80ms < skew ≤
160ms
Synchronization Requirements (4)
 Pointer Synchronization Requirements are very
important in computer-supported cooperative work
(CSCW)
 We need synchronization between graphics, pointers
and audio
 Comparison
– Lip sync error is skew between 40 to 60ms
– Pointer sync error is skew between 250 to 1500ms
 Bound:
– Pointer/Audio/Graphics in sync means -500ms ≤ skew
750ms
– Out of sync means skew < -1000ms and skew > 1250ms
– Transient means -1000ms ≤ skew < -500ms and 750ms <
skew ≤ 1250ms
Synchronization Requirements (5)
 Digital audio on CD-ROM:
– maximum allowable jitter delay in perception experiments 5-
10ns, other experiments suggest 2ms
 Combination of audio and animation is not as
stringent as lip synchronization
– Maximum allowable skew is +/- 80ms
 Stereo audio is tightly coupled
– Maximum allowable skew is 20ms; because of listening
errors, suggestion for skew is +/- 11ms
 Loosely coupled audio channels: speaker and
background music
– Maximum allowable skew is 500ms.
Synchronization Requirements (6)
 Production-level synchronization should be
guaranteed prior to the presentation of data
at the user interface
– For example, in case of recording of synchronized
data for subsequent playback
 Stored data should be captured and recorded with no
skew
 For playback, the defined lip sync boundaries are 80 ms
 For playback at local and remote workstation
simultaneously, sync skews should be between -160ms
and 0ms (video should be ahead of audio for remote
station due to pre-fetching)
Synchronization Requirements (7)
 Presentation-level synchronization should be
defined at the user interface
 This synchronization focuses on human
perception
– Examples
 Video and image overlay +/- 240ms
 Video and image non-overlay +/- 500ms
 Audio and image (music with notes) +/- 5ms
 Audio and slide show (loosely coupled image) +/- 500ms
 Audio and text (Text annotation) +/-240ms
Reference Model for Synchronization
 Synchronization of multimedia objects are
classified with respect to four-level system
SPECIFICATION LEVEL (is an open layer, includes applications and tools
Which allow to create synchronization specifications, e.g. sync editors); editing
And formatting, mapping of user QoS to abstractions at object level
OBJECT/SERVICE LEVEL (operates on all types of media and hides
differences between discrete and continuous media), plan and coordinate
presentations, initiate presentations
STREAM LEVEL (operates on multiple media streams, provides inter-
stream synchronization), resource reservation and scheduling
MEDIA LEVEL (operates on single stream; treaded as a sequence of LDUs,
provides intra-stream synchronization), file and device access
Synchronization in Distributed
Environments
 Information of synchronization must be transmitted with audio
and video streams, so that the receiver side can synchronize the
streams
 Delivery of complete sync information can be done before the
start of presentation
– This is used in synthetic synchronization
 Advantage: simple implementation
 Disadvantage: presentation delay
 Deliver of complete sync information can be using out-of-band
communication via a separate sync channel
– This is used in live synchronization
 Advantage: no additional presentation delays
 Disadvantage: additional channel is needed; additional errors can occur

Synchronization in Distributed
Environments (2)
 Delivery of complete synchronization
information can be done using in-band
communication via multiplexed data streams,
i.e., synchronization information is in headers
of the multimedia PDU
– Advantage: related sync information is delivered
together with media units
– Disadvantage: difficult to use for multiple sources
Synchronization in Distributed
Environments (3)
Location of Synchronization Operations
 It is possible to synchronize media objects by recording
objects together and leave them together as one object,
i.e., combine objects into a new media object during
creation; Synchronization operation happens then at
the recording site
 Synchronization operation can be placed at the sink. In
this case the demand on bandwidth is larger because
additional sync information must be transported
 Synchronization operation can be placed at the source.
In this case the demand on bandwidth is smaller
because the streams are multiplexed according to
synchronization requirements


Synchronization in Distributed
Environments (4)
Clock Synchronization
 Consider synchronization accuracy between
clocks at source and destination
 Global time-based synchronization needs
clock synchronization
 In order to re-synchronize, we can allocate
buffers at the sink and start transmission of
audio and video in advance, or use NTP
(Network Time Protocol) to bound the
maximum clock offset
Synchronization in Distributed
Environments (5)
Other Synchronization Issues
 Synchronization in distributed environment is a multi-
step process
– Sync must be considered during object acquisition (during
video digitization)
– Sync must be considered during retrieval (synchronize
access to frames of a stored video)
– Sync must be considered during delivery of LDUs to the
network (traffic shaping)
– Sync must be considered during transport (use isochronous
protocols if possible)
– Sync must be considered at the sink (sync delivery to the
output devices)

Synchronization Specification Methods
 Interval-based Specification
– presentation duration of an object is considered
as interval
 Examples of operations: A before(0) B, A overlap B, A
starts B, A equals B, A during B, A while(0,0) B
– Advantage: easy to handle open LDUs ad
therefore user interactions
– Disadvantage: model does not include skew
specifications
Audio 1
Video 1
slides
animation
Audio 2
Synchronization Specification (2)
 Control Flow-based Specification – Hierarchical
Approach
– Flow of concurrent presentation threads is synchronized in
predefined points of the presentation
– Basic hierarchical specification: 1. serial synchronization, 2.
parallel synchronization of actions
– Action can be atomic or compound
– Atomic action handles presentation of a single media object,
user input or delay
– Compound actions are combinations of synchronization
operators and atomic actions
– Delay as an atomic action allows modeling of further
synchronization (e.g. delay in serial presentation)
Synchronization Specification (3)
 Control Flow-based Specification – Hierarchical Approach










 Advantage: easy to understand, natural support of hierarchy and integration of
interactive objects is easy
 Disadvantage: Additional description of skews and QoS is necessary, we must add
presentation durations


Audio 1 Video
Slides
Animation
Audio 2
Synchronization Specification (3)
 Control flow-based Synchronization Specification –
Timed Petri Nets


 Advantages: Timed Petri nets allow all kinds of
synchronization specification
 Disadvantage: Difficulty with complex specification
and offers insufficient abstraction of media object
content because the media objects must be split into
sub-objects

transition Input place with token
Output place
Summary
 Different synchronization frameworks
– Little’s synchronization framework (Boston University)
 Goal – support retrieval and delivery of multimedia
– Firefly System – Buchanan and Zellweger
 Goal – generate automatically consistent presentation schedules
for interactive documents
– HyTime – standard system with hypermedia/time-based
structuring language
 Goal – standard for structured representation of hypermedia
information
 HyTime is an application of the Standardized Markup Language
(SGML)