You are on page 1of 37


Marcus Lynch
Multimedia Systems: Contexts and Applications

Persistence of Vision
As with animation, film and video rely on the
phenomenon of persistence of vision to
create an illusion of continuous visual
Fusion frequency is speed of changes
needed to achieve this illusion - typically 40
images per second. Below this speed, a
flickering effect is perceived

24 frames per second?

Film is shot at 24 frames per second, but film
projectors interrupt the projection, effectively
displaying each frame twice to give 48
frames per second
NTSC TV & video = 30 fps with each frame
in two interleaved halves, giving 60 fps
PAL & SECAM systems use 25 interleaved

Video as fast Image

For live-action video it is necessary to
Record images fast enough to achieve
convincing representation of real-time
Interface to equipment conforming to
requirements and standards defined for
broadcast TV - even though these
standards are largely irrelevant to
computer-based playback
Several different standards around the

Video demands

Video places considerable strain on current

processing, storage and data transmission
capabilities of computer systems
High consumer demand, but expectations
shaped by broadcast TV
Typically video targeted at consumer
equipment plays back at reduced frame
rates in small windows with compression
To accommodate low-end PC limitations,
considerable compromises over quality
must be made, resulting in dancing
postage stamps.

Digitizing Video
Need to keep size constantly in mind
Video equipment digitizes time-varying signals
from sensors to generate bitmapped images
Using 24-bit colour, each frame of NTSC video
occupies 640x480x3 bytes = 900Kb. So, one
second of uncompressed NTSC video at ~30 fps
occupies 26Mb,one minute 1.6Gb
Storing such large amounts of data on CD-ROMS
or DVD not currently feasible, nor is transmission
over any but the fastest network - NOT Internet!
Disk arrays using SCSI standards can store entire
films - only film & TV studios can afford at present

Compressed Video

Compression of digital video images is effective

approach for multimedia & lower-end equipment
Other approaches such as limiting frame size
reduce quality, but often impossible to avoid
For real-time video capture this compression must
be carried out so fast that dedicated hardware is
Digital video may be captured directly from a
camera (digitizing and compressing video signal
inside camera then sending to computer via
Firewire) or indirectly from a VTR or broadcast
signal via video capture card (analogue to digital
converter), usually compressing the data before
storage or transmission

Avoiding signal degradation

Digitizing in camera has one big advantage
Analogue signals transmitted over cable will inevitably
suffer some corruption by noise, for even short
Noise corrupts analogue signals stored on magnetic
Composite video signals (domestic) suffer distortion
because of interference between colour & brightness
information (especially on VHS)

All these can affect effectiveness of compression

approaches both inter-frame and intra-frame
since random noise distortion will disrupt
efficiency of process

Digitization in Camera

Produces only digital signals, which are resistant to

corruption by noise and interference. Hence
transmission down cables and storage on magnetic
tape can be achieved without loss of quality
Disadvantage of in-camera digitization is lack of
user control over process. Data stream must
conform to appropriate standard, e.g. DV, which
stipulate data rate for the data stream, and
therefore the amount of compression applied
Analogue video capture boards and associated
software allow user control over compression
parameters, allowing trade-offs between picture
quality, data rate and file sizes


Devices which compress and decompress signals

(i.e. compressor/decompressors) are known as
codecs (c.f. modulator/demodulators, i.e. modems)
Hardware codecs can capture video signals, store
them on computer and play them back at full motion
to video monitor. Most hardware codecs cannot
provide full screen motion to computer monitors
Software codecs needed since audiences may not
have hardware. They are slower, but allow
computer monitor playback albeit with lower quality
Suitable algorithms for software codecs are different
to those for hardware codecs, so often have to
recompress and lose quality for multimedia

Analogue Broadcast Standards

NTSC, named after National Television Systems
Committee. Used in N America, Japan, Taiwan,
parts of Caribbean&S America
PAL (Phase Alternating Line - referring to way
signal encoded) in most of W Europe, Antipodes &
SECAM (Squential Couleur Avec Mmoire) used
in France, E Europe and former USSR
Africa follows pattern of former colonisers
Each has different frame rate (originally matching
local AC line frequency) and number of lines per

Analogue standards

All frames divided into two interlaced fields, one

consisting of the odd-numbered lines of each
frame, the other of the even lines. Transmitted
one after the other, so frame built up by
interlacing fields
NTSC 29.97 fps (originally 30 fps changed when
colour interfered with sound). 525 lines (480
PAL 25 fps 625 lines (576 picture)
SECAM 25 fps 625 lines (576 picture)
Film footage 24fps, so mismatch with all video

Digital Video Standards

Just as complicated as analogue
Inevitable because of need for
backward compatibilityorthogonal to scanning formats
and field rates, do must represent
both 625/50 and 525/59.94 as well
as emerging standards such as
Rec. ITU-R BT.601 defines
sampling of digital video
More complex than we wil discuss
on this module, but read Chapman
& Chapman if interested

Video Compression I

Video data usually compressed when digitised

Compression usually optimised for playback
through same devices
For multimedia cannot assume any particular
device - assume computer processor and video
data delivered from hard disk, CD-ROM/DVD, or
network connection
Usually prepare video material using a software
codec, Typically video data will be compressed

Video Compression II

All video compression algorithms

operate on sequence of bit-mapped
Can compress each image in isolation,
as for still images. This is called spatial
compression or intra-frame compression
Can compress sub-sequences of frames
by only storing the differences between
them. This is called temporal
compression or inter-frame compression
Can be used together

Spatial Compression

Spatial compression is image compression

applied to a sequence of images. Thus lossy
and lossless methods, with same trade-offs as
for still images
Generally lossless methods do not compress
enough to be practical
Compression and recompression reduce image
Post-production work (special effects &
corrections) carried out on uncompressed
footage so that adjustments can be made to
individual pixels of original frame
Exact copies suffer no generational loss

Temporal Compression
Certain frames in sequence designated key
frames (often at regular intervals e.g every 6th
Key frames only spatially compressed & frames
between key frames are replaced by difference
frames - record only differences between current
frame and key frame
Differences typically affect only small portion of
frame (e.g. talking head shots) & therefore take
up less data
Can also apply to interlaced fields of video as
well as frames since conceptually similar

Symmetry of Compression
Compression and decompression may take place
simultaneously (a symmetrical codec) or not
(asymmetrical codec)
Generally compression takes longer than
decompression. If decompression takes too long,
the codec is unusable
Can measure amount of data loss but not the
effect on quality of compression. Terms to
describe video quality are so vague as to be
Broadcast quality means good enough to be
broadcast (better than received!)

Motion JPEG and DV

Motion JPEG most popular approach to

compressing analogue video during capture =
apply JPEG compression frame by frame with no
temporal compression. Sometimes known as
Usually specialised hardware, with different
implementations of algorithms
Recently MJPEG-A agreed by consortium of
digital video companies
Motion JPEG allows user to specify quality setting
- video codecs often allow maximum data rate to
be specified from which quality setting deduced

Motion JPEG & DV II

Low to mid range capture cards achieve ~3Mb

per second (about 7:1 compression ratio)
Multimedia use invariably means further
compression so starting with highest quality input
data may not be worthwhile if final delivery is via
CD-ROM or Internet
If data can be delivered fast enough, MJPEG
video can be played back at full frame size &
standard data rates on special-purpose hardware
DV similar to MJPEG, but different approach to
inter-frame compression(only between two fields
of each frame). Uncompressed video is best to
archive, followed by DV then MJPEG

Software Codecs 4 Multimedia

Four considered especially suitable for compressing
video for delivery via CD-ROM or over Internet
(make a cup of coffee!)
MPEG-1, Cinepak, Intel Indeo, Sorenson

The last three all use vector quantisation, which

works on blocks of pixels (vectors) and a code book
(a pattern recognition library) with a closest-fit
algorithm. This gives smaller list of code book
indices - decompression thus quick, but
compression typically 150 times as long (heavy
computation). Augmented by temporal
compression using key and difference frames
Usually small frames at reduced playback rate

MPEG video

MPEG-1 relevant standard for video incorporated in

multimedia, as it is designed for lower data rates
than MPEG-2 (broadcast digital video standard and
DVDs ) and it allows progressive rendering
Rather than define a compression algorithm,
MPEG-1 defines a data stream syntax and a
Implied compressor combines temporal
compression based on motion compensation with
spatial compression similar to JPEG
Motion compensation attempt to identify coherent
area of movement within sequences of frames

Motion Compression
MPEG key frames called I-pictures (I for intra) purely spatially compressed
Difference frames using previous pictures are
called P-pictures
Difference frames predicted from later frames are Bpictures
Can encode viseo clip as sequence of I-,P and Bpictures. B =1.5xP=3xI compression, but B more
computationally intensive to reconstruct & random
access to P- and B-pictures is difficult
For decoding, rearrange into bitstream order from
display order

Because of diversity of codecs and data formats,
little hope of a universal video file format
Base standardisation on architectural framework, at
level abstract enough to allow many instantiations,
Quicktime has established itself as a de facto
Introduced by Apple in 1991, Quicktime
manipulates movies as objects. Here movie can be
seen as an abstraction of a video sequence but can
contain other types of media - really a framework for
organising, accessing and manipulating the data
that represnet the video frames, which may be
stored separately from the movie itself

Quicktime II
Originally Quicktime focus was on temporal
aspects of time-based media - records rate at
which it should be played back & current
position plus synchronisation to allow correct
playback speed on any system.
If cannot maintain frame rate, some frames
are dropped to achieve correct duration and
synchronisation with other elements (e.g.
Time co-ordinate system also allows nonlinear access to (and therefore editing of)
individual frames

Quicktime III
Quicktime success is because it is a componentbased architecture, and can plug components into
structure to deal with new formats
Standard components include compressors,
sequence grabbers, movie controller, transcoders
Quicktime has its own file format - flexible way of
storing video and other media but also allows
treatment of other file formats as if native QT (eg
AVI, MPEG-1 and DV)
Developers of new formats & codecs do not need
to write higher level functionality as well
Wide range of platforms

Digital Video Editing & Post


Shooting and recording video provides raw

material only. Finished piece of video needs more
Editing = process of making a constructed whole
from a collection of parts by selecting, trimming &
organising raw footage & combining with sound as
well as applying transitions
Post-production = making changes or adding to
the material. Many changes are generalised
image manipulations but also include compositing
e.g. figures inserted into separately shot
background scenes, elements may be animated,
animations combined with live action etc

Filmic Conventions
100= years of filmmaking have generated
elaborate ete of conventions about how film (and
thus video at first) edited
Action sequences cut on the action
Flashback sequences introduced by long dissolve with
narration overlapping the two scenes
Horror film music tracks are very predictable!

Overturning conventions results in avant garde

Jump cuts
Cut straight to flashbacks

All film & video is constructed, e.g.two-person

conversation is constructed in editing suite. Live
webcams are framed in certain ways. Always
need to shoot footage and edit it

Traditional video editing

Advantages of non-linear video editing which

digitisation brings are too compelling to resist
Editing is a physical process of actually cutting film
and splicing together bits of film and needs a lot of
Creating transitions between clips not
straightforward and needs special equipment and
allows little room for experimentation
Traditional video editing same in principle, but in
practice cannot cut or splice video tape accurately,
so copy onto another tape in the desired order and
fade in effects etc( best with 3 machine setup).
Need precise machines, and use SMPTE
Generational loss of copying videotape

Digital video editing

Generational loss (VHS = 2 copies) and need to

construct video linearly inflexible way of working
Digitisation allows different way of working that
brings video editing closer to film editing - random
access and change of data as in WP. Nondestructive and allows endless footling!
Instantaneous playback possible on high-end kit
Time not saved because of increase set of options
(compare DTPs effect on office productivity!) ,
dreams of perfectionism & more experimentation
carried out before final editing decisions made (cf
bands and mixes!)

Digital Video post production

Video editing concerned with arrangement of

picture through time and its synchronisation with
sound. Image manipulation on individual frames
Image correction, e.g. over- or underexposed, out
of focus, colour cast, unacceptable digitisation
artefacts. Each has remedy - Photoshop-like
applications that allow application to sequences
of images. Can use key-frame and interpolation
Blue screening used for inserting isolated images
in shots traditionally. Can use chroma keying for
any colour to be transparent. Mattes and
travelling mattes (or track mattes) = sequences of
frame mattes. Titling facilities

Preparing Video 4 MM delivery

Extra step needed for multimedia - must cope with

limitations of final delivery medium and playback
Compromises to decide what must be sacrificed different material suggests different choices.
Can sacrifice frame size, frame rate, colour depth,
image quality. Frame size x 0.25 , frame rate 15
fps, 8 bit (not 24) colour/256 greyscale (not all
codecs support) can reduce file size by factor of
4x2x3 = 12. If not enough, need to compress - all
codecs introduce visible artefacts at high
compression ratios
Platform considerations - QT self-contained
movies, network connection speeds

Streamed Video and Video

Streamed video = delivering a video data stream
from a remote server, to be displayed as it arrives.
Possibility of live video and video conferencing
Bandwidth fundamental obstacle, so decent quality
streamed video restricted to LANS & T1 lines,
ADSL or cable modems. Dial-up Internet
connections cannot handle required data rate
without loss of quality
Even with bandwidth must deliver with minimum
delay and jitter which loses synchronisation
between video & audio tracks

HTTP Streaming
Progressive download, or HTTP streaming is more
commonly seen on WWW. Not true streaming, but
a refinement of embedded video, where whole file
is transferred before playing back from disk. With
HTTP streaming, movie starts playing when
enough of it has been delivered i.e. when time
needed to download remainder of data = duration
of entire movie
Usually appreciable delay before playback
because of available bandwidth. Entire movie is
downloaded onto disk so need space, cannot be
used for live video. And does not allow you to skip
over parts of the file without downloading them

True Streaming Video

Never stored on users disk. Small buffer may be
used to smooth out jitter but essentially each frame
played as it arrives
Can be open-ended, so can be used for live video
Length of recorded movie limited by server storage
not viewers machine
Random access to specific points in a stream is
possible (except live video)
These characteristics make it suitable for video on
demand applications, but network must be able to
deliver data rate fast enough for playback - fastest
Internet connection not quality of broadcast TV
Democratic access & interactivity 2 reasons to use

Streaming QT & RealVideo

Two leading architectures are Streaming QuickTime and Real Networks

Both based on Internet standard protocols especially RTSP (Real Time
Streaming Protocol), used to control playback of video stream carried over
network using Real Time Protocol, RTP.
Both can be embedded in Web pages - can embed Streaming QT in any
QT-supporting application
Both provide ways to provide several different versions of a movie
compressed for different connection speeds. Both can support live video
streaming and progressive download.
Little to choose between them - depends on purpose & audience

Codecs for Streamed Video &

Video conferencing
Codecs designed especially for video conferencing are
designed for streaming video at low bit rates
H.261 designed for video conferencing over ISDN (=
64Kps). Also called px64 - p = 1 - 3o. As p increases frame
rate and video quality increase. Maximum delay of 150 ms
spec for codec because video conference participants
cannot tolerate well longer interruptions in visual feedback
Common Intermediate Format, CIF =352x288 pixels; QCIF
(Quarter CIF) = 176x144 pixels used when p<6 i.e.
bandwidth< 384 Kps
H.263 successor aims for 27Kps bandwidth - better than
nothing quality