You are on page 1of 113

WECHAMO UNIVERSITY

COLLEGE ENGINEERING AND TECHNOLOGY

SCHOOL OF COMPUTING AND INFORMATICS

DEPARTMENT OF INFORMATION SYSTEMS

MULTIMEDIA INFORMATION SYSTEMS MODULE

BY

WEGDERES TARIKU

REVIEWED BY

1.SAMUEL MULATU(MSC)

2. MOHAMMAD MOQUEEN KHAN(MSC)

MAY 2022

1
HOSSANA, ETHIOPIA

Table of Contents
Chapter 1............................................................................................................................................. 4
Introduction to Multimedia Systems ................................................................................................... 4
1.1. What is Multimedia? ........................................................................................................... 4
1.2. Characteristics of a Multimedia System .................................................................................. 6
1.3. Multimedia Applications (where it is applied) ........................................................................ 6
1.4. History of Multimedia Systems ............................................................................................... 7
Chapter 2 ............................................................................................................................... 21
2. Multimedia Basics and Representation .................................................................................... 21
2.1. Digital representation of Multimedia contents: ........................................................................ 21
2.2 .Digital Audio and MIDI ............................................................................................................. 40
2.3.Colors in Image and Video .......................................................................................................... 44
2.4. Color Models in Images ......................................................................................................... 48
2.5 .Color Models in Video ................................................................................................................ 49
Chapter 3 ........................................................................................................................................... 55
3. Multimedia Data Compression ................................................................................................. 55
3.1. Why compression? ..................................................................................................................... 55
3.2. Compression with Loss and Lossless .......................................................................................... 56
3.2. Huffman Coding ......................................................................................................................... 61
3.3. Dictionary-based coding:............................................................................................................ 62
Chapter 4 ........................................................................................................................................... 67
4. Storage of Multimedia .............................................................................................................. 67
4.1. Basics of optical storage technologies .................................................................................. 67
4.1.1. Random Access Memory (RAM).......................................................................................... 69
4.1.2. Read-Only Memory (ROM).................................................................................................. 70
4.2. 3. Floppy and Hard Disks ........................................................................................................ 70
4.1.4 Zip, Jaz, syQuest and optical storage Devices ...................................................................... 71
4.1.5 Digital Versatile Disc (DVD) .................................................................................................. 72
4.1.6 CD-ROM ................................................................................................................................ 72
2
4.1.7 CD-ROM players ................................................................................................................... 73
4.1.8 CD recorders ......................................................................................................................... 73
4.1.9 Videodisc players.................................................................................................................. 74
Chapter 5 ........................................................................................................................................... 75
5. Multimedia Database Management System (MMDBMS) ......................................................... 75
5.1. Multimedia Database ................................................................................................................. 76
5.2. Usage of multimedia databases in different level ..................................................................... 77
5.3. Design and Architecture of a multimedia database ................................................................... 78
5.3.1. MMDB Architectural considerations ................................................................................... 78
5.3.2. Distributed MMDBMS Architecture .................................................................................... 80
5.3.3. MMDBMS Server Components ........................................................................................... 80
5.3.4. Implementation Considerations.......................................................................................... 82
5.4. Indexing Mechanism in Organizing Multimedia Database ........................................................ 85
Chapter 6 ........................................................................................................................................... 95
Multimedia Data Retrieval ................................................................................................................ 95
6.1 Multimedia content representation ........................................................................................... 95
6.2. Similarity measures during searching ...................................................................................... 101
6.3. Retrieval performance measures: ............................................................................................ 103
6.4. Query languages ....................................................................................................................... 104
6.5 query expansion mechanism..................................................................................................... 109
6.5.1 External Resource Based Query Expansion ........................................................................ 110
6.5.2. QUERY LOGS BASED EXPANSION ...................................................................................... 111
6.5.2. RELEVANCE FEEDBACK BASED EXPANSION....................................................................... 111
6.5.3. QUERY EXPANSION USING WIKIPEDIA .............................................................................. 112
Reference Materials ........................................................................................................................ 113

3
Chapter 1

Introduction to Multimedia Systems


Objective:
✓ Explain the term Multimedia, Definition of Multimedia, Multimedia Information
Systems, Classification of Multimedia Contents
✓ Describe the need for Multimedia, different types of media, Brief history of
Multimedia evolution
✓ Describe the application areas and current trends in Multimedia usage

1.1. What is Multimedia?


Before we go on, it is important to define multimedia. Let us define it from two
perspectives:
1) In terms of what multimedia is all about?
It refers to the storage, transmission, interchange, presentation and perception of
different information types (data types) such as text, graphics, voice, audio and video
where: Storage- refers to the type of physical means to store data.
• Magnetic tape
• Hard disk
• Optical disk
• DVDs
• CD-ROMs, etc.
Presentation- refers to the type of physical means to reproduce information to the
user.
✓ Speakers
✓ Video windows, etc.
Representation- related to how information is described in an abstract form for use
within an electronic system. E.g. to present text to the user, the text can be coded in
raster graphics, primitive graphics, or simple ASCII characters. The same presentation,
different representation.
Perception- describes the nature of information as perceived by the user

4
✓ Speech
✓ Music
✓ Film
2) Based on the word "Multimedia"
It is composed of two words:
Multi- multiple/many
Media- source
Source refers to different kind of information that we use in multimedia. This includes:
✓ text
✓ graphics
✓ audio
✓ video
✓ images
Multimedia refers to multiple sources of information. It is a system which integrates
all the above types.

Definitions:

1) Multimedia means computer information can be represented in audio, video


and animated format in addition to traditional format. The traditional formats
are text and graphics.
2) Multimedia application is an application which uses a collection of multiple
media sources e.g. text, graphics, images, sound/audio, animation and/or
video.
3) Multimedia is the field concerned with the computer controlled integration of
text, graphics, drawings, still and moving images (video), animation, and any
other media where every type of information can be represented, stored,
transmitted, and processed digitally.

5
What is Multimedia Application?
A Multimedia Application is an application which uses a collection of multiple media
sources e.g. text, graphics, images, sound/audio, animation and/or video.
What is Multimedia system?
A Multimedia System is a system capable of processing multimedia data. A Multimedia
System is characterized by the processing, storage, generation, manipulation and
rendition of multimedia information.

1.2. Characteristics of a Multimedia System


A Multimedia system has four basic characteristics:
✓ Multimedia systems must be computer controlled
✓ Multimedia systems are integrated
✓ The information they handle must be represented digitally
✓ The interface to the final presentation of media is usually interactive

1.3. Multimedia Applications (where it is applied)


✓ Digital video editing and production systems
✓ Home shopping
✓ Interactive movies, and TV
✓ Multimedia courseware
✓ Video conferencing
✓ Virtual reality(the creation of artificial environment that you can explore, e.g.
3-D Images ,etc)
✓ Distributed lectures for higher education
✓ Tele-medicine
✓ Digital libraries
✓ World Wide Web
✓ On-line reference works e.g. encyclopedias, games, etc.
✓ Electronic Newspapers/Magazines
✓ Games
Groupware (enabling groups of people to collaborate on projects and share information)

6
World Wide Web (WWW)

Multimedia is closely tied to the World Wide Web (WWW). Without networks,
multimedia is limited to simply displaying images, videos, and sounds on your local
machine. The true power of multimedia is the ability to deliver this rich content to a
large audience.

Features of Multimedia:-Multimedia has three aspects:

✓ Content: movie, production, etc.


✓ Creative Design: creativity is important in designing the presentation
✓ Enabling Technologies: Network and software tools that allow creative designs
to be presented.

1.4. History of Multimedia Systems


Newspaper was perhaps the first mass communication medium, which used mostly
text, graphics, and images.

In 1895, Gugliemo Marconi sent his first wireless radio transmission at Pontecchio,
Italy.A few years later (in 1901), he detected radio waves beamed across the Atlantic.
Initially invented for telegraph, radio is now a major medium for audio broadcasting.
Television was the new media for the 20th century. It brings the video and has
since changed the world of mass communications.

On computers, the following are some of the important events:

1945 -Vannevar Bush (1890-1974) wrote about Memex.

MEMEX stands for MEMory EXtension. A memex is a device in which an individual


stores all his books, records, and communications, and which is mechanized so that it
may be consulted with exceeding speed and flexibility. It is an enlarged intimate
supplement to his memory.

1960s-Ted Nelson started Xanadu project (Xanadu ñ a kind of deep Hypertext).

Project Xanadu was the explicit inspiration for the World Wide Web, for Lotus

Notes and for HyperCard, as well as less-well-known systems.


7
1967 - Nicholas Negroponte formed the Architecture Machine Group at MIT. A
combination lab and think tank responsible for many radically new approaches to the
human-computer interface. Nicholas Negroponte is the Wiesner Professor of Media
Technology at the Massachusetts Institute of Technology.

1968 - Douglas Engelbart demonstrated NLS (Online Systems) system at SRI.

Shared-screen collaboration involving two persons at different sites communicating


over a network with audio and video interface is one of the many innovations
presented at the demonstration.

1969 - Nelson & Van Dam hypertext editor at Brown

1976 - Architecture Machine Group proposal to DARPA: Multiple Media

1985 - Negroponte, Wiesner: opened MIT Media Lab Research at the Media Lab
comprises interconnected developments in an unusual range of disciplines, such as
software agents; machine understanding; how children learn; human and machine
vision; audition; speech interfaces; wearable computers; affective computing;
advanced interface design; tangible media; object-oriented video; interactive
cinema; digital expression from text, to graphics, to sound.

1989 - Tim Berners-Lee proposed the World Wide Web to CERN (European Council
for Nuclear Research)

1990 - K. Hooper Woolsey, Apple Multimedia Lab gave education to 100 people

1992 - The first M-Bone audio multicast on the net (MBONE- Multicast Backbone)

1993 ñ U. Illinois National Center for Supercomputing Applications introduced NCSA


Mosaic (a web browser)

1994 - Jim Clark and Marc Andersen introduced Netscape Navigator (web browser)

1995 - Java for platform-independent application development

8
Hypermedia/Multimedia

What is Hypertext and Hypermedia?


Hypertext is a text, which contains links to other texts. The term was invented by Ted
Nelson around 1965. Hypertext is usually non-linear (as indicated below).
Hypermedia is not constrained to be text-based. It can include other media, e.g.,
graphics, images, and especially the continuous media -- sound and video. Apparently,
Ted Nelson was also the first to use this term.
The World Wide Web (www) is the best example of hypermedia applications.
Hypertext

Hypermedia

Hypermedia is the application of hypertext principles to a wider variety of


media, including audio, animations, video, and images.
Examples of Hypermedia Applications:
✓ The World Wide Web (WWW) is the best example of hypermedia applications.
✓ PowerPoint
9
✓ Adobe Acrobat
✓ Macromedia Director
Desirable Features for a Multimedia System
Given the above challenges, the following features are desirable for a
Multimedia
System:
1. Very high processing speed processing power. Why? Because there are large
data to be processed. Multimedia systems deals with large data and to process
data in real time, the hardware should have high processing capacity.
2. It should support different file formats. Why? Because we deal with
different data types (media types).
3. Efficient and High Input-output: input and output to the file subsystem
needs to be efficient and fast. It has to allow for real-time recording as well as
playback of data. e.g. Direct to Disk recording systems.
4. Special Operating System: to allow access to file system and process data
efficiently and quickly. It has to support direct transfers to disk, real-time
scheduling, fast interrupt processing, I/O streaming, etc.
5. Storage and Memory: large storage units and large memory are required.
Large
6. Caches are also required.
7. Network Support: Client-server systems common as distributed systems
common.
8. Software Tools: User-friendly tools needed to handle media, design and
develop applications, deliver media.

Challenges of Multimedia Systems


1. Synchronization issue: in MM application, variety of media are used at the
same instance. In addition, there should be some relationship between the
media. E.g between Movie (video) and sound. There arises the issue of
synchronization.

10
2. Data conversion: in MM application, data is represented digitally. Because of
this, we have to convert analog data into digital data.
3. Compression and decompression: Why? Because multimedia deals with large
amount of data (e.g. Movie, sound, etc) which takes a lot of storage space.
4. Render different data at same time ó continuous data.

Multimedia System Requirement

1) Software tools
2) Hardware Requirement
1. Software Requirement

Figure multimedia software tools

3-D and Animation Tools:


These softwares provide 3D clip art object such as people, furniture, building, car,
airplane, tree, etc. You can use these objects in your project easily.
A good 3D modelling tool should include the following features:
✓ Ability to drag and drop primitive shape into screen
✓ Ability to create objects from scratch
✓ Ability to add realistic effects such as transparency, shadowing, fog, etc.
✓ Multiple window that allow user to view model in each dimension
11
✓ Color and texture mapping
Examples:
✓ 3Ds Max
✓ Log motion
✓ Discrete
Text editing and word processing tools:
Word processors are used for writing letters, invoices, project content, etc. They
include features like:
• spell check
• table formatting
• thesaurus
• templates ( e.g. letters, resumes, & other common documents)
Examples:
• Microsoft Word,
• Word perfect,
• Note pad
In word processors, we can actually embed multimedia elements such as sound,
image and video.

Sound Editing Tools


They are used to edit sound (music, speech, etc)
The user can see the representation of sound in fine increment, score or wave form.
User can cut, copy, and paste any portion of the sound to edit it. You can also add other
effects such as distort, echo, pitch, etc.
Examples: -sound forge

Multimedia authoring tools:


Multimedia authoring tools provide important framework that is needed for
organizing and editing objects included in the multimedia project (e.g graphics,
animation, sound, video, etc). They provide editing capability to limited extent. (More
on this in chapter 2).
12
Examples:
• Macromedia Flash
• Macromedia Director
• Macromedia Authoware

OCR software
This software’s convert printed document into electronically recognizable ASCII
character. It is used with scanners. Scanners convert printed document into bitmap.
Then these software’s break the bitmap into pieces according to whether it contains
text or graphics. This is done by examining the texture and density of the bitmap and
by detecting edges.
Text area:-ASCII text

Bitmap area: - bitmap image


To do the above, these software use probability and expert system.
Use:
• To include printed documents in our project without typing from keyboard
• To include documents in their original format e.g signatures, drawings, etc
Examples:
✓ OmniPage Pro
✓ Perceive

Painting and Drawing Tools


To create graphics for web and other purposes, painting and editing tools are crucial.
Painting Tools: are also called image-editing tools. They are used to edit images of
different format. They help us to retouch and enhance bitmap images. Some
painting tools allow to edit vector based graphics too.
Some of the activities of editing include:
✓ bluring the picture
✓ removing part of the picture
✓ add texts to picture
✓ merge two or more pictures together, etc
13
Examples:
✓ Macromedia Fireworks
✓ Adobe Photoshop

Drawing Tool: used to create vector based graphics.

Examples:
o Macromedia Freehand
o CorelDraw
o Illustrator

Drawing and painting tools should have the following features:


➢ Scalable dimension for restore, stretch, and distorting images/graphics
➢ customizable pen and brush shapes and sizes
➢ Multiple undo capabilities
➢ Capacity to import and export files in different formats
➢ Ability to create geometric shapes from circle, rectangle, line, etc.
➢ zooming for magnified image editing
➢ support for third party plug-ins. Video Editing
Animation and digital video movie are sequence of bitmapped graphic frames
rapidly played back.
Some of the tools to edit video include:-
o Adobe premier
o Deskshare Video Edit Magic
o Videoshop
o These application display time references (relationship between time & the
video), frame counts, audio, transparency level, etc.

Hardware Requirement
Three groups of hardware for multimedia:
1) Memory and storage devices
2) Input and output devices
14
3) Network devices

1) Memory and Storage Devices


Multimedia products require high storage capacity than text-based data. Huge drives
are essential for the enormous files used in multimedia and audio-visual creation.
I) RAM: is the primary requirement for multimedia system. Why?
Reasons:
• You have to store authoring software itself. E.g Flash takes 20MB of memory,
Photoshop 16-20MB, etc.
• digitized audio and video is stored in memory
• Animated files, etc.
To store this at the same time, you need large amount of memory
II) Storage Devices: large capacity storage devices are necessary to store multimedia
data.
Floppy Disk: not sufficient to store multimedia data. Because of this, they are not used
to store multimedia data.
Hard Disk: the capacity of hard disk should be high to store large data.
CD: is important for multimedia because they are used to deliver multimedia data to
users. A wide variety of data like:
▪ Music(sound, & video)
▪ Multimedia Games
▪ Educational materials
▪ Tutorials that include multimedia
▪ Utility graphics, etc
DVD: have high capacity than CDs. Similarly, they are also used to distribute
multimedia data to users. Some of the characteristics of DVD:
o High storage capacity 4.7-17GB
o Use narrow tracks than CDs high storage capacity
o High data transfer rate 4.6MB/sec

2) Input-Output Devices

15
I) interact with the system: to interact with multimedia system, we use keyboard,
mouse, track ball, or touch screen, etc.
Mouse: multimedia project is typically designed to be used with mouse as an input
pointing device. Other devices like track ball and touch screen could be used in place
of mouse. Track ball is similar with mouse in many ways.
Wireless mouse: important when the presenter has to move around during
presentation

Touch Screen: we use fingers instead of mouse to interact with touch screen
computers. There are three technologies used in touch screens:
i) Infrared light: such touch screens use invisible infrared light that are projected
across the surface of screen. A finger touching the screen interrupts the beams
generating electronic signal. Then it identifies the x-y coordinate of the screen where
the touch occurred and sends signals to the operating system for processing.
ii) Texture-coated: such monitors are coated with texture material that is sensitive
towards pressure. When user presses the monitor, the texture material on the
monitor extracts the x y coordinate of the location and send signals to operating
system
iii) Touch mate:
Use: touch screens are used to display/provide information in public areas such as
o air ports
o Museums
o transport service areas
o hotels, etc
o Advantage:
o user friendly
o easy to use even for non-technical people
o easy to learn how to use
II) Information Entry Devices: the purpose of these devices is to enter information to
be included in our multimedia project into our computer.

16
OCR: they enable us to use OCR software convert printed document into ASCII file.
Graphical Tablets/ Digitizer: both are used to convert points, lines, and curves from
sketch into digital format. They use a movable device called stylus.

Scanners: enable us to convert printed images into digital format. Two types of
scanners:
▪ Flatbed scanners
▪ Portable scanners
Microphones: they are important because they enable us to record speech, music, etc.
The microphone is designed to pick up and amplify incoming acoustic waves or
harmonics precisely and correctly and convert them to electrical signals. You have to
purchase a superior, high-quality microphone because your recordings will depend on
its quality.

Digital Camera and Video Camera (VCR): are important to record and include image
and video in MMS respectively. Digital video cameras store images as digital data, and
they do not record on film. You can edit the video taken using video camera and VCR
using video editing tools.
Remark: video takes large memory space.

17
Output Devices
Depending on the content of the project, & how the information is presented, you need
different output devices. Some of the output hardwares are:
Speaker: if your project includes speeches that are meant to convey message to
audience, or background music, using speaker is obligatory.
Projector: when to use projector:
▪ if you are presenting on meeting or group discussion,
▪ if you are presenting to large number of audience
Types of projector:
➢ LCD projector
➢ CRT projector
Plotter/printer: when the situation arises to present using papers, you use plotters
and/or plotters. In such cases, print quality of the device should be taken into
consideration. Impact printers: not good quality graphics/poor quality not used
Non-impact printers: good quality graphics

3) Network Devices

Why do we require network devices?


The following network devices are required for multimedia presentation:
i) Modem: which stands for modulator demodulator, is used to convert digital signal
into analog signal for communication of the data over telephone line which can carry
only analog signal? At the receiving end, it does the reverse action i.e. converts analog
to digital data.
Currently, the standard modem is called v.90 which has the speed of 56kbps (kilo bits
per second). Older standards include v.34 which has the speed of 28kbps.
Types:
➢ External
➢ Internal
Data is transferred through modem in compressed format to save time and cost.

18
ii) ISDN: stands for Integrated Services Digital Network. It is circuit switched
telephone network system, designed to allow digital transmission of voice and data
over ordinary telephone copper wires. This has the advantage of better quality and
higher speeds than available with analog systems.
✓ It has higher transmission speed i.e faster data transfer rate.
✓ They use additional hardware hence they are more expensive.

iii) Cable modem: uses existing cables stretched for television broadcast reception.
The data transfer rate of such devices is very fast i.e. they provide high bandwidth.
They are primarily used to deliver broadband internet access, taking advantage of
unused bandwidth on a cable television network.

iv) DSL: provide digital data transmission over the telephone wires of local telephone
network. The speed of DSL is faster than using telephone line with modem. How?
They carry a digital signal over the unused frequency spectrum (analog voice
transmission uses limited range of spectrum) available on the twisted pair cables
running between the telephone company's central office and the customer premises.

19
Summary

Multimedia Information Flow

Fig Generalized multimedia information flow

20
Chapter 2

2. Multimedia Basics and Representation


Objective:
✓ Explain the characteristics of Digital Multimedia contents
✓ Describe digital audio coding, MIDI coding and audio file formats
✓ Describe digital image representation with color models, image file formats
✓ Describe digital video representation with color models, video file formats

2.1. Digital representation of Multimedia contents:

Text or data: is a block of characters with each character represented by a fixed


number of binary digits known as codeword: ASCII, Unicode. The size of text or data
file is usually small.
Graphics: can be constructed by the composition of primitive objects such as lines,
orders, polygons, curves and splines.
Each object is stored in an equation and contains a number of parameters (or)
attributes such as start and end coordination, shape, size, fill color, border color,
shadow etc. Graphics file take up less space than image files.

Images: Continuous-tone images can be digitized into bitmaps. A digitized image can
be represented as Two- Dimensional matrix with individual picture elements (pixel).
Each pixel is represented by a fixed number of bits or N-bit quantization. 24-bit
quantization provides 16, 777, 216 colors per pixel.
Resolution for print is measured in dots per inch (dpi), Resolution for screen is
measured in pixels per inch (ppi), Dots are digitized in 1-bit quantization

Speech/Audio Signal: One-dimensional but need high dynamic range, large file size.

21
For telephone lines, the channel bandwidth is 200Hz - 3.4 kHz for a sampling of 8-bit
per channel.
Video: Composed of a series of still image frames and produces the illusion of
movement by quickly displaying one frame after another.

The Human Visual System (HVS) accepts anything more than 20 frames per second
(fps) as smooth motion.
Digital Audio Coding:
Codecs: (Co + Dec) means Coding & Decoding or
Encoder & Decoder or
Compress & Decompress

Determine how information is represented


✓ Required sending speed
✓ Amount of loss allowed
✓ Buffers required
Formats:
Determine how data is stored.
✓ Where is the data?
✓ Where is the data about the data?

Audio Coding:
Analog sound waves (voice/natural) are received through microphone (as analog
voltage, based on air pressure). Then analog voltage (in amplitude) is sampled at
sampling frequency f and sampled signal is quantized into discrete values.

Analog-to-Digital conversion:

22
Sample and Hold: Taking the measurements of sound wave(voltage) in a given interval
of time

Sample rate:
The number of samples per second.
Telephone : 8000 samples/sec
CD: 44,100 samples/sec
DVD: 48,000 samples/sec

Sampling is based on Nyquist – Shannon sampling theorem:


For no loss of information, sampling frequency ≥ 2 * maximum signal
frequency.

Ex: maximum limit of human hearing = ~ 20 khz, by Nyquist, sample rate must
be ≥ 40,000 samples/sec.
For Telephone 4khz is the maximum frequency, so 8 khz sampling is used.
AM radio – 11kHz
FM radio – 22kHz

23
dB – decibels (to measure sound presence level)
35 dB – quiet
120 dB – discomfort

Quantization:
Sampled analog signal needs to be quantized (digitized)
✓ How many discrete digital values?
✓ What analog value does each digital value corresponding to?

Simplest quantization: linear method of encoding.


8-bit linear; 16-bit linear
CD uses 16-bit linear encoding (65536 levels).
Telephones uses 8-bit with “log arithmetic” encoding.
Mono – a single channel of audio
Stero – Two-channel of audio, mix left and right channels.

Structured Audio Coding: Using MIDI

MIDI: Musical Instrument Digital Interface

MIDI is a scripting language (A music description language in binary form).

It forms a protocol, adopted by the electronic music industry that enables computers,
synthesizers, keyboards and other musical devices to communicate with each other.
✓ A description format that is made up of semantic information about the sounds
it represents and that makes use of high level (algorithmic) models.

Normal music digitization: Perform wave form coding: (sample the music signal and
then try to reconstruct it
exactly)

24
MIDI: only record musical actions such as
the key depressed,
the time when the key is depressed,
the duration for which the key remains depressed and
how hard the key is struck (pressure)

MIDI is an example of parameter (or) event-list representation


✓ An event list is a sequence of control parameter that taken alone. (MIDI
messages)
✓ Do not define the quality of a sound but instead specify the ordering and
characterization of parts of a sound with regards to some external model.

Components of MIDI system:


Synthesizer – a sound generator
Sequenzer - a music editor
Trace – to organize the recordings
Channel – to separate information in MIDI system (Channel 1 Piano, Channel 10
Drum)
Timbre – the quality of the sound
Pitch – musical note that the instrument plays.
Voice – the portion of the synthesizer that produces sound
Patch – the control settings that defines a particular timbre.

Structured audio synthesis:


Sampling synthesis:
✓ Individual instrument sounds are digitally recorded and stored in memory.
✓ When the instrument is played, the note recording are reproduced and mixed
(added together) to produce the output sound.

Advantage:
✓ This can be very effective and realistic but requires a lot of memory.

25
✓ Good for playing music but not realistic for speech synthesis.
✓ Good for creating special sound effects from sample libraries.

Synthesizer:
(Music), electronic instrument (usually played with a keyboard) that generates
and modifies sounds electronically and can initiate a variety of other musical
instruments. If a musical instrument satisfies both components of the MIDI standard,
then it is MIDI compatible device.

MIDI Interface has two components:


1. Hardware – connects to equipment
2. Data format – encodes the information travelling through the hardware. Data
are grouped into MIDI messages.

Application of a structured Audio:


✓ Sound generation from process models – the sound is not created from an
event list but rather is dynamically generated in response to evolving, non–
sound–oriented environments such as video games,
✓ Content – based retrieval,
✓ Virtual reality,
✓ Music applications,

Digital Audio and MIDI being used to together:


✓ Modern Recording studio: Hard disk recording and MIDI
Analog sounds (hive vocals, guitar, sax etc)
Keyboards, Drum, samples, loops effects – MIDI

26
✓ Sound Generations: use a mix of
Synthesis and
Samples
✓ Samplers – Digitize (Sample) sound then
Play back,
Loop (beats),
Simulate musical instruments,

Common Audio file formats:


Name Company Extension
1. Mulaw - µlaw (SUN, NeXT) .au
2. RIFF wave (MS WAV) .wav
3. MPEG Audio layer (MPEG) .mp2, .mp3, .mp4
4. AIFF (Apple, SuI) .aiff, .aif
5. HCOM (mac) .hcom
6. SND (SUN, NexT) .snd
7. VOC (sound blaster card properitary standard) .voc

AAC – Advanced audio coding – designed for high compress as well as high
audio quality
Mp3 – MPEG 1Audio layer 3
Mp3 VBR – mp3 with variable bit rate
Audible 2,3,4 – (.aa) used for audio books or other voice records.
Apple Loss less – (.m4a)
AIFF – Audio interchange file format, similar to WAV
WAV – from IBM and Microsoft
WMA – windows media audio
FLAC – Free lossless audio codec
ATRAC (sony) – Adaptive Transform Acoustic coding
OGG Vorbis – is an open and free audio compress formats
Real audio - .rm, .ram

27
Graphic/Image Data Representation
An image could be described as two-dimensional array of points where every
point is
allocated its own color. Every such single point is called pixel, short form of picture
element. Image is a collection of these points that are colored in such a way that they
produce meaningful information/data.

Pixel (picture element) contains the color or hue and relative brightness of that
point in the image. The number of pixels in the image determines the resolution of the
image.

• A digital image consists of many picture elements, called pixels.


• The number of pixels determines the quality of the image image resolution.
• Higher resolution always yields better quality.
• Bitmap resolution most graphics applications let you create bitmaps up to
300 dots per inch (dpi). Such high resolution is useful for print media, but
on the screen most of the information is lost, since monitors usually display
around 72 to 96 dpi.
• A bit-map representation stores the graphic/image data in the same
manner that the computer monitor contents are stored in video memory.
• Most graphic/image formats incorporate compression because of the large size
of the data.

Fig 1 pixels

28
Types of images
There are two basic forms of computer graphics: bit-maps and vector graphics. The
kind you use determines the tools you choose. Bitmap formats are the ones used for
digital photographs. Vector formats are used only for line drawings.

Bit-map images (also called Raster Graphics)


They are formed from pixels a matrix of dots with different colors. Bitmap images are
defined by their dimension in pixels as well as by the number of colors they represent.
For example, a 640X480 image contains 640 pixels and 480 pixels in horizontal and
vertical direction respectively. If you enlarge a small area of a bit-mapped image, you
can clearly see the pixels that are used to create it (to check this open a picture in flash
and change the magnification to 800 by going into View->magnification->800.).

Each of the small pixels can be a shade of gray or a color. Using 24-bit color, each pixel
can be set to any one of 16 million colors. All digital photographs and paintings are
bitmapped, and any other kind of image can be saved or exported into a bitmap
format. In fact, when you print any kind of image on a laser or ink-jet printer, it is first
converted by either the computer or printer into a bitmap form so it can be printed
with the dots the printer uses.

To edit or modify bitmapped images you use a paint program. Bitmap images are
widely used but they suffer from a few unavoidable problems. They must be printed or
displayed at a size determined by the number of pixels in the image. Bitmap images
also have large file sizes that are determined by the imageís dimensions in pixels and
its color depth. To reduce this problem, some graphic formats such as GIF and JPEG
are used to store images in compressed format.

Bitmap Image: An image is a collection of an n x m array of picture elements or pixels.


A pixel representation can be bi-level, gray-scale level or color.

29
Image coding:

Sampling:
✓ Taking measurement (of color) at discrete locations within the image.
o Image sensor in camera measures the color at each pixel.
o 16 samples per inch means16 pixels per inch.
Quantization:
✓ Represent measured value (i.e. voltage) at the sampled point by an integer:
Pixel dimension:
The number of pixels horizontally (i.e. width, w) and vertically (i.e. height h),
denoted as w x h
Resolution:
The numbers of pixels in an image file per unit of spatial measure.
Resolution can be measured in pixels per inch (ppi).
Resolution by a printer is a matter of how many dots of color per inch (dpi) it
can print.
Image size:
30
The physical dimension of an image when it is printed out or displayed on a
computer.
Relation of image size, pixel dimension and resolution is
a = w/r, b = h/r
r – resolution
w x h – pixel dimension
Image size – a x b
Color depth:
The amount of information per pixel is called color depth.
1-bit => white/black (monochrome)
8-bits => gray scale
16-bits =>color (RGB)
24 or 32 bits => true color

Color models: A method by which we can specify, create, and visualize color

31
Vector graphics
They are really just a list of graphical objects such as lines, rectangles, ellipses, arcs, or
Curves called primitives. Draw programs, also called vector graphics programs, are
used to create and edit these vector graphics. These programs store the primitives as a
set of numerical coordinates and mathematical formulas that specify their shape and
position in the image. This format is widely used by computer-aided design programs
to create detailed engineering and design drawings. It is also used in multimedia when
3D animation is desired. Draw programs have a number of advantages over paint-type
programs.
These include:
▪ Precise control over lines and colors.
▪ Ability to skew and rotate objects to see them from different angles or add
perspective.
▪ Ability to scale objects to any size to fit the available space. Vector graphics
always print at the best resolution of the printer you use, no matter what size
you make them.
▪ Color blends and shadings can be easily changed.
▪ Text can be wrapped around objects.

Monochrome/Bit-Map Images
▪ Each pixel is stored as a single bit (0 or 1)
▪ The value of the bit indicates whether it is light or dark
▪ A 640 x 480 monochrome image requires 37.5 KB of storage.
▪ Dithering is often used for displaying monochrome images

32
Fig 3 Monochrome bit-map image

Gray-scale Images
▪ each pixel is usually stored as a byte (value between 0 to 255)
▪ This value indicates the degree of brightness of that point. This brightness goes
from black to white
▪ A 640 x 480 grayscale image requires over 300 KB of storage.

Fig 4 Gray-scale bit-map image

8-bit Color Images


• One byte for each pixel
• Supports 256 out of the millions possible, acceptable color quality
• Requires Color Look-Up Tables (LUTs)
• A 640 x 480 8-bit color image requires 307.2 KB of storage (the same as 8-bit
greyscale)
• Examples: GIF

33
Fig 5 8-bit Color Image

24-bit Color Images


• Each pixel is represented by three bytes (e.g., RGB)
• Supports 256 x 256 x 256 possible combined colors (16,777,216)
• A 640 x 480 24-bit color image would require 921.6 KB of storage
• Most 24-bit images are 32-bit images,
o the extra byte of data for each pixel is used to store an alpha value
representing special effect information

Image Resolution
Image resolution refers to the spacing of pixels in an image and is measured in pixels per
inch, ppi, sometimes called dots per inch, dpi. The higher the resolution, the more pixels in
the image. A printed image that has a low resolution may look pixelated or made up of
small squares, with jagged edges and without smoothness.
Image size refers to the physical dimensions of an image. Because the number of pixels in
an image is fixed, increasing the size of an image decreases its resolution and decreasing its
size increases its resolution.

Popular File Formats


Choosing the right file type for your image to save in is of vital importance. If you are, for
example, creating image for web pages, then it should load fast. So such images should be
small size. The other criteria to choose file type is taking into consideration the quality of
the image that is possible using the chosen file type. You should also be concerned about
the portability of the image.
To choose file type:
• Resulting size of the image-:large file size or small
• Quality of image possible by the file type
• Portability of file across different platforms
The most common formats used on internet are the GIF, JPG, and PNG.

34
Standard System Independent Formats

GIF
✓ Graphics Interchange Format (GIF) devised CompuServe, initially for
transmitting
✓ graphical images over phone lines via modems.
✓ Uses the Lempel-Ziv Welch algorithm (a form of Huffman Coding), modified
slightly for image scan line packets (line grouping of pixels).
✓ LZW compression was patented technology by the UNISYS Corp.
✓ Limited to only 8-bit (256) color images, suitable for images with few
distinctive colors (e.g., graphics drawing)
✓ Supports one-dimensional interlacing (downloading gradually in web
browsers.
✓ Interlaced images appear gradually while they are downloading. They display at
a low blurry resolution first and then transition to full resolution by the time
the download is complete.)
✓ Supports animationómultiple pictures per file (animated GIF)
✓ GIF format has long been the most popular on the Internet, mainly because of
its small size
✓ GIFs allow single-bit transparency, which means when you are creating your image,
you can specify one colour to be transparent. This allows background colours to
show through the image.

PNG
✓ stands for Portable Network Graphics
✓ It is intended as a replacement for GIF in the WWW and image editing tools.
✓ GIF uses LZW compression which is patented by Unisys. All use of GIF may have
to pay royalties to Unisys due to the patent.
✓ PNG uses unpatented zip technology for compression
✓ One version of PNG, PNG-8, is similar to the GIF format. It can be saved with a
maximum of
✓ 256 colours and supports 1-bit transparency. Filesizes when saved in a capable
image editor like FireWorks will be noticeably smaller than the GIF counterpart, as
PNGs save their colour
✓ data more efficiently.
✓ PNG-24 is another version of PNG, with 24-bit colour support, allowing ranges of
colour to a high colour JPG. However, PNG-24 is in no way a replacement format
for JPG, because it is
✓ a loss-less compression format which results in large filesize.
✓ Provides transparency using alpha value
✓ Supports interlacing
✓ PNG can be animated through the MNG extension of the format, but browser
support is less for this format.

JPEG/JPG
✓ A standard for photographic image compression
✓ created by the Joint Photographic Experts Group

35
✓ Intended for encoding and compression of photographs and similar images
✓ Takes advantage of limitations in the human vision system to achieve high rates
of compression
✓ Uses complex lossy compression which allows user to set the desired level of
quality (compression). A compression setting of about 60% will result in
the optimum balance of quality and filesize.
✓ T h o u g h JPGs can be interlaced; they do not support animation and
transparency unlike GIF

TIFF
✓ Tagged Image File Format (TIFF), stores many different types of images
(e.g.,
✓ Monochrome, grayscale, 8-bit & 24-bit RGB, etc.)
✓ Uses tags, keywords defining the characteristics of the image that is included in
the file. For example, a picture 320 by 240 pixels would include a 'width' tag
followed by the number '320' and a 'depth' tag followed by the number '240'.
✓ Developed by the Aldus Corp. in the 1980ís and later supported by the Microsoft
✓ TIFF is a lossless format (when not utilizing the new JPEG tag which allows
for
✓ JPEG compression)
✓ It does not provide any major advantages over JPEG and is not as user-controllable.
✓ Do not uses TIFF for web images. They produce big files, and more importantly,
most web browsers will not display TIFFs.

Color models: A method by which we can specify, create, and visualize color

In the realm of digital imaging, frequency refers to the rate at which color values
change.

36
Image Decimation: (Sub sampling) is the reduction in dimension (or) resolution of the
image.
Ex: Decimation of 2, results to half of the size of the image.
Simplest method is the skipping of pixels.
Image Interpolation: (Up sampling) is the increase in dimension (or) resolution of the
image.
Ex: Interpolation of 2, results to double the size of the image.
Simplest method is duplication of pixels.

Color Models:

What is color? (Visible light)


Light is a visible portion of electromagnetic wave form. Color is how we
perceive different wave lengths of light.
AM radio waves are about 100 meters
FM radio waves are about 1 meter
Light waves are about 0.000005 meters

Light waves are about 0.000005 meters

Color perception:

37
✓ Color is how we perceive of the frequencies of light that reach the retinas of our
eyes.
✓ The human retina has three types of color photoreceptor cone cells that
correspond to the colors of red, green and blue

38
Three characteristics of color:
Hue (essential color):dominant wavelength
Saturation (color purity)
Brightness (Luminance - Intensity of light)

Different Color models:


1. Red Green Blue(RGB) color model
Used for display, based on human eye perception.
2. Cyan MegentaYellow(CMY) color model
Used for printer
3. Hue Saturation Brightness(HSB) color model
4. YUV (Luminance and Chrominance) color model
5. CIE XYZ color model

Procedural modeling:
Creating a digital image by writing a computer program based on some mathematical
computation or unique type of algorithm. Procedural models are often defined by a small set of
data that describes the overall properties of the model.

For example, a tree might be defined by some branching properties and leaf shapes. The actual
model is constructed by a procedure that often makes one of randomness to add variety. In this
way, a single tree pattern can be used to model an entire forest.

39
Video formats:
AVI, WMF, MPEG, QuickTime, Real video, shockwave (Flash)

AVI – Audio video interleaved – is a window movie file with high video quality, but a large file
size (approx. 25 GB is required for 60 minutes of video)

2.2 .Digital Audio and MIDI


What is Sound?
Sound is produced by a rapid variation in the average density or pressure of air molecules
above and below the current atmospheric pressure. We perceive sound as these pressure
fluctuations cause our eardrums to vibrate. These usually minute changes in atmospheric
pressure are referred to as sound pressure and the fluctuations in pressure as sound waves.
Sound waves are produced by a vibrating body, be it a guitar string, loud speaker cone or jet
engine. The vibrating sound source causes a disturbance to the surrounding air molecules,
causing them bounce off each other with a force proportional to the disturbance. The back and
forth oscillation of pressure produces a sound waves.
Source — Generates Sound
• Air Pressure changes
• Electrical —Microphone produces electric signal
• Acoustic (sound)— Direct Pressure Variations
Destination — Receives Sound
• Electrical — Loud Speaker
• Ears — Responds to pressure hear sound
How to Record and Play Digital Audio

40
In order to play digital audio (i.e WAVE file), you need a card with a Digital to Analog Converter
(DAC) circuitry on it. Most sound cards have both an ADC (Analog to Digital Converter) and a
DAC so that the card can both record and play digital audio. This DAC is attached to the Line
Out jack of your audio card, and converts the digital audio values back into the original analog
audio.This analog audio can then be routed to a mixer, or speakers, or headphones so that you
can hear the recreation of what was originally recorded.
Playback process is almost an exact reverse of the recording process. First, to record digital
audio, you need a card which has an Analog to Digital Converter (ADC) circuitry. The ADC is
attached to the Line In (and Mic In) jack of your audio card, and converts the incoming analog
audio to a digital signal.
First, to record digital audio, you need a card which has an Analog to Digital Converter(ADC)
circuitry. The ADC is attached to the Line In (and Mic In) jack of your audio card, and converts
the incoming analog audio to a digital signal. Your computer software can store the digitized
audio on your hard drive, visually display on the computer's monitor, mathematically
manipulate in order to add effects, or process the sound, etc. While the incoming analog audio
is being recorded, the ADC is creates many digital values in its conversion to a digital audio
representation of what is being recorded. These values must be stored for later playback.
Digitizing Sound
• Microphone produces analog signal
• Computers understands only discrete(digital) entities
This creates a need to convert Analog audio to Digital audio — specialized hardware. This is
also known as Sampling.
Common Audio Formats
There are two basic types of audio files:
1. The Traditional Discrete Audio File:
In traditional audio file you can save to a hard drive or other digital storage medium. WAV:
The WAV format is the standard audio file format for Microsoft Windows applications, and is
the default file type produced when conducting digital recording within Windows. It supports a
variety of bit resolutions, sample rates, and channels of audio. This format is very popular upon
IBM PC (clone) platforms, and is widely used as a basic format for saving and modifying digital
audio data.
AIF: The Audio Interchange File Format (AIFF) is the standard audio format employed by
computers using the Apple Macintosh operating system. Like the WAV format, its upports a

41
variety of bit resolutions, sample rates, and channels of audio and is widelyused in software
programs used to create and modify digital audio.
AU: The AU file format is a compressed audio file format developed by Sun Microsystems and
popular in the unix world. It is also the standard audio file format for the Java programming
language. Only supports 8-bit depth thus cannot provide CD-quality sound.
MP3:MP3 stands for Motion Picture Experts Group, Audio Layer 3 Compression. MP3 files
provide near-CD-quality sound but are only about 1/10th as large as a standard audio CD file.
Because MP3 files are small, they can easily be transferred across the Internet and played on
any multimedia computer with MP3 player software.
MIDI/MID: MIDI (Musical Instrument Digital Interface), is not a file format for storing or
transmitting recorded sounds, but rather a set of instructions used to play electronic music on
devices such as synthesizers. MIDI files are very small compared to recorded audio file formats.
However, the quality and range of MIDI tones is limited.
2. Streaming Audio File Formats
Streaming is a network technique for transferring data from a server to client in a format that
can be continuously read and processed by the client computer. Using this method, the client
computer can start playing the initial elements of large time-based audio or video files before
the entire file is downloaded. As the Internet grows, streaming technologies are becoming an
increasingly important way to deliver time-based audio and video data.
For streaming to work, the client side has to receive the data and continuously feed it to the
player application. If the client receives the data more quickly than required, it has to
temporarily store or buffer the excess for later play. On the other hand, if the data doesn't
arrive quickly enough, the audio or video presentation will be interrupted. There are three
primary streaming formats that support audio files: Real Network's Real Audio (RA, RM),
Microsoft’s Advanced Streaming Format (ASF) and its audio subset called Windows Media
Audio 7 (WMA) and Apples QuickTime 4.0+ (MOV).

RA/RM
For audio data on the Internet, the de facto standard is RealNetwork's RealAudio
(.RA)compressed streaming audio format. These files require a RealPlayer program or browser
plug-in. The latest versions of Real Networks server and player software can handle multiple
encodings of a single file, allowing the quality of transmission to vary with the available
bandwidth. Webcast radio broadcast of both talk and music frequently uses RealAudio.

42
Streaming audio can also be provided in conjunction with video as a combined Real Media
(RM) file.
ASF
Microsoft Advanced Streaming Format (ASF) is similar to designed to Real Network's Real
Media format, in that it provides a common definition for internet streaming media and can
accommodate not only synchronized audio, but also video and other multimedia elements, all
while supporting multiple bandwidths within a single media file. Also like Real Network's Real
Media format, Microsoft's ASF requires a program or browser plug-in. The pure audio file
format used in Windows Media Technologies is Windows Media Audio 7 (WMA files). Like MP3
files, WMA audio files use sophisticated audio compression to reduce file size. Unlike MP3 files,
however, WMA files can function as either discrete or streaming data and can provide a
security mechanism to prevent unauthorized use.
MOV
Apple QuickTime movies (MOV files) can be created without a video channel and used as a
sound-only format. Since version 4.0, QuickTime provides true streaming capability.
QuickTime also accepts different audio sample rates, bit depths, and offers full functionality in
both Windows as well as the Mac OS. Popular audio file formats are:

• au (Unix)
• aiff (MAC)
• wav (PC)
• mp3

MIDI
MIDI stands for Musical Instrument Digital Interface.
Definition of MIDI: MIDI is a protocol that enables computer, synthesizers, keyboards, and
other musical device to communicate with each other. This protocol is a language that allows
interworking between instruments from different manufacturers by providing a link that is
capable of transmitting and receiving digital data. MIDI transmits only commands, it does not
transmit an audio signal.It was created in 1982.
Components of a MIDI System
1. Synthesizer: It is a sound generator (various pitch, loudness, tone color). A good
(musician’s) synthesizer often has a microprocessor, keyboard, control panels, memory, etc.

43
2. Sequencer: It can be a stand-alone unit or a software program for a personal computer. (It
used to be a storage server for MIDI data. Nowadays it is more a software music editor on the
computer. It has one or more MIDI INs and MIDI OUTs.

Basic MIDI Concepts


Track: Track in sequencer is used to organize the recordings. Tracks can be turned on or off on
recording or playing back.
Channel: MIDI channels are used to separate information in a MIDI system. There are 16 MIDI
channels in one cable. Channel numbers are coded into each MIDI message.
Timbre: The quality of the sound, e.g., flute sound, cello sound, etc.
Multi-timbral: capable of playing many different sounds at the same time (e.g., piano, brass,
drums,..)
Pitch: The Musical note that the instrument plays
Voice: Voice is the portion of the synthesizer that produces sound. Synthesizers can have many
(1Two, Two0, Two4, 36, etc.) voices. Each voice works independently and simultaneously to
produce sounds of Different timbre and pitch.
Patch: The control settings that define a particular timbre.

2.3.Colors in Image and Video


Light and Spectra
In 1672, Isaac Newton discovered that white light could be split into many colors by aprism.
The colors produced by light passing through prism are arranged in precise array or spectrum.
The colors spectral signature is identified by its wavelength.

44
Visible light is an electromagnetic wave in the 400nm-700nm range
(Blue~400nm,Red~700nm, Green~500nm). Most light we see is not one wavelength, it’s a
combination of many wavelengths. For Example purple is a mixture of red and violet. 1nm=10-
9m.

The Color of Objects


Here we consider the color of an object illuminated by white light. Color is produced by the
absorption of selected wavelengths of light by an object. Objects can be thought of as absorbing
all colors except the colors of their appearance which are reflected back. A blue object
illuminated by white light absorbs most of the wavelengths except those corresponding to blue
light. These blue wavelengths are reflected by the object.

Color Spaces
Color space specifies how color information is represented. It is also called color model. Any
color could be described in a three dimensional graph, called a color space. Mathematically the
axis can be tilted (sloped) or moved in different directions to change the way the space is
described, without changing the actual colors. The values along an axis can be linear or
nonlinear.
this gives a variety of ways to describe colors that have an impact on the way we process

45
a color image. There are different ways of representing color. Some of these are:
• RGB color space
• YUV color space(Luma (Brightness) 'Y', Blue Projection 'U', Red Projection 'V'
• YIQ color space(Luma (Brightness) 'Y', I and Q represent
• the chrominance information. Chrominance: Color - Luma.)
• CMY/CMYK color space
• YCbCr color space(Y' is the luma component and C B and C are the blue-difference
and red-difference chroma components. Y' (with prime) is distinguished from Y,
which is luminance, meaning that light intensity is nonlinearly encoded based on
gamma corrected RGB primaries.)

RGB Color Space


RGB stands for Red, Green, Blue. RGB color space expresses/defines color as a mixture of three
primary colors:
• Red
• Green
• Blue
Absence of all colors create black and presence of the three colors form white. These colors are
called additive colors. Pure black (0,0,0). Pure white(255,255,255) all other colors are
produced by varying the intensity of these three primaries and mixing the colors.

Fig: RGB color


These colors are called additive colors since they add together the way light adds to make

46
colors, and is a natural color space to use with video displays.

CYM and CYMK color space


The subtractive color system reproduces colors by subtracting some wavelengths of light from
white. The three subtractive color primaries are cyan, magenta, and yellow (CMY). If none of
these colors is present, the color produced is white because nothing has been subtracted from
the white light. If all the colors are present at their maximum amounts, the color produced is
black because all of the light has been subtracted from the white light.
A color model used with printers and other peripherals. Three primary colors, cyan
(C),magenta (M), and yellow (Y), are used to reproduce all colors.

The three colors together absorb all the light that strikes it, appearing black (as contrasted to
RGB where the three colors together made white). "Nothing" on the paper is white (as
contrasted to RGB where nothing was black). These are called the subtractive or "paint" colors.
Cyan, Magenta, and Yellow (CMY) are complementary colors of RGB. CMY model is mostly used
in printing devices where the color pigments on the paper absorb certain colors (e.g., no red
light reflected from cyan ink) and in painting.
In practice, it is difficult to have the exact mix of the three colors to perfectly absorb all light
and thus produce a black color. Expensive inks are required to produce the exact color, and the
paper must absorb each color in exactly the same way. To avoid these problems, a forth color is
often added - black - creating the CYMK color space, even though the black is mathematically
not required. Sometimes, an alternative CMYK model (K stands for Black) is used in color
printing (e.g., to produce darker black than simply mixing CMY).

47
2.4. Color Models in Images
Colors models and spaces used for stored, displayed, and printed images.
RGB Color Model (Additive Model)
Color images are encoded as integer triplet(R,G,B) values. These triplets encode how much the
corresponding phosphor should be excited in devices such as a monitor. For images produced
from computer graphics, we store integers proportional to intensity in the frame buffer.
CMY Color Model (Subtractive Color)
Additive color: Namely, when two light beams affect on a target, their colors add; when two
phosphors on a CRT screen are turned on, their colors add. But for ink deposited on paper, the
opposite situation holds: yellow ink subtracts blue from white illumination, but reflects red and
green; it appears yellow. Instead of red, green, and blue primaries, we need primaries that
amount to -red, -green, and -blue. i.e we need to subtract R, or G, or B. These subtractive color
primaries are Cyan (C), Magenta (M) and Yellow (Y) inks.

Transformation from RGB to CMY


Simplest model we can invent to specify what ink density to lay down on paper, to make a
certain desired RGB color: Then the inverse transform is:

48
Undercolor Removal: CMYK System
Undercolor removal: Sharper and cheaper printer colors: calculate that part of the CMY mix
that would be black, remove it from the color proportions, and add it back as real black. The
new specification of inks is thus: K stands for black.

Color combinations that result from combining primary colors available in the two situations,
additive color and subtractive

(a) (b)
Fig: Additive and subtractive color. (a): RGB is used to specify additive color. (b): CMY is used
to specify subtractive color

2.5 .Color Models in Video


Video Color Transforms
Largely derive from older analog methods of coding color for TV. Luminance is separated from
color information.YIQ is used to transmit TV signals in North America and Japan.In Europe,
video tape uses the PAL(Phase Alternate Line) or SECAM(Sequential Color and Memory)
coding's, which are based on TV that uses a matrix transform called YUV. Finally, digital video
mostly uses a matrix transform called YCbCr that is closely related to YUV.

49
YUV Color Model
Established in 1982 to build digital video standard. Video is represented by a sequence of fields
(odd and even lines). Two fields make a frame.Works in PAL (50 fields/sec) or NTSC ( National
Television System Committee) (60 fields/sec). The luminance (brightness), Y, is retained
separately from the chrominance (color). The Y component determines the brightness of the
color (referred to as luminance or luma), while the U and V components determine the color
itself (it is called chroma). U is the axis from blue to yellow and V is the axis from magenta to
cyan. Y ranges from 0 to1 (or 0 to 255 in digital formats), while U and V range from -0.5 to 0.5
(or -128 to 127 in signed digital form, or 0 to 255 in unsigned form). The luminance component
of a television signal which carries information on the brightness of the image. Y(Luminance)
is the intensity of light emitted from a surface per unit area in a given direction. Chrominance
refers to the difference between a color and a reference white at the same luminance.

Fig: YUV decomposition of color image. Top image (a) is original color image; (b) is Y ; (c,d) are
(U; V )

YIQ Color Model


YIQ is used in color TV broadcasting, it is downward compatible with Black and White TV. The
YIQ color space is commonly used in North American television systems. Note that if the
chrominance is ignored, the result is a "black and white" picture. I and Q are a rotated version
of U and V. Y in YIQ is the same as in YUV; U and V are rotated by 33 degree. Y (luminance). I is
red-orange axis, Q is roughly orthogonal to I. Eye is most sensitive to Y (luminance), next to I,

50
next to Q.YIQ is intended to take advantage of human color response characteristics. Eye is
more sensitive to (I) than in (Q). Therefore less bandwidth is required for Q than for I. NTSC
limits I to 1.5 MHZ and Q to 0.6 MHZ. Y is assigned higher bandwidth, 4MHZ.

YCbCr
This is similar to YUV. This color space is closely related to the YUV space, but with the
coordinates shifted to allow all positive valued coefficients. The luminance (brightness), Y, is
retained separately from the chrominance (color). During development and testing of JPEG it
became apparent that chrominance sub sampling in this space allowed a much better
compression than simply compressing RGB or CYM. Sub sampling means that only one half or
one quarter as much detail is retained for the color as for the brightness. It is used in MPEG and
JPEG compressions.
Y-Luma component
Cb
Cr Chrominace

Digital Video Representation: Can be thought of as a sequence of moving images (or) frames.
Important parameters in video
Digital image resolution (n x m) pixel
Quantization (k-bits per pixel)
Frame rate (p frames per sec- fps)

Continuity of motion is achieved at minimum 15fps, 30 fps, is good, HDTV-uses 60 fps.

51
Types of video signals:
a- Component video
b- Composite video
c- S-video

Component video: for high end video system, such as studio.


✓ Three separated video signals-Red, Green, Blue image plane.
✓ Three wires connecting to camera or other devices to TV or monitor
✓ Give best color reproduction
✓ No cross talk between the different channels
✓ Requiring more bandwidth and good synchronization
✓ Besides RGB, YUV, YIQ and other modes can be used.

Composite video:
✓ Chrominance and luminance signals mixed into a single carrier wave
Y-luminance
I,Q or U,V – chrominance
✓ Only one wire required – some interference
✓ Used by broadcast color TV, downward compatible with black and white TV
✓ In NTSC TV, I and Q combined into chrominance signal, at the higher frequency end of
the channel.

S-video: separated video (or) super-video


✓ Two wires: one for luminance and another for a composite chrominance.
✓ Less crosstalk,
Video standards:
1. NTSC (National Television System Committee)
✓ Set the standard for transmission of analog color pictures back in 1953
✓ Used in US and Japan
✓ 525 lines (480 visible)
✓ 30 fps (i.e delay between frames = 33.3 ms)
✓ Video aspect ratio of 4:3 (e.g. 12 inch wide, 9 inch high)

52
Downward compatible with black and white TV system , color model YUV.

Other standards:
PAL (Phase Alternating Line): used in parts of western Europe
SECAM : French standard

Digital video standard:


✓ CCIR (consultative committee for international Radio)
✓ NTSC Standard
525 lines, 858 pixels (720 is visible)
4:2:2 subsampling
One pixel-two bytes
Uses YCbCr color model

Digital video standard specification:

CIF standard:
CIF – Common Intermediate Format
The idea of CIF: a format for lower bit rate
QCIF – Quarter – CIF, more lower bit rate
The resolution of CIF/QCIF can be divided by 8 or 16.

Video bit rate calculation: (width x height × depth × fps)

53
Ex: resolution 640 × 480 and 24 bits per pixel color depth means = (640×480×24×30)
bits/sec (bps)
Normally one frame = 3 pictures (YCbCr)

Advantages of Digital video representation:


Storing video in digital medium or in memory, makes it ready to be processed (noise
removal,
cut and paste etc.),
Direct access, which makes non-linear video editing simple,
Repeated recording without degradation of image quality,
Ease of encryption and better tolerance to channel noise.

54
Chapter 3

3. Multimedia Data Compression

Compression is Process of reducing the amount of bits used to represent an image/video


Compression reduces the number of bits required to represent the data. Compression is useful
because it helps in reducing the consumption of expensive resources, such as disk space and
transmission bandwidth. Compression is built into a broad range of technologies like storage
systems, databases operating systems and software applications.
Multimedia data, comprised of audio, video, and still images, now makes up the majority of
traffic on the Internet. Part of what has made the widespread transmission of multimedia
across networks possible is advances in compression technology. Because multimedia data is
consumed mostly by humans using their senses—vision and hearing—and processed by the
human brain, there are unique challenges to compressing it. You want to try to keep the
information that is most important to a human, while getting rid of anything that doesn’t
improve the human’s perception of the visual or auditory experience. Hence, both computer
science and the study of human perception come into play. In this section, we’ll look at some of
the major efforts in representing and compressing multimedia data.

3.1. Why compression?


Compression Technology is employed to efficiently use storage space, to save on transmission
capacity and transmission time, respectively. Basically, its all about saving resources and
money. Despite of the overwhelming advances in the areas of storage media and transmission
networks it is actually quite a surprise that still compression technology is required. One
important reason is that also the resolution and amount of digital data has increased (e.g. HD-
TV resolution, ever-increasing sensor sizes in consumer cameras), and that there are still
application areas where resources are limited, e.g. wireless networks. Apart from the aim of
simply reducing the amount of data, standards like MPEG-4, MPEG-7, and MPEG-21 offer
additional functionalities

Compression is enabled by statistical and other properties of most data types, however, data
types exist which cannot be compressed, e.g. various kinds of noise or encrypted data

High-resolution image: e.g. 1024×768 pixel, 24 bit color depth requires

55
 1024×768×24 = 18, 874, 368 bits

Text: 1 page with 80 character/line and 64 lines/page and 1 byte/character results in


80×64×1×8 = 40k bit/page

Audio: CD quality, sampling rate of 44, 1 kHz, 16 bits per sample results in
44.1×16 = 706 k bit/s
Stereo: 1,412 m bit/s

Video: Full size frame 1024×768 pixels/frame, 24 bits/pixel, 30 frame/s results in


1024×768×24×30 = 566 m bits

Therefore, requires more storage, bandwidth to transfer.

3.2. Compression with Loss and Lossless


Types of Compression:-Lossless Compression and Lossy Compression

Lossless compression:

In lossless compression (as the name suggests) data are reconstructed after compression
without errors, i.e. no information is lost. Typical application domains where you do not want
to loose information is compression of text, files, fax. In case of image data, for medical imaging

56
or the compression of maps in the context of land registry no information loss can be tolerated.
A further reason to stick to lossless coding schemes instead of lossy ones is their lower
computational demand. For all lossless compression techniques there is a well known trade-
off: Compression Ratio – Coder Complexity – Coder Delay

The original content of the data is not lost/changed when it is compressed (encoded). It is used
mainly for compressing symbolic data such as database records, spreadsheets, texts,
executable programs, etc., Lossless compression can recover the exact original data after
compression where exact replication of the original is essential and changing even a single bit
cannot be tolerated. Examples: Run Length Encoding (RLE), Lempel Ziv (LZ), Huffman Coding.

Lossless Compression typically is a process with three stages:

✓ The model: the data to be compressed is analysed with respect to its structure and the
relative frequency of the occuring symbols.
✓ The encoder: produces a compressed bitstream / file using the information provided by
the model.
✓ The adaptor: uses information’s extracted from the data (usually during encoding) in
order to adapt the model (more or less) continuously to the data.

Lossless compression: (entry encoding)


For reducing data amount while keeping image quality.
Ex: GIF 2:1 to 50:1 compression rates

Lossy compression:
Lossy Compression is generally used for image, audio, video. In this compression technique, the
compression process ignores some less important data and the exact replica of the original file
can’t be retrieved from the compressed file. To decompress the compressed data we can get a
closer approximation of the original file
The original content of the data is lost to certain degree when compressed. For visual and
audio data, some loss of quality can be tolerated without losing the essential nature of the data.
Lossy compression is used for image compression in digital cameras like JPEG, audio
compressionlike mp3. video compression in DVDs with MPEG format.

57
Lossy compression Remove some image details to achieve a higher compression rate by
suppressing higher frequencies.
Ex: JPEG 100:1 Compression rate
Combined with lossless techniques
Trade-off between file size and quality

Compression – dependence on application type:


✓ Dialogue mode, for example, whatsapp
✓ Retrieval mode, for example, youtube
Mode dependent requirements:
Dialogue and Retrieval mode requirements
✓ Synchronization of audio, video , and other media
Dialogue mode requirements:
✓ End –to – End delay < 150 ms
✓ Compression and decompression in real-time
✓ Symmetric
Retrieval mode requirements:
✓ Fast forward and backward data retrieval
✓ Random access within ½ s
✓ Asymmetric
We look mainly at retrieval mode.

Compression categories
1. Entropy coding: (Run-length coding, Huffman coding, Arithmetic coding)
✓ Ignoring semantics of the data
✓ Lossless
2. Source coding: (Prediction: DPCM, DM; Transformation: FFT, DCT)
✓ Based on semantic of the data
✓ Often lossy
3. Channel coding:
✓ Adaptation to communication channel
✓ Introduction of redundancy
4. Hybrid coding:

58
✓ Combination of entropy and source coding proprietary.
(JEPG, MPEG, H.261, DVI RIV, DVI PLV….)

Simple lossless Algorithms:


A. Pattern substitution.

B. Run length coding:


Principle: Replace all repetition of the same symbol in the text(runs) by a rep---- counter and
the symbol.

As we can see, we can only expect a good compression rate when long runs occur frequently
Example of long runs: blank space in text documents,
leading zeros in numbers,
strings of white in grey-scale images.

C. Run length coding for binary files:


When dealing with binary files we are sure that a run of 1’s is always followed by a run
of a 0’s and vice versa. It is then sufficient to store the repetition counts only.

59
D. Variable length coding:
Classical character codes use the same number of bits for each character. When the frequency
of occurrence is different for different characters, we can use fewer bits for frequent character
and more bits for rare character.

Basics of Information Theory: (entropy and code length)


Entropy: The entropy n of an information source with alphabet s = {S1, S2, S3……Sn} is:

60
pi - probability that symbol Si will occur in S.

log2 1/pi – indicates the amount of information contained in Si, which corresponds to the
number
of bits(code length) needed to encode Si.

Code length:
The entropy n specifies the lower bound for the average number of bits to code each
symbol in S
(i.e) n≤i
i – the average length (measured in bits) of the codewords produced by the encoder.

3.2. Huffman Coding


Huffman Coding: found by David Huffman in 1952
Finds the best variable-length code for given character frequencies (or) probabilities.
Algorithm:
Determine the frequencies of the characters and mark the leaf nodes of a binary tree (to
be built) with them.
(1) Out of the tree nodes not yet marked as DONE, take the two with the smallest
frequencies and compute their sum.
(2) Create a parent node for them and mark it with the sum. Mark the branch to the left son
with 0, the one to the right son with 1.
(3) Mark the two son nodes as DONE, when there is only one node not yet marked as DONE,
stop (the tree is complete) otherwise, continue with step 2.

Example:
probabilities of the characters
P(A): 0.3, p(B)= 0.3, p(C) = 0.1, p(D) = 0.15, p(E) = 0.15

61
Characters with higher probabilities are closer to the root of the tree and thus have shorter
codeword lengths, thus it is a optimal code.

Properties of Huffman coding:


1. Unique prefix property: No Huffman code is a prefix of any other Huffman code.
2. Optimality: Minimum redundancy code – proved optimal for a given data model (i.e
given, accurate, probability distribution)
❖ The two least frequent symbols will have the same length for their Huffman codes,
differing only at the least bit.
❖ Symbols that occur more frequently will have shorter Huffman codes than symbols that
occur less frequently.

3.3. Dictionary-based coding:


LZW – Lempel – Ziv – Welch codes an example of the large groups of dictionary based
codes.

Dictionary: A table of character strings which is used in the encoding process.


Example: The word “Lecture” is found on page X4, line Y4 of the dictionary.
It can thus be encoded as (X4, Y4).

Static techniques: The dictionary exists before a string is encoded. It is changed, neither in the
coding nor in the decoding process.
Dynamic techniques: The dictionary is created “on the fly” during the encoding and
decoding process.

62
Lempel and Ziv have proposed an especially brilliant dynamic, dictionary-based technique
(1977).

Variants of these techniques (LZW) are used very widely today for lossless compression.
TIFF – Tag image File Format – is based on LZ coding.

LZW coding principle: The current piece of the message can be encoded as a reference to an
earlier (identical) piece of the message. This reference will usually be shorter than the piece
itself. As the message is processed, the dictionary is created dynamically.

LZW Algorithm:
Initialize code table ();
S ← the given string
W ← the empty string;
For each character in string (S) {
K ← Get next character from S
If (W+K) is in the code table {
W = W+K /* string concatenation */
} else {
Write code (code from code Table (W));
Add table entry (W+K);
W=K
}
}
Write code (code from code Table (W));
Example:
Our alphabet is {A, B, C, D} we encode the string ABACABA.
In the first step we initialize the string table

Dictionary
Code String
1 A
2 B

63
3 C
4 D

We read A from the input . We find A in the table and keep A in the buffer.
We read B from the input into the buffer and now consider AB.
AB is not in the table, we add AB with index 5, write 1 for A into the output and remove A from
the buffer.
The buffer only contains B now.
Next, we read A, consider BA, add BA as entry 6 into the table and write 2 for B into the output
etc..
At the end the code table is
1=A
2=B
3=C
4=D
5=AB
6=BA
7=AC
8=CA
9=ABA

The output data stream is 121351

Adaptive Huffman coding:


In this coding technique statistics (frequency of occurrence) are gathered and updated
dynamically as the data stream arrives.

Algorithm:
Initial – code ();
While not EOF
{
get(c);
encode (c); // identify the symbol or character (c)

64
update-tree(c);
}
Initial – code (): assigns symbols with some initially agreed upon codes, without any prior
knowledge of the frequency counts.
Update-tree (): constructs an Adaptive Huffman tree.

It basically does two things:


a. Increments the frequency counts for the symbols (including any new ones)
b. Updates the configuration of the tree:
❖ Nodes are numbered in order from left to right, bottom to top. The numbers in
parentheses indicates the count.
❖ The tree must always maintain its sibling property, (i.e.) all nodes (internal and
leaf) are arranged in the order of increasing counts.
❖ If the sibling property is about to be violated, a swap procedure is invoked to
update the tree by rearranging the nodes.
❖ When a swap is necessary, the farthest node with count N is swapped with the
node whose count has just been increased to N+1.

Note:
If any character / symbol is to be sent the first time, it must be preceded by a special symbol,
NEW.
The initial code for NEW is 0. The count for NEW is always kept as 0 (the count is never
increased).

Example: AADCCDD – given string


Initial code:
NEW : 0
A : 00001
B : 00010
C : 00011
D : 00100

65
It is important to emphasize that the code for a particular symbol changes during the adaptive
Huffman coding process.

Symbol: New A A NEW D NEW C C D D


Code : 0 00001 1 0 00100 00 00011 11 101 0

66
Chapter 4

4. Storage of Multimedia

The development of computerized multimedia applications and the technical requirements


resulting from the particular nature of these applications call for a document architecture
adapted to the characteristics and the types of documents to be integrated. In the case of
distributed applications, such architecture is meant to structure, represent and manage the
exchange of data between two users, or two workstations. These functions are also meant to
take place with centralized applications that involve an exchange of data between a server and
a user/client. In the current computerized environment, there is an urgent need for better
storage and presentation models of multimedia documents integrating a large amount of data
of different types. Indeed, current multimedia systems integrate a variety of data such as voice,
graphics, texts, video and different types of other images into a single document
Multimedia storage servers provide access to multimedia objects including text, images, audio,
and video. Due to the real-time storage and retrieval requirements and the large storage space
and data transfer rate requirements of digital multimedia, however, the design of such servers
fundamentally differs from conventional storage servers.

4.1. Basics of optical storage technologies


Optical storage, electronic storage medium that uses low-power laser beams to record and
retrieve digital (binary) data. In optical-storage technology, a laser beam encodes digital data
onto an optical, or laser, disk in the form of tiny pits arranged in a spiral track on the disk’s
surface. A low-power laser scanner is used to “read” these pits, with variations in the intensity
of reflected light from the pits being converted into electric signals. This technology is used in
the compact disc, which records sound; in the CD-ROM (compact disc read-only memory),
which can store text and images as well as sound; in WORM (write-once read-many), a type of
disk that can be written on once and read any number of times; and in newer disks that are
totally rewritable. Optical storage provides greater memory capacity than magnetic storage
because laser beams can be controlled and focused much more precisely than can tiny
magnetic heads, thereby enabling the condensation of data into a much smaller space. An
entire set of encyclopedias, for example, can be stored on a standard 12-centimetre (4.72-inch)

67
optical disk. Besides higher capacity, optical-storage technology also delivers more authentic
duplication of sounds and images.
Optical disks are also inexpensive to make: the plastic disks are simply molds pressed from a
master, as phonograph records are. The data on them cannot be destroyed by power outages
or magnetic disturbances, the disks themselves are relatively impervious to physical damage,
and unlike magnetic disks and tapes, they need not be kept in tightly sealed containers to
protect them from contaminants. Optical-scanning equipment is similarly durable because it
has relatively few moving parts.
Early optical disks were not erasable—i.e., data encoded onto their surfaces could be read but
not erased or rewritten. This problem was solved in the 1990s with the development of WORM
and of writable/rewritable disks. The chief remaining drawback to optical equipment is a
slower rate of information retrieval compared with conventional magnetic-storage media.
Despite its slowness, its superior capacity and recording characteristics make optical storage
ideally suited to memory-intensive applications, especially those that incorporate still or
animated graphics, sound, and large quantities of text. Multimedia encyclopedias, video games,
training programs, and directories are commonly stored on optical media.

Data Storage:
A data storage device is a device for recording (storing) information (data). Recording can
be done using virtually any form of energy. A storage device may hold information, process
information, or both. A device that only holds information is a recording medium. Devices that
process information (data storage equipment) may both access a separate portable
(removable) recording medium or a permanent component to store and retrieve information.
Electronic data storage is storage which requires electrical power to store and retrieve that
data.
Most storage devices that do not require visual optics to read data fall into this category.
Electronic data may be stored in either an analogue or digital signal format. This type of data is
considered to be electronically encoded data, whether or not it is electronically stored. Most
electronic data storage media (including some forms of computer storage) are considered
permanent (non-volatile) storage, that is, the data will remain stored when power is removed
from the device. In contrast, electronically stored information is considered volatile memory.

68
By adding more memory and storage space to the computer, the computing needs and habits to
keep pace, is filling the new capacity.

To estimate the memory requirements of a multimedia project—the space required on a floppy


disk, hard disk or CD-ROM, not the random access sense of the project’s content and scope.
Colour images, Sound bites, video clips and the programming code that glues it all together
require memory; if there are many of these elements, you will need even more. If you are
making multimedia, you will also need to allocate memory for storing and archiving working
files used during production, original audio and video clips, edited pieces, and final mixed
pieces, production paperwork and correspondence, and at least one backup of your project
files, with a second backup stored at another location.

4.1.1. Random Access Memory (RAM)


The RAM is the main memory where the operating system is initially loaded and the
application
programs are loaded at a later stage. The RAM is volatile in nature and every program that is
quit/exit is removed from the RAM. More the RAM capacity, higher will be the processing
speed.
If there is a budget constraint, then it is certain to produce a multimedia project on a slower or
limited-memory computer. On the other hand, it is profoundly frustrating to face memory
(RAM) shortages time after time, when you are attempting to keep multiple applications and
files open simultaneously. It is also frustrating to wait the extra seconds required to each
editing step when working with multimedia material on a slow processor.

On the Macintosh, the minimum RAM configuration for serious multimedia production is about
32 MB; but even 64 MB and 256 MB systems are becoming common, because while digitizing
audio or video, you can store more data much quickly in RAM. And when you are using some
software, you can quickly chew up available RAM—for example, Photoshop (16 MB minimum,
20 MB recommended); After Effects (32 MB required), Director (8 MB minimum, 20 MB
better); Page maker (24 MB recommended); Illustrator (16 MB recommended); Microsoft
Office (12 MB recommended).

69
In spite of all the marketing hype about processor speed, this speed is ineffective if not
accompanied by sufficient RAM. A fast processor without enough RAM may waste processor
cycles while it swaps needed portions of program code into and out of memory.
In some cases, increasing available RAM may show more performance improvement on your
system than upgrading the processor clip. On an MPC platform, multimedia authoring can also
consume a great deal of memory. It may be needed to open many large graphics and audio files,
as well as your authoring system, all at the same time to facilitate faster copying/pasting and
then testing in your authoring software. Although 8 MB is the minimum under the MPC
standard, much more is required as of now.

4.1.2. Read-Only Memory (ROM)

Read-only memory is not volatile, Unlike RAM, when you turn off the power to a ROM chip,
it will not forget, or lose its memory. The ROM is typically used in computers to hold the small
BIOS program that initially boots up the computer, and it is used in printers to hold built-in
fonts. Programmable ROMs (called PROM’s) allow changes to be made that are not forgotten. A
new and inexpensive technology, optical read-only memory (OROM), is provided in
proprietary
data cards using patented holographic storage. Typically, OROMs offer 128 MB of storage, have
no moving parts, and use only about 200 milli-watts of power, making them ideal for handheld,
battery-operated devices.

4.2. 3. Floppy and Hard Disks

Adequate storage space for the production environment can be provided by large capacity hard
disks; a server-mounted disk on a network; Zip, Jaz or SyQuest removable cartridges; optical
media; CD-R (compact disc-recordable) discs; tape; floppy disks; banks of special memory
devices or any combination of the above.
Removable media (floppy disks, compact or optical discs and cartridges) typically fit into a
letter sized mailer for overnight courier service. One or many disks may be required for
storage and archiving each project, and it is necessary to plan for backups kept off-site.
Floppy disks and hard disks are mass-storage devices for binary data that can be easily read by
a computer. Hard disks can contain much more information than floppy disks and can operate

70
at far greater data transfer rates. In the scale of things, floppies are, however, no longer “mass
storage” devices. A floppy disk is made of flexible Mylar plastic coated with a very thin layer of
special magnetic material.
A hard disk is actually a stack of hard metal platters coated with magnetically sensitive
material, with a series of recording heads or sensors that however a hairbreadth above the fast
spinning surface, magnetizing or demagnetizing spots along formatted tracks using technology
similar to that used by floppy disks and audio and video tape recording. Hard disks are the
most common mass-storage device used on computers, and for making multimedia, it is
necessary to have one or more large-capacity hard disk drives.
As multimedia has reached consumer desktops, makers of hard disks have been challenged to
build smaller profile, larger-capacity, faster, and less-expensive hard disks.
In 1994, hard disk manufactures sold nearly 70 million units; in 1995, more than 80 million
units. And prices have dropped a full order of magnitude in a matter of months. As network
and Internet servers increase the demand for centralized data storage requiring terabytes (1
trillion bytes), hard disks will be configured into fail-proof redundant array offering built-in
protection against crashes.

4.1.4 Zip, Jaz, syQuest and optical storage Devices

SyQuest’s 44 MB removable cartridges have been the most widely used portable medium
among multimedia developers and professionals, but Iomega’s inexpensive Zip drives with
their likewise inexpensive 100 MB cartridges have significantly penetrated SyQuest’s market
share for removable media. Iomega’s Jaz cartridges provide a gigabyte of removable storage
media and have fast enough transfer rates for audio and video development. Pinnacle Micro,
Yamaha, Sony, Philips and others offer CD-R “burners” for making write-once CDs, and some
double as quad-speed players. As blank CD-R discs become available for less than a dollar each,
this write once media competes as a distribution vehicle.

Magneto-optical (MO) drives use a high-power laser to heat tiny spots on the metal oxide
coating of the disk. While the spot is hot, a magnet aligns the oxides to provide a 0 or 1 (on or
off) orientation. Like SyQuests and other Winchester hard disks, this is rewritable technology,
because the spots can be repeatedly heated and aligned. Moreover, this media is normally not

71
affected by stray magnetism (it needs both heat and magnetism to make changes), so these
disks are particularly suitable for archiving data. The data transfer rate is, however, slow
compared to Zip, Jaz and SyQuest technologies.

One of the most popular formats uses a 128 MB-capacity disk-about the size of a 3.5 inch
floppy. Larger-format MO drives with 5.25 inch cartridges offering 650 MB to 1.3 GB of storage
are also available.

4.1.5 Digital Versatile Disc (DVD)


With this (DVD) new medium capable not only of gigabyte storage capacity but also full motion
video (MPEG2) and high-quantity audio in surround sound, the bar has again risen for
multimedia developers. Commercial multimedia projects will become more expensive to
produce as consumer’s performance expectations rise. There are two types of DVD: DVD—
Video and DVD—ROM. These reflect marketing channels, not the technology.
The DVD can provide 720 pixels per horizontal line whereas current television (NTSC)
provides 240-television pictures will be sharper and more detailed. With Dolby AC-3 Digital
surround
Sound as part of the specification, six discrete audio channels can be programmed for digital
surround sound, and with a separate subwoofer channel, developers can program the low
frequency doom and gloom music popular with Hollywood. Users can randomly access any
section of the disc and use the slow-motion and freeze-frame features during movies. Audio
tracks can be programmed for as many as eight different languages, with graphic subtitles in
32 languages

4.1.6 CD-ROM
A Compact Disc or CD is an optical disc used to store digital data, originally developed for
storing digital audio. The CD, available on the market since late 1982, remains the standard
playback medium for commercial audio recordings to the present day, though it has lost
ground in recent years to MP3 players.

An audio CD consists of one or more stereo tracks stored using 16-bit PCM coding at a
sampling rate of 44.1 kHz. Standard CDs have a diameter of 120 mm and can hold
approximately 80 minutes of audio. There are also 80 mm discs, sometimes used for CD singles,

72
which hold approximately 20 minutes of audio. The technology was later adapted for use as a
data storage device, known as a CD-ROM, and to include record once and re-writable media
(CD-R and CDRW respectively).

The CD-ROMs and CD-Rs remain widely used technologies in the computer industry as of 2007.
The CD and its extensions have been extremely successful in 2004, the worldwide sales of CD
audio, CD-ROM and CD-R reached about 30 billion discs. By 2007, 200 billion CDs had been
sold worldwide.

4.1.7 CD-ROM players


Compact disc read-only memory (CD-ROM) players have become an integral part of the
multimedia development workstation and are important delivery vehicle for large,
massproduced projects.
A wide variety of developer utilities, graphic backgrounds, stock photography and sounds,
applications, games, reference texts, and educational software are available only on this
medium.
The CD-ROM players have typically been very slow to access and transmit data (150 k per
second, which is the speed required of consumer Red Book Audio CDs), but new developments
have led to double, triple, quadruple, speed and even 24× drives designed specifically for
computer (not Red Book Audio) use. These faster drives spool up like washing machines on the
spin cycle and can be somewhat noisy, especially if the inserted CD is not evenly balanced.

4.1.8 CD recorders
With a CD recorder, you can make your own CDs using special CD recordable (CD-R) blank
optical discs to create a CD in most formats of CD-ROM and CD-Audio. The machines are made
by Sony, Phillips, Ricoh, Kodak, JVC, Yamaha and Pinnacle. Software, such as Adaptec’s Toast
for Macintosh or Easy CD Creator for Windows, lets you organize files on your hard disk(s) into
a “virtual” structure, then writes them to the CD in that order. The CD-R discs are made
differently than normal CDs but can play in any CD-Audio or CD-ROM player. They are available
in either a “63minute” or “ 74minute” capacity for the former, that means about 560 MB, and
for the latter, about 650 MB. These write-once CDs make excellent high-capacity file archives
and are used extensively by multimedia developers for pre mastering and testing CDROM
projects and titles.

73
4.1.9 Videodisc players

Videodisc players (commercial, not consumer quality) can be used in conjunction with the
computer to deliver multimedia applications. You can control the videodisc player from your
authoring software with X-Commands (XCMDs) on the Macintosh and with MCI commands in
Windows

74
Chapter 5

5. Multimedia Database Management System (MMDBMS)


At the heart of multimedia information systems lies the multimedia database management
system. Traditionally, a database consists of a controlled collection of data related to a given
entity, while a database management system, or DBMS, is a collection of interrelated data with
the set of programs used to define, create, store, access, manage, and query the database.
Similarly, we can view a multimedia database as a controlled collection of multimedia data
items, such as text, images, graphic objects, sketches, video, and audio.

A multimedia DBMS provides support for multimedia data types, plus facilities for the creation,
storage, access, query, and control of the multimedia database. The different data types
involved in multimedia databases might require special methods for optimal storage, access,
indexing, and retrieval. The multimedia DBMS should accommodate these special
requirements by providing high-level abstractions to manage the different data types, along
with a suitable interface for their presentation.

A multimedia database management system provides a suitable environment for using and
managing multimedia database information. Therefore, it must support the various multimedia
data types, in addition to providing facilities for traditional DBMS functions like database
definition and creation, data retrieval, data access and organization, data independence,
privacy, integration, integrity control, version control, and concurrency support. The functions
of a multimedia DBMS basically resemble those of a traditional DMBS.

We can describe the purposes of a multimedia DBMS as follows:

➢ Integration:-Ensures that data items need not be duplicated during different program
invocations requiring the data.
➢ Data independence:-Separation of the database and the management functions from
the application programs.
➢ Concurrency control:-Ensures multimedia database consistency through rules, which
usually impose some form of execution order on concurrent transactions.
➢ Persistence: - The ability of data objects to persist (survive) through different
transactions and program invocations.
➢ Privacy:-Restricts unauthorized access and modification of stored data.

75
➢ Integrity control:-Ensures consistency of the database state from one transaction to
another through constraints imposed on transactions.
➢ Recovery:-Methods needed to ensure that results of transactions that fail do not affect
the persistent data storage.
➢ Query support:-Ensures that the query mechanisms are suited for multimedia data.
➢ Version control:-Organization and management of different versions of persistent
objects, which might be required by applications.

5.1. Multimedia Database


Multimedia database is the collection of interrelated multimedia data that includes text,
graphics (sketches, drawings), images, animations, video, audio etc and have vast amounts of
multisource multimedia data. The framework that manages different types of multimedia data
which can be stored, delivered and utilized in different ways is known as multimedia database
management system. There are three classes of the multimedia database which includes static
media, dynamic media and dimensional media.

Content of Multimedia Database management system:


1. Media data – The actual data representing an object.
2. Media format data – Information such as sampling rate, resolution, encoding scheme
etc. About the format of the media data after it goes through the acquisition, processing
and encoding phase.
3. Media keyword data – Keywords description relating to the generation of data. It is
also known as content descriptive data. Example: date, time and place of recording.
4. Media feature data – Content dependent data such as the distribution of colors, kinds
of texture and different shapes present in data.

Types of multimedia applications based on data management characteristic are :


1. Repository applications – A Large amount of multimedia data as well as meta-
data(Media format date, Media keyword data, Media feature data) that is stored for
retrieval purpose, e.g., Repository of satellite images, engineering drawings, radiology
scanned pictures.
2. Presentation applications – They involve delivery of multimedia data subject to
temporal constraint. Optimal viewing or listening requires DBMS to deliver data at

76
certain rate offering the quality of service above a certain threshold. Here data is
processed as it is delivered. Example: Annotating of video and audio data, real-time
editing analysis.
3. Collaborative work using multimedia information – It involves executing a complex
task by merging drawings, changing notifications. Example: Intelligent healthcare
network.
There are still many challenges to multimedia databases, some of which are :
✓ Modelling – Working in this area can improve database versus information retrieval
techniques thus, documents constitute a specialized area and deserve special
consideration.
✓ Design – The conceptual, logical and physical design of multimedia databases has not
yet been addressed fully as performance and tuning issues at each level are far more
complex as they consist of a variety of formats like JPEG, GIF, PNG, MPEG which is not
easy to convert from one form to another.
✓ Storage – Storage of multimedia database on any standard disk presents the problem
of representation, compression, mapping to device hierarchies, archiving and buffering
during input-output operation. In DBMS, a ”BLOB”(Binary Large Object) facility allows
untyped bitmaps to be stored and retrieved.
✓ Performance – For an application involving video playback or audio-video
synchronization, physical limitations dominate. The use of parallel processing may
alleviate some problems but such techniques are not yet fully developed. Apart from
this multimedia database consume a lot of processing time as well as bandwidth.
✓ Queries and retrieval –For multimedia data like images, video, audio accessing data
through query opens up many issues like efficient query formulation, query execution
and optimization which need to be worked upon.

5.2. Usage of multimedia databases in different level


(1) MM files and archives:
• Simple browsing and retrieval
• No queries
• Supporting software: e.g. web/ media server, browser, player

77
(2) Annotated & indexed archives:
• Search by keyword; see e.g.
– Image archive: Gimp-Savvy
– Audio archive: Spotify
– Spatial db: Google maps
– Video archive: YouTube
(3) MM archive as part of a wider application
• Browsing/search by keyword, plus related actions, e.g.
– Web shop: Amazon
(4) ’True’ MM databases:
• General queries by media content, e.g.
– Melody search: Musipedia

MMDB should achieve the following


(1) Must support the main types of MM data
(2) Can handle a very large number of MM objects
(3) Supports high-performance, high-capacity storage management:
Hierarchical storage
(4) Offers DB capabilities:
Persistence, transactions, concurrency control, recovery from failures, querying with
high- level declarative constructs, versioning, integrity constraints, security.
(5) Information-retrieval capabilities:
Exact-match retrieval, probabilistic (best-match) retrieval, content-based
retrieval,ranking of results

5.3. Design and Architecture of a multimedia database


5.3.1. MMDB Architectural considerations
Traditional approach:
• Relational or extended relational DBMS, with support for large objects (BLOBs and
CLOBs)
• Information retrieval module (content-based access of objects)
Emerging approach:

78
• ’NoSQL’ databases (’Not only SQL’)
• Improved retrieval speed for very large quantities of data.
• Restricted update types (mainly append), restricted transaction support (relaxed
consistency requirements)
Ideal for MMDB
• Extensible database system with OO capabilities
• Support for queries and transactions involving MM objects
• Support for complex objects with MM subobjects

79
5.3.2. Distributed MMDBMS Architecture

Figure 1 Architecture of a Typical Distributed Multimedia Database Management System

5.3.3. MMDBMS Server Components


A typical multimedia database management system has the following components:

80
Storage Manager: handles the storage and retrieval of different media objects composing a
database The metadata as well as the index information related to media objects are also
handled by this module.
Metadata Manager: deals with creating and updating metadata associated with multimedia
objects. The metadata manager provides relevant information to the query processor in order
to process a query.
Index Manager: supports formulation and maintenance of faster access structures for
multimedia information
Data Manager helps in creation and modification of multimedia objects. It also helps in
handling temporal and spatial characteristics of media objects. Metadata manager, index
manager, and object manager access the required information through the storage manager.
Query Processor receives and processes user queries. This module utilizes information
provided by metadata manager, index manager, and data manager for processing a query. In
the case where objects are to be presented to the user as part of the query response, the query
processor identifies the sequence of the media objects to be presented. This sequence along
with the temporal information for object presentation is passed back to the user.

Communication Manager: handles the interface to the computer network for server-client
interaction. This module, depending on the services offered by the network service provider,
reserves the necessary bandwidth for server-client communication, and transfers queries and
responses between the server and the client.

MMDBMS Client Components


A typical client in the multimedia database management system has the following components:

❖ Communication Manager: manages the communication requirements of a multimedia


database client. Its functionalities are the same as that of the server communication
manager.
❖ Retrieval Schedule Generator: determines the schedule for retrieving media objects.
For the retrieval schedule generator, the response handler provides information
regarding the objects composing the response and the associated temporal information.
The retrieval schedule generator, based on the available buffer and network

81
throughput, determines the schedule for object retrieval. This retrieval schedule is used
by the communication manager to download media objects in the specified sequence.
❖ Response Handler: interacts with the client communication manager in order to
identify the type of response generated by the server. If the response comprises of the
information regarding the objects to be presented, then the information is passed to the
retrieval schedule generator for determining the retrieval sequence. If the response
comprises of the information regarding modifying the posed query or a null answer, the
response is passed to the user.
❖ Interactive Query Formulator: helps a user to frame an appropriate query to be
communicated to a database server . This module takes the response from the server
(through the response handler module) in order to reformulate the query, if necessary.

5.3.4. Implementation Considerations


The implementation of different modules composing a distributed MMDBMS depends on the
hardware resources, operating systems, and the services offered by computer networks.

Hardware resources: The available hardware resources influence both client and server
design. On the server side, the available disk space limits the size of a multimedia database that
can be handled. The number of queries that can be handled and the query response time
depends, to a certain extent, on the speed of the system. On the client side, buffer availability
for media objects retrieval is influenced by available hardware resources. User interface for
multimedia presentation is also influenced by hardware characteristics such as monitor
resolution, width and height.

Operating System: Multimedia databases are composed of continuous media objects. The
retrieval and presentation of these media objects have an associated time constraint. Hence,
real-time features might be needed as part of the operating system. Also, the file system needs
to
Handle large media objects.
Computer Network: Services offered by computer networks influences the retrieval of media
objects by a client system. If guaranteed throughput is not offered by the network service
provider, the client may not be able to retrieve objects at the required time. This may lead to an

82
object presentation sequence that differs from the temporal requirements specified in the
database. Also, the buffer requirements in the client system depend on the throughput offered
by the network service provider

Sample application areas of MMDBs


(a) Educational multimedia services:
• Distance learning
• Teaching material
• Educational audio/video document archives
• Preview possibility
(b) Video-on-demand:
• Selection of movie, possibly using queries
• Preview possibility; wind/rewind
• Requires high bandwidth
• Method of payment must be simple
(c) Audio-on-demand:
• Less bandwidth-consuming than video
• Recorded programs, music, and live net radio stations
(d) Electronic commerce:
• Online info about products: pictures, explanations, availability, etc.
• Possibility to make queries
• Online ordering systems with credit card / net bank payment.
• Examples: bookstore, travel agency
(e) Intelligent systems (‘expert systems’):
• Machine repair: Automatic assistants of different repair jobs.
Manuals may be hard to read; demonstrative videos tell it better
• Medical care: Standard surgery operations
• Crime investigations: combination of surveillance & other info
(f) Digital libraries
• Organized collections of digital information
• Both documents and their metadata in digital form
• Versatile metadata- and content-based retrieval opportunities
• Usually accessible through the web

83
• The web itself & search engines may be considered some kind of (poorly organized)
digital library
(g) Medical information systems
• Patient data, including X-rays, ECG curves, MRI images, ...
• Strict confidentiality
• Used for diagnosis, monitoring and research
• Automated tools: image/signal processing, pattern recognition, ...

Application: Image search engines – Google!

http://www.google.com/mobile/goggles/#landmark

Application: Fingerprint Matching and retrieval

84
Application: real-time skin detection for human recognition

Application: real-time object recognition and tracking

5.4. Indexing Mechanism in Organizing Multimedia Database

85
Indexing and Searching Method
Traditional database indexing and searching methods:
– Object identifiers
– Attributes
– Keywords …
– Can be used in MDBMS, but this is not enough!
Content-based retrieval is computationally intensive. Hence, there is a need for the design of
powerful indexing and searching schema.

Contents of MMDB
• Media data – original content
• Metadata:
– Media format data Ex. Size, Compression details
– Media keyword data Ex. Name, Author, Date
– Media feature data Ex. Scene, Background, other
contextual details
• Indexing can be done on all the above (format, keyword, feature) metadata
Example of Metadata(Feature)
• Image data:
– Content features
• Color, shape, texture, spatial information
• Video data:
– A video sequence can be viewed as a set of continuous video frames - some are
representative (key frames)
• Audio data:
– Acoustic characteristics of audio data
So while Indexing we need to consider different levels
– High-level
– Mid-level
– Low-level

86
Traditional DBMSs are designed and implemented to find exact matches between the stored
data and the query parameters for the precise queries. MDBMS need to handle the incomplete
and fuzzy queries
Multimedia object is amenable to different interpretations by different users. Representation of
multimedia object in an MDBMS is always less than complete. Therefore there is a need for
interactive retrieval …
Feature extraction and indexing of images
Extraction of descriptive attributes from images
• Manually, automatically, or using a hybrid scheme
(automatic segmentation & manual assignment of properties).

Manual indexing:
Performed by a ‘knowledge worker’, trained on patterns and vocabulary of the image database
application. Multiple indexers means: Strict consistency rules, common glossary.
Each interesting object is presented manually to the system for indexing, equipped with
descriptive attributes. Time-consuming and costly; . http://gimp-savvy.com/PHOTO-ARCHIVE

Automatic indexing
Specialized for various application domains (document recognition, optical character
recognition (OCR), engineering drawings, x-rays, ...). The system must first ‘learn’ and
categorize domain element objects. A certain amount of uncertainty (fuzziness) must be
tolerated.

Important area of automatic image analysis and object recognition: Transformation of paper
documents into digital form, and indexing those documents appropriately (so called
document imaging → digital libraries).

Image database structures


The storage of the pixel matrix (possibly compressed) is usually sequential, because it spans
several disk blocks. Each image can be considered as a file of its own.

(a) Relational representation


• Image relation: Image id and image-level (global) properties.

87
• Object relation: Objects (segments, rectangles) within images; extracted manually or
automatically.
--Attributes include: image id, object id, MBR coordinates, features.
• Queries: Apply ‘normal’ database techniques using feature values in query conditions.

(b) Spatial representation


• E.g. using R-trees, R*-trees, etc.
• Build a single R-tree for all images in the database.
• A leaf page contains a set of closely-located objects (their MBRs), with a list of pointers
to source images.
• Separate indexes can be built for other than spatial properties.

Spatial object represented by its MBR

Minimum Bounding Rectangle(MBR)

General observations:
In non-spatial respects, images can usually be treated as documents, and retrieved using
techniques developed for general information retrieval. Combined usage of spatial and non-
spatial criteria in retrieval is achieved simply by combining (union, intersection, etc.) the
pointer lists from the related indexes.

88
5.4.1. Data Models: Data Model Requirements
The role of a data model in DBMSs is to provide a framework (or language) to express the
properties of the data items that are to be stored and retrieved using the system. The
framework should allow designers and users to define, insert, delete, modify, and search
database items and properties. In MIRSs and MMDBMSs, the data model assumes the

89
additional role of specifying and computing different levels of abstraction from multimedia
data.
Multimedia data models capture the static and dynamic properties of the database items, and
thus provide a formal basis for developing the appropriate tools needed in using the
multimedia data. The static properties could include the objects that make up the multimedia
data, the relationships between the objects, and the object attributes. Examples of dynamic
properties include those related to interaction between objects, operations on objects, user
interaction, and so forth.
The richness of the data model plays a key role in the usability of the MIRS. Although the basic
multimedia data types must be supported, they only provide the foundation on which to build
additional features.

Multidimensional feature spaces are a characteristic of multimedia indexing.


A data model should support the representation of these multidimensional spaces, especially
the measures of distance in such a space.
A General Multimedia Data Model
There is a general consensus that the MIRS data model should be based on OO principle and
multilayer hierarchy . OO design encapsulates code and data into a single unit called an object.
The code defines the operations that can be performed on the data. The encapsulation
promotes modularity and hides details of particular media and processing. More importantly,
the OO approach provides extensibility by offering mechanisms for enhancing and extending
existing objects.
Object Layer
An object consists of one or more media items with specified spatial(space) and
temporal(time) relationships. It is normally about only one central topic. An example of a
multimedia object is a slide show (say introducing a tourist attraction) consisting of a number
of images and accompanying audio.

The key issues are how to specify spatial and temporal relationships. Spatial relationships are
specified by the displaying window size and position for each item. The common method for
temporal specification is timeline-based specification in which the starting time and duration
of each item are specified based on a common clock . Other methods include scripts and event
driven models.

90
During presentation, the semantics of an object is maintained only when spatial and temporal
relationships are maintained.

Figure 2 A General multimedia data model

Media Type Layer

The media type layer contains common media types such as text, graphics, image, audio, and
video. These media types are derived from a common abstract media class. At this level,
features or attributes are specified. Taking media type image as an example, the image size,
color histogram, main objects contained, and so on are specified. These features are used
directly for searching and distance calculations.

Media Format Layer

The media format layer specifies the media formats in which the data is stored. A media type
normally has many possible formats. For example, an image can be in raw bitmapped or
compressed formats. There are also many different compression techniques and standards.
The information contained in this layer is used for proper decoding, analysis and presentation.

Remaining Issues

91
Note that different applications may need different data models, due to the fact that different
features and objects are used in different applications. But many applications can share a
common base model if it is properly designed and new features and objects can be added on or
derived from this base model to meet applications requirements.
At the moment, each of the above layers of the data model is not completely designed. There is
no common standard in this area, partly due to the fact that no large scale MIRS has been
developed. Most MIRSs are application-specific, focusing on a limited number of features of a
limited number of media types. The data models, if any, are designed in an ad hoc fashion. More
work is needed in multimedia data modeling to develop consistent large scale MIRSs and
MMDBMSs.

VIMSYS Data Model

The VIMSYS model is proposed specifically to manage visual information (images and video).
It consists of four layers: the representation layer, the image object layer, the domain object
layer
and the domain event layer. All objects in each layer have a set of attributes and methods.

92
FIG 5.3 The VIMSYS data model

Image Representation Layer

The representation layer contains image data and any transformation that results in an
alternate image representation. Examples of transformations are compression, color space
conversion, and image enhancement. This layer is not very powerful at processing user
queries; it can only handle queries that request pixel-based information. However, this layer
provides the raw data for upper layers to extract and define higher level features.

Image Object Layer

The image object layer has two sub-layers: the segmentation sub-layer and the feature sub-
layer.

93
The segmentation sub-layer condenses information about an entire image or video into spatial
or temporal clusters of summarized properties. These localized properties can be searched for
directly. This sub-layer relies heavily on image and video segmentation techniques.
The feature sub-layer contains features that are efficient to compute, organizable in a data
structure, and amenable to a numeric distance computation to produce a ranking score. The
common features are color distribution (histogram), object shape, and texture.

Domain Object Layer

A domain object is a user-defined entity representing a physical entity or a concept that is


derived from one or more features in the lower layers. Examples of domain objects are
"sunset" (a concept) and "heart" (a physical object). Domain knowledge is needed to derive
domain objects.

Domain Event Layer

The domain event layer defines events that can be queried by the user. Events are defined
based on motion (velocity of object movement), spatial and/or temporal relationships between
objects, appearance and disappearance of an object, and so on. Event detection and
organization mechanisms are needed to implement this layer.

94
Chapter 6

Multimedia Data Retrieval


Multimedia data is becoming ubiquitous. Ranging from photos to videos, multimedia data is
becoming part of all applications. Most emerging applications now have some kind of
multimedia data that must be considered integral part of the databases. Moreover, emerging
applications in all applications areas, ranging from homeland security to healthcare contain
rich multimedia data. Based on current trend, it is safe to assume that in very near future, much
of the data managed in databases will be multimedia. Some particular application domains
where multimedia is natural and will continue dominating are entertainment, news,
healthcare, and homeland security
Given a collection of multimedia documents, the goal of multimedia information retrieval
(MIR) is to find the documents that are relevant to a user information need. A multimedia
document is a complex information object, with components of different kinds, such as text,
images, video and sound, all in digital form
The central concern of multimedia information retrieval (MIR, for short) is easily stated: given
a collection of multimedia documents, find those that are relevant to a user information need.
The meaning of the terms involved in this definition is nowadays widely known. Most of the
people using computers have come across, one way or another, a multimedia document,
namely a complex information object, with components of different kinds, such as text, images,
video and sound, all in digital form. An ever growing amount of people are everyday accessing
collections of such documents in order to satisfy the most disparate information needs. Indeed,
the number and types of multimedia information providers is steadily increasing: currently, it
includes publishing houses, cultural, scientific and academic institutions, cinema and TV
studios, as well as commercial enterprises offering their catalogs on-line.
The potential users of the provided information range from highly skilled specialists to laymen.
All these users share the same expectation: to be able to retrieve the documents that are
relevant to their information needs.

6.1 Multimedia content representation

A Multimedia system has four basic characteristics:


• Multimedia systems must be computer controlled.

95
• Multimedia systems are integrated.
• The information they handle must be represented digitally.
• The interface to the final presentation of media is usually interactive.
Computer Controlled
Producing the content of the information – e.g. by using the authoring tools, image editor,
sound and video editor. Storing the information – providing large and shared capacity for
multimedia information. Transmitting the information – through the network.
Presenting the information to the end user – make direct use of computer peripheral such as
display device (monitor) or sound generator (speaker).

Integrated
All multimedia components (audio, video, text, graphics) used in the system must be somehow
integrated. Every device, such as microphone and camera is connected to and controlled by a
single computer. A single type of digital storage is used for all media type. Video sequences are
shown on computer screen instead of TV monitor

Interactivity
Level 1: Interactivity strictly on information delivery. Users select the time at which the
presentation starts, the order, the speed and the form of the presentation itself. Level 2: Users
can modify or enrich the content of the information, and this modification is recorded. Level 3:
Actual processing of users input and the computer generate genuine result based on the users
input
Digitally Represented
Digitization: process involved in transforming an analog signal to digital signal.

Multimedia Data Access


Access to multimedia information must be quick so that retrieval time is minimal. Data access
is based on metadata generated for different media composing a database. Metadata must be
stored using appropriate index structures to provide efficient access. Index structures to be
used depend on the media, the metadata, and the type of queries that are to be supported as
part of a database application.

Access To Text Data

96
Text metadata consists of index features that occur in a document as well as descriptions about
the document. For providing fast text access, appropriate access structures have to be used for
storing the metadata. Also, the choice of index features for text access should be such that it
helps in selecting the appropriate document for a user query. In this section, we discuss the
factors influencing the choice of the index features for text data and the methodologies for
storing them.
Selection of Index Features
The choice of index features should be in such a way that they describe the documents in a
possibly unique manner. The definitions document frequency and inverse document frequency
describe the characteristics of index features.
Methodologies for Text Access: Once the indexing features for a set of text documents are
determined, appropriate techniques must be designed for storing and searching the index
features. The efficiency of these techniques directly influence the response time of search.
Here, we discuss the following techniques :

• Full Text Scanning: The easiest approach is to search the entire set of documents for the
queried index feature(s). This method, called full text scanning, has the advantage that the
index features do not have to be identified and stored separately. The obvious disadvantage is
the need to scan the whole document(s) for every query.

• Inverted Files : Another approach is to store the index features separately and check the
stored features for every query. A popular technique, termed inverted files, is used for this
purpose.

• Document Clustering: Documents can be grouped into clusters, with the documents in each
cluster having common indexing features.

Full Text Scanning


In full text scanning, as the name implies, the query feature is searched in the entire set of
documents. For boolean queries (where occurrences of multiple features are to be tested), it
might involve multiple searches for different features. A simple algorithm for feature searching
in a full text is to compare the characters in the search feature with those occurring in the

97
document. In the case of a mismatch, the position of search in the document is shifted right
once, and this way the search is continued till either the feature is found in the document or the
end of document is reached. Though the algorithm is very simple, it suffers from the number of
comparisons that are to be made for locating the feature.

Inverted Files
Inverted files are used to store search information about a document or a set of documents.
The search information includes the index feature and a set of postings. These postings point to
the set of documents where the index features occur. Access to an inverted file is based on a
single key and hence efficient access to the index features should be supported. The index
features can be sorted alphabetically or stored in the form of a hash table or using
sophisticated mechanism such as B-trees.
Text Retrieval Using Inverted Files

Index features in user queries are searched by comparing them with the ones stored in the
inverted files, using B-tree searching, or hashing depending on the technique used in the
inverted file. The advantage of inverted files is that it provides fast access to the features and it
reduces the response time for user queries. The disadvantage is that the size of the inverted
files can become very large when the number of documents and the index features become
large. Also, the cost of maintaining the inverted files (updating and reorganizing the index files)
can be very high.

Multi-attribute Retrieval
When a query for searching a text document consists of more than one feature, different
techniques must be used to search the information. Consider a query used for searching a book
titled 'Multimedia database management systems'. Here, four key words (or attribute values)
are specified: 'multimedia', 'database', 'management', and 'systems'. Each attribute is hashed to
give a bit pattern of fixed length and the bit patterns for all the attributes are superimposed
(boolean OR operation) to derive the signature value of the query.
Searching from an image database
1. Using a hierarchical classification of images:
The user follows paths in the hierarchy,

98
Ex.
Art works
Paintings
France
18th century
2. Search using keywords in metadata
Images can be considered similar to documents with index terms
3. Search by content features(sample given)
Pattern matching based on similarity with a query image, shape, color
distribution, etc.

How we define image content basically?


Color feature extraction
Usually based on color histograms, i.e. number of pixels of each color (or color
component): Separate histograms can be built for various subregions of the image (e.g.
top-left, top-right, middle, ...)

Image segmentation
Detection of interesting regions within images. A segment is a connected region that satisfies a
homogeneity predicate.Basis for subsequent search. And one of the most difficult tasks in image
processing.

99
Connected region:
◼ For each pair (x1, y1), (xn, yn) of pixels, there exists a chain of pixels {(x1, y1), ..., (xn, yn)} in
the region such that {(xi, yi), (xi+1, yi+1)} are adjacent for all i.
Homogeneity predicate:
◼ p % of the pixels of the connected region have the same color

Content-based retrieval from image databases


General property:
◼ Retrieval is not 100 % precise.
Query types:
◼ Find images having certain features, e.g. color, texture, shape, etc.
◼ Find images containing certain types of objects.
◼ Find image objects having certain attributes, such as shape (circle, triangle, arc, ...), size,
color, etc.
◼ Find images where object of type 1 is located left of object of type 2
(= spatial relationship)

Similarity search: Find images (segments) similar to a given query image (segment).
Applications: recognition of persons from photographs / fingerprints, recognition of military
airplanes, ships, etc.

Process flow example:

100
6.2. Similarity measures during searching

Approaches to similarity-based retrieval


(a) Direct metric approach
◼ A distance function is defined for images (segments)
◼ Task: Find the nearest neighbor (k nearest neighbors) of the query image (segment).
◼ Naive distance functions for m  n color images:
 L1-metric: Sum of pairwise Euclidean distances of RGB pixels
 L2-metric: Euclidean distance in (m  n  3)-dimensional space.
◼ Plenty of computation needed.

(b) Feature-based metrics


◼ Use feature extraction to reduce dimensionality, e.g.
 Shapes of segments in the image
 Color, texture, … features of segments or the whole images
 Different types of features often combined.
◼ The ‘true’ distance function d of images/segments, and the distance function d’ of
extracted feature vectors should satisfy approximately
d(a, b) < d (a, c)  d’(a, b) < d’(a, c)
◼ Use indexing to accelerate similarity retrieval:
101
Query By Example (QBE)
Graphically expressed queries based on:
 example images
 user-constructed sketches and drawings (shape parameters)
 color (principal color or color histogram)
 texture patterns (coarseness, contrast, directionality)
 camera and object motion (in videos)

The image DB user should be able to show the system a sample image, or Paint one
interactively on the screen, or Just sketch the outline of an object. The system should then be
able to return similar images or images containing similar objects.

A sample query
SELECT CONTENTS(image), QBScoreFROMStr(`averageColor= <255,0,0>’, image) AS
SCORE FROM signs ORDER BY SCORE

Search Using Sketch


Sketch entry: giving rough draft of an image
.

Results of search: system will retrieve all related image with respect to draft
image(sketch)

102
Relevance Feedback in CBIR
Motivation:
Human perception of image similarity is subjective, semantic, and task-dependent. The CBIR
based on the similarities of pure visual features are not necessarily perceptually and
semantically meaningful. Each type of visual feature tends to capture only one aspect of image
property and it is usually hard for a user to specify clearly how different aspects are combined.

Relevance Feedback is introduced to address these problems. It is possible to establish the link
between high-level concepts and low-level features.
RF is a supervised active learning technique used to improve the effectiveness of information
systems.
Main idea: use positive and negative examples from the user to improve system performance.

6.3. Retrieval performance measures:

Precision and Recall


Given a query:
 Are all retrieved documents relevant?
 Have all the relevant documents been retrieved?
The first question is about the precision of the search, the second is about the completeness
(recall) of the search.
Precision p: The proportion of retrieved material that is relevant, Given a retrieval list of n
items, p=
g ( n)
n

where g(n) is the number of items in the list relevant to the query.

Recall r:
The proportion of relevant documents retrieved from DB.
r = Total number of documents retrieved that are relevant /(divide by)
Total number of relevant documents in the database

103
Querying mechanism Example
A Sample Multimedia Scenario
Consider a police investigation of a large-scale drug operation. This investigation may generate
the following types of data
– Video data captured by surveillance cameras that record the activities taking
place at various locations.
– Audio data captured by legally authorized telephone wiretaps.
– Image data consisting of still photographs taken by investigators.
– Document data seized by the police when raiding one or more places.
– Structured relational data containing background information, back records, etc.,
of the suspects involved.
– Geographic information system data remaining geographic data relevant to the
drug investigation being conducted.

6.4. Query languages


The conditions specified as part of a user query that are to be evaluated for selecting an object
are termed as query predicates. These predicates can be combined with boolean operators such
as AND, OR, and NOT. Query languages are used to describe query predicates. For multimedia
database applications, query languages require features for describing the following
predicates.

104
Possible Queries
Image Query (by example):
◼ Police officer Rocky has a photograph in front of him.
◼ He wants to find the identity of the person in the picture.
◼ Query: “Retrieve all images from the image library in which the person appearing in the
(currently displayed) photograph appears”

Image Query (by keywords):


◼ Police officer Rocky wants to examine pictures of “Big Spender”.
◼ Query: "Retrieve all images from the image library in which “Big Spender” appears."

Video Query:
◼ Police officer Rocky is examining a surveillance video of a particular person being fatally
assaulted by an assailant. However, the assailant's face is occluded and image
processing algorithms return very poor matches. Rocky thinks the assault was by
someone known to the victim.
◼ Query: “Find all video segments in which the victim of the assault appears.”
◼ By examining the answer of the above query, Rocky hopes to find other people who
have previously interacted with the victim.

Heterogeneous Multimedia Query:


◼ Find all individuals who have been photographed with “Big Spender” and who have
been convicted of attempted murder in South China and who have recently had
electronic fund transfers made into their bank accounts from ABC Corp.

MM Database Architectures
1. Based on Principle of Autonomy
Each media type is organized in a media-specific manner suitable for that media type,
need to compute joins across different data structures
Relatively fast query processing due to specialized structures, The only choice for legacy
data banks

105
2. Based on Principle of Uniformity
A single abstract structure to index all media types. Abstract out the common part of
different media types (difficult!) - metadata
But one structure is easy for implementation

3. Based on Principle of Hybrid Organization


A hybrid of the first two. Certain media types use their own indexes, while others use
the "unified" index.
An attempt to capture the advantages of the first two.
Joins across multiple data sources using their native indexes.

106
QUERY PROCESSING
Query on the content of the media information: (Example Query: Show the details of the
movie where a cartoon character says: 'Somebody poisoned the water hole'). The content of
media information is described by the metadata associated with media objects. Hence, these
queries have to be processed by accessing directly the metadata and then the media objects.
Query by example (QBE) : (Example Query: Show me the movie which contains this song.)

QBEs have to be processed by finding a similar object that matches the one in the example. The
query processor has to identify exactly the characteristics of the example object the user wants
to match. We can consider the following query: Get me the images similar to this one. The
similarity matching required by the user can be on texture, colour, spatial characteristics
(position of objects within the example image) or the shape of the objects that are present in
the image. Also, the matching can be exact or partial. For partial matching, the query processor
has to identify the degree of mismatch that can be tolerated. Then, the query processor has to
apply the cluster generation function for the example media object, these cluster generating
functions map the example object into an m-dimensional feature space. The query processor
has to identify the objects that are mapped within a distance d in the m-dimensional feature
space . Objects present within this distance d are retrieved with a certain measure of
confidence and are presented as an ordered list. Here, the distance d is proportional to the
degree of mismatch thatcan be tolerated.
Time indexed queries: (Example Query : Show me the movie 30 minutes after its start).

107
These queries are made on the temporal characteristics of the media objects. The temporal
characteristics can be stored using segment index trees. The query processor has to process
the time indexed queries by accessing the index information stored using segment trees or
other similar methods.
Spatial queries: (Example Query: Show me the image where President Yelstin is seen to the
left of President Clinton). These are made on the spatial characteristics associated with media
objects. These spatial characteristics can be generated as metadata information. The query
processor can access this metadata information (stored using techniques to generate the
response).
Application specific queries : (Example Query: Show me the video where the river changes its
course). Application specific descriptions can be stored as metadata information. The query
processor can access this information for generating response.

Options for Query Processing

Queries in multimedia databases may involve references to multiple media objects. Query
processor may have different options to select the media database that is to be accessed first.
As a simple case, Figure describes the processing of a query that references a single media, text.

108
Assuming the existence of metadata for the text information, the index file is accessed first.
Based on the text document selected by accessing the metadata, the information is presented
to the user.
When the query references more than one media, the processing can be done in different ways.
Figure above describes one possible way of processing of a query that reference multiple
media: text and image. Assuming that metadata is available for both text and image data, the
query can be processed in two different ways:

• The index file associated with text information is accessed first to select an initial
set of Documents. Then this set of documents are examined to determine whether any
document contains the image object specified in the query. This implies that documents
carries the information regarding the contained images.
• The index file associated with image information is accessed first to select a set of
images. Then the information associated with the set of images is examined to
determine whether images are part of any document. This strategy assumes that the
information regarding the containment of images in documents are maintained as a
separate information base.
.

6.5 query expansion mechanism


The Information Retrieval system described above works very well if the user is able to convey
his information need in form of query. But query is seldom complete. The query provided by
the user is often unstructured and incomplete. An incomplete query hinders a search engine
from satisfying the user’s information need.
In relevance feedback, users give additional input on documents (by marking documents in the
results set as relevant or not), and this input is used QUERY EXPANSION to reweight the terms
in the query for documents.
In query expansion on the other hand, users give additional input on query words or phrases,
possibly suggesting additional query terms. Some search engines (especially on the web)
suggest related queries in response to a query; the users then opt to use one of these
alternative query suggestions. The central question in this form of query expansion is how to
generate alternative or expanded queries for the user.
Following are some of the simple techniques of query expansion,

109
• Finding synonyms of words, and searching for the synonyms as well
• Finding all the various morphological forms of words by stemming each word in the
search query
• Fixing spelling errors and automatically searching for the corrected form or suggesting
it in the results
• Re-weighting the terms in the original query
• Creating a dictionary of expansion terms for each terms, and then looking up in the
dictionary for expansion

6.5.1 External Resource Based Query Expansion


In these approaches, the query is expanded using some external resource like WordNet, lexical
dictionaries or thesaurus. These dictionaries are built manually which contain mappings of the
terms to their relevant terms. There techniques involve look up in such resources and adding
the related terms to query. Following are some of external resource based query expansion
technique
Thesaurus based expansion:- A thesaurus is a data structure that lists words grouped
together according to similarity of meaning (containing synonyms and sometimes antonyms),
in contrast to a dictionary, which provides definitions for words and generally lists them in
alphabetical order. In this approach thesaurus is used to expand the query terms and all the
connected words of query terms are added to query. Thesaurus based system have been
explored and put to use by many organizations. A well known example for such systems is
Unified Medical Language System (UMLS) used with MedLine for querying the bio medical
research literature. They maintain a controlled vocabulary which is human controlled. The
vocabulary contains similar terms for each bio medical concept. Use of a controlled vocabulary
is common for domains which are well resourced Qui and Frei propose the use of similarity
thesaurus for query expansion. The build the thesaurus automatically using the domain
specific data available.
A thesaurus based query expansion system works well only if we have a rich domain specific
thesaurus.
WordNet based expansion:- WordNet is a lexical database for multiple languages. The
similar terms from multiple languages are connected via synsets (set of senses). WordNet can
be used to fetch related term for a particular term in multiple languages and can help in
satisfying user’s information need.

110
WordNet for query expansion and report negative results. Synonyms terms were added to the
query. He observed that this approach makes a little difference in retrieval effectiveness if the
original query is well formed.
WordNet is also used by Smeaton et al. along with POS tagging for query expansion. Each query
term was expanded independently and equally. The interesting thing is they completely
ignored the original query terms after expansion. As a results of this precision and recall
dropped but they were able to retrieve documents which did not contain any of the query
terms but are relevant to the query
6.5.2. QUERY LOGS BASED EXPANSION
With the increase in usage of Web search engines, it is easy to collect and use user query logs.
Query logs are maintained by each search engine in order to analyze the behavior of the user
while interacting with search engine. These kind of approaches use these query logs to analyze
the user’s preference and adds corresponding terms to query. This method can fail when user
wants to search something which is not at all related to earlier searches.

6.5.2. RELEVANCE FEEDBACK BASED EXPANSION


Relevance feedback based methods execute the initial query on collection and extract top k
documents. Then the ranked document are used to improve the performance of retrieval. It is
assumed that the initial retrieved documents are relevant and thus can be used to extract
expansion terms. These models fail when the initial retrieval algorithm of search engine is
poor. These models can be classified into following types : 4.1 Explicit feedback from user This
an interactive approach in which the initial retrieved documents are presented to user and the
user is asked to select the relevant documents. These models are not much useful because
users expect the system to be autonomous and retrieve the results for the user. User would
ultimately get irritated by repeated interaction required from him for each search. These type
of models can be used for testing search engines where developers are willing to interact with
the system. 4.2 Implicit feedback This is a type of model in which the user’s feedback is
inferred by the system. The feedback can be inferred from user’s behavior like : The pages
which user opens for reading, or pages on which user clicks once the results are displayed back
to the user 4.3 Pseudo Relevance Feedback (PRF) In Pseudo Relevance Feedback based models
initial query is fired and top k results are obtained. Then important terms, mostly based on co-
occurrence, from these documents are

111
6.5.3. QUERY EXPANSION USING WIKIPEDIA
Wikipedia is the biggest encyclopedia available freely on the web. Wikipedia content is well
structured and correct. Li et al. Many works suggest use of Wikipedia as source for query
expansion[1, 11, 20, 21, 22]. [2007] [13] proposed query expansion using Wikipedia by
utilizing the category assignments of its articles. The base query is run against a Wikipedia
collection and each category is assigned a weight proportional to the number of top-ranked
articles assigned to it. Articles are then re-ranked based on the sum of the weights of the
categories to which each belongs.

112
Reference Materials

1. Ralf Steinmetz and Klara Nahrstedt, Multimedia Fundamentals: Media Coding and Content
Processing; Prentice Hall,
2. Ze Nian Li and M. S. Drew, Fundamentals of Multimedia, Prentice Hall, 2004.
3. G. Lu, Multimedia Database Management Systems, Artech House, 1999.
4. K.R. Rao, Zoran S. Bojkovic, Dragorad A. Milovanovic; Multimedia Communication Systems;
Prentice Hall, 2002.
5. Issac V. Kerlow, The Art of 3D Computer Animation and Effects(3rd ed). Wiley, 2

113

You might also like