Professional Documents
Culture Documents
BY
WEGDERES TARIKU
REVIEWED BY
1.SAMUEL MULATU(MSC)
MAY 2022
1
HOSSANA, ETHIOPIA
Table of Contents
Chapter 1............................................................................................................................................. 4
Introduction to Multimedia Systems ................................................................................................... 4
1.1. What is Multimedia? ........................................................................................................... 4
1.2. Characteristics of a Multimedia System .................................................................................. 6
1.3. Multimedia Applications (where it is applied) ........................................................................ 6
1.4. History of Multimedia Systems ............................................................................................... 7
Chapter 2 ............................................................................................................................... 21
2. Multimedia Basics and Representation .................................................................................... 21
2.1. Digital representation of Multimedia contents: ........................................................................ 21
2.2 .Digital Audio and MIDI ............................................................................................................. 40
2.3.Colors in Image and Video .......................................................................................................... 44
2.4. Color Models in Images ......................................................................................................... 48
2.5 .Color Models in Video ................................................................................................................ 49
Chapter 3 ........................................................................................................................................... 55
3. Multimedia Data Compression ................................................................................................. 55
3.1. Why compression? ..................................................................................................................... 55
3.2. Compression with Loss and Lossless .......................................................................................... 56
3.2. Huffman Coding ......................................................................................................................... 61
3.3. Dictionary-based coding:............................................................................................................ 62
Chapter 4 ........................................................................................................................................... 67
4. Storage of Multimedia .............................................................................................................. 67
4.1. Basics of optical storage technologies .................................................................................. 67
4.1.1. Random Access Memory (RAM).......................................................................................... 69
4.1.2. Read-Only Memory (ROM).................................................................................................. 70
4.2. 3. Floppy and Hard Disks ........................................................................................................ 70
4.1.4 Zip, Jaz, syQuest and optical storage Devices ...................................................................... 71
4.1.5 Digital Versatile Disc (DVD) .................................................................................................. 72
4.1.6 CD-ROM ................................................................................................................................ 72
2
4.1.7 CD-ROM players ................................................................................................................... 73
4.1.8 CD recorders ......................................................................................................................... 73
4.1.9 Videodisc players.................................................................................................................. 74
Chapter 5 ........................................................................................................................................... 75
5. Multimedia Database Management System (MMDBMS) ......................................................... 75
5.1. Multimedia Database ................................................................................................................. 76
5.2. Usage of multimedia databases in different level ..................................................................... 77
5.3. Design and Architecture of a multimedia database ................................................................... 78
5.3.1. MMDB Architectural considerations ................................................................................... 78
5.3.2. Distributed MMDBMS Architecture .................................................................................... 80
5.3.3. MMDBMS Server Components ........................................................................................... 80
5.3.4. Implementation Considerations.......................................................................................... 82
5.4. Indexing Mechanism in Organizing Multimedia Database ........................................................ 85
Chapter 6 ........................................................................................................................................... 95
Multimedia Data Retrieval ................................................................................................................ 95
6.1 Multimedia content representation ........................................................................................... 95
6.2. Similarity measures during searching ...................................................................................... 101
6.3. Retrieval performance measures: ............................................................................................ 103
6.4. Query languages ....................................................................................................................... 104
6.5 query expansion mechanism..................................................................................................... 109
6.5.1 External Resource Based Query Expansion ........................................................................ 110
6.5.2. QUERY LOGS BASED EXPANSION ...................................................................................... 111
6.5.2. RELEVANCE FEEDBACK BASED EXPANSION....................................................................... 111
6.5.3. QUERY EXPANSION USING WIKIPEDIA .............................................................................. 112
Reference Materials ........................................................................................................................ 113
3
Chapter 1
4
✓ Speech
✓ Music
✓ Film
2) Based on the word "Multimedia"
It is composed of two words:
Multi- multiple/many
Media- source
Source refers to different kind of information that we use in multimedia. This includes:
✓ text
✓ graphics
✓ audio
✓ video
✓ images
Multimedia refers to multiple sources of information. It is a system which integrates
all the above types.
Definitions:
5
What is Multimedia Application?
A Multimedia Application is an application which uses a collection of multiple media
sources e.g. text, graphics, images, sound/audio, animation and/or video.
What is Multimedia system?
A Multimedia System is a system capable of processing multimedia data. A Multimedia
System is characterized by the processing, storage, generation, manipulation and
rendition of multimedia information.
6
World Wide Web (WWW)
Multimedia is closely tied to the World Wide Web (WWW). Without networks,
multimedia is limited to simply displaying images, videos, and sounds on your local
machine. The true power of multimedia is the ability to deliver this rich content to a
large audience.
In 1895, Gugliemo Marconi sent his first wireless radio transmission at Pontecchio,
Italy.A few years later (in 1901), he detected radio waves beamed across the Atlantic.
Initially invented for telegraph, radio is now a major medium for audio broadcasting.
Television was the new media for the 20th century. It brings the video and has
since changed the world of mass communications.
Project Xanadu was the explicit inspiration for the World Wide Web, for Lotus
1985 - Negroponte, Wiesner: opened MIT Media Lab Research at the Media Lab
comprises interconnected developments in an unusual range of disciplines, such as
software agents; machine understanding; how children learn; human and machine
vision; audition; speech interfaces; wearable computers; affective computing;
advanced interface design; tangible media; object-oriented video; interactive
cinema; digital expression from text, to graphics, to sound.
1989 - Tim Berners-Lee proposed the World Wide Web to CERN (European Council
for Nuclear Research)
1990 - K. Hooper Woolsey, Apple Multimedia Lab gave education to 100 people
1992 - The first M-Bone audio multicast on the net (MBONE- Multicast Backbone)
1994 - Jim Clark and Marc Andersen introduced Netscape Navigator (web browser)
8
Hypermedia/Multimedia
Hypermedia
10
2. Data conversion: in MM application, data is represented digitally. Because of
this, we have to convert analog data into digital data.
3. Compression and decompression: Why? Because multimedia deals with large
amount of data (e.g. Movie, sound, etc) which takes a lot of storage space.
4. Render different data at same time ó continuous data.
1) Software tools
2) Hardware Requirement
1. Software Requirement
OCR software
This software’s convert printed document into electronically recognizable ASCII
character. It is used with scanners. Scanners convert printed document into bitmap.
Then these software’s break the bitmap into pieces according to whether it contains
text or graphics. This is done by examining the texture and density of the bitmap and
by detecting edges.
Text area:-ASCII text
Examples:
o Macromedia Freehand
o CorelDraw
o Illustrator
Hardware Requirement
Three groups of hardware for multimedia:
1) Memory and storage devices
2) Input and output devices
14
3) Network devices
2) Input-Output Devices
15
I) interact with the system: to interact with multimedia system, we use keyboard,
mouse, track ball, or touch screen, etc.
Mouse: multimedia project is typically designed to be used with mouse as an input
pointing device. Other devices like track ball and touch screen could be used in place
of mouse. Track ball is similar with mouse in many ways.
Wireless mouse: important when the presenter has to move around during
presentation
Touch Screen: we use fingers instead of mouse to interact with touch screen
computers. There are three technologies used in touch screens:
i) Infrared light: such touch screens use invisible infrared light that are projected
across the surface of screen. A finger touching the screen interrupts the beams
generating electronic signal. Then it identifies the x-y coordinate of the screen where
the touch occurred and sends signals to the operating system for processing.
ii) Texture-coated: such monitors are coated with texture material that is sensitive
towards pressure. When user presses the monitor, the texture material on the
monitor extracts the x y coordinate of the location and send signals to operating
system
iii) Touch mate:
Use: touch screens are used to display/provide information in public areas such as
o air ports
o Museums
o transport service areas
o hotels, etc
o Advantage:
o user friendly
o easy to use even for non-technical people
o easy to learn how to use
II) Information Entry Devices: the purpose of these devices is to enter information to
be included in our multimedia project into our computer.
16
OCR: they enable us to use OCR software convert printed document into ASCII file.
Graphical Tablets/ Digitizer: both are used to convert points, lines, and curves from
sketch into digital format. They use a movable device called stylus.
Scanners: enable us to convert printed images into digital format. Two types of
scanners:
▪ Flatbed scanners
▪ Portable scanners
Microphones: they are important because they enable us to record speech, music, etc.
The microphone is designed to pick up and amplify incoming acoustic waves or
harmonics precisely and correctly and convert them to electrical signals. You have to
purchase a superior, high-quality microphone because your recordings will depend on
its quality.
Digital Camera and Video Camera (VCR): are important to record and include image
and video in MMS respectively. Digital video cameras store images as digital data, and
they do not record on film. You can edit the video taken using video camera and VCR
using video editing tools.
Remark: video takes large memory space.
17
Output Devices
Depending on the content of the project, & how the information is presented, you need
different output devices. Some of the output hardwares are:
Speaker: if your project includes speeches that are meant to convey message to
audience, or background music, using speaker is obligatory.
Projector: when to use projector:
▪ if you are presenting on meeting or group discussion,
▪ if you are presenting to large number of audience
Types of projector:
➢ LCD projector
➢ CRT projector
Plotter/printer: when the situation arises to present using papers, you use plotters
and/or plotters. In such cases, print quality of the device should be taken into
consideration. Impact printers: not good quality graphics/poor quality not used
Non-impact printers: good quality graphics
3) Network Devices
18
ii) ISDN: stands for Integrated Services Digital Network. It is circuit switched
telephone network system, designed to allow digital transmission of voice and data
over ordinary telephone copper wires. This has the advantage of better quality and
higher speeds than available with analog systems.
✓ It has higher transmission speed i.e faster data transfer rate.
✓ They use additional hardware hence they are more expensive.
iii) Cable modem: uses existing cables stretched for television broadcast reception.
The data transfer rate of such devices is very fast i.e. they provide high bandwidth.
They are primarily used to deliver broadband internet access, taking advantage of
unused bandwidth on a cable television network.
iv) DSL: provide digital data transmission over the telephone wires of local telephone
network. The speed of DSL is faster than using telephone line with modem. How?
They carry a digital signal over the unused frequency spectrum (analog voice
transmission uses limited range of spectrum) available on the twisted pair cables
running between the telephone company's central office and the customer premises.
19
Summary
20
Chapter 2
Images: Continuous-tone images can be digitized into bitmaps. A digitized image can
be represented as Two- Dimensional matrix with individual picture elements (pixel).
Each pixel is represented by a fixed number of bits or N-bit quantization. 24-bit
quantization provides 16, 777, 216 colors per pixel.
Resolution for print is measured in dots per inch (dpi), Resolution for screen is
measured in pixels per inch (ppi), Dots are digitized in 1-bit quantization
Speech/Audio Signal: One-dimensional but need high dynamic range, large file size.
21
For telephone lines, the channel bandwidth is 200Hz - 3.4 kHz for a sampling of 8-bit
per channel.
Video: Composed of a series of still image frames and produces the illusion of
movement by quickly displaying one frame after another.
The Human Visual System (HVS) accepts anything more than 20 frames per second
(fps) as smooth motion.
Digital Audio Coding:
Codecs: (Co + Dec) means Coding & Decoding or
Encoder & Decoder or
Compress & Decompress
Audio Coding:
Analog sound waves (voice/natural) are received through microphone (as analog
voltage, based on air pressure). Then analog voltage (in amplitude) is sampled at
sampling frequency f and sampled signal is quantized into discrete values.
Analog-to-Digital conversion:
22
Sample and Hold: Taking the measurements of sound wave(voltage) in a given interval
of time
Sample rate:
The number of samples per second.
Telephone : 8000 samples/sec
CD: 44,100 samples/sec
DVD: 48,000 samples/sec
Ex: maximum limit of human hearing = ~ 20 khz, by Nyquist, sample rate must
be ≥ 40,000 samples/sec.
For Telephone 4khz is the maximum frequency, so 8 khz sampling is used.
AM radio – 11kHz
FM radio – 22kHz
23
dB – decibels (to measure sound presence level)
35 dB – quiet
120 dB – discomfort
Quantization:
Sampled analog signal needs to be quantized (digitized)
✓ How many discrete digital values?
✓ What analog value does each digital value corresponding to?
It forms a protocol, adopted by the electronic music industry that enables computers,
synthesizers, keyboards and other musical devices to communicate with each other.
✓ A description format that is made up of semantic information about the sounds
it represents and that makes use of high level (algorithmic) models.
Normal music digitization: Perform wave form coding: (sample the music signal and
then try to reconstruct it
exactly)
24
MIDI: only record musical actions such as
the key depressed,
the time when the key is depressed,
the duration for which the key remains depressed and
how hard the key is struck (pressure)
Advantage:
✓ This can be very effective and realistic but requires a lot of memory.
25
✓ Good for playing music but not realistic for speech synthesis.
✓ Good for creating special sound effects from sample libraries.
Synthesizer:
(Music), electronic instrument (usually played with a keyboard) that generates
and modifies sounds electronically and can initiate a variety of other musical
instruments. If a musical instrument satisfies both components of the MIDI standard,
then it is MIDI compatible device.
26
✓ Sound Generations: use a mix of
Synthesis and
Samples
✓ Samplers – Digitize (Sample) sound then
Play back,
Loop (beats),
Simulate musical instruments,
AAC – Advanced audio coding – designed for high compress as well as high
audio quality
Mp3 – MPEG 1Audio layer 3
Mp3 VBR – mp3 with variable bit rate
Audible 2,3,4 – (.aa) used for audio books or other voice records.
Apple Loss less – (.m4a)
AIFF – Audio interchange file format, similar to WAV
WAV – from IBM and Microsoft
WMA – windows media audio
FLAC – Free lossless audio codec
ATRAC (sony) – Adaptive Transform Acoustic coding
OGG Vorbis – is an open and free audio compress formats
Real audio - .rm, .ram
27
Graphic/Image Data Representation
An image could be described as two-dimensional array of points where every
point is
allocated its own color. Every such single point is called pixel, short form of picture
element. Image is a collection of these points that are colored in such a way that they
produce meaningful information/data.
Pixel (picture element) contains the color or hue and relative brightness of that
point in the image. The number of pixels in the image determines the resolution of the
image.
Fig 1 pixels
28
Types of images
There are two basic forms of computer graphics: bit-maps and vector graphics. The
kind you use determines the tools you choose. Bitmap formats are the ones used for
digital photographs. Vector formats are used only for line drawings.
Each of the small pixels can be a shade of gray or a color. Using 24-bit color, each pixel
can be set to any one of 16 million colors. All digital photographs and paintings are
bitmapped, and any other kind of image can be saved or exported into a bitmap
format. In fact, when you print any kind of image on a laser or ink-jet printer, it is first
converted by either the computer or printer into a bitmap form so it can be printed
with the dots the printer uses.
To edit or modify bitmapped images you use a paint program. Bitmap images are
widely used but they suffer from a few unavoidable problems. They must be printed or
displayed at a size determined by the number of pixels in the image. Bitmap images
also have large file sizes that are determined by the imageís dimensions in pixels and
its color depth. To reduce this problem, some graphic formats such as GIF and JPEG
are used to store images in compressed format.
29
Image coding:
Sampling:
✓ Taking measurement (of color) at discrete locations within the image.
o Image sensor in camera measures the color at each pixel.
o 16 samples per inch means16 pixels per inch.
Quantization:
✓ Represent measured value (i.e. voltage) at the sampled point by an integer:
Pixel dimension:
The number of pixels horizontally (i.e. width, w) and vertically (i.e. height h),
denoted as w x h
Resolution:
The numbers of pixels in an image file per unit of spatial measure.
Resolution can be measured in pixels per inch (ppi).
Resolution by a printer is a matter of how many dots of color per inch (dpi) it
can print.
Image size:
30
The physical dimension of an image when it is printed out or displayed on a
computer.
Relation of image size, pixel dimension and resolution is
a = w/r, b = h/r
r – resolution
w x h – pixel dimension
Image size – a x b
Color depth:
The amount of information per pixel is called color depth.
1-bit => white/black (monochrome)
8-bits => gray scale
16-bits =>color (RGB)
24 or 32 bits => true color
Color models: A method by which we can specify, create, and visualize color
31
Vector graphics
They are really just a list of graphical objects such as lines, rectangles, ellipses, arcs, or
Curves called primitives. Draw programs, also called vector graphics programs, are
used to create and edit these vector graphics. These programs store the primitives as a
set of numerical coordinates and mathematical formulas that specify their shape and
position in the image. This format is widely used by computer-aided design programs
to create detailed engineering and design drawings. It is also used in multimedia when
3D animation is desired. Draw programs have a number of advantages over paint-type
programs.
These include:
▪ Precise control over lines and colors.
▪ Ability to skew and rotate objects to see them from different angles or add
perspective.
▪ Ability to scale objects to any size to fit the available space. Vector graphics
always print at the best resolution of the printer you use, no matter what size
you make them.
▪ Color blends and shadings can be easily changed.
▪ Text can be wrapped around objects.
Monochrome/Bit-Map Images
▪ Each pixel is stored as a single bit (0 or 1)
▪ The value of the bit indicates whether it is light or dark
▪ A 640 x 480 monochrome image requires 37.5 KB of storage.
▪ Dithering is often used for displaying monochrome images
32
Fig 3 Monochrome bit-map image
Gray-scale Images
▪ each pixel is usually stored as a byte (value between 0 to 255)
▪ This value indicates the degree of brightness of that point. This brightness goes
from black to white
▪ A 640 x 480 grayscale image requires over 300 KB of storage.
33
Fig 5 8-bit Color Image
Image Resolution
Image resolution refers to the spacing of pixels in an image and is measured in pixels per
inch, ppi, sometimes called dots per inch, dpi. The higher the resolution, the more pixels in
the image. A printed image that has a low resolution may look pixelated or made up of
small squares, with jagged edges and without smoothness.
Image size refers to the physical dimensions of an image. Because the number of pixels in
an image is fixed, increasing the size of an image decreases its resolution and decreasing its
size increases its resolution.
34
Standard System Independent Formats
GIF
✓ Graphics Interchange Format (GIF) devised CompuServe, initially for
transmitting
✓ graphical images over phone lines via modems.
✓ Uses the Lempel-Ziv Welch algorithm (a form of Huffman Coding), modified
slightly for image scan line packets (line grouping of pixels).
✓ LZW compression was patented technology by the UNISYS Corp.
✓ Limited to only 8-bit (256) color images, suitable for images with few
distinctive colors (e.g., graphics drawing)
✓ Supports one-dimensional interlacing (downloading gradually in web
browsers.
✓ Interlaced images appear gradually while they are downloading. They display at
a low blurry resolution first and then transition to full resolution by the time
the download is complete.)
✓ Supports animationómultiple pictures per file (animated GIF)
✓ GIF format has long been the most popular on the Internet, mainly because of
its small size
✓ GIFs allow single-bit transparency, which means when you are creating your image,
you can specify one colour to be transparent. This allows background colours to
show through the image.
PNG
✓ stands for Portable Network Graphics
✓ It is intended as a replacement for GIF in the WWW and image editing tools.
✓ GIF uses LZW compression which is patented by Unisys. All use of GIF may have
to pay royalties to Unisys due to the patent.
✓ PNG uses unpatented zip technology for compression
✓ One version of PNG, PNG-8, is similar to the GIF format. It can be saved with a
maximum of
✓ 256 colours and supports 1-bit transparency. Filesizes when saved in a capable
image editor like FireWorks will be noticeably smaller than the GIF counterpart, as
PNGs save their colour
✓ data more efficiently.
✓ PNG-24 is another version of PNG, with 24-bit colour support, allowing ranges of
colour to a high colour JPG. However, PNG-24 is in no way a replacement format
for JPG, because it is
✓ a loss-less compression format which results in large filesize.
✓ Provides transparency using alpha value
✓ Supports interlacing
✓ PNG can be animated through the MNG extension of the format, but browser
support is less for this format.
JPEG/JPG
✓ A standard for photographic image compression
✓ created by the Joint Photographic Experts Group
35
✓ Intended for encoding and compression of photographs and similar images
✓ Takes advantage of limitations in the human vision system to achieve high rates
of compression
✓ Uses complex lossy compression which allows user to set the desired level of
quality (compression). A compression setting of about 60% will result in
the optimum balance of quality and filesize.
✓ T h o u g h JPGs can be interlaced; they do not support animation and
transparency unlike GIF
TIFF
✓ Tagged Image File Format (TIFF), stores many different types of images
(e.g.,
✓ Monochrome, grayscale, 8-bit & 24-bit RGB, etc.)
✓ Uses tags, keywords defining the characteristics of the image that is included in
the file. For example, a picture 320 by 240 pixels would include a 'width' tag
followed by the number '320' and a 'depth' tag followed by the number '240'.
✓ Developed by the Aldus Corp. in the 1980ís and later supported by the Microsoft
✓ TIFF is a lossless format (when not utilizing the new JPEG tag which allows
for
✓ JPEG compression)
✓ It does not provide any major advantages over JPEG and is not as user-controllable.
✓ Do not uses TIFF for web images. They produce big files, and more importantly,
most web browsers will not display TIFFs.
Color models: A method by which we can specify, create, and visualize color
In the realm of digital imaging, frequency refers to the rate at which color values
change.
36
Image Decimation: (Sub sampling) is the reduction in dimension (or) resolution of the
image.
Ex: Decimation of 2, results to half of the size of the image.
Simplest method is the skipping of pixels.
Image Interpolation: (Up sampling) is the increase in dimension (or) resolution of the
image.
Ex: Interpolation of 2, results to double the size of the image.
Simplest method is duplication of pixels.
Color Models:
Color perception:
37
✓ Color is how we perceive of the frequencies of light that reach the retinas of our
eyes.
✓ The human retina has three types of color photoreceptor cone cells that
correspond to the colors of red, green and blue
38
Three characteristics of color:
Hue (essential color):dominant wavelength
Saturation (color purity)
Brightness (Luminance - Intensity of light)
Procedural modeling:
Creating a digital image by writing a computer program based on some mathematical
computation or unique type of algorithm. Procedural models are often defined by a small set of
data that describes the overall properties of the model.
For example, a tree might be defined by some branching properties and leaf shapes. The actual
model is constructed by a procedure that often makes one of randomness to add variety. In this
way, a single tree pattern can be used to model an entire forest.
39
Video formats:
AVI, WMF, MPEG, QuickTime, Real video, shockwave (Flash)
AVI – Audio video interleaved – is a window movie file with high video quality, but a large file
size (approx. 25 GB is required for 60 minutes of video)
40
In order to play digital audio (i.e WAVE file), you need a card with a Digital to Analog Converter
(DAC) circuitry on it. Most sound cards have both an ADC (Analog to Digital Converter) and a
DAC so that the card can both record and play digital audio. This DAC is attached to the Line
Out jack of your audio card, and converts the digital audio values back into the original analog
audio.This analog audio can then be routed to a mixer, or speakers, or headphones so that you
can hear the recreation of what was originally recorded.
Playback process is almost an exact reverse of the recording process. First, to record digital
audio, you need a card which has an Analog to Digital Converter (ADC) circuitry. The ADC is
attached to the Line In (and Mic In) jack of your audio card, and converts the incoming analog
audio to a digital signal.
First, to record digital audio, you need a card which has an Analog to Digital Converter(ADC)
circuitry. The ADC is attached to the Line In (and Mic In) jack of your audio card, and converts
the incoming analog audio to a digital signal. Your computer software can store the digitized
audio on your hard drive, visually display on the computer's monitor, mathematically
manipulate in order to add effects, or process the sound, etc. While the incoming analog audio
is being recorded, the ADC is creates many digital values in its conversion to a digital audio
representation of what is being recorded. These values must be stored for later playback.
Digitizing Sound
• Microphone produces analog signal
• Computers understands only discrete(digital) entities
This creates a need to convert Analog audio to Digital audio — specialized hardware. This is
also known as Sampling.
Common Audio Formats
There are two basic types of audio files:
1. The Traditional Discrete Audio File:
In traditional audio file you can save to a hard drive or other digital storage medium. WAV:
The WAV format is the standard audio file format for Microsoft Windows applications, and is
the default file type produced when conducting digital recording within Windows. It supports a
variety of bit resolutions, sample rates, and channels of audio. This format is very popular upon
IBM PC (clone) platforms, and is widely used as a basic format for saving and modifying digital
audio data.
AIF: The Audio Interchange File Format (AIFF) is the standard audio format employed by
computers using the Apple Macintosh operating system. Like the WAV format, its upports a
41
variety of bit resolutions, sample rates, and channels of audio and is widelyused in software
programs used to create and modify digital audio.
AU: The AU file format is a compressed audio file format developed by Sun Microsystems and
popular in the unix world. It is also the standard audio file format for the Java programming
language. Only supports 8-bit depth thus cannot provide CD-quality sound.
MP3:MP3 stands for Motion Picture Experts Group, Audio Layer 3 Compression. MP3 files
provide near-CD-quality sound but are only about 1/10th as large as a standard audio CD file.
Because MP3 files are small, they can easily be transferred across the Internet and played on
any multimedia computer with MP3 player software.
MIDI/MID: MIDI (Musical Instrument Digital Interface), is not a file format for storing or
transmitting recorded sounds, but rather a set of instructions used to play electronic music on
devices such as synthesizers. MIDI files are very small compared to recorded audio file formats.
However, the quality and range of MIDI tones is limited.
2. Streaming Audio File Formats
Streaming is a network technique for transferring data from a server to client in a format that
can be continuously read and processed by the client computer. Using this method, the client
computer can start playing the initial elements of large time-based audio or video files before
the entire file is downloaded. As the Internet grows, streaming technologies are becoming an
increasingly important way to deliver time-based audio and video data.
For streaming to work, the client side has to receive the data and continuously feed it to the
player application. If the client receives the data more quickly than required, it has to
temporarily store or buffer the excess for later play. On the other hand, if the data doesn't
arrive quickly enough, the audio or video presentation will be interrupted. There are three
primary streaming formats that support audio files: Real Network's Real Audio (RA, RM),
Microsoft’s Advanced Streaming Format (ASF) and its audio subset called Windows Media
Audio 7 (WMA) and Apples QuickTime 4.0+ (MOV).
RA/RM
For audio data on the Internet, the de facto standard is RealNetwork's RealAudio
(.RA)compressed streaming audio format. These files require a RealPlayer program or browser
plug-in. The latest versions of Real Networks server and player software can handle multiple
encodings of a single file, allowing the quality of transmission to vary with the available
bandwidth. Webcast radio broadcast of both talk and music frequently uses RealAudio.
42
Streaming audio can also be provided in conjunction with video as a combined Real Media
(RM) file.
ASF
Microsoft Advanced Streaming Format (ASF) is similar to designed to Real Network's Real
Media format, in that it provides a common definition for internet streaming media and can
accommodate not only synchronized audio, but also video and other multimedia elements, all
while supporting multiple bandwidths within a single media file. Also like Real Network's Real
Media format, Microsoft's ASF requires a program or browser plug-in. The pure audio file
format used in Windows Media Technologies is Windows Media Audio 7 (WMA files). Like MP3
files, WMA audio files use sophisticated audio compression to reduce file size. Unlike MP3 files,
however, WMA files can function as either discrete or streaming data and can provide a
security mechanism to prevent unauthorized use.
MOV
Apple QuickTime movies (MOV files) can be created without a video channel and used as a
sound-only format. Since version 4.0, QuickTime provides true streaming capability.
QuickTime also accepts different audio sample rates, bit depths, and offers full functionality in
both Windows as well as the Mac OS. Popular audio file formats are:
• au (Unix)
• aiff (MAC)
• wav (PC)
• mp3
MIDI
MIDI stands for Musical Instrument Digital Interface.
Definition of MIDI: MIDI is a protocol that enables computer, synthesizers, keyboards, and
other musical device to communicate with each other. This protocol is a language that allows
interworking between instruments from different manufacturers by providing a link that is
capable of transmitting and receiving digital data. MIDI transmits only commands, it does not
transmit an audio signal.It was created in 1982.
Components of a MIDI System
1. Synthesizer: It is a sound generator (various pitch, loudness, tone color). A good
(musician’s) synthesizer often has a microprocessor, keyboard, control panels, memory, etc.
43
2. Sequencer: It can be a stand-alone unit or a software program for a personal computer. (It
used to be a storage server for MIDI data. Nowadays it is more a software music editor on the
computer. It has one or more MIDI INs and MIDI OUTs.
44
Visible light is an electromagnetic wave in the 400nm-700nm range
(Blue~400nm,Red~700nm, Green~500nm). Most light we see is not one wavelength, it’s a
combination of many wavelengths. For Example purple is a mixture of red and violet. 1nm=10-
9m.
Color Spaces
Color space specifies how color information is represented. It is also called color model. Any
color could be described in a three dimensional graph, called a color space. Mathematically the
axis can be tilted (sloped) or moved in different directions to change the way the space is
described, without changing the actual colors. The values along an axis can be linear or
nonlinear.
this gives a variety of ways to describe colors that have an impact on the way we process
45
a color image. There are different ways of representing color. Some of these are:
• RGB color space
• YUV color space(Luma (Brightness) 'Y', Blue Projection 'U', Red Projection 'V'
• YIQ color space(Luma (Brightness) 'Y', I and Q represent
• the chrominance information. Chrominance: Color - Luma.)
• CMY/CMYK color space
• YCbCr color space(Y' is the luma component and C B and C are the blue-difference
and red-difference chroma components. Y' (with prime) is distinguished from Y,
which is luminance, meaning that light intensity is nonlinearly encoded based on
gamma corrected RGB primaries.)
46
colors, and is a natural color space to use with video displays.
The three colors together absorb all the light that strikes it, appearing black (as contrasted to
RGB where the three colors together made white). "Nothing" on the paper is white (as
contrasted to RGB where nothing was black). These are called the subtractive or "paint" colors.
Cyan, Magenta, and Yellow (CMY) are complementary colors of RGB. CMY model is mostly used
in printing devices where the color pigments on the paper absorb certain colors (e.g., no red
light reflected from cyan ink) and in painting.
In practice, it is difficult to have the exact mix of the three colors to perfectly absorb all light
and thus produce a black color. Expensive inks are required to produce the exact color, and the
paper must absorb each color in exactly the same way. To avoid these problems, a forth color is
often added - black - creating the CYMK color space, even though the black is mathematically
not required. Sometimes, an alternative CMYK model (K stands for Black) is used in color
printing (e.g., to produce darker black than simply mixing CMY).
47
2.4. Color Models in Images
Colors models and spaces used for stored, displayed, and printed images.
RGB Color Model (Additive Model)
Color images are encoded as integer triplet(R,G,B) values. These triplets encode how much the
corresponding phosphor should be excited in devices such as a monitor. For images produced
from computer graphics, we store integers proportional to intensity in the frame buffer.
CMY Color Model (Subtractive Color)
Additive color: Namely, when two light beams affect on a target, their colors add; when two
phosphors on a CRT screen are turned on, their colors add. But for ink deposited on paper, the
opposite situation holds: yellow ink subtracts blue from white illumination, but reflects red and
green; it appears yellow. Instead of red, green, and blue primaries, we need primaries that
amount to -red, -green, and -blue. i.e we need to subtract R, or G, or B. These subtractive color
primaries are Cyan (C), Magenta (M) and Yellow (Y) inks.
48
Undercolor Removal: CMYK System
Undercolor removal: Sharper and cheaper printer colors: calculate that part of the CMY mix
that would be black, remove it from the color proportions, and add it back as real black. The
new specification of inks is thus: K stands for black.
Color combinations that result from combining primary colors available in the two situations,
additive color and subtractive
(a) (b)
Fig: Additive and subtractive color. (a): RGB is used to specify additive color. (b): CMY is used
to specify subtractive color
49
YUV Color Model
Established in 1982 to build digital video standard. Video is represented by a sequence of fields
(odd and even lines). Two fields make a frame.Works in PAL (50 fields/sec) or NTSC ( National
Television System Committee) (60 fields/sec). The luminance (brightness), Y, is retained
separately from the chrominance (color). The Y component determines the brightness of the
color (referred to as luminance or luma), while the U and V components determine the color
itself (it is called chroma). U is the axis from blue to yellow and V is the axis from magenta to
cyan. Y ranges from 0 to1 (or 0 to 255 in digital formats), while U and V range from -0.5 to 0.5
(or -128 to 127 in signed digital form, or 0 to 255 in unsigned form). The luminance component
of a television signal which carries information on the brightness of the image. Y(Luminance)
is the intensity of light emitted from a surface per unit area in a given direction. Chrominance
refers to the difference between a color and a reference white at the same luminance.
Fig: YUV decomposition of color image. Top image (a) is original color image; (b) is Y ; (c,d) are
(U; V )
50
next to Q.YIQ is intended to take advantage of human color response characteristics. Eye is
more sensitive to (I) than in (Q). Therefore less bandwidth is required for Q than for I. NTSC
limits I to 1.5 MHZ and Q to 0.6 MHZ. Y is assigned higher bandwidth, 4MHZ.
YCbCr
This is similar to YUV. This color space is closely related to the YUV space, but with the
coordinates shifted to allow all positive valued coefficients. The luminance (brightness), Y, is
retained separately from the chrominance (color). During development and testing of JPEG it
became apparent that chrominance sub sampling in this space allowed a much better
compression than simply compressing RGB or CYM. Sub sampling means that only one half or
one quarter as much detail is retained for the color as for the brightness. It is used in MPEG and
JPEG compressions.
Y-Luma component
Cb
Cr Chrominace
Digital Video Representation: Can be thought of as a sequence of moving images (or) frames.
Important parameters in video
Digital image resolution (n x m) pixel
Quantization (k-bits per pixel)
Frame rate (p frames per sec- fps)
51
Types of video signals:
a- Component video
b- Composite video
c- S-video
Composite video:
✓ Chrominance and luminance signals mixed into a single carrier wave
Y-luminance
I,Q or U,V – chrominance
✓ Only one wire required – some interference
✓ Used by broadcast color TV, downward compatible with black and white TV
✓ In NTSC TV, I and Q combined into chrominance signal, at the higher frequency end of
the channel.
52
Downward compatible with black and white TV system , color model YUV.
Other standards:
PAL (Phase Alternating Line): used in parts of western Europe
SECAM : French standard
CIF standard:
CIF – Common Intermediate Format
The idea of CIF: a format for lower bit rate
QCIF – Quarter – CIF, more lower bit rate
The resolution of CIF/QCIF can be divided by 8 or 16.
53
Ex: resolution 640 × 480 and 24 bits per pixel color depth means = (640×480×24×30)
bits/sec (bps)
Normally one frame = 3 pictures (YCbCr)
54
Chapter 3
Compression is enabled by statistical and other properties of most data types, however, data
types exist which cannot be compressed, e.g. various kinds of noise or encrypted data
55
1024×768×24 = 18, 874, 368 bits
Audio: CD quality, sampling rate of 44, 1 kHz, 16 bits per sample results in
44.1×16 = 706 k bit/s
Stereo: 1,412 m bit/s
Lossless compression:
In lossless compression (as the name suggests) data are reconstructed after compression
without errors, i.e. no information is lost. Typical application domains where you do not want
to loose information is compression of text, files, fax. In case of image data, for medical imaging
56
or the compression of maps in the context of land registry no information loss can be tolerated.
A further reason to stick to lossless coding schemes instead of lossy ones is their lower
computational demand. For all lossless compression techniques there is a well known trade-
off: Compression Ratio – Coder Complexity – Coder Delay
The original content of the data is not lost/changed when it is compressed (encoded). It is used
mainly for compressing symbolic data such as database records, spreadsheets, texts,
executable programs, etc., Lossless compression can recover the exact original data after
compression where exact replication of the original is essential and changing even a single bit
cannot be tolerated. Examples: Run Length Encoding (RLE), Lempel Ziv (LZ), Huffman Coding.
✓ The model: the data to be compressed is analysed with respect to its structure and the
relative frequency of the occuring symbols.
✓ The encoder: produces a compressed bitstream / file using the information provided by
the model.
✓ The adaptor: uses information’s extracted from the data (usually during encoding) in
order to adapt the model (more or less) continuously to the data.
Lossy compression:
Lossy Compression is generally used for image, audio, video. In this compression technique, the
compression process ignores some less important data and the exact replica of the original file
can’t be retrieved from the compressed file. To decompress the compressed data we can get a
closer approximation of the original file
The original content of the data is lost to certain degree when compressed. For visual and
audio data, some loss of quality can be tolerated without losing the essential nature of the data.
Lossy compression is used for image compression in digital cameras like JPEG, audio
compressionlike mp3. video compression in DVDs with MPEG format.
57
Lossy compression Remove some image details to achieve a higher compression rate by
suppressing higher frequencies.
Ex: JPEG 100:1 Compression rate
Combined with lossless techniques
Trade-off between file size and quality
Compression categories
1. Entropy coding: (Run-length coding, Huffman coding, Arithmetic coding)
✓ Ignoring semantics of the data
✓ Lossless
2. Source coding: (Prediction: DPCM, DM; Transformation: FFT, DCT)
✓ Based on semantic of the data
✓ Often lossy
3. Channel coding:
✓ Adaptation to communication channel
✓ Introduction of redundancy
4. Hybrid coding:
58
✓ Combination of entropy and source coding proprietary.
(JEPG, MPEG, H.261, DVI RIV, DVI PLV….)
As we can see, we can only expect a good compression rate when long runs occur frequently
Example of long runs: blank space in text documents,
leading zeros in numbers,
strings of white in grey-scale images.
59
D. Variable length coding:
Classical character codes use the same number of bits for each character. When the frequency
of occurrence is different for different characters, we can use fewer bits for frequent character
and more bits for rare character.
60
pi - probability that symbol Si will occur in S.
log2 1/pi – indicates the amount of information contained in Si, which corresponds to the
number
of bits(code length) needed to encode Si.
Code length:
The entropy n specifies the lower bound for the average number of bits to code each
symbol in S
(i.e) n≤i
i – the average length (measured in bits) of the codewords produced by the encoder.
Example:
probabilities of the characters
P(A): 0.3, p(B)= 0.3, p(C) = 0.1, p(D) = 0.15, p(E) = 0.15
61
Characters with higher probabilities are closer to the root of the tree and thus have shorter
codeword lengths, thus it is a optimal code.
Static techniques: The dictionary exists before a string is encoded. It is changed, neither in the
coding nor in the decoding process.
Dynamic techniques: The dictionary is created “on the fly” during the encoding and
decoding process.
62
Lempel and Ziv have proposed an especially brilliant dynamic, dictionary-based technique
(1977).
Variants of these techniques (LZW) are used very widely today for lossless compression.
TIFF – Tag image File Format – is based on LZ coding.
LZW coding principle: The current piece of the message can be encoded as a reference to an
earlier (identical) piece of the message. This reference will usually be shorter than the piece
itself. As the message is processed, the dictionary is created dynamically.
LZW Algorithm:
Initialize code table ();
S ← the given string
W ← the empty string;
For each character in string (S) {
K ← Get next character from S
If (W+K) is in the code table {
W = W+K /* string concatenation */
} else {
Write code (code from code Table (W));
Add table entry (W+K);
W=K
}
}
Write code (code from code Table (W));
Example:
Our alphabet is {A, B, C, D} we encode the string ABACABA.
In the first step we initialize the string table
Dictionary
Code String
1 A
2 B
63
3 C
4 D
We read A from the input . We find A in the table and keep A in the buffer.
We read B from the input into the buffer and now consider AB.
AB is not in the table, we add AB with index 5, write 1 for A into the output and remove A from
the buffer.
The buffer only contains B now.
Next, we read A, consider BA, add BA as entry 6 into the table and write 2 for B into the output
etc..
At the end the code table is
1=A
2=B
3=C
4=D
5=AB
6=BA
7=AC
8=CA
9=ABA
Algorithm:
Initial – code ();
While not EOF
{
get(c);
encode (c); // identify the symbol or character (c)
64
update-tree(c);
}
Initial – code (): assigns symbols with some initially agreed upon codes, without any prior
knowledge of the frequency counts.
Update-tree (): constructs an Adaptive Huffman tree.
Note:
If any character / symbol is to be sent the first time, it must be preceded by a special symbol,
NEW.
The initial code for NEW is 0. The count for NEW is always kept as 0 (the count is never
increased).
65
It is important to emphasize that the code for a particular symbol changes during the adaptive
Huffman coding process.
66
Chapter 4
4. Storage of Multimedia
67
optical disk. Besides higher capacity, optical-storage technology also delivers more authentic
duplication of sounds and images.
Optical disks are also inexpensive to make: the plastic disks are simply molds pressed from a
master, as phonograph records are. The data on them cannot be destroyed by power outages
or magnetic disturbances, the disks themselves are relatively impervious to physical damage,
and unlike magnetic disks and tapes, they need not be kept in tightly sealed containers to
protect them from contaminants. Optical-scanning equipment is similarly durable because it
has relatively few moving parts.
Early optical disks were not erasable—i.e., data encoded onto their surfaces could be read but
not erased or rewritten. This problem was solved in the 1990s with the development of WORM
and of writable/rewritable disks. The chief remaining drawback to optical equipment is a
slower rate of information retrieval compared with conventional magnetic-storage media.
Despite its slowness, its superior capacity and recording characteristics make optical storage
ideally suited to memory-intensive applications, especially those that incorporate still or
animated graphics, sound, and large quantities of text. Multimedia encyclopedias, video games,
training programs, and directories are commonly stored on optical media.
Data Storage:
A data storage device is a device for recording (storing) information (data). Recording can
be done using virtually any form of energy. A storage device may hold information, process
information, or both. A device that only holds information is a recording medium. Devices that
process information (data storage equipment) may both access a separate portable
(removable) recording medium or a permanent component to store and retrieve information.
Electronic data storage is storage which requires electrical power to store and retrieve that
data.
Most storage devices that do not require visual optics to read data fall into this category.
Electronic data may be stored in either an analogue or digital signal format. This type of data is
considered to be electronically encoded data, whether or not it is electronically stored. Most
electronic data storage media (including some forms of computer storage) are considered
permanent (non-volatile) storage, that is, the data will remain stored when power is removed
from the device. In contrast, electronically stored information is considered volatile memory.
68
By adding more memory and storage space to the computer, the computing needs and habits to
keep pace, is filling the new capacity.
On the Macintosh, the minimum RAM configuration for serious multimedia production is about
32 MB; but even 64 MB and 256 MB systems are becoming common, because while digitizing
audio or video, you can store more data much quickly in RAM. And when you are using some
software, you can quickly chew up available RAM—for example, Photoshop (16 MB minimum,
20 MB recommended); After Effects (32 MB required), Director (8 MB minimum, 20 MB
better); Page maker (24 MB recommended); Illustrator (16 MB recommended); Microsoft
Office (12 MB recommended).
69
In spite of all the marketing hype about processor speed, this speed is ineffective if not
accompanied by sufficient RAM. A fast processor without enough RAM may waste processor
cycles while it swaps needed portions of program code into and out of memory.
In some cases, increasing available RAM may show more performance improvement on your
system than upgrading the processor clip. On an MPC platform, multimedia authoring can also
consume a great deal of memory. It may be needed to open many large graphics and audio files,
as well as your authoring system, all at the same time to facilitate faster copying/pasting and
then testing in your authoring software. Although 8 MB is the minimum under the MPC
standard, much more is required as of now.
Read-only memory is not volatile, Unlike RAM, when you turn off the power to a ROM chip,
it will not forget, or lose its memory. The ROM is typically used in computers to hold the small
BIOS program that initially boots up the computer, and it is used in printers to hold built-in
fonts. Programmable ROMs (called PROM’s) allow changes to be made that are not forgotten. A
new and inexpensive technology, optical read-only memory (OROM), is provided in
proprietary
data cards using patented holographic storage. Typically, OROMs offer 128 MB of storage, have
no moving parts, and use only about 200 milli-watts of power, making them ideal for handheld,
battery-operated devices.
Adequate storage space for the production environment can be provided by large capacity hard
disks; a server-mounted disk on a network; Zip, Jaz or SyQuest removable cartridges; optical
media; CD-R (compact disc-recordable) discs; tape; floppy disks; banks of special memory
devices or any combination of the above.
Removable media (floppy disks, compact or optical discs and cartridges) typically fit into a
letter sized mailer for overnight courier service. One or many disks may be required for
storage and archiving each project, and it is necessary to plan for backups kept off-site.
Floppy disks and hard disks are mass-storage devices for binary data that can be easily read by
a computer. Hard disks can contain much more information than floppy disks and can operate
70
at far greater data transfer rates. In the scale of things, floppies are, however, no longer “mass
storage” devices. A floppy disk is made of flexible Mylar plastic coated with a very thin layer of
special magnetic material.
A hard disk is actually a stack of hard metal platters coated with magnetically sensitive
material, with a series of recording heads or sensors that however a hairbreadth above the fast
spinning surface, magnetizing or demagnetizing spots along formatted tracks using technology
similar to that used by floppy disks and audio and video tape recording. Hard disks are the
most common mass-storage device used on computers, and for making multimedia, it is
necessary to have one or more large-capacity hard disk drives.
As multimedia has reached consumer desktops, makers of hard disks have been challenged to
build smaller profile, larger-capacity, faster, and less-expensive hard disks.
In 1994, hard disk manufactures sold nearly 70 million units; in 1995, more than 80 million
units. And prices have dropped a full order of magnitude in a matter of months. As network
and Internet servers increase the demand for centralized data storage requiring terabytes (1
trillion bytes), hard disks will be configured into fail-proof redundant array offering built-in
protection against crashes.
SyQuest’s 44 MB removable cartridges have been the most widely used portable medium
among multimedia developers and professionals, but Iomega’s inexpensive Zip drives with
their likewise inexpensive 100 MB cartridges have significantly penetrated SyQuest’s market
share for removable media. Iomega’s Jaz cartridges provide a gigabyte of removable storage
media and have fast enough transfer rates for audio and video development. Pinnacle Micro,
Yamaha, Sony, Philips and others offer CD-R “burners” for making write-once CDs, and some
double as quad-speed players. As blank CD-R discs become available for less than a dollar each,
this write once media competes as a distribution vehicle.
Magneto-optical (MO) drives use a high-power laser to heat tiny spots on the metal oxide
coating of the disk. While the spot is hot, a magnet aligns the oxides to provide a 0 or 1 (on or
off) orientation. Like SyQuests and other Winchester hard disks, this is rewritable technology,
because the spots can be repeatedly heated and aligned. Moreover, this media is normally not
71
affected by stray magnetism (it needs both heat and magnetism to make changes), so these
disks are particularly suitable for archiving data. The data transfer rate is, however, slow
compared to Zip, Jaz and SyQuest technologies.
One of the most popular formats uses a 128 MB-capacity disk-about the size of a 3.5 inch
floppy. Larger-format MO drives with 5.25 inch cartridges offering 650 MB to 1.3 GB of storage
are also available.
4.1.6 CD-ROM
A Compact Disc or CD is an optical disc used to store digital data, originally developed for
storing digital audio. The CD, available on the market since late 1982, remains the standard
playback medium for commercial audio recordings to the present day, though it has lost
ground in recent years to MP3 players.
An audio CD consists of one or more stereo tracks stored using 16-bit PCM coding at a
sampling rate of 44.1 kHz. Standard CDs have a diameter of 120 mm and can hold
approximately 80 minutes of audio. There are also 80 mm discs, sometimes used for CD singles,
72
which hold approximately 20 minutes of audio. The technology was later adapted for use as a
data storage device, known as a CD-ROM, and to include record once and re-writable media
(CD-R and CDRW respectively).
The CD-ROMs and CD-Rs remain widely used technologies in the computer industry as of 2007.
The CD and its extensions have been extremely successful in 2004, the worldwide sales of CD
audio, CD-ROM and CD-R reached about 30 billion discs. By 2007, 200 billion CDs had been
sold worldwide.
4.1.8 CD recorders
With a CD recorder, you can make your own CDs using special CD recordable (CD-R) blank
optical discs to create a CD in most formats of CD-ROM and CD-Audio. The machines are made
by Sony, Phillips, Ricoh, Kodak, JVC, Yamaha and Pinnacle. Software, such as Adaptec’s Toast
for Macintosh or Easy CD Creator for Windows, lets you organize files on your hard disk(s) into
a “virtual” structure, then writes them to the CD in that order. The CD-R discs are made
differently than normal CDs but can play in any CD-Audio or CD-ROM player. They are available
in either a “63minute” or “ 74minute” capacity for the former, that means about 560 MB, and
for the latter, about 650 MB. These write-once CDs make excellent high-capacity file archives
and are used extensively by multimedia developers for pre mastering and testing CDROM
projects and titles.
73
4.1.9 Videodisc players
Videodisc players (commercial, not consumer quality) can be used in conjunction with the
computer to deliver multimedia applications. You can control the videodisc player from your
authoring software with X-Commands (XCMDs) on the Macintosh and with MCI commands in
Windows
74
Chapter 5
A multimedia DBMS provides support for multimedia data types, plus facilities for the creation,
storage, access, query, and control of the multimedia database. The different data types
involved in multimedia databases might require special methods for optimal storage, access,
indexing, and retrieval. The multimedia DBMS should accommodate these special
requirements by providing high-level abstractions to manage the different data types, along
with a suitable interface for their presentation.
A multimedia database management system provides a suitable environment for using and
managing multimedia database information. Therefore, it must support the various multimedia
data types, in addition to providing facilities for traditional DBMS functions like database
definition and creation, data retrieval, data access and organization, data independence,
privacy, integration, integrity control, version control, and concurrency support. The functions
of a multimedia DBMS basically resemble those of a traditional DMBS.
➢ Integration:-Ensures that data items need not be duplicated during different program
invocations requiring the data.
➢ Data independence:-Separation of the database and the management functions from
the application programs.
➢ Concurrency control:-Ensures multimedia database consistency through rules, which
usually impose some form of execution order on concurrent transactions.
➢ Persistence: - The ability of data objects to persist (survive) through different
transactions and program invocations.
➢ Privacy:-Restricts unauthorized access and modification of stored data.
75
➢ Integrity control:-Ensures consistency of the database state from one transaction to
another through constraints imposed on transactions.
➢ Recovery:-Methods needed to ensure that results of transactions that fail do not affect
the persistent data storage.
➢ Query support:-Ensures that the query mechanisms are suited for multimedia data.
➢ Version control:-Organization and management of different versions of persistent
objects, which might be required by applications.
76
certain rate offering the quality of service above a certain threshold. Here data is
processed as it is delivered. Example: Annotating of video and audio data, real-time
editing analysis.
3. Collaborative work using multimedia information – It involves executing a complex
task by merging drawings, changing notifications. Example: Intelligent healthcare
network.
There are still many challenges to multimedia databases, some of which are :
✓ Modelling – Working in this area can improve database versus information retrieval
techniques thus, documents constitute a specialized area and deserve special
consideration.
✓ Design – The conceptual, logical and physical design of multimedia databases has not
yet been addressed fully as performance and tuning issues at each level are far more
complex as they consist of a variety of formats like JPEG, GIF, PNG, MPEG which is not
easy to convert from one form to another.
✓ Storage – Storage of multimedia database on any standard disk presents the problem
of representation, compression, mapping to device hierarchies, archiving and buffering
during input-output operation. In DBMS, a ”BLOB”(Binary Large Object) facility allows
untyped bitmaps to be stored and retrieved.
✓ Performance – For an application involving video playback or audio-video
synchronization, physical limitations dominate. The use of parallel processing may
alleviate some problems but such techniques are not yet fully developed. Apart from
this multimedia database consume a lot of processing time as well as bandwidth.
✓ Queries and retrieval –For multimedia data like images, video, audio accessing data
through query opens up many issues like efficient query formulation, query execution
and optimization which need to be worked upon.
77
(2) Annotated & indexed archives:
• Search by keyword; see e.g.
– Image archive: Gimp-Savvy
– Audio archive: Spotify
– Spatial db: Google maps
– Video archive: YouTube
(3) MM archive as part of a wider application
• Browsing/search by keyword, plus related actions, e.g.
– Web shop: Amazon
(4) ’True’ MM databases:
• General queries by media content, e.g.
– Melody search: Musipedia
78
• ’NoSQL’ databases (’Not only SQL’)
• Improved retrieval speed for very large quantities of data.
• Restricted update types (mainly append), restricted transaction support (relaxed
consistency requirements)
Ideal for MMDB
• Extensible database system with OO capabilities
• Support for queries and transactions involving MM objects
• Support for complex objects with MM subobjects
79
5.3.2. Distributed MMDBMS Architecture
80
Storage Manager: handles the storage and retrieval of different media objects composing a
database The metadata as well as the index information related to media objects are also
handled by this module.
Metadata Manager: deals with creating and updating metadata associated with multimedia
objects. The metadata manager provides relevant information to the query processor in order
to process a query.
Index Manager: supports formulation and maintenance of faster access structures for
multimedia information
Data Manager helps in creation and modification of multimedia objects. It also helps in
handling temporal and spatial characteristics of media objects. Metadata manager, index
manager, and object manager access the required information through the storage manager.
Query Processor receives and processes user queries. This module utilizes information
provided by metadata manager, index manager, and data manager for processing a query. In
the case where objects are to be presented to the user as part of the query response, the query
processor identifies the sequence of the media objects to be presented. This sequence along
with the temporal information for object presentation is passed back to the user.
Communication Manager: handles the interface to the computer network for server-client
interaction. This module, depending on the services offered by the network service provider,
reserves the necessary bandwidth for server-client communication, and transfers queries and
responses between the server and the client.
81
throughput, determines the schedule for object retrieval. This retrieval schedule is used
by the communication manager to download media objects in the specified sequence.
❖ Response Handler: interacts with the client communication manager in order to
identify the type of response generated by the server. If the response comprises of the
information regarding the objects to be presented, then the information is passed to the
retrieval schedule generator for determining the retrieval sequence. If the response
comprises of the information regarding modifying the posed query or a null answer, the
response is passed to the user.
❖ Interactive Query Formulator: helps a user to frame an appropriate query to be
communicated to a database server . This module takes the response from the server
(through the response handler module) in order to reformulate the query, if necessary.
Hardware resources: The available hardware resources influence both client and server
design. On the server side, the available disk space limits the size of a multimedia database that
can be handled. The number of queries that can be handled and the query response time
depends, to a certain extent, on the speed of the system. On the client side, buffer availability
for media objects retrieval is influenced by available hardware resources. User interface for
multimedia presentation is also influenced by hardware characteristics such as monitor
resolution, width and height.
Operating System: Multimedia databases are composed of continuous media objects. The
retrieval and presentation of these media objects have an associated time constraint. Hence,
real-time features might be needed as part of the operating system. Also, the file system needs
to
Handle large media objects.
Computer Network: Services offered by computer networks influences the retrieval of media
objects by a client system. If guaranteed throughput is not offered by the network service
provider, the client may not be able to retrieve objects at the required time. This may lead to an
82
object presentation sequence that differs from the temporal requirements specified in the
database. Also, the buffer requirements in the client system depend on the throughput offered
by the network service provider
83
• The web itself & search engines may be considered some kind of (poorly organized)
digital library
(g) Medical information systems
• Patient data, including X-rays, ECG curves, MRI images, ...
• Strict confidentiality
• Used for diagnosis, monitoring and research
• Automated tools: image/signal processing, pattern recognition, ...
http://www.google.com/mobile/goggles/#landmark
84
Application: real-time skin detection for human recognition
85
Indexing and Searching Method
Traditional database indexing and searching methods:
– Object identifiers
– Attributes
– Keywords …
– Can be used in MDBMS, but this is not enough!
Content-based retrieval is computationally intensive. Hence, there is a need for the design of
powerful indexing and searching schema.
Contents of MMDB
• Media data – original content
• Metadata:
– Media format data Ex. Size, Compression details
– Media keyword data Ex. Name, Author, Date
– Media feature data Ex. Scene, Background, other
contextual details
• Indexing can be done on all the above (format, keyword, feature) metadata
Example of Metadata(Feature)
• Image data:
– Content features
• Color, shape, texture, spatial information
• Video data:
– A video sequence can be viewed as a set of continuous video frames - some are
representative (key frames)
• Audio data:
– Acoustic characteristics of audio data
So while Indexing we need to consider different levels
– High-level
– Mid-level
– Low-level
86
Traditional DBMSs are designed and implemented to find exact matches between the stored
data and the query parameters for the precise queries. MDBMS need to handle the incomplete
and fuzzy queries
Multimedia object is amenable to different interpretations by different users. Representation of
multimedia object in an MDBMS is always less than complete. Therefore there is a need for
interactive retrieval …
Feature extraction and indexing of images
Extraction of descriptive attributes from images
• Manually, automatically, or using a hybrid scheme
(automatic segmentation & manual assignment of properties).
Manual indexing:
Performed by a ‘knowledge worker’, trained on patterns and vocabulary of the image database
application. Multiple indexers means: Strict consistency rules, common glossary.
Each interesting object is presented manually to the system for indexing, equipped with
descriptive attributes. Time-consuming and costly; . http://gimp-savvy.com/PHOTO-ARCHIVE
Automatic indexing
Specialized for various application domains (document recognition, optical character
recognition (OCR), engineering drawings, x-rays, ...). The system must first ‘learn’ and
categorize domain element objects. A certain amount of uncertainty (fuzziness) must be
tolerated.
Important area of automatic image analysis and object recognition: Transformation of paper
documents into digital form, and indexing those documents appropriately (so called
document imaging → digital libraries).
87
• Object relation: Objects (segments, rectangles) within images; extracted manually or
automatically.
--Attributes include: image id, object id, MBR coordinates, features.
• Queries: Apply ‘normal’ database techniques using feature values in query conditions.
General observations:
In non-spatial respects, images can usually be treated as documents, and retrieved using
techniques developed for general information retrieval. Combined usage of spatial and non-
spatial criteria in retrieval is achieved simply by combining (union, intersection, etc.) the
pointer lists from the related indexes.
88
5.4.1. Data Models: Data Model Requirements
The role of a data model in DBMSs is to provide a framework (or language) to express the
properties of the data items that are to be stored and retrieved using the system. The
framework should allow designers and users to define, insert, delete, modify, and search
database items and properties. In MIRSs and MMDBMSs, the data model assumes the
89
additional role of specifying and computing different levels of abstraction from multimedia
data.
Multimedia data models capture the static and dynamic properties of the database items, and
thus provide a formal basis for developing the appropriate tools needed in using the
multimedia data. The static properties could include the objects that make up the multimedia
data, the relationships between the objects, and the object attributes. Examples of dynamic
properties include those related to interaction between objects, operations on objects, user
interaction, and so forth.
The richness of the data model plays a key role in the usability of the MIRS. Although the basic
multimedia data types must be supported, they only provide the foundation on which to build
additional features.
The key issues are how to specify spatial and temporal relationships. Spatial relationships are
specified by the displaying window size and position for each item. The common method for
temporal specification is timeline-based specification in which the starting time and duration
of each item are specified based on a common clock . Other methods include scripts and event
driven models.
90
During presentation, the semantics of an object is maintained only when spatial and temporal
relationships are maintained.
The media type layer contains common media types such as text, graphics, image, audio, and
video. These media types are derived from a common abstract media class. At this level,
features or attributes are specified. Taking media type image as an example, the image size,
color histogram, main objects contained, and so on are specified. These features are used
directly for searching and distance calculations.
The media format layer specifies the media formats in which the data is stored. A media type
normally has many possible formats. For example, an image can be in raw bitmapped or
compressed formats. There are also many different compression techniques and standards.
The information contained in this layer is used for proper decoding, analysis and presentation.
Remaining Issues
91
Note that different applications may need different data models, due to the fact that different
features and objects are used in different applications. But many applications can share a
common base model if it is properly designed and new features and objects can be added on or
derived from this base model to meet applications requirements.
At the moment, each of the above layers of the data model is not completely designed. There is
no common standard in this area, partly due to the fact that no large scale MIRS has been
developed. Most MIRSs are application-specific, focusing on a limited number of features of a
limited number of media types. The data models, if any, are designed in an ad hoc fashion. More
work is needed in multimedia data modeling to develop consistent large scale MIRSs and
MMDBMSs.
The VIMSYS model is proposed specifically to manage visual information (images and video).
It consists of four layers: the representation layer, the image object layer, the domain object
layer
and the domain event layer. All objects in each layer have a set of attributes and methods.
92
FIG 5.3 The VIMSYS data model
The representation layer contains image data and any transformation that results in an
alternate image representation. Examples of transformations are compression, color space
conversion, and image enhancement. This layer is not very powerful at processing user
queries; it can only handle queries that request pixel-based information. However, this layer
provides the raw data for upper layers to extract and define higher level features.
The image object layer has two sub-layers: the segmentation sub-layer and the feature sub-
layer.
93
The segmentation sub-layer condenses information about an entire image or video into spatial
or temporal clusters of summarized properties. These localized properties can be searched for
directly. This sub-layer relies heavily on image and video segmentation techniques.
The feature sub-layer contains features that are efficient to compute, organizable in a data
structure, and amenable to a numeric distance computation to produce a ranking score. The
common features are color distribution (histogram), object shape, and texture.
The domain event layer defines events that can be queried by the user. Events are defined
based on motion (velocity of object movement), spatial and/or temporal relationships between
objects, appearance and disappearance of an object, and so on. Event detection and
organization mechanisms are needed to implement this layer.
94
Chapter 6
95
• Multimedia systems are integrated.
• The information they handle must be represented digitally.
• The interface to the final presentation of media is usually interactive.
Computer Controlled
Producing the content of the information – e.g. by using the authoring tools, image editor,
sound and video editor. Storing the information – providing large and shared capacity for
multimedia information. Transmitting the information – through the network.
Presenting the information to the end user – make direct use of computer peripheral such as
display device (monitor) or sound generator (speaker).
Integrated
All multimedia components (audio, video, text, graphics) used in the system must be somehow
integrated. Every device, such as microphone and camera is connected to and controlled by a
single computer. A single type of digital storage is used for all media type. Video sequences are
shown on computer screen instead of TV monitor
Interactivity
Level 1: Interactivity strictly on information delivery. Users select the time at which the
presentation starts, the order, the speed and the form of the presentation itself. Level 2: Users
can modify or enrich the content of the information, and this modification is recorded. Level 3:
Actual processing of users input and the computer generate genuine result based on the users
input
Digitally Represented
Digitization: process involved in transforming an analog signal to digital signal.
96
Text metadata consists of index features that occur in a document as well as descriptions about
the document. For providing fast text access, appropriate access structures have to be used for
storing the metadata. Also, the choice of index features for text access should be such that it
helps in selecting the appropriate document for a user query. In this section, we discuss the
factors influencing the choice of the index features for text data and the methodologies for
storing them.
Selection of Index Features
The choice of index features should be in such a way that they describe the documents in a
possibly unique manner. The definitions document frequency and inverse document frequency
describe the characteristics of index features.
Methodologies for Text Access: Once the indexing features for a set of text documents are
determined, appropriate techniques must be designed for storing and searching the index
features. The efficiency of these techniques directly influence the response time of search.
Here, we discuss the following techniques :
• Full Text Scanning: The easiest approach is to search the entire set of documents for the
queried index feature(s). This method, called full text scanning, has the advantage that the
index features do not have to be identified and stored separately. The obvious disadvantage is
the need to scan the whole document(s) for every query.
• Inverted Files : Another approach is to store the index features separately and check the
stored features for every query. A popular technique, termed inverted files, is used for this
purpose.
• Document Clustering: Documents can be grouped into clusters, with the documents in each
cluster having common indexing features.
97
document. In the case of a mismatch, the position of search in the document is shifted right
once, and this way the search is continued till either the feature is found in the document or the
end of document is reached. Though the algorithm is very simple, it suffers from the number of
comparisons that are to be made for locating the feature.
Inverted Files
Inverted files are used to store search information about a document or a set of documents.
The search information includes the index feature and a set of postings. These postings point to
the set of documents where the index features occur. Access to an inverted file is based on a
single key and hence efficient access to the index features should be supported. The index
features can be sorted alphabetically or stored in the form of a hash table or using
sophisticated mechanism such as B-trees.
Text Retrieval Using Inverted Files
Index features in user queries are searched by comparing them with the ones stored in the
inverted files, using B-tree searching, or hashing depending on the technique used in the
inverted file. The advantage of inverted files is that it provides fast access to the features and it
reduces the response time for user queries. The disadvantage is that the size of the inverted
files can become very large when the number of documents and the index features become
large. Also, the cost of maintaining the inverted files (updating and reorganizing the index files)
can be very high.
Multi-attribute Retrieval
When a query for searching a text document consists of more than one feature, different
techniques must be used to search the information. Consider a query used for searching a book
titled 'Multimedia database management systems'. Here, four key words (or attribute values)
are specified: 'multimedia', 'database', 'management', and 'systems'. Each attribute is hashed to
give a bit pattern of fixed length and the bit patterns for all the attributes are superimposed
(boolean OR operation) to derive the signature value of the query.
Searching from an image database
1. Using a hierarchical classification of images:
The user follows paths in the hierarchy,
98
Ex.
Art works
Paintings
France
18th century
2. Search using keywords in metadata
Images can be considered similar to documents with index terms
3. Search by content features(sample given)
Pattern matching based on similarity with a query image, shape, color
distribution, etc.
Image segmentation
Detection of interesting regions within images. A segment is a connected region that satisfies a
homogeneity predicate.Basis for subsequent search. And one of the most difficult tasks in image
processing.
99
Connected region:
◼ For each pair (x1, y1), (xn, yn) of pixels, there exists a chain of pixels {(x1, y1), ..., (xn, yn)} in
the region such that {(xi, yi), (xi+1, yi+1)} are adjacent for all i.
Homogeneity predicate:
◼ p % of the pixels of the connected region have the same color
Similarity search: Find images (segments) similar to a given query image (segment).
Applications: recognition of persons from photographs / fingerprints, recognition of military
airplanes, ships, etc.
100
6.2. Similarity measures during searching
The image DB user should be able to show the system a sample image, or Paint one
interactively on the screen, or Just sketch the outline of an object. The system should then be
able to return similar images or images containing similar objects.
A sample query
SELECT CONTENTS(image), QBScoreFROMStr(`averageColor= <255,0,0>’, image) AS
SCORE FROM signs ORDER BY SCORE
Results of search: system will retrieve all related image with respect to draft
image(sketch)
102
Relevance Feedback in CBIR
Motivation:
Human perception of image similarity is subjective, semantic, and task-dependent. The CBIR
based on the similarities of pure visual features are not necessarily perceptually and
semantically meaningful. Each type of visual feature tends to capture only one aspect of image
property and it is usually hard for a user to specify clearly how different aspects are combined.
Relevance Feedback is introduced to address these problems. It is possible to establish the link
between high-level concepts and low-level features.
RF is a supervised active learning technique used to improve the effectiveness of information
systems.
Main idea: use positive and negative examples from the user to improve system performance.
where g(n) is the number of items in the list relevant to the query.
Recall r:
The proportion of relevant documents retrieved from DB.
r = Total number of documents retrieved that are relevant /(divide by)
Total number of relevant documents in the database
103
Querying mechanism Example
A Sample Multimedia Scenario
Consider a police investigation of a large-scale drug operation. This investigation may generate
the following types of data
– Video data captured by surveillance cameras that record the activities taking
place at various locations.
– Audio data captured by legally authorized telephone wiretaps.
– Image data consisting of still photographs taken by investigators.
– Document data seized by the police when raiding one or more places.
– Structured relational data containing background information, back records, etc.,
of the suspects involved.
– Geographic information system data remaining geographic data relevant to the
drug investigation being conducted.
104
Possible Queries
Image Query (by example):
◼ Police officer Rocky has a photograph in front of him.
◼ He wants to find the identity of the person in the picture.
◼ Query: “Retrieve all images from the image library in which the person appearing in the
(currently displayed) photograph appears”
Video Query:
◼ Police officer Rocky is examining a surveillance video of a particular person being fatally
assaulted by an assailant. However, the assailant's face is occluded and image
processing algorithms return very poor matches. Rocky thinks the assault was by
someone known to the victim.
◼ Query: “Find all video segments in which the victim of the assault appears.”
◼ By examining the answer of the above query, Rocky hopes to find other people who
have previously interacted with the victim.
MM Database Architectures
1. Based on Principle of Autonomy
Each media type is organized in a media-specific manner suitable for that media type,
need to compute joins across different data structures
Relatively fast query processing due to specialized structures, The only choice for legacy
data banks
105
2. Based on Principle of Uniformity
A single abstract structure to index all media types. Abstract out the common part of
different media types (difficult!) - metadata
But one structure is easy for implementation
106
QUERY PROCESSING
Query on the content of the media information: (Example Query: Show the details of the
movie where a cartoon character says: 'Somebody poisoned the water hole'). The content of
media information is described by the metadata associated with media objects. Hence, these
queries have to be processed by accessing directly the metadata and then the media objects.
Query by example (QBE) : (Example Query: Show me the movie which contains this song.)
QBEs have to be processed by finding a similar object that matches the one in the example. The
query processor has to identify exactly the characteristics of the example object the user wants
to match. We can consider the following query: Get me the images similar to this one. The
similarity matching required by the user can be on texture, colour, spatial characteristics
(position of objects within the example image) or the shape of the objects that are present in
the image. Also, the matching can be exact or partial. For partial matching, the query processor
has to identify the degree of mismatch that can be tolerated. Then, the query processor has to
apply the cluster generation function for the example media object, these cluster generating
functions map the example object into an m-dimensional feature space. The query processor
has to identify the objects that are mapped within a distance d in the m-dimensional feature
space . Objects present within this distance d are retrieved with a certain measure of
confidence and are presented as an ordered list. Here, the distance d is proportional to the
degree of mismatch thatcan be tolerated.
Time indexed queries: (Example Query : Show me the movie 30 minutes after its start).
107
These queries are made on the temporal characteristics of the media objects. The temporal
characteristics can be stored using segment index trees. The query processor has to process
the time indexed queries by accessing the index information stored using segment trees or
other similar methods.
Spatial queries: (Example Query: Show me the image where President Yelstin is seen to the
left of President Clinton). These are made on the spatial characteristics associated with media
objects. These spatial characteristics can be generated as metadata information. The query
processor can access this metadata information (stored using techniques to generate the
response).
Application specific queries : (Example Query: Show me the video where the river changes its
course). Application specific descriptions can be stored as metadata information. The query
processor can access this information for generating response.
Queries in multimedia databases may involve references to multiple media objects. Query
processor may have different options to select the media database that is to be accessed first.
As a simple case, Figure describes the processing of a query that references a single media, text.
108
Assuming the existence of metadata for the text information, the index file is accessed first.
Based on the text document selected by accessing the metadata, the information is presented
to the user.
When the query references more than one media, the processing can be done in different ways.
Figure above describes one possible way of processing of a query that reference multiple
media: text and image. Assuming that metadata is available for both text and image data, the
query can be processed in two different ways:
• The index file associated with text information is accessed first to select an initial
set of Documents. Then this set of documents are examined to determine whether any
document contains the image object specified in the query. This implies that documents
carries the information regarding the contained images.
• The index file associated with image information is accessed first to select a set of
images. Then the information associated with the set of images is examined to
determine whether images are part of any document. This strategy assumes that the
information regarding the containment of images in documents are maintained as a
separate information base.
.
109
• Finding synonyms of words, and searching for the synonyms as well
• Finding all the various morphological forms of words by stemming each word in the
search query
• Fixing spelling errors and automatically searching for the corrected form or suggesting
it in the results
• Re-weighting the terms in the original query
• Creating a dictionary of expansion terms for each terms, and then looking up in the
dictionary for expansion
110
WordNet for query expansion and report negative results. Synonyms terms were added to the
query. He observed that this approach makes a little difference in retrieval effectiveness if the
original query is well formed.
WordNet is also used by Smeaton et al. along with POS tagging for query expansion. Each query
term was expanded independently and equally. The interesting thing is they completely
ignored the original query terms after expansion. As a results of this precision and recall
dropped but they were able to retrieve documents which did not contain any of the query
terms but are relevant to the query
6.5.2. QUERY LOGS BASED EXPANSION
With the increase in usage of Web search engines, it is easy to collect and use user query logs.
Query logs are maintained by each search engine in order to analyze the behavior of the user
while interacting with search engine. These kind of approaches use these query logs to analyze
the user’s preference and adds corresponding terms to query. This method can fail when user
wants to search something which is not at all related to earlier searches.
111
6.5.3. QUERY EXPANSION USING WIKIPEDIA
Wikipedia is the biggest encyclopedia available freely on the web. Wikipedia content is well
structured and correct. Li et al. Many works suggest use of Wikipedia as source for query
expansion[1, 11, 20, 21, 22]. [2007] [13] proposed query expansion using Wikipedia by
utilizing the category assignments of its articles. The base query is run against a Wikipedia
collection and each category is assigned a weight proportional to the number of top-ranked
articles assigned to it. Articles are then re-ranked based on the sum of the weights of the
categories to which each belongs.
112
Reference Materials
1. Ralf Steinmetz and Klara Nahrstedt, Multimedia Fundamentals: Media Coding and Content
Processing; Prentice Hall,
2. Ze Nian Li and M. S. Drew, Fundamentals of Multimedia, Prentice Hall, 2004.
3. G. Lu, Multimedia Database Management Systems, Artech House, 1999.
4. K.R. Rao, Zoran S. Bojkovic, Dragorad A. Milovanovic; Multimedia Communication Systems;
Prentice Hall, 2002.
5. Issac V. Kerlow, The Art of 3D Computer Animation and Effects(3rd ed). Wiley, 2
113