Professional Documents
Culture Documents
FOURIER SERIES
Any periodic function can be expressed as the sum of a
series of sines and cosines (of varying amplitudes)
SQUARE WAVE
Frequencies: f Frequencies: f + 3f
Frequencies: f + 2f + 3f Frequencies: f + 2f + 3f + … + 8f
FOURIER SERIES
A function f(x) can be expressed as a series of sines and cosines:
Transform means:
To change completely the appearance or character of something or
someone, especially so that the thing or person is improved.
Inverse DFT:
The complex numbers F0 … Fn are transformed
into complex numbers f0 … fN
DFT EXAMPLE
Interpretinga DFT can be slightly difficult, because the DFT of
real data includes complex numbers.
Basically:
“Negative” Frequencies
DFT: Magnitude
Sampled data:
f(x) = 2 sin(x+45) + sin(3x)
DFT: Magnitude
FAST FOURIER
TRANSFORM
Discrete Fourier Transform would normally require O(n2) time to process for n samples:
discontinuity
DCT TYPES
DCT Type II
In MP3, the data is overlapped so that half the data from
one sample set is reused in the next.
Known as Modified DCT or MDCT (multidimensional DCT)
Some tasks are much easier to handle in the frequency domain that in the
time domain.
Eg: graphic equalizer. We want to boost the bass:
1. Transform to frequency domain.
2. Increase the magnitude of low frequency components.
3. Transform back to time domain.
JPEG Noise from
discarding high
frequency DCT
Coefficients
Original Image
JPEG Image
MULTIMEDIA PROCESSING IN
COMMUNICATION
ADVANTAGES OF DIGITAL MEDIA
Robustness
Seamless integration
Reusability and interchangeability
Ease of distributed potential
DIGITAL IMAGE
Can be captured by,
Camera
Photography by scanner
Digital images are composed of a collection of pixels arranged in
a 2D matrix, called image resolution.
Pixel consist of: R, G, B components
The no of bits to represent a pixel is called color depth, which
decide actual no of colors available to represent a pixel.
Resolution and color depth determines presentation quality and
size of image storage.
More pixels more color better quality larger size
To reduce the storage , three approaches are used:
1. Index color
This approach reduces the storage size by using a limited no of bits with a color lookup
table to represent a pixel. Dithering can be used to create additional colors by blending
colors from the palette.
2. Color subsampling
Human perceive color brightness, hue and saturation rather than RGB components. To take
advantage of such mechanism of the human eye, light can be separated into luminance
and chrominance instead of RGB components.
This reduces the size by using less bits to represent the chrominance components while
having the luminance component unchanged.
3. Spatial reduction
It is known as data compression, reduces the size by throwing away the spatial redundancy
within images.
DIGITAL VIDEO
Video is a series of still image frames
Anything more than 20 fps is smooth to Human Visual System (HVS)
Biggest challenge:
Massive volume of data involved
To meet real time constraints on retrieval, delivery and display
Solution:
Compromise in the presentation quality,
Video compression
DIGITAL AUDIO
It is designed to make use of the range of human hearing.
Two important considerations:
Frequency response
Dynamic range
• The higher the sampling rate, the more the bits per sample and the more channels
means the higher the quality of Audio and higher the storage and bandwidth
requirements.
A 44.1KHz sampling rate, 16-bit quantization and stereo audio
reception produce CD quality audio, requires BW of =
44,100*16*2 = 1.4 Mb/s
Telephone quality audio has, 8KHz of sampling rate, 8-bit
quantization, and mono audio reception requires BW of =
8000*8*1 = 64Kb/s
Integrated media systems will only achieve their potential if they are truly integrated
in three key ways: integration of content, integration with human users and
integration with other media systems.
Integration of content: such systems must successfully combine digital video and
audio, text, animation and graphics , etc..
integration with human users: they must integrate with the individual user by
cooperative interactive multidimensional dynamic interfaces.
integration with other media systems: integrated media systems must connect with
other such systems
SIGNAL PROCESSING ELEMENTS
Many classical signal processing methods have been developed. But the
key driver is the optimization for representing such multimedia
components, associated storage and delivery requirements.
The optimization procedures ranges from very simple to sophisticated
like,
Nonlinear analog (video and audio) mapping
Quantization for analog signal
Statistical characterization
Motion representation and models
3D representations
Color processing
CHALLENGES OF MULTIMEDIA INFORMATION
PROCESSING
•Novel communications and networking are critical for a multimedia database
to support interactive dynamic interface.
•A truly integrated media system must connect with individual users and content
addressable multimedia database.
•Multimedia systems must successfully combine digital video and audio, text
animation, graphics and knowledge about such information units and their
interrelationships in real time.
•The operations of filtering, sampling, spectrum analysis and signal
representation are basic for all signal processing.
•Understanding these operations in the multidimensional (mD) has a major
activity
• Algorithms for processing mD signals can be grouped into four
categories
• 1 Separable algorithms that use 1D operators to process the rows
and columns of a multidimensional array
• 2 Non-separable algorithms that borrow their derivation from
their 1D counterpart
• 3 mD algorithms that are significantly different from their 1D
counterparts
• 4 mD algorithms that have no 1D counterparts
SEPARABLE ALGORITHMS
•operate on the rows and columns of an mD signal sequentially.
•widely used for image processing because they invariably require
less computation than non-separable algorithms
• examples of separable procedures include mD DFT, DCTs ,FFT --
based spectral estimation using the periodogram
•separable Finite Impulse Response (FIR) filters can be used in sep-
arable filter banks, wavelet representations for mD signals and
decimators and interpolators for changing the sampling rate.
NON-SEPARABLE ALGORITHMS
•They are uniquely mD in that they cannot be decomposed into a
repetition of 1D procedures.
•These can usually be derived by repeating the corresponding 1D
derivation in an mD setting.
•Ex: Upsampling and downsampling
•There are also mD algorithms that have no 1D counterparts,
especially algorithms that perform inversion and computer imaging.
•One of these is the operation of recovering an mD distribution
from a finite set of its projections, equivalently inverting a
discretized Radon transform.
•This is the mathematical basis of computed tomography (Tomography
is imaging by sections or sectioning, through the use of any kind of penetrating wave.)
and positron emission tomography.
•Another imaging method, developed first for geophysical
applications, is Fourier integration.
•Finally signal recovery methods unlike the 1D case are possible.
•The mD signals with finite support can be recovered form the
amplitudes of their Fourier transform or from threshold crossings.
In graphics, (computer generated images) all objects are made up of a series of lines
that are connected to each other either by lines, curves, etc..
FACSIMILE MACHINE
PRE AND POST PROCESSING
the hardware available to capture the data should be cheap,
affordable for a large number of users.
It is mandatory to use a preprocessing step prior to coding in
order to enhance the quality of the final pictures and to remove
the noise that will affect the performance of compression
algorithms.
Many solutions have been proposed in the field of imaging.
A more appropriate approach would be to identify the charac-
teristics of the coding scheme when designing such operators.
For example, mobile a widely used device.
Such devices are usually subject to different motions, such as tilting and jitter,
translating into a global motion in the scene due to the motion of the camera.
Here, the pre and post processing plays an important role.
It is normal to expect a certain degree of distortion of the
decoded images for very low-bit rate applications.
An additional stage could be added to reduce the distortion
further due to compression as a postprocessing operator
There are solutions to improve the effects of ringing,
blurring and mosquito noise , etc.
Recently, advances in postprocessing mechanisms have
improve lip synchronization of head-and-shoulder
video coding at a very low bit rate by using the
knowledge of decoded audio in order to correct the
positions of the lips of the speaker
SPEECH, AUDIO AND ACOUSTIC PROCESSING
Primary advances in speech and audio signal processing are:
Speech and audio signal compression
Speech synthesis
Acoustic processing and echo cancellation
Network echo cancellation
Speech and audio signal compression
Aims at efficient digital representation and reconstruction of speech and audio
signals for storage and playback as well as transmission
Various techniques for signal analysis and compression have been applied to
achieve excellent speech quality even at less than 8kbps, which forms the basis for
cellular as well as Internet telephony
Speech synthesis
includes generation of speech from unlimited text, voice conversion and
modification of speech attributes such as time scaling and articulatory mimic
Key problems is conversion of text into a sequence of speech inputs, methods to
concatenate and reconstruct the sound waveform
Acoustic processing and echo cancellation
Sound pickup and recording is an important area
Sound recoding, interference (ex., ambient noise and reverberation) degrade
the quality
Acoustic processing and echo cancellation includes the modelling of
reverberation, design of dereverberation algorithm, echo suppression,
double talk detection and adaptive echo cancellation
Network echo cancellation
In telephony, due to hybrid coil for two to four wire conversions, there are
near and far echo exist.
There are adaptive echo cancellation algorithms have been developed.
Video signal processing
• it has many advantages over conventional analog video, including bandwidth compression,
robustness against channel noise interactivity and ease of manipulation.
• digital video signal has many formats:
• ex. Broadband TV signal are digitized in ITU-R 601 format which has: 30/25 fps, 720 pixels by 488 lines per
frame, 2: 1 interlaced, 4:3 aspect ratio, and 4:2:2 chroma sample.
TV and PC industries have resulted in the approval of 18 different digital video formats in the
United States. Exchange of video signals between TV and PCs requires effective format
conversion.
There are conversion methods are available. Ex. SIF: source input format, motion-adaption field-
rate doubling and deinterlacing, motion compensated frame rate conversion.
Video signals suffer from several degradation and artifacts.
Some of them are objectionable for freezeframe or printing from video
applications.
Some filters are adaptive to scene content to preserve spatial and temporal
edges while removing the noise.
Examples of edge preserving filters are: median, weighted
median, adaptive linear mean square error and adaptive
weighted-averaging filtering.
Deblocking filters can be classified
1. Require a model of the degradation process (inverse, constrained, least square
and Wiener filtering) and
2. Do not require a model of the degradation (contrast adjustment by histogram
specification and unsharp masking). Smooth intensity variations, high resolution
reference image, reconstruction image, etc..
Another challenge is to decompose a video sequence into its elementary
parts, like synthetic or natural visual objects, finding shot boundaries, spatial
segmentation , object tracking, with that 2D and 3D representation.
Storage and archiving of digital video in shared disks and servers in large
volumes, browsing of such databases in real time and retrieval across switched
and packet networks pose many new challenges.
The simplest method to index content is by assigning
manually or semi-automatically the content to programs,
hots and visual objects.
It is of interest to browse and search for content using com-
pressed data because almost all video data will likely be
stored in compressed format.
What is video indexing?
Video indexing is the process of providing watchers a way to
access and navigate contents easily; similar to book indexing.
The selection of indexes derived from the content of the video to
help organize video data and metadata that represents the
original video stream
Video indexing system may employ frame based, content
based or object-based.
The basic components of a video indexing systems are
temporal segmentation: it extracts shots, scenes and/or video
objects
analysis of indexing features: computes content-based indexing
features for the extracted shots, scenes or objects.
visual summarization: are story boards, visual posters and
mosaic-based summaries.
CONTENT BASED IMAGE RETRIEVAL
Multimedia signal-processing methods must allow efficient access to processing and
retrieval of content in general, and visual content in particular.
Applications: medicines, entertainment, consumer industry, broadcasting, journalism,
art and e-commerce.
Signal processing, pattern recognition; computer vision, database organization,
human-computer interaction and psychology, must contribute to achieving the image
retrieval goal.
Image retrieval methods face several challenges when addressing this goal.
Example:
To improve performance and address these problems, content-based image retrieval methods
have been proposed which focus on feature extraction automatically or semi-automatically
CONTENT BASED IMAGE RETRIEVAL
•Texture based methods
•Shape based methods
•Color based methods
TEXTURE BASED METHODS
Methods based on
spatial frequencies: evaluate the coefficients of the autocorrelation function of the
texture
co-occurrence matrixes: identify repeated occurrences of gray level pixel
configuration within the texture
multiresolution methods: methods describe the texture characteristics at coarse-to-
fine resolution
have been frequently employed for texture description because of their efficiency.
A major problem: sensitivity to scale, that is, the texture characteristics may disappear
at low resolutions or may contain a significant amount of noise at high resolutions
Skin texture based
SHAPE BASED METHOD
Describing quantitatively the shape of an object is a difficult task.
Several contour-based and region-based shape description methods have been
proposed.
1. Contour based
• Chain codes, geometric border representations, Fourier transforms of the boundaries,
polygonal representations and deformable (active) model
2. Region based
• scalar region descriptors, moments, region decompositions and region neighborhood graph
• The main problems that are associated with shape description methods-are
high sensitivity to scale, difficult shape description of objects and high
subjectivity of the retrieved shape results
COLOR BASED METHODS
Three description methods:
Color histogram based : use a quantitative representation of
the distribution of color intensities.
Dominant color based: use a small number of color ranges to
construct an approximate representation of color distribution
Color moment base: statistical measures of the image
characteristics in terms of color.
The performance of these methods depends on: color space,
quantization and distance measure for evaluation of the
retrieved results.
Limitation of histogram based and dominant color based – inability
to allow the localization of an object in an image
Solution to it is – color segmentation
The performance of these methods depends on: color space,
quantization and distance measure for evaluation of the
retrieved results.
Limitation of histogram based and dominant color based – inability
to allow the localization of an object in an image
Solution to it is – color segmentation
But the limitation – complexity
Some or all of the limitations of these systems are the following:
• Few query types are supported
• Limited set of low- level features
• Difficult access to visual objects
• Results partially match user's expectations
• Limited interactivity with the user
• Limited system interoperability
• Scalability problems
PERCEPTUAL CODING OF DIGITAL AUDIO
SIGNALS
Perceptual audio coding is a compression technology for audio
signals that is based on imperfections of the human ear.
It is a lossy compression technique
The task of the perceptual audio coding technique is to have a
decoded bitstream that sounds exactly (or at least as close as
possible) as the original audio whilst keeping the compressed
file as small as possible.
GENERAL PERCEPTUAL AUDIO CODING
ARCHITECTURE It
The
can
and
transforms
be
input
time-frequency
time-frequency
quantized
stationary
analysis properties
frames
spectral
audio
coders typically
and
ranging
of the
components
into
mapping
segment
analysis ais
section
encoded
human
from
of each
set
input of parameters
usually
signals
matched
estimates
2according
to
auditory
into which
quasi-
to the
the temporal
to a perceptual
50 ms.system
frame.
distortion metric.
the time-frequency analysis section may contain the following.
• Unitary transform
• Time-invariant bank of uniform bandpass filters
• Time-varying, critically sampled bank of nonuniform
bandpass filters
• Hybrid transform/filter bank signal analyser
• Harmonic/sinusoidal analyser
• Source-system analysis (LPC/multipulse excitation).
Time frequency analysis involves a fundamental tradeoff between time and
frequency resolution requirements.
Perceptual distortion control is achieved by a psychoacoustic signal analysis that
estimates signal masking power based on psychoacoustic principles.
The frequency dependence of this threshold was quantified with a large number of
listeners.
The quiet threshold is well approximated by the nonlinear function
𝑓 −0.8 𝑓 2 𝑓 4
𝑇𝑞 𝑓 = 3.64 − 6.5𝑒 −0.6 − 3.3 + 10−0.3 [𝑑𝐵 𝑆𝑃𝐿]
1000 1000 1000
Where, SPL = Sound Pressure Level
This is representative of a young listener with acute hearing
Tq(f) can be interpreted as a maximum allowable energy level for coding distortions
introduced in the frequency domain.
Algorithm designers have no a priori knowledge regarding actual playback levels.
where the summation, limits are the critical boundaries (bl = bandlow, bh = bandhigh).
The range of the index, i, is sampling rate dependent, and in particular for i є { l , 25}
Step 3: A basilar spreading function is then convolved with the
discrete bark spectrum to account for interband masking
Where,
i = the index of critical band,
bli and bhi = the upper and lower bounds of band i,
ki = is the number of transform components in band i,
Ti = is the masking threshold in band i
int denotes rounding to the nearest integer.
TRANSFORM AUDIO CODERS