You are on page 1of 34

Application 1.


Tracking using Discrete Wavelet Transform

1. Discrete wavelet Transform

DWT stands for Discrete Wavelet Transform. In numerical analysis and functional analysis, a discrete
wavelet transform (DWT) is any wavelet transform for which the wavelets are discretely sampled. As with
other wavelet transforms, a key advantage it has over Fourier transforms is temporal resolution: it
captures both frequency and location information (location in time).

1.1. Haar wavelets

The first DWT was invented by the Hungarian mathematician Alfrd Haar. For an input represented by a
list of 2n numbers, the Haar wavelet transform may be considered to simply pair up input values, storing
the difference and passing the sum. This process is repeated recursively, pairing up the sums to provide
the next scale: finally resulting in 2n 1 dierences and one final sum.
1.2. Daubechies wavelets
The most commonly used set of discrete wavelet transforms was formulated by the Belgian
mathematician Ingrid Daubechies in 1988. This formulation is based on the use of recurrence relations to
generate progressively finer discrete samplings of an implicit mother wavelet function; each resolution is
twice that of the previous scale. In her seminal paper, Daubechies derives a family of wavelets, the first of
which is the Haar wavelet. Interest in this field has exploded since then, and many variations of
Daubechies' original wavelets were developed.
1.3. Others
Other forms of discrete wavelet transform include the non- or undecimated wavelet transform (where
downsampling is omitted), the Newland transform (where an orthonormal basis of wavelets is formed
from appropriately constructed top-hat filters in frequency space). Wavelet packet transforms are also
related to the discrete wavelet transform. Complex wavelet transform is another form.
2. Applications of DWT
The discrete wavelet transform has a huge number of applications in science, engineering, mathematics
and computer science. Most notably, it is used for signal coding, to represent a discrete signal in a more
redundant form, often as a preconditioning for data compression. Practical applications can also be found
in signal processing of accelerations for gait analysis. DWT can also be used effectively for person

Fig : 2D Discrete Wavelet Transform

3. Human body tracking system:
The objective of tracking is to closely follow objects in each frame of a video stream such that the object
position as well as other information is always known. To overcome difficulties in achieving real time
tracking and improving tracking efficiency, a novel colour-image real-time human body tracking system
based on discrete wavelet transform can be used, where a CCD camera is mounted on a rotary platform
for tracking moving objects. Procedures in tracking moving objects via the proposed approach can be
illustrated via a flow chart shown in figure below. Assume color images captured by a CCD camera having
a size of 240 320 are sent to a computer for further processing via an image acquisition card. A 2-level
discrete wavelet transform for the images is performed, where the lowest-frequency sub-image is only
used for subsequent processing.
As shown in the above diagram, the image is first captured and preprocessed. Image preprocessing
involves statistical analysis of the input image as 2D array, while filtering the image and performing noise
and bad features removal. Moving object detection is then done. If any moving object is found, the
features are extracted.

Fig 1. Flow chart of Human Body tracking system

Transforming the input data set into a set of features is called as feature extraction. Feature extraction
includes heavy use of processes and algorithms like Edge detection, Curvature/corner detection, blob
detection, ridge detection etc.. .
Then the next image frame is captured and target object is identified. This process is repeated multiple
In case moving object is not found, image is captured once again and the entire process is repeated.

Application 2. Vector


Image Compression
Image compression is the process of reducing the number of bits required to represent an image. Vector
quantization, the mapping of pixel intensity vectors into binary vectors indexing a limited number of
possible reproductions, is a popular image compression algorithm. Compression has traditionally been
done with little regard for image processing operations that may precede or follow the compression step.
Recent work has used vector quantization both to simplify image processing tasks -- such as
enhancement, classification, halftoning, and edge detection -- and to reduce the computational
complexity by performing them simultaneously with the compression. After briefly reviewing the
fundamental ideas of vector quantization, we present a survey of vector quantization algorithms that
perform image processing. 1 Introduction Data compression is the mapping of a data set into a bit stream
to decrease the number of bits required to represent the data set .

What is Vector Quantization?

A vector quantizer maps k-dimensional vectors in the vector space Rk into a finite set of vectors Y = {yi: i = 1, 2, ..., N}. Each vector
yi is called a code vector or a codeword. and the set of all the codewords is called a codebook. Associated with each codeword, yi,
is a nearest neighbor region called Voronoi region, and it is defined by:

How does VQ work in compression?

A vector quantizer is composed of two operations. The first is the encoder, and the second is the
decoder. The encoder takes an input vector and outputs the index of the codeword that offers the lowest
distortion. In this case the lowest distortion is found by evaluating the Euclidean distance between the
input vector and each codeword in the codebook. Once the closest codeword is found, the index of that
codeword is sent through a channel (the channel could be a computer storage, communications channel,
and so on). When the encoder receives the index of the codeword, it replaces the index with the
associated codeword. Figure 2 shows a block diagram of the operation of the encoder and decoder.

Figure 2: The Encoder and decoder in a vector quantizer. Given an input vector, the closest codeword is found and the index
of the codeword is sent through the channel. The decoder receives the index of the codeword, and outputs the codeword.

Application 3: COLOR


Color image processing is motivated by two main factors:
1. Color is a powerful descriptor simplifying object recognition.
2. We can distinquish between thousands of color shades and intensities, compared to about 20-30
values of gray.
There are two major areas of color image processing:
1. Full-color processing images are acquired with a full-color sensor (TV camera, color scanner);
2. Pseudocolor processing colors are assigned to a particular monochrome intensity or intensity range.

Fundamentals of color
Particular colors of the object as human perceive them are determined by the nature of light
reflection properties of the object. A body reflecting light that is balanced in all visible wavelengths
body reflecting light that is balanced in all visible wavelengths appears as white. A body reflecting
more light in a particular range of wavelengths and absorbing light in other bands appears as colored.
If the light is achromatic (no colors), its only attribute is its intensity (amount). Examples of
achromatic light: images produced by a b&w TVTV set monochrome pictures (not necessary b&w.
Due to the absorption characteristics of the human eye, we see colors as variable combinations of socalled primary colors of light: red (R), green (G), blue (B). The following wavelengths are designated
to them in 1931: 700 nm, 546.1 nm, and 435.8 nm.
Adding primary colors of light produces secondary colors: magenta (red + blue), cyan (green + blue),
and yellow (green + red). Mixing the three primary colors of light (or a secondary with its opposite
primary color) in the right intensities produces white light. Mixing together the three secondary
colors of light, black (no light) can be produced.
Colors are usually distinguished from each other through the three characteristics: brightness, hue,
and saturation. As mentioned before brightness embodies achromatic intensity Hue represents the
before, brightness embodies achromatic intensity. Hue represents the dominant color as perceived
by an observer (red, yellow, blue). Saturation is the amount of white added to a hue (purity of the
For example, we need to specify saturation to characterize pink (red + white)

Color models
A color model (color space or color system) is a specification of a coordinate system and a subspace
within that system where each color is represented by a single point.Most contemporary color models
are oriented either toward hardware (color monitors and printers) or toward applications where color
manipulation is used (color graphics for animation).
The most commonly used in Image Processing practice models are RGB (monitors, most of cameras),
CMY and CMYK (printers), and HIS (closely correspond to the human visual system.

RGB color model

The RGB (red, green, blue) color model is based on a Cartesian coordinate system. The color subspace is a
cube, in which RGB primary primary values are at the three values are at the threecorners; the secondary
colors (cyan, magenta, and yellow) are at three other corners; black is at the origin; and white is at the
farthest from the origin corner. The gray scale (points of equal RGB values) extends from black to white
along the straight line.

Images represented in the RGB model consist of three component images (one for each primary color)
that are combined into a composite color image. The number of bits used to represent a pixel is called
the pixel depth. Assuming that each component image uses 8 bits, each RGB color pixel is said to have a
depth of 24 bits. Such RGB color images are frequently called full-color images

CMY and CMYK color models

Cyan, magenta, and yellow are the secondary colors of light (or the primary colors of pigments). Most
color printing devices use CMY model. The conversion from RGB is done by:

Equal amount of three pigments must produce black. This approach is not very practical and leads to a
muddy-looking black. To produce a true black (predominant color in printing), a fourth color black is
added to the color model to make a CMYK model.

HSI color model

While RGB and CMY color models are well suited for hardware implementations, these models do not fit
well human perception of colors since we describe colors by their hue saturation and since we describe
colors by their hue, saturation, and brightness. Therefore, the hue, saturation, intensity (HSI) model was
developed, where the intensity component is independent from color information.The intensity axis goes
from black to white and be computed as
I=(R + G + B)/3

Application 4: HUMAN


1. Introduction
Human facial expression recognition by a machine can be described as an interpretation of
human facial characteristics via mathematical algorithms. Gestures of the body are read by an
input sensing device such as a web-cam. It reads the movements of the human body and
communicates with computer that uses these gestures as an input. These gestures are then
interpreted using algorithm either based on statistical analysis or artificial intelligence
techniques. The primary goal of gesture recognition research is to create a system which can
identify specific human gestures and use them to convey information. By observing face, one
can decide whether a man is serious, happy, thinking, sad, feeling pain and so on. Recognizing
the expression of a man can help in many of the areas like in the field of medical science
whereas doctor can be alerted when a patient is in severe pain. It helps in taking prompt action
at that time.

igure 1: Simple architecture of Gesture Recognition
Each box shown in figure 1 is treated as one module. The first module captures the image using the web
cam. Second module is for face detection which can detect the human face from the captured image. A
set of modules bounded by a boundary line represent pre-processing block. It consists of histogram
equalization, edge detection, thinning, and token generation modules. The next module is the training
module to store the token information that comes from the image pre-processing module. This training
has been done using back propagation neural network. And the last module is the token matching and
decision making called recognition module which produces the final result. The following flow chart
represents how all the modules work.

2. Face detection
Face detection is a process that aims to locate a human face in an image. The process is applied on stored
image or images from a camera. Human face varies from one person to another. This variation in faces
could be due to race, gender, age, and other physical characteristics of an individual. Therefore face
detection becomes a challenging task in computer vision. It becomes more challenging task due to the
additional variations in scale, orientation, pose, facial expressions, and lighting conditions. Many methods
have been proposed to detect faces such as neutral networks, skin locus, and color analysis. Since these

detected faces become an input to the recognition of the gestures, it is important to get rid of non-facial
information in the image.

3. Image pre-processing
In this block, consisting of four different modules, a face image is taken as an input and tokens are
produced as output. The first step in this block is to enhance the image quality. To do this, histogram
equalization has been performed. It is then followed by the edge detection process. Since edge detection
plays an important role in finding out the tokens, four well known algorithms i.e. Prewitt, Sobel, Prewitt
Diagonal, and Sobel Diagonal are implemented to do this.. To find edges, an image is convolved with both
masks, producing two derivative images (dx and dy). The strength of the edge at any given image location
is then calculated by taking the square root of the sum of the squares of these two derivatives.
Basically these kernels respond to edges that run vertically and horizontally according to
the pixel grid.

Figure 2: Detection using Prewitt, Sobel, and their diagonal.
4. Recognition
Once the training is over, the network is ready to recognize gesture presented at its input. For
recognizing the gesture of a face two options are provided. If user wants to recognize the gesture of
existing image, then it can be loaded from memory. As the user selects the image, the face recognition
method works and returns the face part of the image.

5. Applications
1. Gesture of a driver when he/she is driving and alerting him/her when in sleepy mood
2. In Human Computer Interaction, where the computer can interact with humans based their

Application 5.



The objective of image compression is to reduce irrelevance and redundancy of the image data
in order to be able to store or transmit data in an efficient form. In numerical analysis and
functional analysis, a discrete wavelet transform (DWT) is any wavelet transform for which the
wavelets are discretely sampled. As with other wavelet transforms, a key advantage it has over
Fourier Transform is temporal resolution: it captures both frequency and location information
(location in time).
Wavelet compression is a form of data compression well suited for image
compression (sometimes
also video
compression and audio
implementations are JPEG 2000 and ECW for still images, and REDCODE, the BBC's Dirac, and
Ogg Tarkin for video. The goal is to store image data in as little space as possible in a file.
Wavelet compression can be either lossless or lossy.
Using a wavelet transform, the wavelet compression methods are adequate for
representing transients, such as percussion sounds in audio, or high-frequency components in
two-dimensional images, for example an image of stars on a night sky. This means that the
transient elements of a data signal can be represented by a smaller amount of information than
would be the case if some other transform, such as the more widespread discrete cosine
transform, had been used.
Wavelet compression is not good for all kinds of data: transient signal characteristics mean good
wavelet compression, while smooth, periodic signals are better compressed by other methods,
particularly traditional harmonic compression (frequency domain, as by Fourier transforms and
related). Data statistically indistinguishable from random noise is not compressible by any

First a wavelet transform is applied. This produces as many coefficients as there are pixels in the
image (i.e.: there is no compression yet since it is only a transform). These coefficients can then
be compressed more easily because the information is statistically concentrated in just a few
called transform
the coefficients are quantized and the quantized values are entropy encoded and/or run length
encoded. A few 1D and 2D applications of wavelet compression use a technique called "wavelet

Haar wavelet transformation

The Haar wavelet transformation is composed of a sequence of low-pass and high-pass filters, known as
a filter bank. These filter sequences can be applied in the same way as a discrete FIR filter in the DSP,
using the MACP command, except as multiple successive FIR filters. The low pass filter performs an
averaging/blurring operation, and is expressed as:



And the high-pass filter performs a differencing operation and can be expressed as


(1, 1)

On any adjacent pixel pair, the complete wavelet transform can be represented in matrix format by:
Second half: Applying 1D
Transformation to Columns of Image

First half: Applying 1D
Transformation to Rows of Image

Where A is the matrix representing the 2D image pixels, T is the Haar wavelet transformation of the
image, and

WN = =
G 1















In the case of transforming a 4x4 pixel image. The equivalent matrix can be expanded for larger images.
This transformation can be represented by an FIR filter approach for use with the DSPs MACP operation.
The result of the complete transformation, T, is composed of 4 new sub-images, which correspond to the
blurred image, and the vertical, diagonal, and horizontal differences between the original image and the
blurred image. The blurred representation of the image removes the details (high frequency
components), which are represented separately in the other three images, in a manner that produces a
sparser representation overall, making it is easier to store and transmit. Below is an example of a wavelet
transformed image portraying the four sub-images as explained above.


The inverse transformation can be applied to T, resulting in lossless compression. Lossy compression can
be implemented by manually setting to zero the elements below a certain threshold in T. The equation of
the inverse transformation is:

A% = WN1TWNT ( = A after lossless compression)



There are several techniques that can be applied for verifying and confirming a users identity. They can
be broadly classified as below:

Something the user knows, such as a password or PIN

Something the user has, such as a smart card or ATM card
Something thats part of the user, such as a fingerprint or iris. The strongest authentication
involves a combination of all three

The technology used for identification of a user based on a physical characteristic, such as a fingerprint,
iris, face, voice or handwriting is called Biometrics.
Advancements in technology has made possible to build rugged and reliable Biometric authentication
systems, and the costs of biometrics authentication systems have been dropping as reliability is
The key steps involved in a biometric authentication system are:

Image capture Using scanning devices available in the market

Image recognition - Using standard algorithms
Template creation - Using standard algorithms

Matching Using application and standard algorithms

Biometrics growth and advantages

The biometrics is experiencing fast development across the world because of two main reasons.
1> Evolutions and rapid expansion of Information technology and web which claims for secure access
control and secure data transfer.
2> Terrorism has put a lot of threat to Governments, which has raised the demand for accurate
identification of individuals. Consequently, new biometrics detectors are explored and more and more
impeccable biometrics systems are being invented. This situation demands for in-depth biometric
research. There are many conferences held frequently, which bring together the biometrics researcher at
one place. Companies and researcher come forward to share their knowledge about biometrics trend.
Such exhibition showcases the latest biometrics products.

Fingerprint recognition
Fingerprint recognition is also known as image acquisition. In this part of the rocess, a user places his or
her finger on a scanner. Numerous images of the fingerprint are then captured. It should be noted that


during this stage, the goal is to capture images of the center of the fingerprint, which contains many of
the unique features. All of the captured images are then converted into black and white images.

Face Recognition:
Face Recognition technology have lead to perfection since many years. Now Face recognition technology
is the best of all technology available where the large no of database is required to identify with easy,
user friendly process. The Face technology has been made to safely recognize persons, independent of
variances that appear to human faces. Face Recognition technology handles pose, mimic, aging variance
as well as variances coming from a new hair style, glasses or temporary lighting changes.

Vein Recognition:
Each and every person is having unique pattern of palm veins. Even if they are twin they have different
pattern of palm veins. In order to identify a person complicated vascular pattern is very much helpful and
thats why it is having quite differentiating features for persons identification. One of the greater
advantages of palm vein is that they do not adopt any change during the life of a human because they lie
under the skin. It is very much secure method of identification and authentication.

IRIS Recognition:
IRIS recognition is one of the biometric identification and authentication that employs pattern
recognition technology with the help of high resolution images of IRIS of a eye of a particular person. IRIS
recognition is absolutely different than Retina Scan Technology. IRIS recognition employs camera with
infrared illumination which reduces reflection from convex cornea and imparts detail rich images with
complex structure of IRIS. Then these images converted to digital templates and provide mathematical
representation of the IRIS. This technology provides unambiguous and perfect identification of an
individual. Person with a glasses or contact lenses can also be identified by IRIS recognition. Because of its
speed of comparison, IRIS recognition is the unique biometric recognition technology perfectly suited for
one to many identification.

Ethical challenges
Deployment of biometric systems for usage in Health care industry, law enforcement, financial sector,
security. Database creation and management, theoretical limitations, future and issues of biometrics.

Applications of Biometric

Physical access control

Justice judiciary law enforcement
Logical access control
Time and attendance
Healthcare biometrics
Border control/Airport biometrics
Financial and transactional
Biometric integrators/Resellers


Application 7.


Content-based image retrieval (CBIR), also known as query by image content (QBIC) and content-based
visual information retrieval (CBVIR) is the application of computer vision techniques to the image
retrieval problem, that is, the problem of searching for digital images in large databases. (see this
survey[1] for a recent scientific overview of the CBIR field). Content based image retrieval is opposed to
concept based approaches (see concept based image indexing).
"Content-based" means that the search will analyze the actual contents of the image rather than the
metadata such as keywords, tags, and/or descriptions associated with the image. The term 'content' in
this context might refer to colors, shapes, textures, or any other information that can be derived from the
image itself. CBIR is desirable because most web based image search engines rely purely on metadata
and this produces a lot of garbage in the results. Also having humans manually enter keywords for images
in a large database can be inefficient, expensive and may not capture every keyword that describes the
image. Thus a system that can filter images based on their content would provide better indexing and
return more accurate results.
CBIR is about developing an image search engine, not only by using the text annotated to the image by an
end user (as traditional image search engines), but also using the visual contents available into the
images itselves.
Initially, CBIR system should has a database, containing several images to be searched. Then, it should
derive the feature vectors of these images, and stores them into a data structure like on of the Tree
Data Structures (these structures will improve searching efficiancy).
A CBIR system gets a query from user, whether an image or the specification of the desired image. Then,
it searchs the whole database in order to find the most similar images to the input or desired image.
The main issues in improving CBIR systems are:
Which features should be derived to describe the images better within database
Which data structure should be used to store the feature vectors
Which learning algorithms should be used in order to make the CBIR wiser
How to participate the users feedback in order to improve the searching result



Signature of a person is an Important Biometric Attribute of a human being and is used for
authorization purpose for decades. With a lot of computing power available with modern
computers there is a vast scope to develop fast algorithms for signature recognition. There is
a lot of research work is being conducted in this field.
Various approaches are possible for signature recognition with a lot of scope of research. In
this project we deal with an Off-line signature recognition technique, where the signature
is capture and presented to the user in the format of image only. We use various image
processing techniques to extract the parameters of signatures and verify the signature based
on these parameters.
1. Preprocessing
(I) Select Signature image
As the person signs his signature is scanned using a scanner and inserted into the system as
an RGB image regardless to the pen color which is used in signing process.
(II) Invert Image
The NOT of two images is carried out by performing the inversion operation on the
corresponding pixels of the image to produce the output pixel value. The inversion
technique can be used to get the negative of the image.
(III) Gray Scale Image
Grayscale images are images without color, or achromatic images. The levels of a grayscale
range from 0 (black) to 1 (white).
(IV) Binary Image
A binary image is a digital image that has only two possible values for each pixel.
2. Skeletonization
Skeletonization is a process for reducing foreground regions in a binary image to a skeletal
remnant that largely preserves the extent and connectivity of the original region while throwing
away most of the original foreground pixels. To see how this works, imagine that the foreground
regions in the input binary image are made of some uniform slow-burning material. Light fires
simultaneously at all
points along the boundary of this region and watch the fire move into the interior. At points
where the fire traveling from two different boundaries meets itself, the fire will extinguish itself
and the points at which this happens form the so called `quench line'. This line is the skeleton.
Under this definition it is clear that thinning produces a sort of skeleton.


3. Dilation
Dilation is one of the two basic operators in the area of mathematical morphology, the other
being erosion. It is typically applied to binary images, but there are versions that work on
grayscale images. The basic effect of the operator on a binary image is to gradually enlarge the
boundaries of regions of foreground pixels (i.e. white pixels, typically). Thus areas of foreground
pixels grow in size while holes within those regions become smaller.
4. Sub Images
The pixel subtraction operator takes two images as input and produces as output a third image
whose pixel values are simply those of the first image minus the corresponding pixel values from
the second image. It is also often possible to just use a single image as input and subtract a
constant value from all the pixels. Some versions of the operator will just output the absolute
difference between pixel values, rather than the straightforward signed output.


Application 9.

Colour Image Smoothing And Sharpening

The process of smoothing or blurring and image supresses noise and small fluctuations. In the frequency
domain, this process refers to the supression of high frequencies.
A smoothing filter can be built in Matlab by using function fspecial (special filters):
gaussianFilter = fspecial('gaussian', [7, 7], 5)
builds a gaussian filter matrix of 7 rows and 7 columns, with standard deviation of 5.

The process of sharpening is related to edge detection - changes in color are attenuated to create an
effect of sharper edges.
Using afspecial, we create a filter for sharpening an image. The special filter is ironically named 'unsharp':
sharpFilter = fspecial('unsharp');
subplot(2,2,1), image(pomegranate), title('Original Pomegranate Seeds');
sharp = imfilter(pomegranate, sharpFilter, 'replicate');
subplot(2,2,2), image(sharp), title('Sharpened Pomegranate');
sharpMore = imfilter(sharp, sharpFilter, 'replicate');
subplot(2,2,3), image(sharpMore), title('Excessive sharpening attenuates noise');


Application 10. Digital watermarking

Digital watermarking is the process of embedding information into a digital signal which may be used to
verify its authenticity or the identity of its owners, in the same manner as paper bearing a watermark for
visible identification


Figure: Watermark Embedding Scheme.

Types of Digital Watermarking:

Visible watermarks
Invisible watermark:
Public watermark:
Fragile watermark:
Private watermark:
Perceptual watermarks:


Copyright protection is probably the most common use of watermarks today. Copyright owner
information is embedded in the image in order to prevent others from alleging ownership of th
The fingerprint embeds information about the legal receiver in the image. This involves
embedding a different watermark into each distributed image and allows the owner to locate
and monitor pirated images that are illegally obtained.
Prevention of unauthorized copying is accomplished by embedding information about how often
an image can be legally copied. An ironic example in which the use of a watermark might have
prevented the wholesale pilfering of an image is in the ubiquitous Lena image, which has been
used without the original owners permission.
In an image authentication application the intent is to detect modifications to the data. The
characteristics of the image, such as its edges, are embedded and compared with the current
images for differences.
Medical applications Names of the patients can be printed on the X-ray reports and MRI scans
using techniques of visible watermarking.

Invisibility: an embedded watermark is not visible.
Robustness: piracy attack or image processing should not affect the embedded watermark.


Readability: A watermark should convey as much information as possible. A watermark should

be statistically undetectable. Moreover, retrieval of the digital watermark can be used to identify
the ownership and copyright unambiguously.
Security: A watermark should be secret and must be undetectable by an unauthorized user in
general. A watermark should only be accessible by authorized parties. This requirement is
regarded as a security and the watermark is usually achieved by the use of cryptographic keys.

In the watermark embedding phase, the color space of the color host image is first converted from RGB
to YCbCr.
The original image X is converted to the YCbCr color space.
Following equation is the formula of YCbCr transformationY 0.29 0.58 0.11 R
Cb = -0.16 -0.33 0.50 G
Cr 0.50 -0.41 -0.81 B
The Y component represents the luminance. The Cb and Cr components represent the chrominance.
There are many kind of sampling method are available like 4:2:2 sampling and 4:1:1 sampling.
Digital image watermarking schemes mainly fall into two broad categories: Spatial-domain and
Frequency-domain techniques.
Spatial Domain Techniques
Some of the Spatial Techniques of watermarking are as follow.
Least-Significant Bit (LSB): The earliest work of digital image watermarking schemes embeds
watermarks in the LSB of the pixels. Given an image with pixels, and each pixel being represented
by an 8-bit sequence, the watermarks are embedded in the last (i.e., least significant), bit, of
selected pixels of the image. This method is easy to implement and does not generate serious
distortion to the image; however, it is not very robust against attacks. For instance, an attacker
could simply randomize all LSBs, which effectively destroys the hidden information.
SSM-Modulation-Based Technique: Spread-spectrum techniques are methods in which energy
generated a one or more discrete frequencies is deliberately spread or distributed in time or
frequency domains. This is done for a variety of reasons, including the establishment of secure
communications, increasing resistance to natural interference and jamming, and to prevent
detection. When applied to the context of image watermarking, SSM based watermarking
algorithms embed information by linearly combining the host image with a small pseudo noise
signal that is modulated by the embedded watermark.
Frequency Domain Techniques
Compared to spatial-domain methods, frequency-domain methods are more widely applied. The aim is to
embed the watermarks in the spectral coefficients of the image. The most commonly used transforms are
the Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT), Discrete Wavelet Transform
(DWT), Discrete Laguerre Transform (DLT) and the Discrete Hadamard Transform (DHT).
The reason for watermarking in the frequency domain is that the characteristics of the human visual
system (HVS) are better captured by the spectral coefficients. For example, the HVS is more sensitive to
low-frequency coefficients, and less sensitive to high-frequency coefficients. In other words, lowfrequency coefficients are perceptually significant, which means alterations to those components might
cause severe distortion to the original image. On the other hand, high-frequency coefficients are

considered insignificant; thus, processing techniques, such as compression, tend to remove highfrequency coefficients aggressively. To obtain a balance between imperceptibility and robustness, most
algorithms embed watermarks in the midrange frequencies.
Discrete Cosine Transformation (DCT):
CT like a Fourier Transform, it represents data in terms of frequency space rather than an
amplitude space. This is useful because that corresponds more to the way humans perceive light,
so that the part that are not perceived can be identified and thrown away.
DCT based watermarking techniques are robust compared to spatial domain techniques. Such
algorithms are robust against simple image processing operations like low pass filtering,
brightness and contrast adjustment, blurring etc. However, they are difficult to implement and
are computationally more expensive. At the same time they are weak against geometric attacks
like rotation, scaling, cropping etc. DCT domain watermarking can be classified into Global DCT
watermarking and Block based DCT watermarking. Embedding in the perceptually significant
portion of the image has its own advantages because most compression schemes remove the
perceptually insignificant portion of the image.
Discrete Wavelet Transformation (DWT): The Discrete Wavelet Transform (DWT) is currently
used in a wide variety of signal processing applications, such as in audio and video compression,
removal of noise in audio, and the simulation of wireless antenna distribution. Wavelets have
their energy concentrated in time and are well suited for the analysis of transient, time-varying
signals. Since most of the real life signals encountered are time varying in nature, the Wavelet
Transform suits many applications very well. Watermarked Original Watermarked is currently
used in a wide variety of signal processing applications, such as in audio and video compression,
removal of noise in audio, and the simulation of wireless antenna distribution. Wavelets have
their energy concentrated in time and are well suited for the analysis of transient, time varying
signals. Since most of the real life signals encountered are time varying in nature, the Wavelet
Transform suits many applications very well . We use the DWT to implement a simple
watermarking scheme. The 2-D discrete wavelet transforms (DWT) decomposes the image into
sub-images, 3 details and 1 approximation. The approximation looks just like the original; only on
1/4 the scale. The 2-D DWT is an application of the 1-D DWT in both the horizontal and the
vertical directions. The DWT separates an image into a lower resolution approximation image (LL)
as well as horizontal (HL), vertical (LH) and diagonal (HH) detail components. The low-pass and
highpass filters of the wavelet transform naturally break a signal into similar (low pass) and
discontinuous/rapidlychanging (high-pass) sub-signals. The slow changing aspects of a signal are
preserved in the channel with the lowpass filter and the quickly changing parts are kept in the
high-pass filters channel. Therefore we can embed high-energywatermarks in the regions that
human vision is less sensitive to, such as the high-resolution detail bands (LH, HL, and HH).
Embedding watermarks in these regions allow us to increase the robustness of our watermark, at
littleto no additional impact on image quality. The fact that the DWT is a multi-scale analysis can
be used to the watermarking algorithms benefit.


Application 11. Image

segmentation based on colour and colour image


In computer vision, segmentation refers to the process of partitioning a digital image into multiple
segments (sets of pixels, also known as superpixels). The goal of segmentation is to simplify and/or
change the representation of an image into something that is more meaningful and easier to analyze.
Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images.
More precisely, image segmentation is the process of assigning a label to every pixel in an image such
that pixels with the same label share certain visual characteristics.
The result of image segmentation is a set of segments that collectively cover the entire image, or a set of
contours extracted from the image. Each of the pixels in a region are similar with respect to some
characteristic or computed property, such as color, intensity, or texture.
Adjacent regions are significantly different with respect to the same characteristics.
When applied to a stack of images, typical in Medical imaging(Medical imaging is the technique and
process used to create images of the human body (or parts and function thereof) for clinical purposes), the
resulting contours after image segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like Marching cubes.

1. Medical Imaging

A CT scan image showing a ruptured abdominal aortic aneurysm.

Color image segmentation is useful in many applications. From the segmentation results, it is possible to
identify Regions of interest and objects in the scene, which is very beneficial to the subsequent image
analysis or annotation.

2. Color image compression

The objective of image compression is to reduce irrelevance and redundancy of the image data in order
to be able to store or transmit data in an efficient form.
Image compression may be lossy or lossless.
Lossless compression is preferred for archival purposes and often for medical imaging, technical
drawings, clip art, or comics. This is because lossy compression methods, especially when used at low bit
rates, introduce compression artifacts.
Lossy methods are especially suitable for natural images such as photographs in applications where
minor (sometimes imperceptible) loss of fidelity is acceptable to achieve a substantial reduction in bit


rate. The lossy compression that produces imperceptible differences may be called visually lossless.

Few Segmentation results : Column (a) Original Image, Column (b) Segmented images.

3. Methods for lossless image compression are:

Run-length encoding used as default method in PCX and as one of possible in BMP, TGA,TIFF
DPCMand Predictive Coding
Entropy encoding - Adaptive dictionary algorithms such as LZW used in GIF and TIFF
Deflation used in PNG, MNG, and TIFF
Chain codes
4. Methods for lossy compression:
Reducing the color space to the most common colors in the image. The selected colors are specified in
the color palette in the header of the compressed image. Each pixel just references the index of a color in
the color palette. This method can be combined with ditheringto avoid posterization.
Chroma subsampling. This takes advantage of the fact that the human eye perceives spatial changes of
brightness more sharply than those of color, by averaging or dropping some of the chrominance
information in the image.
Transform coding. This is the most commonly used method. A Fourier-related transformsuch as DCT or
the wavelet transform are applied, followed by quantization and entropy coding.
Fractal compression


Application 12. Object detection

using correlation principle

Basis for finding matches of a sub-image w (x , y) ( Size J x K) and image f (x, y) ( size M x N)
Correlation between f ( x, y) and w (x y)
c (x, y ) = f (s, t) . w (x +s, y + t)
o For x = 0, 1, 2, . . M-1
o y =0, 1, 2, . . .N-1
Summation is taken over the region where w and f overlap
The functions used are normalized

Any one value (x0, y0) yields correlation c As x, y are varied .

we get function c (x, y)
The maximum value of c will indicate the position where W best matches f .
Accuracy is lost near the edges of image f
The correlation function is sensitive to changes in the amplitudes of f and w
If the all the values of f are doubled the value of correlation c( x, y) will also double
This difficulty is overcome by matching the correlation coefficients which are defined as


x = 0, 1, 2, . . M-1
y =0, 1, 2, . . .N-1


w is average value of pixels in w

f is average value of pixels of f in the region
Coincidence with the current location of w,
 The correlation coefficient (x, y ) is scaled in the range -1 to +1 independent of scale
change in the amplitude of f and w


Application 13. Text compression

Text compression is a technique which reduces the size of a file or data collection without
affecting the information contained in the file. A compressed file, obviously, requires less
storage space. Moreover, electronic transfer of a compressed file is faster than the transfer of an
uncompressed file and a smaller file is consequently less susceptible to transmission errors.
Compression is useful because it helps reduce the consumption of expensive resources, such as
hard disk space or transmission bandwidth. On the downside, compressed data must be
decompressed to be used, and this extra processing may be detrimental to some applications.
For instance, a compression scheme for video may require expensive hardware for the video to
be decompressed fast enough to be viewed as it is being decompressed (the option of
decompressing the video in full before watching it may be inconvenient, and requires storage
space for the decompressed video).
The design of data compression schemes therefore involves trade-offs among various factors,
including the degree of compression, the amount of distortion introduced (if using a lossy
compression scheme), and the computational resources required to compress and uncompress
the data.
Compression Technique

Compression Results


LZ family



Applications of Data Compression

Generic file compression.
Archivers: PKZIP.
File systems: NTFS.
Images: GIF, JPEG, CorelDraw.
Sound: MP3.


Video: MPEG, DivX, HDTV.

ITU-T T4 Group 3 Fax.
V.42bis modem.
Run-Length Encoding
Exploit long runs of repeated characters. Replace run by count followed by repeated character, but don't
bother if run is less than 3.
Annoyance: how to represent counts. Runs in binary file alternate between 0 and 1, so output count only. "File inflation"
if runs are short. Application: Black and white graphics, JPEG (Joint Photographic Experts Group).
. compression ratio improves with resolution!

Huffman coding
Variable length code whose length is inversely proportional to that characters frequency must satisfy
nonprefix property to be uniquely decodable two pass algorithm first pass accumulates the character
frequency and generate codebook second pass does compression with the codebook

Huffman Algorithm:
Create codes by constructing a binary tree
1. consider all characters as free nodes
2. assign two free nodes with lowest frequency to a parent nodes with weights equal to sum of their
3. remove the two free nodes and add the newly created parent node to the list of free nodes
4. repeat step2 and 3 until there is one free node left. It becomes the root of tree

Example(64 data)








Color frequency
Huffman code


Application 14. AUTOMATIC


Automatic number plate recognition is a mass surveillance method that uses optical character
recognition on images to read the license plates on vehicles. They can use existing closed-circuit
television or road-rule enforcement cameras, or ones specifically designed for the task. They are used by
various police forces and as a method of electronic toll collection on pay-per-use roads and cataloging
the movements of traffic or individuals.
ANPR can be used to store the images captured by the cameras as well as the text from the license plate,
with some configurable to store a photograph of the driver. Systems commonly use infrared lighting to
allow the camera to take the picture at any time of the day. ANPR technology tends to be region-specific,
owing to plate variation from place to place.
There are six primary algorithms that the software requires for identifying a license plate:
1. Plate localization responsible for finding and isolating the plate on the picture.
2. Plate orientation and sizing compensates for the skew of the plate and adjusts the dimensions
to the required size.
3. Normalization adjusts the brightness and contrast of the image.
4. Character segmentation finds the individual characters on the plates.
5. Optical character recognition.
6. Syntactical/Geometrical analysis check characters and positions against country-specific rules.
The complexity of each of these subsections of the program determines the accuracy of the system.
During the third phase (normalization), some systems use edge detection techniques to increase the
picture difference between the letters and the plate backing. A median filter may also be used to reduce
the visual noise on the image.

1. The first step in the recognition process is obtaining a photo of the vehicle usually by use of mounted
CCTV camera. After this, some type of Algorithm must be performed to transform an image to a stream
consisting of the license plate number.
2. One would require doing basic preprocessing like image enhancement; plate area localization and
noise reduction. An important characteristic of a license plate is its rectangular shape, which can also be
exploited for localization purposes. This would have to be followed by image segmentation, where
individual character is identified based on their orientation. A simple way to localize these features is to
examine edge and variance information.
3. This can be done by applying a sobel operator and obtaining the image gradient. A thresholding
algorithm then can be applied to obtain a binary edge image. A local variance image can be obtained by
sliding a window across an image and calculating the variance within each window. Combining this area
of high activity can be localized. Finally Character recognition could be performed on the number plate.


Application 15. Currency





Application 16.

Handwritten And Printed Character Recognition

1. Handwritten Character Recognition

Handwriting recognition is the ability of a computer to receive and interpret intelligible handwritten input
from sources such as paper documents, photographs, touch-screens and other devices. The image of the
written text may be sensed "off line" from a piece of paper by optical scanning (optical character
recognition) or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed
"on line", for example by a pen-based computer screen surface.
Handwriting recognition principally entails optical character recognition. However, a complete
handwriting recognition system also handles formatting, performs correct segmentation into characters
and finds the most plausible words.
Character extraction
Off-line character recognition often involves scanning a form or document written sometime in the past.
This means the individual characters contained in the scanned image will need to be extracted. Tools
exist that are capable of performing this step[1] however, several common imperfections in this step. The
most common being characters that are connected together are returned as a single sub-image
containing both characters. This causes a major problem in the recognition stage. Yet many algorithms
are available that reduce the risk of connected characters.
Character recognition
After the extraction of individual characters occurs a recognition engine is used to identify the

corresponding computer character. Several different recognition techniques are currently

Neural networks
Neural network recognizers learn from an initial image training set. The trained network then makes the
character identifications. Each neural network uniquely learns the properties that differentiate training
images. It then looks for similar properties in the target image to be identified. Neural networks are quick
to setup; however, they can be inaccurate if they learn properties that are not important in the target
Feature extraction
Feature extraction works in a similar fashion to neural network recognizers however, programmers must
manually determine the properties they feel are important.
Some example properties might be:
Aspect Ratio
Percent of pixels above horizontal half point
Percent of pixels to right of vertical half point
Number of strokes
Average distance from image center
Is reflected y axis
Is reflected x axis
This approach gives the recognizer more control over the properties used in identification. Yet any
system using this approach requires substantially more development time than a neural network because
the properties are not learned automatically.
On-line recognition


On-line handwriting recognition involves the automatic conversion of text as it is written on a special
digitizer or PDA, where a sensor picks up the pen-tip movements as well as pen-up/pen-down switching.
That kind of data is known as digital ink and can be regarded as a dynamic representation of handwriting.
The obtained signal is converted into letter codes which are usable within computer and text-processing
The elements of an on-line handwriting recognition interface typically include:
a pen or stylus for the user to write with.
a touch sensitive surface, which may be integrated with, or adjacent to, an output display.
a software application which interprets the movements of the stylus across the writing surface,
translating the resulting strokes into digital text.
2. Printed Character Recognition:
The recognition system consists of two main processing units a character separator and an isolated
character classifier.
Character separation (frequently called segmentation) can work in two modes:
fixed (constrained) spacing mode (where character size is known in advance and therefore
segmentation can be very robust)
variable (arbitrary) spacing (where no a priori information can be assumed)

Isolated Character Classifier

The recognition module gets on input an extracted and size-normalized image representing a character
to be recognized. The module produces on output an ordered list of a few of the most probable
classification candidates, together with their confidence values.
The task is performed by matching the raster sample with template masks representing different
characters. The masks are prepared by an off-line training phase. A mask can be considered as a raster
image containing three types of pixels: black, white, and undefined (gray).
Initially, template masks are built per font. In a single font-set of masks, every character is represented by
exactly one mask. Images representing template masks built for Courier font are presented below.
In practice, a font character to be recognized is often unknown a priori. Hence, templates representing
the most prevalent fonts are prepared and combined together.
The Omnifont recognizer, containing a number of masks per character, is shown below. An input image is
correlated with all the masks stored in the recognizer. The mask which has the highest correlation score

is taken to be the primary result of the recognition.


Constrained Printing Recognition

In this case, character spacing is fixed. Hence, segmentation is possible even when fields are
distorted, as

Unconstrained Printing Recognition

The main steps in this recognition process are:
Possible slant is estimated and compensated (in order to cope with italics and back slanted
Top and bottom base lines are detected.
The whole image is divided into horizontally separated "words."
Each word is processed separately, and decomposed into connected components.
The connected components undergo further analysis. Some of them are decomposed into
smaller parts (we call them atoms).
Thus, the problem of characters separation is reduced to a problem of correct partition of an
ordered sequence of atoms. In other words, we need to combine the atoms to molecules. Of

course, this can be done in a variety of ways.