This action might not be possible to undo. Are you sure you want to continue?

1. INTRODUCTION:

Uncompressed multimedia (graphics, audio and video) data requires considerable storage

capacity and transmission bandwidth. Despite rapid progress in mass-storage density, processor

speeds, and digital communication system performance, demand for data storage capacity and

data-transmission bandwidth continues to outstrip the capabilities of available technologies. The

recent growth of data intensive multimedia-based web applications have not only sustained the

need for more efficient ways to encode signals and images but have made compression of such

signals central to storage and communication technology.

Image compression can be lossy or lossless. Lossless compression involves with

compressing data which when decompressed, will be an exact replica of the original data. This is

the case when binary data such as executables, documents etc. are compressed they need to be

exactly reproduced when decompressed. On the other hand, images (and music too) need not be

reproduced exactly.

Near-lossless compression denotes compression methods, which give quantitative bounds

on the nature of the loss that is introduced. Such compression techniques provide the guarantee

that no pixel difference between the original and the compressed image is above a given value.

An approximation of the original image is enough for most purposes, as long as the error

between the original and the compressed image is tolerable. This is because lossy compression

methods, especially when used at low bit rates, introduce compression artifacts. For the lossy

reconstructions at the intermediate stages, no precise bounds can be set on the extent of distortion

present. Near-lossless compression in such a framework is only possible either by an appropriate

pre quantization of the wavelet coefficients and lossless transmission of the resulting bit stream,

or by truncation of the bit stream at an appropriate point followed by transmission of a residual

layer to provide the near-lossless bound.

This project aims on providing an application of neural networks to still image compression in

frequency domains. The sparse properties of Support Vector Machine (SVM) learning are

exploited in the compression algorithms. SVM has the property that it will choose the minimum

number of training points to use as centers of the Gaussian kernel functions. It is this property

that is exploited as the basis for image compression algorithm. compression is more efficient in

frequency space.

1

1.1 OBJECTIVE:

• To clearly understand and to Implement An Algorithm for the Application of

SVM(Non Linear Regression) Learning and DCT to Image Compression

• To compare the obtained results with the standard image compression techniques

available like JPEG.

• To obtain good quality, good compression ratio and signal to noise ratio within

the required bound

2

1.2 DETAILED LITERATURE SURVEY:

IMAGE COMPRESSION

The degree of compression is best expressed in terms of the average information or entropy

of a compressed image source, expressed in terms of bits/pixel. Regardless of the particular

technique used, compression engines accomplish their intended purpose in the following manner:

1. Those portions of the image which are not perceptible to the human eye are not

transmitted.

2. Frame redundancies in the image are not transmitted.

3. The remaining information is coded in an efficient manner for transmission.

Currently, a number of image compression techniques are being used singly or in combination.

These include the following

1.2.1 ARTIFICIAL NEURAL NETWORKS[1]

In this paper describes an algorithm using backpropagation learning in a feed forward

network. The number of hidden neurons were fixed before learning and the weights of the

network after training were transmitted. The neural network (and hence the image) could then be

3

recovered from these weights. Compression was generally around 8:1 with an image quality

much lower than JPEG.

1.2.2 IMAGE COMPRESSION BY SELF-ORGANIZED KOHONEN MAP[2]

In this paper a compression scheme based on the discrete cosine transform (DCT), vector

Quantization of the DCT coefficients by Kohonen map, differential coding by first-order

predictor and entropic coding of the differences. This method gave better performance than

JPEG for compression ratios greater than 30:1

.

1.2.3 SUPPORT VECTORS IN IMAGE COMPRESSION[3]

In this paper The use of support vector machines (SVMs) in an image compression

algorithm was first presented. This method used SVM to directly model the color surface.

The parameters of a neural network (weights and Gaussian centers) were transmitted so that

the color surface could be reconstructed from a neural network using these parameters.

1.2.4 SUPPORT VECTOR REGRESSION MACHINES[4]

In this paper a new regression technique based on Vapnik’s concept of support vectors is

introduced. We compare support vector regression (SVR) with a committee regression

technique (bagging) based on regression trees and ridge regression done in feature space.

On the basis of these experiments it is expected that SVR will have advantages in high

dimensionality space because SVR optimization does not depend on the dimension &y of

the input space

1.2.5 SUPPORT VECTOR METHOD FOR FUNCTION OF APPROXIMATION[5]

In this paper The Support Vector (SV) method was recently proposed for estimating

regressions, constructing multidimensional splines, and solving linear operator equations

In this presentation we report results of applying the SV method to these problems. 1

Introduction The Support Vector method is a universal tool for solving multidimensional

function estimation problems. Initially it was designed to solve pattern recognition

problems, where in order to find a decision rule with good generalization ability one

selects some (small) subset of the training data, called the Support Vectors (SVs).

4

Optimal separation of the SVs is equivalent to optimal separation the entire data. This led

to a new method of representing decision functions where the decision functions are a

linear expansion on a basis whose elements are nonlinear functions parameterized by the

SVs (we need one SV for each element of the basis).

1.2.6 THE NATURE OF STATISTICAL LEARNING THEORY[6]

The aim of this book is to discuss the fundamental ideas which lie behind the statistical

theory of learning and generalization. It considers learning as a general problem of

function estimation based on empirical data. These include the setting of learning

problems based on the model of minimizing the risk functional from empirical data .

a comprehensive analysis of the empirical risk minimization principle including

necessary and sufficient conditions for its consistency non-asymptotic bounds for the risk

achieved using the empirical risk minimization principles for controlling the

generalization ability of learning machines using small sample sizes based on these

bounds the Support Vector methods that control the generalization ability when

estimating function using small sample size.

1.2.7 SUPPORT VECTOR MACHINES, NEUTRAL NETWORKS AND FUZZY LOGIC

MODELS [7]

This is the first textbook that provides a thorough, comprehensive and unified

introduction to the field of learning from experimental data and soft computing. Support

vector machines (SVMs) and neural networks (NNs) are the mathematical structures, or

models, that underlie learning, while fuzzy logic systems (FLS) enable us to embed

structured human knowledge into workable algorithms. The book assumes that it is not

only useful, but necessary, to treat SVMs, NNs, and FLS as parts of a connected whole..

This approach enables the reader to develop SVMs, NNs, and FLS in addition to

understanding them.

1.2.8 IMAGE COMPRESSION WITH NEURAL NETWORKS [8]

5

In this paper new technology such as neural networks and genetic algorithms are being

developed to explore the future of image coding. Successful applications of neural

networks to vector quantization have now become well established, and other aspects of

neural network involvement in this area are stepping up to play significant roles in

assisting with those traditional technologies. This paper presents an extensive survey on

the development of neural networks for image compression which covers three

categories: direct image compression by neural networks; neural network implementation

of existing techniques, and neural network based technology which provide improvement

over traditional algorithms.

1.2.9 NEURAL NETWORKS BY SIMON HAYKINS[9]

The author of this book has briefed about the concepts of SVM. How SVM is used in

pattern recognition. This book also gives the information about the generalization ability

of a linear SVM and also about the kernels used.

CHAPTER 2

BACKGROUND THEORIES

2.1 IMAGE COMPRESSION:

Image compression is the application of Data compression on digital images. In

effect, the objective is to reduce redundancy of the image data in order to be able to store or

transmit data in an efficient form. Image compression is minimizing the size in bytes of a

graphics file without degrading the quality of the image to an unacceptable level. The reduction

in file size allows more images to be stored in a given amount of disk or memory space. It also

reduces the time required for images to be sent over the Internet or downloaded from Web pages.

2.1.1 APPLICATIONS:

Currently image compression is recognized as an “Enabling Technology”. Its been used in

following applications,

6

• Image compression is the natural technology for handling the increased spatial

resolutions of today’s imaging sensors and evolving broadcast television standards.

• Plays a major role in many important and diverse applications including tele video

conferencing, remote sensing, document and medical imaging, facsimile transmission.

• Its also very useful in control of the remotely piloted vehicles in military, space and

hazardous waste management applications.

2.1.2 NEED FOR COMPRESSION:

One of the important aspects of image storage is its efficient compression. To make

this fact clear let's see an example. An image, 1024 pixel x 1024 pixel x 24 bit, without

compression, would require 3 MB of storage and 7 minutes for transmission, utilizing a high

speed, 64 Kbps, ISDN line. If the image is compressed at a 10:1 compression ratio, the storage

requirement is reduced to 300 KB and the transmission time drops to under 6 seconds. Seven 1

MB images can be compressed and transferred to a floppy disk in less time than it takes to send

one of the original files, uncompressed, over an AppleTalk network.

In a distributed environment large image files remain a major bottleneck within systems.

Compression is an important component of the solutions available for creating file sizes of

manageable and transmittable dimensions. Increasing the bandwidth is another method, but the

cost sometimes makes this a less attractive solution.

At the present state of technology, the only solution is to compress multimedia data

before its storage and transmission, and decompress it at the receiver for play back. For example,

with a compression ratio of 32:1, the space, bandwidth, and transmission time requirements can

be reduced by a factor of 32, with acceptable quality.

The figures in Table 1 show the qualitative transition from simple text to full-motion

video data and the disk space, transmission bandwidth, and transmission time needed to store and

transmit such uncompressed data.

Table 2.1 Multimedia data types and uncompressed storage space, transmission bandwidth, and transmission

time required. The prefix kilo- denotes a factor of 1000 rather than 1024.

Multimedi

a data

Size/Duration Bits/pixel

(or)

Bits/Sample

Uncompressed

size (B for

Bytes)

Transmission

Bandwidth

(b for bits)

Transmission

time.(using a

28.8k

7

modem)

A page of

text

11'' x 8.5'' Varying

resolution

4-8kB 32-64kb/page 1.1-2.2 sec

Telephone

quality

speech

10 sec 8bps 80kB 64kb/sec 22.2 sec

Gray scale

image

512*512 8bpp 262kB 2.1 Mb/image 1 min 13 sec

Colour

image

512*512 24bpp 286kB 6.29Mb/image 3 min 39sec

Medical

image

2048*1680 12bpp 5.26MB 41.3Mb/image 23min 54 sec

SHD image 2048*2048 24bpp 12.58MB 100Mb/image 58 min 15sec

Full Motion

Video

640*480,1

min (30

frames/sec)

24bpp 1.66GB 221Mb/sec 5 days 8 hrs

The examples above clearly illustrate the need for sufficient storage space, large

transmission bandwidth, and long transmission time for image, audio, and video data. At the

present state of technology, the only solution is to compress multimedia data before its storage

and transmission, and decompress it at the receiver for play back. For example, with a

compression ratio of 32:1, the space, bandwidth, and transmission time requirements can be

reduced by a factor of 32, with acceptable quality.

2.1.3 COMPRESSION PRINCIPLE:

A common characteristic of most images is that the neighboring pixels are correlated and

therefore contain redundant information. The foremost task then is to find less correlated

representation of the image.

Image compression addresses the problem of reducing the amount of data required to

represent a digital image. The underlying basis of the reduction process is the removal of

redundant data. From a mathematical viewpoint, this amounts to transforming a 2-D pixel array

into a statistically uncorrelated data set. The transformation is applied prior to storage and

transmission of the image. The compressed image is decompressed at some later time, to

reconstruct the original image or an approximation to it.

Two fundamental components of compression are redundancy and irrelevancy reduction.

Redundancy reduction aims at removing duplication from the signal source (image/video).

Irrelevancy reduction omits parts of the signal that will not be noticed by the signal receiver,

8

namely the Human Visual System (HVS). In general, three types of redundancy can be

identified:

• Spatial Redundancy or correlation between neighboring pixel values.

• Spectral Redundancy or correlation between different color planes or spectral

bands.

• Temporal Redundancy or correlation between adjacent frames in a sequence of

images (in video applications).

Image compression research aims at reducing the number of bits needed to represent an

image by removing the spatial and spectral redundancies as much as possible.

The best image quality at a given bit-rate (or compression rate) is the main goal of image

compression. However, there are other important properties of image compression schemes.

Scalability generally refers to a quality reduction achieved by manipulation of the bits-stream

or file (without decompression and re-compression). Other names for scalability are progressive

coding or embedded bit-streams. Despite its contrary nature, scalability can also be found in

lossless codec’s, usually in form of coarse-to-fine pixel scans. Scalability is especially useful for

previewing images while downloading them (e.g. in a web browser) or for providing variable

quality access to e.g. databases. There are several types of scalability:

• Quality progressive or layer progressive: The bit-stream successively refines the

reconstructed image.

• Resolution progressive: First encode a lower image resolution; then encode the difference

to higher resolutions.

• Component progressive: First encode grey; then color.

Region of interest coding: Certain parts of the image are encoded with higher quality than others.

This can be combined with scalability (encode these parts first, others later).

Meta information: Compressed data can contain information about the image which can be used

to categorize, search or browse images. Such information can include color and texture statistics,

small preview images.

9

The quality of a compression method is often measured by the Peak signal-to-noise ratio.

It measures the amount of noise introduced through a lossy compression of the image. However,

the subjective judgment of the viewer is also regarded as an important, perhaps the most

important measure.

2.1.4 CLASSIFICATION OF COMPRESSION TECHNIQUE:

Two ways of classifying compression techniques are mentioned here.

(a) Lossless vs. Lossy compression:

In lossless compression schemes, the reconstructed image, after compression, is

numerically identical to the original image. However lossless compression can only to

achieve a modest amount of compression. An image reconstructed following lossy

compression contains degradation relative to the original. Often this is because the

compression scheme completely discards redundant information. However, lossy

schemes are capable of achieving much higher compression. Under normal viewing

conditions, no visible loss is perceived (visually lossless).

Compressing an image is significantly different than compressing raw binary data. Of

course, general-purpose compression programs can be used to compress images, but the

result is less than optimal. This is because images have certain statistical properties,

which can be exploited by encoders specifically designed for them. Also, some of the

finer details in the image can be sacrificed for the sake of saving a little more bandwidth

or storage space. Lossy compression methods are especially suitable for natural images

such as photos in applications where minor (sometimes imperceptible) loss of fidelity is

acceptable to achieve a substantial reduction in bit rate.

A text file or program can be compressed without the introduction of errors, but only up

to a certain extent. Lossless compression is sometimes preferred for artificial images such

as technical drawings, icons or comics. This is because lossy compression methods,

especially when used at low bit rates, introduce compression artifacts. Lossless

compression methods may also be preferred for high value content, such as medical

imagery or image scans made for archival purposes. This is called Lossless compression.

Beyond this point, errors are introduced. In text and program files, it is crucial that

compression be lossless because a single error can seriously damage the meaning of a

10

text file, or cause a program not to run. In image compression, a small loss in quality is

usually not noticeable. There is no "critical point" up to which compression works

perfectly, but beyond which it becomes impossible. When there is some tolerance for

loss, the compression factor can be greater than it can when there is no loss tolerance. For

this reason, graphic images can be compressed more than text files or program.

The information loss in lossy coding comes from quantization of the data. Quantization

can be described as the process of sorting the data into different bits and representing

each bit with a value. The value selected to represent a bit is called the reconstruction

value. Every item in a bit has the same reconstruction value, which leads to information

loss (unless the quantization is so fine that every item gets its own bit).

(b) Predictive vs. Transform coding: In predictive coding, information already sent or

available is used to predict future values, and the difference is coded. Since this is

done in the image or spatial domain, it is relatively simple to implement and is

readily adapted to local image characteristics. Differential Pulse Code Modulation

(DPCM) is one particular example of predictive coding. Transform coding, on the

other hand, first transforms the image from its spatial domain representation to a

different type of representation using some well-known transform and then codes the

transformed values (coefficients). This method provides greater data compression

compared to predictive methods, although at the expense of greater computation

2.1.5 IMAGE COMPRESSION MODEL:

The block diagram of the image compression model is given in fig 2.1

Figure 2.1 Image Compression Model

2.1.5.1 SOURCE ENCODER:

11

SOURCE

ENCODER

CHANNEL

ENCODER

CHANNEL

CHANNEL

DECODER

SOURCE

DECODER

The source encoder is responsible for reducing the coding; inter pixel or psycho visual

redundancies in the input image. In the first stage of source encoding process, the mapper

transforms the input data into a format designed to reduce the inter pixel redundancies in the

input image. The second stage or quantizer block reduces the accuracy of the mapper’s output in

accordance with some pre established fidelity criterion. This stage reduces the psycho visual

redundancies of the input image. In the third and final stage of the source encoder the symbol

creates a fixed or variable length code to represent the mapped and quantized data set.

Figure 2.2 Source Encoder

2.1.5.2 SOURCE DECODER:

The source decoder contains only two components a symbol decoder and inverse mapper.

These blocks perform in reverse order the inverse operation of the source encoder’s symbol

encoder and mapper block.

2.1.5.3 CHANNEL ENCODER &DECODER

The channel encoder and decoder play an important role in the overall encoding-decoding

process when the channel in fig 2.1 is noisy or prone to error. They are designed to reduce the

impact of channel noise by inserting a controlled form of redundancy into the source encoded

data. As the output of the source encoder retains little redundancy, it would be highly sensitive to

transmission noise without the addition of this controlled redundancy.

2.2.1 COMPRESSION RATIO:

The compression ratio is defined as the ratio of original uncompressed image to the compressed

image.

12

MAPPER

QUANTIZER

SYMBOL

ENCODER

2.2.2 BITS PER PIXEL:

Bits per pixel is defined as the ratio of the number of bits required to encode the image to the

number of pixel in an image.

2.2.3 ENTROPY:

Entropy is the measure of average information in an image.

Where pi= probability of the ith gray level=

nk= Total number of pixels with gray level k

L=Total number of gray levels.

2.2.4 PSNR:

The peak signal to noise ratio is defined as

Where Xij and Xij ′ are the original and reconstructed pixel values at the location (i, j)

respectively, and (M×N) is the image size.

2.3 IMAGE COMPRESSION TECHNIQUES:

2.3.1 JPEG: DCT BASED IMAGE CODING STANDARD:

The DCT can be regarded as a discrete-time version of the Fourier-Cosine series. It is a

close relative of DFT, a technique for converting a signal into elementary frequency components.

Thus DCT can be computed with a Fast Fourier Transform (FFT) like algorithm in O(n log n)

13

Eq

Eq 2.4

Eq 2.3

Eq 2.2

operations. Unlike DFT, DCT is real-valued and provides a better approximation of a signal with

fewer coefficients. The DCT of a discrete signal x(n), n=0, 1, .. , N-1 is defined as:

where, α(u) = 0.707 for u = 0 and

= 1 otherwise.

JPEG established the first international standard for still image compression where the encoders

and decoders are DCT-based. The JPEG standard specifies three modes namely sequential,

progressive, and hierarchical for lossy encoding, and one mode of lossless encoding. The

`baseline JPEG coder' which is the sequential encoding in its simplest form, will be briefly

discussed here. Fig. 2.3 and 2.4 show the key processing steps in such an encoder and decoder

for grayscale images. Color image compression can be approximately regarded as compression

of multiple grayscale images, which are either compressed entirely one at a time, or are

compressed by alternately interleaving 8x8 sample blocks from each in turn.

The original image block is recovered from the DCT coefficients by applying the inverse discrete

cosine transform (IDCT), given by:

Where, α(u) = 0.707 for u = 0 and

= 1 otherwise.

Steps in JPEG Compression:

1. If the color is represented in RGB mode, translate it to YUV.

2. Divide the file into 8 X 8 blocks.

14

Eq 2.5

Eq 2.6

3. Transform the pixel information from the spatial domain to the frequency domain with the

Discrete Cosine Transform.

4. Quantize the resulting values by dividing each coefficient by an integer value and

rounding off to the nearest integer.

5. Look at the resulting coefficients in a zigzag order. Follow by Huffman coding

Figure 2.3 Encoder block diagram.

Figure 2.4 Decoder block diagram

2.3.2 BASIC CONCEPTS OF SVM:

Support Vector Machine is a universal learning machine. It has its roots in neural networks and

statistical learning theory.

2.3.2.1 MACHINE LEARNING:

Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow

computers to learn. What this means in most cases is that an algorithm is given a set of data and

infers information about the properties of the data—and that information allows it to make

15

predictions about other data that it might see in the future. This is possible because almost all

nonrandom data contains patterns, and these patterns allow the machine to generalize. In order to

generalize, it trains a model with what it determines are the important aspects of the data.

To understand how models come to be, we consider a simple example in the otherwise complex

field of email filtering. Suppose we receive a lot of spam that contains the words online

pharmacy. As a human being, we are well equipped to recognize patterns, and can quickly

determine that any message with the words online pharmacy is spam and should be moved

directly to the trash. This is a generalization we have in fact created a mental model of what is

spam.

There are many different machine-learning algorithms, all with different strengths and suited to

different types of problems. Some, such as decision trees, are transparent, so that an observer can

totally understand the reasoning process undertaken by the machine. Others, such as neural

networks, are black box meaning that they produce an answer, but it‘s often very difficult to

reproduce the reasoning behind it

2.3.2.2 SUPPORT VECTOR MACHINE:

Support vector machines (SVM), introduced by Vapnik and coworkers in 1992, and has been

noted as one of the best classifiers during the past 20 years. It is popular in bioinformatics, text

analysis and pattern classification. As a learning method support vector machine is regarded as

one of the best classifiers with a strong mathematical foundation. During the past decade, SVM

has been commonly used as a classifier for various applications

The handling of high feature dimensionality and the labeling of training data are the two major

challenges in pattern recognition. To handle the high feature dimensionality, there are two major

approaches. One is to use special classifiers which are not sensitive to dimensionality, for

example, SVM algorithm.

2.3.2.3 LINEAR CLASSIFICATION PROBLEM:

Most matrimonial sites collect a lot of interesting information about their members, including

demographic information, interests, and behavior. Imagine that this site collects the following

information:

16

• Age

• List of interests

• Location

• Qualification

Furthermore, this site collects information about whether two people have made a good match,

whether they initially made contact, and if they decided to meet in person. This data is used to

create the matchmaker dataset.

Each row has information about a man and a woman and, in the final column, a 1or a 0 to

indicate whether or not they are considered a good match. For a site with a large number of

profiles, this information might be used to build a predictive algorithm that assists users in

finding other people who are likely to be good matches. It might also indicate particular types of

people that the site is lacking, which would be useful in strategies for promoting the site to new

members. Lets take only the parameter ages and give the match information to illustrate how the

classifiers work, since two variables are much easier to visualize.

2.3.2.4 SVM IN LINEAR CLASSIFICATION:

The main idea of SVM is to construct a Hyper plane as the decision surface in such a way that

the margin of separation between the positive and negative examples is maximized. SVM basic

idea, which is to map the data into some other dot product space (called the feature space)

Consider a two-class linearly separable classification problem

Figure2.5.Linearly Separable Classification

17

Let {x1, ..., xn} be our data set and let di= {1,-1} be the class label of xi .The decision boundary

should classify all points correctly. The decision boundary is the Hyperplane. The equation of the

Hyperplane is given by W

T

.X+b=0

Where x is the input vector and w is the adjustable weight vector, b is the bias. The problem here

is there can be many decision boundaries as shown in figure 2.6 (a), 2.6(b) and 2.7(c)

(a)

18

(b)

(c)

Figure 2.6 Decision Boundaries that Can Be Formed

The decision boundary should be as far away from the data of both classes as possible. Therefore

we should maximize the margin. Margin is the width the boundary can be increased by before it

hits a data point. The positive plane that acts as a margin for positive class is given by

{X :<W

T

.X>+b =+1}

The negative plane which acts as the margin for negative class is

{X :<W

T

.X>+b =-1}

Hence we classify as (+1 if {X :<W

T

.X>+b =+1}) and (-1 if {X :<W

T

.X>+b =-1}) The vector

w is perpendicular to both planes. Margin width is 2 / |w|. So to maximize margin we have to

minimize the value of | w | .This is done by many ways. The trick often used is the Lagrangian

formulation of the problem.

19

Figure 2.7 Support Vectors and the Hyperplane

Support Vectors are those points which the margin pushes up against the Hyperplane. The

particular data points for which the following equations is satisfied with the equality sign are

called the support vectors, hence the name support vector machine. These vectors play a

prominent role in the operation of this class of learning machine. In conceptual terms the support

vectors are those points that lie closest to the Hyperplane are the most difficult to classify. As

such they have the direct bearing on the optimum location of the decision surface.

2.3.2.5 SOLUTION BY LAGRANGIAN MULTIPLIERS:

The Lagrangian is written:

] 1 ) b x w ( y [ w w 5 . 0 ) , b , w ( L

i

T

l

1 i

i i

T

− + α − · α

∑

·

where the α

i

are the Lagrange multipliers. This is now an optimization problem without

constraints where the objective is to minimize the Lagrangian L (w,b,α).

2.3.2.6 NON-SEPARABLE CLASSIFICATION:

There is no line that can be drawn between the two classes that separates the data without

misclassifying some data points. Now the aim is to find the hyperplane that makes the smallest

number of errors. Non-negative ‘slack’ variables ξ

1

, ξ

2,

ξ

3,…

ξ

l

are

introduced. These measure the

20

deviation of the data from the maximal margin, thus it is desirable that the ξ

i

be as small as

possible.

The optimization problem is now:

∑

·

ξ + ·

l

1 i

i

2

C || w || 5 . 0 ) w , x ( f

Here C is a design parameter called the penalty parameter. The penalty parameter controls the

magnitude of the ξ

i

An increase in C penalizes larger errors (large ξ

i

). However this can be

achieved only by increasing the weight vector norm W (that we want to minimize). At the same

time an increase in W does not guarantee smaller ξ

i

Figure2.8. Non Linearly Separable Classification

2.3.2.7 Function Approximation by SVM:

Regression is an extension of the non-separable classification such that each data point can be

thought of as being in its own class.

We are now approximating functions of the form

) x ( w ) w , x ( f

i

N

1 i

i

φ ·

∑

·

where the functions ) x (

i

φ are termed kernel functions(basis functions) and N is the number of

support vectors.

Vapnik’s linear loss functions with ε-insensitivity zone as a measure of the error of

approximation:

21

Thus, the loss is equal to 0 if the difference between the predicted f(x;w) and the measured value

is less than ε. Vapnik’s ε -insensitivity loss function defines an ε tube such that if the predicted

value is within the tube the error is zero. For all other predicted points outside the tube, the error

equals the magnitude of the difference between the predicted value and the radius ε of the tube.

Error=y- f(x,w)

The total ‘risk’ or error is given by:

ε

·

− − ·

∑

| b x w y | * L / 1 R

i

T

l

1 i

i emp

Now The goal is now to minimize R from the definition of (ξ

i

,ξ

i

*

) for data outside the

insensitivity tube ε:

|y- f(x,w)|-ε=ξ for data above ε tube

|y- f(x,w)|+ε=ξ

*

for data below ε tube

so our optimization problem is now to find w which minimizes the ‘risk’ or error given by

) ( C w 5 . 0 R

l

1 i

*

i

l

1 i

i

2

, , w

*

∑ ∑

· ·

ξ ξ

ξ + ξ + ·

Where ξ

i

and ξ

i

*

are slack variables for measurements ‘above’ and ‘below’ an ε-tube respectively

and x is a Gaussian kernel. Forming the Lagrangian and finding out partial derivative w.r.to w:

i

l

1 i

i i

x ) ( w

∑

·

∗

α − α ·

Similarly finding partial derivative w.r.to b, ξ

i

, ξ

i

*

We obtain matrix notation in the form of

Min α − α α · α

T T

f H 5 . 0 ) ( L

Where ] 1 x x [ H

T

+ ·

22

T

l

2

1

l

2

1

y

...

y

y

y

...

y

y

f

1

1

1

1

1

1

1

1

1

1

1

]

1

¸

+ ε

+ ε

+ ε

− ε

− ε

− ε

·

Our final goal is to solve non-linear regression problems. i.e problems of the type

f(x;w) = w

T

G(x)+b

Where G(x) is a non-linear mapping that maps input space x to feature space G(*) The mapping

of G(x) is normally the RBF design matrix given by:

,

_

¸

¸

·

) c , x ( G ... ) c , x ( G

... ... ...

) c , x ( G ... ) c , x ( G

G

1 l 11 l

l 1 1 1

Where G(x, c) is the kernel function. Typically a Gaussian kernel function is used given by (in 1-

dimension):

] )

) c x (

( 5 . 0 exp[ ) , c , x ( G

2

λ

−

− · λ

Where x is the spacial coordinate

c is the centre of the Gaussians

λ is the Gaussian width (or shape parameter)

To solve non-linear regression problems the only change required is to the Hessian matrix and is

given by

,

_

¸

¸

−

−

·

) c , x ( G ) c , x ( G

) c , x ( G ) c , x ( G

H

The weights vector w is found from

α − α ·

*

w

Note that when a positive definite kernel (such as Gaussian or complete polynomial) is used the

bias b equals zero.

23

2.3.2.8 APPLICATIONS OF SUPPORT VECTOR MACHINES

Since support-vector machines work well with high-dimensional datasets, they are most often

applied to data-intensive scientific problems and other problems that deal with very complex sets

of data. Some examples include:

• Classifying facial expressions

• Detecting intruders using military datasets

• Predicting the structure of proteins from their sequences

• Handwriting recognition

• Determining the potential for damage during earthquakes

• Digital watermarking

• Image compression

CHAPTER 3

3.1 PROGRAMMING METHODOLOGY:

24

Most image compression algorithms operate in the frequency domain. That is, the image is first

processed through some frequency analyzing function, further processing is applied onto the

resulting coefficients and the results generally encoded using an entropy encoding scheme such

as Huffman coding. The JPEG image compression algorithm is an example of an algorithm of

this type. The first step of the JPEG algorithm is to subdivide the image into 8×8 blocks then

apply the DCT to each block. Next quantization is applied to the resulting DCT coefficients. This

is simply dividing each element in the matrix of DCT coefficients by a corresponding element in

a ‘quantizing matrix’. The effect of reduces the value of most coefficients, some of which vanish

(i.e. their value becomes zero) when rounding is applied. Huffman coding is used to encode the

coefficients.

In this chapter the image is transformed into the frequency domain and applies SVM to the

frequency components. The Discrete Cosine Transform (DCT) is used as it has properties which

are exploited in SVM learning. The basic idea is to transform the image using the DCT, use

SVM learning to compress the DCT coefficients and use Huffman coding to encode the data as a

stream of bits

The algorithm presented here uses the discrete cosine transform. The DCT has properties which

make it suitable to SVM learning. SVM learning is applied to the DCT coefficients. Before the

SVM learning is applied the DCT coefficients are ‘processed’ in such a way as to make the trend

of the DCT curve more suitable to generalization by a SVM.As the DCT is fundamental to the

algorithm a detailed description follows

3.2 DESCRIPTIONS:

3.2.1 Input image:

The input image that is chosen is required to be a gray scale image with intensity levels 0-255.

The input image chosen depends upon the application where the compression is required

3.2.2 DISCRETE COSINE TRANSFORM:

25

The DCT has properties making it the choice for a number of compression schemes. It is the

basis for the JPEG compression scheme The DCT is a transform that maps a block of pixel color

values in the spatial domain to values in the frequency domain.

The DCT of a discrete signal x(n), n=0, 1, .. , N-1 is defined as:

Where, α(u) = 0.707 for u = 0 and

= 1 otherwise.

The DCT is more efficient on smaller images. When the DCT is applied to large images, the

rounding effects when floating point numbers are stored in a computer system result in the DCT

coefficients being stored with insufficient accuracy. The result is deterioration in image quality.

As the size of the image is increased, the number of computations increases disproportionately.

image is subdivided into 8×8 blocks. Where an image is not an integral number of 8×8 blocks,

the image can be padded with white pixels (i.e. extra pixels are added so that the image can be

divided into an integral number of 8× 8 blocks. The 2-dimensional DCT is applied to each block

so that an 8×8 matrix of DCT coefficients is produced for each block. This is termed the ‘DCT

Matrix’. The top left component of the DCT matrix is termed the ‘DC’ coefficient and can be

interpreted as the component responsible for the average background colour of the block.The

remaining 63 components of the DCT matrix are termed the ‘AC’ components as they are

frequency components The DC coefficient is often much higher in magnitude than the AC

components in the DCT matrix The original image block is recovered from the DCT coefficients

by applying the inverse discrete cosine transform (IDCT), given by:

Where, α(u) = 0.707 for u = 0 and

= 1 otherwise

3.2.3 TRANSFORMATION OF THE DCT MATRIX TO 1-D(ZIG-ZAG

TRANSFORMATION):

26

Eq 3.1

Eq 3.2

The elements of the DCT Matrix are mapped using the zig-zag sequence shown in Figure 6.2 to

produce a single row of numbers. That is a single row of numbers is collected as the zig-zag trail

is followed in the DCT matrix. This will produce a row of 64 numbers where the magnitude

tends to decrease traveling down the row of numbers.

Figure 3.1: The zig-zag pattern applied to a block of DCT Coefficients

3.2.4 COMBINING SVM WITH DCT

The 1-dimensional row of DCT coefficients is used as the training data for a SVM. SVM will

produce the minimum number of support vectors required to generalize the training data within a

predefined error (the ε-tube). Thus it is expected that when the row of DCT coefficients are used

as training data for the SVM, a lower number of support vectors will be required in order to

recover the DCT coefficients within the predefined error. Examination of the input data (i.e. the

DCT coefficients) reveals that the magnitudes of the coefficients are generally decreasing

27

traveling down the row of input data, however the sign (positive or negative) appears to be

random. This has the consequence that two coefficients next to each other can be of similar

magnitude but opposite sign causing a large swing in the input data. If the sign of each DCT

coefficient is ignored when used as input data to the SVM, there is the problem of how to re-

assign the signs when the DCT coefficients have been recovered.

The SVM learning process selects the minimum number of training points to use as the centers

of the Gaussian kernel functions in an RBF network in order for the function to be approximated

within the insensitivity zone. These selected training points are the support vectors. The

insensitivity zone drawn around the resulting function. When the penalty parameter C is infinite

the support vectors will always lie at the edge of the zone. There are only three parameters which

affect the compression which must be defined before learning can begin. These are the maximum

allowed error ε termed the insensitivity zone in SVM terminology, the penalty parameter C

and the Gaussian shape parameter.

3.2.4.1 QUADRATIC PROGRAMMING

Quadratic Programming deals with functions in which the x

i

are raised to the power of 0, 1, or 2.

The goal of Quadratic Programming is to determine the x

i

for which the function L(α) is a

minimum. The system is usually stated in Matrix and vector form.

A Quadratic program is an optimization problem with a Quadratic Objective and linear con-

straints

Minimize L(α) = (1/2) α

T

H α + f

T

α

Subject to A*x<=b

Which is usually further defined by a number of constraints (The 1/2 factor is included in the

quadratic term to avoid the appearance of a factor of 2 in the derivatives). L(α) is called the

objective function, H is a symmetric matrix called the Hessian matrix and f is a vector of

constants. This is a constrained minimization problem with quadratic function and linear

inequality constraints Where

,

_

¸

¸

−

−

·

) c , x ( G ) c , x ( G

) c , x ( G ) c , x ( G

H

28

and G(x) is given by the Gaussian kernel function

T

l

2

1

l

2

1

y

...

y

y

y

...

y

y

f

1

1

1

1

1

1

1

1

1

1

1

]

1

¸

+ ε

+ ε

+ ε

− ε

− ε

− ε

·

3.2.4.2 KERNEL FUNCTION

The relationship between the kernel function K and the mapping φ(.) is

K(x,y)=<φ(x),φ(y)>

This is known as the kernel trick In practice, we specify K thereby specifying φ(.) indirectly

instead of choosing φ(.) Intuitively, K(x,y) represents our desired notion of similarity between

data x and y and this is from our prior knowledge K(x,y) needs to satisfy a technical condition

(Mercer condition) in order for φ(.) to exist

Linear operation in the feature space is equivalent to non-linear operation in input space

The classification task can be “easier” with a proper transformation

Transform xi to a higher dimensional space is to

– Input space: the space containing xi

– Feature space: the space of φ(xi) after transformation

Figure 3.2: Transformation of input space to future space

Gaussian kernel function is used given by (in 1-dimension):

29

] )

) c x (

( 5 . 0 exp[ ) , c , x ( G

2

λ

−

− · λ

Where

x is the spacial coordinate

c is the centre of the Gaussians

λ is the Gaussian width (or shape parameter)

3.2.5 THE ‘INVERSION’ BIT

The ‘inversion bit’ indicates which of the recovered points should be inverted (i.e. multiplied by

-1) so that they are negative – that is positive points in Figure 6.4 (b) that were originally

negative are made negative by multiplying by -1 if the inversion bit is set. The inversion bit is a

single ‘0’ or ’1’. It is the sign of the corresponding input data. Each input data has an inversion

bit

After a block has been processed by the SVM, some the recovered DCT coefficients may have a

magnitude lower than the maximum error defined for the SVM. If these components had an

inversion bit of ‘1’ this can be set to ‘0’ as the sign of coefficients with small magnitude does not

affect the final recovered image. Put another way, inversion bits for very small magnitude DCT

coefficients do not contain significant information required for the recovery of the image.

3.2.6 ENCODING DATA FOR STORAGE

For each block weights and support vectors are required to be stored. The support vectors are the

Gaussian centers. In our algorithm we combine the weights with the support vectors so that each

block has the same number of weights as DCT coefficients. Where a weight has no

corresponding support vector the value of the weight is set to zero. That is the only non-zero

weights are weights for which a training point has been chosen to be a support vector by the

support vector machine. The next step is to quantize the weights.

3.2.6.1 QUANTIZATION

Quantizing involves reassigning the value of the weight to one of limited number of values. To

quantize the weights the maximum and minimum weight values (for the whole image) are found

30

and the number of quantization levels are pre-defined. The number of quantization levels chosen

is a degree of freedom in the algorithm.

The steps taken to quantize the weights are:

1. Find the maximum and minimum weight values. Call these max and min.

2. Find the difference (d) between quantization levels by d=max-min/n where n is the number of

quantization levels.

3. Set lowest quantization level q1= min.

4. Set remaining quantization levels by q

m

=q

m-1

+d, until qn = max.

5. Reassign each weight the value of the closest matching quantization level q

m

The inversion bits are now combined with the weights as follows. After quantization,

the minimum quantization level is subtracted from each weight. This will ensure that all weights

have a positive value. An arbitrary number is added to all weights (the same number is added to

all numbers) making all weights positive and non-zero. To recover the weights both the

minimum quantization level and the arbitrary number must be stored. Each individual weight has

an associated inversion bit. The inversion bit is combined with its corresponding weight to

making the value of the weight negative if the inversion bit is ‘1’, otherwise it is positive. Where

the weight is not a support vector the inversion data is discarded. This introduces a small error

when the image is decompressed, but significantly increases compression. The above steps

introduce many ‘zero’ values into the weight data. By setting inversion bits from ‘1’ to ‘0’ when

the associated DCT is less than the error ε many more zeros are introduced.

3.2.6.2 HUFFMAN ENCODING

The quantized weights are encoded using a Huffman encoding. Huffman coding is an entropy

encoding algorithm used for lossless data compression. it is a variable length code table for

encoding a source symbol

31

CHAPTER 4

4.1 FORMULATION OF THE APPROACH:

The image is first sub-divided into 8×8 blocks. The 2-dimensional DCT is applied to each block

to produce a matrix of DCT coefficients. The zig-zag mapping is applied to each matrix of DCT

coefficients to obtain a single row of numbers for each original block of pixels. The first term of

each row (the DC component) is separated so that only the AC terms are left. Not all the terms in

the row of AC coefficients are needed since the higher order terms do not contribute significantly

to the image. Exactly how many values are taken is a degree of freedom in the algorithm.

Support vector machine learning is applied to the absolute values of each row of AC terms as

described above and the inversion number for each block is generated. By following this method,

for each original block the Gaussian centers (i.e. the support vectors), the weights and the

inversion number need to stored/transmitted to be able to recover the block. The AC components

are used as training data to a SVM. The SVM learning process selects the minimum number of

training points to use as the centers of the Gaussian kernel functions in an RBF network in order

for the function to be approximated within the insensitivity zone. These selected training points

are the support vectors. A SVM trained on the data above with an error and Gaussian width set

to different values. The SVM was implemented in Matlab with a quadratic programming. This

will return a value for α from which we can compute the weights. In order to recover the image

the DC coefficient, the support vectors, the weights and the inversion number are stored. The

next step is to quantize the weights. Quantizing involves reassigning the value of the weight to

one of limited number of values. To quantize the weights the maximum and minimum weight

values (for the whole image) are found and the number of quantization levels are pre-defined.

The number of quantization levels chosen is a degree of freedom in the algorithm. The inversion

bits are now combined with the weights. After quantization, the minimum quantization level is

subtracted from each weight. This will ensure that all weights have a positive value. An arbitrary

number is added to all weights (the same number is added to all numbers) making all weights

positive and non-zero. To recover the weights both the minimum quantization level and the

arbitrary number must be stored. The quantized weights and number of zeros between non zero

weights are Huffman encoded to produce a binary file. The compression of the SVM surface

modeled images was computed from an actual binary file containing all information necessary

32

to recover an approximated version of the original image. To objectively measure image quality,

the signal to noise ratio (SNR) is calculated.

4.2 FLOW CHART:

DIVIDEING IMAGE INTO

8*8 BLOCKS

READING

IMAGE

APPLYING DCT TO EACH BLOCK

OF IMAGE

ZIG-ZAG TRANSFORMATION

QUANTIZATION AND HUFFMAN ENCODING

APPLYING IDCT TO EACH

BLOCK

OUTPUT IMAGE

MODELLING OF DCT

COEFFICIENTS USING SVM

DEQUANTIZATION AND HUFFMAN DECODING

DCT COEFFICIENTS FROM SVM MODEL

33

CHAPTER 5

RESULTS AND ANALYSIS:

In this section simulation results of the performance of image algorithm is being presented and

the results are being compared with the existing JPEG algorithm. In the implementation of

Algorithm for Application of SVM(Regression) Learning and DCT to Image Compression, we

first sub-divide image into 8×8 blocks. The 2-dimensional DCT is applied to each block to

produce a matrix of DCT coefficients. The zig-zag mapping is applied to each matrix of DCT

coefficients to obtain a single row of numbers for each original block of pixels. The first term of

each row (the DC component) is separated so that only the AC terms are left. Not all the terms in

the row of AC coefficients are needed since the higher order terms do not contribute significantly

to the image. Exactly how many values are taken is a degree of freedom in the algorithm.

Support vector machine learning is applied to the absolute values of each row of AC terms as

described above and the inversion number for each block is generated. By following this method,

for each original block the Gaussian centers (i.e. the support vectors), the weights and the

inversion number need to stored/transmitted to be able to recover the block. The AC components

are used as training data to a SVM. The support vector machine learning used is identical to the

This is a constrained minimization problem with quadratic function and linear inequality

constraints. It is called quadratic programming. This will return a value for α from which we can

compute the weights. In order to recover the image the DC coefficient, the support vectors, the

weights and the inversion number are stored. The next step is to quantize the weights. Quantizing

involves reassigning the value of the weight to one of limited number of values. To quantize the

weights the maximum and minimum weight values (for the whole image) are found and the

number of quantization levels are pre-defined. The number of quantization levels chosen is a

degree of freedom in the algorithm. The inversion bits are now combined with the weights. After

quantization, the minimum quantization level is subtracted from each weight. This will ensure

that all weights have a positive value. An arbitrary number is added to all weights (the same

number is added to all numbers) making all weights positive and non-zero. To recover the

weights both the minimum quantization level and the arbitrary number must be stored. The

quantized weights and number of zeros between non zero weights are Huffman encoded to

produce a binary file. The compression of the SVM surface modeled images was computed from

an actual binary file containing all information necessary to recover an approximated version of

34

the original image. To objectively measure image quality, the signal to noise ratio (SNR) is

calculated.

5.1 INPUT IMAGE:

Figure 5.1 Input image (a) Lena of size 128*128 is considered for compression

5.2 RESULTS OBTAINED FOR IMAGE COMPRESSION:

5.2.2 DIFFERENT VALUES OF EPSILON:

5.2.2.1 EPSILON=0.001

The image compression is obtained which can be seen in figures 5.2(e) and input image, plot of

DCT coefficients(for one example block) and plot of Absolute value of DCT coefficients, error

between the output and Desired input(for one example block) was shown in 5.2(a),

5.2(b),5.2(c),5.2(d)

35

(a) (b) (c)

(d) (e)

Figure 5.2 (a) Input image (b)DCT Coefficients(for one example block) (c)Absolute Value of DCT

Coefficients (for one example block) (d) error between the output and Desired input(for one example block)

(e)output image

5.2.2.2 EPSILON=0.01

The image compression is obtained which can be seen in figures 5.3(e) and input image, plot of

DCT coefficients(for one example block) and plot of Absolute value of DCT coefficients, error

between the output and Desired input(for one example block) was shown in 5.3(a),

5.3(b),5.3(c),5.3(d)

36

(a) (b) (c)

(d) (e)

Figure 5.3 (a) Input image (b)DCT Coefficients(for one example block) (c)Absolute Value of DCT

Coefficients(for one example block) (d) error between the output and Desired input(for one example block)

(e)output image

5.2.2.3 EPSILON=0.1

The image compression is obtained which can be seen in figures 5.4(e) and input image, plot of

DCT coefficients(for one example block) and plot of Absolute value of DCT coefficients, error

between the output and Desired input(for one example block) was shown in 5.4(a),

5.4(b),5.4(c),5.4(d)

37

(a) (b) (c)

(d) (e)

Figure 5.4 (a) Input image (b)DCT Coefficients(for one example block) (c)Absolute Value of DCT

Coefficients(for one example block) (d) error between the output and Desired input(for one example block)

(e)output image

38

Table 5.1The number of bits with different Epsilon values, Quantization levels and number of supported

vectors respectively with Compression Ratio and SNR(DB).

5.3 COMPARISON OF THE OBTAINED RESULTS WITH JPEG ALGORITHM:

Analysis:

For the purpose o comparison of the proposed algorithm and the JPEG algorithm we have the

compression ratio for both the images. Since the bound can be set previously before in our

proposed algorithm we set the bound to be different values and we can see that the compressed

images are being compared. But in JPEG it is found to be having lots of error even though the

picture quality is being maintained. The signal to noise ratio was considered for comparison and

the value is found in db as per the formula discussed in chapter 2. It is seen that the signal to

noise ratio is very high in case of our algorithm and hence the image with the information is

highly secured and we can obtain till image compression through this algorithm

39

Different

values of

epsilon(ε)

No of

Quantizatio

n Levels

Number

of

Supported

Vectors

Length

Of

Huffman

Code

Total

Numbe

r of

Bits

Compressio

n Ratio

SN

R(D

B)

0.001 60 16128 64343 131072 2.03 38

0.01 60 16128 36107 131072 3.63 22

0.1 60 8756 22354 131072 5.86 18

5.4 INPUT IMAGE:

Figure 5.5 Input image Lena of size 128*128 is considered for compression

5.2 RESULTS OBTAINED FOR JPEG COMPRESSION:

5.2.1 ANALYSIS:

The image compression is obtained which can be seen in figures 5.6(b),5.6(c),5.6(d) and input

image 5.6(a)

40

(a) (b) (c)

(d)

Figure 5.6 (a) Input image (b)(c)(d)compressed image for quality coefficients 2,5,10 respectively

Table 5.2 The number of bits with different Quality Coefficient, Length of Huffman Coding respectively with

Compression Ratio and SNR(DB).

41

CHAPTER 6

CONCLUSION:

In this project, an

image

compression algorithm which takes advantage of SVM learning was presented. The algorithm

exploits the trend of the DCT coefficients after the image has been transformed from the spacial

domain to the frequency domain via the DCT. SVM learning is used to estimate the DCT

coefficients within a predefined error. The SVM is trained on the absolute magnitude of the DCT

coefficients as these values require less SVs to estimate the underlying function. The net result of

the SVM learning is to compress the DCT coefficients much further than other methods such as

JPEG. The algorithm also defines how the original values are recovered by the introduction of

the inversion number. The inversion number allows us to recover the original sign (i.e., positive

42

Quality

coefficient

Length

Of

Huffman

Code

Total

Numbe

r of

Bits

Compressio

n Ratio

SN

R(D

B)

2 25201 131072 5.2 21.7

5 21264 131072 6.16 19.5

10 19381 131072 6.76 18.2

or negative) of each DCT coefficient so that combined with the magnitude of the coefficient as

estimated by the SVM, a close approximation to the original value of the DCT coefficient is

obtained in order to reconstruct the image. the new method produces better image quality than

the JPEG compression algorithm for compression ratios. Large compression ratios are possible

with the new method while still retaining reasonable image quality.

REFERENCES:

[1] M. H. Hassoun, Fundamentals of Artificial Neural Networks. Cambridge, MA: MIT Press,

1995

[2] C. Amerijckx, M. Verleysen, P. Thissen, and J. Legat, “Image compression by self-organized

Kohonen map,” IEEE Trans. Neural Networks, vol. 9, pp. 503–507, May 1998.

[3] J. Robinson and V. Kecman, “The use of support vectors in image compression,”Proc. 2nd

Int. Conf. Engineering Intelligent Systems, June 2000.

43

[4] H. Drucker C. J. C. Burges, L. Kaufmann, A. Smola, and V.Vapnik, Support Vector

Regression Machines. Cambridge, MA: MIT Press, 1997, Advances in Neural Information

Processing Systems, pp. 155–161.

[5] V. Vapnik, S. Golowich, and A. Smola, Support Vector Method for Function Approximation,

Regression Estimation and Signal Processing. Cambridge, MA: MIT Press, 1997, vol. 9,

Advances in Neural Information Processing Systems.

[6] V. N. Vapnik, The Nature of Statistical Learning Theory. New York:Springer-Verlag, 1995.

[7] V. Kecman, Learning and Soft Computing: Support Vector Machines, Neutral Networks and

Fuzzy Logic Models. Cambridge, MA: MIT Press, 2001

[8] J. Jiang, “Image compression with neural networks—A survey,” Signal Processing: Image

Communication, vol. 14, 1999.

[9]Simon Haykins ‘Neural Networks: A Comprehensive Foundation (2nd

edition)’

[10] J. Miano, Compressed Image File Formats. Reading, MA: Addison-Wesley, 1999.

[11] “Digital Compression and Coding of Continuous-Tone Still Images,”Amer. Nat. Standards

Inst., ISE/IEC IS 10918-1, 1994.

BIO DATA:

Name: rama kishor mutyala

Email : ramkishore_mutyala@yahoo.com

Course: Bachelors of Technology

University: Vellore Institute of Technology University

Branch: Electronics and Communication Engineering

Address: rama kishor mutyala

44

Door no: 2-74,Near ramalayam street

Gandhi Nagar, Vetlapalem

Samalkot Mandal, E.G.Dist

Andhra Pradesh-533434

45

1.1

OBJECTIVE:

•

To clearly understand and to Implement An Algorithm for the Application of SVM(Non Linear Regression) Learning and DCT to Image Compression

•

To compare the obtained results with the standard image compression techniques available like JPEG.

•

To obtain good quality, good compression ratio and signal to noise ratio within the required bound

2

1.2

DETAILED LITERATURE SURVEY:

IMAGE COMPRESSION The degree of compression is best expressed in terms of the average information or entropy of a compressed image source, expressed in terms of bits/pixel. Regardless of the particular technique used, compression engines accomplish their intended purpose in the following manner: 1. Those portions of the image which are not perceptible to the human eye are not transmitted. 2. Frame redundancies in the image are not transmitted. 3. The remaining information is coded in an efficient manner for transmission. Currently, a number of image compression techniques are being used singly or in combination. These include the following 1.2.1 ARTIFICIAL NEURAL NETWORKS[1] In this paper describes an algorithm using backpropagation learning in a feed forward network. The number of hidden neurons were fixed before learning and the weights of the network after training were transmitted. The neural network (and hence the image) could then be 3

recovered from these weights. Compression was generally around 8:1 with an image quality much lower than JPEG. 1.2.2 IMAGE COMPRESSION BY SELF-ORGANIZED KOHONEN MAP[2] In this paper a compression scheme based on the discrete cosine transform (DCT), vector Quantization of the DCT coefficients by Kohonen map, differential coding by first-order predictor and entropic coding of the differences. This method gave better performance than JPEG for compression ratios greater than 30:1 . 1.2.3 SUPPORT VECTORS IN IMAGE COMPRESSION[3] In this paper The use of support vector machines (SVMs) in an image compression algorithm was first presented. This method used SVM to directly model the color surface. The parameters of a neural network (weights and Gaussian centers) were transmitted so that the color surface could be reconstructed from a neural network using these parameters. 1.2.4 SUPPORT VECTOR REGRESSION MACHINES[4]

In this paper a new regression technique based on Vapnik’s concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments it is expected that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimension &y of the input space 1.2.5 SUPPORT VECTOR METHOD FOR FUNCTION OF APPROXIMATION[5] In this paper The Support Vector (SV) method was recently proposed for estimating regressions, constructing multidimensional splines, and solving linear operator equations In this presentation we report results of applying the SV method to these problems. 1 Introduction The Support Vector method is a universal tool for solving multidimensional function estimation problems. Initially it was designed to solve pattern recognition problems, where in order to find a decision rule with good generalization ability one selects some (small) subset of the training data, called the Support Vectors (SVs). 4

It considers learning as a general problem of function estimation based on empirical data. or models. and FLS as parts of a connected whole. comprehensive and unified introduction to the field of learning from experimental data and soft computing. to treat SVMs. a comprehensive analysis of the empirical risk minimization principle including necessary and sufficient conditions for its consistency non-asymptotic bounds for the risk achieved using the empirical risk minimization principles for controlling the generalization ability of learning machines using small sample sizes based on these bounds the Support Vector methods that control the generalization ability when estimating function using small sample size.6 THE NATURE OF STATISTICAL LEARNING THEORY[6] The aim of this book is to discuss the fundamental ideas which lie behind the statistical theory of learning and generalization. These include the setting of learning problems based on the model of minimizing the risk functional from empirical data . This approach enables the reader to develop SVMs. Support vector machines (SVMs) and neural networks (NNs) are the mathematical structures. and FLS in addition to understanding them.2. 1. NNs. 1. 1. This led to a new method of representing decision functions where the decision functions are a linear expansion on a basis whose elements are nonlinear functions parameterized by the SVs (we need one SV for each element of the basis).2.8 IMAGE COMPRESSION WITH NEURAL NETWORKS [8] 5 . The book assumes that it is not only useful.7 SUPPORT VECTOR MACHINES.. NEUTRAL NETWORKS AND FUZZY LOGIC MODELS [7] This is the first textbook that provides a thorough. but necessary. that underlie learning.Optimal separation of the SVs is equivalent to optimal separation the entire data. while fuzzy logic systems (FLS) enable us to embed structured human knowledge into workable algorithms. NNs.2.

the objective is to reduce redundancy of the image data in order to be able to store or transmit data in an efficient form. Successful applications of neural networks to vector quantization have now become well established. and neural network based technology which provide improvement over traditional algorithms. 2.In this paper new technology such as neural networks and genetic algorithms are being developed to explore the future of image coding. CHAPTER 2 BACKGROUND THEORIES 2. This book also gives the information about the generalization ability of a linear SVM and also about the kernels used. and other aspects of neural network involvement in this area are stepping up to play significant roles in assisting with those traditional technologies. This paper presents an extensive survey on the development of neural networks for image compression which covers three categories: direct image compression by neural networks.1 APPLICATIONS: Currently image compression is recognized as an “Enabling Technology”. neural network implementation of existing techniques.2. Image compression is minimizing the size in bytes of a graphics file without degrading the quality of the image to an unacceptable level.1.1 IMAGE COMPRESSION: Image compression is the application of Data compression on digital images. It also reduces the time required for images to be sent over the Internet or downloaded from Web pages. The reduction in file size allows more images to be stored in a given amount of disk or memory space. 1. Its been used in following applications. 6 .9 NEURAL NETWORKS BY SIMON HAYKINS[9] The author of this book has briefed about the concepts of SVM. In effect. How SVM is used in pattern recognition.

denotes a factor of 1000 rather than 1024.2 NEED FOR COMPRESSION: One of the important aspects of image storage is its efficient compression. Compression is an important component of the solutions available for creating file sizes of manageable and transmittable dimensions. utilizing a high speed. 64 Kbps.• • • Image compression is the natural technology for handling the increased spatial resolutions of today’s imaging sensors and evolving broadcast television standards. Seven 1 MB images can be compressed and transferred to a floppy disk in less time than it takes to send one of the original files. 1024 pixel x 1024 pixel x 24 bit. facsimile transmission. The prefix kilo. Its also very useful in control of the remotely piloted vehicles in military. but the cost sometimes makes this a less attractive solution. and decompress it at the receiver for play back. Increasing the bandwidth is another method.1. 2. with a compression ratio of 32:1. would require 3 MB of storage and 7 minutes for transmission. and transmission time needed to store and transmit such uncompressed data. the space. The figures in Table 1 show the qualitative transition from simple text to full-motion video data and the disk space. Plays a major role in many important and diverse applications including tele video conferencing. remote sensing. space and hazardous waste management applications. ISDN line. Table 2. the only solution is to compress multimedia data before its storage and transmission. over an AppleTalk network. with acceptable quality. without compression. transmission bandwidth.1 Multimedia data types and uncompressed storage space. At the present state of technology.8k 7 . In a distributed environment large image files remain a major bottleneck within systems. document and medical imaging.(using a Bits/Sample Bytes) (b for bits) 28. To make this fact clear let's see an example. An image. bandwidth. transmission bandwidth. Multimedi a data Size/Duration Bits/pixel Uncompressed Transmission Transmission (or) size (B for Bandwidth time. For example. the storage requirement is reduced to 300 KB and the transmission time drops to under 6 seconds. If the image is compressed at a 10:1 compression ratio. and transmission time required. uncompressed. and transmission time requirements can be reduced by a factor of 32.

2. with acceptable quality. At the present state of technology. audio.58MB 1. For example. From a mathematical viewpoint.2 sec 1 min 13 sec 6. and decompress it at the receiver for play back. The underlying basis of the reduction process is the removal of redundant data. large transmission bandwidth. bandwidth. to reconstruct the original image or an approximation to it. and transmission time requirements can be reduced by a factor of 32. 8 . The transformation is applied prior to storage and transmission of the image.modem) A page of text Telephone quality speech Gray scale image Colour image Medical image SHD image Full Motion Video 11'' x 8. this amounts to transforming a 2-D pixel array into a statistically uncorrelated data set. Two fundamental components of compression are redundancy and irrelevancy reduction. Irrelevancy reduction omits parts of the signal that will not be noticed by the signal receiver.5'' 10 sec 512*512 512*512 2048*1680 2048*2048 640*480.29Mb/image 3 min 39sec 41.2 sec 22. Image compression addresses the problem of reducing the amount of data required to represent a digital image.1-2. The compressed image is decompressed at some later time. and video data.3 COMPRESSION PRINCIPLE: A common characteristic of most images is that the neighboring pixels are correlated and therefore contain redundant information. the space.3Mb/image 23min 54 sec 100Mb/image 221Mb/sec 58 min 15sec 5 days 8 hrs The examples above clearly illustrate the need for sufficient storage space. Redundancy reduction aims at removing duplication from the signal source (image/video). and long transmission time for image.66GB 32-64kb/page 64kb/sec 2.26MB 12.1 min (30 frames/sec) Varying resolution 8bps 8bpp 24bpp 12bpp 24bpp 24bpp 4-8kB 80kB 262kB 286kB 5. the only solution is to compress multimedia data before its storage and transmission. The foremost task then is to find less correlated representation of the image.1 Mb/image 1. with a compression ratio of 32:1.1.

there are other important properties of image compression schemes. This can be combined with scalability (encode these parts first. scalability can also be found in lossless codec’s. Spectral Redundancy or correlation between different color planes or spectral bands. 9 . databases. The best image quality at a given bit-rate (or compression rate) is the main goal of image compression. • • Region of interest coding: Certain parts of the image are encoded with higher quality than others. search or browse images. Such information can include color and texture statistics. then encode the difference to higher resolutions. three types of redundancy can be identified: • • • Spatial Redundancy or correlation between neighboring pixel values. There are several types of scalability: • Quality progressive or layer progressive: The bit-stream successively refines the reconstructed image.g. Scalability generally refers to a quality reduction achieved by manipulation of the bits-stream or file (without decompression and re-compression). others later). Scalability is especially useful for previewing images while downloading them (e. Despite its contrary nature. However. small preview images. Other names for scalability are progressive coding or embedded bit-streams. usually in form of coarse-to-fine pixel scans. in a web browser) or for providing variable quality access to e. In general. Meta information: Compressed data can contain information about the image which can be used to categorize. Resolution progressive: First encode a lower image resolution. Image compression research aims at reducing the number of bits needed to represent an image by removing the spatial and spectral redundancies as much as possible. Component progressive: First encode grey.namely the Human Visual System (HVS). Temporal Redundancy or correlation between adjacent frames in a sequence of images (in video applications).g. then color.

However lossless compression can only to achieve a modest amount of compression. Also. introduce compression artifacts. Lossless compression methods may also be preferred for high value content. such as medical imagery or image scans made for archival purposes. A text file or program can be compressed without the introduction of errors. 2. which can be exploited by encoders specifically designed for them. This is because lossy compression methods. it is crucial that compression be lossless because a single error can seriously damage the meaning of a 10 . errors are introduced. This is called Lossless compression. Compressing an image is significantly different than compressing raw binary data. In text and program files. the subjective judgment of the viewer is also regarded as an important. Lossy compression: In lossless compression schemes.The quality of a compression method is often measured by the Peak signal-to-noise ratio. some of the finer details in the image can be sacrificed for the sake of saving a little more bandwidth or storage space. It measures the amount of noise introduced through a lossy compression of the image.4 CLASSIFICATION OF COMPRESSION TECHNIQUE: Two ways of classifying compression techniques are mentioned here. Lossy compression methods are especially suitable for natural images such as photos in applications where minor (sometimes imperceptible) loss of fidelity is acceptable to achieve a substantial reduction in bit rate. However. Often this is because the compression scheme completely discards redundant information. icons or comics. general-purpose compression programs can be used to compress images. Of course. but only up to a certain extent. However. perhaps the most important measure. This is because images have certain statistical properties.1. but the result is less than optimal. (a) Lossless vs. no visible loss is perceived (visually lossless). Under normal viewing conditions. after compression. Lossless compression is sometimes preferred for artificial images such as technical drawings. especially when used at low bit rates. Beyond this point. the reconstructed image. is numerically identical to the original image. lossy schemes are capable of achieving much higher compression. An image reconstructed following lossy compression contains degradation relative to the original.

on the other hand. and the difference is coded. a small loss in quality is usually not noticeable.1 SOURCE ENCODER: 11 . the compression factor can be greater than it can when there is no loss tolerance. When there is some tolerance for loss.5.1. Differential Pulse Code Modulation (DPCM) is one particular example of predictive coding.5 IMAGE COMPRESSION MODEL: The block diagram of the image compression model is given in fig 2.1 SOURCE ENCODER CHANNEL ENCODER CHANNEL CHANNEL DECODER SOURCE DECODER Figure 2. The information loss in lossy coding comes from quantization of the data.text file. which leads to information loss (unless the quantization is so fine that every item gets its own bit). Quantization can be described as the process of sorting the data into different bits and representing each bit with a value. Transform coding: In predictive coding. (b) Predictive vs. There is no "critical point" up to which compression works perfectly. but beyond which it becomes impossible. first transforms the image from its spatial domain representation to a different type of representation using some well-known transform and then codes the transformed values (coefficients). it is relatively simple to implement and is readily adapted to local image characteristics.1 Image Compression Model 2. information already sent or available is used to predict future values. although at the expense of greater computation 2. The value selected to represent a bit is called the reconstruction value. Transform coding. Every item in a bit has the same reconstruction value.1. This method provides greater data compression compared to predictive methods. or cause a program not to run. In image compression. For this reason. graphic images can be compressed more than text files or program. Since this is done in the image or spatial domain.

In the third and final stage of the source encoder the symbol creates a fixed or variable length code to represent the mapped and quantized data set.5. They are designed to reduce the impact of channel noise by inserting a controlled form of redundancy into the source encoded data.3 CHANNEL ENCODER &DECODER The channel encoder and decoder play an important role in the overall encoding-decoding process when the channel in fig 2.1 COMPRESSION RATIO: The compression ratio is defined as the ratio of original uncompressed image to the compressed image.1. inter pixel or psycho visual redundancies in the input image.5. This stage reduces the psycho visual redundancies of the input image.1 is noisy or prone to error. it would be highly sensitive to transmission noise without the addition of this controlled redundancy. These blocks perform in reverse order the inverse operation of the source encoder’s symbol encoder and mapper block. 2. 12 . The second stage or quantizer block reduces the accuracy of the mapper’s output in accordance with some pre established fidelity criterion. QUANTIZER SYMBOL ENCODER MAPPER Figure 2. In the first stage of source encoding process. As the output of the source encoder retains little redundancy. the mapper transforms the input data into a format designed to reduce the inter pixel redundancies in the input image.2.2 SOURCE DECODER: The source decoder contains only two components a symbol decoder and inverse mapper.2 Source Encoder 2.The source encoder is responsible for reducing the coding.1. 2.

3 Where pi= probability of the ith gray level= nk= Total number of pixels with gray level k L=Total number of gray levels.2.3 IMAGE COMPRESSION TECHNIQUES: 2. 2. Eq 2. 2. Thus DCT can be computed with a Fast Fourier Transform (FFT) like algorithm in O(n log n) 13 .2. Eq 2. It is a close relative of DFT. a technique for converting a signal into elementary frequency components.2 BITS PER PIXEL: Bits per pixel is defined as the ratio of the number of bits required to encode the image to the number of pixel in an image.Eq 2.1 JPEG: DCT BASED IMAGE CODING STANDARD: The DCT can be regarded as a discrete-time version of the Fourier-Cosine series. and (M×N) is the image size.4 Where Xij and Xij ′ are the original and reconstructed pixel values at the location (i. j) respectively.2.3 ENTROPY: Entropy is the measure of average information in an image.4 PSNR: The peak signal to noise ratio is defined as Eq 2.2 2.3.

707 for u = 0 and = 1 otherwise. The original image block is recovered from the DCT coefficients by applying the inverse discrete cosine transform (IDCT). which are either compressed entirely one at a time. and hierarchical for lossy encoding. Steps in JPEG Compression: 1. Color image compression can be approximately regarded as compression of multiple grayscale images..5 where.4 show the key processing steps in such an encoder and decoder for grayscale images. translate it to YUV. Unlike DFT. . .operations. JPEG established the first international standard for still image compression where the encoders and decoders are DCT-based. 2. The DCT of a discrete signal x(n). or are compressed by alternately interleaving 8x8 sample blocks from each in turn. The `baseline JPEG coder' which is the sequential encoding in its simplest form.707 for u = 0 and = 1 otherwise. Fig. 2. Divide the file into 8 X 8 blocks. given by: Eq 2. DCT is real-valued and provides a better approximation of a signal with fewer coefficients.6 Where. and one mode of lossless encoding. will be briefly discussed here. n=0. The JPEG standard specifies three modes namely sequential. α(u) = 0. progressive.3 and 2. N-1 is defined as: Eq 2. α(u) = 0. 14 . If the color is represented in RGB mode. 1.

It has its roots in neural networks and statistical learning theory.3.4 Decoder block diagram 2. Look at the resulting coefficients in a zigzag order. Transform the pixel information from the spatial domain to the frequency domain with the Discrete Cosine Transform. What this means in most cases is that an algorithm is given a set of data and infers information about the properties of the data—and that information allows it to make 15 .2. Figure 2.3 Encoder block diagram.1 MACHINE LEARNING: Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. 5.2 BASIC CONCEPTS OF SVM: Support Vector Machine is a universal learning machine.3.3. 2. 4. Follow by Huffman coding Figure 2. Quantize the resulting values by dividing each coefficient by an integer value and rounding off to the nearest integer.

There are many different machine-learning algorithms. but it‘s often very difficult to reproduce the reasoning behind it 2. are transparent. This is possible because almost all nonrandom data contains patterns.2 SUPPORT VECTOR MACHINE: Support vector machines (SVM). introduced by Vapnik and coworkers in 1992. As a learning method support vector machine is regarded as one of the best classifiers with a strong mathematical foundation. including demographic information.3. interests. SVM algorithm. for example. text analysis and pattern classification.predictions about other data that it might see in the future. To understand how models come to be. are black box meaning that they produce an answer. there are two major approaches. To handle the high feature dimensionality. it trains a model with what it determines are the important aspects of the data. such as decision trees. Suppose we receive a lot of spam that contains the words online pharmacy. all with different strengths and suited to different types of problems.2. Some.2.3 LINEAR CLASSIFICATION PROBLEM: Most matrimonial sites collect a lot of interesting information about their members. One is to use special classifiers which are not sensitive to dimensionality. so that an observer can totally understand the reasoning process undertaken by the machine. 2.3. SVM has been commonly used as a classifier for various applications The handling of high feature dimensionality and the labeling of training data are the two major challenges in pattern recognition. and can quickly determine that any message with the words online pharmacy is spam and should be moved directly to the trash. we are well equipped to recognize patterns. we consider a simple example in the otherwise complex field of email filtering. such as neural networks. This is a generalization we have in fact created a mental model of what is spam. and has been noted as one of the best classifiers during the past 20 years. and behavior. Imagine that this site collects the following information: 16 . Others. In order to generalize. During the past decade. It is popular in bioinformatics. As a human being. and these patterns allow the machine to generalize.

Linearly Separable Classification 17 . a 1or a 0 to indicate whether or not they are considered a good match.5. this site collects information about whether two people have made a good match. 2. which is to map the data into some other dot product space (called the feature space) Consider a two-class linearly separable classification problem Figure2. which would be useful in strategies for promoting the site to new members.3. in the final column. and if they decided to meet in person. Each row has information about a man and a woman and.4 SVM IN LINEAR CLASSIFICATION: The main idea of SVM is to construct a Hyper plane as the decision surface in such a way that the margin of separation between the positive and negative examples is maximized. SVM basic idea. whether they initially made contact. It might also indicate particular types of people that the site is lacking. this information might be used to build a predictive algorithm that assists users in finding other people who are likely to be good matches. Lets take only the parameter ages and give the match information to illustrate how the classifiers work. For a site with a large number of profiles. since two variables are much easier to visualize.• Age • List of interests • Location • Qualification Furthermore.2. This data is used to create the matchmaker dataset.

6 (a)...X+b=0 Where x is the input vector and w is the adjustable weight vector.The decision boundary should classify all points correctly.-1} be the class label of xi .6(b) and 2. The equation of the Hyperplane is given by WT. .. 2. The decision boundary is the Hyperplane.7(c) (a) 18 . b is the bias. The problem here is there can be many decision boundaries as shown in figure 2. xn} be our data set and let di= {1.Let {x1.

19 . Margin is the width the boundary can be increased by before it hits a data point.X>+b =+1}) and (-1 if {X :<WT . Margin width is 2 / |w|.6 Decision Boundaries that Can Be Formed The decision boundary should be as far away from the data of both classes as possible.X>+b =-1}) The vector w is perpendicular to both planes.This is done by many ways. The trick often used is the Lagrangian formulation of the problem. Therefore we should maximize the margin. So to maximize margin we have to minimize the value of | w | . The positive plane that acts as a margin for positive class is given by {X :<WT .(b) (c) Figure 2.X>+b =-1} Hence we classify as (+1 if {X :<WT .X>+b =+1} The negative plane which acts as the margin for negative class is {X :<WT .

As such they have the direct bearing on the optimum location of the decision surface. Non-negative ‘slack’ variables ξ1.5 SOLUTION BY LAGRANGIAN MULTIPLIERS: The Lagrangian is written: L( w . α) = 0.3. This is now an optimization problem without constraints where the objective is to minimize the Lagrangian L (w.2.… ξl are introduced.2. b.6 NON-SEPARABLE CLASSIFICATION: There is no line that can be drawn between the two classes that separates the data without misclassifying some data points. ξ3. 2. 2.3. The particular data points for which the following equations is satisfied with the equality sign are called the support vectors. Now the aim is to find the hyperplane that makes the smallest number of errors.b. These measure the 20 . ξ2.α). In conceptual terms the support vectors are those points that lie closest to the Hyperplane are the most difficult to classify.Figure 2. These vectors play a prominent role in the operation of this class of learning machine.7 Support Vectors and the Hyperplane Support Vectors are those points which the margin pushes up against the Hyperplane.5w T w − ∑ α i [ y i ( w T x i + b) − 1] i =1 l where the αi are the Lagrange multipliers. hence the name support vector machine.

2. Non Linearly Separable Classification 2.5 || w || 2 +C∑ ξi i =1 l Here C is a design parameter called the penalty parameter. At the same time an increase in W does not guarantee smaller ξi Figure2. We are now approximating functions of the form f ( x . thus it is desirable that the ξi be as small as possible.deviation of the data from the maximal margin. The optimization problem is now: f ( x . Vapnik’s linear loss functions with ε-insensitivity zone as a measure of the error of approximation: 21 . The penalty parameter controls the magnitude of the ξi An increase in C penalizes larger errors (large ξi ).7 Function Approximation by SVM: Regression is an extension of the non-separable classification such that each data point can be thought of as being in its own class. w ) = 0. w ) = ∑ w i φi ( x ) i =1 N where the functions φi ( x ) are termed kernel functions(basis functions) and N is the number of support vectors.8.3. However this can be achieved only by increasing the weight vector norm W (that we want to minimize).

r.r.Thus.to w: l w = ∑ (α i − α i ) x i ∗ i =1 Similarly finding partial derivative w. ξi* We obtain matrix notation in the form of Min L(α) = 0. the error equals the magnitude of the difference between the predicted value and the radius ε of the tube.f(x.w)|+ε=ξ* for data above ε tube for data below ε tube so our optimization problem is now to find w which minimizes the ‘risk’ or error given by l l R w .f(x. Forming the Lagrangian and finding out partial derivative w.to b.ξi*) for data outside the insensitivity tube ε: |y.5 w 2 + C(∑ξi + ∑ξi ) * i =1 i =1 Where ξi and ξi* are slack variables for measurements ‘above’ and ‘below’ an ε-tube respectively and x is a Gaussian kernel.w)|-ε=ξ |y. ξi.w) and the measured value is less than ε.w) The total ‘risk’ or error is given by: R emp = 1 / L * ∑ | y i − w T x i − b |ε i =1 l Now The goal is now to minimize R from the definition of (ξi. For all other predicted points outside the tube. Vapnik’s ε -insensitivity loss function defines an ε tube such that if the predicted value is within the tube the error is zero.ξ.f(x.ξ* = 0.5αT Hα − f T α Where H = [ x T x +1] 22 . Error=y. the loss is equal to 0 if the difference between the predicted f(x.

c) − G ( x .. .. Typically a Gaussian kernel function is used given by (in 1dimension): G ( x . G(x1 . c1 ) Where G(x. c ) l 11 . c) is the kernel function..... c. c1 ) G = .. c l ) .. i..5( ( x − c) 2 ) ] λ Where x is the spacial coordinate c is the centre of the Gaussians λ is the Gaussian width (or shape parameter) To solve non-linear regression problems the only change required is to the Hessian matrix and is given by G ( x . c) H = − G ( x . G(x .. c) The weights vector w is found from w = α* − α Note that when a positive definite kernel (such as Gaussian or complete polynomial) is used the bias b equals zero. G ( x l .. . λ) = exp[ −0. ε + y l T Our final goal is to solve non-linear regression problems.. ε − y1 ε − y 2 ..e problems of the type f(x. c) G ( x .w) = wTG(x)+b Where G(x) is a non-linear mapping that maps input space x to feature space G(*) The mapping of G(x) is normally the RBF design matrix given by: G ( x 1 . 23 . ε − yl f = ε + y1 ε + y 2 ..

2.2. Some examples include: • • • • • • • Classifying facial expressions Detecting intruders using military datasets Predicting the structure of proteins from their sequences Handwriting recognition Determining the potential for damage during earthquakes Digital watermarking Image compression CHAPTER 3 3.3. they are most often applied to data-intensive scientific problems and other problems that deal with very complex sets of data.8 APPLICATIONS OF SUPPORT VECTOR MACHINES Since support-vector machines work well with high-dimensional datasets.1 PROGRAMMING METHODOLOGY: 24 .

the image is first processed through some frequency analyzing function.2 DISCRETE COSINE TRANSFORM: 25 . The effect of reduces the value of most coefficients. The first step of the JPEG algorithm is to subdivide the image into 8×8 blocks then apply the DCT to each block.As the DCT is fundamental to the algorithm a detailed description follows 3. SVM learning is applied to the DCT coefficients. In this chapter the image is transformed into the frequency domain and applies SVM to the frequency components. Next quantization is applied to the resulting DCT coefficients. The JPEG image compression algorithm is an example of an algorithm of this type.1 Input image: The input image that is chosen is required to be a gray scale image with intensity levels 0-255. Huffman coding is used to encode the coefficients. The basic idea is to transform the image using the DCT.e.Most image compression algorithms operate in the frequency domain. Before the SVM learning is applied the DCT coefficients are ‘processed’ in such a way as to make the trend of the DCT curve more suitable to generalization by a SVM.2. further processing is applied onto the resulting coefficients and the results generally encoded using an entropy encoding scheme such as Huffman coding. The input image chosen depends upon the application where the compression is required 3. their value becomes zero) when rounding is applied. That is.2. some of which vanish (i. use SVM learning to compress the DCT coefficients and use Huffman coding to encode the data as a stream of bits The algorithm presented here uses the discrete cosine transform.2 DESCRIPTIONS: 3. This is simply dividing each element in the matrix of DCT coefficients by a corresponding element in a ‘quantizing matrix’. The Discrete Cosine Transform (DCT) is used as it has properties which are exploited in SVM learning. The DCT has properties which make it suitable to SVM learning.

given by: Eq 3.707 for u = 0 and = 1 otherwise 3. The top left component of the DCT matrix is termed the ‘DC’ coefficient and can be interpreted as the component responsible for the average background colour of the block.2 Where. the number of computations increases disproportionately. n=0. the rounding effects when floating point numbers are stored in a computer system result in the DCT coefficients being stored with insufficient accuracy.707 for u = 0 and = 1 otherwise.The DCT has properties making it the choice for a number of compression schemes. . Where an image is not an integral number of 8×8 blocks.1 Where. The 2-dimensional DCT is applied to each block so that an 8×8 matrix of DCT coefficients is produced for each block. The DCT of a discrete signal x(n).3 TRANSFORMATION OF THE DCT MATRIX TO 1-D(ZIG-ZAG TRANSFORMATION): 26 .e. It is the basis for the JPEG compression scheme The DCT is a transform that maps a block of pixel color values in the spatial domain to values in the frequency domain. The result is deterioration in image quality. As the size of the image is increased. The DCT is more efficient on smaller images. image is subdivided into 8×8 blocks. the image can be padded with white pixels (i. extra pixels are added so that the image can be divided into an integral number of 8× 8 blocks.The remaining 63 components of the DCT matrix are termed the ‘AC’ components as they are frequency components The DC coefficient is often much higher in magnitude than the AC components in the DCT matrix The original image block is recovered from the DCT coefficients by applying the inverse discrete cosine transform (IDCT). 1. This is termed the ‘DCT Matrix’. α(u) = 0. . N-1 is defined as: Eq 3. When the DCT is applied to large images. α(u) = 0.2..

Thus it is expected that when the row of DCT coefficients are used as training data for the SVM.The elements of the DCT Matrix are mapped using the zig-zag sequence shown in Figure 6.4 COMBINING SVM WITH DCT The 1-dimensional row of DCT coefficients is used as the training data for a SVM. a lower number of support vectors will be required in order to recover the DCT coefficients within the predefined error. That is a single row of numbers is collected as the zig-zag trail is followed in the DCT matrix. SVM will produce the minimum number of support vectors required to generalize the training data within a predefined error (the ε-tube). Examination of the input data (i. Figure 3.2. the DCT coefficients) reveals that the magnitudes of the coefficients are generally decreasing 27 .2 to produce a single row of numbers.1: The zig-zag pattern applied to a block of DCT Coefficients 3.e. This will produce a row of 64 numbers where the magnitude tends to decrease traveling down the row of numbers.

When the penalty parameter C is infinite the support vectors will always lie at the edge of the zone. The system is usually stated in Matrix and vector form. there is the problem of how to reassign the signs when the DCT coefficients have been recovered. c) G ( x . c) 28 . The SVM learning process selects the minimum number of training points to use as the centers of the Gaussian kernel functions in an RBF network in order for the function to be approximated within the insensitivity zone.1 QUADRATIC PROGRAMMING Quadratic Programming deals with functions in which the xi are raised to the power of 0. These selected training points are the support vectors. A Quadratic program is an optimization problem with a Quadratic Objective and linear constraints Minimize L(α) = (1/2) αT H α + fT α Subject to A*x<=b Which is usually further defined by a number of constraints (The 1/2 factor is included in the quadratic term to avoid the appearance of a factor of 2 in the derivatives). The insensitivity zone drawn around the resulting function. however the sign (positive or negative) appears to be random. c) − G ( x . This is a constrained minimization problem with quadratic function and linear inequality constraints Where H = − G ( x . 1. or 2. H is a symmetric matrix called the Hessian matrix and f is a vector of constants. 3. These are the maximum allowed error ε termed the insensitivity zone in SVM terminology. c) G ( x .2. This has the consequence that two coefficients next to each other can be of similar magnitude but opposite sign causing a large swing in the input data. the penalty parameter C and the Gaussian shape parameter.4.traveling down the row of input data. If the sign of each DCT coefficient is ignored when used as input data to the SVM. L(α) is called the objective function. The goal of Quadratic Programming is to determine the xi for which the function L(α) is a minimum. There are only three parameters which affect the compression which must be defined before learning can begin.

y)=<φ(x).) Intuitively.2.and G(x) is given by the Gaussian kernel function ε − y1 ε − y 2 .) to exist Linear operation in the feature space is equivalent to non-linear operation in input space The classification task can be “easier” with a proper transformation Transform xi to a higher dimensional space is to – Input space: the space containing xi – Feature space: the space of φ(xi) after transformation Figure 3...4.2 KERNEL FUNCTION The relationship between the kernel function K and the mapping φ(.. K(x..φ(y)> This is known as the kernel trick In practice.) is K(x. ε − yl f = ε + y1 ε + y 2 .) indirectly instead of choosing φ(.2: Transformation of input space to future space Gaussian kernel function is used given by (in 1-dimension): 29 .y) represents our desired notion of similarity between data x and y and this is from our prior knowledge K(x. ε + y l T 3.y) needs to satisfy a technical condition (Mercer condition) in order for φ(. we specify K thereby specifying φ(.

3. The next step is to quantize the weights. c. That is the only non-zero weights are weights for which a training point has been chosen to be a support vector by the support vector machine.6. λ) = exp[ −0. Put another way.4 (b) that were originally negative are made negative by multiplying by -1 if the inversion bit is set. If these components had an inversion bit of ‘1’ this can be set to ‘0’ as the sign of coefficients with small magnitude does not affect the final recovered image. Each input data has an inversion bit After a block has been processed by the SVM.2.5( Where ( x − c) 2 ) ] λ x is the spacial coordinate c is the centre of the Gaussians λ is the Gaussian width (or shape parameter) 3.5 THE ‘INVERSION’ BIT The ‘inversion bit’ indicates which of the recovered points should be inverted (i. multiplied by -1) so that they are negative – that is positive points in Figure 6. Where a weight has no corresponding support vector the value of the weight is set to zero. In our algorithm we combine the weights with the support vectors so that each block has the same number of weights as DCT coefficients. 3. inversion bits for very small magnitude DCT coefficients do not contain significant information required for the recovery of the image. It is the sign of the corresponding input data.G ( x .1 QUANTIZATION Quantizing involves reassigning the value of the weight to one of limited number of values.e.6 ENCODING DATA FOR STORAGE For each block weights and support vectors are required to be stored.2. To quantize the weights the maximum and minimum weight values (for the whole image) are found 30 . The support vectors are the Gaussian centers. The inversion bit is a single ‘0’ or ’1’. some the recovered DCT coefficients may have a magnitude lower than the maximum error defined for the SVM.2.

but significantly increases compression. 5. Find the difference (d) between quantization levels by d=max-min/n where n is the number of quantization levels.and the number of quantization levels are pre-defined. Each individual weight has an associated inversion bit. The above steps introduce many ‘zero’ values into the weight data. Huffman coding is an entropy encoding algorithm used for lossless data compression. otherwise it is positive. until qn = max. 4. An arbitrary number is added to all weights (the same number is added to all numbers) making all weights positive and non-zero. Call these max and min. it is a variable length code table for encoding a source symbol 31 . Find the maximum and minimum weight values. By setting inversion bits from ‘1’ to ‘0’ when the associated DCT is less than the error ε many more zeros are introduced. After quantization. Set remaining quantization levels by qm=qm-1+d. This introduces a small error when the image is decompressed. The steps taken to quantize the weights are: 1. Reassign each weight the value of the closest matching quantization level qm The inversion bits are now combined with the weights as follows. To recover the weights both the minimum quantization level and the arbitrary number must be stored.2. 3. Where the weight is not a support vector the inversion data is discarded. The inversion bit is combined with its corresponding weight to making the value of the weight negative if the inversion bit is ‘1’. Set lowest quantization level q1= min. This will ensure that all weights have a positive value. 3. the minimum quantization level is subtracted from each weight.2 HUFFMAN ENCODING The quantized weights are encoded using a Huffman encoding. The number of quantization levels chosen is a degree of freedom in the algorithm.6. 2.

the support vectors. Not all the terms in the row of AC coefficients are needed since the higher order terms do not contribute significantly to the image. The 2-dimensional DCT is applied to each block to produce a matrix of DCT coefficients. This will return a value for α from which we can compute the weights. Quantizing involves reassigning the value of the weight to one of limited number of values. The inversion bits are now combined with the weights. These selected training points are the support vectors. the weights and the inversion number need to stored/transmitted to be able to recover the block. In order to recover the image the DC coefficient. The SVM was implemented in Matlab with a quadratic programming. After quantization.e. the weights and the inversion number are stored. The AC components are used as training data to a SVM. The quantized weights and number of zeros between non zero weights are Huffman encoded to produce a binary file. To recover the weights both the minimum quantization level and the arbitrary number must be stored. The zig-zag mapping is applied to each matrix of DCT coefficients to obtain a single row of numbers for each original block of pixels.1 FORMULATION OF THE APPROACH: The image is first sub-divided into 8×8 blocks.CHAPTER 4 4. for each original block the Gaussian centers (i. The next step is to quantize the weights. The number of quantization levels chosen is a degree of freedom in the algorithm. Support vector machine learning is applied to the absolute values of each row of AC terms as described above and the inversion number for each block is generated. the minimum quantization level is subtracted from each weight. The compression of the SVM surface modeled images was computed from an actual binary file containing all information necessary 32 . A SVM trained on the data above with an error and Gaussian width set to different values. The first term of each row (the DC component) is separated so that only the AC terms are left. Exactly how many values are taken is a degree of freedom in the algorithm. the support vectors). This will ensure that all weights have a positive value. An arbitrary number is added to all weights (the same number is added to all numbers) making all weights positive and non-zero. To quantize the weights the maximum and minimum weight values (for the whole image) are found and the number of quantization levels are pre-defined. The SVM learning process selects the minimum number of training points to use as the centers of the Gaussian kernel functions in an RBF network in order for the function to be approximated within the insensitivity zone. By following this method.

4. To objectively measure image quality.to recover an approximated version of the original image. the signal to noise ratio (SNR) is calculated.2 FLOW CHART: 33 .

To quantize the weights the maximum and minimum weight values (for the whole image) are found and the number of quantization levels are pre-defined.CHAPTER 5 RESULTS AND ANALYSIS: In this section simulation results of the performance of image algorithm is being presented and the results are being compared with the existing JPEG algorithm. This will ensure that all weights have a positive value. Quantizing involves reassigning the value of the weight to one of limited number of values.e. the weights and the inversion number need to stored/transmitted to be able to recover the block. This will return a value for α from which we can compute the weights. In order to recover the image the DC coefficient. The zig-zag mapping is applied to each matrix of DCT coefficients to obtain a single row of numbers for each original block of pixels. After quantization. the support vectors. Not all the terms in the row of AC coefficients are needed since the higher order terms do not contribute significantly to the image. The 2-dimensional DCT is applied to each block to produce a matrix of DCT coefficients. The next step is to quantize the weights. The inversion bits are now combined with the weights. The support vector machine learning used is identical to the This is a constrained minimization problem with quadratic function and linear inequality constraints. An arbitrary number is added to all weights (the same number is added to all numbers) making all weights positive and non-zero. By following this method. Exactly how many values are taken is a degree of freedom in the algorithm. the support vectors). the minimum quantization level is subtracted from each weight. The AC components are used as training data to a SVM. The first term of each row (the DC component) is separated so that only the AC terms are left. In the implementation of Algorithm for Application of SVM(Regression) Learning and DCT to Image Compression. we first sub-divide image into 8×8 blocks. It is called quadratic programming. To recover the weights both the minimum quantization level and the arbitrary number must be stored. The quantized weights and number of zeros between non zero weights are Huffman encoded to produce a binary file. for each original block the Gaussian centers (i. Support vector machine learning is applied to the absolute values of each row of AC terms as described above and the inversion number for each block is generated. The number of quantization levels chosen is a degree of freedom in the algorithm. the weights and the inversion number are stored. The compression of the SVM surface modeled images was computed from an actual binary file containing all information necessary to recover an approximated version of 34 .

1 INPUT IMAGE: Figure 5.2 DIFFERENT VALUES OF EPSILON: 5. error between the output and Desired input(for one example block) was shown in 5.2.the original image. To objectively measure image quality.2(b). plot of DCT coefficients(for one example block) and plot of Absolute value of DCT coefficients.001 The image compression is obtained which can be seen in figures 5.1 Input image (a) Lena of size 128*128 is considered for compression 5.2(d) 35 .5.2 RESULTS OBTAINED FOR IMAGE COMPRESSION: 5.2(c). 5.2(e) and input image. 5.2(a).2.5.2.1 EPSILON=0. the signal to noise ratio (SNR) is calculated.

2 (a) Input image (b)DCT Coefficients(for one example block) (c)Absolute Value of DCT Coefficients (for one example block) (d) error between the output and Desired input(for one example block) (e)output image 5. 5.2 EPSILON=0.3(d) 36 .3(e) and input image.3(c).(a) (b) (c) (d) (e) Figure 5.5.2.3(b). plot of DCT coefficients(for one example block) and plot of Absolute value of DCT coefficients.2.01 The image compression is obtained which can be seen in figures 5.3(a).5. error between the output and Desired input(for one example block) was shown in 5.

4(a).4(e) and input image. plot of DCT coefficients(for one example block) and plot of Absolute value of DCT coefficients.4(d) 37 .3 EPSILON=0.2.4(c). error between the output and Desired input(for one example block) was shown in 5. 5.2.5.5.4(b).(a) (b) (c) (d) (e) Figure 5.3 (a) Input image (b)DCT Coefficients(for one example block) (c)Absolute Value of DCT Coefficients(for one example block) (d) error between the output and Desired input(for one example block) (e)output image 5.1 The image compression is obtained which can be seen in figures 5.

4 (a) Input image (b)DCT Coefficients(for one example block) (c)Absolute Value of DCT Coefficients(for one example block) (d) error between the output and Desired input(for one example block) (e)output image 38 .(a) (b) (c) (d) (e) Figure 5.

Quantization levels and number of supported vectors respectively with Compression Ratio and SNR(DB).Table 5.3 COMPARISON OF THE OBTAINED RESULTS WITH JPEG ALGORITHM: Analysis: For the purpose o comparison of the proposed algorithm and the JPEG algorithm we have the compression ratio for both the images.63 5.1 No of Quantizatio n Levels 60 60 60 Number of Supported Vectors 16128 16128 8756 Length Of Huffman Code 64343 36107 22354 Total Numbe r of Bits 131072 131072 131072 Compressio n Ratio SN R(D B) 38 22 18 2. Since the bound can be set previously before in our proposed algorithm we set the bound to be different values and we can see that the compressed images are being compared.03 3.001 0. But in JPEG it is found to be having lots of error even though the picture quality is being maintained. The signal to noise ratio was considered for comparison and the value is found in db as per the formula discussed in chapter 2.01 0. Different values of epsilon(ε) 0.1The number of bits with different Epsilon values.86 5. It is seen that the signal to noise ratio is very high in case of our algorithm and hence the image with the information is highly secured and we can obtain till image compression through this algorithm 39 .

2.6(b).4 INPUT IMAGE: Figure 5.5.5.2 RESULTS OBTAINED FOR JPEG COMPRESSION: 5.6(a) 40 .6(d) and input image 5.1 ANALYSIS: The image compression is obtained which can be seen in figures 5.5.5 Input image Lena of size 128*128 is considered for compression 5.6(c).

5.10 respectively Table 5.6 (a) Input image (b)(c)(d)compressed image for quality coefficients 2.2 The number of bits with different Quality Coefficient.(a) (b) (c) (d) Figure 5. 41 . Length of Huffman Coding respectively with Compression Ratio and SNR(DB).

7 19.. positive 42 . The inversion number allows us to recover the original sign (i. SVM learning is used to estimate the DCT coefficients within a predefined error.76 compression algorithm which takes advantage of SVM learning was presented.5 18. The algorithm exploits the trend of the DCT coefficients after the image has been transformed from the spacial domain to the frequency domain via the DCT. The net result of the SVM learning is to compress the DCT coefficients much further than other methods such as JPEG. The SVM is trained on the absolute magnitude of the DCT coefficients as these values require less SVs to estimate the underlying function.Quality coefficient CHAPTER 6 2 CONCLUSION: In this project.2 5.2 6.e. The algorithm also defines how the original values are recovered by the introduction of the inversion number. an image 5 10 Length Of Huffman Code 25201 21264 19381 Total Numbe r of Bits 131072 131072 131072 Compressio n Ratio SN R(D B) 21.16 6.

Verleysen. Neural Networks. “Image compression by self-organized Kohonen map. Conf.or negative) of each DCT coefficient so that combined with the magnitude of the coefficient as estimated by the SVM. May 1998. “The use of support vectors in image compression. June 2000. vol. 9. REFERENCES: [1] M. 2nd Int. 1995 [2] C. P. pp. Fundamentals of Artificial Neural Networks. Robinson and V. 43 . Amerijckx. Large compression ratios are possible with the new method while still retaining reasonable image quality. Thissen. M. and J.”Proc. Engineering Intelligent Systems. Kecman. [3] J. H. MA: MIT Press. Hassoun. Cambridge. Legat. a close approximation to the original value of the DCT coefficient is obtained in order to reconstruct the image. 503–507.” IEEE Trans. the new method produces better image quality than the JPEG compression algorithm for compression ratios.

Advances in Neural Information Processing Systems. [5] V. Compressed Image File Formats. J. Smola. MA: MIT Press. 1997. and A. pp. 9. Regression Estimation and Signal Processing. [9]Simon Haykins ‘Neural Networks: A Comprehensive Foundation (2nd edition)’ [10] J. Kecman. ISE/IEC IS 10918-1. MA: Addison-Wesley.[4] H.. N. L. 155–161. Neutral Networks and Fuzzy Logic Models. Support Vector Regression Machines. and V. Nat. MA: MIT Press. A. BIO DATA: Name: rama kishor mutyala Email : ramkishore_mutyala@yahoo. [7] V. C. 1994. Smola.Vapnik. 2001 [8] J. 1999. Vapnik. Drucker C. Reading. S. Standards Inst. [6] V. Jiang. Cambridge.” Signal Processing: Image Communication. New York:Springer-Verlag. [11] “Digital Compression and Coding of Continuous-Tone Still Images. Kaufmann. “Image compression with neural networks—A survey. Miano. 14. 1995. Vapnik. 1999.com Course: Bachelors of Technology University: Vellore Institute of Technology University Branch: Electronics and Communication Engineering Address: rama kishor mutyala 44 . vol.”Amer. Learning and Soft Computing: Support Vector Machines. The Nature of Statistical Learning Theory. Advances in Neural Information Processing Systems. Burges. 1997. MA: MIT Press. Cambridge. Support Vector Method for Function Approximation. Golowich. Cambridge. vol.

G.Near ramalayam street Gandhi Nagar.Door no: 2-74.Dist Andhra Pradesh-533434 45 . Vetlapalem Samalkot Mandal. E.

- A New Image Compression framework: DWT Optimization using LS-SVM regression under IWP-QPSO based hyper parameter optimizationby ijcsis
- JPEG2000 Image Compression Using SVM and DWTby International Journal of Science and Engineering Investigations
- Image Compression Shabbirby Thousif Khan
- Image Compression Using Vector Quantisationby Amit Kumar Singh

- A New Image Compression framework
- JPEG2000 Image Compression Using SVM and DWT
- Image Compression Shabbir
- Image Compression Using Vector Quantisation
- DCT
- Supervised transform for Data lossless Compression improvementSeddik_WCCCS2013.pdf
- gjcst_vol10_issue3_ver1__paper11
- Image Compression
- Embedded Vorbis Thesis
- Paper 9
- DIP7
- Assignment 5(Soln)
- Mad Unit 3-Jntuworld
- Utilizing the H.264:MPEG-4 AVC Compressed Domain for Computationally Cheap Abnormal Motion Detection
- MPEG-2 Long GoP vs AVC Comp-Strategies
- vvbvbvbb
- Vector Quantization
- crPro-userManual-200701
- Recent Advances in Face Recognition
- User Guide
- XDCAM WhitePaper F
- stegnography
- 2d Dct Architecture Explanation
- PREPROCESSING FOR PPM
- dct
- Stagenography Overview Report
- Excellent Paper on Compression
- A Novel VLSI Architecture for Image Compression Model Using Low Power Discrete Cosine Transform
- papers for testing
- 03 Design of Modified Adaptive Huffman Data Compression Algorithm
- Ram Kit Hes Is

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd