You are on page 1of 60


Chapter 1

Intensive studying of related areas was undertaken to get the clear concepts of the

entire project work.

1.1 Motivation

The survey of the following concepts has clearly made the working on the project

much easier. It has led to a tremendous amount of enthusiasm in gaining the knowledge

of a system which has gained importance in the field of Medical Images. The whole

project is more inspiring since it is linked to the lives of the many patients and provides

real-time collaboration among physicians and eminently enhances productivity and

patient coordination.

1.2 Medical Images

In the project the PACS system mainly deals with the Medical Image management

system and the display of the image in the same manner as acquired, from the place of

acquisition to the destination system. In order to transfer the large sized medical images,

the image must be compressed first and transferred using a suitable protocol. Hence there

is need to study the storage size of the Medical Images and different compression

standard to reduce the large sized Images and the Protocol which supports the format.

The size of large Medical images often exceeds the display area of the user's

output device. To present such images appropriately sophisticated image browsing

techniques has to be developed. Digital images are described as bitmaps formed of

individual pixels. The semantic content, or structural information, is not preserved in the

representation. As a result, images cannot be revised. Digital images result from either

real world capture or computer generation. They can be captured from the real world


through scanning or the use of a digital camera. Computer generation can be performed

with the use of a paint program, screen capture, or the conversion of a graphic into a

bitmap image.

The motivation for the compression of Medical images is illustrated through the

use of Table1.2.1. This Table shows the storage size, transmission bandwidth, and

transmission time needed for various types of uncompressed images. It is clear from these

values, that images require much storage space, large transmission bandwidths, and long

transmission times. With the present state of technology, the only solution is to compress

images before their storage and transmission. Then, at the receiver end, the compressed

images can be decompressed.

Image Size Bits/ Uncompressed Transmission Transmission

Type Pixel Size Bandwidth Time(Aprx.)

Gray 512 x 512 8bpp 262Kb 2.1 Mb/image 1min 13s


Color 512 x 512 24bpp 786Kb 6.29 Mb/image 3min 39s

Medical 2048 x 2048 12bpp 5.16 Mb 41.3 Mb/image 23min 54s

Table 1.2.1. Storage size and Transmission Bandwidth for various Images

The JPEG standard has been in use for almost a decade now. It has proved a

valuable tool during all these years, but it cannot fulfill the advanced requirements of

today. JPEG uses the Discrete Cosine Transform (DCT)-based method. With the

continual expansion of multimedia and Internet applications, the needs and requirements

of the technologies used grew and evolved. Today’s digital imagery is extremely

demanding, not only from the quality point of view, but also from the image size aspect.

Current Medical image size covers orders of magnitude, ranging from the size of less than

100 Kb to high quality scanned images of approximate size of 40 Gb. The JPEG 2000

international standard represents advances in image compression technology where the


image coding system is optimized not only for efficiency, but also for scalability and

interoperability in network and mobile environments. Digital imaging has become an

integral part of the Internet, and JPEG2000 is a powerful new tool that provides power

capabilities for designers and users of networked image applications. After compression

there was a need to transfer the image. The JPEG2000 standard uses the JPIP Protocol for

image browsing. As PACS system involves compression of the Medical Images &

transmission on the Web, the JPEG2000 technique fulfills the need of the system.

1.3 Preamble

In Medical Imaging, PACS (Picture Archiving and Communication System) is an

integrated system of digital products and technology allowing for acquisition, storage,

retrieval, and display of radiographic images.

The key components of PACS are modality interfaces, a network backbone, a

database management system, an image management system, a long-term archive, and

diagnostic and clinical workstations. A PACS includes interfaces with the hospital

information system (HIS) and radiology information system (RIS). A web server,

allowing Internet access, is also a strategic component of PACS.

The medical images are stored in an independent format. The most common

format for image storage is DICOM (Digital Imaging and Communications in Medicine).

Digital Imaging and Communications in Medicine (DICOM) is a comprehensive set of

standards for handling, storing and transmitting information in medical imaging. It

includes a file format definition and a network communication protocol.

The goal of the project is to compress the medical images using the JPEG2000

format. The JPEG2000 uses the wavelet transform for compression & retrieval of the

compressed image, which is then stored in the server & the compressed images are

transferred over the network for viewing through the JPIP protocol.

PACS replaces hard-copy based means of managing medical images, such as film

archives. It expands on the possibilities of such conventional systems by providing

capabilities of off-site viewing and reporting (tele-education, tele-diagnosis).

Additionally, it enables practitioners on various physical locations to peruse the same

information simultaneously. With the ever-decreasing price of digital storage, PACS

systems are overwhelmingly cost-effective.

1.4 Problem Formulation

Despite the increase in the use of image modalities that allow the accomplishment

of cuts us part, such as the computerized cat scan (CT), the ultrasound (US) and the

magnetic resonance (MR), which, in general way, supply images in digital format, the

examinations of conventional radiology continue representing 70% of the examinations

carried through in a radiology department. With this system it does not have reduction of

time or work, they continue existing problems and still add bigger risk of errors is


The success of any healthcare service is dependent on the efficient use and sharing

of patient information. The problem statement is, “Diagnostic images are frequently lost,

misplaced, and unread. The diagnostic images and information is not available anywhere,

the cost involved is also more, which involves decrease in the customer service and

efficiency”. There was a requirement for image processing, handling, and display,

particularly in the same color and motion is needed in diagnostic images.

In pursuance of the above goals, we have decided to implement a system, in which

the Medical Images are compressed, manipulated & Metadata are added to the Diagnosed

portion of the image & then stored, which will be transferred from the place of acquisition

from the storage system, to the station for further diagnosis’s using “Implementation of

mini-PACS in Healthcare Systems”.


1.5 Scope of the Project

Medical imaging is important and widespread in the diagnosis of disease. In

certain situations, however, the particular manner in which the images are made available

to physicians and their patients introduces obstacles to timely and accurate diagnosis of

disease. These obstacles generally relate to the fact that each manufacturer of a medical

imaging system uses different and proprietary formats to store the images in digital form.

This means, for example, that images from a scanner manufactured by General Electric

Corp. are stored in a different digital format compared to images from a scanner

manufactured by Siemens Medical Systems. Further, images from different imaging

modalities such as ultrasound and MRI are stored in formats different from each other. In

practice, viewing of medical images typically requires a different proprietary

"workstation" for each manufacturer and for each modality.

In principle, medical images could be converted to Internet web pages for

widespread viewing. Several technical limitations of current Internet standards, however,

create a situation where straightforward processing of the image data results in images

which transfer across the Internet too slowly, lose diagnostic information, or both. One

such limitation is the bandwidth of current Internet connections which, because of the

large size of medical images, result in transfer times which are unacceptably long. The

problem of bandwidth can be addressed by compressing the image data before transfer,

but compression typically involves loss of diagnostic information. In addition, due to the

size of the images the time required to process image data from an original format to a

format which can be viewed by Internet browsers is considerable, meaning that systems

designed to create web pages "on the fly" introduce a delay of seconds to minutes while

the person requesting to view the images waits for the data to be processed. Workstations

allow images to be reordered or placed "side-by-side" for viewing but again an Internet


system would have to create new web pages "on the fly" which would introduce further

delays. Finally, diagnostic interpretation of medical images requires the images are

presented with appropriate brightness and contrast. On proprietary workstations these

parameters can be adjusted by the person viewing the images but control of image

brightness and contrast are not features of current Internet standards (http of html).

Chapter 2




2.1 Introduction to PACS

PACS (Picture Archiving and Communication Systems) are high-speed, graphical,

computer network systems for the storage, retrieval, and display of radiologic images.

PACS are electronic medical image management systems. They consist of image display

systems, archiving systems, networks, and interfaces, presenting one unified system to the


Picture - Digital diagnostic image (radiological)

Archiving - Electronic storage & retrieval

Communication - Computer network (multiple access)

Systems - Control of the processes (integrated technology)

Figure 2.1.1: PACS system

The images are acquired, compressed, archived and retrieved over a network for

diagnosis and review by physicians. These images can be interpreted and viewed at

workstations, which can also double as archive stations for image storage. The

introduction of client/server computing, improved digital imaging and computer network


technologies, along with the advancement of the DICOM and HL7 standards have put

PACS along side radiology information systems (RIS) as an ideal solution for managing

radiological images.

One of the main benefits that PACS provides is the ability to provide a timely

delivered and efficient access to images, interpretations and related data throughout the

organization. This helps to ease consultations between physicians who can now

simultaneously access the same images over networks, leading to a better diagnosis

process. It is also beneficial to physicians in emergency situations, as they need not wait

for long periods in order to view a patient’s radiological images as these are instantly

available on the network when ready. Another feature of PACS is the ability to digitally

enhance the images, providing more detailed and sharper images. This improves

diagnostic capabilities at radiological examinations.

The high costs of PACS has led to vendors offering mini-PACS, which is a cheap

alternative for organizations that cannot afford the cost of a full PACS system or those

seeking to implement seeing to implement some form of a digital image management

system but would rather start off with something small. While PACS are considered to be

at a minimum hospital wide, mini-PACS usually tend to be departmental-based

(radiology, emergency room, or orthopedics). Mini-PACS are easy to maintain and cheap

to repair, and they can gradually be upgraded to a fully functioning hospital wide PACS.

2.2 Types of PACS

Full PACS handle images from various modalities, such as ultrasonography,

radiography, magnetic resonance imaging, positron emission tomography, computed

tomography and plain X-rays. Small-scale systems that handle images from a single


CT MRI Ultrasound Laser Scanner

Image Database

Workstation Workstation
Review-station Review-station

Figure 2.2.1: The Entire PACS System

modality (usually connected to a single acquisition device) are sometimes called mini-


The key components of PACS are modality interfaces, a network backbone, a

database management system, an image management system, a long-term archive, and

diagnostic and clinical workstations. A PACS includes interfaces with the hospital

information system (HIS) and radiology information system (RIS). A web server,

allowing Internet access, is also a strategic component of PACS.

Typically a PACS network consists of a central server which stores a database

containing the images. This server is connected to one or more clients via a LAN a local

area network. The PACS is connected to an interface engine and receives orders for

diagnostic studies, which it then matches to image sets coming into the PACS from the

digital modalities (CT, CR, and MR, etc.) via DICOM (a digital imaging communications


standard) in order to ensure that all images are associated with the right patient. To

process these order messages successfully, the PACS must receive from the RIS

admission/discharge/transfer messages about patients. Finally, the PACS receive

electronically signed reports from the RIS, which it then archives with the images so that

reports and images may be retrieved and displayed concurrently. For scanning image

films into the system, printing image films from the system and interactive display of

digital images. PACS workstations offer means of manipulating the images (crop, rotate,

zoom, brightness, contrast and others).

2.3 Storage

Image storage and communication can be based on either a centralized or

distributed architecture. In centralized storage system all the acquired images are forward

to a central archive system to which every modality or workstation is attached on a point-

to-point basis. Whereas a distributed architecture is composed of linked local storage

subsystems or file servers. Each server has its own short-term storage unit (usually a

small RAID), one or more image acquisition modalities, and several diagnostic/review

workstations. Each of these architectures has its own advantages and disadvantages.

However distributed storage architecture has been found suitable for large-scale PACS

and centralized architecture for miniPACS.

Storage Media: PACS storage devices should hold gigabytes of data with

relatively efficient access time. Research continues to consistently improve PACS by

providing storage media that can hold many images and have quick access time. A PACS

needs at least two levels of archive (short-term and long-term). Images should be

retrievable from the short-term archive in 2 seconds. Images from the long-term archive

should take no more than 3 minutes to retrieve.


Figure 2.3.1: Storage subsystem in a distributed large scale PACS

Examples of storage media that can be used for PACS archiving include:

• Redundant array of inexpensive disks (RAID) for immediate access of current


• Magnetic disks for faster retrieval of cached images.

• Erasable magneto-Optical disks for temporary long term archive.

• Write once read many (WORM) in the optical disk library, which constitute the

permanent archive.

• Recently developed digital versatile discs (DVD-ROM) for low cost permanent


• And the digital linear tapes for backup storage


2.4 Multiple Modality Interface

First generation modality interfaces were achieved primarily through video

acquisition techniques, whereas second generation interfaces made use of ACRNEMA

V2, proprietary digital and early DICOM storage interfaces. Such interfaces allowed for

the exportation of image information into the PACS. The “double entry” problem that of

entering one set of patient information into the RIS and an incorrect variant of that same

information into the modality was not significantly addressed by first or second

generation systems. Some second generation modality interface gateways (“protocol

converters”) made use of DICOM modality worklist interfaces to rationalize the RIS and

modality patient information. However, such interfaces were the exceptions to the rule.

Modality image acquisition, as a technical problem in PACS, has largely been

solved by the widespread implementation of the DICOM standard and the availability of

image “protocol converters,” used to interface pre-DICOM modalities. The modality

interface problem facing third generation PACS is that of truly solving the “double entry”

problem and improving workflow. Although most, if not all, modalities manufactured

today support the DICOM standard, they support only the DICOM storage SOP portion

of the standard. A smaller number of modalities support DICOM worklist management

SOP or DICOM detached study management SOP, which are required to solve the

“double entry” problem; even fewer modalities manufactured today support the DICOM

modality performed procedure step SOP, which is used to track the status of image

acquisition workflow.

The next generation modality interfaces (“protocol converters”) will have to

address these informatics problems. Such devices will have to provide for the acquisition

of images, the rationalization of the patient information with the RIS patient information,

and the reporting of modality workflow status. These interfaces will provide a complete


DICOM interface to the modality. In the case of some modalities, these interfaces will be

supplied as a part of the modality itself.

The best examples of such modality interfaces can be found in the world of

computed radiography. These interfaces present a modality worklist to the operator,

associate records from the worklist with the digitized images provide image quality

control, transmit the digitized images to multiple destinations, print images to both local

and network printers, and report modality workflow status.

2.5 Application Level

The Integrated Medical Application (IMA) gives the physician access to medical

services and functionality from a single graphical desktop. The desktop offers services for

the local and remote access to electronic patient records of hospitals, specialist clinics,

and general practitioners. The distributed patient records within a hospital or practice are

logically combined by means of a meta-patient record. A meta-patient record provides

information about the local multimedia patient documents. In addition to basic data such

as document type, location and creation date, the record provides information concerning

the document structure. The information supplied by the meta-record from each practice

and hospital can be combined to form the complete, virtual patient record. The

management and access to each record and each document is carried out by the Open

Distributed Management System. An object-oriented model has been used throughout.

Each local implementation of the record may be different, depending on the facilities of

the local environment, but each presents the same external interface to the outside. This

mechanism enables the scalable integration over a wide area of all patient documents.

The central components of the IMA provide functionality for the transparent

access of local and remote documents obtained through selected meta-records services,

the navigation in the patient record, the visualization of multimedia documents, and the


processing of images. Other tools that have been integrated in the graphical desktop

include advanced image processing capabilities e.g. for quantification of stenosis,

communications services such as email, text processing, desktop conferencing, and access

to external information sources such as the Internet World-Wide Web.

The applications of PACS are as follows:-

• Rapid access to critical information to decrease exam-to-diagnosis time. This is

especially useful in emergency and operating rooms.

• Elimination of film, handling and storage costs.

• Images can be easily shared between reading radiologists, other physicians and

medical records.

• Radiologists can access soft-copy images instantly after acquisition to expedite

diagnosis and reporting at the almost any available workstation.

• Web servers can be used to most cost-effectively share images with other

departments, even referring physicians across town. They can access the images

using the Internet or the local intranet.

• Hardcopy films or paper printouts can be made when needed for traditional

archiving or the provision of images to other departments.

Images can be archived at secure locations using database servers manages the

transfer, retrieval and storage of images and relevant information, the archive provides

permanent image storage.


2.6 Network

Topology refers to the way the network is laid out physically or logically. Two or more

devices connect to a link, and then two or more links form a topology. Five basic

topologies are possible: Bus, star, tree, mesh and ring.

PACS communication networks enable the movement of medical data between

modality imaging devices, gateway computers, PACS server, display workstations,

remote locations for diagnosis and consultation and other Hospital information systems

like HIS/RIS.

The most commonly used network technologies in building PACS networks are:

• The Ethernet based on IEEE standard 802.3, Carrier Sense Multiple Access with

Collision Detection (CSMA/CD) protocol. Suitable for LANs and can operate at

10Mbits/Sec on half-inch coaxial cable, twisted pair wire or fiber optic cables.

• FDDI can be used for medium speed communication. Runs on optic fiber at

100Mbits/sec over a distance of 200kms with upto 1000 stations connected.

• ATM: Can be used to combine LAN and WAN application. ATM is a Virtual

circuit- oriented packet switching network with transmission speed ranging from

51.84 Mbits/sec – 2.5 Gbits/sec.

Conceptually three main types of networks are used to transport radiology images:

• A LAN linking imaging devices, data storage units and display devices within one

departmental area.

• A larger LAN for intra-hospital transport linking departments,

• A tele-radiology network for transmission of images to other hospitals in the

region or to and from remote sites for diagnosis at a distance.


2.7 Digital Imaging and Communications in Medicine

2.7.1 Medical Imaging Standard

The Digital Imaging and Communications in Medicine (DICOM) standard was

created by the National Electrical Manufacturers Association (NEMA) to aid the

distribution and viewing of medical images, such as CT scans, MRIs, and ultrasound.

Digital Imaging and Communications in Medicine (DICOM) is a comprehensive

set of standards for handling, storing and transmitting information in medical imaging. It

includes a file format definition and a network communications protocol. This protocol is

an application protocol; it uses TCP/IP to communicate between systems. DICOM files

can be exchanged between two entities that have the capability to receive the information

- image and patient data - in DICOM format.

DICOM was developed to enable integration of scanners, servers, workstations

and network hardware from multiple vendors into a picture archiving and communication

system. The different machines, servers and workstations come with DICOM

conformance statements which clearly state the DICOM classes supported by them.

DICOM has been widely adopted by hospitals and is making inroads in smaller

applications like dentist's and doctor's offices. DICOM is categorized into two different

transmissions; DICOM Store and DICOM Print. DICOM Store is a format to send to a

PACS System or Workstation. DICOM Print is a format to send to a DICOM Printer,

normally to print an "X-Ray" film. Most vendors require individual licenses to perform

these types of transmissions. This standard provides the sender image quality control over

the image being sent.

Despite of the diversity of sources for digital medical imaging (CR, Digital X-Ray

detectors, MRI, CT, PET, SPECT, Ultrasound, etc.) all modalities are usually encoded in

DICOM format. DICOM stands for Digital Image and Communications in Medicine and

it is a format world-wide accepted by medical users and vendors of medical image

acquisition devices. The consolidation of DICOM has leveraged the development of

computer applications for the processing of medical image studies, and has eased the set

up of collaborative environments, where medical images, preserving their clinical value,

can be understood by different computer applications.

2.7.2 Why was DICOM created?

The standard makers were used to feel big confidence while giving data

interchange and communication support because they pushed the clients to buy

equipment of the same company.

Imagine that we wanted to exchange a picture from a TAC (Tomography

computerized equipment) with another user that was using a radiotherapy planification

system. Then we might have rewritten all the software code in the planification system so

it would be aloud to read the picture. The same would happen if we wanted to update the

TAC system with the picture from the planification system. DICOM was created to solve

these problems of compatibility.

2.7.3 DICOM Standard

This Standard, now designated Digital Imaging and Communications in Medicine

(DICOM), embodies a number of major enhancements -

a. It is applicable to a networked environment. DICOM supports operation in

a networked environment using industry standard networking protocols

such as OSI and TCP/IP.

b. It specifies how devices claiming conformance to the Standard react to

commands and data being exchanged. DICOM specifies, through the


concept of Service Classes, the semantics of commands and associated


c. It specifies levels of conformance. DICOM explicitly describes how an

implementer must structure a Conformance Statement to select specific


d. It is structured as a multi-part document. This facilitates evolution of the

Standard in a rapidly evolving environment by simplifying the addition of

new features. ISO directives which define how to structure multi-part

documents have been followed in the construction of the DICOM


e. It introduces explicit Information Objects not only for images and graphics

but also for studies, reports, etc.

f. It specifies an established technique for uniquely identifying any

Information Object. This facilitates unambiguous definitions of

relationships between Information Objects as they are acted upon across

the network.

2.7.4 Overview of DICOM Standard

The contents of the DICOM standard go far beyond a definition of an exchange

format for medical image data. DICOM defines

• Data structures (formats) for medical images and related data,

• Network oriented services, e. g.

o image transmission,

o query of an image archive (PACS),


o print (hardcopy), and

o RIS - PACS - modality integration

• Formats for storage media exchange, and

• Requirements for conforming devices and programs.

2.7.5 DICOM data structures

A DICOM image consists of a list of data elements (so-called attributes) which

contain a multitude of image related information:

• Patient information (name, sex, identification number),

• Modality and imaging procedure information (device parameters, calibration,

radiation dose, contrast media), and

• Image information (resolution, windowing).

For each modality, DICOM precisely defines the data elements that are required,

optional (i. e. may be omitted) or required under certain circumstances (only if contrast

media was used). This powerful flexibility is at the same time, however, one crucial

weakness of the DICOM standard because practical experience shows that image objects

are frequently incomplete. In such objects, required fields are missing or contain incorrect

values. These problems can lead to subsequent problems when exchanging data.

2.7.6 DICOM network services

The DICOM network services are based on the client/server concept. In case two

DICOM applications want to exchange information, they must establish a connection and

agree on the following parameters:


Who is client and who is server,

• which DICOM services are to be used, and

• in which format data is transmitted (e. g. compressed or uncompressed).

Only if both applications agree on a common set of parameters, the connection can

and will be established. In addition to the most basic DICOM service "image

transmission" (or in DICOM terminology: "Storage Service Class") there are number of

advanced services, e. g:

• The DICOM image archive service ("Query/ Retrieve Service Class") allows to

search images in a PACS archive by certain criteria (patient, time of creation of

the images, modality etc.) and to selectively download images from this archive.

• The DICOM print service ("Print Management Service Class") allows to access

laser cameras or printers over a network, so that multiple modalities and

workstations can share one printer.

• The DICOM modality worklist service allows to automatically downloading up-

to-date worklists including a patient's demographic data from an information

system (HIS/RIS) to the modality.

2.7.7 Media exchange

In addition to the exchange of medical images over a network, media exchange has

become another focus which has been integrated into the DICOM standard. Fields of

application are for example the storage or cardiac angiography films in cardiology or the

storage of ultrasound images. In order to make sure that DICOM storage media are really

interchangeable, the standard defines so-called "application profiles" which explicitly


• images from which modalities may be present on the medium (e. g. 'only X-Ray

Angiography images'),

• which encoding formats and compression schemes may be used (e. g. only

uncompressed or loss-less JPEG),

• Which storage medium is to be used (e. g. 'CD-R with ISO file system').

Apart from the image files, each DICOM medium contains a so-called "DICOM

directory". This directory contains the most important information (patient name,

modality, unique identifiers etc.) for all images which are captured on the medium. With

the necessary help of this directory, it is possible to quickly browse or search through all

images on the medium without having to read the complete image files - which would for

instance take a couple of minutes when reading from a CD.

2.7.8 DICOM File Format

A single DICOM file contains both a header (which stores information about the

patient's name, the type of scan, image dimensions, etc), as well as all of the image data

(which can contain information in three dimensions). DICOM image data can be

compressed (encapsulated) to reduce the image size. Files can be compressed using lossy

or lossless variants of the JPEG format.

DICOM files consist of a header with standardized as well as free-form fields and

a body of image data. A single DICOM file can contain one or more images, allowing

storage of volumes and/or animations. Image data can be compressed using JPEG


DICOM differs from other data formats in that it groups information together into

a data set. That is, an X-Ray of your chest is in the same file as your patient ID, so that


the image is never mistakenly separated from your information. It also mandates the

presence of a media directory, the DICOMDIR file that provides index and summary

information for all the DICOM files on the media.

DICOM restricts the filenames on DICOM media to 8 character names

(sometimes 8.3). This is a common source of problems with media created by developers

that did not read the specifications carefully. This is a historical requirement to maintain

compatibility with older existing systems. The DICOMDIR information provides

substantially greater information about each file than any filename could, so there is less

need for meaningful file names.


Chapter 3


3.1 Wavelet Transform

The transform of a signal is just another form of representing the signal. It does

not change the information content present in the signal. The Wavelet Transform provides

a time-frequency representation of the signal. It was developed to overcome the short

coming of the Short Time Fourier Transform (STFT), which can also be used to analyze

non-stationary signals. While STFT gives a constant resolution at all frequencies, the

Wavelet Transform uses multi-resolution technique by which different frequencies are

analyzed with different resolutions.

A wave is an oscillating function of time or space and is periodic. In contrast,

wavelets are localized waves. They have their energy concentrated in time or space and

are suited to analysis of transient signals. While Fourier Transform and STFT use waves

to analyze signals, the Wavelet Transform uses wavelets of finite energy.

Figure 3.1.1 : (a) a Wave (b) a Wavelet

The wavelet analysis is done similar to the STFT analysis. The signal to be

analyzed is multiplied with a wavelet function just as it is multiplied with a window


function in STFT, and then the transform is computed for each segment generated.

However, unlike STFT, in Wavelet Transform, the width of the wavelet function changes

with each spectral component. The Wavelet Transform, at high frequencies, gives good

time resolution and poor frequency resolution, while at low frequencies, the Wavelet

Transform gives good frequency resolution and poor time resolution.

3.2 The Continuous Wavelet Transform and the Wavelet Series

The Continuous Wavelet Transform (CWT) is provided by equation 3.2.1, where

x(t) is the signal to be analyzed. y(t) is the mother wavelet or the basis function. All the

wavelet functions used in the transformation are derived from the mother wavelet through

translation (shifting) & scaling (dilation or compression).

_1 _ _
√ 
_t τ_- _
(Tτ . s = ) | s |∫ x ( y *t ) s d t

The mother wavelet used to generate all the basis functions is designed based

on some desired characteristics associated with that function. The translation parameter τ

relates to the location of the wavelet function as it is shifted through the signal. Thus, it

corresponds to the time information in the Wavelet Transform. The scale parameter s is

defined as |1/frequency| and corresponds to frequency information. Scaling either dilates

(expands) or compresses a signal. Large scales (low frequencies) dilate the signal and

provide detailed information hidden in the signal, while small scales (high frequencies)

compress the signal and provide global information about the signal. The Wavelet

Transform merely performs the convolution operation of the signal and the basis function.

The above analysis becomes very useful as in most practical applications, high

frequencies (low scales) do not last for a long duration, but instead, appear as short bursts,

while low frequencies (high scales) usually last for entire duration of the signal.


The Wavelet Series is obtained by discretizing CWT. This aids in computation of

CWT using computers and is obtained by sampling the time-scale plane. The sampling

rate can be changed accordingly with scale change without violating the Nyquist

criterion. Nyquist criterion states that, the minimum sampling rate that allows

reconstruction of the original signal is 2ω radians, where ω is the highest frequency in the

signal. Therefore, as the scale goes higher (lower frequencies), the sampling rate can be

decreased thus reducing the number of computations.

3.3 The Discrete Wavelet Transform

The Wavelet Series is just a sampled version of CWT and its computation may

consume significant amount of time and resources, depending on the resolution required.

The Discrete Wavelet Transform (DWT), which is based on sub-band coding, is found to

yield a fast computation of Wavelet Transform. It is easy to implement and reduces the

computation time and resources required.

The foundations of DWT go back to 1976 when techniques to decompose discrete

time signals were devised. Similar work was done in speech signal coding which was

named as sub-band coding. In 1983, a technique similar to sub-band coding was

developed which was named pyramidal coding. Later many improvements were made to

these coding schemes which resulted in efficient multi-resolution analysis schemes.

In CWT, the signals are analyzed using a set of basis functions which relate to

each other by simple scaling and translation. In the case of DWT, a time-scale

representation of the digital signal is obtained using digital filtering techniques. The

signal to be analyzed is passed through filters with different cutoff frequencies at different


3.4 Classification of wavelets

We can classify wavelets into two classes: (a) orthogonal and (b) biorthogonal.

Based on the application, either of them can be used.

(a) Features of orthogonal wavelet filter banks –

The coefficients of orthogonal filters are real numbers. The filters are of the same length

and are not symmetric. The low pass filter, G0 and the high pass filter, H0 are related to

each other by

H0 (z) = z -N G0 (-z-1) (3.4.1)

The two filters are alternated flip of each other. The alternating flip automatically

gives double-shift orthogonality between the low pass and high pass filters, i.e., the scalar

product of the filters, for a shift by two is zero. i.e.,

∑G[k] H[k-2l] = 0,

where k, lЄZ. Perfect reconstruction is possible with alternating flip. Also, for perfect

reconstruction, the synthesis filters are identical to the analysis filters except for a time

reversal. Orthogonal filters offer a high number of vanishing moments. This property is

useful in many signal and image processing applications. They have regular structure

which leads to easy implementation and scalable architecture.

(b) Features of biorthogonal wavelet filter banks -

In the case of the biorthogonal wavelet filters, the low pass and the high pass filters do not

have the same length. The low pass filter is always symmetric, while the high pass filter

could be either symmetric or anti-symmetric. The coefficients of the filters are either real

numbers or integers.

For perfect reconstruction, biorthogonal filter bank has all odd length or all even

length filters. The two analysis filters can be symmetric with odd length or one symmetric

and the other anti-symmetric with even length. Also, the two sets of analysis and

synthesis filters must be dual. The linear phase biorthogonal filters are the most popular

filters for data compression applications.


3.5 Wavelet Families

There are a number of basis functions that can be used as the mother wavelet for

Wavelet Transformation. Since the mother wavelet produces all wavelet functions used in

the transformation through translation and scaling, it determines the characteristics of the

resulting Wavelet Transform. Therefore, the details of the particular application should be

taken into account and the appropriate mother wavelet should be chosen in order to use

the Wavelet Transform effectively. Figure 4.5.1 illustrates some of the commonly used

wavelet functions. Haar wavelet is one of the oldest and simplest wavelet. Therefore, any

discussion of wavelets starts with the Haar wavelet. Daubechies wavelets are the most

popular wavelets. They represent the foundations of wavelet signal processing and are

used in numerous applications. These are also called Maxflat wavelets as their frequency

responses have maximum flatness at frequencies 0 and π. This is a very desirable property

in some applications.

Figure 3.5.1: Wavelet families (a) Haar (b) Daubechies4 (c) Coiflet1 (d) Symlet2 (e) Meyer

(f) Morlet (g) Mexican Hat.


The Haar, Daubechies, Symlets and Coiflets are compactly supported orthogonal

wavelets. These wavelets along with Meyer wavelets are capable of perfect

reconstruction. The Meyer, Morlet and Mexican Hat wavelets are symmetric in shape.

The wavelets are chosen based on their shape and their ability to analyze the signal in a

particular application.

3.6 DWT and Filter Banks

3.6.1 Multi-Resolution Analysis using Filter Banks :

Filters are one of the most widely used signal processing

functions. Wavelets can be realized by iteration of filters with rescaling.

The resolution of the signal, which is a measure of the amount of detail

information in the signal, is determined by the filtering operations, and

the scale is determined by up sampling and down sampling

(subsampling) operations. The DWT is computed by successive low

pass and high pass filtering of the discrete time-domain signal as

shown in Figure 4.6.1. This is called the Mallat algorithm or Mallat-tree

decomposition. Its significance is in the manner it connects the

continuous-time mutiresolution to discrete-time filters. In the Figure,

the signal is denoted by the sequence x[n], where n is an integer. The

low pass filter is denoted by G0 while the high pass filter is denoted by

H0. At each level, the high pass filter produces detail information, d[n],

while the low pass filter associated with scaling function produces

coarse approximations, a[n].

At each decomposition level, the half band filters produce signals

spanning only half the frequency band. This doubles the frequency

resolution as the uncertainty in frequency is reduced by half. In


accordance with Nyquist’s rule if the original signal has a highest

frequency of ω, which requires a sampling frequency of 2ω radians,

then it now has a highest frequency of ω/2 radians. It can now be

sampled at a frequency of ω radians thus discarding half the samples

with no loss of information. This decimation by 2 halves

Ho 2
Ho 2
X[n] d2[n]
Go 2

d3[n] Go 2
Go 2


Figure 3.6.1:Three-level Wavelet decomposition tree

the time resolution as the entire signal is now represented by only half

the number of samples. Thus, while the half band low pass filtering

removes half of the frequencies and thus halves the resolution, the

decimation by 2 doubles the scale.

With this approach, the time resolution becomes arbitrarily good at high

frequencies, while the frequency resolution becomes arbitrarily good at low frequencies.

The filtering and decimation process is continued until the desired level is reached. The

maximum number of levels depends on the length of the signal. The DWT of the original


signal is then obtained by concatenating all the coefficients, a[n] and d[n], starting from

the last level of decomposition. Figure4.6.2 shows the reconstruction of the original

2 H1
2 H1

d3[n] H1 X[n]
2 2 G1
2 G1

2 G1

Figure 3.6.2: Three-level Wavelet reconstruction tree.

signal from the wavelet coefficients. Basically, the reconstruction is the reverse process of

decomposition. The approximation and detail coefficients at every level are upsampled by

two, passed through the low pass and high pass synthesis filters and then added. This

process is continued through the same number of levels as in the decomposition process

to obtain the original signal. The Mallat algorithm works equally well if the analysis

filters, G0 and H0, are exchanged with the synthesis filters, G1

3.6.2 Conditions for Perfect Reconstruction

In most Wavelet Transform applications, it is required that the original signal be

synthesized from the wavelet coefficients. To achieve perfect reconstruction the analysis

and synthesis filters have to satisfy certain conditions. Let G0(z) and G1(z) be the low pass

analysis and synthesis filters, respectively and H0(z) and H1(z) the high pass analysis and

synthesis filters respectively. Then the filters have to satisfy the following two conditions

G0 (-z) G1 (z) + H0 (-z). H1 (z) = 0 (3.6.1)

G0 (z) G1 (z) + H0 (z). H1 (z) = 2z-d


The first condition implies that the reconstruction is aliasing-free and the second

condition implies that the amplitude distortion has amplitude of one. It can be observed

that the perfect reconstruction condition does not change if we switch the analysis and

synthesis filters.

There are a number of filters which satisfy these conditions. But not all of them

give accurate Wavelet Transforms, especially when the filter coefficients are quantized.

The accuracy of the Wavelet Transform can be determined after reconstruction by

calculating the Signal to Noise Ratio (SNR) of the signal. Some applications like pattern

recognition do not need reconstruction, and in such applications, the above conditions

need not apply.

3.7 Applications of Wavelets

There is a wide range of applications for Wavelet Transforms. They are applied in

different fields ranging from signal processing to biometrics, and the list is still growing.

One of the prominent applications is in the FBI, fingerprint compression standard.

Wavelet Transforms are used to compress the fingerprint pictures for storage in their data

bank. The previously chosen Discrete Cosine Transform (DCT) did not perform well at

high compression ratios. It produced severe blocking effects which made it impossible to

follow the ridge lines in the fingerprints after reconstruction. This did not happen with

Wavelet Transform due to its property of retaining the details present in the data.

In DWT, the most prominent information in the signal appears in high amplitudes

and the less prominent information appears in very low amplitudes. Data compression can

be achieved by discarding these low amplitudes. The wavelet transforms enables high

compression ratios with good quality of reconstruction. At present, the application of

wavelets for image compression is one the hottest areas of research. The Wavelet


Transforms have been chosen for the JPEG 2000 compression standard. Figure 3.7.1

shows application of wavelets in signal processing.

Input Signal Transform Inverse Wavelet Output Signal
Processing Transform

Figure 3.7.1: Signal processing application using Wavelet Transform

Chapter 4


4.1 Embeded Zero –Tree Wavelet Transform

EZW (Embedded Zerotrees of Wavelet Transforms) is a lossy image compression

algorithm. At low bit rates (i.e. high compression ratios) most of the coefficients

produced by a sub-band transform (such as the wavelet transform) will be zero, or very

close to zero. This occurs because "real world" images tend to contain mostly low

frequency information (highly correlated). However where high frequency information

does occur (such as edges in the image) this is particularly important in terms of human

perception of the image quality, and thus must be represented accurately in any high

quality coding scheme.

By considering the transformed coefficients as a tree (or trees) with the lowest

frequency coefficients at the root node and with the children of each tree node being the

spatially related coefficients in the next higher frequency sub-band, there is a high

probability that one or more sub-trees will consist entirely of coefficients which are zero

or nearly zero, such sub-trees are called zero-trees. Due to this, we use the terms node and

coefficient interchangeably, and when we refer to the children of a coefficient, we mean

the child coefficients of the node in the tree where that coefficient is located. We use

children to refer to directly connected nodes lower in the tree and descendants to refer to

all nodes which are below a particular node in the tree, even if not directly connected.

In zero-tree based image compression scheme such as EZW and SPIHT, the intent

is to use the statistical properties of the trees in order to efficiently code the locations of

the significant coefficients. Since most of the coefficients will be zero or close to zero, the

spatial locations of the significant coefficients make up a large portion of the total size of

a typical compressed image. A coefficient (likewise a tree) is considered significant if its

magnitude (or magnitudes of a node and all its descendants in the case of a tree) is above

a particular threshold. By starting with a threshold which is close to the maximum

coefficient magnitudes and iteratively decreasing the threshold, it is possible to create a

compressed representation of an image which progressively adds finer detail. Due to the

structure of the trees, it is very likely that if a coefficient in a particular frequency band is

insignificant, then all its descendants (the spatially related higher frequency band

coefficients) will also be insignificant.

EZW uses four symbols to represent (a) a zero-tree root, (b) an isolated zero (a

coefficient which is insignificant, but which has significant descendants), (c) a significant

positive coefficient and (d) a significant negative coefficient. The symbols may be thus

represented by two binary bits. The compression algorithm consists of a number of

iterations through a dominant pass and a subordinate pass, the threshold is updated

(reduced by a factor of two) after each iteration. The dominant pass encodes the

significance of the coefficients which have not yet been found significant in earlier

iterations, by scanning the trees and emitting one of the four symbols. The children of a

coefficient are only scanned if the coefficient was found to be significant, or if the

coefficient was an isolated zero. The subordinate pass emits one bit (the most significant

bit of each coefficient not so far emitted) for each coefficient which has been found

significant in the previous significance passes. The subordinate pass is therefore similar to

bit-plane coding.

There are several important features to note. Firstly, it is possible to stop the

compression algorithm at any time and obtain an approximation of the original image, the

greater the number of bits received, the better the image. Secondly, due to the way in





Figure 4.1.1: Block Diagram Of EZW

which the compression algorithm is structured as a series of decisions, the same

algorithm can be run at the decoder to reconstruct the coefficients, but with the decisions

being taken according to the incoming bit stream. In practical implementations, it would

be usual to use an entropy code such as arithmetic code to further improve the

performance of the dominant pass. Bits from the subordinate pass are usually random

enough that entropy coding provides no further coding gain.

4.1.1 Implementation of EZW Algorithm


Embedded Zero tree algorithm is a simple yet powerful algorithm having the

property that the bits in the stream are generated in the order of their importance. The first

step in this algorithm is setting up an initial threshold.

Any coefficient in the wavelet is said to be significant if its absolute value is

greater than the threshold. In a hierarchical sub-band system, every coefficient is spatially

related to a coefficient in the lower band. Such coefficients in the higher bands are called

‘descendants’. This is shown in Figure 4.1.2.

Figure 4.1.2: Hierarchial Sub-Band System

If a coefficient is significant and positive, then it is coded as ‘positive significant’

(ps). If a coefficient is significant and negative, then it is coded as ‘negative significant’

(ns). If a coefficient is insignificant and all its descendants are insignificant as well, then

it is coded as ‘zero tree root’ (ztr). If a coefficient is insignificant and all its descendants

are not insignificant, then it is coded as ‘insignificant zero’ (iz).The algorithm involves

two passes – Dominant pass and Subordinate pass.

In the dominant pass, the initial threshold is set to one half of the maximum pixel

value. Subsequent passes have threshold values one half of the previous threshold. The

coefficients are then coded as ps, ns, iz or ztr according to their values. The important part


is that if a coefficient is a zerotree root, then the descendants need not be encoded. Thus

only the significant values are encoded.

In the subordinate pass, those coefficients which were found significant in the

dominant pass are quantized based on the pixel value. In the first pass, the threshold is

half of the maximum magnitude, so the interval is divided into two and the subordinate

pass codes a 1 if the coefficient is in the upper half of the interval and codes a 0 if the co-

efficient is in the lower half of the interval. Thus if the number of passes is increased the

precision of the co-efficient is increased. This is the first algorithm that is implemented in

the compression coding.

4.2 Huffman Coding

In computer science and information theory, Huffman coding is an entropy

encoding algorithm used for lossless data compression. The term refers to the use of

a variable-length code table for encoding a source symbol (such as a character in a file)

where the variable-length code table has been derived in a particular way based on the

estimated probability of occurrence for each possible value of the source symbol. It was

developed by David A. Huffman while he was a Ph.D.student at MIT, and published in

the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".

Huffman coding uses a specific method for choosing the representation for each

symbol, resulting in a prefix code (sometimes called "prefix-free codes", that is, the bit

string representing some particular symbol is never a prefix of the bit string representing

any other symbol) that expresses the most common characters using shorter strings of bits

than are used for less common source symbols. Huffman was able to design the most

efficient compression method of this type: no other mapping of individual source symbols

to unique strings of bits will produce a smaller average output size when the actual


symbol frequencies agree with those used to create the code. A method was later found to

do this in linear time if input probabilities (also known as weights) are sorted.

Figure 4.2.1 :Huffman Coding

4.2.1 Basic Technique

The technique works by creating a binary tree of nodes. These can be stored in a
regular array, the size of which depends on the number of symbols, n. A node can be
either a leaf node or an internal node. Initially, all nodes are leaf nodes, which contain
the symbol itself, the weight (frequency of appearance) of the symbol and optionally, a
link to a parent node which makes it easy to read the code (in reverse) starting from a leaf
node. Internal nodes contain symbol weight, links to two child nodes and the optional link
to a parent node. As a common convention, bit '0' represents following the left child and
bit '1' represents following the right child. A finished tree has up to n leaf nodes and n −
1 internal nodes. A Huffman tree that omits unused symbols produces the most optimal
code lengths. The process essentially begins with the leaf nodes containing the
probabilities of the symbol they represent, then a new node whose children are the 2
nodes with smallest probability is created, such that the new node's probability is equal to
the sum of the children's probability. With the previous 2 nodes merged into one node
(thus not considering them anymore), and with the new node being now considered, the
procedure is repeated until only one node remains, the Huffman tree.

The simplest construction algorithm uses a priority queue where the node with lowest
probability is given highest priority:

1. Create a leaf node for each symbol and add it to the priority queue.

2. While there is more than one node in the queue:

i. Remove the two nodes of highest priority (lowest probability) from

the queue


ii. Create a new internal node with these two nodes as children and
with probability equal to the sum of the two nodes' probabilities.

iii. Add the new node to the queue.

3. The remaining node is the root node and the tree is complete.

Since efficient priority queue data structures require O(log n) time per insertion, and a
tree with n leaves has 2n−1 nodes, this algorithm operates in O(n log n) time.

If the symbols are sorted by probability, there is a linear-time (O(n)) method to create
a Huffman tree using two queues, the first one containing the initial weights (along with
pointers to the associated leaves), and combined weights (along with pointers to the trees)
being put in the back of the second queue. This assures that the lowest weight is always
kept at the front of one of the two queues:

1. Start with as many leaves as there are symbols.

2. Enqueue all leaf nodes into the first queue (by probability in increasing
order so that the least likely item is in the head of the queue).

3. While there is more than one node in the queues:

i. Dequeue the two nodes with the lowest weight by examining the
fronts of both queues.

ii. Create a new internal node, with the two just-removed nodes as
children (either node can be either child) and the sum of their weights as
the new weight.

iii. Enqueue the new node into the rear of the second queue.

4. The remaining node is the root node; the tree has now been generated

It is generally beneficial to minimize the variance of codeword length. For example, a

communication buffer receiving Huffman-encoded data may need to be larger to deal
with especially long symbols if the tree is especially unbalanced. To minimize variance,
simply break ties between queues by choosing the item in the first queue. This
modification will retain the mathematical optimality of the Huffman coding while both
minimizing variance and minimizing the length of the longest character code.

4.2.2 Main Properties

The probabilities used can be generic ones for the application domain that are
based on average experience, or they can be the actual frequencies found in the text being
compressed. (This variation requires that a frequency table or other hint as to the


encoding must be stored with the compressed text; implementations employ various tricks
to store tables efficiently.)

Huffman coding is optimal when the probability of each input symbol is a

negative power of two. Prefix codes tend to have slight inefficiency on small alphabets,
where probabilities often fall between these optimal points. "Blocking", or expanding the
alphabet size by coalescing multiple symbols into "words" of fixed or variable-length
before Huffman coding, usually helps, especially when adjacent symbols are correlated
(as in the case of natural language text). The worst case for Huffman coding can happen
when the probability of a symbol exceeds 2−1 = 0.5, making the upper limit of inefficiency
unbounded. These situations often respond well to a form of blocking called run-length
encoding; for the simple case of Bernoulli processes, Golomb coding is a provably
optimal run-length code.

Arithmetic coding produces slight gains over Huffman coding, but in practice
these gains have seldom been large enough to offset arithmetic coding's higher
computational complexity and patent royalties.

4.2.3 Applications

Arithmetic coding can be viewed as a generalization of Huffman coding; indeed,

in practice arithmetic coding is often preceded by Huffman coding, as it is easier to find
an arithmetic code for a binary input than for a nonbinary input. Also, although arithmetic
coding offers better compression performance than Huffman coding, Huffman coding is
still in wide use because of its simplicity, high speed and lack of encumbrance by patents.

Huffman coding today is often used as a "back-end" to some other compression

method. DEFLATE and multimedia codecs such as JPEG and MP3 have a front-end
model and quantization followed by Huffman coding.

4.3 Arithmetic Coding

Arithmetic coding is a compression technique that encodes data (the data string)

by creating a code string which represents a fractional value on the number line between

0 and 1. The coding algorithm is symbol wise recursive; i.e., it operates upon and encodes

(decodes) one data symbol per iteration or recursion. On each recursion, the algorithm


successively partitions an interval of the number line between 0 and 1, and retains one of

the partitions as the new interval. Thus, the algorithm successively deals with smaller

intervals, and the code string, viewed as a magnitude, lies in each of the nested intervals.

The data string is recovered by using magnitude comparisons on the code string to

recreate how the encoder must have successively partitioned and retained each nested

subinterval. Arithmetic coding differs considerably from the more familiar compression

coding techniques, such as prefix (Huffman) codes.

4.3.1 Compression systems

The notion of compression systems captures the idea that data may be transformed

into something which is encoded, then transmitted to a destination, then transformed back

into the original data. Any data compression approach, whether employing arithmetic

coding, Huffman codes, or any other coding technique, has a model which makes some

assumptions about the data and the events encoded.

The code itself can be independent of the model. Some systems which compress

waveforms (eg, digitized speech) may predict the next value and encode the error. In this

model the error and not the actual data is encoded. Typically, at the encoder side of a

compression system, the data to be compressed feed a model unit. The model determines

1) the event to be encoded, and 2) the estimate of the relative frequency (probability) of

the events. The encoder accepts the event and some indication of its relative frequency

and generates the code string.

A simple model is the memoryless model, where the data symbols themselves are

encoded according to a single code. Another model is the first-order Markov model,

which uses the previous symbol as the context for the current symbol. Consider, for

example, compressing English sentences. If the data symbol (in this case, a letter) “q” is

the previous letter, we would expect the next letter to be “u.” The first-order Markov


model is a dependent model; we have a different expectation for each symbol (or in the

example, each letter), depending on the context. The context is, in a sense, a state

governed by the past sequence of symbols. The purpose of a context is to provide a

probability distribution, or statistics, for encoding (decoding) the next symbol.

Corresponding to the symbols are statistics. To simplify, consider a single-context model,

i.e., the memoryless model. Data compression results from encoding the more frequent

symbols with short code-string length increases, and encoding the less-frequent events.

with long code length increases.

Most of the data compression methods in common use today fall into one of two

camps: dictionary based schemes and statistical methods. In the world of small systems,

dictionary based data compression techniques seem to be more popular at this time.

However, by combining arithmetic coding with powerful modeling techniques, statistical

methods for data compression can actually achieve better performance.

4.3.2 Arithmetic Coding: how it works

It has only been in the last ten years that a respectable candidate to replace

Huffman coding has been successfully demonstrated: Arithmetic coding. Arithmetic

coding completely bypasses the idea of replacing an input symbol with a specific code.

Instead, it takes a stream of input symbols and replaces it with a single floating point

output number. The longer (and more complex) the message, the more bits are needed in

the output number. It was not until recently that practical methods were found to

implement this on computers with fixed sized registers. The output from an arithmetic

coding process is a single number less than 1 and greater than or equal to 0. This single
Character Probability
number can be uniquely decoded to create the --------
exact stream of symbols that went into its
SPACE 1/10
A 1/10
construction. In order to construct the output number, the symbols being encoded have to
B 1/10
E to them. For example,
have a set probabilities assigned 1/10 if we are going to encode the
G 1/10
I 1/10
L 2/10
S 1/10
T 1/10

random message "BILL GATES" we would have a probability distribution that looks like


Once the character probabilities are known, the individual symbols need to be

assigned a range along a "probability line", which is nominally 0 to 1. It does not matter

which characters are assigned which segment of the range, as long as it is done in the

same manner by both the encoder and the decoder. The nine character symbol set use here

would look like this:

Character Probability Range

--------- ----------- -----------
SPACE 1/10 0.00 - 0.10
A 1/10 0.10 - 0.20
B 1/10 0.20 - 0.30
E 1/10 0.30 - 0.40
G 1/10 0.40 - 0.50
I 1/10 0.50 - 0.60
L 2/10 0.60 - 0.80
S 1/10 0.80 - 0.90
T 1/10 0.90 - 1.00

Each character is assigned the portion of the 0-1 range that corresponds to its

probability of appearance. Note also that the character "owns" everything up to, but not

including the higher number. So the letter 'T' in fact has the range 0.90 - 0.9999....

The most significant portion of an arithmetic coded message belongs to the first

symbol to be encoded. When encoding the message "BILL GATES", the first symbol is

"B". In order for the first character to be decoded properly, the final coded message has to

be a number greater than or equal to 0.20 and less than 0.30. What we do to encode this

number is keep track of the range that this number could fall in. So after the first character

is encoded, the low end for this range is 0.20 and the high end of the range is 0.30.

After the first character is encoded, we know that our range for the output number

is now bounded by the low number and the high number. What happens during the rest of

the encoding process is that each new symbol to be encoded will further restrict the

possible range of the output number. The next character to be encoded, 'I', owns the range

0.50 through 0.60. If it was the first number in the message, we would set low and high

range values directly to those values. But 'I' is the second character. So what we do

instead is say that 'I' owns the range that corresponds to 0.50-0.60 in the new subrange of

0.2 - 0.3. This means that the new encoded number will have to fall somewhere in the

50th to 60th percentile of the currently established range. Applying this logic will further

restrict our number to the range 0.25 to 0.26. The algorithm to accomplish this for a

message of any length is shown below:

Set low to 0.0

Set high to 1.0

While there are still input symbols do

get an input symbol

code range = high - low.

high = low + range*high range(symbol)

low = low + range*low range(symbol)

End of While

output low

Following this process through to its natural conclusion with our chosen message looks

like this:


New Character Low value High Value

------------- --------- ----------
0.0 1.0
B 0.2 0.3
I 0.25 0.26
L 0.256 0.258
L 0.2572 0.2576
SPACE 0.25720 0.25724
G 0.257216 0.257220
A 0.2572164 0.2572168
T 0.25721676 0.2572168
E 0.257216772 0.257216776
S 0.2572167752 0.2572167756

So the final low value, 0.2572167752 will uniquely encode the message "BILL

GATES" using the present encoding scheme.

Given this encoding scheme, it is relatively easy to see how the decoding process

will operate. We find the first symbol in the message by seeing which symbol owns the

code space that the encoded message falls in. Since the number 0.2572167752 falls

between 0.2 and 0.3, we know that the first character must be "B". We then need to

remove the "B" from the encoded number. Since we know the low and high ranges of B,

we can remove their effects by reversing the process that put them in. First, we subtract

the low value of B from the number, giving 0.0572167752. Then we divide by the range

of B, which is 0.1. This gives a value of 0.572167752. We can then calculate where that

lands, which is in the range of the next letter, "I".

The algorithm for decoding the incoming number looks like this:

get encoded number


find symbol whose range straddles the encoded number

output the symbol

range = symbol low value - symbol high value


subtract symbol low value from encoded number

divide encoded number by range

until no more symbols

Note that we have conveniently ignored the problem of how to decide when there

are no more symbols left to decode. This can be handled by either encoding a special

EOF symbol, or carrying the stream length along with the encoded message.

The decoding algorithm for the "BILL GATES" message will proceed something like


Encoded Number Output Symbol Low High Range

-------------- ------------- --- ---- -----
0.2572167752 B 0.2 0.3 0.1
0.572167752 I 0.5 0.6 0.1
0.72167752 L 0.6 0.8 0.2
0.6083876 L 0.6 0.8 0.2
0.041938 SPACE 0.0 0.1 0.1
0.41938 G 0.4 0.5 0.1
0.1938 A 0.2 0.3 0.1
0.938 T 0.9 1.0 0.1
0.38 E 0.3 0.4 0.1
0.8 S 0.8 0.9 0.1

In summary, the encoding process is simply one of narrowing the range of

possible numbers with every new symbol. The new range is proportional to the

predefined probability attached to that symbol. Decoding is the inverse procedure, where

the range is expanded in proportion to the probability of each symbol as it is extracted.


Chapter 5


5.1 Introducing Mex-Files

We can call C or Fortran subroutines from MATLAB as if they were built-in

functions. MATLAB callable C and Fortran programs are referred to as MEX-files.

MEX-files are dynamically linked subroutines that the MATLAB interpreter can

automatically load and execute.

MEX-files have several applications:

• Large pre-existing C and Fortran programs can be called from MATLAB without

having to be rewritten as M-files.

• Bottleneck computations (usually for-loops) that do not run fast enough in

MATLAB can be recoded in C or Fortran for efficiency.

5.1.1 Using Mex-Files


MEX-files are subroutines produced from C or Fortran source code. They behave

just like M-files and built-in functions. While M-files have a platform-independent

extension, .m, MATLAB identifies MEX-files by platform-specific extensions.

We can call MEX-files exactly as we would call any M-function. For example, a

MEX-file called conv2.mex on your disk in the MATLAB datafun toolbox directory

performs a 2-D convolution of matrices. conv2.m only contains the help text

documentation. If we invoke the function conv2 from inside MATLAB, the interpreter

looks through the list of directories on MATLAB’s search path. It scans each directory

looking for the first occurrence of a file named conv2 with the corresponding filename

extension or .m. When it finds one, it loads the file and executes it. MEX-files take

precedence over M-files when like-named files exist in the same directory. However, help

text mex -setup is still read from the .m file.

Please choose your compiler for building external interface (MEX) files:
5.1.2 Running Mex Files
Would you like mex to locate installed compilers [y]/n? y

a compiler:
following commands are followed to compile mex files used in a matlab
[1] Digital Visual Fortran version 6.0 in C:\Program Files\Microsoft Visual Studio
[2] Lcc C version 2.4 in C:\MATLAB7\sys\lcc

[3] Microsoft Visual C/C++ version 6.0 in C:\Program Files\Microsoft Visual Studio

[0] None

Compiler: 2

Please verify your choices:

Compiler: Lcc C 2.4

Location: C:\MATLAB7\sys\lcc

Are these correct?([y]/n): y

Try to update options file: C:\Documents and Settings\Administrator\Application


From template: C:\MATLAB7\BIN\WIN32\mexopts\lccopts.bat

Done . . .

Then to run a mex file, the file name is preceded by the keyword mex and

appropriate extention is used. For example, to run the mex file dominant_pass_c we use

the following command:

>> mex dominant_pass_c.c

5.2 Flowchart For Compression:


5.3 Embedded Zero-Tree Wavelet :


5.4 Dominant Pass :


5.5 Subordinate Pass :


5.6 Huffman Coding :



5.6 Arithmetic Coding :


5.8 Decompression :


Chapter 6

In this project we have compressed an image using the JPEG2000 Compression

technique. This process includes obtaining the wavelet coefficients of the image and then

encoding the resulting co-efficient using EZW algorithm, we then Huffman and

arithmetic encode the resulting sequence.

The compression ratios obtained when given a threshold of 2048 are:-

Huffman coding - 7281.8

Arithmetic coding - 5041.2

The Figure below compares the uncompressed original image and the compressed

output image obtained after execution of the matlab code.

(a) (b)

Figure 6.1: (a) Original Image (b) Compressed Image


Chapter 7
Transform coding forms an integral part of compression techniques. The

advantage of using a transform is that it packs the data into a lesser number of

coefficients. The main purpose of using the transform is thus to achieve energy

compaction. The type of transform used varies between different codecs.

Discrete Cosine Transform (DCT) is a commonly used in different compression

techniques. DCT closely resembles the optimal transform in terms of performance. The

DCT based compression techniques are efficient. However, it has its own shortcomings.

While good compression is achieved by this method, it does not achieve high

compression with lesser distortions. In order to achieve this Wavelet transforms are used

in latest standards.

In this project, an Embedded Zero-tree Wavelet Encoder was developed for JPEG

2000 image and its performance is studied. It can be seen that EZW has a marginally

better performance than the JPEG like encoder. The EZW scheme is marginally better

than the DCT scheme.


Chapter 8
It can be seen that for very small block sizes and very large block sizes the visual

distortions are more pronounced. In case of smaller images the distortions may be due to

the size of the header details needed to be added for each block, which would be

significant compared to the information of the block itself. In case of large block size the

initial threshold is high and hence needs a lot of passes to achieve significant amount of

visual quality.

The embedded block coding algorithm achieves excellent compression

performance, usually higher than that of EZW with arithmetic coding, but in some cases

substantially higher. The algorithm utilizes the same low complexity binary arithmetic

coding engine. Together with careful design of the bit-plane coding primitives, this

enables comparable execution speed to that observed with the simpler EZW without

arithmetic coding. The coder offers additional advantages including memory locality,

spatial random access and ease of geometric manipulation.



[1] Jonathan E. Turner, “A Beginner’s Guide to PACS”, 2002

[2] Michael J. Gormish, Daniel Lee, Michael W. Marcellin, “JPEG 2000:


[3] M. Boliek, S. Houchin, G. Wu, “JPEG 2000 Next Generation Image Compression

System: Features and Syntax,” Proceedings ICIP-2000, Sept. 2000.

[4] Lecture Notes on Wavelet Transforms, Ambikairajah E., (University of New

South Wales, 2000).

[5] G. Kaiser, A Friendly Guide to Wavelets, Birkhauser, Boston, 1994.

[6] Arno Swart, “An introduction to JPEG2000-like compression using MATLAB”,

28th October 2003.

[7] Balaji Vasan Srinivasan, “Image Compression Using Embedded Zero-tree

WaveletCoding (EZW)”, 2003.

[8] Thomas Stahlbuhk, Hourieh Fakourfar, “The Embedded Zerotree Wavelet

Algorithm”, 2002.

[9] D.VIJENDRA BABU, Dr.N.R.ALAMELU, “Wavelet Based Medical Image

Compression Using ROI EZW”, International Journal of Recent Trends in

Engineering, Vol 1, No. 3, May 2009.