You are on page 1of 291


of National Conference on
VLSI for
Communication, Computation and

VCCC’ 08

15th March

Mrs. C. Kezi Selva Vijila
(Head, Department of ECE)
Mrs. G. Josemin Bala
(Assistant Professor, Department of ECE )

Organized by
Department of Electronics and Communication Engineering

(Declared as Deemed to be University under Sec. 3 of the UGC Act,1956)
Coimbatore, Tamilnadu.

Dr. Paul Dhinakaran
Karunya University, Coimbatore


Dr. S. Arumugam
Additional Director, DOTE, Chennai.

Dr. V. Palaniswami
Principal, GCT, Coimbatore.

Dr. A. Ebenezer Jeyakumar

Principal, GCE, Salem.

Dr.E. Kirubakaran
BHEL, Tiruchirappalli.


Chairman : Dr. Paul P. Appasamy

Vice Chancellor,
Karunya University, Coimbatore

Vice Chairman : Dr. Anne Mary Fernandez

Karunya University, Coimbatore

Convenor : Mrs.C.Kezi Selva Vijila,

HOD, Department of ECE.
Karunya University, Coimbatore

Co-Convenor : Mrs. G. Josemin Bala

Asst.Proffessor, Department of ECE.
Karunya University, Coimbatore
Dr. Easter Selvan
Ms. Shanthini Pandiaraj
Mr. Albert Rajan
Mr. Shanty Chacko
Mr. Abraham Chandy
Mr. Karthigai Kumar
Mrs. Nesasudha
Ms. Rahimunnisa
Mr. Jude Hemanth


Mrs.D.Jackuline Moni
Mrs.T.Anita JonesMary
Mrs.S.Sridevi Sathyapriya
Mr.N.Satheesh Kumar
Mrs.Jennifer S. Raj
Mr.S.Immanuel Alex
Ms.J.Grace Jency Gananammal
Ms.F.Agi Lydia Prizzi
Mr.J.Samuel Manoharan
Mr.D.Narain Ponraj
Mrs.G.Shine Let
Ms.Linda paul
Ms.Cynthia Hubert
Ms.Anu Merya Philip
Mr.Arul Rajkumar
Mr.Wilson Christopher
Mr.Manohar Livingston

Editors : Mrs.C. Kezi SelvaVijila

(Head, Department of ECE)

Mrs.G. Josemin Bala

(Assistant Professor, Department of ECE )

Staff Co-Ordinators : Mrs.K.Rahimunnisa

(Senior Lecturer, Department of ECE )

Mr.A. Amir Anton Jone

(Lecturer, Department of ECE )

Student Co-Ordinators : II ME - Tintu Mol

IV Yr - Lisbin

II Yr - Nixon
S.Arock Roy
I.Kingsly Jeba
J.John Christo
M.Muthu Kannan
D.Arun Premkumar

Karunya University (Declared as Deemed to be University under

Sec. 3 of the UGC Act, 1956 ) is located 25 kms away from Coimbatore

in the very bosom of Mother Nature. The University is surrounded by

an array of green-clad, sky scraping mountains of the Western Ghats.

The Siruvani river with its crystal clear water has its origin here and it

is nature's boon to Coimbatore.

KITS is thus set in a natural environment ideal for a residential

institution. During leisure time, one can feast on the mountains, the

skies with its rainbow colours and the horizon. One with an aesthetic

sense will not miss the waters trickling down the hills, the birds that

sing sweetly on the trees, the cool breeze and the drizzles. One cannot

but wonder at the amazing craftsmanship of God Almighty.


The origin of the Institution is still amazing. In the year 1981,

Dr. D. G. S. Dhinakaran, God's servant received the divine

commission to start a Technical University which could turn out

outstanding engineers with leadership qualities at the national and

global level. Building up such a great Institution was no easy task.

The Dhinakarans had to face innumerable trials and tribulations

including the tragic death of their dear daughter during the course of

this great endeavor. But nothing could stop them from reaching the


In response to the divine command Dr. D. G. S. Dhinakaran

received from the Lord Almighty, the Institute was established with

the vision of turning out of its portals engineers excelling both in

academics and values. They will be total persons with the right

combination of academic excellence, personality development and

spiritual values.


To provide the youth with the best opportunities and

environment for higher education and research in Sciences and

Technology and enable them to attain very high levels of academic

excellence as well as scientific, technical and professional


To train and develop students to be good and able citizens

capable of making significant contribution towards meeting the

developmental needs and priorities of the nation.

To inculcate in students moral values and leadership qualities

to make them appreciate the need for high ethical standards in

personal, social and public life and reach the highest level of

humanism, such that they shall always uphold and promote a high

social order and be ready and willing to work for the emancipation of

the poor, the needy and the under-privileged.


The department of electronics and Communication

Engineering was established in the year 1986.It is very well equipped

with highly commendable facilities and is effectively guided by a set of

devoted and diligent staff members. The department offers both

Under Graduate and Post Graduate programmes (Applied Electronics

and VLSI Design).The department has 39 teaching faculty,6 technical

assistants and an office assistant. It has 527 students in UG

programme and 64 students in PG programme. The department has

been awarded ‘A’ grade by National Board of Accreditation.


The mission of the department is to raise Engineers and

researchers with technical expertise on par with International

standards, professional attitudes and ethical values with the ability to

apply acquired knowledge to have a productive career and empowered

spiritually to serve humanity.


To undertake research in telemedicine and signal processing

there by opening new avenues for mass funded projects.

To meet the diverse needs of student community and to

contribute to society trough placement oriented training and technical


Inculcating moral, social, &spiritual values through charity

&outreach programs.


The department has fully furnished class rooms with e-learning

facility, Conference hall with video conferencing and latest teaching

aid. The department laboratories are equipped with highly

sophisticates equipments like digital storage oscilloscope, Lattice ISP

expert System, FPGA Training SPARTAN,ADSP 2105,trainer,Antenna

Training system, Transmission Line Trainer and Analyzer, Spectrum

Analyzer(HAMEG),Fiber Optic Transmitting and Receiving Units.

The department laboratories utilize the latest advanced

software like Mentor Graphics, MATLAB Lab view, Tanner tools, FGPA

Advantage 6.3 LS, Microsim 8.0,VI 5416 Debugger.


Research oriented teaching with highly qualified faculty and

experts from the industries

Excellent placement for both UG and PG and students in

various reputed companies like VSNL, HAL, DRDO, BSNL, WIPRO,


Hands on practice for the students in laboratories equipped

with sophisticated equipments and advanced softwares.

Centers of Excellence in signal processing ,medical image

processing, and VLSI faculty and students.

Funded projects from AICTE in VLSI systems and

communication field. Effective research forums to work on current

research areas Industrial training in industry during vacations for all

students Advanced software facilities to design, develop and

implement Electronic s Systems.




Organizing Committee

Advisory Committee

Profile of the University

Profile of Department of ECE



VL 01. An FPGA-Based Single-Phase Electrical Energy Meter 1

Binoy B. Nair, P. Supriya
Amrita Vishwa Vidhya Peetham, Coimbatore

VL 02. A Multilingual, Low Cost FPGA Based Digital Storage 7

Binoy B Nair, Sreeram, Srikanth, Srivignesh
Amrita Vishwa Vidhya Peetham, Coimbatore

VL 03. Design Of Asynchronous NULL Convention Logic FPGA 10

R.Suguna, S.Vasanthi M.E., (Ph.D)
K.S.Rangasamy College of technology, Tiruchengode

VL 04. Development of ASIC Cell Library for RF Applications 16

K.Edet Bijoy, Mr.V.Vaithianathan
SSN College of Engineering, Chennai

VL 05. A High-Speed Clustering VLSI Processor Based on the 22

Histogram Peak-Climbing Algorithm
I.Poornima Thangam, M.Thangavel
K.S.Rangasamy College of technology, Tiruchengode

VL 06. Reconfigurable CAM- Improving The Effectiveness Of Data 28

Access In ATM Networks
C. Sam Alex , B.Dinesh, S. Dinesh kumar
JAYA Engineering College,Thiruninravur , Near Avadi, Chennai

VL 07. Design of Multistage High Speed Pipelined RISC Architecture 35

Manikandan Raju, Prof.S.Sudha
Sona College of Technology, Salem
VL 08. Monitoring of An Electronic System Using Embedded 39
N.Sudha , Suresh, R.Norman,
SSN College of Engineering, Chennai

VL 09. The Design of a Rapid Prototype Platform for ARM Based 42

Embedded Systems
A.Antony Judice1 IInd M.E. (Applied Electronics),
SSN College of Engineering, Chennai
Mr.Suresh R Norman,Asst.Prof.,
SSN College of Engineering, Chennai

VL 10. Implementation of High Throughput and Low Power FIR Filter 49

V.Dyana Christilda B.E*, R.Solomon Roach
Francis Xavier Engineering College, Tirunelveli

VL 11. n x Scalable Stacked MOSFET for Low Voltage CMOS 54

M.Jeyaprakash, T.Loganayagi
Sona College of Technology, Salem

VL 12. Test Pattern Selections Algorithms Using Output Deviations 60

S.Malliga Devi,Lyla,B.Das,S.Krishna Kumar
NIT, Calicut, Student IEEE member

VL 13. Fault Classification Using Back Propagation Neural Network 64

For Digital To Analog Converter
B.Mohan*,R. Sundararajan * J.Ramesh** and Dr.K.Gunavathi
PSG College of Technology Coimbatore

VL 14. Testing Path Delays in LUT based FPGAs 70

R.Usha, Mrs.M.Selvi
Francis Xavier Engineering College, Tirunelveli

VL 15. VLSI Realisation of SIMPPL Controller SOC for Design Reuse 76

Tressa Mary Baby John
Karunya University, Coimbatore

VL 16. Clock Period Minimization of Edge Triggered Circuit 82

Anitha.A, D.Jacukline Moni, S.Arumugam
Karunya University, Coimbatore

VL 17. VLSI Floor Planning Based on Hybrid Particle Swarm 87

Optimization (HPSO)
D.Jackuline Moni,
Karunya University, Coimbatore
Bannariamman educational trust…

VL 18. Development Of An EDA Tool For Configuration Management 91

Of FPGA Designs
Anju M I , F. Agi Lydia Prizzi, K.T. Oommen
Karunya University, Coimbatore
VL 19. A BIST for Low Power Dissipation 95
Rohit Lorenzo, A. Amir Anton Jone
Karunya University, Coimbatore

VL 20. Test Pattern Generation for Power Reduction using BIST 99

Anu Merya Philip,
Karunya University, Coimbatore

VL 21. Test Pattern Generation For Microprocessors Using Satisfiability 104

Format Automatically And Testing It Using Design For
Cynthia Hubert
Karunya University, Coimbatore

VL 22. DFT Techniques for Detecting Resistive Opens in CMOS 109

Latches and Flip-Flops
Reeba Rex.S
Karunya University, Coimbatore

VL 23. 2-D Fractal Array Design for 4-D Ultrasound Imaging 113
Mrs.C Kezi Selva Vijila,Ms Alice John
Karunya University ,Coimbatore




SPC 01. Secured Digital Image Transmission Over Network Using 118
Efficient Watermarking Techniques On Proxy Server
Jose Anand, M. Biju, U. Arun Kumar
JAYA Engineering College, Thiruninravur, Chennai 602024

SPC 02. Significance of Digital Signature & Implementation through 123

RSA Algorithm
R.Vijaya Arjunan
ISTE: LM -51366
Aarupadai Veedu Institute of Technology, Chennai

SPC 03. A Survey On Pattern Recognition Algorithms For Face 128

N.Hema, C.Lakshmi Deepika
PSG College of Technology, Coimbatore

SPC 04. Performance Analysis Of Impulse Noise Removal Algorithms 132

For Digital Images
K.Uma, V.R.Vijaya Kumar
PSG College of Technology, Coimbatore
SPC 05. Confidentiality in Composition of Clutter Images 135
G.Ignisha Rajathi, M.E -II Year ,Ms.S.Jeya
Francis Xavier Engg College , Tirunelveli

SPC 06. VHDL Implementation Of Lifting Based Discrete Wavelet 139

M.Arun Kumar, C.Thiruvenkatesan
SSN College of Engineering, Chennai

SPC 07. VLSI Design Of Impulse Based Ultra Wideband Receiver For 142
Commercial Applications
G.Srinivasa Raja, V.Vaithianathan
SSN College of Engineering, Chennai

SPC 08. Distributed Algorithms for Energy Efficient Routing in Wireless 147
Sensor Networks
T.Jingo M.S.Godwin Premi, K.S.Shaji
Sathyabama university, Chennai


SPC 09. Decomposition Of EEG Signal Using Source Seperation 152

Kiran Samuel
Karunya University, Coimbatore

SPC 10. Segmentation of Multispectral Brain MRI Using Source 157

Separation Algorithm
Krishnendu K
Karunya University, Coimbatore

SPC 11. MR Brain Tumor Image Segmentation Using Clustering 162

Lincy Annet Abraham,D. Jude Hemanth
Karunya University, Coimbatore

SPC 12. MRI Image Classification Using Orientation Pyramid and 166
Multiresolution Method
R.Catharine Joy, Anita Jones Mary
Karunya University, Coimbatore

SPC 13. Dimensionality reduction for Retrieving Medical Images Using 170
J.W Soumya
Karunya University, Coimbatore
SPC 14. Efficient Whirlpool Hash Function 175
J.Piriyadharshini , D.S.Shylu
Karunya University, Coimbatore

SPC 15. 2-D Fractal Array Design For 4-D Ultrasound Imaging 181
Alice John, Mrs.C Kezi Selva Vijila
Karunya University, Coimbatore

SPC 16. PC Screen Compression for Real Time Remote Desktop Access 186
Jagannath.D.J,Shanthini Pandiaraj
Karunya University, Coimbatore

SPC 17. Medical Image Classification using Hopfield Network and 191
Principal Components
G.L Priya
Karunya University, Coimbatore

SPC 18. Delay Minimization Of Sequential Circuits Through Weight 195

S.Nireekshan kumar,Grace jency
Karunya University, Coimbatore

SPC 19. Analysis of MAC Protocol for Wireless Sensor Network 200
Jeeba P.Thomas,Mrs.M.Nesasudha, ,
Karunya University, Coimbatore

SPC 20. Improving Security and Efficiency in WSN Using Pattern Codes 204
Anu jyothy,Mrs.M.Nesasudha
Karunya University, Coimbatore



CC 01. Automatic Hybrid Genetic Algorithm Based Printed Circuit 208

Board Inspection
Mridula, Kavitha, Priscilla
Adhiyamaan college of Engineering, Hosur-635 109

CC 02. Implementation Of Neural Network Algorithm Using VLSI 212

B.Vasumathi, Prof.K.R.Valluvan
Kongu Engineering College, Perundurai

CC 03. A Modified Genetic Algorithm For Evolution Of Neural 217

Network in Designing an Evolutionary Neuro-Hardware
N.Mohankumar, B.Bhuvan, M.Nirmala Devi, Dr.S.Arumugam
NIT Calicut, Kerala
CC 04. Design and FPGA Implementation of Distorted Template Based 221
Time-of-Arrival Estimator for Local Positioning Application
Sanjana T S., Mr. Selva Kumar R, Mr. Cyril Prasanna Raj P
VLSI System Design Centre,
M S Ramaiah
School of Advanced Studies, Bangalore

CC 05. Design and Simulation of Microstrip Patch Antenna for Various 224
T.Jayanthy, A.S.A.Nisha, Mohemed Ismail, Beulah Jackson
Sathyabama university, Chennai.


CC 06. Motion Estimation Of The Vehicle Detection and Tracking 229

Karunya University, Coimbatore

CC07. Architecture for ICT (10,9,6,2,3,1) Processor 234

Karunya University, Coimbatore

CC08. Row Column Decomposition Algorithm For 2d Discrete 240

Cosine Transform
Caroline Priya.M.
Karunya University, Coimbatore

CC09. VLSI Architecture for Progressive Image Encoder 246

Resmi E,K.Rahimunnisa
Karunya University, Coimbatore

CC10. Reed Solomon Encoders and Decoders using Concurrent Error 252
Detection Schemes
Rani Deepika.B.J, K.Rahimunnisa
Karunya University, Coimbatore

CC11. Design of High Speed Architectures for MAP Turbo Decoders 258
Lakshmi .S.Kumar ,Mrs.D.Jackuline Moni
Karunya University, Coimbatore

CC12. Techonology Mapping Using Ant Colony Optimizaton 264

M.SajanDeepak, Jackuline Moni,
Karunya University, Coimbatore
Bannariamman educational trust…

An FPGA-Based Single-Phase Electrical Energy Meter

Binoy B. Nair, P. Supriya

Abstract- This paper presents the design and

development of a novel FPGA based single phase
energy meter which can measure power contained
in the harmonics accurately up to the 33rd
harmonic. The design presented in this paper has
an implementation of the booth multiplication
algorithm which provides a very fast means of
calculating the instantaneous power consumption.
The energy consumed is displayed using seven Fig.1 FPGA based energy meter block diagram
segment displays and a serial communication
interface for transmission of energy consumption This paper is divided into four sections; section II
to PC is also implemented, the drivers for which describes the method used for computing the
are implemented inside the FPGA itself. The electrical energy consumed, section III gives the
readings are displayed on the PC through an implementation details and the results are presented in
interface developed using Visual Basic section IV.
programming language.
Index Terms— FPGA, Energy meter Energy consumed is calculated by integrating the
instantaneous power values over the period of
consumption of energy.
The main types of electrical energy meters available (1)
in the market are the Ferraris meter, also referred to as Vn – instantaneous value of voltage.
an induction-type meter and microcontroller based In instantaneous value of current.
energy meters. However, a Ferraris meter has T – sampling time.
disadvantages such as creeping, limited voltage and The instantaneously calculated power is accumulated
current range, inaccuracies due to non ideal voltage and this accumulated value is compared with a
and current waveforms, high wear and tear due to constant that is equal to 0.01kWH and the display
moving parts [1].A wide variety of microcontroller updated once this constant is reached.
based energy meters are available in the market and Energy of 0.01kWH = 0.01* 1000* 3600 watt sec.
offer a significant improvement over an induction (2)
type energy meter [2].However, a microcontroller
based energy meter has the following disadvantages: 0.01*1000*3600 w- s = (Vn*In)* T (for n = 0 to
1. Power consumption is large when compared to N)
FPGA based meters.
2. All the resources of the microcontroller may not be 0.01 * 1000 * 3600 watt sec
= Σ (Vn * In)
made use of, resulting in wastage of resources and ∆T
money when large scale manufacture is concerned. (3)
An FPGA based energy meter not only provides all
the advantages offered by the microcontroller based When sampling time T = 0.29ms,
energy meter, but also offers additional advantages
0 . 01 * 1000 * 3600
such as lower power consumption and lesser space
0 . 29 * 10 − 3
= ∑ (V n I n ) = 124137931
requirements (as very little external circuitry is
required). An FPGA based energy meter can also be The multiplication factor for the potential transformer
reconfigured any number of times, that too at a very (PT) is taken to be 382.36 and for current transformer
short notice, thus making it ideal in cases where the (CT) it is taken as 9.83. The conversion factor of the
ADC for converting the 0 – 255 output into
user requirements and specifications vary with,
the actual scale of 0-5V for voltage and current is
time[3].The block diagram of the FPGA based energy
meter developed is given in Fig 1.


Therefore the constant value to be stored for the signal changes during the A/D conversion, the output
comparison should be equal to: digital value will be unpredictable. To overcome this,
the input voltage is sampled and held constant for the
124137931 * 51 * 51 ADC during the conversion. Two LF 398 ICs from
= 85905087 National Semiconductor have been used to sample
382.36 * 9.83
and hold the sampled values of voltage and current
during the A/D conversion. The working of a Sample
Thus, the constant 85905087 has been stored as meter
and Hold (SAH) circuit is illustrated in Fig.2.
constant. After reaching this value, the energy reading
displayed is incremented by 0.01 KWh.

The FPGA forms the core part of FPGA based energy
meter. But in addition to the FPGA, various other
hardware components were used to convert the
voltage and current inputs to digital form for
processing by the FPGA. The energy consumed must Fig.2 Working of SAH
be displayed and seven-segment displays were used
for the purpose. The consumed energy was The sampling frequency used was 3.45kHz which
transmitted to PC using RS-232 interface, which helps the user to accurately measure the power
required additional external circuitry. The hardware contained in the harmonics up to the 33rd harmonic.
details of the FPGA based single phase energy meter This significantly increases the accuracy of the energy
is provided in this section. The working of each meter and the meter can be used in environments
component too is presented in brief. where the presence of harmonics in the supply is
A. Sensing unit significant.
The function of the sensing unit is to sense the C. Field Programmable Gate Array (FPGA)
voltage and current through the mains and to convert The FPGA is the key unit of the energy meter
them into a 0-5 V signal which is then fed into the presented in this paper. It is programmed to perform
ADC. The sensing unit is composed of the current the following functions:
transformer, the potential transformer and the adder
circuit. 1) Find the product of the instantaneous values of
The potential transformer is used to step down the voltage and current to get the instantaneous
mains voltage to a fraction of its actual value, so that power.
it can be safely fed into the adder circuit. The current 2) Accumulate the power and compare the
transformer is used to detect the current flowing accumulated value to the meter constant.
through the mains. A burden resistance is used at the 3) When meter constant is exceeded, the energy
secondary side to convert the current into an consumed is incremented by 00.01 and
equivalent voltage signal as current cannot be directly displayed.
fed to the ADC. 4) Drive the seven-segment displays.
Two op-amps in the IC are used as voltage followers 5) Send the energy reading to PC via RS-232.
and the remaining two are configured as non-
inverting amplifiers with a gain of 2 and also act as The instantaneous power consumption is calculated
level shifters, adding a d.c voltage of 2.5 V to the using an implementation of the booth multiplier
input a.c signal, thus changing the a.c signal range algorithm. Booth multiplier algorithm provides a fast
from +2.5 V to -2.5 V to 0V to +5 V, as A/D means of multiplying the 8-bit values of voltage and
converter used can only operate from 0V to +5V current obtained from the ADC. The resultant value is
range [3]. the instantaneous power which can be of a maximum
B. Analog to Digital Conversion 17-bit length. These instantaneous power values are
accumulated and the accumulated value is compared
The basic function of an analog to digital (A/D) to the meter constant already stored in the FPGA.
converter is to convert an analog input to its binary Once that meter constant is exceeded, the display is
equivalent. ADC 0808, an 8-bit Successive incremented by 00.01 kWh, the accumulator gets
approximation A/D converter from National reset and the amount by which the accumulator
Semiconductor is employed for converting the reading exceeded the meter constant is loaded into the
sampled voltage and current signals into equivalent 8- accumulator. The meter constant is chosen to
bit binary values [4]. correspond to 00.01 kWh primarily due to limitations
A Sample And Hold (SAH) circuit is needed as the of the FPGA kit which is used for implementing the
input voltage keeps varying during A/D conversion. If energy meter. Now the next set of digital values for
a Sample and Hold circuit is not used, and the input


voltage and current are available at the input and the side. The RS-232 Level Converter used is MAX-232
process of power calculation and accumulation which generates +10V and -10V from a single 5v
repeats. supply. On the PC side, Microsoft Comm control in
The FPGA used for implementing the energy meter is Visual Basic is used to read and display the incoming
Spartan2 from Xilinx. The Hardware Description data from the FPGA.
Language (HDL) used for the purpose is VHDL[5].
D. Seven segment display
To display the total energy consumed, four seven-
segment displays are used and can be used to display The FPGA based single phase energy meter was
energy from 00.00 to 99.99 KW-hour. Each of the designed, simulated and implemented on a Spartan 2
displays needs a base drive signal for enabling it, and FPGA. The sensing circuit consisting of the op-amps
the seven segment equivalent of the digit it has to and the sample and hold ICs was implemented on a
display. The base drive is provided by the FPGA at printed circuit board. The results of simulation and
the rate of 0.25 MHz per display, at the same time it the test results are presented in this section.
sends the seven segment equivalent of the digit to that A. Simulation Results
display. Hence, all the four displays appear to be Simulation results for the adder circuit
displaying the digits simultaneously. The aim of the adder circuit, implemented using op-
E. Serial Communication Interface amps LM 324 is to shift the input a.c voltage
RS-232 (Recommended standard-232) is a standard (maximum allowed is 2.5Vmax) up by 2.5 V, so that
interface approved by the Electronic Industries the input is in the range 0-5V. This is required as the
Association (EIA) for connecting serial devices. ADC used is unipolar and can only convert signals in
Each byte of data is synchronized using it's start bit the range 0-5V to their 8-bit binary equivalent.
and stop bit. A parity bit can also be included as a The results of simulating the adder circuit are
means of error checking. Fig.3 shows the TTL/CMOS presented in Fig. 5. The input signal provided was an
serial logic waveform when using the common 8N1 a.c signal of 2.5Vmax and the output obtained was
format. 8N1 signifies 8 Data bits, No Parity and 1 same as input, but shifted up by 2.5V.
Stop Bit. The RS-232 line, when idle is in the Mark
State (Logic 1). A transmission starts with a start bit
which is (Logic 0). Then each bit is sent down the
line, one at a time. The LSB (Least Significant Bit) is
sent first. A Stop Bit (Logic 1) is then appended to the
signal to make up the transmission. The data sent
using this method, is said to be framed. That is the
data is framed between a Start and Stop Bit.

Fig.3 TTL/CMOS Serial Logic Waveform Fig. 5 Adder circuit simulation

The waveform in Fig. 3 is only relevant for the signal
immediately at the output of FPGA. RS-232 logic Simulation results for the sample and hold
levels uses +3 to +25 volts to signify a "Space" (Logic The ADC used for converting voltage and current
0) and -3 to -25 volts for a "Mark" (logic 1). Any signals into digital form was ADC 0808, which can
voltage in between these regions (ie between +3 and - do A/D conversion on only one channel at a time.
3 Volts) is undefined. Therefore this signal is put Hence the sampled values must be held till inputs on
through a RS-232 Level Converter. The signal present both channels (voltage and current signals are given
on the RS-232 port of a personal computer is shown to separate channels) are converted. An added
in Fig. 4. requirement was that the input signal should not
change during the A/D conversion process. Hence it
was essential that sample and hold circuits be used.
The result of simulating sample and hold circuit is
given in Fig.7. The sampling frequency used was
Fig.4 RS-232 Logic Waveform 3.45kHz.
The rate at which bits are transmitted (bits per
second) is called baud. Each piece of equipment has
its own baud rate requirement. A baud rate of 100 bits
per second is used in the design presented. This baud
is set both on the PC side as well as on the FPGA


product is negative and should be subtracted from the

accumulated value. Signals det_v and det_i check for
the polarity of the signals and the addition and
subtraction processes are triggered by these two
signals. The process of accumulation is shown in Fig.

Fig. 7 Sample and hold output with the sampling


Simulation results for the VHDL code

The VHDL code was simulated using ModelSim.
Since it is not possible to present the whole process of
energy computation in one figure, the individual
operations are presented as separate figures, in the
order in which they occur.
Fig. 8 shows the multiplication of voltage and current Fig. 9 Accumulation
signals taking place. The process of multiplication The process of updating the energy consumed is
takes place after every second end of conversion given in Fig. 10. Once the accumulator value exceeds
signal. As soon as the end_conv signal, indicating the the meter constant, a signal sumvi_gr_const goes
end of conversion goes high, the test data is read into high. This low to high transition triggers another
signal ‘current’ and ‘10000000’ is subtracted from it process which increments the energy consumed by
to yield a signal ‘i’. This is then multiplied with signal one unit, indicating a consumption of 0.01 kWh of
‘v’ obtained in the same manner as described for ‘i’ energy on the seven-segment display. The total
and the 16-bit product is prepended with 0s to make it energy consumed is indicated by four signals;
30-bit for addition with the 30-bit signal accumulator last_digit, third_digit, second_digit and first_digit . In
. Fig. 10, the energy consumed indicated initially is
After the end of conversion signal is received, the 13.77 kWh which then increments to 13.78 kWh. The
hold signal indicated by samp_hold, is made low to RS-232 interface transmits the ASCII equivalent of
start sampling again. the four signals through the output ‘bitout’ at a baud
rate of 100 bits/s.

Fig.8 Multiplication
The next process after multiplication is accumulation.
The process of accumulation is triggered after every Fig.10 Energy Updating
third end of conversion signal. The product obtained
has to be either added or subtracted from the B. Hardware Implementation Results
accumulated value, depending on weather the inputs Design overview
were of same polarity or of opposite polarity. When The design overview was generated using Xilinx ISE.
both ‘voltage’ and ‘current’ are positive ( i.e. greater It gives an overview of the resources utilized on the
than ‘10000000’) or both of them are negative( i.e. FPGA board on which the design is implemented.
less than ‘10000000’) , the product is positive and has The details such as the number of slice registers used
to be added to the accumulator. Otherwise, the as flip-flops, latches etc. can be found from the design


overview. Fig.11 presents the design overview for the

energy meter code implemented on the FPGA.

Fig.13 Pin Locking

Serial communication interface

The GUI for serial communication on the PC was
developed using Visual Basic. The data sent by the
FPGA was received and displayed on the PC. The
tariff chosen was 1 rupee per unit, but can be changed
Fig.11 Design overview by modifying the Visual Basic program. The GUI
RTL schematic form is shown in Fig.14.
The RTL schematic presents the design of the
implemented energy meter code after synthesis. The
top level schematic, showing only the inputs and the
outputs obtained after synthesis using Leonardo
Spectrum is presented in Fig.12.

Fig.12 top-level RTL schematic

Pin locking
The I/O pins for the FPGA were configured using
Xilinx ISE. A total of twenty pins were configured as
outputs, including those for control of ADC, Base
drive for seven-segment display, the data for seven-
segment display and the serial communication. A pin Fig.14 GUI from for serial communication
was configured exclusively to give sample/hold signal
to the LF 398 Sample and Hold IC. Eleven pins were The experimental setup used to implement and test
configured as inputs, including eight pins to detect the the design is shown in Fig.15.
ADC output, reset signal and the clock signal for the
FPGA. Fig.13 shows the pin locking and the location
of the pins on the board.


Fig.15 Experimental setup


[1] Kerry Mcgraw, Donavan Mumm, Matt Roode,

Shawn Yockey, The theories and modeling of the
kilowatt-hour meter, [online] Available:
[2] Anthony Collins, Solid state solutions for
electricity metrology, [Online] Available: http://
[3] Ron Manicini, Op-amps for everyone- Design
Reference, Texas Instruments, Aug. 2002
[4] Nicholas Gray, ABCs of ADCs: Analog-to-Digital
Converter Basics, National Semiconductor
Corporation, Nov. 2003.
[5] Douglas L.Perry, VHDL: Programming by
example, TMH Publishing Ltd., New Delhi ;4th Ed.,
[6] T.Riesgo,Y. Torroja, and E.Torre, “Design
Methodologies Based on Hardware Description
Languages”, IEEE Transactions on Industrial
Electronics, vol. 46, No. 1, pp. 3-12, Feb.1999.


A Multilingual, Low Cost FPGA Based

Digital Storage Oscilloscope
Binoy B.Nair, L.Sreeram, S.Srikanth, S.Srivignesh
Amrita Vishwa Vidhyapeetham, Ettimadai, Coimbatore – 641105,
Tamil Nadu, India.
Email addresses: [binoybnair, lsree87, srikanth1986, s.srivignesh]

also not restricted as it depends on the A/D converter one

Abstract--In a country like India, a Digital Storage
uses. Here we use an 8 bit A/D converter with 8 input
Oscilloscope is too costly an instrument for most of the
channels and hence it is possible to view up to 8 input
schools to use, as a teaching aid. Another problem
waveforms on the screen simultaneously. The different
associated with commercially available Digital Storage
waveforms can be viewed in different colors thereby
Oscilloscopes is that the user interface is usually in
reducing confusion.
English, which is not the medium of instruction in most
of the schools in rural areas. In this paper, the design
and implementation of an FPGA based Digital Storage
Oscilloscope is presented, which overcomes the above
The design presented can be considered to be made up
difficulties. The oscilloscope not only costs a fraction of of three modules, for ease of understanding. The analog
the commercially available Digital Storage signal conditioning, analog to digital conversion and its
Oscilloscopes, but has an extremely simple user interface to FPGA comprise the first module. The second
interface based on regional Indian languages. The module deals with the processing of the acquired digital
oscilloscope developed, is based on a Cyclone II FPGA. signal and the third module deals with presenting output in
Analog to Digital Converter interface developed, allows user understandable form on a VGA screen. The whole
usage of ADCs depending on consumers choice, system is to be implemented using VHDL.[8] The flow of
allowing customization of the oscilloscope. The VGA the process is shown in the Fig.1.
interface developed allows any VGA monitor to be used
as display.

Keywords - Digital Storage Oscilloscope, FPGA, VGA.


Oscilloscopes available today cost thousands of rupees.[11]

Moreover these oscilloscopes have less functionalities,
smaller displays and limited number of channels.[1],[10]
The major disadvantage of PC based oscilloscopes is
that they are PC based. So it is not portable and another
major disadvantage is it requires specialized software
packages to be installed in the PC.[2] These packages are
usually expensive and may not produce optimum
performance in low end PCs. Additional hardware like
Data Acquisition cards are also required.[3] But with this
design, there are no such problems as the PC is replaced
with FPGA and instead of data acquisition card, a low cost
op-amp based circuit is used for input signal conditioning
and commercially available A/D converter is used for
digitizing the signals. This results in significant cost
reduction with little difference in performance. The FPGA,
A/D converter and signal conditioning circuit together form Fig 1. Flowchart of FPGA based DSO
a single system, which is portable and any VGA monitor
can be used for display.
Functions like Fast Fourier Transform,
convolution, integration, differentiation and mathematical
operations like addition, subtraction, and multiplication of
signals are implemented.[5],[6] Since we are interfacing the
oscilloscope with VGA monitor, we have a larger display.
New functions can be easily added just by changing the
VHDL code. The number of input channels available is


A. Input Signal Conditioning and numbers. A sample grid for displaying letter ‘A’ on the
The input signal is scaled using op-amp based scaling screen is given in Fig.3.[4]
circuits to bring it to the voltage range accepted by the A/D
converter. After scaling this signal is fed to the A/D
converter which converts the signal to its digital
equivalent.[2] The control signals to A/D converter are
provided from the FPGA itself. The circuit designed draws
very little power thus minimizing loading effects. An
additional advantage of having the A/D converter as a
separate hardware unit is that any commercially available
A/D converter can be used depending on user requirement
with little or no change in the interface code.

B. Signal Processing
The digital values obtained after A/D conversion are
stored in a 640 X 8 bit RAM created inside FPGA. The Fig 3. Matrix values of letter “A”
control signals are sent to the memory for allowing data to
be written to it. The clock is fed to a counter that generates III. LABORATORY TESTS
the memory's sequential addresses. An option to store the
captured wave information is also provided through a flash A sample result of the laboratory tests is shown in
memory interface so that the information can be stored for Fig.4. In this sample test we will be displaying a sine wave
future reference. It also provides the user with the ability to of 2.5 kHz in second channel. The interface options are
log the waveform data for a duration, limited only by the displayed in Tamil language. Additionally maximum and
size of the flash memory used. minimum values of both the waves are also displayed. All
Additional functions like Integration, Differentiation, the waveforms are in different colors. So that it is easy to
Fast Fourier Transform and mathematical operations like differentiate between the waveforms. The waveforms are in
addition, subtraction and multiplication of signals are also the same color as that of the options available to avoid
implemented. Integration is done using Rectangular rule confusion between waves.
method. Differentiation is done using finite differentiation
method.[9] Fast Fourier Transform is implemented using
cordic algorithm. Addition, Subtraction and Multiplication
is done with basic operations like Ripple adder, 2’s
complement addition and Booth algorithm respectively.[7]
VGA Interface & Display
A VGA monitor interface was also developed and the
waveforms are displayed on the screen with 640X480
resolution with all the necessary signals such as horizontal
synchronization and vertical synchronization along with
RGB color information sent to the VGA monitor, being
generated by the FPGA.[4] The timing diagram of VGA
signals is shown in the Fig. 2.

Fig.4. Sample Laboratory Test Output


FPGA based Digital Storage Oscilloscope

presented here has many advantages like low-cost,
portability, availability of channels, 4096 colors to
Fig 2. Timing Diagram of Analog to Digital Convertor waveforms and a large display which helps in analyzing the
waveforms clearly with multiple regional language
The VGA display is divided into two parts. Upper part interactive interface. The user specifications of the
displays the wave and the lower part displays the menu and developed system have been set up in accordance with the
the wave parameters. One of the distinguishing features of requirements of common high school level science
the oscilloscope presented here is its ability to display the laboratories. The interface circuit hardware has been
menu and the wave information in Indian languages like developed with few affordable electronic components for
Tamil, Telugu, Malayalam or Hindi in addition to English. conversion and processing of the analog signal in the digital
Each of the character is generated by considering a grid of form before being acquired by the FPGA. The overall cost
8X8 pixels for Indian languages and 8X6 pixels for English


of the Digital Storage Oscilloscope presented here is [3] R.Lincke, I.Bijii, Ath.Trutia , V.Bogatu,
approximately USD 184. The system was successfully B.Logofzitu, “PC based Oscilloscopes”
tested with several forms of input waveforms, such as [4] A.N. Netravali P. Pirsch, “Character Display on
sinusoidal, square, and triangular signals. The developed CRT,”IEEE Transactions on Broadcasting, Vol.
system has possible expansion capacities in the form of Bc-29, No. 3, September 1983
additional signal processing modules. This DSO can be [5] IEEE Standard specification of general-purpose
used for real-time data acquisition in most common-purpose laboratory Cathode-Ray Oscilloscopes, IEEE
low-power- and low-frequency-range applications for high Transactions On Instrumentation And
school laboratories. It can also be used as an instructional Measurement, Vol. Im-19, No. 3, August 1970
tool for undergraduate data acquisition courses for [6] Oscilloscope Primer, XYZs of Oscilloscopes.
illustrating complex concepts concerning parallel port [7] Morris Mano,” Digital Design”.
programming, A/D conversion, and detailed circuit [8] Douglas Perry,”VHDL Programming by Example.”
development. The entire system is shown in Fig.5. [9] W.Cheney, D.Kincaid,”Numeric methods and
[10] Product Documentation of Tektronix and Aplab

Fig.5. Entire System of FPGA based DSO


[1] J. Miguel Dias Pereira, ”The History and

Technology of Oscilloscopes,”IEEE
Instrumentation & Measurement Magazine
[2] Chandan Bhunia, Saikat Giri, Samrat Kar,
Sudarshan Haldar, and Prithwiraj Purkait, “A low
cost PC based Oscilloscope,” IEEE Transactions
on Education, Vol. 47, No. 2, May 2004


Design of Asynchronous NULL Convention Logic FPGA

R.Suguna , S.Vasanthi M.E., (Ph.D).,
II M.E (VLSI Design)Senior Lecturer, ECE Department
K.S.Rangasamy College of technology, Tiruchengode.

Abstract— NULL Convention logic (NCL) is a self- will become asserted. In a THmn gate, each of the ‘n’
timed circuit in which the control is inherent in inputs is connected to the rounded portion of the gate.
each datum. There are 27 fundamental NCL gates. The output emanates from the pointed end of the gate
The author proposes a logic element in order to and the gate’s threshold value ‘m’ is written inside the
configure as any one of the NCL gate. Two gate. NCL circuits are designed using a threshold gate
versions of reconfigurable logic element are network for each output rail [3] (i.e., two threshold
developed for implementing asynchronous FPGA. gate networks would be required for a dual-rail signal
One with embedded registration logic and the D, one for D0 , and another for D1 ).
other without embedded registration logic. Both
versions can be configured as any one of the 27 Another type of threshold gate is referred to as a
fundamental NULL convention logic (NCL) gate. weighted threshold gate, denoted as THmnWw1w2…..
It includes resettable and inverting variations. wR. Weighted threshold gates have an integer value
Both can utilize embedded registration for gates m>wR>1 applied to inputR. Here1<R<n, where n is
with three or fewer inputs. The version with only the number of inputs, m is the gate’s threshold and
extra embedded registration can utilize gates with w1,w2,….. , wR, each>1, are the integer
four inputs. The above two approaches are
compared with existing approach showing that
both version developed herein yield more area
efficient NULL convention logic (NCL) circuit
Fig. 1. THmn threshold gate.
Index Terms—Asynchronous logic design, delay-
insensitive circuits, field-programmable gate array
(FPGA), NULL convention logic (NCL),
reconfigurable logic.

Fig. 2. TH34w2 threshold gate:
Though synchronous circuit design presently
dominates the semiconductor design industry, there
are major limiting factors to this design approach, Z = AB + AC + AD + BCD.
including clock distribution, increasing clock rates, weights of input1, input2, inputR, respectively. For
decreasing feature size, and excessive power example, consider the TH34W2 gate shown in Fig. 2,
consumption[6]. As a result of the problems whose n=4 inputs are labeled A, B, C and D . The
encountered with synchronous circuit design, weight of input A,W(A), is therefore 2. Since the
asynchronous design techniques have received more gate’s threshold is 3, this implies that in order for the
attention. One such asynchronous approach is NULL output to be asserted, either inputs B, C and D, must
Convention logic (NCL). NCL is a clock-free delay- all be asserted, or input A must be asserted along with
insensitive logic design methodology for digital any other input B, C or D. NCL threshold gates are
systems. The separation between data and control designed with hysteresis state holding capability, such
representations provides self-synchronization, without that all asserted inputs must be deasserted before the
the use of a clock signal. output is deasserted. Hysteresis ensures a complete
NCL is a self-timed logic paradigm in which transition of inputs back to NULL before asserting the
control is inherent in each datum. NCL follows the output associated with the next wavefront of input
so-called weak conditions of Seitz’s delay-insensitive data.
signaling scheme. NCL threshold gate variations include resetting THnn
and inverting TH1n gates. Circuit diagrams designate
II. NCL OVERVIEW resettable gates by either ‘d’ or ‘n’ appearing inside
the gate, along with the gate’s threshold. ‘d’ denotes
NCL uses threshold gates as its basic logic elements the gate as being reset to logic ‘1’ and ‘n’ to logic ‘0’.
[4]. The primary type of threshold gate, shown in Fig. Both resettable and inverting gates are used in the
1, is the THmn gate, where 1<m<n. THmn gates have design of delay insensitive registers [8].
single-wire inputs, where at least ‘m’ of the ‘n’
inputs must be asserted before the single wire output



This observability condition, also referred to as
NCL uses symbolic completeness of expression to indicatability or stability, ensures that every gate
achieve delay insensitive behavior [7]. A symbolically transition is observable at the output, which means
complete expression depends only on the that every gate that transitions is necessary to
relationships of the symbols present in the expression transition at least one of the outputs[5].
without reference to their time of evaluation [8]. In
particular, dual-rail and quad-rail signals, or other III. DESIGN OF A RECONFIGURABLE NCL
mutually exclusive assertion groups (MEAGs) [3] can LOGIC ELEMENT
incorporate data and control information into one
mixed-signal path to eliminate time reference. Fig. 4. shows a hardware realization [1] of a
reconfigurable NCL LE, consisting of reconfigurable
A dual-rail signal D consists of two mutually logic, reset logic, and output inversion logic. There
exclusive wires and which may assume any value are 16 inputs used specifically for programming the
from the set. Likewise, a quad-rail signal consists of gate: Rv, Inv, and Dp (14:1). Five inputs are only
four mutually exclusive wires that represent two bits. used during gate operation: A, B, C, D and rst. P is
For NCL and other circuits to be delay insensitive, used to select between programming and operational
they must meet the input completeness and mode. Z is the gate output; Rv is the value that Z
observability criteria [2]. will be reset, when rst is asserted during operational
mode .Inv determines if the gate output is inverted or
A. Input completeness not. During programming mode, Dp(14:1) is used to
program the LUT’s 14 latches in order to configure
In order for NCL combinational circuits to maintain the LE as a specific NCL gate. Addresses 15 and 0
delay-insensitivity, they must adhere to the are constant values and therefore do not need to be
completeness of input criterion [5], which requires programmed.
1. All the outputs of a combinational circuit may not A. Reconfigurable logic
transition from NULL to The reconfigurable logic portion consists of a 16-
DATA until all inputs have transitioned from NULL address LUT [1], shown in Fig. 3, and a pull-up/pull-
to DATA, and down (PUPD) function. The LUT contains 14 latches,
2. All the outputs of a combinational circuit may not shown in Fig. 4, and a pass transistor multiplexer
transition from DATA to NULL until all inputs have (MUX). When P is asserted (nP is deasserted), the Dp
transitioned from DATA to NULL. values are stored in their respective latch to configure
the LUT output to one of the 27 equations in Table I.
Table I Thus, only 14 latches are required because address 0
27 NCL fundamental gates is always logic ‘0’ and address 15 is always logic ‘1’
according to the 27 NCL gate equations. The gate
inputs A, B, C and D are connected to the MUX
select signals to pass the selected latch output to the
LUT output. The MUX consists of N-type transistors
and a CMOS inverter to provide a full voltage swing
at the output.


Fig. 3. Reconfigurable NCL LE without extra embedded registration.

The LUT output is then connected to the N-

type transistor of the PUPD function, such that the
output of this function will be logic ‘0’ only when F is
logic 1. Since all gate inputs (i.e., A, B, C and D ) are
connected to a series of P-type transistors, the PUPD
function output[4] will be logic ‘1’ only when all gate
inputs are logic ‘0.’

B. Reset Logic
The reset logic consists of a programmable latch and
transmission gate MUX [1]. During the programming
phase when P is asserted (nP is deasserted), the
latch stores the value Rv. The gate will be reset when
rst is asserted. rst is the MUX select input, such that
when it is logic ‘0’, the output of the PUPD function
passes through the MUX to be inverted and output on
Z. When rst is logic 1, the inverse of Rv is passed
through the MUX.

C. Output Inversion Logic

The output inversion logic also consists of a
programmable latch and transmission gate MUX. The
programmable latch stores Inv during the
programming phase, which determines if the gate is
inverting or not. The input and output of the
reconfigurable logic are both fed as data inputs to the
MUX, so that either the inverted or noninverted value
can be output, which is used as the MUX select input.




An alternative to the reconfigurable NCL LE

described is shown in Fig. 5. This design is very
similar to the previous version. However, it contains
an additional latch and input ER for selecting
embedded registration. Additional embedded
registration logic within the reconfigurable logic’s
PUPD logic, along with an additional registration
request input, Ki is used. The remaining portions of
the design, reconfigurable logic, reset logic and
output inversion logic functions the same.

A. Reconfigurable Logic
The reconfigurable logic portion consists of the same
16-address LUT used in the previous version and a
revised PUPD function that includes additional
embedded registration logic. When embedded
registration is disabled (i.e., ER=logic ‘0’ during the
programming phase), Ki should be connected to logic
‘0’, and the PUPD logic functions the same as
explained. However, when embedded registration is
enabled, the output of the PUPD function will only be
logic ‘0’ when both F and Ki are logic ‘1’, and will
only be logic ‘1’ when all gate inputs (i.e., A,B,C and
D) and Ki are logic ‘0’.

B. Embedded Registration

Embedded registration[1] merges delay insensitive

registers into the combinational logic, when possible.
This increases circuit performance and substantially
decreases the FPGA area required to implement most
Fig. 4 16 bit LUT designs, especially high throughput circuits (i.e.,
circuits containing many registers). Fig.6. shows an
example of embedded registration applied to an NCL
full-adder, where (a) shows the original design
consisting of a full-adder and 2-bit NCL register [2],
[8], (b) shows the design utilizing embedded
registration when implemented using the
reconfigurable NCL LE without extra embedded
registration capability, and (c) shows the design
utilizing embedded registration when implemented
using the reconfigurable NCL LE with extra
embedded registration capability.


Fig. 5. Reconfigurable NCL LE with extra embedded registration.

Implementation using NCL

Fig. 6. Embedded registration example. Original reconfigurable LE in Fig. 3.



1] Scott C.Smith, “Design of an FPGA Logic Element

for Implementing Asynchronous NULL Convention
Logic circuits,”IEEE Trans.on VLSI, Vol. 15, No. 6,
June 2007.
D.Lamb, “Optimization of NULL Convention Self
Timed Circuits,” Integr.,VLSI J.,vol. 37,no. 3,pp.135-
[3] S. C. Smith, R. F. DeMara, J. S. Yuan, M.
Hagedorn, and D. Ferguson, “Delay-insensitive
gatelevel pipelining,” Integr., VLSI J., vol. 30, no.2,
pp. 103–131, 2001.
[4]G.E.Sobelman & K.M. Fant,“CMOS circuit design
of threshold gates with hysteresis,”in Proc. IEEE Int.
Implementation using NCL Symp. Circuits Syst. (II), 1998, pp. 61–65.
reconfigurable LE in Fig. 3. [5]Schott. A. Brandt and K. M. Fant,
“Considerations of Completeness in the
V. CONCLUSION Expression of Combinational Processes” Theseus
Research, Inc. 2524 Fairbrook Drive Mountain View,
COMPARISON: [6] J.McCardle and D. Chester, “Measuring an
asynchronous Processor’s power and noise,” in Proc.
Table II compares the and propagation delays for the Synopsys User Group Conf. (SNUG), 2001, pp. 66–
two reconfigurable LE’s developed herein, based on 70.
which input transition caused the output to transition, [7] A. Kondratyev, L. Neukom, O. Roig, A. Taubin
and shows the average propagation delay,TP , during and K. Fant, “Checking delay-insensitivity: 104 gates
normal operation (i.e., excluding reset). Comparing and beyond,” in Proc. 8th Int. Symp. Asynchronous
the two reconfigurable LE’s developed herein shows Circuits Syst., 2002, pp. 137–145.
that the version without extra embedded registration [8] K. M. Fant and S. A. Brandt, “NULL convention
is 6% smaller and 20% faster. However, since fewer logic: A complete and consistent logic for
gates may be required when using the version with asynchronous digital circuit synthesis,” in Proc. Int.
extra embedded registration, the extra embedded Conf. Appl. Specific Syst., Arch., Process., 96, pp.
registration version may produce a smaller, faster 261–273.
circuit, depending on the amount of additional
embedded registration that can be utilized.

Propagation delay comparison based on input


Development of ASIC Cell Library for RF Applications.

K.Edet Bijoy1 Mr.V.Vaithianathan2
K.Edet Bijoy, IInd M.E. (Applied Electronics), SSN College of Engineering, Kalavakkam,
Chennai-603110. Email:
Mr.V.Vaithianathan, Asst.Prof., SSN College of Engineering, Kalavakkam, Chennai-603110.

Abstract— The great interest in RF CMOS comes flexible in terms of drive requirements, bandwidth
from the obvious advantages of CMOS technology and circuit loading. For RF applications, the most
in terms of production cost, high-level integration, common drive requirements for off-chip loads are
and the ability to combine digital, analog and RF based on 50 impedances. A factor governing the
circuits on the same chip. This paper reviews the bandwidth of the RF cells is the nodal capacitance to
development of an ASIC cell library especially for ground, primarily the drain and source sidewall
RF applications. The developed cell library capacitances. Transistors making up the library
includes cells like filters, oscillators, impedance elements are usually designed with multiple gate
matching circuits, low noise amplifiers, mixers, fingers to reduce the sidewall capacitance. Since these
modulators and power amplifiers. All cells were cells are to be used with digital and baseband analog
developed using standard 0.25µm and 0.18µm systems, control by on-chip digital and analog signals
CMOS technology. Circuit design criteria and is another factor in the design.
measurement results are presented. Applications The choice of cells in such a cell library should be
are low power, high speed data transfer RF based on the generalized circuit layout of a wireless
applications. system front end. A typical RF front end will have
both a receiver and transmitter connected to an
Index Terms— ASIC Cell Library, RF VLSI antenna through some type of control device. For the
receiver chain, the RF signal is switched to the upper
I.INTRODUCTION arm and enters the low noise amplifier and then to a
down converting mixer. For the transmit chain, the
he use of analog CMOS circuits at high frequencies RF signal enters an upconverting mixer and is then
Thas garnered much attention in the last several years.
sent to the output amplifier and through the control
CMOS is especially attractive for many of the device to the antenna. A number of CMOS cells
applications because it allows integration of both should be designed for the library. These cells include
analog and digital functionality on the same die, an RF switch for control of microwave and RF energy
increasing performance at the same time as keeping flow from the antenna to the transmitter or receiver, a
system sizes modest. The engineering literature has transmitter output amplifier capable of driving a 50
shown a marked increase in the number of papers antenna directly and at low distortion, and a mixer
published on the use of CMOS in high frequency that may be used in either circuit branch. An active
applications, especially since 1997. These load is included for use wherever a load may be
applications cover such diverse areas as GPS, micro required.
power circuits, GSM, and other wireless applications The cell library for RF applications presented here
at frequencies from as low as 100MHz for low earth attempts to address many of the design factors. The
orbiting satellite system to 1000 MHz and beyond. library consists of cells designed using 0.18 and
Many of the circuits designed are of high performance 0.25 CMOS processes. The cells described in this
and have been designed with and optimized for the paper can be used separately or combined to construct
particular application in mind. more complex functions such as an RF application.
At the heart of rapid integrated system design is the Each of the cells will be discussed separately for the
use of cell libraries for various system functions. In sake of clarity of presentation and understanding of
digital design, these standard cells are both at the the operation of the circuit. There was no post-
logic primitive level (NAND and NOR gates, for processing performed on any of the circuit topologies
example) as well as higher levels of circuit presented in this paper. The systems were designed to
functionality (ALUs, memory). For baseband analog maintain 50 system compatibility. The cells have
systems, standard cell libraries are less frequently been designed for flexibility in arrangement to meet
used, but libraries of operational amplifiers and other the designer's specific application. The larger
analog circuits are available. In the design of a CMOS geometry devices may also be used for education
RF cell library, the cells must be designed to be purposes since there are a number of low cost
fabrication options available for the technology. In
the design of any cell library, a trade-off between


speed/frequency response and circuit complexity is Fig.(1) the LC tank is shown explicitly, in practical
always encountered. A portion of this work is to show situations another configuration can be made, while
the feasibility of the cell library approach in RF for small signal circuits, it does not matter if the
design. The approach taken in this work with the second node of the capacitor C is connected to Vdd or
technologies listed above is directly applicable to ground. However, in any case a serial output
small device geometries. These geometries will yield capacitor is needed to block the DC path. This
even better circuit performance than the cells capacitor, not shown in Fig. (1) can contribute to the
discussed here. output matching, so it has to be chosen very carefully.
The output pad capacitance can be used for output
II.LOW NOISE AMPLIFIER DESIGN matching additionally.
The most critical point for the realization of a highly In order to connect the LNA to a measurement
integrated receiver is the RF input. The first stage of a equipment, a package or an antenna bonding pads
receiver is a low noise amplifier (LNA), which (Cpad) are needed. Fig. (1) shows two LNAs with
dominates the noise figure of the whole receiver. different input matching networks. In the networks
Besides of low noise, low power consumption, high from Fig. (1) all components are placed on the chip.
linearity and small chip size are the other key This principle is very often used, therefore we start
requirements. Because of this situation the design of the LNA analysis from this point. The bonding pad is
the LNA is really a challenge. parallel to the input of the LNA, and as long as their
impedance is much higher than the input impedance
of the LNA, they do not introduce any significant
effects to the input impedance of the whole circuit. In
our case assuming practical value of 150 fF for Cpad
and frequency of 2 GHz the impedance of the pad can
be neglected in comparison with required 50 .
However, if the influence of Cpad can not be neglected
only the imaginary part of Zin is affected.
The use of inductive degeneration results in no
additional noise generation since the real part of the
input impedance does not correspond to a physical
Fig.1. Amplifiers with input matching circuits: (a) resistor. The source inductor Ls generates a resistive
inductor Lg connected directly to the transistor, (b) term in the input impedance
pad capacitance Cpad connected directly to the
transistor. Zin = (gmLs/Cgs) + j ( ( 2
(Lg+Ls)Cgs -1)/ Cgs)

Among a few possible solutions for the LNA core, where Ls and Lg are source and gate inductors,
a cascode amplifier shown in Fig (1) with inductive respectively and gm and Cgs denote small signal
degeneration is often preferred. The transistor in parameters of transistor M1 (Cgd, gds and Cpad are
common-gate (CG) configuration of the cascode neglected).
amplifier reduces the Miller effect. It is well known, The inductor Lg series connected with the gate
that the capacitance connected between output and cancels out the admittance due to the gate-source
input of an amplifier with inverting gain, is seen at its capacitor. Here, it is assumed that the tuned load (L,
input and output multiplied by the gain. The gain of C) is in resonance at angular frequency 0 and
the common-source (CS) configuration is gmRL therefore appears to be a pure resistive load RL.
where RL is the output impedance, and the input To obtain a pure resistive term at the input, the
impedance of CG configuration is 1/gm. Therefore, if capacitive part of input impedance introduced by the
both transistors have similar gm the gain of the capacitance Cgs should be compensated by
transistor in CS configuration decreases and the inductances. To achieve this cancellation and input
Miller capacitance is reduced. At the output of the matching, the source and gate inductances should be
cascode amplifier, the overlap capacitance does no set to
affect the Miller effect since the gate of the amplifier Ls = RsCgs/ gm
is grounded. Thus, the tuned capacitor of the LC tank
only has to be large enough to make the tank Lg = (1- 02LsCgs) / 02Cgs
insensitive to Cgd2. In addition, with a low impedance where Rs is the required input resistance, normally
point at the output of the common source amplifier, 50 .
the instability caused by the zero of the transfer The noise figure of the whole amplifier with noise
function is highly reduced. Finally, with an AC contribution of transistor M2 neglected can be given
ground at the gate of the cascode amplifier, the output as
is decoupled from the input, giving the cascode
configuration a high reverse isolation. Although in


F = 1 + ( / )(1/Q) ( 0/ T) [ 1+ ( /k )(1+Q2) + been proposed, which allows dominant noise
2|c| ( /k )] contributions to be reduced. A very low noise figure
can be achieved by this way. This matching consists
Where = gm/gd0 of series inductance and parallel capacitance
connected between base and emitter of the common
, , c, k are bias dependent transistor parameters source transistor.
and Q=1/( 0CgsRs) is the quality factor of the input The input matching presented in Fig. (1)b is quite
circuit. It can be seen that noise figure is improved by similar to bipolar amplifier. Here, instead of base
the factor ( T / 0)2. Note, that for currently used emitter capacitance pad capacitance is used. It can be
sub-micron MOS-technologies T is in the order of expected, that taking pad capacitance as a part of
100 GHz. The noise figure of the LNA can be also input matching can lower the noise figure of a FET
expressed in simplified form with induced gate noise LNA. RF-CMOS LNAs have achieved lowest noise
neglected, however, easier for first order analysis values if pad capacitance was taken into
consideration. The reason for this behavior has not
F 1 + kgmRs/( T/ 0) been discussed enough, so far.
where is bias dependent constant and Rs is source Oscillators can generally be categorised as either
resistance. Although on a first sight suggests low amplifiers with positive feedback satisfying the
transconductance gm for low noise figure, taking into wellknown Barkhausen Criteria, or as negative
account that T gm/Cgs one can see that it is not resistance circuits. At RF and Microwave frequencies
true. Increasing of gm lowers the noise figure but at the negative resistance design technique is generally
the cost of higher power consumption. Since Cgs favoured.
contributes to the ( T / 0)2 factor, lowering this The procedure is to design an active negative
capacitance leads to improved noise. The last resistance circuit which, under large-signal steady-
possibility of noise reduction is reducing the signal state conditions, exactly cancels out the load and any
source resistance Rs. However, this resistance is fixed, other positive resistance in the closed loop circuit.
normally. This leaves the equivalent circuit represented by a
Decreasing the Cgs capacitance is done by reducing single L and C in either parallel or series
the size of the transistor. This has also impact on the configuration. At a frequency the reactances will be
linearity of the amplifier, and according to input equal and opposite, and this resonant frequency is
matching requirements, very large inductors Lg given by the standard formula
should be used that can not be longer placed on chip. f= 1/ (2 (LC))
Because of this reason the inductor Lg is placed off- It can be shown that in the presence of excess
chip. Between the inductor and the amplifier the on negative resistance in the small-signal state, any small
chip pad capacitance Cpad is located as it is shown in perturbation caused, for example, by noise will
Fig. (1)b. It consists of the pad structure itself and the rapidly build up into a large signal steady-state
on chip capacitance of ESD structure and signal resonance given by equation
wiring. In this case pad capacitance and Cgs are in Negative resistors are easily designed by taking a
similar order. Therefore, the pad has to be treated as a three terminal active device and applying the correct
part of an amplifier and then taken into account in the amount of feedback to a common port, such that the
design process. magnitude of the input reflection coefficient becomes
It should be noted, that particularly input pads need greater than one. This implies that the real part of the
special consideration. It has been proven that shielded input impedance is negative. The input of the 2-port
pads have ideally no resistive component, and so they negative resistance circuit can now simply be
neither consume signal power nor generate noise. terminated in the opposite sign reactance to complete
They consist of two metal plates drawn on the top and the oscillator circuit. Alternatively high-Q series or
bottom metals to reduce the pad capacitance value parallel resonator circuits can be used to generate
down to 50 fF. Unfortunately, it is not the whole higher quality and therefore lower phase noise
capacitance, which should be taken into account. One oscillators. Over the years several RF oscillator
has to realize that all connections to the pad increase configurations have become standard. The Colpitts,
this value. Hartly and Clapp circuits are examples of negative
The input matching circuit is very important for resistance oscillators shown here using bipolars as the
low noise performance of the LNA. Low noise active devices. The Pierce circuit is an op-amp with
cascode amplifiers using different approaches for positive feedback, and is widely utilised in the crystal
input impedance matching have been analyzed and oscillator industry.
compared in terms of noise figure performance for The oscillator design here concentrates on a
bipolar technology. The effect of noise filtering worked example of a Clapp oscillator, using a
caused by the matching network has been pointed out. varactor tuned ceramic coaxial resonator for voltage
Furthermore, a parallel-series matching network has control of the output frequency. The frequency under


consideration will be around 1.4 GHz, which is

purposely set in-between the two important GSM The first step in the design process is to ensure
mobile phone frequencies. It has been used at Plextek adequate (small-signal) negative resistance to allow
in Satellite Digital Audio Broadcasting circuits and in oscillation to begin, and build into a steady-state. It is
telemetry links for Formula One racing cars. At these clear that capacitor values of 2.7 pF in the Clapp
frequencies it is vital to include all stray and parasitic capacitive divider result in a magnitude of input
elements early on in the simulation. For example, any reflection coefficient at 1.4 GHz.. This is more than
coupling capacitances or mutual inductances affect enough to ensure that oscillation will begin.
the equivalent L and C values in equation, and
therefore the final oscillation frequency. Likewise,
any extra parasitic resistance means that more
negative resistance needs to be generated.
(A) Small-Signal Design Techniques
The small-signal schematic diagram of the
oscillator under consideration is illustrated in
Figure(2). The circuit uses an Infineon BFR181W
Silicon Bipolar as the active device, set to a bias point
of 2V Vce and 15 mA collector current. The resonator
is a 2 GHz quarter wavelength short circuit ceramic
coaxial resonator, available from EPCOS. The
resonator is represented by a parallel LCR model and
the Q is of the order of 350. It is important to note Fig.3 Result of Small-Signal Negative Resistance
that for a 1.4 GHz oscillator a ceramic resonator some Simulation
15 – 40 % higher in nominal resonant frequency is
required. This is because the parallel resonance will The complete closed loop oscillator circuit is next
be pulled down in frequency by the necessary analysed (small-signal) by observing the input
coupling capacitors (4 pF used) and tuning varactor impedance at the ideal transformer. The oscillation
etc. The varactor is a typical Silicon SOD-323 condition is solved by looking for frequencies where
packaged device, represented by a series LCR model, the imaginary part of the impedance goes through
where the C is voltage dependent. The load into zero, whilst maintaining an excess negative
which the circuit oscillates is 50 . At these resistance. It can be seen that the imaginary part goes
frequencies any necessary passive components must through zero at two frequencies, namely 1.35 GHz
include all stray and parasitic elements. The and 2.7 GHz. However, there is no net negative
transmission lines represent the bonding pads on a resistance at 2.7 GHz, while at 1.35 GHz there is
given substrate. The transmission lines have been some –70 . Thus with the component values we
omitted for the sake of clarity. have designed a circuit capable of oscillating at
The oscillator running into its necessary load forms approximately 1.35 GHz.
a closed-loop circuit and cannot be simulated in this
form because of the absence of a port. Therefore an III.MIXER DESIGN
ideal transformer is used to break into the circuit at a RF mixer is an essential part of wireless
convenient point, in this case, between the negative communication systems. Modem wireless
resistance circuit and resonator. It is important to note communication systems demand stringent dynamic
that this element is used for simulation purposes only, range requirements. The dynamic range of a receiver
and is not part of the final oscillator circuit. Being is often limited by the first down conversion mixer.
ideal it does not affect the input impedance at its point This force many compromises between figures of
of insertion. merit such as conversion gain, ~ linearity, dynamic
range, noise figure and port-to-port isolation of the
mixer. Integrated mixers become more desirable than
discrete ones for higher system integration with cost
and space savings. In order to optimize the overall
system performance, there exists a need to examine
the merits and shortcomings of each mixer feasible
for integrated solutions. Since balanced mixer designs
are more desirable in todays integrated receiver
designs due to its lower spurious outputs, higher
common-mode noise rejection and higher port-to-port
isolation, only balanced type mixers are discussed
Fig.2 Schematic for Small-Signal Oscillator Design


Fig.5 Transformation Networks Using l/4-Long

Transmission Lines

Figures (6) and (7) show the selectivity curves for

different transformation ratios and section numbers.
(B) Exponential lines
Fig.4 Schematic of a Single-Balanced Mixer
Exponential lines have largely frequency
independent transformation properties. The
The design of a single-balanced mixer is discussed
characteristic impedance of such lines varies
here. The single-balanced mixer shown in Fig(4) is
exponentially with their length I
the simplest approach that can be implemented in
Z = Z0.ekl
most semiconductor processes. The single balanced
where k is a constant, but these properties are
mixer offers a desired single-ended RF input for ease
preserved only if k is small.
of application as it does not require a balun
transformer at the input. Though simple in design, it
has moderate gain and low noise figure. However, the
design has low 1dB compression point, low port-to-
port isolation, low input ip3 and high input


Some graphic and numerical methods of impedance
matching will be reviewed here with refererence to
high frequency power amplifiers. Although matching Fig.6 Selectivity Curves for Two /4-Section
networks normally take the form of filters and Networks at Different Transformation Ratios
therefore are also useful to provide frequency
discrimination, this aspect will only be considered as
a corollary of the matching circuit.

(A) Matching networks using quarter-wave

At sufficiently high frequencies, where /4-long
lines of practical size can be realized, broadband
transformation can easily be accomplished by the use
of one or more /4-sections. Figure(5) summarizes
the main relations for (a) one-section and (b) two- Fig.7 Selectivity Curves for One, Two and Three
section transformation. A compensation network can /4-Sections
be realized using a /2-long transmission line.
This paper presented the results of on-going work
done in designing and developing a library of 0.18
and 0.25 CMOS cells for RF applications. The
higher operating frequency ranges are expected to
occur with 0.18 CMOS cells. The 1000 MHz upper


frequency is important because it includes several [6] N. Ratier, M. Bruniaux, S. Galliou, R. Brendel, J.
commercial communications bands. The design goals Delporte, "A very high speed method to simulate
were met for all the cell library elements. Designed quartz crystal oscillator" Proc. of the 19th EFTF.
amplifier can be used to improve with on-off function Besan,on, March 2005, to be published.
by connecting the gate of the transistor M2 through a [7] J. Park, C.-H. Lee, B. Kim, and J. Laskar, “A
switch to the Vdd or ground. Although, not fully lowflicker noise CMOS mixer for direct
integrated on chip, this architecture is a good solution conversion receivers,” presented at the IEEE
for multistandard systems, which operates at different MTT-S Int. Microw. Symp., Jun. 2006.
frequency bands. In reality, the small-signal [8] T. H. Lee, The Design of CMOS Radio-
simulation is vital to ensure that adequate negative Frequency Integrated Circuits. Cambridge,
resistance is available for start-up of oscillation. With United Kingdom: Cambridge University Press,
the emergence of new and more advanced 2002.
semiconductor processes, the proper integrated mixer [9] D. Pienkowski, R. Kakerow, M. Mueller, R.
circuit topology with the highest overall performance Circa, and G. Boeck, “Reconfigurable RF
can be devised and implemented. Receiver Studies for Future Wireless Terminals,”
Proceedings of the European Microwave
REFERENCES Association, 2005, vol. 1, June 2005.
[1] D. Jakonis, K. Folkesson, J. Dabrowski, P. [10] S. Camerlo, "The Implementation of ASIC
Eriksson, C. Svensson, "A 2.4-GHz RF Sampling Packaging, Design, and Manufacturing
Receiver Front-End in 0.18um CMOS", IEEE Technologies on High Performance Networking
Journal of Solid-State Circuits, Volume 40, Issue Products," 2005 Electronic Components and
6, June 2005, PP. 1265-1277. Technology Conference Proceedings, June 2005,
[2] C. Ruppel, W. Ruile, G. Schall, K. Wagner, and pp. 927-932.
0. Manner, "Review of Models for Low-Loss [11] J. Grad, J. Stine, “A Standard Cell Library for
Filter Design and Applications," IEEE Student Projects”, Technical Report Illinois
Ultrasonics Symp. Prac. 1994, pp. 313-324. Institute of Technology 2002,
[3] H. T. Ahn, and D. J. Allstot, “A 0.5-8.5-GHz cad/scells
fully differential CMOS distributed amplifier,” [12] D. Stone, J. Schroeder, R. Kaplan, and A. Smith,
IEEE J. Solid-State Circuits, vol. 37, no. 8, pp. “Analog CMOS Building Blocks for Custom
985-993, Aug 2002. and Semicustom
[4] B. Kleveland, C. H. Diaz, D. Vook, L. Madden, Applications”, IEEE JSSC, Vol. SC-19, No. 1,
T. H. Lee, and S. S. Wong, ”Exploiting CMOS February, 1984.
Reverse Interconnect Scaling in Multigigahertz
Amplifier and Oscillator Design,” IEEE J. Solid-
State Circuits, vol. 36, no. 10, pp. 1480-1488,
Oct 2001
[5] M. Yamaguchi, and K.-I. Arai, "Current status
and future prospects of RF integrated
inductors", J. Magn. Soc. Jpn., vol.25, no.2,
pp.59-65 2001.


A High-Performance Clustering VLSI Processor Based On

The Histogram Peak-Climbing Algorithm
I.Poornima Thangam,
II M.E. (VLSI Design),
M.Thangavel M.E., (Ph.D).,
Professor, Dept. of ECE,
K.S.Rangasamy College of Technology,
Tiruchengode, Namakkal, District.

Abstract –In computer vision systems, image into the image domain results in the desired
feature separation is a very difficult and important segmentation.
step. The efficient and powerful approach is to do
unsupervised clustering of the resulting data set. Components of a Clustering Task
This paper presents the mapping of the
unsupervised histogram peak-climbing clustering Typical pattern clustering activity involves the steps:
algorithm to a novel high-speed architecture Pattern representation (optionally including feature
suitable for VLSI implementation and real-time extraction and/or selection), definition of pattern
performance. It is the first special- purpose proximity measure appropriate to the data domain,
architecture that has been proposed for this clustering or grouping, Data abstraction (if needed),
important problem of clustering massive amounts and Assessment of output (if needed) [4] as shown in
of data which is a very computationally intensive Fig.1.
task and the performance is improved by making
the architecture truly systolic. The architecture has
also been prototyped using a Xilinx FPGA
development environment.

Key words – Data clustering, systolic architecture,

peak climbing algorithm, VLSI design, FPGA


As new algorithms are developed using a paradigm

of off-line non real-time implementation, often there Fig.1 Stages in Data Clustering
is a need to adapt and advance hardware architectures
to implement algorithms in a real-time manner if they
are to truly serve a useful purpose in industry and
defense. This paper presents a high performance, II.SIGNIFICANCE OF THE PROPOSED
systolic architecture [2], [3], [4] for the important task ARCHITECTURE
of unsupervised data clustering. Special attention is
paid to the clustering of information rich features used The main consideration for the implementation of
for color image segmentation, and “orders of the clustering algorithm as a dedicated architecture is
magnitude” performance increase from current its simplicity and highly nonparametric nature where
implementation on a generic compute platform. very few inputs are required into the system. These
characteristics lend the algorithm to implementation in
Clustering for image segmentation an FPGA environment, so it can form part of a
Special attention is given for the image flexible and reconfigurable real-time computing
segmentation application in this proposed architecture. platform for video frame segmentation. This system is
The target segmentation algorithm described in [5], depicted in Fig.2, and it can also serve as a rapid
relies on scanning images or video frames with a prototyping image segmentation platform.
sliding window and extracting features from each
window. The texture, or the pixels bounded by each This is the first special-purpose architecture that has
window, is characterized using mathematically been proposed for the important problem of clustering
modeled features. Once all features are extracted from massive amounts of feature data generated during the
the sliding windows, they are clustered in the feature real-time segmentation [1]. During image
space. The mapping of the identified clusters back segmentation, it is highly desirable to be able to


choose the best fitting features and/or clustering N Æ dimensions of the features;
method based on problem domain [6], type of CS (k) Æ length of the histogram cell in the
imagery, or lighting conditions. kth dimension;
f max(k)Æ maximum value of the kth
dimension of the features;
f min(k)Æ minimum value of the kth
dimension of the M features;
QÆ total number of quantization levels for
each dimension of the N-dimensional
dkÆ index for a histogram cell in the kth
dimension associated with a given feature

Since the dynamic range of the vectors in each

dimension can be quite different, the cell size for each
dimension could be different. Hence, the cells will be
hyper-boxes. This provides efficient dynamic range
management of the data, which tends to enhance the
quality and accuracy of the results. Next, the number
of feature vectors falling in each hyper-box is counted
and this count is associated with the respective hyper-
box, creating the required histogram [8].

B. Peak-climbing Approach

After the histogram is generated in the

feature space, a peak-climbing clustering approach is
utilized to group the features into distinct clusters [9].
This is done by locating the peaks of the histogram.

Fig. 2. Reconfigurable real-time computing platform

for video frame segmentation.


This section describes the clustering algorithm

implemented in this work. The Histogram peak-
climbing approach is used for clustering features
extracted from the input image.

A. Histogram Generation

Given M features f of dimensionality N to be

clustered, the first step is to generate Histogram of N
dimensions [5], [7]. This histogram is generated by
quantizing each dimension according to the following
equations: Fig. 3. Illustration of the peak climbing approach for a
two dimensional feature space example.

In Fig.3, this peak-climbing approach is

illustrated for a two-dimensional space example. A
peak is defined as being a cell with the largest density
in the neighborhood. A peak and all the cells that are
linked to it are taken as a distinct cluster representing
a mode in the histogram. Once the clusters are found,
these can be mapped back to the original data domain
for each of the M f (k) feature members, where from which the features were extracted. Features


grouped in the same cluster are tagged as belonging to

the same category.



Fig.4 shows the different steps of this

implementation of the clustering algorithm and the
overall architecture. The chosen architecture follows a
globally systolic partition For the hardware
implementation, the dataset to be clustered and Q are
inputs to the system, and the clustered data and the
number of clusters found are outputs of the system.

Fig.4. Peak climbing clustering algorithm overall architecture


A. Overall Steps

The main steps performed by this architecture

A. Feature selection / Extraction
B. Inter pattern similarity
C. Clustering / Grouping
i. Histogram Generation
ii. Identifying Peaks
iii. Finding the peak indices
iv. Link Assignment
The input image is digitized and the features (intensity
here) of each pixel are found.Then the inter pattern
similarity is identified by calculating the number of
pixels with equal intensity. The clustering of features
is done by four steps, namely, Generation of
Histogram, Peak Identification, Finding the
corresponding peak index for each and every peak
[10], [11] and at last the Link assignment by setting a Fig.6. Cell size CS(k) processing element
threshold value by which the features are clustered. Fig.7 shows the details of the PE to compute
histogram indexes for each data vector. N PEs are
B. Architectural Details instantiated in parallel; one for each dimension.

This section presents the architectural details of the

processor. Fig.5 shows the PE for the operation of
finding the minimum and maximum values for each
dimension of the feature vectors in the data set. N
Min-Max PEs are instantiated in parallel, one for each
dimension. The operations to find the minimum and
maximum values are run sequentially, thus, making
use of a single MIN/MAX cell in the PE.
Fig.6 shows the details of the PE to compute the Cell
Size CS(k) for each dimension. N PEs are instantiated
in parallel; one for each dimension. Because of the
high dimensionality of Random Field models, the
number of quantization levels in each dimension
necessary for effective and efficient clustering is very
small ,that is Q = 3 … 8. This allows the division
operation of Equation 1 to be implemented by a
multiplication by the inverse of Q stored in a small
look-up table (LUT).
Fig.7. Index processing element

Fig.8 shows the details of the PE to allocate

and identify a data vector with a given histogram bin.
One instantiation per each possible bin is made. The
purpose of the compressor is to count the number of
ones from the comparators, which corresponds to the
density of a given bin in the histogram.

The rest of the micro-architecture to establish

the links between the histogram bins, and assign the
clusters, so that the results can be output, follows a
very similar structure as Fig.8. The only notable
exception is that the PE uses a novel computational
cell to calculate the norm between two 22 dimensional
vectors. This cell is shown in Fig.9.

Fig.5. Min-Max processing element


Fig.8. Processing Element used to allocate vectors VI. RESULTS

to histogram bins
This paper describes a high performance
VLSI architecture for the clustering of high
dimensionality data. This architecture can be used
in many military, industrial, and commercial
applications that require real-time intelligent
machine vision processing. However, the approach
is not limited to this type of signal processing only,
but it can also be applied to other types of data for
other problem domains, for which the clustering
process needs to be accelerated. In this paper, the
performance of the processor has been improved
by making the architecture systolic.


In future, there is possibility of processing data

in floating-point format, and also the
implementation of the architecture across several
FPGA chips to address larger resolutions.

Fig. 9. Neighbor detector


The architecture has been prototyped using a

Xilinx FPGA development environment. The issue
of cost effectiveness for FPGA implementation is
somewhat secondary for reconfigurable computing
platforms. The main advantage of FPGAs is its
flexibility. The target device here is Virtex- II



[1] O. J. Hernandez, “A High-Performance VLSI

Architecture for the Histogram Peak-Climbing Data
Clustering Algorithm,” IEEE Trans. Very Large
Scale Integr.(VLSI) Syst.,vol.14,no.2,pp. 111-121,
Feb. 2006.
[2] M.-F. Lai, C.-H. Hsieh and Y.-P. Wu, “A VLSI
architecture for clustering analyzer using systolic
arrays,” in Proc. 12th IASTED Int. Conf. Applied
Informatics, May 1994, pp. 260–260.
[3] M.-F. Lai, Y.-P. Wu, and C.-H. Hsieh, “Design of
clustering analyzer based on systolic array
architecture,” in Proc. IEEE Asia-Pacific Conf.
Circuits and Systems, Dec. 1994, pp. 67–72.
[4] M.-F. Lai, M. Nakano, Y.-P.Wu, and C.-H. Hsieh,
“VLSI design of clustering analyzer using systolic
arrays,” Inst. Elect. Eng. Proc.: Comput. Digit. Tech.,
vol. 142, pp. 185–192, May 1995.
[5] A. Khotanzad and O. J. Hernandez, “Color image
retrieval using multispectral random field texture
model and color content features,” Pattern Recognit.
J., vol. 36, pp. 1679–1694, Aug. 2003.
[6] M.-F. Lai and C.-H. Hsieh, “A novel VLSI
architecture for clustering analysis,” in Proc. IEEE
Asia Pacific Conf. Circuits and Systems, Nov. 1996,
pp. 484–487.
[7] O. J. Hernandez, “High performance VLSI
architecture for data clustering targeted at computer
vision,” in Proc. IEEE SoutheastCon, Apr. 2005, pp.
[8] A. Khotanzad and A. Bouarfa, “Image
segmentation by a parallel, nonparametric histogram
based clustering algorithm,” Pattern Recognit.J, vol.
23, pp. 961–963, Sep. 1990.
[9] S. R. J. R. Konduri and J. F. Frenzel, “Non-
linearly separable cluster classification: An
application for a pulse-coded CMOS neuron,” in
Proc.Artificial Neural Networks in Engineering Conf.,
vol. 13, Nov. 2003, pp. 63–67.
/clustering /tutorial _html
[11] /cluster analysis


Reconfigurable CAM-Improving the Effectiveness of Data

Access in ATM Networks
Sam Alex 1, B.Dinesh2, S. Dinesh kumar 2
1: Lecturer, 2: Students, Department of Electronics and Communication Engineering,
JAYA Engineering College,Thiruninravur , Near Avadi, Chennai-602024.
Email :-,,

Abstract-Content addressable memory is an reproduce the original message. The function of a

expensive component in fixed architecture systems router is to compare the destination address of a
however it may prove to be a valuable tool in packet to all possible routes, in order to choose the
online architectures (that is, run-time appropriate one. A CAM is a good choice for
reconfigurable systems with an online decision implementing this lookup operation due to its fast
algorithm to determine the next reconfiguration). search capability. In this paper, we present a novel
Using an ATM, customers access their bank architecture for a content addressable memory that
account in order to make cash withdrawals (or provides arbitrary tag and data widths. The intention
credit card cash advances) and check their account is that this block would be incorporated into a Field-
balances .The existing now has dedicated lines Programmable Gate Array or a Programmable Logic
from a common server. In this paper we use the Core, however, it can be used whenever post-
concept of IP address matching using fabrication flexibility of the CAM is desired.[21].
reconfigurable content addressable
memories(RCAM) using which, we replace the II.CONTENTADDRESSABLEMEMORIES
dedicated lines connected to the different ATMs by
means of a single line, where every ATM is Content Addressable Memory (CAM) is hardware
search engines that are much faster than algorithmic
associated with an individual RCAM circuit. We
approaches for search-intensive applications. They are
implement the RCAM circuit using finite state
a class of parallel pattern matching circuits. In one
machines. Thus we improved the efficiency with
mode these circuits operate like standard memory
which data is accessed. We have also made
circuits and may be used to store binary data. Unlike
efficient cable usage, bringing in less maintenance
standard memory circuits, a powerful match mode is
and making the overall design cheap.
available. This provides all of the data in CAM to be
Keywords: Content addressable memory, ATM,
searched in parallel. In the match mode, each memory
RCAM, Finite state machines, look up table,
cell in the array is accessed in parallel and compared
packet forwarding.
to some value. If value is found, match signal is
In some implementations, all that is significant is
match for the data is found. In other cases, it is
A Content-Addressable memory (CAM) compares
desirable to know exactly where in the memory
input search data against a table of stored data, and
address space, this data was located. Rather than
returns the address of the matching data [1]–[5].
producing simple match signal, CAM supplies the
CAMs have a single clock cycle throughput making
address of the matching data. CAM compares input
them faster than other hardware- and software-based
search data against a table of stored data, and returns
search systems. CAMs can be used in a wide variety
the address of the matching data. They have a single
of applications requiring high search
clock cycle throughput making them faster than other
speeds. These applications include parametric curve
hardware and software-based are-search systems.
extraction [6], Hough transformation [7], Huffman
CAMs can be used in a wide variety of
coding/decoding [8], [9], Lempel–Ziv compression
applications requiring high search speeds. These
[10]–[13], and image coding[14]. The primary
applications include parametric curve extraction,
commercial application of CAMs today is to classify
Hough transformation, Huffman coding/decoding,
and forward Internet protocol (IP) packets in network
Lempel– Ziv compression, and image coding.
routers [15]–[20]. In networks like the Internet, a
message such an as e-mail or a Web page is
transferred by first breaking up the message into small
data packets of a few hundred bytes, and, then,
sending each data packet individually through the
network. These packets are routed from the source,
through the intermediate nodes of the network (called
routers), and reassembled at the destination to Fig.1.Block Diagram of a Cam Cell


A. Working of CAM B. Packet Forwarding Using Cam

We describe the application of CAMs to
Writing to a CAM is exactly like writing to
packet forwarding in network routers. First, we
a conventional RAM. However, the “read” operation
briefly summarize packet forwarding and then show
is actually a search of the CAM for a match to an
how a CAM implements the required operations.
input "tag." In addition to storage cells, the CAM
Network routers forward data packets from an
requires one or more comparators. Another common
incoming port to an outgoing port, using an address-
scheme involves writing to consecutive locations of
lookup function. The address-lookup function
the CAM as new data is added. The outputs are a
examines the destination address of the packet and
MATCH signal (along with an associated MATCH
selects the output port associated with that
VALID signal) and either an encoded N-bit value or a
address.The router maintains a list, called the routing
one-hot-encoded bus with one match bit
table that contains destination addresses and their
corresponding to each CAM cell.
corresponding output ports. An example of a
The multi-cycle CAM architecture tries to
simplified routing table is displayed in TableI.
find a match to the input data word by simply
sequencing through all memory locations – reading
Example Routing Table
the contents of each location, comparing the contents
Entry Address(Binary) Output port
to the input value, and stopping when a match is
found. At that point, MATCH and MATCHVALID
are asserted. If no match is found, MATCH is not 1 101XX A
asserted, but MATCH VALID is asserted after all 2 0110X B
addresses are compared. MATCH VALID indicates 3 011XX C
the end of the read cycle. In other words, MATCH 4 10011 D
VALID asserted and MATCH not asserted indicates Fig.3. Simple routing table
that all the addresses have been compared during a
read operation and no matches were found.
When a match is found, the address of the CAM RAM
matching data is provided as an output and the
MATCH signal is asserted. It is possible that multiple 101XX encoder 00 port A

locations might contain matching data, but no 0110 X 01 01 port B
checking is done for this. Storage for the multi-cycle 011XX 10 port C
CAM can be either in distributed RAM (registers) or 10011 11 port D
block RAM.
M bits
M-1 bits 01101 Port B
Last data word of previous entry 0
First tag word 1 Fig. 4. CAM based implementation of he routing
Second tag word 0 table.
Third tag word 0
All four entries in the table are 5-bit words,
First data word 0
with the don’t car care bit “X”, matching both a 0
Second data word 0
and a 1 in that position. Due to the “X” bits, the first
Third data word 0
three entries in the Table represent a range of input
Fourth data word 0
addresses, i.e., entry 1 maps all addresses in the range
First tag word of next entry 1 10100 to 10111 to port A. The router searches this
One entry with 3*(M-1) tag bits and 4*(M-1) data table for the destination address of each incoming
bits packet, and selects the appropriate output port.
Fig.2. Cam Storage Organization For example, if the router receives a packet
In a typical CAM, each memory word is With the destination address 10100, the packet is
divided into two parts. 1. A tag field .2.An address forwarded to port A. In the case of the incoming
field. Each tag field is associated with one address 01101, the address lookup matches both entry
comparator. Each comparator compares the associated 2 and entry 3 in the table. Entry 2 is selected since it
tag with the input tag bits, and if a match occurs, the has the fewest “X” bits, or alternatively it has the
corresponding data bits are driven out of the memory. longest prefix, indicating that it is the most direct
Although this is fast, it is not suitable for applications route to the destination. This lookup method is called
with a very wide tag or data width. Wide tags lead to longest-prefix matching.
large comparators, which are area inefficient, power Fig illustrates how a CAM accomplishes address
hungry, and often slow. lookup by implementing the routing table shown in
Table able I. On the left of Fig, the packet destination-


address of 01101 is the input to the CAM. As in the classified. This typically involves some type of
table, two locations match, with the (priority) encoder search. Current software based approaches rely on
choosing the upper entry and generating the match standard search schemes such as hashing.
location 01, which corresponds to the most-direct This results in savings not only in the cost of the
route. processor itself, but in other areas such as power
This match location is the input address to a RAM consumption and overall system cost. In addition, an
that contains a list of output ports, as depicted in Fig. external CAM provides networking hardware with the
A RAM read operation outputs the port designation, ability to achieve packet processing in essentially
port B, to which the incoming Packet is forwarded. constant time. Provided all elements to be matched fit
We can view the match location output of the CAM in the CAM circuit, the time taken to match is
as a pointer that retrieves the associated word from independent of the number of items being matched.
the RAM. In the particular case of pack packet
forwarding the- associated word is the designation of
the output port. This CAM/RAM System is a
complete implementation of an address-lookup engine
for packet forwarding.
The Reconfigurable Content Addressable & & &
Memory or RCAM makes use of run-time
reconfiguration to efficiently implement a CAM
circuit. Rather than using the FPGA flip-flops to store
the data to be matched, the RCAM uses the FPGA
Look up Tables or LUTs. Using LUTs rather than Fig.6. IP Match circuit using the RCAM.
flip-flops results in a smaller, faster CAM. The
approach uses the LUT to provide a small piece of Figure above shows an example of an IP
CAM functionality. In Figure, a LUT is loaded with Match circuit constructed using the RCAM approach.
data which provides\match 5" functionality. That is, Note that this example assumes a basic 4-input LUT
whenever the binary encoded value \5" is sent to the structure for simplicity. Other optimizations,
four LUT inputs, a match signal is generated. including using special-purpose hardware such as
4 input LUT carry chains are possible and may result in substantial
circuit area savings and clock speed increases.
This circuit requires one LUT input per
matched bit. In the case of a 32-bit IP address, this
circuit requires 8 LUTs to provide the matching, and
three additional 4-input LUTs to provide the ANDing
for the MATCH signal. An array of this basic 32-bit
Fig. 5. A simple Look up Table. … 000100 matching block may be replicated in an array to
produce the CAM circuit.
Note that using a LUT to implement CAM
functionality, or any functionality for that matter, is IV. FINITE STATE MACHINES
not unique. An N-input LUT can implement any
arbitrary function of N inputs; including a CAM. This If a combinational logic circuit is an
circuit demonstrates the ability to embed a mask in implementation of a Boolean function, then,
the configuration of a LUT, permitting arbitrary sequential circuit can be considered an
disjoint sets of values to be matched, within the LUT. implementation of finite state machine.
This function is important in many matching The goal of FSM is not accepting or rejecting things,
applications, particularly networking. This approach but generating a set of outputs, given, a set of inputs.
can be used to provide matching circuits such as They describe how the inputs are being processed,
match all or match none or any combination of based on input and state, to generate outputs. This
possible LUT values. FSM uses only entry actions, that is, output depends
One currently popular use for CAMs is in only on the state.
networking. Here data must be processed under
demanding real-time constraints. As packets arrive, In a Moore finite state machine, the output of the
their routing information must be processed. In circuit is dependent only on the state of the machine
particular, destination addresses, typically in the form and not its inputs. This FSM uses only input actions,
of 32-bit Internet Protocol (IP) addresses must be


that is, output depends on input and state. The usage There are two types of ATM installations, on and off
of this machine results in the reduction in the number premise. On premise ATMs are typically more
of states. This FSM uses only input actions, that is, advanced, multi-function machines that complement
output depends on input and state. The usage of this an actual bank branch's capabilities and thus more
machine results in the reduction in the number of expensive. Off premise machines are deployed by
states. financial institutions and also ISOs (or Independent
In a Mealy Finite state machine, the output is Sales Organizations) where there is usually just a
dependent both on the machine state as well as on the straight need for cash, so they typically are the
inputs to the finite state machine. Notice that in this cheaper mono-function devices.
case, outputs can change asynchronously with respect
to clock. C. Hardware
One of the best ways of describing a Mealy An ATM typically is made up of the following
finite state machine is by using two always devices.
statements, on be for describing the sequential logic, CPU (to control the user interface and transaction
and one for describing the combinational logic (this devices), Magnetic and/or Chip card reader (to
includes both next state logic and output logic). It is identify the customer), PIN Pad (similar in layout to a
necessary top do this since any changes on inputs Touch tone or Calculator keypad), often
directly the outputs used to describe combinational manufactured as part of a secure enclosure, Secure
logic, the state of the machine using a reg variable. crypto processor, generally within a secure enclosure,
Display (used by the customer for performing the
V. THE AUTOMATED TELLER MACHINES transaction). Function key buttons (usually close to
the display) or a Touch screen (used to select the
An ATM is also known, in English, as various aspects of the transaction), Record Printer (to
Automated Banking Machine, Money machine, Bank provide the customer with a record of their
Machine transaction) , Vault (to store the parts of the
machinery requiring restricted access), Housing (for
A. Usage aesthetics and to attach signage to). Cheque
Processing Module ,Batch Note Acceptor .Recently,
Encrypting PIN Pad (EPP) with German due to heavier computing demands and the falling
markings on most modern ATMs, the customer price of computer-like architectures, ATMs have
identifies him or herself by inserting a plastic card moved away from custom hardware architectures
with a magnetic stripe or a plastic smartcard with a using microcontrollers and/or application-specific
chip, that contains his or her card number and some integrated circuits to adopting a hardware architecture
security information, such as an expiration date or that is very similar to a personal computer.
CVC (CVV). The customer then verifies their identity Many ATMs are now able to use operating
by entering a pass code, often referred to as a systems such as Microsoft Windows and Linux.
Personal Identification Number Although it is undoubtedly cheaper to use commercial
off-the-shelf hardware, it does make ATMs
B. Types of ATMs vulnerable to the same sort of problems exhibited by
conventional computers.
Mono-function devices, which only one type of
mechanism for financial transactions is present (such D. Future
as cash dispensing or statement printing) Multi- ATMs were originally developed as just cash
function devices, which incorporate multiple dispensers; they have evolved to include many other
mechanisms to perform multiple services (such as bank-related functions. ATMs can also act as an
accepting deposits, dispensing cash, printing advertising channel for companies to advertise their
statements, etc.) all within a single footprint. own products or third-party products and services.
There is now a capability (in the U.S.and Europe, at
least) for no-envelope deposits with a unit called a VI. IMPLEMENTATION
Batch- or Bunch-Note Acceptor, or BNA, that will
accept up to 40 bills at a time. There is another unit
called a Cheque Processing Machine, or Module,
(CPM) that will accept your Cheque, take a picture of
both sides, read the magnetic ink code line which is at
the bottom of every Cheque, read the amount written
on the Cheque, and capture your Cheque into a bin,
giving you instant access to your money, if your
account allows. Fig.9.Current ATM System


Having seen the details about the current

ATM systems, we can understand that, these systems
have dedicated lines from their servers.In the current
system, the server checks all the users’ information VII. RESULTS
and then give the details of the user who inserted.



Fig.11.Block Diagram


1 2

Fig. 10. Proposed system with RCAM

The data packets coming from the server are

available to all the RCAMs simultaneously. The
packets are formatted to have a 32 bit IP address
followed by 10 bits of data. All the RCAMs receive
the data packets (DP) simultaneously. The IP
addresses of the data packets are traced by all the four
RCAM circuits. The RCAM performs its matching
and determines if the IP address of the packet matches
with the IP address of the ATM. In
case the matching occurs with the address of the first
ATM, then, the following 10 bits of data are taken by
the first ATM alone. If there is a mismatch in the IP
address, the following 10 bits of data are not taken.
We initially set up a CAM array, consisting
of four CAMS, cam0, cam1, cam2, and cam3.(for
each of the four ATMs taken as a sample) In each of
the elements of the CAM array, we have variables
ranging from 0 to 31, in order to take up the 32 bit IP
address that would be fed during the runtime. It
would be like, cam0 (0) cam0 (1)….cam0 (31),
similarly for all the four Cams. During the runtime,
we decide the IP address of the ATMs and force it
into the channels.
We also send the 10 bits of data,
following the IP address. The ‘dsel’ pin is set up such
that, if the IP address if forced on channel for cam 0,
the ‘dsel’ becomes 0001 so that, the data that is
transmitted appears as output to ‘tout 0’pin.
Similarly, when address for cam 1 is forced, ‘dsel’ Fig.12. the first cam IP address is read
becomes 0010, such that, data transmitted appears as
output to ‘tout 1’ pin.


are just the beginning of RCAM usage. Once other

applications realize that simple, fast, flexible parallel
matching is available, it is likely that other
applications and algorithms will be accelerated.


[1] T. Kohonen, Content-Addressable Memories, 2nd

ed. New York: Springer-Verlag, 1987.
Fig.13.The Data for First Cam Is Read [2] L. Chisvin and R. J. Duckworth, “Content-
addressable and associative memory: alternatives to
the ubiquitous RAM,” IEEE Computer, vol. 22, no. 7,
pp. 51–64, Jul. 1989.
[3] K. E. Grosspietsch, “Associative processors and
memories: a survey,” IEEE Micro, vol. 12, no. 3, pp.
12–19, Jun. 1992.
[4] I. N. Robinson, “Pattern-addressable memory,”
IEEE Micro, vol. 12, no. 3, pp. 20–30, Jun. 1992.
[5] S. Stas, “Associative processing with CAMs,” in
Northcon/93 Conf. Record, 1993, pp. 161–167.
[6] M. Meribout, T. Ogura, and M. Nakanishi, “On
using the CAM concept for parametric curve
extraction,” IEEE Trans. Image Process., vol. 9, no.
12, pp. 2126–2130, Dec. 2000.
[7] M. Nakanishi and T. Ogura, “Real-time CAM-
based Hough transform and its performance
evaluation,” Machine Vision Appl., vol. 12, no. 2, pp.
Fig.14.The Address of Two Cams are Taken. 59–68, Aug. 2000.
[8] E. Komoto, T. Homma, and T. Nakamura, “A
VIII. CONCLUSION high-speed and compactsize JPEG Huffman decoder
Today, advances in circuit technology permit large using CAM,” in Symp. VLSI Circuits Dig Tech.
CAM circuits to be built. However, uses for CAM Papers, 1993, pp. 37–38.
circuits are not necessarily limited to niche [9] L.-Y. Liu, J.-F.Wang, R.-J.Wang, and J.-Y. Lee,
applications like cache controllers or network routers. “CAM-based VLSI architectures for dynamic
Any application which relys on the searching of data Huffman coding,” IEEE Trans. Consumer Electron.,
can benefit from a CAM-based approach. vol. 40, no. 3, pp. 282–289, Aug. 1994.
In addition, the use of parallel matching [10] B. W. Wei, R. Tarver, J.-S. Kim, and K. Ng, “A
hardware in the form of CAMs can provide another single chip Lempel-Ziv data compressor,” in Proc.
more practical benefit. For many applications, the use IEEE Int. Symp. Circuits Syst. (ISCAS), vol. 3, 1993,
of CAM- based parallel search can offload much of pp. 1953–1955.
the work done by the system processor. This should [11] R.-Y. Yang and C.-Y. Lee, “High-throughput
permit smaller, cheaper and lower power processors data compressor designs using content addressable
to be used in embedded applications which can make memory,” in Proc. IEEE Int. Symp. Circuits Syst.
use of CAM-based parallel search. (ISCAS), vol. 4, 1994, pp. 147–150.
The RCAM is a flexible, cost-effective [12] C.-Y. Lee and R.-Y. Yang, “High-throughput
alternative to existing CAMs. By using FPGA data compressor designs using content addressable
technology and run-time reconfiguration, fast, dense memory,” IEE Proc.—Circuits, Devices and Syst.,
CAM circuits can be easily constructed, even at run- vol. 142, no. 1, pp. 69–73, Feb. 1995.
time. In addition, the size of the RCAM may be [13] D. J. Craft, “A fast hardware data compression
tailored to a particular hardware design or even algorithm and some algorithmic
temporary changes in the system. extansions,” IBM J. Res. Devel., vol. 42, no. 6, pp.
This flexibility is not available in other 733–745, Nov. 1998.
CAM solutions. In addition, the RCAM need not be a [14] S. Panchanathan and M. Goldberg, “A content-
stand-alone implementation. Since the RCAM is addressable memory architecture
entire a software solution using state of the art FPGA for image coding using vector quantization,” IEEE
hardware, it is quite easy to embed RCAM Trans.Signal Process., vol. 39, no. 9, pp. 2066–2078,
functionality in larger FPGA designs. Sep. 1991.
Finally, we believe that existing
applications, primarily in the field of network routing,


[15] T.-B. Pei and C. Zukowski, “VLSI

implementation of routing tables: tries and CAMs,” in
Proc. IEEE INFOCOM, vol. 2, 1991, pp. 515–524.
[16] , “Putting routing tables in silicon,” IEEE
Network Mag., vol. 6, no.1, pp. 42–50, Jan. 1992.
[17] A. J. McAuley and P. Francis, “Fast routing table
lookup using CAMs,” in Proc. IEEE INFOCOM, vol.
3, 1993, pp. 1282–1391.
[18] N.-F. Huang, W.-E. Chen, J.-Y. Luo, and J.-M.
Chen, “Design of multi-field IPv6 packet classifiers
using ternary CAMs,” in Proc. IEEE GLOBECOM,
vol. 3, 2001, pp. 1877–1881.
[19] G. Qin, S. Ata, I. Oka, and C. Fujiwara,
“Effective bit selection methods for improving
performance of packet classifications on IP routers,”
in Proc. IEEE GLOBECOM, vol. 2, 2002, pp. 2350–
[20] H. J. Chao, “Next generation routers,” Proc.
IEEE, vol. 90, no. 9, pp. 1518–1558, Sep. 2002.
[21] C.J. Jones, S.J.E. Wilton, "Content addressable
Memory with Cascaded Match, Read and Write Logic
in a Programmable Logic Device", U.S. Patent
6,622,204. Issued Sept. 16, 2003, assigned to Cypress
Semiconductor Corporation.


Design of Multistage High Speed Pipelined RISC

Manikandan Raju , Prof.S.Sudha
Electronics and Communication Department (PG),
Sona College of Technology, Salem ,

Abstract The paper describes the architecture and (SPRs) and 64 Double-Precision-Floating-Point-Unit
design of the pipelined execution unit of a 32-bit (DPFPU) registers. The Instruction Set Architecture
RISC processor. Organization of the blocks in
different stages of pipeline is done in such a way (ISA) has total 136 instructions. The processor has
that pipeline can be clocked at high frequency. two modes of operation, user mode and supervisor
Control and forward of 'data flow' among the mode (protected mode). Dependency-Resolver detects
stages are taken care by dedicated hardware logic. and resolves the data hazard within the pipeline. The
Different blocks of the execution unit and execution unit is interfaced with instruction channel
dependency among themselves are explained in and data channel. Both the channels operate in
details with the help of relevant block diagrams. parallel and Communicate with external devices
The design has been modeled in VHDL and through a common Bus Interface Unit (BIU). The
functional verification policies adopted for it have instruction channel has a 128-bit Instruction Line
been described thoroughly Synthesis of the design Buffer, 64-KB Instruction Cache [1], a Memory
is carried out at 0.13-micron standard cell Management Unit (MMU) [1] and a 128-bit Prefetch
technology. Buffer [3]. The data channel has a 128-bit Data Line
Buffer, 64-KB Data Cache, a MMU and a Swap
Keywords: ALU, Pipeline, RISC, VLSI, Multistage Buffer [3].
The Pre-fetch Buffer and Swap Buffer are introduced
I. INTRODUCTION to reduce memory latency during instruction fetch and
data cache misses respectively. The external data flow
The worldwide development of high-end, through the instruction channel and data channel is
sophisticated digital systems has created a huge controlled by respective controller-state-machine. The
demand for high speed, general-purpose processor. processor also has seven interrupt-requests (IRQs)
The performance of processors has increased and one non-maskable interrupt (NMI) inputs. The
exponentially since their launch in 1970. Today's high Exception Processing Unit controls the interrupts and
performance processors have a significant on-chip exceptions.
impact on the commercial marketplace. This high
growth rate of the processors is possible due to III. EXECUTION UNIT
dramatic technical advances in computer architecture,
circuit design, CAD tools and fabrication methods. CORE-I execution unit contains an ALU unit, a
Different processor architectures have been developed branch/jump unit, a single-precision floating-point
and optimized to achieve better performance. RISC unit (SPFPU) and a double-precision floating-point
philosophy [1] has attracted microprocessor designers unit (DPFPU) [4]. The execution unit is implemented
to a great extent .Most computation engines used in six pipeline stages - IF (Instruction Fetch), ID
these days in different segments like server, (Instruction Decode), DS (Data Select), EX
networking, signal processing are based on RISC (Execution Unit), MEM (Memory) and WB (Write
philosophy [1].To cater to the needs of multi-tasking, Back). The main blocks in different stages of the
multi-user applications in high-end systems, a 32-bit pipeline are shown in Figure1
generic processor architecture, CORE-I, has been
designed based on RISC philosophy [2]. A.Program Counter Unit (PCU):

II. PROCESSOR OVERVIEW Program Counter Unit provides the value of program
Counter (PC) in every cycle for fetching the next
CORE-I is a 32-bit RISC processor with 6 stage instruction from Instruction Cache. In every cycle, it
pipelined execution unit based on load-store checks for presence of any branch instruction in
architecture. ALU supports both single-precision and MEM stage, jump instruction in
double-precision floating-point operations. CORE-I EX stage, any interrupt or on-chip exceptions and
Register-File has 45 General- Purpose Registers presence of RTE (return from exception) instruction
(GPRs), 19 Special-Purpose Registers [2] inside the pipeline. In the absence of any one of


above conditions, the PCU either increments the PC C. ALU Unit:

value by 4 when pipeline is not halted or keeps the
old value. This operation is performed in IF stage. CORE-I has instructions to perform arithmetic
operations like addition, subtraction, shifting,
rotating, multiplication, single step division, bit
B. Register File: set/reset, extend sign, character reading/writing etc.
The operations are performed in EX stage.There are
CORE-I has two separate register units - two operands to the execution unit. Each of the
Primary Register Unit (PRU) and Double Precision operand can take value either from the register
Register Unit (DPRU). PRU contains 64 registers content or forwarded value from EX, MEM or WB
used by integer and single precision floating-point stage. So, a multiplexer (data input) in front of the
operations and DPRU contains two sets of 32 ALU block selects the actual operand. Since all the
registers used for double-precession floating-point computational blocks execute in parallel with the
operations. All the registers are of 32bit width. There input in DS stage are generated in ID stage, data and
are 45 GPRs and 19 SPRs in PRU. Six-bit address of produce 32-bit results, a multiplexer (ALU output) is
the registers also put after blocks to select the correct result. So,
to be read is specified in fixed locations in the EX stage in a general pipelining scheme contains
instruction. Performing register reading in a high- input data mux, operational block and output ALU
speed pipelined execution unit is a time critical mux. This is one of the critical timing paths to be
handled in high-speed pipeline design. To clock the
design. In CORE-I, register reading is performed in pipeline at high speed, in CORE-I the data selection
two stages ID and DS. Three least significant bits of for the computational blocks in EX stage is performed
register address are used in ID stage to select 8 from one stage ahead in DS stage. The multiplexed operand
the 64 registers and three most significant bits are is latched and then fed to the operational blocks. In
used in the DS stage to select final register out of 8 CORE-I, the ALU results multiplexing are done in
registers. MEM stage instead of EX stage. The main issue to be
handled for this organizational change of pipeline
occurs at the time of consecutive data dependent
instructions in the pipeline. At the time of true data
dependency [1] between two consecutive instructions,
the receiving instruction has to wait one cycle in the
DS stage, as the forwarding instruction has to reach
MEM stage to produce the correct ALU output.
Dependency Resolver controls the data forwarding.
The data flow in the pipeline among DS, EX and
MEM stages are shown in Figure-2. The address and
data calculation for the load, store instructions are
performed in the ALU.
Fig1.Six Stages of CORE-I Pipeline
D. Dependency Resolver:
DPRU contains 64 registers arranged in two banks –
The Dependency Resolver module resolves
each bank having 32 registers. The register banks are
the data hazards [1] of the six-stage pipeline
named as Odd Bank and Even Bank.Odd Bank
architecture of CORE-I. The true data dependency in
contains all odd numbered registers viz. regl, reg3 up
CORE-I is resolved mainly by data forwarding. But in
to reg63 and Even Bank contains the even numbered
case of data dependency between two consecutive
registers viz. reg0, reg2 up to reg62. This arrangement
instructions, some stages of the pipeline have to be
is made to provide two 64-bit operands to DPFPU
stalled for one cycle (as explained earlier in ALU
from the registers simultaneously. DP instruction
section).This module handles both stalling [1] as well
specifies only the address of even registers (e.g., r0,
as data forwarding.
r2), which are read from the Even Bank. But their
corresponding odd registers are also read and the
D.I Stalling
whole 64- bit operand is prepared (e.g., (rl: r0), (r3:
r2)). All the register reading is done in two clock
Dependency Resolver checks the instructions in ID
cycles as mentioned earlier. Special instruction is
and DS stages and generates enable signal for stalling.
used to transfer data between PRU and DPRU. The
In the next cycle this enable signal is used by logic
dependency between them is taken care by a separate
(placed in DS stage) and produces Freeze signal. This
Dependency Resolver.
Freeze signal stalls all the flops between IF/ID, ID/DS


and DS/EX stage. Below figure shows the stall enable 3 cycles and for other instructions like 32x32
and Freeze signal generation blocks in the ID, DS and multiplication, single-precision and double-precision
EX stage. EX-MEM-WB stages are not stalled. So floating-point operations, the number of cycles is
EX result moves to MEM and forwarded to DS programmable. It can be programmed through
instructions or setting values in dedicated input ports.
D.II Forwarding When the instruction reaches EX stage, the pipeline is
frozen for required number of cycles
In CORE-I architecture data are forwarded
from MEM,WB, WB+ and WB++ stages to DS stage.
WB+ and WB++ are the stages used for forwarding
only and contain the 'flopped' data of the previous
stage for the purpose of forwarding. Generation of all
the control signals for data forwarding as well as the
actual transfer of data is time critical. Uniqueness of
the CORE-I data forwarding is that, all the control
signals used for the forwarding multiplexers, one
clock cycle earlier. Then they are latched and used in
DS stage as shown in below figure. Fig3.Forwarding Scheme
The consequences of the early select signals F. Branch and Jump Unit:
generation are -
1. The forwarding instruction has to be CORE-I supports 14 conditional branch
checked one stage before the actual stage of instructions, 2 jump instructions and 1 return
forwarding. For example, to forward data from MEM instruction. Jump and return instructions are
stage, the instruction in the EX stage has to be scheduled in EX stage, i.e. PC value is updated when
checked with the receiving instruction. the instruction is in EX stage. But for the
2. The receiving instruction also has to be conditional branch instructions, condition is evaluated
compared with the forwarding instruction one stage in EX stage and PC value is updated in MEM stage.
before.Receiving stage is always DS. In most of the All these instructions have 3-delay slots [1].
situations the receiving instruction in DS, one clock
cycle back remains in ID. So the ID stage instruction G. Exception Processing Unit:
is compared with the forwarding instruction. But in
case of successive dependency, when IF, ID and DS CORE-I has external seven IRQs, 1 NMI,
stages are stalled for one cycle, the receiving and 1 PIO interrupt [2]. In addition to these external
instruction, before the forwarding remains in the DS interrupts, the onchip interrupt controller serves SIO,
stage itself. In that case the DS instruction is Timer interrupts, onchip exceptions due to arithmetic
compared with the forwarding instruction. For time operations, bus errors and illegal opcodes. CORE-I
constraint Dependency Resolver generates the control also supports software interrupts with TRAP
signals for both the cases and finally the correct one is instructions. The Interrupt Service Routine (ISR)
selected in DS stage. address for the interrupt is calculated in EX stage and
fed to the PCU. The return PC value and processor
status word is saved in the interrupt stack pointer
before transferring control to routine. At the time of
exception processing, if higher priority interrupts
come, interrupt controller serves the higher priority


The complete processor is modeled in

verilog HDL. The syntax of the RTL design is
checked using LEDA tool. For functional verification
of the design, the processor is modeled in high-level
language - System verilog [5]. The design is verified
Fig2 . Stall Enable and Freeze signal generation both at the block level and top level. Test cases for the
Logic block level are generated in System verilog by both
directed and random way. For top-level verification,
E. Multi Cycle Unit: assembly programs are written and the corresponding
Multiplication and floating-point operations hex code from the assembler is fed to both RTL
are multi cycle in CORE-I. 16x16 multiplication takes design and model. The checker module captures and


compares the signals from both the model and

displays the messages for mismatching of signal
values. For completeness of the verification, different
coverage matrices have been checked. The formal
verification of the design is also planned. The design
has been synthesized targeting 0.1 3micron standard
cell technology using cadence PKS tool [6]. The
complete design along with all timing constraints and
optimization options are described using TCL script
[7].When maximum timing corner in the library is
the synthesizer reports worst-path timing of 1.4ns in
thewhole design. After synthesis, verilog netlist is
verifiedfunctionally with the synthesizer generated
'sdf' file.


This paper has described the architecture of pipelined

execution unit of 32-bit RISC processor.The design
isworking at 714MHz after synthesis at 0.13micron
technology. Presently the design is being carried
through backend flow and chips will be fabricated
after it. Our future work includes changing the core
architecture to make it capable of handling multiple
threads and supporting network applications


[1] John L Hennessy, David A Patterson, Computer

Architecture: A
Quantitative Approach, Third Edition, Morgan
Kaufmann Publishers, 2003
[2] ABACUS User Manual, Advanced Numerical
Research and Analysis
[3] Norman P. Jouppi, Improving Direct-Mapped
cache Performance by the
Addition of a Small Fully-Associative Cache and
Prefetch Buffers, Digital
Equipment Corporation Western Research Lab, 1990
[4] IEEE Standard for Binary Floating-Point
Arithmetic, Approved March
21, 1985, IEEE Standards Board
[5] Accellera' s Extensions to Verilog, System Verilog
3.1 a language
Referecne manual
[6] 'PKS User Guide', Product version 5.0.12, October
[7] TCL and the Tk Toolkit


Monitoring Of an Electronic System Using Embedded

Technology 1 2
N.Sudha , Suresh R.Norman
1: N.Sudha, II yr M.E. (Applied Electronics), S S N College of Engineering, Chennai
2: Asst. Professor, SSN College of Engineering, Chennai
SSN College of Engineering (Affiliated by: Anna University)
Email id: , Phone no: 9994309837

Abstract- The embedded remote electronic measuring embedded keyboard. Then the embedded remote
system board, with interface modules uses an measurement system will output the voltage waveform
embedded board to replace a computer. The embedded into circuit under test. The users can then observe the
board has the advantages of being portable, operates in waveforms by means of the oscilloscope interface of the
real time, is low cost and also programmable with on embedded board. If the users are not satisfied with this
board operating system. waveform, they can re-setup another waveform. The
The design provides step by step function to dual channel LCM (Liquid Crystal Monitor) provides a
help the user operate, such as keying in the waveform comparison between the input and response waveforms
parameters with the embedded board keyboard, to determine the characteristics of the circuit under test.
providing test waveforms and then connecting the The network function can also transfer the measured
circuit-under-test to the embedded electronic waveforms to a distant server. Our design can provide
measurement system. This design can also display the different applications, in which any electronic factory
output response waveform measured by the embedded can build the design house and the product lines at
measurement system (from the circuit-under-test) on different locations. The designer can measure the
the embedded board LCM (Liquid Crystal Monitor). electronic products through the Internet and the
necessary interfaces. If there are any mistakes in the
I. INTRODUCTION circuit boards on the production line, our design the
Initially designed remote electronic measurement engineers can solve these problems through network
system used interface chips to design and implement the communications; which will reduce the time and cost.
interface cards for the functions of the traditional
electronic instruments of a laboratory such as power II. HARDWARE OF THE EMBEDDED REMOTE
supply, a signal generator and an oscilloscope. They ELECTRONIC MEASUREMENT SYSTEM
integrate the communication software to control the The embedded remote electronic measurement
communication between the computer and interface card system includes power supply, a signal generator and an
by means of an ISA (Industry Standard Architecture) bus oscilloscope. Fig. 1 shows the embedded remote
to transmit or receive the waveform data. By utilizing measurement system architecture.
widely used software and hardware, the users at the client
site will convey the waveform data through the computer
network and can not only receive the input waveforms
from the clients, but can also upload the measurement
waveforms to a server. In this remote electronic
measurement system has some disadvantages: it requires
operation with a computer and it is not portable.
The intended embedded electronic measurement
system overcomes this disadvantage by replacing the Fig 1. The embedded remote electronic measurement
computer with the embedded board, since it has an system architecture
advantage of portability, has real time operation, is low
cost, and is programmable. Fig. 2 shows the hardware interface modules of
In this design the users need only key in the embedded remote electronic measurement system.
the waveform and voltage parameters using the We can divide these interface modules into three parts,
ADC, DAC and the control modules. The function of the
ADC module is for the oscilloscope that is used mainly
to convert the analog signal to the digital format for the
measurement waveform. The function of the DAC
module is to convert the digital signal to analog signal


to convert the analog signal to the digital format for the

measurement waveform. The function of the DAC
module is to convert the digital signal to analog signal
for outputting, such as power supply and signal
generator. The control signals manage the ADC and
DAC modules to connect and transfer the measurement
waveform data to the embedded board during each
board during each time interval to avoid snatching
resources from each other.

Fig 4. Flow chart of the embedded signal generator

According to the sampling theory, the sample rate should

be more than twice that of the signal bandwidth, but in our
design the sample rate is 10 times to signal bandwidth,
Fig 2. The interface module of the embedded remote then only we can get the better quality of waveforms.
electronic measurement system If a waveform is only composed of two or three
samples, the triangular waveform is same as the sine
A).DAC modules provide the major functions of the waveform, and the users cannot recognize what kinds of
power supply and signal generator. The power supply waveform it is, so the sample rate must be more than ten
provides stable DC voltage but the signal generator times larger than the signal bandwidth in order to
provides all kinds of waveform, like sine, square and recognize waveforms.
triangular waveform.
a. The power supply provides an adjustable DC
voltage with varying small drift and noises in the output
voltage. The embedded board keyboard establishes the
voltage values and sends the data to the power supply
module. The system can make a maximum current
value up to 2A.
b. The signal generator provides the specific Fig 5. The procedure of analog to digital conversion.
waveforms which are based on sampling theory. The embedded oscilloscope provides the view of the
According to the sampling theory, if the conversion rate measurement waveform for the users and transfers the
is 80 MHz, the clear waveform is composed of at least waveform data to the server in the distance for
10 samples, so it can only provide the maximum verification. Because the resolution of the embedded board
waveform bandwidth of 8MHz. LCM is lower than the resolution of computer monitor;
user can only observe a low quality of waveform in the
LCM in the client site.
If desired to observe high quality of waveform, user can
transfer the measurement waveform to a computer system.
C) The control module provides the control between the
Fig 3.The signal generator of the embedded remote memory and the data storage, and I/O connection. Because
electronic measurement system the I/O pins are limited, not every extra module can
In our design, by using a keypad the users can connect to an independent I/O simultaneously. Some of the
enter all settings of waveform, amplitude and frequency I/O pins need to be shared or multiplexed. As the
into an embedded board and the embedded board will embedded board cannot recognize which module is
output the waveform into the testing circuit. The users connected to it and cannot allocate the system resources to
can preview the waveforms in LCM. If users are not an extra module, we need to use a control module to
satisfied with this waveform, they can re-setup another manage every extra module. A control module includes
waveform. Fig 4. shows the signal generation flow three control chips, which has three I/O ports and a
chart. bidirectional data bus which very convenient for the input
B) The ADC module provides the key function of the and the output.
oscilloscope, which converts the analog signal to digital
format for the embedded board.



REMOTE ELECTRONIC MEASUREMENT Embedded design takes very little space and is
SYSTEM easily portable. The electronic instrument operation is
The overall performance of an embedded unified as our design uses less system resources, increases
system is poor compared to that of a PC, but the the operating efficiency and has more functions together
embedded execute a specific application program which with an extra I/O interface. This design can replace the
require less resources and more reliable than that of a traditional signal generator and oscilloscope and integrate
PC. In addition, the embedded system can be designed the Measurement Testing system into one portable system.
by using C language and forms the GUI module for It is also possible for electronic engineers to remotely
users operation. The advantages of the design it is easy measure circuits under test through the network
to use and easy to debug the program errors. Fig.6 transmission of the measurement waveform.
shows the flowchart of the embedded remote electronic
measurement system. At first, the users can choose the REFERENCES
type of the instruments, and then key in the relative
parameters of the instrument. For example, one can key [1] M. J. Callaghan, J. Harkin, T. M. McGinnity and L. P.
in the voltage values for the power supply, and key in Maguire, “An Internet-based methodology for remotely
waveform types for the generator module.When the accessed embedded systems,” Proceedings of the IEEE
system set up is finished, this system will output the DC International Conference on Systems, Man and
voltages and the waveforms. In addition, if the users Cybernetics, Volume 6, pp. 6-12, Oct. 2002
operate the oscilloscope, they only need to choose the [2] Jihong Lee and Baeseung Seo, “Real-time remote
sample rate and one can observe the measurement monitoring system based on personal communication
waveform in LCM. If the users want to send the service (PCS),” Proceedings of the 40th SICE Annual
waveform to the server, they just only need to key in Conference, pp. 370-375, July 2001.
the server IP address. When the transmission is [3] Ying-Wen Bai, Cheng-Yu Hsu, Yu-Nien Yang, Chih-
connected, the embedded system will send the data to Tao Lu, Hsiang-Ting Huang and Chien-Yung Cheng,
the server. Another advantage of the embedded remote “Design and Implementation of a Remote Electronic
electronic measurement system is the server can receive Measurement System,” 2005 IEEE Instrumentation and
a lot of Measurement Technology Conference, pp.1617-1622,
waveforms from the different client sites. If the May 2005.
observer has some questions to ask one of the clients, h [4] T. Yoshino, J. Munemori, T. Yuizono, Y. Nagasawa,
the observer can send the defined waveform to that S. Ito, K.Yunokuchi, Parallel Processing, “Development
embedded remote electronic measurement system and and application of a distance learning support system using
collect the output waveform again. This function can personal computers via the Internet Yoshino,” Proceedings
assist users to debug the distance circuits to both locate of 1999 International Conference on Parallel Processing,
and understand the problem of the testing circuit in pp. 395-402, Sept. 1999
detail. [5] Li Xue Ming, “Streaming technique and its application
in distance learning system,” Proceedings of the 5th e
International Conference on Signal Processing, Volume 2, r
pp. 1329-1332, Aug. 2000.
[6] J.L.Schmalzel, P.A.Patel, and H.N.Kothari, 7
“Instrumentation curriculum: from ATE to VXI,”
ProceedingsC of the 9th IEEE Instrumentation and
Measurementh Technology Conference, pp. 265-267, May
1992. a
p Bai, Hong-Gi Wei, Chung-Yueh Lien and
[7] Ying-Wen
Hsin-Lungt Tu, “A Windows-Based Dual-Channel
Arbitrary Signal Generator,” Proceedings of the IEEE
Instrumentation and Measurement Technology
Conference, pp. 1425-1430 May 2002.
[8] Digital Signal Processing; A Practical Approach,
Emmanuel C. Ifeachor, and Barrie W. Jervis, Second
Edition, 1996.
[9] Ying-Wen Bai and Hong-Gi Wei, “Design and
implementation of a wireless remote measurement
system,” Proceedings of the 19th IEEE Instrumentation
and Measurement Technology Conference, IMTC/2002.
Volume 2, pp. 937-942, May 2002.
[10] Ying-Wen Bai and Cheng-Yu Hsu, “Design and
Fig 6. The Flow chart of the embedded remote Implementation of an Embedded Remote Electronic
electronic measurement system. Measurement System,” 2006 IEEE Instrumentation and
Measurement Technology Conference, pp.1311-1316,
April 2006

The Design of a Rapid Prototype Platform for ARM Based

Embedded System
A.Antony Judice1 Mr. Suresh R Norman2
A.Antony Judice , IInd M.E. (Applied Electronics), SSN College of Engineering, Kalavakkam,
Chennai-603110. Email:
Mr.Suresh R Norman, Asst.Prof., SSN College of Engineering, Kalavakkam, Chennai-603110.

signal processors and Field Programmable Gate-

Abstract— Embedded System designs continue to Arrays. The development tools used by system
increase in size, complexity, and cost. At the same designers are often rather primitive: simulation
time, aggressive competition makes today's models for FPGA devices, synthesis tools to map the
electronics markets extremely sensitive to time-to- logic into FFGAs, some high-level models and
market pressures. A hardware prototype is a emulation systems for micro-controllers, software
faithful representation of the final design, tools such as editors, debuggers and compilers. One
guarantying its real-time behavior. And it is also of the major design validation problem facing an
the basic tool to find deep bugs in the hardware. embedded system designer is the evaluation of
The most cost effective technique to achieve this different hardware-software partitioning. Reaching
level of performance is to create an FPGA-based the integration point earlier in the design cycle not
prototype. As both of the FPGA and ARM only finds any major problems earlier while there is
Embedded system support the BST(Boundary still time to fix them, but also speeds software
Scan Test), we can detect faults, on the connections development. Most times, software integration and
between the two devices by chaining their JTAG debugging could start much earlier, and software
ports and performing BST. Since FPGA-based development proceed faster, if only a hardware (or
prototypes enable both chip and system-level ASIC) prototype could consistently be available
testing, such prototypes are relatively inexpensive, earlier in the development cycle. A possible choice
thereby allowing them to be provided to multiple for design validation is to simulate the system being
developers and also to be deployed to multiple designed. However, this approach has several
development sites. Thus fast prototyping platform drawbacks. If a high-level model for the system is
for ARM based embedded systems, providing a used, simulation is fast but may not be accurate
low-cost solution to meet the request of flexibility enough. With a low-level model too much time may
and testability in embedded system prototype be required to achieve the desired level of confidence
development. in the quality of the evaluation. Modeling a composite
system that includes complex software programmable
Index Terms— FPGA, Rapid prototyping, components is not easy due to synchronization issues.
Embedded system, ARM, Reconfigurable In most embedded system applications, safety and
Interconnection. the lack of a well-defined mathematical formulation
of the goals of the design makes simulation inherently
I.INTRODUCTION ill-suited for validation. For these reasons, design
teams build prototypes which can be tested on the
field to physically integrate all components of a
APID prototyping is a form of collecting system for functional verification of the product
Rinformation on requirements and on the adequacy of before production. Since design specification changes
possible designs. Prototyping is very useful at are common, it is important to maintain high
different stages of design such as product flexibility during development of the prototype. In
conceptualization at the task level and determining this paper we address the problem of hardware
aspects of screen design. Embedded system designers, software partitioning evaluation via board-level rapid
are under more and more pressure to reduce design prototyping. We believe that providing efficient and
time often in presence of continuously changing flexible tools allowing the designer to quickly build a
specifications. To meet these challenges, the hardware software prototype of the embedded system
implementation architecture is more and more based will help the designer in this difficult evaluation task
on programmable devices: micro-controllers, digital more effectively than having a relatively inflexible
non-programmable prototype. Furthermore, we
believe that coupling this board-level rapid-
prototyping approach with synthesis tools for fast


programming data for both the devices and the be fixed and supportive, but potentially constraining,
interconnections among them can make a difference or free and flexible In interactive systems design
in shortening the design time. The problems that we prototypes are critical in the early stages of design
must solve include: fast and easy implementation of a unlike other fields of engineering design where design
prototype of the embedded system: validation of decisions can initially be carried out analytically
hardware and software communication without relying on a prototype. In systems design,
(synchronization between hardware and software prototyping is used to create interactive systems
heavily impacts the performance of the final product). design where the completeness and success of the
Our approach is based on the following basic user interface can utterly depend on the testing.
ideas: 1.Use of a programmable board, a sort of Embedded systems are found everywhere. A
universal printed circuit board providing re- specialized computer system that is part of a larger
programmable connections between components. system or machine. Typically, an embedded system is
With a programmable board as prototyping vehicle, housed on a single microprocessor board with the
the full potential the FPGA can be exploited FPGA programs stored in ROM. Virtually all appliances that
Programming no longer affected by constraints such have a digital interface--watches, microwaves, VCRs,
ns a fixed pin assignment due to the custom printed cars -- utilize embedded systems. Some embedded
board or a wire wrap prototype. 2. Debugging the systems include an operating system , but many are so
prototype by loading the code on the target emulator specialized that the entire logic can be implemented
and making it run, programming the FPGA, providing as a single program.
signals to the board via pattern generator and In order to deliver correct-the-first-time products
analyzing the output signals via a logic analyzer. This with complex system requirements and time-to-
approach can be significantly improved by using market pressure, design verification is vital in the
debugging tools for both software and hardware in embedded system design process. A possible choice
order to execute step by step the software part and the for verification is to simulate the system being
clock-triggered hardware part. We argued that designed. Since debugging of real systems has to take
prototyping is essential to validate an embedded into account the behavior of the target system as well
system. However, to take full advantage of the as its environment, runtime information is extremely
prototyping environment, it is quite useful to simulate important. Therefore, static analysis with simulation
the design as much as feasible at all levels of the methods is too slow and not sufficient. And
hierarchy. Simulation is performed at different stages simulation cannot reveal deep issues in real physical
along the design flow. At the specification level we system. A hardware prototype is a faithful
use an existing co-simulation environment for representation of the final design, guarantying its real-
heterogeneous systems, which provides interfacing a time behavior. And it is also the basic tool to find
well-developed set of design aids for digital signal deep bugs in the hardware. For these reasons, it has
processing. become a crucial step in the whole design flow.
Prototyping can be used to gain a better Traditionally, a prototype is designed similarly to the
understanding of the kind of product required in the target system with all the connections fixed on the
early stages of system development where several PCB (printed circuit boards)
different sketch designs can be presented to users and As embedded systems are getting more complex,
to members of the development team for critique. The the needs for thorough testing become increasingly
prototype is thrown away in the end although it is an important. Advances in surface-mount packaging and
important resource during the products development multiple-layer PCB fabrication have resulted in
of a working model. The prototype gives the designer smaller boards and more compact layout, making
a functional working model of their design so they traditional test methods, e.g., external test probes and
can work with the design and identify some of its "bed-of-nails" test fixtures, harder to implement. As a
possible pros and cons before it is actually produced. result, acquiring signals on boards, which is beneficial
The prototype also allows the user to be involved in to hardware testing and software development,
testing design ideas Prototyping can resolve becomes infeasible, and tracking bugs in prototype
uncertainty about how well a design fits the user's becomes increasingly difficult. Thus the prototype
needs. It helps designers to make decisions by design has to take account of testability. If errors on
obtaining information from users about the necessary the prototype are detected, such as misconnections of
functionality of the system, user help needs, a suitable signals, it could be impossible to correct them on the
sequence of operations, and the look of the interface. multiple-layer PCB board with all the components
It is important that the proposed system have the mounted. All these would lead to another round of
necessary functionality for the tasks that users may prototype fabrication, making development time
want to perform anywhere from gathering information extend and cost increase.
to task analysis. Information on the sequence of Besides testability, it is important to maintain high
operations can tell the designers what users need to flexibility during development of the prototype as
interact successfully with the system. Exchanges can design specification changes are common. Nowadays


complex systems are often not built from scratch but processors, adopt a similar architecture as the one
are assembled by re using previously designed shown in Fig. 1. The integrated memory controller
modules or off-the-shelf components such as provides an external memory bus interface supporting
processors, memories or peripheral circuitry in order various memory chips and various operation modes
to cope with more aggressive time-to-market (synchronous, asynchronous, burst modes). It is also
constraints. Following the top-down design possible to connect bus-extended peripheral chips to
methodology, lots of effort in the design process is the memory bus. The on-chip peripherals may include
spent on decomposing the customers’ requirements interrupt controller, OS timer, UART, I2C, PWM,
into proper functional modules and interfacing them AC97, and etc. Some of these peripherals signals are
to compose the target system. Some previous research multiplexed with general-purpose digital I/O pins to
works have suggested that FPLD (field programmable provide flexibility to user while other on-chip
logic device) could be added to the final design to peripherals, e.g. USB host/client, may have dedicated
provide flexibility as FPLD’S can offer peripheral signal pins. By connecting or extending
programmable interconnections among their pins and these pins, user may use these on chip peripherals.
many more advantages. However, extra devices may When the on-chip peripherals cannot fulfill the
increase production cost and power dissipation, requirement of the target system, extra peripheral
weakening the market competition power of the target chips have to be extended.
system. To address these problems, there are also To enable rapid prototyping, the platform should be
suggestions that FPLD’S could be used in hardware capable of quickly assembling parts of the system into
prototype as an intermediate aproach Moreover, a whole through flexible interconnection. Our basic
modules on the prototype cannot be reused directly. idea is to insert a reconfigurable interconnection
In industry, there have been companies that provide module composed by FPLD into the system to
commercial solutions based on FPLD’S for rapid provide adjustable connections between signals, and
prototyping. Their products are aimed at SOC (system to provide testability as well. To determine where to
on a chip) functional verification instead of embedded place this module, we first analyze the architecture of
system design and development. It also encourages the system. The embedded system shown in Fig. 2 can
concurrent development of different parts of system be divided into two parts. One is the minimal system
hardware as well as module reusing. composed of the embedded processor and memory
devices. The other is made up of peripheral devices
extended directly from on-chip peripheral interfaces
of the embedded processor, and specific peripheral
chips and circuits extended by the bus. The minimal
system is the core of the embedded system,
determining its processing capacity. The embedded
processors are now routinely available at clock speeds
of up to 400MHz, and will climb still further. The
speed of the bus connecting the processor and the
memory chips is exceeding 100MHz. As pin-to-pin
propagation delay of a FPLD is in the magnitude of a
few nanoseconds, inserting such a device will greatly
impair the system performance. The peripherals
enable the embedded system to communicate and
interactive with the circumstance in the real world. In
general, peripheral circuits are highly modularized
Fig.1 FPGA architecture and independent to each other, and there are hardly
needs for flexible connections between them.
Here we apply a reconfigurable interconnection
module to substitute the connections between
microcomputer and the peripherals, which enables
flexible adjusting of connections to facilitate
II. THE DESIGN OF A RAPID PROTOTYPING interfacing extended peripheral circuits and modules.
PLATFORM As the speed of the data communication between the
A. Overview peripherals and the processor is much slower than that
in the minimal system, the FPLD solution is feasible.
ARM based embedded processors are wildly used in Following this idea, we design the Rapid Prototyping
embedded systems due to their low-cost, low-power Platform as shown in Fig. 2 We define the interface
consumption and high performance. An ARM based ICB between the platform and the embedded
embedded processor is a highly integrated SOC processor core board that holds the minimal system of
including an ARM core with a variety of different the target embedded system. The interface IPB
system peripherals. Many arm based embedded


between the platform and peripheral boards that hold

extended peripheral circuits and modules is also
defined. These enable us to develop different parts of
the target embedded system concurrently and to
compose them into a prototype rapidly, and encourage
module reusing as well. The two interfaces are
connected by a reconfigurable interconnect module.
There are also some commonly used peripheral
modules, e.g. RS232 transceiver module, bus
extended Ethernet module, AC97 codec,
PCMCIA/Compact Flash Card slot, and etc, on the
platform which can be interfaced through the
reconfigurable interconnect module to expedite the
embedded system development.

Fig 3:FPGA design flow

The use of FPLD to build the interconnection module

not only offers low cost and simple architecture for
fast prototyping, but also provides many more
advantages. First,
Interconnections can be changed dynamically through
internal Logic modification and pin re-assignment to
the FPLD. Second, as FPLD is connected with most
pins from the embedded processor, it is feasible to
detect interconnection problems due to design or
physical fabricate fault in the minimal system with
BST (Boundary-Scan Test, IEEE Standard 1149.1
specification). Third, it is possible to route the FPLD
internal signals and data to the FPLD’S I/O pins for
quick and easy access without affecting the whole
system design and performance.
It is even possible to implement an embedded logical
FIGURE 2: Schematic of the Rapid Prototyping analyzer into the FPLD to smooth the progress of the
Platform hardware verification and software development.
B. Reconfigurable Interconnect Module
Before the advent of programmable logic, custom
With the facility of state-of-arts FPLD’S, we design
logic circuits were built at the board level using
interconnection module to interconnect, monitor and
standard components, or at the gate level in expensive
test the bus and I/O signals between the minimal
application-specific (custom) integrated circuits. The
system and peripherals. As the bus accessing obeys
FPGA is an integrated circuit that contains many (64
specific protocol and has control signals to identify
to over 10,000) identical logic cells that can be
the data direction, the interconnection of the bus can
viewed as standard components. Each logic cell can
be easily realized by designing
independently take on any one of a limited set of
A corresponding bus transceiver into the FPLD,
personalities. The individual cells are interconnected
whereas the Interconnection of the I/Os is more
by a matrix of wires and programmable switches. A
complex. As I/Os are Multiplexed with on-chip
user's design is implemented by specifying the simple
peripherals signals, there may be I/Os with bi-
logic function for each cell and selectively closing the
direction signals, e.g. the signals for on-chip I2C
switches in the interconnect matrix. The array of
interface, or signals for on-chip MMC (Multi Media
logic cells and interconnects form a fabric of basic
Card interface. The data direction on these pins may
building blocks for logic circuits. Complex designs
alter without an external indication, making it difficult
are created by combining these basic blocks to create
to connect them via a FPLD. One possible solution is
the desired circuit.
to design a complex state machine according to
Field Programmable means that the FPGA function
corresponding accessing protocol to control the data
is defined by a user's program rather than by the
transfer direction. In our design we assign specific
manufacturer of the device. A typical integrated
locations on the ICB and IPB interfaces to these bi-
circuit performs a particular function defined at the
direction signals and use some jumpers to directly
time of manufacture. In contrast, the FPGA function
connect these signals when needed. The problem is
is defined by a program written by someone other
circumvented at the expense of losing some
than the device manufacturer. Depending on the


particular device, the program is either 'burned'

in permanently or semi-permanently as part of a board
assembly process, or is loaded from an external
memory each time the device is powered up. This
user programmability gives the user access to
complex integrated designs without the high
engineering costs associated with application specific
integrated circuits.

C. Design Flow Changes Allowed by re


This combination of moderate density, re

programmability and powerful prototyping tools
provides a novel capability for systems designers:
hardware that can be designed with a software-like
iterative-implementation methodology. Figure 4
Shows a typical ASIC design methodology in which
the design is verified by simulation at each stage of
refinement. Accurate simulators are slow; fast
simulators trade away simulation accuracy. ASIC
designers use a battery of simulators across the speed
accuracy spectrum in an attempt to verify the design.
Although this design flow works with FPGA’S as
well, an FPGA designer can replace simulation with
in-circuit verification, “simulating” the circuitry in
real time with a prototype. The path from design to
prototype is short, allowing a designer to verify
operation over a wide range of conditions at high
speed and high accuracy. This fast design-place-
route-load loop is similar to the software edit-
compile-run loop and provides the same benefits.
Designs can be verified by trial rather than reduction
to first principles or by mental execution. A designer
can verify that the design works in the real system,
not merely in a potentially-erroneous simulation
model of the system. This makes it possible to build
proof-of-concept prototype designs easily. Design-by-
prototype does not verify proper operation with Fig 4: Contrasting design methodologies: (a)
worst-case timing merely that the design works on the Traditional gate arrays; and (b) FPGA
presumed-typical prototype part. To verify worst-case
timing, designers may check speed margins in actual Prototype versus Production
voltage and temperature corners with a scope,
speeding up marginal signals; they may use a As with software development, the dividing line
software timing analyzer or simulator after debugging between
to verify worst-case paths; or simply use faster speed- prototyping and production can be blurred with a
grade parts in production to ensure sufficient speed reprogrammable FPGA. A working prototype may
margin over the complete temperature and voltage qualify
range. as a production part if it meets cost and performance
goals. Rather than re-design, an engineer may choose
to substitute a faster speed FPGA using the same
programming bit stream, or a smaller, cheaper
compatible FPGA with more manual work to squeeze
the design into a smaller IC. A third solution is to
substitute a mask-programmed version of the LCA for
the field-programmed version. All three of these
options are much simpler than a system re design
.Rapid prototyping is most effective when it becomes
rapid product development.


Field Upgrades analyzer provided in Altera’s Quartus II software,

into the FPGA for handling more complicated
Re programmability allows a systems designer situations. Quartus II software enables the highest
another option: that of modifying the design in the levels of productivity and the fastest path to design
FPGA by changing the programming bit stream after completion for both high-density and low-cost FPGA
the design is in the customer’s hands. The bit stream design. Dramatically improve your productivity
can be stored PROM or elsewhere in the system. For compared to traditional FPGA design flows. Take
example, an FPGA used as a peripheral on a computer advantage of the following productivity enhancing
may be loaded from the computer’s disk. In some features today. With the help of the logical analyzer,
existing systems, manufacturers send modified we are able to capture and monitor data passing
hardware to customers as anew bit stream on a floppy through over a period of time, which expedites the
disk or as a file sent over a modem. debugging process for the prototype system.

Reprogram ability for Board-Level Test

The most common system use of re programmability

is for board-level test circuitry. Since FPGA’S are
commonly used for logic integration, they naturally
have connections to major subsystems and chips on
the board. This puts the FPGA in an ideal location to
provide system-level test access to major subsystems.
The board designer makes one configuration of the
FPGA for normal operation and a separate
configuration for test mode. The “operating” logic
and the “test” logic need not operate simultaneously,
so they can share the same FPGA. The test logic is
simply a different configuration of the FPGA, so it
requires no additional on-board logic or wiring. The
test configuration can be shipped with the board, so
the test mode can also be invoked as a diagnostic after
delivery without requiring external logic

III.EXPERIMENTAL RESULTS FIGURE 5:Hardware Prototyping by making use of

As the Rapid Prototyping Platform is still under
development, we present an example applied with the Boundary scan test (IEEE 1149.1)
same considerations in the Rapid Prototyping This incorporated earlier boundary scan tests that
Platform. It is an embedded system prototype based had been developed for testing printed circuit
on Intel XScale PXA255,which is an ARM based boards. Boundary scan allows the engineer to
embedded processor. The diagram of the prototype is verify that the connections on a board are
illustrated in Fig. 5. where a Bluetooth module is functioning.
connected to the prototype USB port and a CF LAN • The JTAG standard uses an internal shift
card is inserted. The FPGA (an Altera Cyclone register which is built into JTAG compliant
EP1C6F256) here offers the same function as the devices.
reconfigurable interconnection module shown in Fig. • This, boundary scan register, allows
2. Most of the peripheral devices are expanded to the monitoring and control of all I/O pins,
system through the FPGA, and more peripherals can signals and registers
be easily interfaced when needed. As both of the • The interface to JTAG is through a standard
FPGA and PXA255 support the BST, we can detect PC parallel port to a JTAG port on the board
faults, e.g. short circuit and open-circuit faults, on the to be tested.
connections between the two devices by chaining
their JTAG ports and performing BST. Here, we use
an open source software package to perform the BST.
The FPGA internal signals can be routed to the
LED matrix for easy access, which is helpful in some
Testing and debugging. We also insert an embedded
logical analyzer, the Signal Tap II embedded logic


Boundary scan
PC o/p ports
register [1] S. Trimberger, “A Reprogrammable Gate Array
and Applications,”Proc. IEEE, Vol. 81, No. 7, July
TDI data in
1993, pp. 1030-1041.
[2] Hauck, S, "The roles of FPGA’S in

ins truc tion

De vic e i d
TAP clock
mode reprogrammable systems", Proc.IEEE , Vol. 86 ,
reset Issue: 4 , April 1998, pp. 615 – 638
0101 [3] Cardelli, S.; Chiodo, M.; Giusto, P.; Jurecska, A.;
TDO data out
Lavagno, L.;Sangiovanni-Vincentelli, A.“Rapid-
Prototyping of Embedded Systems via
SP i/p ports
Reprogrammable Devices,” Rapid System
Prototyping, 1996.Proceedings., Seventh IEEE
International Workshop on.
[4] Product Briefs for System Explorer
Reconfigurable Platform with debugging capabilities,
• The operation of the BSR is controlled by
the TAP(Test Access Port) state machine .pdf
• This decodes the instructions sent through [5] Steve Furber, Stephen B., ARM system-on-chip
the BSR and decides what actions to take. architecture, Addison-Wesley, 2000.
• The actions mainly involve loading the BSR [6] Intel® PXA255 Processor Developer's Manual,
cells with data, executing the instructions
and then examining the results. rs/manuals/278693.htm
• The TAP is controlled by the state of TMS [7 JTAG Tools,
• As long as the addresses, data and signal
states are correct, the developer can do tasks
such as query the flash chips, erase them,
and load data in


In this paper we have shown that the use of a

flexible prototype based on re-programmable devices
and coupled with a set of synthesis tools that provide
fast programming data can shorten the design time.
The prototype we obtain is neither the result of a
software simulation nor the result of hardware
emulation, because it is made up of a hardware
circuitry and a software implementation. So, the
components used in the prototype are close to those
used in the final product. Moreover, it is possible to
evaluate the tradeoff between hardware partition and
software partition. Last but not least, the technology
used allows real-time operation mode. This approach
can help to fix errors early in the design cycle since
the hardware-software integration test is done earlier
than the common approaches with custom PCB.
The next step will be to test the prototype in a real
environment. In this paper, we discuss the design of a
fast prototyping platform for ARM based embedded
systems to accommodate the requirements of
flexibility and testability in the prototyping phase of
an embedded system development.


Implementation of High Throughput and

Low Power FIR Filter in FPGA
V.Dyana Christilda B.E*, R.Solomon Roach M.Tech**
Student, Department of Electronics and Communication Engineering
Lecturer, Department of Electronics and Communication Engineering
Francis Xavier Engineering College,Tirunelveli.

Abstract-This paper presents the implementation techniques to reduce power consumption of FIR
of high throughput and low power FIR filtering IP filters. The authors in [l] utilize differential
cores. Multiple datapaths are utilized for high coefficients method (DCM) which involves using
throughput and low power is achieved through various orders of differences between coefficients
coefficient segmentation, block processing and along with stored intermediate results rather than
combined segmentation and block processing using the coefficients themselves directly for
algorithms.Also coefficient reduction algorithm is computing the partial products in the FIR equation.
proposed for modifying values and the number of To minimize the overhead while retaining
non-zero coefficients used to represent the FIR the benefit of DCM, differential coefficient and input
digital pulse shaping filter response. With this method (DCIM) [2] and decorrelating (DECOR) [3]
algorithm, the FIR filter frequency and phase have been proposed. Another approach used in [4] is
response can be represented with a minimum to optimize word-lengths of input/output data samples
number of non-zero coefficients. Therefore, and coefficient values. This involves using a general
reducing the arithmetic complexity needed to get search based methodology, which is based on
the filter output. Consequently, the system statistical precision analysis and the incorporation of
characteristic i.e. power consumption, area usage, cost/performance/power measures into an objective
and processing time are also reduced The paper function through word-length parameterization. In
presents the complete architectural [5], Mehendale et al. presents an algorithm for
implementation of these algorithms for high optimizing the coefficients of an FIR filter. So as to
performance applications. Finally this FIR filter is reduce power consumption in its implementation on a
designed and implemented in FPGA. programmable digital signal processor.
This paper presents the implementation of
1.INTRODUCTION high throughput and low power FIR filtering
Intellectual Property (IP) cores. This paper shows
One of the fastest growing areas in the computing their implementation for increased throughput as well
industry is the provision of high throughput DSP as low power applications, through employing
systems in a portable form. With the advent of SoC multiple datapaths. The paper studies the impact of
technology, Due to the intensive use of FIR filters in parameterization in terms of datapaths parallelization
video and communication systems, high performance on the power /speed/ area performance of these
in speed, area and power consumption is demanded. algorithms.
Basically, digital filters are used to modify the
characteristic of signals in time and frequency domain II.GENERAL BACKGROUND
and have been recognized as primary digital signal
processing operations For high performance Finite Impulse Response filters have been used in
low power applications, there is a continuous signal processing as ghost cancellation and channel
demand for DSP cores, which provide high equalization . FIR filtering of which the output is
throughput while minimizing power consumption. described in Equation 1 is realized by a large number
Recently, more and more traditional applications and of adders, multipliers and delay elements.
functionalities have been targeted to palm-sized
devices, such as Pocket PCs and camera-enabled
mobile phones with colorful screen. Consequently,
not only is there a demand of provision of high data
processing capability for multimedia and
communication purposes, but also the requirement of Where Y[n] is the filter output, X[n k]is input data,
power efficiency has been increased significantly. and h[k]is the filter coefficient. Direct form of a finite
Furthermore, power dissipation is becoming word length FIR filter generally begins with rounding
a crucial factor in the realization of parallel mode FIR or truncating the optimum infinite precision
filters. There is increasing number of published


coefficients determined by McClellan and Parks IV.DESIGN AND METHODOLOGY

A. Coefficient Segmentation Algorithm
In DSP applications due to the ease of
The block diagram of a generic DF FIR filter performing arithmetic operations. Nevertheless, sign
implementation is illustrated in Figure 1.It consists of extension is its major drawback and causes more
two memory blocks for storing coefficients (HROM) switch activity when data toggles between positive
and input data samples (XRAM), two registers for and negative values. For this reason, in Coefficient
holding the coefficient (HREG) and input data segmentation algorithm, the coefficient h is
(XREG). an output register (OREG), and the segmented into two parts; one part, mk for the
controller along with the datapath unit. The XRAM is multiplier and one part, sk for the shifter.
realized in the form of a latch-based circular buffer Segmentation is performed such that mk is the
for reducing its power consumption. The controller is smallest positive value in order to minimize the
responsible for applying the appropriate coefficients switching activity at the multiplier input. On the other
and data samples to the datapath. hand skis a power of two number and could be both
positive and negative depending on the original
FSM HROM XRAM coefficient.
The MSB bit of sk acts as the sign bit and
remainder are the measure of shift. For instance, if a
HREG XREG coefficient is 11110001, the decomposed number and
shift value will be 00000001 and 10100, respectively.
ARITHMETIC UNIT An example of 2 datapath implementation
architecture of this algorithm is shown in Figure 4.

Fig-2 low power generic FIR core

In order to increase the throughput, the number of

datapaths should be increased and data samples and
coefficients should be allocated to these datapaths in
each clock cycle. For example, for a 4-tap FIR filter
with 2 datapaths, the coefficient data can be separated
in to 2 parts, (h3,h2)and (h1,h0) each allocated to a
different datapath with corresponding set of input data
samples, as shown in Figure 2. Therefore, an output Fig.4 AU of segmentation algorithm
will be obtained in [N/M] clock cycles, where N is the
number of taps and M is the number of datapaths. For The AU for the coefficient segmentation algorithm is
example, for a 4-tap filter, an output can be obtained shown in Fig. 4. It consists of a multiplier (mult), an
in 2 clock cycles with 2 datapaths. adder (add), a logarithmic shifter (shift) implemented
using arrays of 2-10-1 multiplexers, a conditional
two's complementor (xconv), a multiplexer (mux) to
load and clear the slufter and a clearing block (clacc)
identical to the one in the conventional FIR filtering
block. The MSB of the shift value sk determines if a
negative shift has to be performed and therefore
controls the conversion unit xconv.

Fig. 3 A 2 datapath architecture


results in terms of power saving. An example

datapath allocation for N=6 and L=2 and its
corresponding architecture is shown in Figure 5 .
The sequence of steps for the multiplication scheme
can be summarized as follows:
1. Get the first filter coefficient, h(N-1).
2. Get data samples x[n-(N-1)], x[n-(N-2)], . . .,x[n-
(N-L)] and save them into data registers Ro, R1,. . . ,
RL-1 respectively
3. Multiply h(N-1) by R0, R1, ..., RL-1 and add the
products into the accumulators
ACC0, ACC1, ..., ACCL.1 respectively.
4. Get the second coefficient, h(N-2).
5. Get the next data sample, x[n-(N-L-l)], and place it
in Ro overwriting the oldest data sample in the block.
6. Process h(N-2) as in step (3), however, use
registers in a circular manner, e.g. multiply h(N-2) by
R1, . . ., RL-I,R0 Their respective products will be
added to accumulators ACC0, ACC1, . . . ,.ACCL-I.
Fig.5 Flow chart of algorithm Process the remaining coefficients as for h(N-2).
7. Get the output block, y(n), y(n-I), ..., y(n-L), from
The output of xconv is the two's complement of the ACCo, ACCl, ..., .ACCL.-1 respectively.
data only if the MSB of sk is one, otherwise the 8. Increment n by L and repeat steps (1) to (7) to
output is equal to the input data. When hk is zero obtain next output block.
(mk=0, sk = 0) or one (mk= 1, sk=0), the shift value
will be zero. In these cases, the output of the shifter
must be zero as well. To guarantee this behavior, a
multiplexer is needed between the conversion unit
and the shifter that applies a zero vector when sk
equals zero. Since three values (multiplier, shifter and
accumulator outputs) are to be added, a single multi
input adder carries out this addition.

B. Data Block-Processing
The main objective of block processing is to
implement signal processing schemes with high
inherent parallelism . A number of researchers have
studied block processing methods for the
development of computationally efficient high order
recursive filters, which are less sensitive to
roundoff error and coefficient accuracy During
filtering, data samples in fixed size blocks, L, are
processed consecutively. This procedure reduces Fig.5.The block processing algorithm with 2 data
power consumption by decreasing the switching paths.
activity, by a factor depending on L, in the following:
(1) coefficient input of the multiplier, (2) data and
coefficient memory buses, (3) data and coefficient C. Combination Coefficient Segmentation and Block
address buses.Due to the successive change of both Processing Algorithm
coefficient and data samples at each clock cycle, there
is a high switching activity within the multiplier unit The architectures of coefficient segmentation
of the datapath. This high switching activity can be and block processing algorithms can he merged
reduced significantly, if the coefficient input of the together. This will reduce the switching activity
multiplier is kept unchanged and multiplied with a at both coefficient and data inputs of the multiplier
block of data samples. units within the data paths with only slight overhead
Once a block of data samples are processed, in area. The algorithm commences by processing the
then a new coefficient is obtained and multiplied with coefficient set through the segmentation algorithm.
a new block of data samples. However, this process The algorithm segments the coefficients into two
requires a set of accumulator registers corresponding primitive parts.
to the size of the data block size. The previous results The first part Sk ,is processed through a
have shown that a block size of 2 provides the best shifter and the remaining part mk is applied to the


multiplier input. The algorithm performs the Registers R0,R1,….RL-1 respectively. This will form
segmentation through selecting a value of sk which the first block of data samples.
leaves mk to be smallest positive number. This results 5.Apply R0 to both the multiplier and shifter
in a significant reduction in the amount of switched units .Add their results and the content of
capacitance. The resulting sk and mk values are then accumulator Acc0 together and store the final result
stored in the memory for filtering operations. The into accumulator ACC0.Repeat this for the
filtering operation commences by fetching sk and mk remaining data registers R1 to RL-1, this
values and applying these to both shifter and time using accumulators ACC1 to ACCL-1
multiplier inputs respectively. Next, a block of L data respectively.
samples(x0,x1,…xL-1) are fetched from the data 6.Get the multiplier part ,m(N-2),and the shifter part
memory and stored in the register file. s(N-2) of the next coefficient ,h(N-2) and apply these
This is followed by applying the first data to the multiplier and shifter inputs respectively.
sample x0,in the register file to both shifter and 7.Update the data block formed in step (4) by
multiplier units. The resulting values from both shifter getting the next data sample, x[n-(N-L-1)] and
and multiplier units are then summed together and the storing it in data register R0 overwriting the oldest
final result is added to the first accumulator. The data sample in the block.
process is repeated for all other data samples. The 8.Process the new data block as in step(5)
contents of the register file are updated with the However, start processing with R1, followed by
addition of a single new data entry which will replace R2…RL-1,R0 in a circular manner. During this
the first entry in the previous cycle. procedure use accumulators in the same order as
This procedure reduces the switching activity data registers
at coefficient inputs of the multiplier, since the same 9.Process the remaining multiplier and shifter parts
coefficient is used in all data samples in the block. In as in steps(6) to(8).
addition less memory access to both data and 10.Get the first block of filter outputs y(n),y(n-
coefficient memories are required since coefficient 1)….y(n-L)from ACC0,ACC1..ACCL-1
and data samples are obtained through internal 11.Increment n by L and repeat steps(1) to (10) to
registers. obtain the next block of filter outputs.

D. Coefficient Reduction Algorithm for

Coefficient Representation

The main goal of the coefficient reduction algorithm

is to reduce the number of non-zero coefficients used
to represent the filter response. The coefficient
reduction algorithm is summarized below:
1. Derive the filter coefficients based on the
desired specifications using the MATLAB or any
other filter design software program.
2. Multiply these coefficients with a constant value so
that you get some of them greater than zero.
3. Round up the values obtained from step 2 to be
4. The number of non zero values obtained from step
3 must represent at least 93% of the signal power.
5. If the same signal power representation can be
Fig.6 Combined segmentation and block processing obtained with different constant values then the
algorithm with 2 data paths. smaller value is chosen.
6. The values of the first and last coefficient produced
The sequence of steps involved are given below, from step 5 are equal to zero.
1.Clear all accumulators (ACC1 to ACCL-1) 7. Change the values of the first and last coefficient to
2.Get the multiplier part, m(N-1) , of the be non-zeros with their original sign in step 1.
coefficient h(N-1) from the Coefficient 8. Find the frequency response of the filter using the
Memory and apply it to the coefficient input of new set of coefficients and see whether it fulfills the
multiplier control desired specifications or not.
3.Get the shifter part, s(N-1) of the coefficient 9. The absolute value of the first coefficient must be
h(N-1) and apply it to inputs of the Shifter less than 10. Values greater than the proper one will
4.Get the data samples x[n-(N-1)], x[n-(N-2)],…. cause either ripples in the pass band and/or in
x[n-(N-L)], and store these into data


transition band and/or maximize the gain factor of the V.CONCLUSION

filter response.
10. If ripples are there in the pass band or the This paper gives the complete architectural
transition band region of the frequency response implementations of low power algorithms for high
found in 8, then the first and last coefficient values performance applications. Combining the two
must be reduced. algorithms the power is reduced and the throughput is
11. Divide the new set of coefficients by the constant increased by increasing and the number of datapath
used in step 2 so that the filter response is normalized units. Combined Segmentation and Block processing
back to zero magnitude. (COMB) algorithm achieves best power savings.
This paper also presents in detail an
algorithm proposed for modifying the number and
values of FIR filters coefficients that is coefficient
reduction algorithm.The algorithm target is to reduce
the number of non-zero coefficients used to represent
any FIR. An encouraging results will be achieved
when obtaining the phase and frequency responses of
the filters with the new set of coefficients The
schemes target reducing power consumption through
a reduction in the amount of switched capacitance
within the multiplier section of the DSPs.


[1]N. Sathya .K Roy. and D. Bhatracharya:

"Algorithms for Low Power and High Speed FIR
Film Realisation Using Differential Coefficients",
IEEE Trans. on Circuits and Systems-II Analog and
Digital Signal &messing. vol. 44. pp. 488-497, June.
[2] T-S Chang. Y-H Chu. aod C-W Jen: "Low
Fig. 7 Distribution of Transmitted Signal’s average Power FIR Filter Realization with Differential
power Coefficients and lnputs". I EE E Trans on Circuits and
Systems-II: Analog and Digital Signal Processing.
The coefficient reduction algorithm starts vol.47. no. 2, pp. 137-148.Feb..2000.
with obtaining the filter coefficients based on desired [3]M.Mehendale,S.D.Sherlekar,G.Venkatesh:”Low
specifications. Using the round function in MATLAB Power Realization of Fir filters on programmable
these coefficients are rounded to the nearest integer DSPs”IEEE Trans. On VlSI Systems
after being multiplied with a constant integer value. It [4]. Ramprasad N.R. Shanbhag. and I.H. Hajj:
is better if we choose the constant value to be a power "Decorrelating (DECORR) Transformatians for Low
of 2 i.e. (2m) so that the division in step 11 is done Power Digital filters". IEEE Trans on Circuits and
simply by a right shift. Systems-II. Analog and Digital Signal Processing.
The frequency domain representation [5]. Choi and W.P. Burleson: "Search - Based
obtained with the new set of coefficients must cover Wordlength optimistion for VlSI/DSP
at least 93% of the signal power (see Fig. 7) synthesis". VLSI signal Processing
otherwise; the filter performance with its new set of [6T. Erdogan, M. Hasan T. Arslan. "Algorithm Low
coefficients will differ much from its original one. Power FIR Cores" Circuits, Devices and
The value of the constant must be the smaller if more Systems, IEEE Proceedings.
than one constant can produce the same signal power. [7] Kyung-Saeng Kim and Kwyro Lee, "Low-
Smaller value will lead to less gain factor and less power and area efficient FIR filter implementation
passband and/or transition band ripples. suitable for multiple tape," IEEE Trans. On VLSI
systems, vol. 11, No. 1, Feb.
[8] Tang Zhangwen, Zahang Zahnpeng, Zhang Jie,
and Min Hao, “A High-Speed, Programmable, CSD
Coefficient FIR filter”, ICASSP


n x Scalable stacked MOSFET for Low voltage CMOS

T.Loganayagi, Lecturer Dept of ECE (PG), Sona College of technology, Salem
M.Jeyaprakash student Dept of ECE (PG) Sona College of technology, Salem

Abstract This paper presents a design and control, ultrasonic transducer control, electro-static
implementation of stacked MOSFET circuit for device control, piezoelectric positioning, and many
output drivers of low voltage CMOS technologies. others. Existing methods for handling such high-
A monolithic implementation of series connected voltage switching can be divided into two general
MOSFETs for high voltage switching is presented. categories: device techniques and circuit techniques
Using a single low voltage control signal to trigger [1]. Device techniques include lateral double-diffused
the bottom MOSFET in the series stack, a voltage MOSFETs (LDMOSFETs) and mixed voltage
division across parasitic capacitances in the circuit fabrication processes. These methods increase the
is used to turn on the entire stack of devices. individual transistor’s breakdown voltage by
Voltage division provides both static and dynamic modifying the device layout. In the past
voltage balancing, preventing any device in the LDMOSFETs have been used to achieve extremely
circuit from exceeding its nominal operating high operating voltages in standard CMOS
voltage. This circuit, termed the stacked technologies [2]. This is accomplished by adding a
MOSFET, is n x scalable, allowing for on-die lightly-doped drift region between the drain and gate
control of voltages that are n x the fabrication channel. The layout of such devices is unique to each
processes rated operating voltages. The governing process they are implemented in, and as such, are
equations for this circuit are derived and reliable very labor and cost intensive. Further, because
operation is demonstrated through simulation and modern fabrication processes utilizing thinner gate
experimental implementation in a 180 nm SOI oxides and reduced overall process geometries, new
CMOS process. LDMOSFETs are becoming less effective. Even
circular LDMOSFETs, the most effective device
Key Words CMOS integrated circuits, high shape for mitigating high e-field stress, are less
voltage techniques, Buffer circuits, input/output effective than they once were.
(I/O). Mixed voltage fabrication processes essentially
I.INTRODUCTION take a step back in time, allowing for the fabrication
of large geometry, thick oxide devices on the same
High-voltage switching in current MOSFET substrate as sub-micrometer geometry devices [3].
technology is becoming increasingly difficult due to Although effective, these processes are more
the decreasing gate-oxide thickness. Devices with expensive due to their added mask and process steps,
reduced gate-oxide are and still exhibit an upper limit on operating voltage.
Optimized for speed, power consumption and size of Further, because more die-space per transistor is
the device. Stacked MOSFETs in combination with required, the performance per area is relatively poor.
level shifters are one circuit technique to switch high- Circuit techniques used for on-die high-voltage
voltages and overcome the decreased gate-oxide control include level shifters and monolithic high-
break down. The Stacked MOSFETs enables rail-to- voltage input/output (I/O) drivers. Taking many
rail high voltage switching. On-Die high-voltage different forms, level-shifters work by upwardly
switching (where high-voltage is defined as any translating low-voltage signals, such that the voltage
voltage greater than the rated operating voltage of the across any two terminals of each device in the circuit
CMOS fabrication process being used) is a system- never exceeds the rated operating voltage [1],[4]. In
on-chip (SOC) design challenge that is becoming ever doing this, an output voltage that is greater than the
more problematic. Such difficulty is a direct result of individual transistor breakdown voltages can be
the reduced breakdown voltages that have arisen from controlled. However, the magnitude of the output
the deep sub-micrometer and nanometer scaling of voltage swing is still limited by the individual
MOSFET geometries. While these low-voltage transistor breakdown. This requires an output signal
processes are optimized for minimum power that does not operate rail-to-rail. As such, these level-
consumption, high speed, and maximum integration shifters are only suitable for applications where the
density, they may not meet the requirements of addition of off-die high-voltage transistors is possible.
system applications where high-voltage capabilities Monolithic high-voltage I/O drivers are a relatively
are needed. Such applications of on-die high-voltage new technique for the on-die switching of voltages
switching include MEMS device control, monolithic greater than the rated operating voltage of the process
power converter switching, high-voltage aperture [6]. These circuits enable high-voltage switching
using only the native low-voltage FETs of the


fabrication process. Reference [5] reports a circuit derivation, equations representing an -device Stacked
that switches 2(1/2) x the rated voltage of the process. MOSFET will be generated.
While this topology is theoretically n x scalable, it
requires an excessive number of devices in the signal
path, not only taking up a large amount of die area,
but also increasing the on-resistance. Ref. [6] reports
achieving 3x the rated voltage of the process using
only three devices in the signal path. This minimizes
the on-resistance, but the design is not n x scalable.
In this paper, we present a circuit technique for on-
die high-voltage switching that uses a minimum
number of devices in the signal path while remaining
n x scalable. Termed the stacked MOSFET, this
circuit uses the native low-voltage FETs of the
fabrication process to switch voltages n x grater than
the rated breakdown voltage of the process used. That
is, this circuit is scalable to arbitrarily high output
voltages, limited only by the substrate breakdown
voltage. Fig. 1. Schematic of two stacked MOSFET.
The goal of this paper is to show that the stacked
MOSFET is scalable in integrated circuits [7]. The A. Derivation for a Two-Device stacked MOSFET
topology is not changed from [7], but the governing
equations are rederived here such that the design
variables are those that are commonly available in any
IC process ([7] derived the governing equations based
on design variables commonly available for discrete
high voltage
power MOSFETs). First, an overview of the Stacked
MOSFET topology is presented, along with a
derivation of its governing equations. This discussion
focuses on the specific realization of a two-MOSFET
stack, with a generalization given for extending to an
-MOSFET stack. Second, circuit simulation results
are presented, giving validity to our mathematical
model. Third, and finally, experimental results are
presented, revealing excellent correlation between the
analytic models, simulation results, and measured

Fig. 2. Two-device Stacked MOSFET, including
Fig. 1 shows the topology of a two-device Stacked parasitic capacitances, with definition of notation
MOSFET. By placing MOSFETs in series and used in derivation.
equally dividing the desired high-voltage across them,
for the entire switching period, reliable high-voltage The triggering of the Stacked MOSFET is
control can be achieved. In switching applications this accomplished through capacitive voltage division. As
circuit will act as a single MOSFET switch, being shown in Fig. 2, there exists an inherent parasitic
controlled by a low-voltage logic level. Hess and capacitance Cp between the gate and source of M2.
Baker implemented this circuit using discrete power This capacitance, along with a capacitor C2 inserted
MOSFETs. As such, their characterization of the in the gate leg of M2 will set the value of Vgs2 that
circuit was well suited to the discrete design process, turns on M2.
utilizing spec sheet parameters such as MOSFET Consider the circuit in Fig. 2 with an initial
input and output capacitance. To realize this circuit condition of both devices turned off. If the resisters
concept in IC technology the governing equations are sized such that
need to be recharacterized for IC design parameters.
The following is a derivation of the governing Rbias << R1+ R2
equations for the two-device Stacked MOSFET based (1)
on conservation of charge principles. From this


Then the output voltage rise to Vdd . (Note that this charge on the parallel combination of the two
assumes that the off state leakage current through M1 capacitors. By the conservation of charge, the total
and M2 is much less than the current through R1 and charge on the parallel combination of C2 and Cp will
R2.) Since M1 is off, the node Vdrain is free to take be the sum of their initial charges
on the value dictated by the voltage divider of R1 and Qtotal= Q2 (initial) + Qp (initial)
R2. (7)
If R1 and R2 are sized equally then Where
Vdrain = V2dd Qp(initial)=Cp(-Vdiode).
This voltage is grater than Vg 2 (the reason for this The resulting gate-source voltage will be
will become more apparent later in the derivation),
Q total
and causes the diode to be forward biased. The V gs 2 ( final ) =
resulting voltage at the gate of M2 will be Q parallel
Vg 2 = Vdrain − Vdiode = V2dd − Vdiode
Substituting in (8)
V gs2 =
C 2 ( vdd/2 - vdiode )+ C p (− vdiode )
where Vdiode is the forward voltage across the diode. C 2 + Cp
Equation (2) and (3) dictate a Vg 2 of (10)
This simplifies to
Vg 2 = - Vdiode
Vgs2 = C2 ⎛⎜ Vdd −Vdiode⎞⎟ + C2 (−Vdiode)
(4) C2+Cp ⎜⎝ 2 ⎠ C2+Cp

keeping M2 off. As such, the off condition, with the Solving (11) for C2, an expression for setting the
output voltage at Vdd and Vdrain at Vdd/2 exhibits desired gate-source voltage to turn on M2 can be
static voltage balancing and results in this condition found as
being safely held.
When Vin rises high, M1 is turned on, pulling ⎛ ⎞
Vdrain to ground. This reverse biases the diode, ⎜ ⎟
C2 = C ⎜ V gs +V diode ⎟
v dd − (V gs +V diode )
leaving the gate-source voltage of M2 to be set by the p⎜ ⎟
capacitive voltage divider of C2 and Cp. Cp ⎜ ⎟
⎝ 2 ⎠
represents the lumped total parasitic capacitance
across the gate-source of M2 and can be solved for as (12)
M2 will then be held on as long as the charge on C2
maintains a voltage greater than the threshold of M2.
p = diode+ gs + gb + gd(1- v1) + ds(1- v2) This implies an inherent low-frequency limitation,
(5) due to on-state leakage current dissipating the charge
on C2. If frequencies of operation less than those
Where Cdiode is the reverse bias junction capacitance allowed by the given value of C2 are desired, C2 and
of the diode and Cgs, Cgb, Cgd, and Cds are the Cp can be simultaneously scaled up according to the
corresponding MOSFET junction capacitances. Ev1 ratio
and Ev2 are used to approximate the Miller
⎛ ⎞
capacitance resulting from Cgd and Cds, respectively,
C ⎜ V + V ⎟
⎜ ⎟
2 gs diode
and are defined as
Cp ⎜ v dd − (V gs + V diode ) ⎟
v dd ⎜ ⎟
E v1 =
∆ V ds = −
2 ⎝ 2 ⎠
∆ V gs V gs +V diode (13)
(6a) Because MOSFETs are majority charge carrier
v dd +V gs +V diode devices and each device in this circuit is capacitively
Ev 2 = ∆V dg = −
2 coupled to the next, it is expected that all devices in
∆ V gs V gs +V diode the stack will turn on and turn off together,
(6b) maintaining a dynamic voltage balancing. This will be
At turn-on C2 and Cp are in parallel, resulting in the experimentally verified in the following.
final gate-source voltage being dictated by the total


B. Derivation for an n-Device Stacked MOSFET

The previous analysis and characterization can be
directly extended to a stack of nMOSFETs. Fig. 3
shows a generalized schematic of an n-device Stacked
Equation (11) can be redefined for the generalized
circuit as
C ⎛ v ⎞
V gs(i) =
(i) ⎜ ( i −1)⋅⎛⎜ dd ⎞⎟ − diode ⎟
C +C

p(i) ⎝ ⎝ n ⎠

(− Vdiode)
C(i) + Cp(i)

Where n is the number of devices in the stack and is

the specific device being considered. The parasitic
capacitances Cp (i) are defined in the same manor as
for the two-device stack and are all equal for equally
sized devices in the stack. The design equation for
setting the turn-on gate-source voltages is then
generally defined as follows:

⎛ ⎞
C(i) = C ⎜
gs+ diode ⎟
⎜ (i-1)⋅(Vdd/n)-( gs+ diode⎟⎟
⎝ ⎠ Fig. 3. Generalized n-device Stacked MOSFET.
The (i-1) (Vdd/n) term in the denominator of (15)
will increase for devices higher in stack, and result in
a value for C(i) that is less than C(i-1). This reduction III. DESIGN AND SIMULATION
in C(i) implies that the ratio of die-space occupied to
output voltage decreases for higher voltages. In other Utilizing the previous equations, a two-device
words, less overhead space is required for devices at Stacked MOSFET has been designed and simulated
the top of the stack than at the bottom. for implementation in Honeywell’s 0.35- m PD SOI
As with the two-device Stacked MOSFET, if CMOS process. This process is rated for 5-V
frequencies of operation less than those allowed by operation. The models used are the BSIMSOI models
the given value of C(i) are desired, C(i) and Cp(i) can provided by Honeywell. Also, to illustrate the validity
be simultaneously scaled up according to the ratios as of the design equations for the general –device
follows: Stacked MOSFET, simulation results for an eight-
device Stacked MOSFET are included.
C (i) ⎛ ⎞

= ⎜
⎜ (i -1) ⋅ (Vdd/n) - ( gs + ⎟
C p(i) ⎝ V V diode ⎟⎠
Consider the two-device Stacked MOSFET, shown
in Fig. 1. If each FET used in the stack is sized to
have a W/L of 500 and a gate-source voltage of 4V,
then the parasitic capacitances, under the desired
operating conditions, can be extracted from the device
models as shown in Table I. This table also includes
the extracted diode junction capacitance at the
appropriate biasing conditions. Accordingly, can be
sized using (5) and (12) to be 14.6 pF. The simulated
drain voltages resulting from the previous design
values are shown in Fig. 4. The top trace is the drain
voltage for M2 and the lower trace is the drain
voltage for M1. Note that the voltages are evenly
distributed causing neither device to exceed its drain-
source breakdown voltage. The gate-source voltage


which controls M2 is shown in Fig. 5. Note that the 4-

V gate source voltage design for turning on M2 is
achieved. Also, the predicted 0.7-V gate-source
voltage used to hold the stack off is exhibited.

Fig.7. Dynamic drain voltage balancing on the falling



Fig.4. Drain voltages for two-device Stacked The previously simulated two-device Stacked
MOSFET operating with a 10v supply. MOSFET has been implemented in the same
TABLE I Honeywell 0.35- m PD SOI CMOS process. The
MODELED JUNCTION CAPACITANCES layout and test structure is shown in Fig. 9. In
implementing this circuit it is important to take into
Capacitance Extracted account any parasitics that are introduced in layout as
values well as in the test and measurement setup. All
Gate-Source 838.63 capacitances will affect the operation of the Stacked
Gate-Bulk 16.62 MOSFET. For this reason, good layout techniques,
Gate-Drain 52.03 coupled with post-layout parasitic simulation of the
Drain-Source 10.87 circuit, are critical. Further, realistic models of
Diode 9.96 capacitances and inductances introduced by probe
tips, bond wires, or other connections should be
Fig. 8 shows a drain voltage characteristic similar to
the simulation results shown in Fig. 4. This
characteristic results from the two-device Stacked
MOSFET being biased with a 10-V supply, operating
at 50 kHz. As predicted, these measurements show
that in the off state static voltage balancing is
achieved. This balancing ensures that each device is
supporting an even 5-V share of the 10-V output.
When the stack turns on, both devices turn on almost
simultaneously, pulling the output to ground.
As discussed previously, because the MOSFET is a
majority charge carrier device, and each device is
capacitively coupled to the next, all of the devices in
Fig.5. Gate-Source voltage for M2 in a Two-device the stack rise and fall together. This dynamic voltage
Stacked MOSFET operating with a 10-v supply. sharing is what allows for each component in the
circuit to operate very near the edge of its rating.

Fig.6. Dynamic drain voltage balancing on the rising



Fig.8. Measured drain voltages for a two device REFERENCES

stacked MOSFET showing even voltage sharing for
both devices. [1] H. Ballan and M. Declercq, High Voltage Devices
and Circuits in Standard CMOS Technologies.
Norwell, MA: Kluwer, 1999.
[2] T. Yamaguchi and S. Morimoto, “Process and
device design of a 1000-V MOS IC,” IEEE Trans.
Electron Devices, vol. 29, no. 8, pp. 1171–1178, Aug.
[3] J.Williams, “Mixing 3-V and 5V ICS,” IEEE
Spectrum, vol. 30, no. 3, pp. 40–42, Mar. 1993.
[4] D. Pan, H. W. Li, and B. M. Wilamowski, “A low
voltage to high voltage level shifter circuit for MEMS
application,” in Proc. UGIM Symp., 2003, pp. 128–
[5] A.-J. Annema, G. Geelen, and P. de Jong, “5.5 V
I/O in a 2.5 V in a 0.25 _m CMOS technology,” IEEE
J. Solid-State Circuits, vol. 36, no. 3, pp. 528–538,
Mar. 2001.
Fig.9. Layout of a two-device stacked MOSFET with [6] B. Serneels, T. Piessens, M. Steyaert, and W.
test pads. Dehaene, “Ahigh-voltage output driver in a 2.5-V
0.25-_m CMOS technology,” IEEE J. Solid- State
V. CONCLUSION Circuits, vol. 40, no. 3, pp. 576–583, Mar. 2005.
[7] H. Hess and R. J. Baker, “Transformerless
In this paper we have shown that with new capacitive coupling of gate signals for series
characteristic equations, that the series connected operation of powerMOSdevices,” IEEE Trans. Power
MOSFET circuit is adaptable to IC technology. Using Electron., vol. 15, no. 5, pp. 923–930, Sep. 2000.
this monolithic implementation of series connected
MOSFETs, on-die high voltage switching is achieved.
The governing design equations have been derived
and verified through circuit simulation and
experimental measurement. This technique for on-die
high-voltage switching can be classified as a circuit
technique that reliably achieves rail-to-rail output
swings. Such high-voltage switching is accomplished
using only the fabrication processes native logic
gates. The triggering of this circuit is extremely fast,
exhibiting input-to-output delays of only 5.5 ns, with
rise and fall times of approximately 10 kV/ s. The low
frequency limit is set only by the scaling of the
inserted gate capacitor. The high frequency limit will
ultimately be set by the rise/fall times. Our measured
results show excellent static and dynamic voltage
sharing. In the event of transient over voltages, the
over voltage is evenly distributed across the stack,
minimizing the impact.


Test pattern selection algorithms using output deviation

S.Malliga Devi, Student Member, IEEE, Lyla.B.Das, and S.Krishna kumar

e.g. complexity of the test pattern generation (TPG)

Abstract:-It is well known that n-detection test sets and the test application time. The problem of finding
are effective to detect unmodeled defects and an optimal test set for a tested circuit with acceptable
improve the defect coverage. However, in these fault coverage is an important task in diagnostics of
sets, each of the n-detection test patterns has the complex digital circuits and systems. It has been
same importance on the overall test set published that high stuck-at-fault (SAF) coverage
performance. In other words, the test pattern that cannot guarantee high quality of testing, especially for
detects a fault for the first time plays the same CMOS integrated circuits. The SAF model ignores
important role as the test pattern that detects that the actual behaviour of digital circuits implemented as
fault for the (n)- th time. But the test data volume CMOS integrated circuits and does not adequately
of an n-detection test set is often too high .In this represent the majority of real integrated circuits
paper, we use output deviation algorithms defects and failures.
combined with n-detection test set to reduce test The purpose of fault diagnosis is to
data volume and test application time efficiently determine the cause of failure in a manufactured,
for test selection using Probabilistic fault model faulty chip. An n-detection test set has a property that
and the theory of output deviation method. To each modeled fault is detected either by n different
demonstrate the quality of the selected patterns , tests, or by the maximum obtainable different m tests
we present experimental results for non feedback that can detect the fault (m < n). Here, by different
zero-resistance bridging faults, stuck open faults in tests for a fault, we mean tests which can detect this
the ISCAS benchmark circuits. Our results show fault and activate or/and propagate the faulty effect
that for the same test length, patterns selected on along different paths[3]. The existing literature
the basis of output deviations are more effective reports experimental results [4] suggesting that the n-
than patterns selected using several other methods. detection test sets are useful in achieving high defect
coverage for all types of circuits (combinational, scan
sequential, and non-scan sequential). However, the
effectiveness of n-detection tests for diagnosis
remains an unaddressed issue.
I.INTRODUCTION The inherent limitation for n-detection tests is their
increased pattern size. Typically, the size of an n-
detection test set increases approximately linearly
Semiconductor manufacturers strive to attain with n [3]. Because the tester storage space is limited,
a high yield (ideally 100%) when fabricating large test volume may create problems for storing the
integrated circuits. Unfortunately, numerous factors failing-pattern responses.
can lead to a variety of manufacturing defects which In this paper, we investigate the effectiveness
may reduce the overall yield. The purpose of testing of n-detection tests to diagnose failure responses
is to identify and eliminate any effective chips after caused by stuck-at and bridging faults. It was
the chips are manufactured. However, it is currently observed in [] that the common one-detection test set
impractical to test exhaustively for all possible with greater than 95% stuck-at fault coverage
defects. This is a result of the computational produced only 33% coverage of node-to-node
infeasibility of accurately modeling defects, bridging faults. A test that detects a stuck-at fault on a
limitations imposed by existing manufacturing test node will also detect the corresponding low resistive
equipment and time/economic constraints imposed by bridges (AND,OR) with the supply lines. This is also
the test engineers. For these reasons, the stuck-at-fault the reason that the tests generated for stuck-at faults
(SAF) model has been accepted as the standard model can detect some bridging defects in the circuit.
to generate test patterns[2]. Most of the existing However, such test sets do not guarantee the detection
commercial ATPG tools use the SAF coverage as a of node-to-node bridges. If a stuck-at fault on a node
metric of the quality of a test set and terminate test is detected once, the probability of detecting a static
generation when a high SAF fault coverage is bridging fault with another un-correlated node that
attained. has signal probability of 50% is also 50%. When the
Each possible physical defect in a tested stuck-at fault is detected twice (thrice), the estimated
circuit should be covered by the test method that leads probability of detecting the bridging fault with another
to the lowest overall testing costs, taking into account node acting as an aggressor will increase to 75%


(88%). A test set created by a conventional ATPG III. CALCULATION OF THE OUTPUT
tool aiming at single detection may have up to 6% of DEVIATION
stuck-at faults detected only once, and up to 10% of Output deviation is the metric which tells
stuck-at faults detected only once or twice. This may how much the output deviates from the expected
result in inadequate coverage of node-to-node value.
bridging defects. The experimental results show that We use ISCAS-85 benchmark circuits for
in general, n-detection tests can effectively improve calculation of the output deviation methods.
the diagnostic algorithm’s ability to locate the real
fault locations even though use the single-stuck-at-
fault based diagnosis algorithm.

A. Fault model

we consider a probabilistic fault model[5] that allows

any number of gates in the IC to fail probabilistically.
Tests for this fault model, determined using the theory
of output deviations, can be used to supplement tests For example to calculate the output deviation
for classical fault models, thereby increasing test [5] of the circuit c17.bench from ISCAS-85
quality and reducing the probability test escape. By benchmark circuits, for the input pattern 00000 the
targeting multiple fault sites in a probabilistic manner, expected outputs are 0 0.but the probability of output
such a model is useful for addressing phenomenon or line 22 being 1 is 0.333
mechanisms that are not fully understood. Output and output being 0 is 0.667.similarly the probability
deviations can also be used for test selection, whereby of output line 23 being 1 is also 0.333 and output
the most effective test patterns can be selected from being 0 is 0.667.
large test sets during time-constrained and high- From the definition of output deviation[7] ,
volume production testing[1].The key idea here is to for the above circuit the output deviation of the output
use redundancy to ensure correct circuit outputs if lines 22, 23 is .333,0.333 respectively .
every logic gate is assumed to fail with probability .
Elegant theoretical results have been derived on the IV.N-DETECTION TEST SET
amount of redundancy required for a given upper
bound on . However, these results are of limited In n-detection test, where each target fault is targeted
practical value because the redundancy is often by n different test patterns. Using the n-detection test
excessive, the results target only special classes of defect coverage is increased. As the number of unique
circuits, and a fault model that assigns the same detections for each fault increases, the defect
failure probability to every gate (and for every input coverage usually improves. An advantage of this
combination) is too restrictive. Stochastic techniques approach is that even when n is very large, n-
have also been proposed to compute reliably using detection test sets can be generated using existing
logic gates that fail probabilistically. single stuck-at ATPG tools with reasonable
computation time. We use ATALANTA single stuck
II.FINDING OUT ERROR PROBABILITY –at ATPG tool to generate n-detection test set .
In this section, we explain how to calculate
the error probability of a logic gate. Error probability A. Disadvantage of n-detection test
of a gate defines the probability of the output being an
However, the data volume of an n-detection
unexpected value for the corresponding input
test set is often too large, resulting in long testing time
combination.For calculating the error probability we
and high tester memory requirement . This is because
need ,reliability vector of the gate, and the probability
the n-detection method simply tries to detect each
of the various input combinations.
single stuck-at fault n times, and does not use any
other metric to evaluate the contribution of a test
pattern towards increasing the defect coverage. It has
been reported in the literature that the sizes of n-
detection test sets tend to grow linearly with n .

Using the method used in[7],we calculate the output B. Importance of test selection
probability of the above gate. with the input
combination as 00,the output probabilities Therefore, test selection is necessary to
pc0=0.1,pc1=0.9.pc0 is the probability of output being 0 ensure that the most effective test patterns are chosen
and pc1 is the probability of output being 1 for the from large test sets during time constrained and high-
corresponding input combination. volume production testing. If highly effective test


patterns are applied first, a defective chip can fail simulation in future. We are also implementing a
earlier, further reducing test application time bridging fault simulator to calculate the coverage of
environment. Moreover, test compression is not single non-feedback, zero-resistance bridging faults
effective if the patterns are delivered without a (sNFBFs). To eliminate any bias in the comparison of
significant fraction of don't-care bits. In such cases, different methods for test set selection, we use two
test set selection can be a practical method to reduce arbitrarily chosen sets of confidence level vectors for
test time and test data volume. our experiments.One more method to evaluate the test
In this paper, we use the output deviation patterns selected using output deviation method is
metric for test selection. To evaluate the quality of the Gate exhaustive (GE) testing metrics [8]which are
selected test patterns, we determine the coverage that computed using an inhouse simulation tool based on
they single non-feedback zero-resistance bridging the fault simulation program FSIM [9].A CMOS
faults (s-NFBFs),and stuck open faults. Experimental combinational circuit under the presence of a SOP
results show that patterns selected fault behaves like a sequential circuit[10]. In CMOS
using the probabilistic fault model and output circuits, the traditional line stuck-at fault model does
deviations provide higher fault coverage than patterns not represent the behaviors of stuck-open (SOP) faults
selected using other methods. properly. A sequence of two test patterns is required
to detect a SOP fault. SOPRANO[11] an efficient
V. ALGORITHMS automatic test pattern generator for stuck-open faults
in cmos combinational circuits. We also apply the
Using theory of output deviation method test output deviation algorithms to the stuck open faults to
pattern selection algorithm is done. In that selection of evaluate the quality of selected test patterns in a high
small number of test patterns T11 from a large test set volume production testing environment. We are
called T1. To generate T1, we run ATALANTA a currently concentrating to get the tools to evaluate the
single stuck- at fault ATPG tool. The ATPG tool stuck open faults.
generates n-detection test patterns for each single
stuck at fault. Each time a test pattern is selected, we VII.CONCLUSION
perform fault simulation and drop those faults that are Evaluation of pattern grading using the fault coverage
already detected n times. The set T1 is randomly for stuck open faults and non feedback bridging faults
reordered before being provided as input to is being done to demonstrate the effectiveness of
Procedure1.The flow chart for the procedure is shown output deviation as a metric to model the quality of
in fig.3. test patterns. This proves especially useful in high
Then we sort T1 such that test patterns with volume and time constraint production testing
high deviations can be selected earlier than test environment . The work is on progress and final
patterns with low deviations. For each primary output results will be exhibited during the time of
(PO), all test patterns in T1 are sorted in descending presentation.
order based on their output deviations ,we get test set
Using the sorted test is applied to [I] M.Abramovici, M.A.Breuer, A.D.Friedman,
procedure1, therefore we get the optimized n- “Digital
detection test set that normally contains smaller Systems Testing and Testable Design”,1990,
number of test patterns and achieves high defect Computer
coverage. Science Press,pp.94-95.
In procedure2 selection of test patterns with [2]. Y. Tian, M. Mercer, W. Shi, and M. Grimaila,
low output deviations are selected earlier than test “An optimal test pattern selection method to improve
patterns with high output deviations. But this the defect coverage” in Proc. ITC, 2005.
procedure takes one more parameter called [3]. “Multiple Fault Diagnosis Using n-Detection
threshold[7]. Tests”, Zhiyuan Wang, Malgorzata Marek-
Sadowska1 Kun-Han Tsai2 Janusz Rajski2,
Proceedings of the 21st International Conference on
Computer Design (ICCD’03) [4]. E.J. McCluskey, C.-
W. Tseng,“Stuck-fault tests vs. actual defects”, Proc.
The work is in progress .All experiments are of Int'l Test Conference, 2000, pp. 336 -342.
being performed on a Pentium 4 PC running Linux [5] Z. Wang, K. Chakrabarty, and M. Goessel,”Test
with a 2.6 Ghz processor and 1G memory. The set enrichment using a probabilistic fault model and
program to compute output deviations is to be the theory of output deviations”in Proc.DATE Conf.,
implemented using C. Atalanta and its associated 2006, pp. 1275.1280. [6] K. P. Parker and E. J.
simulation engine are used to generate n-detection test McCluskey,”.Probablistic treatment of general
sets. We have written the fault simulation program in combinational networks”, IEEE Trans. Computers,
c language so that we can add constrains in the vol. C-24, pp. 668.670, Jun. 1975.


[7] “An Efficient Test Pattern Selection Method for

Improving Defect Coverage with Reduced Test Data
Volume and Test Application Time”,Zhanglei Wang
and Krishnendu Chakrabarty 15th Asian Test
Symposium (ATS'06) [8] K. Y. Cho, S. Mitra, and E.
J. McCluskey,”Gate exhaustive testing”, In Proc.
ITC, 2005, pp. 771.777. [9] “An efficient, forward
fault simulation algorithm based on the parallel
pattern single fault propagation” in Proc. ITC, 1991,
pp. 946-955.
[10] ”A CMOS Stuck-Open Fault Simulator”,Hyung
K. Lee, Dong S. Ha, ieee proceedings - 1989
[II] “SOPRANO: An Efficient Automatic Test Pattern
Generator For Stuck-Open Faults In Cmos
Circuits”,Hyung Ki Lee and Dong Sam Ha, 27th
Design Auto mation Conference.


Fault Classification Using Back Propagation Neural Network for

Digital to Analog Converter
B.Mohan*, R. Sundararajan *, J.Ramesh** and Dr.K.Gunavathi*** UG Student
** Senior Lecturer
*** Professor
Department of ECE
PSG College of Technology, Coimbatore

Abstract: In today’s world Digital to Analog Fig 1 illustrates an 8-bit R-2R ladder. Starting at the
converters are used in the wide range of right end of the network, notice that the resistance
applications like wireless networking (WLAN, looking to the right of any node to ground is 2R. The
voice/data communication and Bluetooth), wired digital input determines whether each resistor is
communication (WAN and LAN), and consumer switched to ground (non inverting input) or to the
electronics (DVD, MP3, digital cameras, video inverting input of the op-amp. Each node voltage is
games, and so on). Therefore the DAC unit must related to VREF, by a binary-weighted relationship
be fault free, and there is a need for a system to caused by the voltage division of the ladder network.
detect the fault occurrence. This paper deals with The total current flowing from VREF is constant, since
designing an efficient system to detect and classify the potential at the bottom of each switched resistor is
the fault in the DAC unit. R-2R DAC has been always zero volts (either ground or virtual ground).
used for analysis and the back propagation neural Therefore, the node voltages will remain constant for
network algorithms are used in classifying the any value of the digital input.
faults. Efficiency of 77% is achieved in classifying
the fault by implementing three back propagation Fig 1 Schematic of R-2R digital to analog
neural network algorithms. converter
2R 2R 2R 2R 2R 2R 2R 2R 2R
There are many challenges for mixed signal design to
D7 D6 D5 D4 D3 D2 D1 D0
be adaptable for SOC implementation. The major R
considerations in designing these mixed signal
circuits for the complete SOC are high speed, low
power, and low voltage. Both cost and high speed
operation are limitations of the complete SOC.
Accordingly, to remove the speed gap between a
processor and circuits in the complete SOC The output voltage, out, is dependent on currents
implementation, architectures must not only be fast flowing through the feedback resistor, RF(=R), such
but also cheap. The next challenge is low power that
consumption. In the portable device market, reducing out = -iTOT . RF (1)
the power consumption is one of the main issues. Low Where iTOT is the sum of the currents selected by the
voltage operation is one of the difficult challenges in digital input by
the mixed-signal ICs. Above all the circuits designed N −1
must be fault free. If any fault occurs then it must be itot = ∑ Dk N −k
detected. Therefore the fault classification is one of k =0 2 * 2R
the major needs in mixed signal ICs. This paper aims Where Dk is the k-th bit of the input word with a
at implementing efficient fault classification in DAC value that is either a 1 or a 0.The voltage scaling DAC
unit using neural network. structure is very regular and thus well suited for MOS
technology. An advantage of
this architecture is that it guarantees monotonicity, for
II. R-2R DAC the voltage at each tap cannot be less than the tap
below. The area required for the voltage scaling DAC
is large if the number of bits is eight or more. Also,
R-2R D/A Converter work under the principle of
the conversion speed of the converter will be sensitive
voltage division and this configuration consist of a
to parasitic capacitance at each of its internal nodes.
network of resistors alternating in value of R and 2R.


FIG 2 OUTPUT RESPONSE OF 8-BIT R-2R DAC Offset error 0.002484 LSB

Gain error 0.00979 LSB

Average power 34.7106 µW

Max power 827.2653 µW

Min power 0.000127 µW


The Structural fault models considered for the testing

of the DACs are:

The output of R-2R DAC for the 8-bit pattern counter 1) Gate-to-Source Short (GSS)
input is shown below in Fig 2. The output is very 2) Gate-to-Drain Short (GDS)
linear, glitch free and rises to the supply voltage of 3) Drain-to-Source Short (DSS)
2.5 V within 256 µs. 4) Resistor Short (RS)
5) Capacitance Short (CS)
The INL and DNL curves for the fault free case are 6) Gate Open (GO)
plotted using MATLAB and are shown in Fig 3. The 7) Drain Open (DO)
maximum INL and DNL are found to be 0.038LSB 8) Source Open (SO)
and -0.012LSB respectively. 9) Resistor Open (RO)

FIG 3.1 INL CURVES OF R-2R DAC The structural faults are illustrated in fig 4. The low
resistance (1 ) and high resistance (10M ) are
frequently used to simulate structural faults. Restated,
a transistor short is modeled using low resistance
(1 ) between the shorted terminals, and an open
transistor is modeled as a large resistance (10M ) in
series with the open terminals.

For example, the gate-open is modeled by connecting

the gate and the source, and also the gate and the
drain of the transistor by a large resistance (10M ).


Fig 4 Structural faults
Gate to Drain Drain to Source
Short Short

The offset error, gain error and power consumed by

the R-2R DAC are shown in table 1.



Gate to Source Short Resistor Short




patterns and the desired outputs. In a three-layer

network, it is a straightforward matter to understand
how the output, and thus the error, depends on the
R 1ohm hidden-to-output layer weights.

1ohm The results obtained from the Monte Carlo simulation

are used to detect and classify the fault using the
resistor open Capacitance short neural network model. Here back propagation neural
network model is used. Following Back propagation
Algorithms are used to classify the faults:
R 10Mohm 1ohm
C Trainoss

Gate Open Drain Open Source Open Trainbfg can train any network as long as its weight,
net input, and transfer functions have derivative
Back propagation is used to calculate derivatives of
performance with respect to the weight and bias
10Mohm variables X. Each variable is adjusted according to the
10Mohm G following:
G X = X + a*dX; (4)
10Mohm where dX is the search direction. The parameter a is
S 10Mohm selected to minimize the performance along the search
The first search direction is the negative of the
IV. MONTE CARLO ANALYSIS gradient of performance. In succeeding iterations the
search direction is computed according to the
All types of faults are introduced in each transistor following formula:
and resistor and Monte Carlo simulation is done for dX = -H\gX; (5)
each case. The Monte Carlo analysis in T-Spice is where gX is the gradient and H is an approximate
used to perform simulation by varying the value of Hessian matrix.
the threshold voltage(parameter).The iteration value
of the Monte Carlo analysis specifies the number of (B)TRAINCGB
times the file should be run by varying the threshold Traincgb can train any network as long as its weight,
value. Syntax in T-Spice to invoke the Monte Carlo net input, and transfer functions have derivative
Analysis: functions. Back propagation is used to calculate
.param VTHO_N=unif(0.3694291,.05,2) derivatives of performance with respect to the weight
VTHO_P=unif(0.3944719,.05,2) (3) and bias variables X. Each variable is adjusted
The result thus obtained is stored in the spread sheet according to the following:
for further fault classification using neural network X = X + a*dX; (6)
where dX is the search direction. The parameter a is
V. FAULT CLASSIFICATION USING BACK selected to minimize the performance along the search
PROPAGATION NEURAL NETWORK direction. The first search direction is the negative of
the gradient of performance. In succeeding iterations
Any function from input to output can be the search direction is computed from the new
implemented as a three-layer neural network. In order gradient and the previous search direction according
to train a neural network to perform some task, to the formula:
weights and bias value must be adjusted at each dX = -gX + dX_old*Z; (7)
iteration of each unit in such a way that the error where gX is the gradient. The parameter Z can be
between the desired output and the actual output is computed in several different ways.
reduced. This process requires that the neural network
compute the error derivative of the weights (EW). In (C)TRAINOSS
other words, it must calculate how the error changes Trainoss can train any network as long as its weight,
as each weight is increased or decreased slightly. The net input, and transfer functions have derivative
back propagation algorithm is the most widely used functions. Back propagation is used to calculate
method for determining the EW. The goal now is to derivatives of performance with respect to the weight
set the interconnection weights based on the training


and bias variables X. Each variable is adjusted

according to the following:
X = X + a*dX; (8)
where dX is the search direction. The parameter a is
selected to minimize the performance along the search
direction. The first search direction is the negative of
the gradient of performance. In succeeding iterations
the search direction is computed from the new
gradient and the previous steps and gradients
according to the following formula:
dX = -gX + Ac*X_step + Bc*dgX; (9)
where gX is the gradient, X_step is the change in the
weights on the previous iteration, and dgX is the
Fig6: performance graph for traincgp algorithm with
change in the gradient from the last iteration.
parameters values learning rate=0.03, hidden layer =8
While using the neural network algorithms the
following parameters are varied within the range
specified and the results are obtained.


Range of Parameter Variation

Learning Rate 0.01 to 0.05

Hidden Layer Neurons 10 to 15 Neurons

Epochs for Training 100 to 1500 Epochs Fig7: performance graph for trainoss algorithm with
parameters values learning rate=0.03, hidden
layer = 8


The following are the output results for the different
Back Propagation algorithms by varying the
parameter values like learning rate, epochs and the
hidden layers for the maximum fault detection

Fig8 : performance graph for trainbfg algorithm with

parameters values learning rate=0.01, epochs= 1000

Fig5: performance graph for trainbfg algorithm with

parameters values learning rate=0.03, hidden
layer = 8


Fig9 : performance graph for traincgp algorithm with Fig12 : performance graph for trainbfg algorithm with
parameters values learning rate=0.01, epochs= 1000 parameters values no of hidden layers= 8,
epochs= 1000

Fig10 : performance graph for trainoss algorithm with

parameters values learning rate=0.01, epochs= 1000 Fig13 : performance graph for trainoss algorithm with
parameters values no of hidden layers= 8,
epochs= 1000

From fig 5,6,7 it can be inferred that for constant

learning rate of 0.03 and hidden layer of value 8,the
fault coverage is best for trainoss algorithm with
epoch value of 1000 .

From fig 8,9,10 it can be inferred that for constant

learning rate of 0.01 and epoch value of 1000,the
fault coverage is best for trainoss algorithm with
hidden layer of value 8.

From fig 11,12,13 it can be inferred that for constant

hidden layer of value 8 and epoch value of 1000,the
Fig11 : performance graph for trainbfg algorithm with
fault coverage is best for trainoss algorithm with.
parameters values no of hidden layers= 8,
learning rate of 0.01
epochs= 1000
The fault coverage of all the three algorithms has
been compared in the graphs above. It can be inferred
that best fault coverage of 77% for trainoss when
compared to other algorithms.


Fig 14: Performance Comparison for three algorithms

with learning rate=0.01, epoch =1000 hidden layer =8
giving best fault classification


In this paper, fault classification using the neural

network with the three Back Propagation algorithm
(trainbfg , traincgp, and trainoss) is used to classify
the fault by varying the parameters like learning rate,
epochs and the hidden layers and the output results
are obtained. The trainoss algorithm is efficient
enough to classify the faults up to 77%. The output
results show the best values of the epoch, learning
rate and number of hidden neurons for which the
algorithms show best performance. This work can be
extended with classifying the faults in other DAC
converters by using the efficient neural network

1. Chi Hung Lin, Klass Bult., “A 10-b 500-
MSamples CMOS DAC in 0.6m2 “, IEEE
J.Solid State Circuit, pp. 1948-1958, Dec 1998.
2. Swapna Banerjee et al., “A 10-bit 80-MSPS
2.5-V 27.65-mW 0.185mm2 Segmented
Current Steering CMOS DAC”,18th
International Conference on VLSI Design, pp.
319-322, Jan.2005,.
3. Jan M. Rabaey, ”Digital integrated circuits: a
design perspective”, Prentice Hall of India Pvt
Ltd, new delhi, 2002.
4. Morris Mano M.”Digital Design-third edition,”
Prentice Hall of India Pvt Ltd, new delhi, 2000.
5. S N Sivanandam, S Sumathi, S N
Deepa,”Introduction to Neural Network using
MATLAB 6.0”,2006.
6. Grzechca. D, Rutkowski. J, “New Concept to
Analog Fault Diagnosis by Creating Two
Fuzzy-Neural Dictionaries Test”, IEEE
MELCON, May 2004.


Testing Path Delays In LUT-Based FPGAs

Ms.R..Usha*, Mrs.M.Selvi, M.E(Ph.D). **

*Student,II yr M.E.,VLSI Design, Department of Electronics&Communication Engineering

**Asst.Prof, Department of Electronics&Communication Engineering
Francis Xavier Engnieering College,Tirunelveli

Abstract- Path delay testing of FPGAs is especially This path delay testing method is applicable to
important since path delay faults can render an FPGA’s in which the basic logic elements are
otherwise fault-free FPGA unusable for a given implemented by LUTs. The goal of this work is to test
design layout. In this approach, select a set of a set of paths, called target paths, to determine
paths in FPGA based circuits that are tested in whether the maximum delay along any of them
same test configuration. Each path is tested for all exceeds the clock period of the circuit. These paths
combinations of signal inversions along the path are selected based on static timing analysis using
length. Each configuration consists of a sequence nominal delay values and actual routing information.
generator, response analyzer and circuitry for Circuitry for applying test patterns and observing
controlling inversions along tested paths, all of results is configured using parts of the FPGA that are
which are formed from FPGA resources not not under test.
currently under test. The goal is to determined by
testing whether the delay along any of the path in INTRODUCTION TO APPROACH
the test exists the clock period.Two algorithms are The delay of a path segment usually depends on
presented for target path partitioning to determine the direction of signal transition in it. The direction of
the number of required test configurations. Test the signal transition in any segment is determined by
circuitry associated with these methods is also that of the transition at the source and the inversions
described. along the partial path leading to the particular
segment. A test to determine whether the maximum
Index terms- Design automation, Field delay along a path is greater than the clock period
Programmable Gate Arrays ,Programmable logic must propagate a transition along the path and
devices,testing. produce a combination of side-input values that
maximizes the path delay. This approach is not
I.INTRODUCTION usually feasible because of the difficulty of
determining the inversions that maximize the path
This paper is concerned with testing paths in delay and the necessary primary input values to
lookup-table (LUT) based FPGAs after they have produce them. Instead, we propose to test each target
been routed. While this may be regarded as user path for all combinations of inversions along it,
testing , we are considering an environment in which guaranteeing that the worst case will also be included.
a large number of manufactured FPGA devices Although the number of combinations is
implementing a specific design are to be tested to exponential in the number of LUTs along the path,
ensure correct operation at the specified clock speed. the method is feasible because application of each test
It is thus akin to manufacturing tests in that the time requires only a few cycles of the rated clock.
needed for testing is important. Ideally, we would like However, the results may be pessimistic in that a path
to verify that the actual delay of every path between that fails a test may operate correctly in the actual
flip-flops is less than the design clock period. Since circuit, because the combination of inversions in the
the number of paths in most practical circuits is very failing test may not occur during normal operation.
large, testing must be limited to a smaller set of paths. The method of testing a single path in a circuit is
Testing a set of paths whose computed delay is within reprograms the FPGA to isolate each target path from
a small percentage of the clock period may be the rest of the circuit and make inversions along the
sufficient in most cases. Thus, our goal is to path controllable by an on-chip test controller. Every
determine by testing whether the delay along any of LUT along the path is re-programmed based on its
the paths in the set exceeds the clock period. original function. If it is positive unate in the on-path
input, the LUT output is made equal to the on-path
II.BASIC APPROACH input independent of its side inputs. Similarly,
negative unate functions are replaced by inverters. If
the original function is binate in the on-path input, the
LUT is re-programmed to implement the exclusive-
OR (XOR) of the on-path input and one of its side-
inputs, which we shall call its controlling sideinput.


As mentioned earlier,this change of functionality does the s steady at zero for the preceding three clock
not affect the delay of the path under test because the cycles. A test for the falling transition starts at
delay through an LUT is unaffected by the function 6T,with the input steady at one for the preceding three
implemented. Inversions along the path are controlled clock cycles. Results are sampled at d at time 4T(for
by the signal values on the controlling side inputs.For rising edge s transition)and 7T (for falling edge s
each combination of values on the controlling side transition),respectively.Thus,both rising and falling
inputs we apply a signal transition at the source of the transitions are applied at the source for each
path and observe the signal value at the destination combination of inversions in time 6T. As the falling
after one clock period.The absence of a signal transition is applied at 6T,the enable input E of the
transition will indicate that the delay along the tested counter is set to 1.This action starts a state
pathexceeds the clock period for the particular (counter)change at 7T to test the path for the next
combination of inversions. combination of inversions .A counter change at this
The basic method described above can be time point allows2T of settling time before the
implemented by the circuitry shown in Fig. 1, following transition occurs at the source s.By
consisting of a sequence generator, a response ensuring that the counter reaches its final value within
analyzer and a counter, that generates all Tand propagates to the path destination d within an
combinations of values in some arbitrary order. A additional T,d is ensured to be stable before the
linear feedback shift register modified to feedback following source transition. Thus, the destination will
shift register modified to include the all-0’s output reach the correct stable value corresponding to the
may be used as the counter.The controlling side new combination of inversions if no path from the
inputs are connected to the counter.The controller and counter to the destination has a delay greater than
the circuitry for applying tests and observing results 2T.This delay explains the need for a 3T period
are also formed during configuration in parts of the betweens transitions (1T to perform the test,1T for
FPGA that do not affect the behavior of the path(s) possible counter state changes ,and 1T for subsequent
under test. be used as the counter.The controller and propagation of the counter changes to d).
the circuitry for applying tests and observing results
are also formed during configuration in parts of the III. TEST STRATEGY
FPGA that do not affect the behavior of the path(s)
under test. The method described in the preceding section
The sequence generator produces a sequence of requires the test control circuitry to be reconfigured
alternating zeros and ones, with period equal to for every path to be tested. The total time for testing a
6T,where T is the operational clock period. The set of target paths in a circuit consists of the test
response analyzer checks for an output transition for application time and the reconfiguration time. Our
every test, and sets an error flip-flop if no transition is goal is to reduce both components of the total time for
observed at the end of a test.The flip-flop is reset only testing a specified set of paths. Since the time needed
at the beginning of the test session ,and will indicate for configuring the test structure is usually larger than
an error if and only if no transition is produced in that for applying test patterns generated on chip we
some test. The counter has as many bits as the number shall focus on reducing the number of test
of binate LUTs along the tested path. configurations needed by testing as many paths as
The test for a path for each direction of signal possible in each configuration.
direction consists of two parts ,an initialization part . Two approaches to maximize the number of
and a propagation part ,each of duration 3T.A path is paths tested in a test configuration suggest
tested in time 6T by overlapping the initialization part themselves. First, we can try to select a set of target
of each test with the propagation part of the preceding paths that can be tested simultaneously. This will also
test.In addition the change of counter state for testing have the effect of reducing test application time.
a path for a new combination of inversions is also Secondly, we can try to select a set of simultaneously
done during the initialization phase of rising transition testable sets that can be tested in sequence with the
tests. same configuration. In this case, the number of
Fig.2 shows the timing of the signals during the simultaneously tested paths may have to be reduced
application of a test sequence. It can be seen from the so as to maximize the total number of paths tested
figure that the source s of the test path toggles every with the configuration. These two approaches will be
three clock cycles.For correct operation, the input elaborated in the next two sections,but first we define
transition occurring at 3T must reach the destination a few terms..
within time T(i.e., before 3T+T).On the following The simultaneous application of a single rising
clock edge at 3T+T,the result of the transition is or falling transition at the sources of one or more
clocked into the destination flip-flop at d.A change paths and observing the response at their destinations
must be observed at the destination for every test, is called a test. The set of tests for both rising and
otherwise a flip-flop is set to indicate an error. In falling transitions for all combinations of inversions
Fig.2,a test for the rising edge starts at time 3T,with along each path is called a test phase, or simply, a


phase. As mentioned earlier, a single path with k

binate LUTs will have 2 ・ 2k tests in a test phase. The single phase method described above requires
that all paths tested in a session be disjoint. The
The application of all test phases for all target paths in
number of test sessions needed for a large target set is
a configuration is called a test session.
therefore likely to be very large. The multi-phase
method attempts to reduce the number of test sessions
A. Single Phase Method
needed by relaxing the requirement that all paths
tested in a session be disjoint. This, however,
This method attempts to maximize the number of
increases the test and cannot be tested simultaneously.
simultaneously tested paths. A set of paths may be
Consider sets of target paths S1, S2, Sp such that all
tested in parallel if it satisfies the following
paths in each set are disjoint except for common
sources. Clearly, all paths in each set Si can be tested
1) No two paths in the set have a common destination.
simultaneously, as in the single phase method, if each
2) No fanout from a path reaches another path in the
set can be selected and logically isolated from all
other paths. This allows the testing of the sets Si in
The above conditions guarantee that signals
sequence, and is the basis of our multi-phase method.
propagating along paths in the set do not interfere
We also restrict the target paths for each session to
with one another. Moreover, if the same input is
simplify the control circuitry needed.
applied to all paths in the set, two or more paths with
We assume that the LUTs in the FPGA are 4-
a common initial segment will not interact if they do
input LUTs, but the method can be easily modified to
not re-converge after fanout.
allow a larger number of inputs. Since each LUT may
All LUTs on paths to be tested in a session are
need up to two control inputs, one for path selection
reprogrammed to implement inverters, direct
and the other for inversion control, at most two target
connections or XORs as discussed in the preceding
paths may pass through any LUT. Target paths
section. The LUTs with control inputs are levelized,
satisfying the following conditions can be tested in a
and all control inputs at the same level are connected
single session.
to the same counter output. The source flipflops of all
1) There is a path to each target path destination,
paths to be tested in the session are connected to the
called the main path to the destination.
same sequence generator, but a separate transition
2) Main paths may not intersect, but they may have a
detector is used for each path. The transition detectors
common initial section.
of all paths are then ORed together to produce an
3) Additional paths to each destination, called its side
error indication if any of the paths is faulty.
paths, must meet only the main path and continue to
Alternatively, a separate error flip-flop can be used
the destination along the main path.
for each tested path, connected to form a scan chain
4) Main and side paths may not intersect any other
and scanned out to identify the faulty path(s).
path except that two or more paths may have a
common source.
5)No more than two target paths may pass through
any LUT.
6)The number of target paths to all destinations must
be the same.
The above conditions allow us to select one path
to each output and test all of them in parallel. The first
two conditions guarantee that the signal propagating
along main paths to different destinations will not
interact. The main paths can therefore be tested in
parallel. The restriction that a side path can meet only
B. Multi-phase Method the main path to the same destination [condition 3)]
allows a simple mechanism for propagating a signal
through the main path or one of its side paths.
Together with Condition 4, it guarantees that a set of
main paths or a set of side paths, one to each
destination, can be tested in parallel. Condition 5
allows for two control signals to each LUT, one for
controlling inversion, and the other for selecting the
path for signal propagation. A single binary signal is
sufficient for selecting one of the target paths that
may pass through an LUT. The last condition is
required to produce a signal change at every


destination for every test, simplifying the error

detection logic.
With the above restrictions, LUTs on target
paths will have one or two target paths through them.
These LUTs are called 1-path LUTs and 2-path LUTs,
respectively. The inputs that are not on target paths
will be called free inputs.
The following procedure selects a set of target
paths satisfying the conditions for multi-phase testing
by selecting appropriate target paths for each set Si
from the set of all target paths in the circuit. The
union of these sets is the set of paths targeted in a test
session. The procedure is then repeated for the PROCEDURE 1
remaining paths to obtain the target paths for
subsequent test sessions until all paths are covered. 1) Select a path that does not intersect any already
selected path, as the main path to each destination.
2) For each main path, select a side path such that
a) It meets the main path and shares the rest of the
path with it.
b) No other path meets the main path at the same LUT
c) It does not intersect any already selected target path
(except for segments overlapping the main path).
3) Repeat Step 2 until no new side path can be found
for any main path.
4) Find the number, n, of paths such that
a) There are n target paths to each destination.
b) The total number of paths is a maximum.
5) Select the main path and n − 1 side paths to each
destination as the target paths for the session.
: Figure 3 shows all the target paths in a circuit. The
source and destination flip-flops are omitted for the
sake of clarity. We start Procedure 1 by (arbitrarily)
selecting dAEJLy and hCGKMz as the main paths to
the destinations y and z.Adding paths eAEJLy, cEJLy
and fBFJLy to the first path, and jCGKMz, nDGKMz
and qHKMz to the second, we get the set of target
paths shown in heavy lines. Since there are four paths
to each destination, the eight target paths shown can
be tested in a single four-phase session.
The procedure can be repeated with the
remaining paths to select sets of target paths for
subsequent sessions. One possible set of test sessions
is given in the following table, where the path(s) in
the first row of each sessions were those chosen as the
main path(s).
Destination: y Destination: z
Session 1 dAEJLy hCGKMz
Session 2 gBEJLy gHKMz
Session 3 gBFJLy mDGKMz
Session 4 hCFJLy
Session 5 nDGLy
Session 6 mDGLy


The set of sessions may not be unique and depends on alternating 1’s and 0’s, a four-bit counter for
the choices made. Also note that not all sessions inversion control and a path selector. The path
obtained are multiphase sessions. Session 3, for selector is a shift register that produces an output
example, became a single-phase session because no sequence, 000, 100, 010, 001 for the 4-phase test of
path qualified as a side path of mDGKMz, which was the first session in our example.
arbitrarily chosen as the main path. No paths could be It can be verified from the figure that the main paths
concurrently tested with those in Sessions 4, 5, and 6 are selected when all selector outputs are 0. When any
because all paths to z had already been targeted. The output is 1, exactly one side path to each destination
sets of target paths obtained by Procedure 1 are such is selected. Input transitions are applied to all paths
that each 2-path LUT has a main path and a side path simultaneously, but propagate only up to the first 2-
through it. Thus, a single binary signal is sufficient to path LUT on all paths except the selected
select the input through which the signal is to be ones.Thus,only one path to each destination will have
propagated. Since the side path continues along the transitions along its entire length.since these paths are
main path, selecting the appropriate input at the 2- disjoint,no interaction can occur among them.
path LUT where it meets the main path is sufficient
for selecting the side path for testing. By using the
same path selection signal, one side path to each
destination can be selected simultaneously and tested
in parallel.
The FPGA configuration for a test session is
obtained by the following procedure:

1) Configure a sequence generator and connect its
output to the sources of all target paths of the session.
2) Configure a counter to control inversion parity,
with the number of bits equal to the largest number of
binate LUTs along any target path for the test session.
3) Configure a path selector to select the set of paths
tested in each test phase, with the number of bits
equal to the
In this paper, a new approach to testing selected sets
number of side paths to a destination.
of paths in FPGA-based circuits is presented. Our
4) Designate a free input of each LUT as its inversion
approach tests these paths for all combinations of
controlinput p, and connect it to the counter output
inversions along them to guarantee that the maximum
delays along the tested paths will not exceed the clock
to its level.
period during normal operation. While the test
5) Designate another free input of each 2-path LUT as
method requires reconfiguring the FPGA for testing,
its selector input s, and connect it to the path selector.
the tested paths use the same connection wires,
6) Modify the LUT of each 1-path LUT with on-path
multiplexers and internal logic connections as the
input a to implement f = a ⊕ p, if the original original circuit, ensuring the validity of the tests.
function is binate in a; otherwise f = a if it is positive Following testing, the test circuitry is removed from
or a if it is negative in a. the device and the original user circuit is programmed
7) Modify the LUT of each 2-path LUT to implement into the FPGA. Two methods have been presented for
reducing the number of test configurations needed for
a given set of paths. In one method, called the single-
where a and b are on the main path and a side path,
phase method, paths are selected so that all paths in
each configuration can be tested in parallel. The
The above modification for 2-path LUTs assumes
second method, called the multi-phase method,
that they are binate in both on-path inputs. If the
attempts to test the paths in a configuration with a
output of a 2-path LUT is unate in a or b or both, a
sequence of test phases, each of which tests a set of
slightly different function f is needed. For example, if
paths in parallel. Our experimental results with
the LUT output is binate in a and negative in b, the
benchmark circuits show that these methods are
modified LUT must implement viable, but the preferable method depends on the
Figure 4 shows the test structure for the circuit of Fig. circuit structure. The use of other criteria, such as the
3.Only target paths that were selected for the first test total time for configuration and test application for
session are shown, and all LUT functions are assumed each configuration, or better heuristics may lead to
to be binate in their inputs. The test circuitry consists more efficient testing with the proposed approach.
of a sequence generator that produces a sequence of



[1] M. Abramovici, C. Stroud, C. Hamilton, S.

Wijesuriya, and V. Verma, “Using roving STARs
for on-line testing and diagnosis of FPGAs in
faulttolerant applications,” in IEEE Int. Test
Conf., Atlantic City, NJ, Sept. 1999, pp. 28–30.
[2] M. Abramovici, C. Stroud, and J. Emmert,
“Online BIST and BIST-based diagnosis of
FPGA logic blocks,” IEEE Trans. on VLSI
Systems, vol. 12, no. 12, pp. 1284–1294, Dec.
[3] I. G. Harris and R. Tessier, “Interconnect testing
in cluster-base FPGA architectures,” in
ACM/IEEE Design Automation Conf., Los
Angeles, CA, June 2000, pp. 49–54.
[4] I. G. Harris and R. Tessier, “Testing and diagnosis
of interconnect faults in cluster-based FPGA
architectures,” IEEE Trans. on CAD, vol. 21, no.
11, pp. 1337–1343, Nov. 2002.
[5] W.K. Huang, F.J. Meyer, X-T. Chen, and F.
Lombardi, “Testing configurable LUT-based
FPGAs,” IEEE Trans. on VLSI Systems, vol. 6,
no. 2, pp. 276–283, June 1998.
[6] C. Stroud, S. Konala, P. Chen, and M.
Abramovici, “Built-in self-test of logic blocks in
FPGAs (Finally, a free lunch),” in IEEE VLSI
Test Symp., Princeton, NJ, Apr. 1996, pp. 387–
[7] C. Stroud, S.Wijesuriya, C. Hamilton, and M.
Abramovici, “Built-in selftest of FPGA
interconnect,” in IEEE Int. Test Conf.,
Washington, D.C., Oct. 1998, pp. 404–411.
[8] L. Zhao, D.M.H. Walker, and F. Lombardi,
“IDDQ testing of bridging faults in logic
resources of reprogrammable field programmable
gate arrays,” IEEE Trans. on Computers, vol. 47,
no. 10, pp. 1136–1152, Oct.1998.
[9] M. Renovell, J. Figuras, and Y. Zorian, “Test of
RAM-based FPGA: Methodology and
application to the interconnect,” in IEEE VLSI
Test Symp., Monterey, California, Apr. 1997, pp.
[10] C-A. Chen and S.K. Gupta, “Design of efficient
BIST test pattern generators for delay testing,”
IEEE Trans. on CAD, vol. 15, no. 12, pp. 1568–
1575, Dec. 1996.
[11] S. Pilarski and A. Pierzynska, “BIST and delay
fault detection,” in
IEEE Int. Test Conf., Baltimore, MD, Oct. 1993,
pp. 236–242.
[12] A. Krasniewski, “Application-dependent testing
of FPGA delay faults ,” in Euromicro Conf.,
Milan, Italy, Sept. 1999, pp. 26267.


VLSI Realisation Of SIMPPL Controller Soc For Design

Tressa Mary Baby John, II M.E VLSI Karunya University, Coimbatore,
S.Sherine, Lecturer, ECE Dept., Karunya University, Coimbatore
email contact no: 09994790024

Abstract-SoCs are defined as a collection of SIMPPL controller consists of

functional units on one chip that interact to perform ƒ Execute Controller
a desired operation. The decreasing size of process ƒ Debug Controllers
technologies enables designers to implement ƒ Control Sequencer
increasingly complex SoCs using Field
Programmable Gate Arrays (FPGAs). This will The implementation of all the functional blocks of
reduce impact of increased design time and costs for SIMPPL controller has to be done and test bench
electronics when we try to increase design will be created to prove the functionality with the
complexity. The project describes as to how SoCs data from off chip interfaces.
are designed as Systems Integrating Modules with Index Terms- Design reuse, Intellectual property,
Predefined Physical Links (SIMPPL) controller. Computing Element,
The design consists of computing systems as a
network of Computing Elements (CEs) I. INTRODUCTION
interconnected with asynchronous queues. The
strength of the SIMPPL model is the CE A. What Is SIMPPL?
abstraction, which allows designers to decouple the
functionality of a module from system-level The SIMPPL controller acts as the physical
communication and control via a programmable interface of the IP core to the rest of the system. It
controller. This design aims at reducing design time processes instruction packets received from other CE’s
by facilitating design reuse, system integration, and and its instruction set is designed to facilitate
system verification. Abstract-SoCs are defined as a controlling the core’s operations and reprogramming
collection of functional units on one chip that the core’s use for different applications. SIMPPL uses
interact to perform a desired operation. The IP (Intellectual Property) concepts with predefined
modules are typically of a coarse granularity to modules making using of design reuse and it expedites
promote reuse of previously designed Intellectual system integration using IP concepts. Reusing IP is
Property (IP). The decreasing size of process more challenging in hardware designs than reusing
technologies enables designers to implement software functions. Software designers benefit from a
increasingly complex SoCs using Field fixed implementation platform with a highly abstracted
Programmable Gate Arrays (FPGAs). This will programming interface, enabling them to focus on
reduce impact of increased design time and costs for adapting the functionality to the new application.
electronics when we try to increase design Hardware designers not only need to consider changes
complexity. The project describes as to how SoCs to the module’s functionality, but also to the physical
are designed as Systems Integrating Modules with interface and communication protocols. The SIMPPL
Predefined Physical Links (SIMPPL) controller. is modelled as a system model with abstraction for IP
The design consists of computing systems as a modules called computing element (CE) that facilitate
network of Computing Elements (CEs) SoC design for FPGAs. The processing element
interconnected with asynchronous queues. The represents the datapath of the CE or the IP module,
strength of the SIMPPL model is the CE where an IP module implements a functional block
abstraction, which allows designers to decouple the having data ports and control and status signals.
functionality of a module from system-level B. Why SIMPPL?
communication and control via a programmable For communication, tasks performed are common. In
controller. This design aims at reducing design time the normal communication interface, there is merely
by facilitating design reuse, system integration, and only transfer of data. There is no processing of data
system verification.The SIMPPL controller acts as and if there is any error, it is checked only after it is
the physical interface of the IP core to the rest of received at the receiver end, hence a lot of time is
the system. Its instruction set is designed to wasted debugging the error. But SIMPPL has two
facilitate controlling the core’s operations and controllers, namely, normal and debug controller. Thus
reprogramming the core’s use for different testing is done as and when error is detected.
applications. Abstracting IP modules as computing elements (CEs)
can reduce the complexities of adapting IP to new
applications. The CE model separates the datapath of
the IP from system-level control and communications.


A lightweight controller provides the system-level decoupling the processing rate of a CE from the inter-
interface for the IP module and executes a program that CE communication rate.
dictates how the IP is used in the system. Localizing
the control for the IP to this program simplifies any For the purposes of this discussion, we assume a FIFO
necessary redesign of the IP for other applications. width of 33 bits, but leave the depth variable.
Most of the applications are ready to use, hence with
slight modification, we can make it compatible for any
other applications or complicated architecture. C

C. Advantages of SIMPPL In FPGAs C C C

The advantage of using SIMPPL in FPGA C C

is that the SIMPPL design is based on simple data flow
and also the design is split into CE and PE. CE can be
implemented with the help of combinational circuits
and PE can be easily arrived at with the help of simple
FSM. Both of these favour high speed applications OFF-CHIP
because there are no complicated arithmetic or
transformations such as sine, cosine transforms etc.
The most important benefits to designing SoCs on an Fig. 1. Generic computing system described using the
FPGA are that there is no need to finalize the SIMPPL model.
partitioning of the design at the beginning of the design
process or to create a complex co-simulation II. SIMPPL CE ABSTRACTION
environment to model communication between
hardware and software. Design Reuse With IP Modules And Adaptability IP
reuse is one of the keys for SoC design productivity
D. The SIMPPL System Model improvement. IP core is a block of logic gates used in
making FPGA or ASIC for a product. IP cores are
The Fig 1.below shows a SIMPPL system model, blocks of information and they are portable as well.
the SIMPPL SoC architecture of a network of CEs They are essential elements of design reuse. Design
comprising the hardware and software modules in the reuse is faster and cheaper to build a new product
system. I/O Communication Links are used to because they are not designed earlier but also tested for
communicate with off-chip peripherals using the reliability. In hardware, a component in design reuse is
appropriate protocols. The Internal Communication called IP cores. Software designers have a fixed
Links are represented by arrows in between the CE’s. implementation with a strong abstracted programming
These are defined as point-to-point links to provide interface that helps to adapt functionality to new
inter-CE communications where the communication application while hardware users need to consider
protocols are abstracted from the physical links and physical interface and communication protocols
implemented by the CEs. SIMPPL is thus a point-to- adaptability along with modules functionality. This is
point interconnection architecture for rapid system the cost of design and verification across multiple
development. Communication between processing designs and is proven to increase productivity. The
elements is achieved through SIMPPL. Several VSI Alliance has proposed the Open Core Protocol
modules are connected on a point-to-point basis to (OCP) to enable the provided by SIMPPL. IP reuse
form a generic computing system. Mechanism for enables the team to leverage separation of external core
physical transfer of data across a link is provided so communications from the IP core’s functionality,
that designer can focus on the meaning of data transfer. similar to the SIMPPL model. Both communication
SIMPPL greatly facilitates speeds and ease of hardware models are illustrated in Figure 2. The SIMPPL model
development. For our current investigation, we are targets the direct communication model using a
using n-bit wide asynchronous first-in–first-out defined, point-to-point interconnect structure for all
(FIFOs) to implement the internal links in the SIMPPL on-chip communications. In contrast, OCP is used to
model. Asynchronous FIFOs are used to connect the provide a well-defined socket interface for IP that
different CE’s to create the system. SIMPPL thus allows a designer to attach interface modules that act
represents computing systems as hardware of CE’s as adaptors to different bus standards that include
interconnected with asynchronous FIFOs. point-to-point interconnect structures as shown in
Asynchronous FIFOs isolate clocking domains to Figure 2. This allows a designer to easily connect a
individual CEs, allowing them to transmit and receive core to all bus types supported by the standard. The
at data rates independent of the other CEs in the SIMPPL model, however, has a fixed interface;
system. This simplifies system-level design by supporting only point-to-point connections with the
objective of allowing is to enable designers to treat IP


modules as programmable coarse grained functional peripheral, and interacts with the rest of the system via
units. Designers can then reprogram the IP module’s the SIMPPL controller, which interfaces with the
usage in the system to adapt to the requirements of new internal communication links to receive and transmit
applications. instruction packets. The SIMPPL Control Sequencer
(SCS) module allows the designer to specify, or
H/W IP H/W IP ‘‘program’’, how the PE is used in the SoC. It contains
to OCP to OCP
the sequence of instructions that are executed by the
controller for a given application. The controller then
manipulates the control bits of the PE based on the
current instruction being executed by the controller and
OCP to Bus A OCP to Bus B
the status bits provided by the PE.

Bus A Bus B


Fig. 2. Standardizing the IP interface using (a) OCP for

different bus standards and (b) SIMPPL for point-to-
point communications.

B. CE Abstraction
The strength of the SIMPPL model is the
CE abstraction, which allows designers to decouple the Fig. 3. Hardware CE abstraction.
functionality of a module from system-level
communication and control via a programmable III. SIMPPL CONTROLLER
controller. This design aim at reduces design time by The SIMPPL controller acts as the physical
facilitating design reuse, system integration, and interface of the IP core to the rest of the system. Its
system verification. The CE is an abstraction of instruction set is designed to facilitate controlling the
software or hardware IP that facilitates design reuse by core’s operations and reprogramming the core’s use for
separating the datapath (computation), the inter-CE different applications. As told above, we have to
communication, and the control. Researchers have design two versions of controllers- a execution-only
demonstrated some of the advantages of isolating version and a run-time debugging version, in other
independent control units for a shared datapath to words, a execute controller and a debug controller. The
support sequential procedural units in hardware. This is Execute controller has 3 parts, namely, consumer
similar to when a CE is implemented as software on a execute, producer execute and full execute. The Debug
processor (software CE), the software is designed with controller also has 3 parts, a consumer debug, producer
the communication protocols, the control sequence, debug and full debug.
and the computation as independent functions. Ideally,
a controller customized to the datapath of each CE A. Instruction Packet Format
could be used as a generic system interface, optimized
for that specific CE’s datapath. To this end, we have SIMPPL uses instruction packets to pass
created two versions of a fast, programmable, both control and data information over the internal
lightweight controller—an execution-only (execute) communication links shown in Fig. 1. Fig. 4 provides a
version and a run-time debugging (debug) version— description of the generic instruction packet structure
that are both adaptable to different types of transmitted over an internal link. Although the current
computations suitable to SoC designs, one of them is SIMPPL controller uses a 33-bit wide FIFO, the data
field-programmable gate array (FPGAs). Fig.3 word is only 32 bit. The remaining bit is used to
illustrates how the control, communications and the indicate whether the transmitted word is an instruction
datapath are decoupled in hardware CEs. The or data. The instruction word is divided into the least
processing element (PE) represents the datapath of the significant byte, which is designated for the opcode,
CE or the IP module, where an IP module implements and the upper 3 bytes, which represents the number of
a functional block having data ports and control and data words (NDWs) sent or received in an instruction
packet. The current instruction set uses only the five
status signals. It performs a specific function, be it a
least significant bits (LSBs) of the opcode byte to
computation or communication with an off-chip
represent the instruction. The remaining bits are


reserved for future extensions of the controller where only one instruction is in flight at a time, to
instruction set. reduce design complexity and to simplify program
writing for the user. The SIMPPL controller also
monitors the PE-specific status bits that are used to
generate status bits for the SCS, which are used to
determine the control flow of a program. The format
of an output data packet sent via the internal transmit
(Tx) link is dictated by the instruction currently being
executed. The inputs multiplexed to the Tx link are
the Executing Instruction Register (EX IR), an
immediate address that is required in some
instructions, the address stored in the address register
a0 and any data that the hardware IP transmits. Data
can only be received and transmitted via the internal
links and cannot
originate from the SCS. Furthermore, the controller
can only send and receive discrete packets of data,
which may not be sufficient for certain types of PEs
requiring continuous data streaming. To solve this
problem, the controller supports the use of optional
asynchronous FIFOs to buffer the data transmissions
Fig. 4. An internal link’s data packet format. between the controller and the PE.

Each instruction packet begins with an instruction

word that the controller interprets to determine how
the packet is used by the CE. Since the SIMPPL
model uses point-to-point communications, each CE
can transfer/receive instruction packets directly
to/from the necessary system CEs to perform the
appropriate application- specific computations.

B. Controller Architecture

Figure 5 illustrates the SIMPPL controller’s datapath

architecture. The controller executes instructions
received via both the internal receive (Rx) link and the
SCS. Instructions from the Rx Link are sent by other
CEs as a way to communicate control or status
information from one CE to another CE, whereas
instructions from the SCS implement local control.
Instruction execution priority is determined by the
value of the Cont Prog bit so that designers can vary Fig. 5. An overview of the SIMPPL controller datapath
priority of program instructions depending on how a architecture.
CE is used in an application. If this status bit is high,
then the “program” (SCS) instructions have the
highest priority, otherwise the Rx link instructions
have the highest priority. Since the user must be able C. Controller Instruction Set
to properly order the arrival of instructions to the Table 1 contains all the instructions currently
controller from two sources, allowing multiple supported by the SIMPPL controller. The objective is
instructions in the execution pipeline greatly to provide a minimal instruction set to reduce the size
complicates the synchronization required to ensure of the controller, while still providing sufficient
that the correct execution order is achieved. programmability such that the cores can be easily
Since the user must be able to properly order the reconfigured for any potential application.
arrival of instructions to the controller from two
sources, allowing multiple instructions in the
execution pipeline greatly complicates the
synchronization required to ensure that the correct
execution order is achieved. Therefore, the SIMPPL
controller is designed as a single-issue architecture,


TABLE 1-Current Instruction Set Supported by the address of the next instruction of the program to
SIMPPL Controller be fetched from memory. While a SIMPPL controller
and program perform the equivalent operations to a
program running on a generic processor, the controller
uses a remote PC in the SCS to select the next
instruction to be fetched. Figure 6 illustrates the SCS
structure and its interface with the SIMPPL controller
via six standardized signals. The 32-bit program word
and the program control bit, which indicates if the
program word is an instruction or address, are only
valid when the valid instruction bit is high. The valid
Although some instructions required to fully instruction signal is used by the SIMPPL controller in
support the reconfigurability of some types of combination with the program instruction read to
hardware PEs may be missing, the instructions in fetch an instruction from the Store Unit and update the
Table 1 support the hardware CEs that have been built PC. The continue program bit indicates whether the
to date. Furthermore, the controller supports the current program instruction has higher priority than
expansion of the instruction set to meet future the instructions received on the CE Rx link. It can be
requirements. The first column in Table 1 describes used in combination with PE-specific and controller
the operation being performed by the instruction. status bits to help ensure the correct execution order
Columns 2 through 4 are used to indicate whether the of instructions.
different instruction types can be used to request data
(Rd Req), receive data (Rx), or write data (Wr). The
next two columns are used to denote whether each
instruction may be issued from or executed from the
SCS (S) or internal Receive Communication Link (R).
Finally, the last two columns are used to denote
whether the instruction requires an address field
(Addr Field) or a data field (Data Field) in the packet
transmission. The first instruction type described in
Table 1 is the immediate data transfer instruction. It
consists of one instruction word of the format shown
in Figure 4, excluding the address field, where the two
LSBs of the opcode indicates whether the data transfer
is a read request, a write, or a receive. The immediate Fig. 6. Standard SIMPPL control sequencer structure
data plus immediate address instruction is similar to and interface to the SIMPPL controller.
the immediate data transfer instruction except that an
address field is required as part of the instruction A. Consumer Controller
packet.Designers can reduce the size of the controller We have 4 interfacing blocks for
by tailoring the instruction set to the PE. Although communication within the consumer execute
some CE’s receive and transmit data, thus requiring controller. They are Master, Slave, Processing
the full instruction set, others may only produce data Element, and Programmable Interface. The Consumer
or consume data. The Producer controller (Producer) writes data to the Master. Slave is from wherein the
is designed for CEs that only generate data. It does not Consumer reads data. The signals of the Master block
support any instructions that may read data from a CE. are Master clock, Master write, Master data, Master
The Consumer controller (Consumer) is designed for control Master full. Following are the signals of the
CEs that receive input data without generating output Slave block, Slave clock, Slave data, Slave control,
data. It does not support any instructions that try to Slave read, Slave exist. There are 2 more signals that
write PE data to a Tx link. are generated in relation to the Processing Element
and these are generated from the Processing Element
IV. SIMPPL CONTROL SEQUENCER to the Consumer. Following are the signals:
The SIMPPL Control Sequencer provides the can_write_data, can_write_addr. The signals
local program that specifies how the PE is to be used generated from the Programmable Interface to the
by the system. The operation of a SIMPPL controller Consumer are as follows: program control bit program
is analogous to a generic processor, where the
valid instruction, cont_program, and program
controller’s instruction set is akin to assembly
instruction. The signals generated from the Consumer
language. For a processor, programs consist of a series
to the Programmable Interface are as follows:
of instructions used to perform the designed
prog_instruction_read. The input signals of the blocks
operations. Execution order is dictated by the
are given to the consumer controller and the output
processor’s Program Counter (PC), which specifies


signals are directed to the blocks from the consumer decryption of data will be done at the producer and consumer
controller. controller ends respectively as an enhancement part of this
Initially when the process begins, the controller checks project.
whether the instruction is a valid instruction or not. If not, the RFERENCES
instruction is not executed, as the valid instruction bit is not
set as high. On the receiving of a valid instruction, the valid[1] M. Keating and P. Bricaud, Reuse Methodology Manual
instruction bit goes high, the instruction is identified then for
by System-on-a-Chip Designs. Norwell, MA: Kluwer
the control bit. We may receive either data or instruction. Academic, 1998.
When data is received from the slave, the consumer will read [2] H. Chang, L. Cooke, M. Hung, G. Martin, A. J. McNelly,
the data and store it in the Processing Element. When the and L. Todd, Surviving the SOC Revolution: A Guide to
slave read pin is becomes ‘1’, the slave data will Platform-Based
be Design. Norwell, MA: Kluwer Academic,
transferred. Once this data is received, the Processing 1999.
Element checks for the condition whether its ready to set the [3]L.Shannon and P.Chow, “Maximizing system
can_write_data pin or can_write_address. This is known once performance:Using reconfigurability to monitor system
the data is sent to the consumer and hence the can_write_data communications,” in Proc. IEEE Int. Conf. on Field-
is set. After this the corresponding acknowledge signals are Programm. Technol., Dec. 2004, pp. 231–238.
sent and once the data transfer is ensured, the [4] ——, “Simplifying the integration of processing elements
can_write_address pin is set to ‘1’ from the Processing in computing systems using a programmable controller,” in
Element. Once this write_address is received, the data in the proc. IEEE Symp. on Field-Programm.Custom Comput.
Mach., Apr. 2005, pp. 63–72.
slave is transferred to the Processing Element. When the
[5] E. Lee and T. Parks, “Dataflow process networks,” Proc.
consumer communicates with the Master, all the data is
IEEE, vol. 83, no. 5, pp. 471– 475, May 1995.
transferred to the Master. Master block deals with pure data
[6] K. Jasrotia and J. Zhu, “Stacked FSMD: A power
transfer, hence on receiving pure data instead of instruction,
efficient micro-architecture for high level synthesis,” in Proc.
the Slave|_data is stored as Master_data. The address to store
Int. Symp. on Quality Electronic Des., Mar. 2004, pp. 425–
this Master_data is governed by the Consumer controller. 430.
The two important facts we are dealing here is
concerning the program instruction and the slave data. The
slave data for this module is a fixed value. The program
instruction is given any random value. It contains the
instruction and the size of the data packets that is data words.
These data words are in a continuous format and are
generated as per the counter.



The CE abstraction facilitates verification of the

PE’s functionality. Hence a debug controller will be
introduced based on the execute SIMPPL controller that
allows the detection of low-level programming and
integration errors. For secure data transfer, encryption and

Clock Period Minimization of Edge Triggered Circuit
.D.Jacukline Moni, 2S.Arumugam,1 Anitha.A,
ECE Department, Karunya University,
Chief Executive, Bannari Amman Educational Trust

Abstract--In a sequential VLSI circuit, due to increase, more techniques are developed for clock period
differences in interconnect delays on the clock minimization. An application of optimal clock skew
distribution network, clock signals do not arrive at all scheduling to enhance the speed characteristics of
of the flip-flops (FF) at the same time. Thus there is a functional blocks of an industrial chip was demonstrated
skew between the clock arrival times at different in [1].
latches. Among the various objectives in the II.PROJECT DESCRIPTION
development of sequential circuits, clock period
minimization is one of the most important one. Clock This paper deals with the clock period
skew can be exploited as a manageable resource to minimization of edge triggered circuits. Edge triggered
improve circuit performance. However, due to the circuit are the sequential circuits that use the edge-
limitation of race conditions, the optimal clock skew triggered clocking scheme. It consists of registers and
scheduling often does not achieve the lower bound of combinational logic gates with wires connecting between
sequential timing optimization. This paper presents the them. Each logic gate has one output pin and one or more
clock period minimization of edge-triggered circuits. input pin. A timing arc is used to denote the signal
The objective here is not only to optimize the clock propagation from input pin to output pin and suitable
period but also to minimize the required inserted delay delay value for the timing arc is also taken in to account.
for resolving the race conditions. This is done using In the design of an edge-triggered circuit, if the clock edge
Modelsim XE 11 5.8c. arrives at each register exactly simultaneously, the clock
period cannot be shorter than the longest path delay. If this
I. INTRODUCTION circuit has timing violations cause by long paths, an
improvement can be done by an optimization step. There
Most integrated circuits of sufficient complexity are two approaches to resolve the timing violations of long
utilize a clock signal in order to synchronize different paths. One is to apply logic optimization techniques for
parts of the circuit and to account for the propagation reducing the delays of long paths; and the other is to apply
delays. As ICs become more complex, the problem of sequential timing optimization techniques, such as clock
supplying accurate and synchronized clocks to all the skew scheduling [7] and retiming transformation [5] [8] ,
circuits become difficult. One example of such a complex to adjust the timing slacks among the data paths. Logic
chip is the microprocessor, the central component of optimization techniques are applied earlier. For those long
modern computers. A clock signal might also be gated or paths whose delays are difficult to further reduce,
combined by a controlling signal that enables or disables sequential timing optimization techniques are necessary.
the clock signal for a certain part of a circuit. In It is well known that the clock period of a
synchronous circuit, clock signal is a signal used to nonzero clock skew circuit can be shorter than the longest
coordinate the actions of two or more circuits. A clock path delay if the clock arrival times of registers are
signal oscillates between high and a low state and is properly scheduled. The optimal clock skew scheduling
usually in the form of a square wave. Circuits using the problem can be formulated as a constraint graph and
clock signal for synchronization may become active at solved by polynomial time complexity algorithms like
either rising edge, falling edge or both edges of the clock cycle detection method [6],binary search algorithms,
cycle. A synchronous circuit is one in which all the parts shortest path algorithms [2] etc. Given a circuit graph G,
are synchronized by a clock. In ideal synchronous circuits, the optimal clock skew scheduling problem is to
every change in the logical levels of its storage determine the smallest feasible clock period and find an
components is simultaneous. These transactions follow the optimal clock skew schedule, which specifies the clock
level change of a special signal called the clock. Ideally, arrival times of registers for this circuit to work with the
the input to each storage element has reached its final smallest feasible clock period. Due to the limitation of
value before the next clock occurs, so the behaviors of the race conditions, the optimal clock skew scheduling often
whole circuit can be predicted exactly .Practically, some does not achieve the lower bound of sequential timing
delay is required for each logical operation, resulting in a optimization. Thus, a combination of optimal clock skew
maximum speed at which each synchronous system can scheduling and delay insertion may lead to further clock
run. To make these circuits work correctly, a great deal of period reduction. For this circuit graph shown below is
care is needed in the design of the clock distribution taken for analysis. This approach of combining optimal
network. This paper deals with the clock period clock skew scheduling and delay insertion for the
minimization of edge triggered circuit. Clock skew is a synthesis of nonzero clock skew circuits was done using
phenomenon in synchronous circuits in which the clock Delay Insertion and Nonzero Skew (DIANA) algorithm.
signal arrives at different components at different times. The DIANA algorithm is an iteration process
This can be due to wire-interconnect length, temperature between the construction of an effective delay-inserted
violations, capacitive coupling, material imperfections etc. circuit graph and the construction of an irredundant delay-
As design complexity and clock frequency continue to inserted circuit graph. The iteration process repeats until

the clock period cannot be further reduced. The delay The delay to register ratio of a directed cycle C is
component is then applied to the edge triggered circuit that given by maximum delay of C / the number of registers in
we have taken. C. This gives the lower bound of sequential timing
III.METHOD 1 optimization. From the circuit graph it is clear that, the
A. LOWER BOUND OF SEQUENTIAL TIMING maximum delay to register ratio [9] of the directed cycle is
OPTIMIZATION 4 tu. The waveform of the edge triggered circuit is shown
Fig 1 shows an edge triggered flipflop circuit. It in fig 3.
consists of registers and combinational logic gates with
wires connecting them. The circuit has four registers and
eight logic gates. Each logic gate has one ore more input
pin and one output pin. A timing arc is defined to denote
the signal propagation from input to output. The delays of
the timing arc in the edge triggered circuit are initialized
as shown in the table below. A data path from register Ri
to register Rj denoted as Ri Rj includes the
combinational logic from Ri to Rj. The circuit can also be
modeled as a circuit graph G (V, E) for timing analysis Fig.2. Circuit Graph
where V is the set of vertices
and E is the set of directed edges. Each vertex represents a
register and special vertex called host is used to
synchronize the input and output. A directed edge (Ri, Rj)
represents a data path
Ri Rj, and it is associated with weight which represents
the minimum and maximum propagation delay of the data
path. The circuit graph of the edge triggered flipflop is
shown in fig 2. From the graph it is clear that the
maximum propagation delay path is TPD3,4 (max) and is 6
time units (tu) .
Fig. 3. Waveform of edge triggered flipflop


This section introduces circuit graph and constraint
graph to model the optimum clock skew scheduling
problem. This problem can be modeled as Graph –
theoretic problem [4]. Let TCi denote the clock arrival time
of register Ri. TCi is defined as the time relative to a global
time reference. Thus, TCi may be a negative value. For a
data path Ri Rj , there are two types of clocking hazards:
double clocking where the same clock pulse triggering the
same data in two adjacent registers; and zero clocking,
where the data reaches a register too late relative to the
following clock pulse .To prevent double clocking, the
clock skew must satisfy the following constraint: TCj - TCi
TPDi,j(min). To prevent zero clocking, the clock skew
must satisfy the following constraint: TCi - TCj P-
TPDi,j(max), where P is the clock period. Both the two
inequalities define a permissible clock skew range of a
data path. Thus if we have a circuit graph, G and clock
period P, we can model the constraints of clocking hazards
Fig. 1. Edge triggered flipflop
by a constraint graph, G cg( G, P) where each vertex
represents a register and a directed edge corresponds to
either type of constraint. Each directed edge ( Ri, Rj ) in
the circuit graph, G has a D-edge and Z-edge in the
corresponding constraint graph Gcg(G,P).The D- edge
corresponds to double clocking constraints and is in the
direction of signal propagation and is associated with
weight TPDi,j(min). The Z edge corresponds to zero clocking
constraints, is against the direction of signal propagation
and is associated with weight P - TPDi,j(max). Using the
circuit graph shown in fig 2, the corresponding constraint
Table 1. Delays of timing arcs graph is shown in fig 4(a) with the clock period as P. A

circuit graph works with the clock period P only if the skew schedule is derived by taking zero clocking
clock skew schedule satisfies the clocking constraints. The constraints into account. In the second step, delay
optimal clock skew scheduling problem is to determine insertion is applied to resolve the race conditions.
the smallest feasible clock period of a circuit graph and Consider the circuit graph shown in fig 6 (a) .Here the
find the corresponding clock skew schedule for the circuit lower bound of sequential timing optimization is 3 tu.
graph to work with the smallest feasible clock period. There is no negative cycle in the constraint graph of fig 6
Optimum clock skew scheduling problem is (b). The clock skew schedule is taken as Thost = 0 tu, TC1 =
solved by applying binary search approach. At each step 0 tu, TC2 = 0 tu and TC3 = 1 tu .Here the lower bound is
in the binary search [3], for constant value of the clock achieved with out any delay insertion.
period P, check for negative cycle is done .The binary
approach is repeated until smallest feasible clock period is
attained. After applying this approach, we get the smallest
feasible clock period as 5tu. The corresponding constraint
graph is shown in fig 4(b). When the clock period is 5 tu,
there exist a critical cycle R3 R4 R3 in the constraint
graph. If the clock period is less than 5 tu, this cycle will
become a negative cycle. From the fig 3(b), optimum
clock skew scheduling is limited by the critical cycle R3
R4 R3, which is not a critical z-cycle. This critical cycle
has a critical D-edge ed (R3 R4). The weight of the
D- edge is the minimum delay from Register R3 to Fig.6 (a) Circuit Graph ex2
Register R4. Thus, if we increase the minimum delay, the (b) Constraint Graph Gcg(ex2, 3).
cycle becomes a non critical one. The optimal clock skew
schedule is taken as Thost = 0 tu, TC1 = 2 tu, TC2 = 2 tu, TC3 On the other hand, fig 6 shows the two step
= 2 tu and TC4 = 3 tu. The corresponding waveform process for obtaining a delay inserted circuit graph which
representation is also shown in fig 5. works with a clock period P = 3tu. In the first step, since
But due to limitation of race conditions, the only zero clocking constraints are considered, clock skew
optimal clock skew scheduling often does not achieve the schedule is taken as Thost = 0 tu, TC1 = 2 tu, TC2 = 2 tu and
lower bound of sequential timing optimization. Also TC3 = 3 tu. This is shown in fig 7(a). Then in the second
different clock skew schedule will have different race step, delay insertion is applied to resolve the race
conditions. So delay insertion [10] is taken into account in conditions. Here the required increase of the minimum
determining the clock skew schedule. delay from host to R2 is 1 tu and the required increase of
the minimum delay from host to R3 is 2 tu. Fig 7(b) shows
this process. The two step process result in extra delay
insertion. The corresponding waveform is also shown in
fig 8.

Fig 4. (a) Constraint Graph Gcg (ex1, P).

(b) Constraint Graph Gcg(ex1, 5).

Fig. 8. Waveform of the two step process

Fig. 5. Waveform of OCSS


Delay Inserted Circuit Graph is to model the
increase of the minimum delay of every data path during Fig.7. Two step process (a) First Step
the stage of clock skew scheduling .This is a two step
process, in which we can achieve the lower bound of
sequential timing optimization. In the first step, clock


Here the edge triggered circuit shown in fig 1 is

taken. The circuit graph is shown in fig 2. From fig 4(b)
the smallest clock period is 5tu and the clock skew
schedule is taken as (0, 2, 2, 3). As first step, delay
inserted circuit graph is constructed for the fig 2. Here two
loop iterations are performed one with clock period 4.5 tu
and other with a clock period of 4 tu .The data paths host
R1 and R3 R4 are critical minimum paths. The feasible
value for the increase of minimum delay from host to
(b) Second step register R1 is with in the interval (0, 5) where as for
R3 to R4 is (0, 6 -1). Thus we take phost,1 as 5/2 = 2.5 and
IV.DESIGN METHODOLOGY P3,4 =5tu. The effective delay inserted circuit and
corresponding constraint graph is shown in fig 8(a) and
The proposed approach is combining optimum (b).
clock skew scheduling and delay insertion using an
algorithm known as delay insertion and nonzero skew
algorithm (DIANA). This is a three step iteration process
which involves delay insertion , optimum clock skew
scheduling and delay minimization .The input to the
algorithm is an edge triggered circuit and the output is
also an edge triggered circuit that works with a clock
period under a given clock skew schedule. This algorithm
involves construction of an effective delay inserted circuit
graph and the construction of irredundant delay inserted
circuit graph. The iteration process is repeated until the
clock period cannot be reduced further. The edge triggered
circuit shown in fig 1 is used for this approach.
The pseudocode of this algorithm is shown below. Fig 8 (a) Effective delay inserted circuit graph
Procedure DIANA (Gin) (b) Corresponding constraint graph
begin Next step is the construction of irreduntant delay inserted
k= 0; circuit graph. Here there are two delay inserted data paths
GMin(k) = Gin host R1 and R3 R4 . From 8(b) it s clear that the
(SMin(k) , PMin(k) ) = OCSS (GMin(k)); minimum value of phost,1 to work with a clock period 4.5
repeat tu is 0 and for P3,4 is .5tu. The corresponding circuit and
k=k+1; constraint graph under the given clock skew schedule is
GINS(k) = Delay Insertion(GMin(k-1), SMin(k-1) PMin(k-1) ; shown in fig 9(a) and (b).
(SINS(K), P(INS(K) ) = OCSS(GMin(k));
(GMin(k), SMin(k) PMin(k) ) =Del_Min(GINS(k) ,P(INS(K) );
until (PMin(k) = PMin(k -1) );
Gopt = GMin(k) ;
Sopt = SMin(k) ;
Popt = PMin(k) ;
return (Gopt, Sopt , Popt)
Initially Gmin(0) = Gin. The procedure OCSS performs
optimal clock skew scheduling. The procedure
Delay_insertion is used to obtain delay inserted circuit Fig 9 (a) Irreduntant delay inserted circuit graph
graph Gins(k) by increasing the minimum delay of every (b) Corresponding constraint graph
critical minimum path with respect to given clock period
Pmin(k-1) and clock skew schedule S min(K-1) . After delay In the second loop iteration also we construct the effective
insertion the D-edges in the constraint graph is noncritical delay inserted circuit graph of fig 9(a) and as before first
. so we use OCSS again to reduce the clock period further. find the critical minimum paths and then the feasible value
The procedure Del_Min is used to obtain irreduntant delay for the increase of minimum delay. After finding the
inserted circuit graph by minimizing the increased delay smallest clock period, construct the irreduntant delay
of each delay inserted data path in the circuit graph with inserted graph. Once critical z cycle is there, the clock
respect to the clock period. This process is repeated until period cannot be further reduced. The process is repeated
the clock period cannot be further reduced. till this stage. The clock period thus we get through
DIANA algorithm gives the lower bound of sequential
timing optimization for the edge triggered circuit. The
waveform representation of the above approach for the
clock period 4.5tu and 4 tu is shown in fig 10(a) and (b).
[4] Deokar. R. B and Sapatnekar S. S “A graph-theoretic
approach to clock skew optimization,” in Proc. IEEE Int.
Symp. Circuits and Systems, London, U.K.(1994) , vol. 1,
pp. 407–410.
[5] Friedman E. G, Liu. X, and Papaefthymiou M. C,
“Retiming and clock scheduling for digital circuit
optimization,” IEEE Trans. Comput.-Aided Des. Integr.
Circuits Syst., vol. 21, no. 2, pp. 184–203, Feb(2002).
[6] S. M. Burns, “Performance analysis and optimization
of asynchronous circuits,” Ph.D. dissertation, Dept.
Fig 10(a) waveform for the clock period 4.5tu Comput. Sci., California Inst. Technol., Pasadena
[7] John P.Fishburn, “Clock skew optimization,” IEEE
Trans. Comput., vol. 39, no. 7, July( 1990).
[8] Leiserson C. E and Saxe J. B “Retiming Synchronous
[9] Papaefthymiou M. C, “Understanding retiming
through maximum average-delay cycles,” Math. Syst.
Theory, vol. 27, no. 1, pp. 65–84, Jan. /Feb (1994).
[10] Shenoy N. V, Brayton R. K and Sangiovanni-
Vincentelli A. L,“Clock Skew Scheduling with Delay
Padding for Prescribed Skew Domains” in proc.
IEEE/ACM Int. Conf. Computer-Aided Design, Santa
Fig 10(b) waveform for the clock period 4tu Clara,CA (1993), pp. 156–1


The Diana Algorithm is applied to a series of benchmark

circuits. The results obtained are as follows.
Circuit Clock Period Gate Delay

B01 2.212 6.788

B03 6.23 5.32
B04 17.85 5.942

This paper describes the clock period
minimization of edge triggered circuit. This paper uses a
delay insertion and non zero skew algorithm to optimize
the clock period .Experimental results of various sections
of this project are shown above. It is clear that clock
period is minimized using this algorithm than any other
approaches. This algorithm is applied to series of
benchmark circuits and the results are shown above.


[1] Adler V, Baez F, Friedman E. G, Kourtev I. S, Tang

K.T and Velenis, D “Demonstration of speed
enhancements on an industrial circuit through application
of non-zero clock skew scheduling,” in Proc. IEEE Int.
Conf. Electronics, Circuits and Systems, St. Julians,
Malta, (2001),vol. 2, pp. 1021–1025.
[2] Albrecht.C, Korte.B , Schietke.J and Vygen.J, “Cycle
time and slack optimization for VLSI chips,” in Proc.
IEEE/ACM Int. Conf. Computer Aided Design, San Jose,
CA, (1999), pp. 232–238.
[3] Cormen.T.H , Leiserson C. E, and Rivest R. L.,
Introduction to Algorithms. New York: McGraw-


VLSI Floor Planning Based On Hybrid Particle Swarm Optimization

D.Jackuline Moni, 2S.Arumugam
Associate Professor,ECE Department,Karunya University
Chief Executive,Bannari Amman Educational Trust

Abstract- Floor planning is important in very large In this paper, we adopted a non-slicing
scale integrated circuits (VLSI) design automation as representation B*-tree with Hybrid Particle Swarm
it determines the performance, size and reliability of Optimization (HPSO) algorithm.HPSO [1] utilizes
VLSI chips. This paper presents a floorplanning the basic mechanism of PSO [7, 8] and the natural
method based on hybrid Particle Swarm selection method, which is usually utilized by EC
Optimization (HPSO).B*-tree floorplan structure is methods such as genetic algorithm (GA).Since
adopted to generate an initial floorplan without any search procedure by PSO deeply depends on pbest
overlap and then HPSO is applied to find out the and gbest, the searching area may be limited by
optimal solution.HPSO has been implemented and pbest and gbest. On the contrary, by the introduction
tested on popular MCNC and GSRC benchmark of natural selection, broader area search can be
problems for nonslicing realized.
and hard module VLSIfloorplanning.Experimental The remainder of this paper is organized as
results show that the HPSO can quickly produce follows. Section 2 describes the PSO and HPSO
optimal or nearly optimal solutions for all popular methodology. Section 3 presents B*-tree
benchmark circuits. representation and our proposed methods for
floorplanning.The experimental results are reported
I.INTRODUCTION in Section 4.Finally, the conclusion is in section 5.

As technology advances, design complexity is II .METHOLOGY

increasing and the circuit size is getting larger. To cope
with the increasing design complexity, hierarchial design A. Particle Swarm Optimization
and IP modules are widely used. This trend makes Particle swarm optimization (PSO) is a population
module floorplanning much more critical to the quality based stochastic optimization technique developed
of a VLSI design than ever. Given a set of circuit by Dr.Eberhart and Dr.Kennady in 1995, inspired by
components, or “modules” and a net list specifying social behavior of bird flocking. In PSO, the
interconnections between the modules, the goal of VLSI potential solution, called particles, fly through the
floorplanning is to find a floorplan for the modules such problem space by following the current optimum
that no module overlaps with another and the area of the particles. All the particles have fitness values, which
floorplan and the interconnections between the modules are evaluated by the fitness function to be optimized,
are minimized. and have velocities, which direct the flying of the
A fundamental problem to floorplanning lies in the particles. The particles fly through the problem
representation of geometric relationship among modules. space by following the current optimum particles.
The representation profoundly affects the operations of PSO is initialized with a group of random particles
modules and the complexity of a floorplan design (solutions) and then searches for optima by updating
process. It is thus desired to use an efficient, flexible, generations. In every iteration, each particle is
and effective representation of geometric relationship for updated by following two “best “values Pbest and
floorplan designs. Existing floorplan representations can Gbest. When a particle takes part of the population
be classified into two categories, namely: 1)“Slicing in its topological neighbors, the best value is a local
representation” 2) “non slicing representation”. Slicing best and is called Lbest.
floorplans are those that can be recursively bisected by Let s denotes the dimension number of
horizontal and vertical cut lines down to single blocks. unsolved problem. In general, there are three
They can be encoded by slicing trees or polish attributes, current position xi, current velocity vi and
expressions[9,18].For non slicing floorplans, researchers local best position yi, for particles in search space to
have proposed several representations such as sequence present their features. Each particle in the swarm is
pair [12,13, 16,17]., boundedslicinggrid[14], Otree[4], iteratively updated according to the aforementioned
B*tree[2] ,TransitiveClosureGraph(TCG)[10,11],Corner- attributes [3,7]. Each agent tries to modify its
block list(CBL)[5,21],Twin binary sequence[20]. position using the following information: The
Since B*-tree representation [2] is an efficient, current positions (x, y), current velocities (vx, vy),
flexible, and effective data structure, we have used B*- distance between the current position and p best and
tree floorplan to generate an initial floorplan without any the distance between the current position and g best.
overlap. Existing representations [6,19], use simulated Assuming that the function f is to be minimized that
annealing because this allows modifying the objective the dimension consists of n particles and the new
function in applications. The drawback of adopting SA is velocity of every particle is updated by (1)
that the system must be close to equilibrium throughout vi,j(t+1)=wvi,j(t)+c1 r1,i(t)[yi,j(t)-xi,j(t)]+c2 r2,i(t)[ i,j(t)-
the process, which demands a careful adjustment of the xi,j(t)]
annealing schedule parameters. (1)

where, v i,j is velocity of the particle of the jth dimension C. Steps of Hybrid Particle Swarm Optimization
for all j belongs to 1…s, w is the inertia weight of Step 1:
velocity,c1 and c2 denotes the acceleration Generation of initial condition of each agent .Initial
coefficients,r1 and r2 are the elements from two uniform searching points (si0) and velocities (vi0) of each
random sequences in the range (0, 1) and t is number of agent are usually generated randomly within the
generations. The new position of the particle is allowable range. The current searching point is set to
calculated as follows pbest for each agent. The best-evaluated value of
xi(t+1)=xi(t)+vi(t+1) (2) pbest is set to g best and the agent number with the
The local best position of each particle is updated best value is stored.
by(3). Step 2:
⎧yi(t),→if →f (xi(t +1)) ≥ f ( yi(t)) Evaluation of searching point of each agent. The
yi(t +1) =⎨ objective function value is calculated for each agent.
⎩xi(t +1),→if →f (xi(t +1)) < f ( yi(t)) (3)
If the value is better than the current pbest of the
agent, the pbest value is replaced by the current
The global best position y found from all particles during value. If the best value of pbest is better than the
previous three steps are defined as current g best, g best is replaced by the best value
and the agent number with the best value is stored.
yi (t +1) = argminf ( yi (t +1)), →1 ≤ i ≤ n Step 3:
yi Natural selection using evaluation value of each
B. Hybrid particle swarm optimization (HPSO) searching point is done.
The structure of the hybrid model is illustrated below Step 4:
begin Modification of each searching point. The current
initialize searching point of each agent is changed.
while (not terminate-condition) do Step 5:
begin Checking the exit condition. If the current iteration
evaluate number reaches the predetermined maximum
calculate new velocity vectors iteration number, then exit, otherwise go to step 2.
Natural Selection Given an admissible placement P, we can represent
end it by a unique (horizontal) B*-tree T. Fig 2(b) gives
end an example of a B*-tree representing the placement
The breeding is done by first determining which of the of Fig 2(a). A B*-tree is an ordered binary tree
particles that should breed. This is done by iterating whose root corresponds to the module on the bottom
through all the particles and, with probability (pi) mark a left corner. Similar to the DFS procedure, we
given particle for breeding. Note that the fitness is not construct the B*-tree T for an admissible placement
used when selecting particles for breeding. From the P in a recursive fashion: Starting from the root, we
pool of marked particles we now select two random first recursively construct the left subtree and then
particles for breeding. This is done until the pool of the right subtree. Let Ri denote the set of modules
marked particles is empty. The parent particles are located on the right hand side and adjacent to bi. The
replaced by their offspring particles, thereby keeping the left child of the node ni corresponds to the lowest
population size fixed where pi is a uniformly distributed module in Ri that is unvisited. The right child of the
random value between 0 and 1.The velocity vectors of node ni represents the lowest module located above
the offspring is calculated as the sum of the velocity and with its x coordinates equal to that of bi.
vectors of the parents normalized to the original length Following the above mentioned DFS procedure and
of each parent velocity vector. The flow chart of HPSO definitions, we can
is shown in figure.1 guarantee the 1-to-1 correspondence between an
admissible placement and its induced B*-tree.

Figure1:Flow chart of HPSO

(a) corresponding packing (i.e compute the x and y
coordinates for all modules) is amortized linear time.
circuit #of With B*-tree Our method
blo Area Time Area Time B*-tree Perturbations
cks (mm2) (sec) (mm2) (sec) Given an initial B*-tree, we perturb the B*-
tree to another using the following three operations.
Apte 9 46.92 7 46.829 1.31 • Op1 : Rotate a module
Xerox 10 20.06 25 19.704 3.69 • Op 2 : Move a module to another place
Ami33 33 1.27 3417 1.26 4.44 • Op 3 : swap two modules

Op 1 rotates a module, and the B * -tree structure is
not changed. Op 2 deletes and inserts a node . Op 2
and Op 3 need to apply the deletion and insertion
operations for deleting and inserting a node from and
to a B*-tree.
A. Floorplanning using B*-tree

Read the input benchmark circuit and construct

the B*-tree representation. Then start with random
floorplanning to get initial solutions. These initial
solutions will be assigned to different particles. Then
velocity is found out for each particle. Depending
upon the velocity of each particle, we have to
perturb the B*-tree. After each perturbation new
solution is obtained. Then gbest and lbest are found
out. Natural selection using evaluation value of each
searching point is done and the same process is
repeated until the termination condition is reached.

The experiments in this study employed
GSRC and MCNC bench marks[22] for the
proposed floorplanner and compared with [2].The
simulation programs were written in C++ compiled
Fig2: (a) An admissible placement (b) The (horizontal)
using Microsoft Visual C++,and the results were
B*-tree representing the placement
obtained on a Pentium 4 2Ghz with 256MB RAM.
The PSO experiments with w, c1, c2 initializations
As shown in fig 2, it makes the module a , the root of T
were 0.4, 1.4, and 1.4 respectively. For HPSO, the
since module a, is on the bottom - left corner.
probability of selection is chosen as 0.6.The particle
Constructing the left subtree of na recursively it makes nh
number is set as twenty. The floorplanner was run
the left child of na . Since the left child of nh does not
for 10 times and average values of chip area and run
exist, it then constructs the right subtree of nh (which is
time were taken.
routed by ni). The construction is recursively performed
The results are shown in Table 1.Compared
in the DFS order. After completing the left subtree of na
with [2], our method can find a better placement
the same procedure applies to the right subtree of na. The
solution in even less computation time. Under the
resulting B *tree for the placement of fig 2( a) is shown
same tree structure, our approach has more
in fig 2(b) .The construction takes only linear time.
efficiency and solution searching ability for
Given a B* tree T , we shall compute the x and
y coordinates for each module associated with a node in
the tree. The x and y coordinates of the module
Table 1 Results of Hard Modules using B*-tree
associated with the root (xroot, yroot) = (0, 0) since the root
based HPSO
of T represents the bottom- left module. The B* -tree
keeps the geometric relationship between two modules
as follows. If node nj is the left child of node ni , module
In this paper, we proposed a floorplanner based on
bj must be located on the right- hand side and adjacent to
HPSO with B*-tree structure for placing blocks.
module bi in the admissible placement ; xj = xi + wi. .
HPSO exhibits the ability for searching the solution
Besides if node nj is the right child of node ni , module bj
space more efficiently than SA.The experimental
must be located above, with the x- coordinate of bj equal
results proved that the proposed HPSO method can
to that of bi i.e xj = xi. Therefore, given a B* -tree, the x
lead to a more optimal and reasonable solutions on
coordinates of all modules can be determined by
the hard IP modules placement problem. Our future
traversing the tree once. The contour data structure is
work is to deal with soft IP modules and also to
adopted to efficiently compute the y- coordinate from a
include constraints such as alignment and
B* -tree. Over all, given a B*-tree we can determine the
performance constraints.
REFERENCES [20] E.F.Y. Young, C.C.N. Chu and Z.C. Shen,
“Twin Binary Sequences: A Nonredundant
[1] P.J. Angeline “Using Selection to Improve Particle Representation for General Nonslicing Floorplan,”
Swarm Optimization.” In Proceedings of the IEEE IEEE Trans. on CAD 22(4), pp. 457–469, 2003.
Congress on Evolutionary Computation, 1998 pages 84- [21] S. Zhou, S. Dong, C.-K. Cheng and J. Gu,
89 IEEE Press. “ECBL: An Extended Corner Block List with
[2] Y.-C. Chang, Y.-W. Chang, G.-M. Wu and S.- Solution Space including Optimum Placement,”
W.Wu, “B *-trees: A New representation for Non- ISPD 2001, pp. 150-155.
Slicing Floorplans,” DAC 2000, pp.458-463. [22]
[3]R.C.Eberhart and J.kennedy “A New Optimizer using
Particle Swarm Theory.” In Proceedings of the Sixth ss.html
International Symposium on Micromachine and Human
Science, 1995 ,pages 39-43.
[4] P.-N. Guo, C.-K. Cheng and T. Yoshimura, “An O-
tree Representation of Non-Slicing Floorplan,” DAC
‘99, pp. 268-273.
[5] X. Hong et al., “Corner Block List: An Effective and
Efficient Topological Representation of Non-Slicing
Floorplan,” ICCAD 2000, pp. 8-13.
[6] A. B. Kahng, “Classical floorplanning harmful?”
ISPD 2000, pp. 207-213.
[7] J.Kennedy and R.C.Eberhart ‘Particle Swarm
Optimization.’ In Proceedings of the IEEE International
Joint Conference on Neural Networks, (1995) pages
1942-1948.IEEE Press
[8] J.Kennedy ‘The Particle Swarm: Social Adaptation
of Knowledge.’ In Proceedings of the IEEE
International Conference on
Evolutionary Computation, 1997, pages 303-308.
[9]M.Lai and D. Wong,“SlicingTree Is a Complete
FloorplanRepresentation,” DATE 2001, pp. 228–232.
[10] J.-M. Lin and Y.-W Chang, “TCG: A Transitive
Closure Graph-Based Representation for Non-Slicing
Floorplans,” DAC 2001, pp. 764–769.
[11] J.-M. Lin and Y.-W. Chang, “TCG-S: Orthogonal
Coupling of P*-admissible Representations for General
Floorplans,” DAC 2002, pp. 842–847.
[12] H. Murata, K. Fujiyoshi, S. Nakatake and, “VLSI
Module Placement Based on Rectangle-Packing by the
Sequence Pair,” IEEE Trans. on CAD 15(12), pp. 1518-
1524, 1996.
[13] H. Murata and E. S. Kuh, “Sequence-Pair Based
Placement Methods for Hard/Soft/Pre-placed Modules”,
ISPD 1998, pp. 167-172.
[14] S.Nakatake, K.Fujiyoshi, H.Murata,and Y.Kajitani,
“Module placement on BSG structure and IC Layout
Applications,” Proc.ICCAD,pp.484-491,1998.
[15] K.E. Parsopoulos and M.N. Vrahatis, “Recent
Approaches to Global ptimization Problems through
Particle Swarm Optimization.” Natural Computing,
2002, 1(2-3):235-306.
[16] X. Tang, R. Tian and D. F. Wong, “Fast Evaluation
of Sequence Pair in Block Placement by Longest
Common Subsequence Computation,” DATE 2000, pp.
[17] X. Tang and D. F.Wong, “FAST-SP: A Fast
Algorithm for Block Placement Based on Sequence
Pair,” ASPDAC 2001, pp. 521-526.
[18] D.F.Wong and C.L.Liu, “A New Algorithm For
Floorplan Design,” DAC 1986,PP.101-107.
[19] B. Yao et al., “Floorplan Representations:
Complexity and Connections,” ACM Trans. on Design
Autom. of Electronic Systems 8(1), pp. 55–80, 2003.

Development Of An EDA Tool For Configuration Management Of

FPGA Designs
Anju M I 1, F. Agi Lydia Prizzi 2, K.T. Oommen Tharakan3
PG Scholar, School of Electrical Sciences,
Karunya University, Karunya Nagar, Coimbatore - 641 114
Lecturer, School of Electrical Sciences,
karunya University, Karunya Nagar, Coimbatore - 641114
Manager-IED, Avionics, VSSC, ISRO P.O. Thiruvanathapuram

Configuration management is a discipline applying

Abstract-To develop an EDA tool for configuration technical and administrative direction and surveillance to
management of various FPGA designs. Once FPGA identify and document the functional and physical
designs evolve with respect to additional characteristics of a configuration item, control changes
functionality, design races etc. it has become very to those characteristics, record and report change
important to use the right design only for the processing and implementation status, and verify
application. In this project we propose to solve the compliance with specified requirements. (IEEE STD
problem with respect to the case of VHDL. The 610.12, 1990).
FPGA VHDL codes will be coded for the various
constructs, the no. of pins used, the pin check sum, The IEEE's definition has three keywords: technical,
the fuse check sum and the manufacturer, the design, administrative and surveillance. This definition fits the
the device part number and the device resources CM concept into the organization. CM not only can help
used. This will help in fusing the right VHDL the technical staff to track their work, but also can help
file to the FPGA. the administrator to create a clear view of the target, the
I.INTRODUCTION problem and the current status. Furthermore, CM
a) EDA tools: supplies an assessment framework to track the whole
Electronic design automation (EDA) is the category of progress.
tools for designing and producing electronic systems
ranging from printed circuit boards (PCBs) to integrated c) Checksum:
circuits. This is sometimes referred to as ECAD
It is important to be able to verify the correctness of files
(electronic computer-aided design) or just CAD. This
that are moved between different computing systems.
usage probably originates in the IEEE Design
The way that this is traditionally handled is to compute a
Automation Technical Committee. EDA for electronics
number which depends in some clever way on all of the
has rapidly increased in importance with the continuous
characters in the file, and which will change, with high
scaling of semiconductor technology. EDA tools are
probability, if any character in the file is changed. Such a
used for programming design functionality into FPGAs.
number is called as checksum.
b) Configuration Management: This paper presents Development of an EDA Tool For
Configuration Management (CM) is a documentation The Configuration Management Of FPGA Design, a tool
system for tracking the work. Configuration management which converts VHDL code to a unique code. This tool
is implemented in C language by assigning values to
the some of the constructs present in the HDL. It is very
involves the collection and maintenance of data important to use the right design only for the application.
concerning the hardware and software of computer This tool will help in fusing the right VHDL file to the
systems that are being used. CM embodies a set of FPGA.
techniques used to help define, communicate and control The conversion of VHDL file to a unique code is done
the evolution of a product or system through its concept, by assigning values to the some of HDL constructs. The
development, implementation and maintenance phases. It code or the number thus obtained will be unique. For
is also a set of systematic controls to keep information developing this tool we are considering not only HDL
up to date and accurate. A collection of hardware, constructs (IEEE 1164 logic) but also file name, fuse
software, and/or firmware, which satisfies an end-use check sum and pin check sum.
function and is designated for configuration

2. Develop a unique number for the above, giving
II.BLOCK DIAGRAM weights to various constructs.

3. This shall be mathematically or logically

operated with respective FPGA manufacturer.
Eg. Actel.

4. Further it shall again be operated with respect to

the Actel actual device number.

There are a total of 97 constructs in HDL( IEEE 1164

logic). For some of its constructs, here we have assigned
different values. The assigned values may be decimal,
III. OVERVIEW hexadecimal or octal.
For a totally hardware oriented design (eg. FPGAs) the Algorithm used for this is given below:
development time is prohibitive in bringing fresh and IV. ALGORITHM
affordable products to the market. Equally restrictive is a Step 1: begin
totally software based solution which will perform Step 2: read .vhd file
slowly due to the use of a generalised computing. This is Step 3: assign values for constructs
where designing for a hybrid between a hardware and Step 4: weighted file name * [ weighted construct *
software based implementation can be of particular position of the construct] = a number // for a single
advantage. construct//
This paper helps you in developing an EDA tool for Step 5: total no. of similar constructs + step 4 = a
configuration management of FPGA designs. This tool number
will help you in selecting the right file for the Step 6: repeat step 4 and step 5 // for all constructs//
application. In many ways (data flow, structural, Step 7: product no. * step 6 = a number
behavioural and mixed level modelling) we can write a Step 8: product version no. + step 7 = a number
program. What ever may be the modelling we should Step 9: [fuse checksum +pin check sum] + step 8 = a
download the right application to the FPGA kit. number
Otherwise it leads to wastage of time, money and energy.
For developing this tool, it has considered some the 4. a) STEP DETAILS
constructs present in VHDL. And assigning weights to INPUT: .vhd file
the constructs which are considered. Here we have Step 3: file name==> assign a value
considered .vhd file (after writing VHDL program, we For eg: wt assigned for the file = 1
save the file with an extension .vhd) as the input and a Step 4: wtd file name * [ wtd construct * line no.] = a
unique code or a unique number as the output. With the number or a code (this is for a single construct)
help of C language the .vhd file is converted to a unique For eg: case statement
code or a number. If a file is converted into a number wt assigned for case ==> 8
and saved, it will be helpful while coding modules of big case is in line no. 30
project. Consider an example: if we want to do coding of then, 1*[8*30] = 240 (this is for a single construct)
a big project and in one of its modules a CPU or a RAM Step 5: add similar constructs
has come. In such situation, the programmer has to do For eg: total no. of case statement = 90
coding for the CPU or RAM, depends on the need. The then, add single construct value and total no.
programmer will directly copy and use that code, if the ie: 240+90 = 330
code is available. To make it as error free, he has to Step 6: repeat steps 5 and 4 for all constructs
check whether the coding is working or not. This is time For eg: ‘ if’ statement
consuming. If it is coded and saved as a number, it will wt assigned for ‘if’ = 10
be easier for the programmer to call that particular suppose ‘if’ is in line no. 45
program for his project. He could just call that code for then, 1* [10*45] = 450
downloading to FPGA kit. total no. of if statement =1
In this paper we have considered five different VHDL construct value+total no = 450 +15 =465
programs (.vhd files). These files are converted into so step 6= 330+465 = 795
unique code. It is shown in output. Step 7: 795 * product no
EDA TOOL DEVELOPMENT: Steps taken into For eg: Product no = 77
consideration Then,
The following steps are taken into consideration. 795 * 77 = 61215
1. The various constructs used in VHDL. Step 8: 61215 + version no
(Quantity and location of the constructs). For eg: version no= 3
61215 + 3 =61218
Step 9: pin checksum + fuse checksum +61215
OUT PUT: a code or a number

V. OUTPUTS: 4) Fibonacci series

1) 16 bit adder

5) 32 bit memory module

2) ALU

3) 8 bit counter VI. CONCLUSION

This report presents the various steps required for the

implementation of development of an EDA tool for
configuration management of FPGA design and is
presented in the form of an algorithm. The coding for the
implementation of development of an EDA tool for
configuration management of FPGA design has been
implemented using C and the experimental results of five
.vhd files are shown above.



[1] Ecker, W.; Heuchling, M.; Mades, J.; Schneider, C.;

Schneider, T.; Windisch, A.; Ke Yang;
Zarabaldi,‘‘HDL2HYPER-a highly flexible hypertext
generator for VHDL models’’ , IEEE, oct 1999, Page(s):57 –
[2] ‘‘The verilog to html converter’’,
[4] ‘‘Method and program product for protecting
information in EDA tool design views’’,
[5]‘‘Intusoft makes HDL development model easy’’
[6] Arie Komarnitzky, Nadav Ben-Ezer, Eugene
Lyubinsky - AAI (Avnet ASIC Israel Ltd.) Tel Mond,
Israel , ‘‘Unique Approach to Verification of Complex
SoC Designs’’.
[7] Matthew F. Parkinson, Paul M. Taylor, and Sri
Parameswaran, “C to VHDL Converter in a Codesign
Environment”, VHDL International Users Forum.
Spring Conference, 1994 .


A BIST for Low Power Dissipation

Mr.Rohit Lorenzo ,PG Scholar & Mr.Amir Anton jone, M.E, Lecturer ECE Dept. KarunyaUniversity,Coimbatore

Abstract - In this paper we propose a new scheme for apply patterns that cannot appear during normal operation
Built in self test .we proposing different architectures to the state inputs of the CUT during test application.
that have reduce power dissipation . The architectures Furthermore, the values applied at the state inputs of the
are designed With techniques that reducing the power CUT during scan shift operations represent shifted values
dissipation. The BIST with different technique of test vectors and circuit responses and have no particular
decreases transitions that occur at scan inputs during temporal correlation. Excessive switching activity due to
scan shift operations and hence reduces power low correlation between consecutive test patterns can cause
dissipation in the CUT. Here we doing the comparison several problems [14].Since heat dissipation in a CMOS
among different architectures of BIST. In this paper circuit is proportional to switching activity, a CUT can be
We are fixing the values at the inputs of BIST permanently damaged due to excessive heat dissipation if
architecture & at the output we are restructuring the switching activity in the circuit during test application is
scan chain to get the optimized results. Experimental much higher than that during its normal operation. Heat
results of the proposed technique show that the power dissipated during test application is already in-fluencing the
dissipation is reduced signifcantly compared to existing design of test methodologies for practical circuits [14].
Circuit power dissipation in test mode is much higher than
the power dissipation in function mode [21]. High power The BIST TPG proposed in this paper reduces switching
consumption in BIST mode is especially a serious concern activity in the CUT during BIST by reducing the number of
because of at-speed testing. Low power BIST techniques transitions at scan input during scan shift cycles fig 1.. If
are gaining attention in recent publications [11]. The first scan input is assigned , where , at time and assigned the
advantage of low power BIST is to avoid the risk of opposite value at time , then a transition occurs at at time .
damaging the Circuits Under Test (CUT). Low power BIST The transition that occurs at scan input can propagate into
techniques save the cost of expensive packages or external internal circuit lines causing more transitions. During scan
cooling devices for testing. Power dissipation in BIST shift cycles, the response to the previous scan test pattern is
mode is made up of three major components: the also scanned out of the scan chain. Hence, transitions at
combinational logic power, the sequential circuit power, scan inputs can be caused by both test patterns and
and the clock power. In the clock power reduction responses. Since it is very difficult to generate test patterns
category, disabling or gating the clock of scan chains are by a random pattern generator that cause minimal number
proposed [2]. By modifying the clock tree design, these of transitions while they are scanned into the scan chain
techniques effectively reduce the clock power consumption, and whose responses also cause minimal number of
which is shown to be a significant component of the test transitions while they are scanned out of the scan chain, we
power [23]. However, clock trees are sensitive to the focus on minimizing the number of transitions caused only
change of timing; even small modifications sometimes can by test patterns that are scanned in. Even though we focus
cause serious failure of the whole chip. Modifying the on minimizing the number of transitions caused only by test
clocks, therefore, not only increases the risk of skew patterns, our extensive experiments show that the proposed
problems but also imposes constraints on the test patterns TPG can still reduce switching activity significantly during
generation. The low transition random test pattern BIST . Since circuit responses typically have higher
generator (LT-RTPG) is proposed to reduce the number of correlation among neighborhood scan outputs than test
toggles of the scan input patterns . In 3-weight weighted patterns, responses cause fewer transitions than test patterns
random technique while being scanned out. A transition at the input of the
scan chain at scan shift cycle , which is caused by scanning
we are fixing transition at the input so in this way we are in a value that is opposite to the value that was scanned in
reducing power in 3 wt wrbist. switching activity in a at the previous scan shift cycle , continuously causes
circuit can be significantly higher during BIST than that transitions at scan inputs while the value travels through the
during its normal operation. Finite-state machines are often scan chain for the following scan shift cycles.. describes
implemented in such a manner that vectors representing scanning a scan test pattern 01100 into a scan chain that has
successive states are highly correlated to reduce power five scan flip-flops. Since a 0 is scanned into the scan chain
dissipation [16]. Use of scan allows to at time , the 1 that is scanned into the scan chain at time
causes a transition at the input of the scan chain and
continuously causes transitions at the scan flip-flops it
passes through until it arrives at its final destination at time
In contrast, the 1 that is scanned into the scan chain at the (since the generators are 9 bits wide, When the content of
next cycle causes no transition at the input of the scan chain the shift counter is , where k = 0,1,……8, A value for input
and arrives at its final destination at time without causing pk is scannes into the scan chain The generator counter
any transition at the scan flip-flops it passes through[14]. selects appropriate generators; when the content of the
This shows that transitions that occur in the entire scan generator counter is , test patterns are generated by using
chain can be reduced by reducing transitions at the input of generator Pseudo-random pattern sequences generated by
the scan chain. Since transitions at scan inputs propagate an LFSR are modified (fixed) by controlling the AND and
into internal circuit lines causing more transitions, reducing OR gates with overriding signal s0 and s1 . fixing a random
transitions at theinput scan chain can eventually reduce value to 0 is achieved by setting s0 to 1 and s1 to 1.
switching activity in the entire circuit. overriding of signals
s0 and s1 is driven by T flip flops , TF0 and TF1 . The
inputs of TF0 and TF1 is driven by D0 and D1 respectively
which are generated by the outputs of shift counter and
generator counter . The shift counter is required by all scan-
based BIST techniques and not particular to the proposed
3-weight WRBIST scheme.All BIST controllers need a
pattern counter that counts the number of test patterns
applied. The generator counter can be implemented from
logG where G is the number of generator counter no
additional hardware is required hardware overhead for
Fig1 -Transitions at scan chain input implementing a 3-weight WRBIST is incurred only by the
decoding logic and the fixing logic, which includes two
III. ARCHITECTURE OF 3WT-WRBIST toggle flip-flops ( flip-flops), an AND and an OR gate.
Since the fixing logic can be implemented with very little
hardware, overall hardware overhead for implementing the
serial fixing 3-weight WRBIST is determined by hardware
overhead for the decoding logic. both d0 and d1 are set to 0
hence the t flip flops hold totheirprevious state in cycles
when a scan value of Pk is scanned in also assume that T
flip flop TF0 is initialized to 1 TF1 initialized to 0 . flip
flops placed in scan chain in descending order of their
subscript number hence the value of p0 is scanned first and
p8 is scanned last Random patterns generated by the LFSR
can be fixed by controlling the AND/OR gates directly by
the decoding logic without the two T flip flops . however
(a) (b)
this scheme incur larger hardware overhead for the
decoding logic and also more transition in the circuit under
Fig. 2.generator: (a) with toggle flip-flops TF and TF test (CUT) during BIST than the scheme with T flip flops .
and (b)without toggle flip-flops.
in the scheme shows TF0 ,TF1, D0 and D1 values for the
scheme in T flip flops that is implemented .
The LT-RTPG proposed in reduces switching activity
during BIST by reducing transitions at scan inputs during
scan shift operations. An example LT-RTPG is shown in
Fig. 4. The LT-RTPG is comprised of an -stage LFSR, a -
input AND gate, and a toggle flip-flop (T flip-flop). Hence,
it can be implemented with very little hardware. Each of
inputs of the AND gate is connected to either a normal or
an inverting output of the LFSR stages. If large is used,
large sets of neighboring state inputs will be assigned
identical values in most test patterns, resulting in the
decrease fault coverage or the increase in test sequence
length. Hence, like [15], in this paper, LT-RTPGs with only
Fig. 3wt-WRBIST or 3 are used. Since a flip-flop holds previous values until
the input of the flip-flop is assigned a 1, the same value ,
shows a set of generators and Fig.3 shows an where , is repeatedly scanned into the scan chain until the
implementation of the 3-weight WRBIST for the generators value at the output of the AND gate becomes 1. Hence,
shown The shift counter is an (m+1) modulo counter, adjacent scan flip-flops are assigned identical values in
where m is the number of scan elements in the scan chain most test patterns and scan inputs have fewer transitions
during scan shift operations. Since most switching activity REFRENCES
during scan BIST occurs during scan shift operations (a [1] Z. Barzilai, D. Coppersmith, and A. L. Rosenberg,
capture cycle occurs at every cycles), the LT-RTPG can “Exhaustive genera-tion of bit patterns with applications to
reduce heat dissipation during overall scan testing. Various VLSI self-testing,’’ IEEE Trans.Comput., vol. C-32, no. 2,
properties of the LT-RTPG are studied and a detailed pp. 190-194, Feb. 1983.
methodology for its design is presented in . It has been [2] L. T. Wang and E. J. McCluskey, “Circuits for pseudo-
observed that many faults that escape random patterns are exhaustive testpattern generation,” in Proc. IEEE Inr. Tesr
highly correlated with each other and can be detected by Con$, 1986, pp. 25-37.
continuously complementing values of a few inputs from a [3] W. Daehn and J. Mucha, “Hardware test pattern
parent test vector. This observation is exploited in [22], and generators for built-in test,’’ in Proc. IEEE Int. Tesr Con$,
to improve fault coverage for circuits that have large 1981, pp. 110-113.
numbers of RPRFs. We have also observed that tests for [4] S. Hellebrand, S. Tarnick, and J. Rajski, “Generation of
faults that escape LT-RTPG test sequences share many vector patterns through reseeding of multiple-polynomial
common input linear feedback shift registers,”in Proc. IEEE Int. Test
Conf., 1992, pp. 120–129.
[5] N. A. Touba and E. J. McCluskey, “Altering a pseudo-
random bit sequence
for scan-based BIST,” in Proc. IEEE Int. Test Conf., 1996,
[6] M. Chatterjee and D. K. Pradhan, “A new pattern
biasing technique for BIST,” in Proc. VLSITS, 1995, pp.
7] N. Tamarapalli and J. Rajski, “Constructive multi-phase
Fig4 LT-RTPG test point insertion for scan-based BIST,” in Proc. IEEE
Int. Test Conf., 1996, pp. 649–658.
[8] Y. Savaria, B. Lague, and B. Kaminska, “A pragmatic
approach to the design of self-testing circuits,” in Proc.
IEEE Int. Test Conf., 1989, pp. 745–754.
[9] J. Hartmann and G. Kemnitz, “How to do weighted
random testing for BIST,” in Proc. IEEE Int. Conf.
Comput.-Aided Design, 1993, pp.568–571.
[ [10] J. Waicukauski, E. Lindbloom, E. Eichelberger, and
assignments. This implies that RPRFs that escape LT- O. Forlenza, “A method for generating weighted random
RTPG test sequences can be effectively detected by fixing test patterns,” IEEE Trans. Comput., vol. 33, no. 2, pp.
selected inputs to binary values specified in deterministic 149–161, Mar. 1989.
test cubes for these RPRFs and applying random patterns to [11] H.-C. Tsai, K.-T. Cheng, C.-J. Lin, and S. Bhawmik,
the rest of inputs. This technique is used in the 3-weight “Efficient testpoint selection for scan-based BIST,” IEEE
WRBIST to achieve high fault coverage for random pattern Trans. Very Large Scale Integr. (VLSI) Syst., vol. 6, no. 4,
resistant circuits. In this paper we demonstrate that pp. 667–676, Dec. 1998.
augmenting the LT-RTPG with the serial fixing 3-weight [12] W. Li, C. Yu, S. M. Reddy, and I. Pomeranz, “A scan
WRBIST proposed in [15] can attain high fault coverage BIST generation method using a markov source and partial
without excessive switching activity or large area overhead BIST bit-fixing,” in Proc.IEEE-ACM Design Autom. Conf.,
even for circuits that have large numbers of RPRFs. 2003, pp. 554–559.
[13] N. Z. Basturkmen, S. M. Reddy, and I. Pomeranz,
V.CONCLUSION “Pseudo random patterns using markov sources for scan
BIST,” in Proc. IEEE Int. Test Conf., 2002, pp. 1013–1021.
This paper presents a low hardware overhead TPG for [14] S. B. Akers, C. Joseph, and B. Krishnamurthy, “On the
scanbased BIST that can reduce switching activity in CUTs role of independent fault sets in the generation of minimal
during BIST . The main objective of most recent BIST test sets,” in Proc. IEEE Int Test Conf., 1987, pp. 1100–
techniques has been the design of TPGs that achieve Low 1107.
power dissipation . Since the correlation between [15] S. W. Golomb, Shift Register Sequences. Laguna Hills,
consecutive patterns applied to a circuit during BIST is CA: Aegean Park, 1982.
significantly lower, switching activity in the circuit can be [16] C.-Y. Tsui, M. Pedram, C.-A. Chen, and A. M.
significantly higher during BIST than that during its normal Despain, “Low power state assignment targeting two-and
operation. multi-level logic implementation,” in Proc. IEEE Int. Conf.
Comput.-Aided Des., 1994, pp. 82–87
[17] P. Girard, L. Guiller, C. Landrault,
andS.Pravossoudovitch, “A test vector inhibiting technique

for low energy BIST design,” in Proc. VLSI Test. Symp., [22] B. Pouya and A. L. Crouch, “Optimization trade-offs
1999, pp. 407–412. for vector volume and test power,” Proc. Int’l Tset Conf.,
[18] J. A. Waicukauski, E. B. Eichelberger, D. O. Forlenza, 2000, pp. 873 881.
E. Lindbloom,and T. McCarthy, “Fault simulation for
structured VLSI,” VLSI Syst. Design, pp. 20–32, Dec. 1985 [23] Y Bonhomme, P. Girard, L. Guiller, C. Landrault and
[19] R. M. Chou, K. K. Saluja, and V. D. Agrawal, S. Pravossoudovitvh, “A gated clock scheme for low power
“Scheduling tests for VLSI systems under power scan testing of logic ICs or embedded cores,” Proc. 10th
constraints,” IEEE Trans. Very Large Scale Integr. (VLSI) Asian Test Symp., 2001, pp. 253 258.
Syst., vol. 5, no. 2, pp. 175–185, Jun. 1997. [24] Y. Zorian, “A distributed BIST control scheme for
[20] T. Schuele and A. P. Stroele, “Test scheduling for complex VLSI design,” Proc. 11th IEEE VLSI Test Symp.,
minimal energy consumption under power constrainits,” in 1993, pp. 4
Proc. VLSI Test. Symp., 2001,pp. 312–318. [25] P. Girard, “Survey of low-power testing of VLSI
[21] N. H. E.Weste and K. Eshraghian, Principles of CMOS circuits,” IEEE Design and Test of Computers,May-June
VLSI Design: A Systems Perspective, 2nd ed. Reading, MA: 2002, pp. 82 92.
Addison-Wesley, 1992.


Test pattern generation for power reduction using BIST architecture

Anu Merya Philip, II M.E.(VLSI).
D.S Shylu, M.Tech, Sr Lecturer,ECE Dept:,
Karunya University, Coimbatore, Tamil Nadu

Abstract--Advances in the Built-in-self-test (BIST) circuit’s normal operation. In order to ensure non-destructive
techniques have enabled IC testing using a combination of testing of such a circuit, it is necessary to either apply test
external automated test equipment and BIST controller on vectors which cause a switching activity that is comparable to
the chip. A new low power test pattern generator using a that during normal circuit operation or remove any excessive
linear feedback shift register (LFSR), called LP-TPG, is heat generated during test using special cooling equipment.
presented to reduce the average and peak power of a The use of special cooling equipment to remove excessive heat
circuit during test. The correlation between the test dissipated during test application becomes increasingly
patterns generated by LP-TPG is more than conventional difficult and costly as tests are applied at higher levels of
LFSR. LP-TPG inserts intermediate patterns between the circuit integration, such as BIST at board and system levels.
random patterns. The goal of having intermediate patterns Elevated temperature and current density caused by excessive
is to reduce the transitional activities of primary inputs switching activity during test application will severely
which eventually reduces the switching activities inside the decrease the reliability of circuits under test due to metal
circuit under test, and hence, power consumption. The migration or electro-migration.
random nature of the test patterns is kept intact. In the past, the tests were typically applied at rates much
Keyword—Lp-LFSR, R-injection, test patterns lower than a circuit’s normal clock rate. Circuits are now
tested at higher clock rates, possibly at the circuit’s normal
I. INTRODUCTION clock rate (at- speed testing). Consequently, heat dissipation
during test application is on the rise and is fast becoming a
The Linear Feedback Shift Register (LFSR) is commonly problem. A new low power test pattern generator using a
used as a test pattern generator in low overhead built-in-self- linear feedback shift register, called LP-TPG, is presented to
test (BIST). This is due to the fact that an LFSR can be built reduce the power consumption of a circuit during test.
with little area overhead and used not only as a TPG, which The original patterns are generated by an LFSR and the
attains high fault coverage for a large class of circuits, but also proposed technique generates and inserts intermediate patterns
as an output response analyzer. An LFSR TPG requires between each pair patterns to reduce the primary input’s (PI’s)
unacceptably long test sequence to attain high fault coverage activities.
for circuits that have a large number of random pattern II. LOW POWER TEST PATTERN GENERATION
resistant faults. The main objective of most recent BIST
techniques has been the design of TPG’s that achieve high The basic idea behind low power BIST is to reduce the PI
fault coverage at acceptable test lengths. Another objective is activities. Here we propose a new test pattern generation
to reduce the heat dissipation during test application. technique which generates three intermediate test patterns
A significant correlation exists between consecutive between each two consecutive random patterns generated by a
vectors applied to a circuit during its normal operation. This conventional LFSR. The proposed test pattern generation
fact has been motivating several architectural concepts, such method does not decrease the random nature of the test
as cache memories and also for high speed circuits that process patterns. This technique reduces the PI’s activities and
digital audio and video signals. In contrast, the consecutive eventually the switching activities in the CUT.
vectors of a sequence generated by an LFSR are proven to Assume that Ti and Ti+1 are the two consecutive test
have low power correlation. Since the correlation between patterns generated by a pseudorandom pattern generator.
consecutive test vectors applied to a circuit during BIST is Suppose the two vectors are
significantly lower, the switching activity in the circuit can be Ti = {t1i , t2i,…,tni} and
significantly higher during BIST than that during its normal Ti+1 = {t1i+1, t2i+1,…,t ni+1},
operation. where n is the number of bits in the test patterns which is equal
Excessive switching activity during test can cause several to the number of PI’s in the circuit under test.
problems. Firstly, since heat dissipation in a CMOS circuit is Assume that Tk1, Tk2, and Tk3 are the intermediate patterns
proportional to switching activity, a circuit under test (CUT) between Ti and Ti+1. Tk2 is generated as
can be permanently damaged due to excessive heat dissipation Tk2 = {t1i,…, tn/2i,tn/2+1i+1,…,tni+1}
if the switching activity in the circuit during test application is Tk2 is generated using one half of each of the two random
much higher than that during its normal operation. The patterns Ti and Ti+1. Tk2 is also a random pattern because it is
seriousness of excessive heat dissipation during test generated using two random patterns. The other two patterns
application is worsened by trends such as circuit are generated using Tk2. Tk1 is generated between Ti and Tk2
miniaturization for portability and high performance. These and Tk3 is generated between Tk2 and Ti+1.
objectives are typically achieved by using circuit designs that Tk1 is obtained by
decrease power dissipation and reducing the package size to tjk1 = { tji; if tji =tjk2
aggressively match the average heat dissipation during the R if tji tjk2}
where j {1,2,…,n} and R is a random bit. This method of The first half of LFSR is active and second half is in idle
generating Tk1 and Tk3 is called R-injection. If two mode. Selecting sel1sel2=11, both halves of LFSR are sent to
corresponding bits in Ti and Ti+1 are the same, the same bit is the outputs O1 to On. Here Ti is generated.
positioned in the corresponding bit of Tk1, otherwise a random Step 2: en1en2=00, sel1sel2=10
bit (R ) is positioned. R can come from the output of the Both halves of LFSR are in idle mode. The first half of the
random generator. In this method, the sum of the PI’s activities LFSR is sent to the outputs O1 to On/2, but the injector circuit
between Ti and Tk1 (Ntransi.,k1), Tk1 and Tk2 (Ntransk1,k2), Tk2 and outputs are sent to the outputs On/2+1 to On. Tk1 is generated.
Tk3 (Ntransk2,k3) and Tk3 and Ti+1 (Ntransk3,i+1) are equal to the Step 3: en1en2=01, sel1sel2=11
activities between Ti and Ti+1 (Ntransi,i+1). The second half of the LFSR is active and the first half is
Ntransi.,k1+Ntransk1,k2+Ntransk2,k3 +Ntransk3,i+1 =Ntransi,i+1 in idle mode. Both halves are transferred to the outputs O1 to
III. LP-TPG On and Tk2 is generated.
Step 4: en1en2=00, sel1sel2=01
The proposed technique is designed into LFSR Both halves of the LFSR are in idle mode. From the first
architecture to create LP-TPG. Figure 2 shows LP-TPG with half, the injector outputs are sent to the outputs O1 to On/2
added circuitry to generate intermediate test patterns. and the second half sends the exact bits in the LFSR to the
outputs On/2+1 to On. Thus Tk3 is generated.
Step 5:
The process continues by going through step 1 to generate
The LP-TPG with R-injection circuit keeps the random
nature of the test patterns intact. The FSM control the test
pattern generation throughout the steps and it is independent
of the LFSR size and polynomial. Clk and test_en signals are
the inputs of the FSM.
When test_en=1, FSM starts with step 1 by setting
en1en2=10 and sel1sel2=11. It continues the process by going
through steps 1 to 4. One pattern is generated in each clock
cycle. The size of the FSM is very small and fixed. FSM can
be part of BIST controller used in the circuit to control the test


The figure 2 shows an example pattern generation of 8-bit

test patterns between T1 and T2 assuming R=0.
Fig 1. Proposed LP-TPG
Pattern 1:
The LFSR used in LP-TPG is an external-XOR LFSR.
The R-injection circuit taps the present state (Ti pattern) and
the next state (Ti+1 pattern) of the LFSR. The R-injection
Intermediate patterns
circuit includes one AND, one OR and one 2x1 mux. When tji
and tji+1 are equal, both AND and Or gates generate the same
bit and regardless of R, that bit is transferred to the MUX
output. When they are not equal, a random bit R is sent to the
The LP-TPG is activated by two non-overlapping enable
signals: en1 and en2. Each enable signal activates one half of
the LFSR.
When en1en2=10, first half of the LFSR is active and
second half is in idle mode.
1 0 1 0 0 0 0 1
When en1en2=01, first half is in idle mode and second
half is in active mode. 1 0 1 0 0 1 0 1
The middle flip flop between n/2th and n/2+1th flip flops is 0 0 0 0 0 1 0 1
used to store the n/2th bit of the LFSR when en1en2=10 and 0 1 0 1 0 1 0 1
that bit is used for second half when en1en2=01.A small finite
state machine (FSM) controls the pattern generation process. Fig 2. 8-bit pattern generation
Step 1: en1en2=10, sel1sel2=11

The example shows an LP-TPG using an 8-bit LFSR with Total estimated
polynomial x8+x+1 and seed=01001011. Two consecutive power consumption: 9
patterns T1 and T2 and three intermediate patterns are ---
generated. Vccint 2.50V: 1 3
First and second halves of Tk2 are equal to T1 and T2 Vcco33 3.30V: 2 7
respectively. Tk1 and Tk2are generated using R-injection (R=0 ---
injected in the corresponding bits of Tk1 and Tk2). Clocks: 0 0
Ntrans1,2=7, Ntrans1.,k1=2, Ntransk1,k2=1, Ntransk2,k3=2, Ntransk3,2 =2. Inputs: 0 0
This reduction of PI’s activities reduces the switching Logic: 0 0
activities inside the circuit and eventually power consumption. Outputs:
Having three intermediate patterns between each consecutive Vcco33 0 0
pattern may seem to prolong the test session by a factor of 3. Signals: 0 0
However, empirically many of the intermediate patterns can do ---
as good as conventional LFSR patterns in terms of fault Quiescent
detection. Vccint 2.50V: 1 3
Vcco33 3.30V: 2 7

Thermal summary:
junction temperature: 25C
Ambient temp: 25C
Case temp: 25C
Theta J-A: 34C/W

Network Summary:Cap Range (uF) #
Capacitor Recommendations:
Total for Vccint : 8
470.0 - 1000.0 : 1
0.0470 - 0.2200 : 1
0.0100 - 0.0470 : 2
0.0010 - 0.0047 : 4
Total for Vcco33 : 1
Fig 3. Block diagram of 8-bit LP-TPG. 470.0 - 1000.0 : 1

Figure 3 shows the block diagram of an LP-TPG using an 8- Analysis

bit LFSR. completed: Fri Jan 25 11:01:38 2008
The power report shows that a conventional LFSR will
V. POWER ANALYSIS exhibit a total power consumption of 9 mW.
B. Power report of low power LP-TPG
The power comparison between a conventional LFSR and
a low power LP-TPG is performed. The power report is Release 6.3i - XPower SoftwareVersion:G.35
obtained during simulation as below.
Copyright (c) 1995-2004 Xilinx, Inc.
A. Power report of conventional LFSR All rights reserved.
----------------------------------------- Design: lp_lfsr
Release 6.3i - XPower SoftwareVersion:G.35 Preferences: lp_lfsr.pcf
Copyright (c) 1995-2004 Xilinx, Inc. Part: 2s15cs144-6
All rights reserved. Data version: PRELIMINARY,v1.0,07-31-02
Design: lfsr4_9 Power summary: I(mA) P(mW)
Preferences: lfsr4_9.pcf -----------------------------------------
Part: 2s15cs144-6 Total estimated
Data version: PRELIMINARY,v1.0,07-31-02 power consumption: 7
Power summary: I(mA) P(mW) ---
------------------------------------------- Vccint 2.50V: 0 0

Vcco33 3.30V: 2 7
Clocks: 0 0
Inputs: 0 0
Logic: 0 0
Vcco33 0 0
Signals: 0 0
Vcco33 3.30V: 2 7

Thermal summary:
junction temperature: 25C
Ambient temp: 25C
Case temp: 25C
Theta J-A: 34C/W

Network Summary:Cap Range (uF) #
Capacitor Recommendations: Fig .4. Waveform of LP-LFS
Total for Vccint : 8
470.0 - 1000.0 : 1 VII. PAPER OUTLINE
0.0470 - 0.2200 : 1
0.0100 - 0.0470 : 2 The proposed technique reduces the correlation between
0.0010 - 0.0047 : 4 --- the test patterns. Original patterns are generated by an LFSR
Total for Vcco33 : 3 and the proposed technique generates and inserts intermediate
470.0 - 1000.0 : 1 patterns between each pair patterns to reduce the primary
0.0010 - 0.0047 : 2 inputs (PI’s) activities which reduces the switching activity
Analysis inside the CUT and hence the power consumption. Adding test
completed: Fri Jan 25 11:00:00 2008 patterns does not prolong the overall test length. Hence
The power report of a low power LP-TPG shows a total application time is still same. The technique of R-injection is
power consumption of 7 mW. This shows that there has been embedded into a conventional LFSR to create LP-TPG.
much reduction in power in an LP-TPG compared to a normal
The LP-LFSR was simulated using Xilinx software. The [1] Y.Zorian, ”A Distributed BIST Control Scheme for
conventional LFSR generated a total power of 9mW whereas Complex VLSI Devices,” in Proc. VLSI Test Symp. (VTS’93),
the LP-TPG has a much reduced power of 7mW. The output pp. 4-9, 1993.
waveform is shown in figure 4. [2] S. Wang and S. Gupta, ”DS-LFSR: A New BIST TPG for
Low Heat Dissipation,” in Proc. Int. Test Conf. (ITC’97), pp.
848-857, 1997.
[3] F. Corno, M. Rebaudengo, M. Reorda, G. Squillero and M.
Violante,”Low Power BIST via Non-Linear Hybrid Cellular
Automata,” in Proc. VLSI Test Symp. (VTS’00),pp. 29-34,
[4] P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch, H.
-J. Wunderlich, ”A modified Clock Scheme for a Low Power
BIST Test Pattern Generator,” in Proc. VLSI Test Symp.
(VTS’01), pp. 306-311, 2001.
[5] D. Gizopoulos et. al.,”Low Power/Energy BIST Scheme
for Datapaths,” in Proc. VLSI Test Symp. (VTS’00), pp. 23-28,

[6] X. Zhang, K. Roy and S. Bhawmik, ”POWERTEST: A
Tool for Energy Conscious Weighted Random Pattern
Testing,” in Proc. Int. Conf. VLI Design, pp. 416-422, 1999.
[7] S. Wang and S. Gupta,”LT-RTPG: A New Test-Per-Scan
BIST TPG for Low Heat Dissipation,” in Proc. Int. Test Conf.
(ITC’99),pp. 85-941999.
[8] P. Girard et. al.,”Low Energy BIST Design: Impact of the
LFSR TPG Parameters on the Weighted Switching Activity,”
in Proc Int. Symp. on Circuits and Systems (ISCAS’99), pp. ,
[9] P. Girard, et. al.,”A Test Vector Inhibiting Technique for
Low Energy BIST Dsign,” in Proc. VLSI Test Symp.
(VTS’99),pp. 407-412, 1999.
[10] S. Manich, et. al.,”Low Power BIST by fi ltering Non-
Detecting Vectors,” in Proc. European Test Workshop
(ETW’99), pp. 165-170, 1999.
[11] F. Corno,M. Rebaudengo,M. Sonza Reorda andM.
Violante, ”A New BIST Architecture for Low Power
Circuits,” in Proc. European TestWorkshop (ETW’99), pp.
160-164, 1999.
[12] X. Zhang and K. Roy,”Peak Power Reduction in Low
Power BIST,” in Proc. Int. Symp. on Quality Elect. Design
(ISQED’00),pp. 425-432, 2000.
[13] Synopsys Inc., “User Manuals for SYNOPSYS Toolset
Version 2002.05,” Synopsys, Inc., 2002.
[14] S. Manich and J. Figueras,”Sensitivity of the Worst Case
Dynamic Power Estimation on Delay and Filtering Models,”
in Proc.PATMOS Workshop, 1997.


Test Pattern Generation For Microprocessors

Using Satisfiability Format Automatically and Testing It Using Design
for Testability
Cynthia Hubert, II ME and Grace Jency.J, Lecturer, Karunya University

Abstract—This paper is used for testing microprocessor.

In this paper a satisfiability based framework for
automatically generating test programs that target gate
level stuck at faults in microprocessors is demonstrated. II.RELATED WORK
The micro architectural description of a processor is
translated into RTL for test analysis. Test generation Generating test sequences that target gate-level stuck-at
involves extraction of propagation paths from a modules faults without DFT. Applicable to both RTL and mixed
input output ports to primary I/O ports. A solver is then gate-level /RTL circuits.No need to assume that
used to find the valid paths that justify the precomputed controller and datapath are seperable. But limited set of
vectors to primary input ports and propagate the transparency rules and limited number of faulty responses
good/faulty responses to primary output ports. Here the used during propagation analysis [1]. Algorithm for
test program is constructed in a deterministic fashion generating test patterns that target stuck-at faults at logic
from the micro architectural description of a processor level. Reduction in test generation time and improved fault
that target stuck at faults. This is done using modelsim. coverage.But it cannot handle circuits with multiple clock
functional RTL design [2]. Technique for extracting
Index Terms—microprocessor, satisfiability, test functional information from RTL controller datapath
generation, test program. circuits. Results in low area delay power overheads,high
fault coverage and low test generation time. Some faults in
the controller are sequentially untestable [3].
For high-speed devices such as microprocessors, a III.DESIGN FOR TESTABILITY
satisfiability based register transfer level test generator that
automatically generates test programs and detects gate level Design for test is used here. In fig1, automatic test
stuck at faults is demonstrated. Test generation at the RTL program generator gives the inputs test vectors to the test
can be broadly classified into two categories1) constraint- access ports (TAP). The Test Access ports gives the input
based test generation and 2) precomputed test set-based sequences to the system under test which performs the
approach. Constraint-based test generation relies on the fact operations and gives it to the signature analyzer which
that a module can be tested by abstracting the RTL produces the output and this output is compared with the
environment, in which it is embedded, as constraints. The expected output and tells whether it is faulty or good.
extracted constraints, with the embedded module, present Suppose an 8bit input is taken. It will have 256 possible
the gate-level automatic test pattern generation (ATPG) tool combinations. Then each of the 256 combinations is tested.
with a circuit of significantly lower complexity than the So there is wastage of time. This is that the test vectors are
original circuit. While precomputed test set-based constructed in a pseudo random fashion. Sometimes there is
approaches first precomputed the test sets for different RTL no 100% fault coverage. To overcome the wastage of time
modules and then attempt to determine functional paths only, the test programs are constructed in a deterministic
through the circuit for symbolically justifying a test set and fashion and only precomputed test vectors are taken to
the corresponding responses. This symbolic analysis is reduce the wastage of time.
followed by a value analysis phase, when the actual tests are A.How to compute test vectors
assembled using the symbolic test paths and the module- This is done by automatic test pattern generation
level precomputed test sets. The use of precomputed test or test program generation. This means that in
sets enables RTL ATPG to focus its test effort on microprocessors the test vectors are determined manually
determining symbolic justification and propagation paths. and put into the memory and then each of the precomputed
However, symbolic analysis is effective only when: 1) a test vector is taken automatically from the memory and
clear separation of controller and datapath in RTL circuits is testing is done.
available and 2) design for testability (DFT) support B.Unsatisfiable test vector
mechanisms, such as a test architecture, are provided to ease The test vector generated is not able to perform the
the bottlenecks presented by the controller/datapath particular function or it is not able to perform 100% fault
interface. The issues were addressed by using a functional coverage. The test vector, which overcomes this
circuit representation based on assignment decision disadvantage, is called satisfiabilility.
diagrams (ADDs).


Fig.1 Design for Testability

A satisfiability (SAT)-based framework for
automatically generating test programs that target gate-level
stuck-at faults in microprocessors. In fig2, the micro
architectural description of a processor is first translated
into a unified register–transfer level (RTL) circuit
description, called assignment decision diagram (ADD), for
test analysis. Test generation involves extraction of Fig.2 Test generation methodology
justification/propagation paths in the unified circuit unified controller/data path representation (ADD) derived
representation from an embedded module’s input–output from the micro architectural description of the processor.
(I/O) ports to primary I/O ports, abstraction of RTL The RTL modules are captured “as-is” in the ADD. In order
modules in the justification/propagation paths, and to justify/propagate the precomputed test vectors/ responses
translation of these paths into Boolean clauses. Since the for an embedded module, we first derive all the potential
ADD is derived directly from a micro architectural justification/propagation paths from the I/O ports of the
description, the generated test sequences correspond to a embedded module to primary I/O ports. The functionality of
test program. If a given SAT instance is not satisfiable, then RTL modules in these paths is abstracted by their equivalent
Boolean implications (also known as the unsatisfiable I/O propagation rules. The generated paths are translated
segment) that are responsible for unsatisfiability are into Boolean clauses by expressing the functionality of
efficiently and accurately identified. We show that adding modules in these paths in terms of Boolean clauses in
design for testability (DFT) elements is equivalent to conjunctive normal form (CNF). The precomputed test
modifying these clauses such that the unsatisfiable segment vectors/responses are also captured with the help of
becomes satisfiable.The proposed approach constructs test additional clauses. These clauses are then resolved using an
programs in a deterministic fashion from the micro SAT solver resulting in valid test sequences that are
architectural description of a processor. Develop a test guaranteed to detect the stuck-at faults in the embedded
framework in which test programs are generated module targeted by the precomputed test vectors. Since the
automatically for microprocessors to target gate level stuck- ADD represents the micro architecture of the processor, the
at faults. Test generation is performed on a test sequences correspond to a test program. RTL test
generation also imposes a large number of initial conditions
corresponding to the initial state of flip-flops, precomputed
test vectors, and propagation of faulty responses. These
conditions are propagated through the circuit by the
Boolean constraint propagation (BCP) engine in SAT
solvers before searching through the sequential search space
for a valid test sequence. This results in significant pruning
of the sequential search space resulting in test generation
time reduction. The Boolean clauses capturing the test
generation problem for some precomputed test
vectors/responses are not satisfiable. The Boolean variables
in these implications are targeted for DFT measures
1. The proposed approach constructs test programs in a
deterministic fashion from the micro architectural
description of a processor that target stuck-at faults.
2. Test generation is performed at the RTL, resulting in very
low test generation times compared to a gate-level
sequential test generator.
3. The proposed test generation-based DFT solution is both
accurate and fast.
An ADD can be automatically generated from a
functional or structural circuit description.In fig3, it consists
of four types of nodes: READ, operation, WRITE, and
assignment-decision. READ nodes represent the current
contents of input ports, storage elements, and constants.
WRITE nodes represent output ports and the values held by
the storage elements in the next clock cycle. Operation
nodes represent various arithmetic and logic operations and
the assignment decision node implements the functionality
of a multiplexer.
Fig.4 RTL datapath of simple microprocessor

Fig.3 Assignment Decision Diagram

The below fig4 shows RTL datapath of simple

microprocessor. pI1 and pI2 represents the primary
inputs,R1,R2,R3 are registers,+ and – represents adder and
subtractor,MUL represents multiplier,CMP represents
comparator and PO1 represents primary output In fig5, Fig.5 Assignment decision diagram of simple
R1,R2,R3 inside the square box represents read microprocessor datapath
node,R1,R2,R3 inside the circle represents write node.+1,- subtractor is selected.If L2 is 0,then R2 is selected.If L2 is
1,mul,cmp represents operation node,A3,A4,A5,A6,A7 1,then the output of A5 is selected.If L3 is 0,then R3 is
represents assignment decision node.M1,M2,L1,L2,L3 are selected.If L3 is 1,then the output of the adder is
select signals. If M1 is 0,then the output of the adder is selected,mul does the multiplication of R2 and R3,cmp
selected.If M1 is 1,then pI1 is selected.If L1 is 0,then R1 is compares the values of R2 and R1.
selected.If L1 is 1 ,then the output of A4 is selected.If M2 is
0,then pI1 is selected.If M2 is 1,then the output of the



Fig 8. Assignment decision diagram for a new value of

Fi datapath.
g.6 Assignment decision diagram output of datapath.

Fig 9. Assignment decision diagram for a new value of

Fig.7 Assignment decision diagram for a new value of datapath.



[1] L. Lingappan, S. Ravi, and N. K. Jha, “Satisfiability-

based test generation for nonseparable RTL controller-
datapath circuits,” IEEE Trans. Computer-Aided Design
Integr. Circuits Syst., vol. 25, no. 3, pp.544–557, Mar. 2006
[2] I. Ghosh and M. Fujita, “Automatic test pattern
generation for functional register-transfer level circuits
using assignment decision diagrams,” IEEE Trans.
Comput.-Aided Design Integr. Circuits Syst.,vol. 20, no. 3,
pp. 402–415, Mar. 2001.
[3] I. Ghosh, A. Raghunathan, and N. K. Jha, “A design for
testability technique for register-transfer level circuits using
control/data flow extraction,”IEEE Trans. Computer-Aided
Design Integr. Circuits Syst., vol. 17,no. 8, pp. 706–723,
Aug. 1998.
[4] A. Paschalis and D. Gizopoulos, “Effective software-
Fig 10. Assignment decision diagram for a new value of based self-test strategies for on-line periodic testing of
datapath. embedded processors,” IEEE Trans. Computer-Aided
Design Integr. Circuits Syst., vol. 24, no. 1, pp.88–99, Jan.
[5] B. T. Murray and J. P. Hayes, “Hierarchical test
generation using precomputed tests for modules,” IEEE
Trans. Computer-Aided Design Integr. Circuits Syst., vol.
9, no. 6, pp. 594–603, Jun. 1990.
[6] S. Bhatia and N. K. Jha, “Integration of hierarchical test
generation with behavioral synthesis of controller and data
path circuits,” IEEE Trans. Very Large Scale Integr. (VLSI)
Syst., vol. 6, no. 4, pp. 608–619, Dec. 1998.
[7] H. K. Lee and D. S. Ha, “Hope: An efficient parallel
fault simulator for synchronous sequential circuits,” IEEE
Trans. Computer-Aided Design Integr. Circuits Syst., vol.
15, no. 9, pp. 1048–1058, Sep. 1996.
[8] L. Chen, S. Ravi, A. Raghunathan, and S. Dey, “A
Fig 11. Assignment decision diagram for a new value of
scalable software based self-test methodology for
programmable processors,” in Proc.Design Autom. Conf.,
2003, pp. 548–553.
[9] N. Kranitis, A. Paschalis, D. Gizopoulos, and
G.Xenoulis,“Softwarebased self-testing of embedded
processors,” IEEE Trans. Comput., vol.54, no. 4, pp. 461–
In this paper, we present a novel approach that
475, Apr. 2005.
extends SAT based ATPG to generate test programs that
detect gatelevel stuck-at faults in microprocessors.


DFT Techniques for Detecting Resistive Opens in CMOS Latches and

Reeba Rex.S and Mrs.G.Josemin Bala, Asst.Prof, Karunya University

Abstract-- In this paper, a design-for-testability (DFT) defects such as silicide resistive open. Memory elements
technique is proposed to detect resistive open in like latches and flip-flops are widely used in the design of
conducting paths of clocked inverter stage in CMOS digital CMOS integrated circuits. Their application depends
latches and flip-flops is proposed. The main benefit of on the requirements of performance, gate count, power
this paper is, it is able to detect a parametric range of dissipation, area, etc. Resistive opens affecting certain
resistive open defects. The testability of the added DFT branches of fully static CMOS memory elements are
circuitry is also addressed. Application to large number undetected by logic and delay testing. For these opens the
of cells also considered. Comparison with other input data is correctly written and memorized. However,
previously proposed testable latches is carried out. for high resistive opens the latch may fail to retain the
Circuits with the proposed technique have been information after some time in the presence of leakage or
simulated and verified using TSPICE. noise. Testable latches have been proposed for making
Index Terms—Design-for-testability (DFT), flip-flop, detectable stuck-open faults in these otherwise undetectable
latches, resistive open. branches. Reddy have proposed a testable latch where an
additional controllable input is added to the last stage of the
latch. Then a proper sequence of vectors is generated for
I.INTRODUCTION testing these opens. Also, the delay is penalized due to the
added series transistors. Rubio have proposed a testable
The conventional tests cannot detect FET stuck open latch. The number of test vectors is lower than that
faults in several CMOS latches and flip-flops. The stuck- proposed by Reddy. One additional input is required. The
open faults can change static latches and flip-flops into delay is also penalized due to the added series transistors. In
dynamic devices-a danger to circuits whose operation this paper, a design-for-testability (DFT) technique for
requires static memory, since undetected FET stuck-open testing full and resistive opens in undetectable branches of
faults can cause malfunctions. Designs given for several fully static CMOS memory elements is proposed. This is
memory devices in which all single FET stuck-open faults the first testable latch able to cover both full opens and
are detectable. These memory devices include common parametric resistive opens in otherwise undetectable faulty
latches, master-slave flip-flops, and scan-path flip-flops that branches. Design considerations for the DFT circuitry are
can be used in applications requiring static memory stated. The results are compared with previous reported
elements whose operation can be reliably ascertained testable structures.
through conventional fault testing methods. Stuck at faults Here a fault free circuit is taken and simulated. A
occur due thin oxide shorts (the n transistor gate to Vss or faulty circuit is taken and DFT circuitry is added and
the p transistor gate to Vdd), metal metal shorts. Stuck open simulated. Now compare both the results of the simulation
or stuck closed is due to missing source, drain or gate and the fault is located.
connection .A open or break at the drain or source of a
MOSFET give rise to a class of conventional failure called II.DESIGN FLOW
stuck open faults. If a stuck open exits, a test vector may not
always guarantee a unique repeatable logic value at the
output because there is no conducting path from the output
node to either Vdd or Vss. Undetectable opens may occur in
some branches of CMOS latches and flip-flops. This
undetectable opens occur in the clocked inverter stage (CIS)
of the symmetric D-latch. This is because the input data is
correctly written through the driver stage despite the
defective stage. Opens in vias-contacts are likely to occur.
The number of vias-contacts is high in actual integrated
circuits due to the many metal levels. In the damascene-
copper process, vias and metal are patterned and etched
prior to the additive metallization. The open density in
copper shows a higher value than those found in aluminum. Fig.1. DFT design flow
Random particle induced-contact defects are the main test
target in production testing. In addition, silicided opens can
occur due to excess anneal during manufacturing. Low
temperature screening technique can detect cold delay

III.METHODOLOGY latch is higher (lower) than for the defect-free latch. When
the transistors and are deactivated, the cell evolves to a
The methodology used here is DFT circuitry, which is stable quiescent state. The transistors and are sized such that
used to detect fault in the undetectable branches in CMOS the state of the defective latch flips its state but the state of
latches and flip-flops. This open cannot be detected by the defect-free latch remains unchanged.
delay and logic testing. This approach not only considers Let Vpg is the PMOS gate voltage and Vng is the NMOS
stuck-open faults, but also resistive opens in the CIS gate voltage. L and W correspond to length and width of the
branches. Opens are modeled with a lumped resistance transistors. Rop corresponds to resistive open. Based on the
which can take a continuous range of values. The proposed values of Vpg, Vng, L, W, we get different resistive open
testable CMOS latch cell has four additional transistors and
only one control signal are required. The network under
test (NMOS or PMOS) is selected by proper initialization of
the latch state


In this paper, a symmetric CMOS D-latch cell (see fig.2.)

has been considered. Possible open locations affecting
conductive paths of the CIS (clocked inverter stage) are
taken. This approach resistive opens in the CIS branches.
Opens are modeled with a lumped resistance which can take
Fig3. Proposed testable latch with one control signal
a continuous range of values. The proposed testable CMOS
latch cell has four additional transistors and only one
Let Wp is the width of PMOS DFT transistor, Wn is the
control signal are required.
width of NMOS DFT transistor. RminP is the minimum
The network under test (NMOS or PMOS) is selected by
detectable resistance for PMOS and RminN is the minimum
proper initialization of the latch state. Resistive opens in the
detectable resistance for NMOS. So based on the Wp/Wn
NMOS (PMOS) network are tested as follows:
ratio the minimum detectable resistance for PMOS and
• initialize the latch to 1 (0) state;
NMOS varies.
• in the memory phase, activate transistors MTP and MTN;
• deactivate both transistors MTP and MTN;
• observe the output of the CMOS latch.
The detect ability of the open defects is determined by the
voltages imposed by the DFT circuitry during the memory
phase and the latch input/output characteristics.

Fig.4. Waveform for fault free symmetrical latch

Fig .2.Symmetrical CMOS latch with undetected

The voltage values imposed during the memorizing phase

are determined by the transistor sizes of the DFT circuitry
and the latch memorizing circuitry. Let us consider a
resistive open in the NMOS network. The latch output was
initialized to one state. When the two DFT transistors MTP
and MTN are activated there is a competition of three
networks the NMOS branch under test, MTP transistor, and
MTN transistor. Due to the resistive open, the strength of the
Fig.5.Timing diagram for latch with one control signal
NMOS branch of the CIS decreases. Hence, different
.Resistive open R11=45k
voltage values at Q and Qbar appear for the defect-free and
the defective cases. The voltage at (Qbar) for the defective

V. TESTABILITY OF THE DFT CIRCUITRY simultaneously activated the current drawn from the power
supply could be important. Due to the high current density,
The testability of the added DFT circuitry is done. The mass transport due to the momentum transfer between
DFT circuitry is composed of the transistors MTP, MTN and conducting electrons and diffusion metal atoms can occur.
the inverter (see Fig. 3). Let us focus in the transistors MTP This phenomenon is known as electro migration. As a
and MTN. Defects affecting the DFT inverter can be consequence the metal lines can be degraded and even an
analyzed in a similar way. Stuck-open faults, resistive open open failure can occur. The activation of the DFT circuitries
defects and stuck-on faults are considered. Resistive opens for blocks of scan cells can be skewed to minimize stressing
located in conducting paths of the two DFT transistors can on the power buses during test mode. This is implemented
be tested using the same procedure than for opens affecting inserting delay circuitries in the path of the control signal of
undetectable branches of the latch. For a stuck-open fault at blocks of scan cells (see Fig. 7).In this way, the activation
the NMOS DFT transistor (see Fig. 3) the latch is initialized of the DFT circuitries of each block of scan cells is time
to one logic. When in memory phase the two DFT skewed. Hence, at a given time there is a stressing current
transistors are activated, the voltage at Qbar(Q) pulse due to only one block of flip-flops. For comparison
increases(decreases). The voltage at Qbar(Q) tends to a purposes, the current drawn from the power supply for 4
higher (lower) value than for the defect-free case because symmetrical flip-flops cells simultaneously activated and
the NMOS transistor if off. After the two DFT transistors time skewed is shown in fig. 8.and fig.9. In this example,
are deactivated the defect-free (defective) maintains the scan chain has been divided in three blocks of 4 cells
(changes) the initialized state. Hence, the defect is detected. each one. A delay circuitry composed of 4 inverters has
Resistive opens are tested in a similar way. Low values of been implemented.
resistive opens can be detected. For the used latch topology
resistive opens as low as 5 k is detectable.

Fig.8.Current consumption with delay circuitry

Fig.6.Output of the DFT circuitry for the case of Rop=45k

A. Waveform Description

Fig.4.corresponds to waveform of symmetrical latch

initialized to d=1(5v).In fault free condition we get Q=1 and
Qbar=0. Fig.6. corresponds to waveform of the output of
DFT circuitry. The DFT transistors are activated when
control is low. When the DFT transistors are activated, the
faulty latch voltage at Qbar(Q) tends to
increase(decrease).Here Vp=2.2V and Vn=1.5V, Vp and
Vn are voltages at the gates of the transmission gate
Fig.9.Current consumption without delay circuitry
When we see the waveform, the current
consumption with delay circuitry is 9µA and the current
consumption without delay circuitry is 14µA.

Fig.7.Skewing activation of scan cell by blocks.

Let us assume a scan design. Using the

proposed technique, a current pulse appears at the power
buses during the activation the DFT circuitry. When the
DFT circuitries of the flip-flops in the scan chain are

Technique Add Add RDET
Inpu Trans A DFT technique to test resistive opens in otherwise
t . undetectable branches in fully static CMOS latches and flip-
[2] 1 4 R∞ flops has been proposed. The main benefit of this proposal
[3] 2 4 R∞ is that it is able to detect a parametric range of resistive
This 1 4 >40k- opens with reduced performance degradation. We can
Proposal ∞ apply this DFT technique for other flipflops.
Table.1.Comparison with other testable latches
Table.1. shows a comparison between our proposal and
other testable latch structures [2], [3]. This proposal [1]Antonio Zenteno Ramirez, Guillermo Espinosa, and
requires one additional input. The number of additional Victor Champac “Design-for-Test Techniques for Opens in
inputs for proposals previously reported is also given. In Undetected Branches in CMOS Latches and Flip-Flops,”
this proposal, the number of additional transistors per cell is IEEE Transaction on VLSI Systems, vol.15, no. 5, may
smaller than for the other techniques. The delay 2007.
penalization using our proposal is significantly small. This [2] M. K. Reddy and S. M. Reddy, “Detecting FET stuck-
technique requires eight vectors for testing both CIS open faults in CMOS latches and flip-flops,” IEEE Design
branches of the latch. For testing one branch, the first vector Test, vol. 3, no. 5, pp. 17–26, Oct. 1986.
writes the desired state into the latch. The second vector [3] A. Rubio, S. Kajihara, and K. Kinoshita, “Class of
memorizes this state. Then, the third vector activates the undetectable stuck open branches in CMOS memory
DFT circuitry and the fourth vector deactivates the DFT elements,” Proc. Inst. Elect. Eng.-G, vol. 139, no. 4, pp.
circuitry. A similar sequence is required for complementary 503–506, 1992.
branch. The main benefit of this proposal is that it can [4] C. -W. Tseng, E. J. McCluskey, X. Shao, and D. M. Wu,
detect a parametric range of the resistance of the open. The “Cold delay defect screening,” in Proc. 18th IEEE VLSI
other proposals only detect a line completely open (or Test Symp., 2000, pp. 183–188.
infinite resistive open). [5]Afzel Noore,”Reliable detection of CMOS stuck open
faults due to variable internal delays”,IEICE Electronics
Express,vol..2, no.8, pp. 292-297.
[6] S. M. Samsom, K. Baker, and A. P. Thijssen, “A
comparative analysis of the coverage of voltage and tests of
realistic faults in a CMOS flip-flop,” in Proc. ESSCIRC
20th Eur. Solid-State Circuits Conf., 1994, pp. 228–231.
[7] K. Banerjee, A. Amerasekera, N. Cheung, and C. Hu,
“High-current failure model for VLSI interconnects under
short-pulse stress conditions,” IEEE Electron Devices Lett.,
vol. 18, no. 9, pp. 405–407, Sep.1997.


2-D fractal array design for 4-D Ultrasound Imaging

Ms. Alice John, Mrs.C.Kezi Selva Vijila
M.E. Applied electronics, HOD-Asst. Professor
Dept. of Electronics and Communication Engineering
Karunya University, Coimbatore
Abstract- One of the most promising techniques for linear programming and by Trucco using simulated
limiting complexity for real time 3-D ultra sound annealing.
systems is to use sparse 2-D layouts. For a given number Sparse arrays can be divided into 3 categories, random,
of channels, optimization of performance is desirable to fractal, periodic. One of the promising category is sparse
ensure high quality volume images. To find optimal periodic arrays [8]. These are based on the principal of
layouts, several approaches have been followed with different transmit and receive layouts, where the grating
varying success. The most promising designs proposed lobes in the transmit array response are suppressed by
are Vernier arrays, but also these suffer from high receive array response and vice versa. Periodic arrays utilize
peaks in the side lobe region compared with a dense partial cancellation of transmit and receive grating lobes.
array. In this work, we propose new method based on Sparse periodic arrays have a few disadvantages; one is the
the principal of suppression of grating lobes. The use of overlapping elements, another is the strict geometry
proposed method extends the concept of fractal layout. which fixes the number of elements. An element in a 2-D
Our design has simplicity in construction, flexibility in array will occupy a small area compared to an element in a
the number of active elements and the possibility of 1-D. The sparse periodic array is having high resolution but
suppression of grating lobes. there is frequent occurrence of side lobes.
In the sparse random arrays, one element is chosen at
Index Terms- 4-D Ultrasound imaging, sparse 2-D array, random according to a chosen distribution function. Due to
fractal layout, sierpinski car pet layout. randomness, the layouts are very easy to find. The sparse
random arrays are having low resolution but the suppression
1. INTRODUCTION of side lobes is maximum. By exploiting the properties of
sparse random arrays and sparse periodic arrays, we go for
The new medical image modality, volumetric imaging, fractal arrays. In Fractal arrays, we can obtain high
can be used for several applications including diagnostics, resolution with low side band level by using the advantages
research and non-invasive surgery. Existing 3-D ultrasound of both periodic and random arrays.
systems are based on mechanically moving 1-D arrays for To simplify future integration of electronics into the
data collections and preprocessing of data to achieve 3-D probe, the sparse transmit and receive layouts should be
images. The main aim is to minimize the number of chosen to be non-overlapping. This means that some
channels without compromising image quality and to elements should be dedicated to transmit while others
suppress the side lobes. New generations of ultrasound should be used to receive. To increase system performance,
systems will have the possibility to collect and visualize future 2-D arrays should possibly include pre-amplifiers
data in near real time. To develop the full potential of such a directly connected to the receive elements.
system, an ultrasound probe with a 2-D transducer array is The paper is organized in the following manner.
needed. Section II describes fractal array design starting with
Current systems use linear arrays with more than 100 sierpinsky fractal, carpet fractal and then pulse echo
elements. A 2-D transducer array will contain between 1500 response. Section III describes the simulation and
and 10,000 elements. Such arrays represent a technological performance of different designs by adjusting the kerf
challenge because of the high channel count [1]. To value. In section IV, we summarize the paper.
overcome this challenge, undersampling the 2-D array by
only connecting some of the all possible elements [2] is a II. FRACTAL ARRAY LAYOUTS
suitable solution. For a given set of constraints, the A fractal is generally a rough or fragmented geometric
problem is to choose those shape that can be subdivided into parts, each of which is (at
elements that give the most appropriate beam pattern or least approximately) a reduced-size copy of the whole, a
image. The analysis of such sparse array beam patterns has property called self-similarity.The Fractal component model
a long history. A short has the following important features:
review of some of these works can be found in [3]. • Recursivity : components can be nested in
Several methods for finding sparse array layouts for 4- composite components
D ultrasound imaging have been reported. Random
approaches have been suggested by Turnbull et al. [4], [5] • Reflectivity: components have full introspection
and this work has been followed by Duke University [6]- and intercession capabilities.
[7]. Weber et al. have suggested using genetic algorithms.
Similar layouts have been found out by Holm et al. using

• Component sharing: a given component instance have taken those elements for receiver array which
can be included (or shared) by more than one will never cause an overlapping.
D. Pulse-Echo Response
• Binding components: a single abstraction for
components connections that is called bindings. The layout should have optimal pulse-echo
Bindings can embed any communication semantics performance, i.e. the pulse-echo radiation pattern should
from synchronous method calls to remote have as low sidelobe level as possible for a specified
procedure calls mainlobe width for all angles and depths of interest. To
compute the pulse-echo response for a given transmit and
• Execution model independence: no execution receive layout is time consuming. A simplification
model is imposed. In that, components can be run commonly used is to evaluate the radiation properties in
within other execution models than the classical continuous wave mode in the far field. An optimal set of
thread-based model such as event-based models layouts for continuous waves does not necessarily give
and so on. optimal pulse-echo responses. To ensure reasonable pulse-
echo performance, additional criteria which ensure a
• Open: extra-functional services associated to a uniform distribution of elements could be introduced. This
component can be customized through the notion will limit the interference in the sidelobe region between
of a control membrane. pulses transmitted from different elements and reduce the
sidelobe level.
A. Sierpinski Fractal

In the sierpinski fractal we have considered mainly two


• Sierpinski triangle

• Sierpinski carpet

B. Sierpniski Triangle

The Sierpinski Triangle also called Sierpinski Gasket

and Sierpinski Sieve.

• Start with a single triangle. This is the only triangle

in this direction; all the others will be drawn upside

• Inside the first triangle, we have drawn a smaller

upside down triangle. It's corners should be exactly
in the centers of the sides of the large triangle

C. Sierpinski Carpet Fig. 1. Pulse-echo response of a sierpinsky carpet layout

In this paper we are mainly considering carpet layout III. RESULTS AND DISCUSSION
because we are considering 2-D array. Fractal layout exploits the advantages of both the
periodic and random arrays. Our main aim is to suppress the
• Transmitter array: transmit array is drawn using a
sidelobes and to narrow down the mainlobe. Firstly we have
matrix M consisting of both ones and zeros. These
created transmit and receive array layouts. Both the layouts
arrays have been constructed by considering a
have been constructed in such a way they both won’t
large array of element surrounded by a small
overlap each other. Transmit array is designed using a
matrix. In carpet fractal array first of all we have
matrix M. Iterations up to 3, were taken to construct the
drawn a square at the right middle and this small
transmit array. The intensity distributions were taken to find
square will occupy 1/3rd of the original big array.
out the spreading of the sidelobe and the mainlobe.
Surrounding the above built square we have
In our paper we have taken into consideration different
constructed small squares.
specifications such as speed of the sound wave i.e. 1540
m/s, initial frequency, sampling frequency as 100.10^6 HZ,
• Receiver array: in the sparse 2-D array layout to
width and height of the array, kerf is also considered that is
avoid overlapping we are selecting different
the height between the elements in an array.
receiver and transmitter arrays. In our paper we

A. case I: kerf = 0

We have simulated the transmitter and receiver layout

in this we can see since kerf value i.e. the distance between
the elements are given as zero there is no spacing between
the elements. From the pulse-echo response we can come to
the conclusion that in this case the mainlobe is not sharp but
the sidelobe level is highly suppressed. Fig. 2(a-b) shows
the transmitter and receiver layout. Fig. 2© shows the
pulse-echo response and Fig. 2(d) shows the intensity
distribution from which we can see that the side lode level (b) Receiver array
is reduced.

B. case II: kerf = lamda/2

In the second case the kerf value is taken as lamda/2, so

we can see a lamda/2 spacing between the transmitter and
receiver array. Fig 3(a-b) shows the layouts. Fig. 3© shows
the pulse-echo response in which we can see that the
mainlobe is now sharp but the sidelobes are not highly
suppressed. Fig. 3(d) shows the intensity distribution where (d) Intensity distribution
the sidelobe level is high compared to that of case I.
Fig. 2. (a)-(b) show array layout and (c)-(d) show pulse
C. case III: kerf = lamda/4 response for kerf=0
In the third case kerf value is taken as lamda/4 Fig.
4(a-b) shows the array layouts. Fig. 4© shows pulse-echo
response in which the main lobe is sharp but the sidelobe
level is high. From the intensity distribution also we can see
that the sidelobe distribution is high compared to case II.

D. case IV: kerf = lamda

In the last case kerf value is taken as lamda and

because of this we can see a spacing of lamda between the
elements in the array. Fig. 5(a-b) shows the transmitter and
receiver layout. Fig. 5© shows the pulse-echo response here
the mainlobe very sharp but the sidelobe level started
spreading towards both sides. Fig. 5(d) shows its intensity
distribution. The intensity distribution shows the spreading
of the sidelobe clearly. The sidelobe level in this case is
high compared to all other cases.
(a) Transmitter array
(b) Receiver array

(a) Transmitter array

(c ) Pulse-Echo Response


(c ) Pulse-Echo response (c ) Pulse-Echo response

(d) Intensity distribution

(d) Intensity distribution

Fig. 3. (a)-(b) show array layout and (c)-(d) show pulse

echo response for kerf=lamda/2 Fig. 4. (a)-(b) show array layout and (c)-(d) show pulse
echo response for kerf=lamda/4

(a) Transmitter array

(a) Transmitter array

(b) Receiver array

(b) Receiver array



[1]B. A. J. Angelsen, H. Torp, S. Holm, K.

Kristoffersen, and T. A. Whittingham, “Which
transducer array is best?,” Eur. J. Ultrasound, vol. 2,
no. 2, pp. 151-164, 1995.
[2]S. Holm, “Medical ultrasound transducers and
beamforming,” in Proc. Int. Cong. Acoust., pp. 339-
(c ) Pulse-Echo response 342, Jun. 1995.
[3]R. M. Leahy and B. D. Jeffs, “On the design of
maximally sparse beamforming arrays,” IEEE Trans.
Antennas Propagat.,vol. AP-39, pp. 1178-1187,
Aug. 1991.
[4]D. H. Turnbull and F. S. Foster, “Beam steering with
pulsed two-dimensional transducer arrays,” IEEE
Trans. Ultrason.,Ferroelect., Freq. Contr., vol. 38,
no. 4, pp. 320-333, 1991.
[5]D. H. Turnbull, “Simulation of B-scan images from two-
dimens between linear and two-dimensional phased
arrays,” Ultrason. Imag., vol. 14, no. 4, pp. 334-353,
(d ) Intensity Distribution
Oct. 1992.
Fig. 5. (a)-(b) show array layout and (c)-(d) show pulse
echo response for kerf=lamda

To construct a 2-D array for 4-D ultrasound imaging we
need to meet many constraints in which an important one is
regarding the mainlobe and sidelobe level. To execute this
we are going for pulse-echo response. We have shown it is
possible to suppress the unwanted sidelobe levels by
adjusting different parameters of the array layout. We have
also shown the changes in the intensity level while
adjusting the spacing between array elements. As a future
we will calculate the mainlobe BW, ISLR and the sidelobe
peak value to take the correct fractal, the above shown
parameters will affect the image quality.


Secured Digital Image Transmission over Network Using

Efficient Watermarking Techniques on Proxy Server
Jose Anand, M. Biju, U. Arun Kumar
JAYA Engineering College, Thiruninravur, Near Avadi, Chennai 602024.

Abstract: With the rapid growth of Internet technologies information like a company logo to indicate the ownership of
and wide availability of multimedia computing facilities, the multimedia. The visible watermarking causes distortion
the enforcement of multimedia copyright protection of the cover image, and hence the invisible watermarking is
becomes an important issue. Digital watermarking is more practical. Invisible watermarking, as the name
viewed as an effective way to deter content users from suggests, the watermark is imperceptible in the watermarked
illegal distributing. The watermark can be used to image. Invisible watermarking can be classified into three
authenticate the data file and for tamper detection. This types, robust, fragile and semi-fragile.
is much valuable in the use and exchange of digital A popular application of watermarking techniques is to
media, such as audio and video, on emerging handheld provide a proof of ownership of digital data by embedding
devices. However, watermarking is computationally copyright statements into video or image digital products.
expensive and adds to the drain of the available energy in Automatic monitoring and tracking of copy-write material
handheld devices.This paper analyzes the performance of on web, automatic audit of radio transmissions, data
energy, average power and execution time of various augmentation, fingerprinting applications, all kind of data
watermarking algorithms. Also we propose a new like audio, image, video, formatted text models and model
approach in which a partition is made for the animation parameters are examples where watermarking can
watermarking algorithm to embed and extract by be applied.
migrating some tasks to the proxy server. Security To allow the architecture to use a public-key security
measures have been provided by DWT, which leads to a model on the network while keeping the devices themselves
lower energy consumption on the handheld device simple, we create a software proxy for each device. All
without compromising the security of the watermarking objects in the system, e.g., appliance, wearable gadgets,
process. Proposed approach shows that executing the software agents, and users have associated trusted software
watermarking tasks that are partitioned between the proxies that either run on an embedded processor on the
proxy and the handheld devices, reduce the total energy appliance, or on a trusted computer.
consumed by a good factor, and improve the In the case of the proxy running on an embedded
performance by two orders of magnitude compared to processor on the appliance, we assume that device to proxy
running the application on only the handheld device. communication is inherently secure. If the device has
minimal computational power and communicates to its
Keywords:- energy consumption, mobile computing, proxy through a wired or wireless network, we force the
proxy server, security, watermarking. communication to adhere to a device to proxy protocol. The
proxy is software that runs on a network-visible computer.
The proxy’s primary function is to make access-control
I. INTRODUCTION decisions on behalf of the device it represents. It may also
Watermarking is used to provide copyright protection perform secondary functions such as running scripted actions
for digital content. A distributor embeds a mark into a digital on behalf of the device and interfacing with a directory
object, so ownership of this digital object can be proved. service. The device to proxy protocol varies for different
This mark is usually a secret message that contains the types of devices. In particular, we consider lightweight
distributor’s copyright information. The mark is normally devices with higher bandwidth devices with low bandwidth
embedded into the digital object by exploiting the usually wireless network connections and slow CPUs and
inherent information redundancy. heavyweight devices with higher bandwidth connections and
The problem arises when a dishonest user tries to delete facter CPUs.
the mark in the digital object before redistribution in order to It was assumed that heavyweight devices are capable of
claim ownership. In consequence, the strength of running proxy software locally. With a local proxy, a
watermarking schemes must be based on the difficulty of sophisticated protocol for secure device to proxy
locating and changing the mark. There are many communication is unnecessary, assuming critical parts of the
watermarking approaches that try to protect the intellectual device are tamper resistant. For lightweight devices, the
property of multimedia objects, especially images, but proxy must run elsewhere.
unfortunately very little attention has been given to software The proxy and device communicate through a secure
watermarking. channel that encrypts and authenticates all the messages.
There are two kinds of digital watermarking, visible and Different algorithms are used for authentication and
invisible. The visible watermarking contains visible encryption. It may use symmetric keys. In this paper the

energy profile of various watermarking algorithms are Figure 1 shows our implementation of a watermarking
analyzed, and also analyzed the impact of security and image system in which multimedia content is streamed to a
quality on energy consumption. handheld device via a proxy server. This system consists of
Then a task partitioning scheme for wavelet based three components: mobile devices, proxy servers, and
image watermarking algorithms in which computationally content servers.
expensive portions of the watermarking are offloaded to a A mobile or handheld device refers to any type of
proxy server. The proxy server acts as an agent between the networked resource; it could be handheld (PDA), a gaming
content server and the handheld device is used for various device, or a wireless security camera. Content servers store
other tasks such as data transcoding, load management. The multimedia and database content and stream data (images) to
partitioning scheme can be used to reduce energy a client as per request. All communication between the
consumption associated with watermarking on the handheld mobile devices and the servers are relayed through the proxy
without compromising the security of the watermarking servers.
process. Proxy servers are powerful servers that can, among
other things, compress/decompress images, transcode video
II. WATERMARKING in real-time, access/provide directory services, and provide
services based on a rule base for specific devices. Figure 2
shows the general process of watermarking image data,
The increasing computational capability and availability
where the original image (host image) is modified using a
of broadband in emerging handheld devices have made them
signature to create the watermarked image.
true endpoints of the internet. They enable users to download
In this process, some error or distortion is introduced.
and exchange a wide variety of media such as e-books, To ensure transparency of the embedded data, the amount of
images, etc. Digital watermarking has been proposed as a image distortion due to the watermark embedding process
technique for protecting intellectual property of digital data. has to be small. There are three basic tasks in the
It is the process of embedding a signature/watermark watermarking process with respect to an image as shown in
into a digital media file so that it is hidden from view, but figure 2. A watermark is embedded either in the spatial
can be extracted on demand to verify the authenticity of the domain or in the frequency domain. Detection and extraction
media file. The watermark can be a binary data, a logo, or a refers to whether an image has a watermark and extracting
seed value to a pseudorandom number generator to produce the full watermark from the image. Authentication refers to
a sequence of numbers with a certain distribution. comparing the extracted water mark with the original
Watermarking can be used to combat fraudulent use of watermark.
wireless voice communications, authenticating the identity
of cell phones and transmission stations, and securing the
delivery of music and other audio content. Watermarking
bears a large potential in securing such applications, for
example, e-fax for owner verification, customer
authentication in service delivery, and customer support.
Watermarking algorithms are designed for maximum
security with little or no consideration for other system
constraints such as computational complexity and energy
availability. Handheld devices such as PDAs and cell phones
have a limited battery life that is directly affected by the
amount of computational burden placed by the application.
Digital watermarking tasks place an additional burden on the
available energy in these devices.
Watermarking, like steganography, seeks to hide
Figure 2 Watermarking process (a) watermark generation
information inside another object. Therefore, it should be
and embedding (b) watermark extraction and authentication
resilient to intentional or unintentional manipulations and
resistant to watermark attacks. Although several techniques
Watermarks are used to detect unauthorized
have been proposed for remote task execution for power
modifications of data and for ownership
management, these do not account for the application
security during the partitioning process.
authentication. Watermarking techniques for images and
video differ in that watermarking in video streams takes
advantage of the temporal relation between frames to embed
water marks.

Figure 3 Digital Signal Generations

Figure 1 Architecture of target system

A simple approach for embedding data into images is to The handheld then embeds the watermark coefficients
set the least significant bit of some pixels to zero. Data is into the image using a unique coefficient relationship to
then embedded into the image by assigning 1’s to the LSBs generate the watermarked image. This is a secure approach
in a specific manner which is known only to the owner. This as the proxy does not know the coefficient relationship used
method satisfies the perceptual transparency property, since to embed the watermark coefficients in the image. During
only the least significant bit of an 8-bit value is altered. watermark extraction, the handheld extracts the image, and
In DCT-based watermarking, the original image is watermark coefficients from the watermarked image and
divided into 8 x 8 blocks of pixels, and the two-dimensional uses its private secure key to decrypt the image and
(2-D) DCT is applied independently to each block. The watermark coefficients.
watermark is then embedded into the image by modifying The handheld sends the image coefficients to the proxy
the relationship of the neighboring blocks of the DCT for processing, such as carrying out inverse DWT; on the
coefficients that are in the middle-frequency range in the other hand, it processes the coefficients of the watermark
original image. itself to generate the watermark. Then it authenticates the
The spatial and frequency domain watermarking watermark against the original watermark. The fact that the
techniques used in still images, are extended to the temporal watermark is not sent to the proxy makes this scheme secure
domain for video streams. In this, one can take advantage of against any potential malicious attack by the proxy which is
the fact that in MPEG video streams the frames and the bi- shown in the figure 4 and 5 respectively
directional frames are derived from reference intermediate .
frames using motion estimation. Wavelet-based
watermarking is one of the most popular approaches due to
its robustness against malicious attacks.
Wavelet-based image watermark embedding consists of
three phases: 1) watermark preprocessing; 2) image
preprocessing; and 3) watermark embedding, as shown in
Figure 4. First, each bit in each pixel of both the image and
the watermark is assigned to a bit plane. There are 8 bit
planes corresponding to the gray-level resolution of the
Then DWT coefficients are obtained for each bit plane
by carrying out DWT on a plane-by-plane basis. The DWT Figure 4 Embedding and Extraction
coefficients of the watermark are encrypted using a public
key. The watermark embedding algorithm then uses the
coefficients of the original image and those of the encrypted
watermark to generate the watermarked image. A similar
reverse process is used for watermark extraction and
authentication. Figure 5 Partioning image watermarking and embedding and
First, the encrypted coefficients of the image and the
watermark are extracted from the image. Then a secret extraction process
private key is used to decrypt the coefficients of the III. EXPERIMENTAL SETUP
watermark and an inverse DWT is applied and so on, till the
original watermark is obtained. DWT uses filters with Our experimental setup is shown in Figure 6. All the
different cutoff frequencies to analyze a signal at different measurements were made using a Sharp Zaurus PDA with an
resolutions. Intel 400---MHz XScale processor with a 64-MB ROM and
The signal is passed through a series of high-pass filters, 32-MB SDRAM. It uses a National Instruments PCI-6040E
also known as wavelet functions, to analyze the high data acquisition (DAQ) board to sample the voltage drop
frequencies and it is passed through a series of low-pass across the resistor (to calculate current) at 1000 samples/s.
filters, also known as scaling functions, to analyze the low The DAQ has a resolution of 16 bits.
frequencies. Activate the application. We present two IV. ENERGY CONSUPTION ANALYSIS
partitioning schemes—the first gives priority to reduction of
energy consumption. We calculated the instantaneous power consumption
This watermark process migration is applicable in office corresponding to each sample and the total energy using the
environments where a trusted proxy can act as an “agent” or following equations: where is the instantaneous voltage drop
representative for the mobile device and can take care of across the resistor in volts with resistance and is the voltage
authentication and quality of service negotiation with the across the Zaurus PDA or the supply voltage, and is the
content server. A more secure partitioning scheme for both sampling period. Energy is the sum of all the instantaneous
watermark embedding and extraction requires some power samples for the duration of the execution of the
participation from the device in the watermarking process. application multiplied by the sampling period. We calculate
During watermark embedding, we migrate the following average power as the ratio of total energy over total
tasks to the proxy: bit decomposition, coefficient calculation execution time.
using DWT, and watermark coefficient encryption using the
public key. So, the handheld first sends the image and the
watermark to the proxy. The proxy processes them and sends
the image and watermark coefficients back to the handheld.

TABLE I Embedding Energy, Power and Execution Time Wang 88.00 0.59 147.90
Analysis Xia 82.70 0.57 144.51
Xie 74.06 1.00 73.88
Avg. Zhu 158.80 1.16 137.38
Algorithm Energy(J) Time
(W = TABLE III Authentication of Energy, Power and Execution
J/s) Time Analysis
Bruyndonckx 1.47 0.11 13.46
Corvi 83.20 0.61 136.15 Avg.
Cox 126.00 1.10 115.23 Exec.
Dugad 68.70 0.50 136.64 Algorithm Energy(J) Time
(W =
Fridrich 196.00 1.15 171.00 J/s)
Kim 73.50 0.52 140.81 Bruyndonckx 0.02 0.59 0.034
Koch 2.19 0.17 12.64 Corvi 0.10 0.73 0.138
Wang 85.80 0.61 140.20 Cox 0.05 1.35 0.037
Xia 90.00 0.67 133.82 Dugad 0.03 0.97 0.031
Xie 154.80 1.05 147.07 Fridrich 0.18 1.36 0.132
Zhu 163.30 1.14 143.74 Kim 0.10 0.76 0.131
Koch 0.04 1.25 0.032
Table I lists the energy usage, average power Wang 0.08 1.36 0.059
(energy/execution time), and execution time for watermark Xia 0.08 1.40 0.057
embedding by the various watermarking algorithms when Xie 0.04 1.00 0.039
they are executed on the handheld device. Calculating Zhu 0.06 1.20 0.050
wavelet and inverse-wavelet transforms is computationally
expensive and, thus, also power hungry. V. CONCLUSION
The large variation in the power consumption of the
different algorithms can be in part attributed to the difference In this paper the energy characteristics of several
in the type of instructions executed in each case. The wavelet based image watermarking algorithms are analyzed
instruction sequence executed is largely dependent on and designed a proxy-based partitioning technique for
algorithmic properties which enable certain optimizations energy efficient watermarking on mobile devices. The
such as vectorization and on the code generated by the energy consumption due to watermarking tasks can be
compiler. minimized for the handheld device by offloading the tasks
We present the energy, power, and execution time completely to the proxy server with sufficient security. So
analysis of watermark extraction in Table II. Watermark this approach maximizes the energy savings and ensures
extraction is more expensive than watermark embedding. security. These approaches can be enhanced by providing
During extraction, the transform is carried out on both the some error correction codes while embedding and on
input image and the output image, and the corresponding extraction stages.
coefficients are normalized.
The correlation between the normalized coefficients of REFERENCES
the input and output is used as a measure of the fidelity of
the watermarked image. The overhead of computing band
[1] A. Fox and S. D. Gribble, “Security on the move:
wise correlation and image normalization accounts for the
Indirect authentication using kerberos,” in Proc.
higher energy consumption.
Mobile Computing Networking, White Plains, NY,
In Table III, we list the energy, power, and execution
1996, pp. 155–164.
time for watermark authentication. This task is
[2] B. Zenel, A Proxy Based Filtering Mechanism for the
computationally inexpensive, since it involves a simple
Mobile Environment Comput. Sci. Dept., Columbia
comparison of the extracted watermark and the original
University, New York, 1995, Tech. Rep. CUCS-0-95.
[3] A. Rudenko, P. Reiher, G. J. Popek, and G. H.
Kuenning, “The remote processing framework for
TABLE II Extracting Energy, Power and Execution Time
portable computer power saving,” in Proc. 1999 ACM
Symp. Appl. Comput., 1999, pp. 365–372.
[4] U. Kremer, J. Hicks, and J. Rehg, Compiler-directed
Avg. remote task execution for power management: A case
Power Exec. study, Compaq Cambridge Research Laboratory
Algorithm Energy(J)
(W = Time (s) (CRL), Cambridge, MA, 2000, Tech. Rep. 2000-2.
J/s) [5] P. Rong and M. Pedram, “Extending the lifetime of a
Bruyndonckx 0.22 0.79 0.28 network of battery-powered mobile devices by remote
Corvi 70.30 0.47 150.77 processing: A markovian decision-based approach,”
Cox 121.00 0.95 128.02 in Proc. 40th Conf. Des. Automat., 2003, pp. 906–
Dugad 38.40 0.49 79.00 911.
Fridrich 191.00 1.10 173.60 [6] A. Rudenko, P. Reiher, G. J. Popek, and G. H.
Kim 91.30 0.55 166.57 Kuenning, “The remote processing framework for
Koch 0.61 0.61 1.00

portable computer power saving,” in Proc. 1999 ACM
Symp. Appl. Comput., 1999, pp. 365–372.
[7] U. Kremer, J. Hicks, and J. Rehg, Compiler-directed
remote task execution for power management: A case
study, Compaq Cambridge Research Laboratory
(CRL), Cambridge, MA, 2000, Tech. Rep. 2000-2.
[8] F. Hartung, J. K. Su, and B. Girod, “Spread spectrum
watermarking: Malicious attacks and counterattacks,”
in Security Watermarking Multimedia Contents,
1999, pp. 147–158.s
[9] W. Diffie and M. E. Hellman, “New directions in
cryptography,” IEEE Trans. Inform. Theory, vol. IT-
22, no. 6, pp. 644–654, Nov. 1976. 25] W. Diffie and
M. E. Hellman, “New directions in cryptography,”
IEEE Trans. Inform. Theory, vol. IT-22, no. 6, pp.
644–654, Nov. 1976.
[10] I. Cox, J. Kilian, T. Leighton, and T. Shamoon,
“Secure spread spectrum watermarking for
multimedia,” IEEE Trans. Image Process., vol.S.
Voloshynovskiy, S. Pereira, and T. Pun, “Watermark
attacks,” in Proc. Erlangen Watermarking Workshop,
[11] Arun Kejariwal (S’02) received the B. Tech. degree in
electrical engineering from the Indian Institute of
Technology (IIT), New Delhi, India, in 2002. S.
Voloshynovskiy, S. Pereira, and T. Pun, “Watermark
attacks,” in Proc. Erlangen Watermarking Workshop,


Significance of Digital Signature and Implementation through RSA

R.VIJAYA ARJUNAN, M.E, Member, ISTE: LM -51366
Senior Lecturer, Department of Electronics and Communication & Bio medical Engineering, AArupadai Veedu Institute of
Technology, Vinayaka Missions University, Old Mahabalipuram Road, Chennai. &

Abstract-Internet-enabled wireless devices continue to
proliferate and are expected to surpass traditional In conventional cryptography, also called secret-key or
Internet clients in the near future. This has opened up symmetric-key encryption, one key is used both for
exciting new opportunities in the mobile e-commerce encryption and decryption. The Data Encryption Standard
market. However, data security and privacy remain (DES) is an example of a conventional cryptosystem that is
major concerns in the current generation of “Wireless widely employed by the Federal Government. Figure is an
Web” offerings. All such offerings today use a security illustration of the conventional encryption process.
architecture that lacks end-to-end security. This
unfortunate choice is driven by perceived inadequacies of Key management and conventional encryption
standard Internet security protocols like SSL on less Conventional encryption has benefits. It is very fast. It is
capable CPUs and low-bandwidth wireless lines. This especially useful for encrypting data that is not going
article presents our experiences in implementing and anywhere. However, conventional encryption alone as a
using standard security mechanisms and protocols on means for transmitting secure data can be quite expensive
small wireless devices. We have created new classes for simply due to the difficulty of secure key distribution. Recall
Java 2 Micro-Edition platform that offer fundamental a character from your favorite spy movie: the person with a
cryptographic operations such as message digests and locked.Briefcase handcuffed to his or her wrist. What is in
ciphers as well as higher level security protocols solution the briefcase, anyway? It’s the key that will decrypt the
for ensuring end-to-end security of wireless Internet secret data. For a sender and recipient to communicate
transactions even within today’s technological securely using conventional encryption, they must agree
constraints. upon a key and keep it secret between themselves. If they
are in different physical locations, they must trust a courier,
the Bat Phone, or some other secure communication medium
to prevent the disclosure of the secret key during
transmission. Anyone who overhears or intercepts the key in
Cryptography is the science of using mathematics to encrypt transit can later
and decrypt data. Cryptography enables you to store read, modify, and forge all information encrypted or
sensitive information or transmit it across insecure networks authenticated with that key.
(like the Internet) so that it cannot be read by anyone except
the intended recipient. While cryptography is the science of III. PUBLIC KEY CRYPTOGRAPHY
securing data, cryptanalysis is the science of analyzing and
breaking secure communication. The problems of key distribution are solved by public key
Classical cryptanalysis involves an interesting combination cryptography, the concept of which was introduced by
of analytical reasoning, application of mathematical tools, Whitfield Diffie and Martin Hellman in 1975. Public key
pattern finding, patience, determination, and luck. cryptography is an asymmetric scheme that uses a pair of
Cryptanalysts are also called attackers. Cryptology embraces keys for encryption: a public key, which encrypts data, and a
both cryptography and cryptanalysis. PGP is also about the corresponding private, or secret key for decryption. You
latter sort of Cryptography. Cryptography can be strong or publish your public key to the world while keeping your
weak, as explained above. Cryptographic strength is private key secret. Anyone with a copy of your public key
measured in the time and resources it would require to can then encrypt information that only you can read. Even
recover the plaintext. The result of strong cryptography is people you have never met. It is computationally infeasible
cipher text that is very difficult to decipher without to deduce the private key from the public key. Anyone who
possession of the appropriate decoding tool. How difficult? has a public key can encrypt information but cannot decrypt
Given all of today’s computing power and available time— it.Only the personwho has the corresponding private key can
even a billion Computers doing a billion checks a second—it decrypt Information.
is not possible to decipher the result of strong cryptography
before the end of the universe. One would think, then, that
strong cryptography would hold up rather well against even
an extremely determined cryptanalyst. Who’s really to say?
No one has proven that the strongest encryption obtainable
today will hold up under tomorrow’s computing power.
However, the strong cryptography employed by PGP is the
best available today.


Key Larger keys will be cryptographically secure for a longer

A key is a value that works with a cryptographic algorithm period of time. If what you want to encrypt needs to be
to produce a specific cipher text. Keys are basically really, hidden for many years, you might want to use a very large
really, really big numbers. Key size is measured in bits; the key. Of course, who knows how long it will take to
number representing a 1024-bit key is darn huge. In public determine your key using tomorrow’s faster, more efficient
key cryptography, the bigger the key, the more secure the computers? There was a time when a 56-bit symmetric key
cipher text. However, public key size and conventional was considered extremely safe. Keys are stored in encrypted
cryptography’s secret key size are totally unrelated. A form. PGP stores the keys in two files on your hard disk; one
conventional 80-bit key has the equivalent strength of a for public keys and one for private keys. These files are
1024-bit public key. A conventional 128-bit key is called key rings. As you use PGP, you will typically add the
equivalent to a 3000-bit public key. Again, the bigger the public keys of your Recipients to your public key ring. Your
key, the more secure, but the algorithms used for each type private keys are stored on your private key ring. If you lose
of cryptography are very different and thus comparison is your private key ring, you will be unable to decrypt any
like that of apples to oranges. While the public and private information encrypted to keys on that ring.
keys are related, it’s very difficult to derive the private key
given only the public key; however, deriving the private key IV. DIGITAL SIGNATURES
is always possible given enough time and computing power. A major benefit of public key cryptography is that it
This makes it very important to pick keys of the right size; provides a method for employing digital signatures. Digital
large enough to be secure, but small enough to be applied signatures enable the recipient of information to verify the
fairly quickly. Additionally, you need to consider who might authenticity of the information’s origin, and also verify that
be trying to read your files, how determined they are, how the information is intact. Thus, public key digital signatures
much time they have, and what their resources might be provide authentication and data integrity. A digital signature
also provides non-repudiation, which means that it prevents
the sender from claiming that he or she did not actually send

the information. These features are every bit as fundamental knows that you just deposited $1000 in your account, but
to cryptography as privacy, if not more. A digital signature you do want to be darn sure it was the bank teller you were
serves the same purpose as a handwritten signature. dealing with. The basic manner in which digital signatures
However, a handwritten signature is easy to counterfeit. A are created is illustrated. Instead of encrypting information
digital signature is superior to a handwritten signature in that using someone else’s public key, you encrypt it with your
it is nearly impossible to counterfeit, plus it attests to the private key. If the information can be decrypted with your
contents of the information as well as to the Identity of the public key, then it must have originated with you.
signer. Some people tend to use signatures more than they
use encryption. For example, you may not care if anyone

V. RSA ENCRYPTION Suppose I give you the number 1459160519. I'll even tell
you that I got it by multi-Person a selects two prime
Public Key Cryptography numbers.
1. We will use p = 23 and q = 41 for this example, but keep
One of the biggest problems in cryptography is the in mind that the real numbers person A should use should be
distribution of keys. Suppose you Live in the United States much larger.
and want to pass information secretly to your friend in 2. Person A multiplies p and q together to get PQ = (23)(41)
Europe. If you truly want to keep the information secret, you = 943. 943 are the public keyî, which he tells to person B
need to agree on some sort of key That you and he can use (and to the rest of the world, if he wishes).
to encode/decode messages. But you don't want to keep 3. Person A also chooses another number e, which must be
using The same key or you will make it easier and easier for relatively prime to (p _ 1)
others to crack your cipher.But it's also a pain to get keys to (q _1 ) In this case, (p _ 1)(q _ 1) = (22)(40) = 880, so e = 7
your friend. If you mail them, they might be stolen. If is _ne. e is
You send them cryptographically, and someone has broken Also part of the public key, so B also is told the value of e.
your code, that person will Also have the next key. If you 4. Now B knows enough to encode a message to A. Suppose,
have to go to Europe regularly to hand-deliver the next Key, for this example, that
that is also expensive. If you hire some courier to deliver the The message is the number M = 35.
new key, you have to Trust the courier, etcetera.
5. B calculates the value of C = Me (mod N) = 357(mod
RSA Encryption
6. 357 = 64339296875 and 64339296875(mod 943) = 545.
The number 545 is
In the previous section we described what is meant by a trap-
The encoding that B sends to A.
door cipher, but how do you make one? One commonly used
7. Now A wants to decode 545. To do so, he needs to _nd a
cipher of this form is called RSA Encryption, where RSA
number d such that
are the initials of the three creators: Rivest, Shamir, and
Ed = 1(mod (p _ 1)(q _ 1)), or in this case, such that 7d =
Adleman. It is based on the following idea: It is very simply
1(mod 880). A
to multiply numbers together, especially with computers.
Solution is d = 503, since 7 _ 503 = 3521 = 4(880) + 1 =
But it can be very difficult to factor numbers. For example,
1(mod 880).
if I ask you to multiply together 34537 and 99991, it is a
8. To _nd the decoding, A must calculate Cd (mod N) =
simple matter to punch those numbers into a calculator and
545503(mod 943). This
3453389167. But the reverse problem is much harder.
Looks like it will be a horrible calculation, and at _rst it So this means that 545503 =
seems like it is, but notice That 503 = 545256+128+64+32+16+4+2+1 = 545256545128 _ _ _
256+128+64+32+16+4+2+1 (this is just the binary 5451: But since we only care about the result (mod 943), we
expansion of 503). can calculate all the partial results in that modulus, and by
repeated squaring of 545, we can get all

the exponents that are powers of 2. For example, 5452(mod 54532(mod 943) = 795
943) = 545 _ 545 = 297025(mod 943) = 923. Then square 54564(mod 943) = 215
again: 5454(mod 943) = (5452)2(mod 943) = 923 _ 923 = 545128(mod 943) = 18
851929(mod 943) = 400, and so on. We obtain the following 545256(mod 943) = 324
table: So the result we want is:
5451(mod 943) = 545 545503(mod 943) = 324 _ 18 _ 215 _ 795 _ 857 _ 400 _ 923
5452(mod 943) = 923 _ 545(mod 943) = 35:
5454(mod 943) = 400 Using this tedious (but simple for a computer) calculation, A
5458(mod 943) = 633 can decode B's message
54516(mod 943) = 857 And obtain the original message.


Key length comparison
zECC( base point) RSA(modulus n) KEY LENGTH COMPARISON
z106 bits 512 bits
z132 bits 768 bits
z160 bits 1024 bits
z224 bits 2048 bits 2000
1 2 3 4 ECC( BASE PT)
80 163 1024
128 283 3072
192 409 7680
256 571 15360

Our experiments done with RSA & Other crypto 5000
analytic algorithm show that SSL is a viable technology
even for today’s mobile devices and wireless networks. By
carefully selecting and implementing a subset of the SYMMETRIC 0
protocol’s many features, it is possible to ensure acceptable
performance and compatibility with a large installed base to ECC 1 2
secure Web servers while maintaining a small memory 3 4
footprint. Our implementation brings mainstream security DH/RSA
mechanisms, trusted on the wired Internet, to wireless
devices for the first time. explore the use of smart cards as hardware accelerators and
The use of standard SSL ensures end-to-end Elliptic Curve Cryptography in our implementations.
security, an important feature missing from current wireless
architectures. The latest version of J2ME MIDP
incorporating KSSL can be downloaded.
In our ongoing effort to further enhance
cryptographic performance on small devices, we plan to



1) R.L.Rivest, A.Shamir & L.M.Adleman, “A method

for obtaining digital signatures and public Key
cryptosystems”, Communications of the ACM, 21 (1978),
120-126.FIPS 186, “Digital Signature
Standard”, 1994.
2) W.Diffie & M.E.Hellman, “New directions in
cryptography”, IEEE Transactions on Information
Theory, 22 (1976), 644-654.
3) J. Daemen and V. Rijmen, AES Proposal: Rijndael, AES
Algorithm Submission,
September 3, 1999.
4) J. Daemen and V. Rijmen, The block cipher Rijndael,
Smart Card research and
Applications, LNCS 1820, Springer-Verlag, pp. 288-296.
5)A. Frier, P. Kariton, and P.Kocher, “ The SSL3.0 protocol
version 3.0” ; http://home
6) D. Wagner and B. Schneier, “ Analysis of the SSL3.0
Protocol” 2nd USENIX Wksp Elect, Commerce, 1996; http://
7) WAP Forum, “Wireless Transport Layer Security
Specification”; ttp://, htm
8) A. Lee, NIST Special Publication 800-21, Guideline for
Implementing Cryptography
In the Federal Government, National Institute of Standards
and Technology, Nov ‘99
9) A. Menezes, P. van Oorschot, and S. Vanstone, Handbook
of Applied Cryptography,
CRC Press, New York, 1997.
10) J. Nechvatal, ET. Al., Report on the Development of the
Advanced Encryption Standard (AES), National Institute of
Standards and Technology, October 2, 2000.


A Survey on Pattern Recognition Algorithms For Face Recognition

N.Hema*, C.Lakshmi Deepika**
PG Student
**Senior Lecturer
Department of ECE
PSG College of Technology
Coimbatore-641 004
Tamil Nadu, India.

Abstract- This paper discusses about face recognition. Normalization includes the segmentation, alignment and
Where face recognition refers to an automated or semi- normalization of the face images. Finally, recognition
automated process of matching facial images. Since it has includes the representation and modeling of face images as
got its own disadvantage the thermal face recognition is identities, and the association of novel face images with
used. The major advantage of using thermal infrared known models. In order to realize such a system, acquisition,
imaging is to improve the face recognition performance. normalization and recognition must be performed in a
While conventional video cameras sense reflected light, coherent manner.
thermal infrared cameras primarily measure emitted The thermal infrared (IR) spectrum comprises mid-
radiation from objects such as faces [1]. Thermal wave infrared (MWIR) ranging from (3-5 µm), and long-
infrared (IR) imagery offers a promising alternative to wave infrared (LWIR) ranging from (8-12 µm), all longer
visible face recognition as it is relatively insensitive to than the visible spectrum is from (0.4-0.7 µm). Thermal IR
variations in face appearance caused by illumination imagery is independent of ambient lighting since thermal IR
changes. The fusion of visual and thermal face sensors only measure the heat emitted by objects [3]. The
recognition can increase the overall performance of face use of thermal imagery has great advantages in poor
recognition systems. Visual face recognition systems illumination conditions, where visual face recognition
perform relatively well under controlled illumination systems often fail. It will be a highly challenging task if we
conditions. Thermal face recognition systems are want to solve those problems using visual images only.
advantageous for detecting disguised faces or when there
is no control over illumination. Thermal images of II.VISUAL FACE RECOGNITION
individuals wearing eyeglasses may result in poor
performance since eyeglasses block the infrared A face is a three-dimensional object and can be
emissions around the eyes, which are important features seen differently according to inside and outside elements.
for recognition. With taking advantages of each visual Inside elements are expression, pose, and age that make the
and thermal image, the new fused systems can be face seen differently. Outside elements are brightness, size,
implemented in collaborating low-level data fusion and lighting, position, and other Surroundings. The face
high-level decision fusion [4, 6].This survey was further recognition uses a single image or at most a few images of
carried out through neural network and support vector each person are available and a major concern has been
machine. Neural networks have been applied successfully scalability to large databases containing thousands of people.
in many pattern recognition problems, such as optical Face recognition addresses the problem of identifying or
character recognition, object recognition, and verifying one or more persons by comparing input faces with
autonomous robot driving. The advantage of using the face images stored in a database [6].
neural networks for face recognition is the feasibility of While humans quickly and easily recognize faces
training a system to capture the face patterns. However, under variable situations or even after several years of
one drawback of network architecture is that it has to be separation, the problem of machine face recognition is still a
extensively tuned (number of layers, number of nodes, highly challenging task in pattern recognition and computer
learning rates, etc.) to get exceptional performance. vision. Face recognition in outdoor environments is a
Support Vector Machines can also be applied to face challenging task especially where illumination varies
detection [8]. Support vector machines can be considered greatly. Performance of visual face recognition is sensitive
as a new paradigm to train polynomial function, or to variations in illumination conditions. Since faces are
neural networks. essentially 3D objects, lighting changes can cast significant
shadows on a face. This is one of the primary reasons why
I.INTRODUCTION current face recognition technology is constrained to indoor
access control applications where illumination is well
Face recognition has developed over 30 years and is controlled. Light reflected from human faces also varies
still a rapidly growing research area due to increasing significantly from person to person. This variability, coupled
demands for security in commercial and law enforcement with dynamic lighting conditions, causes a serious problem.
applications. Although, face recognition systems have Face recognition can be classified into two broad
reached a significant level of maturity with some practical categories: feature-base and holistic methods. The analytic
success, face recognition still remains a challenging problem or feature-based approaches compute a set of geometrical
due to large variation in face images. Face recognition is features from the face such as the eyes, nose, and the mouth.
usually achieved through three steps: acquisition, The holistic or appearance-based methods consider the
normalization and recognition. This acquisition can be global properties of the human face pattern.
accomplished by digitally scanning an existing photograph Data reduction and feature extraction schemes
or by taking a photograph of a live subject [2]. make the face recognition problem computationally
tractable. Some of the commonly used methods for visual CELLULAR NEURAL NETWORK
face recognition is as follows,
Cellular neural networks or cellular nonlinear
NEURAL NETWORK BASED FACE RECOGNITION networks (CNN) provide an attractive paradigm for very
large-scale integrated (VLSI) circuit architecture in
A neural network can be used to detect frontal view applications devoted to pixel-parallel image processing. The
of faces. Each network is trained to provide the output as the resistive-fuse network is well-known as an effective model
presence or absence of a face [9]. In this the training for image segmentation, and some analog circuits
methods are designed to be general, with little customization implementing. Gabor filtering is an effective method for
for faces. Many face detection have used the idea that facial extracting the features of images, and it is known that such
images can be characterized directly in terms of pixel filtering is used in the human vision system. A flexible face
intensities. The algorithm such as neural network-based face recognition technique using this method has also been
detection method describes a retinally connected neural proposed [19]. To implement Gabor-type filter using analog
network examines small windows of an image, and decides circuits, CNN models have been proposed. A pulse-width
whether each window contains a face. It arbitrates between modulation (PWM) approach technique is used for achieving
multiple networks to improve performance over a single time-domain analog information processing. The pulse
network signals which have digital values in the voltage domain and
Training a neural network for the face detection analog values in the time domain. The PWM approach is
task is challenging because of the difficulty in characterizing suitable for the large-scale integration of analog processing
prototypical “no face” images. The two classes to be circuits because it matches the scaling trend in Si CMOS
discriminated in face detection are “images containing technology and leads to low voltage operation [20]. It also
faces” and “images not containing faces”. It is easy to get a has high controllability and allows highly effective matching
representative sample of images which contain faces, but with ordinary digital systems.
much harder to get a representative sample of those which
do not contain faces. III.THERMAL FACE RECOGNITION

Face recognition in the thermal infrared domain has

A NEURAL BASED FILTER received relatively little
attention when compared to visible face recognition.
It contains a set of neural network-based filters of Identifying faces from different imaging modalities, in
an image, and then uses an arbitrator to combine the outputs. particular the infrared imagery has become an area of
The filters examine each location in the image at several growing interest.
scales, looking for locations that might contain a face.
The arbitrator then merges detections from individual THERMAL CONTOUR MATCHING
filters and eliminates overlapping detections. It is a filter that The thermal face recognition extracts and matches
receives a input as 20x20 pixel region of the image, and thermal contours for identification. Such techniques include
generates an output ranging from 1 to -1, signifying the elemental shape matching and the eigenface method.
presence or absence of a face, respectively [12]. To detect Elemental shape matching techniques use the elemental
faces anywhere in the input, the filter is applied at every shape of thermal face images. Several different closed
location in the image. To detect faces larger than the window thermal contours can be observed in each face. The sets of
size, the input image is repeatedly reduced in size (by sub shapes are unique for each individual because they result
sampling), and the filter is applied at each size [13]. from the underlying complex network of blood vessels.
Variations in defining the thermal slices from one image to
SUPPORT VECTOR MACHINE another has the effect of shrinking or enlarging. In the
resulting shape the centroid location and other features of the
Among the existing face recognition techniques, shapes are constant.
subspace methods are widely used in order to reduce the
high dimensionality of the face image. Much research is A NON-ITERATIVE ELLIPSE FITTING
done on how they could express expressions. ALGORITHM
The Karhunen–Loeve Transform (KLT) is used to Ellipses are often used in face-recognition
produce the most expressive subspace for face representation technology such as, face detection and other facial
and recognition. Linear discriminant analysis (LDA) or component analysis. The use of an ellipse can be a powerful
Fisher face is an example of the most discriminating representation of certain features around the faces in the
subspace methods. It seeks a set of features that best thermal images. The general equation of a conic can be
separates the face classes. Another important subspace represented as
method is the Bayesian algorithm using probabilistic
subspace it is different from other subspace techniques, F(A,T) = AT = ax2+bxy+cy2+dx+ey+f
which classify the test face image into M classes of M
individuals, the Bayesian algorithm casts the face Where A = [a,b,c,d,e,f] and T = [x2,xy,y2,x,y,I]T. Commonly
recognition problem into a binary pattern classification used conic fitting methods minimize the algebraic distance
problem. The aim of the training of the SVMs will be to find in terms of least squares. While the minimization can be
the hyperplane (if the classes are linearly separable) or the solved by a generalized
surfaces which separate the six different classes[8]. eigenvalue system which can be denoted as


Where S = [X1, X2… Xn] T is called the design matrix. S =

DTD is called scatter matrix and C is a constant matrix. Least
squares conic fitting was commonly used for fitting ellipses,
but it can lead to other conics [6]. The non-iterative ellipse-
fitting algorithm that yields the best least square ellipse
fitting method has a low eccentricity bias, is affine-invariant,
and is extremely robust to noise.


There are several motivations for using fusion:

utilizing complementary information can reduce error rates;
use of multiple sensors can increase reliability. The fusion
can be performed using pixel based fusion in wavelet
domain and feature based fusion in eigen face domain.


Fusion in the eigenspace domain involves
combining the eigen features from the visible and IR images. Figure 1: A data fusion example. (a) Visual image, (b)
Specifically, first we compute two eigen spaces, one using thermal image, and (c) fused
the visible face images and the other using the IR face Image.
images. Then, each face is represented by two sets of eigen
features, the first computed by projecting the IR face image V.CONCLUSION
in the IR-eigenspace, and the second by projecting the In this paper fusion of visual thermal images was
visible face image in the visible-eigenspace. Fusion is discussed and the various methods such as neural networks
performed by selecting some eigen features from the IR- and support vector machine for recognition purpose were
eigenspace and some from the visible-eigenspace. discussed. Till now the cellular neural network was applied
only for visual face recognition [20]. But there are effective
PIXEL BASED FUSION IN WAVELET DOMAIN IR cameras which can take thermal image irrespective of the
Fusion in the wavelet domain involves combining surrounding conditions. So we propose that the same can be
the wavelet coefficients of the visible and IR images. To used for thermal face recognition to get effective results.
fuse the visible and IR images, we select a subset of
coefficients from the IR image and the rest from the visible REFERENCES
image. The fused image is obtained by applying the inverse
wavelet transform on the selected coefficients. [1] Y. Adini, Y. Moses, and S. Ullman, “Face Recognition:
The fusion can also be done by pixel-wise weighted The Problem of Compensating for Changes in Illumination
summation of visual and thermal images. Direction,” IEEE Trans. Pattern Analysis and Machine
Intell igence , Vol. 19, No. 7, pp.721-732, 1997.
F(x,y) = a(x,y)V(x,y)+b(x,y)T(x,y) [2] P. J. Phillips, P. Grother, R. J. Micheals, D. M.
Blackburn, E. Tabassi, and M. Bone, “Face Recognition
where F(x,y) is a fused output of a visual image, V(x,y) and Vendor Test 2002,” evaluation Report, National , pp.1-56,
a thermal image, T(x, ,y) 2003. Institute of Standards and Technology
while a(x,y) and b(x,y) represent the weighting factor of [3] M. S. Bartlett, J. R. Movellan, and T. J. Sejnowski, “Face
each pixel. A fundamental problem is: which one has more recognition by independent component analysis,” IEEE
weight at the pixel level. This can be answered if we know Trans. Neural Networks, Vol. 13, No. 6 pp.1450-1464, 2002.
the illumination direction which affects the face in the visual [4] Y. Yoshitomi, T. Miyaura, S. Tomita, and S. Kimura,
images and other variations which affect the thermal images. “Face identification using thermal image processing,” Proc.
Illumination changes in the visual images and facial IEEE Int. Workshop on Robot and Human , pp.374-379,
variations after exercise in the thermal images are also one 1997. Communication
of challenging problems in face recognition technology[14]. [5] J. Wilder, P. J. Phillips, C. Jiang, and S. Wiener,
Instead of finding each weight, we make use of the average “Comparison of Visible and Infrared Imagery for Face
of both modalities constraining both weighting factors a(x,y) Recognition,” Proc. Int. Conf. Automatic Face and , pp.182-
b(x,y) as 1.0. 187, 199. Gesture Recognition.
The average of visual and thermal images can [6] J. Heo, B. Abidi, S. Kong, and M. Abidi, “Performance
compensate variations in each other, although this is not a Comparison of Visual and Thermal Signatures for Face
perfect way to achieve data fusion. Figure 1 shows a fused Recognition,” Biometric Consortium, Arlington, VA, Sep
image based on average intensity using (a) visual and (b) 2003.
thermal images(c)fused image [7] Y.I. Tian, T. Kanade, J.F. Cohn, Recognizing action
units for facial expression analysis, IEEE Trans. Patt. Anal.
Mach. Intell. 23 (2) (2001) 97–115.
[8] J. Wan, X. Li, PCB Infrared thermal imaging diagnosis
using support vector classifier, Proc. World Congr. Intell.
Control Automat. 4 (2002) 2718–2722.
[9] Y. Yoshitomi, N. Miyawaki, S. Tomita, S. Kimura,
Facial expression recognition using thermal image
processing and neural network, Proc. IEEE Int. Workshop
Robot Hum. Commun. (1997) 380–385.
[10] E. Hjelmas, B.K. Low, Face detection: a survey,
Comput. Vis. Image Und. 83 (3) (2001) 236–274.
[11] M.H. Yang, D.J. Kriegman, N. Ahuja, Detecting faces
in images: a survey, IEEE Trans. Patt. Anal. Mach. Intell. 24
(1) (2002) 34–58.
[12] H.A. Rowley, S. Baluja, T. Kanade, Neural network-
based face detection, IEEE Trans. Patt. Anal. Mach. Intell.
20 (1) (1998) 23–38.
ccuratefacedetectorbasedonneuralnetworks, IEEE Trans.
Patt. Anal. Mach. Intell. 23 (1) (2001) 42–53.
[14] D. Socolinsky, L. Wol?, J. Neuheisel, C. Eveland,
Illumination invariant face recognition using thermal
infrared imagery, Comput. Vision Pattern Recogn. 1 (2001)
[15] X. Chen, P. Flynn, K. Bowyer, Visible-light and
infrared face recognition, in: Proc. Workshop on Multimodal
User Authentication, 2003, pp. 48–55.
[16] K. Chang, K. Bowyer, P. Flynn, Multi-modal 2d and 3d
biometrics for face recognition, in: IEEE Internat. Workshop
on Analysis and Modeling of Faces and Gestures, 2003, pp.
[17] R.D. Dony, S. Haykin, Neural network approaches t
image compression, Proc. IEEE 83 (2) (1995) 288–303.
[18] Dowdall, J., Pavlidis, I., Bebis, G.: A face detection
method based on multib and feature extraction in the near-IR
spectrum. In: Proceed- ings IEEE Workshop on Computer
Vision Beyond the Visible Spectrum: Methods and
Applications, Kauai, Hawaii (2002)
[19] T.Morie, S.Sakabayashi, H. Ando, A.Iwata, “Pulse
Modulation Circuit Techniques for Non- linear Dynamical
Systems,” in Proc. Int. Symp. on Non- linear Theory and its
Application (NOLTA’98), pp. 447– 450,Crans-
[20] AMulti-Functional Cellular Neural Network Circuit
Using Pulse Modulation Signals for Image Recognition
TakashiMorie, MakotoMiyake, SeiichiNishijima, Makoto
Nagata,and AtsushiIwata Faculty of Engineering, Hiroshima
University Higashi-Hiroshima,739-8527Japan.


Performance Analysis of Impulse Noise Removal Algorithms for

Digital Images
K.Uma1, V.R.Vijaya Kumar2
PG Student,2Senior Lecturer
Department of ECE,PSG College of Technology

Abstract:- In this paper, three different impulse noise impulse noise detection, refinement, and impulse noise
removal algorithms are implemented and their cancellation, which replaces the values of identified noisy
performances are analysed. First algorithm uses alpha pixels with median value.
trimmed mean based approach to detect the impulse
noise. Second algorithm follows the principle of A.IMPULSE NOISE DETECTION
multistate median filter. Third algorithm works under
the principle of thresholding. Experimental result shows Let I denote the corrupted, noisy image of
that these algorithms are capable of removing impulse size l1 × l2 , and X ij is its pixel value at position ( i, j ) . Let
noise effectively compared to many of the standard filters
in terms of quantitative and qualitative analysis. Wij denote the window of size ( 2 Ld + 1) × ( 2 Ld + 1)
centered about. X ij .
t− α t
The acquisition or transmission of digital images caused by
M ij ( I ) = ∑ X
t − 2* α t i = α t +1 (i)
through sensors or communication channels is often
interfered by impulse noise. It is very important to eliminate
noise in the images before subsequent processing, such as t = ( 2 Ld + 1)2 . is the trimming parameter that assumes
image segmentation, object recognition, and edge detection. values between 0 and 0.5, X(i) represents the ith data item in
Two common types of impulse noise are the salt and pepper
the increasingly ordered samples of Wij i.e.
noise and the random value impulse noise. There are large
number of techniques have been proposed to remove x(1) x(2) ………x(t). That is,
impulse noise from corrupted images. Many existing X ( i ) =ith smallest (Wij ( I ))
methods are an impulse detector to determine whether a
pixel should be modified. Images corrupted by salt and The alpha trimmed mean M ij ( I ) with appropriately
pepper noise, the noisy pixels can take only the maximum chosen α ,represents the approximately the average noise
and minimum values. Median filter[6] was once the most
popular non linear filter for removing impulse noise, because free pixel values within the window (Wij ( I )) Absolute
of its good denoising power and computational efficiency. difference between xij and M ijα ( I )
However, when the noise level is over 50%, some details
and edges of the original image are smeared by the filter.
r = x M α (I ) .
Different remedies of the median filter have been proposed, ij ij − ij
e.g. the adaptive median filter, the multi-state median filter,
Switching strategy is also another method to identify the rij Should be relatively large for noisy pixel and small for
noisy pixels and then replace them by using the median filter noise free pixels.
or its variants. These filters are good at detecting noise even First, when the pixel xij is an impulse, it takes a value
at a high noise level. The main drawback of median filter is
details and edges are not recovered satisfactorily, especially substantially larger than or smaller than those of its
when the noise level is high. NASM [4] filter performs and neighbors. Second, when the pixel xij is a noise-free pixel,
achieve fairly close performance to that of ideal switching which could belong to a flat region, an edge, or even a thin
median filter. Weighted median filter control the filter ring line, its value will be very similar to those of some of its
performance in order to preserve the signal details. Centre neighbors. Therefore, we can detect image details from noisy
weighted median filter where only the centre pixel of pixels by counting the number of pixels whose values are
filtering window has weighting factor and then Filtering
similar to that of xij in its local window.
should be applied to corrupted pixels only while leaving the
uncorrupted ones. Switching based median ⎧0 xi −u , j −v − xij < T
filter[4]methodologies by applying no filtering to true pixels, δ i −u , j − v = ⎨ ,
standard median filter to remove impulse noise. Mean filter; ⎩1 otherwise
rank filter and alpha trimmed mean filter are also used to
remove impulse noise.
T is a predetermined parameter, δ i −u , j −v =1 indicates the
pixel xi −u , j −v is similar to that of pixel xij . ξij denotes the
Alpha trimmed mean based approach [1] is used to detect the number of pixels which are similar to that of neighbour
impulse noise. This algorithm consists of three steps: pixels.

ξij = ∑
− Ld ≤ u , v ≤ Ld
δ i −u , j − v
This paper is under median based switching
⎧0 ξ i , j ≥ N
schemes, called multi-state median[2] (MSM) filter. By
ϕi , j = ⎨ , using simple thresholding logic, the output of the MSM [5]
⎩1 otherwise filter is adaptively switched among those of a group of
center weighted median (CWM) filters that have different
N is a predetermined parameter. ϕi , j =0 indicates xij is a center weights. As a result, the MSM filter is equivalent to
noise free pixel. an adaptive CWM filter with a space varying center weight
⎧0 ϕi , j = 0
which is dependent on local signal statistics. The efficacy of
rij ∗ ϕi , j = ⎨ , this filter has been evaluated by extensive simulations. By
⎩1 ϕi , j = 1 employing simple thresholding logic; the output of the
proposed multi-state median (MSM) filter is then adaptively
R (1) = rij × ϕij . switched among those of a group of CWM filters that have
varying center weights. As a result, the MSM filter is
equivalent to an adaptive CWM[6] filter with a space
varying center weight which is dependent on local signal
Sij and Xij denote the intensity values of the original image
and the observed noisy image, respectively, at pixel
location ( i, j ) .
(a) (b) The output of CWM filters, in which a weight
adjustment is applied to the center or origin pixel Xij within a
sliding window, can be defined as
Yij = median ( X ijw )
X ijw = { X i − s , j −t , W◊ X ij
The median is then computed on the basis of those 8+w
samples. Here w denotes the centre weight.
(c) (d) The output of a CWM filter with center weight w can also be
represented by
Yijw = median{ X ij ( K ), X ij , X ij ( N + 1 − k ) }
Where k = ( N + 2 − w) / 2 .
Based on the fact that CWM filters with different center
weights have different capabilities of suppressing noise and
preserving details. This can be realized by a simple
(e) thresholding operation as follows.
Fig 1 Impulse noise detection (a) Corrupted by 20%fixed For the current pixel Xij, we first define differences
value impulse noise (b) Absolute difference image (c)Binary
flag (d) Product of binary image and absolute difference d w = Yijw − X ij w = 1,3,5.......N − 2
image.(e)Restored image.

R (1) Retains the impulse noise and remove the image

details. Next apply a fuzzy impulse detection technique for
each pixel. Fuzzy flag is used to measure how much the
pixel is corrupted. Noisy pixels are located near one of the
two samples in the ordered samples. Based on this (a) (b)
observation refinement of fuzzy flag can be Fig 2 Space invariant median filter (a) Noisy image (20%
generated.Accoding to that impulse noise can be effectively impulse noise). (b) Restored image
Compare this method to median filter it shows better These differences provide information about the likelihood
performance and it also remove the random value impulse of corruption for the current pixel Xij. For instance, consider
noise. To demonstrate the superior performance of this the difference dN_2. If this value is large, then the current
method, extensive experiments have been conducted on a pixel is not only the smallest or the largest one among the
variety of standard test images to compare this method with observation samples, but very likely contaminated by
many other well known Techniques.
impulse noise. If d1 is small, the current pixel may be
regarded as noise-free and be kept unchanged in the filtering.

Together, differences d1 through d N − 2 reveal even more
information about the presence of a corrupted pixel. A
classifier based on differences d w is employed to estimate
the likelihood of the current pixel being contaminated. An
attractive merit of the MSM filtering technique is that it
provides an adaptive mechanism to detect the likelihood of a .
pixel being corrupted by impulse. As a result, it satisfactorily (a) (b)
trades off the detail preservation against noise removal by Fig 3 Multiple threshold (a) Noisyimage (20%) (b)Restored
adjusting the center weight of CWM filtering, which is image.
dependent on the local signal characteristics. Furthermore, it
possesses a simple computation structure for COMPARISIONS

implementation. 45 ATM


A novel decision-based filter, called the multiple 0
10 20 30 40 50

thresholds switching (MTS) filter [3], is to restore images NOISE DENSITY

corrupted by salt-pepper impulse noise. The filter is based on

a detection-estimation strategy. The impulse detection Fig 4 Performance Comparisons between various filters.
algorithm is used before the Filtering process, and therefore
only the noise-corrupted pixels are replaced with the III. CONCLUSION
estimated central noise-free ordered mean value in the In this paper removal of impulse noise was
current filter window. The new impulse detector, which uses discussed and the detection of impulse noise by various
multiple thresholds with multiple neighborhood information methods such as Fuzzy flag, noise refinement and classifier
of the signal in the filter window, is very precise, while was discussed. Restoration performances are quantitatively
avoiding an undue increase in computational complexity.To measured by the peak-signal-to-noise-ratio (PSNR), MAE,
avoid damage to good pixels, decision-based median filters MSE. Impulse noise detection by alpha trimmed mean
realized by thresholding operations have been introduced. approach method provides a significant improvement on
In general, the decision-based filtering procedure other state-of-the-art methods. Among the various impulse
consists of the following two steps: an impulse detector that noise removal algorithms, the alpha trimmed mean based
classifies the input pixels as either noise-corrupted or noise- approach yield better PSNR when compared to other
free, and a noise reduction filter that modifies only those algorithms
pixels that are classified as noise-corrupted. In general, the
main issue concerning the design of the decision-based REFERENCES
median filter focuses on how to extract features from the
local information and establish the decision rule, in such a [1]Wenbin Luo (2006),’ An Efficient Detail-Preserving
way to distinguish noise-free pixels from contaminated ones Approach for Removing Impulse Noise in Images’, IEEE
as precisely as possible. In addition, to achieve high noise signals processing letters, vol. 13, No.7,pp.413-416.
reduction with fine detail preservation, it is also crucial to [2] Tao Chen and Hong Ren Wu (2001), ‘Space Variant
apply the optimal threshold value to the local signal Median\
statistics. Usually a trade-off exists between noise reduction Filters for the Restoration of Impulse Noise Corrupted
and detail preservation. Images’,
This filter takes a new impulse detection strategy to build the IEEE transactions on circuits and systems—ii: analog and
decision rule and practice the threshold function. The new digital
impulse detection approach based on multiple thresholds Signal processing, Vol. 48, NO. 8, pp.784-789.
considers multiple neighborhood information of the filter [3] Raymond chan (2006),’ Salt-Pepper Impulse Noise
window to judge whether impulse noise exists. The new Detection and Removal Using Multiple Thresholds for
impulse detector is very precise without, while avoiding an Image Restoration’ Journal of I[4] How-Lung Eng and Kui-
increase in computational complexity. The impulse detection Kuung Ma (2000),‘Noise Adaptive Soft-Switching Median
algorithm is used before the filtering process starts, and Filter for Image Denoising’, IEEE Transactions on
therefore only the noise-corrupted pixels are replaced with Acoustics, speech and signal processing Vol.6,pp.2175-
the estimated central noise-free ordered mean value in the 2178.
current filter window. Extensive experimental results [5] Tao Chen, Kai-Kuang Ma, and Li-Hui
demonstrate that the new filter is capable of preserving more Chen,(1999),’Tristate median filter for image
details while effectively suppressing impulse noise in denoising’,IEEE transactions on Image processing ,Vol.
corrupted images. 8,No.12.
[6] J. Astola and P. Kuosmanen (1997), Fundamentals of
Nonlinear Digital Filtering. Boca Raton, FL:
CRCnformation Science and Engineering 22, pp. 189-198.


Confidentiality in Composition of Clutter Images

G.Ignisha Rajathi, M.E -II Year , Ms.S.Jeya Shobana, M.E.,Lecturer,
Department Of CSE, Department Of CSE,
Francis Xavier Engg College , TVL. Francis Xavier Engg College , TVL

Abstract-In this whirlpool world, conveying highly F5 algorithm is different that uses subtraction and matrix
confidential information secretly has become one of the encoding to embed data into the (DCT) coefficients.
important aspects of living. With the increasing distance,
it has enunciated the coverage with communication II.PROPOSED SYSTEM
through computer technology having made simple,
nowadays. Ex for hidden communication is 1. Select the image for Stego image preparation.
STEGANOGRAPHY. The outdated trend of information 2. Generate the Stego image using F5 Algorithm.
hiding (secrets) behind the text, is now hiding the secrets 3. Select the Back Ground image.
behind the clutter images. The change of appearance of 4. Embed the stego image in the back ground image-
pictures smells fragrant rather than changing its Collage Steganography.
features. 5. Finally extract it.
This paper, combines 2 processes. The simple OBJECT
IMAGE is used for the Steganography process that is
done based on F5 algorithm. The prepared Stegoimages A COMPLETE SYSTEM DESIGN
are placed on the BACKGROUND IMAGE that is SENDER SIDE :
COLLAGE STEGANOGRAPHY. Here the patchwork is
done by changing the type of each object as well as its
location. The increased number of images leads to
increased amount of info hiding.

Keywords : Steganography, Collage - Patchwork,

Information hiding, Package, Stego image, Steganalysis,
The word Steganography is a Greek word that means
‘writing in hiding’. The main purpose is to hide data in a
cover media so that others will not notice it. Steganography
has found use in military, diplomatic, personal & intellectual
applications. This is a major distinction of this method with
the other methods is, for ex, in cryptography, individuals see
the encoded data and notice that such data exists but they
cannot comprehend it. However, in steganography, STEGO IMAGE PREPARATION
individuals will not notice at all that data exists in the USUAL METHOD ( USING LSB ) :
sources. Most steganography jobs have been performed on LSB in BMP – A BMP is capable of hiding quite a large
images , video clips , text , music and sound. msg, but the fact that more bits are altered results in a larger
Among the methods of steganography, the most possibility that the altered bits can be seen with the human
common one is to use images. In these methods, features eye. i.e., it creates suspicion when transmitted between
such as pixels of image are changed in order to hide the parties.
information so as not to be identifiable by human users and Suggested applications: LSB in BMP is most suitable for
the changes applied on the image are not tangible. Methods applications where the focus is on the amount of information
of steganography in images are usually applied on the to be transmitted and not on the secrecy.
structural features of the image. The ways to identify LSB in GIF – GIF images are especially vulnerable to
steganography images and how to extract such information statistical – or visual attacks – since the palette processing
have been usually discovered while so far no method has that has to be done leaves a very definite signature on the
been used for applying steganography in the appearance of image. This approach is dependent on the file format as well
images. as the image itself.
This paper provides a new method for Suggested applications: LSB in GIF is a very efficient
steganography in image by applying changes on the algorithm to use when embedding data in a grayscale image.
appearance of the image and putting relevant pictures on a
background. Then, depending on the location, mode data is
hidden. The usage of F5 algorithm is used for the
preparation of Stego image. The process performed in this
PROPOSED METHOD(USING F5): decrement absolute value of
DCT coefficient Gs insert Gs into stego
The new method evolved with more functionalities to img
over ride all the other techniques was evolved as the F5 end if
Steganographic Algorithm. It provides more security & it is until s = 0 or Gs 0
found to be more efficient in its performance to be an ideal insert DCT coefficients from G into stego image
one among various other techniques for preparing Stego- end while
“F5” term represents 5 functions
1. Discrete Cosine Transformation The F5 Algorithm is used as follows: Instead of
2. Quantization(Quality) replacing the LSB of a DCT coefficient with message data,
3. Permutation(Password-Driven) F5 decrements its absolute value in a process called matrix
4. Embedding Function (Matrix Encoding) encoding. So, there is no coupling of any fixed pair of DCT
5. Huffman Encoding. coefficients.
Matrix encoding computes an appropriate (1, (2k – 1),
k) Hamming code by calculating the message block size k
SYSTEM FLOW DIAGRAM – EMBEDDER ( F5 from the message length and the number of nonzero non-DC
ALGORITHM ) : co-eff. The Hamming code (1, 2k– 1, k) encodes a k-bit
message word ‘m’, into an n-bit code word ‘a’, with n = 2k –
1.F5 uses the decoding function f(a) = ni=1 ai.i and the
Hamming distance d.
In other words, we can find a suitable code word a
for every code word a and every message word m so that m
= f(a ) and d(a, a ) 1. Given a code word a and message
word m, we calculate the difference s = m⊕f(a) and get the
new code word as

First, the DCT

coefficients are permutated by a keyed pseudo-random
number generator (PRNG), then arranged into groups of ‘n’
while skipping zeros and DC coefficients. The message is
split into k-bit blocks. For every message block ‘m’, we get
an n-bit code word ‘a’ by concatenating the least significant
bit of the current coefficients’ absolute value.
If the message block ‘m’ and the decoding f(a) are
the same, the message block can be embedded without any
SIZE VERIFICATION(STEGO IMG): changes; otherwise, we use s = m⊕f(a) to determine which
While choosing the object image, it has to be noted coefficient needs to change. If the coefficient becomes zero,
that the size of the text file should always be less than the shrinkage happens, and it is discarded from the coefficient
size of the object image, to fit in the object image for hiding group.
process. For ex, the size of the text file of 1KB can easily fit The group is filled with the next nonzero coefficient
into the object image of 147 x 201 JPEG size. and the process repeats until the message can be embedded.
For smaller messages, matrix encoding lets F5 reduce the
F5 ALGORITHM : number of changes to the image—for ex, for k= 3, every
change embeds 3.43 message bits while the total code size
Input: msg,shared secret,cover img more than doubles. Because F5 decrements DCT
Output: stego image coefficients, the sum of adjacent coefficients is no longer
initialize PRNG with shared secret invariant.
permutate DCT co-eff with PRNG
determine k from image capacity Steganographic interpretation
calculate code word length n 2k – 1 – Positive coefficients: LSB
while data left to embed do – Negative coefficients: inverted LSB
get next k-bit message block Skip 0, adjust coefficients to message bit
repeat – Decrement positive coefficients
G {n non-zero AC co-eff} – Increment negative coefficients
s k-bit hash f of LSB in G – Repeat if shrinkage occurs
s s⊕k-bit message block [Straddling]Permutation equalizes the change density
if s 0 then Scatters the message more uniformly
– Key-driven distance schemes
– Parity block schemes
Independent of message length To determine the type and location of each object, first
[Matrix Encoding]Embed k convert the input text to arrays of bits. Now calculate the
bits by changing one of n = 2 k–1 places: change embedding number of possible modes for the first object, e.g. there were
4,000 modes for the airplane. Calculate power(2), i.e,< the
k n density rate efficiency number of modes. Equal to the number of this power, we
1 1 50 % 100 % 2 read the bit input array. For ex, the closest power of 2 to
2 3 25 % 66.7 % 2.7 number 4,000 is equal to 211=2048. Then we read 11 bits of
3 7 12.5 % 42.9 % 3.4 the input array. For ex, if the 11 obtained bits are
4 15 6.25 % 26.7 % 4.3 00001100010, the number is 98. Now, to find the location &
k n >k type of object divide the obtained number by the number of
modes of the object. For ex, divide 98 by 4 (types), the
This Stego image preparation moves on to the next step of remainder is 2. The airplane type is military.
collage preparation. HORIZONTAL & VERTICAL DISP:
Now, we divide the quotient of this division by the
COLLAGE STEGANOGRAPHY number of possible columns in which the object can be
displaced. For ex, here we divided 24( which is the quotient
BRIEF INTRODUCTION : of division of 98 by 4) by 20. The remainder shows the
The stego image prepared out of the object image , that holds amount for displacement in horizontal direction and
the secret text , should be placed on the appropriate locations quotient shows the amount for displacement in vertical
in the background image that is COLLAGE direction. For ex, for the airplane, we have: Horizontal disp:
STEGANOGRAPHY. Usually the different modes of the 24%20 = 4 Vertical disp: 24/20 = 1
relevant object images & locations are held on a separate file By adding these two quantities with the primary location of
in db & sent as a key to the sender & the receiver. the picture, the image location is determined. For ex, for the
choosing the background image, note that the size of the Horizontal position: 600+4=604
object image should always be less than the size of the Vertical position: 400+1=401
background image, so that the stego image can fit into the Thus, the type and location of the other objects are
background image. For ex, the size of the object image of also found. Now, the image is sent along with the key
147 x 201 JPEG can easily fit into the background image of file(object name, object types, object location & object
size 800 x 600 JPEG. The Starting Location points are displacement).This is collage steganography.
specified, that has to be checked with its full size on X & Y- EXTRACTION
axis to fit in the background. For ex, Consider the size of the STEGO-IMAGE EXTRACTION
background as 800 x 600 and of the object image as 147 x (FROM BACKGROUND IMAGE) :
201, then the starting points specified for the placement of While extracting information, the program relating
the object image should be always less than (673, 399) . to the key file, finds the type and location of the object. For
i.e., 800-147 = 673 and 600-201 = 399. ex, from the rectangular area with apexes [(0,0), (600,0),
COLLAGE PROCESS: Here first a picture is selected as the (600,400), (0,400)] to the rectangular area with apexes
background image. For ex, the picture of an airport runway. [(20,50), (620,50), (620,450), (20,450)], it searches airplanes
Then the images of a number of appropriate objects are of diff types. Now, considering the type and location of the
selected with the background, for ex birds in the sky, an object, we find the figure of this mode.
airplane on the round and a guide car. For each of these SECRET MESSAGE EXTRACTION (FROM STEGO
objects various types are selected, for ex for the airplane IMAGE) :
several types of airplane such as training airplanes, Finally, the suspect and the key file in hand, we carry out the
passenger airplanes, military airplanes and jet airplanes. inverse of the actions performed in stego image preparation
THEORETICAL LOCATION : using F5 algorithm for all objects and putting in the
Each of the selected objects can be placed only in a corresponding bits next to each other the information hidden
certain area. For instance, if a background scene is 480*680 in the image is obtained from package.
pixels, the permissible area for the position of the airplane
image with dimensions of 100*200 can be a rectangular area EXPERIMENTAL RESULTS
with apexes [(0,0), (0,600), (600,400), (0,400)] to the
rectangular area with apexes [(20,50), (620,50), (620,450), EMBEDDING :
(20,450)], with disp up to 50 pixels to the right & 20 pixels
to the bottom. The project implementation is in Java. The
MODE SPECIFICATION : steganography program uses the key file, to load information
In view of the above factors (existing object, type relating to the background image and the objects in the
of object and location of object), one can create pictures in image. The objname, the x and y coordinate positions are
different positions. For ex, for the object of airplane in the provided, in which the types of the object are under the name
above ex, there are 4 types of airplanes (training, passenger, imageXX.png(objtype). The pictures are in JPEG format.
military or jet airplane) and 1,000 different positions Finally, there are displacement of the picture in horizontal
(20*50=1,000), which be 4,000 modes. There are two other and vertical directions.
objects (bird and car), which has 2000 different modes. In Here, we selected the picture of a house as the
this picture the number of modes = 16*109. bkgnd & putting in 3 objects (car, animal & human). For
(4000*2000*2000=16*109). each of the objects, we considered 4 different types. The text
for hiding in the image is prepared. Then, the appropriate
type of the object and its location are calculated acc. to the
input information and it is placed on the background image
& saved as JPEG.


The object image sizes should be proportionate to

bckgnd image size. The decoder loads key file. Then the
collage-stego image, is received from the user. According to
the key file & the algorithm, it finds type & location of each
object in the image and calculates corresponding number of
each object & extracting the bits hidden in the image. Then,
by placing the extracted bits besides each other, the hidden
text was extracted and shown to the user.


It is changing the appearance of the image by using the

displacement of various coordinated objects rather than
changing the features.

F5 algorithm for Stego image preparation is more suitable

for large scale of data and for color images. Creating a large
bank of interrelated images, one can hide a large size of
information in the image.


¾ Applied for color, grayscale & binary images.

¾ Hide – Print – Scan – Extract.
¾ No change in features of background image as only the
appearance is changed.
¾ Collage stego image –overall can’t be detected.
¾ Increased number of appropriate objects – Increased
storage of messages


1. Niels Provos and Peter Honeyman, "Hide and Seek: An

Introduction to Steganography," Security &
Privacy Magazine, May/June 2003, pp. 32-44.
2. Mohammad Shirali-Shahreza and Sajad Shirali-
Shahreza,”Collage Steganography”, Computer Science
Department, Sharif University of Technology, Tehran,


VHDL Implementation of Lifting Based Discrete Wavelet Transform

M.Arun Kumar1, C.Thiruvenkatesan2
1: M.Arun Kumar, II yr M.E.(Applied Electronics)
SSN College of Engineering,
2: Mr.C.Thriuvenkatesan, Asst. Professor, SSN College of Engineering, Chennai

Abstract: Wavelet Transform has been successfully 1. To implement the 1-D and 2-D Lifting Wavelet
applied in different fields, ranging from pure Transform (LWT) in MATLAB to understand the
mathematics to applied science. Software implementation concept of lifting scheme.
of the Discrete Wavelet Transform (DWT), however 2. To develop the lifting algorithm for 1-D and 2-D
greatly flexible, appears to be the performance DWT using C language.
bottleneck in real-time systems. Hardware 3. To implement the 1-D LWT in VHDL using
implementation, in contrast, offers a high performance prediction and updating scheme
but is poor in flexibility. A compromise between these 4. To implement the 5/3 wavelet filter using lifting
two is reconfigurable hardware. For 1- D DWT, the scheme.
architectures are mainly convolution-based and lifting-
based. On the other hand direct and line-based methods Lifting Scheme Advantages
are the most possible implementations for the 2-D DWT.
The lifting scheme to construct VLSI architectures for The lifting scheme is a new method for constructing
DWT outperforms the convolution based architectures in biorthogonal wavelets. This way lifting can be used to
many aspects such as fewer arithmetic operations, in- construct second-generation wavelets; wavelets that are not
place implementation and easy management of boundary necessarily translate and dilate of one function. Compared
extension. But the critical path of the lifting based with first generation wavelets, the lifting scheme has the
architectures is potentially longer than that of the following advantages:
convolution based ones and this can be reduced by
employing pipelining in the lifting based architecture.
The lifting based architecture. 1-D and 2-D DWT using • Lifting leads to a speedup when compared to the
lifting scheme have been obtained for signals and images classic implementation. Classical wavelet transform
respectively through MATLAB simulation. The Liftpack has a complexity of order n, where n is the number
algorithm for calculating the DWT has been of samples. For long filters, Lifting Scheme speeds
implemented using ‘VHDL’ language. The Lifting up the transform with another factor of two. Hence
algorithm for 1-D DWT has also been implemented in it is also referred to as fast lifting wavelet transform
• All operations within lifting scheme can be done
entirely parallel while the only sequential part is the
1. INTRODUCTION order of lifting operations.

Mathematical transformations are applied to signals Secondly, the lifting scheme can be used in
to obtain further information from that signal that is not situations where no Fourier transform is available.
readily available in the raw signal. Most of the signals in Typical examples include Wavelets on bounded
practice are time-domain signals (time-amplitude domains, Wavelets on curves and surfaces, weighted
representation) in their raw format. This representation is not wavelets, and Wavelets and irregular sampling.
always the best representation of the signal for most signal
processing related applications. In many cases, the most II. Lifting Algorithm
distinguished information is hidden in the frequency content
(frequency spectrum) of the signal. Often times, the
information that cannot be readily seen in the time-domain The basic idea behind the Lifting Scheme is very
can be seen in the frequency domain. Fourier Transform simple that is to use the correlation in the data to remove
(FT) is a reversible transform, that is, it converts time- redundancy. To this end, first the data is split into two sets
domain signal into frequency-domain signal and vice-versa. (Split phase): the odd samples and the even samples
However, only either of them is available at any given time. (Figure 2). If the samples are indexed beginning with 0
That is, no frequency information is available in the time- (the first sample is the 0th sample), the even set comprises
domain signal, and no time information is available in the all the samples with an even index and the odd set
Fourier transformed signal. Wavelet Transform (WT) contains all the samples with an odd index. Because of the
addresses this issue by providing time-frequency assumed smoothness of the data, it is predicted that the
representation of a signal or an image. The objectives odd samples have a value that is closely related to their
proposed in the thesis are neighboring even samples. N even samples are used to
predict the value of a neighboring odd value (Predict
phase). With a good prediction method, the chance is high

that the original odd sample is in the same range as its Here follows a summary of the steps to be taken for both
prediction. The difference between the odd sample and its forward and inverse transform.
prediction is calculated and this is used to replace the odd
sample. As long as the signal is highly correlated, the
newly calculated odd samples will be on the average
smaller than the original one and can be represented with
fewer bits. The odd half of the signal is now transformed.
To transform the other half, we will have to apply the
predict step on the even half as well. Because the even half
is merely a sub-sampled version of the original signal, it
has lost some properties that are to be preserved. In case
of images for instance, the intensity (mean of the samples)
is likely kept as constant throughout different levels. The
Figure.3 The lifting Scheme, inverse transform: Update,
Predict and Merge stages



This section explains the method to calculate the lifting

based 1-D DWT using MATLAB.
Noisy input signal

third step (Update phase) updates the even samples using 500

the newly calculated odd samples such that the desired 450

property is preserved. These three steps are repeated on 400

the even samples and each time half of the even samples


are transformed, until all samples are transformed. These 300

three steps are explained in the following section in more 250

detail. 200


Figure.1 Predict and update stages 100

0 500 1000 1500 2000 2500 3000 3500 4000
Sampling instant n-->

III.THE INVERSE TRANSFORM Figure .4.The input signal with noise signals
Approximation A1 Detail D1
550 25
One of the great advantages of the lifting scheme realization 500 20
of a wavelet transform is that it decomposes the wavelet 15
filters into extremely simple elementary steps, and each of 10
these 400
A m plitude

A m plitude



150 -20

100 -25
0 1000 2000 3000 4000 0 1000 2000 3000 4000
Sampling instant n--> Sampling instant n-->

Figure.5. The Approximation and Detail


Figure2 Multiple levels of decomposition

steps is easily invertible. As a result, the inverse wavelet

transform can always be obtained immediately from the
forward transform. The inversion rules are trivial: revert the
order of the operations, invert the signs in the lifting steps,
and replace the splitting step by a merging step. The block
diagram for the inverse lifting scheme is shown in Figure.3.
Reconstructed image
This section explains the method to calculate the lifting Output image
based 2-D DWT using MATLAB.
Input image

Figure.8.Recontructed image

Figure.6 The cameraman input image
The lifting scheme to construct VLSI architectures for DWT
outperforms the convolution based architectures in many
aspects such as fewer arithmetic operations, in-place
Here the Haar wavelet is used as the mother
implementation and easy management of boundary
wavelet function and using elementary lifting steps lifts it.
extension. But the critical path of the lifting based
Then this new lifted wavelet is used to find the wavelet
architectures is potentially longer than that of the
transform of the input signal. This results in two output
convolution based ones and this can be reduced by
signals and they are called as approximated signal and detail
employing pipelining in the lifting based architecture. 1-D
signals. The approximation represents the low frequency
and 2-D DWT using lifting scheme have been obtained for
components present in the original input signal. The detail
signals and images respectively through MATLAB
gives high frequency components in the signal and it
represents the hidden details in the signal. If we do not get
sufficient information in the detail then the approximation is
again decomposed into approximation and details. This
decomposition occurs until sufficient information about the
image is recovered. Finally inverse lifting scheme is
À A VLSI Architecture for Lifting-Based Forward
performed on the approximation and detail images to
and Inverse Wavelet Transform by Kishore Andra
reconstruct the original image. If we compare the original
et al IEEE 2002
(Figure.4) and reconstructed (Figure.5) images they look
À Flipping Structure: An Efficient VLSI Architecture
exactly same and the transform is loss less.
for Lifting-Based Discrete Wavelet Transform by
Approximation image Detail image-Horizontal Chao-Tsung Huang. Et al IEEE 2004
À Generic RAM-Based Architectures for Two-
Dimensional Discrete Wavelet Transform With
Line-Based Method by Chao-Tsung Huang. et al
IEEE 2005
À Evaluation of design alternatives for the 2-D-
Detail image-Vertical Detail image-Diagonal
discrete wavelet transform by Zervas. N. D, et
al.IEEE 2001
À Efficient VLSI architectures of lifting-based
discrete wavelet transform by systematic design
method Huang. C.-T, et al Proc.IEEE 2002.
À Lifting factorization-based discrete wavelet
transform architecture design Jiang. W, et al.IEEE
Figure.7.Approximation and detail images 2001


VLSI Design Of Impulse Based Ultra Wideband Receiver For

Commercial Applications
G.Srinivasa Raja1, V.Vaithianathan2
1: G.Srinivasa Raja, II yr M.E.(Applied Electronics)
SSN College of Engineering,
Old Mahabalipuram Road, SSN Nagar - 603 110.
2: Mr.V.Vaithianathan, Asst. Professor, SSN College of Engineering, Chennai

Abstract: An Impulse based ultra-wide band (UWB) Ultra-wide band technology based on the Wi-Media
receiver front end is presented in this paper. The standard brings the convenience and mobility of wireless
Gaussian modulated pulses of frequency ranges between communications to high-speed interconnects in devices
3.1-10.6GHz satisfying Federal Communication throughout the digital home and office. Designed for low-
Commission spectral mask is received through omni power, short-range, wireless personal area networks, UWB
directional antenna and fed into the corresponding is the leading technology for freeing people from wires,
LNA’s, Filters and detectors. The Low noise amplifiers, enabling wireless connection of multiple devices for
filters, detectors are integrated on a single chip and transmission of video, audio and other high-bandwidth data.
simulated using 0.18 m CMOS Technology. All these
simulation is done using Tanner EDA tool along with
Puff software for supporting filter and amplifier designs.


Ultra-Wide Band (UWB) wireless communications offers a

radically different approach to wireless communication
compared to conventional narrow band systems. Comparing
other wireless technologies Ultra wide band has some
specific characteristics i.e., High-data-rate communications
at shorter distances, improved channel capacity and
immunity to interference. All these made UWB useful in
military, imaging and vehicular applications. This paper
includes the design of Impulse based Ultra Wide band Fig. 1 History of UWB
receiver which can be made inbuilt in systems to avoid
cable links at shorter distances and works with low power. UWB's combination of broader spectrum and lower power
The receiver has low complexity design due to impulse- improves speed and reduces interference with other wireless
based signal (i.e., absence of local oscillator) made it easily spectra. It is used to relay data from a host device to other
portable. Ultra-wideband communications is not a new devices in the immediate area (up to 10 meters, or 30 feet).
technology; in fact, it was first employed by Guglielmo UWB radio transmissions can legally operate in the range
Marconi in 1901 to transmit Morse code sequences across from 3.1 GHz to 10.6 GHz; at limited transmit power of-
the Atlantic Ocean using spark gap radio transmitters. 41dBm/MHz. Consequently, UWB provides dramatic
However, the benefit of a large BW and the capability of channel capacity at short range that limits interference.
implementing multi user systems provided by A Signal is said to be UWB if it occupies at least
electromagnetic pulses were never considered at that time. 500MHz of BW and the Fractional Bandwidth Occupies
Approximately fifty years after Marconi, modern pulse- More than 25% of a center frequency. The UWB signal is a
based transmission gained momentum in military time modulated impulse radio signal is seen as a carrier-less
applications in the form of impulse radars. base band transmission.


802.11b 802.11
Emitted G P
Signal Microwave

“Part 15Limit”
1.6 1.9 2.4 3.1 5 10.6
Frequency Table.1 Comparison of wireless technologies

Fig. 2. UWB Spectrum

Classification of signals based on their fractional bandwidth:

Narrowband Bf < 1%
Wideband 1% < Bf < 20%
Ultra-Wideband Bf > 20%

Fig 3.Theoritical data rate over ranges

Types of receiver:
1. Impulse Type 1. Military.
2 .MultiCarrier Type 2. Indoor Applications (Such as WPAN (Wide Personal
Area Network)
Impulse – UWB 3. Outdoor (substantial) Applications but with very
Pulse of Very Short Duration (Typically few nano seconds) low data rates.
Merits and Demerits: 4. High-data-rate communications, multimedia applications,
1. High Resolution in Multipath Reduce Fading and cable replacement.
Margins, Low Complexity Implementation. Impulse: Radio technology that modulates impulse based
2. High precise synchronization & power during waveforms instead of continuous carrier waves.
the brief interval increases possibility of interference
Pulse Types:
Multi Carrier – UWB
1.Gaussian First derivative, Second derivative
Single data stream is split into multiple data streams of
2. Gaussian modulated Sinusoidal Pulse.
reduced rate, with Each Stream Transmitted into Separate
frequency. (Sub carrier) Sub carriers must be properly
spaced so that they do not interfere.
Merits and Demerits:
1. Well suited for avoiding interference because its carrier
frequency can be precisely chosen to avoid narrowband
2. Front-end design can be challenging due to variation in
3. High speed FFT is needed.

Fig .4 UWB Time-domain Behavior


UWB Frequency-domain Behavior

An attenuator circuit allows a known source of power to be

reduced by a predetermined factor usually expressed as
decibels. A powerful advantage of an attenuator is since it
is made from non-inductive resistors, the attenuator is able
to change a source or load, which might be reactive, into
one, which is precisely known and resistive. The attenuator
Fig.6 Impulse Based UWB Receiver achieves this power reduction without introducing
distortion. The factor K is called the ratio of current,
II. RECEIVER ARCHITECTURE voltage, or power corresponding to a given value of
attenuation "A" expressed in decibels.
These UWB impulse based receiver consists of impedance
matching circuits, LNA, filters and Detectors. Low Noise Amplifier: The Amplifier has two primary
Antenna: The purpose of the antenna is to capture the purposes. The first is to interface the antenna Impedance
propagating signal of interest. The ideal antenna for the over the band of interest to standard input impedance, such
wideband receiver would itself be wideband. That is, the as 50 or 75 Ω. The second purpose of the preamplifier is to
antenna would nominally provide constant impedance over provide adequate gain at sufficiently low noise figure to
the bandwidth. This would facilitate efficient power meet system sensitivity requirements. In the VHF low band,
transfer between the antenna and the preamplifier. a preamplifier may not necessarily need to be a low noise
However, due to cost and size limitations, wide band amplifier. An amplifier noise temperature of several
antennas are often not practical. Thus, most receivers are hundred Kelvin will usually be acceptable. Without a
limited to simple antennas, such as the dipole, monopole, or matching circuit, the input impedance of a wideband
variants. These antennas are inherently narrowband, amplifier is usually designed to be approximately constant
exhibiting a wide range of impedances over the bandwidth. over a wide range of frequencies. As shown in the previous
For the purpose of antenna-receiver integration, it is useful section, the antenna impedance can vary significantly over
to model the antenna using the ‘Pi’ equivalent.. a wide bandwidth. The resulting impedance mismatch may
result in an unacceptable loss of power efficiency between
R3 the antenna and the preamplifier. The use of matching
networks to overcome this impedance mismatch.

R1 R2
436 436

Fig.7 Impedance matching circuit

Fig.8 Low Noise Amplifier

Impedance matching is important in LNA design
because often times the system performance can be strongly
affected by the quality of the termination. For instance, the


frequency response of the antenna filter that precedes the It only needs to be large enough to ensure that interference
LNA will deviate from its normal operation if there are is suppressed to appoint that it does not cause the
reflections from the LNA back to the filter. Furthermore, undesirable effects. To satisfy these specifications, BPF can
undesirable reflections from the LNA back to the antenna be implemented using a passive LC Filter. The LC Filter
must also be avoided. An impedance match is when the can be combined with the input-matching network of the
reflection coefficient is equal to zero, and occurs when ZS = LNA.A low-pass filter is a filter that passes low-frequency
ZL There is a subtle difference between impedance signals but attenuates (reduces the amplitude of) signals
matching and power matching. As stated in the previous with frequencies higher than the cutoff frequency
paragraph, the condition for impedance matching occurs
when the load impedance is equal to the characteristic
impedance. However, the condition for power matching
occurs when the load impedance is the complex conjugate
of the characteristic impedance. When the impedances are
real, the conditions for power matching and impedance
matching are equal.
For the analysis of LNA design for low noise, the origins of
the noise must be identified and understood. The important
noise sources in CMOS transistors. Thermal noise is due to
the random thermal motion of the carriers in the channel. It
Fig .9 Band pass and Low pass filter Design
is commonly referred to as a white noise source because its
power spectral density holds a constant value up to very
Butterworth filter-3rd order
high frequencies (over 1 THz) Thermal noise is given by
Normalized values:
id 2 µ C22=0.6180F
= 4 K T (-Q )
∆ f L 2 C4=2.0000F
Induced gate noise is a high frequency noise source that is L13=1.6180H
caused by the non-quasi static effects influencing the power L15=1.6180H
spectral density of the drain current Induced gate noise has
a power spectral density given by Square law detector: A square law means that the DC
component of diode output is proportional to the square of
ω 2
C 2 the AC input voltage. So if you reduce RF input voltage by
id 2 g s
= 4 K T δ half, you'll get one quarter as much DC output. Or if you
∆ f 5 g d s 0 apply ten times as much RF input, you get 100 times as
much DC output as you did before.
Noise Figure: Noise figure (NF) is a measure of signal-to- Op-Amp: An operational amplifier, usually referred to as an
noise ratio (SNR) degradation as the signal traverses the op-amp for brevity, is a DC-coupled high-gain electronic
receiver front-end. Mathematically, NF is defined as the voltage amplifier with differential inputs and, usually, a
ratio of the input SNR to the output SNR of the system. single output. In its typical usage, the output of the op-amp
is controlled by negative feedback, which largely
Total Output Noise Power determines the magnitude of its output voltage gain, input
NF = impedance at one of its input terminals and output
Output Noise Power due to Source impedance. Then the output of the op-amp is fed into A/D
NF may be defined for each block as well as the entire converter for specific applications.
receiver. NFLNA, for instance, determines the inherent
noise of the LNA, which is added to the signal through the Simulated results:
amplification process.

Then the corresponding signal of the Low noise amplifier

with amplification is fed into consecutive stages to get the
required RF signal.
Filter Design: One major technique to combat interference
is to filter it out with band pass filters. For most band pass Tanner EDA simulation for LNA
filters, the relevant design parameter consists of the center
frequency, the bandwidth (which together with center
frequency defines the quality factor Q) and the out-of-band
suppression. The Bandwidth of the band selection filter is
typically around the band of interest and the center
frequency is the center of the band. The Q required is
typically high and the center frequency is high as well. On
the other hand, the suppression is typically not prohibitive.


“An Ultra-Wide band Transceiver Architecture for Low

Power, Low Rate, Wireless Systems” IEEE
VOL. 54, NO. 5, SEPTEMBER 2005(Pg: 1623-1631).
[3]Jeff Foerster, Intel Architecture Labs, Intel Corp. Evan
Green, Intel Architecture Labs, Intel Corp. Srinivasa
Somayazulu, Intel Architecture Labs, Intel Corp. David
Leeper, Intel Connected Products Division, Intel Corp.”
Ultra-Wideband Technology for Short- or Medium-Range
Wireless Communications” Intel Technology Journal Q2,
[4]Bonghyuk Park°, Seungsik Lee, Sangsung Choi
Electronics and Telecommunication Research Institute
[5]Gajeong-dong, Yuseong-gu, Daejeon 305-350, Korea “Receiver Block Design for Ultra
Band Pass Filter Simulation Using Puff wideband Applications” 0-7803-9152-7/05/$20.00 © 2005
IEEE (Pg: 1372-1375)
[6].Ultra-Wideband Wireless Communications Theory and
Applications-Guest EditorialIEEE JOURNAL ON
NO. 4, APRIL 2006.

Low Pass Filter Simulation using Puff


This Impulse based Ultra wide band receiver takes

very low power with minimum supply voltage and it is
easily portable. Though Various Wireless technologies are
being extinct, but UWB have interesting facts. So by
utilizing this fact, the receivers are designed and layout
using Tanner EDA tool with allowable bandwidth of 7GHz
and transmit with power of -41dBm/MHz.


[1].Journal of VLSI Signal Processing 43, 43–58, 2006

2006 Springer Science + Business Media, LLC.
Manufactured in The Netherlands.DOI: 10.1007/s11265-
006-7279-x “A VLSI Implementation of Low Power, Low
Data Rate UWB Transceiver for Location and Tracking
Centre for Wireless Communications, P.O. Box 4500, FIN-
90014 University of OULU
[2] Ian D. O’Donnell, Member, IEEE, and Robert W.
Brodersen, Fellow, IEEE


Distributed Algorithms for energy efficient Routing in Wireless

Sensor Networks
T.Jingo M.S.Godwin Premi .S.Shaji
Department of Electronics & Telecommunications Engineering
Sathyabama University
Jeppiaar Nagar, Old Mamallapuram Road, Chennai -600119

Abstract:- Sensor networks have appeared as a promising situations, the goal of this paper is to find algorithms that do
technology with various applications, where power this computation in a distributed manner. We analyze
efficiency is one of the critical requirements. Each node partially distributed algorithm and completely distributed
has a limited battery energy supply and can generate algorithm to compute such a flow. The algorithms described
information that needs to be communicated to a sink can be used in static networks, or in networks in which the
node. We are assuming that each node in the wireless topology changes slowly enough such that there is enough
network has the capacity to transform information in the time between topology changes to optimally balance the
form of packets and also each node is assumed to be able traffic.Energy efficient algorithms for routing in wireless
to dynamically adjust its transmission power depending networks have received considerable attention over the past
on the distance over which it transmits a packet. To few years. Distributed algorithms to form sparse topologies
improve the power efficiency requirements, without containing Minimum-energy routes were proposed in
affecting the network delay, we propose and study a “Minimum energy mobile wireless networks [1],”
number of schemes for deletion of obsolete information “Minimum energy mobile wireless networks revisited [2].”
from the network nodes and we propose distributed An approximate approach based on discretization of the
algorithms to compute an optimal routing scheme that coverage region of a node into cones was described in
maximizes the time at which the first node in the network “Distributed topology control for power efficient operation
runs out of energy. For computing such a flow we are in multi-hop wireless ad hoc networks[3],” “Analysis of a
analyzing a partially distributed algorithm and a cone-based distributed topology control algorithm for
completely distributed algorithm. The resulting wireless multi-hop networks” [4]. All the above mentioned
algorithms have low computational complexity and are works focused on minimizing the total energy consumption
guaranteed to converge to an optimal routing scheme of the network. However, as pointed out in this can lead to
that maximizes the lifetime of a network. For reducing some nodes in the network being drained out of energy very
the power consumption we are taking source node as quickly. Hence instead of trying to minimize the total energy
dynamically move form one location to the other where it consumption, routing to maximize the network lifetime was
is created and the sensor nodes are static and cannot considered in “Energy conserving routing in wireless ad-hoc
move form one location to the other location where it is networks [5],” “Routing for maximum system lifetime in
created. The results of our study will allow a network wireless ad-hoc networks [6].” The problem was formulated
designer to implement such a system and to tune its as a linear program, and heuristics were proposed to select
performance in a delay-tolerant environment with routes in a distributed manner to maximize the network
intermittent connectivity, as to ensure with some chosen lifetime. However, as illustrated in these papers, these
level of confidence that the information is successfully heuristics do not always lead to selection of routes that are
carried through the mobile network and delivered within globally optimum and a similar problem formulation for
some time period. selection of relay nodes was given in “Topology control for
wireless sensor networks [7],” We note that distributed
I. INTRODUCTION iterative algorithms for the computation of the maximum
lifetime routing flow were described in “Energy efficient
routing in ad hoc disaster recovery networks” [8]. Each-
A network of wireless sensor nodes distributed in a iteration involved a bisection search on the network lifetime,
region. Each node has a limited battery energy supply and and the solution of a max-flow problem to check the
can generate information that needs to be communicating to feasibility of the network lifetime. The complexity of the
a sink node. It is assumed that each wireless node has the algorithm was shown to be polynomial in the number of
capability to relay packets. Also each node is power nodes in the special case of one source node. We use a
depending on the distance over which it transmits a packet. different approach based on the sub gradient algorithm for
We focus on the problem of computing a flow that the solution of the dual problem. We exploit the separable
maximizes the lifetime of the network - the lifetime is taken nature of the problem using dual decomposition to obtain
to be the time at which the first node runs out of energy. partially and fully distributed algorithms. This is similar to
Since sensor networks need to self configure in many

the dual decomposition approaches applied to other reducing the power consumption. The problems faced in the
problems in communication networks existing systems are overcome through the proposed system.
When power efficiency is considered, ad hoc Each mobile estimate its life-time based on the traffic
networks will require a power-aware metric for their routing volume and battery state. The extension field in route-
algorithms. Typically, there are two main optimization request RREQ and route reply RREP packets are utilized to
metrics for energy-efficiency broadcast/ multicast routing in carry the life-time (LT) information. LT field is also
wireless ad hoc networks: included into the routing tables. When a RREQ packet is
(1) Maximizing the network lifetime; and send, LT is set to maximum value (all ones). When an
(2) Minimizing the total transmission power intermediate node receives the RREQ, it compares the LT
assigned to all nodes. field of the packet to its own LT. Smallest of the two is set to
Maximum lifetime broadcast/multicast routing algorithms forwarded RREQ packet. When a node having a path to the
can distribute packet relaying loads for each node in a destination hears the RREQ packet, it will compare the LT
manner that prevents nodes from being overused or abused. field of the RREQ with the LT field in its routing table and
By maximizing the lifetime of all nodes, the time before the put the smaller of the two into RREP. In case destination
network is partitioned is prolonged. hears the RREQ, it will simply send RREP with the lifetime
II. OBJECTIVE field equal to the LT in the RREQ. All intermediate nodes
that hear RREP store the path along with the life time
information. In case the source receives several RREPs, it
• We reduce the power consumption for packet selects the path having the largest LT.
• We achieve maximum lifetime using the partially • Unattended operation
and fully distributed processing techniques. • Robustness under dynamic operating conditions
• Scalability to thousands of sensors
III.GENERAL BLOCK DIAGRAM • Energy consumption is low
• Efficiency is high


We describe the system model and formulate the

problem of maximizing the network lifetime as an
optimization problem. We are introducing the sub-gradient
algorithm to solve a convex optimization problem via the
dual problem since the objective function is not strictly
convex in the primal variables, the dual function is non-
differentiable. Hence, the primal solution is not immediately
IV.EXISTING SYSTEM available, but it can be recovered. We derive the partially
and fully distributed algorithms. We describe a way to
Power consumption is one of the major drawbacks completely decentralize the problem by introducing
in the existing system. When a node traverse from one additional variables corresponding to an upper bound on the
network to another network located within topology, the inverse lifetime of each node. The problem of maximizing
average end-end delay time is increased because of more the network lifetime can be reformulated as the following
number of coordinator nodes present in the topology. By convex quadratic optimization problem. The flow
traversing more number of coordinator from the centralized conservation violation is normalized with respect to the total
node, battery life is decreased. So network connectivity flow in the network and the minimum node lifetime is
doesn’t maintain while the sensor node traversing. The normalized with respect to the optimal value of the network
sensors collect all the information for which it has been for. lifetime given by a centralized solution to the problem. We
The information collected by the sensors will be sent to the considered the network lifetime to be time at which the first
nearest sensor node runs out of energy. Thus we assumed that all nodes are
• Existing works focused on minimizing the total of equal importance and critical to the operation of the
energy consumption of the network. sensor network. However for a heterogeneous wireless
• Nodes in the network being drained out of energy sensor network, some nodes may be more important than
very quickly. others. Also, if there are two nodes collecting highly
• Energy consumption is high correlated data, the network can remain functional even if
• It is not robust. one node runs out of energy. Moreover, for the case of nodes
• The sensors have a limited power so they are not with highly correlated data, we may want only one node to
capable to transform the information to all the other forward the data at a given time. Thus we can activate the
sensors. two nodes in succession, and still be able to send the
• Because of this power consumption network necessary data to the sink. We will model the lifetime of a
lifetime is low. network to be a function of the times for which the nodes in
V.PROPOSED SYSTEM the network can forward their data to the sink node. In order
to state this precisely, we redefine the node lifetime and the
In the proposed system the base station can network lifetime for the analysis in this section. We will
dynamically move from one location to the other for
relax the constraint on the maximum flow over a link at a
given time. We also describe various extensions to the
problem for which we can obtain distributed algorithms
using the approach described in this paper. We extend the
simplistic definition of network lifetime to more general
definitions which model more realistic scenarios in sensor


1. Node Creation and Plotting process

2. Lifetime Estimation and path tracing
3. Partially Distributed Processing
4. Fully Distributed Processing
5. Data Passing


A. Partially Distributed Processing:

• Each mobile estimate its life-time based on the

traffic volume and battery state. B.Fully Distributed Algorithm
• The extension field in route-request RREQ and
route reply RREP packets are utilized to carry the The distributed network and ad hoc networks makes
life-time (LT) information. resource allocation strategies very challenging since there is
no central node to monitor and coordinate the activities of all
• When a RREQ packet is send, LT is set to
the nodes in the network. Since a single node cannot be
maximum value.
delegated to act as a centralized authority because of
• When an intermediate node receives the RREQ, it
limitations in the transmission range, several delegated
compares the LT field of the packet to its own LT.
nodes may coordinate the activities in certain zones. This
Smallest of the two is set to forwarded RREQ
methodology is generally referred to as clustering and the
nodes are called clusterheads.The clusterheads employ
• When a node having a path to the destination hears
centralized algorithms in its cluster; however, the
the RREQ packet, it will compare the LT field of clusterheads themselves are distributive in nature.
the RREQ with the LT field and put the smallest of
A first consideration is that the requirement for sensor
the two into RREP. In case destination hears the
networks to be self-organizing implies that there is no fine
RREQ, it will simply send RREP with the lifetime control over the placement of the sensor nodes when the
field equal to the LT in the RREQ.
network is installed (e.g., when nodes are dropped from an
• All intermediate nodes that hear RREP store the airplane). Consequently, we assume that nodes are
path along with the life time information. randomly distributed across the environment.
• In case the source receives several RREPs, it select • We first put all the nodes in vulnerable state
the path having the largest LT.
• If there is a face which is not covered by any other
active or vulnerable sensor, then go to active state and
inform neighbors.
• If all its faces are covered by one of two types of
sensors: active or vulnerable sensors with a larger
energy supply, i.e., the sensor is not a champion for any
of its faces, then go to idle state and inform neighbors

• After sensor node goes to Active state, it will stay in

Active state for pre- defined time called Reshuffle-
triggering threshold value.
• Upon reaching the threshold value, node in Active A network consisting of 32 nodes was deployed on a
state will go to Vulnerable state and inform the small island to monitor the habitat environment. Several
neighbors. energy conservation methods were adopted, including the
• If sensor node is in Idle or Active state then it will use of sleep mode, energy-efficient communication
go in vulnerable state, if one of its neighbor goes protocols, and heterogeneous transmission power for
into Vulnerable state. different types of nodes. We use both of the above-
• It causes global reshuffle and it will find new mentioned techniques to maximize the network lifetime in
minimal sensor cover. our solution. We find the optimal schedule to switch on/off
sensors to watch targets in turn, and we find the optimal
X.LITERATURE SURVEY routes to forward data from sensor nodes to the BS.
The algorithms were derived to solve the dual problems
There are two major techniques for maximizing the of programs (24) (4) and (8) in a partially and a fully
routing lifetime: the use of energy efficient routing and the decentralized manner, respectively. The computation results
introduction of sleep/active modes for sensors. Extensive show that the rate of convergence of the fully distributed
research has been done on energy efficient data gathering algorithm was slower than that for the partially distributed
and information dissemination in sensor networks. Some algorithm. However, each-iteration of the partially
well-known energy efficient protocols were developed, such distributed algorithm involves communication between all
as Directed Diffusion [9], LEACH [10], PEGASIS [11], and the nodes and a central node (e.g. sink node). Hence, it is not
ACQUIRE [12]. Directed Diffusion is regarded as an obvious which algorithm will have a lower total energy
improvement over the SPIN [13] protocol that used a consumption cost. If the radius of the network graph is
proactive approach for information dissemination. LEACH small, then it would be more energy efficient to use the
organizes sensor nodes into clusters to fuse data before partially distributed algorithm even though each-iteration
transmitting to the BS. PEGASIS improved the LEACH by involves the update of a central variable. Conversely, for
considering both metrics of energy consumption and data- large network radius, the fully distributed algorithm would
gathering delay. be a better choice. Also, we note that the computation at
In [14], an analytical model was proposed to find the each node for the fully distributed algorithm involves the
upper bound of the lifetime of a sensor network, given the solution of a convex quadratic optimization problem. This is
surveillance region and a BS, the number of sensor nodes in contrast to the partially distributed algorithm, where each-
deployed and initial energy of each node. Some routing iteration consists of minimization of a quadratic function of
schemes for maximizing network lifetime were presented in a single variable, which can be done analytically. We
[15]. In [16], an analytic model was proposed to analyze the considered many different extensions to the original problem
tradeoff between the energy cost for each node to probe its and showed how the sub gradient approach can be used to
neighbors and the routing accuracy in geographic routing, obtain distributed algorithms. In addition, we considered a
and a localized method was proposed. In [17] and [8], linear generalization of the definition of network lifetime to model
programming (LP) formulation was used to find energy- realistic sensor network scenarios, and reformulated the
efficient routes from sensor nodes to the BS,and problem as a convex optimization problem with separable
approximation algorithms were proposed to solve the LP structure.
Another important technique used to prolong the X. SIMULATION RESULTS
lifetime of sensor networks is the introduction of switch
on/off modes for sensor nodes. J. Carle et al. did a good
survey in [18] on energy efficient
area monitoring for sensor networks. They pointed out that
the best method for conserving energy is to turn off as many
sensors as possible, while still keeping the system
functioning. An analytical model was proposed in [19] to
analyze the system performance, such as network capacity
and data delivery delay, against the sensor dynamics in
on/off modes. A node scheduling scheme was developed in
[20]. This scheme schedules the nodes to turn on or off
without affecting the overall service provided. A node
decides to turn off when it discovers that its neighbors can
help it to monitor its monitoring area. The scheduling
scheme works in a localized fashion where nodes make
decisions based on its local information. Similar to [21], the
work in [22] defined a criterion for sensor nodes to turn
themselves off in surveillance systems. A node can turn XI. CONCLUSION
itself off if its monitoring area is the smallest among all its
neighbors and its neighbors will become responsible for that In this project, we proposed two distributed algorithms to
area. This process continues until the surveillance area of a calculate an optimal routing flow to maximize the network
node is smaller than a given threshold. A deployment of a lifetime. The algorithms were derived to solve the dual
wireless sensor network in the real world for habitat problems of programs “Analysis of a cone-based distributed
monitoring was discussed in [23]. topology control algorithm for wireless multi-hop networks
[4],” and “Energy efficient routing in ad hoc disaster Int. Conf. System Sciences (HICSS-33), Maui, HI, Jan.
recovery networks [8],” in a partially and a fully 2000.
decentralized manner, respectively. The computation results [11] Lindsey S., C. Raghavendra, and K. M. Sivalingam,
show that the rate of convergence of the fully distributed “Data gathering algorithms in sensor networks using energy
algorithm was slower than that for the partially distributed metrics,” IEEE Trans. Parallel Distrib. Syst., vol. 13, no. 9,
algorithm. However, each-iteration of the partially pp. 924–935, Sep. 2002.
distributed algorithm involves communication between all [12] N. Sadagopan and B. Krishnamachari, “ACQUIRE: The
the nodes and a central node (e.g. sink node). Hence, it is not acquire mechanism for efficient querying in sensor
obvious which algorithm will have a lower total energy networks,” in Proc. 1st IEEE Int. Workshop on Sensor
consumption cost. If the radius of the network graph is Network Protocols and Application (SNPA), 2003, pp. 149–
small, then it would be more energy efficient to use the 155.
partially distributed algorithm even though each-iteration [13] W. R. Heinzelman, J. Kulit, and H. Balakrishnan,
involves the update of a central variable. Conversely, for “Adaptive protocols for information dissemination in
large network radius, the fully distributed algorithm would wireless sensor networks,” presented at the 5th ACM/IEEE
be a better choice. Also, we note that the computation at Annu. Int. Conf. Mobile Computing and Networking
each node for the fully distributed algorithm involves the (MOBICOM), Seattle, WA, Aug. 1999.
solution of a convex quadratic optimization problem. This is [14] M. Bhardwaj, T. Garnett, and A. Chandrakasan, “Upper
in contrast to the partially distributed algorithm, where each- bounds on the lifetime of sensor networks,” in IEEE Int.
iteration consists of minimization of a quadratic function of Conf. Communications, 2001, pp. 785–790.
a single variable, which can be done analytically. This [15] J. Chang and L. Tassiulas, “Maximum lifetime routing
communication paradigm has a broad range of applications, in wireless sensor networks,” presented at the Advanced
such as in the area of telemetry collection and sensor Telecommunications and Information Distribution Research
networks. It could be used for animal tracking systems, for Program (ATIRP’2000), College Park, MD, Mar. 2000.
medical applications with small sensors to propagate [16] T. Melodia, D. Pompili, and I. F. Akyildiz, “Optimal
information from one part of the body to another or to an local topology knowledge for energy efficient geographical
external machine, and to relay traffic or accident information routing in sensor networks,” in Proc. IEEE INFOCOM,
to the public through the vehicles themselves as well as 2004, pp. 1705–1716.
many other applications. [17] N. Sadagopan and B. Krishnamachari, “Maximizing
data extraction in energy-limited sensor networks,” in Proc.
REFERENCES IEEE INFOCOM, 2004, pp. 1717–1727.
[18] Carle J. and D. Simplot-Ryl, “Energy-efficient area
[1] Rodoplu V. and Meng T. H., “Minimum energy mobile monitoring for sensor networks,” IEEE Computer, vol. 37,
wireless networks,” IEEE J. Select. Areas Communi., vol. no. 2, pp. 40–46, Feb. 2004.
17, no. 8, pp. 1333–1344, 1999. [19] Chiasserini C. F. and Garetto M., “Modeling the
[2] L. Li and J. Y. Halpern, “Minimum energy mobile performance of wireless sensor networks,” in Proc. IEEE
wireless networks revisited,” IEEE International Conference INFOCOM, 2004, pp. 220–231.
on Communications (ICC), 2001. [20] D. Tian and N. D. Georganas, “A coverage-preserving
[3] R. Wattenhofer et al., “Distributed topology control for node scheduling scheme for large wireless sensor networks,”
power efficient operation in multihop wireless ad hoc in Proc. 1st ACM Int. Workshop on Wireless Sensor
networks,” IEEE INFOCOM, 2001. Networks and Applications, 2002, pp. 32–41.
[4] L. Li et al., “Analysis of a cone-based distributed [21] L. B. Ruiz et al., “Scheduling nodes in wireless sensor
topology control algorithm for wireless multi-hop networks: A Voronoi approach,” in Proc. 28th IEEE Conf.
networks,” ACM Symposium on Principle of Distributed Local Computer Networks (LCNS2003), Bonn/Konigswinter,
Computing (PODC), 2001. Germany, Oct. 2003, pp. 423–429.
[5]J. H. Chang and L. Tassiulas, “Energy conserving routing [22] A. Mainwaring, J. Polastre, R. Szewczyk, D. Culler,
in wireless ad-hoc networks,” in Proc. IEEE INFOCOM, pp. and J. Anderson, “Wireless sensor networks for habitat
22–31, 2000. monitoring,” in Proc. 1st ACM Int. Workshop on Wireless
[6] “Routing for maximum system lifetime in wireless ad- Sensor Networks and Applications, Atlanta, Ga, Sep. 2002,
hoc networks,” in Proc. of 37-th Annual Allerton Conference pp. 88–97.
on Communication, Control and Computing, 1999. [23] H. J. Ryser, Combinational Mathematics. Washington,
[7] J. Pan et al., “Topology control for wireless sensor DC: The Mathematical Association of America, 1963, pp.
networks,” MobiCom, 2003. 58–59.
[8] G. Zussman and A. Segall, “Energy efficient routing in [24] R. A. Brualdi and H. J. Ryser, Combinatorial Matrix
ad hoc disaster recovery networks,” INFOCOM, 2003. Theory. Cambridge,
[9] C. Intanagonwiwat, R. Govindan, and D. Estrin,
“Directed diffusion: A scalable and robust communication
paradigm for sensor networks,” presented at the 6th Annu.
ACM/IEEE Int. Conf. Mobile Computing and Networking
(MOBICOM), Boston, MA, Aug. 2000.
[10] W. Heinzelman, A. Chandrakasan, and H. Balakrishna,
“Energy-efficient communication protocol for wireless
microsensor networks,” presented at the 33rd Annu. Hawaii

Decomposition of EEG Signal Using Source Separation
Kiran Samuel PG student Karunya university coimbatore and Shanty Chacko, Lecturer, Department of
Electronics & communication Engineering Karunya University,

Abstract: The objective of my project is to

reconstruct brain maps from EEG signal. And of EEG[1], the more sources we include the more
from the brain map we will try to diagnosis accurate it will be. So our first aim is to decompose
anatomical, functional and pathological problems. the EEG signal into its components. The components
These brain maps are projections of energy of the in the sense all the more accurate it will become. As a
signals. First we will be going for the beginning we will start with the reading, measuring
deconvolution of the EEG signal into its and displaying of EEG signal.
components. And using some visualization tools we II. MEASURING EEG
will be able to plot brain maps, which internally In conventional scalp EEG, the recording is
show the brain activity. The EEG sample is obtained by placing electrodes on the scalp with a
divided into four-sub bands alpha, beta, theta and conductive gel or paste, usually after preparing the
delta. This each EEG sub band sample will be scalp area by light abrasion to reduce impedance due
having some specified number of components. For to dead skin cells. The technique has been advanced
extracting these components we do the by the use of carbon nanotubes to penetrate the outer
deconvolution of EEG. The deconvolution of EEG layers of the skin for improved electrical contact. The
signal can be done using source separation sensor is known as ENOBIO. Many systems typically
algorithms. There are many algorithms nowadays use electrodes, each of which is attached to an
that can be used for source separation. For doing individual wire. Some systems use caps or nets into
all these thing we will be using a separate toolbox which electrodes are imbedded; this is particularly
called EEGLAB. This toolbox is exclusively made common when high-density arrays of electrodes are
for this EEG signal processing and processing. needed. Electrode locations and names are specified
Keywords – EEG signal decomposition, Brain map by the International 10–20 system for most clinical
1. INTRODUCTION and research applications (except when high-density
arrays are used). This system ensures that the naming
The EEG data will be first divided into its four- of electrodes is consistent across laboratories. In most
frequency sub band. This is done based on the clinical applications, 19 recording electrodes (plus
frequency separation. Electroencephalography is the ground and system reference) are used. A smaller
measurement of the electrical activity of the brain by number of electrodes are typically used when
recording from electrodes placed on the scalp or, in recording EEG from neonates. Additional electrodes
special cases, subdurally or in the cerebral cortex. The can be added to the standard set-up when a clinical or
resulting traces are known as an research application demands increased spatial
electroencephalogram (EEG) and represent a resolution for a particular area of the brain. High-
summation of post-synaptic potentials from a large density arrays (typically via cap or net) can contain up
number of neurons. These are sometimes called to 256 electrodes more-or-less evenly spaced around
brainwaves, though this use is discouraged, because the scalp. Even though there are so many ways of
the brain does not broadcast electrical waves [1]. taking EEG in most of the cases this 10 – 20 systems
Electrical currents are not measured, but rather are used. So as an example we are going to take the
voltage differences between different parts of the same 10 – 20 system measured one EEG sample and
brain. The measured EEG signal will be having there that sample is going to be decomposed Thus
by so many components. So if we are able to measuring of EEG signal is done by different ways in
decompose this EEG signal into its components first different countries.
and then do the analysis part then it will be useful.

Fig: 1 Normal EEG wave in time domain

EEGLAB offers a structured programming

III. EEG SUB BAND: environment for storing, accessing, measuring,
manipulating and visualizing event-related EEG. The
1. Delta activity around 4Hz. EEGLAB gui is designed to allow non-experienced
Matlab users to apply advanced signal processing
2. Theta activity between 4 – 8 Hz. techniques to their data[4]
We will be using two basic filters in this
3. Alpha activity between 8 – 14 Hz.
toolbox one is a high pass filter, which will eliminate
4. Beta activity between 14 and above. the entire noise component in the data that means all
the data above the frequency of 50 is considered as
The EEG typically described in terms of (1) noise. The other is a band pass filter. The pass band
rhythmic activity and (2) transients. The rhythmic and stop band are selected according to the sub band
activity is divided into bands by frequency. To some frequencies. The power can also be estimated using
degree, these frequency bands are a matter of the formula,
nomenclature but these designations arose because .
rhythmic activity within a certain frequency range
was noted to have a certain distribution over the scalp Where X (m) is the desired signal. After finding out
or a certain biological significance. Most of the the power for each sub bands we will be going for the
cerebral signal observed in the scalp EEG comes falls plotting of brain map. The power spectrum of the
in the range of 1-40 Hz. This normal EEG signal is whole sample is also shown in the figure below
passed through a band pass filter so that I can extract
these sub bands. The frequency spectrum of the whole
EEG signal is also plotted; this was done by taking the
fft of the signal. Figure 1 shows all the 32 channels of
an EEG sample at a particular time period. The
number of channels in an EEG sample may vary. Now
we will see some things about four EEG sub bands.


Fig: 6-power spectrum of theta wave
EEGLAB provides an interactive graphic user
interface (gui) allowing users to flexibly and .
interactively process their high-density EEG data.

memory and learning, especially in the temporal
lobes. Theta rhythms are very strong in rodent
hippocampi and entorhinal cortex during learning and
memory retrieval they can equally be seen in cases of
focal or generalized subcortical brain damage and
C. Alpha waves

Alpha is the frequency range from 8 Hz to

Fig: 4 Power spectrums for single EEG channel 14 Hz. Hans Berger named the first rhythmic EEG
A. Delta waves activity he saw, the "alpha wave." [6] This is activity in
the 8-12 Hz range seen in the posterior head regions
Delta is the frequency range up to 4 Hz. It is when an adult patient is awake but relaxed. It was
seen normally in adults in slow wave sleep. It is also noted to attenuate with eye opening or mental
seen normally in babies. It may be seen over focal exertion. This activity is now referred to as "posterior
lesions or diffusely in encephalopathy [6]. Delta waves basic rhythm," the "posterior dominant rhythm" or the
are also naturally present in stage three and four of "posterior alpha rhythm." The posterior basic rhythm
sleep (deep sleep) but not in stages 1, 2, and rapid eye is actually slower than 8 Hz in young children
movement (REM) of sleep. Finally, delta rhythm can (therefore technically in the theta range). In addition
be observed in cases of brain injury and comatic to the posterior basic rhythm, there are two other
patients. normal alpha rhythms that are typically discussed: the
mu rhythm and a temporal "third rhythm." Alpha can
be abnormal; for example, an EEG that has diffuse
alpha occurring in coma and is not responsive to
external stimuli is referred to as "alpha coma."

Fig: 5 Power spectrum of delta wave

B. Theta waves

Theta rhythms are one of several

characteristic electroencephalogram waveforms
associated with various sleep and wakefulness states.
Theta is the frequency range from 4 Hz to 8 Hz. Theta Fig: 7-power spectrum of alpha wave
is seen normally in young children. It may be seen in . D. Beta waves
drowsiness or arousal in older children and adults; it Beta is the frequency range from 14 Hz to
can also be seen in meditation. Excess theta for age about 40 Hz. Low amplitude beta with multiple and
represents abnormal activity. These rhythms are varying frequencies is often associated with active,
associated with spatial navigation and some forms of busy or anxious thinking and active concentration.
Rhythmic beta with a dominant set of frequencies is
seen in the scalp EEG is rarely cerebral[6]. This is
mostly seen in old people and when ever they are
trying to relax this activity is seen. This activity will
be low in amplitude but will be in a rethamic pattern

Fig: 8-power spectrum of beta wave

associated with various pathologies and drug effects,
especially benzodiazepines. Activity over about 25 Hz
V. CHANNEL LOCATION elec1 elec2 elec3

Component1 [0.824 0.534 0.314 ...]

S = Component 2[0.314 0.154 0.732 ...]

Component 3[0.153 0.734 0.13 ...]

Now we will see how to reproject one component to

the electrode space. W-1 is the inverse matrix to go
from the source space S to the data space X[2].
Fig: 9 Channel locations of brain
Channel location shows on what all places the X = W-1S
electrodes are placed on the brain. The above figure As a conclusion, when we talk about independent
shows the two-dimensional plot of the brain and the components, we usually refer to two concepts
channel locations. • Rows of the S matrix which are the time
course of the component activity
• Columns of the W-1 matrix which are the
scalp projection of the components


Brain Mapping is a procedure that records electrical

activity within the brain. This gives us the ability to
view the dynamic changes taking place throughout the
brain during processing tasks and assist in
Fig: 10 Channel locations of brain on a 3-d plot
determining which areas of the brain are fully
The major algorithms used for deconvolution of engaged and processing efficiently. The electrical
EEG signal are ICA and JADE first we are trying to activity of the brain behaves like any other electrical
go with ICA. system. Changes in membrane polarization, inhibitory
V1. ICA and excitatory postsynaptic potentials, action
potentials etc. create voltages that are conducted
ICA can deal with an arbitrary high number of
through the brain tissues. These electrical voltages
dimensions. Let's consider 32 EEG electrodes for
enter the membranes surrounding the brain and
instance. The signal recorded in all electrodes at each
continue up through the skull and appear at the scalp,
time point then constitutes a data point in a 32-
which can be measured as micro Volts.
dimension space. After whitening the data, ICA will
"rotate the 128 axis" in order to minimize the
Gaussianity of the projection on all axis. The ICA
component is the matrix that allows projecting the
data in the initial space to one of the axis found by
ICA[5]. The weight matrix is the full transformation
from the original space. When we write

• X - Original EEG channels

• S - EEG components. Fig: 11 Brain Map for all 32 EEG components

These potentials are recorded by an electrode that
• W - The weight matrix to go from is attached to the scalp with non-toxic conductive gel.
the S space to the X space. The electrodes are fed into a sensitive amplifier. At
crossroads the EEG is recorded from many electrodes
In EEG: An artifact time course or the time arranged in a particular pattern. Brain Mapping
techniques are constantly evolving, and rely on the
course of the one compact domain in the brain
development and refinement of image acquisition,
representation, analysis, visualization and
In EEG: An artifact time course or the time course of
the one compact domain in the brain


We took 32 channel raw EEG data and found out

the power spectrum for the signal. We deconvoluted
the EEG sample using ICA algorithm. From the
power spectrum with the help of EEGLAB toolbox in
MATLAB we plotted brain maps for the sample EEG
data. As future works we are planning to split the
EEG signal into its components using source
separation algorithms other than ICA. And will try to
plot the brain maps for each component and thus will
compare all the available algorithms.


[1] S.Saneil, A.R.Leyman”EEG brain map

reconstruction using blind source separation”.
IEEE signal processing workshop paper. Pages
233-236, august 2001.
[2] Ning T. and Bronzino D., “Autoregressive and
bispectral analysis Techniques: EEG
applications”, IEEE Engineering in Medicine
and Biology Magazine, pages 18-23, March
[3] Downloaded EEG sample database.,
[4] Downloaded EEGLAB toolbox for MATLAB.,
[5] For acquiring ICA toolbox for MATLAB.
[6] Notes on alpha, beta, delta and theta sub bands,

Segmentation of Multispectral Brain MRI using Source
Separation Algorithm
Krishnendu K, PG student and Shanty Chacko, Lecturer, Department of Electronics & communication
Engineering, Karunya university, Karunya Nagar, Coimbatore – 641 114, Tamil Nadu, India.
Email addresses:,

Abstract-- The aim of our paper is to implement o Locate tumors and other pathologies
an algorithm for segmenting multispectral MRI o Measure tissue volumes
brain images and to check whether there is any o Diagnosis
performance improvement. One set of o Treatment planning
multispectral MRI brain image consists of one o Study of anatomical structure
spin-lattice relaxation time, spin–spin relaxation
time, and proton density weighted images (T1, T2,
A. Need for Segmentation
and PD). The algorithm to be used is the ‘source
The purposes of segmenting magnetic
separation algorithm’. Source separation is a more
resonance (MR) images are:
general term used as we can use algorithms like
1) to quantify the volume sizes of different tissue
ICA, BINICA, JADE etc.. For implementing the
types within the body, and
algorithm the first thing needed is the database of
2) to visualize the tissue structures in three
multispectral MRI brain images. Sometimes this
dimensions using image fusion.
database is called as the ‘test database’. After the
B. Magnetic Resonance Imaging (MRI)
image database is acquired implement the
Magnetic Resonance Imaging (MRI) is a technique
algorithm, calculate the performance parameters
primarily used in medical imaging to demonstrate
and check for performance improvement with
pathological or other physiological alterations of
respect to any already implemented technique.
living tissues. Medical MRI most frequently relies on
Keywords – Multispectral MRI, Test Database,
the relaxation properties of excited hydrogen nuclei in
Source Separation Algorithm, Segmentation.
water and lipids. When the object to be imaged is
placed in a powerful, uniform magnetic field, the
I. INTRODUCTION spins of atomic nuclei with a resulting non-zero spin
have to arrange in a particular manner with the
In image processing field, segmentation [1] applied magnetic field according to quantum
refers to the process of partitioning a digital image mechanics. Nuclei of hydrogen atoms (protons) have
into multiple regions (sets of pixels). The goal of a simple spin 1/2 and therefore align either parallel or
segmentation is to simplify and/or change the antiparallel to the magnetic field. The MRI scanners
representation of an image into something that is more used in medicine have a typical magnetic The spin
meaningful and easier to analyze. Image segmentation polarization determines the basic MRI signal strength.
is typically used to locate objects and boundaries For protons, it refers to the population difference of
(lines, curves, etc.) in images. The result of image the two energy states that are associated with the
segmentation is a set of regions that collectively cover parallel and antiparallel alignment of the proton spins
the entire image. Several general-purpose algorithms in the magnetic field. The tissue is then exposed to
and techniques have been developed for image pulses of electromagnetic energy (RF pulses) in a
segmentation. Since there is no general solution to the plane perpendicular to the magnetic field, causing
image segmentation problem, these techniques often some of the magnetically aligned hydrogen nuclei to
have to be combined with domain knowledge in order assume a temporary non-aligned high-energy state. Or
to effectively solve an image segmentation problem in other words, the steady-state equilibrium
for a problem domain. The methods most commonly established in the static magnetic field becomes
used are Clustering Methods, Histogram-Based perturbed and the population difference of the two
Methods, Region-Growing Methods, Graph energy levels is altered. In order to selectively image
Partitioning Methods, Model based Segmentation, different voxels (volume picture elements) of the
Multi-scale Segmentation, Semi-automatic subject, orthogonal magnetic gradients are applied.
Segmentation and Neural Networks Segmentation. The RF transmission system consists of a RF
synthesizer, power amplifier and transmitting coil.
This is usually built into the body of the scanner. The
Some of the practical Medical Imaging power of the transmitter is variable. Magnetic
applications of image segmentation are: gradients are generated by three orthogonal coils,
oriented in the x, y and z directions of the scanner.
These are usually resistive electromagnets powered by dark in the T1-weighted image and bright in the T2-
sophisticated amplifiers which permit rapid and weighted image. A tissue with a short T1 and a long
precise adjustments to their field strength and T2 (like fat) is bright in the T1-weighted image and
direction. Some time constants are involved in the gray in the T2-weighted image. Gadolinium contrast
relaxation processes that establish equilibrium agents reduce T1 and T2 times, resulting in an
following the RF excitation. These time constants are enhanced signal in the T1-weighted image and a
T1, T2 and PD. In the brain, T1-weighting causes the reduced signal in the T2-weighted image.
nerve connections of white matter to appear white, . T1 (Spin-lattice Relaxation Time)
and the congregations of neurons of gray matter to Spin-lattice relaxation time, known as T1, is a
appear gray, while cerebrospinal fluid appears dark. time constant in Nuclear Magnetic Resonance and
The contrast of "white matter," "gray matter'" and Magnetic Resonance Imaging. T1 characterizes the
"cerebrospinal fluid" is reversed using T2 or PD rate at which the longitudinal Mz component of the
imaging. magnetization vector recovers. The name spin-lattice
In clinical practice, MRI is used to relaxation refers to the time it takes for the spins to
distinguish pathologic tissue (such as a brain tumor) give the energy they obtained from the RF pulse back
from normal tissue. One advantage of an MRI scan is to the surrounding lattice in order to restore their
that it is thought to be harmless to the patient. It uses equilibrium state. Different tissues have different T1
strong magnetic fields and non-ionizing radiation in values. For example, fluids have long T1s (1500-2000
the radio frequency range. mSec), and water based tissues are in the 400-1200
C. Multispectral MR Brain Images mSec range, while fat based tissues are in the shorter
Magnetic resonance imaging (MRI) is an 100-150 mSec range. T1 weighted images can be
advanced medical imaging technique providing rich obtained by setting short TR (< 750mSec) and TE (<
information about the human soft tissue anatomy. It 40mSec) values in conventional Spin Echo sequences.
has several advantages over other imaging techniques
enabling it to provide three-dimensional data with
high contrast between soft tissues. A multi-spectral
image (fig.1) is a collection of several monochrome
images of the same scene, each of them taken with a
different sensor. The advantage of using MR images
is the multispectral characteristics of MR images with
relaxation times (i.e.,T1 and T2) and proton density
(i.e., PD) information.

Fig 2. T1 weighted image

. T2 (Spin-spin Relaxation Time)
Spin-spin relaxation time, known as T2, is a
Figure. 1.MR multispectral images T1w (left), T2w (center), and time constant in Nuclear Magnetic Resonance and
PDw (right) for one brain axial slice Magnetic Resonance Imaging. T2 characterizes the
T1, T2 and PD weighted images depends on rate at which the Mxy component of the magnetization
two parameters, called sequence parameters, Echo vector decays in the transverse magnetic plane. T2
Time (TE) and Repetition Time (TR). Spin Echo decay occurs 5 to 10 times more rapidly than T1
sequence is based on repetition of 90° and 180° RF recovery, and different tissues have different T2s. For
pulses. Spin Echo sequence have two parameters, example, fluids have the longest T2s (700-1200
Echo Time (TE) is the time between the 90° RF pulse mSec), and water based tissues are in the 40-200
and MR signal sampling, corresponding to maximum mSec range, while fat based tissues are in the 10-100
of echo. The 180° RF pulse is applied at time TE/2 mSec range. T2 images in MRI are often thought of as
and Repetition Time is the time between 2 excitations "pathology scans" because collections of abnormal
pulses (time between two 90° RF pulses). Nearly all fluid are bright against the darker normal tissue. T2
MR image display tissue contrasts that depend on weighted images can be obtained by setting long TR
proton density, T1 and T2 simultaneously. PD, T1 and (>1500 mSec) and TE (> 75mSec) values in
T2 weighting will vary with sequence parameters, and conventional Spin Echo sequences. The "pathology
may differ between different tissues in the same weighted" sequence, because most pathology contains
image. A tissue with a long T1 and T2 (like water) is

more water than normal tissue around it, it is usually than the segmentation obtained from each image
brighter on T2. individually or from the addition of the three images’
PD (Proton Density) segmentations.
Proton density denotes the concentration of Some examples are,
mobile Hydrogen atoms within a sample of tissue. An 1) Dark on T1, bright on T2, This is a typical
image produced by controlling the selection of scan pathology. Most cancers have these characteristics.
parameters to minimize the effects of T1 and T2, 2) Bright on T1, bright on T2, blood in the brain has
resulting in an image dependent primarily on the these characteristics.
density of protons in the imaging volume. 3) Bright on T1, less bright on T2, this usually means
the lesion is fatty or contains fat.
4) Dark on T1, dark on T2, chronic blood in the brain
has these characteristics.
Following is a table of approximate values of
the two relaxation time constants for nonpathological
human tissues.

Tissue Type T1 (ms) T2 (ms)

Cerebrospinal Fluid 2300 2000
(similar to pure
Gray matter of 920 100
White matter of 780 90
Fig 3. T2 weighted image Blood 1350 200
Proton density contrast is a quantitative Fat 240 85
summary of the number of protons per unit tissue. The Gadolinium Reduces T1 and T2 times
higher the number of protons in a given unit of tissue,
the greater the transverse component of Table 1. Approximate values of the two relaxation time constants
magnetization, and the brighter the signal on the D. Applications of Segmentation
proton density contrast image.
The classic method of medical image
analysis, the inspection of two-dimensional grayscale
images, is not sufficient for many applications. When
detailed or quantitative information about the
appearance, size, or shape of patient anatomy is
desired, image segmentation is often the crucial first
step. Applications of interest that depend on image
segmentation include three-dimensional visualization,
volumetric measurement, research into shape
representation of anatomy, image-guided surgery, and
detection of anatomical changes over time.

Fig 4. PD weighted image
A T1 weighted image is the image which is
usually acquired using short TR (or repetition time of
A. Algorithm
a pulse sequence) and TE (or spin-echo delay time).
1) Loading T1, T2 and PD images.
Similarly, a T2 weighted image is acquired using
2) Converting to double precision format.
relatively long TR and TE and a PD weighted image
3) Converting each image matrix to a row
with long TR and short TE. Since the three images
are strongly correlated (or spatially registered) over
4) Combining three row matrices to form a
the patient space, the information extracted by means
of image processing from the images together is
5) Computing independent components of the
obviously more valuable than that extracted from each
matrix using FastICA algorithm.
image individually. Therefore, tissue segmentation
6) Separating each rows of the resultant matrix
from the three MR images is expected to produce
to three row matrices.
more accurate 3D reconstruction and visualization
7) Reshaping each row matrix to 256x256.
8) Executing dynamic pixel range correction. The ICA algorithm
9) Converting to unsigned integer format.
10) Plotting the input images and segmented ICA rotates the whitened matrix back to the
output images. original space. It performs the rotation by minimizing
the Gaussianity of the data projected on both axes
B. Independent Component Analysis (fixed point ICA). By rotating the axis and
• Introduction to ICA minimizing Gaussianity of the projection, ICA is able
• Whitening the data to recover the original sources which are statistically
• The ICA algorithm independent (this property comes from the central
limit theorem which states that any linear mixture of 2
• ICA in N dimensions
independent random variables is more Gaussian than
• ICA properties
the original variables).
Introduction to ICA
ICA in N dimensions
ICA can deal with an arbitrary high number
ICA is a quite powerful technique and is able
of dimensions. ICA components are the matrix that
to separate independent sources linearly mixed in
allows projecting the data in the initial space to one of
several sensors. For instance, when recording
the axis found by ICA. The weight matrix is the full
magnetic resonance images (MRI) on the scalp, ICA
transformation from the original space.
can separate out artifacts embedded in the data (since
When we write
they are usually independent of each other). ICA is a
S = W X,
technique to separate linearly mixed sources. We used
X is the data in the original space, S is the source
FastICA algorithm for segmenting the images as the
activity and W is the weight matrix to go from the S
code is directly available in World Wide Web.
space to the X space.
The rows of W are the vector with which we
Whitening the data
can compute the activity of one independent
component. After transformation from S space to the
Some preprocessing steps are performed by
X space we need to reproject each component to the S
most ICA algorithms before actually applying ICA. A
space. W-1 is the inverse matrix to go from the source
first step in many ICA algorithms is to whiten (or
space S to the data space X.
sphere) the data. This means that we remove any
X = W-1S
correlations in the data, i.e. the different channels of
If S is a row vector and we multiply it by the
say, matrix Q are forced to be uncorrelated. Why we
column vector from the inverse matrix above we will
are doing whitening is that it restores the initial
obtain the projected activity of one component. All
"shape" of the data and that then ICA must only rotate
the components forms a matrix. Rows of the S matrix
the resulting matrix. After doing whitening the
which are the time course of the component activity.
variance on both axis is now equal and the correlation
ICA properties
of the projection of the data on both axis is 0
• ICA can only separate linearly mixed
(meaning that the covariance matrix is diagonal and
that all the diagonal elements are equal). Then
applying ICA only means to "rotate" this
• Since ICA is dealing with clouds of point,
representation back to the original axis space. The
changing the order in which the points are
whitening process is simply a linear change of
plotted has virtually no effect on the outcome
coordinate of the mixed data. Once the ICA solution
of the algorithm.
is found in this "whitened" coordinate frame, we can
easily reproject the ICA solution back into the original • Changing the channel order has also no
coordinate frame. effect on the outcome of the algorithm.
Putting it in mathematical terms, we seek a • Since ICA separates sources by maximizing
linear transformation V of the data D such that when their non-Gaussianity, perfect Gaussian
P = V*D we now have Cov(P) = I (I being the identity sources can not be separated.
matrix, zeros everywhere and 1s in the Diagonal; Cov
being the covariance). It thus means that all the rows • Even when the sources are not independent,
of the transformed matrix are uncorrelated. ICA finds a space where they are maximally


The acquired MR images are in the DICOM

(Digital Imaging and Communications in Medicine, Segmented multispectral MR (T1, T2 and
.dcm) single-file format. In order to load it in PD) images are obtained using the ICA algorithm.
MATLAB a special command is used. The input T1, The tissues can be analyzed using the segmented
T2 and PD images and the corresponding segmented image. For analyzing the tissues, parameters like T1
output images are given below (fig.5). time and T2 time of each tissue type must be known.
As the future work we are planning to extract the
brain part only from the image using snake algorithm
and will try to segment it using ICA algorithm and our


[1] Umberto Amato, Michele Larobina , Anestis

Antoniadis , Bruno Alfano ”Segmentation of
magnetic resonance brain images through
discriminant analysis”. Journal of Neuroscience
Methods 131 (2003) 65–74.
[2] “Semi-Automatic Medical Image Segmentation”
by Lauren O’Donnell, MASSACHUSETTS
Fig. 5.MR multispectral images T1w (left), T2w (center), and PDw
(right) for one brain axial slice and corresponding segmented T1w
[3] Notes about T1, T2, PD and Multispectral MR
(left), T2w (center), and PDw (right) images brain images.
[4] For acquiring multispectral MR brain images.
[5] ICA (Independent Component Analysis)
[6] For acquiring FastICA toolbox for MATLAB.

MR Brain Tumor Image Segmentation Using Clustering
Lincy Annet Abraham1, D.Jude Hemanth2
PG Student of Applied Electronics1, Lecturer2
Department of Electronics & Communication Engineering
Karunya University, Coimbatore.,

Abstract- In this study, unsupervised clustering one class, FCM allows pixels to belong to multiple
methods are examined to develop a medical classes with varying degrees of membership. The
diagnostic system and fuzzy clustering is used to
assign patients to the different clusters of brain approach allows additional flexibility in many
tumor. We present a novel algorithm for obtaining applications and has recently been used in the
fuzzy segmentations of images that are subject to processing of magnetic resonances (MR) images.
multiplicative intensity inhomogeneities, such as In this work, unsupervised clustering methods are
magnetic resonance images. The algorithm is to be performed to cluster the patients brain tumor.
formulated by modifying the objective function in Magnetic resonance (MR) brain section images are
the fuzzy algorithm to include a multiplier field, segmented and then synthetically colored to give
which allows the centroids of each class to vary visual representation of the original data. This study
across the image. Magnetic resonance (MR) brain fuzzy c means algorithm is used to separate the tumor
section images are segmented and then from the brain and can be identified in a particular
synthetically colored to give visual representation color. Supervised and unsupervised segmentation
of the original data. The results are compared with techniques provide broadly similar results..
the results of clustering according to classification
performance. This application shows that fuzzy II. PROPOSED METHODOLOGY
clustering methods can be important supportive
tool for the medical experts in diagnostic.

Index Terms- Image segmentation, intensity

inhomogeneities, fuzzy clustering, magnetic
resonance imaging. Figure 1. Block Diagram

Figure 1. shows the proposed methodology
ccording to rapid development on medical of segmentation of images. Magnetic resonance (MR)
devices, the traditional manual data analysis has brain section images are segmented and then
become inefficient and computer-based analysis is synthetically colored to give visual representation of
indispensable. Statistical methods, fuzzy logic, neural the original data wit three approaches: the literal and
network and machine learning algorithms are being approximate fuzzy c means unsupervised clustering
tested on many medical prediction problems to algorithms and a supervised computational neural
provide a decision support system. network, a dynamic multilayered perception trained
Image segmentation plays an important role in with the cascade correlation learning algorithm.
variety of applications such as robot vision, object Supervised and unsupervised segmentation techniques
recognition, and medical imaging. There has been provide broadly similar results. Unsupervised fuzzy
considerable interest recently in the use of fuzzy algorithm were visually observed to show better
segmentation methods which retain more information s eg me n ta tion when compared wit raw image data
from the original image than hard segmentation for volunteer studies.
methods. The fuzzy c means algorithm (FCM), in In computer vision, segmentation refers to the
particular, can be used to obtain segmentation via process of partitioning a digital image into multiple
fuzzy pixel classification. Unlike hard classification regions (sets of pixels). The goal of segmentation is to
methods which force pixels to belong exclusively to simplify and/or change the representation of an image
into something that is more meaningful and easier to
analyze. Image segmentation is typically used to To reach a minimum of dissimilarity
locate objects and boundaries (lines, curves, etc.) in function there are two conditions. These are given in
images. Equation (3) and Equation (4).
The result of image segmentation is a set of

regions that collectively cover the entire image, or a n m
set of contours extracted from the image (see edge u x j=1 ij j
detection). Each of the pixels in a region are similar c=
with respect to some characteristic or computed i n m (3)
property, such as color, intensity, or texture. Adjacent j=1 ij
regions are significantly different with respect to the
same characteristic(s). Some of the practical
applications of image segmentation are:
uij = 2/(m−1)
⎛d ⎞

∑k=1⎜⎜ dij ⎟⎟
• Medical Imaging c
o Locate tumors and other pathologies
o Measure tissue volumes ⎝ kj ⎠
o Computer-guided surgery
o Diagnosis 3.1 ALGORITHM
o Treatment planning This algorithm determines the following steps.
o Study of anatomical structure Step 1. Randomly initialize the membership matrix
• Locate objects in satellite images (roads, (U) that has constraints in Equation (1).
forests, etc.) Step 2. Calculate centroids (ci) by using Equation (3).
• Face recognition Step 3. Compute dissimilarity between centroids and
• Fingerprint recognition data points using equation (2). Stop if its
• Automatic traffic controlling systems improvement over previous iteration is below a
• Machine vision threshold.
Step 4. Compute a new U using Equation (4). Go to
Step 2.
FCM does not ensure that it converges to an
Fuzzy C-means Clustering (FCM), is also known
optimal solution. Because of cluster centers
as Fuzzy ISODATA, is an clustering technique which
(centroids) are initialize using U that randomly
is separated from hard k-means that employs hard
initialized (Equation (3)).
partitioning. The FCM employs fuzzy partitioning
such that a data point can belong to all groups with
different membership grades between 0 and 1.
Figure 2. shows the systematic procedure of the
FCM is an iterative algorithm. The aim of FCM
algorithm and the summation is given above as per
is to find cluster centers (centroids) that minimize a
dissimilarity function.
1) Read the input image
To accommodate the introduction of fuzzy
2) Set the number of clusters =4
partitioning, the membership matrix (U) is randomly
3) Calculate the eulidean distance
initialized according to Equation
4) Randomly initialize membership matrix

∑u =1,∀j =1,...,n
5) Calculate the centroids
ij (1) 6) Calculate the membership coefficient
i=1 7) If threshold is below 0.01 then update the
membership matrix
8) If threshold above 0.01 then display the
The dissimilarity function which is used in FCM is
segmentated image.
given Equation
9) The image is coverted into colour
10) The segmented tumor is displayed in a
c c n
J(U,c1,c2,...,cc)=∑Ji =∑∑uij dij
m 2 particular colour and the rest in another
i=1 i=1 j=1
uij is between 0 and 1;
The set of MR images consist of 256*256 12 bit
ci is the centroid of cluster i;
images. The fuzzy segmentation was done in
dij is the Euclidian distance between ith centroid(ci)
MATLAB software. There four types of brain tumor
and jth data point;
used in this study namely astrocytoma, meningioma,
m [1, ] is a weighting exponent.
glioma, metastase.

Table 1. Types and number of datas
Astrocytoma 15
Read the input MR images meningioma 25
glioma 20
metastase 10

Set number of cluster


Fuzzy c means algorithm is used to assign the patients

to different clusters of brain tumor. This application
Calculate the Eulidean
of fuzzy sets in a classification function causes the
distance class membership to become relative one and an
object can belong to several classes at the same time
but with different degrees. This is important feature
for medical diagnostic system to increase the
Randomly initialize sensitivity. The four types of datas were used. One
membership matrix sample is shown as above. Figure 3. shows the input
image of the brain tumor and Figure 4. shows the
fuzzy segmentated output image.
Calculate the centroids

Calculate the membership

coefficients for each pixel in
each cluster

<= 0.01
Figure 3. Input image

Display the output image

Figure 2. Algorithm for identification

Figure 4. Output imag

Table 2. Segmentation results

Count 20 We would like to thank M/S Devaki Scan Center

Madurai, Tamil nadu for providing MR brain tumor
images of various patients and datas.
Threshold 0.0085
Time period 78.500000 seconds
[1]. Songül Albayrak, Fatih Amasyal. “FUZZY C-
Centroid 67.3115 188.9934 120.2793 DIAGNOSTIC SYSTEMS”. International XII.
13.9793 Turkish Symposium on Artificial Intelligence and
Neural Networks - TAINN 2003.

[2]. Coomans, I. Broeckaert, M. Jonckheer, and D.L.

VI.CONCLUSION Massart: “Comparison of Multivariate
Discrimination Techniques for Clinical Data -
In this study, we use fuzzy c means algorithms to Application to the Thyroid Functional State”.
cluster the brain tumor. In medical diagnostic Methods of Information in Medicine, Vol.22, (1983)
systems, fuzzy c means algorithm gives the better 93- 101.
results according to our application. Another
important feature of fuzzy c means algorithm is [3]. L. Ozyilmaz, T. Yildirim, “Diagnosis of Thyroid
membership function and an object can belong to Disease using Artificial Neural Network Methods”,
several classes at same time but with different Proceedings of the 9’th International Conference on
degrees. This is a useful feature for a medical Neural Information Processing (ICONIP’02) (2002).
diagnostic system. At a result, fuzzy clustering
method can be important supportive tool for the [4]. G. Berks, D.G. Keyserlingk, J. Jantzen, M.
medical experts in diagnostic. Future work is fuzzy c Dotoli, H. Axer, “Fuzzy Clustering- A Versatile Mean
means result is to be compared with other fuzzy to Explore Medical Database”, ESIT2000, Aachen,
segmentation. Reduced time period of fuzzy Germany.
segmentation is used in medical.

MRI Image Classification Using Orientation Pyramid and Multi
resolution Method
R. Catharine Joy, Anita Jones Mary
PG student of Applied Electronics, Lecturer
Department of Electronics and Communication Engineering
Karunya University, Coimbatore.,

Abstract--In this paper, a multi-resolution volumetric like MRI head-neck studies has been addressed by
texture segmentation algorithm is used. Textural supervised statistical classification methods, notably EM-
measurements were extracted in 3-D data by sub-band MRF. The segmented portions cannot be seen clearly
filtering with an Orientation Pyramid method. through 2-D slice image. So we are going for 3-D rendering.
Segmentation is used to detect the objects by dividing the Cartilage image also cannot be segmented and viewed
image into regions based on colour, motion, texture etc. clearly. Tessellation or tilling of a plane is a collection of
Texture relates to the surface or structure of an object plane figures that fills the plane with no overlaps and no
and depends on the relation of contiguous elements and gaps.
may be characterised by granularity or roughness,
principal orientation and periodicity. We describe the 2- In this paper we describe fully a 3-D texture
D and 3-D frequency domain texture feature description scheme using a multi-resolution sub-band
representation by illustrating and quantitatively filtering and to develop a strategy for selecting the most
comparing results on example 2-D images and 3-D MRI. discriminant texture features conditioned on a set of training
First, the algorithm was tested with 3-D artificial data images. We propose a sub-band filtering scheme for
and natural textures of human knees will be used to volumetric textures that provide a series of measurements
describe the frequency and orientation multi-resolution which capture the different textural characteristics of the
sub-band filtering. Next, the three magnetic resonance data. The filtering is performed in the frequency domain
imaging sets of human knees will be used to discriminate with filters that are easy to generate and give powerful
anatomical structures that can be used as a starting results. A multi-resolution classification scheme is then
point for other measurements such as cartilage developed which operates on the joint data-feature space
extraction. within an oct-tree structure. This benefits both the efficiency
of the computation and ensures only the certain labelling at
Index Terms- Volumetric texture, Texture Classification, a given resolution is propagated to the next. Interfaces
sub-band filtering, Multi-resolution. between regions (planes), where the label decisions are
uncertain, are smoothed by the use of 3-D “butterfly” filters
I.INTRODUCTION which focus the inter-class labels.

Volumetric texture analysis is highly desirable, like for II.LITERATURE SURVEY

medical imaging applications such as magnetic resonance
imaging segmentation, Ultrasound, or computed Texture analysis has been used with mixed success in MRI,
tomography where the data provided by the scanners is such as for detection of micro calcification in breast imaging
either intrinsically 3-D or a time series of 2-D images that and for knee segmentation and in CNS imaging to detect
can be treated as a data volume. Moreover, the segmentation macroscopic lesions and microscopic abnormalities such as
system can be used as a tool to replace the tedious process for quantifying contra lateral differences in epilepsy
of manual segmentation. Also we describe a fully texture subjects, to aid the automatic delineation of cerebellar
description using a multi- volumes, to estimate effects of age and gender in brain
resolution sub-band filtering. Texture features derived from asymmetry and to characterize spinal cord pathology in
grey level co-occurrence matrix (GLCM) calculate the joint Multiple Sclerosis. Segmenting the trabecular region of the
statistics tics of grey-levels of pairs at varying distances and bone can also be viewed as classifying the pixels I that
is a simple and widely used texture feature. Texture analysis region, since the boundary is initialized to contain intensity
has been used with mixed success in MRI, such as for and texture corresponding to trebecular bone, then grows
detection of micro calcification in breast imaging and for outwards to find the true boundary of that bone region.
knee segmentation. Each portion of the cartilage image is However, no classification is performed on the rest of the
segmented and shown clearly. Texture segmentation is to image, and the classification of trabecular bone is performed
segment an image into regions according to the textures of locally. The concept of image texture is intuitively obvious
the regions. The goal is to simplify and change the to us; it can be difficult to provide a satisfactory definition.
representation of an image into something that is more Texture relates to the surface or structure of an object and
meaningful and easier to analyse. For example, the problem depends on the relation of contiguous elements and may be
of grey-matter white- characterized by granularity or rough ness, principal
orientation and periodicity.
matter labelling in central nervous system (CNS) images The principle of sub band filtering can equally be
applied to images or volumetric data. Wilson and Spann
proposed a set of operations that subdivide the frequency
domain of an image into smaller regions by the use of two
operators’ quadrant and center-surround. By combining
these operators, it is possible to contrast different
tessellations of the space, one of which is the orientation
pyramid. To visualize the previous distribution, the
Bhattacharyya space and its two marginal distributions were
obtained for a natural texture image with 16 classes. It is
important to mention two aspects of this selection process:
the Bhattacharyya space is constructed on training data and
the individual Bhattacharyya distances are calculated Fig: 1 (a, b) 2-D orientation pyramid
between pairs of classes. Therefore, there is no guarantee
that the feature selected will always improve the
classification of the whole data space; the features selected
could be mutually redundant or may only improve the
classification for a pair of classes but not the overall


Volumetric Texture is considered as the texture that can be

found in volumetric data. Texture relates to the surface or
structure of an object and depends on the relation of (c) 3-D orientation pyramid.
contiguous elements. The other concepts of texture are
smoothness, fineness, coarseness, graininess and describe
the three different approaches for texture analysis:
statistical, structural and spectral. The statistical methods
rely on the moments of the grey level histogram; mean,
standard deviation, skewness, flatness, etc. According to
Sonka that texture is scale dependent, therefore a multi-
resolution analysis of an image is required if texture is going
to be analysed. Texture analysis has been used with mixed
success in MRI, such as for detection of micro calcification
in breast imaging and for knee segmentation. Texture
segmentation is to segment an image into regions according
to the textures of the regions.

IV.SUBBAND FILTERING USING AN ORIENTATION Fig: 2 A graphical example of sub-band filtering.


The principle of sub band filtering can equally be A.SUBBAND FILTERING

applied to images or volumetric data. Certain characteristics
of signals in the spatial domain such as periodicity are quite A filter bank is an array of band-pass filters that
distinctive in the frequency or Fourier domain. If the data separates the input signal into several components, each one
contain textures that vary in orientation and frequency, then carrying a single frequency sub band of the original signal.
certain filter sub bands will contain more energy than others. It also is desirable to design the filter bank in such a way
Wilson and Spann proposed a set of operations that that sub bands can be recombined to recover original signal.
subdivide the frequency domain of an image into smaller The first process is called analysis, while the second is
regions by the use of two operators’ quadrant and centre called synthesis. The output of analysis is referred as sub
surround. By combining these operators, it is possible to band signal with as many sub bands as there are filters in
contrast different tessellations of the space, one of which is filter bank.
the orientation pyramid.

Fig: 3 Sub-band filters images of the second orientation pyramid
containing 13 sub-band regions of the human knee MRI. Fig: 4 K-means classification of MR image of a human knee based on
frequency and orientation regions.
The filter bank serves to isolate different frequency
components in a signal. This is useful because for most
applications some frequencies are more important than Once the phase congruency map of an image has
others. For example these important frequencies can be been constructed we know the feature structure of the
coded with a fine resolution. Small differences at these image. However, thresholding is course, highly subjective,
frequencies are significant and a coding scheme that and in the end eliminates much of the important information
preserves these differences must be used. On the other hand, in the image. Some other method of compressing the feature
less important frequencies do not have to be exact. A coarser information needs to be considered, and some way of
coding scheme can be used, even though some of the finer extracting the non-feature information, or the smooth map of
details will be lost in the coding. the image, needs to be developed. In the absence of noise,
the feature map and the smooth map should comprise the
B.PYRAMIDS whole image. When noise is present, there will be a third
component to any image signal and one that is independent
Pyramids are an example of a multi-resolution of the other two.
representation of the image. Pyramids separate information
into frequency bands In the case of images, we can represent VI.EXPERIMENTAL RESULTS
high frequency information (textures, etc.) in a finely
sampled grid Coarse information can be represented in a The 3-D MRI sets of human knees acquired
coarser grid (lower sampling rate acceptable) Thus, coarse different protocols, one set with Spin Echo and two sets with
features can be detected in the coarse grid using a small SPGR. In the three cases each slice had dimensions of 512 x
template size This is often referred to as a multi-resolution 512 pixels and 87, 64, and 60 slices respectively. The bones,
or multi-scale resolution. background, muscle and tissue classes were labelled to
provide for evaluation. Four training regions of size 32 x 32
V.MULTIRESOLUTION CLASSIFICATION x 32 elements were manually selected for the classes of
background, muscle, bone and tissue. These training regions
A multi-resolution classification strategy can exploit the were small relative to the size of the data set, and they
inherent multi-scale nature of texture and better results can remained as part of the test data. Each training sample was
be achieved. The multi- resolution procedure consists of filtered with the OP sub-band filtering scheme.
three main stages: climb, decide and descend. The climbing
stage represents the decrease in resolution of the data by
means of averaging a set of neighbours on one level
(children elements or nodes) up to a parent element on the
upper level. Two common climbing methods are the
Gaussian Pyramid and the Quad tree. The decrease in
resolution correspondingly reduces the uncertainty in the
elements’ values since they tend toward their mean. In
contrast, the positional uncertainty increases at each level.
At the highest level, the new reduced space can be classified
either in a supervised or unsupervised scheme.

Fig: 5 One slice from a knee MRI data set is filtered with a sub-band filter
with a particular frequency.

The SPGR (Spoiled Gradient Recalled) MRI data REFERENCES
sets were classified and the bone was segmented with the
objective of using this as an initial condition for extracting [1]C. C. Reyes-Aldasoro and A. Bhalerao, “Volumetric
the cartilage of the knee. The cartilage adheres to the texture description and discriminant feature selection for
condyles of the bones and appears as a bright, curvilinear MRI,” in Proc. Information Processing in Medical Imaging,
structure in SPGR MRI data. In order to segment the C. Taylor and A. Noble, Eds., Ambleside, U.K., Jul. 2003.
cartilage out of the MRI sets, two heuristics were used: [2]W. M.Wells, W. E. L. Grimson, R. Kikinis, and F. A.
cartilage appears bright in the SPGR MRIs and cartilage Jolesz, “Adaptive Segmentation of MRI Data,” IEEE Trans.
resides in the region between bones. This is translated into Med. Imag., vol. 15, no. 4, Aug. 1996.
two corresponding rules: threshold voxels above a certain [3]C. Reyes-Aldasoro and A. Bhalerao, “The Bhattacharyya
Gray level and discard those not close to the region of space for feature selection and its application to texture
contact between bones. segmentation,” Pattern Recognit., vol. 39, no. 5, pp. 812–
826, 2006.
[4]G. B. Coleman and H. C. Andrews, “Image Segmentation
by Clustering,” Proc. IEEE, vol. 67, no. 5, pp. 773–785,
May 1979..
[5]P. J. Burt and E. H. Adelson, “The Laplacian Pyramid as
a compact Image Code,” IEEE Trans. Commun., vol. COM-
31, no. 4, pp. 532–540, Apr. 1983.
[6]V. Gaede and O. Günther, “Multidimensional access
methods,” ACM Computing Surveys, vol. 30, no. 2, pp. 170–
231, 1998.

Fig:6 The cartilage and one slice of the MRI set.


A multi-resolution algorithm is used to view the

classified images on segments. A sub-band filtering
algorithm for segmentation method was described and
tested, first, with artificial and natural textures yielding
fairly good results. The algorithm was then used to segment
a human knee MRI. The anatomical regions: muscle, bone,
tissue and background could be distinguished. Textural
measurements were extracted in 3-D data by sub-band
filtering with an Orientation Pyramid tessellation method.
The algorithm was tested with artificial 3-D images and
MRI sets of human knees. Satisfactory classification results
were obtained in 3-D at a modest computational cost. In the
case of MRI data, M-VTS improve the textural
characteristics of the data. The resulting segmentations of
bone provide a good starting point. The future enhancement
is being enhanced by using Fuzzy clustering.

Dimensionality reduction for Retrieving Medical Images
Using PCA and GPCA
W Soumya, ME, Applied Electronics, Karunya University, Coimbatore

Abstract— Retrieving images from large and varied selection or feature extraction. Some of the feature space
collections using image content is a challenging and reduction methods include Principal component analysis
important problem in medical applications. In this (PCA), Independent Component Analysis (ICA), Linear
paper, to improve the generalization ability and Discriminant Analysis (LDA), and Canonical Correlation
efficiency of the classification, from the extracted Analysis (CCA). Among these, PCA finds principal
regional features, a feature selection method called components, ICA finds independent components [11], CCA
principal component analysis is presented to select the maximize correlation [5], and LDA maximize the interclass
most discriminative features. A new feature space variance [10]. PCA is the most well known statistical
reduction method, called Generalized Principal approach for mapping the original high-dimensional features
Component Analysis (GPCA), is also presented which into low-dimensional ones by eliminating the redundant
works directly with images in their native state, as two- information from the original feature space [1]. The
dimensional matrices. In principle, redundant advantage of the PCA transformation is that it is linear and
information is removed and relevant information is that any linear correlations present in the data are
encoded into feature vectors for efficient medical image automatically detected. Then, Generalized Principal
retrieval, under limited storage. Experiments on Component Analysis (GPCA), which is a novel feature
databases of medical images show that, for the same space reduction technique which is superior to PCA, is also
amount of storage, GPCA is superior to PCA in terms of presented [2].
memory requirement, quality of the compressed images,
and computational cost.

Index Terms—Dimension reduction, Eigen vectors,

image retrieval, principal component analysis. Query Image

INTRODUCTION Feature Classifier

Images in
ADVANCES in data storage and image acquisition
technologies have enabled the creation of large image database
datasets. Also, the number of digitally produced medical
images is rising strongly in various medical departments like
radiology, cardiology, pathology etc and in clinical decision Fig. 1. Block diagram of content-based image retrieval.
making process. With this increase has come the need to be
able to store, transmit, and query large volumes of image FEATURE SELECTION METHODS
data efficiently. Within the radiology department,
mammographies are one of the most frequent application Principal Component Analysis (PCA)
areas with respect to Principal Components Analysis (PCA) which is an
classification and content-based search [7-9]. Within unsupervised feature transformation technique and
cardiology, CBIR has been used to discover stenosis images supervised feature selection strategies such as the use of
[13]. Pathology images have often been proposed for information gain for feature ranking/selection. Principal
content-based access [12] as the color and texture properties component analysis reduces the dimensionality of the search
can relatively easy be identified. In this scenario, it is to a basis set of prototype images that best describes the
necessary to develop appropriate information systems to images. Each image is described by its projection on the
efficiently manage these collections [3]. A common basis set; a match to a query image is determined by
operation on image databases is the retrieval of all images comparing its projection vector on the basis set with that of
that are similar to a query image which is referred to as the images in the database. The reduced dimensions are
content-based medical image retrieval. The block diagram chosen in a way that captures essential features of the data
for medical imag to the vectors to concentrate relevant with very little loss of information.
information in a small number of only for The idea behind the principal component analysis
reasons of computational efficiency but also because it can method is briefly outlined herein: An image can be viewed
improve the accuracy of the analysis. The set of techniques as a vector by concatenating the rows of the image one after
that can be employed for dimension reduction can be another. If the image has square dimensions (as in MR
partitioned in two important ways; they can be separated images) of L x L pixels, then the size of the vector is L2. For
into techniques that apply to supervised or unsupervised typical image dimensions of 124 x 124, the vector length
learning and into techniques that either entail feature
(dimensionality) is 15,376. Each new image has a different X=ATy+Mx (4)
vector, and a collection of images will occupy a certain
region in an extremely high dimensional space. The task of A new query image is projected similarly onto the
comparing images in this hundred thousand–dimension eigenspace and the coefficients are computed. The class that
space is a formidable one. The medical image vectors are best describes the query image is determined by a similarity
large because they belong to a vector space that is not measure defined in terms of the Euclidean distance of the
optimal for image description. However, knowledge of brain coefficients of query and each images in each class. The
anatomy provides us with similarities between these images. training set image whose coefficients are closest (in the
It is because of the similarities that we can deduce that Euclidean sense) to those of the query image is selected as
image vectors will be located in a small cluster of the entire the match image. If the minimum Euclidean distance
image space. The optimal system can be computed by the exceeds a preset threshold, the query image is assigned to a
Singular Value Decomposition (SVD). new class.
Dimension reduction is achieved by discarding the lesser
principal components. i.e., the idea is to find a more
appropriate representation for the image features so that the Generalized Principal Component Analysis (GPCA)
dimensionality of the space used to represent them can be This scheme works directly with images in their native
reduced. state, as two-dimensional matrices, by projecting the images
to a vector space that is the tensor product of two lower-
dimensional vector spaces. GPCA is superior to PCA in
A.1 PCA Implementation terms of quality of the compressed images, query precision,
The mathematical steps used to determine the principal and computational cost. The key difference between PCA
components of a training set of medical images are outlined and the generalized PCA (GPCA) method that we propose
in this paragraph (6): A set of training images n are in this paper is in the representation of image data. While
represented as vectors of length L x L, where L is the PCA uses a vectorized representation of the 2D image
number of pixels in the x (y) direction. These pixels may be matrix, GPCA works with a representation that is closer to
arranged in the form of a column vector. If the images are of the 2D matrix representation (as illustrated schematically in
size M x N, there will be total of MN such n-dimensional Figure 2) and attempts to preserve spatial locality of the
vectors comprising all pixels in the n images. The mean pixels. The matrix representation in GPCA leads to SVD
vector, Mx of a vector population can be approximated by computations on matrices with much smaller sizes. More
the sample average, specifically, GPCA involves the SVD computation on
matrices with sizes r x r and c x c, which are much smaller
K than the matrices in PCA (where the dimension is n x (r x c).
Mx = 1/K ( Xk) (1) This reduces dramatically the time and space complexities
k=1 of GPCA as compared to PCA. This is partly due to the fact
with K=MN. Similarly, the n x n covariance matrix, Cx, of that images are two-dimensional signals and there are spatial
the population can be approximated by locality properties intrinsic to images that the representation
K used by GPCA seems to take advantage of.
Cx =1/ (K-1) ∑ (Xk - Mx) (Xk - Mx) T (2)

where K-1 instead of K is used to obtain an unbiased B.1 GPCA Implementation

estimate of Cx from the samples. Because Cx is real and In GPCA, the algorithm deals with data in its native matrix
symmetric, finding a set of n ortonormal eigenvectors representation and considers the projection onto a space,
always is possible. which is the tensor product of two vector spaces. More
specifically, for given integers l1 and l2, GPCA computes the
The principal components transform is given by (l1, l2) - dimensional axis system ui x vj, for i = 1 …l1 and j =
1 …l2, where denotes the tensor product, such that the
Y=A(X-Mx) (3) projection of the data points (subtracted by the mean) onto
this axis system has the largest variance among all (l1, l2)-
It is not difficult to show that the elements of y are dimensional axes systems.
uncorrelated. Thus, the covariance matrix Cy is diagonal.
The rows of matrix A are the normalized eigenvectors of
Cx. These eigenvectors determine linear combinations of
the n training set images to form the basis set of images,that
best describe the variations in the training set images.
Because Cx is real and symmetric, these vectors form an
orthonormal set, and it follows that the elements along the
main diagonal of Cy, are the eigen values of Cx. The main
diagonal element in the ith row of Cy is the variance of
vector element Yi. Because the rows of A are orthonormal,
its inverse equals its transpose. Thus, we can recover the X’s
by performing the inverse transformation

Fig 2: Schematic view of the key difference between GPCA Step 5: Compute the d eigenvectors (Ri) of MR
and PCA. GPCA works on the original matrix representation corresponding
of images directly, while PCA applies matrix-to-vector to the largest d eigen values.
alignment first and works on the vectorized representation
of images, which may lead to loss of spatial locality Step 6: Form the matrix MR to obtain l2 eigen vectors using
information. equation (8).

Formulation of GPCA: Let Ak, for k = 1,…….., n Step 7: Compute the d eigenvectors (Li) of ML
be the n images in the dataset and calculate mean using the corresponding to the largest d eigen values.
equation (5) given below
n Step 8: Obtain the reduced representation using the
M=1/n (∑ Ak) (5) equation,
Let, Dj = LTAj R (9)
Aj= Ak –M for all j (6).
GPCA aims to compute two matrices L and R with
orthonormal columns, such that the variance var (L, R) is
maximum using equations (7) and (8). The main In this experiment, we applied PCA and GPCA on the 40
observation, which leads to an iterative algorithm for images of size 124x124 in the medical image dataset that
GPCA, is stated in the following theorem: contains brain, chest, breast and elbow images which is
n shown in figure.3. Both PCA and GPCA can be applied for
ML = ∑ Aj Ri RiT AjT (7) medical image retrieval. The experimental comparison of
j=1 PCA and GPCA is based on the assumption that they both
n use the same amount of storage. Hence it is important to
MR = ∑ AjT Li LiT Aj (8) understand how to choose the reduced dimension for PCA
j=1 and GPCA for a specific storage requirement. We use p = 9
Theorem: Let L, R be the matrices maximizing the variance (where p corresponds to the principal components) in PCA
var (L, R). Then, (as shown in TABLE I) and set d = 4 (where d corresponds
_ to the largest two eigen values) for GPCA (as shown in
For a given R, matrix L consists of the l1 eigenvectors TABLE II) correspondingly.
of the matrix ML corresponding to the largest l1 eigen

For a given L, matrix R consists of the l2 eigenvectors

of the matrix MR corresponding to the largest l2 eigen

Theorem provides us an iterative procedure for computing

L and R. More specifically, for a fixed L, we can compute R
by computing the eigenvectors of the matrix MR. With the
computed R, we can then update L by computing the
eigenvectors of the matrix ML. The solution depends on the
initial choice, L0, for L. Experiments show that choosing L0
= (Id, 0) T, where Id is the identity matrix, produces excellent Fig. 3. Medical image database
results. We use this initial L0 in all the experiments. Given L
and R, the projection of Aj onto the axis system by L and R The reduced dimensions are chosen in a way that captures
can be computed by Di = LT Aj R. essential features of the data with very little loss of
information. PCA is popular because of its use of
Algorithm: multidimensional representations for the compressed format.
Let A1……..An be the n images in a database.

Step 1: Calculate mean of all the n images using equation


Step 2: Subtract the mean from image using equation (6).

Step 3: Set an identity matrix using L0= (Id, 0) T.

Step 4: Form the matrix ML to obtain l1 eigen vectors using

equation (7).

TABLE I of Recurrent Neural Network in which a physical path exists
Features obtained for from output of a neuron to input of all neurons except for
PCA the corresponding input neuron. If PCA features are fed to
Images Eigen Vectors Brain Chest Breast Elbow Hopfield network, then 9 neurons are used in input layer
1 V1 0.5774 0.5774 0.5774 0.5774 since size of PCA feature vector is 1¯9 and if GPCA
0.5774 0.5774 0.5774 0.5774 features are used as classifier input then, 4 neurons are used
0.5774 0.5774 0.5774 0.5774 in the input layer of Hopfield network since GPCA feature
V2 0.4082 -0.0775 0.7071 -0.8128
vector is of size 1¯4. Energy is calculated using equation
0.4082 -0.6652 -0.7071 0.4735
-0.8165 0.7427 0 0.3393 T
E= -0.5* S*W*S (5)
V3 0.7071 -0.8128 0.4082 -0.0775
where, E is the energy of a particular pattern (S)
-0.7071 0.4735 0.4082 -0.6652
W is the weight value
0 0.3393 -0.8165 0.7427
2 V1 0.5774 0.5774 0.5774 0.5774
The test pattern energy is compared with the stored
0.5774 0.5774 0.5774 0.5774
pattern energy and the images having energy close to the
0.5774 0.5774 0.5774 0.5774 test pattern energy are retrieved from the database.
V2 0.7071 -0.7946 -0.7887 -0.7573
-0.7071 0.5599 0.5774 0.6430 CONCLUSION
0 0.2348 0.2113 0.1144
V3 0.4082 -0.1877 -0.2113 -0.3052
0.4082 -0.5943 -0.5774 -0.5033
To overcome problems associated with high
dimensionality, such as high storage and retrieval times, a
-0.8165 0.7820 0.7887 0.8084
dimension reduction step is usually applied to the vectors to
concentrate relevant information in a small number of
GPCA compute the optimal feature vectors L and R such dimensions. In this paper, two subspace analysis methods
that original matrices are transformed to a reduced 2 x 2 such as Principal Component Analysis (PCA) and
matrices and in PCA, feature vectors are obtained as a 3x3 Generalized Principal Component Analysis (GPCA) is
matrix which is listed in tables 1 and 2. presented and compared. PCA is a simple well known
dimensionality reduction technique that applies matrix-
TABLE II vector alignment first and works on the vectorized
Features obtained for GPCA representation of images, which may lead to loss of spatial
Images Matrix
locality information, while GPCA works on the original
Brain Db1 -3.3762 1.1651 matrix representation of images directly. GPCA is found
superior to PCA in which dimensionality is reduced to a 2x2
-0.2207 -0.6612
matrix, whereas in PCA eigen vectors are obtained as a 3x3
Db2 4.6552 2.6163 matrix. GPCA works directly with images in their native
state, as two-dimensional matrices, by projecting the images
-0.4667 0.7519
to a vector space that is the tensor product of two lower-
Db3 4.6552 -2.7397 dimensional vector spaces.
2.6163 -1.6044
Db4 -1.7318 0.1744

0.7202 -0.4391
[1] U. Sinha, H. Kangarloo, Principal component analysis
for content-based image retrieval, RadioGraphics 22 (5)
Db5 -1.6252 0.0462 (2002) 1271-1289.
-0.0010 0.1173 [2] J. Ye, R. Janardan, and Q. Li. GPCA: An efficient
dimension reduction scheme for image compression and
retrieval. In KDD ’04: Proceedings of the tenth ACM
Therefore, GPCA has asymptotically minimum memory SIGKDD international conference on Knowledge discovery
requirements, and lower time complexity than PCA, which And data mining, pages 354–363, New York, NY, USA,
is desirable for large medical image databases. GPCA also 2004. ACM Press.
uses transformation matrices that are much smaller than [3] Henning Muller, Nicolas Michoux, David Bandon,
PCA. This significantly reduces the space to store the Antoine Geissbuhler, “A review of content-based image
transformation matrices and reduces the computational time retrieval systems in medical applications - clinical benefits
in computing the reduced representation for a query image. and future directions”, International Journal of Medical
Experiments show superior performance of GPCA over Informatics.,vol. 73, pp. 1 – 23, 2004.
PCA, in terms of quality of compressed images and query [4] Imola K. Fodor Center for Applied Scientific
precision, when using the same amount of storage. Computing,Lawrence Livermore National Laboratory, A
The feature vectors obtained through feature selection survey of dimension reduction techniques.
methods are fed to a Hopfield neural classifier for efficient [5] Marco Loog1, Bram van Ginneken1, and Robert P.W.
medical image retrieval. Hopfield neural network is a type
Duin2 “Dimensionality Reduction by Canonical [10] B. Bai, P. Kantor, N. Cornea, and D. Silver. Toward
Contextual Correlation Projections,” T. Pajdla and J. content-based indexing and retrieval of functional brain
[6] P. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, Z. images. In Proceedings of the (RIAO07), 2007.
Protopapas, Fast and effective retrieval of medical tumor [11] D. Comaniciu, P. Meer, D. Foran, A. Medl, Bimodal
shapes, IEEE Transactions on Knowledge and Data system for interactive indexing and retrieval of pathology
Engineering 10 (6) (1998) 889-904. images, in: Proceedings of the Fourth IEEE Workshop on
[7] S. Baeg, N. Kehtarnavaz, Classification of breast mass Applications of Computer Vision (WACV'98), Princeton,
abnormalities using denseness and architectural distortion, NJ, USA, 1998, pp. 76{81.
Electronic Letters on Computer Vision and Image Analysis1 [12] M. R. Ogiela, R. Tadeusiewicz, Semantic-oriented
(1) (2002) 1-20. syntactic algorithms for content recognition and
[8] F. Schnorrenberg, C. S. Pattichis, C. N. Schizas, K. understanding of images in medical databases, in:
Kyriacou, Content{based retrieval of breast cancer biopsy Proceedings of the second International Conference on
slides, Technology and Health Care 8 (2000) 291{297. Multimedia and Exposition (ICME'2001), IEEE Computer
[9] Two-dimensional nearest neighbor discriminant analysis Society, IEEE Computer Society, Tokyo, Japan, 2001, pp.
Xipeng Qiuf, Lide Wu 0925-2312/$ - see front matter r 2007 621-624.
Elsevier B.V. All rights reserved. [13]
doi:10.1016/j.neucom.2007.02.001 Image base v6 July 2007.

Efficient Whirlpool Hash Function
D.S.Shylu J.Piriyadharshini
Sr.Lecturer, ECE Dept, II ME(Applied Electronics)
Karunya University, Karunya University
Coimbatore- 641114. Coimbatore- 641114.
mail mail id: riya_