You are on page 1of 5

Implementation of Discete Wavelet Transform on

FPGA for Iris Recognition

Akhilesh Patil
Dept. of Electrical Engg, Arizona State University,
Tempe, AZ

Adwait Purandare
Dept. of Electrical Engg, Arizona State University,
Tempe, AZ

Akash Sharma
Dept. of Electrical Engg, Arizona State University,
Tempe, AZ

AbstractIris Recognition systems are gaining quite

significance in todays security systems from high end buildings
to smart phone locks. Low Power and accurate recognition along
with compressed iris image size for recognition is possible by
implementing the Discrete Wavelet Transform in such scenarios.
The Wavelet transform provides time as well as frequency
information for a non-stationary signal unlike the Fourier
transform which has information pertaining only to the
frequency components. It further compresses the actual 2D
image for faster comparison and detection. This paper
implements the Discrete Wavelet Transform with the goal of a
real time iris recognition system, thus emphasizing heavily on
architecture techniques to improve performance. We see
considerable improvement when implemented on Virtex-7
techniques; Low Power; Area; Iris Recognition




Iris Recognition systems are quite popular today as a

foolproof method for security checks. Not only have they been
set at every entrance of buildings of national security and
importance but also have become a trend at corporate offices.
Smartphone security concerns have also become eminent and
many leading smartphone vendors are entering the iris
recognition based smartphone unlock systems domain. Iris
recognition is unique in the sense that no two humans out of
1072 have the same iris pattern. Even genetically identical
twins have different iris patterns and unlike fingerprints, iris is
heavily protected and does not vary with human age. Since iris
recognition is pattern dependent, color induced variations like
lenses also do not matter. Thus it has provided to be the most
reliable biometric system available today. A typical Iris
recognition system is seen in Fig 1. Extraction of the image,
processing image and recognition form the three main
subsystems. The algorithm of interest Discrete Wavelet
Transform (DWT) is part of the image processing module.

Fig1. Iris Recognition System

The DWT algorithm used in feature extraction and image

compression forms the bottleneck of the Iris Recognition
system due to its computationally intensive nature involving
floating point multiplication and additions, hence it is
imperative to enhance the performance of the algorithm. Thus
the design goal of this project is to improve the frequency of
operation of the DWT algorithm, such that the entire Iris
recognition system works real-time.


The iris images are accessed from the database [1]. For the
pre-processing of the iris images, and useful feature extraction
the method in [2] is followed. Input image does not contain
only useful information from iris zone but also useless data
derived from the surrounding eye region. Before extracting the
features of an iris, the input image must be pre-processed to
localize, segment and enhance the region of interest (i.e., iris
zone). The system normalizes the iris region to overcome the
problem of a change in camera-to-eye distance and pupils
size variation derived from illumination. Furthermore, the
brightness is not uniformly distributed due to non-uniform
illumination, the system must be capable of removing the
effect and further enhancing the iris image. Hence, the image
pre-processing consists of iris localization, iris segmentation
& normalization, and enhancement units. We did this part in
MATLAB to generate the final Iris template as seen in Fig 2.

First iris localization and segmentation was done to

identify the iris and the pupil region from the image template.
The boundary of pupil and iris is recognized using canny edge
detector as it has low error rate, response only to edge and the
difference between obtained and actual present edge is less.

Fig 2a Iris image

Fig 2b Edge detection and segmentation

image processing due to their application to nonstationary

signals as well as their ability to compress them.


A. Block Diagram
A 1D DWT implements a high pass and a low pass filter.
For multiresolution analysis and DWT purpose, Haar or
Daubechies wavelets can be used. The Length of the filter
used plays an important role in the number of FIR filters used
in the next stage i.e. after filtering of rows, when the filtering
of columns is to be done the number of channels used in for
the FIR filters of columns can be reduced if the length of the
filter is less, although this is a tradeoff if accuracy is to be
considered. For our case the word length of the filtering
coefficients was kept as 16, and the length of FIR filter was 4,
so a 'db2' mother wavelet was used. A 2D wavelet transform,
as in Fig 3., first applies the mother wavelet filtering on all the
rows, and then on all the columns leading to 4 output matrices.

Fig.2c. Iris Pre-processed image

Once the segmentation module has estimated the iriss

boundary, the normalization module uses image registration
technique to transform the iris texture from Cartesian to polar
coordinates. The process, often called iris unwrapping, yields
a rectangular entity that is used for subsequent processing.
Normalization has three advantages:
It accounts for variations in pupil size due to changes in
external illumination that might influence iris size.
It ensures that the irises of different individuals are mapped
onto a common image domain in spite of the variations in
pupil size across subjects.
It enables iris registration during the matching stage through
a simple translation operation that can account for in-plane
eye and head rotations.
Once the normalization is done we obtain the template as
shown which is used for further analysis by the DWT.


The Fourier transform converts time signals into their

frequency domain. However, there is a loss of information
pertaining to the exact or approximate time sample which
generated the frequency bands. Indeed, Fourier transform
applied to a stationary signal and another non-stationary signal
can generate same frequency bands. Thus the need of wavelet
transform, which divides the input signal into time related sub
bands, and generates frequency related information for the
sub-band or the wavelet. It retains time as well as frequency
information of the wavelet. Further, one level of the Discrete
Wavelet transform provides compression of the input signal in
half due to its symmetric property of regeneration. Although
the idea behind wavelet transforms dates back to 1970s, the
implementation of this transform dates back to a decade only.
Wavelet transforms have become the norm form video and

Fig3. Level 1 implementation of 2D DWT

After passing the signal through a half band low pass filter,
half of the samples can be eliminated according to
the Nyquists rule, since the signal now has a highest
frequency of p/2 radians instead of p radians. Simply
discarding every other sample will subsample the signal by
two, and the signal will then have half the number of points.
The scale of the signal is now doubled. Note that the low pass
filtering removes the high frequency information, but leaves
the scale unchanged. Only the sub sampling process changes
the scale. Resolution, on the other hand, is related to the
amount of information in the signal, and therefore, it is
affected by the filtering operations. Half band low pass
filtering removes half of the frequencies, which can be
interpreted as losing half of the information. Therefore, the
resolution is halved after the filtering operation. Note,
however, the sub sampling operation after filtering does not
affect the resolution, since removing half of the spectral
components from the signal makes half the number of samples
redundant anyway. Half the samples can be discarded without
any loss of information. In summary, the low pass filtering
halves the resolution, but leaves the scale unchanged. The
signal is then sub sampled by 2 since half of the number of
samples is redundant. This doubles the scale.

In a similar manner the filtering and the sub sampling

operations are carried out to generate the 1-level coefficients.
B. Software implementation of DWT
The 2D DWT is implemented in MATLAB using the
inbuilt function, as well as using block wise convolution
function on input template Fig 2c. The block wise
implementation involved convolution function on rows
separately followed by down sampling and convolution on
columns, followed again by down sampling. This exercise was
performed to verify that the block wise implementation
matches the actual expected functional implementation. The
goal is to implement the block wise implementation in
Simulink Synphony Model Compiler platform (SMC) to be
synthesized for an FPGA later on.
C. Hardware implementation (SMC)
SMC inputs are time sampled. Thus every input is entered
at a time slice. The filtering operation - LPF is performed on
the rows and then down-sampled. The intermediate output of
level 1 2D DWT is verified with the software output. To
perform filtering operation on columns, some array
manipulation and internal data storage has to be maintained
since all the data is not available at one time slice. A
commutator block time multiplexes the inputs and an Mcontrol block selects the time for de-multiplexing for the
second stage of low pass filtering. This is again followed by a
down-sampler block. Thus an image of resolution 60*480 gets
down-sampled to 30*240. This shows the image compression
achieved 4 times the reduction of the original image.

Fig 4: One level DWT in SMC Simulink.

This above 1 level DWT generates the LL part (Low pass

on rows followed by low pass on columns). This is just one of
the 4 possible combinations of the DWT on the image, with
HL, LH and HH being the others. The LL is called the
approximate image, and contains most of the actual image
information (about 99%). Although the others can be
effectively ignored for image recognition as well as image
reconstruction, they were constructed for architecture
observation purpose. The DWT output is converted into
binary image using image thresholding which is the input for
the pattern recognition as seen in Fig 6.

Fig 6: DWT pattern for recognition



A. FIR Filter Architecture

As far as the DWT hardware is concerned it main
computationally intensive part is the filtering process. The
filtering process is carried by the FIR filters and with the FIR
architecture used plays an important role in the
performance/power/area tradeoffs. Each convolution operation
carried out by FIR filters inherently consists of Multiplication
and addition of the samples as shown in Fig 5. So multipliers
and adders are the important block that affects the
performance of the design.
In our project we tried to implement both the direct form
of FIR filter and the systolic form of the FIR filters. In our
case we have 4 filter coefficients and the number of taps for
the fir filters is 4. so one channel of FIR filter uses 4
multipliers and 3 adders i.e. 7 DSP slices/blocks. The total
number of resources DSP slices utilized for a one level DWT
for the input image of dimensions 60*480 is as shown in
Table 1.

Fig 7a: Direct Form Implementation

Fig 7b: Systolic Form Implementation

Fig 5: Direct form FIR filter using adder and gain blocks

The Systolic form can run at a more higher frequency than

direct form this is because of the pipelining present in the
systolic FIR filter. The critical path is now reduced to
Tmult+Tadder, which was earlier Tmult+3Tadder. The
number of FF have been increased. This is because of
additional pipelining registers.

Table 1: Filter implementation results

The Direct form of FIR filter was implemented by manual

construction of multipliers and adders to compare with SMC
inbuilt block implementation. The observation is that although
both use the same number of DSP slices, the number of FlipFlops and LUTs are significantly lower in the in-built
implementation. To conclude, Systolic architecture is chosen
from the FIR block in SMC henceforth.
B. Choice of FPGA
DSP slices are crucial for implementation of the DWT
algorithm, since they have an advantage of Improved
flexibility and utilization, Improved application efficiency,
Reduced overall power consumption, Increased maximum
frequency, Reduced set-up plus clock-to-out time, Support for
many independent functions, including multiply, multiply
accumulate (MACC), multiply add, three-input add, barrel
shift, wide-bus multiplexing, magnitude comparator, bit-wise
logic functions, pattern detect, and wide counter and support
for cascading multiple DSP48s slices to form wide math
functions, DSP filters, and complex arithmetic without the use
of general FPGA fabric.

C. 1 Level DWT Complete v/s LL

1 Level DWT with all its LL, LH, HL, HH will require
7588 slices in theory while the implementation of only LL
requires 2107 DSP slices as seen in Table 3. As mentioned
before, the implementation of FIR filters using DSP slices is
of prime importance, hence we discard implementing the
DWT with all 4 images as the output.
D. 1 Level DWT Folded architecture
The implementation of DWT is a feed-forward
architecture, and all the filter coefficients are the same. Thus,
folding and retiming is possible with the architecture. Fig 7.
Shows 1 level DWT implemented in a regular implementation
and after folding. Folding reduces the DSP slice utilization but
also degrades performance. The same DSP slices are reutilized and thus there is reduction seen. However, since the
same blocks are utilized, the net frequency of operation
reduces. The LUT utilization increases due to multiplexing
required for the blocks.

Table 3: One Level DWT implementation Regular and Folded

E. 2 Level DWT
2 level DWT works on the LL output of the first level. The
LL is further filtered row-wise as well as column-wise as
shown in Fig 8. Level 2 DWT is useful for performance at
recognition step since the image is further compressed and
smaller data for the hamming code based recognition
algorithm. However, as the number of levels increase, the
amount of data lost also increases which may lead to increased
False acceptance rate; showing a tradeoff between recognition
speed and recognition rate.

Table 2: FPGA platform comparison

We implemented the Design on the Virtex 7 platform as it had

the required high no. of DSP slices required.

Fig8. Step-wise Three Level Wavelet Image Decomposition

F. 2 Level DWT Implementation

2 Level DWT is implemented in its regular structure as well as
folded architecture for systolic FIR filter with only LL images.
Its result is seen in Table 4.

possible using folding techniques but this degrades the

performance and is not suitable for real-time applications.
Thus systolic form of FIR filter is one of the most efficient
filter architecture that can be used not only in DWT algorithm
but also in other form of DSP algorithms which are to be
optimized for performance.


Table 4: Two Level DWT implementation Regular and



Iris recognition is one of the most efficient biometric

method used for the recognition and authentication of the
person. The amount of compression achieved (in terms of
number of pixels) and feature extraction depends on the Level
of DWT used. The size of the image generated as the DWT
output plays a vital role in the further comparison for the Iris
matching with database, since as small the image size less the
time required for further comparisons. In this project we
observed how different styles of architectural optimizations on
Filter level, and system level affect the performance and area
requirements of the design. Pipelining the Direct form of the
FIR filter to convert it into the systolic form gives a boost in
the frequency of operation, which is essential for real time
applications, such as iris recognition. Area reduction is






Institute of Automation, Chinese Academy of Sciences, CASIA Iris

Image Database,
Libor Masek, Peter Kovesi, MATLAB Source Code for a Biometric
Identification System Based on Iris Patterns, The School of Computer
Science and Software Engineering, The University of Western Australia.
Wen-Shiung Chen, Kun-Huei Chih, Sheng-Wen Shih and Chih-Ming
Hsieh PersonalIdentification with Human Iris Recognition based on
Wavelet Transform IAPRConference on Machine VIsion Applications,
May 16-18, 2005 Tsukuba Science City,Japan.
Qijun Huang, Yajuan Wang and Sheng Chang High-Performance
FPGAImplementation of Discrete Wavelet Transform for Image
Processing Photonics andOptoelectronics (SOPO), 2011 Symposium.
M.Sifuzzaman, M.R. Islam, M.Z. Ali Applications of Wavelet
Transform and itsAdvantages Compared to Fourier Transform Journal
of Physical Sciences, Vol.13, 2009,121-134.
Philippe Guermeur A real time Discrete Wavelet Transform
Implementation on anFPGA Architecture ISAS 98.
Discrete Wavelet Transforms Theory and Applications by Jusso
Ahmad M.Sarhan, Iris Recognition Using Discrete Cosine Transform
and Artificial Neural Networks, Journal of Computer Science, Vol.5,
No.5, pp. 369-373, 2009.