You are on page 1of 67

Offline Signature Verification with Convolutional Neural Networks

Abstract

Signature verification is an important biometric technique that aims to detect whether


a given signature is forged or genuine. It is essential in preventing falsification of documents
in numerous financial, legal, and other commercial settings. Our project aims to automate the
process of signature verification by using convolutional neural networks (CNNs). Our model
is based on the CNN architecture to train our model with transfer learning. When classifying
whether a given signature was a forgery or genuine, we achieve accuracies of 97% for Dutch
signatures and 95% for Signatures. We also performed several experiments altering the types
of training data and prediction task to make it more relevant to real-life applications, for
which our methods seem promising but for which we were not able to achieve results much
higher than a naive baseline.

CHAPTER 1
INTRODUCTION

Biometric authentication is the process of verifying the identity of individuals based


on their unique biological characteristics. It has become a ubiquitous standard for access
to high security systems. Current methods in machine learning and statistics have allowed
for the reliable automation of many of these tasks (face verification, fingerprinting, iris
recognition). Among the numerous tasks used for biometric authentication is signature
verification, which aims to detect whether a given signature is genuine or forged.
Signature verification is essential in preventing falsification of documents in numerous
financial, legal, and other commercial settings. The task presents several unique
difficulties: high intra-class variability (an individual’s signature may vary greatly day-to-
day), large temporal variation (signature may change completely over time), and high
inter-class similarity (forgeries, by nature, attempt to be as indistinguishable from genuine
signatures as possible). There exist two types of signature verification: online and offline.

Signature verification is an important research area in the field of person


authentication [1]. A signature is treated as an image carrying a certain pattern of pixels
that pertains to a specific individual. Signature verification problem is concerned with
examining and determining whether a particular signature truly belongs to a person or
not. Signatures are a special case of handwriting in which special characters and
flourishes are viable. Signature verification is a different pattern recognition problem as
no two genuine signatures of a person are precisely the same. The difficulty also stems
from the fact that the skilled forgeries follow the genuine pattern unlike fingerprints
which vary widely for two different persons. Ideally, interpersonal variation should be
much more than intrapersonal variation. Therefore, it is important to identify those
features that minimize the intrapersonal variation and maximize interpersonal variation.
The key factor is to differentiate between the parts of the signature that are habitual and
those that vary with almost every signature. There are two approaches used for signature
verification according to the acquisition of the data as offline and online signature
verification system. In offline verification system, only static features are considered,
whereas in case of online systems, dynamic features are taken into consideration. Offline
signature recognition [2] is performed after the writing is complete. The data is captured
at a later time by using an optical scanner to convert the image into a bit pattern. Online
signature recognition, in contrast, means that the machine recognizes the handwriting as
the user writes. It requires a transducer that captures the signature as it is written and
hence the features are dynamic in nature. Offline data is a 2-D image of the signature.
Processing Offline is complex due to the absence of stable dynamic characteristics.
Difficulty also lies in the fact that it is hard to segment signature strokes due to highly
stylish and unconventional writing styles. The nature and the variety of the writing pen
may also affect the nature of the signature obtained. The non-repetitive nature of variation
of the signatures, because of age, illness, geographic location and perhaps to some extent
the emotional state of the person, accentuates the problem. All these coupled together
cause large intra-personal variation. A robust system has to be designed which should not
only be able to consider these factors but also detect various types of forgeries. The
system should neither be too sensitive nor too coarse. It should have an acceptable trade-
off between a low False Acceptance Rate (FAR) and a low False Rejection Rate (FRR).
The designed system should also find an optimal storage and comparison solution for the
extracted feature.

Online verification requires an electronic signing system which provides data such as
the pen’s position, azimuth/altitude angle, and pressure at each time-step of the Figure 1.
Superimposed examples of multiple genuine signatures from the same ID, indicating high
intraclass variability (from [10]) signing. By contrast, offline verification uses solely 2D
visual (pixel) data acquired from scanning signed documents. While online systems
provide more information to verify identity, they are less versatile and can only be used in
certain contexts (e.g. transaction authorization) because they require specific input
systems. We aim to build an offline signature verification system using a Convolutional
Neural Network (CNN). Our paper focuses on building systems trained on data with
varying degrees of information, as well as experimenting with different objective
functions to obtain optimal error rates.

CHAPTER 2

RELATED WORK

2.1. Tasks Within Signature Verification

Unfortunately, the task of offline signature verification does not have a uniform standard
for performance evaluation. In 2004, the University of Hong Kong hosted the first
international signature verification competition [2], but the competition was solely for
online signatures. Nonetheless, this paved the way for future competitions for both online
and offline verification ([3], [4]). Beyond the online/offline distinction, there are two
further distinctions that define the task of signature verification, which we will refer to as
”forgery exposure” and ”writer-dependence”. The forgery exposure of the task is
determined by what kinds of signatures the model is trained on. Some authors train on
forgeries and genuine signatures, but test on different forgeries and genuine signatures
during testing; some train on forgeries for identities that exist solely in the training set
(then test on forgeries and genuine signatures for novel IDs); some authors do not use
forgeries at all during training. It is worth noting that only the latter two tasks are
reasonable for practical applications. That is, while a system could have access to a
development set that includes forgeries, it is unlikely that forgeries can be acquired for all
users of that system. We have trained systems for the first two of these varieties. In the
experiments section, we detail exactly how we use the training and test data to
accomplish each task. The writer-dependence distinction refers to whether the model has
a different classifier for each identity, or a single classifier for all identities. Writer-
dependent models, which are more common in the literature, use a different classifier for
each user. At test time, the model is given the ID of the signature for which it is testing
authenticity. Writer-independent models use a single classifier for all identities. We have
developed both writer-dependent and writer-independent models in our report. 2.2.
Feature Selection Most existing models in the literature use explicit feature extraction,
including geometric [9], graphometric [10], directional [11], wavelet [12], shadow [13],
and texture [14] features . Only in recent years has feature learning been explored [15].
We feed in raw pixels to our network, letting the CNN learn the relevant features for
signature verification. 2.3. Competing Models The signature verification literature
includes several examples of Hidden Markov Models [18], Neural Networks [17],
Support Vector Machines [19], and other machine learning models. In 2012, Khalajzadeh
et al. [16] used CNNs for Persian signature verification, which is the only report of CNNs
being used in the offline signature verification literature. Unfortunately, the paper
includes very little information about their methodology (i.e. forgery exposure, writer
dependence). Based on their explanation, we assume their model is trained on forgeries
and genuine signatures for all IDs, and thus can be compared with our work in our main
experiment. There results only report an average of 99.86 for validation performance, and
mean squared error, making it difficult to fully compare our model to theirs.

MOTIVATION:

THESIS ORGANIZATION:

CHAPTER 3

DIGITAL IMAGE PROCESSING

3.1 BACKGROUND:
Digital image processing is an area characterized by the need for extensive
experimental work to establish the viability of proposed solutions to a given problem. An
important characteristic underlying the design of image processing systems is the significant
level of testing & experimentation that normally is required before arriving at an acceptable
solution. This characteristic implies that the ability to formulate approaches &quickly
prototype candidate solutions generally plays a major role in reducing the cost & time
required to arrive at a viable system implementation.

3.2 What is DIP?

An image may be defined as a two-dimensional function f(x, y), where x & y are
spatial coordinates, & the amplitude of f at any pair of coordinates (x, y) is called the
intensity or gray level of the image at that point. When x, y & the amplitude values of f are
all finite discrete quantities, we call the image a digital image. The field of DIP refers to
processing digital image by means of digital computer. Digital image is composed of a finite
number of elements, each of which has a particular location & value. The elements are called
pixels.

Vision is the most advanced of our sensor, so it is not surprising that image play the
single most important role in human perception. However, unlike humans, who are limited to
the visual band of the EM spectrum imaging machines cover almost the entire EM spectrum,
ranging from gamma to radio waves. They can operate also on images generated by sources
that humans are not accustomed to associating with image.

There is no general agreement among authors regarding where image processing stops
& other related areas such as image analysis& computer vision start. Sometimes a distinction
is made by defining image processing as a discipline in which both the input & output at a
process are images. This is limiting & somewhat artificial boundary. The area of image
analysis (image understanding) is in between image processing & computer vision

There are no clear-cut boundaries in the continuum from image processing at one end to
complete vision at the other. However, one useful paradigm is to consider three types of
computerized processes in this continuum: low-, mid-, & high-level processes. Low-level
process involves primitive operations such as image processing to reduce noise, contrast
enhancement & image sharpening. A low- level process is characterized by the fact that both
its inputs & outputs are images. Mid-level process on images involves tasks such as
segmentation, description of that object to reduce them to a form suitable for computer
processing & classification of individual objects. A mid-level process is characterized by the
fact that its inputs generally are images but its outputs are attributes extracted from those
images. Finally higher- level processing involves “Making sense” of an ensemble of
recognized objects, as in image analysis & at the far end of the continuum performing the
cognitive functions normally associated with human vision

Digital image processing, as already defined is used successfully in a broad range of


areas of exceptional social & economic value.

INTRODUCTION TO IMAGE PROCESSING

5.1 Introduction to Digital Image Processing

Image processing is any form of signal processing for which the input is an image,
such as a photograph or video frame; the output of image processing may be either an image
or, a set of characteristics or parameters related to the image. Most image-processing
techniques involve treating the image as a two-dimensional signal and applying standard
signal-processing techniques to it. Image processing usually refers to digital image
processing, but optical and analog image processing are also possible.

Gray scale image:

A grayscale image is a function I (xylem) of the two spatial coordinates of the image
plane.

I(x, y) is the intensity of the image at the point (x, y) on the image plane.

I (xylem) takes non-negative values assume the image is bounded by a rectangle [0, a] ´[0,
b]I: [0, a] ´ [0, b] ® [0, info)

Color image:

It can be represented by three functions, R (xylem) for red, G (xylem) for green
and B (xylem) for blue.

An image may be continuous with respect to the x and y coordinates and


also in amplitude. Converting such an image to digital form requires that the coordinates as
well as the amplitude to be digitized. Digitizing the coordinate’s values is called sampling.
Digitizing the amplitude values is called quantization.

Coordinate convention:

The result of sampling and quantization is a matrix of real numbers. We use two
principal ways to represent digital images. Assume that an image f(x, y) is sampled so that
the resulting image has M rows and N columns. We say that the image is of size M X N. The
values of the coordinates (xylem) are discrete quantities. For notational clarity and
convenience, we use integer values for these discrete coordinates. In many image processing
books, the image origin is defined to be at (xylem)=(0,0).The next coordinate values along
the first row of the image are (xylem)=(0,1).It is important to keep in mind that the notation
(0,1) is used to signify the second sample along the first row. It does not mean that these are
the actual values of physical coordinates when the image was sampled. Following figure
shows the coordinate convention. Note that x ranges from 0 to M-1 and y from 0 to N-1 in
integer increments.

The coordinate convention used in the toolbox to denote arrays is different from the
preceding paragraph in two minor ways. First, instead of using (xylem) the toolbox uses the
notation (race) to indicate rows and columns. Note, however, that the order of coordinates is
the same as the order discussed in the previous paragraph, in the sense that the first element
of a coordinate topples, (alb), refers to a row and the second to a column. The other
difference is that the origin of the coordinate system is at (r, c) = (1, 1); thus, r ranges from 1
to M and c from 1 to N in integer increments. IPT documentation refers to the coordinates.
Less frequently the toolbox also employs another coordinate convention called spatial
coordinates which uses x to refer to columns and y to refers to rows. This is the opposite of
our use of variables x and y.

Image as Matrices:

The preceding discussion leads to the following representation for a digitized image
function:

f (0,0) f(0,1) ……….. f(0,N-1)

f(1,0) f(1,1) ………… f(1,N-1)


f(xylem)= . . .

. . .

f(M-1,0) f(M-1,1) ………… f(M-1,N-1)

The right side of this equation is a digital image by definition. Each element of this
array is called an image element, picture element, pixel or pel. The terms image and pixel are
used throughout the rest of our discussions to denote a digital image and its elements.

A digital image can be represented naturally as a MATLAB matrix:

f(1,1) f(1,2) ……. f(1,N)

f(2,1) f(2,2) …….. f(2,N) . . .

f= f(M,1) f(M,2) …….f(M,N)

Where f(1,1) = f(0,0) (note the use of a monoscope font to denote


MATLAB quantities). Clearly the two representations are identical, except for the shift in
origin. The notation f(p ,q) denotes the element located in row p and the column q. For
example f(6,2) is the element in the sixth row and second column of the matrix f. Typically
we use the letters M and N respectively to denote the number of rows and columns in a
matrix. A 1xN matrix is called a row vector whereas an Mx1 matrix is called a column
vector. A 1x1 matrix is a scalar.

Matrices in MATLAB are stored in variables with names such as A, a, RGB, real array
and so on. Variables must begin with a letter and contain only letters, numerals and
underscores. As noted in the previous paragraph, all MATLAB quantities are written using
mono-scope characters. We use conventional Roman, italic notation such as f(x ,y), for
mathematical expressions

Reading Images:

Images are read into the MATLAB environment using function imread whose syntax is

imread(‘filename’)
Format name Description recognized extension

TIFF Tagged Image File Format .tif, .ti

JPEG Joint Photograph Experts Group .jpg, .jpeg

GIF Graphics Interchange Format .gif

BMP Windows Bitmap .bmp

PNG Portable Network Graphics .png

XWD X Window Dump .xwd

Here filename is a spring containing the complete of the image file(including any applicable
extension).For example the command line

>> f = imread (‘8. jpg’);

reads the JPEG (above table) image chestxray into image array f. Note the use of single
quotes (‘) to delimit the string filename. The semicolon at the end of a command line is used
by MATLAB for suppressing output. If a semicolon is not included. MATLAB displays the
results of the operation(s) specified in that line. The prompt symbol(>>) designates the
beginning of a command line, as it appears in the MATLAB command window.

When as in the preceding command line no path is included in filename, imread reads
the file from the current directory and if that fails it tries to find the file in the MATLAB
search path. The simplest way to read an image from a specified directory is to include a full
or relative path to that directory in filename.

For example,

>> f = imread ( ‘D:\myimages\chestxray.jpg’);

reads the image from a folder called my images on the D: drive, whereas

>> f = imread(‘ . \ myimages\chestxray .jpg’);


reads the image from the my images subdirectory of the current of the current working
directory. The current directory window on the MATLAB desktop toolbar displays
MATLAB’s current working directory and provides a simple, manual way to change it.
Above table lists some of the most of the popular image/graphics formats supported by
imread and imwrite.

Data Classes:

Although we work with integers coordinates the values of pixels themselves are not
restricted to be integers in MATLAB. Table above list various data classes supported by
MATLAB and IPT are representing pixels values. The first eight entries in the table are refers
to as numeric data classes. The ninth entry is the char class and, as shown, the last entry is
referred to as logical data class.

All numeric computations in MATLAB are done in double quantities, so this is also a
frequent data class encounter in image processing applications. Class unit 8 also is
encountered frequently, especially when reading data from storages devices, as 8 bit images
are most common representations found in practice. These two data classes, classes logical,
and, to a lesser degree, class unit 16 constitute the primary data classes on which we focus.
Many ipt functions however support all the data classes listed in table. Data class double
requires 8 bytes to represent a number uint8 and int 8 require one byte each, uint16 and int16
requires 2bytes and unit 32.

Image Types:

The toolbox supports four types of images:

1 .Intensity images

2. Binary images

3. Indexed images

4. R G B images

Most monochrome image processing operations are carried out using binary or intensity
images, so our initial focus is on these two image types. Indexed and RGB colour images
Intensity Images:

An intensity image is a data matrix whose values have been scaled to represent
intentions. When the elements of an intensity image are of class unit8, or class unit 16, they
have integer values in the range [0,255] and [0, 65535], respectively. If the image is of class
double, the values are floating _point numbers. Values of scaled, double intensity images are
in the range [0, 1] by convention.

Binary Images:

Binary images have a very specific meaning in MATLAB.A binary image is a logical
array 0s and1s.Thus, an array of 0s and 1s whose values are of data class, say unit8, is not
considered as a binary image in MATLAB .A numeric array is converted to binary using
function logical. Thus, if A is a numeric array consisting of 0s and 1s, we create an array B
using the statement.

B=logical (A)

If A contains elements other than 0s and 1s.Use of the logical function converts all
nonzero quantities to logical 1s and all entries with value 0 to logical 0s.

Using relational and logical operators also creates logical arrays.

To test if an array is logical we use the I logical function:

islogical(c)

If c is a logical array, this function returns a 1.Otherwise returns a 0. Logical array can
be converted to numeric arrays using the data class conversion functions.

Indexed Images:

An indexed image has two components:

A data matrix integer, x.

A color map matrix, map.


Matrix map is an m*3 arrays of class double containing floating_ point values in the
range [0, 1].The length m of the map are equal to the number of colors it defines. Each row of
map specifies the red, green and blue components of a single color. An indexed images uses
“direct mapping” of pixel intensity values color map values. The color of each pixel is
determined by using the corresponding value the integer matrix x as a pointer in to map. If x
is of class double ,then all of its components with values less than or equal to 1 point to the
first row in map, all components with value 2 point to the second row and so on. If x is of
class units or unit 16, then all components value 0 point to the first row in map, all
components with value 1 point to the second and so on.

RGB Image:

An RGB color image is an M*N*3 array of color pixels where each color pixel is triplet
corresponding to the red, green and blue components of an RGB image, at a specific spatial
location. An RGB image may be viewed as “stack” of three gray scale images that when fed
in to the red, green and blue inputs of a color monitor

Produce a color image on the screen. Convention the three images forming an RGB
color image are referred to as the red, green and blue components images. The data class of
the components images determines their range of values. If an RGB image is of class double
the range of values is [0, 1].

Similarly the range of values is [0,255] or [0, 65535].For RGB images of class units or
unit 16 respectively. The number of bits use to represents the pixel values of the component
images determines the bit depth of an RGB image. For example, if each component image is
an 8bit image, the corresponding RGB image is said to be 24 bits deep.

Figure 5.1 : An image is an array or a matrix of pixels arranged in columns and rows.
In a (8-bit) grayscale image each picture element has an assigned intensity that ranges
from 0 to 255. A grey scale image is what people normally call a black and white image, but
the name emphasizes that such an image will also include many shades of grey.

 Pixel :

A pixel (short for picture element) is a single point in a graphic image. Each such
information element is not really a dot, nor a square but an abstract sample. Each element
of the above matrix is known as a pixel where dark = 0 and light = 1. A pixel with only 1 bit will represent
a black and white image. If the numbers of bits are increased then the number of gray levels
will increase and a better picture quality is achieved. All naturally occurring images are
analog in nature. If the number of pixels is more than the clarity is more. An image is
represented as a matrix in Digital Image Processing. In Digital Signal Processing we use only
row matrices. Naturally occurring images should be sampled and quantized to get a digital
image. A good image should have 1024*1024 pixels which is known as 1k * 1k = 1Mpixel

Figure 5.2: Each pixel has a value from 0 (black) to 255 (white). The possible range of the
pixel values depend on the color depth of the image, here 8 bit = 256 tones or grayscales.

A normal grey scale image has 8 bit color depth = 256 grey scales. A “true color”
image has 24 bit color depth = 8 x 8 x 8 bits = 256 x 256 x 256 colors = ~16 million colors.
Figure 5.3: A true-color image assembled from three grayscale images colored red, green and
blue. Such an image may contain up to 16 million different colors. Some grayscale images
have more grayscales, for instance 16 bit = 65536 grayscales. In principle three grayscale
images can be combined to form an image with 281,474,976,710,656 grayscales.

5.1.2 RGB image

The RGB color model relates very closely to the way we perceive color with the r, g
and b receptors in our retinas. RGB uses additive color mixing and is the basic color model
used in television or any other medium that projects color with light. It is the basic color
model used in computers and for web graphics, but it cannot be used for print production.

The secondary colours of RGB – cyan, magenta, and yellow – are formed by mixing
two of the primary colours (red, green or blue) and excluding the third color. Red and green
combine to make yellow, green and blue to make cyan, and blue and red form magenta. The
combination of red, green, and blue in full intensity makes white. The RGB color model is
an additive color model in which red, green, and blue light are added together in various ways
to reproduce a broad array of colors. The name of the model comes from the initials of the
three additive primary colors, red, green, and blue.

The main purpose of the RGB color model is for the sensing, representation, and
display of images in electronic systems, such as televisions and computers, though it has also
been used in conventional photography. Before the electronic age, the RGB color model
already had a solid theory behind it, based in human perception of colours.

RGB is a device-dependent color model: different devices detect or reproduce a given


RGB value differently, since the color elements (such as phosphors or dyes) and their
response to the individual R, G, and B levels vary from manufacturer to manufacturer, or
even in the same device over time. Thus an RGB value does not define the same color across
devices without some kind of color management. Typical RGB input devices are color TV
and video cameras, image scanners, and digital cameras. Typical RGB output devices are TV
sets of various technologies (CRT, LCD, plasma, etc.), computer and mobile
phone displays, video projectors, multicolor LED displays, and large screens such as
JumboTron, etc. Color printers, on the other hand, are not RGB devices, but subtractive
color devices (typically CMYK color model).
Figure 5.4: The additive model of RGB. Red, green, and blue are the primary stimuli for
human color perception and are the primary additive colors

5.1.3 Astronomical images

Images of astronomical objects are usually taken with electronic detectors such as a
CCD (Charge Coupled Device). Similar detectors are found in normal digital cameras.
Telescope images are nearly always gray scale, but nevertheless contain some color
information. An astronomical image may be taken through a color filter. Different detectors
and telescopes also usually have different sensitivities to different colors (wavelengths).

5.1.4 Representative color images

If one or more of the images in a data set is taken through a filter that allows radiation
that lies outside the human vision span to pass – i.e. it records radiation invisible to us - it is
of course not possible to make a natural colour image. But it is still possible to make a colour
image that shows important information about the object. This type of image is called a
representative color image. Normally one would assign colors to these exposures in
chromatic order with blue assigned to the shortest wavelength, and red to the longest. In this
way it is possible to make color images from electromagnetic radiation far from the human
vision area, for example x-rays. Most often it is either infrared or ultraviolet radiation that is
used.

5.1.5 Fundamental steps in Digital Image Processing


Image Acquisition:
Digital image acquisition is the creation of digital images typically from a physical
object. A digital image may be created directly from a physical scene by a camera or similar
device. Alternatively it can be obtained from another image in an analog medium such as
photographs, photographic film, or printed paper by a scanner or similar device. Many
technical images acquired with tomographic equipment, side-looking radar, or radio
telescopes are actually obtained by complex processing of non-image data.
Image Enhancement:
The process of image acquisition frequently leads to image degradation due to
mechanical problems, out-of-focus blur, motion, inappropriate illumination and noise. The
goal of image enhancement is to start from a recorded image and to produce the most visually
pleasing image.
Image Restoration:
The goal of image restoration is to start from a recorded image and to produce the most
visually pleasing image. The goal of enhancement is beauty. The goal of restoration is truth. The
measure of success in restoration is usually an error measure between the original and the
estimate image. No mathematical error function is known that corresponds to human
perceptual assessment of error
Color image Processing:
Color image processing is based on that any color can be obtained by mixing 3 basic
colors red, green and blue. Hence 3 matrices are necessary each one representing each color.
Wavelet and Multi-Resolution Processing:
Many times a particular spectral component occurring at any instant can be of
particular interest. In these cases it may be very beneficial to know the time intervals these
particular spectral components occur. For example, in EEGs the latency of an event-related
potential is of particular interest. Wavelet transform is capable of providing the time and
frequency information simultaneously, hence giving a time-frequency representation of the
signal. Although the time and frequency resolution problems are results of a physical
phenomenon (the Heisenberg uncertainty principle ) and exist regardless of the transform
used, it is possible to any signal by using an alternative approach called the multi-resolution
analysis (MRA). MRA analyzes the signal at different frequencies with different resolutions.
MRA is designed to give good time resolution and poor frequency resolution at high
frequencies and good frequency resolution and poor time resolution at low frequencies.
Compression:
Image compression is the application of data compression on digital images. Its
objective is to reduce redundancy of the image data in order to be able to store or transmit
data in an efficient form.
Morphological processing:
Morphological processing is a collection of techniques for image processing based on
mathematical morphology. Since these techniques rely only on the relative ordering of pixel
values not on their numerical values they are especially suited to the processing of binary
images and gray scale images.
Segmentation:
In the analysis of the objects in images it is essential that we can distinguish between
the objects of interest and ³the rest´. This latter group is also referred to as the background.
The techniques that are used to find the objects of interest are usually referred to as
segmentation techniques.

In image processing it is usually necessary to perform high degree of noise reduction


in an image before performing higher-level processing steps, such as edge detection. The
median filter is a non-linear digital filtering technique, often used to remove noise from
images or other signals. The idea is to examine a sample of the input and decide if it is
representative of the signal. This is performed using a window consisting of an odd number
of samples. The values in the window are sorted into numerical order; the median value, the
sample in the center of the window, is selected as the output. The oldest sample is discarded,
a new sample acquired, and the calculation repeats.

Median filtering is a common step in image processing. It is particularly useful to


reduce speckle noise and salt and pepper noise. Its edge-preserving nature makes it useful in
cases where edge blurring is undesirable. Image synthesis is the process of creating new
images from some form of image description. The kinds of images that are typically
synthesized include:

 test Patterns - scenes with simple two dimensional geometric shapes.

 Image Noise - images containing random pixel values, usually generated from specific
parameterized distributions.

 Computer Graphics - scenes or images based on geometric shape descriptions. Often


the models are three dimensional, but may also be two dimensional.

Synthetic images are often used to verify the correctness of operators by applying them to
known images. They are also often used for teaching purposes, as the operator output on such
images is generally `clean', whereas noise and uncontrollable pixel distributions in real
images make it harder to demonstrate unambiguous results. The images could be binary, grey
level or colour.

5.2 Tools And Techniques Of Image Processing

Numerous numbers of sources will be a reason for the addition of unwanted signs in
binary images in the form of the unhealthy illumination and noise [86-94]. This happens
especially when we are acquiring, quantization and transmitting the images.

The concept of noise is unavoidable in image progressioning. The degradation will


directly affect the visualization of the binary images when we are progressioning the image.
This is still an unsolved problem which has a wide capability of study in this area takes us to
introduce a scheme to answer this issue.

At present modernization of images will have an imperative part in every aspect


especially in the exchange of data which having certain details in it. Images and videos are
used all over in the world to a greater extent depending on their usage. The unwanted sign
will make change in the true image at edges, lines; get ridding of the image details and pixel
values. This is the main reason for removing of noise in the images before progressioning the
image. The different progressioning techniques will be discussed in the further section in
below.

5.3 Noise removal of image

In present and future scenario will be very much related to the digital image
progressioning which having the integral part of every one’s life. The applications which
always turn around the extended general documentations of a probability and visual exchange
of information medical fields and serious surveillance, the quality of the image and accuracy
is increased along with demand that have for images.

The blur in the images will be attained while acquiring the image and transmitting of
pictures which are degraded quality, which spoiled the picture quality in both sensing and
visual awareness. The quality also depends on the environmental conditions when the image
has been captured. For example, the images that have been sending by the wireless medium
maybe be affected by the lightening effects, temperature variation etc.
Get rid of the image is a well explored topic that have been prime aim to our any
progressioning the image which enhances quality of image by diminishing the noise from the
image. The main theme here to safeguard the required information of pictures and removing
undesired details on the image, which has to be achieved by the local geometrics and pixels
values having imperative data [92- 94].

Denoised
Input image Noise image Noise
image
f(x, y) + g(x, y) reduction 
window f (x, y)

Unwanted
signal
n (x, y)

Fig.5.5 A image model removing of noise progression

From the above shown figure 2.1 where the input image subjected to the addition of noise n
(x, y) and noise image g(x, y) provides the required output. Which has been given to the noise

reduction filter (s) and the denoised image is obtained at the output ( f (x, y) ).

5.4 Image Noise

The noise is introduced when we are considering an image, formation, recording, or


transmitting section, which has to be trim down completely using various techniques. Even a
minute amount of unwanted sign will change the whole scenario, when accuracy is needed.
The unwanted signs are mainly of following types. They are impulse, additive white
Gaussian unwanted signal (AWG), unwanted speckle signal, Poisson, salt and pepper blurs
and many more.

A vast number of techniques have been future by number of authors which paved the way for
the further development in the removing of noise concept. These techniques will further be
categorized on their nature and use.

Here various types of noises are discussed that have encountered in the proposed technique.
They are shown below.

5.4.1 Noises types and sources


Material dimension is one dependent for the source and types of noise. Which is
likely to be dependent on then quantity that to be measured. The unwanted sign is attained
when the source is different from the required source; even when the noise is caused form the
measuring progression itself.

Sometimes the unwanted sign occurs due to mathematical iteration also, especially
when we try to compress an image. Noise is undesirable part of all the digital images. It is not
possible to delete the noise that has been in the digital images when we are acquiring the
image so it is essential to delete the unwanted signal by using of noise algorithms.

5.4.2 Noise in Camera

From variety of sources we will see the noise when we are taking the images from the
digital cameras and conventional film cameras. These noises are mainly huge in number.
They were listed here as follows.

The Gaussian, salt and pepper blur were two frequently widespread unwanted signals
that we observe. In every case the unwanted signals various pixels are correlated or
uncorrelated. These may be independent or the identically separated. The most frequently
occurred noises are seen here they are:

5.4.3 Additive White Gaussian blur (AWG)

The most commonly used noise that has been seen in the black and white images
which are frequently used in the digital image progressioning. These are mainly used as a
benchmark standard where the AWGN will assess the performance of the image using
removing of noise algorithms. The probability distribution function of normal distribution
(Gaussian distribution) is due to Gaussian noise, this noise is also called as statistical noise.

Normally dram rate is extra to the spotless pel value in Gaussian distribution, this will
clearly be observed in Gaussian phenomenon. This is exactly same for all the pixels (mean =
variance). Here the amount of noise is added to each and every value of the pixel where the
amplifier noise of every noise is same as the AWGN. This Gaussian noise is mainly occurred
due to the thermal agitation of the charged carriers which is often seen within the electrical
conductor. The bell-shaped function will show the distribution function of the Gaussian
variable (bell curve) is exposed in fig 2.2.
Fig.5.6 The distribution of Gaussian function (bell curve)

In general statistics the normal distribution is taken into account most prominently.
The main reasons for this are the central limit theorem comes under the mild conditions, the
huge number of random variables is considered independently which is taken into account
from same distribution. The theorem is tactical analytically which is derived explicitly.

1 f   2
1 ( )
Pf(f) e2 
 2

Where  is average/expectation (site of the crest value?)

 2 is variance

 is standard deviation

5.4.4 Salt and Pepper blur

It has distribution of fat tail, the other name is spike blur or the impulsive blur. The
main reason for this noise is due to the loss of the pixel while we transmit the images. Here
when we compared the pixels with the surrounding images then we can notice the variation in
those pixels in color or intensity of those pixels, but is very limited to a small number of
pixels. Here the noise type will make the image either dark or saturated. This makes the rate
of the pixels is uncorrelated with rate of the pixel in spotless picture. This occurs mainly due
to the dust in the camera or overheated faulty CCD elements, A-D convertor errors etc. The
bit errors in transmission are also caused due to this. The removing of noise effect of median
filters is quite being effective on this kind.

5.4.5 Photon Shot Noise

This is mainly occurred by quantum fluctuation which is dominant in lighter parts of


an image for which we have taken it from the image sensor; this describes he change in the
photon count which sensed at a given exposure level. This having the root average square
number that is directly relative to the intensity square root and independent to another. This
also backups the poisons noise with its distribution, which do not vary far away from the
Gaussian too. If the mean is too large then it is to be analogous like the Gaussian distribution
having Mean and Variance. The deleting of the Gaussian and the Poisson noise is the same.
This has become the major concern in the recent times due to the image sensor is reducing
the size and the also decrease in the number of the photons that is capturing the image.

5.4.6 Thermal noise

This caused mainly due to the emery of the chip that will accumulate the
photoelectrons resulting in the unwanted signs. This also occurs in the absence of the light
and hence called as the dark-current noise which varies accordingly the temperate of the
sensor. This can diminish by the sign acquiring progression which can be done by the cooling
the sensor of the camera.

5.5 Other Types of Noises

5.5.1 Noises that are spatially independent

Here all the spatial coordinates of the blur is independent as well as uncorrelated
according to the picture itself, applications like X-rays and nuclear image/imaging on
medicine. The 2.1 table demonstrates the spatial self-governing PDF values the
variances and mean.

5.5.2 Blurs those are spatially dependent

Here the type of blur will be depends on the spatially correlated itself and coordinates
to figure itself. Here the most frequently noise that occurs is periodic noise which is
correlated to it. This periodic noise occurs mainly due to the electrical and the
electromagnetic interferences which acquiring the image.
Table 5.1 Probability Density Functions PDF’s blur categories, Average and Variance

Blur type PDF Average Variance

 (s  a) 2
1
p(z)  e 2b2
,   z  
Gaussian 2 b ma  2  b2

 (s  a)2
2
p(z)   (z  a) e b
,z a
Rayleigh b b b(4   )
ma 2 
za 4 4
=0,

 a b z b 1  az
p(z)   e
Erlang  (b  1)! ,z0 b b
m 2 
(gamma) a a2
=0 , z0

p(z)   ae  az , z  0
Exponential 1 1
=0, z0 m 2 
a a2

1
p (z) 
ba ,a  z  b
Uniform ab (b  a) 2
m 2 
=0, elsewhere 2 12

Salt and p a , z  a

pepper p (z)   pb , z  b m  a p a  bpb  2  (a  m)2 p a  (b m)2 pb
0, otherwise
(impulse) 

5.6 Demonstration of Image


The human visual system (HVS) incurs in given picture like an accumulation of
Luminant energy which is distributed spatially [20]. This type of representation termed as
optical image. Optical images are general types of images that a camera captures. Later these
are constituted as video like an analog electrical signal. These are further sampled to
transform in to a digital image which is represented as S(r , c).

The binary image S(r, c) is interpreted like 2D collection of information. This is called
as Pixel representation. Each pixel magnitude refers to certain feature of the image in spatial
system. One such is the brightness of the image. Finally, the whole image is transformed into
a data matrix having rows and columns to fit 2D array representation. Every row in the
matrix matches for a vector, based on mapping many image formats [95-104].

5.6.1 Binary Image

The elementary type of image representation is called the "Binary image". It typical
uses only two levels. The two levels are referred to as black and white which are mentioned
as ‘1’ and ‘0’. This kind of image representation is considered like 1 bit per pixel image. This
is suitable to an reason which considers barely single digit number to signify every pel. These
types of images are often used to depict low level information of the picture like its outline or
shape. Especially in applications like representation in optical character (OCR) where the
only outline character required realizing the letter representing it. The digital images are
generated from the system of images of gray scale through a technique called thresholding.
The two-level thresholding simply acts as a decision factor above which it switches to
numerical '1' and below which it switches to numerical '0'.

(a) (b)
(c)

Fig. 5.7 illustrations of digital images

5.6.2 Image of Gray scale

Images of Gray scale (GS) were denoted as neutral or single-color picture. Such images
possess information of brightness merely. Hence color data is contained by them is empty.
However, the brightness is represented at different levels. Typical 8-bit image holds a range
of 0-255 brightness levels known as gray levels. Here 0 refers to black and 1 refers to white.
The 8-bit depiction is obvious with the reality to a computer actually handles the data in 8-bit
format. Below Fig.5.7 (a) and (b) are two examples of such GSI.

(a) (b)
(c)

Fig.5.8 Examples of GSI

5.6.3 Color images

Color image (CI) which modeled as triple band single chromatic light information,
here every band of information will keeps in touch with various dissimilar colors.

(a) (b)
(c)

Fig.5.9 illustrations of CI

The genuine data accumulated via binary data of the image is brightness in all spectrum
bands. The displaying of the image that corresponds sequence of brightness which depicts on
the screen by image components that gives out energy in the form of light that corresponds to
a singular color. Colored pictures that are distinctive were signified as RGB images that uses
the eight bit monochrome ordinary as a replica, the color images that correspond will have 24
bits to 8 bits per pixel for each color bands (red, green, blue).

(a) SR(r,c) (b) SB(r,c) (c) SG(r,c)

Fig.5.10: Representation of CI as individual color (RGB) images.

The following figure illustrates the vector as an arrow that adds, which refers an individual
smallest unit measurement of red, green, blue principles like a color vector ---(R, G, B ).

Fig.5.11 A shading pixel vector comprises of the red, green and blue pixel esteems (R, G, B)
at one given line/section pixel facilitate (r, c)
Fig.5.12 Color pixel representation

For many applications, the information about RGB shading is changed into mathematical
space that decouples the brilliance information from the genuine color data. The HSL shading
change enables we to portray shades by means of all promptly get it. The daintiness is the
splendor of the shading, and the tint is the thing that we regularly consider as "shading" and
the tone (ex: green, blue, red, and orange). The immersion is a proportion of how much white
is in shading (ex: Pink equals to red when it has extra white, so it is less soaked than an
unadulterated red).

5.6.4 Multispectral Images

A multispectral picture is one that catches picture information at particular


frequencies over the electromagnetic range. Multispectral pictures regularly contain data
outside the typical human perceptual range. This may incorporate infrared, bright, X-beam,
acoustic or radar information. Foundation of these kinds of picture incorporates satellite
frameworks, submerged sonar frameworks and medicinal diagnostics imaging frameworks.
Fig.5.13 Electromagnetic range

(a) (b)

Fig.5.14 Examples of Multispectral images

5.6.5 Computer Graphics

PC illustrations are a specific field inside that alludes to the software engineering
domain that alludes to the multiplication of visual information using PC.

In PC illustrations, sorts of picture information are separated into two principle classes:

1. Bitmap picture (or raster picture): can spoke to by our picture demonstrate I(r, c), where
we have pixel information and comparing brightness esteems that put away in file
arrangement.
2. Vector pictures: allude to the strategies for shows lines, bends shapes by storing only just
the key focuses. These key focuses are adequate to characterize the shapes, and the procedure
of turning postulations into a picture is called ripping after the picture has been rendered, it
very well may be thought of as being in bit outline where every pixel has particular qualities
related with it.

5.7 Digital Image Formats

Picture groups are institutionalized methods for sorting out and putting away
advanced pictures [101]. Picture records are taken from advanced data in individual
configurations that can be rasterized for use on a PC show or printer. A picture manuscript
configuration may stockpile data in uncompressed, full to capacity, or designs of vectors.
Once rasterized, an image turns into a framework of pixels, every one of which has different
bits to allocate its shading equivalent to the coloring profundity of the gadgets shows it. Most
the sort of document organize fall into class of bitmap pictures. By and large, these sorts of
pictures contain both header data and the crude pixel information. The header data contain
data with respect to

i. The numeral rows (height)

ii. The numeral columns (Width)

iii. The numeral groups.

iv. The quantity of bit per pixel.

v. The document type

vi. Moreover, with a portion of the beneficial complex record designs have the header may
contain information about the kind of pressure utilized and other vital requirements to make
the picture, I (r, c).

5.7.1 BMP arrange (Bitmap picture File Format)

The BMP otherwise called bitmap picture record or gadget free bitmap (DIB) document
arrange or basically a bitmap, is a raster illustrations picture document design used to stock
up bitmap computerized pictures, independently of the showcase gadget, (for example, a
designs connector), specifically on Microsoft Windows and OS working frameworks.
This record arrange can store 2D double pictures of arbitrary width, tallness, and goals in
both monochrome and shading pictures, in an assortment of shading profundities, and
alternatively with information demands, alpha channels, and shading plots.

5.7.2 TIFF (Tagged Image File Format)

Labeled Image File Format is a standout amongst the most famous and adaptable of the
present open space raster document positions. They are utilized on World Wide Web
(WWW). GIF records are restricted to a most extreme of 8 bits/pixel and takes into
consideration a kind of pressure called LZW. The GIF picture header is 13 byte long and
contains essential data.

5.7.3 JPEG (Joint Photo Graphic Experts Group)

This is the correct organization for those photograph pictures which must be little records, for
instance, for sites or for email. JPG is frequently utilized on advanced camera memory cards.
The JPG record is magnificently little, frequently packed to maybe just 1/10 of the span of
the first information, which is something to be thankful for when modems are included. In
any case, this awesome pressure effectiveness accompanies a high cost.

JPG utilizes lossy pressure (lossy signifying "with misfortunes to quality"). Here Lossy
implies that nature of the picture is lost where the pressure of JPG information is done and
spared, and the recuperation of this quality is never to be occur. JPEG pictures pressure is
being utilized widely on the WWW. It's, adaptable, so it can make huge documents with
fantastic picture correspondence.

5.7.4 VIP (visualization in picture handling) positions:

It is produced for the CVIP devices programming, when performing brief pictures are made
that utilization skimming point portrayal which is past the standard 8 bit/pixel. To speak to
this sort of information the remapping is utilized, which is the way toward taking unique
picture and adding a condition to make an interpretation of it to the rang (0-225).

5.7.5 Graphics Interchange Format (GIF)

The indexed representation of color images is used by GIF (having the palette of maximum
256 shades), the LZW (Lempel–Ziv–Welch) firmness procedure, and title of size 13 byte.

5.7.6 Portable Network Graphics (PNG)


The most popular image type of format which supports both true color and indexed images is
PNG. Additionally, it offers a patent-free alternate for GIF set-up.

5.8 Medical Picture Modalities

The most primitive methods for generating medical pictures are using illumination to
generate snapshots, whichever due to gross anatomic structures, or the use of a microscope,
of histological specimens. The creation of images is still done by light. However, the internal
structures of the body are not seen by visible light.

5.8.1 X-Rays

In 1895 X-ray beams were first found by Wilhelm Conrad Roentgen, for that he granted
Nobel Prize in 1901 for material science for his accomplishment. This cleared every one of
the researchers an overall enthusiasm for the field of medication and many more applications
by 1900, there as of now were various medicinal radiological social orders. Accordingly, the
premise was laid for another subdivision of prescription committed to imaging the
development and capacity of the body. Standard X-beam framework with picture intensifier
is taken. X beams impinging on the picture intensifier are changed into a conveyance of
electrons, which delivers an enhanced light picture on a littler fluorescent screen after
quickening.

Fig.5.15 X-ray image model

The picture is seen by a TV camera and a film camera and can be seen on a PC screen and
put away on a CD-ROM or a PACS. X-beam innovation is the most seasoned and most
ordinarily utilized type of restorative imaging. X-beams utilize ionizing radiation to deliver
pictures of a man's inner structure by sending X-beam pillars through the body, which are
caught up in various sums relying upon the thickness of the material. What's more, included
as "x-beam type" gadgets are additionally mammography, interventional radiology, registered
radiography, advanced radiography and processed tomography (CT). Radiation Therapy is a
kind of gadget which likewise uses either x-beams, gamma beams, electron shafts or protons
to treat malignant growth.

X-beam pictures are regularly used to assess:

a) Fractured bones

b) Cavities

c) Objects Swallowed

d) Lungs

e) Human veins

f) Mammography

5.8.2 Ultrasound

To create images inside the body the Diagnostic ultrasound, or else called therapeutic
sonography or ultra-sonography utilized high recurrence sound waves. The ultrasound device
sends sound waves into the body and can alter over the returning resonance echoes into an
image. Ultrasound originality can similarly deliver discernable hints of blood stream,
enabling therapeutic experts to exploit the two sounds and visuals to survey a patient's
comfort.

Fig.5.16 Ultrasound image model

Ultrasound is regularly used to assess:

a) Pregnancy
b) Abnormalities in both heart and veins

c) pelvis and mid-region organs

d) Symptoms of torment, swelling and disease

5.8.3 Positron Emission Tomography (PET)

The working of how tissues and elements of the organs is an atomic imaging system that
gives doctors data with assistance of PET. The blend of CT imaging is finished by the
assistance of PET, a scanner and a little amount of radio-pharmaceuticals which is infused
into a patient's vein to help in making exhaustive, automated pictures of regions inside the
body.

Fig.5.17 PET image model

PET is frequently used to calculate:

a) Neurological syndromes like Multiple and Sclerosis Alzheimer’s

b) leukemia

c) Effectiveness of treatments

d) Heart conditions

5.8.4 Computed Tomography (CT)

Computed Tomography, is likewise called as CAT check. Which is a therapeutic imaging


approach that joins various number of X-beam projections being used from different edges to
make inside and out cross-sectional pictures of territories within the body. CT pictures allow
specialists to get exceptionally exact, 3-D viewpoint of exact parts of the body, for instance,
fragile tissues, the pelvis, veins, the lungs, the cerebrum, the heart, guts and bones. CT is
likewise frequently the preferential technique for diagnosing many diseases, for example,
liver, lung and pancreatic malignant growths.

Fig.5.18 CT image model

CT is repeatedly used to assess:

a) Presence, size and area of tumors

b) Organs in the pelvis, upper body and guts

c) Colon well-being (CT colongraphy)

d) Vascular condition/blood stream

e) Pulmonary embolism (CT angiography)

f) Abdominal aortic aneurysms (CT angiography)

g) Bone wounds

h) Heart tissue

i) Traumatic wounds

j) Heart ailment

5.8.5 PET-CT

For included exactness, doctors utilize a therapeutic imaging strategy that joins PET
and CT. the single superimposed picture permits pictures procured from the two gadgets that
brings successively and aggregate into a sole superposed picture. PET-CT fills in as a key
instrument in the depiction of tumor limit, arranging and the homework of patient conduct
designs. The blend has been appeared to enhance oncologic consideration by emphatically
affecting dynamic treatment choices, sickness repeat checking and understanding results, for
example, malady free movement.

Fig.5.19 PET-CT image model

5.8.6 Magnetic Resonance Imaging (MRI)

The detailed images of organs and tissues are obtained by using radio waves and a magnetic
field in medical imaging technology which is done by MRI images. This proven to be most
effective in curing affected tissues from the clustered tissues which are having combination
of both tissues. This makes magnetic resonance imaging the most trending technique in the
field of the medical history at present.

Fig.5.20 MRI model

Beam of X-ray is often used to evaluate:

a) Veins

b) Abnormal tissue

c) Breasts

d) Bones and joints


e) Organs in the pelvis, chest and stomach area (kidney, heart, spleen, liver)

f) Spinal wounds

g) Tendon and tendon tears

5.9 Image Quality Metrics

The normal for a picture that bargains with seen picture corruption is finished by the
assistance of Image Quality (regularly, contrasted with a perfect picture). Imaging
frameworks may present a few measures of clamor or curios in the pointer, so the quality
judgment is a critical difficulty.

There are different methods and measurements that can be ascertained dispassionately
and mechanically assessed by a program of a PC. Thus, they characterize as Full Reference
conspire (FR) and No-Reference plot (NR). In FR picture superiority plans, the test picture
quality is assessed by offsetting it with a specified picture that is accepted to have
extraordinary quality. NR measurements attempt to survey the estimation of a picture with no
reference to the novel one.

The picture quality lists attempt to make sense of the a few or the mixes of the
different elements that decide the picture quality. A portion of the basic components of the
picture quality measurements are given as pursues.

a. Mean Square Error (MSE)

b. Pinnacle Signal to Noise Ratio (PSNR)

c. Mean Difference (MD)

d. Standardized Cross Correlation (NCC)

e. Most extreme Difference (MD)

f. Standardized Absolute Error (NAE)

g. Laplacian Mean Square Error (LMSE)

I. Basic similarity index metric

However, only some of the above-mentioned image metrics are used in this thesis. They are
explained in details in terms of their meaning and calculation as follows.
5.9.1 Mean Square Error (MSE)

The expression for evaluating the MSE is given as in eq (2.1)

1 m n
  Aij  Bij 
2
MSE 
mn i 1 j 1 (2.1)

Therefore

m altitude of Image entailing number or pixel rows

n thickness of image, entailing the number of pixel columns.

Aij is density of pixel values of the ideal picture.

Bij being density of pixel values of figure that has been fused.

The most commonly used type of measurement is Mean square error which is used at
projection of error process wherever, the blunder esteem is the esteem contrast among the
real information and the resultant information. The mean of the square of this mistake gives
the blunder or the real contrast among the normal/perfect outcomes to the acquired or
computed outcome. Here, the count is carry out at level of the pixel. An aggregate of m*n
pixels are to be well thought-out. A(i,j) will be the pixel thickness estimation of the ideal
picture and B(i,j) being that of the melded picture. The contrast among the pixel thickness of
the ideal picture and the melded picture is squared and the mean of the equivalent is the
viewed as mistake. MSE esteem will be 0 if both the pictures were indistinguishable.

5.9.2 Peak Signal to Noise Ratio (PSNR)

PSNR expression is given as in eq (2.2)

AD 
1 m n

 Aij  Bij
mn i 1 j 1

(2.2)

Where

m altitude of the Image implying number or rows of a pixel.

n thickness image, implying number of columns of pixel.

A(i,j ) the density of pixel values of the ideal image.

B(i,j) density of pixel values of fused image.


Normal Difference, as clarified by the term itself, is the normal estimation of the distinction
between the genuine/perfect information and the gotten/resultant information. For our
situation, the relating pixel estimations of perfect Image An and melded picture B is
considered.

The distinction of the comparing pixel thickness esteems is found the middle value of to get
the metric. This metric aides in giving the general normal contrast between the relating pixels
of the two pictures demonstrating us an esteem that determines, how vastly different is the
combined picture from the ideal picture.

5.9.3 Structural Content (SC)

The expression for the calculating the SC is given as in eq(2.3)

m n 2

  A 
i 1 j 1
ij

SC  m n

  B 
2
ij
i 1 j 1
(2.3)

Where,

m the altitude of the picture that implies on rows of the pixel.

n thickness image, implies on columns of the pixel.

A(i,j) density of the pixel values of the ideal representation.

B(i,j) density of the pixel values of image fused.

Here the proportion between the substance of the both the normal and the acquired
information. Essentially, it is the proportion between the net entirety of the square of the
normal information and the net whole of square of the acquired information. For our situation
we compute the proportion among the net total of the square of the pixel densities of flawless
picture and the net aggregate of the square of pixel densities of melded picture.

For the combined and immaculate picture being indistinguishable, the Structural Content
metric esteem would be 1. Yet, SC being 1 does not imply that the melded and impeccable
pictures are indistinguishable.

5.9.4 Maximum Difference (MD)

Mathematically the MD is represented as in eq (2.4)


MD  max( Aij  Bij ), i  1, 2,3,.............m; j  1, 2,3,...........n

(2.4)

somewhere

m- height of Image implies on number of rows of a pixel

n thickness image, implies on number of columns of a pixel.

A(i,j) density of value of the ideal image of a pixel.

B(i,j) being the density of the pixel values of fused image.

Most extreme Difference is an exceptionally basic metric that gives us the data of the biggest
of the relating pixel mistake. The distinction between every one of the relating pels densities
is ascertained. The greatest distinction estimation of these is measured as the metric here.
These metric mirrors a high esteem if any piece of the two pictures, a noteworthy distinction
exists.

5.9.5 Laplacian Mean Square Error (LMSE)

The expression for evaluating the LMSE is given as in eq (2.5)

m n 2

   2 A   2 B 
i 1 j 1
LMSE  m n

 (
i 1 j 1
2
A) 2
(2.5)

Where,

m altitude of the picture implies on number of rows of a pixel

n thickness of image, implies on number of rows of a pixel.

A(i,j) density of the pixel standards of the ideal picture.

B(i,j) density of pixel standards of the image fused.

Laplacian Mean Square Error, as clarified by the term, the typical mean square
mistake count. However, the distinction at this time is that the mean square blunder is figured
not founded on the normal and acquired information but rather dependent on the laplacian
estimation of the equivalent. Consequently, the laplacian of every one of the qualities is
figured and after that the net entirety of the square of the blunder (contrast) is computed
which is separated by total of the square of the Laplacian of the normal information.

5.9.6 Structural Similarity Index Metric (SSIM)

The Structural Similarity Index estimates the comparability among two pictures. It is
an enhanced form of conventional techniques like PSNR and MSE. SSIM is viewed as a
standout amongst the best and steady metric.

SSIM is computed dependent on dual bounds -

(a) K vector being a consistent in the SSIM list recipe with default esteem:

K = [0.01 0.03]

(b) L being the dynamic assortment of pictures. For our situation the default estimation of

L = 255.

The qualities C1 and C2 are computed dependent on the accompanying equation:

C1  ( K1  L) 2

C2  ( K 2  L ) 2

G being a Gaussian channel mask with default esteem given in mat lab as f special
('gaussian', 11,1.5), the info pictures An and B are low pass separated with G giving µ1 and
µ2 individually. The channel activity or convolution task is signified by ".".

µ1=A.G

µ2=B.G

Then the values, σ12 , σ22and σ122 are calculated based on following formula.
 12  ( Aij 2 .G )  12

 22  ( Bij 2 .G )   22

 122  ( Aij Bij .G )  1.  2

Once above qualities are ascertained, at last the SSIM esteem is figured by utilizing following
recipe:

SSIM  mean   2  12  C1  .  (2   12  C2 ). / ( 12   22  C1 ).  (12   22  C1 ) 


The SSIM list is a decimal incentive somewhere in the range of 0 and 1. An
estimation of 0 would mean zero connection with the first picture, and 1 implies precisely the
same picture. 0.95 SSIM, for instance, would suggest half as much variety from the first
picture as 0.90 SSIM. Throughout this record, picture and video pressure strategies can be
adequately analyzed.
5.9.7 Benchmarking
The following table 2.2 portrays the perfect esteem a metric will have on the off
chance that the combined picture and the ideal portrayal are indistinguishable. Amid this
case, the intertwined picture isn't immaculate; the table indicates the scope of the esteem the
metric will length between.
Table 5.2: Ideal values of some image quality metrics

As to picture quality measurements readings, the more noteworthy perspective is its outright
esteem. Our primary target is to recognize the distinction/mistake an incentive in the
combined picture as for the ideal picture, the sign does not hold quite a bit of importance for
our situation.

FORGERIES

Automatic examinations of questioned signatures were introduced in the late 1960s


with the advent of computers. As computer systems became more powerful and more
affordable designing an automatic forgery detection system became an active research
subject. Most of the work in off-line forgery detection, however, has been on random or
simple forgeries and less on skilled or simulated forgeries. Before looking into the landmark
contributions in the area of forgery detection, we first enumerate the types of forgeries
.Verification is the decision about whether the signature is genuine or forged. Forged images
can be classified into three groups there are three kinds of forgeries –Skilled Random and
Casual. Shown below is a self-explanatory image of the various kinds of forgeries:

Random Images: Are formed without any knowledge of the signer's name or signature
shape. Simple Images: Produce by people knowing the name of the signer's but without any
example of the signature.

Skilled Images: Are produce by people looking at the original signature image and try to
imitate it as closely as possible.

Forgeries
Skilled Forgeries
Casual Forgeries
Random frog Professional forgeries
Amateur forgeries
Over-theshoulder forgeries
Home improved Various kinds of forgeries are classified into the following types:
Random Forgery
The signer uses the name of the victim in his own style to create a forgery known as the
simple forgery or random forgery. This forgery accounts for the majority of the forgery cases
although they are very easy to detect even by the naked eye.
Unskilled Forgery
The signer imitates the signature in his own style without any knowledge of the spelling and
does not have any prior experience. The imitation is preceded by observing the signature
closely for a while.
Skilled Forgery
Undoubtedly the most difficult of all forgeries is created by professional impostors or persons
who have experience in copying the signature. For achieving this one could either trace or
imitate the signature by hard way. Figure shows the different types of forgeries and how
much they are varies from original signature.
(a) Genuine Signature; (b) Random Forgery; (c) Simulated Simple Forgery; (d)
Simulated Skilled Forgery

About Forgeries: Initially a set of signatures are obtained from the subject and fed to the
system. These signatures and pre-processed Then the pre-processed images are used to
extract relevant geometric parameters that can distinguish signatures of different persons.
These are used to train the system. The mean value of these features is obtained. In the next
step the scanned signature image to be verified is fed to the system. It is pre-processed to be
suitable for extracting features. It is fed to the system and various features are extracted from
them. These values are then compared with the mean features that were used to train the
system. The Euclidian distance is calculated and a suitable threshold per user is chosen.
Depending on whether the input signature satisfies the threshold condition the system either
accepts or rejects the signature. Handwritten signature verification is the process of
confirming the identity of a user sing the handwritten signature of the user as a form of
behavioural biometrics [1][2]. Automatic handwritten signature verification has been studied
for decades. Many early research attempts were reviewed in the survey papers.[3]. The main
advantage that signature verification has over other forms of biometric technologies, such as
fingerprints or voice verification, is that handwritten signature is already the most widely
accepted biometric for identity verification in society for years. The long history of trust of
signature verification means that people are very willing to accept a signature-based
biometric authentication system [4].
Various types of signature detect action:
There are two approaches used for signature verification according to the acquisition of the
data as:
 Offline signature verification system
 Online signature verification system
Offline signature verification system
In offline verification system, only static features are considered, whereas in case of online
systems, dynamic features are taken into consideration. Offline signature recognition is
performed after the writing is complete. The data is captured at a later time by using an
optical scanner to convert the image into a bit pattern. As compared to online signature
verification systems, off-line systems are difficult to design as many desirable characteristics
such as the order of strokes, the velocity and other dynamic information are not available in
the off-line case. The verification process has to fully rely on the features that can be
extracted from the trace of the static signature image only. Although difficult to design, off-
line signature verification is crucial for determining the writer identification as most of the
financial transactions in present times are still carried out on paper. Therefore, it becomes all
the more essential to verify a signature for its authenticity. The design of any signature
verification system generally requires the solution of five subproblems: data acquisition, pre-
processing, feature extraction, comparison process and performance evaluation. Off-line
verification just deals with signature images acquired by a scanner or a digital camera. In an
off-line signature verification system, a signature is acquired as an image.
This image represents a personal style of human handwriting, extensively described
by the graphometry. In such a system the objective is to detect different types of forgeries,
which are related to intra and inter-personal variability. The system applied should be able to
overlook inter-personal variability and mark these as original and should be able to detect
intra-personal variability and mark them as forgeries .In off–line signature recognition we are
having the signature template coming from an imaging device, hence we have only static
characteristic of the signatures. The person need not be present at the time of verification.
Hence off-line signature verification is convenient in various situations like document
verification, banking transactions etc. As we have a limited set of features for verification
purpose, off-line signature recognition systems need to be designed very carefully to achieve
the desired accuracy. The difference between the off-line and on-line lies in how data are
obtained. In the on-line SRVS data are obtained using special peripheral device, while in the
off-line SRVS images on the signature written on a paper are obtained using scanner or a
camera [2]. In this research, an approach for off-line signature recognition and verification is
proposed. The designed system consist of three stages: the first stage is pre-processing stage
which applied some operations and filters to improve and enhance signature image.
The purpose of the pre-processing stage is to determine the best signature image for
the next stage which is feature extraction stage, choosing the right feature is an art more than
a science. Three powerful features are used: global feature, texture feature and grid
information feature [3]. The three features are calculated for each signature image and enter
to the last stage which is neural network stage. Neural network consist of two-stage
classifiers: the first classifier stage contain three back propagation (BP) neural networks, each
one of the three BP takes its input from one of the three features and trained individually of
each other.
Each BP have two outputs that enter as an input to the second stage classifier. The
second stage classifier consists of two radial basis function (RBF) neural networks. It is the
task of the second classifier (RBF) to combine the result of the first classifier (BP) to make
the final decision of the system [4]. Offline verification is concerned with the verification of a
signature made by a normal pen. Various different approaches to both classes have been
proposed.

CHAPTER 4
INTRODUCTION TO MATLAB

4.1 What Is MATLAB?

MATLAB ® is a high-performance language for technical computing. It integrates


computation, visualization, and programming in an easy-to-use environment where problems
and solutions are expressed in familiar mathematical notation. Typical uses include

Math and computation


Algorithm development
Data acquisition
Modeling, simulation, and prototyping
Data analysis, exploration, and visualization
Scientific and engineering graphics
Application development, including graphical user interface building.
MATLAB is an interactive system whose basic data element is an array that does not
require dimensioning. This allows you to solve many technical computing problems,
especially those with matrix and vector formulations, in a fraction of the time it would take to
write a program in a scalar non interactive language such as C or FORTRAN.

The name MATLAB stands for matrix laboratory. MATLAB was originally written to
provide easy access to matrix software developed by the LINPACK and EISPACK projects.
Today, MATLAB engines incorporate the LAPACK and BLAS libraries, embedding the
state of the art in software for matrix computation.

MATLAB has evolved over a period of years with input from many users. In university
environments, it is the standard instructional tool for introductory and advanced courses in
mathematics, engineering, and science. In industry, MATLAB is the tool of choice for high-
productivity research, development, and analysis.
MATLAB features a family of add-on application-specific solutions called toolboxes.
Very important to most users of MATLAB, toolboxes allow you to learn and apply
specialized technology. Toolboxes are comprehensive collections of MATLAB functions (M-
files) that extend the MATLAB environment to solve particular classes of problems. Areas in
which toolboxes are available include signal processing, control systems, neural networks,
fuzzy logic, wavelets, simulation, and many others.

4.2 The MATLAB System:


The MATLAB system consists of five main parts:
Development Environment:
 This is the set of tools and facilities that help you use MATLAB functions and files.
Many of these tools are graphical user interfaces. It includes the MATLAB desktop and
Command Window, a command history, an editor and debugger, and browsers for viewing
help, the workspace, files, and the search path.
The MATLAB Mathematical Function:
This is a vast collection of computational algorithms ranging from elementary
functions like sum, sine, cosine, and complex arithmetic, to more sophisticated functions like
matrix inverse, matrix eigen values, Bessel functions, and fast Fourier transforms.
The MATLAB Language:
This is a high-level matrix/array language with control flow statements, functions, data
structures, input/output, and object-oriented programming features. It allows both
"programming in the small" to rapidly create quick and dirty throw-away programs, and
"programming in the large" to create complete large and complex application programs.
Graphics:
MATLAB has extensive facilities for displaying vectors and matrices as graphs, as
well as annotating and printing these graphs. It includes high-level functions for two-
dimensional and three-dimensional data visualization, image processing, animation, and
presentation graphics. It also includes low-level functions that allow you to fully customize
the appearance of graphics as well as to build complete graphical user interfaces on your
MATLAB applications.

The MATLAB Application Program Interface (API):


This is a library that allows you to write C and Fortran programs that interact with
MATLAB. It includes facilities for calling routines from MATLAB (dynamic linking),
calling MATLAB as a computational engine, and for reading and writing MAT-files.
MATLAB working environment:
MATLAB desktoP:-

Matlab Desktop is the main Matlab application window. The desktop contains five sub
windows, the command window, the workspace browser, the current directory window, the
command history window, and one or more figure windows, which are shown only when the
user displays a graphic.

The command window is where the user types MATLAB commands and expressions
at the prompt (>>) and where the output of those commands is displayed. MATLAB defines
the workspace as the set of variables that the user creates in a work session. The workspace
browser shows these variables and some information about them. Double clicking on a
variable in the workspace browser launches the Array Editor, which can be used to obtain
information and income instances edit certain properties of the variable.
The current Directory tab above the workspace tab shows the contents of the current
directory, whose path is shown in the current directory window. For example, in the windows
operating system the path might be as follows: C:\MATLAB\Work, indicating that directory
“work” is a subdirectory of the main directory “MATLAB”; WHICH IS INSTALLED IN
DRIVE C. clicking on the arrow in the current directory window shows a list of recently used
paths. Clicking on the button to the right of the window allows the user to change the current
directory.

MATLAB uses a search path to find M-files and other MATLAB related files, which
are organize in directories in the computer file system. Any file run in MATLAB must reside
in the current directory or in a directory that is on search path. By default, the files supplied
with MATLAB and math works toolboxes are included in the search path. The easiest way to
see which directories are on the search path. The easiest way to see which directories are
soon the search paths, or to add or modify a search path, is to select set path from the File
menu the desktop, and then use the set path dialog box. It is good practice to add any
commonly used directories to the search path to avoid repeatedly having the change the
current directory.
The Command History Window contains a record of the commands a user has entered
in the command window, including both current and previous MATLAB sessions. Previously
entered MATLAB commands can be selected and re-executed from the command history
window by right clicking on a command or sequence of commands.

This action launches a menu from which to select various options in addition to executing the
commands. This is useful to select various options in addition to executing the commands.
This is a useful feature when experimenting with various commands in a work session.

Using the MATLAB Editor to create M-Files:

The MATLAB editor is both a text editor specialized for creating M-files and a
graphical MATLAB debugger. The editor can appear in a window by itself, or it can be a sub
window in the desktop. M-files are denoted by the extension .m, as in pixelup.m. The
MATLAB editor window has numerous pull-down menus for tasks such as saving, viewing,
and debugging files. Because it performs some simple checks and also uses color to
differentiate between various elements of code, this text editor is recommended as the tool of
choice for writing and editing M-functions. To open the editor, type edit at the prompt opens
the M-file filename.m in an editor window, ready for editing. As noted earlier, the file must
be in the current directory, or in a directory in the search path.

Image Representation 3.1 Image Format An image is a rectangular array of values (pixels).
Each pixel represents the measurement of some property of a scene measured over a finite
area. The property could be many things, but we usually measure either the average
brightness (one value) or the brightnesses of the image filtered through red, green and blue
filters (three values). The values are normally represented by an eight bit integer, giving a
range of 256 levels of brightness. We talk about the resolution of an image: this is defined by
the number of pixels and number of brightness values. A raw image will take up a lot of
storage space. Methods have been defined to compress the image by coding redundant data in
a more efficient fashion, or by discarding the perceptually less significant information.
MATLAB supports reading all of the common image formats. Image coding is not addressed
in this course unit.

3.2 Image Loading and Displaying and Saving An image is loaded into working memory
using the command

>> f = imread(‘image file name’);

The semicolon at the end of the command suppresses MATLAB output. Without it,
MATLAB will execute the command and echo the results to the screen. We assign the image
to the array f. If no path is specified, MATLAB will look for the image file in the current
directory. The image can be displayed using >> imshow(f, G)

f is the image to be displayed, G defines the range of intensity levels used to display it. If it is
omitted, the default value 256 is used. If the syntax [low, high] is used instead of G, values
less than low are displayed as black, and ones greater than high are displayed as white.
Finally, if low and high are left out, i.e. use [ ], low is set to the minimum value in the image
and high to the maximum one, which is useful for automatically fixing the range of the image
if it is very small or vary large. Images are usually displayed in a figure window. If a second
image is displayed it will overwrite the first, unless the figure function is used:
>> figure, imshow(f)

will generate a new figure window and display the image in it. Note that multiple functions
may be called in a single line, provided they are separated by commas. An image array may
be written to file using:

>> imwrite(array name, ‘file name’)

The format of the file can be inferred from the file extension, or can be specified by a third
argument. Certain file formats have additional arguments. 3.3 Image Information Information
about an image file may be found by

>> imfinfo filename

4 Quantisation 4.1 Grey Level Ranges Images are normally captured with pixels in each
channel being represented by eight bit integers. (This is partly for historical reasons – it has
the convenience of being a basic memory unit, it allows for a suitable range of values to be
represented, and many cameras could not capture data to any greater accuracy. Further, most
displays are limited to eight bits per red, green and blue channel.) But there is no reason why
pixels should be so limited, indeed, there are devices and applications that deliver and require
higher resolution data.

illumination (other wavelength ranges and more of them are possible). MATLAB provides
functions for changing images from one type to another. The syntax is

>> B = data_class_name(A)

where data_class_name is one of the data types in the above table, e.g.

>> B = uint8(A)
4.2 Number of Pixels Images come in all sizes, but are (almost) always rectangular.
MATLAB gives several methods of accessing the elements of an array, i.e. the pixels of an
image. An element can be accessed directly: typing the array name at the prompt will return
all the array elements (which could take a while), typing the array name followed by element
indices in round brackets, will return that value. Ranges of array elements can be accessed
using colons.

Ranges of array elements can be accessed using colons.

>> A(first:last)

Will return the first to last elements inclusive of the one dimensional array A. Note that the
indices start at one.

>> A(first : step : last)

Will return every step elements starting from first and finishing when last is reached or
exceeded. Step could be negative, in which case you’d have to ensure that first was greater
than last. Naturally, this notation can be extended to access portions of an image. An image,
f, could be flipped using

>> fp = f(end : -1 : 1, :);

The keyword end is used to signify the last index. Using the colon alone implies that all index
values are traversed. This also indicates how multi-dimensional arrays are accessed. Or a
section of an image could be abstracted using

>> fc = f(top : bottom, left : right);


Or the image could be subsampled using

>> fs = f(1 : 2 : end, 1 : 2 : end);

4.2.1 A note on colour images. If the input image is colour, these operations will return
greyscale results. A colour image has three values per pixel, which are accessed using a third
index.

>> A(x, y, 1:3)

Would return all three colour values of the pixel at (x,y). A colour plane could be abstracted
using

>> R = A(x, y, 1);

And similarly for G (last index = 2) and B.

Point Processing Point processing operations manipulate individual pixel values, without
regard to any neighbouring values. Two types of transforms can be identified, manipulating
the two properties of a pixel: its value and position.

5.1 Value Manipulation The fundamental value of a pixel is its brightness (in a monochrome
image) or colour (in a multichannel image).

5.1.1 Pixel Scaling Scaling of pixel values is achieved by multiplying by a constant.


MATLAB provides a single function that achieves several effects

>> R = imadjust(A, [low_in, high_in], [low_out, high_out], gamma);

This takes the input range of values as specified and maps them to the output range that’s
specified. Values outside of the input range are clamped to the extremes of the output range
(values below low_in are all mapped to low_out). The range values are expected to be in the
interval [0, 1]. The function scales them to values appropriate to the image type before
applying the scaling operation. Whilst low_in is expected to be less than high_in, the same is
not true for low_out and high_out. The image can therefore be inverted. The value of gamma
specifies the shape of the mapped curve. Gamma = 1 gives a linear scaling, a smaller gamma
gives a mapping that expands the scale at lower values, a larger gamma expands the upper
range of the scale. This can make the contrast between darker or brighter tones more visible,
respectively. Omitting any of the parameters results in default values being assumed. The
extremes of the ranges are used (0 and 1), or gamma = 1.

Histogram

The histogram of an image measures the number of pixels with a given grey or colour value.
Histograms of colour images are not normally used, so will not be discussed here. The
histogram of an image with L distinct intensity levels in the range [0, G] is defined as the
function h(rk) = nk.

rk is the k th intensity level in the image, and nk will be the number of pixels with grey value
rk . G will be 255 for a uint8 image, 65536 for uint16 and 1.0 for a double image. Since the
lower index of MATLAB arrays is one, never zero, r1 will correspond to intensity level 0,
etc. For the integer valued images, G = L-1. We often work with normalised histograms. A
normalised histogram is obtained by dividing each element of h(rk) by the total number of
pixels in the image (equal to the sum of histogram elements). Such a histogram is called the
probability density function (pdf) and reflects the probability of a given intensity level
occurring. p(rk) = nk /n MATLAB functions for computing and displaying the histogram and
normalised histogram are: >> h = imhist(A, b); >> p = imhist(A, b)/numel(A); b is the
number of bins in the histogram. If it is omitted, 256 is assumed. The function numel(A)
returns the number of elements in the argument, in this case the number of pixels in A.
Additional functions are defined for displaying the data in different forms: notably as a bar
graph and for adding axes.

CHAPTER 4
PROPOSED METHOD
BLOCK DIAGRAM
Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have proven successful in recent years at a large
number of image processing-based machine learning tasks. Many other methods of
performing such tasks revolve around a process of feature extraction, in which hand-
chosen features extracted from an image are fed into a classifier to arrive at a
classification decision. Such processes are only as strong as the chosen features, which
often take large amounts of care and effort to construct. By contrast, in a CNN, the
features fed into the final linear classifier are all learned from the dataset. A CNN consists
of a number of layers, starting at the raw image pixels, which each perform a simple
computation and feed the result to the next layer, with the final result being fed to a linear
classifier. The layers’ computations are based on a number of parameters which are
learned through the process of backpropagation, in which for each parameter, the gradient
of the classification loss with respect to that parameter is computed and the parameter is
updated with the goal of minimizing the loss function. Exactly how this update is done
and what the loss function is are tunable hyperparameters of the network, discussed in
more detail below. For more details on backpropagation, see [5]. 3.2. VGG Architecture
The architecture of a CNN determines how many layers it has, what each of these layers
is doing, and how the layers are connected to each other. Choosing a good architecture is
crucial to successful learning with a CNN. For our main training tasks, we used the VGG-
16 CNN architecture [6]. This network contains a total of 16 layers with learnable
parameters. These layers are of the following types: Fully Connected Layers: Fully
connected layers apply an affine transformation to their inputs. Mathematically, a fully-
connected layer from n inputs to h outputs works as follows:
In a fully-connected layer, every output depends on every input
according to the weight matrix W, a learnable parameter. Outputs also depend on a bias term
b which is learnable but doesn’t depend on the inputs. ReLU Nonlinearity: The Rectified
Linear Unit is a commonly used activation function after fully-connected layers. This layer
applies the following mathematical operation to input X:

Here, the maximum is taken elementwise. ReLU layers do not


have any learnable parameters. ReLU is commonly used in modern neural networks instead
of other possible activation functions such as sigmoid and tanh for several reasons. One
reason is that their computation is very simple, saving time during training that would be
spent computing exponentials for sigmoid and tanh. ReLU neurons also do not become
saturated at high input values, meaning their gradient does not vanish to zero when receiving
such values. This allows the neurons to continue learning in scenarios where other activation
functions would have vanishing gradients. However, because the gradient of a ReLU neuron
is 0 for negative inputs, it’s also possible for these neurons to stop learning in cases where
their input never produces positive values. Softmax Nonlinearity: The Softmax nonlinearity
appears in the final layer of the neural network and computes final class scores that will be
fed into the loss function or outputted during testing. It has no learnable parameters. These
scores have an interpretation as the neural network’s estimated probabilities for each class.
Note that all the output values produced by the Softmax function add to one. Mathematically,
the i th class probability f(x)i is computed as follows:

Using Softmax to compute class scores is an attractive option because


of its ease of interpretation. Convolutional Layers: Convolutional layers process an input
image by sliding a number of small filters across each possible region and outputting the dot
product of the filter and the image at each region. They are similar to fullyconnected layers,
but with restrictions on which input neurons are connected to which outputs. Specifically,
outputs are only connected to inputs of a small region, and all weights for each filter are tied
together rather than being allowed to be learned independently. The learnable parameters for
a convolutional layer are the weights of each filter and one bias value for each filter.
Convolutional layers lend themselves naturally to understanding of images, in which we
often want to extract features by looking at small areas of an image, where we don’t care
exactly where in the image the feature is. For example, a face is still a face regardless of
where in the image it is. Our architecture uses 3x3 filters, with more filters used per layer as
the network gets deeper. Max Pooling Layers: Max pooling layers reduce the size of an
image by combining 2x2 regions of the input into a single output value. For each 2x2 region
of input, the output value is simply the max value of those 4 input pixels. This layer has no
learnable parameters. This layer cuts both the width and height of the image in half as it goes
to the next layer. In networks with max pooling layers, the input width and height decrease as
the image is forwarded through the network, and the number of filters (depth) tends to
increase. This corresponds to processing the image at a higher level of abstraction in which
features correspond to larger region of the input image. Dropout Layers: Dropout layers are a
non-deterministic nonlinearity used in many modern neural networks. A dropout layer takes
in a number of inputs and for each input, sets it to 0 with probability p and leaves it
unchanged with probability (1-p). During test time, dropout layers instead behave
deterministically and multiply all input values by (1- p), the average value by which they
were multiplied during training. The dropout value p is a tunable hyperparameter for the
network. Dropout can be interpreted as a form of regularization, as using dropout during
training forces the network to have many ways of computing a correct result rather than just
one. This keeps the network from relying too heavily on any single connection. 3.3. Training
the Network As our loss function, we chose the standard Categorical Cross-Entropy loss with
L2 regularization, which is computed from the outputs of the final Softmax layer. If X is the
output from the Softmax layer for a given training data point, the unregularized loss is
defined as:

Where yi is the correct class label. This is a standard loss function for CNN’s.
One important feature of this loss function is that unlike other choices such as the Hinge Loss
function, it does not become satisfied with results that are ”good enough” and seeks to push
the class scores closer and closer to perfection, corresponding to all the probability mass
being assigned to the correct class. Using the gradient of this loss function, we update all
parameters in the network using the Nesterov Momentum update. Our reasoning for using
this particular update method is explained in our Results section. This update keeps track of a
variable ”v” that is a function of the magnitude of previous updates, allowing the learning
rate to depend on this variable. This type of update is more tolerant to different values of
learning rate and tends to converge relatively quickly, as it can adapt over time based on how
quickly the parameters are changing. 3.4. Transfer Learning VGG-16 is a large neural
network with a huge number of learnable parameters. Effectively training a network this large
from scratch requires a large dataset and access to substantial computational resources.
However, this problem can be averted by leveraging similarities between different image
datasets. Specifically, for most reasonable image classification tasks, the low-level features
learned by the first few layers of a network will be roughly the same regardless of the dataset.
This means that we can initialize our neural network with parameter values learned from a
different dataset and expect that the values for the first layers of the network will work well
without having to be trained. This process is known as transfer learning. For our task, we
downloaded pretrained weights trained on ImageNet for the VGG-16 architecture by
following the directions here:

We initialized our model with the pretrained weights and allowed only the fully-connected
layers at the end to train, keeping the convolutional layers fixed throughout training (this is a
decision we came to based on testing different options, see results for further discussion).

4.5. Comparison Networks

In our final experiment, we make use of a model proposed in [7] for image patch comparison.
The paper proposes several models for image patch comparison, including siamese (similar to
[8]), pseudo siamese, and 2-channel networks. In the 2-channel network, two images are
stacked (i.e. each image serves as a channel for the composite, paired image) as the input to a
convolutional network. The convolutional network then leads to a full connected linear
decision layer with 1 output that indicates the similarity of the two patches. They found the
the 2-channel network outperforms the other proposed architectures. Further details about our
adaptation are in the Results section. The following is a sketch of a 2-channel architecture:
Figure 2. 2-channel network for image patch comparison [7]

4. Dataset and Feature

5. Results and analysis

5.1. Evaluation Metrics


During training and model tuning, we held out a test set which was not tested on until the
final submission of our project. We report accuracy on this set as well as on a validation set,
which we used to tune our hyperparameters as we iterated on our models. The model we
consider ”best” is the one which achieved the best performance on the validation set, and we
ran this
best model
on the test
set to

achieve the results for the test set. All of our tasks involve the test-time behavior of
classifying whether a given signature is forged or genuine. We evaluate our performance
using three metrics: classification accuracy, False Acceptance Rate, and False Rejection Rate.
They are defined as follows, where genuine signatures are considered ”positive” examples:
Explain the figures in your own words
CHAPTER 7

Conclusions

We experimented with several variations on signature verification tasks. We showed that


convolutional neural networks do an excellent job of verifying signatures when allowed
access during training to examples of genuine and forged signatures of the same people
whose signatures are seen at test time. We then conducted an experiment where we tested our
network on the signatures of new people whose signatures had not been seen at all during
training, resulting in performance little better than a naive baseline due to the inherent
difficulty of this task. Finally, we proposed a novel architecture for the comparison of
signatures which has promise for future work in signature verification, specifically in
situations where a possibly-forged signature can be compared to known genuine signatures of
a specific signer. We propose two directions for future work. First, access to more resources
would allow us to achieve better performance on our main task. Specifically, being able to
train on a larger dataset with more signature examples per person could achieve higher
accuracies, as well as training a larger network for more epochs, which we were not able to
do due to time and computational resource constraints. This task is attractive because it
mirrors the situation of a real-world application of signature verification. Although we were
unable to achieve high results at this task, the literature on signature verification indicates that
our method is promising. Having access to more data and more computational resources
would likely allow us to achieve stronger results at this task as well. With access to these, we
would train our model on larger datasets and allow more layers to train for more epochs.

REFERENCES

[1] E. J. Justino, A. El Yacoubi, F. Bortolozzi, and R. Sabourin, An off-line signature


verification system using HMM and graphometric features, in Fourth IAPR International
Workshop on Document Analysis Systems (DAS), Rio de. Citeseer, 2000, pp. 211222.

[2] Yeung, D.-Y., Chang, H., Xiong, Y., George, S.E., Kashi, R.S., Matsumoto, T., Rigoll,
G.: SVC2004: ”First International Signature Verification Competition” in ICBA 2004.
LNCS, vol. 3072, pp. 16-22. Springer, Heidelberg
[3] Liwicki, M., Malik, M., van den Heuvel, C., Chen, X., Berger, C., Stoel, R., Blumenstein,
M., Found, B.: Signature Verification Competition for Online and Offline Skilled Forgeries
(SigComp2011). in 2011 International Conference on Document Analysis and Recognition

[4] J. Vargas, M. Ferrer, C. Travieso, and J. Alonso, Off-line Handwritten Signature GPDS-
960 Corpus, in Ninth International Conference on Document Analysis and Recognition,
2007. ICDAR 2007, vol. 2, Sep. 2007, pp. 764768. [5] A. Karpathy ”Backpropagation,
Intuitions” from Stanford’s CS231n, http://cs231n.github.io/optimization-2/, 2015

[6] K. Simonyan, A. Zisserman ”Very Deep Convolutional Networks for Large-Scale Image
Recognition” in ICLR 2015 [7] S. Zagoruyko, N. Komodakis ”Learning to Compare Image
Patches via Convolutional Neural Networks” CVPR 2015, arXiv:1504.03641v1

[8] J. Bromley, J. W. Bentz, L. bottou, I. Guyon, Y. LeCun, C. Moore, E. Sa ckinger, and R.


Shah, Signature verification using a siamese time delay neural network, Int. J. Pattern
Recognit. Artificial Intell., vol. 7, no. 4, pp. 669687, Aug. 1993.

[9] H. Baltzakis and N. Papamarkos, A new signature verification technique based on a two-
stage neural network classifier, Engineering applications of Artificial intelligence, vol. 14, no.
1, pp. 95103, 2001.

[10] L. S. Oliveira, E. Justino, C. Freitas, and R. Sabourin, The graphology applied to


signature verification, in 12th Conference of the International Graphonomics Society, 2005,
pp. 286 290

[11] J.-P. Drouhard, R. Sabourin, and M. Godbout, A neural network approach to off-line
signature verification using directional PDF, Pattern Recognition, vol. 29, no. 3, pp. 415424,
1996.

[12] P. S. Deng, H.-Y. M. Liao, C. W. Ho, and H.-R. Tyan, Wavelet-Based Off-Line
Handwritten Signature Verification, Computer Vision and Image Understanding, vol. 76, no.
3, pp. 173190, Dec. 1999

[13] R. Sabourin and G. Genest, An extended-shadow-code based approach for off-line


signature verification. I. Evaluation of the bar mask definition, in Conference on Pattern
Recognition, 1994. Vol. 2 - Conference B: Computer Vision amp; Image Processing.,
Proceedings of the 12th IAPR International, vol. 2, Oct. 1994, pp. 450453 vol.2.
[14] T. Ojala, M. Pietikinen, and D. Harwood, A comparative study of texture measures with
classification based on featured distributions, Pattern Recognition, vol. 29, no. 1, pp. 5159,
Jan. 1996.

[15] N. A. Murshed, R. Sabourin, and F. Bortolozzi, A cognitive approach to off-line


signature verification, International Journal of Pattern Recognition and Artificial Intelligence,
vol. 11, no. 05, pp. 801825, 1997.

[16] H. Khalajzadeh, M. Mansouri, and M. Teshnehlab, Persian Signature Verification using


Convolutional Neural Networks, in International Journal of Engineering Research and
Technology, vol. 1. ESRSA Publications, 2012.

[17] K. Huang and H. Yan, Off-line signature verification based on geometric feature
extraction and neural network classification, Pattern Recognition, vol. 30, no. 1, pp. 917, Jan.
1997.

[18] L. Batista, E. Granger, and R. Sabourin, Dynamic selection of generativediscriminative


ensembles for off-line signature verification, Pattern Recognition, vol. 45, no. 4, pp.
13261340, Apr. 2012.

[19] E. zgndz, T. entrk, and M. E. Karslgil, Off-line signature verification and recognition by
support vector machine, in European signal processing conference, EUSIPCO, 2005.

You might also like