Professional Documents
Culture Documents
Abstract
CHAPTER 1
INTRODUCTION
Online verification requires an electronic signing system which provides data such as
the pen’s position, azimuth/altitude angle, and pressure at each time-step of the Figure 1.
Superimposed examples of multiple genuine signatures from the same ID, indicating high
intraclass variability (from [10]) signing. By contrast, offline verification uses solely 2D
visual (pixel) data acquired from scanning signed documents. While online systems
provide more information to verify identity, they are less versatile and can only be used in
certain contexts (e.g. transaction authorization) because they require specific input
systems. We aim to build an offline signature verification system using a Convolutional
Neural Network (CNN). Our paper focuses on building systems trained on data with
varying degrees of information, as well as experimenting with different objective
functions to obtain optimal error rates.
CHAPTER 2
RELATED WORK
Unfortunately, the task of offline signature verification does not have a uniform standard
for performance evaluation. In 2004, the University of Hong Kong hosted the first
international signature verification competition [2], but the competition was solely for
online signatures. Nonetheless, this paved the way for future competitions for both online
and offline verification ([3], [4]). Beyond the online/offline distinction, there are two
further distinctions that define the task of signature verification, which we will refer to as
”forgery exposure” and ”writer-dependence”. The forgery exposure of the task is
determined by what kinds of signatures the model is trained on. Some authors train on
forgeries and genuine signatures, but test on different forgeries and genuine signatures
during testing; some train on forgeries for identities that exist solely in the training set
(then test on forgeries and genuine signatures for novel IDs); some authors do not use
forgeries at all during training. It is worth noting that only the latter two tasks are
reasonable for practical applications. That is, while a system could have access to a
development set that includes forgeries, it is unlikely that forgeries can be acquired for all
users of that system. We have trained systems for the first two of these varieties. In the
experiments section, we detail exactly how we use the training and test data to
accomplish each task. The writer-dependence distinction refers to whether the model has
a different classifier for each identity, or a single classifier for all identities. Writer-
dependent models, which are more common in the literature, use a different classifier for
each user. At test time, the model is given the ID of the signature for which it is testing
authenticity. Writer-independent models use a single classifier for all identities. We have
developed both writer-dependent and writer-independent models in our report. 2.2.
Feature Selection Most existing models in the literature use explicit feature extraction,
including geometric [9], graphometric [10], directional [11], wavelet [12], shadow [13],
and texture [14] features . Only in recent years has feature learning been explored [15].
We feed in raw pixels to our network, letting the CNN learn the relevant features for
signature verification. 2.3. Competing Models The signature verification literature
includes several examples of Hidden Markov Models [18], Neural Networks [17],
Support Vector Machines [19], and other machine learning models. In 2012, Khalajzadeh
et al. [16] used CNNs for Persian signature verification, which is the only report of CNNs
being used in the offline signature verification literature. Unfortunately, the paper
includes very little information about their methodology (i.e. forgery exposure, writer
dependence). Based on their explanation, we assume their model is trained on forgeries
and genuine signatures for all IDs, and thus can be compared with our work in our main
experiment. There results only report an average of 99.86 for validation performance, and
mean squared error, making it difficult to fully compare our model to theirs.
MOTIVATION:
THESIS ORGANIZATION:
CHAPTER 3
3.1 BACKGROUND:
Digital image processing is an area characterized by the need for extensive
experimental work to establish the viability of proposed solutions to a given problem. An
important characteristic underlying the design of image processing systems is the significant
level of testing & experimentation that normally is required before arriving at an acceptable
solution. This characteristic implies that the ability to formulate approaches &quickly
prototype candidate solutions generally plays a major role in reducing the cost & time
required to arrive at a viable system implementation.
An image may be defined as a two-dimensional function f(x, y), where x & y are
spatial coordinates, & the amplitude of f at any pair of coordinates (x, y) is called the
intensity or gray level of the image at that point. When x, y & the amplitude values of f are
all finite discrete quantities, we call the image a digital image. The field of DIP refers to
processing digital image by means of digital computer. Digital image is composed of a finite
number of elements, each of which has a particular location & value. The elements are called
pixels.
Vision is the most advanced of our sensor, so it is not surprising that image play the
single most important role in human perception. However, unlike humans, who are limited to
the visual band of the EM spectrum imaging machines cover almost the entire EM spectrum,
ranging from gamma to radio waves. They can operate also on images generated by sources
that humans are not accustomed to associating with image.
There is no general agreement among authors regarding where image processing stops
& other related areas such as image analysis& computer vision start. Sometimes a distinction
is made by defining image processing as a discipline in which both the input & output at a
process are images. This is limiting & somewhat artificial boundary. The area of image
analysis (image understanding) is in between image processing & computer vision
There are no clear-cut boundaries in the continuum from image processing at one end to
complete vision at the other. However, one useful paradigm is to consider three types of
computerized processes in this continuum: low-, mid-, & high-level processes. Low-level
process involves primitive operations such as image processing to reduce noise, contrast
enhancement & image sharpening. A low- level process is characterized by the fact that both
its inputs & outputs are images. Mid-level process on images involves tasks such as
segmentation, description of that object to reduce them to a form suitable for computer
processing & classification of individual objects. A mid-level process is characterized by the
fact that its inputs generally are images but its outputs are attributes extracted from those
images. Finally higher- level processing involves “Making sense” of an ensemble of
recognized objects, as in image analysis & at the far end of the continuum performing the
cognitive functions normally associated with human vision
Image processing is any form of signal processing for which the input is an image,
such as a photograph or video frame; the output of image processing may be either an image
or, a set of characteristics or parameters related to the image. Most image-processing
techniques involve treating the image as a two-dimensional signal and applying standard
signal-processing techniques to it. Image processing usually refers to digital image
processing, but optical and analog image processing are also possible.
A grayscale image is a function I (xylem) of the two spatial coordinates of the image
plane.
I(x, y) is the intensity of the image at the point (x, y) on the image plane.
I (xylem) takes non-negative values assume the image is bounded by a rectangle [0, a] ´[0,
b]I: [0, a] ´ [0, b] ® [0, info)
Color image:
It can be represented by three functions, R (xylem) for red, G (xylem) for green
and B (xylem) for blue.
Coordinate convention:
The result of sampling and quantization is a matrix of real numbers. We use two
principal ways to represent digital images. Assume that an image f(x, y) is sampled so that
the resulting image has M rows and N columns. We say that the image is of size M X N. The
values of the coordinates (xylem) are discrete quantities. For notational clarity and
convenience, we use integer values for these discrete coordinates. In many image processing
books, the image origin is defined to be at (xylem)=(0,0).The next coordinate values along
the first row of the image are (xylem)=(0,1).It is important to keep in mind that the notation
(0,1) is used to signify the second sample along the first row. It does not mean that these are
the actual values of physical coordinates when the image was sampled. Following figure
shows the coordinate convention. Note that x ranges from 0 to M-1 and y from 0 to N-1 in
integer increments.
The coordinate convention used in the toolbox to denote arrays is different from the
preceding paragraph in two minor ways. First, instead of using (xylem) the toolbox uses the
notation (race) to indicate rows and columns. Note, however, that the order of coordinates is
the same as the order discussed in the previous paragraph, in the sense that the first element
of a coordinate topples, (alb), refers to a row and the second to a column. The other
difference is that the origin of the coordinate system is at (r, c) = (1, 1); thus, r ranges from 1
to M and c from 1 to N in integer increments. IPT documentation refers to the coordinates.
Less frequently the toolbox also employs another coordinate convention called spatial
coordinates which uses x to refer to columns and y to refers to rows. This is the opposite of
our use of variables x and y.
Image as Matrices:
The preceding discussion leads to the following representation for a digitized image
function:
. . .
The right side of this equation is a digital image by definition. Each element of this
array is called an image element, picture element, pixel or pel. The terms image and pixel are
used throughout the rest of our discussions to denote a digital image and its elements.
Matrices in MATLAB are stored in variables with names such as A, a, RGB, real array
and so on. Variables must begin with a letter and contain only letters, numerals and
underscores. As noted in the previous paragraph, all MATLAB quantities are written using
mono-scope characters. We use conventional Roman, italic notation such as f(x ,y), for
mathematical expressions
Reading Images:
Images are read into the MATLAB environment using function imread whose syntax is
imread(‘filename’)
Format name Description recognized extension
Here filename is a spring containing the complete of the image file(including any applicable
extension).For example the command line
reads the JPEG (above table) image chestxray into image array f. Note the use of single
quotes (‘) to delimit the string filename. The semicolon at the end of a command line is used
by MATLAB for suppressing output. If a semicolon is not included. MATLAB displays the
results of the operation(s) specified in that line. The prompt symbol(>>) designates the
beginning of a command line, as it appears in the MATLAB command window.
When as in the preceding command line no path is included in filename, imread reads
the file from the current directory and if that fails it tries to find the file in the MATLAB
search path. The simplest way to read an image from a specified directory is to include a full
or relative path to that directory in filename.
For example,
reads the image from a folder called my images on the D: drive, whereas
Data Classes:
Although we work with integers coordinates the values of pixels themselves are not
restricted to be integers in MATLAB. Table above list various data classes supported by
MATLAB and IPT are representing pixels values. The first eight entries in the table are refers
to as numeric data classes. The ninth entry is the char class and, as shown, the last entry is
referred to as logical data class.
All numeric computations in MATLAB are done in double quantities, so this is also a
frequent data class encounter in image processing applications. Class unit 8 also is
encountered frequently, especially when reading data from storages devices, as 8 bit images
are most common representations found in practice. These two data classes, classes logical,
and, to a lesser degree, class unit 16 constitute the primary data classes on which we focus.
Many ipt functions however support all the data classes listed in table. Data class double
requires 8 bytes to represent a number uint8 and int 8 require one byte each, uint16 and int16
requires 2bytes and unit 32.
Image Types:
1 .Intensity images
2. Binary images
3. Indexed images
4. R G B images
Most monochrome image processing operations are carried out using binary or intensity
images, so our initial focus is on these two image types. Indexed and RGB colour images
Intensity Images:
An intensity image is a data matrix whose values have been scaled to represent
intentions. When the elements of an intensity image are of class unit8, or class unit 16, they
have integer values in the range [0,255] and [0, 65535], respectively. If the image is of class
double, the values are floating _point numbers. Values of scaled, double intensity images are
in the range [0, 1] by convention.
Binary Images:
Binary images have a very specific meaning in MATLAB.A binary image is a logical
array 0s and1s.Thus, an array of 0s and 1s whose values are of data class, say unit8, is not
considered as a binary image in MATLAB .A numeric array is converted to binary using
function logical. Thus, if A is a numeric array consisting of 0s and 1s, we create an array B
using the statement.
B=logical (A)
If A contains elements other than 0s and 1s.Use of the logical function converts all
nonzero quantities to logical 1s and all entries with value 0 to logical 0s.
islogical(c)
If c is a logical array, this function returns a 1.Otherwise returns a 0. Logical array can
be converted to numeric arrays using the data class conversion functions.
Indexed Images:
RGB Image:
An RGB color image is an M*N*3 array of color pixels where each color pixel is triplet
corresponding to the red, green and blue components of an RGB image, at a specific spatial
location. An RGB image may be viewed as “stack” of three gray scale images that when fed
in to the red, green and blue inputs of a color monitor
Produce a color image on the screen. Convention the three images forming an RGB
color image are referred to as the red, green and blue components images. The data class of
the components images determines their range of values. If an RGB image is of class double
the range of values is [0, 1].
Similarly the range of values is [0,255] or [0, 65535].For RGB images of class units or
unit 16 respectively. The number of bits use to represents the pixel values of the component
images determines the bit depth of an RGB image. For example, if each component image is
an 8bit image, the corresponding RGB image is said to be 24 bits deep.
Figure 5.1 : An image is an array or a matrix of pixels arranged in columns and rows.
In a (8-bit) grayscale image each picture element has an assigned intensity that ranges
from 0 to 255. A grey scale image is what people normally call a black and white image, but
the name emphasizes that such an image will also include many shades of grey.
Pixel :
A pixel (short for picture element) is a single point in a graphic image. Each such
information element is not really a dot, nor a square but an abstract sample. Each element
of the above matrix is known as a pixel where dark = 0 and light = 1. A pixel with only 1 bit will represent
a black and white image. If the numbers of bits are increased then the number of gray levels
will increase and a better picture quality is achieved. All naturally occurring images are
analog in nature. If the number of pixels is more than the clarity is more. An image is
represented as a matrix in Digital Image Processing. In Digital Signal Processing we use only
row matrices. Naturally occurring images should be sampled and quantized to get a digital
image. A good image should have 1024*1024 pixels which is known as 1k * 1k = 1Mpixel
Figure 5.2: Each pixel has a value from 0 (black) to 255 (white). The possible range of the
pixel values depend on the color depth of the image, here 8 bit = 256 tones or grayscales.
A normal grey scale image has 8 bit color depth = 256 grey scales. A “true color”
image has 24 bit color depth = 8 x 8 x 8 bits = 256 x 256 x 256 colors = ~16 million colors.
Figure 5.3: A true-color image assembled from three grayscale images colored red, green and
blue. Such an image may contain up to 16 million different colors. Some grayscale images
have more grayscales, for instance 16 bit = 65536 grayscales. In principle three grayscale
images can be combined to form an image with 281,474,976,710,656 grayscales.
The RGB color model relates very closely to the way we perceive color with the r, g
and b receptors in our retinas. RGB uses additive color mixing and is the basic color model
used in television or any other medium that projects color with light. It is the basic color
model used in computers and for web graphics, but it cannot be used for print production.
The secondary colours of RGB – cyan, magenta, and yellow – are formed by mixing
two of the primary colours (red, green or blue) and excluding the third color. Red and green
combine to make yellow, green and blue to make cyan, and blue and red form magenta. The
combination of red, green, and blue in full intensity makes white. The RGB color model is
an additive color model in which red, green, and blue light are added together in various ways
to reproduce a broad array of colors. The name of the model comes from the initials of the
three additive primary colors, red, green, and blue.
The main purpose of the RGB color model is for the sensing, representation, and
display of images in electronic systems, such as televisions and computers, though it has also
been used in conventional photography. Before the electronic age, the RGB color model
already had a solid theory behind it, based in human perception of colours.
Images of astronomical objects are usually taken with electronic detectors such as a
CCD (Charge Coupled Device). Similar detectors are found in normal digital cameras.
Telescope images are nearly always gray scale, but nevertheless contain some color
information. An astronomical image may be taken through a color filter. Different detectors
and telescopes also usually have different sensitivities to different colors (wavelengths).
If one or more of the images in a data set is taken through a filter that allows radiation
that lies outside the human vision span to pass – i.e. it records radiation invisible to us - it is
of course not possible to make a natural colour image. But it is still possible to make a colour
image that shows important information about the object. This type of image is called a
representative color image. Normally one would assign colors to these exposures in
chromatic order with blue assigned to the shortest wavelength, and red to the longest. In this
way it is possible to make color images from electromagnetic radiation far from the human
vision area, for example x-rays. Most often it is either infrared or ultraviolet radiation that is
used.
Image Noise - images containing random pixel values, usually generated from specific
parameterized distributions.
Synthetic images are often used to verify the correctness of operators by applying them to
known images. They are also often used for teaching purposes, as the operator output on such
images is generally `clean', whereas noise and uncontrollable pixel distributions in real
images make it harder to demonstrate unambiguous results. The images could be binary, grey
level or colour.
Numerous numbers of sources will be a reason for the addition of unwanted signs in
binary images in the form of the unhealthy illumination and noise [86-94]. This happens
especially when we are acquiring, quantization and transmitting the images.
In present and future scenario will be very much related to the digital image
progressioning which having the integral part of every one’s life. The applications which
always turn around the extended general documentations of a probability and visual exchange
of information medical fields and serious surveillance, the quality of the image and accuracy
is increased along with demand that have for images.
The blur in the images will be attained while acquiring the image and transmitting of
pictures which are degraded quality, which spoiled the picture quality in both sensing and
visual awareness. The quality also depends on the environmental conditions when the image
has been captured. For example, the images that have been sending by the wireless medium
maybe be affected by the lightening effects, temperature variation etc.
Get rid of the image is a well explored topic that have been prime aim to our any
progressioning the image which enhances quality of image by diminishing the noise from the
image. The main theme here to safeguard the required information of pictures and removing
undesired details on the image, which has to be achieved by the local geometrics and pixels
values having imperative data [92- 94].
Denoised
Input image Noise image Noise
image
f(x, y) + g(x, y) reduction
window f (x, y)
Unwanted
signal
n (x, y)
From the above shown figure 2.1 where the input image subjected to the addition of noise n
(x, y) and noise image g(x, y) provides the required output. Which has been given to the noise
reduction filter (s) and the denoised image is obtained at the output ( f (x, y) ).
A vast number of techniques have been future by number of authors which paved the way for
the further development in the removing of noise concept. These techniques will further be
categorized on their nature and use.
Here various types of noises are discussed that have encountered in the proposed technique.
They are shown below.
Sometimes the unwanted sign occurs due to mathematical iteration also, especially
when we try to compress an image. Noise is undesirable part of all the digital images. It is not
possible to delete the noise that has been in the digital images when we are acquiring the
image so it is essential to delete the unwanted signal by using of noise algorithms.
From variety of sources we will see the noise when we are taking the images from the
digital cameras and conventional film cameras. These noises are mainly huge in number.
They were listed here as follows.
The Gaussian, salt and pepper blur were two frequently widespread unwanted signals
that we observe. In every case the unwanted signals various pixels are correlated or
uncorrelated. These may be independent or the identically separated. The most frequently
occurred noises are seen here they are:
The most commonly used noise that has been seen in the black and white images
which are frequently used in the digital image progressioning. These are mainly used as a
benchmark standard where the AWGN will assess the performance of the image using
removing of noise algorithms. The probability distribution function of normal distribution
(Gaussian distribution) is due to Gaussian noise, this noise is also called as statistical noise.
Normally dram rate is extra to the spotless pel value in Gaussian distribution, this will
clearly be observed in Gaussian phenomenon. This is exactly same for all the pixels (mean =
variance). Here the amount of noise is added to each and every value of the pixel where the
amplifier noise of every noise is same as the AWGN. This Gaussian noise is mainly occurred
due to the thermal agitation of the charged carriers which is often seen within the electrical
conductor. The bell-shaped function will show the distribution function of the Gaussian
variable (bell curve) is exposed in fig 2.2.
Fig.5.6 The distribution of Gaussian function (bell curve)
In general statistics the normal distribution is taken into account most prominently.
The main reasons for this are the central limit theorem comes under the mild conditions, the
huge number of random variables is considered independently which is taken into account
from same distribution. The theorem is tactical analytically which is derived explicitly.
1 f 2
1 ( )
Pf(f) e2
2
2 is variance
is standard deviation
It has distribution of fat tail, the other name is spike blur or the impulsive blur. The
main reason for this noise is due to the loss of the pixel while we transmit the images. Here
when we compared the pixels with the surrounding images then we can notice the variation in
those pixels in color or intensity of those pixels, but is very limited to a small number of
pixels. Here the noise type will make the image either dark or saturated. This makes the rate
of the pixels is uncorrelated with rate of the pixel in spotless picture. This occurs mainly due
to the dust in the camera or overheated faulty CCD elements, A-D convertor errors etc. The
bit errors in transmission are also caused due to this. The removing of noise effect of median
filters is quite being effective on this kind.
This caused mainly due to the emery of the chip that will accumulate the
photoelectrons resulting in the unwanted signs. This also occurs in the absence of the light
and hence called as the dark-current noise which varies accordingly the temperate of the
sensor. This can diminish by the sign acquiring progression which can be done by the cooling
the sensor of the camera.
Here all the spatial coordinates of the blur is independent as well as uncorrelated
according to the picture itself, applications like X-rays and nuclear image/imaging on
medicine. The 2.1 table demonstrates the spatial self-governing PDF values the
variances and mean.
Here the type of blur will be depends on the spatially correlated itself and coordinates
to figure itself. Here the most frequently noise that occurs is periodic noise which is
correlated to it. This periodic noise occurs mainly due to the electrical and the
electromagnetic interferences which acquiring the image.
Table 5.1 Probability Density Functions PDF’s blur categories, Average and Variance
(s a) 2
1
p(z) e 2b2
, z
Gaussian 2 b ma 2 b2
(s a)2
2
p(z) (z a) e b
,z a
Rayleigh b b b(4 )
ma 2
za 4 4
=0,
a b z b 1 az
p(z) e
Erlang (b 1)! ,z0 b b
m 2
(gamma) a a2
=0 , z0
p(z) ae az , z 0
Exponential 1 1
=0, z0 m 2
a a2
1
p (z)
ba ,a z b
Uniform ab (b a) 2
m 2
=0, elsewhere 2 12
Salt and p a , z a
pepper p (z) pb , z b m a p a bpb 2 (a m)2 p a (b m)2 pb
0, otherwise
(impulse)
The binary image S(r, c) is interpreted like 2D collection of information. This is called
as Pixel representation. Each pixel magnitude refers to certain feature of the image in spatial
system. One such is the brightness of the image. Finally, the whole image is transformed into
a data matrix having rows and columns to fit 2D array representation. Every row in the
matrix matches for a vector, based on mapping many image formats [95-104].
The elementary type of image representation is called the "Binary image". It typical
uses only two levels. The two levels are referred to as black and white which are mentioned
as ‘1’ and ‘0’. This kind of image representation is considered like 1 bit per pixel image. This
is suitable to an reason which considers barely single digit number to signify every pel. These
types of images are often used to depict low level information of the picture like its outline or
shape. Especially in applications like representation in optical character (OCR) where the
only outline character required realizing the letter representing it. The digital images are
generated from the system of images of gray scale through a technique called thresholding.
The two-level thresholding simply acts as a decision factor above which it switches to
numerical '1' and below which it switches to numerical '0'.
(a) (b)
(c)
Images of Gray scale (GS) were denoted as neutral or single-color picture. Such images
possess information of brightness merely. Hence color data is contained by them is empty.
However, the brightness is represented at different levels. Typical 8-bit image holds a range
of 0-255 brightness levels known as gray levels. Here 0 refers to black and 1 refers to white.
The 8-bit depiction is obvious with the reality to a computer actually handles the data in 8-bit
format. Below Fig.5.7 (a) and (b) are two examples of such GSI.
(a) (b)
(c)
Color image (CI) which modeled as triple band single chromatic light information,
here every band of information will keeps in touch with various dissimilar colors.
(a) (b)
(c)
Fig.5.9 illustrations of CI
The genuine data accumulated via binary data of the image is brightness in all spectrum
bands. The displaying of the image that corresponds sequence of brightness which depicts on
the screen by image components that gives out energy in the form of light that corresponds to
a singular color. Colored pictures that are distinctive were signified as RGB images that uses
the eight bit monochrome ordinary as a replica, the color images that correspond will have 24
bits to 8 bits per pixel for each color bands (red, green, blue).
The following figure illustrates the vector as an arrow that adds, which refers an individual
smallest unit measurement of red, green, blue principles like a color vector ---(R, G, B ).
Fig.5.11 A shading pixel vector comprises of the red, green and blue pixel esteems (R, G, B)
at one given line/section pixel facilitate (r, c)
Fig.5.12 Color pixel representation
For many applications, the information about RGB shading is changed into mathematical
space that decouples the brilliance information from the genuine color data. The HSL shading
change enables we to portray shades by means of all promptly get it. The daintiness is the
splendor of the shading, and the tint is the thing that we regularly consider as "shading" and
the tone (ex: green, blue, red, and orange). The immersion is a proportion of how much white
is in shading (ex: Pink equals to red when it has extra white, so it is less soaked than an
unadulterated red).
(a) (b)
PC illustrations are a specific field inside that alludes to the software engineering
domain that alludes to the multiplication of visual information using PC.
In PC illustrations, sorts of picture information are separated into two principle classes:
1. Bitmap picture (or raster picture): can spoke to by our picture demonstrate I(r, c), where
we have pixel information and comparing brightness esteems that put away in file
arrangement.
2. Vector pictures: allude to the strategies for shows lines, bends shapes by storing only just
the key focuses. These key focuses are adequate to characterize the shapes, and the procedure
of turning postulations into a picture is called ripping after the picture has been rendered, it
very well may be thought of as being in bit outline where every pixel has particular qualities
related with it.
Picture groups are institutionalized methods for sorting out and putting away
advanced pictures [101]. Picture records are taken from advanced data in individual
configurations that can be rasterized for use on a PC show or printer. A picture manuscript
configuration may stockpile data in uncompressed, full to capacity, or designs of vectors.
Once rasterized, an image turns into a framework of pixels, every one of which has different
bits to allocate its shading equivalent to the coloring profundity of the gadgets shows it. Most
the sort of document organize fall into class of bitmap pictures. By and large, these sorts of
pictures contain both header data and the crude pixel information. The header data contain
data with respect to
vi. Moreover, with a portion of the beneficial complex record designs have the header may
contain information about the kind of pressure utilized and other vital requirements to make
the picture, I (r, c).
The BMP otherwise called bitmap picture record or gadget free bitmap (DIB) document
arrange or basically a bitmap, is a raster illustrations picture document design used to stock
up bitmap computerized pictures, independently of the showcase gadget, (for example, a
designs connector), specifically on Microsoft Windows and OS working frameworks.
This record arrange can store 2D double pictures of arbitrary width, tallness, and goals in
both monochrome and shading pictures, in an assortment of shading profundities, and
alternatively with information demands, alpha channels, and shading plots.
Labeled Image File Format is a standout amongst the most famous and adaptable of the
present open space raster document positions. They are utilized on World Wide Web
(WWW). GIF records are restricted to a most extreme of 8 bits/pixel and takes into
consideration a kind of pressure called LZW. The GIF picture header is 13 byte long and
contains essential data.
This is the correct organization for those photograph pictures which must be little records, for
instance, for sites or for email. JPG is frequently utilized on advanced camera memory cards.
The JPG record is magnificently little, frequently packed to maybe just 1/10 of the span of
the first information, which is something to be thankful for when modems are included. In
any case, this awesome pressure effectiveness accompanies a high cost.
JPG utilizes lossy pressure (lossy signifying "with misfortunes to quality"). Here Lossy
implies that nature of the picture is lost where the pressure of JPG information is done and
spared, and the recuperation of this quality is never to be occur. JPEG pictures pressure is
being utilized widely on the WWW. It's, adaptable, so it can make huge documents with
fantastic picture correspondence.
It is produced for the CVIP devices programming, when performing brief pictures are made
that utilization skimming point portrayal which is past the standard 8 bit/pixel. To speak to
this sort of information the remapping is utilized, which is the way toward taking unique
picture and adding a condition to make an interpretation of it to the rang (0-225).
The indexed representation of color images is used by GIF (having the palette of maximum
256 shades), the LZW (Lempel–Ziv–Welch) firmness procedure, and title of size 13 byte.
The most primitive methods for generating medical pictures are using illumination to
generate snapshots, whichever due to gross anatomic structures, or the use of a microscope,
of histological specimens. The creation of images is still done by light. However, the internal
structures of the body are not seen by visible light.
5.8.1 X-Rays
In 1895 X-ray beams were first found by Wilhelm Conrad Roentgen, for that he granted
Nobel Prize in 1901 for material science for his accomplishment. This cleared every one of
the researchers an overall enthusiasm for the field of medication and many more applications
by 1900, there as of now were various medicinal radiological social orders. Accordingly, the
premise was laid for another subdivision of prescription committed to imaging the
development and capacity of the body. Standard X-beam framework with picture intensifier
is taken. X beams impinging on the picture intensifier are changed into a conveyance of
electrons, which delivers an enhanced light picture on a littler fluorescent screen after
quickening.
The picture is seen by a TV camera and a film camera and can be seen on a PC screen and
put away on a CD-ROM or a PACS. X-beam innovation is the most seasoned and most
ordinarily utilized type of restorative imaging. X-beams utilize ionizing radiation to deliver
pictures of a man's inner structure by sending X-beam pillars through the body, which are
caught up in various sums relying upon the thickness of the material. What's more, included
as "x-beam type" gadgets are additionally mammography, interventional radiology, registered
radiography, advanced radiography and processed tomography (CT). Radiation Therapy is a
kind of gadget which likewise uses either x-beams, gamma beams, electron shafts or protons
to treat malignant growth.
a) Fractured bones
b) Cavities
c) Objects Swallowed
d) Lungs
e) Human veins
f) Mammography
5.8.2 Ultrasound
To create images inside the body the Diagnostic ultrasound, or else called therapeutic
sonography or ultra-sonography utilized high recurrence sound waves. The ultrasound device
sends sound waves into the body and can alter over the returning resonance echoes into an
image. Ultrasound originality can similarly deliver discernable hints of blood stream,
enabling therapeutic experts to exploit the two sounds and visuals to survey a patient's
comfort.
a) Pregnancy
b) Abnormalities in both heart and veins
The working of how tissues and elements of the organs is an atomic imaging system that
gives doctors data with assistance of PET. The blend of CT imaging is finished by the
assistance of PET, a scanner and a little amount of radio-pharmaceuticals which is infused
into a patient's vein to help in making exhaustive, automated pictures of regions inside the
body.
b) leukemia
c) Effectiveness of treatments
d) Heart conditions
g) Bone wounds
h) Heart tissue
i) Traumatic wounds
j) Heart ailment
5.8.5 PET-CT
For included exactness, doctors utilize a therapeutic imaging strategy that joins PET
and CT. the single superimposed picture permits pictures procured from the two gadgets that
brings successively and aggregate into a sole superposed picture. PET-CT fills in as a key
instrument in the depiction of tumor limit, arranging and the homework of patient conduct
designs. The blend has been appeared to enhance oncologic consideration by emphatically
affecting dynamic treatment choices, sickness repeat checking and understanding results, for
example, malady free movement.
The detailed images of organs and tissues are obtained by using radio waves and a magnetic
field in medical imaging technology which is done by MRI images. This proven to be most
effective in curing affected tissues from the clustered tissues which are having combination
of both tissues. This makes magnetic resonance imaging the most trending technique in the
field of the medical history at present.
a) Veins
b) Abnormal tissue
c) Breasts
f) Spinal wounds
The normal for a picture that bargains with seen picture corruption is finished by the
assistance of Image Quality (regularly, contrasted with a perfect picture). Imaging
frameworks may present a few measures of clamor or curios in the pointer, so the quality
judgment is a critical difficulty.
There are different methods and measurements that can be ascertained dispassionately
and mechanically assessed by a program of a PC. Thus, they characterize as Full Reference
conspire (FR) and No-Reference plot (NR). In FR picture superiority plans, the test picture
quality is assessed by offsetting it with a specified picture that is accepted to have
extraordinary quality. NR measurements attempt to survey the estimation of a picture with no
reference to the novel one.
The picture quality lists attempt to make sense of the a few or the mixes of the
different elements that decide the picture quality. A portion of the basic components of the
picture quality measurements are given as pursues.
However, only some of the above-mentioned image metrics are used in this thesis. They are
explained in details in terms of their meaning and calculation as follows.
5.9.1 Mean Square Error (MSE)
1 m n
Aij Bij
2
MSE
mn i 1 j 1 (2.1)
Therefore
Bij being density of pixel values of figure that has been fused.
The most commonly used type of measurement is Mean square error which is used at
projection of error process wherever, the blunder esteem is the esteem contrast among the
real information and the resultant information. The mean of the square of this mistake gives
the blunder or the real contrast among the normal/perfect outcomes to the acquired or
computed outcome. Here, the count is carry out at level of the pixel. An aggregate of m*n
pixels are to be well thought-out. A(i,j) will be the pixel thickness estimation of the ideal
picture and B(i,j) being that of the melded picture. The contrast among the pixel thickness of
the ideal picture and the melded picture is squared and the mean of the equivalent is the
viewed as mistake. MSE esteem will be 0 if both the pictures were indistinguishable.
AD
1 m n
Aij Bij
mn i 1 j 1
(2.2)
Where
The distinction of the comparing pixel thickness esteems is found the middle value of to get
the metric. This metric aides in giving the general normal contrast between the relating pixels
of the two pictures demonstrating us an esteem that determines, how vastly different is the
combined picture from the ideal picture.
m n 2
A
i 1 j 1
ij
SC m n
B
2
ij
i 1 j 1
(2.3)
Where,
Here the proportion between the substance of the both the normal and the acquired
information. Essentially, it is the proportion between the net entirety of the square of the
normal information and the net whole of square of the acquired information. For our situation
we compute the proportion among the net total of the square of the pixel densities of flawless
picture and the net aggregate of the square of pixel densities of melded picture.
For the combined and immaculate picture being indistinguishable, the Structural Content
metric esteem would be 1. Yet, SC being 1 does not imply that the melded and impeccable
pictures are indistinguishable.
(2.4)
somewhere
Most extreme Difference is an exceptionally basic metric that gives us the data of the biggest
of the relating pixel mistake. The distinction between every one of the relating pels densities
is ascertained. The greatest distinction estimation of these is measured as the metric here.
These metric mirrors a high esteem if any piece of the two pictures, a noteworthy distinction
exists.
m n 2
2 A 2 B
i 1 j 1
LMSE m n
(
i 1 j 1
2
A) 2
(2.5)
Where,
Laplacian Mean Square Error, as clarified by the term, the typical mean square
mistake count. However, the distinction at this time is that the mean square blunder is figured
not founded on the normal and acquired information but rather dependent on the laplacian
estimation of the equivalent. Consequently, the laplacian of every one of the qualities is
figured and after that the net entirety of the square of the blunder (contrast) is computed
which is separated by total of the square of the Laplacian of the normal information.
The Structural Similarity Index estimates the comparability among two pictures. It is
an enhanced form of conventional techniques like PSNR and MSE. SSIM is viewed as a
standout amongst the best and steady metric.
(a) K vector being a consistent in the SSIM list recipe with default esteem:
K = [0.01 0.03]
(b) L being the dynamic assortment of pictures. For our situation the default estimation of
L = 255.
C1 ( K1 L) 2
C2 ( K 2 L ) 2
G being a Gaussian channel mask with default esteem given in mat lab as f special
('gaussian', 11,1.5), the info pictures An and B are low pass separated with G giving µ1 and
µ2 individually. The channel activity or convolution task is signified by ".".
µ1=A.G
µ2=B.G
Then the values, σ12 , σ22and σ122 are calculated based on following formula.
12 ( Aij 2 .G ) 12
22 ( Bij 2 .G ) 22
Once above qualities are ascertained, at last the SSIM esteem is figured by utilizing following
recipe:
As to picture quality measurements readings, the more noteworthy perspective is its outright
esteem. Our primary target is to recognize the distinction/mistake an incentive in the
combined picture as for the ideal picture, the sign does not hold quite a bit of importance for
our situation.
FORGERIES
Random Images: Are formed without any knowledge of the signer's name or signature
shape. Simple Images: Produce by people knowing the name of the signer's but without any
example of the signature.
Skilled Images: Are produce by people looking at the original signature image and try to
imitate it as closely as possible.
Forgeries
Skilled Forgeries
Casual Forgeries
Random frog Professional forgeries
Amateur forgeries
Over-theshoulder forgeries
Home improved Various kinds of forgeries are classified into the following types:
Random Forgery
The signer uses the name of the victim in his own style to create a forgery known as the
simple forgery or random forgery. This forgery accounts for the majority of the forgery cases
although they are very easy to detect even by the naked eye.
Unskilled Forgery
The signer imitates the signature in his own style without any knowledge of the spelling and
does not have any prior experience. The imitation is preceded by observing the signature
closely for a while.
Skilled Forgery
Undoubtedly the most difficult of all forgeries is created by professional impostors or persons
who have experience in copying the signature. For achieving this one could either trace or
imitate the signature by hard way. Figure shows the different types of forgeries and how
much they are varies from original signature.
(a) Genuine Signature; (b) Random Forgery; (c) Simulated Simple Forgery; (d)
Simulated Skilled Forgery
About Forgeries: Initially a set of signatures are obtained from the subject and fed to the
system. These signatures and pre-processed Then the pre-processed images are used to
extract relevant geometric parameters that can distinguish signatures of different persons.
These are used to train the system. The mean value of these features is obtained. In the next
step the scanned signature image to be verified is fed to the system. It is pre-processed to be
suitable for extracting features. It is fed to the system and various features are extracted from
them. These values are then compared with the mean features that were used to train the
system. The Euclidian distance is calculated and a suitable threshold per user is chosen.
Depending on whether the input signature satisfies the threshold condition the system either
accepts or rejects the signature. Handwritten signature verification is the process of
confirming the identity of a user sing the handwritten signature of the user as a form of
behavioural biometrics [1][2]. Automatic handwritten signature verification has been studied
for decades. Many early research attempts were reviewed in the survey papers.[3]. The main
advantage that signature verification has over other forms of biometric technologies, such as
fingerprints or voice verification, is that handwritten signature is already the most widely
accepted biometric for identity verification in society for years. The long history of trust of
signature verification means that people are very willing to accept a signature-based
biometric authentication system [4].
Various types of signature detect action:
There are two approaches used for signature verification according to the acquisition of the
data as:
Offline signature verification system
Online signature verification system
Offline signature verification system
In offline verification system, only static features are considered, whereas in case of online
systems, dynamic features are taken into consideration. Offline signature recognition is
performed after the writing is complete. The data is captured at a later time by using an
optical scanner to convert the image into a bit pattern. As compared to online signature
verification systems, off-line systems are difficult to design as many desirable characteristics
such as the order of strokes, the velocity and other dynamic information are not available in
the off-line case. The verification process has to fully rely on the features that can be
extracted from the trace of the static signature image only. Although difficult to design, off-
line signature verification is crucial for determining the writer identification as most of the
financial transactions in present times are still carried out on paper. Therefore, it becomes all
the more essential to verify a signature for its authenticity. The design of any signature
verification system generally requires the solution of five subproblems: data acquisition, pre-
processing, feature extraction, comparison process and performance evaluation. Off-line
verification just deals with signature images acquired by a scanner or a digital camera. In an
off-line signature verification system, a signature is acquired as an image.
This image represents a personal style of human handwriting, extensively described
by the graphometry. In such a system the objective is to detect different types of forgeries,
which are related to intra and inter-personal variability. The system applied should be able to
overlook inter-personal variability and mark these as original and should be able to detect
intra-personal variability and mark them as forgeries .In off–line signature recognition we are
having the signature template coming from an imaging device, hence we have only static
characteristic of the signatures. The person need not be present at the time of verification.
Hence off-line signature verification is convenient in various situations like document
verification, banking transactions etc. As we have a limited set of features for verification
purpose, off-line signature recognition systems need to be designed very carefully to achieve
the desired accuracy. The difference between the off-line and on-line lies in how data are
obtained. In the on-line SRVS data are obtained using special peripheral device, while in the
off-line SRVS images on the signature written on a paper are obtained using scanner or a
camera [2]. In this research, an approach for off-line signature recognition and verification is
proposed. The designed system consist of three stages: the first stage is pre-processing stage
which applied some operations and filters to improve and enhance signature image.
The purpose of the pre-processing stage is to determine the best signature image for
the next stage which is feature extraction stage, choosing the right feature is an art more than
a science. Three powerful features are used: global feature, texture feature and grid
information feature [3]. The three features are calculated for each signature image and enter
to the last stage which is neural network stage. Neural network consist of two-stage
classifiers: the first classifier stage contain three back propagation (BP) neural networks, each
one of the three BP takes its input from one of the three features and trained individually of
each other.
Each BP have two outputs that enter as an input to the second stage classifier. The
second stage classifier consists of two radial basis function (RBF) neural networks. It is the
task of the second classifier (RBF) to combine the result of the first classifier (BP) to make
the final decision of the system [4]. Offline verification is concerned with the verification of a
signature made by a normal pen. Various different approaches to both classes have been
proposed.
CHAPTER 4
INTRODUCTION TO MATLAB
The name MATLAB stands for matrix laboratory. MATLAB was originally written to
provide easy access to matrix software developed by the LINPACK and EISPACK projects.
Today, MATLAB engines incorporate the LAPACK and BLAS libraries, embedding the
state of the art in software for matrix computation.
MATLAB has evolved over a period of years with input from many users. In university
environments, it is the standard instructional tool for introductory and advanced courses in
mathematics, engineering, and science. In industry, MATLAB is the tool of choice for high-
productivity research, development, and analysis.
MATLAB features a family of add-on application-specific solutions called toolboxes.
Very important to most users of MATLAB, toolboxes allow you to learn and apply
specialized technology. Toolboxes are comprehensive collections of MATLAB functions (M-
files) that extend the MATLAB environment to solve particular classes of problems. Areas in
which toolboxes are available include signal processing, control systems, neural networks,
fuzzy logic, wavelets, simulation, and many others.
Matlab Desktop is the main Matlab application window. The desktop contains five sub
windows, the command window, the workspace browser, the current directory window, the
command history window, and one or more figure windows, which are shown only when the
user displays a graphic.
The command window is where the user types MATLAB commands and expressions
at the prompt (>>) and where the output of those commands is displayed. MATLAB defines
the workspace as the set of variables that the user creates in a work session. The workspace
browser shows these variables and some information about them. Double clicking on a
variable in the workspace browser launches the Array Editor, which can be used to obtain
information and income instances edit certain properties of the variable.
The current Directory tab above the workspace tab shows the contents of the current
directory, whose path is shown in the current directory window. For example, in the windows
operating system the path might be as follows: C:\MATLAB\Work, indicating that directory
“work” is a subdirectory of the main directory “MATLAB”; WHICH IS INSTALLED IN
DRIVE C. clicking on the arrow in the current directory window shows a list of recently used
paths. Clicking on the button to the right of the window allows the user to change the current
directory.
MATLAB uses a search path to find M-files and other MATLAB related files, which
are organize in directories in the computer file system. Any file run in MATLAB must reside
in the current directory or in a directory that is on search path. By default, the files supplied
with MATLAB and math works toolboxes are included in the search path. The easiest way to
see which directories are on the search path. The easiest way to see which directories are
soon the search paths, or to add or modify a search path, is to select set path from the File
menu the desktop, and then use the set path dialog box. It is good practice to add any
commonly used directories to the search path to avoid repeatedly having the change the
current directory.
The Command History Window contains a record of the commands a user has entered
in the command window, including both current and previous MATLAB sessions. Previously
entered MATLAB commands can be selected and re-executed from the command history
window by right clicking on a command or sequence of commands.
This action launches a menu from which to select various options in addition to executing the
commands. This is useful to select various options in addition to executing the commands.
This is a useful feature when experimenting with various commands in a work session.
The MATLAB editor is both a text editor specialized for creating M-files and a
graphical MATLAB debugger. The editor can appear in a window by itself, or it can be a sub
window in the desktop. M-files are denoted by the extension .m, as in pixelup.m. The
MATLAB editor window has numerous pull-down menus for tasks such as saving, viewing,
and debugging files. Because it performs some simple checks and also uses color to
differentiate between various elements of code, this text editor is recommended as the tool of
choice for writing and editing M-functions. To open the editor, type edit at the prompt opens
the M-file filename.m in an editor window, ready for editing. As noted earlier, the file must
be in the current directory, or in a directory in the search path.
Image Representation 3.1 Image Format An image is a rectangular array of values (pixels).
Each pixel represents the measurement of some property of a scene measured over a finite
area. The property could be many things, but we usually measure either the average
brightness (one value) or the brightnesses of the image filtered through red, green and blue
filters (three values). The values are normally represented by an eight bit integer, giving a
range of 256 levels of brightness. We talk about the resolution of an image: this is defined by
the number of pixels and number of brightness values. A raw image will take up a lot of
storage space. Methods have been defined to compress the image by coding redundant data in
a more efficient fashion, or by discarding the perceptually less significant information.
MATLAB supports reading all of the common image formats. Image coding is not addressed
in this course unit.
3.2 Image Loading and Displaying and Saving An image is loaded into working memory
using the command
The semicolon at the end of the command suppresses MATLAB output. Without it,
MATLAB will execute the command and echo the results to the screen. We assign the image
to the array f. If no path is specified, MATLAB will look for the image file in the current
directory. The image can be displayed using >> imshow(f, G)
f is the image to be displayed, G defines the range of intensity levels used to display it. If it is
omitted, the default value 256 is used. If the syntax [low, high] is used instead of G, values
less than low are displayed as black, and ones greater than high are displayed as white.
Finally, if low and high are left out, i.e. use [ ], low is set to the minimum value in the image
and high to the maximum one, which is useful for automatically fixing the range of the image
if it is very small or vary large. Images are usually displayed in a figure window. If a second
image is displayed it will overwrite the first, unless the figure function is used:
>> figure, imshow(f)
will generate a new figure window and display the image in it. Note that multiple functions
may be called in a single line, provided they are separated by commas. An image array may
be written to file using:
The format of the file can be inferred from the file extension, or can be specified by a third
argument. Certain file formats have additional arguments. 3.3 Image Information Information
about an image file may be found by
4 Quantisation 4.1 Grey Level Ranges Images are normally captured with pixels in each
channel being represented by eight bit integers. (This is partly for historical reasons – it has
the convenience of being a basic memory unit, it allows for a suitable range of values to be
represented, and many cameras could not capture data to any greater accuracy. Further, most
displays are limited to eight bits per red, green and blue channel.) But there is no reason why
pixels should be so limited, indeed, there are devices and applications that deliver and require
higher resolution data.
illumination (other wavelength ranges and more of them are possible). MATLAB provides
functions for changing images from one type to another. The syntax is
>> B = data_class_name(A)
where data_class_name is one of the data types in the above table, e.g.
>> B = uint8(A)
4.2 Number of Pixels Images come in all sizes, but are (almost) always rectangular.
MATLAB gives several methods of accessing the elements of an array, i.e. the pixels of an
image. An element can be accessed directly: typing the array name at the prompt will return
all the array elements (which could take a while), typing the array name followed by element
indices in round brackets, will return that value. Ranges of array elements can be accessed
using colons.
>> A(first:last)
Will return the first to last elements inclusive of the one dimensional array A. Note that the
indices start at one.
Will return every step elements starting from first and finishing when last is reached or
exceeded. Step could be negative, in which case you’d have to ensure that first was greater
than last. Naturally, this notation can be extended to access portions of an image. An image,
f, could be flipped using
The keyword end is used to signify the last index. Using the colon alone implies that all index
values are traversed. This also indicates how multi-dimensional arrays are accessed. Or a
section of an image could be abstracted using
4.2.1 A note on colour images. If the input image is colour, these operations will return
greyscale results. A colour image has three values per pixel, which are accessed using a third
index.
Would return all three colour values of the pixel at (x,y). A colour plane could be abstracted
using
Point Processing Point processing operations manipulate individual pixel values, without
regard to any neighbouring values. Two types of transforms can be identified, manipulating
the two properties of a pixel: its value and position.
5.1 Value Manipulation The fundamental value of a pixel is its brightness (in a monochrome
image) or colour (in a multichannel image).
This takes the input range of values as specified and maps them to the output range that’s
specified. Values outside of the input range are clamped to the extremes of the output range
(values below low_in are all mapped to low_out). The range values are expected to be in the
interval [0, 1]. The function scales them to values appropriate to the image type before
applying the scaling operation. Whilst low_in is expected to be less than high_in, the same is
not true for low_out and high_out. The image can therefore be inverted. The value of gamma
specifies the shape of the mapped curve. Gamma = 1 gives a linear scaling, a smaller gamma
gives a mapping that expands the scale at lower values, a larger gamma expands the upper
range of the scale. This can make the contrast between darker or brighter tones more visible,
respectively. Omitting any of the parameters results in default values being assumed. The
extremes of the ranges are used (0 and 1), or gamma = 1.
Histogram
The histogram of an image measures the number of pixels with a given grey or colour value.
Histograms of colour images are not normally used, so will not be discussed here. The
histogram of an image with L distinct intensity levels in the range [0, G] is defined as the
function h(rk) = nk.
rk is the k th intensity level in the image, and nk will be the number of pixels with grey value
rk . G will be 255 for a uint8 image, 65536 for uint16 and 1.0 for a double image. Since the
lower index of MATLAB arrays is one, never zero, r1 will correspond to intensity level 0,
etc. For the integer valued images, G = L-1. We often work with normalised histograms. A
normalised histogram is obtained by dividing each element of h(rk) by the total number of
pixels in the image (equal to the sum of histogram elements). Such a histogram is called the
probability density function (pdf) and reflects the probability of a given intensity level
occurring. p(rk) = nk /n MATLAB functions for computing and displaying the histogram and
normalised histogram are: >> h = imhist(A, b); >> p = imhist(A, b)/numel(A); b is the
number of bins in the histogram. If it is omitted, 256 is assumed. The function numel(A)
returns the number of elements in the argument, in this case the number of pixels in A.
Additional functions are defined for displaying the data in different forms: notably as a bar
graph and for adding axes.
CHAPTER 4
PROPOSED METHOD
BLOCK DIAGRAM
Convolutional Neural Networks
Convolutional Neural Networks (CNNs) have proven successful in recent years at a large
number of image processing-based machine learning tasks. Many other methods of
performing such tasks revolve around a process of feature extraction, in which hand-
chosen features extracted from an image are fed into a classifier to arrive at a
classification decision. Such processes are only as strong as the chosen features, which
often take large amounts of care and effort to construct. By contrast, in a CNN, the
features fed into the final linear classifier are all learned from the dataset. A CNN consists
of a number of layers, starting at the raw image pixels, which each perform a simple
computation and feed the result to the next layer, with the final result being fed to a linear
classifier. The layers’ computations are based on a number of parameters which are
learned through the process of backpropagation, in which for each parameter, the gradient
of the classification loss with respect to that parameter is computed and the parameter is
updated with the goal of minimizing the loss function. Exactly how this update is done
and what the loss function is are tunable hyperparameters of the network, discussed in
more detail below. For more details on backpropagation, see [5]. 3.2. VGG Architecture
The architecture of a CNN determines how many layers it has, what each of these layers
is doing, and how the layers are connected to each other. Choosing a good architecture is
crucial to successful learning with a CNN. For our main training tasks, we used the VGG-
16 CNN architecture [6]. This network contains a total of 16 layers with learnable
parameters. These layers are of the following types: Fully Connected Layers: Fully
connected layers apply an affine transformation to their inputs. Mathematically, a fully-
connected layer from n inputs to h outputs works as follows:
In a fully-connected layer, every output depends on every input
according to the weight matrix W, a learnable parameter. Outputs also depend on a bias term
b which is learnable but doesn’t depend on the inputs. ReLU Nonlinearity: The Rectified
Linear Unit is a commonly used activation function after fully-connected layers. This layer
applies the following mathematical operation to input X:
Where yi is the correct class label. This is a standard loss function for CNN’s.
One important feature of this loss function is that unlike other choices such as the Hinge Loss
function, it does not become satisfied with results that are ”good enough” and seeks to push
the class scores closer and closer to perfection, corresponding to all the probability mass
being assigned to the correct class. Using the gradient of this loss function, we update all
parameters in the network using the Nesterov Momentum update. Our reasoning for using
this particular update method is explained in our Results section. This update keeps track of a
variable ”v” that is a function of the magnitude of previous updates, allowing the learning
rate to depend on this variable. This type of update is more tolerant to different values of
learning rate and tends to converge relatively quickly, as it can adapt over time based on how
quickly the parameters are changing. 3.4. Transfer Learning VGG-16 is a large neural
network with a huge number of learnable parameters. Effectively training a network this large
from scratch requires a large dataset and access to substantial computational resources.
However, this problem can be averted by leveraging similarities between different image
datasets. Specifically, for most reasonable image classification tasks, the low-level features
learned by the first few layers of a network will be roughly the same regardless of the dataset.
This means that we can initialize our neural network with parameter values learned from a
different dataset and expect that the values for the first layers of the network will work well
without having to be trained. This process is known as transfer learning. For our task, we
downloaded pretrained weights trained on ImageNet for the VGG-16 architecture by
following the directions here:
We initialized our model with the pretrained weights and allowed only the fully-connected
layers at the end to train, keeping the convolutional layers fixed throughout training (this is a
decision we came to based on testing different options, see results for further discussion).
In our final experiment, we make use of a model proposed in [7] for image patch comparison.
The paper proposes several models for image patch comparison, including siamese (similar to
[8]), pseudo siamese, and 2-channel networks. In the 2-channel network, two images are
stacked (i.e. each image serves as a channel for the composite, paired image) as the input to a
convolutional network. The convolutional network then leads to a full connected linear
decision layer with 1 output that indicates the similarity of the two patches. They found the
the 2-channel network outperforms the other proposed architectures. Further details about our
adaptation are in the Results section. The following is a sketch of a 2-channel architecture:
Figure 2. 2-channel network for image patch comparison [7]
achieve the results for the test set. All of our tasks involve the test-time behavior of
classifying whether a given signature is forged or genuine. We evaluate our performance
using three metrics: classification accuracy, False Acceptance Rate, and False Rejection Rate.
They are defined as follows, where genuine signatures are considered ”positive” examples:
Explain the figures in your own words
CHAPTER 7
Conclusions
REFERENCES
[2] Yeung, D.-Y., Chang, H., Xiong, Y., George, S.E., Kashi, R.S., Matsumoto, T., Rigoll,
G.: SVC2004: ”First International Signature Verification Competition” in ICBA 2004.
LNCS, vol. 3072, pp. 16-22. Springer, Heidelberg
[3] Liwicki, M., Malik, M., van den Heuvel, C., Chen, X., Berger, C., Stoel, R., Blumenstein,
M., Found, B.: Signature Verification Competition for Online and Offline Skilled Forgeries
(SigComp2011). in 2011 International Conference on Document Analysis and Recognition
[4] J. Vargas, M. Ferrer, C. Travieso, and J. Alonso, Off-line Handwritten Signature GPDS-
960 Corpus, in Ninth International Conference on Document Analysis and Recognition,
2007. ICDAR 2007, vol. 2, Sep. 2007, pp. 764768. [5] A. Karpathy ”Backpropagation,
Intuitions” from Stanford’s CS231n, http://cs231n.github.io/optimization-2/, 2015
[6] K. Simonyan, A. Zisserman ”Very Deep Convolutional Networks for Large-Scale Image
Recognition” in ICLR 2015 [7] S. Zagoruyko, N. Komodakis ”Learning to Compare Image
Patches via Convolutional Neural Networks” CVPR 2015, arXiv:1504.03641v1
[9] H. Baltzakis and N. Papamarkos, A new signature verification technique based on a two-
stage neural network classifier, Engineering applications of Artificial intelligence, vol. 14, no.
1, pp. 95103, 2001.
[11] J.-P. Drouhard, R. Sabourin, and M. Godbout, A neural network approach to off-line
signature verification using directional PDF, Pattern Recognition, vol. 29, no. 3, pp. 415424,
1996.
[12] P. S. Deng, H.-Y. M. Liao, C. W. Ho, and H.-R. Tyan, Wavelet-Based Off-Line
Handwritten Signature Verification, Computer Vision and Image Understanding, vol. 76, no.
3, pp. 173190, Dec. 1999
[17] K. Huang and H. Yan, Off-line signature verification based on geometric feature
extraction and neural network classification, Pattern Recognition, vol. 30, no. 1, pp. 917, Jan.
1997.
[19] E. zgndz, T. entrk, and M. E. Karslgil, Off-line signature verification and recognition by
support vector machine, in European signal processing conference, EUSIPCO, 2005.