Professional Documents
Culture Documents
by
(CB002737)
GF09C1SE
Staffordshire University
1
ABSTRACT
Face recognition is a wide area which is currently studied by lot of researchers and
vendors but still there are some areas that even researchers have not touched. Partial
face recognition can be considered as one of these areas. Few years back researchers
considered that partial face recognition is an unrealistic approach because there is
less information which can use to recognize a person. But after the research
conducted by researchers like Sato et al (1998), it is proven that partial face region
also contain some amount of information that can use to recognition.
By conducting this project, I tried to investigate how far partial face regions involve
recognizing individual. The artifact allows users to input probe images and then the
system will determine individual’s face whose belongs that particular face region.
Even though it appears to be a simple process the system should perform intensive
amount of work to achieve it.
A domain research, which has conducted to identify and study about the problem,
related to face recognition, similar system. Then by analysing it I decided to the area
that needed constrain in system research.
During system research researcher studied about the fundamentals about the image
processing, face detection techniques and face recognition techniques. Then after
analysing them it decided to select appropriate techniques to extract partial face
regions and recognize individuals.
Finally appropriate image processing techniques, face detection techniques and face
recognition techniques were selected and justified along with selection of the project
developing methodologies and project developing platforms.
ii
ACKNOWLEDGEMENT
Special thanks to my family without their help guidance and love I would not able to
make this far. Then I like to thank my colleagues for their advice and guidance about
the project. Special thanks go to my friend Tharindu Roshantha and Andrew Jebaraj
for being good friends to me and help me to proof read my document and correcting
my mistakes.
Finally, I would like to thank to APIIT Sri Lanka for facilitating me to by providing
me necessary educational help form well qualified lecture staff, library and lobotomy
facilities which need to make this project success. Especially thanking the library, for
facilitating me to get the necessary reading materials which were needed to my
project research.
iii
TABLE OF CONTENT5S
ABSTRACT .................................................................................................................ii
CHAPTER 1 ................................................................................................................ 1
1 INTRODUCTION ................................................................................................... 1
CHAPTER 2 ................................................................................................................ 8
iv
2.3.1 Pose variance ............................................................................................. 12
CHAPTER 3 .............................................................................................................. 15
3.3.3 Facial feature detection using AdaBoost with shape constraints .............. 35
3.4.3 Support vector machine kernel correlation feature analysis method ......... 42
v
3.4.5 Justification on selected technique ............................................................ 56
Chapter 4 .................................................................................................................... 60
CHAPTER 5 .............................................................................................................. 64
CHAPTER 6 .............................................................................................................. 72
vi
6.2 Image processing APIs ..................................................................................... 76
CHAPTER 7 .............................................................................................................. 80
7.6 Sample Unit Testing Test plans for eye region extraction module .................. 81
REFERENCE ............................................................................................................. 83
APPENDICES .............................................................................................................. i
vii
LIST OF FIGURES
Figure 1.1: Altered faces. ............................................................................................. 3
Figure 1.2: Normal face recognition process ............................................................... 3
Figure 1.3: Proposed solution overview ...................................................................... 4
Figure 2.1: Configuration of generic face recognition ................................................. 8
Figure 2.2: Google Picasa ............................................................................................ 9
Figure 2.3: APPLE I photo face recognition .............................................................. 10
Figure 2.4: Face.com Photo Tagging ......................................................................... 11
Figure 2.5: Face Book Photo Tagging ....................................................................... 11
Figure 2.6: Different pose Audio Visual Technologies ............................................. 12
Figure 2.7: Different illuminations ............................................................................ 13
Figure 2.8: Different facial expression....................................................................... 13
Figure 2.9: Different facial occlusions ....................................................................... 13
Figure 3.1: Image representation in plane, As a matrix ............................................. 15
Figure 3.2: Image segmentation approach ................................................................. 16
Figure 3.3: Low noise object/background image histogram ...................................... 18
Figure 3.4: Canny edge detection process.................................................................. 21
Figure 3.5: Masks ....................................................................................................... 22
Figure 3.6: Sober Template ........................................................................................ 22
Figure 3.7: Geometrical features ................................................................................ 26
Figure 3.8: Template matching strategy..................................................................... 27
Figure 3.9: Template matching .................................................................................. 28
Figure 3.10: Template ................................................................................................ 29
Figure 3.11: Test one Grey-Level Template matching .............................................. 29
Figure 3.12: Test two Grey-Level Template matching .............................................. 30
Figure 3.13: Test three Grey-Level Template matching ............................................ 30
Figure 3.14: Test four Grey-Level Template matching ............................................. 30
Figure 3.15: Test Five Grey-Level Template matching ............................................. 31
Figure 3.16: Test Five template is not available ........................................................ 31
Figure 3.17: Rectangle Features ................................................................................. 33
Figure 3.18: The features selected by AdaBoost ....................................................... 33
Figure 3.19: Cascade of classifiers............................................................................. 34
Figure 3.20: Features selected by AdaBoost .............................................................. 36
viii
Figure 3.21: Elastic branch graph matching face recognition.................................... 39
Figure 3.22: Face image transformation .................................................................... 44
Figure 3.23: Distribution of faces in image space...................................................... 44
Figure 3.24: Faces in face space ................................................................................ 45
Figure 3.25: Overview of Eigen face Approach ........................................................ 46
Figure 3.27: Representation of A x =b. ..................................................................... 47
Figure 3.26: Vector .................................................................................................... 47
Figure 3.28: Representation of A v = λ v. .................................................................. 48
Figure 3.29: Training Images ..................................................................................... 50
Figure 3.30: Example images from the ORL database .............................................. 53
Figure 3.31: Mean face obtained from the ORL database ......................................... 53
Figure 3.32: Eigenfaces .............................................................................................. 54
Figure 5.1: SDLC Model ........................................................................................... 65
Figure 5.2: Incremental model ................................................................................... 66
Figure 5.3: Hybrid of waterfall and incremental model ............................................. 66
Figure 5.4: Prototyping model ................................................................................... 67
Figure 5.5: Evolutionary Prototyping ........................................................................ 68
Figure 6.1: Dot net Framework .................................................................................. 73
Figure 6.2: Dot net Code Execution process .............................................................. 73
Figure 6.3: Java Architecture ..................................................................................... 75
ix
LIST OF TABLES
Table 3.1: Detection rates for various numbers of false positives on the test set ...... 34
Table 3.2: Comparison of face detection approach .................................................... 38
Table 3.3: Recognition results between different galleries using EGBM .................. 40
Table 3.4: Recognize and Detect Faces ..................................................................... 56
Table 7.1: Test Case : Eye region detection ............................................................... 82
Table 7.2: Test Name : Eye region Extraction ........................................................... 82
x
LIST OF ABBREVIATION
xi
CHAPTER 1
1 INTRODUCTION
Automated face recognition and computer vision is one of the most favourite areas
of researchers because it relates not only to image processing but also artificial
intelligent which consider as next generation of the computer. Although human can
identify familiar faces independent from variations of viewing conditions,
expressions, aging and distractions such as glasses or masks, but a computer is not
capable of understanding through cameras and recognize faces. However, as
mentioned by Zhang et al (2007) there are several approaches invented by
1
researchers and vendors for face recognition. Still there are lot of areas to improve in
automated face recognition.
However, when it come to real world, it might not be possible to capture full frontal
picture of a face at all the times in uncontrolled environments. Even though there are
many face recognition systems available, most of these work in optimal conditions.
Especially without full frontal face, these systems fail to recognise a face. As a result
of that most of system cannot give accurate face match results. Because of that, there
can be lot of complications in identifying a person in an image.
One of the reason identified was criminals intentionally alter their appearance using
disguises to defraud law enforcement and the public. Also because of the influences
of different cultures and environment factors sometimes it is not possible to expose
full face. In those situations normally, face recognition approaches fail to give well
accurate result.
There are various reasons for the failings of the current systems. One possible reason
is they cannot identify disguised faces comparing with full faces because current
approaches are not capable of identifying individuals using partial face regions
which have fewer characteristics in small face region comparing with full faces.
2
Figure 1.1: Altered faces.
(a) Muslim girl wearing veil [Source: Chopra, 2007]. (b) Women wearing masks
[Source: Tarlow, 2007]. (c) Person wearing sunglass [Source: University of Buenos
Aires ,n.d].
By understanding the above, there are lot of sub steps performed within the above
steps. Although there are lot of approaches for full-face recognition, the proposed
solution deviates from the regular face recognition approaches. Therefore, when it
takes the suggested solution the overall process of the face recognition applies but
internal process sequence is different from the regular face recognising process.
3
1.4 Project Objectives
The purposed solution takes a segment of a face (eye region, nose region or mouth
region) at a time and identifies which has submitted region. Based on the input
region it will extract the features that are unique to each region. Then it will extract
particular face region form the faces in database. After that, it will match the
features, which have extracted from both two-face region.
Region Face
Face Region
Feature Matcher
Identifier
Extractor
Submitted
Image face
feature
Results
4
1.6 Project Scope
This project is to develop a system that allows recognise individuals using partial
regions of face. Initially the project is based on recognise individuals using
submitted face regions.
This project is only focus on identifying individuals using only eye region
but as external functionality, it will try to implement it to mouth region.
The all images have taken in 0-degree camera angle, which mean all images
are frontal view images, and taken in controlled environment.
The eyes should be open and mouth should close in the faces including both
input image and images in database. All faces should be in neutral mood.
It assumed that input image of the system (The image to be recognised) is
pre scaled and it is not necessary to scale back.
Disguised faces, which cannot use for extract at least one of eye region or
nose region or mouth region.
Different poses of face regions.(taken in different angles)
Low quality images ( less resolutions, etc)
Different facial expressions
Core Functionality of the artefact can identify as follows. The main functionality of
the solution is “recognize individuals using partial face segments”.
During this research and implementation, it focuses on face recognition using eye
region of a person. Because of that which will mention as partial face recognition as
onward here consider as partial face recognition using eye region.
5
Furthermore, it can be divide as follows
Input identification
This will identify whether input region is a face region or not and if it is a
face region what is the region.
i.e.
User input an eye region then first at all it will check whether it is a face
region if yes it will identify what is the face region.
This will detect the particular face region of full frontal faces in database and
extract particular face region from the faces
Face match
This will match submitted face region with extracted face regions in database
and provide match result.
The proposed artefact will develop a standalone application, which runs on PC and a
project report, which include the details of the project.
6
1.7 Constrains
1. Camera view
2. Inadequate resolution of the images
3. Lighting condition
4. Scale of the image
5. Distance between camera and person
6. Facial expressions
7. Occlusion
8. Ageing
1.8 Assumptions
1. The input images, which submitted to system and images in database are in
same scale.
2. The input image has extract manually based on given dimensions.
3. Images have taken in controlled environment, which avoids different lighting
conditions, variance of view, variance distance between camera and person.
4. All images have constant resolution.
5. All faces have taken in neutral facial experience.
6. The face’s areas taken into recognize have not occluded.
7. It is assume that all images taken are in same age because of that the distance
of each features of the face does not change.
7
CHAPTER 2
2 DOMAIN INVESTIGATION
Typical face recognition systems recognize faces based on various factors and
various technologies. Although different face recognition system use different
approaches, most of them perform key main steps, which are common to most face
recognition approaches.
Input Images
and Video
Face Detection
Feature
Extraction
Face
Recognition
8
2.2 Similar systems
Face recognition systems dates back in history of over 5 decades which starts form
1960. Today face recognition has archived excellent development and gained
considerable attention from areas such as surveillance, biometric identification etc.
Because of that, there are many systems that have been developed using face
recognition technology but during the research, researchers could not find a system
which is capable of partial face recognition.
Google Picasa is photo organizing and editing software, which is a free application
provides by Google. One of newest feature of Picasa is face recognition feature.
According to Baker (2006) Google Picasa use Face recognition approach called
“Neven Vison ” for recognition faces.
Baker (2006) also mentioned that Neven Vison is very advanced techniques and it
covers over 15 patents. Furthermore “Neven Vison” is not only for face recognition
but also for object recognition.
9
2.2.2 APPLE IPhoto
Apple Inc (2010) stated that “iPhoto introduces Faces: a new feature that
automatically detects and even recognizes faces in your photos. iPhoto uses face
detection to identify faces of people in your photos and face recognition to match
faces that look like the same person”.
10
2.2.3 Face.Com automatic face book photo tagger
According to Bil (2010) Face.com provide auto face tagging ability to facebook
photo albums. It will check all the photos available in users’ photo collection in
facebook and tag user and friends.
During the experimentation, the researcher has identified that this application gives
poor results. Especially when it try to identify images in different lighting condition.
11
2.3 Problem of face recognition systems
Even thought face recognition is five decades old. In study filed of face recognition
system there are lot of unsolved problems and biometric identification area. At times
most of the approaches of the face recognition, which is currently in use and in
experimenting stages, can only give solutions for few problems in face recognition.
During the research, the researcher could not find a robust face recognition
algorithm, which gives 100 % accurate results.
Pose variance mean difference of the camera angel that takes photos (Image).
12
2.3.2 Lighting/Illumination variance
13
2.3.5 Individual characteristics
Face recognition systems might depend on some face characteristics such as skin
colour. Some time face recognition systems, which is designed for Caucasian might
not be capable of handling faces of other races.
In addition, gender can also consider as another factor sometime face recognition
system that is use for females lip identification might not work for males.
The face scale depends on the distance between person and camera. Therefore, some
time it is not capable of handling faces taken at different distance.
Apart from that, face recognition requires lot of computational power which should
be considered because although there are high-end computers which are capable of
handling such process, they are reasonably expensive.
14
CHAPTER 3
Digital image processing can be considered as one interesting area of the computer
vision. Image processing based on concepts on mathematics. Following section
shows fundamentals of image processing.
According to Jähne (2005, pp32) there are entirely different ways to display an
image. The most popular way to represent an image is rectangular grid. Gonzalez
(2004, pp 01) further described it as follows; digital images can be represent as a
matrix as it can be represent as 2D function where x and y gives coordinates in
geometric plane.
15
3.1.1.1 Raster Image
As defined by Busselle( 2010) vector images are collection of connected lines and
curves. Each line and curves defined by mathematical formula. Therefore, these
images are resolution independent and it has the ability to control mathematical
formula according to the scale.
Binary images represent images by using 0 and 1.The pixel of the object represent as
“1” and background as “0”. By using threshold, a digital colour image could convert
into a binary image. Threshold is a method that uses to segment an image.
According to Grady & Schwartz (2006), “Image segmentation has often been
defined as the problem of localizing regions of an image relative to content“.
Image segmentation
16
Furthermore Biswas (2008) and Yang (2004) described discontinuity based image
segmentation is focused on Isolated points, lines and edges detection while region
based / similarity based approach is focus on thresholding.
Spann (n.d) mentioned that, grey level histogram can consider as one of most
popular image segmentation method among other techniques.
Image histogram is a “graph which shows the size of the area of the image that is
captured for each tonal variation that the camera is capable of recording” (Sutton,
n.d).
Noise is unwanted external information that is in image. First the noise should be
filter out form the image or it should consider when it processing the image.
3.1.2.2 Thresholding
“The operation known as “simple thresholding” consists in using zero for all pixels
whose level of grey is below a certain value (called the threshold) and the maximum
value for all the pixels with a higher value” (kioskea.net , 2008).
17
3.1.2.2.1 Grey level thresholding
Grey level thresholding defined whether particular grey pixel belongs to the object or
to the background. By using this method, it is possible to separate an object form the
background.
else
18
3.1.2.2.2 Determine threshold
Spann (n.d) mentioned that to determine threshold there are several approaches but
following approaches shows high interaction.
Interactive threshold
Adaptive threshold
Minimisation method
Please refer appendix for detailed explanation for approaches for determining
threshold.
Neoh and Hazanchuk (2005) mentioned, “Edge detection is a fundamental tool used
in most image processing applications to obtain information from the frames as a
precursor step to feature extraction and object segmentation.” In addition, they
further explained it as follows “This process detects outlines of an object and
boundaries between objects and the background in the image.” According to Green
(2002), “Edge detecting an image significantly reduces the amount of data and filters
out useless information, while preserving the important structural properties in an
image”.
The relationship between edge detection and threshold is once image is binarized the
next step is applying threshold by doing that it can decide whether edges are
available or not.
If the threshold is lower, more edges can be found and high threshold might cause to
loss of edges.
19
3.1.3.1 Canny’s edge detection
According to Green (2002b) there are few criteria that should consider.
Green (2002b) applied following steps to detect edges using Canny’s edge detection.
Green (2002b) has achieved by finding the gradient of the image. To do that
he perform the “Sobel” operator 2-D spatial gradient measurement on an
image.
20
Step 4
“Once the edge direction is known, the next step is to relate the edge
direction to a direction that can be traced in an image” (Green, 2002b).
Step 5
After the edge directions are known, non maximum suppression now has to
be applied (Green ,2002b).
Step 6
Then it use hysteresis to determine final edges. He did it by using suppressing all
edges, which does not connect to selected edge.
According to (Green, 2002a ) “The Sobel operator performs a 2-D spatial gradient
measurement on an image. Typically, it is used to find the approximate absolute
gradient magnitude at each point in an input greyscale image”.
(Green, 2002a) and Biswas(2008) Describe it further as Sobel detector can consider
as algorithm that performs two dimensional gradient measurements on an image.
21
This technique achieve by calculating the estimated gradient magnitude at each point
in a gray scale image.
In the article presented by Green (2002a).He mentioned that the Sobel edge detector
used a pair of 3 x 3 convolution masks. One of mask estimate gradient in the x-axis
other mask estimating the gradient in the y-axis. Normally mask is smaller than the
actual image because of that this mask is slid over the image. Because of that, it
manipulates a square of pixel at a time.
𝐺 = 𝐺𝑥 2 + 𝐺𝑦 2
22
Approximate magnitude of the gradient
𝐺 = 𝐺𝑥 + 𝐺𝑦
It has identified that by using grey-level histogram segmentation and applying edge
detection like canny edge detector, it is possible to extract face regions. According to
view of researcher, this method can be useful in extraction face regions from the
face but it might not give accurate results because histogram thresholding value
might depend on gender and race.
23
3.2 Face recognition approaches
The first face recognition approach was introduced by Sir Francis Galton in 1888
(Kepenekci, 2001) which was done by measuring four characteristics of French
prisoners. Because of that attempt of Sir Francis Galton to identify persons in more
scientific way, it made foundation of the biometric recognition which cause to
improvement of face recognition.
As mentioned before there are different approaches for face recognition. Those
approaches can divide into main section based on the image type. Those are two-
dimensional and three dimensional face recognition.
3D model based face recognition can consider as latest trend of the face recognition.
As mentioned by Bonsor & Johnson (2001) , this face recognition approach uses
distinctive features of the faces. Which mean features like dept of the face features,
curves of the eyes, nose and chin use to recognise a faces.
Akarun, G¨okberk, & Salah (2005) stated that three dimensional face recognition has
higher accuracy rate over two dimensional traditional face recognition but when it
comparing with traditional face recognition approach it has few disadvantages such
as cost of implementation , unfamiliarity of technology and lack of hardware
specially camera equipments. Because of that, still 2D face recognition has higher
demand.
As mentioned by Gaokberk (2006) there are different approaches that use in 3D face
recognition. Point cloud-based approaches, depth image-based approaches, curve-
based approaches, differential geometry-based approaches, facial feature-based
geometrical approaches and shape descriptor-based approaches can consider most
popular approaches in 3D face recognition.
24
3.2.2 Two dimensional face recognition
Two dimensional face recognition can be considered as the most oldest and popular
method that is currently using in the field. This approach can process 2D images that
have taken in regular cameras (probably cameras in security systems) and identify
faces based on different approaches. As mentioned by NSTC subcommittee on
biometrics (2006) some of algorithms use faces landmarks to identify faces while
other algorithm use normalized face data to identify probe image.
As stated by Brunelli and Poggio (1993) the idea behind geometric face recognition
is “extract relative position and other parameters of distinctive features such as eyes”
.After finding the features they will match with the known individual’s features
details and find out the closest distance of the matches. According to Kanade (1973)
the research has achieved 45-75% success recognition rate with 20 test cases
.According to Brunelli and Poggio (1993) disadvantage of geometric face
recognition over view based face recognition is extraction of face features.
The following image shows some of face features that can use for recognize faces.
25
Figure 3.7: Geometrical features
However, with the modern techniques and approaches, face feature detection has
gained lot of improvements comparing with old day. As mentioned by Bagherian,
Rahmat and Udzir (2009) extraction the features can do by different approaches such
as colour segmentation approach, template based approach that use pre-defined
image to detect face features and relative locations.
With development of appearance based face recognition approach geo metric face
recognition approach has deviate from the face recognition area and the techniques
used by geometric face recognition have used to different areas such as facial
expression detection ,etc.
The second approach of the face recognition is appearance based face recognition,
which is quite popular among researchers and industries relate to machine vision. In
this approach, it takes probe face as a set of pixels into a matrix and compare with
the known individuals. Brunelli and Poggio (1993) described it as follows “the
image, which is represented as a bi dimensional array of intensity values, is
compared using a suitable metric with a single template representing the whole
face”. The above-mentioned approach is only an overall picture of the appearance
based recognisor approach. There are more sophisticated ways of performing
appearance based face recognition.
26
Figure 3.8: Template matching strategy
During this project, it considers only 2D face recognition because it is not faceable to
find enough resources to do 3D face recognition. In 2D face recognition, it would
constraint on view based face recognition because by using geo metric face
recognition the data (distance between features) which can use to recognize is not
enough.
i.e:
If it takes eye region, it can only take few measures. Like distance between I corners,
width of eyelid, etc. These measures can be verified because of that this method is
not good reliable measure.
27
3.3 Face region extraction approaches
The author has identified several approaches to extract face features during the
research. Among them following techniques show promising results.
According to Latecki (n.d.) in here, the matching processes changes the position of
the template image to all possible positions of the source image. Then it computes a
numerical index that indicates how well the template matches the image in that
position. Match is done on a pixel-by-pixel basis.
28
3.3.1.1 Bi-Level Image template matching
Letecki(n.d) has conducted an experiment on five data set and achieved following
correlation maps.
29
Figure 3.12: Test two Grey-Level Template matching
30
Figure 3.15: Test Five Grey-Level Template matching
( xi x ) yi y
N 1
cor i 0
N 1 N 1
xi x yi y
2 2
i 0 i 0
x is the template gray level image x is the average grey level in the
template image
y is the source image section ȳ is the average grey level in the source
image
N is the number of pixels in the section (N= template image size = columns *
image rows)
31
The value cor is between –1 and +1, with larger values representing a stronger
relationship between the two images.
If the correlation value is less than correlation values of template then there will not
be a template image in source.
By using grey level template matching it is possible to match face regions but it
might not get accurate results because intensity values of the faces can be depend of
the race and gender and computational time can be high because template matching
use pixel based calculations. However, by limiting variance it is possible to perform
this approach to face region extraction.
As a alternative to pixel based calculation Violla & Jones (2001) proposed feature
based approach. Following section describe about it.
Violla & Jones (2001) proposed an approach to detect objects using a boosted
cascade of simple features. They introduced new image representation method called
“Integral image” according to Violla & Jones (2001) integral images allow
processing the data other than using raw images. Then they have used a learning
algorithm “AdaBoost” to select visual features from a larger set and yields extremely
efficient classifiers. Then they combined successively more complex classifiers in a
cascade structure, which dramatically increases the speed of the detector by focusing
attention on promising regions of the image.
Violla & Jones (2001) approach they have used features that generated by haar basis
functions. According to Violla & Jones (2001) the reason for using features are “that
features can act to encode ad-hoc domain knowledge that is difficult to learn using a
finite quantity of training data.” Furthermore, that has used three types of features.
Two rectangle feature , three-rectangle features and four-rectangle features which are
respectively the difference between the sum of the pixels within two rectangular
regions , computes the sum within two outside rectangles subtracted from the sum in
a centre rectangle , computes the difference between diagonal pairs of rectangles.
32
Figure 3.17: Rectangle Features
According to Violla & Jones (2001) the reason for using above mentioned “integral
images” are rectangle features can be computed very rapidly using an integral image.
By using number of negative (which does not include false faces) and positive
images (which does include faces) to the AdaBoost has trained for extract the
features.
“The two features are shown in the top row and then over layed on a typical training
face in the bottom row. The first feature measures the difference in intensity between
the region of the eyes and a region across the upper cheeks. The feature capitalizes
on the observation that the eye region is often darker than the cheeks. The second
feature compares the intensities in the eye regions to the intensity across the bridge
of the nose” (Violla & Jones 2001).
33
In the end, they have formed the cascade, which can be, identify as a decision three,
which has classifier network to filter our negative results and provide a well accurate
result. The following diagram shows structure of a cascade.
The approach, which is proposed, by Violla & Jones (2001) has 38 stages with over
6000 features.
They have trained each classifier of the cascade by 4916 faces and 10,000 non-faces
size of the all images are 24 x 24.
Table 3.1: Detection rates for various numbers of false positives on the test set
This method can consider as good approach to face region extraction but identifying
face regions can be difficult because face can have lot of representation of
rectangular features therefore Cristinacce & Cootes (2003) proposed approach,
which can use to detect face features.
34
3.3.3 Facial feature detection using AdaBoost with shape constraints
This can consider as extend of face detection method proposed by Violla & Jones
(2001).
Cristinacce & Cootes (2003) proposed a method for facial feature extraction using
AdaBoost with shape constraints for locate the eye, nose, mouth corners in frontal
face image.
Their approach can divide into two main sections face detection and face
segmentation detection.
Cristinacce & Cootes (2003) used the face detection method used by Viola & Jones
to detect the faces for feature selection, which has described previously. Cristinacce
& Cootes (2003) described it as follows “The output of the face detector is a image
region containing the face, which is then examined to predict the location of the
internal face features”.
Cristinacce & Cootes (2003) deviated from Viola and Jones approach when selection
negative and positive images and building AdaBoost template using haar like
feature. Viola & Jones use human faces as positive examples and regions known
not to contain a human face as the negative examples. According to Cristinacce &
Cootes (2003) in this approach, the positive examples are image patches centred on a
particular facial feature and the negative examples are image patches randomly
displaced a small distance from the same facial feature.
Cristinacce & Cootes (2003) performed same algorithm to locate the local features
detection. The following figure shows few features selected by AdaBoost for right
eye.
35
Figure 3.20: Features selected by AdaBoost
The other thing that the approach taken by Cristinacce & Cootes(2003) which is
extends approach taken by Viola & Jones(2001) is in this method it use shape
constraints to check candidate feature points. Cristinacce & Cootes (2003) described
it as follows “Firstly a shape model is fitted to the set of points and the likelihood of
the shape assessed. Secondly, limits are set on the orientation, scale and position of a
set of candidate feature points relative to the orientation, scale and position implied
by the global face detector”.
The shape model they have used designed in similar way that proposed by Dryden
and Mardia(1998 cited Cristinacce & Cootes ,2003). Then Cristinacce & Cootes
aligned the points in to a common co-ordinate frame. After taking the distribution
According to Cristinacce & Cootes (2003) they got multi-variant Gaussian type of
distribution. Then they estimated probability of the give shape ps(x).Then they got
threshold T by comparing training dataset. So if probability of given shape is greater
than threshold they consider as a shape.
Cristinacce & Cootes (2003) has analysed to find the “range of variation in position
of the features relative to the bounding box found by the full face detection”.
Furthermore, the feature detectors what have implemented during this application
returns list of candidate points that probability of given shape is greater than
threshold.
According to Cristinacce & Cootes (2003) the maximum time the entire process took
was less than .5 Seconds. They have archived 88.8% detection rate as follows.
36
“Feature distance is within 5% of the eye separation for 65% of faces, 10% for 85%
of faces and 15% for 90% of faces” Cristinacce & Cootes (2003).
This method shows promising results but to implement this method it requires lot of
time and effort because of AdaBoost training but during the technical investigation,
it found a method to overcome this problem. The method for implementation will
describe in later.
In the research, it discussed about viola-jones face feature detection using Haar-like
features, template matching approach, face detection and facial feature detection
using AdaBoost approach and shape constraint. In Violla-Jones (Violla &
Jones,2001) approach, they have used features that generated by Haar based
functions. Features are rectangle areas, which represent the binary intensity values of
the faces. This method achieved 76.1% - 96.9% face detection rate. Cristinacce &
Cootes (2003) purposed extended version of Viola-Jones (Violla & Jones 2001)
approach for detect local features of faces. Cristinacce & Cootes (2003) have
achieved excellent results than Violla-Jones approach. Kuo & Hannah (2005).
purposed template-matching approach for eye feature extraction and they have
archived 94 % for iris extraction, 88% eye corners and eyelid extraction.
Both Violla-Jones (Violla & Jones, 2001) and Cristinacce-Cootes (Cristinacce &
Cootes, 2003) approaches used AdaBoost network to train the Haar cascades. In
addition, Kuo & Hannah (2005) approach does not use any neural network approach.
To train the AdaBoost they have used huge number of positive and nonnegative
images, which requires lot of time and approaches but once cascade files created it is
possible to reuse it for different set of data.
Kuo & Hannah (2005) mentioned that there approach is relatively inefficient because
this algorithm does not work in occluded faces and pose variance. In addition, the
algorithm is capable of identifying only eyes.
37
Violla-Jones Approach Cristinacce-Cootes Template based Approach
Approach
High detection rate High detection rate High detection rate
Researcher has identified there are trained haar-like feature cascade file which can
use for Violla-Jones Approach and Cristinacce-Cootes Approach. Then it does not
require use of AdaBoost training.
By considering above facts, it decided that to use haar-like features for detect the
face and face regions. Because by comparing above three techniques it shows high
accuracy rate and efficiency. The reason for rejecting template-based approach is the
accuracy of the template matching is depending on the template and testing set.
38
Furthermore, in the face graph, node represent fiducial points (eyes, nose, etc) and
edges represent distance between each fiducial points. Furthermore those points are
describe by sets of wavelet components (jets).
According to Wiskott et al (1997) image graph of new faces are extract using elastic
graph matching process that represent the face. According to Wiskott et al (1997) it
is vital to have accurate node positioning for face recognition to get accurate result
therefore it use phase information to get accurate node positioning. Object-adapted
graphs which use fiducial points representation handle rotations of objects (In this
case faces) in depth.
The face graph is extraction base on bunch graph, which is combined representation
of all model graphs for cover wide variance, rather than one model does. Wiskott et
al (1997) also mentioned that use of individual separate models to cover variations
like such as differently shaped eyes, mouths, or noses, different types of beards,
variations due to sex, age, race, etc.
39
The above method purposed by Wiskott et al (1997) is not specific only for human
face recognition they have mentioned that it is possible to use this system for other
object recognition also. Furthermore, above method work fine in lot of variance
caused by size, expression, position and pose changes.
The experiment they have done by using FERET database they have achieved high
recognition rate over frontal - frontal (98 %) face recognition with and without
expression. They could archive 57% - 81 % recognition rate by comparing half
profile left side of the face and half-profiling right side of the face.
The following table shows the summery of their results which achieved in their
approach.
No of success
No of success
percentage
percentage
success
success
Image
Image
40
It has identified that this method is suitable for multi-view based face recognition but
like geometric based face recognition approach, this cannot apply for partial face
recognition because of lack of information for face recognition.
As stated by Delac, Grgic and Liatsis (2005) there are several statistical (appearance
based) methods have been purposed. Among them PCA and LDA perform major
roles. According to Navarrete and Ruiz-del-solar (2001) in subspace method “it is
project the input faces onto a reduced dimensional space where the recognition is
carried out, performing a holistic analysis of the faces”.
Furthermore, Navarrete and Ruiz-del-solar (2001) stated that PCA and LDA could
consider as above projection methods that reduce high dimensional image space to
low dimensional image space. Heseltine (2005) elaborate it further as follows PCA
,LDA and other methods can be use to “image subspace projection in order to
compare face images by calculating image separation in a reduced dimensionality
coordinate space.” .He also mentioned that it use “ a training set of face images in
order to compute a coordinate space in which face images are compressed to fewer
dimensions, whilst maintaining maximum variance across each orthogonal subspace
dimension.”(Heseltine, 2005).
According to Yambor (2000) the images projected into a subspace, then it creates
combined all subspace into a one which has all training image’s subspaces then the
test (probe) image project into subspace. After that, each test image compares with
the training images by a similarity or distance measure. The most similar or closest
images identify as the match image.
41
3.4.3 Support vector machine kernel correlation feature analysis method
Savvides et al (2006) proposed a method for recognize partial faces and holistic
faces based on kernel correlation feature analysis (KCFA) and support vector
machine.
Here they have used correlation filter to extract features form the face images and
face regions. Minimize average correlation filter which has designed to minimize the
average correlation plane energy resulting from the training images Xie(2005 cited
Savvides et al 2006) has used in this approach to output required correlation peak
values that need to identify correlation peeks that match to face and face regions that
need to extract.
According to Savvides et al (2006), they have designed one correlation filter which
is a minimize average correlation filter (MACE).First they have trained MACE set
using 12776 images of 222 different classes. Because of that, they have received 222
dimensional correlation feature vector after projection of input images to correlation
filters, which have generated by MACE.
Savvides et al (2006) stated that the kernel correlation filters could extend using
linear advance correlation filter by performing kernel trick. They extended their CFA
method by “kernel trick” and performing it over the all images.
By examine distance and similarity threshold values between training image and
probe image (in KCFA projection coefficients) they have identified best match that
mean they recognised the individuals using this method which is similar to PCA.
They gained promising results during this approach. The results as follows the eye-
region yields a verification rate of 83.1% compared to 53% obtained by using the
mouth region and 50% with the nose region.
43
Figure 3.22: Face image transformation
[Source: Fladsrud,2005]
X=[x1,x2,x3.............................,xn]T
The rows of the image place each after other and it form a vector. Above vector
belongs to a image space which is w by h all images whose dimensions is w by h and
by putting row each after each it can be represent as one dimensional vector which
shows in figure 4.4.Following images shows basis of the image space.
When it take faces all the faces look like same because any face has a mouth, a nose,
pair of eyes, etc which are relatively located at approximately same place. As
mentioned by Fladsrud (2005) because of above quality of the face all faces (face
vectors) are located in very small area in the image space. Figure 3.26 represents
distribution of faces (face vectors) in image space.
44
By understanding that it is possible to understand that represent the faces in image
space is waste of space. O'Toole et al (1993) purposed use of PCA to find the
specific vectors that are highly sensitive for distribution of face images. These
vectors can call as “face space” which is subspace of face images. Furthermore
O’Toole et al (1993) stated that “Face space will be a better representation for face
images than image space which is the space which containing all possible images
since there will be increased variation between the faces in face space”.
According to O’Toole et al (1993) the vectors in face space are known as eigenfaces.
The simplest method to compare two images is compare pixel by pixel but when it
considering 100 pixels by 100 pixels it contains 104 pixels doing comparison for that
amount of pixels is time consuming and inefficient. As a solution for that Kerby &
Sirovich(1990) used karhunen-Loève expansion which is popular as principle
component analysis(PCA) .Also Kerby & Sirovich(1991) stated , The main idea
behind applying PCA to a image is find out weights ( vectors) which are responsible
to distribution of face space within the image space.
Because of the application of PCA, it is possible to keep image data in way that is
more compact and because of that, the comparison of two images vector is less time
consuming and efficient than matching image pixel by pixel.
45
Training set
Start (known
faces)
D = Average No
Distanse
(W,Wx)
E = Eigenfaces
X is unknown
D>e and δ >e Yes
face
No
No
W=weights of E
Recognition Module
training set X is a known
D>e and δ < e Yes
face
Input
W=weights of E
Unknown E = Eigenfaces
training set
images E End
Above figure represents overall idea behind the Eigen face algorithm. The diagram is
base on Turk & Pentland (1991). The training set which are known images are
transformed into set of Eigen vectors (Eigen faces)(E).After that it calculate weights
of E training set. Then it compares the weight of the new face and the weight of
training set (W).D is the average distance of between weight vectors .e is the
maximum allowable distance from any face class. δ is the Euclidian distance between
(L2 Norm) projection. The it identifies faces based on D , δ and e.
3.4.4.1 Variance
Covariance is the measure variability of two data sets together. As further described
by Weisstein (2010a) “Covariance provides a measure of the strength of the
correlation between two or more sets of random variants”
46
3.4.4.3 Eigenvector
Vector is a unit that has both magnitude and direction. Vector can consider as arrow,
which has size. The arrowhead represents the direction.
Ax=b
[Source: Baraniuk,2009]
Eigenvector has special properties that do not have in normal vector.A description by
Marcus & Minc (1988, p. 144 cited by Weisstein,2010b) “Eigenvectors are a
special set of vectors associated with a linear system of equations (i.e., a matrix
equation) that are sometimes also known as characteristic vectors, proper vectors, or
latent vectors “
47
Baraniuk (2009) further define Eigen vector as follows
Av=λv
Where λ is corresponding Eigen value. A only changes the length of v, not its
direction.
[Source: Baraniuk,2009]
As noted in previous section Eigen values is the values of scale of a Eigen vector.
Based on Reedstrom (2006) following calculations are presented here.
Av=λv
𝑣 𝐴−λ =0
48
i.e
2 1
A =
1 2
2−λ 1
det 𝐴 − λ =det
1 2−λ
= 2−λ 2−λ − 1
= 4 + λ2 − 4λ − 1
= λ2 − 4λ + 3
∴ λ = 3 or λ = 1
The following steps are based on Turk and Pentland(1991 cited Gül 2003 and Bebis
2003).
49
Computation of Eigen faces
[Source: Bebis,2003]
Let’s take Image matrix that ( Nx x Ny ) pixels size. Then the image transform into
new image vector Γ of size (P x 1) where (Nx x Ny) and the Γi is column vector. As
mentioned in image space section this is done by locating each column one after the
other.
50
Step 3: Create mean / average face
After transforming training images to 1D vector the average value of the faces
should calculate. This average value of the faces is call as mean face.
Mean face is the average face of all images training image vectors at each pixel
point. The size of the mean face is (P x 1).
After creating the mean face, the difference between mean face and each image
should calculate. By doing this it is possible to identify each training set image
uniquely.
Mean subtracted image is the difference of the training image from the mean image.
Size of mean subtracted image is (P x 1).
Like that is should calculate each subtracted mean face (difference between mean
face and training image).The output will be matrix all have details about the
subtracted mean faces.
After finding the difference / subtracted mean face the set of this vectors use
principle component analysis.
λ 1 Mt 2
k= M i=1 U Tk Φn
51
Step 6: Covariance matrix calculation
After applying PCA, the corresponding Eigen value and eigenvector should
calculate. In order to do it covariance matrix should calculate.
For a face image of size Nx by Ny pixel the covariance matrix is size of (P x P), ( P =
Nx x Ny). Because of that processing covariance matrix, consume lot of time,
processing power. Because of that using directly in this format is not a good choice.
Therefore, According to Turk & Pentland (1991) If the M is image space(No of data
points in image space) less than dimensions of the face space. which mean M < N2
there will be M – 1 matrix . It can be done by solving M x M which is reasonably
effective rather than solving for N x N dimensional vector. Calculation of AAT does
by using linier combination of face imagesΦi .
Turk & Pentland (1991) stated that by using following formula (calculation) it is
possible to construct M x M matrix into L = ATA. This gives M Eigen vectors.
They also mentioned that using following formula “It can determine leaner
combination of the M training set images from its eigenvector U l where l=1...M
images.
Ul = Vlk Φk
k=1
Turk & Pentland (1991) stated because of this approach the calculations that suppose
to do is greatly reduced.
52
They also described “from the order of the number of pixels in images (N2) to the
order of the number of images in the training set (M)” and practically training set of
the face images will be relatively small.
Figure 4.8 and figure 4.9 shows images taken in ORL database and mean images that
calculated using ORL faces.
[Source: Gül,2003]
[Source: Gül,2003]
Furthermore Gül (2003) stated that By using less eigenfaces which is less than total
number of faces for eigenface projection, it is possible to eliminate some of the
eigenvectors with small eigenvalues those contribute to less variance in the data.
53
Figure 3.32: Eigenfaces
[Source: Gül,2003]
Based on Turk & Pentland(1991) and Turk & Pentland(1990) , Gül(2003) described
that larger eigenvalues indicates larger variance therefore the eigenvector that have
largest eigenvalue is consider as first eigenvector like that most generalizing
eigenvector comes first in the eigenvector matrix.
Let’s take Γ as probe image (Which need to be recognised). The image should be
same size and it should have taken in same lighting condition. Then as mentioned by
Turk & Pentland(1991) , The image should normalize before it transform into
eigenface. Then it is assume that Ψ contains average value of the training set using
following formula it is possible to calculate distance different form average value.
Φ= Γ− Ψ
The projection value should calculate with the probe image and value calculated
using training image.
54
Step 8-4: Calculating Euclidian distance
Euclidian distance use to measure distance between projection value ( Eigen faces in
training set) and inputted image
𝑒 2 = Φ` − Φi
𝑒 = Euclidean distance Φ`= projection value Φi = projection value of i image
Get minimum e
𝑒𝑚𝑖𝑛 = mini Φ` − Φi
Step 9-1 :
First the image has to normalize and transform it in to 1D vector. The calculate
distance difference
The projection value should calculate with the probe image and value calculated
using training image.
Euclidian distance use to measure distance between projection value ( Eigen faces in
training set) and inputted image
Distance threshold defined maximum allowable distance from any face class. which
is equal to half of distance between two most distant classes.
55
“Classification procedure of the Eigenface method ensures that face image vectors
should fall close to their reconstructions, whereas non-face image vectors should fall
far away. “( Gül,2003).
Distance measure is the distance between subtracted image and the reconstructed
image.Based on Turk & Pentland (1991) recognise an image can be done by
knowing 𝑒𝑚𝑖𝑛 i and ε.
𝑒𝑚𝑖𝑛 = Euclidean distance ε = Distance value form the face Θ = Distance threshold
During this research, three approaches for face recognition were discussed .EGBM,
Support vector machine based kernel correlation feature analysis method and popular
eigenface method. Each method has taken three different approaches to face
recognition. Each method has archived good results under different conditions.
EGBM technique takes face as graph that represents facial features as nodes, and
distance between them as weight of the edges between them. As mentioned before it
is capable of identifying human faces under different poses, different facial
expressions and different scale of the image. According to Wiskott et al (1997), they
could achieve 12 % to 98 % recognition rate under different face poses, lighting
condition and different facial expression. Wiskott et al (1997) also mentioned that
EGBM technology can use to recognize object other than face recognition.
Support vector machine based kernel correlation feature analysis method is a method
that is similar to principle component analysis. Savvides et al (2006) suggested a
method recognize partial face regions (eyes , mouth , nose) by using Support vector
machine based KCFA.In that method they have used Class dependent feature
analysis (CFA) for dimensionality reduction and feature extraction of the given
images. In addition, they have used techniques like Kernel tricks, Kernel CFA.
56
According to Savvides et al (2006), the eye-region yields a verification rate of 83.1%
compared to 53% obtained by using the mouth region and 50% with the nose region.
Eigen face method can consider as one of most popular face recognition techniques
use in filed. In eigenface it takes faces as eigenvectors and produce PCA for feature
detection and dimensionality reduction. Then it checks similarity between eigenface
and new probe image by projecting to the eigenface space. By comaparing with other
techniques eigenface approach is a simple approach there are lot of sources and
developing platforms supports to implementation of eigenface.Campos,Feris &
Cesar (2000) have adopted eigenface approach for recognize faces using eigeneyes
According to they have archived Campos,Feris & Cesar (2000) 25.00 % - 62.50 %
face recognition rate. Yuen et al(2009) purposed a method for extract facial features
for template machine based face recognition in their face recognition module they
have achieved recognition rates for eye and mouth as follow 79.17% , 51.39% .So by
considering results got by Savvides et al (2006), Campos,Feris & Cesar (2000) &
Yuen et al(2009) the following conclusion can be made.
Eye and mouth are unique because of that they can use to recognize
purpose.Template matching, kernel correlation feature analysis and Eigenface
approach can use to recognize faces using partial face regions.
Other than that EBGM approach is very complex over eigenface approach because
of that it will take lot of time and effort for implementation. By considering allocated
time duration it will not be feasible to use EBGM approach.
When we comparing with Eigenface and Support vector machine, based kernel
correlation feature analysis method can be consider as novel method while eigenface
can consider as more stable method. Also Support vector machine, based kernel
57
correlation feature analysis method use support vector machine to measure distance
which takes more computational power comparing with eigenface apporch. And
research has understood that developing and training support vector machine during
the given time duration is not feasible. Therefore between Support vector machine,
based kernel correlation feature analysis method and eigenface approach, eigenface
approach has selected.
58
3.5 Approach to the solution
As mentioned above the solution consists of three main modules. Face region
extraction, face region identification and face match module.
To implement this module it will use Haar-like features for detect the particular face
region. And it will extract the particular face region from the face image and process
it to create Eigen image.
This is done by projecting probe image to Eigen image space and measure distance
and similarity value. Then it will decide whether submitted probe image is a face
region or not by using conditions in Table 3.4.
After identifying face region it will project face region into eigen space and
determine best match.
59
CHAPTER 4
4 REQUIREMENT SPECIFICATION
Since this system is about the identifying individuals using partial, face regions, this
system will have simple interfaces to interaction with users. In functional
requirements section it will shows the procedures of the proposed system and what
will be the functional outcome of the system. Then under non-functional
requirements, it will describe the performance, usability and security of the system.
Registration
Registration module will allow user to input new record to the system. Then it will
call the face region extraction module to extract relevant face segmentations and
process them to input face recognition module.
Using this module user will able to input a probe image to search in database.
Image recognition
This feature allows recognizing inputted image with the image in database. And
identifies individual based on inputted image.
Accessibility
Since this developed as pc client application this will able to access only single user
at a time.
60
Availability
Accuracy
It is required to have over 50 % accuracy rate of eye pair detection and over 40 %
accuracy of eye pair recognition.
Response Time
Response time will depend on detection module, extraction module and recognition
module. Based on the research it is expected to have 4 – 5s response time.
Security
Since target users of this application are law enforcement agencies, surveillance
monitoring systems and this application handle sensitive data this application will
have higher security level. Therefore, a login module will be implemented for user
login.
This part has done by based on technical research and investigation which is describe
in chapter 6.
In software requirement section, it will talk about the software requirements that
need to implement this solution.
MATLAB R2009a requires at least Intel Pentium 4 CPU or AMD Athlon CPUwith
680 MB hard disk and 512 MB RAM.
61
Visual studio 2008 requires at least 1.6 GHz CPU, 384 MB RAM, 1024x768 display,
5400 RPM hard disk.
MATLAB image processing tool kit also requires MATLAB with extra hard disk
space.
Emgu CV requires Visual Studio 2008 with Windows operating system as minimum
requirements.
By considering above facts and since this is image processing project which might
requires more processing power and storage disk capacity.
62
Windows xp or Windows 7 operating system(Home edition or professional
editions ).
63
CHAPTER 5
5 DEVELOPMENT METHODOLOGIES
There are different approaches to develop particular solution. However, the approach
should select based on the characteristics and requirements of the project.
The next section of the chapter briefly explain some of developing approaches
analysis them to select best approach for the project.
5.1 Waterfall
This method is a sequence method. All phases start one after another. Because of that
success of one phase, directly affect to other phase. In addition, every phase should
well document.
64
Software
concept
Requirements
Analysis
Architectural
Design
Detailed
Design
C oding &
debugging
System
Testing
65
The basic software functionality are required early
[Source: SoftDevTeam,2010]
[Source: SoftDevTeam,2010]
66
5.4 Prototyping
User Feedbacks
Test
Production
67
According to Albert, Yeung & Hall (2007, p352) Prototyping approach has two
variations.
Evolutionary prototyping
In this approach, one prototype is building and refine until it meet
appropriate state.
Requirement
Gathering
Quick Design
Build Prototype
Engineer product
68
5.5 Justification for selection method
By considering project characteristics and facts, I decided to use hybrid method. The
hybrid method is a combined approach of prototyping and waterfall model. Coding
and debugging phase of waterfall model replace with evolutionary photo type. That
phase is an iterative phase.
The reason for selecting this approach is this project requires good documentation.
By using waterfall approaches, it is possible to give rich documentation about the
research. This is important because implementation and accuracy of the solution
depend on the research. In addition, this project is conduct by individual because of
that it is not practical to working on parallel phases. Since this waterfall is a
sequential, that is suite for this project.
Under domain research, it consider about the typical face recognition system
structure and similar systems. Then identifies approaches achieved by similar
systems. In addition, during this section of research it studies about problems,
relate to face recognition.
69
System research (Week 5 – Week 12)
In this section studies about the image processing and face recognition
approaches that can use to develop the solution.
During this stage it analysis all the details gathered by research and identify what are
the required technologies that need to implement this system.
To better understand and mange the system design implement in this stage. First, it
starts to by designing modules of the system. To do that it designs logical design,
which gives total idea and theory behind the system logic. By comparing with other
approaches it design the logical design.
70
Phase 4.2 Development Increment 2 Face Match Module (Week 23 – Week
28)
During this stage, following tests perform to each module and to the system. It will
check about whether systems meet the desired result required.
Final documentation will be prepared during this stage and system and
documentation will deliver to APIIT project board for evaluate.
71
CHAPTER 6
For face recognition, there are different developing platforms, APIs. However,
finding appropriate platform is very important thing therefore in this chapter it
briefly analysis developing platforms and APIs.
6.1.1 Dot.Net
Dot.Net framework 3.5 can consider as most reliable and stable dot net framework,
which is available now. Dot.Net framework 4.0 is the next version of the Dot.Net
framework, which suppose to release in this year.
72
6.1.1.1 Dot Net Architecture
Dot.Net run on top of windows API (operating system) following diagram shows
architecture of the Dot.Net.
[Source: Northrup,2009]
[Source: Northrup,2009]
6.1.1.2 Languages
VB.net , C#.net and Visual C++.net are the most popular dynamic languages in
Dot.Net frameworks other than these languages J# , F#, Iron python ,etc can consider
as Dot.Net languages.
73
C#.net(Charp.net) which is a popular programming language similar to VB.Net but
C sharp has more object oriented capability than Vb.Net and support huge variety of
APIs.
There are different software which we can use to develop Dotnet applications. They
are Visual Studio.Net, Phrogram , and Delphi, which are the commercial softwares,
which use to develop applications. Programming environments like Monodevelop
and sharpdevelop can consider as free developing environments. The speciality of
the Monodevelop is it is the only developing environment that capable of run in
Linux and Mac environment other than Windows.
Sun java is a software platform, which built on Java virtual machine. Sun Java
developed by sun Microsoft system. In modern world java, consider as powerful
developing platform. One of the reason for popularity of the java is Java is a free
platform because of that most developers adapted java and developed java platform.
Java Virtual machine support some third party compilers other than Java, which is
main and fully supported language of java.
Clojure, Groovy, Jruby, Rhino, Jpython, Scala are the most popular compilers and
interpreter that supported by java platform.
74
6.1.2.1 Java Architecture
Java is the most favourable language of Java platform, which is Object Oriented.
Technologies and APIs that described in above diagram shows are compatible with
java.
J2SE java Software application development layer is the place where it starts the
developing of application. The relevant components are embedded in that
environment by using above mentioned developing platform it is possible to develop
enterprise level application.
Since java is free, there are lot of developing environment developed for java based
developing among them Eclipse and Net bean are distinctive. Both applications
allow GUI based developing and integrating with other language interpreters and
compilers.
75
It is possible to integrate MATLAB with DotNet, C++ and Java.
An API consists of set libraries, which gives functionality to perform process and
communicate in between software.
In image processing field widely using APIs are Aforge.net, OpenCV , EmguCV.
6.2.1 OpenCV
(Open Source Computer Vision) is a library of programming functions for real time
computer vision. (OpenCV n.d.) OpenCV is a one of most popular set libraries for
image processing, and ANN training. OpenCV can integrate with C , C++ and
Python.
Opencv have over 500 optimized algorithms for image processing, face detection,
face recognition, OCR, Object tracking, ANN training, etc.
6.2.2 EmguCV
One of the main disadvantage of OpenCV is it is not easy to integrate it with Dot.net
environment. As a solution for that, OpenCV wrapper developed which support to
OpenCV and Dotnet integration.
6.2.3 Aforge.Net
76
Aforge.Net is only works with C sharp. This is a one of drawback of this framework.
Extract face region using Emgucv and Do recognize process using MatLab
In this approach both platform should be integrated with Visual studio and
developing language like C sharp.net to exchange data. There are lot of ways to
integrate both MatLab and Emgucv separately but during research, it could not find
integration of both technologies at same time. EmguCv does not provide face
recognition ability using Eigenface method.
OpenCv is a ideal solution for this because OpenCV API provides both Haar Like
feature object detection ability.C or C++ should use to intertact with OpenCV.Visual
Studio can intergrte with OpenCV as developing platform.
Dot.Net developing plat form and Java developing platform are competitive
platforms, which gives similar features and performance. When comparing with
windows application development Dot.Net environment is more compatible with
Microsoft technologies because Microsoft developed both Dot.Net and Windows.
77
However, when it considering non-windows application development java is ideal
for those kind of development because Dot.Net architecture can only implement in
non-window environment only using Mono technology, which is not sill stable.
When it considering API support Dot.Net supported by lot of image processing APIs
comparing with Java environment.
By considering above fact, Dot.Net has selected as developing platform and visual
studio 2008 selected as developing software.
As mentioned before C++ shows higher efferent rate and as mentioned before C++
has lot of support from APIs including OpenCv, EmguCv and Aforge.net. Therefore,
C++ gives lot of support to image processing.
Visual Csharp.net, Visual Basic.net are simple languages than C++. In addition, both
of them have less capability to implement image processing. However, it does not
mean that by using those languages it is not possible to implement image-processing
application. Since Vb.Net, Csharp.Net and Visual C++ are working on Dot.Net
platform, all three languages use features of dot.net platform. Because of that, these
languages with combination of dot net platforms well.
However, when we compare Csharp.Net, Vb.Net and C++, Visual C++ has more
processing capacity than CSharp and Vb.Net.
78
By considering above fact, it selected to use Visual C++ as developing language. In
addition, OpenCV use as supportive library set for image processing with C++.
If this approach fails, C sharp and Matlab will use as alternative approach.
79
CHAPTER 7
Testing and evaluation is necessary to identify and measure the system. Since this
project use evolutionary prototyping based hybrid approach, it is required to test
each module, which is an outcome of each developing phase.
This checks each modules /components of the system right after implementation of
each. Mainly it checks about the functionality of the modules. This testing is carried
out while developing the system .In this project there will be unit testing at the end
of each increment. By doing that, it is possible to figure out errors of subsystems.
This will check whether system meet required functionality after integrating new
module to the core module. By doing that it can check whether the communication
between modules are happening properly.
This will perform once after all the module integrating. By doing that it is ensure that
all modules of the system communicate properly within the system.
Under this section, in this test it considers whether system gives smooth
performances, reliability and maintainability of the system. This will check weather
system gives runtime errors or build errors.
80
7.5 Accuracy testing
Under the test it will check whether system output meet the required expectation.
Following test plans shows basic test that can perform for each modules but these
will not be final tests because each modules consist of sub modules.
E.g. storing and retrieving face database can consider as one test case.
7.6 Sample Unit Testing Test plans for eye region extraction module
81
Input Jpeg Image
82
REFERENCE
Apple Inc. (2010). Apple - iPhoto - Organize, edit, and share photos on the web or
in a book. [Online]. Available from : http://www.apple.com/ilife/iphoto/.[Accessed:
08 March 2008]
Audio Visual Technologies Group. (n.d.). GTAV Face Database. [Online]. Available
from: http://gps-
tsc.upc.es/GTAV/ResearchAreas/UPCFaceDatabase/GTAVFaceDatabase.htm.
[Accessed: 30 January 2010]
Baker, L. (2006). Google, Neven Vision & Image Recognition | Search Engine
Journal. [Online]. Available from : http://www.searchenginejournal.com/google-
neven-vision-image-recognition/3728/. [Accessed: 08 March 2008]
Bagherian, E., Rahmat, R.W. & Udzir, N.I. (2009).Extract of Facial Feature Point,
IJCSNS International Journal of Computer Science and Network Security.9(1) p. 49
- 53.
83
Bebis, G. (2003). CS4/791Y: Mathematical Methods for Computer Vision. [Online].
Available from : http://www.cse.unr.edu/~bebis/MathMethods/. [Accessed: 23
February 2010]
Biswas ,P.K. (2008). Image Segmentation - I. Lecture - 29. [Online video]. October
16th. Available from : http://www.youtube.com/watch?v=3qJej6wgezA. [Accessed:
March 4th 2010]
Chopra,M. (2007). Two Muslim Girls. [Online]. June 24th 2007. Available from:
http://www.intentblog.com/archives/2007/06/two_muslim_girls.html. [Accessed:
08th March 2010]
Costulis, P.K. (2004). The Standard Waterfall Model for Systems Development.
[Online]. Available from : http://web.archive.org/web/20050310133243/http://asd-
www.larc.nasa.gov/barkstrom/public/The_Standard_Waterfall_Model_For_Systems
_Development.htm [Accessed: 28 February 2010]
84
Cristinacce,D & Cootes,T (2003). Facial feature detection using AdaBoost with
shape constraints.[Online]. Available from
citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.119.8459. [Accessed: 20th
January 2010]
Grady, L & Schwartz , E.L. (2006). Isoperimetric Graph Partitioning for Image
Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
[Online] 28. (3). P.469 - 475. Available from:
http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org
%2Fiel5%2F34%2F33380%2F01580491.pdf%3Farnumber%3D1580491&authDeci
sion=-203. [Accessed: 2 March 2010].
Gonzalez, R.C & Woods, R.E.(2004).Digital Image Processing. 2nd ed. New Jersey:
Prentice Hall.
85
GPWiki. (n.d.). Game Programming Wiki. [Online]. Available from :
http://gpwiki.org/index.php/.NET_Development_Environments. [28th February
2010]
Kirby, M & Sirovich ,L. (1990). Application of the Karhunen-Loeve Procedure for
the Characterization of Human Faces. IEEE Transactions on Pattern Analysis and
Machine Intelligence. [Online] 10. (1). P.103 - 108. Available from:
http://portal.acm.org/citation.cfm?id=81077.81096. [Accessed: 2 March 2010]
86
Kepenekci, B. (2001). Face Recognition Using Gabor Wavelet Transform. In partial
fulfilment of the requirements for the degree of Master of Science. Ankara: the
graduate school of natural sciences of the middle east technical university.
87
Neoh, S & Hazanchuk, N.A. (2005). Adaptive Edge Detection for Real-Time Video
Processing using FPGAs I. [Online]. Available from :
http://www.altera.com/literature/cp/gspx/edge-detection.pdf. [Accessed: 2nd March
2010]
Savvides, L.Et al. (2006). Partial & Holistic Face Recognition on FRGC-II data
using Support Vector Machine . CVPRW IEEE Proceedings of the 2006 Conference
on Computer Vision and Pattern Recognition Workshop. [Online] 24. (6). p. 48.
Available from: http://portal.acm.org/citation.cfm?id=1153824. [Accessed:
21/02/2010].
88
Sommerville,L. ( 2009). software engineering .[Online ] Delhi: Pearson Education.
Available from -
http://books.google.com/books?id=VbdGIoK0ZWgC&dq=Sommerville+%2B+Soft
ware+prototyping&source=gbs_navlinks_s . [Accessed: 3 March 2010].
89
Viola, P & Jones,M. (2001). Robust Real-time Object Detection.[Online]. Available
from : www.hpl.hp.com/techreports/Compaq-DEC/CRL-2001-1.pdf . [Accessed:
20th February 2010].
Waltercedric. (n.d.). Picasa 3.5 available with face recognition / easy geotagging.
[Online]. Available from :
http://www.waltercedric.com/component/content/article/238-software/1653-picasa-
35-available-with-face-recognition--easy-geotagging.html. [Accessed: 08 March
2008]
Wiskott, L.Et al. (1997).Face Recognition by Elastic Bunch Graph Matching. IEEE
Transactions on Pattern Analysis and Machine Intelligence. [Online] 19. (7).p. 775 -
779. Available from : http://www.face-rec.org/algorithms/EBGM/WisFelKrue99-
FaceRecognition-JainBook.pdf. [Accessed: 21/01/2010].
90
Zhang, W.Et al. (2007).Face Recognition Local Gabor Binary Patterns Based on
Kullback–Leibler Divergence for Partially Occluded Face Recognition. IEEE Signal
Processing Letters [Online] 14. (11).p. 875 - 878. Available from :
http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org
%2Fiel5%2F97%2F4351936%2F04351969.pdf%3Farnumber%3D4351969&authDe
cision=-203. [Accessed: 21/01/2010].
91
APPENDICES
i
APPENDIX A
ii
1. Image Processing Techniques
Spann (n.d) has used three images and he got following histogram based on them.
iii
Spann (n.d) mentioned, by using histogram above three images can identify as
follows
Noise free
“For the noise free image, it’s simply two spikes at i=100, i=150 “
(Spann , n.d)
Low Noise
“There are two clear peaks centred on i=100, i=150” (Spann , n.d)
High Noise
T
po (T ) P(i )
i 0
255
pb (T ) P(i)
i T 1
iv
P(i) h(i) / N
There can be incident that object and background overlaps (this is happen because
when both has similar characteristics).
v
APPENDIX B
vi
2. Eigenface Step by Step (Formulas)
The following steps are based on Turk and Pentland(1991 cited Gül 2003 and Bebis
2003).
[Source : Bebis,2003]
vii
Γ i= Transformed Image (2D to 1D)
Let’s take Image matrix that ( Nx x Ny ) pixels size. Then the image transform into
new image vector Γ of size (P x 1) where (Nx x Ny) and the Γi is column vector. As
mentioned in image space section this is done by locating each column one after the
other.
After transforming training images to 1D vector the average value of the faces
should calculate. This average value of the faces is call as mean face.
1 Mt
Ψ= i=1 Γi
Mt
Mean face is the average face of all images training image vectors at each pixel
point. The size of the mean face is (P x 1).
After creating the mean face, the difference between mean face and each image
should calculate. By doing this it is possible to identify each training set image
uniquely.
Φi = Γi - Ψ
Mean subtracted image is the difference of the training image from the mean image.
Size of mean subtracted image is ( P x 1).
viii
Like that is should calculate each subtracted mean face (difference between mean
face and training image).The output will be matrix all have details about the
subtracted mean faces.
After finding the difference/subtracted mean face the set of this vectors use principle
component analysis.
λ 1 Mt 2
k= M i=1 U Tk Φ n
λk = Eigen vector
Uk = Eigen values
M = orthonormal vector
Φ = difference
After applying PCA, the corresponding Eigen value and eigenvector should
calculate. In order to do it covariance matrix should calculate.
1 Mt
C = Α ⋅ ΑT = Mt i=1 Φi ΦiT
C = covariance matrix
Φi = difference / subtracted mean face
ΦiT = transpose of difference
A = difference matrix
i = current image no
ix
According to Turk & Pentland (1991) transpose of difference (ΦiT ) is inverse of
difference. So then ΑT is transpose difference matrix.
For a face image of size Nx by Ny pixel the covariance matrix is size of (P x P), ( P =
Nx x Ny). Because of that processing covariance matrix, consume lot of time,
processing power. Because of that using directly in this format is not a good choice.
Therefore, According to Turk & Pentland (1991) If the M is image space(No of data
points in image space) less than dimensions of the face space. which mean M < N2
there will be M – 1 matrix . It can be done by solving M x M which is reasonably
effective rather than solving for N x N dimensional vector. Calculation of AAT does
by using linier combination of face imagesΦi .
Turk & Pentland (1991) stated that by using following formula (calculation) it is
possible to construct M x M matrix into L = ATA. This gives M Eigen vectors.
They also mentioned that using following formula “It can determine leaner
combination of the M training set images from its eigenvector Ul where l=1...M
images.
Ul = Vlk Φk
k=1
Ul = Eigen faces
k = image number
m = count of set
Φk = difference of kth image
Vlk = Eigen vectors of k
Turk & Pentland (1991) stated because of this approach the calculations that suppose
to do is greatly reduced.
x
They also described “from the order of the number of pixels in images (N2) to the
order of the number of images in the training set (M)” and practically training set of
the face images will be relatively small.
Figure 4.8 and figure 4.9 shows images taken in ORL database and mean images that
calculated using ORL faces.
[Source : Gül,2003]
[Source : Gül,2003]
Furthermore Gül (2003) stated that by using less eigenfaces which is less than total
number of faces for eigenface projection, it is possible to eliminate some of the
eigenvectors with small eigenvalues those contribute to less variance in the data.
xi
Figure 2.4: Eigenfaces
Based on Turk & Pentland(1991) and Turk & Pentland(1990) , Gül(2003) described
that larger eigenvalues indicates larger variance therefore the eigenvector that have
largest eigenvalue is consider as first eigenvector like that most generalizing
eigenvector comes first in the eigenvector matrix.
Let’s take Γ as probe image (Which need to be recognised). The image should be
same size and it should have taken in same lighting condition. Then as mentioned by
Turk & Pentland(1991) , The image should normalize before it transform into
eigenface. Then it is assume that Ψ contains average value of the training set using
following formula it is possible to calculate distance different form average value.
Φ= Γ− Ψ
Φ = distance of difference
Γ = captured image value
Ψ = average value
The projection value should calculate with the probe image and value calculated
using training image.
xii
𝑘
Φ` = 𝑤𝑖 𝑢𝑖 𝑤𝑖 = 𝑢𝑖𝑇 Φ
𝑖=1
Φ` = projection value
𝑘 = image set count
𝑖 = Image number
𝑤𝑖 = projection of the image
𝑢𝑖 = Eigen faces
Φ = Distance of difference
𝛺 = (𝑤1 , 𝑤2 , 𝑤3 … … . . 𝑤𝑀
𝛺 = weight matrix
𝑤 = weight of image / projection of the image
Euclidian distance use to measure distance between projection value ( Eigen faces in
training set) and inputted image
𝑒2 = Φ` − Φi
𝑒 = Euclidean distance
Φ` = projection value
Φi = projection value of i image
Get minimum e
𝑒𝑚𝑖𝑛 = mini Φ` − Φi
xiii
Step 9: classify face image
First the image has to normalize and transform it in to 1D vector. The calculate
distance difference
Φ= Γ− Ψ
Φ = distance of difference
Γ = captured image value
Ψ = average value
The projection value should calculate with the probe image and value calculated
using training image.
Φ` = 𝑤𝑖 𝑢𝑖 𝑤𝑖 = 𝑢𝑖𝑇 Φ
𝑖=1
Φ` = projection value
𝑘 = image set count
𝑖 = Image number
𝑤𝑖 = projection of the image
𝑢𝑖 = Eigen faces
Φ = Distance of difference
Euclidian distance use to measure distance between projection value ( Eigen faces in
training set) and inputted image
xiv
𝑒2 = Φ` − Φi
𝑒 = Euclidean distance
Φ` = projection value
Φi = projection value of i image
Distance threshold defined maximum allowable distance from any face class. which
is equal to half of distance between two most distant classes.
1
Distance threshold Θ = max Φ` − Φi Ψj
2
Θ = Distance threshold
Φ` = projection value
Φi = projection value of i image
“Classification procedure of the Eigenface method ensures that face image vectors
should fall close to their reconstructions, whereas non-face image vectors should fall
far away. “ Gül(2003)
ε2 = Φ − Φ`
Distance measure is the distance between subtracted image and the reconstructed
image.
Based on Turk & Pentland (1991) recognise an image can be done by knowing 𝑒𝑚𝑖𝑛 i
and ε.
xv
If ε > Θ It is not a face
If ε < Θ and for all 𝑒𝑚𝑖𝑛 > Θ It is a unknown face
If ε < Θand for all 𝑒𝑚𝑖𝑛 < Θ The image match to training image i
xvi