Professional Documents
Culture Documents
CHAPTER 4
GABOR AND NEURAL BASED FACE RECOGNITION
4.1
75
4.1.1
Introduction
Generally, a wavelet can be viewed as a continuous wave
3)
76
4.1.2
Gabor wavelets are self similar (i.e.), some filters can be generated from one
mother wavelet by dilation and rotation. 2D Gabor (Hossein Sahoolizadeh
et al 2008) functions are similar to enhancing edge contours, as well as
valleys and ridge contours of the image. This corresponds to enhancing
eye, mouth, nose edges which are supposed to be the main important points
on a face. Moreover, such an approach also enhances moles, dimples, scars,
etc. Hence, by using such enhanced points as feature locations, a feature
map for each facial image can be obtained and each face can be
represented with its own characteristics without any initial constraints.
Having feature maps specialized for each face makes it possible to keep
overall face information while enhancing local characteristics (Zhang B et al
2007).
Gabor wavelets are used to extract facial appearance changes as a
set of multistage and multi orientation coefficients. It is shown to be robust
against noise and changes in illumination for all facial patterns. The common
approach when using Gabor filters (Chengjun Liu 2002) for face recognition
is to construct a filter bank with filters of different scales and orientations and
to filter the given face image. A well-designed Gabor filter bank can capture
the relevant frequency spectrum in all directions.
This method is based on selecting peaks (high energized points) of
the Gabor wavelet responses, as feature points. Detected feature points
together with locations are stored as feature vectors. The feature vector
consists of all useful information extracted from different frequencies,
orientations and from all locations and is hence very useful for expression
recognition. Feature vectors are generated by sampling wavelet responses of
the facial images at the specific nodes.
77
f,
( x, y ) can be represented as
f,
where
( x, y )
sin
exp
con
cos
sin
1
2
n
n
x 2n
y 2n
2
x
2
y
exp( 2 fx n )
(4.1)
x
y
are the standard deviations of the Gaussian envelope along the x and y
78
n 1
n.
will result in a
N,
(4.2)
y),
n.
The
important issue in the design of Gabor filters for face recognition is the choice
of filter parameters. The Gabor representation of a face image is computed by
convolving the face image with the Gabor filters. Here, f(x, y) be the intensity
at the coordinate (x, y) in a gray scale face image, its convolution with a
Gabor filter
f,
( x, y ) is defined as:
g f , ( x, y )
where
f ( x, y )
f,
( x, y )
(4.3)
f,
g f , ( x, y )
R 2 g f , ( x, y )
J 2 g f , ( x, y )
(4.4)
79
0,
2
5
3
5
4
5
5
5
6
5
7
5
The Gabor wavelets are scale invariant and the statistics of the image must
remain constant as one magnifies any local region of the image.
Figure 4.1 illustrates the convolution result of a face image with a
Gabor filter. Here, a 2D Gabor filter is expressed as a Gaussian modulated
sinusoid in the spatial domain and as shifted Gaussian in the frequency
domain.
4.1.3
recognition has two main steps as Feature point localization and Feature
vector computation (Lee 1996). In this step, feature vectors are extracted
from points with high information content on the face image. In most
feature-based methods, facial features are assumed to be the eyes, nose and
mouth (Yousra Ben Jemaa and Sana Khanfir 2009). The number of feature
vectors and their locations can vary in order to better represent diverse
80
Figure 4.2 Flowchart of the feature extraction stage of the facial images
From the responses of the face image to Gabor filters, peaks are
found by searching the locations in a window W0 of size (w*w) by the
following procedure:
A feature point is located at (x0, y0), if
R j x0 , y0
max R j x, y
x , y W0
(4.5)
81
R j x0 , y0
1
N1 N 2
N1 N 2
R j x, y
(4.6)
x 1 y 1
j = 1 40, where Rj is the response of the face image to the jth Gabor filter.
N1 and N2 are the sizes of the face image and the center of the window W0 is at
(x0, y0). Window size W must be chosen small enough to capture the
important features and large enough to avoid redundancy.
4.1.4
xk , yk , Ri , j xk , yk
(4.7)
82
im2vec (image)
83
4.1.6
size of the test image whose actual size is 320 * 243. At first the function
adjusts the histogram of the window. Then to convolve the window with
Gabor filters, the window in frequency domain will be multiplied by the
Gabor filters. Gabor filters are loaded and then the window histogram is
adjusted so that the parameters are set with trial and error. The numbers in
the input vector of the neural network should be between -1 and 1. For this
the feature matrix of size 45 * 48 is formed. Thus the matrix of the image is
converted into an image vector of size 2160 * 1 by reshaping.
The input query image is shown in Figure 4.4 and is resized into matrix
of size 50 * 50.
84
Gabor wavelets technique has recently been used not only for face
recognition, but also for face tracking and face position estimation. Thus this
approach not only reduces computational complexity, but also improves the
performance in the presence of occlusions. For a given input image the Gabor
filters are formed as shown in Figure 4.11(c).
4.1.7
Summary
Gabor wavelet provides the optimized resolution in both time and
85
Introduction
A boosting learning process is used to reduce the feature
dimensions and make the Gabor feature extraction process substantially more
efficient (Daugman 1988). Combining optimized Gabor features with Neural
Networks (Rowley et al 1996) reduces computation and memory cost of the
feature extraction process, but also achieves very accurate recognition
performance. Actually, training process in a neural network does not consist
of a single call to a training function. Instead, the network was trained several
times on various noisy images (Hutchinson and Welsh 1989).
In the previous chapters PCA and LDA based face reconstruction
and discrimination are done in an effective way. But the classification of face
and non-face are not carried out and the images are reconstructed for a rose as
shown in Figure 4.5. In this work, neural networks (Agui et al 1992)
effectively classify a face and non-face by BPNN algorithm.
86
4.2.2
when the networks have a large database of prior stored data base. Neural
networks can be used to extract patterns and detect trends that are too
complex to be noticed by either humans or other computer techniques. Neural
networks exhibit the ability (Hutchinson and Welsh 1989) of adaptive
learning, which is the ability to learn how to do tasks based on the data given
for training or initial experience.
To reduce complexity, neural network (Jahan Zeb et al 2007) is
often applied to the face recognition phase rather than to the feature extraction
phase. The network is initialized with random weights at first, and the data is
then fed into the network. As each data is tested, the result is checked. The
square of the difference between the expected and actual result is calculated,
and this data is used to adjust the weights of each connection accordingly. The
accuracy of neural networks is mostly a function of the size of their training
87
set rather than their complexity. The procedure for face recognition using
neural network is shown in Figure 4.6.
approximate almost any regularity between its input and output. The delta rule
is often utilized by the most common class of ANNs called back propagation
neural networks.The NN weights are adjusted by supervised training
procedure called back propagation. Back propagation performs a gradient
descent within the solution's vector space towards a global minimum. The
88
flow chart for BPNN Algorithm to identify whether the given image is face or
non-face is as shown in Figure 4.7.
Load new Neural Network
If result>0.1
No
The given
sample is non
face
Yes
F=0
The given
sample is a face
F=1
If F=1
Figure 4.7 Flow chart for neural network based face recognition
Back propagation is a kind of the gradient descent method, which
searches an acceptable local minimum in the NN weight space in order to
achieve minimal error. In principle, NNs can compute any computable
function, i.e., they can do everything a normal digital computer can do
(Kurita et al 2003). Almost any mapping between vector spaces can be
89
BPNN Algorithm
STEP 1: Load the new neural network using the mat lab function.
STEP 2: Call the special function sim by sending the new neural
network and the image vector as parameters.
STEP 3: Train the neural network.
STEP 4: And obtain the return variable in result.
STEP 5: If result is greater than 0.1, then print the given image as
a face.
STEP 6: And make F=1.
STEP 7: Else print the given image as a non-face.
STEP 8: Also make F=0
STEP 9: If F=1 call PCA Program.
The BPNN algorithm involves two phases, during the first phase,
the input vector is presented and propagated forward through the network to
compute the output values ok for each output unit. This output is compared
with its desired value, resulting in an error signal
second phase involves a backward pass through the network during which the
error signal is passed to each unit in the network and appropriate weight
changes are calculated.
Learning process in back propagation requires providing pairs of
input and target vectors. The output vector o of each input vector is
compared with target vector t. In case of difference of these two, the weights
are adjusted to minimize the difference. Initially, random weights and
90
thresholds are assigned to the network. These weights are updated every
iteration in order to minimize the cost function or the mean square error
between the output vector and the target vector. The BPNN algorithm applied
in face recognition is shown in Figure 4.8.
net m
x z w mz
(4.8)
z 1
The units of output vector of hidden layer after passing through the
activation function are given by
hm
1
1 exp
net m
(4.9)
91
net k
(4.10)
h z w kz
z 1
ok
1
1
exp
net k
(4.11)
For updating the weights, we need to calculate the error. This can
be done by
1
2
oi
ti
(4.12)
i l
wij
i j
(4.13)
92
where
can be obtained by
ti oi oi l oi
(4.14)
oi and ti represents the real output and target output at neuron i in the output
layer respectively. Similarly, the change of the weights between hidden layer
and output layer is given by
wij
where
Hi
(4.15)
xj
xj is the output of neuron j in the input layer. A hidden unit h receives a delta
from each output unit o equal to the delta of that output unit weighted with the
weight of the connection between those units.
Hi can
be obtained by
k
Hi
xi l
xi
(4.16)
wij
j 1
xi is the output at neuron i in the input layer, and summation term represents
the weighted sum of all
After calculating the weight change in all layers, the weights can simply
updated by
wij new
wij old
wij
(4.17)
93
E (w (N))
(4.18)
and this is called the generalized delta rule. The effect is that if the basic delta
rule is consistently pushing a weight in the same direction, then it gradually
gathers "momentum" in that direction. If momentum term is included, it will
have the effects of smoothening the weight changes, amplifies the learning
rate causing a faster convergence enabling to escape from small local minima
on the error surface.
The feature representation vectors from PCA and LDA are then
used to train the weighting factors in the combined neural networks. One of
the algorithms developed for non-linear optimization problems, following the
ideas of steepest descent are also called as gradient descent. BPNN algorithm
is made in the space of variables in the direction opposite to the direction of
the gradient of the minimized function.
A large number of neurons in the hidden layer can give high
generalization error due to over fitting and high variance. On the other hand,
by having less neurons, high training error and high generalization error is
obtained due to under fitting and high statistical bias. 'Over fitting' is the
phenomenon that in most cases a network gets worse instead of better after a
certain point during training when it is trained to as low errors as possible.
94
4.2.5
actually learn from input data. Various parameters assumed for this network
are as follows:
No. of Input unit
= 1 feature vector
= 70
= 1
Learning rate
= 0.4
No. of epochs
= 400
= 0.01
Momentum
= 0.9
95
96
(a)
(b)
(c)
Figure 4.11 (Continued)
97
(d)
(e)
Figure 4.11 Images of Gabor based neural network
4.2.6
Advantages,
Disadvantages
and Applications
of
Neural
Networks
High accuracy, more than 90 % recognition rate, easy to implement
and reduced execution time are the main advantages of Neural Network based
Face Recognition. Neural Networks are more flexible for solving non-linear
tasks. As gradient based method is applied, some inherent problems like slow
convergence and escaping from local minima are encountered here.
In practice, NNs are especially useful for classification and
approximation problems when rules such as those that might be used in an
expert system cannot easily be applied. NNs are, at least today, difficult to
98
Algorithm
Figure 4.12
No of Images
PCA
FLDA
BPNN
50
89
92
94
100
86
88
90
200
83
86
88
300
80
83
86
400
75
79
82
99
4.3
cascade architecture, in which hidden neurons are added to the network one at
100
a time and do not change after they have been added. It is called a cascade
because the output from all neurons already in the network feed into new
neurons. As new neurons are added to the hidden layer, the learning algorithm
attempts to maximize the magnitude of the correlation between the new
neurons output and the residual error of the network which is to be
minimized. A cascade correlation neural network has three layers: input,
hidden and output.
Input Layer: A vector of predictor variable values (x1xp) of the
given image is presented to the input layer. The input neurons perform no
action on the values other than distributing them to the neurons in the hidden
and output layers. In addition to the predictor variables, there is a constant
input of 1.0, called the bias that is fed to each of the hidden and output
neurons. The bias is multiplied by a weight and added to the sum going into
the hidden neuron.
Hidden Layer: Arriving at a neuron in the hidden layer, the value
from each input neuron is multiplied by a weight, and the resulting weighted
values are added together producing a combined value. The weighted sum is
fed into a transfer function, which outputs a value. The outputs from the
hidden layer are distributed to the output layer.
Output Layer: Each output neuron receives values from all of the
input neurons and all the hidden layer neurons, with the bias values. Each
value presented to the output neuron is multiplied by a weight, and the
resulting weighted values are added together producing a combined output
value. The weighted sum is fed into a transfer function, which outputs a final
value for classification. For regression problems, a linear transfer function is
used in the output neurons. But for classification problems, there is a neuron
for each category of the target variable and a sigmoid transfer function is used.
101
4.3.2
two key ideas: The first is the cascade architecture, in which hidden units are
added to the network one at a time and do not change after they have been
added. The second is the learning algorithm, which creates and installs the
new hidden units. For each new hidden unit, an attempt is made to maximize
the magnitude of the correlation between the new units output and the
residual error signal. The training steps for CCNN algorithm is as follows
Step1:
Step2:
Step3:
Step4:
Step5:
Step6:
102
The input and the output neurons are linked by a weight value.
Values on a vertical line are added together after being multiplied by their
weights. Every input is connected to every output unit by a connection with
an adjustable weight. There is also a bias input, permanently set to +1. The
output units may just produce a linear sum of their weighted inputs, or they
may employ some non-linear activation function. So each output neuron
receives a weighted sum from all of the input neurons including the bias. The
cascade architecture with one hidden unit is shown in Figure 4.14.
103
104
order feature detectors; it also may lead to very deep networks and high fan-in
to the hidden units.
C
o p
( yp
y )(eop
eo )
(4.19)
105
where y and eo are the mean values of the outputs and output errors over the
all patterns p of the training sample.
After learning, the candidate-node is added to the main net. The
weights of this added node are frozen. The output of this node, in its turn, can
either be forwarded to the output of the main net or serve as one of inputs for
the hidden units. One by one added hidden nodes thus make cascade
architecture as shown in Figure 4.16.
O u t p ut
u n its
H i d de n u ni t 2
H i d de n u ni t 1
In p u t
u n its
eop
eop X ip
p
( yop t op ) f p
(4.20)
106
X ip
(4.21)
(eop
eo ) f p
If
E
and
w
C
will be denoted as S, weights correction formula
w
(4.22)
wt = S(t)
if
wt = wt-1S(t)/(S(t-1)-S(t))
wt =
w(t-1)
wt-1=0,
107
While the weights in the output layer are being trained, the other weights in
the active network are frozen. While the candidate weights are being trained,
none of the weights in the active network are changed. In a machine with
plenty of memory, it is possible to record the unit-values and the output errors
for an entire epoch, and then to use these cached values repeatedly during
training, rather than recomputing them for each training case. A reasonably
small net is built automatically. This can result in a tremendous speedup,
especially for large networks.
4.3.3
Advantages
and
Disadvantages
of
Cascade
Correlation
Algorithm
Cascade-Correlation Network is useful for incremental learning, in
which new information is added to an already-trained net. Once built, a
feature detector is never cannibalized. It is available from that time on for
producing outputs or more complex features. Training on a new set of
examples may alter a networks output weights, but these are quickly restored
on return to the original problem. At any given time, only one layer of
weights in the network can be trained. The rest of the network is not changing.
In CCNN, there is no need to guess the size, depth, and
connectivity pattern of the network in advance. It may be possible to build
networks with a mixture of nonlinear types. Cascade-Correlation learns fast.
In backpropogation, the hidden units engage in a complex way before they
settle into distinct useful roles: In Cascade-Correlation, each unit sees a fixed
problem and can move decisively to solve that problem. The learning time in
epochs grows very roughly as NlogN, where N is the number of hidden units
ultimately needed to solve the problem. Cascade-Correlation can build deep
nets (high-order feature detectors) without the dramatic slowdown that is seen
in back-propagation networks with more than one or two hidden layers.
108
109
Table 4.2
No. of images
BPNN
CNN
50
94
95
100
90
92
200
88
89
300
86
88
400
82
84
110
Table 4.3
No. of images
BPNN+FLDA
CNN+FLDA
50
30.09
25.36
100
39.51
33.04
200
45.26
39.41
300
51.02
45.21
400
60.26
53.34
111
4.3.4
Summary
Neural Networks (NN) have found use in a large number of
computational disciplines. The well known PCA and FLDA algorithms are
applied with BPNN to improve the performance. LDA is a robust algorithm
for the illumination variance. The performances of the LDA with BPNN are
discussed here, with various databases and with diverse environments.
BPNN enhances the classification and the performance of LDA
with BPNN resulted in more than 90 % recognition rate. CNN is better for the
large number of database images, as it has a fast recognition. Mostly, the
execution time of CNN is 20% less, when compared to BPNN.
Neural networks are currently used prominently in voice
recognition systems, image recognition systems, industrial robotics, medical
imaging and data mining and aerospace applications.