You are on page 1of 25



A project report
Submitted in the partial fulfillment of the requirement for
The award of the degree of
Bachelors of Engineering


Prateek Saraogi
Shubham Pandey
Shubhanshoo Agarwal

Dr. K. Sridhar Patnaik

This is to certify that the content of the project entitled FACIAL
RECOGNITON is a bona fide work carried out by Prateek Saraogi,
Shubham Pandey, Shubhanshoo Agarwal under my supervision and
guidance in partial fulfillment of the requirement for the degree of
Bachelor of Engineering in Computer Science of Birla Institute of
Technology, Mesra, Ranchi.
The contents of this project report have not been submitted earlier for the
award of any other degree or certificate. I hereby commend this work.


Dr. K. Sridhar Patnaik

Associate Professor
Dept. of CSE
Birla Institute of Technology
Mesra, Ranchi-835215



Department of CSE

(Undergraduate Studies)

BIT Mesra

BIT Mesra


The project work entitled IMAGE RECOGNITION, is carried out and

presented in a manner satisfactory to warrant its acceptance as a prerequisite to the degree for which it has been submitted. It is understood
that by this approval, the undersigned do not necessarily endorse any
conclusion drawn or opinion expressed therein, but approved the project
report for the purpose for which it is submitted.

Internal Examiner

External Examiner

Head of Department
Computer Science and Engineering
Birla Institute of Technology, Mesra
Ranchi 835215

We would like to thank all the people who helped and supported us in
writing the research project.
We would like to express our gratitude to our project guide, Dr. K. Sridhar
Patnaik, for constant motivation for working on this project. We are
grateful to have shared his experience in this field.
We would also want to thank all the other faculties, who have been our
lectures on various fields crucial to this project.

1. Introduction
1.1 Digital Image Processing
1.2 Image Recognition
1.3 IMED
1.4 Expression Detection
2. Literature Review
2.1 Euclidean Distance
2.2 IMage Euclidean Distance
2.3 Standardizing Transform
3. Research Background
3.1 Principle Component Analysis
3.2 Bayesian Similarities
3.3 Artificial Neural Network
4. Research Methodology
4.1 Block Diagram
4.2 Steps
4.2.1 Using Bayesian Similarity
4.2.2 Using Principal Component Analysis
5. Implementation
5.1 Demonstration with Images
5.2 Code Snippet
6. Future Works
7. Bibliography

Digital Image Processing
Digital image processing is the use of computer algorithms to
perform image processing on digital images. As a subcategory or field
of digital signal processing, digital image processing has many
advantages over analog image processing. It allows a much wider range
of algorithms to be applied to the input data and can avoid problems such
as the build-up of noise and signal distortion during processing. Since
images are defined over two dimensions (perhaps more) digital image
processing may be modeled in the form of multidimensional systems.

Facial Recognition
A facial recognition system is a computer application capable of
identifying or verifying a person from a digital image or a video frame
from a video source. One of the ways to do this is by comparing selected
facial features from the image and a facial database. It is typically used in
security systems and can be compared to other biometrics such as
fingerprint or eye iris recognition systems. Recently, it has also become
popular as a commercial identification and marketing tool.

IMage Euclidean Distance (IMED)

Unlike the traditional Euclidean distance, IMED (IMage Euclidean
Distance) takes into account the spatial relationships of pixels. Therefore,
it is robust to small perturbation of images. We argue that IMED is the
only intuitively reasonable Euclidean distance for images. IMED is then
applied to image recognition. The key advantage of this distance measure
is that it can be embedded in most image classification techniques such as

SVM, LDA, and PCA. The embedding is rather efficient by involving a

transformation referred to as Standardizing Transform (ST). We show that
ST is a transform domain smoothing. Using the Face Recognition
Technology (FERET) database and two state-of-the-art face identification
algorithms, we demonstrate a consistent performance improvement of the
algorithms embedded with the new metric over their original versions.

Expression Detection
Human facial expression recognition by a machine can be described as an
interpretation of human facial characteristics via mathematical
algorithms. Expressions of the face are read by an input sensing device
such as a web-cam. It reads the movements of the facial muscles and
communicates with computer that uses these gestures as an input. These
gestures are then interpreted using algorithm either based on statistical
analysis or artificial intelligence techniques. The primary goal of facial
recognition research is to create a system which can identify specific
human expression and use them to convey information. By observing
face, one can decide whether a man is serious, happy, thinking, sad,
feeling pain and so on.

Euclidean Distance
Among all the image metrics, Euclidean distance is the most commonly
used due to its simplicity. Let x, y be two M by N images, x = (x 1, x2,
., xMN), y = (y1, y2, , yMN), where x kN+l, y kN+l are the gray levels
at location (k, l). The Euclidean distance dE (x, y) is given by:

Euclidean distance defined above does not take into account that x, y are
images, xk, yk are gray levels on pixels. For images, there are spatial
relationships between pixels. The traditional Euclidean distance is only a
summation of the pixel-wise intensity differences and, consequently,
small deformation may result in a large Euclidean distance.

Generally, we call a Euclidean distance, IMage Euclidean Distance
(IMED) if the metric satisfies three conditions which lead to appealing
A Euclidean distance d (x, y) = [(x-y)TG(x-y)]1/2 , G = (gij)MN*MN is said
to be an IMED if the following conditions are satisfied:
1. The metric coefficient gij depends on the distance between pixels Pi
and Pj. Let f represent this dependency:

2. f is continuous and gij decreases monotonically as |Pi - Pj| increases.

3. The functional dependency f is a universal function. That is, it is not

for images of a particular size or resolution.
Condition 1 means that the information about pixel distance must be
considered in the metric. Depending only on |Pi Pj| makes gij (and, hence,
the induced Euclidean distance) invariant to linear transformation of
images. It also implies that all the base vectors have the same length and,
therefore, gij is proportional to cos ij.
Condition 2 says how to merge the pixel distance into the metric
coefficients so that the induced distance is intuitively reasonable. The
continuity of f is a general necessity. The request that gij decreases as
|Pi Pj| increases means that the distance depends on the extent of the
Finally, Condition 3 guarantees the universal validity of this distance
More precisely, Conditions 1-3 imply that IMED is characterized by the
following properties:
1. Small deformation yields small image distance. The stronger the
deformation, the larger the distance. And, the distance is continuous
to the extent of deformation.
2. The distance between two images remains invariant if we perform
the same translation, rotation, and reflection to the images.
3. The metric applies to images of any size and resolution.

Standardizing Transform
In these algorithms, one often needs to compute IMED,
i.e., (xi xj)TG(xi - xj), for all pairs of images. Thus, for large databases,
this evaluation is expensive. However, these computations can be greatly
simplified by introducing a linear transformation.
Consider a decomposition of matrix G, G = ATA. If we transform all
images x, y by A and denote u =Ax, v =Ay, ..., then IMED between
x, y is equal to the traditional Euclidean distance between u, v:
(x y)T G(x y)=(x y)TATA(x y)= (u v)T (u v)

Thus, one avoids unnecessary and repeated computation of IMED by

utilizing the transformed images u, v, as inputs to the image recognition
algorithms. The decomposition can be written in another way by
G = G1/2G1/2;
where the symmetric matrix G1/2 is uniquely defined as
G1/2 = QA1/2QT
Here, A is a diagonal matrix whose elements are eigen values of G
(remember that G is positive definite, so the diagonal entries of A1/2 are
positive real numbers) and Q is an orthogonal matrix whose column
vectors are eigen vectors of G. Thus, applying the transformation G1/2 to
the images x, y
u=G1/2x, v=G1/2y
We call the transformation G1/2(.) Standardizing transform (ST). Hence,
feeding the transformed images to a recognition algorithm automatically
embeds IMED in it. An interesting result is that ST is a transform domain
smoothing. Note that ST is a composition of three operations
G1/2 = QA1/2QT

Principal Component Analysis
Principal Component Analysis (PCA) was invented in 1901 by Karl
Pearson. PCA is a variable reduction procedure and useful when obtained
data have some redundancy. This will result into reduction of variables
into smaller number of variables which are called Principal Components
which will account for the most of the variance in the observed variable.
Problems arise when we wish to perform recognition in a highdimensional space. Goal of PCA is to reduce the dimensionality of the
data by retaining as much as variation possible in our original data set. On
the other hand, dimensionality reduction implies information loss. The
best low-dimensional space can be determined by best principal
The major advantage of PCA is using it in Eigen face approach which
helps in reducing the size of the database for recognition of a test images.
The images are stored as their feature vectors in the database which are
found out projecting each and every trained image to the set of Eigen faces
obtained. PCA is applied on Eigen face approach to reduce the
dimensionality of a large data set.
Eigen Face Approach
It is adequate and efficient method to be used in face recognition due to
its simplicity, speed and learning capability. Eigen faces are a set of Eigen
vectors used in the Computer Vision problem of human face recognition.
They refer to an appearance based approach to face recognition that seeks
to capture the variation in a collection of face images and use this
information to encode and compare images of individual faces in a holistic
The Eigen faces are Principal Components of a distribution of faces, or
equivalently, the Eigen vectors of the covariance matrix of the set of the
face images, where an image with N by N pixels is considered a point in

N x N dimensional space. This suggests that coding and decoding of face

images may give information of face images emphasizing the significance
of features. These features may or may not be related to facial features
such as eyes, nose, lips and hairs. We want to extract the relevant
information in a face image, encode it efficiently and compare one face
encoding with a database of faces encoded similarly. A simple approach
to extracting the information content in an image of a face is to somehow
capture the variation in a collection of face images.
We wish to get Principal Components of the distribution of faces, or the
Eigen vectors of the covariance matrix of the set of face images. Each
image location contributes to each Eigen vector, so that we can display
the Eigen vector as a sort of face. Each face image can be represented
exactly in terms of linear combination of the Eigen faces. The number of
possible Eigen faces is equal to the number of face image in the training
set. The faces can also be approximated by using best Eigen face, those
that have the largest Eigen values, and which therefore account for most
variance between the set of face images. The primary reason for using
fewer Eigen faces is computational efficiency.

Bayesian Similarities
The Bayesian approach provides the means to incorporate prior
knowledge in data analysis. Bayesian analysis revolves around the
posterior probability, which summarizes the degree of ones certainty
concerning a given situation. Bayess law states that the posterior
probability is proportional to the product of the likelihood and the prior
probability. The likelihood encompasses the information contained in the
new data. The prior expresses the degree of certainty concerning the
situation before the data are taken. Although the posterior probability
completely described the state of certainty about any possible image, it is
often necessary to select a single image as the result or reconstruction.
A typical choice is that image that maximizes the posterior probability,
which is called the MAP estimate. Other choices for the estimator may be

more desirable, for example, the mean of the posterior density function.
In situations where only very limited data are available, the data alone
may not be sufficient to specify a unique solution to the problem. The
prior introduced with the Bayesian method can help guide the result
toward a preferred solution. As the MAP solution differs from the
maximum likelihood (ML) solution solely because of the prior, choosing
the prior is one of the most critical aspects of Bayesian analysis. I will
discuss a variety of possible priors appropriate to image analysis.
Current approaches to image matching for visual object recognition and
image database retrieval often make use of simple image similarity
metrics such as Euclidean distance or normalized correlation, which
correspond to a standard template-matching approach to recognition. For
example, in its simplest form, the similarity measure S (I1, I2) between
two images I1 and I2 can be set to be inversely proportional to the norm
||I2 I1||. Such a simple formulation suffers from two major drawbacks: it
requires precise alignment of the objects in the image and does not exploit
knowledge of which type of variations are critical (as opposed to
incidental) in expressing similarity. In this paper, we formulate a
probabilistic similarity measure which is based on the probability that the
image-based differences, denoted by d (Il, I2), are characteristic of typical
variations in appearance of the same object. For example, for purposes of
face recognition we can define two classes of facial image variations:
intrapersonal variations 1 (corresponding, for example, to different
facial expressions of the same individual) and extra personal variations
E (corresponding to variations between different individuals). Our
similarity measure is then expressed in terms of the probability
S (I1, I2) = P (d (I1, I2) 1) = P (1|d (I1, I2))
where P (1|d (I1, I2)) is a posteriori probability given by Bayes rule, using
estimates of the likelihoods P (d (I1, I2) |1) and P (d (I1, I2) |E) which
are derived from training data using an efficient subspace method for
density estimation of high-dimensional data.

Artificial Neural Network

An Artificial Neural Network (ANN) is an information processing
paradigm that is inspired by the way biological nervous systems, such as
the brain, process information. The key element of this paradigm is the
novel structure of the information processing system. It is composed of a
large number of highly interconnected processing elements (neurones)
working in unison to solve specific problems. ANNs, like people, learn by
example. An ANN is configured for a specific application, such as pattern
recognition or data classification, through a learning process. Learning in
biological systems involves adjustments to the synaptic connections that
exist between the neurones. This is done in ANN as well.
One special model of ANN is multilayer perceptron (MLP) which is a feed
forward model that maps sets of input data onto a set of appropriate
An MLP consists of multiple layers of nodes in a directed graph, with each
layer fully connected to the next one. Except for the input nodes, each
node is a neuron (or processing element) with a nonlinear activation
Backpropagation, an abbreviation for "backward propagation of errors",
is a common method of training artificial neural networks used in
conjunction with an optimization method such as gradient descent.
Two Passes: 1. Forward propagation of a training pattern's input through the neural
network in order to generate the propagation's output activations and
calculating the final output.
2. In Second pass weights are updated using the error signals which are
propagated in reverse direction from output layer to input layer.

Block Diagram

Step 1

Selection of image from database

Resizing of image
Step 2 Standardzing Transform
Recognition using PCA
Step 3 Recognition using Bayesian Similarity

Step 4

Display of output

Using Bayesian Similarity
1. Training image acquisition and alignment: Before using the
system for recognizing faces, we have to train the system so that it
is able to identify faces. For this, we used 60 pairs of face images (2
images per individual) from the FERET database and those we have
taken. The training images are aligned into 24 by 24 grayscale
images. The aligned images are then put into a folder called
Training set. The following is a brief graphical description of how
we obtained the aligned training images. After doing that, we read

the images into a large matrix as column vectors. we also read the
images whose file names have 'a' (eg.1a.png) into another matrix
called ImgMatrixA and those with 'b' in the filenames into

2. Calculation of intrapersonal differences: The next step is to

calculate the intrapersonal differences (i.e. the differences between
images of the same individual). To do that we took each of the image
column vectors in ImgMatrixA and subtract it with another one with
the same index in ImgMatrixB. Then we subtracted the vectors the
other way around. This is make sure that the mean of the differences
is zero.
3. Selection of eigen vectors: After getting the zero-meaned
intrapersonal difference vectors, we had to get the eigenvectors that
will reduce the dimensionality of the images so that they are suitable
for feature extraction. For this purpose, we used the Matlab
implementation of eigen space decomposition, which is based on
Turk and Pentland's eigen space decomposition method.
4. Offline whitening translation: This is the crucial step in the
implementation of the system. This is where we performed offline
whitening transformation using the formula as proposed by
Moghaddam et tal in. The reason is to reduce the computational
complexity of iteratively calculating the differences between each
of the training images and the input image

5. Calculating the normalizing denominator: One of the most

important parts in the implementation of the system is in the
calculation of the normalizing denominator. It should be noted that
any mistake in the calculation of the denominator will result in the
lack of accuracy of the system in recognizing the images. we used
the normalizing formula to calculate the denominator.
6. Implementing the system: My face recognition system can be
described in the following diagram:

When an image is input into the system, we used the modified

version of Ms. Lingyung Zhang's Matlab implementation of face
alignment. The program is modified in such a way that the aligned
image is directly passed as a parameter into the back end of the
system. The back end will treat the aligned image in the same
manner as the training images and then will calculate the Maximum
Likelihood (ML) estimations.

Using PCA
To Train a recognizer using a set of M images: 1. Convert the images into vector form: - N x N images is converted into
N2*1 image and combining all the vector form N2*M vector matrix.
2.Normalization of images: -Normalize of images is done by taking the
differences between the image vector and the average sum of all the

3. Dimensionality reduction: - Calculation of Co-variance matrix using

ATA and calculating eigen-vectors corresponding to each facial image.
These correspond to the Eigen faces.
4. Taking K-useful eigenfaces to represent all the images. This eliminates
the noise used to represent the images.
5. Each image in the training set is represented by a weighted linear
combination K-eigen faces and mean sum of the images. We get a weight
vector corresponding to each image and have a weight matrix
corresponding the images.
To recognize an unknown image: 1. Convert the images into vector form.
2. Normalize the image.
3. Projection of input image to eigen faces to get a weight matrix.
4. This weight matrix is compared to all the weight matrix using
Euclidean distance. The minimum distance corresponding to a image will
give the most matching image.

Best result from different methods
1.357 s IMED

2.98 s Vector cosine

13.826 s Euclidean


Resizing and
converting the
image to

Applying Standardizing Transform

Recognition using ML technique

Time profile 1.358s

Code Snippet
Create Database

Create Database of Images

Eigen Face Core

Determine most discriminating feature


Compare two images

A. The Viola-Jones Object Detection Framework: -Viola and Jones
proposed in an object detection framework to detect faces in images.
The algorithm has four stages: the Haar Feature Selection, creating an
Integral Image, the Adaboost Training, and the cascading Classifiers.
Haar Features: -Haar features are digital image features used in object
recognition. Because all human faces share some common properties.
Integral image representation: -The integral image at location (x, y) is the
sum of the pixels above and to the left of (x, y). The integral image can be
calculated in a single pass and only once for each sub window.
The mechanism responsible for features selection is the Adaboost
Algorithm. This algorithm creates a strong classifier as linear combination
of weighted simple weak classifiers. First, the system chooses the most
efficient weak classifier that will be a component of the final strong
classifier. Then the weights will be updated to emphasize the examples
which were incorrectly classified. This procedure will be repeated for n
times. This makes the next weak classifier to focus on harder
examples. The final strong classifier is a weighted combination of the n
weak classifiers.
B. Image Cropping: - Once the face has been detected by the Viola-Jones
algorithm, a simple MATLAB routine was written to crop the face image
by detecting the coordinates of the top-left corner, the height and width of
the face enclosing rectangle.
C. Facial Image Preparation: -In order to recognize the facial expression
in the cropped image of the previous phase, the image has to be resized to
64 64 pixels. Next the RGB image is converted into grayscale

D. ANN using MLP with Backpropagation Algorithm: -ANN is an

information processing paradigm that is inspired by the way biological
nervous systems, such as the brain, process information.
An ANN is configured for a specific application, such as pattern
recognition or data classification, through a learning process. One special
model of ANN is multilayer perceptron (MLP) which is a feed forward
model that maps sets of input data onto a set of appropriate outputs. The
MLP consists of three or more layers: an input and an output layer with
one or more hidden layers. Since an MLP is a Fully Connected Network,
each node in one layer connects with a certain weight to every node in the
following layer MLP utilizes a supervised learning technique called
backpropagation for training the network

Number of input nodes: 64 64 + 1(bias) = 4097 nodes

For documentation of various matlab image processing tools.
cMinor answer for Viola Jones algorithm for face detection

[1] On the Euclidean Distance of Images, L Wang, Y Zhang, J Feng
[2] Eigenfaces for Recognition, M Turk, A Pentland
[3] A Bayesian Similarity Measure for Direct Image Matching, B
Moghaddam, C Nastar, A Pentland
[4] PCA by Victor Lavrenko