You are on page 1of 22


A Mid-Term Project Report Submitted for the partial fulfillment of the requirements for the Degree of Bachelor of Technology Under Biju Pattnaik University of Technology Project ID: 11079 Submitted By


Roll No. # CSE200833392 Roll No. # CSE200833397

2010 2011

Under the guidance of

Mr. Sourav Pramanik



Generic face recognition systems identify a subject by comparing the subjects image to images in an existing face database. These systems are very useful in forensics for criminal identification and in security for biometric authentication, but are constrained by the availability and quality of subject images. In this project, we will propose a novel system, that uses descriptive non-visual human input of facial features to perform face recognition without the need for a reference image for comparison. Our maps images in an existing database to a fourteen dimensional descriptive feature space and compares input feature descriptions to images in feature space. Our system clusters database images in feature space using feature-weighted K-Means clustering to offer computational speedup while searching feature space for matching images. We are working in MATLAB environment. Our system has four modules: 1: Convert Descriptive Features to numeric data. 2: Extract 14 features from each face image from face database. 3: Compare input features with extracted features of a face. 4: Display the best match image.

It is our proud privilege to epitomize our deepest sense of gratitude and indebtedness to our guide, Mr. SOURAV PRAMANIK for his valuable guidance, keen support, intuitive ideas and persistent endeavor. His inspiring assistance and affectionate care enabled us to complete our work smoothly and successfully. We are also thankful to Mr. SWADHIN MISHRA, B.Tech Project Coordinator for giving us valuable time and support during the presentation of the project. We acknowledge with immense pleasure the interest, encouraging attitude and constant inspiration rendered by Dr. Ajit Kumar Panda, Dean, N.I.S.T and Prof. Sangram Mudali, Director, N.I.S.T. Their continued derive for better quality in everything that happens at N.I.S.T. and selfless inspiration has always helped us to move ahead. We can never forget to thanks our family and friends for taking the pain of helping us and understanding us at any hour of time during the completion of the project. Lastly, we bow our gratitude at the omnipresent Almighty for all his kindness. We still seek his blessings to proceed further.



2.2.2 Linguistic variables......................................................................................10 2.2.3 Example.......................................................................................................11 2.5 Neural Network:.....................................................................................13 2.5.3 Learning of ANNS..............................................................................15 Learning with a teacher:...................................................................15 Learning without a teacher:..............................................................16

8. REFERENCES ..........................................................................22


The face recognition problem involves searching an existing face database for a face, given a description of the face as an input. The face identification problem is one of accepting or rejecting a persons claimed identity by searching an existing face database to validate input data. Many databases for face identification and recognition have been built and are now widely used. However, most systems that have been developed in the past are constrained by images being the primary, and often singular, form of input data. In cases where images are not available as sample input,it not possible for such systems to perform face recognition. Our system uses general facial descriptions as input to retrieve images from a database. Users may utilize this system to identify images by just entering general descriptions, removing the constraint of input images for face recognition and identification purposes. Our system will formalize subjective human descriptions into discrete feature values and associates seven descriptive and seven geometric features to face images. The seven discretized geometric features combine with the seven descriptive features to form a composite fourteen dimensional feature set for our system. Similar images are clustered in feature space using weighted K-means clustering. User input, in the form of facial descriptions, directly maps to the fourteen dimensional descriptive feature space. Thereafter, the input description is compared to the three closest clusters of images in feature space iteratively, to check for matches. A set of prospective matches is then identified and returned. Our approach draws inspiration from the fact that humans describe faces using abstract and often subjective feature measures such as the shape of a face, the color of the skin, hair color etc.[3]. These semantic descriptions, supplied by humans are immune to picture quality and other effects that reduce the efficiency of contemporary face recognition and identification algorithms. We will identify the possible facial features that may lead to better recognition[5] while coming to our present feature set. We will implement all these in MATLAB programming language. .

For every unknown person, his/her face that draws our attention most. So face is the most important visual identity of a human being. For that reason face recognition has been an important research problem spanning numerous fields and disciplines. This because face recognition, in additional to having numerous practical applications such as bankcard identification, access control, Mug shots searching, security monitoring, and surveillance system, is a fundamental human behaviour that is essential for effective communications and interactions among people. A formal method of classifying faces was first proposed in [5]. The author proposed collecting facial profiles as curves, finding their norm, and then classifying other profiles by their deviations from the norm. This classification is multi-modal, i.e. resulting in a vector of independent measures that could be compared with other vectors in a database. Progress has advanced to the point that face recognition systems are being demonstrated in real-world settings [2]. The rapid development of face recognition is due to a combination of factors: active development of algorithms, the availability of a large databases of facial images, and a method for evaluating the performance of face recognition algorithms. The problem of face recognition can be stated as follows: Given still images or video of a scene, identifying one or more persons in the scene by using a stored database of faces [1]. The problem is mainly a classification problem. Training the face recognition system with images from of the face recognition systems. the known individuals and classifying the newly coming test images into one of the classes is the main aspect

2.1. Face Recognition Technique:

There are lots of techniques for face recognition. These are: 1. Eigenfaces (Eigenfeatures). 2. Neural Networks. 3. Dynamic Link Architecture. 4. Hidden Markov Model. 5. Feature Based Matching. 6. Template Matching.

1. Eigenfaces:
Eigenface is one of the most thoroughly investigated approaches to face recognition. It is also known as Karhunen- Love expansion, eigenpicture, eigenvector, and principal component. References [2, 3] used principal component analysis to efficiently represent pictures of faces. They argued that any face images could be approximately reconstructed by a small collection of weights for each face and a standard face picture (eigenpicture). The weights describing each face are obtained by projecting the face image onto the eigenpicture. There is substantial related work in multimodal biometrics. For example used face and fingerprint in multimodal biometric identification, and used face and voice. However, use of the face and ear in combination seems more relevant to surveillance applications 2. Neural Networks: The attractiveness of using neural networks could be due to its non linearity in the network. Hence, the feature extraction step may be more efficient than the eigenface methods. One of the first artificial neural networks (ANN) techniques used for face recognition is a single layer adaptive network called WISARD which contains a separate network for each stored individual . The way in constructing a neural network structure is crucial for successful recognition. But it is not used for more number of persons. If the number of persons increases, the computing expense will become more demanding. In general, neural network approaches encounter problems when the number of classes (i.e., individuals) increases. Moreover, they are not suitable for a single model image recognition test because multiple model images per person are necessary in order for training the systems to optimal parameter setting.

3. Graph Matching: Graph matching is another approach to face recognition. Reference presented a dynamic link structure for distortion invariant object recognition which employed elastic graph matching to find the closest stored graph. Dynamic link architecture is an extension to classical artificial neural networks. Memorized objects are represented by sparse graphs, whose vertices are labeled with a multi-resolution description in terms of a local power spectrum and whose edges are labeled with geometrical distance vectors. Object recognition can be formulated as elastic graph matching which is performed by stochastic optimization of a matching cost function. In general, dynamic link architecture is superior to other face recognition techniques in terms of rotation invariance; however, the matching process is computationally expensive. 4. Hidden Markov Models (HMMs): Stochastic modeling of non stationary vector time series based on (HMM) has been very successful for speech applications. Reference [3] applied this method to human face recognition. Faces were intuitively divided into regions such as the eyes, nose, mouth, etc., which can be associated with the states of a hidden Markov model. Since HMMs require a one-dimensional observation sequence and images are two dimensional, the images should be converted into either 1D temporal sequences or 1D spatial sequences.

5. Feature based matching.

Geometrical feature matching techniques are based on the computation of a set of geometrical features from the picture of a face. The fact that face recognition is possible even at coarse resolution as low as 8x6 pixels [5] when the single facial features are hardly revealed in detail, implies that the overall geometrical configuration of the face features is sufficient for recognition. The overall configuration can be described by a vector representing the position and size of the main facial features, such as eyes and eyebrows, nose, mouth, and the shape of face outline.

geometrical feature matching based on precisely measured distances between features may be most useful for finding possible matches in a large database such as a Mug shot album. However, it will be dependent on the accuracy of the feature location algorithms. Current automated face feature location algorithms do not provide a high degree of accuracy and require considerable computational time.

6. Template Matching
A simple version of template matching is that a test image represented as a twodimensional array of intensity values is compared using a suitable metric, such as the Euclidean distance, with a single template representing the whole face. There are several other more sophisticated versions of template matching on face recognition. One can use more than one face template from different viewpoints to represent an individual's face. In general, template-based approaches compared to feature matching are a more logical approach. In summary, no existing technique is free from limitations. Further efforts are required to improve the performances of face recognition techniques, especially in the wide range of environments encountered in real world.

2.2 Fuzzy Logic

Fuzzy logic is a form of many-valued logic; it deals with reasoning that is approximate rather than fixed and exact. In contrast with traditional logic theory, where binary sets have two valued logic true or false, fuzzy logic variables may have a truth value that ranges in degree between 0 and 1. Fuzzy logic has been extended to handle the concept of partial truth, where the truth value may range between completely true and completely false. Furthermore, when linguistic variables are used, these degrees may be managed by specific functions. Fuzzy logic began with the 1965 proposal of fuzzy set theory by Lotfi Zadeh. Though fuzzy logic has been applied to many fields, from control theory to artificial intelligence, it still remains controversial among most statisticians, who

prefer Bayesian logic, and some control engineers, who prefer traditional two-valued logic.

2.2.1 Degrees of truth: Fuzzy logic and probabilistic logic are mathematically similar both have truth values ranging between 0 and 1 but conceptually distinct, due to different interpretations. Fuzzy logic corresponds to "degrees of truth", while probabilistic logic corresponds to "probability, likelihood"; as these differ, fuzzy logic and probabilistic logic yield different models of the same real-world situations. Both degrees of truth and probabilities range between 0 and 1 and hence may seem similar at first. For example, let a 100 ml glass contain 30 ml of water. Then we may consider two concepts: Empty and Full. The meaning of each of them can be represented by a certain fuzzy set. Then one might define the glass as being 0.7 empty and 0.3 full. Note that the concept of emptiness would be subjective and thus would depend on the observer or designer. Another designer might equally well design a set membership function where the glass would be considered full for all values down to 50 ml. It is essential to realize that fuzzy logic uses truth degrees as a mathematical model of the vagueness phenomenon while probability is a mathematical model of ignorance. The same could be achieved using probabilistic methods, by defining a binary variable "full" that depends on a continuous variable that describes how full the glass is. There is no consensus on which method should be preferred in a specific situation

2.2.2 Linguistic variables While variables in mathematics usually take numerical values, in fuzzy logic applications, the non-numeric linguistic variables are often used to facilitate the expression of rules and facts.[4] A linguistic variable such as age may have a value such as young or its antonym old. However, the great utility of linguistic variables is that they can be modified via linguistic hedges applied to primary terms. The linguistic hedges can be associated with certain functions.

2.2.3 Example Fuzzy set theory defines fuzzy operators on fuzzy sets. The problem in applying this is that the appropriate fuzzy operator may not be known. For this reason, fuzzy logic usually uses IF-THEN rules, or constructs that are equivalent, such as fuzzy associative matrices. Rules are usually expressed in the form: IF variable IS property THEN action

2.3 Principal Components Analysis

Finally we come to Principal Components Analysis (PCA). What is it? It is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences. Since patterns in data can be hard to find in data of high dimension, where the luxury of graphical representation is not available, PCA is a powerful tool for analyzing data. The other main advantage of PCA is that once you have found these patterns in the data, and you compress the data, i.e. by reducing the number of dimensions, without much loss of information. This technique used in image compression, as we will see in a later section.

2.3.1 Method
Step 1: Get some data Step 2: Subtract the mean Step 3: Calculate the covariance matrix Step 4: Calculate the eigenvectors and eigenvalues of the covariance matrix. Step 5: Choosing components and forming a feature vector Step 6: Deriving the new data set:


2.4 K MEANS CLUSTERING K-Mean Clustering K-Mean

Clustering K means clustering algorithm was developed by J. Macqueen (1967) and then by J. A. Hartigan and M. A. Wong around 1975. Simply speaking k-means clustering is an algorithm to classify or to group your objects based on attributes/features into K number of group. K is positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus the purpose of K-mean clustering is to classify the data. We also know beforehand that these objects belong to two groups of medicine (cluster 1 and cluster 2). The problem now is to determine which medicines belong to cluster 1 and which medicines belong to the other cluster. Here is step by step k means clustering algorithm:

Figure 2.1: flow chart for k-means clustering. Step 1. Begin with a decision on the value of k = number of clusters


Step 2. Put any initial partition that classifies the data into k clusters. You may assign the training samples randomly, or systematically as the following: 1. Take the first k training sample as single-element clusters 2. Assign each of the remaining (N-k) training samples to the cluster with the nearest centroid. After each assignment, recomputed the centroid of the gaining cluster. Step 3. Take each sample in sequence and compute its distance from the centroid of each of the clusters. If a sample is not currently in the cluster with the closest centroid, switch this sample to that cluster and update the centroid of the cluster gaining the new sample and the cluster losing the sample. Step 4. Repeat step 3 until convergence is achieved, that is until a pass through the training sample causes no new assignments. If the number of data is less than the number of cluster then we assign each data as the centroid of the cluster. Each centroid will have a cluster number. If the number of data is bigger than the number of cluster, for each data, we calculate the distance to all centroid and get the minimum distance. This data is said belong to the cluster that has minimum distance from this data.

2.5 Neural Network:

Artificial Neural Network (ANN) is a powerful tool for pattern recognition problems. The use of neural networks (NN) in faces has addressed several problems: gender classification, face recognition and classification of facial expressions. One of the earliest demonstrations of NN for face recall applications is reported in Kohonen's associative map [52]. Using a small set of face images, accurate recall was reported even when input image is very noisy or when portions of the images are missing. A few NN based face recognition techniques are discussed in the following.

2.5.1 Single Layer adaptive NN:

A single layer adaptive NN (one for each person) for face recognition, expression analysis and face verification was reported in [53]. A system named Wilke, Aleksander and Stonham's recognition devise (WISARD) was devised. It

needs typically 200-400 presentations for training each classifier where the training patterns included translation and variation in facial expressions. One classifier was constructed corresponding to one subject in the database. Classification was achieved by determining the classifier that was giving the highest response for the given input image.

2.5.2 Multilayer Perceptron (MLP):

Much of the present literature on face recognition with neural networks present results with only a small number of classes (often below 20). In [33] the first 50 principal components of the images were extracted and reduced to five dimensions using auto associative neural network. The resulting representation was classified using a standard multilayer perceptron (MLP).

Figure 2.2 : Basic structure of Artificial Neural Network.

2.5.2 Network Architecture.

1). Single layer feedforword networks: In this layered neural network the neurons are organized in the form of layers. In this simplest form of a layered network, we have an input layer of source nodes those projects on to an output layer of neurons, but not vise-versa. In other words, this network is strictly a feed forward or acyclic type .Such a network is called single layered network, with designation single later referring to the o/p layer of neurons. 2). Multilayer feed forward networks: The second class of the feed forward neuron network distinguishes itself by one or more hidden layers, whose computation nodes are correspondingly called neurons or units. The function of hidden neurons is inter

venue between the external i/p and the network o/p in some useful manner. The ability of hidden neurons is to extract higher order statistics is particularly valuable when the size of i/p layer is large. The i/p vectors are feedforward to 1st hidden layer and this pass to 2nd hidden layer and so on until the last layer i.e. output layer, which gives actual network response.

3). Recurrent networks: A recurrent network distinguishes itself from feed forward neural network, in that it has least one feed forward loop. As shown in figures output of the neurons is fed back into its own inputs is referred as self-feedback .A recurrent network may consist of a single layer of neurons with each neuron feeding its output signal back to the inputs of all the other neurons. Network may have hidden layers or not.

2.5.3 Learning of ANNS

The property that is of primary significance for a neural network is the ability of the network to learn from environment, and to improve its performance through learning. A neural network learns about its environment through an interactive process of adjustment applied to its synaptic weights and bias levels. Network becomes more knowledgeable about its environment after each iteration of the learning process. Learning with a teacher:

1). Supervised learning: the learning process in which the teacher teaches the network by giving the network the knowledge of environment in the form of sets of the inputs-outputs pre-calculated examples. Neural network response to inputs is observed and compared with the predefined output. The difference is calculated refer as error signal and that is feed back to input layers neurons along with the inputs to reduce the error to get the perfect response of the network as per the predefined outputs.

15 Learning without a teacher:

Unlike supervised learning, in unsupervised learning, the learning process takes place without teacher that is there are no examples of the functions to be learned by the network. 1). Reinforcement learning In reinforcement learning, the learning of an input output mapping is performed through continued interaction with environment in order to minimize a scalar index of performance. In reinforcement learning, because no information on way the right output should be provided, the system must employ some random search strategy so that the space of plausible and rational choices is searched until a correct answer is found. Reinforcement learning is usually involved in exploring a new environment when some knowledge( or subjective feeling) about the right response to environmental inputs is available. The system receives an input from the environment and process an output as response. Subsequently, it receives a reward or a penalty from the environment. The system learns from a sequence of such interactions. 2). Unsupervised learning: in unsupervised or self-organized learning there is no external teacher or critic to over see the learning process. Rather provision is made for a task independent measure of the quality of the representation that the network is required to learn and the free parameters of the network are optimized with respect to that measure. Once the

network has become tuned to the statistical regularities of the input data, it developes the ability to form internal representation for encoding features of the input and there by to create the new class automatically.



Edge detection refers to the process of identifying and locating sharp discontinuities in an image. The discontinuities are abrupt changes in pixel intensity which characterize boundaries of objects in a scene. Classical methods of edge detection involve convolving the image with an operator (a 2-D filter), which is constructed to be sensitive to large gradients in the image while returning values of zero in uniform regions. There are an extremely large number of edge detection operators available, each designed to be sensitive to certain types of edges. Variables involved in the selection of an edge detection operator include Edge orientation, Noise environment and Edge structure. The geometry of the operator determines a characteristic direction in which it is most sensitive to edges. Operators can be optimized to look for horizontal, vertical, or diagonal edges. Edge detection is difficult in noisy images, since both the noise and the edges contain high frequency content. Attempts to reduce the noise result in blurred and distorted edges. Operators used on noisy images are typically larger in scope, so they can average enough data to discount localized noisy pixels. This results in less accurate localization of the detected edges. Not all edges involve a step change in intensity. Effects such as refraction or poor focus can result in objects with boundaries defined by a gradual change in intensity [1]. The operator needs to be chosen to be responsive to such a gradual change in those cases. So, there are problems of false edge detection, missing true edges, edge localization, high computational time and problems due to noise etc. Therefore, the objective is to do the comparison of various edge detection techniques and analyze the performance of the various techniques in different conditions. There are many ways to perform edge detection. However, the majority of different methods may be grouped into two categories:


Figure 2.3: Edge detection process

2.6.1Gradient based Edge Detection:

The gradient method detects the edges by looking for the maximum and minimum in the first derivative of the image.

6.6.2 Laplacian based Edge Detection:

The Laplacian method searches for zero crossings in the second derivative of the image to find edges. An edge has the one-dimensional shape of a ramp and calculating the derivative of the image can highlight its location. Suppose we have the following signal, with an edge shown by the jump in intensity below: Suppose we have the following signal, with an edge shown by the jump in intensity below: .


When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (much data, but not much information) then the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input.



In the initial stage of our project we have studied various books based on our project. basically our project requires good knowledge on MATLAB so in order to fulfill the requirement of the project we have worked sincerely on MATLAB and learned various activities on it such as how to read or a picture, convert it to one form to another form, sampling, masking, edge detection using different operator etc. In this way we have understand the use of MATLAB in the field of Digital Image Processing. For our project followings are the concept that we gone through: i) Artificial Neural Network ii) Fuzzy Logic iii) K-means clustering iv) Future extraction v) Edge Detection


We believe that our approach will be of great use for forensic face recognition and criminal identification systems which require descriptive input semantics, since the available data often consists of witness' descriptions. In addition, our method of searching for data using descriptive semantics could combine with existing automated face recognition systems and augment them. Adler et al concluded in 2006 that humans effectively utilize contextual information while recognizing faces, and in general equal or outperform even the best automated systems. Extensions to our work could include the annotation of contextual data to images using the descriptive semantic method. This could help improve our face recognition method by obtaining qualitatively better user input as well as improving our recognition performance. In general, the use of descriptive input features allows for input data to bear different semantics than the data being searched for. We believe that this could yield good results for other data types as well, specially where direct pattern recognition is either infeasible or yields unsatisfactory results.



[1] A. Adler and M.E. Schuckers. Comparing human and automatic face recognition performance. Systems, Man, and Cybernetics, Part B, IEEE Transactions on , 37(5):1248{1255,Oct. 2007}. [2] Sherrie L. Davey Bruce W. Behrman. Eyewitness identification in actual criminal cases: An archival analysis. Law and Human Behavior ,25, Issue - 5:475{491, 2001.} [3] Ralph Gross. Face databases. February 2005. [4] Wong M. A. Hartigan J. A. A k-means clustering algorithm. applied statistics,. Journal of the Royal Statistical Society. Series C, Applied statistics , 28:100{108,1979.. [5] M. Kirby and L. Sirovich, Application of the Karhunen- Love procedure for the characterization of human faces, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, pp. 831-835, Dec.1990.