You are on page 1of 5




A Study of Genetic – Principal Component Analysis in the Feature Extraction and Recognition of Face Images
Omidiora E. Olusayo1, Oladipo Oluwasegun2, Oyeleye C. Akinwale3 and Ismaila W. Oladimeji
Abstract -This paper presents the development of a face recognition system using a hybrid Genetic - Principal Component Analysis. The principal component analysis uses the eigenfaces in features extraction after image pre-processing (i.e. grayscale conversion and matrix to vector conversion) and the genetic algorithm helps in removing irrelevant features in the eigenface space. This ensures that only relevant features were used in the biometric verification. Euclidean distance was used as the similarity measure and various thresholds were set at different cropped image dimension to determine if a test image is “known” or “unknown” to a trained image database. The performance metrics used include the average recognition accuracy, average recognition time and total training time. These were measured by varying cropped image dimension between 30*30 and 80*80. The experiment was carried out on a database of 320 black faces and the result showed that the GPCA-based system has better recognition accuracy with 9.1665% cumulative increment in recognition accuracy over PCA-based system, better recognition time with a cumulative time gain of 0.085401 seconds over the PCA-based system. However, PCA-based system took less time to train image database than GPCA-based system.

Index   Terms—   Faces,   Genetic   Algorithms,   Principal   Component   Analysis,   Feature   Extraction   and   Recognition
—————————— u ——————————



Face recognition is the ability to recognize human faces   with many variations of facial appearances such as face expression, background lightening, face orientation which is based on angle of capture, aging, facial hair and cosmetics. This is usually done by a computer analysis of two dimensional images that represent a three dimensional human face. The image often includes a human face together with a background. Thus, the face has to be extracted from the background under variety of light sources. This requires analysis of a number of human face characteristics that can be distinguished from other objects. Increasing demand for a fast and reliable process of face images recognition has obliged researchers to try and examine different pattern recognition schemes. This paper presents a study of Hybrid Principal Component analysis and Genetic Algorithm (GA) in image processing, with particular reference to human faces. Face recognition has become one of the most active research areas of pattern recognition since the early 1990s. In the past 20 years, significant advances have been made in design of successful classifier for face recognition [1]. Principal components analysis (PCA) method, which is the basis of well-known face recognition algorithm, eigenfaces [5] is an appearance-based technique used widely for the feature extraction and has recorded a great performance in face recognition. PCA based approaches typically include two phases: training and classification. In the training phase, an

eigenspace is established from the training samples using PCA and the training face images are mapped to the eigenspace for classification. In the classification phase, an input face is projected to the same eigenspace and classified by an appropriate classifier, such as Support Vector Machines (SVMs) or Neural Networks [2]. Genetic algorithm is an evolutionary algorithm methodology inspired by biological evolution [3]. Evolutionary algorithms create a population of abstract representations of candidate solutions, which is evolved using biology inspired operators such as selection, crossover and mutation towards better solutions. In recent years, Genetic Programming and other evolutionary algorithms have been used in classification and pattern recognition problems[6]. With the rapid development of biometrics in security evolution, face recognition systems are being used in many domains because of its non- intrusive and stressfree nature in confirming the true identity of any human being. This advancement breaks the limitations whereby face recognition facilities were only used for identity verification and surveillance purposes. Face recognition is now used in many applications in interpreting human intentions, actions and behavior, input to various context awareness solutions. In this work, the main objective is to develop a face recognition system using hybrid Genetic Principal Component Analysis Based techniques. Also, evaluation of the performance of the algorithm using the following parameters: average recognition rate, recognition time, training time, false acceptance rate and false rejection rate were carried out with conventional PCA.

• Omidiora E. Olusayo, Oladipo Oluwasegun, Oyeleye C. Akinwale and Ismaila W. O. are all with the Department of Computer Science and Engineering, Ladoke Akintola University of Technology, Ogbomoso, Nigeria




2. Review of Related Works
Over the years, a lot of researchers have been actively involved in face recognition researches. Duan, Liu, Zhao and Chao (2011)[1] worked on improving PCA algorithm for face recognition. Traditional PCA algorithm combines with the processing of enhancing local mean and standard deviation of image. The improved PCA algorithm mixed with LDA and two dimensional PCA algorithms respectively extends the scope of PCA algorithm. Experimental results on ORL face database showed that the method had good recognition performance and more robust on uneven illumination and expressive faces. In 2006, Sentayehu [4] developed a Face Recognition System (FRS) based on Artificial Neural Network in his work, two FRSs were developed. The first model used Principal Component Analysis (PCA) for features extraction and ANN for classification purpose. The second model, combination of Gabor Filter (GF) and PCA were used for feature selection and ANN for classification. Experiment was carried out on FRS by using ORL dataset, the result showed that model 1 achieve 75.6% recognition rate while model 2 achieve 88.3% recognition rate of correct classification and perform very efficiently when subjected to new unseen images with false rejection rate of 0% during testing. It was concluded that the high recognition rate of model 2 showed the efficiency of GF in feature extraction. Sarawat etal, 2009 presented a FRS using genetic algorithm and back-propagation neural network. The overall system architecture consists of three steps. Some pre-processing was performed on the input image. Secondly, face features are extracted, which were taken as input to the BackPropagation Neural network (BPN) and the third step

involved genetic algorithm. Classification was carried out by BPN and GA. The results obtained demonstrated high degree of performance of the implemented algorithms.

3. Methodology
This research paper presents the development of a FRS using a hybrid algorithm; Genetic - Principal component analysis. The principal component analysis uses the eigenfaces in features extraction and the genetic algorithm remove irrelevant features in the eigenface space. This is to ensure that only features relevant in biometric identification are passed on to the testing stage. The developed system uses euclidean distance as the similarity measure to determine if face image is recognized or not. Figure 1 shows a simple block diagram for the system implementation. The face database consists of 320 images of 40 classes. Each class comprises 8 images of each subject. Some of the subject images were taking at different times with different facial expressions. The sequence of the steps involved in developing the system is described in flow charts as shown in figures 2 and 3 respectively. The implementation proceeds in two stages namely the training and testing stage.
feature selection and weighing For better classification (GA)

Face Inage

Feature Extraction (PCA)

Face classification

Figure 1 Simple block diagram for implemented face recognition system





Acquire a set of face images for training.

Acquire test face image

Crop training images to desired dimension.

Crop the testing image to desired dimension.

Convert images to grayscale

Convert images to grayscale

Convert images data into a vector forms

Convert images data into a vector forms
Apply Principal Component Analysis for dimension reduction, and form Eigen faces

Project image data onto the GPCA face space
Apply SSGA to PCA features to remove irrelevant features

Match the introduced face with the ones in the database template
Project the faces in the training set onto the GPCA face space.

Classify the image into either known or unknown
Store the projected images in the database

Figure 2a: Training Stage of the FRS The training stage is preceded by a series of pre-processing stage which includes: image cropping, to remove background and unwanted parts of the image, image grayscale conversion and conversion of image data to vector form. Principal component analysis is used on the vector image form to ensure that the face image dataset is represented by a reduced number of effective features and yet retains the most intrinsic information of the image data. The face database used has 200 images used for training and each image has 6400 (80*80) pixel after background removal. Sample images are shown in figure 3 before it was cropped and figure 4 shows cropped sample images. The number of pixels defines the number of dimension. Next step was to find the covariance matrix and its eigenvectors and eigenvalues. The eigenvectors were reordered in descending order of importance (first eigenvectors are those whose corresponding eigenvalues are Figure 2b: Testing stage of the FRS



the maximum). Then the less important eigenvectors were eliminated. The eigenvectors were sorted according to their corresponding eigenvalues from high to low. The eigenvectors corresponding to zero eigenvalues was discarded while those associated with non-zero eigenvalues were kept. A matrix of eigenvectors was formed which forms the eigenface. The eigenface formed was inputted into the genetic algorithm module to remove irrelevant features of the eigenface. This was to ensure only relevant feature were used in biometric verification and identification of faces. In the genetic algorithm module an initial population of uniformly distributed random number of same size as the eigenface matrix was generated as feature mask. Each element of the feature mask was multiplied by equivalent element of the eigenface to generate weighted feature. Features in the weighted features with values not approximately equal to zero or zero were selected to create a parent population. The fitness value of the parent population was calculated using equation (1). This was repeated twenty times to generate 20 parent population. Parent populations were selected in descending order and crossover was performed on the strongest two parents to replace the weakest parent. The population was mutated based on the mutation probability and the fitness value of the new population was computed. This was done for a number of fifty generations and the fittest of the parent was selected as the Genetic feature matrix which was used for the recognition process. It should be noted that roulette wheel selection and one point crossover were used for the selection and crossover procedures respectively. Fitness = ((number of errors) * 10) + Feature Used (1) less than or equal to the threshold were selected. And the one with least euclidean distance was selected when the images selected were more than one. In this case the image was classified as “Known” and when none of the images in the GPCA face space was less than or equal to the Threshold value, the test image is said to be “unknown”. It should be noted that the threshold value varies with the cropped face image dimension. All the 320 images acquired were used in this research work. Five (5) images from each class of 8 are chosen for training while the other three were used in testing the system. The test images were introduced one at a time specifying the filename, to see if it will be classified as being known or unknown to the face recognition system. The performance metrics used are Average recognition accuracy, average recognition time and total training time. These were investigated as it varies with cropped image dimension to deduce optimum dimension for the face recognition system and the results were evaluated with the performance of a conventional PCAbased face recognition system.

4. Simulation Output and Result Discussion
The simulation process precedes in two stages namely the training stage and testing stage. At the training stage, the system used five (5) of the 8 image samples collected to form a GPCA feature space. This was done by iteration through the faces in the f1 to f40 folder of the project directory and training the files named 1.jpeg, 2.jpeg, 3.jpeg, 4.jpeg and 5.jpeg respectively. The testing image samples were arranged thus; 11.jpeg, 12.jpeg, 13.jpeg were the three remaining images of the first subject, 21.jpeg, 22.jpeg, 23.jpeg were the three remaining samples of the second image while 401.jpeg, 402.jpeg and 403.jpeg were the three remaining images of the last subject respectively. These were all stored and arranged in the project directory of the system’s local drive. The dimension of the cropped image influences the performance of both GPCA-and PCA-based FRS. Table 1 shows the summary of the effect of increase in the cropped dimension from 30x30 through 80x80 for both the GPCA and PCA-based systems. The recognition accuracies at dimension 30X30 are 91.6667% and 93.3333% for PCA and GPCA-based systems respectively. At dimension 80X80 a recognition accuracy of 96.6667% and 98.3333% was obtained for PCA and the GPCAbased system respectively. This implies that recognition accuracy of the both systems increases with increase in cropped dimension. This is because more features will be available to identify the face image with an increase in cropped image dimension. The GPCA-based system had better recognition accuracy; this is due to the fact that only relevant features will be made available for recognition in the GPCA feature matrix.

Figure 3 Sample image before background removal

Figure 4 Sample image after cropping The Genetic feature outputted was projected into face space and stored in the database (template). During testing, images were projected into a GPCA face space after it has been pre-processed (i.e. cropped, gray scaled and converted to vector), the task of determining whether the image was recognized or not was carried out. This was achieved by Performing distance measure using euclidean distance between the projected test image and created GPCA projected images. The image(s) with euclidean distance value

Table1: Recognition accuracy for both GPCA-and PCAbased system
Dimension of Cropped Face image 30X30 40X40 50X50 Average recognition Accuracy for GPCA (%) G 93.3333 95.8333 96.6667 Average recognition Accuracy for PCA (%) P 91.6667 94.1667 95.8333 Increment in recognition accuracy G-P 1.6666 1.6666 0.8334 Percentage Increment in Recognition Accuracy (%) (G-P * 100) / P 1.818108 1.76984 0.869635



60X60 97.5000 95.8333 70X70 98.3333 96.6667 80X80 98.3333 96.6667 Cumulative increment in recognition accuracy 1.6667 1.6666 1.6666 9.1665 1.739166 1.724068 1.724068 (t1) 30X30 200 29.9054 17.7997 40X40 200 51.5895 30.1862 50X50 200 104.4580 80.0285 60X60 200 236.5440 209.3220 70X70 200 549.6040 530.0290 80X80 200 1170.03 1142.4400 Extra time incurred in GPCA Training 12.1057 21.4033 24.4295 27.2220 19.5750 27.5900 132.3255 100) / t2 68.0107 70.90425 30.526 13.00484 3.693194 2.415006

The time of recognition of images introduced to the system was well captured, as the program evaluates the difference between the time image was recognized and the time the search commences. The system was made to iterate through images 11.jpeg, 12.jpeg, 13.jpeg through 401.jpeg, 402.jpeg, 403.jpeg in the project directory; which were the equivalent testing images. The time to recognize each image was captured and cumulated to calculate the average time it takes to recognize a test face image. At cropped dimension 30X30 it took an average of 0.087502 seconds and 0.0787076 seconds to recognize a test image in PCA-and GPCA-based system respectively and at cropped dimension 80X80 it took an average of 0.107769 seconds and 0.0845118 seconds to recognize a test image in PCA-and GPCA-based system respectively as shown in Table 2. This shows that GPCAbased system optimizes search time better than the PCA counterpart. This is due to the fact that the feature matrix presented for GPCA search has only relevant feature to search through.

5. Conclusion
In conclusion the experimentation reveals that the GPCAbased system developed had a better recognition accuracy and better recognition time than the conventional PCA-based system but PCA-based system trains faster than GPCA-based system. It could also be inferred that the difference in average recognition time increases with increase in cropped image dimension, percentage increase in training time of PCA-based system relative to GPCA-based was minimal (2.415006%) at the highest cropped image dimension 80X80 and generally recognition accuracies increase with increase cropped face image dimension used.


Table 2: Average recognition time for GPCA-and PCAbased system.
Dimension of Cropped face images Average Average recognition recognition time GPCA time PCA system system (Sec.) (Sec.) (t1) (t2) 30X30 0.0787076 0.087502 40X40 0.0817280 0.092515 50X50 0.0841049 0.094705 60X60 0.0868764 0.100801 70X70 0.0830243 0.101062 80X80 0.0845118 0.107769 Cumulative recognition time gained Recognition time gain (Sec) T= t1-t2 0.008794 0.010787 0.010600 0.013925 0.018038 0.023257 0.085401 Percentage Gain in Recognition time (%) ((t1-t2) * 100) / t2 10.05120 11.65973 11.19265 13.81435 17.84845 21.58042


[3] [4] [5] [6]

The time incurred while training the GPCA-based system was captured as the difference between the cpu-time at the commencement of the first image training and cpu-time at the conclusion of the last training image. At dimension 30X30 a total time of 17.7997 seconds and 29.9054 was incurred training the PCA-and GPCA-based system respectively while at dimension 80X80 a total time of 1142.4400 seconds and 1170.03 seconds was incurred training the PCA-and GPCAbased system respectively as shown in Table 3. This indicated that the PCA-based system trains faster than the GPCA counterpart. This could be explained by the time incurred in creating new parents in the feature population and time taking in iterating through different generation, trying to fish out population with the best fitness value.

F. Duan, Y. Liu, Q. Zhao and I. Chao (2007). Face Recognition Hybrid Algorithm Based on Improved PCA, Journal of Computational Information Systems 7: 9 (2011) 3062-3069 A. Eleyan and H. Demirel (2005). .Face Recognition System Based on PCA and Feedforward Neural Networks., Computational Intelligence and Bioinspired Systems, vol. 3512, pp. 935-942. J. R., Koza (1992). “Genetic Programming: On the Programming of Computer by Means of Natural Selection,” MIT Press: Cambridge. E. W. Sentayehu (2006). face recognition using artificial neural network Addis Ababa University, Ethiopia. M. Turk and A. Pentland (1991) Eigenfaces for recognition, J. Cognitive Neurosci. 3 (1), pp. 71–86 S. Xuesong and Y. Zhou (2009). “Gray Intensity Images Proc-essing for PD Pattern Recognition Based on Genetic Programming,” International Joint Conference on Artificial Intelligence JCAI’09, Haikou, pp. 711-714.

Table 3: Training time for both GPCA-and PCA based system.
Dimension of Cropped face images Numbe rs of Trained images Total training time GPCA (Sec) Total training time PCA (Sec) (t2) Extra training time (sec) (T=t1-t2) Increase in training Time (%.) ((t1- t2) *