Professional Documents
Culture Documents
Kyung-A Kim 1 , Se-Young Oh 2 , Hyun-Chul Choi 3 Dept. of Electrical Eng., Pohang University of Science and Technology, Pohang, Kyungbuk, 790-784, South Korea 1 kablue98@postech.ac.kr 2 syoh@postech.ac.kr 3 dreaming@postech.ac.kr
Abstract
This paper presents a novel algorithm for the extraction of the facial feature (eyebrow, eye, nose and mouth) fields from 2-D gray-level face images. The fundamental philosophy is that eigenfeatures, derived from the eigenvalues and eigenvectors of the gray-level data set constructed from the feature fields, are very useful to locate these fields efficiently. In addition, multi-resolution images, derived from a 2-D DWT (Discrete Wavelet Transform), are used to save the search time of the facial features. The experimental results indicate that the proposed algorithm is robust against facial feature size and slight variations of pose.
heuristic assumptions are used. Also, to enhance performance of the system, we combine a few single feature detectors. In Section 2, the basic idea of the proposed algorithm is reviewed briefly. A facial feature extractor, which uses the proposed algorithm, is introduced in Section 3. And methods to improve the performance of the extraction system are presented in Section 4. Experimental results are reported in Section 5.
1. Introduction
Face recognition is useful for many applications, such as identity authentication, security access, e-commerce etc., which has received greater interest among experts recently. To recognize face, automatic extraction of facial features from a persons face is a very important process. Many researchers have proposed methods to find the facial feature regions [1,3-5,8] or to locate the face region [6-7,9-10] in an image. These methods can be classified by their use of three types of information: template matching, intensity and geometrical features. In general, template matching requires many templates to accommodate varying pose whereas the intensity method requires good lighting conditions. And geometrical method is not robust enough for the various pose. In this paper, we present a novel algorithm for the extraction of the facial feature fields from 2-D graylevel face images. We use a sliding window template of the facial features represented in the eigenfeature space to locate facial features in face images. Here, the eigenfeature space is defined as eigenvector space from the training set which consists of a particular facial feature one of eyes, eyebrows, nose and mouth. To speed up execution time, multi-resolution images and
the eigenfeature space. The dominant PCA components are then mapped back into the original image space and the result is called a PCA filtered image. In the facial feature map, the lowest value indicates the presence of a feature (see Figure 2).
and computational cost exists. In our test, because the use of the former images is enough to obtain good performance, latter images are not used. Step3 : Determine coarse regions of eyebrows, eyes, nose and mouth using the LL3(32x32) image. Here, a feature detection system is made up of 2 double feature detectors. One gets coarse region of eyebrows and eyes, while the other obtains the region containing both nose and mouth. The coarse region detectors were trained on randomly selected 30 frontal images with
Input image
HL3
LL2
HL2
HL1
Determine the coarse regions for eyebrows, eyes, nose and mouth.
Generate a binary edge image or image with certain threshold within coarse region.
Compute the facial feature map and detect features. Single Feature Detector (eye) Single Feature Detector (eyebrow) Single Feature Detector (nose) Single Feature Detector (mouth)
If failed
Figure 2. Example of the feature map for the eye. Step1 : Compute eigenvectors (eigenfeatures) of the facial features (eyebrow, eye, nose, mouth) with manually marked feature from the training set. Step2 : Perform a Haar transform of the input image at several levels (In our experiments, 3 levels) with scale-frequency resolution. Among the results of the transform, LL2, LH2, HL2 and LL3 images are used to detect facial features. To improve the accuracy of detection with higher cost, LL1, LH1 and HL1 images can be used. Therefore a tradeoff between accuracy
Facial features
Figure 3. The system block diagram manually marked samples. Size of each sample is 6x15, 8x8 for the eyebrow-eye and the nosemouth region respectively. This step not only reduces the size of the search window in the next step but also improves performance of the facial feature detection. Figure 4 shows examples of the detection result. Step4 : Compute the feature map as follows. Step4-1: Generate a binary edge image (canny
or sobel edge) or an image with certain threshold within a coarse region (result of the previous step) using summation of LH2 and HL2 images (see Figure 5), which contain high frequency features. To save time, only those pixels with the 1s value within the coarse region (result of the previous step) are computed for the facial feature map. Step4-2: Compute the facial feature map using a PCA information in LL2 image. In the feature map, the lowest value indicates the presence of a feature [2].
eyes while double feature detector2 extracts that of the nose and mouth. Second row of the Figure 6 shows the advantage of detecting both eyes and eyebrows over the single feature detectors. The single eye detector cannot extract the right eyebrow, whereas the eyebrow detector makes up for the eye detectors defects. First row of the Figure 7 also demonstrates the advantage of using the nose-mouth detector. The single nose detector cannot extract the exact nose, however, this problem is solved as the mouth detector detects a nose as well as mouth. The inverse case may appear.
(a) (b) Figure 4. Example of the coarse feature region by Step3. (a) and (b) represent the coarse region containing eyebrow-eye and nose- mouth.
(a) (b) (c) Figure 6. Example of the results of (a) Eye Detector, (b) Eyebrow Detector and (c) Combined Detector
(a) (b) Figure 5. (a) From left to right, top to bottom, clockwise, LL2, LH2, (LH2+HL2) and HL2 images. (b) binary image obtained by applying a certain threshold to the summed image of the LH2 and HL2.
(a) (b) (c) Figure 7. Example of the results of (a) Nose Detector, (b) Mouth Detector and (c) Combined Detector 4.1.1 Eye-eyebrow detector (1) Eye detector First, select four candidates (two eyes and two eyebrows) that have the lowest value in the feature map. This extracts both eyes and eyebrows with over 80% of accuracy. If four candidates satisfy heuristic assumptions (the relations between eyebrows and eyes), the eyebrow detector does not
4.1. Combined usage of the two single feature detectors for increased robustness
We developed a feature detector system which consists of the two double feature detectors and four single feature detectors (see Figure 3). Double feature detector1 extracts a region containing the eyebrows and
operate (see the first row of the Figure 6). Otherwise, the eyebrow detector operates to detect missed features. (2) Eyebrow detector Considering both heuristic assumptions and the result of the eye detector, it works within the coarse region. If the lowest value in the eyebrow feature map is below some chosen threshold, the position is selected for the remaining feature to be found (see the second row of the Figure 6). 4.1.2 Nose-mouth detector (1) Nose and mouth detector Select two candidates that have the lowest value in the feature map of each single feature detector system. Considering both heuristic assumptions (the relations between nose and mouth) and the results of nose-mouth detector, features are extracted (see Figure 7).
256x256 pixels in 256 grey levels and are taken against a homogeneous background, with the person in an upright frontal position, with tolerance for some tilting and rotation. The eye and eyebrow detectors were trained on selected 20 frontal images while the nose and mouth detectors were learned from 30 frontal images with manually marked features. Size of each feature in the training samples is 20x50, 20x40, 26x50, 30x60 for eyebrow, eye, nose and mouth, respectively. Figure 8 shows the examples of the training feature set. In this experiment, our algorithm was applied to different people with various poses. The algorithm is very fast (with the average execution time of 0.0862sec) and produced good results. The feature detectors were trained as described before, using 20~30 frontal images. To improve the generalization performance, selection of the good samples which represent various shapes is important. The proposed algorithm is robust enough for the test data, which is the rotated facial images. Table 1 shows the experimental results. The performance of feature extraction have achieved correct hit rate of 92.23~98.13% for the training feature samples consisting of frontal faces. Also, the system achieved generalization of 90.17~96.78% for the test set. Especially, the extraction performance of the eyes is higher than any other algorithms [1,3]. Figure 9 shows the representative experimental results. The results demonstrate the robust performance of the detector which extracts narrow eyes as well as eyes in spectacles. Dotted rectangles represent examples of mis-extraction in rotated faces. To improve the performance in the rotated face, we can select the sample features in the rotated space as well as the frontal space. For simplicity, facial features hidden by hair were not considered. Table 1. Extraction performance. A: Number of missed features in the frontal face DB. B: Number of missed features in the rotated face DB. C: Performance of the facial feature extraction in the frontal face DB. D: Performance in the rotated face DB. (Figures within parentheses represent the total number of the features) eyebrow A B C
16/(206) 160/(1628) 92.23% 90.17%
eye
4/(214) 55/(1710) 98.13% 96.78%
nose
5/(107) 40/(856) 95.33% 95.33%
Mouth
7/(107) 47/(856) 93.46% 94.51%
5. Experimental Results
We have used the IMDB (Intelligent Multimedia Lab. Database) face data, which consists of 107 Asian faces (56 males and 51 females) [11]. All images are
Figure 9. Examples from the results of the facial detector. Cases include male and female faces, with and without eyeglasses. Dotted boxes indicate errors.
6. Conclusion
A facial feature extraction algorithm is presented based on eigenfeatures and multi-resolution images with the following three merits. First, the training and extraction time of the proposed system is less than that of the existing that we know of any algorithms [1,3]. The training time is needed only to compute eigenfeatures, making it faster than SVM or MLP [1,3]. The extraction time takes less than 0.09 second for each image through the use of the 2-D DWT, coarse region extraction, binary images (edge or threshold) and heuristic assumptions. Second, although the detector system is trained using a relatively small feature sample set of 2~3% of the total data, it has good generalization performance. Third, the eigenfeatures and geometric information of the features, that is, the result of the proposed feature detection system, can be directly applied to face recognition without additional processing. The proposed system can be applied to 3D face modeling, face tracking and detection in mobile robots and face recognition.
tional Conference on Pattern Recognition, Volume 1, 1998, pp.110-113. [9] Kin Choong Yow, R. Cipolla. Feature-based human face detection. Image Vision Computer, Volume:15 Issue:9, 1997, pp.713-735. [10] Takeo Kanade. Picture processing by computer complex and recognition of human faces. Technical Report, Department of Information Science, Kyoto University, 1973. [11] Intelligent Multimedia Laboratory. Asian Face Image Database PF01. Technical Report, Department of Computer Science and Engineering, Pohang University of Science and Technology, 2001.
References
[1] Yeon-Sik Ryu and Se-Young Oh. Automatic Extraction of Eye and Mouth Fields from a Face Images using Eigenfeatures and Ensemble Networks. Applied Intellignece, Vol.17(2), 2002, pp.171-185. [2] Matthew Turk and Alex Pentland. Eigenfaces for Recognition. Journal of Cognitive Neuroscience, Massachusetts Institute of Technology, 1991, Volume3, Number 1. [3] Dihua Xi, Igor T. Podolak and Seong-Whan Lee. Facial Component Extraction and Face Recognition with Support Vector Machines. Proceedings of the Fifth International Conference on Automatic Face and Gesture Recognition, IEEE Computer Society, 2002, pp.76-81. [4] Yankang Wang, H.Kuroda, M. Fujumura, A. Nakamura, Automatic extraction of eye and mouth fields from monochrome face image using fuzzy technique. Proceedings of the 1995 Fourth IEEE International Conference on Universal Personal Communications Record, Tokyo, Japan, 1995, pp.778-782. [5] R.Pinto-Elias, J.H. Sossa-Azuela. Automatic facial feature detection and location. Proceeding of the 14th International Conference on Pattern Recognition, Volume2, 1998, pp.1360-1364. [6] R. Brunelli, T. Poggio. Face recognition: features versus templates. Pattern Analysis and Machine Intelligence, IEEE Transactions on, Volume:15 Issue:10, October 1993, pp.1042 -1052. [7] P.Juell, R.Marsh. A hierarchical neural network for human face detection. Pattern Recognition, Volume:29 Issue:5, 1996, pp.781-787. [8] Weimin Huang, Q. Sun, C.P. Lam, J.K. Wu. A robust approach to face and eyes detection from images with cluttered background. Proceedings of the 14th Interna-