You are on page 1of 4

^ ,*

8th Seminar on Neural Network Applications in Electrical Engineering, NEUREL-2006 /m /E2 ~~~Faculty of Electrical Engineering, University of Belgrade, Serbia, September 25-27, 2006 2 06 +IE ~~~~http://neurel .etf`, http://www.ewh. htmI


Zoran Bojkovic, Senior Member IEEE, Andreja Samcovic University of Belgrade, Faculty for Traffic and Transport Engineering, Serbia e-mail: zsbojkovicgyahoo.comT
Abstract: Neural networks are adaptive information processing systems that offer attractive solutions for video surveillance. This application aims at identifying particular patterns. Also, MPEG-4 standard profiling strategy in facial animation guarantees that the standard can provide adequate solutions for video surveillance. The main goal of this presentation is to provide face detection for video surveillance using neural network based method. After providing the corresponding architecture for face detection, the emphasize is on the detector which is trained with multilayer back propagation neural networks. Three different face representations are taken into account, i.e. pixel representation, partial profile representation and eigenface representation. Based on this, three independent sub-detectors are generated. The detection rates are measured. The circle at about 94 00 indicates the position where the neural network achieves the optimal performance.
In the past years, there is a continuous increase in the field of human face processing [3], [4], [5]. The research area are: (a) video surveillance which is based on the fact that faces provide an important fact for people's identity, (b) facial expression analysis which provides a natural and intelligent computer interaction interface, (c) human faces in semantics-based video compression and coding. The first part of this work deals with state of the art in the area of face detection algorithm. After that, neural network based face detection approach, together with the corresponding architecture is analyzed. Training process method and result will conclude the presentation.

In the past decade, some efforts have been spent on designing face detection methods. These methods range from simple edge-based algorithms to complex

Keywords surveillance

Face detection, facial structure, video

There are two possible approaches to communication of talking-head video. The pixel-based approach renders the facial images and transmits the resulting images as array of pixels, where the modelbased approach transmits the facial animation parameters (FAPs) that describe the facial motions and renders the images at the receiver. The model-based approach divides the task into geometric and articulation modeling. They are described by the MPEG-4 Synthetic and Natural Hybrid Coding (SNHC) group as the facial definition parameters (FDPs) and FAPs, respectively. The geometric model defines the polygonal mesh of face and the associated skin texture from which visually realistic facial images from different view angles can be synthesized [1]. The articulation model (deals with the definition of static geometric models to generate various dynamic effects for intelligible reproduction of facial expressions [2]. FAPs are responsible for describing the movements of the face, either at low level or at high level. Here, low level means displacement of a specific single point of the face, and high level represents reproduction of a facial expression. In other words, FAP represents the proper animation parameter stream. The FDPs are responsible for defining the appearance of the face.

high-level approaches using pattern recognition techniques. Generally speaking, they can be classified as knowledge-based classifiers and statistical learning-based classifiers. Knowledge-based classifiers use the low-level image feature like skin-color face geometry and organic feature distribution. These detection methods use semantic knowledge of human faces an at the same time are relatively simple to implement. These algorithms are not robust for large face variations. A multilayered network to learn the face / non-face patterns from numerous training samples is reported in [6]. The presented detection suffers from the fact that the execution speed of the algorithm is too low for real-time surveillance applications. The system has no preknowledge about the most probable locations of faces. It must scan windows at all pixel positions and use arbitrary window sizes extracted from the input image. The most important factor in the execution time of the system is the number of small windows that the neural network has to process. To improve the efficiency of the neural network based method, a face detection approach that uses successive face detectors to progressively restrict the possible candidate face regions to smaller areas can be used. Successive face detectors approach is presented in Figure 1. It can be seen that three detectors (color-based, structure-based, learning-based) are cascaded. In that way, the outputs of a previous detector (potential facial regions) act as the inputs of the subsequent detector. The

1-4244-0433-9/06/$20.00 C2006 IEEE.


NEURAL NETWORK DETECTION APPROACH BASED FACE 0 b. color information is an effective image feature [8]. Fig. Cr) in the Y Cb Cr color space form a condensed cluster. The implementation of the system emphasizes the criteria described in developing the face detector. Note that the data projection methods attempt to take data from a high dimensional space and map it into a low dimensional space with the minimum of error. approach can be recommended. Such a method enables one to observe and detect underlying data distribution. skin-color segmentation and a binary filter as a post-processor to smooth the segmentation results. and (b) to reduce the size of the input space and find the intrinsic dimension of a signal.2. 250 200 Cr k 150 100 50 III. The chrominance components (Cb. Block-scheme for the successive face detectors approach in the whole chain A biologically-motivated face detection system developed in [7] is used to segment the face from the rest of the images. high dimensionality of data poses various challenges for learning algorithm [7]. while the overall detection performance can be reinforced by the next detectors. making it difficult for an algorithm to find any structure in the data. 1.initial pruning of large non-face areas by the first detectors. patterns and structures. They are as follows: * The algorithm must be robust enough to cope with the intrinsic variability in images * It must perform well in an unstructured environment * It should be amenable to real-time implementation and produce low or no false alarm A successful implementation of the face detection was performed using a retinal connected neural network architecture and was refined later to make it suitable for real-time applications. A great deal of effort has been devoted to this subject. The first detector in a cascade of face detectors emphasizes more speed then accuracy. The average performance of the system is above 95 00 on face images. In addition. An elliptical model is shown in Figure 2. data may be sparse. Visualization of the principle steps in the skin-color detector understands pixel by pixel color verification. significantly decreases the input windows for the final neural network based detector and increases the processing speed of the detector process. we can eliminate large background areas that contain no significant skin-like colors. For roughly locating facial regions. Depending on whether the input pixel falls inside or outside of the elliptical region. having 30-35 00 deviation from the frontal face image. Different inputs take into account different information patterns giving the trained neural networks a broader sensitivity to certain image patterns. the neural network Cb Fig. It is generated by fitting an ellipse over the cluster. The detector is trained with multilayered back propagation neural networks which take different face representations as an input. In higher dimensions. Elliptical model in skin-color distribution 0 50 100 150 200 250 45 . The weighted sum of the results from the networks should give a reliable judgment on the existence of face patterns. The two main reasons for reducing the dimensionality of a data are: (a) to allow the distribution of the data to be visualized. For example.- In order to improve the efficiency of the face detection for video surveillance. Searching for better and suitable data projection methods has always been an integral objective of pattern recognition. the presence of irrelevant and noisy information can mislaid the learning algorithm.

a too bright or too dark environment occurs.12. In [6]. The whole training process can be divided into parts. On the other hand. but it is computationally expensive. the network is trained with for example randomly selected 100 face samples and 100 non-face samples from each training set. 46 .The structure-based detector focuses on the feature structure of the face. As an example. After the vertical profile position of each candidate feature is located.3. This number is determined by the size of the non-face training set at that time. The network architecture for face detection is shown in Figure 4. These images are added to the non-face training set and the next part is started all over again. It is based on the principle component decomposition and provides a compact way of representing an arbitrary face using only a few parameters. Eigenface representation is very effective in facial coding and compression. This information retains certain invariability among faces. while an output of 0 indicates that the input window contains no recognizable face pattern. Each sub-network is a connected three-layer back propagation network with a fixed number of middle layer hidden units. It can be seen that the network architecture is composed of three parts which look at the three described face representations. We will suppose three face representation: pixel representation. The network achieves the optimal performance at about 94 00. Also. we propose the nonface training samples to be added to the training set dynamically. Sometimes. Also. we can apply a probability-based evaluation process to further verify the existence of prominent facial features. false acceptances. in which it is supplied with random new non-face samples clipped from the nonface repository. it is important to differentiate faces from other non-face objects. Three independent sub-detectors are generated based on three face representations. In order to select face areas present in the skin blobs acquired in the first detector. The goal is to determine which face representation method describes a face optimally. TRAINING PROCESS The neural network gradually learns face characteristics by learning from samples. we use a probability-based face structure detector which ranks the input window as a positive or negative face candidate. Partial profile representation looks at the profiles of three blocks. For each window. These samples also include the scaled versions at the same face with a scaling factor between 0. the test process pauses. At the end of each part. This is indicated by circle representation. at a given time during a training period. Figure 5 shows the training process for the face detection rate in the case of the eigenface representation. each featuring a salient facial feature region in the face. The algorithm starts with a gray-scale input window. When up to 250 false detections are collected. Also. X-PROFILE OF A X-PROFILE OF B y- PROFILE OF A BLOCK A BLOCK B y- PROFILE OF B y- BLOCK C PROFILE OF C X-PROFILE OF C Fig. 25 by 25 pixels). it should be noted that this is probably the most commonly used input format for neural network based object detectors. from which the second-order derivatives of local maxima are located as reference lines. It contains the complete information about the input window and can be seen as a lossless representation method. Each part contains a variable number of iterations. The facial profile in the vertical direction is then generated from the window. partial profile representation and eigenface representation. In that way the network is trained to handle explicitly non-face data. IV. An output of 1 indicates that the input window contains a face. because facial features have been proven valuable for classifying human faces. leading to a final detection failure. the network undergoes a test stage. NETWORK ARCHITECTURE FOR FACE DETECTION and 1 about the input window.8 and 1. faces were not detected due to substantial in-depth or in-plane detection. face training set contains 12000 face images collected from various face data bases and web photo galleries. The test samples are selected in such a way that the correlation between samples is kept as low as possible. Partial profile block representation V. The detection rates are measured against a separate test set of 500 faces and 4000 non-faces. The reference lines correspond to some comparatively "flat" and "light" areas in the face. facial feature appearance and distribution vary considerably among different people and under different imaging conditions. The positive candidate are then supplied to a neural network based detector to further evaluate the human face resemblance. Partial profile block representation is given in Figure 3. The samples are generated from the repository in a random way so that the correlation effect within one image can be maximally reduced. we can enumerate all small windows (for example. During each iteration. it captures all the variability among the face samples. The profile information contains the integral information about the pixel distribution. extracted from each skin blob at every position and scale. Starting from the pixel representation. which often degrades the performance of the detector. Profile analysis acts as a first-step prune in the structure detector and also as a feature candidate selector. We select coefficients (for example first 80) in the eigenface space for the representation of faces. i. The weighted sum of the sub-network outputs gives an indication between 0 As for the network training.e.

Vol.Vieux.8 0. Proc.Balkja. Pattern Recognition.5.E. ACKNOWLEDGMENTS [1] Z.9 . Within MPEG-4. [3] A. pp 433-464. June 2006.1 0. May 1997.3/4.INPUT UNITS OUTPUT UNITS \ FACE: 1 NONFACE: wo FINAL DECISION Fig. pp 7086. No. [2] P. 1999. Vol.Sharima: "Recognition of facial expressions and measurement of levels of interest from video". The MPEG-4 standard provides a constant and complete architecture for the coded representation of the desired combination of streamed elementary audiovisual information. January 1998.Milovanovic: "Audiovisual integration in multimedia communications based on MPEG-4 facial animation". LNCS 1542. T. pp 500-508. pp 705-740. on Multimedia. Proceedings IEEE.Doenges et al: "MPEG-4 audio/video synthetic graphics/audio for mixed media". [8] R. Of the Ist Conf. S. Birkhaeuser. as well as adequate solutions for applications in face detection for video surveillance.Bojkovic. May/August 2001.Pentland: "Eigenface for recognition".7 0. pp 151-161.83. B. No. No.A. IEEE Trans. No.9. on Computer Vision Systems.24.20. D. R.Rowley.Bullot. pp 311339.5 0.25. Journal of Cognitive Neuro-science". S. The weighted sum of the results from the three networks should give a reliable judgment on the existence of the face patterns.5. IEEE Trans. on Pattern Analysis and Machine Intelligence.1.S. CONCLUSION We have analyzed face detection approach in neural network based method. Vol. May 1995. pp 23-28. Image Communication. A neural network based face detector is trained with three multilayered back propagation neural networks which take three different face representations as an input.L.Crowley: "Face-tracking and coding for video compression".8 0. No.K.A. on Pattern Analysis and Machine Intelligence.Yeasin. It is demonstrated how the neural network technology can give a solution for the fast human face detection using a cascade of detectors: colorbased.1. Circuits. C.Jyenger: "Automatic recognition and analysis of human faces and facial expressions: a survey". [7] M. [6] H. A.2 This research has been supported by the Ministry for science and environmental protection of the Republic of Serbia. P.Sirhey: "Human and machine recognition of faces: a survey". IEEE Trans. A. No.Wilson.2 0.Sounal.L. Systems and Signal Processing CSSP.Kanada: "Neural networkbased face detection". pp 65-77. K.Jain: "Face detection in color images".4. REFERENCES 0. J. May 2002. Training process of the neural network eigenspace representation V.Hsu. [9] M. Network architecture for face detection 1 0.Chellapa.6 0.1.75 0 0.95 0.L. [5] W.8. Vol. Vol. 1991. Signal Processing.3. January 47 .4 0. No.Schwerdt. Vol. Vol.1.Turk.85 0. a binary format for scene description framework offers a parametric methodology for scene structure representation. [4] R. pp 696-706.5. January 1992. structure-based and learning-based.9 1 False detection rate Fig.20. Vol.3 0.