You are on page 1of 8

AUTOMATIC FACE REGION TRACKING FOR HIGHLY ACCURATE FACE

RECOGNITION IN UNCONSTRAINED ENVIRONMENTS

Young-Ouk Kim†*, Joonki Paik†, Jingu Heo‡, Andreas Koschan‡, Besma Abidi‡, and Mongi Abidi‡

Image Processing Laboratory, Department of Image Engineering
Graduate School of Advanced Imaging Science, Multimedia, and Film
Chung-Ang University, Seoul, Korea

*Korea Electronics Technology Institute, 203-103 B/D 192, Yakdae-Dong,


Wonmi-Gu Puchon-Si, Kyunggi-Do 420-140, Korea

Imaging, Robotics, and Intelligent Systems Laboratory
Department of Electrical and Computer Engineering
The University of Tennessee, Knoxville

The final goal of intelligent surveillance systems is to


ABSTRACT accurately identify the subject. Face recognition is a
separate research area in image processing and computer
In this paper, we present a combined real-time face
vision that can serve this objective.
region tracking and highly accurate face recognition
The area of face recognition has become more
technique for an intelligent surveillance system. High-
attractive than ever because of the increasing need for
resolution face images are very important to achieve an
security. Eigenface (PCA) [3] and Local Feature Analysis
accurate identification of a human face. Conventional
(LFA) [4] are popular algorithms in face recognition
surveillance or security systems, however, usually provide
technology. Other algorithms such as Linear Discriminant
poor image quality because they use only fixed cameras to
Analysis (LDA) [5], Independent Component Analysis
passively record scenes. We implemented a real-time
(ICA) [6], Elastic Graph Matching (EGM) [7], Neural
surveillance system that tracks a moving face using four
Networks (NN) [8], Support Vector Machines (SVM) [9]
pan-tilt-zoom (PTZ) cameras. While tracking, the region-
and Hidden Markov Models (HMM) [10] have also been
of-interest (ROI) can be obtained by using a low-pass
actively investigated for face recognition. Although some
filter and background subtraction with the PTZ. Color
leading (according to the FERET Evaluation Report [11])
information in the ROI is updated to extract features for
commercial software packages such as those by Identix
optimal tracking and zooming. FaceIt®, which is one of
and Viisage are widely used in real applications such as
the most popular face recognition software packages, is
super bowl games and airports, critics believe that their
evaluated and then used to recognize the faces from the
accuracy is still questionable. The performance of any
video signal. Experimentation with real human faces
face recognition software depends on how one controls
showed highly acceptable results in the sense of both
the area where faces are captured in order to minimize
accuracy and computational efficiency.
illumination effects, pose and other facial variations [12].
A performance enhancement technique has been proposed
1. INTRODUCTION using the post-processing method in [13].
Recently, intelligent surveillance systems have gained Among various factors that directly affect the
more attention, especially for use in unconstrained, accuracy of a face recognition algorithm, the size and
complicated security environments. The main purpose of pose of the face are the most important in the sense of
these systems is to monitor and identify an intruder with quality and reliability of outcome.
an acceptable level of accuracy. Most existing In this paper, we present an efficient, real-time
surveillance systems simply record a fixed viewing area, implementation of a four-channel automatic zoom (in/out)
while some others adopt a tracking technique for wider module for high-resolution acquisition of face regions.
coverage areas [1,2]. Although panning and tilting the We also test an existing face recognition algorithm [14]
camera extends its viewing area, only a few automatic using the optimally detected face region. Although object
zoom control techniques have been proposed for tracking is an active research topic in computer vision, its
acquiring the optimum ROI. practical implementation is still under development due to
the high computational complexity and difficulty in quality, reliable face features for input to the recognition
analysis of false detections. module.
Optimum zooming control plays an important role in The outputs from the four cameras are directed to the
enhancing the performance of tracking [15] and at the four-input multiplexer, and the output of the multiplexer is
same time provides for highly accurate identification of an digitized by a frame grabber. Since most surveillance
intruder. To realize this function, we first detect and track systems have a recording device with various kinds of
the face of a moving person in front of four PTZ cameras, image compression, we needed a separate frame grabber
and extract several features for tracking and optimization to acquire raw image data. In this system, we used
of the zooming scale. Existing real-time tracking Microsoft DirectShow to minimize redundant
techniques include: CAMSHIFT [16], condensation [17] computations for real-time image processing.
and adaptive Kalman filtering. But these algorithms fail to DirectShow can seamlessly integrate the modules to
track the object when it moves far away from the camera. play back or capture a particular media type [20].
Many chroma distribution-based face tracking
algorithms have been proposed since these are very 3. REAL-TIME FACE REGION SEGMENTATION
efficient in the sense of both tracking performance and AND RECOGNITION
computational speed. Yang and Weibel [18] proposed a
real-time face tracking algorithm using normalized color For accurate identification of an intruder, an optimum
distribution. Yao and Gao [19] presented a face tracking zooming ratio must be automatically generated by the
algorithm based on the skin and lip chroma transform. system. This optimum zooming ratio can be obtained only
Huang and Chen [20] built a statistical color model and by a robust tracking algorithm. Features used by most
deformable template for tracking multiple faces. tracking algorithms include: (i) color, (ii) motion, and (iii)
These algorithms, however, cannot successfully track contour. The tracking algorithms may fail when the target
the face region in the presence of occlusion or when object becomes extremely small in the viewing region
colors are similar to the background. The proposed since color, motion, and contour information are subject
technique utilizes both color distribution and ellipse to being unstable [21].
template matching to solve the occlusion problem in real- In this paper, we adopt the low-pass filter based
time. technique proposed in [24], to detect the candidate area of
a moving object. After detecting the moving object, we
2. THE PROPOSED FACE TRACKING- segment the face area from the background based on the
RECOGNITION FRAMEWORK HSV color system. We can then extract the appropriate
zooming ratio and features for tracking based on the fault
The framework for automatic face region detection analysis of four inputs at the same time. Figure 2 shows
and recognition is shown in Figure 1. the flowchart of the proposed algorithm.

RGB 24 bit 4-channel image

Multiplex 4-channel image to one image

Motion detection by low-pass

HSV transform and dynamic threshold


Figure 1: The proposed face region tracking and recognition
system
Feature extraction for zooming, tracking
We used four PTZ cameras for tracking and recording the
moving object and an additional fixed camera, which is Template matching, check
not shown in the figure, for recording the wide-angle N
Y
view.
Four cameras can be flexibly arranged for a specific 4 Active cameras control (P/T/Z)
application. For this research we aligned the four cameras
horizontally, 1 meter apart from each other. This
arrangement is to obtain the maximum angle face in Face recognition
unconstrained environments. Each camera has a 25
zooming ratio and is built over an in-house pan-tilt Figure 2: PTZ camera control for face segmentation and
assembly. Because of the pan-tilt assembly and high recognition
power zooming, the proposed system can provide high
3.1. Adaptive motion detection represents the number of pixels having the same hue
value. As shown in this figure, we see that the distribution
In a tracking algorithm, automatic ROI detection is
of hue values for the same face changes according to the
very important to meet the perceptual requirements. This
distance from the camera.
processing, in general, consumes large amounts of system
We can extract maximum, low-threshold, and high-
resources because of its computational complexity. Color
threshold values of a face using the previously defined
correlation, blob detection, region growing, prediction,
candidate area. These three variables can efficiently
and contour modeling [16] are some of the techniques
segment the face region form the background. Figure 5
used for automatic ROI detection.
presents the hue distribution of a face within the ROI with
We were able to detect a reasonably accurate
3 values.
candidate region using a Gaussian low-pass filter. The
candidate area of a moving object is obtained as
180 50

^ 1M

N o. of pixels for skin color(1M ,2M )


160 45

N o. of pixels for skin color(3M )


I nm = I ng − I mg , (1) 140 2M 40

35
120 3M
30
100
where I ng and I mg respectively represent the Gaussian 80
25

20
filtered n and m -th image frames, which are converted 60
15

to the normalized RGB color coordinate system. 40 10

Figure 3 shows the result of candidate moving area 20 5

0 0
detection: the top left image represents I 5 , top right [99] [104] [109] [114] [119] [124] [129] [134] [139] [144] [149] [154] [159] [164] [169] [174] [179]

image I 23 , bottom left I 5 g − I 8 g , and bottom right H ue Value

I 23 g − I 25 g .

Figure 4: Skin color histograms of the same face at three


different distances

180
f(x)Max
160 f(xi)Low-th
140
f(xi)Hi-th
120

100

80

60

40

20

Figure 3: Candidate moving area detection 0


[99] [104] [109] [114] [119] [124] [129] [134] [139] [144] [149] [154] [159] [164] [169] [174] [179]

This method can be successfully applied even when


the target face disappears during initialization or tracking. Figure 5: Hue distribution within the ROI

3.2. Skin color segmentation from background Using the three values, f(x)Max, f(xi)Low-th, and f(xi)Hi-th,
we can segment the face region within the ROI from the
Color information for a moving object is one of the background. The hue index of f(xi)Max can be iteratively
most important features. However, color changes due to calculated and the other variables f(xi)Low-th, and f(xi)Hi-th,
illumination changes and reflected light. In this can be formulated as
experiment, we applied the HSV color model since it is
less sensitive to illumination changes than other color f ( xi ) Low−th : f ' ( x i )f ' ( x i 1 ) ¡Ü0 , and ( f ( xi ) < f ( x) Max ), (2)
models. In the proposed surveillance system, the skin
color of moving objects changes according to the distance f ( xi ) Hi −th : f ' ( x i )f ' ( x i+1 ) ¡Ü0 , and ( f ( x i ) > f ( x ) Max ), (3)
between the object and camera even if light conditions are
fixed. where f ′ represents the first derivative of f .
Figure 4 presents experimental results of skin color Figure 6 respectively shows the original input image,
changes according to the distance between the cameras the corresponding HSV image, and the face region
and the moving object. In this figure the horizontal axis segmented from the background.
represents the hue value of the face and the vertical axis
Figure 6: Skin color segmentation result

3.3. Feature extraction for zooming and tracking


In this paper, we select three features for automatic Figure 8: 4-camera automatic zooming with face tracking
zooming and face tracking. The first feature is the mean
location (xc, yc) of hue values, which are located between In (5), the effective pixel ratio indicates the error
f(xi)Low-th, and f(xi)Hi-th, within the detected ROI probability of zooming and tracking. If this value is
smaller than a prespecified value, we must detect a new

∑ H ( x, y ) ∑ H ( x, y )
y
candidate area for the moving object having the latest
f(xi)Low-th, and f(xi)Hi-th values. This dynamic change of the
x
xc = , yc = , (4) ROI is necessary for correct tracking. This process is
EH EH shown in Figure 9.

where H(x,y) represents the pixel location of an effective


hue value and EH the number of selected pixels having
effective hue values. The second feature is the area of the
detected ROI, and the third is the effective pixel ratio,
RROI, within the detected ROI. The mean location xc and yc
indicating the direction of the moving object and the
second feature AROI determines the optimum zooming Figure 9: Dynamic change of ROI
ratio; the third feature, RROI , is used for fault detection in
zooming and tracking. The second and the third features
can be formulated as 3.4. Simple template matching for occlusion problems
In order to avoid the undesired extension of the
E tracked region to neighboring faces, an ellipse fitting is
AROI = WidthROI × Height ROI and RROI = H . (5)
AROI performed every 30 frames, using the generalized Hough
transform on the edge image of the rectangular region that
Automatic zooming is performed using the AROI is searched based on color distribution.
feature. There are two experimentally selected limiting The ellipse fitting procedure followed by a region
values for automatic zooming, Tele and Wide. If AROI is search can make the detection more robust. Figure 10
greater than Wide, the zoom lens turns wide for zooming shows the ellipse fitting result on the edge image.
down, and vice versa. Figure 7 presents experimental
results of the proposed face tracking algorithm using only
the pan/tilt function. The result of 4-channel automatic
zooming with face tracking is shown in Figure 8.
In Figures 7, 8, and 9, segmented face regions are
shown in black, and the histogram of the face region is
overlaid on each image.

Figure 10: Occlusion between two faces (top), Sobel edge


Figure 7: Single face tracking detection within the ROI (bottom-left), ellipse
fitting (bottom-right)
3.5. Face recognition using FaceIt 4. PERFORMANE EVALUATION OF FACE
RECOGNITION USING FACEIT®
The fitted face region obtained in subsection 3.4 is fed
into a face recognition package, FaceIt, where it is
matched against a database of faces and results of 4.1 FaceIt® identification accuracy
identification or rejection are reported. Figure 11 shows
In this experiment, still images were used and the
the template for FaceIt® software. In general, the steps
focus was on several facial image variations such as
involved in the process of recognition are: (1) creation of
expression, illumination, age, pose, and face size. These
a gallery database of face images, (2) selection or input of
factors represent major concerns for face recognition
a subject image to be identified, which can either be a still
technology. According to the FERET evaluation report
image or a live sequence, (3) matching is performed and
[11], other factors such as compression and media type do
results given with their respective confidence rate.
not affect the performance and are not included in this
experiment.
We divided the evaluation into two main sections with
an overall test and a detailed test. In the overall test, we
evaluated the overall accuracy rates of FaceIt®. In the
detailed test, we determined what variations affect the
system’s performance. For lack of databases with mixed
variations, we only considered one variation at a time in
the face image for the detailed test. Table 2 shows a
summary and description of the tests included in this
section. The overall performance of FaceIt® Identification
for 1st match is about 88%. FaceIt® also works well under
expression and face size variations in cases where these
Figure11: FaceIt® software template window types of variations are not mixed. Age variation,
illumination, and pose changes have proven to be a
challenging problem for FaceIt®.
3.6. Time performance
Table 2: Experimental results for FaceIt® Identification
The performance of the proposed algorithm is
evaluated to measure the processing time for a set of 1st Match 1st 10 Match
algorithms. Tested image frames of 320×240 resolution Tests Gallery Subject Success Success
Rate (%) Rate (%)
were used. Table 1 summarizes the processing time of the 1,676 1,475 1,577
algorithms using different PC platforms. For real-time Overall Test 700(fa)
(fa, fb) (88.0%) (94.1%)
tracking, at least 15 frames per second (FPS) is required, 197
Expression 200(ba) 200(bj) 200 (100 %)
and the algorithm showed acceptable speed even with the (98.5%)
slowest PC (Pentium 3, 670 MHz). Illumination 200(ba) 200(bk)
188
197 (98.5%)
(94.0%)
Table 1: Result of algorithm processing speed in ms Age 80(fa) 104(fa) 83 (79.8%) 99 (95.2%)
CPU 200
P3-0.6GHz P3-1.2GHz P4-1.7GHz Frontal image gives the
Pose 200(ba) (bb~bh)
Step best result
/pose
Motion detection* (10.22)ms (8.38)ms (7.03)ms No affect as long as the
HSV transform 36.96 28.55 24.86 Face Size 200(ba) 200(ba) distance between the eyes
is more than 20 pixels
Dynamic threshold 6.72 5.19 4.52
Feature extraction 4.70 3.63 3.16 We did not use all of the images provided by FERET
Fault analysis 2.02 1.56 1.36 but selected only those suitable for this experiment. The
Template match* (50.2) (35.2) (31.4) 2-letter codes (fa, fb, etc) indicate the type of imagery. For
Camera interface 3.36 2.60 2.26 example, fa indicates a regular frontal image. Detailed
Total Time (ms) 67.20 51.90 45.20 naming conventions can be seen at [24]. Figures 12 and
Speed (fps) 14.88 19.27 22.12 13 show example images of the same individuals under
* (Not executed every frame) different conditions, such as expression, illumination, age,
and pose. In the pose test, FaceIt achieved acceptable
In the following section, a thorough performance good accuracy rates for poses within ± 25° of the frontal
evaluation of the commercial face recognition package image.
used in our experiments is conducted.
Table 4 shows a description of the tests that were not
included in this report and the reasons they were not
included. Table 5 shows the execution time and other
compatibilities of FaceIt®.
Table 4: Test items not included in this experiment [11]
Not included Description Reason
Different compression Does not affect
Compression
Figure12: Example images of the same individual under ratios by JPEG performance
different conditions tested with FaceIt Identification Images stored on different
Does not affect
[23] Media media
performance
CCD or 35 film
Does not affect
Image type BMP, JPG, TIFF and etc
performance
Time delay of a photo Covered by overall
Temporal
taken and age test
Features should be
Resolution Image resolution
seen clearly

Figure13: Example images of same individual with different Table 5: Execution time and compatibilities
poses [23] Feature Description
Aligning In order to create a gallery database, three steps are
(eye necessary; auto aligning, create template and create
Table 3 and Figure 14 show a summary of the pose positioning) vector - 2~3 sec / image.
tests (R-Right rotation, L-Left rotation). The greater the Matching In order to match against database, subjects should be
pose deviation from the frontal view, the less accuracy aligned first (1~2 sec) and then matched (2.5~3 sec;
FaceIt® achieved and the more manual aligning required. depends on the size of database).
Speed Up We can load the data into RAM to speed up process.
Ease of Use Easy to add and delete images regardless of the size
Table 3: Summary of pose test and image types (drag images from Window Explorer
into the FaceIt® software).
Pose(R, L) 1st Match 1st 10 Match Manual Aligning
(%) (%) Required (%)
90°L N/A N/A 100.0 4.2 FaceIt® surveillance accuracy
60°L 34.5 71.0 13.5 In this experiment, live face images from real scenes were
40°L 65.0 91.0 4.5
captured by FaceIt software using a small PC camera
25°L 95.0 99.5 2.5
15°L 97.5 100.0 0.5
attached via a USB port. We used randomly captured face
0 100.0 100.0 0.0 images and matched these against databases which were
15°R 99.0 99.5 0.0 used previously in the FaceIt Identification test.
25°R 90.5 99.5 2.0 In order to see the effects of variations, we applied
40°R 61.5 87.5 4.5 different database sizes (the small DB was the IRIS
60°R 27.5 65.0 11.0 database which contains 34 faces while the large DB was
90°R N/A N/A 100.0 700 faces from FERET plus the IRIS DB) and different
lighting conditions to face images. Since face variations
are hard to measure, we divided variations such as pose,
expressions and age into small and large variations.
Figure 15 shows an example of captured faces used in the
experiment. When we captured the faces, any person with
significant variations such as quick head rotation or
continuous or notable expression changes was considered
as a large variation, while the others were considered as
small variations.

Figure 15: Example images of face frames used for FaceIt®


Surveillance experiment
Figure 14: Summary of pose test
Table 6 provides a results summary for this
experiment. The time elapsed between the preparation of
the IRIS database and the captured faces was
approximately 3 months. The basic distance between the
camera and the faces was 2~3 ft. The detailed test only
used a person who seemed to be moderately well
recognized in the overall test. From detailed tests 1 to 4,
we can see how the database size and facial variations
affect performance. From detailed tests 3 to 8, we can see
how lighting can affect the performance. We can also
observe how distance affects the performance from
detailed tests 8 and 9. For the lighting conditions, we set
‘High’ as an indoor ambient illumination condition and
‘Medium’ as not ambient but still recognizable through
human eyes. ‘Front’, ‘Medium’,’ Side’, and ‘Back’
indicate the placement location of additional lights.
Figure 16: The effects of DB size and variations
Table 6: Summary of experimental results (basic distance 2~3ft,
time elapsed 3 months, O: overall, D: detail, sub:
subject, ind: individuals, mat: match)
1st mat 10 mat
Test DB Face
Description Light Num Num
No. size /ind
/Sub /Sub
Small DB & 55.8 % 96.6 %
High 758
O1 Large 34 423 732
& Front /13
Variations /758 /758
Small DB & 55.0 % 99.0%
High 200
D1 Large 34 110 198
& Front /1
Variations /200 /200
Large DB & 47.5 % 78.5 %
High 200
D2 Large 734 95 157
& Front /1
Variations /200 /200
Small DB & 67.0 % 99.0%
High 200
D3 Small 34 134 198
& Front /1
Variations /200 /200
Large DB & 60.5 % 93.0 %
High 200
D4 Small 734 121 186
& Front /1
Variations /200 /200
Small DB & 34.0 % 96.5.0% Figure 17: The effects of lighting and distance
200
D5 Small 34 Medium 68 193
/1
Variations /200 /200
Small DB &
Medium 200
60.5 % 98.5% 5. CONCLUSIONS
D6 Small 34 121 197
& Side /1 In this work we presented a real-time, optimum ROI
Variations /200 /200
Small DB &
Medium 200
32.0 % 80.5 % detection technique, especially useful for face tracking, in
D7 Small 34 64 161 an intelligent surveillance system. Since accurate
& Back /1
Variations /200 /200
Small DB &
identification of a human face is more important than just
0.0 % 16.0% tracking a moving object, an efficient method to detect the
Small 200
D8 34 Medium 0 16
Variations /1
/100 /100
face region and a resulting high-resolution acquisition are
Dist: 9~12ft needed.
Small DB &
5.0 % 78.0 % The proposed intelligent surveillance system with
Small Medium 200
D9
Variations
34
& Front /1
5 78 built-in automatic zooming and tracking algorithms can
/100 /100
Dist: 9~12ft efficiently detect high-resolution face images and stably
track the face. One major contribution of this work is the
Figure 16 shows the effects of database size and development of real-time, robust algorithms for automatic
variations while Figure 17 addresses lighting and distance. zooming and tracking, and an intelligent surveillance
A small DB, small variations, close distance, high lighting system architecture using multiple PTZ cameras with
and additional frontal lighting result in best performance. seamless interface.
We also evaluated FaceIt, one of the most popular
commercial face recognition softwares, and examined
how variations in faces affect its performance. From a
distance with poor illumination conditions, FaceIt gave [8]. H. Rowley, S. Baluja, and T. Kanade, “Neural network-
unacceptably poor accuracy. FaceIt needs at least 20 based face detection,” Proc. IEEE Conf. Computer Vision,
pixels between the eyes in order to detect and recognize Pattern Recognition, pp. 203-208, 1996.
faces. The proposed system can detect small face areas [9]. E. Osuna, R. Freund, and F. Girosi, “Training support
and zoom in those regions in order to increase the vector machines: an application to face detection,” Proc.
performance of FaceIt with high quality images where IEEE Conf. Computer Vision, Pattern Recognition, pp. 130-
features are clearly seen. 136, 1997.
Although face recognition systems work well with [10]. F. Samaria and S. Young, “HMM based architecture for
“in-lab” databases and ideal conditions, they have face identification,” Image, Vision Computing, vol. 12, pp.
exhibited many problems in real applications. [So far, no 537-583, 1994.
face-recognition systems, tested in airports, have spotted a
[11]. D. Blackburn, J. Bone, and P. Phillips, “FRVT 2000
single person who is wanted by authorities.] Variations evaluation report,” Evaluation Report NIST, pp.1-70,
exist in unconstrained environments including pose, February 2001.
resolution, illumination, and age differences. They make
face recognition a very difficult problem. Detection of [12]. M. Bone and D. Blackburn, “Face recognition at a
chokepoint: scenario evaluation results,” Evaluation Report
faces from a distance and in crowds is also a challenging
Department of Defense, November 2002.
task.
In order to increase the performance of face detection [13]. C. Sacchi, F. Granelli, C. Regazzoni, and F. Oberti, “A
and recognition, a combination of robust face detection real-time algorithm for error recovery in remote video-
and recognition is necessary. An incorporated face based surveillance application,” Signal Processing: Image
recognition system using other imaging modalities such as Communication, vol. 17, pp. 165-186, 2002.
thermal imagery and 3D face modeling which provide [14]. P. Phillips, H. Moon, and S. Rizvi, “The FERET evaluation
more features and is invariant to changes in poses should methodology for face-recognition algorithms,” IEEE Trans.
be developed to be successfully used for surveillance. PAMI, vol. 22, no. 10, pp. 1090-1104, October 2000.
By using four horizontally aligned cameras, we can [15]. X. Clady, F. Collange, F. Jurie, and P. Martinet, “Object
significantly extend the viewing angle of a person-of- tracking with a pan-tilt-zoom camera: application to car
interest. More specifically, the maximum viewing angle driving assistance” Proc. Int. Conf. Robotics, Automation,
with recognition accuracy 95% or higher is ± 25 o for a pp. 1653-1658, 2001.
single camera. On the other hand, the corresponding [16]. G. Bradski, “Computer vision face tracking for use in a
viewing angle can be extended up to ± 75o when using perceptual user interface,” Intel Tech. Journal, Q2, 1998.
four cameras. [17]. M. Isard and A. Blake, “Condensation-conditional density
propagation for visual tracking”, Int. Journal, Computer
REFERENCES Vision, vol. 29, no. 1, pp. 5-28, 1998.
[1]. L. Davis, I. Haritaoglu, and D. Harwood, “W4: real-time [18]. J. Yang and A. WaiBel, “A real-time face tracker”,
surveillance of people and their activities,” IEEE Trans. Proceedings of WACV’96, pp. 142-147, 1996.
PAMI, vol. 22, no. 8, pp. 809-830, 2000.
[19]. H. Yao and W. Gao, “Face locating and tracking method
[2]. R. Collins, O. Amidi, and T. Kanade, “An active camera based on chroma transform in color images”, Signal
system for acquiring multi-view video,” Proc. Int. Conf. Processing Proc.2000 , vol 2. pp. 1367-1371, 2000.
Image Processing, pp. 517-520. 2002.
[20]. F. Huang and T. Chen, “Tracking of multiple faces for
[3]. M. Turk and A. Pentland, “Eigenfaces for recognition,” human-computer interfaces and virtual environments,” Int.
Journal, Cognitive Neuroscience, vol. 3, pp 72-86, 1991. Conf. Multimedia, Expo, pp. 1563-1566, vol. 3, 2000.
[4]. P. Penev and J. Attick, “Local Feature Analysis: a general [21]. M. Linetsky, Programming Microsoft Direct Show,
statistical theory for object representation,” Network: Wordware Publishing Inc, 2002.
Computation in Neural Systems, vol. 7, no. 3, pp.447-500,
[22]. P. Fieguth and D. Terzopoulos, “Color-based tracking of
1996.
heads and other mobile objects at video frame rates,” Proc.
[5]. P. Belhumeur, J. Hespanha, and D. Kriegman, “Eigenfaces Computer Vision, Pattern Recognition, pp. 21-27, 1997.
vs. fisherfaces: recognition using class specfic linear
[23]. B. Menser and M. Wien, “Automatic face detection and
projection,” IEEE Trans. PAMI, vol. 19, no. 7, pp.711-720,
tracking for H.263 compatible region-of-interest coding,”
1997.
Proc. SPIE, vol. 3974, 2000.
[6]. P. Comon, “Independent component analysis, a new
[24]. http://www.itl.nist.gov/iad/humanid/feret/feret_master.html
concept?,” Signal Processing, vol. 36, pp. 287–314, 1994.
[7]. L. Wiskott, J. Fellous, N. Krüger, and C. Malsburg, “Face
recognition by elastic bunch graph matching,” IEEE Trans.
PAMI, vol 19, pp. 775-779, 1997.