You are on page 1of 6

International Conference on Convergence and Hybrid Information Technology 2008

Automatic face detection using feature tracker

Ki-sang Kim1, Gye-Young Kim2, Hyung-Il Choi1

1School of Media, Soongsil University, Seoul, Korea

School of Computing, Soongsil University, Seoul Korea

Abstract training data without deriving an explicit model of the

skin color. This method has higher accuracy. But it has
This paper presents a method for face detection too long process to find in real-time system. The third
which is robustness in face rotation. This method has method is parametric skin distribution modeling. This
two main modules, candidate face region detecting method require much storage space and their
and tracking. To detect a face region in variously performance directly depends on the
conditional image, we used skin color detection, which representativeness of the training images set.
is rule-based algorithm. Then, to track the region, we Face tracking algorithm has classified a few
applied Harris corner detection and a greedy feature categories. First, a rule-based face detection
tracker which has robustness for rotated facial image. algorithm[8] is based on reasoning rule from human
In experimental result, we assess the performance of face research worker’s knowledge. It’s easy to use
face tracking algorithm, which is robustness in known knowledge. However, this system is hard to
rotation. apply human face reasoning rules exactly. Second,
feature-based face detection algorithm[9] used facial
1. Introduction feature for face detection. Skin color which is one of a
variety of features is less sensitive to facial translation,
Research of face tracking has been intensified due rotation, scale. So this algorithm is most commonly
to its wide range of applications in security, used recently. Third template-based algorithm[11] is to
entertainment industry, gaming, psychological facial compare between some standard facial pattern and
expression analysis and human computer interaction. searching window. Good point is very simple.
Recent advances in face video processing and However, this algorithm is very sensitive to facial
compression have made face-to-face communication rotation, scale, variety of light variation and image
be practical in real world applications. However, noise. The last, neural network algorithm[12] is
higher bandwidth is still highly demanded due to the learning face region and other, which get from variety
increasing intensive communication. And after decades, images, and detection face. This algorithm is good at
robust and realistic real time face tracking still poses a front and side face. However, it has too computation
big challenge. The difficulty lies in a number of issues and it is not good at variety of rotated face.[7]
including the real time face feature tracking under a This paper describes an active face feature tracking
variety of imaging conditions (e.g., skin color, pose that is not related to imaging conditions. Detecting face,
change, self-occlusion and multiple non0rigid features we used skin color detection. After that, Extracting
deformation). feature points from facial image using Harris corner
In general, face detection with skin color detection detector[15], and tracking these feature points with a
has several categories.[16] First, a method to build a greedy feature tracker.
skin classifier is defined explicitly the boundaries skin In general, our real time face tracking system is
cluster in some color space. This method is very outlined in Fig 1, which consists of two big modules:
simple. So it is good at real-time process. Second is
Nonparametric skin distribution modeling. The key
idea is to estimate skin color distribution from the

978-0-7695-3328-5/08 $25.00 © 2008 IEEE 211

DOI 10.1109/ICHIT.2008.203
Figure 2. Original image

Figure 1. Overall system configuration; n is the

minimum number of feature points

One module is Detection face region from image,

using skin color detection. Another is face tracking
with a greedy feature tracker.
The organization of the paper is as follows: In
section2, we will explain about skin color detection.
Figure 3. Result of skin color detection without
Face tracking with Harris corner detector and a greedy
feature tracker will be described in Section 3, followed
by experimental results and evaluations in Section 4.
Finally the concluding remarks will be given in As you see in Figure 3, skin color detection has
noise. So it needs to remove noise. For filtering, we
Section 5.
used erosion and dilation. It removes noise. Also, it
can merge the face, which is divided by glasses. (See
2. Detect face region with skin color in Figure 4 and Figure 5.)
This method is face detection from image using
formal face skin color.[16] For fast face detection, we
used simple method of skin color detection. To detect
skin color clusters in RGB color space, we found
several rules. This method is simple and very fast.
(R, G, B) is classified as skin if:

R > 95 & R < 220 & G > 40 & B > 20 &

max{R, G, B} – min{R, G, B} > 15 &
|R – G| >15 & R > G & R > B
Figure 4. Result of skin color detection with filtering

ªg x º
g « g » ’I
¬ y¼
ªg x º ª g x2 g x g y º
gg T
«g » g x g y @ « » (1)
«¬ g x g y g y »¼
¬ y¼
Z ³³
gg T wdx

The symmetric 2 * 2 matrix Z of the system must

be both above the image noise level and well-
(a) non-filtered face region conditioned. The noise requirement implies that both
eigenvalues of Z must be large, while the conditioning
requirement means that they cannot differ by several
orders of magnitude. Two well eigenvalues mean a
roughly constant intensity profile within a window. A
large and a small eigenvalue correspond to a
unidirectional pattern. Two large eigenvalues can
represent corners, salt-and-pepper textures, or any
other pattern that can be tracker reliably.
In practice, when the smaller eigenvalue is
sufficiently large to meet the noise criterion, the matrix
Z is usually also well conditioned. This is due to the
fact that the intensity variations in a window are
(b) filtered face region bounded by the maximum allowable pixel value, so
Figure 5. Comparing with (a) and (b) that the greater eigenvalue cannot be arbitrarily large.
As a consequence, if the two eigenvalues of Z are
3. Face tracking using feature tracking O1 and O2 , we accept a window if
min(O1 , O2 ) ! T where T is a predefined threshold.
It has two big modules. One is extracting feature Figure 6, shows range of corner by O1 and O2 .
points. Another is tracking points. We’ll explain about
these two modules.

3.1. Extract feature points using Harris corner


After extract face region from image, we need to

face feature points for tracking. So, we used Harris
corner detector algorithm[15] to extract feature points
from face. The basic principle of the Harris corner
detector is that a good feature is a one that can be
tracked well, so tracking should not be separated from
feature extraction. A good feature is a textured patch
with high intensity variation in both x and y directions,
such as a corner. Denote the intensity function by
g(x,y) and consider the local intensity variation matrix Figure 6. Classification of image points

Figure 7 shows the result of extracting feature


Figure 7. Result of extract feature points Figure 8. Result of feature tracking without estimation

3.2. Track feature points

After extracted feature points using Harris corner

detector, we have to track those points. Continuously
tracking method with these points, we used a greedy
feature tracking algorithm. This algorithm has less
computation and higher accuracy. So it is good at real
time system.
Let I t ( x, y ) is 2D gray scaled image in t-th frame
which we want to find feature from I t1 ( x, y ) frame.
Consider a feature point f t 1 > f xt 1 f yt 1 @T on the
I t1 ( x, y ) frame. The goal is to find the feature point Figure 9. Result of feature tracking with estimation.
location f t f t 1  d >f x
t 1
 dx f y
t 1
 dy @
on the
4. Experimental Result
frame It such as these two feature values are similar.
The vector d >d x dy @ is the feature movement In this section, we shows the result of face tracking.
The testing environment is Microsoft Windows XP on
value, also known as the optical flow. It is focus on the a Core Duo 3.0Ghz, Intel Corp. The compiler used
notion of similarity in 2D neighborhood sense. Let Z x Visual C++ 6.0. The camera used for experimentation
was 640 u 580. Each frame has a color-value resolution
and Zy , which are integers, defines a image
of 24 bits, i.e. RGB each has 256 levels.
neighborhood of size. For find feature movement, the The Figure 11 displays the face region which was
vector minimization function H defined as follows: founded by skin color detection.

f xt 1 Z x f yt 1 Z y
H ( d ) H d x dy ¦ ¦ (I t 1 ( x, y )  I t ( x  d x , y  d y )) 2 (2)
x f xt 1 Z x y f yt 1 Z y

But, some feature is not correct that we want to find.

So the minimization function needs to estimation e .
The estimation function defined as follows:

H (d )  e (3)
As you see Figure 8 and Figure 9, The estimation
function deleted the feature point when face is
Figure 10. Original image

Figure 12 shows the most of feature points are
translated when face is translated. But some feature
points are not exactly same and one feature point is
Figure 13 shows the tracking result when face is

Figure 11. Detection of face region with skin color

After that, we need to find facial feature points.

Figure 11 shows the results of extracting feature points
by applying the Harris corner detection algorithm.

Figure 13. Face tracking when face is rotated

Figure 13 shows the most of feature points are

rotated. But many of feature points are deleted cause of


Figure 11. Extraction facial feature points with Harris ‫ړډڋ‬
corner detection.
These images show the feature points that found
from face region. When face is translated than feature ‫ڏډڋ‬
points are must translated too. In figure 12 shows the
face tracking result by applying a greedy feature
tracker. ‫ڋ‬
‫ڋ‬ ‫ڌ‬ ‫ڍ‬ ‫ڎ‬ ‫ڏ‬ ‫ڐ‬ ‫ڑ‬ ‫ڒ‬ ‫ړ‬

Figure 14. Average of feature points count in ideal

face region. Diamond is estimation tracking and square
is tracking without estimation.

As you see figure 14, estimation delete points

which is useless feature point. So a greedy feature
tracker is better than existing tracker. Horizontal axis
is time. Vertical axis is ratio of average of feature
points count in ideal face region. Figure 15 shows the
ratio of face region with ideal face region. Horizontal
axis is time. Vertical is ratio of overlapped region. We
checked that estimation tracking is better than existing
Figure 12. Face tracking when face is translated tracker. We used formula 4 to comparing face region.

[5] Vámossy, Y., Tóth, Á., Hirschberg, P.: PAL-based
ratio = overlapped region / (face region + ideal region) (4) Localization Using Pyramidal Lucas-Kanade Feature Tracker,
In: 2nd Serbian-Hungarian Joint Symposium on Intelligent
Systems, Subotica, Serbia and Montenegro, 2004, pp. 223-
[6] Q. Zhu, S. Avidan, and K. Cheng, “Learning a sparse,
‫ڌ‬ corner-based representation for time-varying background
modelling,” in Proc. 10th Intl. Conf. on Computer Vision,
‫ړډڋ‬ Beijing, China, 2005.

‫ڑډڋ‬ [7] Ming-Hsuan Yang, David Kriegman, and Narendra

Ahuja. “Detecting Faces in Images: A Survay”, IEEE
‫ڏډڋ‬ Transaction on Pattern Analysis and Machine Intelligence,
vol. 24, no. 1, pp. 34-58. Jan. 2002.
[8] C. Kotropoulos, and I. Pitas, “Rule-based detection in
‫ڋ‬ frontal views”, International Conference on Acoustics,
‫ڋ‬ ‫ڌ‬ ‫ڍ‬ ‫ڎ‬ ‫ڏ‬ ‫ڐ‬ ‫ڑ‬ ‫ڒ‬ ‫ړ‬ Speech and Signal Processing, vol. 4, pp2537-2540, 1997.

[9] S.A. Sirohey “Human face segmentation and

identification”, Technical Report CS-TR-3176 University of
Figure 15. The ratio of overlapped between result of Maryland, 1993.
face region and ideal face region
[10] H. P. Graf, E. Consatto, D. Gibbon, M. Kocheisen, and
5. Conclusion E. Petajan, “Multi-Modal system for locating heads and
faces”, The Second International Conference on Automatic
In this paper, we present the automatic face Face and Gesture Recognition, pp. 88-93, 1996.
detection and tracking algorithm in real-time camera [11] V. Govindaraju, S.N. Srihari, and D.B. Sher, “A
input environment. Detecting face, we used skin color computational model for face location”, The third IEEE
detection. To trace and extract facial features, we used International conference on Computer Vision, pp. 718-721,
Harris corner detection and a greedy feature tracking 1990.
algorithm which has robustness for rotated facial
image. In experimental result, we shows result of face [12] H.A. Rowley, S. Baluja, and T. Kanade, “Neural
detection and tracking doing well. However, if feature network-based face detection”. IEEE Transactions on Pattern
points are too many deleted, we have to find face again. Analysis and Machine Intelligence, vol. 20, no. 1, pp. 22-38,
This process needs too many times. Jan. 1998.

[13] K.K. Sung and T. Poggio, “Example-based learning for

References view-based human face detection”. Technical Report A.I.
Memo 1521, CBLC paper 112, MIT Dec. 1994.
[1] X. Wei, Z. Zhu, L. Yin, and Q. Ji. A real-time face
tracking and animation system. Proceedings of the CVPR [14] Kin C. Yow, and Roberto Cipolla, “Feature-Based
Workshop on Face Processing in Video (FPIV Human Face Detection”, Second International Conference on
2004),Washington, D.C., June 28 2004. Automatic Face and Gesture Recognition, 1996.

[2] Jianbo Shi, Carlo Tomasi, “Good features to track”, IEEE [15] C. Harris and M.J. Stephens. A combined corner and
Conference on CVPR Seat-tle(1994) 593-600. edge detector. In Alvey Vision Conference, pages 147-152,
[3] Carlo Tomasi and Takeo Kanade, “Detection and
Tracking of Point Features”, Carnegie Mellon University [16] Vezhnevets V., Andreeva A., "A Comparative
Technical Report CMU-CS-91-132(1991) Assessment of Pixel-based Skin Detection Methods".
Technical report, Graphics and Media Lab., Moscow State
[4] Bouguet, J. Y.: Pyramidal Implementation of the Lucas University, Moscow, Russia, 2006.
Kanade Feature Tracker, Intel Corporation, Microprocessor
Research Labs, 2000,