Professional Documents
Culture Documents
Detection
(To appear in the Proceedings of CVPR'97, June 17-19, 1997, Puerto Rico.)
Time (hours)
700
2.5
the one in gure 3a. This means that the number of (a) 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
(b) 300
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Time (hours)
ure 3d we report the number of global iterations (the
2.5 8
1.5
points.
The memory requirements of this technique are Figure 3: (a) Training Time on a SPARC station-20
quadratic in the size of the working set B . For the vs. number of samples; (b)Number of Support Vec-
50,000 points data set we used a working set of 1,200 tors vs. number of samples; (c) Training Time on
variables, that ended up using only 25Mb of RAM. a SPARCstation-20 vs. Number of Support Vectors.
However, a working set of 2,800 variables will use ap- Notice how the number of support vectors is a bet-
proximately 128Mb of RAM. Therefore, the current ter indicator of the increase in training time than the
technique can deal with problems with less than 2,800 number of samples alone; (d) Number of global itera-
support vectors (actually we empirically found that tions performed by the algorithm.
the working set size should be about 20% larger than
the number of support vectors). In order to overcome
this limitation we are implementing an extension of
the decomposition algorithm that let us deal with very methodology for nding faces using SVM's should gen-
large numbers of support vectors (say 10,000). eralize well for other spatially well-dened pattern and
feature detection problems.
3 SVM Application: Face Detection in It is important to remark that face detection, like
Images most object detection problems, is a dicult task due
to the signicant pattern variations that are hard to
This section introduces a Support Vector Machine ap- parameterize analytically. Some common sources of
plication for detecting vertically oriented and unoc- pattern variations are facial appearance, expression,
cluded frontal views of human faces in grey level im- presence or absence of common structural features,
ages. It handles faces over a wide range of scales and like glasses or a moustache, light source distribution,
works under dierent lighting conditions, even with shadows, etc.
moderately strong shadows. The system works by scanning an image for face-
The face detection problem can be dened as fol- like patterns at many possible scales and uses a SVM
lows: given as input an arbitrary image, which could as its core classication algorithms to determine the
be a digitized video signal or a scanned photograph, appropriate class (face/non-face).
determine whether or not there are any human faces
in the image, and if there are, return an encoding of 3.1 Previous Systems
their location. The problem of face detection has been approached
Face detection as a computer vision task has many with dierent techniques in the last few years. This
applications. It has direct relevance to the face recog- techniques include Neural Networks [2, 9, 11], detec-
nition problem, because the rst important step of a tion of face features and use of geometrical constraints
fully automatic human face recognizer is usually lo- [13], density estimation of the training data [6], la-
cating faces in an unknown image. Face detection beled graphs [5] and clustering and distribution-based
also has potential application in human-computer in- modeling [10].
terfaces, surveillance systems, census systems, etc. Out of all these previous works, the results of Sung
From the standpoint of this paper, face detection and Poggio [10], and Rowley et al. [9] re
ect systems
is interesting because it is an example of a natural with very high detection rates and low false positive
and challenging problem for demonstrating and test- rates.
ing the potentials of Support Vector Machines. There Sung and Poggio use clustering and distance met-
are many other object classes and phenomena in the rics to model the distribution of the face and non-face
real world that share similar characteristics, for exam- manifold, and a Neural Network to classify a new pat-
ple, tumor anomalies in MRI scans, structural defects tern given the measurements. The key of the quality
in manufactured parts, etc. A successful and general of their result is the clustering and use of combined
Mahalanobis and Euclidean metrics to measure the dierent textured patterns they contain. This
distance from a new pattern and the clusters. Other bootstrapping step, which was successfully used
important features of their approach is the use of non- by Sung and Poggio [10] is very important in the
face clusters, and the use of a bootstrapping technique context of a face detector that learns from exam-
to collect important non-face patterns. One drawback ples because:
of this technique is that it does not provide a prin-
cipled way to choose some important free parameters Although negative examples are abundant,
like the number of clusters it uses. negative examples that are useful from a
Similarly, Rowley et al. have used problem infor- learning point of view are very dicult to
mation in the design of a retinally connected Neural characterize and dene.
Network that is trained to classify faces and non-faces The two classes, face and non-face are not
patterns. Their approach relies on training several equally complex since the non-face class is
NN emphasizing subsets of the training data, in or- broader and richer, and therefore needs more
der to obtain dierent sets of weights. Then, dierent examples in order to get an accurate de-
schemes of arbitration between them are used in order nition that separates it from the face class.
to reach a nal answer. Figure 4 shows an image used for bootstrap-
Our approach to face detection with SVM uses no ping with some misclassications, that were
prior information in order to obtain the decision sur- later used as non-face examples.
face, so that this technique could be used to detect
other kind of objects in digital images. 4. After training the SVM, we incorporate it as the
3.2 The SVM Face Detection System classier in a run-time system very similar to the
This system detects faces by exhaustively scanning an one used by Sung and Poggio [10] that performs
image for face-like patterns at many possible scales, the following operations:
by dividing the original image into overlapping sub-
images and classifying them using a SVM to determine Re-scale the input image several times.
the appropriate class (face/non-face). Multiple scales Cut 1919 window patterns out of the
are handled by examining windows taken from scaled scaled image.
versions of the original image. More specically, this
system works as follows: Preprocess the window using masking, light
correction and histogram equalization.
1. A database of face and non-face 19 19 = 361 Classify the pattern using the SVM.
pixel patterns, assigned to classes +1 and -1 re- If the pattern is a face, draw a rectangle
spectively, is used to train a SVM with a 2nd- around it in the output image.
degree polynomial as kernel function and an up-
per bound C = 200.
2. In order to compensate for certain sources of im-
age variation, some preprocessing of the data is
performed:
Masking: A binary pixel mask is used to
remove some pixels close to the boundary
of the window pattern allowing a reduction
in the dimensionality of the input space to
283. This step is important in the reduc-
tion of background patterns that introduce
unnecessary noise in the training process.
Illumination gradient correction: A
best-t brightness plane is subtracted from
the unmasked window pixel values, allowing
reduction of light and heavy shadows.
Histogram equalization: A histogram
equalization is performed over the patterns
in order to compensate for dierences in il- Figure 4: Some false detections obtained with the rst
lumination brightness, dierent cameras re- version of the system. This false positives were later
sponse curves, etc. used as non-face examples in the training process
3. Once a decision surface has been obtained
through training, the run-time system is used over 3.2.1 Experimental Results
images that do not contain faces, and misclassi-
cations are stored so they can be used as negative To test the run-time system, we used two sets of im-
examples in subsequent training phases. Images ages. The set A, contained 313 high-quality images
of landscapes, trees, buildings, rocks, etc., are with one face per image. The set B, contained 23 im-
good sources of false positives due to the many ages of mixed quality, with a total of 155 faces. Both
sets were tested using our system and the one by Sung
and Poggio [10]. In order to give true meaning to the
number of false positives obtained, it is important to
state that set A involved 4,669,960 pattern windows,
while set B 5,383,682. Table 2 shows a comparison
between the 2 systems. At run-time the SVM system
is approximately 30 times faster than the system of
Sung and Poggio. One reason for that is the use of
a technique introduced by C. Burges [3] that allows
to replace a large numbers of support vectors with a
much smaller number of points (which are not neces-
sarily data points), and therefore to speed up the run
time considerably.
In gure 5 we report the result of our system on some
test images. Notice that the system is able to handle,
up to a small degree, non-frontal views of faces. How-
ever, since the database does not contain any example
of occluded faces the system usually misses partially
covered faces, like the ones in the bottom picture of
gure 5. The system can also deal with some degree of
rotation in the image plane, since the data base con-
tains a number of \virtual" faces that were obtained
by rotating some face example of up to 10 degrees.
In gure 6 we report some of the support vectors we
obtained, both for face and non-face patterns. We
represent images as points in a ctitious two dimen-
sional space and draw an arbitrary boundary between
the two classes. Notice how we have placed the sup-
port vectors at the classication boundary, accord-
ingly with their geometrical interpretation. Notice
also how the non-face support vectors are not just ran-
dom non-face patterns, but are non-face patterns that
are quite similar to faces.
Test Set A Test Set B
Detect False Detect False
Rate Alarms Rate Alarms
SVM 97.1 % 4 74.2% 20
Sung et al. 94.6 % 2 74.2% 11
Table 2: Performance of the SVM face detection sys-
tem