• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
 
Copyright 2009 IEEE. Published in the Proceedings of ISSCS 2009, Iasi, Romania, 9–10 July 2009. Personal use of this material is permitted. However, permission toreprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuseany copyrighted component of this work in other works, must be obtained from the IEEE.
Head Tracking WithCombined Face And Nose Detection
Martin B¨ohme, Martin Haker, Thomas Martinetz, and Erhardt Barth
Insitute for Neuro- and BioinformaticsUniversity of L¨ubeck, GermanyEmail:
{
boehme, haker, martinetz, barth
}
@inb.uni-luebeck.de
 Abstract
We present a facial feature detector for time-of-flight(TOF) cameras that extends previous work by combining a nosedetector based on geometric features with a face detector. Thegoal is to prevent false detections outside the area of the face.To detect the nose in the image, we first compute the geometricfeatures per pixel. We then augment these geometric features withtwo additional features: The horizontal and vertical distance tothe most likely face detected by a cascade-of-boosted-ensemblesface detector. We use a very simple classifier based on an axis-aligned bounding box in feature space; pixels whose featurevalues fall within the box are classified as nose pixels, andall other pixels are classified as “non-nose”. The extent of thebounding box is learned on a labeled training set. Despite itssimiplicity, this detector already delivers satisfactory results onthe geometric features alone; adding the face detector improvesthe equal error rate (EER) from 22.2% (without face detector)to 10.4% (with face detector). (Note when comparing with ourprevious results from [1] and [2] that, in contrast to this paper,the test data used there did not contain scale variations.)
I. I
NTRODUCTION
In previous work, we have shown that the time-of-flight (TOF)camera, a novel type of image sensor that delivers a rangemap that is perfectly registered with an intensity image, canbe used to implement a detector for a prominent facial feature,the nose, by combining a set of geometric features with a verysimple bounding-box classifier [1]–[3].While this simple approach already yields satisfactory results,it has its limitations, particularly if distance variations causethe apparent size of the nose in the image to change, whichinfluences the values of the geometric features. To cope withthis, the detector has to accept a larger range of feature valuesas “nose”, and this increases the number of false detections.One way of coping with this problem is to modify thegeometric features to be scale-invariant by computing themon the reconstructed surface of the object instead of on theimage [3], but this approach is not computationally efficientenough to run at camera frame rates on current hardware.In this paper, we will examine a different approach. Toimprove the robustness of the detector to false detections, wewill augment it with a fast face detector based on a cascadeof boosted ensembles. This approach to face detection was
The ARTTS project is funded by the European Commission (contract no.IST-34107) within the Information Society Technologies (IST) priority of the6th Framework Programme. This publication reflects the views only of theauthors, and the Commission cannot be held responsible for any use whichmay be made of the information contained therein.
originally described by Viola and Jones [4] and has sincegained enormous popularity for a wide range of applications.As we have shown in [5], the Viola-Jones face detector, whichin its original formulation operates on intensity images, canbe extended to work on the combined range and intensitydata of a TOF camera. This not only increases the detectionperformance but also reduces the running time of the facedetector.An obvious way of combining the face detector with the nosedetector would be to reject any pixels that do not fall withina detected face. However, this would have the disadvantagethat if the face is not detected (a false negative), the resultof the nose detector is always rejected, even if it is correct.For this reason, we use a slightly different approach: For eachframe, we try to find a face; if no face is found, we determinethe most likely position of the face by taking the subregionof the image that generated the strongest response in the facedetector. Then, for each pixel, we compute the horizontal andvertical distance to the center of the face and use these valuesas two additional features in addition to the geometric features.The idea is that the nose is generally close to the center of theface, i.e. pixels that are far away from the center of the faceshould probably not be classified as “nose”.We then apply a very simple bounding-box classifier to thesefeatures. To train this classifier, we determine the minimumand maximum feature values that are obtained for labeled nosepixels in a set of training data. In this way, we define an axis-aligned bounding box in feature space. To detect the nose innew images, we compute the feature values for each pixel;if they lie within the bounding box, the pixel is classified as“nose”; otherwise, it is classified as ”non-nose”.Once the location of the nose has been identified, it can beused to implement a head tracker. The nose has already beenidentified by several other authors as an important feature forhead tracking [6], [7], and we have already used the previousversion of our nose detector to implement a head tracker [2].The rest of this paper is structured as follows: We willfirst define the geometric features and present an overviewof the Viola-Jones face detector. We will then describe thesimple bounding-box classifier. Finally, we will evaluate theperformance of our nose detector on a database of face imagesand quantify the improvement in detection performance thatis obtained using the face detector.
 
Fig. 1. Sample detection results on test images (top: intensity image, bottom: range map). The square indicates the detected face, the pixels marked in whiteindicate detected nose pixels. The second image shows an example of false detections (on the chin).
II. M
ETHOD
 A. Geometric Features
The geometric features that we use for the nose detectiontask, known as the
generalized eccentricities
, are related toGaussian curvature. We will briefly review their definitionhere; for more detail, see [1].We will define the generalized eccentricities on a special typeof surface known as the
Monge patch
or 2-1/2-D image. Suchsurfaces have the property that the surface points
(
x,y,z
)
canbe defined by a function
of 
x
and
y
, with
z
=
(
x,y
)
.Because the TOF camera delivers both an intensity valueper pixel, we can define two such surfaces, where
(
x,y
)
is either the range or the intensity value corresponding toa pixel position
(
x,y
)
. (For more details on the geometricalinterpretation of images for image analysis, see [8] and [9].)The generalized eccentricities
n
,
n
= 0
,
1
,
2
,...
are nowdefined as
2
n
= (
c
n
(
x,y
)
l
(
x,y
))
2
+ (
s
n
(
x,y
)
l
(
x,y
))
2
,
(1)where
c
n
and
s
n
are filter kernels corresponding to transferfunctions
n
and
n
defined in terms of polar coordinates
ρ
(spatial frequency) and
θ
(orientation):
n
=
i
n
A
(
ρ
)cos(
)
,
n
=
i
n
A
(
ρ
)sin(
)
,
(2)where
A
(
ρ
)
is a radial filter tuning function. If we set this to
A
(
ρ
) = (2
πρ
)
2
, the generalized eccentricities
0
and
2
can beused to distinguish between six basic surface types (see [9] fordetails). In practice,
A
(
ρ
)
is combined with a low-pass filterto reduce the sensitivity of the features to noise.In a TOF range map, background regions can contain arelatively high amount of noise because they return littlelight. To avoid unwanted spatial filter responses in theseregions, a threshold computed using Otsu’s method [10] wasapplied to the intensity data to segment the foreground fromthe background; the background was then set to a fixedvalue in both the range map and the intensity image. Thisprevents false detections on the background; the face detectorwill additionally prevent false detections on the rest of theforeground, such as the person’s torso.
 B. Face Detecto
Viola and Jones [4] introduced an approach to face detection(also known as a
cascade of boosted ensembles
) that iscomputationally efficient while achieving good classificationperformance. The Viola-Jones face detector is based on acascade structure (see Fig. 2); early stages in the cascaderequire little computation but can only identify subregions inthe image that are easily classified as nonfaces. These nonfacesare rejected immediately, while all other subregions (true facesand “hard” nonfaces) are passed on to the subsequent stagesin the cascade. The cascade stages become progressively moresophisticated and are able to reject more and more nonfacesuntil all that is left (ideally) are the faces. Because mostsubregions in an image are very dissimilar to a face, theaverage number of stages that are processed per subregionand hence the average computational cost are low.Each of the stages of the cascade consists of a classifiertrained using the AdaBoost algorithm on a set of features (so-called
Haar-like
features) that can be evaluated quickly and inconstant time, independent of the size of the feature, using adata structure known as an integral image.The Viola-Jones face detector originally operates on intensityimages; as we have shown in [5], the detector can be extendedto use both intensity and range features on the data from a TOFcamera, and this detector has better classification performance
 
Stage 1 Stage 2 Stage n
.....
reject reject rejectaccept accept acceptfaceinputnonfacenonface nonface
Fig. 2. Cascade structure of the Viola-Jones face detector
and lower running time than a detector operating on eithertype of data alone. In this paper, we will therefore use thisextended face detector based on intensity and range data.One way of integrating the face detector with the nose detectorbased on the geometric features would be to reject anynoses that fall outside the detected face region. However, thisapproach has two disadvantages: (i) The nose is expected tolie near the center of the face, so if a nose is detected near theborder of the face region, this is most likely a false detectionthat we would like to be able to identify as such. (ii) If no faceis detected in the current frame (a false negative), a potentiallycorrect nose detection is always suppressed.For these two reasons, we use an alternative approach. Wealways find the most likely face subregion in the image bychoosing the subregion that passed the largest number of stages in the cascade. In the case of a false negative, it isstill likely that the true face passed more cascade stages thanall nonface subregions. If more than one subregion passed thesame maximum number of stages, we use the decision valueof that stage to break ties.When the most likely face has been found, we compute twoadditional features for each pixel: The horizontal and verticaldistance of that pixel to the center of the detected face. Becausethe nose will always lie at a similar position relative to thecenter of the face, this will help the classifier to ignore falsedetections that occur far from the center of the face.
C. Bounding-Box Classifier 
The classifier is based on the features
0
and
2
(see Section II-A), evaluated on both the range map and the intensity data,as well as the horizontal and vertical distance from the centerof the detected face (see Section II-B). Because the featurespace spanned by
0
and
2
has a radial structure (see [9]),we convert these features to polar coordinates
r
and
φ
beforepassing them on to the classifier. For each pixel, we obtaina feature vector
F
= (
1
,...,F 
6
)
, which contains the valuesof 
r
and
φ
for the range and intensity data as well as thehorizontal and vertical distance from the center of the face.On a set of training images, we compute the feature vectorsobtained on the nose pixels (which were hand-labeled in theimages). For each feature, we compute the minimum andmaximum values
min
j
and
max
j
of that feature acrossall of these nose pixels. In this way, we obtain vectors
F
min
and
F
max
that define an axis-aligned bounding box in feature
00.10.20.30.40.50.60.70.80.9100.10.20.30.40.50.60.70.80.91false positive rate
   d  e   t  e  c   t   i  o  n   r  a   t  e
 with face detectorwithout face detector
Fig. 3. ROC curves of detection rate versus false positive rate for thecombined nose detector (geometric features and face detector) and for thedetector based only on geometric features (no face detector). Detection rateand false positive rate are evaluated on a per-image basis, i.e. an image iscounted as a detection if the nose is identified correctly and as a false positiveif at least one non-nose pixel is falsely classified as “nose”. Strictly speaking,therefore, the curves are not standard ROC curves, but they represent theinformation one is interested in for this application: How accurately does thedetector give the correct response per image.
space. A pixel is classified as “nose” if its feature values fallwithin this bounding box and “non-nose” otherwise.To control the tradeoff between false-positive rate and false-negative rate, we can scale the box around its center, obtainingnew bounding box limits
ˆ
F
min
=
F
centre
α
F
halfwidth
,
ˆ
F
max
=
F
centre
+
α
F
halfwidth
,
(3)where
F
centre
=
F
min
+
F
max
2
,
F
halfwidth
=
F
max
F
min
2
, and
α
is the parameter that controls the scaling of the box.We use this simple classifier because it can be evaluatedquickly and still yields relatively good results. We wouldexpect a more sophisticated classifier, such as an SVM, tobe superior; however, we are mainly interested in the effect of the face detector on classification performance, and this effectshould remain fundamentally the same independent of the typeof classifier that is used.
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...