You are on page 1of 33

# Object Detection: The Viola-Jones Face Detector

Augusto Morgan
Institute of Computing - University of Campinas
augusto.morgan@students.ic.unicamp.br

June 9, 2014

June 9, 2014

1 / 22

Overview

Object Detection

## Viola-Jones Face Detector

Haar-like features and the integral image
Cascade of Weak Classifiers

June 9, 2014

2 / 22

Object Detection

June 9, 2014

3 / 22

Object Detection

## How can we detect objects in an image?

We can use a classifier:
Given an image, is it the object we are looking for or not?
But what if the images contains a lot of other objects?
We are interested in finding where in the image are the objects.

June 9, 2014

3 / 22

Sliding Window

## We can use the classifier in small portions of the image!

We slice the image in small subwindows and apply the classifier on each
one of them.
Problems?

June 9, 2014

4 / 22

## Proposed in 2001 by Paul Viola and Michael Jones

It discards a great number of negative samples before applying too
much processing time on them, achieving high frame-rates
How does it achieve that?

June 9, 2014

5 / 22

## Haar wavelet function

The classifier used in the paper is bases on Haar-like features.

Haar wavelet
function:

1 0 t < 21 ,
(t) = 1 12 < t 1,

0 otherwhise.
Figure: Haar wavelet

## Viola-Jones Face Detector

June 9, 2014

6 / 22

Haar-like Features
Rectangles representing a score based on positive areas and negative areas.
Three kind of features: 2, 3 and 4 rectangles.
Each feature is calculated by:
X
X
f (i) =
IWhite
IBlack

## Figure: The different types of Haar-Like

Features
Augusto Morgan (IC)

## Viola-Jones Face Detector

June 9, 2014

7 / 22

Haar-like Features
Rectangles representing a score based on positive areas and negative areas.
Three kind of features: 2, 3 and 4 rectangles.
Each feature is calculated by:
X
X
f (i) =
IWhite
IBlack
Problem: The number of
Haar-Like Features is too large!
For a 24x24 pixels window there
are more than 160,000 distinct
Haar-Like Features.

## Note: this set is overcomplete.

Augusto Morgan (IC)

Features

June 9, 2014

7 / 22

## The Integral Image

New intermediate representation of
the image, similar to the Summed
Area Table used in CG.
Each pixel (x,y) contains the sum of
the original pixels above and to the
left of (x,y), inclusive.
ii(x, y ) =

i(x 0 , y 0 )

x 0 <x
y 0 <y

## It can be computed in one pass over

the original image.

June 9, 2014

8 / 22

## The sum of each rectangle can

be calculated using the integral
image in four array references.

Sum(R) = ii(A)ii(B)ii(D)+ii(C )
Figure: The sum of one
rectangle using the integral
image

## Each feature can then be

calculated in a few array
references.

June 9, 2014

9 / 22

## Rectangular Features are very simple and coarse.

However they are really fast!

June 9, 2014

10 / 22

## Rectangular Features are very simple and coarse.

However they are really fast!
They can be calculated at different scales without the need to calculate a
Gaussian Pyramid and each level integral image, wich speeds up its use
with multiscale detection.
Every other feature strategy that need the Pyramid to be calculated for
multiscale runs slower than this approach.

June 9, 2014

10 / 22

## Training the Classifier

Given the features and the set of positive and negative examples, any
classifier can be trained.
There are, however, a huge number of features.
A very small number of features can be combined to create an effective
classifier.
How to find these features?

## Viola-Jones Face Detector

June 9, 2014

11 / 22

A weak classifier

The weak classifier used in the paper takes as input a sub-window (x) and
consists of a feature (f ), a threshold () and a polarity (p) indicating the
direction of the following inequality:

1 pf (x) < p,
h(x, f , , p) =
0 otherwhise.
The weak classifier used can be viewed as a single node decision tree, a
stump.
For each feature, an optimal threshold is associated, which is used to
minimize the number of missclassifications.

June 9, 2014

12 / 22

## AdaBoost is used to boost the performance of a simple learning algorithm.

It combines weak classification functions, to create a more powerfull one.

June 9, 2014

13 / 22

## AdaBoost is used to boost the performance of a simple learning algorithm.

It combines weak classification functions, to create a more powerfull one.
At each round the examples are re-weighted to emphasize those which
were incorrectly classified by the previous weak classifier.
The final strong classifier is a weighted combination of weak classifiers
followed by a threshold.

June 9, 2014

13 / 22

## We can see the AdaBoost procedure as a greedy feature selection process:

AdaBoost is actually selecting a small set of good features.

June 9, 2014

14 / 22

## We can see the AdaBoost procedure as a greedy feature selection process:

AdaBoost is actually selecting a small set of good features.
This way, the weak learning algorithm tries to select the single rectangle
that best separate the positive and negative examples.

June 9, 2014

14 / 22

Training

June 9, 2014

15 / 22

Training

## Done in multiples rounds.

All examples start with the same weight.

June 9, 2014

15 / 22

Training

## Done in multiples rounds.

All examples start with the same weight.
At each round it searches over a large set of features and thresholds,
choosing the feature/threshold that minimize the weighted error.

June 9, 2014

15 / 22

Training

## Done in multiples rounds.

All examples start with the same weight.
At each round it searches over a large set of features and thresholds,
choosing the feature/threshold that minimize the weighted error.
The examples wrongly classified have their weight changed and the process
is repeated.

## Viola-Jones Face Detector

June 9, 2014

15 / 22

Considerations

Huge set of possible features and related thresholds (NK , where N is the
number of examples and K the number of features).
For 20000 samples and 160000 features (the number for the 24x24 pixels
subwindow) contains 3.2 billion distincts classifiers!
If using M rounds, AdaBoost takes O(MKN).

## Viola-Jones Face Detector

June 9, 2014

16 / 22

Considerations

Huge set of possible features and related thresholds (NK , where N is the
number of examples and K the number of features).
For 20000 samples and 160000 features (the number for the 24x24 pixels
subwindow) contains 3.2 billion distincts classifiers!
If using M rounds, AdaBoost takes O(MKN).
For each subwindow, all the classifiers are used and combined to get the
What if we could eliminate subwindows earlier?

June 9, 2014

16 / 22

## The Attentional Cascade

The insight is that smaller, and therefore more efficient, boosted classifiers
can be constructed which reject many of the negative sub-windows while
detecting almost all positive instances.

June 9, 2014

17 / 22

## The Attentional Cascade

The insight is that smaller, and therefore more efficient, boosted classifiers
can be constructed which reject many of the negative sub-windows while
detecting almost all positive instances.
This can be done by adjusting the threshold in the AdaBoost algorithm, to
minimize false-negatives.

June 9, 2014

17 / 22

## The Attentional Cascade

They achieved 100% Hit Rate, and 50% False Positive in the first 2
feature classifier.
Far from acceptable, but, with a few operations they can discard around
50% of the non-face sub-windows. And this is only the first classifier.

June 9, 2014

18 / 22

## The Attentional Cascade

They achieved 100% Hit Rate, and 50% False Positive in the first 2
feature classifier.
Far from acceptable, but, with a few operations they can discard around
50% of the non-face sub-windows. And this is only the first classifier.
A cascade of classifiers is built this way, with the positive output of each
one, activating the next one, using the more complex classifiers only in the
sub-windows that are more likely a face.
Since the great majority of sub-windows of an image are negative, the
cascade tries to eliminate as many sub-windows as possible at the earliest
stage possible.

June 9, 2014

18 / 22

## In the end, a post-processing step is taken to handle multiple-detections of

the same face, to have no duplicates.
Augusto Morgan (IC)

June 9, 2014

19 / 22

## Haar-like Features Extended Set

Proposed by Rainer Lienhart and Jochen Maydt in 2002.
Same principle, more variability.

June 9, 2014

20 / 22

## Viola-Jones Face Detector

June 9, 2014

21 / 22

References

Viola, P. and Jones. M., CVPR 2001, Rapid Object Detection using a
Boosted Cascade of Simple Features
Viola, P. and Jones. M., International Journal of Computer Vision
v. 57 2004, Robust Real-Time Face Detection.
Lienhart, R. and Maydt, J., IEEE ICIP 2002, An Extended Set of
Haar-like Features for Rapid Object Detection
Weisstein, Eric W. Haar Function. From MathWorldA Wolfram
Web Resource. http://mathworld.wolfram.com/HaarFunction.html

June 9, 2014

22 / 22