Professional Documents
Culture Documents
Principled Asymmetric Boosting Approaches in Face Detection: Minh-Tri Pham
Principled Asymmetric Boosting Approaches in Face Detection: Minh-Tri Pham
in Face Detection
presented by
Minh-Tri Pham
Ph.D. Candidate and Research Associate
Nanyang Technological University, Singapore
Outline
• Motivation
• Contributions
– Automatic Selection of Asymmetric Goal
– Fast Weak Classifier Learning
– Online Asymmetric Boosting
– Generalization Bounds on the Asymmetric Error
• Future Work
• Summary
Outline
• Motivation
• Contributions
– Automatic Selection of Asymmetric Goal
– Fast Weak Classifier Learning
– Online Asymmetric Boosting
– Generalization Bounds on the Asymmetric Error
• Future Work
• Summary
Problem
Application
Application
Face recognition
Application
3D face reconstruction
Application
Camera auto-focusing
Application
Windows face logon
• Lenovo Veriface Technology
Appearance-based Approach
• Scan image with probe
window patch (x,y,s)
– at different positions and scales
– Binary classify each patch into
• face, or
• non-face
• Desired output state:
– (x,y,s) containing face
• Key requirement:
– A very fast classifier
0 1
A very fast classifier
• Cascade of non-face rejectors:
pass pass pass pass
F1 F2 …. FN face
reject reject reject
non-face
Correctly
classified
Wrongly
Weak Weak
classified
Classifier Classifier
Learner Learner
1 Correctly 2
classified
Stage 1 Stage 2
: negative example
: positive example
Asymmetric Boosting
• Weight positives times more than negatives
Weak Weak
Classifier Classifier
Learner Learner
1 2
Stage 1 Stage 2
: negative example
: positive example
Non-face Rejector
• A strong combination of weak classifiers:
yes
+ + +
f1,1 f1,2 …. f1,K >? pass
F1
no
reject
…
Main issues
• Requires too much intervention from experts
A very fast classifier
• Cascade of non-face rejectors:
pass pass pass pass
F1 F2 …. FN face
reject reject reject
non-face
Weak Weak
Classifier Classifier
Learner Learner
1 2
Stage 1 Stage 2
: negative example
: positive example
Non-face Rejector
• A strong combination of weak classifiers:
yes
+ + +
f1,1 f1,2 …. f1,K >? pass
F1
no
reject
How to
choose ?
Main issues
• Requires too much intervention from experts
10 minutes to learn a
weak classifier
…
Main issues
• Requires too much intervention from experts
• Motivation
• Contributions
– Automatic Selection of Asymmetric Goal
– Fast Weak Classifier Learning
– Online Asymmetric Boosting
– Generalization Bounds on the Asymmetric Error
• Future Work
• Summary
Outline
• Motivation
• Contributions
– Automatic Selection of Asymmetric Goal
– Fast Weak Classifier Learning
– Online Asymmetric Boosting
– Generalization Bounds on the Asymmetric Error
• Future Work
• Summary
Outline
• Motivation
• Contributions
– Automatic Selection of Asymmetric Goal
– Fast Weak Classifier Learning
– Online Asymmetric Boosting
– Generalization Bounds on the Asymmetric Error
• Future Work
• Summary
Detection with Multi-exit
Asymmetric Boosting
yes
+ + +
f1,1 f1,2 …. f1,K >? pass
F1
no
reject
yes
+ + +
f1,1 f1,2 …. f1,K >? pass
F1
no
K reject
F1 ( x) sign f1,i ( x)
i 1
Idea Issues
• For k from 1 until convergence: • Weak classifiers are sub-
– Let k optimal w.r.t. training goal.
F1 ( x) sign f1,i ( x)
i 1 • Too many weak classifiers
are required in practice.
– Learn new weak classifier f1,k(x):
fˆ1,k arg minFAR ( F1 ) FRR( F1 )
f1,k
k
F1 ( x) sign f1,i ( x)
– Let i 1
– Adjust to see if we can achieve FAR(F1) <= 0 and FRR(F1) <= 0:
• Break loop if such exists
Existing trends (2)
Idea
• For k from 1 until convergence:
– Let k
F1 ( x) sign f1,i ( x)
i 1
– Learn new weak classifier f1,k(x):
fˆ1,k arg minFAR ( F1 ) FRR( F1 )
f1,k
Learn every weak classifier f1,k ( x ) using the same asymmetric goal:
Why?
Because…
• Consider two desired bounds (or targets) for learning a boosted classifier FM(x):
– Exact bound: 0
FAR ( FM )and FRR ( FM ) 0 (1)
– Conservative bound:
FAR ( F ) 0 FRR( F ) M M 0 (2)
0
• (2) is more conservative than (1) because (2) => (1).
FAR FAR
1 1
H1
=1 = 0/0
H2
H1 H3
H4 exact bound
exact bound H2 H39 Q2
0 0
H3
Q1 H40 Q1
conservative H201 H200 Q 4 Q 3 Q2 conservative Q3
H41
bound Q200 bound Q40 Q39
0 b0 Q201 0 b0 Q41
1 FRR 1 FRR
At 0for
, every new weak classifier learned, the ROC operating
0
point moves the fastest toward the conservative bound
Implication
yes
+ + +
f1,1 f1,2 …. f1,K >? pass
F1
no
K reject
F1 ( x) sign f1,i ( x)
i 1
F1 +
F2 + + +
F3 + + +
f1 f2 f3 f4 f5 f6 f7 f8 object
pass pass pass
reject reject reject
non-obj
• Features: 0
• Weak classifiers are trained with the same goal: .
0
• 0
Every pass/reject decision is guaranteed with FARand FRR 0 .
• The classifier is a cascade.
• Score is propagated from one node to another.
• Main advantages:
• Weak classifiers are learned (approximately) optimally.
• No training of multiple boosted classifiers.
• Much fewer weak classifiers are needed than traditional cascades.
Results
Goal () vs. Number of weak classifiers (K)
Method No of No of Total
weak exit training
classifier nodes time
s
– Better accuracy
• Motivation
• Contributions
– Automatic Selection of Asymmetric Goal
– Fast Weak Classifier Learning
– Online Asymmetric Boosting
– Generalization Bounds on the Asymmetric Error
• Future Work
• Summary
Outline
• Motivation
• Contributions
– Automatic Selection of Asymmetric Goal
– Fast Weak Classifier Learning
– Online Asymmetric Boosting
– Generalization Bounds on the Asymmetric Error
• Future Work
• Summary
Fast Training and Selection of
Haar-like Features using Statistics
…but…
statistics: Face
– Assumption: feature value v(t) is normally
distributed given face class c is known
– Closed-form solution for optimal threshold
n:cn c
• Sample total weight:
ˆ c zˆ
m 1
c w Jn
n:cn c
(m)
n
Total training time: 3.1 seconds per weak classifier with 300K features
• Existing methods: up to 10 minutes with 40K features or fewer
Experimental Results
• Comparison with Fast AdaBoost (J. Wu et. al. ‘07), the fastest known
implementation of Viola-Jones’ framework:
12 Fast AdaBoost
10 Fast StatBoost
seconds (s)
8
6
4
2
0
0 50000 100000 150000 200000 250000 300000
number of features (T)
Experimental Results
• Performance of a cascade:
• Time:
– Reduction of the face detector training time from up to a month to 3 hours
– Significant gain in both N and T with little increase in training time
• Due to O(N+T) per weak classifier
• Accuracy:
– Even better accuracy for face detector
• Due to much more members of Haar-like features explored
Outline
• Motivation
• Contributions
– Automatic Selection of Asymmetric Goal
– Fast Weak Classifier Learning
– Online Asymmetric Boosting
– Generalization Bounds on the Asymmetric Error
• Future Work
• Summary
Outline
• Motivation
• Contributions
– Automatic Selection of Asymmetric Goal
– Fast Weak Classifier Learning
– Online Asymmetric Boosting
– Generalization Bounds on the Asymmetric Error
• Future Work
• Summary
Weak classifier
• Cascade of non-face rejectors:
Weak classifier
• Cascade of non-face rejectors:
Weak classifier
• Cascade of non-face rejectors:
Weak classifier
• Cascade of non-face rejectors:
Outline
• Motivation
• Contributions
– Automatic Selection of Asymmetric Goal
– Fast Weak Classifier Learning
– Online Asymmetric Boosting
– Generalization Bounds on the Asymmetric Error
• Future Work
• Summary
Summary