Professional Documents
Culture Documents
Open this page, allow it to access your webcam and see your face getting recognized
by your browser using JavaScript and OpenCV, an “open source computer vision
library”. That’s pretty cool! But recognizing faces in images is not something terribly
new and exciting. Wouldn’t it be great if we could tell OpenCV to recognize something
of our choice, something that is not a face? Let’s say… a banana?
That is totally possible! What we need in order to do that is called a “cascade classifier
for Haar features” to point OpenCV at. A cascade classifier basically tells OpenCV
what to look for in images. In the example above a classifier for face features was
being used. There are a lot of cascade classifiers floating around on the internet and
you can easily find a different one and use it. But most of them are for recognizing
faces, eyes, ears and mouths though and it would be great if we could tell OpenCV to
recognize an object of our choice. We need a cascade classifier that tells OpenCV
how to recognize a banana.
Here’s the good news: we can generate our own cascade classifier for Haar features.
Over in computer vision land that’s called “training a cascade classifier”. Even better
news: it’s not really difficult. And by “not really” I mean it takes time and a certain
amount of willingness to dig through the internet to find the relevant information and
tutorials on how to do it, but you don’t need a PhD and a lab.
But now for the best of news: keep on reading! We’ll train our own cascade classifier in
the following paragraphs and you just need to follow the steps described here.
The following instructions are heavily based on Naotoshi Seo’s immensely
helpful notes on OpenCV haartraining and make use of his scripts and resources he
released under the MIT licencse. This is an attempt to provide a more up-to-date step-
by-step guide for OpenCV 2.x that wouldn’t be possible without his work — Thanks
Naotoshi!
You’ll also need OpenCV on your system. The preferred installation methods differ
between operating systems so I won’t go into them, but be sure to get at least OpenCV
2.4.5, since that’s the version this post is based on and OpenCV 2.x had some major
API changes. Build OpenCV with TBB enabled, since that allows us to make use of
multiple threads while training the classifier.
If you’re on OS X and use homebrew it’s as easy as this:
Another thing we need is the OpenCV source code corresponding to our installed
version. So if your preferred installation method doesn’t provide you with access to it
go and download it. If you get a compiler error further down, be sure to try it with
OpenCV 2.4.5.
SAMPLES
In order to train our own classifier we need samples, which means we need a lot of
images that show the object we want to detect (positive sample) and even more
images without the object (negative sample).
How many images do we need? The numbers depend on a variety of factors, including
the quality of the images, the object you want to recognize, the method to generate the
samples, the CPU power you have and probably some magic.
Training a highly accurate classifier takes a lot of time and a huge number of samples.
The classifiers made for face recognition are great examples: they were created by
researchers with thousands of good images. The TechnoLabsz blog has a great
post that provides some information based on their experience:
It is unclear exactly how many of each kind of image are needed. For Urban
Challenge 2008 we used 1000 positive and 1000 negative images whereas the
previous project Grippered Bandit used 5000. The result for the Grippered Bandit
project was that their classifier was much more accurate than ours.
This post here is just an introduction and getting a large number of good samples is
harder than you might think, so we’ll just settle on the right amount that gives us
decent results and is not too hard to come by:
I’ve had success with the following numbers for small experiments: 40 positive
samples and 600 negative samples. So let’s use those!
POSITIVE IMAGES
Now we need to either take photos of the object we want to detect, look for them on
the internet, extract them from a video or take some Polaroid pictures and then scan
them: whatever it takes! We need 40 of them, which we can then use to generate
positive samples OpenCV can work with. It’s also important that they should differ in
lighting and background.
Once we have the pictures, we need to crop them so that only our desired object
is visible. Keep an eye on the ratios of the cropped images, they shouldn’t differ that
much. The best results come from positive images that look exactly like the ones you’d
want to detect the object in, except that they are cropped so only the object is visible.
Again: a lot of this depends on a variety of factors and I don’t know all of them and a
big part of it is probably black computer science magic, but since I’ve had pretty good
results by cropping them and keeping the ratio nearly the same, let’s do the same now.
To give you an idea, here are some scaled down positive images for the banana
classifier:
NEGATIVE IMAGES
Now we need the negative images, the ones that don’t show a banana. In the best
case, if we were to train a highly accurate classifier, we would have a lot of negative
images that look exactly like the positive ones, except that they don’t contain the object
we want to recognize. If you want to detect stop signs on walls, the negative images
would ideally be a lot of pictures of walls. Maybe even with other signs.
We need at least 600 of them. And yes, getting them manually by hand takes a long
time. I know, I’ve been there. But again: you could take a video file and extract the
frames as images. That way you’d get 600 pictures pretty fast.
For the banana classifier I used random photos from my iPhoto library and some
photos of the background where I photographed the banana earlier, since the classifier
should be able to tell OpenCV about a banana in pretty much any picture.
Once we have the images, we put all of them in the negative_images folder of the
repository and use find to save the list of relative paths to a file:
CREATING SAMPLES
With our positive and negative images in place, we are ready to generate samples out
of them, which we will use for the training itself. We need positive and negative
samples. Luckily we already have the negative samples in place. To quote the
OpenCV documentation about negative samples:
"Negative samples are taken from arbitrary images. These images must not contain
text file in which each line contains an image filename (relative to the
So, let’s make sure we’re in the root directory of the repository and fire this up:
In order to use it, we need to copy it into the OpenCV source directory and compile it
there with other OpenCV files:
cp src/mergevec.cpp ~/opencv-2.4.5/apps/haartraining
cd ~/opencv-2.4.5/apps/haartraining
lopencv_objdetect
Then we can go back to the repository, bring the executable mergevec with us and use
it:
We can now use the resulting samples.vec to start the training of our classifier.
Many tools and libraries out there (including jsfeat, which is used in the example in the
first paragraph) only accept classifiers in the old format. The reason for this is most
likely that the majority of the easily available classifiers (e.g. the ones that ship with
OpenCV) still use that format. So if you want to use these libraries and tools you
should use opencv_haartraining to train your classifier.
-precalcIdxBufSize 1024
Read at least this post on the OpenCV answer board and the official
documentation about opencv_traincascade to get a better understanding of the
parameters in use here. Especially the -numPosparameter can cause some problems.
This is going to take a lot of time. And I don’t mean the old “get a coffee and come
back”-taking-a-lot-of-time, no. Running this took a couple of days on my mid-2011
MacBook Air. It will also use a lot of memory and CPU. Do something else while this
runs and come back after you noticed that it finished. Even better: use a EC2 box to
run this on.
You don’t have to keep the process running without any interruptions though: you can
stop and restart it at any time and it will proceeded from the latest training stage it
finished.
<BEGIN
Precalculation time: 11
+----+---------+---------+
| N | HR | FA |
+----+---------+---------+
| 1| 1| 1|
+----+---------+---------+
| 2| 1| 1|
+----+---------+---------+
| 3| 1| 1|
+----+---------+---------+
| 4| 1| 1|
+----+---------+---------+
| 5| 1| 1|
+----+---------+---------+
| 6| 1| 1|
+----+---------+---------+
| 7| 1| 0.711667|
+----+---------+---------+
| 8| 1| 0.54|
+----+---------+---------+
| 9| 1| 0.305|
+----+---------+---------+
END>
Each row represents a feature that is being trained and contains some output about its
HitRatio and FalseAlarm ratio. If a training stage only selects a few features (e.g. N =
2) then its possible something is wrong with your training data.
At the end of each stage the classifier is saved to a file and the process can be
stopped and restarted. This is useful if you are tweaking a machine/settings to
optimize training speed.
When the process is finished we’ll find a file called classifier.xml in
the classifier directory. This is the one, this is our classifier we can now use to detect
bananas with OpenCV! Only the sky is the limit now.
var cv = require('opencv');
var thickness = 2;
var inputFiles = [
];
inputFiles.forEach(function(fileName) {
console.log(objects);
im.rectangle(
[object.x, object.y],
color,
);
im.save(fileName.replace(/\.jpg/, 'processed.jpg'));
});
});
});
OPENCV’S FACEDETECT
Now let’s use our webcam to detect a banana! OpenCV ships with a lot of samples
and one of them is facedetect.cpp, which allows us to detect objects in video frames
captured by a webcam. Compiling and using it is pretty straightforward:
cd ~/opencv-2.4.5/samples/c
chmod +x build_all.sh
./build_all.sh
Be sure to play around with the --scale parameter, especially if the results are not
what you expected. If all goes well, you should see something like this:
IN CLOSING
OpenCV, Haar classifiers and image detection are vast topics that are nearly
impossible to cover in a blog post of this size, but I hope this post helps you to get your
feet wet and gives you an idea of what’s possible. If you want to get a more thorough
understanding start reading through the references linked below.
Be sure to dig through the samples folder of the OpenCV source, play around with other
language bindings and libraries. With your own classifier it’s even more fun than “just”
detecting faces. And if your classifier is not yet finished, try playing around with the
banana classifier I put in the trained_classifier directory of the repository. I’m sure
there are thousands of super practical use cases for it, so go ahead and use it. And if
you want to add your classifier to the directory just send a pull request!
If there are any questions, problems or remarks: leave a comment!
http://opencvuser.blogspot.mx/2011/08/creating-haar-cascade-classifier-aka.html
http://note.sonots.com/SciSoftware/haartraining.html
Objective
Data Prepartion
o Positive (Face) Images
o Negative (Background) Images
o Natural Test (Face in Background) Images
o How to Crop Images Manually Fast
Create Samples (Reference)
o 1. Create training samples from one
o 2. Create training samples from some
o 3. Create test samples
o 4. Show images
o EXTRA: random seed
Create Samples
o Create Training Samples
o Create Testing Samples
Training
o Haar Training
o Generate a XML File
Testing
o Performance Evaluation
o Fun with a USB camera
Experiments
o PIE Expeirment 1
o PIE Experiment 2
o PIE Experiment 3
o PIE Experiment 4
o PIE Experiment 5
o PIE Experiment 6
o UMIST Experiment 1
o UMIST Experiment 2
o CBCL Experiment 1
o haarcascade_frontalface_alt2.xml
Discussion
Download
o How to enable OpenMP
References
Objective
The OpenCV library provides us a greatly interesting demonstration for a face detection.
Furthermore, it provides us programs (or functions) that they used to train classifiers for
their face detection system, called HaarTraining, so that we can create our own object
classifiers using these functions. It is interesting.
However, I could not follow how OpenCV developers performed the haartraining for their
face detection system exactly because they did not provide us several information such as
what images and parameters they used for training. The objective of this report is to
provide step-by-step procedures for following people.
My working environment is Visual Studio + cygwin on Windows XP, or on Linux. The cygwin
is required because I use several UNIX commands. I am sure that you will use the cygwin
(especially I mean UNIX commands) not only for this haartraining but also for others in the
future if you are one of engineer or science people.
FYI: I recommend you to work haartrainig with something different concurrently because
you have to wait so many days during training (it would possibly take one week). I
typically experimented as 1. run haartraining on Friday 2. forget about it completely 3. see
results on next Friday 4. run another haartraining (loop).
Data Prepartion
FYI: There are database lists on Face Recognition Homepage - Databases .
and Computer Vision Test Images .
Kuranov et. al. [3] mentions as they used 5000 positive frontal face patterns, and 5000
positive frontal face patterns were derived from 1000 original faces. I describe how to
increase number of samples at the later chapter.
Before, I downloaded and used The UMIST Face Database (Dead Link) because cropped
face images were available at there. The UMIST Face Database has video-like image
sequences from side-faces to frontal faces. I thought training with such images would
generate a face detector which is robust to facial pose. However, the generated face
detector did not work well. Probably, I dreamed too much. It was a story on 2006.
I obtained a cropped frontal face database based on CMU PIE Database . I use it too.
This dataset has a large illumination variations, thus this would result in the same bad
result with the case of the UMIST Face Database which had large variations in poses.
#Sorry, it looks redistribution (of modifications) of PIE database is not allowed. I made only
a generated (distorted and diminished) .vec file available at the Download section. The
PIE database is free (send a request e-mail), but it does not include the cropped faces
originally.
MIT CBCL Face Data is another choice. They have 2,429 frontal faces with few
illumination variations and pose variations. This data would be good for haartraining.
However, the size of image is originally small 19 x 19. So, we can not perform experiments
to determine good sizes.
Probably, the OpenCV developers used the FERET database. It looks that the FERET
database became available to download over internet from Jan. 31, 2008(?).
There is a CMU-MIT Frontal Face Test Set that the OpenCV developers used for their
experiments. This dataset has a ground truth text including information for locations of
eyes, noses, and lip centers and tips, however, it does not have locations of faces
expressed by rectangle regions required by the haartraining utilities as default.
I created a simple script to compute facial regions from given ground truth information. My
computation works as follows:
Right boundary is located right the margin from the right eye
Left boundary is located left the margin from the left eye
The generated ground truth text and image dataset is available at the Download section,
you may download only the ground truth text . By the way, I converted GIF to PNG
because OpenCV does not support GIF. The mogrify (ImageMagick) command would be
useful to do such conversion of image types
$ mogrify -format png *.gif
Usage: ./createsamples
[-info <description_file_name>]
[-img <image_file_name>]
[-vec <vec_file_name>]
[-bg <background_file_name>]
-img <one_positive_image>
-bg <collection_file_of_negatives>
-vec <name_of_the_output_file_containing_the_generated_samples>
For example,
[filename]
[filename]
[filename]
...
such as
img/img1.jpg
img/img2.jpg
Let me call this file format as collection file format.
How to create a collection file
This format can easily be created with the find command as
such as
-info <description_file_of_samples>
-vec <name_of_the_output_file_containing_the_generated_samples>
For example,
This generates samples without applying distortions. You may think this function as a file
format conversion function.
...
where (x,y) is the left-upper corner of the object where the origin (0,0) is the left-upper
corner of the image such as
Let me call this format as a description file format against the collection file format
although the manual [1] does not differentiate them.
This function crops regions specified and resize these images and convert into .vec format,
but (let me say again) this function does not generate many samples from one
image (one cropped image) applying distortions. Therefore, you may use this 2nd function
only when you have already sufficient number of natural images and their ground truths
(totally, 5000 or 7000 would be required).
Note that the option -num is used only to restrict the number of samples to generate, not
to increase number of samples applying distortions in this case.
For such a situation, you can use the find command and the identify command (cygwin
should have identify (ImageMagick) command) to create a description file as
$ find <dir> -name '*.<ext>' -exec identify -format '%i 1 0 0 %w %h' \{\}
\; > <description_file>
such as
If all images have the same size, it becomes simpler and faster,
$ find <dir> -name '*.<ext>' -exec echo \{\} 1 0 0 <width> <height> \; >
<description_file>
such as
How to automate to crop images? If you can do it, you do not need haartraining. You have
an object detector already
3. Create test samples
The 3rd function is to create test samples and their ground truth from single image
applying distortions. This function (cvsamples.cpp#cvCreateTrainingSamplesFromInfo) is
triggered when options, -img, -bg, and -info were specified.
-img <one_positive_image>
-bg <collection_file_of_negatives>
-info <generated_description_file_for_the_generated_test_images>
In this case, -w and -h are used to determine the minimal size of positives to be embeded
in the test images.
Be careful that only the first <num> negative images in the <collection_file_of_negatives>
are used.
4. Show images
The 4th function is to show images within a vec file. This function
(cvsamples.cpp#cvShowVecSamples) is triggered when only an option, -vec, was specified
(no -info, -img, -bg). For example,
#include<time.h>
srand(time(NULL));
Create Samples
Create Training Samples
Kuranov et. al. [3] mentions as they used 5000 positive frontal face patterns and 3000
negatives for training, and 5000 positive frontal face patterns were derived from 1000
original faces.
However, you may have noticed that none of 4 functions of the createsamples utility
provide us a function to generate 5000 positive images from 1000 images at burst. We
have to use the 1st function of the createsamples to generate 5 (or some) positives form 1
image, repeat the procedures 1000 (or some) times, and finally merge the generated
output vec files. *1
I wrote a program, mergevec.cpp, to merge vec files. I also wrote a
script, createtrainsamples.pl , to repeat the procedures 1000 (or some) times. I
specified 7000 instead of 5000 as default because the Tutorial [1] states as "the
reasonable number of positive samples is 7000." Please modify the path to createsamples
and its option parameters directly written in the file.
The input format of createtrainsamples.pl is
Example)
$ cd HaarTraining/bin
Kuranov et. al. [3] states as 20x20 of sample size achieved the highest hit rate.
Furthermore, they states as "For 18x18 four split nodes performed best, while for 20x20
two nodes were slightly better. Thus, -w 20 -h 20 would be good.
Example)
$ # cd HaarTraining/bin
$ find tests/ -name 'info.dat' -exec cat \{\} \; > tests.dat # merge info
files
Training
Haar Training
Now, we train our own classifier using the haartraining utility. Here is the usage of the
haartraining.
Usage: ./haartraining
-data <dir_name>
-vec <vec_file_name>
-bg <background_file_name>
[-eqw]
Kuranov et. al. [3] states as 20x20 of sample size achieved the highest hit rate.
Furthermore, they states as "For 18x18 four split nodes performed best, while for 20x20
two nodes were slightly better. The difference between weak tree classifiers with 2, 3 or 4
split nodes is smaller than their superiority with respect to stumps."
Furthermore, there was a description as "20 stages were trained. Assuming that my test
set is representative for the learning task, I can expect a false alarm rate
about and a hit rate about ."
Therefore, use of 20x20 of sample size with nsplit = 2, nstages = 20, minhitrate = 0.9999
(default: 0.995), maxfalselarm = 0.5 (default: 0.5), and weighttrimming = 0.95 (default:
0.95) would be good such as
The "-nonsym" option is used when the object class does not have vertical (left-right)
symmetry. If object class has vertical symmetry such as frontal faces, "-sym (default)"
should be used. It will speed up processing because it will use only the half (the centered
and either of the left-sided or the right-sided) haar-like features.
The "-mode ALL" uses Extended Sets of Haar-like Features [2]. Default is BASIC and it
uses only upright features, while ALL uses the full set of upright and 45 degree rotated
feature set [1].
The "-mem 512" is the available memory in MB for precalculation [1]. Default is 200MB, so
increase if more memory is available. We should not specify all system RAM because this
number is only for precalculation, not for all. The maximum possible number to be specified
would be 2GB because there is a limit of 4GB on the 32bit CPU (2^32 ≒ 4GB), and it
becomes 2GB on Windows (kernel reserves 1GB and windows does something more).
There are other options that [1] does not list such as
#You can use OpenMP (multi-processing) with compilers such as Intel C++ compiler and
MS Visual Studio 2005 Professional Edition or better. See How to enable OpenMP section.
#One training took three days.
If you want to convert an intermediate haartraining output dir tree data into a xml file,
there is a software at the OpenCV/samples/c/convert_cascade.c (that is, in your installation
directory). Compile it.
$ convert_cascade --size="<sample_width>x<sampe_height>"
<haartraining_ouput_dir> <ouput_file>
Example)
Usage: ./performance
-data <classifier_directory_name>
-info <collection_file_name>
[-ni]
During detection, a sliding window was moved pixel by pixel over the picture at each scale. Starting
with the original scale, the features were enlarged by 10% and 20%, respectively (i.e., representing a
rescale factor of 1.1 and 1.2, respectively) until exceeding the size of the picture in at least one
dimension. Often multiple faces are detect at near by location and scale at an actual face location.
Therefore, multiple nearby detection results were merged. Receiver Operating Curves (ROCs) were
constructed by varying the required number of detected faces per actual face before merging into a
single detection result. During experimentation only one parameter was changed at a time. The best
mode of a parameter found in an experiment was used for the subsequent experiments. [3]
Execute the performance utility as
Be careful that you have to tell the size of training samples when you specify the classifier
directory although the classifier xml file includes the information inside *2.
-ni option suppresses to create resulted image files of detection. As default, the
performance utility creates the resulted image files of detection and stores them into
directories that a prefix 'det-' is added to test image directories. When you want to use this
function, you have to create destination directories beforehand by yourself. Execute next
command to create destination directories
where tests.dat is the collection file for testing images which you created at the step of
createtestsamples.pl. Now you can execute the performance utility without '-ni' option.
+================================+======+======+======+
+================================+======+======+======+
|tests/01/img01.bmp/0001_0153_005| 0| 1| 0|
+--------------------------------+------+------+------+
....
+--------------------------------+------+------+------+
+================================+======+======+======+
Number of stages: 15
15
874 72 0.612045 0.050420
26 0 0.018207 0.000000
8 0 0.005602 0.000000
4 0 0.002801 0.000000
1 0 0.000700 0.000000
....
'Hits' shows the number of correct detections. 'Missed' shows the number of missed
detections or false negatives (Truly there exists, but the detector missed to detect it).
'False' shows the number of false alarms or false positives (Truly there does not exist, but
the detector alarmed as there exists.)
The latter table is for ROC plot. Please see my modified version of haartraining
document [5] for more.
I modified facedetect.c slightly because the facedetect utility did not work in the same
manner with the performance utility. I added options to change parameters on command
line. The source code is available at the Download section (or direct link facedetect.c ).
Now the usage is as follows:
[ filename | camera_index = 0 ]
Experiments
PIE Expeirment 1
The PIE dataset has only frontal faces with big illumination variations. The dataset used in
PIE experiments looks as follows:
+================================+======+======+======+
+================================+======+======+======+
+================================+======+======+======+
Number of stages: 16
16
847 67 0.593137 0.046919
15 0 0.010504 0.000000
1 0 0.000700 0.000000
+================================+======+======+======+
+================================+======+======+======+
+================================+======+======+======+
Number of stages: 16
16
20 9 0.039139 0.017613
20 9 0.039139 0.017613
20 0.003914 0.000000
PIE Experiment 2
haarcascade_frontalface_pie2.sh .
o Tried -nonsym option
o The training took 3 days on Intel Xeon 3GHz with 1GB memory machine.
haarcascade_frontalface_pie2.performance_pie_tests.txt
haarcascade_frontalface_pie2.performance_cmu_tests.txt
PIE Experiment 3
haarcascade_frontalface_pie3.sh .
o Increased number of training samples from 7000 to 10000. Still -nonsym
o The training took 4 days on Intel Xeon 2GHz with 1GB memory machine.
o This was the best among PIE experiments
haarcascade_frontalface_pie3.performance_pie_tests.txt
haarcascade_frontalface_pie3.performance_cmu_tests.txt
PIE Experiment 4
haarcascade_frontalface_pie4.sh .
o Tried to change -minhitrate from 0.999 to 0.995
o The training took 5 days on Intel Xeon 3GHz with 1GB memory machine.
haarcascade_frontalface_pie4.performance_pie_tests.txt
haarcascade_frontalface_pie4.performance_cmu_tests.txt
| Total| 3| 508| 1|
PIE Experiment 5
haarcascade_frontalface_pie5.sh .
o -sym for experiment 3
o The training took 4 days on Intel Xeon 2GHz with 1GB memory machine.
haarcascade_frontalface_pie5.performance_pie_tests.txt
| Total| 737| 691| 46|
haarcascade_frontalface_pie5.performance_cmu_tests.txt
PIE Experiment 6
haarcascade_frontalface_pie5.sh .
o -maxtreesplits 4
o The training took 3 weeks
haarcascade_frontalface_pie5.performance_pie_tests.txt
haarcascade_frontalface_pie5.performance_cmu_tests.txt
| Total| 8| 503| 7|
UMIST Experiment 1
The UMIST is a multi-view face dataset.
haarcascade_profileface_umist1.performance_cmu_tests.txt
| Total| 0| 511| 5|
UMIST Experiment 2
haarcascade_profileface_umist2.sh
o -maxtreesplits 4
o training took 2 weeks
haarcascade_profileface_umist2.performance_umist_tests.txt
haarcascade_profileface_umist2.performance_cmu_tests.txt
CBCL Experiment 1
haarcascade_frontalface_cbcl1.sh
o -maxtreesplits 4
o The training took 5 weeks
haarcascade_frontalface_cbcl1.performance_cbcl_tests.txt
haarcascade_frontalface_cbcl1.performance_cmu_tests.txt
haarcascade_frontalface_alt2.xml
haarcascade_frontalface_alt2.performance_pie_tests.txt
haarcascade_frontalface_alt2.performance_umist_tests.txt
haarcascade_frontalface_alt2.performance_cbcl_tests.txt
| Total| 109| 891| 1490|
haarcascade_frontalface_alt2.performance_cmu_tests.txt
Discussion
The created detectors outperformed the opencv default xml in terms of synthesized test
samples created from training samples. This shows that the training was successfully
performed. However, the detector did not work well in general test samples. This might
mean that the detector was over-trained or over-fitted to the specific training samples. I
still don't know good parameters or training samples to generalize detectors well.
False alarm rates of all of my generated detectors were pretty low compared with the
opencv default detector. I don't know which parameters are especially different. I set false
alarm rate with 0.5 and this makes sense theoretically. I don't know.
Training illumination varying faces in one detector resulted in pretty poor. The generated
detector became sensitive to illumination rather than robust to illumination. This detector
does not detect non-illuminated normal frontal faces. This makes sense because normal
frontal faces did not exist in training sets so many. Training multi-view faces in one time
resulted in the same thing.
We should train different detectors for each face pose or illumination state to construct a
multi-view or illumination varied face detector as Fast Multi-view Face Detection . Viola
and Jones extended their work for multi-view by training 12 separated face poses
detectors. To achieve rapidness, they further constructed a pose estimator by C4.5 decision
tree re-using the haar-like features, they further cascaded the pose estimator and face
detector (Of course, this means that if pose estimation fails, the face detection also fails).
Theory behind
The advantage of the haar-like features is the rapidness in detection phase, not accuracy.
We of course can construct another face detector which achieves better accuracy using,
e.g., PCA or LDA although it becomes slow in detection phase. Use such features when you
do not require rapidness. PCA does not require to train AdaBoost, so training phase would
quickly finish. I am pretty sure that there exist such face detection method already
although I did not search (I do not search because I am sure).
Download
The files are available at https://github.com/sonots/tutorial-haartraining
Directory Tree
HaarTraining haartraining
o
src Source Code, haartraining and my additional c++ source codes are at here.
o
src/Makefile Makefile for Linux, please read comments inside
o
bin Binaries for Windows are ready, my perl scripts are also at here. This directory
would be a working directory.
o make Visual Studio Project Files
data The collected Image Datasets
result Generated Files (vec and xml etc) and results
This is a svn repository, so you can download files at burst if you have a svn client (you
should have it on cygwin or Linux). For example,
$ svn co https://github.com/sonots/tutorial-haartraining/blob/master/
tutorial-haartraining
Sorry, but downloading (checkout) image datasets may take forever.... I created a zip file
once, but google code repository did not allow me to upload such a big file (100MB). I
recommend you to check out only the HaarTraining directory first as
$ svn co
https://github.com/sonots/tutorial-haartraining/blob/master/HaarTraining/
HaarTraining
Here, the list of my additional utilities (I put them in HaarTraining/src and HaarTraining/bin
directory):
convert_cascade.c
facedetect.c This is my slightly modified version
Right click cvhaartraining project > Properties. You will see a picture as below.
Reference http://www.codeproject.com/KB/cpp/BeginOpenMP.aspx
Follow Configuration Properties > C/C++ > Language > Change 'OpenMP Support' to 'Yes
(/openmp)' as the above picture shows. If you can not see it, probably your environment
does not support OpenMP.
Build cvhaartraining only (Right click the project > Project Only > Rebuild only
cvhaartraining) and do the same procedure (enable OpenMP) for haartraining project. Now,
haartraining.exe should work with OpenMP.
You may use Process Explorer to verify whether it is utilizing OpenMP or not.
Run the Process Explorer > View > Show Lower Pane (Ctrl+L) > choose 'haartraining.exe'
process and see the Lower Pane. If you can see two threads not one thread, it is utilizing
OpenMP.
References
[1] HaarTraining doc This document can be obtained from
OpenCV/apps/HaarTraining/doc on your OpenCV install directory
[2] Rainer Lienhart and Jochen Maydt. An Extended Set of Haar-like Features for Rapid
Object Detection. IEEE ICIP 2002, Vol. 1, pp. 900-903, Sep.
2002. http://www.lienhart.de/ICIP2002.pdf
[3] Alexander Kuranov, Rainer Lienhart, and Vadim Pisarevsky. An Empirical Analysis of
Boosting Algorithms for Rapid Objects With an Extended Set of Haar-like Features. Intel
Technical Report MRL-TR-July02-01,
2002. http://www.lienhart.de/Publications/DAGM2003.pdf
[4] Paul Viola and Michael J. Jones. Rapid Object Detection using a Boosted Cascade of
Simple Features. IEEE CVPR,
2001. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3.7597
[5] Modified HaarTraining doc This is my modified version of [1]
https://codeyarns.com/2014/09/01/how-to-train-opencv-cascade-classifier/
Now we can train the cascade classifier. The details of its parameters can be
seen in its documentation. I used this invocation:
$ opencv_traincascade -data obj-classifier -vec pos-samples.vec -bg neg-filepaths.txt -
1minhitrate 0.999 -maxfalsealarm 0.5 -w 50 -h 50 -nonsym -baseFormatSave
obj-classifier is a directory where we are asking the classifier files
to be stored. Note that this directory should already be created by
you. pos-samples.vec is the file we generated in step above. neg-
filepaths.txt is a file with list of paths to negative sample
files. 2048 is the amount of MB of memory we are requesting the program
to use. The more the memory, the faster the training. 200 is the number of
positive samples in pos-samples.vec. This number is also reported
by opencv_createsamples when it finishes its execution. 2000 is the
number of negative sample image paths we have specified in neg-
filepaths.txt. 20 is the number of stages in the classifier we
wish. 50x50 is the size of object in these images. This should be same as
what was specified with opencv_createsamples. 0.999and 0.5 are
self-explanatory. Details on all these parameters are found in
the documentation.
Related: See my tips and other posts on using OpenCV Cascade Classifier.
Tried with: OpenCV 2.4.9 and Ubuntu 14.04