Professional Documents
Culture Documents
Abstract—Detecting large animals on roadways using auto- some European countries witnessed more than 507 000 colli-
mated systems such as robots or vehicles is a vital task. This sions, resulting in around 300 human fatalities, 30 000 human
can be achieved using conventional tools such as ultrasonic injuries, and more than $1 billion in damages every year.
sensors, or with innovative technology based on smart cam-
eras. In this paper, we investigate a vision-based solution. We Over the last decade, many AVC mitigation architectures
begin the paper by performing a comparative study between have been proposed in [3] and [4]. These architectures can be
three detectors: 1) Haar-AdaBoost; 2) histogram of oriented grouped into two main categories: 1) passive methods, which
gradient (HOG)-AdaBoost; and 3) local binary pattern (LBP)- use deterrence to keep large animals away from roadways and
AdaBoost, which were initially developed to detect humans and 2) active methods, based on animal detection. Passive methods
their faces. These detectors are implemented, evaluated, and com-
pared to each other in terms of accuracy and processing time. make use of deterrence strategies to warn animals; an exam-
Based on our evaluation and comparison results, we design a ple of this is the use of ultrasonic noise such as whistles (e.g.,
two-stage architecture which outperforms the aforementioned Hornet V120 [3]), or the generation of high intensity lights
detectors. The proposed architecture detects candidate regions from vehicles, which increase the distance from which they
of interest using LBP-AdaBoost in the first stage, which offers may be perceived by animals. Other earlier and inefficient
robustness to false positives in real-time conditions. The sec-
ond stage is based on support vector machine classifiers that techniques, such as electronic mats, animal reflectors, road-
were trained using HOG features. The training data are gener- side refractors, and break the beam methods, have also been
ated from our novel dataset called large animal dataset, which used to keep animals away from roads. It is observed in [4]
contains common and thermographic images of large road- that the most effective way to reduce the number of AVCs is
animals. We emphasize that no such public dataset currently to detect animals using cameras, rather than relying on deter-
exists.
rence strategies. This is because camera-based systems are the
Index Terms—Animal–vehicle collisions (AVCs) detection, most efficient and accurate way to see around regions under
automated systems, obstacle detection and avoidance. investigation in order to reduce AVCs. However, the disad-
vantages of camera-based solutions lie in the fact that they
focus on animals within the road, and ignore those outside
I. I NTRODUCTION the field of investigation. These systems also fail to detect
ITH the rapid advances in technology, there is a animals on curve-lanes. Deterrence methods (e.g., ultrasonic
W concerted determination to use automatic systems to
increase driver aptitude in safety tasks such as driving vehi-
devices), on the other hand, require a clear line of sight to
establish beam connections. Another problem is their false
cles. Automatic collision avoidance systems are intended to activation by smaller species of animals, air movement, or
assist drivers in obstacle detection and avoidance [1]. The humans, which results in false alarms. The advantage of these
animal–vehicle collision (AVC) avoidance system is an exam- systems is that they are relatively insensitive to changes in tem-
ple of such systems, which serve to enhance the safety of perature. In this paper, we focus only on the active methods
roadway users and increase highway throughput. from a computer-vision perspective. To detect the presence of
AVCs are challenging issues for vehicles, particularly in animals, vision-based detection systems use visible range cam-
rural regions of North America and Europe. They account for eras, thermographic cameras, radars, or lasers. These devices
about 200 human deaths, 29 000 injuries, and $1.1 billion in are installed inside cars or along roadsides. When a large ani-
property damage every year in the U.S. [2]. Similar situations mal is detected, drivers are notified through a warning message
are encountered in Europe, Africa, and Asia. For instance, from a car dashboard system, or through flashing roadside
signs.
In this research paper, based on [5], we explore the detec-
Manuscript received November 4, 2014; revised March 7, 2015; accepted
July 31, 2015. This work was supported in part by the Canada Research Chair tion of roadway animals using a camera-based architecture.
Programs, in part by the DIVA Strategic Research Network, and in part by the We begin this paper with a comparative study between
Natural Sciences and Engineering Research Council of Canada. This paper three detectors: 1) Haar-AdaBoost; 2) histogram of ori-
was recommended by Associate Editor E. Tunstel.
The authors are with the School of Electrical Engineering and Computer ented gradient (HOG)-AdaBoost; and 3) local binary pattern
Science, University of Ottawa, Ottawa, ON K1N 6N5, Canada (e-mail: (LBP)-AdaBoost, all of which were originally developed
amammeri@uottawa.ca). to detect humans and their faces. We then evaluate and
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. compare these detectors in terms of accuracy and process-
Digital Object Identifier 10.1109/TSMC.2015.2497235 ing time. Additionally, we compare these detectors to the
2168-2216 c 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
well-known HOG-SVM approach. Based on the evaluation method of extracting features improves the robustness of the
results, we propose a two-stage architecture that outperforms lion-face detector against shadow and illumination changes in
the aforementioned schemes Haar-AdaBoost, HOG-AdaBoost, natural scenes. However, no performance results are drawn
LBP-AdaBoost, and HOG-SVM. All of these detectors were in [8] to support this claim. Moreover, this method only con-
evaluated and tested in different daytime and nighttime condi- siders the anterior view of lions, making it vulnerable to
tions. Our two-stage architecture exhibits good performance in situations in which animal faces are not available. In [10],
daytime conditions; however, at nighttime, it is less efficient. a Haar-based method that builds visual models of animals is
The first stage, i.e., LBP-Adaboost, is used to detect regions proposed. This method is based on the assumption that ani-
of interests (ROIs) that potentially contain large animals. The mals can be represented by many segments signifying different
second stage is based on support vector machine (SVM) body parts, as already performed by part-based methods to
classifiers that were trained using HOG increases the detec- detect humans in [5] and [11]. To identify animals in [10],
tion rate, particularly in daytime conditions, as explained in three different techniques are used: 1) histograms of textons;
Section VIII [6]. 2) intensity-normalized patch pixel values; and 3) the scale
To train our classifiers and to test the aforementioned archi- invariant feature transform (SIFT) descriptor. Three classifiers
tectures, we have created a new dataset called large animal are also tested: 1) K-way logistic regression; 2) SVMs; and
dataset (LADSet). We emphasize that no such public dataset 3) K-nearest neighbors (KNN). It is reported that KNN, with
for large animals exists in the literature. This dataset is fre- K = 1, performs better than SVM and K-way logistic regres-
quently updated by the addition of new images. To perform sion. Despite the good performance of the proposed system,
our experiments, we have focused mainly on lateral views of some limitations are still found. For instance, its application
moose; the lateral position is the most common for animals to is restricted to lateral views of animals. Additionally, this sys-
take when crossing roadways. Other uncommon postures of tem is only appropriate for frames with mono-targeted animals.
animals, such as posterior view and the recumbent position, The authors also use SIFT, which is considered slow compared
are left to our future work. to the HOG method. This is because SIFT is a local descriptor
We begin this paper by reviewing the most important which only computes the gradient histogram for blocks around
research works on large animal detection in Section II, and specific interest points, while HOG is computed for an entire
presenting the animal detection scenario in Section III. Next, in image. HOG also performs better than SIFT in terms of its
Section IV, we introduce our dataset, which primarily consists false positive rate (FPR). Haar-like features are used to detect
of Internet images and video frames. In Section V, the three and identify African penguins by their chest pattern in [12],
features used to detect animals are briefly explained. The main based on the presence of black chest spots in adult penguins.
algorithms used in training are then introduced in Section VI. AdaBoost algorithm is used to train the penguins’ features.
The three detectors constructed in this paper are explained The system proposed in [12] does not work for penguins that
and compared to each other in Section VII. To address the change their feather pattern, for penguins with extraordinary
problems found with these detectors, we propose a two-stage patterns or for posterior views.
architecture in Section VIII. In Section IX, we investigate A two-stage strategy architecture is developed in [7]. It
nighttime conditions, and suggest a new system. We conclude begins by segmenting images into many regions according to
our paper with some useful remarks that outline our future their grayscale values. Contours of animals are found using a
work in Section X. contour finding function. After the generation of ROIs, they are
resized in such a way that may contain the contour of animals.
In the second stage, HOG has been applied to detect animals
II. R ELATED W ORK without any adaptation, leading to unoptimized results. In this
Surprisingly, large animal detection in the automotive con- paper, an extensive set of experiments are conducted to select
text has not received great deal of interest from the human- the best parameters from HOG that yield optimized results.
machine systems community, despite the existence of some In [7], HOG features are extracted from ROIs instead of the
AVC mitigation architectures. In fact, almost all of these entire image. For animal identification, a linear SVM classi-
papers present some countermeasures that are mainly used to fier is used. In general, the system proposed in [7] fails to
prevent collisions; however, they do not address the detection detect animals at longer distances. Moreover, directly apply-
algorithms of AVC systems (see [3], [4]). The only excep- ing HOG without tuning its parameters leads to unoptimized
tion comes from [7], in which a contour-based HOG-SVM results. Instead of extracting Haar features from gray or color
method is developed to detect deer. In this section, we review images as performed originally in [13], Zhang et al. [14], [15]
the detection and recognition methods for large animals in extracted Haar features from four channels to capture local pat-
general applications. Haar-like features, originally developed terns. These features, which are essentially Haar-like features,
to detect human faces, are explored in [8] to detect the faces are renamed in [15] as Haar of oriented gradients (HOOG).
of lions. Indeed, these features are extracted from a color map HOOG is used to handle the shape and variation in texture
calculated from the difference between the R and G color chan- of the animal head. The classification stage jointly captures
nels, instead of being extracted from gray images as performed shape and texture features; a second step, called deformable
in [9]. These features are trained using the AdaBoost algo- detection, is then performed. The role of this second step is
rithm. A combination of the Kanade–Lucas–Tomasi method to handle the spatial misalignment observed between the out-
is used to track animal faces. It is stated in [8] that this put of shape and texture detectors. The results shown in [14]
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
are very promising, since they consider both texture and edge Fig. 2. Example (video) of large animal detection using our architecture
proposed in this paper.
information. However, this paper is validated only by using
still images (obtained from the Internet) of animals’ heads.
Aside from texture and shape, color hue can also be used to environment, camera resolution, vehicle mobility, and the large
detect animals, particularly in daylight. For instance, the work intraclass variability between different types of large animals.
in [16] starts by preprocessing the input images to reduce the The detection of large animals imposes several new chal-
amount of processed data. The color space Luv is used, and lenges compared to the detection of human faces or pedestri-
the mean-shift clustering algorithm is applied to the prepro- ans. Human faces have a relatively fixed texture features that
cessed images to perform color segmentation. This approach can be accurately described by Haar features, as shown in [9].
uses training images to obtain a color model for an ani- The HOG descriptor was originally developed to detect pedes-
mal. Unfortunately, this method is only applicable to daytime trians, since their outlines are nearly invariable even when they
conditions, and does not seem adaptable to night conditions. are walking in different directions. It is shown in [18] that
Khorrami et al. [17] proposed a method that is capable of HOG outperforms most object detection algorithms. With large
detecting multiple types of animals using principal compo- animals, however, colors and outlines vary greatly. Compared
nents analysis (PCA). PCA is a mathematical technique used to human faces and heads, which are almost unique and
to reduce data dimensionality. After detecting animals, local standard, animal faces and heads have greater variation in
entropy and connected component analysis are used to isolate appearance. Furthermore, human faces are characterized by
the foreground, containing animals from the background. At skin texture, while the texture of animals’ faces are more var-
the end, large displacement optical flow is applied to ensure ied and complex [14]. Moreover, the body of a large animal
that areas in the frames correspond to large changes in veloc- exhibits high variability within the same class (e.g., moose),
ity. The mentioned works indicate that gradient features can and between animals of different classes. This is due to the
also be used alongside color or texture features. This is due to fact that animals possess specific properties including texture,
the fact that gradient features, such as HOG, are invariant to height, shape, and different views (posterior, anterior, and lat-
scale and illumination, and are hence well suited to nighttime eral), that distinguish them from others. The effective use of
applications. those features to detect large animals is a challenging issue,
and is largely discussed in this paper.
III. A NIMAL D ETECTION S CENARIO
In this paper, the proposed architecture can be implemented IV. DATASET C REATION
in a dashboard system or in roadside units (RSUs). Stationary We created our dataset (called LADSet) from a large set
cameras are installed at the roadside; when a large animal of images and videos collected mainly from the Internet [6].
enters their field of view, the cameras detect the animal and Collecting videos directly from nature is challenging, and to
notify upcoming vehicles through flashing signs installed on the best of our knowledge, no such public dataset exists.
the roadside (see Fig. 1). Approaching drivers, after seeing the Approximately 20 h of various videos were selected, contain-
flashing signs, immediately reduce their vehicle’s speed and ing images of large animals such as moose, elk, horses, cows,
make appropriate decisions to avoid a collision. If a dashboard and deer in residential areas, zoos, or forests. Particular interest
camera (see Fig. 2) is installed inside the vehicle, flashing was given to large animals crossing rural roads and highways,
signs are not necessary. The dashboard camera will detect ani- and to videos recorded from moving vehicles. This reflects real
mals crossing the roadway, and will notify the driver through situations, and helps improve system performance. Moreover,
a warning message. videos with different weather conditions (rain, sun, and snow)
We have successfully tested our proposed architecture on are considered. The collected videos were downloaded from
many videos and images taken from moving dashboard and video websites (e.g., Youtube and Youku) and converted to
RSU cameras. The detection of road-faring animals by station- image format using the ratio of 1:4 continuous video frames
ary or mobile cameras is a challenge due to several undesirable to avoid repeated sampling. For nighttime conditions, a small
factors. These factors are mainly related to the surrounding dataset was constructed; it is explained in Section IX.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 3. HTR shape category images taken from LADSet. Fig. 4. HTL shape category images taken from LADSet.
TABLE I
D IFFERENT PARAMETERS OF HOG D ESCRIPTORS U SED TO S ELECT A
C OMBINATION T HAT Y IELDS THE B EST P ERFORMANCE R ESULTS
FOR D ETECTING A NIMALS . T HE R ESULTS OF V IDEO T EST
S PEEDS C OME F ROM THE S AME V IDEO (L ENGTH : 57 S ,
S IZE : 320 × 240, T OTAL F RAMES : 1027)
V. F EATURES E XTRACTION
In this paper, we used three features to describe large ani-
mals: LBP, HOG, and Haar. A brief description of these
features is given in the following sections [6].
A. LBP Features
The LBP is a simple and powerful texture descriptor used
to discover and summarize local patterns in frames. Each pixel
is described by its relative gray level to its direct neighboring
pixels using the original basic version of LBP, or to indirectly
neighboring pixels using the extended LBP version. If the
intensity of the neighboring pixel pn is lower than the inten-
sity of the center pixel pc , it is set to zero; otherwise, it is set
to one. Consequently, each pixel is represented by a binary
code. For instance, for a region of 3 × 3 pixels, the LBP code
(xc , yc ) is expressed in its decimal form as:
of its pixel center
LBP(xc , yc ) = 7i=0 S(pn − pc ) × 2i , where S(v) = 1 if v ≥ 0, Fig. 7. Comparison between different parameters of HOG.
and 0 otherwise.
More distinctive block features, called multiblock
LBP (MB-LBP) features, were proposed in [23]. Here, using the 1-D mask [−1 0 1] as shown in [18]. The image
the authors applied MB-LBP and AdaBoost learning methods is then divided into a set of cells with a size of 8 × 8 pixels.
to face detection, which perform better than the original LBP After that, four adjacent cells are regrouped to form a block.
features and Haar-like features. The basic idea of MB-LBP is For each cell, the histogram of gradient with nine orientation
that the simple binary difference rule that works in a single bins is computed for later use as a descriptor block. This is per-
pixel is transferred to a block, which may have a different formed by accumulating votes into bins for each orientation.
size. This means that the MB-LBP operator is defined by The vote is weighted by the magnitude of a gradient at each
comparing the central block intensity bc with those of its pixel. Finally, when all histograms are computed, the descrip-
eight neighborhood blocks b0 , . . . , b7 (see Fig. 6). The size tor vector is built into a single vector, and cell histograms are
of the blocks considered in this paper varies from 1 × 1 normalized. For normalization purposes, cell histograms are
to 5 × 5 pixels. Furthermore, we only apply LBP(8,1) as organized into blocks of 16 × 16 pixels. L1-norm, L1-sqrt,
the operator of MB-LBP. This is because the larger blocks L2-norm, and L2-hys can be applied to normalize the gradi-
have very limited descriptive functions compared with ent intensity (to make the feature vector space robust to local
adjacent blocks.The final binary sequence can be obtained illumination changes). Once this normalization step has been
as MB-LBP = 7i=0 S(bi − bc ) × 2i , where bc is the average performed, all the histograms can be concatenated into a single
intensity of the central block and bi (i = 0, . . . , 7) is the feature vector.
average intensity of the neighbor block i. The parameters of HOG, as defined in [18], yield a good
In this paper, we consider MB-LBP for animal detection, performance when used to detect pedestrians. In this paper,
and we use the AdaBoost algorithm in order to select an opti- we vary the parameters of HOG in order to obtain the com-
mal set of local regions and their weights (see Section VI). bination that yields the best performance in detecting large
By doing this, a smaller MB-LBP features set, which repre- animals using many videos and images (see Table I). We
sents animals, may be generated and compared to instances in show in Table I and Fig. 7 that HOG1 performs better
which earlier versions of LBP are used. than HOG2–HOG5 in terms of both detection speed and
low FPRs. On the other hand, HOG3 has the highest true
B. HOG Features positive rate; unfortunately, it is extremely slow (343 ms)
HOGs [18] is a gradient-based method initially proposed for compared to HOG1, HOG2, HOG4, and HOG5. Its FPR is
the detection of pedestrians. Usually the first step of HOG is also high, thus it does not meet the requirements of our
gamma and color normalization of the input image. The hori- entire system. Hence, HOG1 parameters are used in our
zontal and vertical gradients of each pixel are then computed experiments.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
B. AdaBoost Training
1) Weak Classifiers: As we know, the sets of Haar,
MB-LBP, and HOG features computed over a given image
Fig. 8. Example of Haar features shown in the detection Window, for
are extremely large. For instance, in a sub-window of
instance: edge features (a, d), line features (b, c, e) and center-surround 20 × 20 pixels, we may find 45 891 Haar-like features,
features (f) [9], [13], [24]. 3600 MB-LBP features, and 576 HOG features. Although the
number of feature MB-LBP, and HOG is much lower than
C. Haar Features Haar-like features, the three descriptors contain a considerable
Haar-like features are a set of 2-D Haar functions used to amount of redundant information. For instance, the features
encode the texture of objects [13] in images. Each Haar-like that only appear in the background are useless. The AdaBoost
feature consists of at least two adjacent “black" and “white" algorithm is then used to select significant features and to
rectangles. The value of a Haar-like feature is then found by construct a powerful binary classifier.
computing the difference between the sum of the pixel values In this paper, each single feature (i.e., Haar, HOG, or
within black and white rectangular regions. Originally, two MB-LBP) is used as a weak classifier to separate positive
kinds of Haar features were introduced in [13], as shown in from negative images. For each weak classifier, an optimal
Fig. 8(a) and (b). These features were first extended in [9], to threshold classification function is defined with the purpose of
include a third feature as shown in Fig. 8(c); in [24] they were maximizing the classification ability and minimizing the num-
extended to contain a tilted (45◦ ) Haar-like feature as shown ber of misclassified sub-windows. For that purpose, we define
in Fig. 8(d)–(f). a weak classifier by ht ; this consists of a feature ft , a threshold
The set of the basic Haar features extracted from a given θt and a coefficient factor pt which indicates the direction of
image is extremely large. For instance, in a sub-window of the inequality sign (“<” or “>” between ft and θt ), as per-
24 × 24 pixels, more than 45 000 Fig. 8(a)–(c) types of Haar formed in [25]. Equation (2) defines the optimal threshold
features were computed. A convenient and fast method for classification function for a weak classifier as follows:
computing the huge number of Haar-like features is through 1, if pt ft (x) < pt θt
the integral image. In this paper, the features shown in ht (x) = (2)
Fig. 8(a) and (b), and a tilted (90◦ ) version of them, are 0, otherwise
used to represent animals. After that, we trained them using where x is a sub-window of an input image, ft indicates the
the AdaBoost algorithm as performed in [9] to detect large tth Haar feature or the HOG histogram bin value (we use Hk,t
animals. instead of ft to illustrate the tth histogram bin value in the kth
cell when extracting HOG weak classifier). If we apply the
VI. C LASSIFICATION AdaBoost algorithm on MB-LBP features, it will be difficult
In this paper, AdaBoost is used to separately train the three to use the threshold classification function, since the value
aforementioned features: 1) Haar; 2) MB-LBP; and 3) HOG. of MB-LBP features is nonmetric. As performed in [23], the
AdaBoost is extremely simple to use and to implement, and weak classifier known as the decision tree or regression tree is
often yields very effective results. applied. Hence, the multibranch tree is adopted to design the
weak classifiers based on MB-LBP features. The multibranch
A. AdaBoost Algorithm tree has 256 branches, each of which correspond to a certain
Given T weak classifiers ht (each of them represented by discrete value of MB-LBP features [23]. The weak classifier
one feature: HOG, MB-LBP, or Haar) learned through an iter- (in the case of MB-LBP features) is defined as
⎧
ative process, the strong classifier is then formed through a ⎪
⎪ a0 , if xk = 0
⎪
⎪
linear combination of weak classifiers ⎨ · · ·
T
ht (x) = aj , if xk = j (3)
⎪
⎪
H(x) = sign αt .ht (x) (1) ⎪···
⎪
⎩
t=1 a255 , if xk = 255
where αt refers to the weight of each weak classifier found in where xk is the kth element of feature vector x and aj , ( j =
the boosting process and x represents the input sub-window, 0, . . . , 255) are regression parameters learned in the AdaBoost
ht is a weak classifier generated by a single feature. The weight training process. We calculate the best tree-based weak clas-
coefficients are computed by an iterative process as described sifier just as we would learn a node in a decision tree [23].
in [9]. Assuming each feature is considered a weak classi- The minimization of (3) gives the following parameters:
fier, the total number of each type of feature in a certain k
image would be extremely large. For each type of feature, i wi yi δ xi = j
aj = k . (4)
the AdaBoost training function aims to select a small number i wi δ xi = j
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Obviously, the parameters aj ∈ [−1, +1]. aj > 0 indicates that Algorithm 1: Cascade Classifier
the tth MB-LBP feature extracted from a positive sample is 1 Input: Detection sub-window
greater than that which is extracted from a negative sample. 2 Output: Detection result: Positive or Negative
Thus, when ht (x) is greater than a threshold TLBP , we say that 3 for i = 1; i ≤ N; i + + do
the input window x represents a true animal. Otherwise, it is 4 Si = 0
Initialize the calculation result Si for stage i/N classifier.
not considered to be an animal. 5 for t = 1; t ≤ T; t + + do
2) HOG-AdaBoost Algorithm: With the SVM classifier, 6 Si = Si + αj hj (x)
all the 36N 1 (N is the number of blocks) HOG vectors are where ht (x) is the t/T weak classifier, namely the tth
extracted to participate in the classification process. However, selected feature in this stage.
7 end
in the case of the AdaBoost algorithm, only a small set of 8 if Si ≥ Thresholdi then
histogram values, known as weak classifiers, are used. This 9 RESULT ← TRUE
means that each single histogram value in one bin of the cell The looping would continue
has classification capabilities. Actually, in each cell, we have 10 end
nine weak classifiers, each of which corresponds to one bin. 11 else
12 RESULT ← FALSE
The AdaBoost algorithm aims to pick up the most powerful skip detection and the result is negative.
weak classifiers from the 36N histogram bins. We set a thresh- 13 end
old θt for each bin value; we then compared the value Hk,t 14 end
(which indicates the tth histogram bin value in the kth cell) 15 return RESULT
of the input image to the threshold θt (which corresponds to
the tth feature); this was done based on (2) using hk,t (x) = 1
if pt Hk,t (x) < pt θt , and 0, otherwise. Finally, we combined
the selected weak classifiers into a strong final one. A num-
ber of trained, strong AdaBoost classifiers can be linked by a
“cascade” algorithm to get a more efficient and accurate
classifier, as explained in the next section.
3) Cascade of AdaBoost Classifiers: A cascade of classi-
fiers was constructed in order to improve detection perfor-
mance and reduce processing time. The word cascade means
that the resulting classifier consists of several simple classifiers
Fig. 9. Number of features chosen by each stage for the three classifiers.
applied subsequently to an input window, until the recogni-
tion or rejection of the target takes place. In this paper, the
AdaBoost training algorithm was applied to each stage. The referred in this paper as “Haar-AdaBoost,” “LBP-AdaBoost,”
key principle is that the simple, but efficient, AdaBoost clas- and “HOG-AdaBoost.”
sifiers are arranged at early stages to reject many negative Fig. 9 shows the number of features selected at each stage.
sub-windows and to accept almost all positive sub-windows. The classifier HOG-AdaBoost uses the largest number of weak
The subsequent complicated and strong classifiers aim to classifiers in each stage; the LBP cascade classifier uses less
achieve low FPRs. Algorithm 1 shows the pseudo-code of the than ten features in earlier stages, and around 17 in later stages;
cascade classifier [9]. and the Haar detector chooses from 15 to 56 features. Before
4) Classifier Training: We assume that we need to train a we conducted a comparison, we observed that LBP features
classifier with N stages. The
FPR and the detection
N rate of were the most powerful and efficient features compared to
the classifier are then: F = N i=1 f i and T = i=1 t i , where HOG and Haar.
F and T are the desired FPR and the accuracy rate, respec-
tively; fi and ti are, respectively, the FPR and the detection VII. C OMPARISON B ETWEEN D ETECTORS
rate of the ith stage classifier. If we intend our classifier to
To evaluate the performance of each detector, two main cri-
achieve a detection rate of 90% (for the whole 20 stages), the
teria were considered: 1) detection accuracy and 2) processing
minimum bit rate of each stage should be: log0.9 20 ≈ 0.995.
time. We tested these detectors on more than 7000 testing still
At the same time, if we define the FPR of each classifier
images and on 13 sequences of video (see Figs. 2 and 16–18)
stage as ≤ 50%, the maximum FPR should be less than
under different weather conditions. The experiments were per-
0.5020 ≈ 0.95 × 10−6 . This is considered to be a very low
formed on an Intel core i5-2450M 2.50 GHz dual-core with
FPR. Note that F and T rates are not the final performance
4 GB of RAM.
rates of our architecture; they are estimated rates obtained from
the training dataset. Recall that in this paper, Haar, MB-LBP
A. Processing Time
and HOG features are trained separately in our image dataset
using the AdaBoost learning algorithm. These classifiers are In order to measure the time required to recognize animals
in a given scene, otherwise known as processing time, we
1 Assume we have N blocks in an image window, and that each block choose continuously playing video instead of a single image to
contains four cells divided into nine bins; the final HOG feature vectors should simulate a real-time situation (as shown in Figs. 2, 17, and 18).
be a dimension of N × 4 × 9 = 36N. We tested each detector (i.e., HOG-AdaBoost, Haar-AdaBoost,
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 10. Processing time per frame: input video size: 320 × 240; total
frame number: 1026; initial (minimal) detection window size: 56 × 40;
scale rate: 1.05.
Fig. 12. Video detection results using (from left to right) Haar-AdaBoost,
HOG-AdaBoost, and LBP-AdaBoost.
TABLE II
the Haar-AdaBoost, when the FPPI rate is less than 10−4 . We S OME I NFORMATION C OMPARISONS FOR H AAR , LBP, AND HOG
have also noticed that the Haar-AdaBoost classifier curve can
achieve an extremely low miss rate; at this point, the system
can recognize 7030/7110 positive samples, but has more than
3000 false positive results in 31 712 negative images.
On the other hand, if FPPI is greater than 10−3 ,
LBP-AdaBoost and Haar-AdaBoost methods have compara-
ble miss rates. However, as Haar-AdaBoost consumes much
more time than LBP-Adaboost (as shown in Figs. 10 and 11),
and has a significant number of false positives (see Table II),
LBP-AdaBoost can be considered the best solution, followed
by Haar-AdaBoost. The video detection results of Fig. 12 show
a clear difference in performance between the three detectors.
The LBP-AdaBoost detector wins the comparison in terms of
true positive results and false negative misjudgment. In addi-
tion, the Haar-AdaBoost detector obtains better results than
the HOG-AdaBoost detector, particularly when considering
FPRs. The three aforementioned schemes are also compared
to the well-known HOG-SVM in terms of accuracy of detec-
tion (see Fig. 13). Again, LBP-AdaBoost seems to be the most
accurate scheme compared to the three other detectors. On the
other hand, HOG-SVM only outperforms the HOG-AdaBoost
detector. If we associate the evaluation results of Fig. 13 with Fig. 14. Architecture design of the two-stage animal detection system.
Fig. 9, we find that LBP features are very efficient at animal
detection: only 219 LBP features applied in 18 stages achieve
final detection accuracy rate. The stages’ order is driven by
the highest detection rate.
the accuracy of detection and processing time.
With the two-stage architecture, the whole image is scanned
VIII. P ROPOSED A RCHITECTURE in the first stage to yield ROIs that possibly contain animals.
After a preprocessing step, which involves adapting the size
We want to emphasize that due to the large diversity of ani-
of ROIs to the requirements of the second stage, these resized
mal types and postures, compared to human faces or pedestrian
ROIs are then scanned by a second stage to verify whether
postures, it is hard to develop detectors for all large animals
or not they contain animals. The first stage of our proposed
at once. As a first attempt, we propose a two-stage archi-
scheme can be considered as a detectorİand the second stage
tecture in this paper that can detect some large animals, as
as a “classifier” or “recognizer.”
highlighted in Section IV. We begin this section with a qualita-
tive comparison between the aforementioned detectors, which
clarifies fundamental reasons and advantages of our proposed A. Architecture Design
two-stage architecture for detecting large animals. HOG-SVM We design our system with two main criteria in mind: accu-
and HOG-AdaBoost methods are less beneficial than LBP- racy of detection and processing time. This was performed in
AdaBoost and Haar-AdaBoost algorithms if they are applied order to decrease FPRs and to increase the accuracy of the
to the entire image (in the first stage), for the following rea- system. Moreover, in order to obtain a real-time detection sys-
sons. In Fig. 13, we show that HOG-AdaBoost and HOG-SVM tem, the processing time is used as a second design criterion.
have the highest FPRs compared to Haar-AdaBoost and This enables quick target detection. To achieve these criteria,
LBP-AdaBoost. The application of these two detectors might a two-stage system is suggested in Fig. 14.
have a severe impact on our system. Furthermore, as we are In the first stage, we apply a fast detection algorithm, which
dealing with real-time systems, HOG-SVM has the highest supplies the second stage with a set of ROIs that may contain
processing time compared to the other schemes, as shown in animals and other similar objects (false positive targets). To
Fig. 11. Therefore, only LBP-Adaboost and Haar-Adaboost fulfil the system requirements, the detector of the first stage
can be applied to the entire image (first stage) [6]. should operate simply and quickly, because it is applied to
Now, if we compare between LBP-AdaBoost and Haar- the entire input frame. We chose LBP-AdaBoost at the first
AdaBoost to determine which is best suited as a first stage, stage, since AdaBoost is more quick to reject false targets and
we select LBP-AdaBoost because of its low FPR and notice- less complex than SVM. Moreover, it is shown in Section VII
ably superior processing time, despite its lower true positive and Fig. 13 that LBP-AdaBoost has good performance results
rate. Moreover, we show in Table II, a quantitative compari- compared to other schemes. The first stage LBP-AdaBoost is
son between the three features. Again, we observe that the use applied to the entire image to obtain ROIs that may contain
of LBP allows for a few positive rates. The second stage of animals. Furthermore, this first stage uses a single classifier
our architecture is used to eliminate the false positive results trained by both animal-dataset categories (HTL and HTR).
extracted by the first stage. This can dramatically increase the Conversely, the second stage uses two parallel sub-classifiers.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 15. Example shows that an ROI detected by the first 1st classifier (red
rectangle) does not cover the minimum detectable rectangle size (green rect-
angle) of the 2nd stage classifiers; this is the case unless the ROI is extended
to the blue region.
Fig. 16. Experimental results of large animal detection performed on still
images in snowy, sunny, cloudy, and nighttime conditions.
Fig. 18. Experimental results performed on video showing a horse crossing Fig. 20. Miss rate versus FPPI curves used to demonstrate the two-stage
the roadway in snowy conditions. animal detection system.
Fig. 19. Average detection speed; LBP-AdaBoost cascade HOG-SVM is Fig. 21. Miss rate versus FPPI curves used to demonstrate the two-stage
64.32 ms and HOG-SVM operating as a first stage cascade LBP-AdaBoost is animal detection system (nighttime scenario).
167.45 ms. Video size: 320×240; total frame number: 1026; initial (minimal)
detection window size: 56 × 40; scale rate: 1.05.
X. C ONCLUSION
In this paper, we investigated the problem of AVC mitigation
Fig. 22. Nighttime experimental results performed on infrared still images. using camera-based systems. Three detectors based on HOG,
LBP, and Haar features were used to detect large animals.
These features were trained separately on our dataset, LADSet,
using the well-known AdaBoost algorithm. That is, three dif-
ferent detectors called Haar-AdaBoost, HOG-AdaBoost, and
LBP-AdaBoost were constructed. These detectors were then
assessed and compared with each other. The comparison was
performed in terms of accuracy and processing time. After
that, these detectors were again compared to the well-known
HOG-SVM. Overall, LBP-AdaBoost has shown important
results compared to other schemes. However, a high FPR was
observed. To cope with this issue, we used LBP-AdaBoost and
took advantage of its good detection rate, and we combined
it with HOG-SVM. The latter has shown good performance
when detecting the contours of animals. That is, a new scheme
based on the two-stage strategy was developed. The afore-
mentioned schemes were evaluated and tested in different
illuminated conditions. To perform our experiments, we have
concentrated on the lateral view of a moose since this is the
position it takes when crossing roadways. Other positions,
such as posterior and anterior views and postures at vari-
ous angles, are left to future work. Our two-stage architecture
LBP-AdaBoost/ HOG-SVM has shown a good performance
Fig. 23. Experimental results performed on infrared videos of nighttime. in daytime conditions. However, we have observed that dur-
ing the nighttime, the combination of LBP-AdaBoost and
HOG-SVM has shown limited capabilities.
We then retrained the aforementioned one-stage and two-
stage architectures based on the thermographic animal images, ACKNOWLEDGMENT
using the same strategy as performed with daytime detec- The authors would like to thank the anonymous reviewers
tion. Of the four one-stage detectors, while LBP-AdaBoost for their comments that help to improve the manuscript.
has the best performance in terms of animal detection in
daytime conditions (see Fig. 13), its classification ability at R EFERENCES
nighttime seems to be the worst (with Haar-AdaBoost), as
[1] I. D. Katzourakis, C. F. J. de Winter, M. Alirezaei, M. Corno, and
shown in Fig. 21. More precisely, the Haar-AdaBoost detec- R. Happee, “Road-departure prevention in an emergency obstacle avoid-
tor performs slightly better than LBP-AdaBoost when the FPPI ance situation,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 44, no. 5,
rate is greater than 0.4×10−2 . This is because texture features May 2014.
[2] J. M. Conn, J. L. Annest, and A. Dellinger, “Nonfatal motor-vehicle
(LBP and Haar features) are weakened in nighttime environ- animal crash-related injuries, United States, 2001-2002,” J. Safety Res.,
ments. Conversely, the gradient features are strong enough vol. 35, no. 5, pp. 571–574, 2004.
to be detected at night. That is, HOG-AdaBoost followed by [3] M. A. Sharafsaleh et al., “Evaluation of an animal warning
system effectiveness phase two-final report,” Dept. Transp.,
HOG-SVM have the best detection rate in nighttime conditions Inst. Transp. Studies Univ. Calfornia, Berkeley, CA, USA,
(see Fig. 21). Tech. Rep. UCB-ITS-PRR-2012-12, 2012.
These results are explained as follows. In the context of [4] K. Knapp et al., “Deer-vehicle crash countermeasure toolbox: A decision
and choice resource,” Midwest Regional Univ. Transp. Center, Deer-
animal detection technology, there are two main differences Veh. Crash Inf. Clearinghouse, Univ. Wisconsin-Madison, Madison, WI,
between thermographic and common (visible light) images. USA, Tech. Rep. DVCIC-02, 2004.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[5] A. Mammeri, T. Zuo, and A. Boukerche, “Extending the detection range Abdelhamid Mammeri received the M.Sc.
of vision-based driver assistance systems application to pedestrian pro- degree from the Catholic University of Louvain,
tection system,” in Proc. IEEE Glob. Commun. Conf. GLOBECOM, Louvain-la-Neuve, Belgium, in 2002, and the Ph.D.
Austin, TX, USA, Dec. 2014, pp. 1358–1363. degree from Sherbrooke University, Sherbrooke,
[6] D. Zhou, “Real-time animal detection system for intelligent vehicles,” QC, Canada, in 2010, all in electrical and computer
M.S. thesis, School Elect. Eng. Comput. Sci., Univ. Ottawa, Ottawa, engineering.
ON, Canada, 2014. He is a Research Associate with DIVA Strategic
[7] Z. Debao, W. Jingzhou, and W. Shufang, “Countour based HOG deer Research Network, Ottawa University, Ottawa,
detection in thermal images for traffic safety,” in Proc. Int. Conf. Image ON, Canada. His current research interests include
Process. Comput. Vis. Pattern Recognit., Las Vegas, NV, USA, Jul. 2012, visual sensor networks, wireless ad hoc networks,
pp. 1–6. vehicular networks, energy minimization schemes
[8] T. Burghardt and J. Calic, “Real-time face detection and tracking of ani- for visual sensor networks, object detection through vision applied on
mals,” in Proc. 8th Seminar Neural Netw. Appl. Elect. Eng. (NEUREL), Vehicular Adhoc Networks, etc. He has extensively published in highly
Belgrade, Serbia, Sep. 2006, pp. 27–32. ranked international conferences and journals in the above areas.
[9] P. Viola and M. Jones, “Rapid object detection using a boosted cascade Dr. Mammeri was a recipient of the FQRNT Quebec Scholarship Award
of simple features,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. at Post-Doctorate-Level in 2012. He has served as a Technical Program
Pattern Recognit., vol. 1. Kauai, HI, USA, Dec. 2001, pp. 511–518. Committee Member for several conferences, including the IEEE Vehicular
[10] D. Ramanan, D. A. Forsyth, and K. Barnard, “Building models of ani- Technology Conference 2013, the IEEE Local Computer Networks 2013, and
mals from video,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 8, ACM Modeling, Analysis and Simulation of Wireless and Mobile Systems
pp. 1319–1334, 2006. 2013.
[11] H. Cho, P. E. Rybski, A. Bar-Hillel, and W. Zhang, “Real-time pedestrian
detection with deformable part models,” in Proc. IEEE Intell. Veh. Symp.
IV, Alcalá de Henares, Spain, Jun. 2012, pp. 1035–1042.
[12] T. Burghardt, B. Thomas, P. J. Barham, and J. Calic, “Automated visual
recognition of individual African penguins,” in Proc. 5th Int. Penguin
Conf., Ushuaia, Argentina, Sep. 2004. Depu Zhou is currently pursuing the master’s
[13] C. P. Papageorgiou, M. Oren, and T. Poggio, “A general framework for degree in electrical and computer engineering with
object detection,” in IEEE 6th Int. Conf. Comput. Vis., Mumbai, India, the School of Electrical Engineering and Computer
Jan. 1998, pp. 555–562. Science, University of Ottawa, Ottawa, ON, Canada.
[14] W. Zhang, J. Sun, and X. Tang, “From tiger to panda: Animal head His current research interests include video
detection,” IEEE Trans. Image Process., vol. 20, no. 6, pp. 1696–1708, streaming over vehicular networks.
2011.
[15] W. Zhang, J. Sun, and X. Tang, “Cat head detection—How to effectively
exploit shape and texture features,” in Proc. ECCV, vol. 4. Marseille,
France, 2008, pp. 802–806.
[16] M. Zeppelzauer, “Automated detection of elephants in wildlife video,”
EURASIP J. Image Video Process., vol. 46, no. 1, pp. 1–44, 2013.
[17] P. Khorrami, J. Wang, and T. Huang, “Multiple animal species detection
using robust principal component analysis and large displacement optical
flow,” in Proc. Workshop Vis. Observation Anal. Animal Insect Behav.
(VAIB), Tsukuba, Japan, 2012.
[18] N. Dalal and B. Triggs, “Histograms of oriented gradients for human Azzedine Boukerche received the M.Sc. and
detection,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Ph.D. degrees in computer science from McGill
Recognit. (CVPR), vol. 1. San Diego, CA, USA, Jun. 2005, pp. 886–893. University, Montreal, QC, Canada.
[19] R. M. Nowak, Walker’s Mammals of the World. Baltimore, MD, USA: He is a Full Professor and the Canada Research
Johns Hopkins Univ. Press, 1999, pp. 1081–1091. Chair Tier-1 with the University of Ottawa, Ottawa,
[20] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel, “Man vs. computer: ON, Canada. He was a Faculty Member with the
Benchmarking machine learning algorithms for traffic sign recognition,” University of North Texas, Denton, TX, USA. He
Neural Netw., vol. 32, pp. 323–332, Aug. 2012. is the Scientific Director of NSERC-DIVA Strategic
[21] J. Ge, Y. Luo, and G. Tei, “Real-time pedestrian detection and tracking Research Network and the Director of PARADISE
at nighttime for driver-assistance systems,” IEEE Trans. Intell. Transp. Research Laboratory with Ottawa University. He
Syst., vol. 10, no. 2, pp. 283–298, Jun. 2009. was a Senior Scientist with the Simulation Sciences
[22] D. Chen, X. B. Cao, H. Qiao, and F.-Y. Wang, “A multiclass classifier Division, Metron Corporation, San Diego, CA, USA. He was with the
to detect pedestrians and acquire their moving styles,” in Proc. IEEE JPL/NASA-California Institute of Technology, Pasadena, CA, USA, for one
Int. Conf. Intell. Security Informat., San Diego, CA, USA, May 2006, year, where he contributed to a project centered about the specification and
pp. 758–759. verification of the software used to control interplanetary spacecraft operated
[23] L. Zhang, R. Chu, S. Xiang, S. Liao, and S. Z. Li, “Face detection based by JPL/NASA Laboratory. His current research interests include vehicular
on multi-block LBP representation,” in Advances in Biometrics. Berlin, networks, sensor networks, mobile ad hoc networks, mobile and pervasive
Germany: Springer, 2007, pp. 11–18. computing, wireless multimedia, performance evaluation and modeling
[24] R. Lienhart and J. Maydt, “An extended set of Haar-like features for of large-scale distributed systems, distributed computing, and large-scale
rapid object detection,” in Proc. Int. Conf. Image Process., vol. 1. distributed interactive simulation. He has published several research papers
Rochester, NY, USA, 2002, pp. I-900–I-903. in the above areas.
[25] P. Viola and M. Jones, “Robust real-time object detection,” Int. J. Dr. Boukerche was a recipient of the Ontario Distinguished
Comput. Vis., vol. 4, no. 2, pp. 51–52, 2001. Researcher Award, the Premier of Ontario Research Excellence Award,
[26] T. Ojala, M. Pietikainen, and D. Harwood, “Performance evaluation the G. S. Glinski Award for Excellence in Research, the IEEE Computer
of texture measures with classification based on kullback discrimina- Society Golden Core Award, the IEEE CS-Meritorious Award, the University
tion of distributions,” in Proc. 12th IAPR Int. Conf. Pattern Recognit. of Ottawa Award for Excellence in Research, and several best research
Conf. A Comput. Vis. Image Process., vol. 1. Jerusalem, Israel, 1994, paper awards for his work on vehicular and sensor networking and mobile
pp. 582–585. computing. He is an Editor of three books on mobile computing, wireless
[27] Tardif & Associates Inc., “Collisions involving motor vehicles and large ad hoc, and sensor networks. He serves as an Associate Editor for several
animals in Canada,” Final report, Transp. Canada Road Safety Dir., IEEE T RANSACTIONS and ACM journals, as well as the Steering Committee
Canada, Ottawa, ON, Canada, Mar. 2003, p. 44. Chair for several IEEE and ACM international conferences. He is a Fellow of
[28] P. Klocek, Handbook of Infrared Optical Materials. New York, NY, the Engineering Institute of Canada, the Canadian Academy of Engineering,
USA: Marcel Dekker, 1991. and the American Association for the Advancement of Science.