Proof Ecoinf 101919

Ecological Informatics xxx (xxxx) 101919
Contents lists available at ScienceDirect
Ecological Informatics
journal homepage: www.elsevier.com/locate/ecolinf
F
WilDect-YOLO: An efficient and robust computer vision-based accurate
object localization model for automated endangered wildlife detection
OO
Arunabha M. Roy a, ⁎, Jayabrata Bhaduri b, Teerath Kumar c, Kislay Raj c
a Aerospace Engineering Department, University of Michigan, Ann Arbor, MI 48109, USA
b Capacloud AI, Deep Learning & Data Science Division, Kolkata, WB 711103, India
c School of Computing, Dublin City University, Dublin 9, Ireland
PR
ARTICLE INFO ABSTRACT
Keywords: Objective. With climatic instability, various ecological disturbances, and human actions threaten the existence of
Endangered wildlife detection various endangered wildlife species. Therefore, an up-to-date accurate and detailed detection process plays an
You only look once (YOLOv4) algorithm important role in protecting biodiversity losses, conservation, and ecosystem management. Current state-of-the-
Object detection (OD)
art wildlife detection models, however, often lack superior feature extraction capability in complex environ-
Computer vision
ments, limiting the development of accurate and reliable detection models. Method. To this end, we present
Deep learning (DL)
D
Wildlife preservation WilDect-YOLO, a deep learning (DL)-based automated high-performance detection model for real-time endan-
gered wildlife detection. In the model, we introduce a residual block in the CSPDarknet53 backbone for strong
and discriminating deep spatial features extraction and integrate DenseNet blocks to improve in preserving criti-
cal feature information. To enhance receptive field representation, preserve fine-grain localized information,
TE
and improve feature fusion, a Spatial Pyramid Pooling (SPP) and modified Path Aggregation Network (PANet)
have been implemented that results in superior detection under various challenging environments. Results. Eval-
uating the model performance in a custom endangered wildlife dataset considering high variability and complex
backgrounds, WilDect-YOLO obtains a mean average precision (mAP) value of 96.89%, F1-score of 97.87%, and
precision value of 97.18% at a detection rate of 59.20 FPS outperforming current state-of-the-art models. Signifi-
cance. The present research provides an effective and efficient detection framework addressing the shortcoming
of existing DL-based wildlife detection models by providing highly accurate species-level localized bounding box
EC
prediction. Current work constitutes a step toward a non-invasive, fully automated animal observation system in
real-time in-field applications.
1. Introduction with aerial image object detection generally suffer from low accuracy
due to complex backgrounds and disturbances among wild animals
RR
In recent years, automated wildlife detection plays a critical role in (Eikelboom et al., 2019). Moreover, satellite-based monitoring methods
wildlife survey (Chalmers et al., 2021; Delplanque et al., 2021; Peng et require very-high-resolution satellite imagery which are limited for rel-
al., 2020), conservation (Khaemba and Stein, 2002; O'Brien, 2010), and atively larger-sized animals (Wang et al., 2019).
ecosystem management (Austrheim et al., 2014; Harris et al., 2010) to To circumvent such issues, various automatic and semi-automatic
tackle worldwide accelerated biodiversity crisis. Up-to-date detailed detection algorithms for wildlife animals have been adopted, in particu-
and accurate wildlife data can be beneficial in preventing biodiversity lar, from unmanned aircraft systems (UASs) imagery (Gonzalez et al.,
CO
losses, ecosystem damage, and poaching (Norouzzadeh et al., 2018; 2016; Ofli et al., 2016). Additionally, pixel-based classification methods
Petso et al., 2021). While traditional wildlife survey techniques mainly that include threshold setting, supervised, and unsupervised classifica-
include distance sampling (Aebischer et al., 2017), camera trapping tion have been popular methods for detecting animals in remote sens-
(Chauvenet et al., 2017), and satellite monitoring (Chauvenet et al., ing images (Kudo et al., 2012; Pringle et al., 2009). However, these
2017), however, such traditional techniques have disadvantages due to methods are not adequate for detecting targets with similar gray-scale
lower efficiency, high cost, the requirement of qualified personals, and values with the complex background (Wang et al., 2019). To detect tar-
their individual bias (Guo et al., 2018). Similarly, wild animal surveys gets in complex environments, various machine learning (ML) methods
⁎ Corresponding author.
E-mail address: arunabhr.umich@gmail.com (A.M. Roy).
https://doi.org/10.1016/j.ecoinf.2022.101919
Received 10 October 2022; Received in revised form 9 November 2022; Accepted 13 November 2022
1574-9541/© 20XX
Note: Low-resolution images were used to create this PDF. The original images will be used in the final composition.
A.M. Roy et al. Ecological Informatics xxx (xxxx) 101919
have been employed to localize objects combining rotation-invariant leading to missed detection and false object predictions for endangered
object descriptors for automated wildlife detection (Cheng and Han, species which posses unique body textures, shapes, sizes, and colors
2016). Although, traditional ML yields encouraging results in relatively (Kim et al., 2019). Between various species, accurate detection and lo-
simple scenarios, however, they are not adequate and robust methods calization tasks can be challenging due to significant variability of light-
for detecting complicated animal features such as structure, texture, ening conditions, low visibility, high degree of osculation and overlap,
morphology, etc. (Peng et al., 2020; Rey et al., 2017). the coexistence of multi-object classes with various aspect ratios, and
More recently, driven by big-data methods (Khan et al., 2022a), other morphological characteristics (Chabot et al., 2019). Additionally,
deep learning (DL) characterized by multilayer neural networks (NN) visual similarities, complex background and the low distinguishable in-
(LeCun et al., 2015) has shown remarkable breakthroughs in pattern terface between species and their surroundings, and various other criti-
recognition for various fields including image classification (Jamil et cal factors offer additional challenges and difficulties for the state-of-
F
al., 2022; Khan et al., 2022b;Singh et al., 2023 ), computer vision the-art wildlife detection models (Feng and Li, 2022).
(Chandio et al., 2022; Voulodimos et al., 2018), object detection (Roy To address the aforementioned shortcomings, in the current study,
et al., 2022; Roy and Bhaduri, 2021; Roy and Bhaduri, 2022; Zhao et we present WilDect-YOLO, based on an improved version of the state-
OO
al., 2019a), time-series classification (Xiao et al., 2021a, 2021c; Xing et of-art YOLOv4 detection model for accurate real-time endangered
al., 2022a, 2022b), brain-computer interface (Roy, 2022a, 2022b, wildlife detection. In WilDect-YOLO, we integrate DenseNet blocks to
2022c), and across diverse scientific disciplines (Bose and Roy, 2022; improve preserving critical feature information and reuse. In addition,
Roy, 2021; Roy and Bose, 2023 ). Particularly in object localization, DL two residual blocks have been carefully designed in the CSPDarknet53
methods have demonstrated superior accuracy (Han et al., 2018) that backbone for strong and discriminating deep spatial features extraction.
can be categorized into two classes: two-stage and one-stage detector Furthermore, Spatial Pyramid Pooling (SPP) has been tightly attached
(Lin et al., 2017a). Two-stage detectors including Region Convolution to the backbone to enhance the representation of receptive fields. We
PR
Neural Network (RCNN) (Girshick, 2015), faster-RCNN (Ren et al., have also utilized a modified Path Aggregation Network (PANet) to effi-
2016), mask-RCNN (He et al., 2017) etc. have shown a significant im- ciently preserve fine-grain localized information by feature fusion. Ad-
provement in accuracy in object localization. In recent times, You Only ditionally, we performed an extensive ablation study for backbone-neck
Look Once (YOLO) variants (Bochkovskiy et al., 2020; Redmon et al., architecture to optimize both accuracy of detection and detection
2016; Redmon and Farhadi, 2017, 2018) have been proposed that unify speed. The proposed WilDect-YOLO has been employed to detect dis-
target classification and localization leading to significant improvement tinct eight different endangered wildlife species that provide superior
in the detection speed (Roy et al., 2022; Roy and Bhaduri, 2021, 2022). and accurate detection under various complex and challenging environ-
Therefore, driven by advances in computer vision technologies, wildlife ments. The WilDect-YOLO effectively addresses the shortcoming of ex-
D
detection is rapidly transforming into a data-rich discipline and has isting DL-based wildlife detection models and illustrates the superior
been applied in the automated detection of a variety of wildlife species potential in real-time in-field applications. In short, current work con-
(Duporge et al., 2021; Eikelboom et al., 2019; Gonçalves et al., 2020). stitutes a step toward a non-invasive, fully automated efficient animal
Along the similar line, various DL methodologies such as convolutional observation system.
TE
neural network (CNN) (Kellenberger et al., 2018), RetinaNet

(Eikelboom et al., 2019), ResNet-50 (Chabot et al., 2022), YOLOv3 2. Related works
(Torney et al., 2019), Faster R-CNN (Peng et al., 2020), Libra-RCNN
(Delplanque et al., 2021) etc. have demonstrated high precision in ob- In the present section, some recent and relevant works have been
ject localization and can be deployed as a reliable and predictable highlighted. More recently, a two-channeled perceiving residual pyra-
EC
model for automated wildlife detection. mid network (Ruff et al., 2021) has been proposed based on audio sig-
Motivations: The main motivation of the present study is to design nals that deliver superior detection accuracy. Furthermore, different
an efficient and robust computer vision-based algorithm for the accu- techniques such as segmentation-based YOLO model (Parham et al.,
rate classification and localization of endangered wildlife species. Cli- 2018), fast-depth CNN-based detection model from highly cluttered
matic instability and various human activities such as thawing, hunt- camera images (Singh et al., 2020), sparse multi discriminative-neural
ing, oil drilling, etc. threaten the existence of various endangered ani- network (SMD-NN) (Meena and Loganathan, 2020), a fast image-
mals and create damage to ecosystems (Jaskólski, 2021). Species that enhancement algorithm based on Multi-Scale Retinex (MSR) (Singh et
RR
inhabit such ecosystems are highly specialized to live in adverse al., 2022), CNN-based model for facial detection (Taheri and Toygar,
weather conditions, which is why such changes affect them severely 2018), a semi-supervised learning-based Multi-part CNN (MP-CNN)
(Crooks et al., 2017). Thus, it is crucial to build an accurate automated (Divya Meena and Agilandeeswari, 2019), CNN with k-Nearest Neigh-
endangered wildlife detection model to conserve and protect the bor (kNN) has been utilized for wildlife detection that provides state-of-
species and the ecosystem. Although, there exists several state-of-the- the-art performance.
art works for wildlife detection (Barbedo et al., 2019; Moreni et al., In terms of endangered animal detection, there is only a handful of
CO
2021; Naude and Joubert, 2019; Peng et al., 2020) including multi- work that has been geared toward addressing such an important issue.
species animal detection (Delplanque et al., 2021; Eikelboom et al., Notably, the DL-based model for classifying red pandas (He et al.,
2019), however, they often suffer from low accuracy, missed detection, 2019); animal action recognition based on wildlife videos (Schindler
and relatively large computational overhead. Additionally, there is no and Steinhage, 2021) are some of the representative works in recent en-
systemic study, as per the authors' best knowledge, that addresses the deavors. Additionally, RGB and thermal image-based Arctic bird detec-
challenge of detecting and accurate localization of multiple endangered tion using drones has been developed in (Lee et al., 2019). After review-
wildlife species that is worthy of further investigation. To this end, the ing the aforementioned methods which are geared toward endangered
current works aim to develop an efficient and robust endangered wildlife detection, the current works aim to develop an efficient and ro-
wildlife classification and accurate object localization model simultane- bust endangered wildlife classification and accurate object localization
ously productive in terms of training time and computational cost model simultaneously productive in terms of training time and compu-
which is currently missing in recent state-of-the-art models for endan- tational cost which is currently lacking in the recent state-of-the-art en-
gered wildlife detection. deavors.
Challenges: Despite illustrating outstanding performance in detect-
ing wildlife species, current state-of-the-art DL algorithms are still not
suitable due to their insufficient fine-grain feature extraction capability
2
3. Endangered wildlife species dataset and the background, and noisy environment. Additionally, the images
of the dataset have variations in their scale, orientation, and resolution.
Since there is no publicly available endangered wildlife dataset, in
the present work, we have extensively collected high-resolution web- 4. Proposed methodology for object localization
harvested images for different endangered species under various com-
plex backgrounds. The dataset used for the experimentation comprises In object detection, the target object classification and localization
eight classes: Polar Bear (Ursus maritimus), Galápagos Penguin (Sphenis- are performed simultaneously where the target class has been catego-
cus mendiculus), Giant Panda (Ailuropoda melanoleuca), Red Panda (Ailu- rized and separated from the background by drawing bounding boxes
rus fulgens), African forest elephant (Loxodonta cyclotis), Sunda Tiger (BBs) on input images containing the entire object. This can be particu-
(Panthera tigris sondaica), Black Rhino (Diceros bicornis), and African larly useful for counting endangered species for accurate surveying. To
F
wild Dog (Lycaon pictus). Fig. 1 shows some of the representative im- this end, the main goal of the current work is to develop an accurate
ages from the custom dataset for the eight different classes considered and robust endangered wildlife localization model. In this regard, dif-
herein. Noteworthy to mention, categories including Galápagos Pen- ferent variants of YOLO (Bochkovskiy et al., 2020; Redmon et al., 2016;
OO
guin, Red Panda, African forest elephant, Sunda Tiger, Black Rhino and Redmon and Farhadi, 2017, 2018) are some of the best high-precision
African wild Dogs have been declared critically endangered species. In one-stage object detection models that consist of the following parts: a
the datasets, there are a total number of 1600 images of which there are backbone for semantic deep feature extraction, followed by the neck for
200 images for each class. For the variability and challenges in the hierarchical feature fusion, and finally detection head for object classi-
datasets, we have included images that characterize limited and/or full fication and localization.
illumination, low visibility, high degree of occultation, multiple objects The overall schematic of the YOLO object localization process has
with overlap, complex backgrounds, the textural similarity of the object been depicted in Fig. 2 where the YOLO algorithm transforms the object
PR
D
TE
EC
RR
Fig. 1. (a) Representative samples images from endangered wildlife dataset that consist of eight classes: (a) Polar Bear; (b) Galápagos Penguin; (c) Giant Panda; (d)
Red Panda; (e) African forest elephant; (f) Sunda Tiger; (g) Black Rhino; and (h) African wild Dog. (For interpretation of the references to colour in this figure leg-
end, the reader is referred to the web version of this article.)
CO
Fig. 2. Schematic of (a) YOLO object localization process for endangered wildlife detection; (b) offset regression process for target BBs prediction during CIoU loss.
3
detection task into a regression problem by generating BBs coordinates maximum suppression (NMS) (Ren et al., 2016) algorithm from multi-
and probabilities for each class. During the process, the inputted image ple scales.
size has been uniformly divided into N × N grids where B predictive
BBs have been generated. Subsequently, a confidence score has been as- 4.1. WilDect-YOLO architecture
signed if the target object falls inside that particular grid. It detects the
target object for a particular class when the center of the ground truth In recent endeavors, various attempts have been made on computer
lies inside a specified grid. During detection, each grid predicts NB num- vision-based object detection algorithm for accurate wildlife detection
bers of BBs with the confidence value ΘB as: and survey utilizing deep CNN (Kellenberger et al., 2019), R-CNN
(Ibraheam et al., 2021), Faster R-CNN (Peng et al., 2020), single shot
(1) multi-box detector (SSD) (Saxena et al., 2021), and YOLO (Choe and
F
Kim, 2020). Although the aforementioned techniques have demon-
where infers the accuracy of BB prediction, i.e., strated outstanding performance, however, the detection of endangered
indicates that the target class falls inside the grid, otherwise, wildlife detection task, specifically in Polar and African regions, faces
OO
. The degree of overlap between ground truth and the pre- several specific challenges, in particular, significant variability of light-
dicted BB has been described by the scale-invariant evaluation metric ening conditions, low visibility, high degree of osculation and overlap,
intersection over union (IoU) which can be expressed as the coexistence of multiple target classes with various aspect ratios, vi-
sual similarities, complex backgrounds, and the low distinguishable in-
(2) terface between species and its surroundings. Such challenging condi-
tions lead to false object prediction with a large number of missed de-
tection from the original YOLOv4 (Bochkovskiy et al., 2020) due to its
where Bt and Bp are the ground truth and predicted BBs, respec-
PR
insufficient fine-grain feature extraction capabilities.
tively. However, to further improve BBs regression and gradient disap- To resolve the existing issues, in the current work, we propose a
pearance, generalized IoU (GIoU) (Rezatofighi et al., 2019) and dis- novel object localization algorithm WilDect-YOLO based on a state-of-
tance-IoU (DIoU) (Zheng et al., 2020) as been introduced considering the-art YOLOv4 network, specially designed for endangered wildlife de-
aspect ratios and orientation of the overlapping BBs. More recently, tection, to enhance feature extraction, preserve fine-grain localized in-
complete IoU (CIoU) (Zheng et al., 2020) has been proposed for im- formation and improve feature fusion that provides superior detection
proved accuracy and faster convergence speed in BB prediction which under various challenging environments. The model has been opti-
can be expressed as mized to achieve better efficiency and accuracy of BB prediction based
D
on the characteristics and complexities of the endangered wildlife
(3) dataset considered herein. The overall network of the object localiza-
tion model is shown in Fig. 3.
To improve performance in terms of classification accuracy and ob-
TE
ject localization, we perform extensive experiments, and various modi-

(4)
fications are proposed which are detailed in the subsequent sections.
4.2. Improvement of discriminative feature extraction

where bgt and bp denotes the centroids of Bgt and Bp, respectively; ξ
EC
and β are the consistency and trade-off parameters, respectively. As In the present study, we have introduced a residual block CSPX1-n
shown in Fig. 2 -(b), η is the smallest diagonal length of Bp ∪ Bt; wgt, wp where n represents residual weighting operations to improve detection
are widths and hgt, hp are heights of Bgt and Bp, respectively. With in- speed and performance. We integrate CSPX1-n modules in the CSPDark-
creasing wp/hp, we get ξ → 0 from Eq. (4). Therefore, to optimize the in- net53 backbone replacing the original CSP8 and CSP4 residual blocks to
fluence of ξ on the CIoU, wp/hp can be properly chosen for the YOLO extract fine-grained rich semantic information as shown in Fig. 3. In the
model. Finally, the best BB prediction can be obtained from the non- CSPX1-n block, we divide the input features into two parts. In the first
RR
CO
Fig. 3. Schematic of the proposed WilDect-YOLO consists of improved Dense-CSPDarknet53 with residual block CSPX1-n and SPP in the backbone, modified PANet in
the neck part with regular YOLO head.
4
part, (3 × 3) convolution was performed followed by an additional ture operations, such implementation improve the computational
(3 × 3) convolution to maintain the number of feature maps after en- speed.
tering the next residual unit as shown in Fig. 4-(a). To further improve
the feature extraction, we perform 3 × 3 convolution at the end. 4.4. Receptive field enhancement
Whereas, the second part acts as a residual edge for the convolution.
These two parts have been concatenated at the end to improve the se- One of the requirements of CNN is to have fixed-size input images.
mantic feature information. Implementation of the CSPX1-n modules in However, due to the different aspect ratios of the images, they have
the improved CSPDarknet53 helps to learn more expressive features been fixed by cropping and warping during the convolution process
that demonstrate significant improvement of detection accuracy for the which results in losing important features. In this regard, SPP (He et al.,
custom wildlife datasets used herein. 2015) applies an efficient strategy in detecting target objects at multi-
F
ple length scales. To this end, we have added an SPP block integrated
4.3. Preserving critical feature information with CSPX1–2 of the Dense-CSPDarknet53 backbone to improve recep-
tive field representation and extraction of important contextual features
OO
To preserve critical feature maps and efficiently reuse the discrimi- as shown in Fig. 4. In the proposed model, a modified SPP consisting of
native feature information, we have fused DenseNet (Huang et al., various sizes of sliding kernels (i.e., 5 × 5, 9 × 9, and 13 × 13) with
2017) in the original CSPDarknet53. In DenseNet, each layer has been maximum pooling has been prescribed that effectively increases the re-
connected to other layers in a feed-forward mode where n-th layer can ceptive field representation of the backbone.
receive the important feature information Xn from all the previous lay-
ers X0, X1, …, Xn−1 as Xn = Hn[X0, X1, …, Xn−1] where Hn is the feature 4.5. Preserving fine-grain localize information
map function for n-th layer. The schematic of the DenseNet blocks net-
PR
work structure have been shown in Fig. 4-(b, c). In addition, an improved PANet (Liu et al., 2018) integrated with
As shown in Fig. 3, we have introduced two DenseNet blocks; the CSPX2-n has been utilized as a neck of the detection model as shown in
first block (Dense B-1) has been attached before cross-stage partial Fig. 2. It can efficiently combine high and low feature fusion for multi-
block CSPX1–4; whereas the second block (Dense B-2) has been placed scale feature pyramid maps preserving fine-grain localized information.
before CSPX1–2 in the proposed WilDect-YOLO network which results Additionally, by employing flexible ROI pooling and element-wise max
in enhance feature propagation. It has been found that DenseNet signifi- operation, PANet can efficiently fuse the information from previous fea-
cantly improves the feature transfer and mitigates over-fitting in the ture layers resulting in significant improvement in the detection accu-
proposed detection network. Additionally, by reducing redundant fea- racy of the model.
D
TE
EC
RR
CO
Fig. 4. Schematic of (a) CSPX1-n residual block; (b) dense block (DB)-1; (c) dense block (DB)-2; (d) CSPX2-n residual block architecture used in WilDect-YOLO detec-
tion model.
5
Furthermore, CIoU loss function (Zheng et al., 2020), dropblock reg-

ularization (Ghiasi et al., 2018), Cross mini Batch Normalization (Yao (5)
et al., 2021), dropout in feature map (Srivastava et al., 2014), and co-
sine annealing scheduler (Loshchilov and Hutter, 2017) have been em- The ratio of the correct prediction of target classes is called R of the
ployed to further improve the performance of WilDect-YOLO. We use classifier which can be evaluated as:
the original YOLOv3 head in the final part of the detection network.
Utilizing 416 × 416 × 3 image size as the input, the detection head of
(6)
the WilDect-YOLO can predict BBs in three different scales:
(13 × 13 × 24), (26 × 26 × 24), and (52 × 52 × 24) as shown in
Fig. 2. After extensive experiments, we have found that Mish (Misra, The higher values of P and R indicate superior detection capability.
F
2020) activation provides the optimal performance in terms of model Whereas, the F-1 score is the arithmetic mean of the P and R given as:
accuracy. Overall, our proposed methodology provides the best results
in terms of accuracy and performance compared to current state-of-the- (7)
OO
art models for endangered wildlife detection (see Section 6.2).
A relatively high F1 score represents a robust detection model. The
5. Training and performance performance metrics AP can be defined as the area under a P-R curve
(Davis and Goadrich, 2006) as follows
5.1. Training procedure
(8)
In the present work, we have performed an extensive and elaborate
PR
study to explore the comparative performance analysis of the proposed
WilDect-YOLO models for endangered wildlife classification and object A higher average AP value indicates better accuracy in predicting
localization. From the initial custom endangered wildlife species various object classes. In addition, AP50:95 denotes AP over
dataset consisting of 1600 images has been further expanded tenfold by IoU=0.50 : 0.05 : 0.95; AP50 and AP75 are APs at IoU threshold of 50%
utilizing various data augmentation procedures (i.e., colour balancing, and 75%, respectively. The AP for detecting small, medium, and large
rotation, blur processing, mirror projection, brightness transformation) objects can be measured through APS, APM, and APL, respectively. Fi-
to obtain the final dataset of a total of 16,000 images (2000 images per nally, mAP can be obtained from the average of all APs as:
class). From the final dataset, a total of 60%, 20%, and 20% images
D
have been randomly chosen for training, validation, and test sets, re- (9)
spectively. For the training set, LabelImg (Tzutalin, 2015) has been
used for the annotation of BBs around the target classes. For all the ex-
periments, we have used a Windows 10 Pro (64-bit) based computa- 6. Results
TE
tional system that has Intel Core i5-10210U with CPU @ 2.8 GHz ×6,
32 GB DDR4 memory, NVIDIA GeForce RTX 2080 utilizing CUDA In this section, the performance and detection accuracy of the pro-
10.2.89 and cuDNN 10.2 v7.6.5 for GPU parallelization. As required CV posed WilDect-YOLO frameworks have been discussed which have been
libraries, Visual Studio v15.9 (2017), and OpenCV 4.5.1-vc14 have evaluated in a custom-made endangered wildlife dataset consisting of 8
been integrated with DarkNet. Unless otherwise stated, a batch size set classes. For better clarity in BBs representation, the following BB class
EC
to 32 with a total number of training steps has been kept as 85,000 dur- identifiers have been associated in the detection results: class 1- Polar
ing training. The initial learning rate has been set to 0.001. The training Bear; class 2- Galápagos Penguin; class 3- Giant Panda; class 4- Red
dataset has been trained utilizing the available pre-trained weights-file Panda; class 5- African forest elephant; class 6- Sunda Tiger; class 7-
(AlexeyAB, 2021). Various training hyperparameters for WilDect-YOLO Black Rhino; and class 8- African wild Dog. The performance of the
have been detailed in Table 1. WilDect-YOLO network has been optimized through extensive ablation
studies. Finally, the performance of the proposed model has been stud-
5.2. Performance metrics ied in detail and compared with several state-of-the-art object detection
RR
models.
In the present work, the performance of the object detection models
has been evaluated by common standard measures (Ferri et al., 2009) 6.1. Optimization of network performance
including average precision (AP), precision (P), recall (R), IoU, F-1
score, mean average precision (mAP), etc. The confusion matrix ob- At first, we conduct extensive experiments to select proper back-
tained from the evaluation procedure provides the following interpreta- bone-neck combinations to optimize the performance of the proposed
CO
tions of the test results: true positive (TP), false positive (FP), false neg- WilDect-YOLO model in terms of both detection accuracy and speed.
ative (FN), and true negative (TN). During binary classification, the For different combinations of backbone-neck configurations, detection
classified object can be defined as TP for IoU ≥0.5. Whereas, it can be accuracy in terms of parameters AP, AP50, AP75, APS, APM, and APL as
classified as FP for IoU <0.5. Based on the aforementioned interpreta- well as detection speed (in FPS) has been reported in Table. 2. For the
tions, the metric P of the classifier can be defined by its ability to distin- comparison, we select Mish as the activation function. From the Table.
guish target classes correctly as: 2, one can see that DenseNet blocks in CSPDarknet53 (i.e., D-
CSPDarknet-53) improve the accuracy of the detection model compared
to the original YOLOv4. The performance is further improved by intro-
ducing CSPX1-n into D-CSPDarknet53. However, such a configuration
Table 1 results in a slight decrease in detection speed. We observe that the best
Various hyparameters values for training the WilDect-YOLOv model. performance has been achieved when both CSPX1-n and CSPX2-n have
Image size Sub-division Batch Channels Decay been integrated into D-CSPDarknet53 and PANet, respectively. There is
a significant improvement in the accuracy parameter, in particular, AP,
416 × 416 × 3 8 32 6 0.005 APS, and APL increase by 4.9%, 6.9%, and 7.6%, respectively compared
Initial learning rate Momentum Classes Training steps Filters to CSPDarknet53 + PANet configuration. Thus, a such configuration in
0.001 0.9 8 85,000 36
6
Table 2
Performance of various residual and dense block combinations in WilDect-YOLO architecture for anchors size of 416 × 416.
Backbone+ add-in Neck +add-in AP AP50 AP75 APS APM APL FPS
CSPDarknet53 PANet 76.8 93.6 92.5 80.9 89.2 80.9 59.6

D-CSPDarknet53 PANet 78.4 96.1 92.2 78.3 87.7 81.7 61.1
D-CSPDarknet53 + CSPX1-n PANet 79.5 96.1 92.5 77.9 88.2 82.9 60.1
CSPDarknet53 PANet+CSPX2-n 77.1 95.6 91.2 74.1 87.9 84.7 63.2
D-CSPDarknet53 + CSPX1-n PANet+CSPX2-n 81.7 96.9 92.3 87.8 92.5 88.5 59.2
F
WilDect-YOLO provides the optimal performance in terms of detection crease in F1 and 4.68% increase in mAP, respectively. We observe that
accuracy and speed for the custom wildlife species data set considered the performance of Dense-YOLOv4 is superior to the original YOLOv4
herein. In summary, together with proper activation function and im- with 3.34%, 2.63%, 3.01%, and 2.32% increase in P, R, F1, and mAP,
OO
proved backbone-neck combination provide an efficient high- respectively. However, WilDect-YOLO yields the best performance
performance model for wildlife detection in complex scenarios. reaching the values of 97.18%, 98.56%, 97.87%, and 96.89% in P, R,
F1, and mAP, respectively as shown in Fig. 5. Moreover, WilDect-YOLO
6.2. Comparison with existing state-of-the-art models provides a superior real-time detection speed of 59.21 FPS which is
3.34% higher than the original YOLOv4 model. In summary, WilDect-
In this section, the detection performance of WilDect-YOLO is com- YOLO outshines some of the best detection models in terms of both de-
pared with some of the existing state-of-the-art detection models (Zhao tection accuracy and speed suitable for automated high-performance
et al., 2019b). For the performance comparison, we consider Faster R- wildlife detection models.
PR
CNN (Ren et al., 2016), Mask R-CNN He et al. (2017), RetinaNet (Lin et
al., 2017b), SSD Liu et al. (2016), YOLOv3 (Redmon and Farhadi, 6.3. Overall performance of WilDect-YOLO
2018), YOLOv4 (Bochkovskiy et al., 2020), and Dense-YOLOv4 (Roy
and Bhaduri, 2022) that are trained in the custom wildlife dataset in From the previous section, it has been observed that YOLOv4,
OpenMMLab object detection toolbox Chen et al. (2019). Comparison Dense-YOLOv4, and WilDect-YOLO provide better performance com-
of different performance parameters including P, R, F1-score, mAP, and pared to other state-of-the-art models. Therefore, these three models
detection speed obtained from these models have been shown in Table are closely compared in terms of mAP, F1, IoU, final loss, and average
D
3. detection time as shown in Table 4. The proposed WilDect-YOLOv has
The comparison reveals that the accuracy of R-CNN, RetinaNet, achieved the highest average IoU value of 0.917 indicating superior BB
SSD, and Mask R-CNN is quite inferior compared to YOLO variants as accuracy during target detection compared to the other two models.
visually illustrated in the bar-chart plot in Fig. 5. Between YOLOv3 and Similarly, it has also illustrated better detection performance and accu-
TE
YOLOv4, YOLOv4 demonstrated better performance with a 6.46% in- racy by achieving the highest F1 and mAP values of 97.9% and 96.9%
which are 6.1% and 5.6% improvement over the original YOLOv4, re-
spectively.
Table 3
Comparison of different performance parameters including P, R, F1, mAP,
Furthermore, the detection speed of 59.21 FPS obtained from
and detection speed (in FPS) between WilDect-YOLO and other state-of-the- WilDect-YOLO was found to be higher than YOLO and slightly less than
Dense-YOLOv4. Thus, it can provide real-time detection of wildlife
EC
art models where bold highlights the best performance values.

species with better accuracy compared to the other two models. In addi-
Model P (%) R (%) F1-score (%) mAP (%) Dect. time (ms) FPS
tion, the comparison of P-R curves between the three models have been
Faster R-CNN 71.32 72.39 71.85 73.17 41.12 24.32 depicted in Fig. 6-(a). From the comparison of the P-R curves, one can
RetinaNet 75.11 77.67 76.36 77.11 32.89 30.40 see that WilDect-YOLO attains a better P value for a particular R. It
SSD 76.13 80.19 78.10 80.52 28.22 35.43 achieved the highest area under the P-R curve indicating superior de-
Mask R-CNN 78.22 83.35 80.70 81.61 50.72 19.72
tection performance compared to YOLOv4 and Dense-YOLOv4. Next,
RR
YOLOv3 83.61 87.47 85.49 86.61 25.11 39.82

YOLOv4 90.19 93.79 91.95 91.29 17.21 58.10
we compare the loss evolution curves as shown in Fig. 6-(b). In the ini-
Dense-YOLOv4 93.53 96.42 94.95 93.61 16.77 59.63 tial phase, after exhibiting several cycles of fluctuation, the loss in the
WilDect-YOLO 97.18 98.56 97.87 96.89 16.89 59.20 WilDect-YOLO model tends to saturate after approximately 20,000
CO
Fig. 5. Comparison bar chart of different performance parameters including P, R, F1-score, mAP, and detection speed (in FPS) between WilDect-YOLO and other
state-of-the-art models.
7
Table 4 objects have a significant degree of overlap between them. From the de-
Overall performance comparison between original YOLOv4, Dense-YOLOv4, tection result, one can see that the bounding box prediction from the
and WilDect-YOLO. proposed WilDect-YOLO is quite accurate in detecting each target ob-
Detection IoU F1 mAP Validation Detection Detection ject as illustrated in Fig. 8.
model loss time (ms) speed (FPS) In Fig. 9, we have extended the detection for African Elephant and
Sunda Tiger classes where the target class is placed in a complex and
YOLOv4 0.810 0.919 0.913 12.07 17.21 58.10
challenging background. Detection results from WilDect-YOLO in terms
Dense- 0.881 0.949 0.936 5.31 16.77 59.63
YOLOv4 of boundary box precision are more accurate compared to Dense-
WilDect- 0.917 0.979 0.969 1.88 16.89 59.21 YOLOv4 as shown in Table. 6. To further illustrate the efficacy of the
YOLO WilDect-YOLO detection performance, we have considered the detec-
F
tion of Black Rhino and African Wild Dog cases that have a high degree
training steps with a final loss value of 1.88. Whereas, the other two of occlusion, and dense overlapping between object classes. This is
models exhibit higher fluctuation in loss evolution and yield higher fi- quite a challenging task to detect target objects individually. In such
OO
nal loss value. Evidently, the proposed WilDect-YOLO is easier to train cases, the detection results from WilDect-YOLO elucidate superior de-
with faster convergence characteristics demonstrating its efficacy from tection accuracy by preciously detecting each target class with high
the computational point of view. confidence index as shown in Figs. 10.
To further gain insight into the performances of these models, detec- Additionally, for poorly visible multiple target objects due to insuffi-
tion result containing TP, FP, and FN for each class and corresponding cient lightening conditions, the proposed localization algorithm per-
P, R, and F-1 values from Dense-YOLOv4 and WilDect-YOLO has been forms well without missed detection as demonstrated in Figs. 7-10. For
shown in Table 5. WilDect-YOLO has illustrated significant improve- high-aspect-ratio object detection cases with the presence of irregular
PR
ment in P and R values for various classes, in particular, for detecting shapes and the similarity of their texture with surrounding environ-
Galapagoes Penguine, African Elephant, and Black Rhino classes. ments, the proposed the model yields good performance in such chal-
WilDect efficiently maximizes the TP value while simultaneously reduc- lenging scenarios. The overall detection result illustrates accurate and
ing FP and FN values for all classes. The proposed model improves robust bounding box prediction from WilDect-YOLO for all target
3.65% in P and 2.14% in R compared to Dense-YOLOv4. From the over- classes compared to Dense-YOLOv4.
all comparison, we can conclude that WilDect-YOLO demonstrated the
best performance in detecting various endangered wildlife species out- 7. Discussion
performing both YOLOv4 and Dense-YOLOv4 in terms of precision and
D
accuracy values. The current study proposes an efficient automated detection frame-
work for the endangered wildlife species which can be deployed for ani-
6.4. Detection of various animal species mal surveys in various demographic regions without human interven-
tion. Thus, it can significantly reduce the cost of operation, manual
TE
In this section, we have demonstrated the detection results for eight equipment, and overcome the difficulties of working in these adverse
different classes of endangered animal species from the proposed weather conditions. The current framework illustrates its superior capa-
WilDect-YOLO and compared them with Dense-YOLOv4. The visual bility of detecting various endangered animals which are significantly
representations of the detection results have been presented with con- different in terms of body textures, shapes, sizes, colors, and morpho-
fined BBs considering complex backgrounds and challenging environ- logical characteristics. Furthermore, in the presence of various detec-
EC
ments as shown in Figs. 7-10. Corresponding detailed detection results tion challenges such as visual similarities, complex backgrounds, a high
consisting of the number of detected and undetected target classes with degree of occultation and overlap, and the low distinguishable interface
average confidence scores have been reported in Table. 6. between species and its surroundings, the proposed model can replace
In Fig. 7, we tested the model for detecting Polar Bears and Galapa- current state-of-the-art detection models in terms of accuracy and ro-
gos Penguins in a challenging scenario where the target objects have bustness. Additionally, the current deep learning framework can be ex-
been placed in a similar textured background. The proposed model tended to UAS imagery to further expand the capability of detecting
shows its efficacy by preciously detecting the target objects with high various wildlife animals. With improved feature extraction capability
RR
average confidence index values. In separate cases, we have considered and an efficient localization algorithm, the proposed model can be suit-
detection for Giant Panda and Red Panda classes where multiple target able for detecting small-size animals from relatively low-resolution im-
CO
Fig. 6. Comparison of (a) P-R curves; (b) loss evolution curves between original YOLOv4, Dense-YOLOv4, and WilDect-YOLO.
8
Table 5 the current work can be integrated with geographic information sys-
Comparison of detection results for individual classes between Dense-YOLOv4 tems (GIS) for analyzing the migrations and activities of wild animals.
and WilDect-YOLO. Moreover, one of the potential applications can be assembling object
Model Class Objects TP FP FN P (%) R (%) F1- detection framework with semantic segmentation methods such as
score Mask R-CNN (Bharati and Pramanik, 2020), U-Net (Esser et al., 2018)
to extract additional physical information such as diseases, body fat,
WilDect- All 10,070 9694 281 141 97.18 98.56 97.87
YOLO
height as well as various animal activities including eating, running,
Polar Bear 675 656 12 8 98.20 98.79 98.49
Galap. 1453 1398 33 27 97.69 98.10 97.89 and resting which can be helpful in better understanding animal health
Penguine and habits (Norouzzadeh et al., 2018). Nevertheless, the current deep-
Giant Panda 1211 1201 23 11 98.12 99.09 98.60 learning model outshines classical automated image analysis and vari-
F
Red Panda 789 756 12 09 98.43 98.82 98.63 ous state-of-the-art approaches in wildlife animal detection indicating
African 1987 1878 89 32 95.47 98.32 96.87
future improvements in performance and usability for the precise and
Elephant
accurate endangered animal survey which can be applied to various au-
OO
Sunda Tiger 987 923 44 19 95.44 97.98 96.70
Black Rhino 1001 981 23 12 97.70 98.79 98.24 tomated wildlife monitoring (Arbieu et al., 2021; Chen et al., 2020;
Wild Dog 1967 1901 45 23 97.68 98.80 98.24 Desgarnier et al., 2022; Hou et al., 2020; Mannocci et al., 2021) and dif-
Dense- All 10,070 9291 642 345 93.53 96.42 94.95 ferent biological conservation purposes (Stern and Humphries, 2022).
YOLO4 Polar Bear 675 621 37 22 94.37 96.58 95.46
The current framework can also be extended for various fault detec-
Galap. 1453 1378 39 32 97.24 97.73 97.48
Penguine
tion/thermal imaging(Glowacz, 2021a, 2021b, 2021c), human activity
Giant Panda 1211 1118 87 52 92.78 95.56 94.14 recognition (Xiao et al., 2021b) etc.
Red Panda 789 740 29 26 96.22 96.60 96.41
PR
African 1987 1801 177 78 91.05 95.84 93.38 8. Conclusions
Elephant
Sunda Tiger 987 901 76 32 92.22 96.57 94.34
Summarizing, in the present work, we have developed an efficient
Black Rhino 1001 921 87 36 91.36 96.23 93.74
Wild Dog 1967 1811 110 67 94.27 96.43 95.34 and robust object localization algorithm WilDect-YOLO is based on
computer vision for accurate classification and localization of various
endangered wildlife species. In the proposed network, we integrate
ages as well as satellite imagery. Although, the present work focus on
DenseNet blocks to improve feature critical feature information and
endangered animal detection, however, the current framework can be
two new residual blocks for efficient deep spatial feature extraction. In
extended to more generalized automated animal species detection for
D
addition, SPP and improved PANet modules have been employed to ef-
comprehensive and systematic wildlife animal surveys. Furthermore,
TE
EC
RR
CO
Fig. 7. Detection results for Polar Bear (class-1) and Galapagos Penguin (class-2) from the proposed WilDectYOLO and Dense-YOLOv4. Detailed detection results
with average confidence indexes have been shown in Table 6.
9
F
OO
PR
Fig. 8. Detection results for Giant Panda (class-3) and Red Panda (class-4) from the proposed WilDectYOLO and Dense-YOLOv4. Detailed detection results
with average confidence scores have been shown in Table 6. (For interpretation of the references to colour in this figure legend, the reader is referred to the
web version of this article.)
D
TE
EC
RR
Fig. 9. Detection results for African Elephant (class-5) and Sunda Tiger (class-6) from the proposed WilDectYOLO and Dense-YOLOv4. Detailed detection results with
CO
average confidence scores have been shown in Table 6.
ficiently preserve fine-grain localized information by feature fusion. Data availability

Evaluated on a custom-made dataset for endangered wildlife species, it
has been found that at a detection rate of 59.20 FPS, WilDect-YOLO has The data that support the findings of this study are available upon
achieved mAP, F1-score, and precision values of 96.89%, 97.87%, and reasonable request.
97.18%, respectively outperforms existing state-of-the-art wildlife de-
tection models in terms of both classification accuracy and localized Declaration of Competing Interest
bounding box prediction in detecting various wildlife spices. Current
work effectively addresses the shortcoming of existing deep learning- The authors declare that they have no known competing financial
based wildlife detection models and constitutes a step toward a fully au- interests or personal relationships that could have appeared to influ-
tomated accurate automated wildlife monitoring system in real-time in- ence the work reported in this paper.
field applications.
10
F
OO
PR
Fig. 10. Detection results for Black Rhino (class-7) and African wild Dog (class-8) from the proposed WilDectYOLO and Dense-YOLOv4. Detailed detection results
with average confidence scores have been shown in Table 6.
D
Table 6 Data availability
Detailed detection results from WilDect-YOLO and Dense-YOLOv4 for differ-
ent classes as shown in Figs. 7-10. Data will be made available on request.
TE
Species Figs. No Model Detc. Undetc. Avg. confidence

Score Acknowledgements
Polar Bear 7 (a)-(c) WilDect- 10 0 0.96

YOLO
The support of the Aeronautical Research and Development Board
Polar Bear 7 (a-i)-(c-i) Dense- 10 0 0.91 (Grant No. DARO/08/1051450/M/I) is gratefully acknowledged.
YOLOv4
Galap. 7 (d)-(f) WilDect- 16 0 0.93 References
EC
Penguine YOLO
Galap. 7 (d-i)-(f-i) Dense- 13 3 0.88 Aebischer, T., Siguindo, G., Rochat, E., Arandjelovic, M., Heilman, A., Hickisch, R.,
Penguine YOLOv4 Vigilant, L., Joost, S., Wegmann, D., 2017. First quantitative survey delineates the
Giant Panda 8 (a)-(c) WilDect- 18 1 0.94 distribution of chimpanzees in the eastern Central African Republic. Biol. Conserv.
YOLO 213, 84–94.
Giant Panda 8 (a-i)-(c-i) Dense- 14 5 0.83 AlexeyAB, 2021. Pre-Trained Weights-File.
YOLOv4 Arbieu, U., Helsper, K., Dadvar, M., Mueller, T., Niamir, A., 2021. Natural language
RR
Red Panda 8 (d)-(f) WilDect- 7 0 0.98 processing as a tool to evaluate emotions in conservation conflicts. Biol. Conserv. 256,
YOLO 109030.
Red Panda 8 (d-i)-(f-i) Dense- 7 0 0.93 Austrheim, G., Speed, J.D., Martinsen, V., Mulder, J., Mysterud, A., 2014. Experimental
YOLOv4 effects of herbivore density on aboveground plant biomass in an alpine grassland
ecosystem. Arct. Antarct. Alp. Res. 46 (3), 535–541.
African 9 (a)-(c) WilDect- 14 0 0.92
Barbedo, J.G.A., Koenigkan, L.V., Santos, T.T., Santos, P.M., 2019. A study on the
Elephant YOLO
detection of cattle in uav images using deep learning. Sensors 19 (24), 5436.
African 9 (a-i)-(c-i) Dense- 10 4 0.83
Bharati, P., Pramanik, A., 2020. Deep learning techniques—r-cnn to mask r-cnn: a survey.
Elephant YOLOv4 Comput. Intell. Pattern Recog. 657–668.
CO
Sunda Tiger 9 (d)-(f) WilDect- 8 0 0.99 Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M., 2020. Yolov4: Optimal Speed and Accuracy
YOLO of Object Detection.
Sunda Tiger 9 (d-i)-(f-i) Dense- 7 1 0.92 Bose, R., Roy, A., 2022. Accurate deep learning sub-grid scale models for large eddy
YOLOv4 simulations. Bull. Am. Phys. Soc. 1, 1–10.
Black Rhino 10 (a)-(c) WilDect- 7 0 0.98 Chabot, D., Stapleton, S., Francis, C.M., 2019. Measuring the spectral signature of polar
YOLO bears from a drone to improve their detection from space. Biol. Conserv. 237,
Black Rhino 10 (a-i)-(c- Dense- 6 1 0.91 125–132.
i) YOLOv4 Chabot, D., Stapleton, S., Francis, C.M., 2022. Using web images to train a deep neural
Wild Dog 10 (d)-(f) WilDect- 16 0 0.91 network to detect sparsely distributed wildlife in large volumes of remotely sensed
YOLO imagery: a case study of polar bears on sea ice. Ecol. Inform. 101547.
Chalmers, C., Fergus, P., Curbelo Montanez, C.A., Longmore, S.N., Wich, S.A., 2021. Video
Wild Dog 10 (d-i)-(f- Dense- 11 5 0.77
analysis for the detection of animals using convolutional neural networks and
i) YOLOv4
consumer-grade drones. J. Unmanned Vehicle Syst. 9 (2), 112–127.
Chandio, A., Gui, G., Kumar, T., Ullah, I., Ranjbarzadeh, R., Roy, A.M., Hussain, A., Shen,
Y., 2022. Precise Single-Stage Detector. arXiv preprint arXiv:2210.04252.
Chauvenet, A.L., Gill, R.M., Smith, G.C., Ward, A.I., Massei, G., 2017. Quantifying the bias
in density estimated from distance sampling and camera trapping of unmarked
individuals. Ecol. Model. 350, 79–86.
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J.,
Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai,
11
J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D., 2019. MMDetection: Open mmlab Kellenberger, B., Marcos, D., Tuia, D., 2018. Detecting mammals in uav images: best
Detection Toolbox and Benchmark. arXiv preprint arXiv:1906.07155. practices to address a substantially imbalanced dataset with deep learning. Remote
Chen, X., Zhao, J., Chen, Y.-H., Zhou, W., Hughes, A.C., 2020. Automatic standardized Sens. Environ. 216, 139–153.
processing and identification of tropical bat calls using deep learning approaches. Kellenberger, B., Marcos, D., Lobry, S., Tuia, D., 2019. Half a percent of labels is enough:
Biol. Conserv. 241, 108269. efficient animal detection in uav imagery using deep cnns and active learning. IEEE
Cheng, G., Han, J., 2016. A survey on object detection in optical remote sensing images. Trans. Geosci. Remote Sens. 57 (12), 9524–9533.
ISPRS J. Photogramm. Remote Sens. 117, 11–28. Khaemba, W.M., Stein, A., 2002. Improved sampling of wildlife populations using
Choe, D.-G., Kim, D.-K., 2020. Deep learning-based image data processing and archival airborne surveys. Wildl. Res. 29 (3), 269–275.
system for object detection of endangered species. J. Inform. Commun. Converg. Eng. Khan, W., Kumar, T., Cheng, Z., Raj, K., Roy, A.M., Luo, B., 2022a. Sql and nosql
18 (4), 267–277. Databases Software Architectures Performance Analysis and Assessments–A
Crooks, K., Burdett, C., Theobald, D., King, S., Marco, M.D., Rondinini, C., Boitani, L., Systematic Literature Review. arXiv preprint arXiv:2209.06977.
2017. Quantification of habitat fragmentation reveals extinction risk in terrestrial Khan, W., Raj, K., Kumar, T., Roy, A.M., Luo, B., 2022b. Introducing urdu digits dataset
mammals. Proc. Natl. Acad. Sci. 114 (29), 7635–7640. with demonstration of an efficient and robust noisy decoder-based pseudo example
F
Davis, J., Goadrich, M., 2006. The relationship between precision-recall and roc curves. generator. Symmetry 14 (10), 1976.
In: Proceedings of the 23rd International Conference on Machine Learning. pp. Kim, J.S., Elli, G.V., Bedny, M., 2019. Knowledge of animal appearance among sighted
233–240. and blind adults. Proc. Natl. Acad. Sci. 116 (23), 11213–11222.
Delplanque, A., Foucher, S., Lejeune, P., Linchant, J., Théau, J., 2021. Multispecies Kudo, H., Koshino, Y., Eto, A., Ichimura, M., Kaeriyama, M., 2012. Cost-effective accurate
OO
detection and identification of african mammals in aerial imagery using convolutional estimates of adult chum salmon, oncorhynchus Keta, abundance in a japanese river
neural networks. Remote Sens. Ecol. Conserv. using a radio-controlled helicopter. Fish. Res. 119, 94–98.
Desgarnier, L., Mouillot, D., Vigliola, L., Chaumont, M., Mannocci, L., 2022. Putting eagle LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553), 436–444.
rays on the map by coupling aerial video-surveys and deep learning. Biol. Conserv. Lee, W.Y., Park, M., Hyun, C.-U., 2019. Detection of two arctic birds in Greenland and an
267, 109494. endangered bird in Korea using rgb and thermal cameras with an unmanned aerial
Divya Meena, S., Agilandeeswari, L., 2019. An efficient framework for animal breeds vehicle (uav). PLoS One 14 (9), 1–16.
classification using semi-supervised learning and multi-part convolutional neural Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P., 2017a. Focal loss for dense object
network (mp-cnn). IEEE Access 7, 151783–151802. detection. In: Proceedings of the IEEE International Conference on Computer Vision.
Duporge, I., Isupova, O., Reece, S., Macdonald, D.W., Wang, T., 2021. Using very-high- pp. 2980–2988.
PR
resolution satellite imagery and deep learning to detect and count african elephants in Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P., 2017b. Focal loss for dense object
heterogeneous landscapes. Remote Sens. Ecol. Conserv. 7 (3), 369–381. detection. In: Proceedings of the IEEE International Conference on Computer Vision.
Eikelboom, J.A., Wind, J., van de Ven, E., Kenana, L.M., Schroder, B., de Knegt, H.J., van pp. 2980–2988.
Langevelde, F., Prins, H.H., 2019. Improving the precision and accuracy of animal Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., Berg, A., 2016. Ssd: Single
population estimates with aerial image object detection. Methods Ecol. Evol. 10 (11), Shot Multibox Detector,— In European Conference on Computer Vision (eccv).
1875–1887. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J., 2018. Path aggregation network for instance
Esser, P., Sutter, E., Ommer, B., 2018. A variational u-net for conditional appearance and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and
shape generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 8759–8768.
Pattern Recognition. pp. 8857–8866. Loshchilov, I., Hutter, F., 2017. Sgdr: Stochastic Gradient Descent with Warm Restarts.
Feng, J., Li, J., 2022. An adaptive embedding network with spatial constraints for the use Mannocci, L., Baidai, Y., Forget, F., Tolotti, M.T., Dagorn, L., Capello, M., 2021. Machine
D
of few-shot learning in endangered-animal detection. ISPRS Int. J. Geo Inf. 11 (4), learning to detect bycatch risk: novel application to echosounder buoys data in tuna
256. purse seine fisheries. Biol. Conserv. 255, 109004.
Ferri, C., Hernández-Orallo, J., Modroiu, R., 2009. An experimental comparison of Meena, S.D., Loganathan, A., 2020. Intelligent animal detection system using sparse multi
performance measures for classification. Pattern Recogn. Lett. 30 (1), 27–38. discriminative-neural network (smd-nn) to mitigate animal-vehicle collision. Environ.
Ghiasi, G., Lin, T.-Y., Le, Q.V., 2018. Dropblock: a regularization method for Sci. Pollut. Res. 27, 39619–39634.
TE
convolutional networks. Adv. Neural Inf. Proces. Syst. 31. Misra, D., 2020. Mish: A Self Regularized Non-monotonic Activation Function.
Girshick, R., 2015. Fast r-cnn in Proceedings of the IEEE International Conference on Moreni, M., Theau, J., Foucher, S., 2021. Train fast while reducing false positives:
Computer Vision. IEEE.[Google Scholar], Piscataway, NJ, pp. 1440–1448. improving animal classification performance using convolutional neural networks.
Glowacz, A., 2021a. Fault diagnosis of electric impact drills using thermal imaging. Geomatics 1 (1), 34–49.
Measurement 171, 108815. Naude, J., Joubert, D., 2019. The aerial elephant dataset: A new public benchmark for
Glowacz, A., 2021b. Thermographic fault diagnosis of ventilation in bldc motors. Sensors aerial object detection. In: Proceedings of the IEEE/CVF Conference on Computer
21 (21), 7245. Vision and Pattern Recognition Workshops. pp. 48–55.
Glowacz, A., 2021c. Ventilation diagnosis of angle grinder using thermal imaging. Sensors Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M.S., Packer, C.,
EC
21 (8), 2853. Clune, J., 2018. Automatically identifying, counting, and describing wild animals in
Gonçalves, B.C., Spitzbart, B., Lynch, H.J., 2020. Sealnet: a fully-automated pack-ice seal camera-trap images with deep learning. Proc. Natl. Acad. Sci. 115 (25),
detection pipeline for sub-meter satellite imagery. Remote Sens. Environ. 239, E5716–E5725.
111617. O’Brien, T., 2010. Wildlife picture index and biodiversity monitoring: issues and future
Gonzalez, L.F., Montes, G.A., Puig, E., Johnson, S., Mengersen, K., Gaston, K.J., 2016. directions. Anim. Conserv. 13 (4), 350–352.
Unmanned aerial vehicles (uavs) and artificial intelligence revolutionizing wildlife Ofli, F., Meier, P., Imran, M., Castillo, C., Tuia, D., Rey, N., Briant, J., Millet, P., Reinhard,
monitoring and conservation. Sensors 16 (1), 97. F., Parkan, M., et al., 2016. Combining human computing and machine learning to
Guo, X., Shao, Q., Li, Y., Wang, Y., Wang, D., Liu, J., Fan, J., Yang, F., 2018. Application of make sense of big (aerial) data for disaster response. Big Data 4 (1), 47–59.
RR
uav remote sensing for a population census of large wild herbivores—taking the Parham, J., Stewart, C., Crall, J., Rubenstein, D., Holmberg, J., Berger-Wolf, T., 2018. An
headwater region of the yellow river as an example. Remote Sens. 10 (7), 1041. animal detection pipeline for identification. In: 2018 IEEE Winter Conference on
Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D., 2018. Advanced deep-learning techniques Applications of Computer Vision (WACV). IEEE, pp. 1075–1083.
for salient and category-specific object detection: a survey. IEEE Signal Process. Mag. Peng, J., Wang, D., Liao, X., Shao, Q., Sun, Z., Yue, H., Ye, H., 2020. Wild animal survey
35 (1), 84–100. using uas imagery and deep learning: modified faster r-cnn for kiang detection in
Harris, G., Thompson, R., Childs, J.L., Sanderson, J.G., 2010. Automatic storage and tibetan plateau. ISPRS J. Photogramm. Remote Sens. 169, 364–376.
analysis of camera trap data. Bull. Ecol. Soc. Am. 91 (3), 352–360. Petso, T., Jamisola, R.S., Mpoeleng, D., Mmereki, W., 2021. Individual animal and herd
He, K., Zhang, X., Ren, S., Sun, J., 2015. Spatial pyramid pooling in deep convolutional identification using custom yolo v3 and v4 with images taken from a uav camera at
networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37 (9), different altitudes. In: 2021 IEEE 6th International Conference on Signal and Image
CO
1904–1916. Processing (ICSIP). IEEE, pp. 33–39.

He, K., Gkioxari, G., Dollár, P., Girshick, R., 2017. Mask r-cnn. In: Proceedings of the IEEE Pringle, R.M., Syfert, M., Webb, J.K., Shine, R., 2009. Quantifying historical changes in
International Conference on Computer Vision. habitat availability for endangered species: use of pixel-and object-based remote
He, Q., Zhao, Q., Liu, N., Chen, P., Zhang, Z., Hou, R., 2019. Distinguishing individual red sensing. J. Appl. Ecol. 46 (3), 544–553.
pandas from their faces. In: Lin, Z., Wang, L., Yang, J., Shi, G., Tan, T., Zheng, N., Redmon, J., Farhadi, A., 2017. Yolo9000: better, faster, stronger. In: Proceedings of the
Chen, X., Zhang, Y. (Eds.), Pattern Recognition and Computer Vision. Springer IEEE Conference on Computer Vision and Pattern Recognition. pp. 7263–7271.
International Publishing, Cham, pp. 714–724. Redmon, J., Farhadi, A., 2018. Yolov3: An incremental improvement.
Hou, J., He, Y., Yang, H., Connor, T., Gao, J., Wang, Y., Zeng, Y., Zhang, J., Huang, J., Redmon, J., Divvala, S., Girshick, R., Farhadi, A., 2016. You only look once: Unified, real-
Zheng, B., et al., 2020. Identification of animal individuals using deep learning: a case time object detection. In: Proceedings of the IEEE Conference on Computer Vision and
study of giant panda. Biol. Conserv. 242, 108414. Pattern Recognition. pp. 779–788.
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q., 2017. Densely connected Ren, S., He, K., Girshick, R., Sun, J., 2016. Faster r-cnn: towards real-time object detection
convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39 (6),
and Pattern Recognition. pp. 4700–4708. 1137–1149.
Ibraheam, M., Li, K.F., Gebali, F., Sielecki, L.E., 2021. A performance comparison and Rey, N., Volpi, M., Joost, S., Tuia, D., 2017. Detecting animals in african savanna with
enhancement of animal species detection in images with various r-cnn models. AI 2 uavs and the crowds. Remote Sens. Environ. 200, 341–351.
(4), 552–577. Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S., 2019. Generalized
Jamil, S., Abbas, M.S., Roy, A.M., 2022. Distinguishing malicious drones using vision intersection over union: A metric and a loss for bounding box regression. In:
transformer. AI 3 (2), 260–273. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Jaskólski, M.W., 2021. For human activity in arctic coastal environments–a review of Recognition. pp. 658–666.
selected interactions and problems. Miscellanea Geogr. 25 (2), 127–143. Roy, A.M., 2021. Finite element framework for efficient design of three dimensional
12
multicomponent composite helicopter rotor blade system. Eng 2 (1), 69–79. Stern, E.R., Humphries, M.M., 2022. Interweaving local, expert, and indigenous
Roy, A.M., 2022a. Adaptive transfer learning-based multiscale feature fused deep knowledge into quantitative wildlife analyses: a systematic review. Biol. Conserv.
convolutional neural network for eeg mi multiclassification in brain–computer 266, 109444.
interface. Eng. Appl. Artif. Intell. 116, 105347. Taheri, S., Toygar, Önsen, 2018. Animal classification using facial images with score-level
Roy, A.M., 2022b. An efficient multi-scale CNN model with intrinsic feature integration fusion. IET Comput. Vis. 12 (6), 679–685.
for motor imagery EEG subject classification in brain-machine interfaces. Biomed. Torney, C.J., Lloyd-Jones, D.J., Chevallier, M., Moyer, D.C., Maliti, H.T., Mwita, M., Kohi,
Signal Process. Control 74, 103496. E.M., Hopcraft, G.C., 2019. A comparison of deep learning and citizen science
Roy, A.M., 2022c. A multi-scale fusion cnn model based on adaptive transfer learning for techniques for counting wildlife in aerial survey images. Methods Ecol. Evol. 10 (6),
multi-class mi-classification in bci system. BioRxiv. 779–787.
Roy, A.M., Bhaduri, J., 2021. A deep learning enabled multi-class plant disease detection Tzutalin, 2015. Labelimg.
model based on computer vision. AI 2 (3), 413–428. Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E., 2018. Deep learning for
Roy, A.M., Bhaduri, J., 2022. Real-time growth stage detection model for high degree of computer vision: a brief review. Comput. Intell. Neurosci. 2018.
occultation using densenet-fused YOLOv4. Comput. Electron. Agric. 193, 106694. Wang, D., Shao, Q., Yue, H., 2019. Surveying wild animals from satellites, manned aircraft
F
Roy, A.M., Bose, R., 2023. Physics-aware deep learning framework for linear elasticity. and unmanned aerial systems (uass): a review. Remote Sens. 11 (11), 1308.
arXiv 1–48. https://doi.org/https://.org/10.3390/drones7020081. Xiao, Z., Xu, X., Xing, H., Luo, S., Dai, P., Zhan, D., 2021a. Rtfn: a robust temporal feature
Roy, A.M., Bose, R., Bhaduri, J., 2022. A fast accurate fine-grain object detection model network for time series classification. Inf. Sci. 571, 65–86.
based on YOLOv4 deep neural network. Neural Comput. & Applic. 1–27. Xiao, Z., Xu, X., Xing, H., Song, F., Wang, X., Zhao, B., 2021b. A federated learning system
OO
Ruff, Z.J., Lesmeister, D.B., Appel, C.L., Sullivan, C.M., 2021. Workflow and convolutional with enhanced feature extraction for human activity recognition. Knowl.-Based Syst.
neural network for automated identification of animal sounds. Ecol. Indic. 124, 229, 107338.
107419. Xiao, Z., Xu, X., Zhang, H., Szczerbicki, E., 2021c. A new multi-process collaborative
Saxena, A., Gupta, D.K., Singh, S., 2021. An animal detection and collision avoidance architecture for time series classification. Knowl.-Based Syst. 220, 106934.
system using deep learning. In: Advances in Communication and Computational Xing, H., Xiao, Z., Qu, R., Zhu, Z., Zhao, B., 2022a. An efficient federated distillation
Technology. Springer, pp. 1069–1084. learning system for multitask time series classification. IEEE Trans. Instrum. Meas. 71,
Schindler, F., Steinhage, V., 2021. Identification of animals and recognition of their 1–12.
actions in wildlife videos using deep learning techniques. Ecol. Inform. 61, 101215. Xing, H., Xiao, Z., Zhan, D., Luo, S., Dai, P., Li, K., 2022b. Selfmatch: robust
Singh, A., Pietrasik, M., Natha, G., Ghouaiel, N., Brizel, K., Ray, N., 2020. Animal semisupervised time-series classification with self-distillation. Int. J. Intell. Syst.
PR
detection in man-made environments. In: 2020 IEEE Winter Conference on Yao, Z., Cao, Y., Zheng, S., Huang, G., Lin, S., 2021. Cross-Iteration Batch Normalization.
Applications of Computer Vision (WACV). pp. 1427–1438. Zhao, Z.-Q., Zheng, P., Xu, S.-T., Wu, X., 2019a. Object detection with deep learning: a
Singh, A., Raj, K., Kumar, T., Verma, S., Roy, A. M., 2023. Deep Learning-Based Cost- review. IEEE Trans. Neural Netw. Learn. Syst. 30 (11), 3212–3232.
Effective and Responsive Robot for Autism Treatment. Drones 7 (2), 1–18. https:// Zhao, Z.-Q., Zheng, P., Xu, S.-T., Wu, X., 2019b. Object detection with deep learning: a
doi.org/https://.org/10.3390/drones7020081. review. IEEE Trans. Neural Netw. Learn. Syst. 30 (11), 3212–3232.
Singh, A., Ranjbarzadeh, R., Raj, K., Kumar, T., Roy, A. M., 2022. Understanding EEG Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D., 2020. Distance-iou loss: Faster and
signals for subject-wise Definition of Armoni Activities. arXiv 1–11. https://doi.org/ better learning for bounding box regression. In: Proceedings of the AAAI Conference
https://.org/10.48550/arXiv.2301.00948. on Artificial Intelligence, vol. 34, pp. 12993–13000.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., 2014. Dropout:
a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15 (1),
D
1929–1958.
TE
EC
RR
CO
13

Proof Ecoinf 101919

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Proof Ecoinf 101919

Uploaded by

Copyright:

Available Formats

Ecological Informatics xxx (xxxx) 101919

Contents lists available at ScienceDirect

neural network (CNN) (Kellenberger et al., 2018), RetinaNet

ject localization, we perform extensive experiments, and various modi-

4.2. Improvement of discriminative feature extraction

Furthermore, CIoU loss function (Zheng et al., 2020), dropblock reg-

CSPDarknet53 PANet 76.8 93.6 92.5 80.9 89.2 80.9 59.6

art models where bold highlights the best performance values.

YOLOv3 83.61 87.47 85.49 86.61 25.11 39.82

average confidence scores have been shown in Table 6.

ficiently preserve fine-grain localized information by feature fusion. Data availability

Species Figs. No Model Detc. Undetc. Avg. confidence

Polar Bear 7 (a)-(c) WilDect- 10 0 0.96

1904–1916. Processing (ICSIP). IEEE, pp. 33–39.

You might also like