You are on page 1of 7

Animal Detection Using Deep Learning in Natural Scene

Neeraj Gupta Mukul Agrawal Kshitij Gupta


Department of CEA GLA University GLA University
GLA University Mathura, India Mathura, India
Mathura, India mukul.agrawal@gla.ac.in kshitij.gupta1@gla.ac.in
neeraj.gupta@gla.ac.in

Krishna Gupta Nitin Kumar


GLA University GLA University
Mathura, India Mathura, India
krishna.gupta@gla.ac.in nitin.kumar_cs18@gla.ac.in

Abstract— It is very essential activity to capture being programmed explicitly. Basically machine
or monitoring the systematic and well-grounded learning focuses on the development of computer
activity of animals because population and programs that can access data and use it to learn.
harvest monitoring are an important component This process of learning is start with observations or
of wildlife management to evaluate the effects of data such as examples, direct experience, or
management decisions.This project develops an instruction, in order to look for patterns in data and
algorithm to detect the animals. As there are make better decisions in the future based on the
large number of different animals identifying examples that we provide. The primary aim is to
them manually is very difficult task. The allow the computers learn automatically without
algorithm that we develops classifies animals human intervention or assistance and adjust actions
based on their images so we can monitor them accordingly.
efficiently. Animal detection and classification
can help to prevent animal-vehicle accidents, Deep learning is a subset of machine learning. In
trace animals and prevent theft. This can be deep learning, there is a term named as CNN means
achieved by applying effective deep learning convolutional neural network which is a class of
algorithms. deep neural networks and it is most commonly
applied to analyzing visual imagery. CNNs use
Keywords— Animal detection; deep learning; relatively little pre-processing compared to other
classification image classification algorithms. It means that the
I. INTRODUCTION network learns the filters that in traditional
algorithms were hand-engineered. This
A Machine Learning is the subset of Artificial independence from prior knowledge and human
Intelligence that provide systems the ability to learn effort in feature design is a major advantage. They
automatically and improve from experience without have applications in image and video recognition,
recommender systems, image classification, According to LeCun et al. [17], CNNs have been
medical image analysis, and natural language showing Great practical performance and been
processing. widely used in machine learning in the past recent
years . It is especially used in many areas such as
image classification , speech recognition and NLP (
natural language processing ). These models have
This Project develops an algorithm to detect and made the state the art results that even outperformed
classify animals. This is very essential activity to human in image recognition task [20], due to recent
improvements. In neural networks , namely deep
capture or monitor the systematic and well-
CNNs, and computing power, especially the
grounded activity of animals. The algorithm which
successful. Implementations of parallel computing
we are going to use classifies animals by capturing on GPUs ( graphical processing units ).
their images that helps easy prediction and
classification. We are using machine learning that
provides systems the ability to learn automatically
and improve from experience without being
programmed explicitly This process of learning is
start with observations or data such as examples
from a direct experience or instructions in order to
look for patterns in data and make better decisions
in the future based on the examples that we provide. Figure 3: Illustration of a typical convolutional
We are using tensorflow library and image neural network architecture setup.
classification model to build our project.
CNNS are basically the learning models which is
based on neural network specifically designed to
II. RELATED WORK take advance the spatial structure of input images,
which are usually in 3-dimensional volume: width,
In this part we first describe the CNN and its height and depth . As shown in figure 3, a CNN is
applications in brief to image classification. Then essentially a sequence of layers which can be
after this we summarize various architectures of divided into groups each comprising of
CNN which have demonstrated the state of the art convolutional layer. In the standard neural networks,
performance in recent ImageNet challenges. Finally each neuron is fully connected to all neurons in the
we discuss some existing approaches to a particular previous layer and also the neurons of each layer are
problem :- our topic name. fully independent. When applied to high
A. Convolutional Neural Networks for Image
dimensional data such as natural images, the total
number of parameters can reach millions, leading to
Classification
serious overfitting problem and impractical to be
For human, Visual recognition is a relatively trained. In CNNs, by contrast, each neuron is
trivial task but still It is challenging for automated connected only to a small region of the preceding
image recognition systems due to varied properties layer, forming local connectivity. The convolution
and complication of images. There is an infinite layer computes the outputs of its neurons connected
number of images which is generated by variations to local regions in the previous layer, the spatial
in position , scale ,view , background or illumination extent of this connection is specified by a filter size.
can be altered by each object of interest. There are In addition, another important property of CNNs,
many challenges which become more serious in real namely parameter sharing, dramatically reduces the
world problems such as ( Our topic name ). So It is number of parameters and so does computing
important to build models that are capable of being complexity. Thus, compared to regular neural
invariant o certain transformations of the inputs networks with similar size of layers, CNNs have
while keeping sensitivity with inter class objects. much fewer connections and parameters, making
them easier to train while their performance is
slightly degraded [20]. These three main
characteristics – spatialstructure, local connectivity
Trainable
and parameter sharing – allow CNNs converting Model
layers
Main specifications
input image into layers of abstraction; the lower
5 convolutional layers and 3
layers present detail features of images such as AlexNet 8 fully-connected layers. [20]
edges,curves and corners, while the higher layers
exhibit more abstract features of object.
A CNN-based architecture comprising of 8 layers 13 convolutional layers with 3x3
VGG-16 16 filters,
with 5 convolutional layers and 3 fully-connected and 3 fully-connected layers.
layers. A variant of the AlexNet model also [18]
Developed an Inception Module
achieved over 10% top-5 test error rate better than that
the second-best entry [20]. GoogLeNet [19], the dramatically reduces the
GoogLeNet 22 number of parameters while
winner of ILSVRC-2014, developed an Inception
achieving high accuracy.
Module that dramatically reduces the number of Average pooling is used at top
parameters. Further more, the GoogLeNet replaced of CNN instead of fully-
fullyconnected layers at the top of the CNN by connected layers. [19]
average pooling, removing a large number A deep residual learning
framework, skip
parameters which do not affect performance of the connections and batch
ResNet-50 50
network. The VGG Nets [18], which are analogous normalization. Much deeper
to AlexNet but the network depth was increased up than VGG-16 (50 compared to
to 19 layers, with smaller convolutional filters, 16) but having lower
outperformed other models in the ILSVRC-2014 complexity and higher
performance. [21]
except the GoogLeNet. Not only show great
performance on the ImageNet dataset, the VGG
models also generalize well and achieve the best There were some attempts which is to build an
results on other datasets [18]. The most recently automatic wildlife classification system. In [10], Yu
published state-of-the-art architecture is the ResNet, et al. employed improved sparse coding spatial
a residual learning framework with a depth of up to pyramid matching (ScSPM) for imageclassification
152 layers but still having lower complexity than the [27], [28]. Animal objects are first manually
VGG Nets. Similar to the VGG Nets, the ResNet detected and cropped out of the background with
also shows good generalization performance on new the whole body, then image features are extracted
datasets other than the ImageNet [21]. based on the ScSPM to convert an image or a
bounding box to a single vector, finally a linear
multi-class SVM is applied for classification. The
B. Wildlife Classification average classification accuracy was at 82% on their
own dataset of 7,196 images of 18 species.
Monitoring wildlife through camera traps is an
effective and reliable method in natural observation Inspired by the great success of deep CNN-based
as it can collect a large volume of visual data models, in this work we apply deep CNN models
naturally and inexpensively. The wildlife data, for wild animal classification, similar with [11] and
which can be fully automatic captured and collected [12], on the Wildlife Spotter dataset. Different to
from camera traps, however, is a burden for [12], we solve the task of animal image filtering
biologists to analyze to detect whether there exist prior to the task of animal identification, as the
animal in each image, or identify which species the Wildlife Spotter dataset contains a large amount of
objects belong to. blank images (i.e. images without animal presence).
Furthermore, for the task of animal identification,
we investigate two training scenarios for
Table I: The most common and successful comparisons: training models from scratch on the
CNN architectures for image classification.
Wildlife Spotter dataset, and training with available Australian wildlife populations, behaviours and
ImageNet pre-trained models (i.e. fine-tuning). habitats to save threatened species and preserve
balanced, diverse, and sustainable ecosystems4.
C. Citizen Science
The Wildlife Spotter project is divided into six
sub-projects, specializing on separated natural areas
Citizen Science plays an major role in many
of Australia: Tasmanian nature reserves, Far north
research areas like ecology and environmental
Queensland, South-central Victoria, Northern
sciences [33], [34], [35], [36]. A citizen scientist is
Territory arid zone, New South Wales coastal
a proposer who contributes to science by collecting
forests, and Central mallee lands of New South
and processing data as part of a scientific enquiry.
Wales5. Volunteers participate the project by
Significant development in digital technique,
registering online accounts, logging in the Web-
especially the Internet and mobile computing, is one
based image classification system and manually
of key factors responsible for the great explosion of
labeling the displayed images, one by one. User
recent citizen science projects [33]. Volunteers are
assigns an introduced image to a specific species by
now able to, remotely, take part to a project by
clicking the appropriate category from a given list
using designated applications on their mobile
of animals. In case of uncertainty, blank image or
phones or computers to collect data or process
image problem, user labels image as “Something
introduced data, and then enter them online into
else”, “There is no animal in view” or “There is a
centralized, relational databases [36]. Citizen
problem with this image”, respectively. In order to
scientists now participate in many projects on a
obtain reliable classification accuracy, each image
range of areas, including climate change, invasive
in the dataset is repeatedly introduced to a number
species and monitoring of all kinds [33], [36]. In
of different users to label. For instance, most
addition, the engagement of public significantly
classified images in the Southcentral Victoria
facilitates the area of machine learning. Supervised
dataset each was annotated by five citizen scientists.
machine learning algorithms require large amounts
As we described in Section I, the image datasets
of labeled data to train automated models, thus
collected from camera traps are usually in large
human-labeled datasets, such as Snapshot Serengeti
volume and in imperfect quality, which critically
or Wildlife Spotter, are valuable resources. Many
prolong processing time and probably lead to
Internet based applications, such as Google Search,
misclassification or inconsistent labeling. In this
Facebook or Amazon, are leveraging machine
work, we aim at building a practical, fully
learning techniques through data collected from
automatic animal recognition framework for
public user activities to enhance their business
Wildlife Spotter project, freeing scientists from the
management.
burden of manual labeling, while dramatically
reducing processing time.

D. Wildlife Spoter Projectl III. EXPERIMENT RESULT AND DISCUSSION

Many Australian organizations and universities  Classify multiple different types of animals
has undertaken an online citizen science project using convolution image classifier
named Wildlife Spotter , taking crowd-sourcing  Use Visual Geometry Group- VGG16-
approach to science by asking volunteers to help model and transfer learning
scientists classifying animals from huge amount of  A pre-trained model on ImageNet
collected images to deal with the enormous volume  Help identify animal pictures taken by
of images, the project invites volunteers playing as Wildlife Conservatory
“citizen scientists” to join image analyzing. The  Identify potential new species as well as
main goal of the project is, through analyzing endangered species
captured images, to assist researchers study
Confusion
Matrix

Training and Validation : Accuracy CONFUSION MATRIX

Training and Validation: Loss


NORMALIZED CONFUSION MATRIX
CLASSIFICATION METRICS REFERENCES

[1] P. M. Vitousek, H. A. Mooney, J. Lubchenco,


and J. M. Melillo, “Human domination of
Earth’s ecosystems,” Science, vol. 277, no.
5325, pp. 494– 499, 1997.

[2] G. C. White and R. A. Garrott, Analysis of


wildlife radio-tracking data. Elsevier, 2012.
[3] B. J. Godley, J. Blumenthal, A. Broderick,
M. Coyne, M. Godfrey,
L. Hawkes, and M. Witt, “Satellite tracking
of sea turtles: Where have we been and
where do we go next?” Endangered
Species Research, vol. 4, no. 1-2, pp. 3–22,
2008.
ACCURACY OF MODEL
[4] I. A. Hulbert and J. French, “The accuracy of
GPS for wildlife telemetry and habitat
mapping,” Journal of Applied Ecology, vol.
38, no. 4, pp. 869–878, 2001.

[5] A. F. O’Connell, J. D. Nichols, and K. U.


Karanth, Camera traps in animal ecology:
Methods and Analyses. Springer Science &
IV CONCLUSION Business Media, 2010.

Efficient and reliable monitoring of wild [6] S. Thorpe, D. Fize, and C. Marlot, “Speed of
animals in their natural habitats is essential to processing in the human visual system,”
inform conservation and ma- nagement decisions. Nature, vol. 381, no. 6582, p. 520, 1996.
In this paper, using the Wildlife Spotter dataset,
which contains a large number of images taken by [7] G. Chen, T. X. Han, Z. He, R. Kays, and T.
trap cameras in South-central Victoria, Australia, Forrester, “Deep con- volutional neural
we proposed and demonstrated the feasibility of a network based species recognition for wild
animal monitoring,” in Proceedings of the
deep learning approach towards constructing
IEEE International Conference on Image
scalable automated wildlife monitoring system. Our
Processing (ICIP), 2014, pp. 858–862.
models achieved more than 96% in recognizing
[8] A. Gómez, A. Salazar, and F. Vargas,
images with animals and close to 90% in “Towards automatic wild animal monitoring:
identifying three most common animals (bird, rat Identification of animal species in camera-
and bandicoot). Furthermore, with different trap images using very deep convolutional
experimental settings for balanced and imbalan- neural networks,” arXiv:1603.06169, 2016.
ced, the system has shown to be robust, stable and [9] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li,
suitable for dealing with images captured from the and L. Fei-Fei, “ImageNet: A large-scale
wild. hierarchical image database,” in Proceedings
of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 2009, pp.
248–255.
[10] O. Russakovsky, J. Deng, H. Su et al., [18] R. Collobert and J. Weston, “A unified
“ImageNet large scale visual recognition architecture for natural lan- guage
challenge,” International Journal of processing: Deep neural networks with
Computer Vision, vol. 115, no. 3, pp. 211– multitask learning,” in Proceedings of
252, 2015. the 25th International Conference on
Machine Learning (ICML), 2008, pp.
[11] N. Pinto, D. D. Cox, and J. J. DiCarlo, “Why 160–167.
is real-world visual object recognition hard?”
[19] J. Gehring, M. Auli, D. Grangier, and Y. N.
PLOS Computational Biology, vol. 4, no. 1, Dauphin, “A Convolutional
p. e27, 2008. Encoder Model for Neural Machine
Translation,” arXiv:1611.02344, 2016.

[12] C. M. Bishop, “Pattern recognition,” [20] J. Gehring, M. Auli, D. Grangier, D.


Machine Learning, vol. 128, pp. 1–58, 2006. Yarats, and Y. N. Dauphin,
“Convolutional Sequence to Sequence
[13] Y. LeCun, B. Boser, J. S. Denker, D. Learning,” ArXiv e-prints, 2017.
Henderson, R. E. Howard,
W. Hubbard, and L. D. Jackel,
“Backpropagation applied to handwritten zip
code recognition,” Neural Computation, vol.
1, no. 4, pp. 541–551, 1989.

[14] K. Simonyan and A. Zisserman, “Very deep


convolutional networks for large-scale image
recognition,” arXiv:1409.1556, 2014.

[15] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S.


Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich,
“Going deeper with convolutions,” in
Proceedings of the IEEE Conference on
Computer Vision and Pattern
Recognition (CVPR), 2015, pp. 1–9.

[16] A. Krizhevsky, I. Sutskever, and G. E.


Hinton, “ImageNet classifica- tion with
deep convolutional neural networks,” in
Advances in Neural Information
Processing Systems, 2012, pp. 1097–
1105.

[17] K. He, X. Zhang, S. Ren, and J. Sun,


“Deep residual learning for image
recognition,” in Proceedings of the IEEE
Conference on Computer Vision and
Pattern Recognition (CVPR), 2016, pp.
770–778.

You might also like