You are on page 1of 6

2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE)

Bird Species Identification using Deep Learning


on GPU platform
2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE) 978-1-7281-4142-8/20/$31.00 ©2020 IEEE 10.1109/ic-ETITE47903.2020.85

Pralhad Gavali J.Saira Banu


Computer Science & Information Technology Department, School of Computer Science and Engineering,
Rajarambapu Institute of Technology, Sakharale Vellore Institute of Technology, Vellore
Sangli, Maharashtra, India. Vellore, Tamilnadu, India.
pralhad.gavali@ritindia.edu jsairabanu@vit.ac.in

Abstract bird information requires huge efforts by humans as


well as being a much more expensive method. In such
Today, many species of birds are rarely found, and it is
situations, a robust system must be in place that will
difficult to classify bird species when found. For example,
provide large-scale bird information processing and
for different scenarios, birds come with different sizes,
forms, colors and from a human viewpoint with different serve as a valuable resource for scholars, government
angles. Indeed, the images show different differences that agencies and so on. Consequently, naming bird species
need to be recorded as audio recognition of bird species. It is plays a significant role here for determining which
also easier for people to identify birds in the pictures. Today, species belongs to a specific image of birds. Generally;
using deep convolutional neural network (DCNN) on the identification of birds has done using the image,
GoogLeNet framework bird species classification is audio or video [6]. In 2013, the IEEE International
possible. For this experiment, a bird image was converted Machine Learning Workshop for Signal Processing
into a gray scale format that generated the autograph. After
(MLSP) declared a challenge to identify bird species
examining each and every autograph that calculates the score
[14]. The audio processing technique allows for the
sheet from each node and predicts the respective bird species
after the score sheet analysis. In this experiment, the detection of birds by recording the audio signal. But
Caltech-UCSD Birds 200 [CUB-200-2011] data collection the processing of such information becomes more
was used for both training and testing purposes. For training complicated because of in the environment; the mixed
purpose 500 labeled data are used and 200 unlabeled data are sounds like insects, real world objects, etc. Usually,
used for testing. For classification, Deep Convolutional people are more effective at find images than audios or
Neural Networks are used and parallel processing was videos. So, it is preferable to use an image over audio
carried out using GPU technology. Final results show that or video to classify birds [15].
the DCNN algorithm can be predicted at 88.33% of bird
species. The experimental research is performed on the linux
Ornithologists have been facing problems in
operating systems with Tensor flow library and using a
identifying bird species for many decades. They have
NVIDIA Geforce GTX 680 with 2 GB RAM.
to learn all the specifics of birds, such as their climate,
Keywords— Autograph, Caltech-UCSD, DCNN, Tensorflow genetics, distribution, environmental impact, etc.
Normally, bird identification is conducted by an
ornithologist based on the classification suggested by
I. INTRODUCTION Linnaeus based on criteria such as State, Clade, Rank,
Order, Family and Species [5].
Nowadays, Identification of bird species is a difficult The rest of the paper will be arranged as below. First,
activity sometimes leading to uncertainty. Birds allow brief overviews of a general introduction to the images
us to search certain organisms within the environment for species and then their classification methods.
as they respond quickly to changes in the atmosphere
(for example, the insects on which they feed) [1] But
collecting and gathering

978-1-7281-4142-8/$31.00 ©2020 IEEE 1


978-1-7281-4141-1/$31.00 ©2020 IEEE
2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE)

A summary of the approach with processes for


Horn deep increasing previous methods.
classification is discussed in Section 2. Section 3 convolution classification
includes the results of experiments and discussions. al nets accuracy.
Eventually, in Section 4, the findings are addressed.

III. METHODOLOGY
II. LITERATURE SURVEY

Table 1: Literature Review Main purpose behind this experiment is to identify


the images of bird and classify into a concern
species by considering following objectives
Paper title Advantages Disadvantages
Author
Name • Bird species Identification is a major
Andreia Bird Species Classification of Some time don’t
concern in ornithologists.
Marini, Classification bird using color identify the species
Based on feature. because many sub- • To study the bird species resources for the
Jacques Color species of birds have sake of protection cannot be ignored.
Facon Features. quite similar color.
• By protecting bird species resources, it
Identification will provide our nation a prestige and
of bird species Difficult to recognize value.
in audio It is simple to voice because the audio
Karol J. recordings upload only one clip contains much other
Piczak using deep audio recorded voices that creates A methodology block diagram is displayed in Figure
convolution clip. disturbance to find 1. The method shown is made up of four processes:
neural species. (A) Input/Upload image; (B) Pre- processing; (C)
networks Deep learning; and (D) classification. (E) Evaluation.
Andrei- Bird Species
Petru Easy to identify
Identification difficult to detect image
Brar, bird species
from an from different angle
using image
Image

Birdvocalization
Rahul P measures The complete
Audio based
Tivareka introduce identification process
bird species
r, automated involves collection of
recognition
methods deals recorded vocalizations
using naive
Vinayak with powerful of different species
bayes
D. audio signal which is lengthy and in
algorithm
Chavan processing efficient
techniques.

Accuracy is
close to the
Bird species state-of-the-art A common problem
identification and has an when training neural
John
using advantage over networks is often the
Martinss
convolutional raw spectral data lack of available training
on
neural when data. data augmentation
networks computational techniques.
resources are
limited.

Steve Bird species The use of meta-


Error rate on CUB-200-
Branson, categorization data, in addition
2011 reduced by 30
Grant using pose to the audio Figure 1: Bird Identification sequence.
percent as compared to
Van normalized data, as a way of

2
978-1-7281-4141-1/$31.00 ©2020 IEEE
2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE)

A. Browse/upload Image Collab [13]. GoogLeNet framework will plot the


different curves for training loss and testing
Caltech-UCSD.Caltech-UCSD Birds 200(CUB- 200- accuracies.
2011) is a well-known bird image dataset with 200 E. Evaluation
category images. The dataset includes birds found
mostly in North America [6]. Caltech-UCSD Birds To produce possible results, the input will be
200 consists of 11,788 pictures and annotations such compared with the trained dataset an autograph,
as 312 binary attributes, 15 component positions, 1 consisting of nodes that ultimately form a network, is
bounding box. created during classification. A score sheet is
generated on the basis of this network and will be
B. Pre-processing created with the aid of the score sheet output [4].

Pre-processing developed a gray scale image dataset IV. EXPERIMENTAL RESULTS


that is used to pixel-by-pixel image recognition and
image size reductions. [2] Then, these functions are This section reports the experiment on the available
aggregated and forwarded to the classifier. This Caltech-UCSD Birds 200 (CUB-200- 2011) image
increased processing time while retaining quality of datasets and evaluation results The experimental
the image. analysis is carried out using a tensor flow library on
C. Deep learning the operating system Ubuntu16.04 and with a 2 GB
memory of NVIDIA Geforce GTX 680.The
This input file is fed to the device and forwarded to experimental results of classification on Number of
DCNN where a suitable dataset is coupled with CNN categories: 200, Number of images: 11,788,
[2]. A DCNN is composed of different layers of Annotations per image: 15 Part Locations, 312 Binary
convolution. Different alignment or features such as Attributes, 1 Bounding Box for classification of the
head, body, color, beak, form, and whole bird image bird species using deep learning [7] [11]
are considered to give maximum classification
accuracy This alignment is constructed through a deep
network to extract multi functions.
D. Classification

In this analysis, the GoogLeNet framework has been


used to identify the images. Tensorflow is a software
library which is created by Google and which is an
open source in nature It allows developers to monitor
each neuron (node) in order to adjust the parameters to
achieve the desired output [12]. Tensorflow has a
number of built-in image classification libraries.
Tensorflow [3] produces an autograph consisting of
sequence of processing nodes and retraining the
dataset to achieve greater recognition accuracy using
retrain.py in Google

Figure 2: Caltech-UCSD Dataset

3
978-1-7281-4141-1/$31.00 ©2020 IEEE
2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE)

Figure 3. Shown the process by which the bird is Amazon Mechanical Turk is focused on the idea that
detected from the image. It consists of following steps people can do far more work than machines, such as
recognizing objects in a picture or video, de-
• First upload the image and then consider the duplicating data, transcribing audio recordings or
various alignments such as head, body, color, researching data details [10]. Tasks like this have
beak and whole image from that file. historically been done by recruiting and undoing a
• Each alignment is given to extract functions broad temporary workforce (which is time consuming,
from multiple layers of the network via a expensive and difficult to scale).
deep network of conventions.
• The image will be taken into account after its
representation.
• Finally, the result of the classification (i.e. the
characteristics are aggregated and transferred
to the classifier) will be generated and the
bird species will be found

Figure 4: Images and annotations from CUB-200

The result is described in Figure 5 as a bar chart. This


reveals that out of 94 percent accuracy with a
Flycatcher, the CUB-200-2011 database has properly
classified 100 percent accuracy with other types as
present Flycatcher.

Figure 3: Bird detection process from image.

Figure 4 shown in green with a rough outline. To the


right of each picture is a table of attributes (one per
row, 11 out of a total of 25 attributes displayed) and
attribute values given by Amazon Mechanical Turk
workers looking at the picture [6].
Amazon Mechanical Turk (MTurk) provides a
wide environment where some work is done using
human intelligence.

Figure 5: Score Sheet

4
978-1-7281-4141-1/$31.00 ©2020 IEEE
2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE)

The average accuracy of all 7 classes is 92.90% which Conversions, Thus each GPU executes all of the CNN
is shown in the bar map. (Specific Bird classification operations on a separate data set. For each duplicate,
accuracy) measured gradients are averaged before weight changes occur
After analyzing the score sheet, it is observed that the which is the graph's synchronization stage.
accuracy obtained is less when using a single
parameter. But if a mixed (figure 4) approach is used V. CONCLUSION
which Increase accuracy by observing parameters such
This research examined the use of deep learning
as pose, wings, color, beak, head, etc.
algorithms to address the problem of identification of
We test our experiments in a GPU cluster with 2
bird species. The framework used Deep learning
NVIDIA GeForce GTX 680 dual GPU chips, 2 Intel
algorithm (Unsupervised Learning) for image
Xeon E5-2630 8-core processors and 128 GB of RAM
classification on the dataset (Caltech-UCSD Birds
fitted with each node. A 6 GB / s InfiniBand network
200). It consists of 200 categories or 11,788 photos.
performs inter-node communication. TensorFlow 2,
The CUB-200-2011 dataset consisted of mixed
operating on CUDA 7.5 and using cuDNN 5.1.3
parameters such as pose, wings, color, beak, arms, etc.
primitives for improved performance implements on
The initial study of the 200 images of Caltech-UCSD
the DCNN architectures with their training [2].
Birds for these species could then be carried out using
the findings as test images. The experiment was able to
achieve 90.93 percent by fine-tuning the images in
Tensorflow and trained on the GoogleNet dataset and
on the Caltech-UCSD Birds 200 dataset, which
included 11,788 images, the classification average
accuracy. Future research will concentrate on
developing Architecture for Real Time Smartphone.

REFERENCES

[1] Tóth, B.P. and Czeba, B., 2016, September.


Convolutional Neural Networks for Large-Scale Bird Song
Classification in Noisy Environment. In CLEF (Working Notes)
(pp. 560-568).

[2] Gavali, Pralhad, and J. Saira Banu. "Deep Convolutional


Neural Network for Image Classification on CUDA Platform." In
Deep Learning and Parallel Computing Environment for
Bioengineering Systems, pp. 99-122.
Academic Press, 2019

[3] Pradelle, B., Meister, B., Baskaran, M., Springer, J. and


Lethin,R., 2017, November. Polyhedral Optimization of
TensorFlow Computation Graphs. In 6th Workshop on Extreme-
scale Programming Tools (ESPT-2017)
Figure 6: Time and Test accuracy and when using different
number of workers.

Test set those are achievable by each configuration and


final accuracies are shown in the figure 6. The
computation graph variables are stored in RAM to
ensure fair sharing of weight among the models

5
978-1-7281-4141-1/$31.00 ©2020 IEEE
2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE)

at The International Conference for High Performance Computing,


Networking, Storage and Analysis (SC17). [13] https://colab.research.google.com/no te
books/gpu.ipynb
[4] Prof. Pralhad Gavali, Ms. Prachi Abhijeet Mhetre Bird
Species Identification using Deep Learning International Journal of [14] Elias Sprengel, Martin Jaggi, Yannic Kilcher, and
Engineering Research & Technology (IJERT) ISSN: 2278- 0181 Thomas Hofmann. Audio Based Bird Species Identification using
Vol. 8 Issue 04, April-2019 pp 68-72 Deep Learning Techniques. 2016.

[5] Goering, C., Rodner, E., Freytag, A., Denzler, J.,


“Nonparametric Part Transfer for Fine-grained Recognition”, IEEE [15] Forrest Briggs, Yonghong Huang, Raviv Raich The 9th
Conference on Computer Vision and Pattern Recognition (CVPR). annual MLSP competition: New methods for acoustic classification
2014 of multiple simultaneous bird species in a noisy environment. IEEE
International Workshop on Machine Learning for Signal
Processing, MLSP, 2013.
[6] Fagerlund, S., 2007. Bird species recognition using
support vector machines. EURASIP Journal on Applied Signal
Processing, 2007(1), pp.64-64.

[7] Wah, C., Van Horn, G., Branson, S., Maji, S., Perona, P.,
Belongie, S., “Similarity Comparisons for Interactive Fine-Grained
Categorization”, IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). 2014.

[8] Cireúan, D., Meier, U. and Schmidhuber, J., 2012. Multi-


column deep neural networks for image classification. arXiv
preprint arXiv:1202.2745.

[9] Andreia Marini, Jacques Facon and Alessandro L.


Koerich Postgraduate Program in Computer Science (PPGIa)
Pontifical Catholic University of Paran´a (PUCPR) Curitiba PR,
Brazil 80215–901 Bird Species Classification Based on Color
Features

[10] Image Recognition with Deep Learning Techniques


ANDREIPETRU BĂRAR, VICTOR-EMIL NEAGOE, NICU
SEBE Faculty of Electronics, Telecommunications & Information
Technology Polytechnic University of Bucharest.

[11] http://www.vision.caltech.edu/visipedi a
/CUB-200.html

[12] https://www.tensorflow.org/datasets/catal og/overview.

6
978-1-7281-4141-1/$31.00 ©2020 IEEE

You might also like