Professional Documents
Culture Documents
net/publication/323281289
CITATION READS
1 1,692
6 authors, including:
Nuwan Kuruwitaarachchi
Sri Lanka Institute of Information Technology
13 PUBLICATIONS 2 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Nuwan Kuruwitaarachchi on 07 December 2018.
Abstract—Illegal logging has been identified as a major prob- Forest Monitoring WSN Architecture that addresses the above
lem in the world, which may be minimized through effective concern.
monitoring of forest covered areas. In this paper, we propose and
describe the initial steps to build a new three-tier architecture for II. S YSTEM D ESIGN
Forest Monitoring based on Wireless Sensor Network and Chain-
saw Noise Identification using a Neural Network. In addition to
detection of chainsaw noises, we also propose methodologies to
localize the origin of the chainsaw noise. Listening Post Listening Post
(Arduino) (Arduino)
Index Terms—illegal logging, chainsaw, forest monitoring,
wireless sensor network, neural network, audio identification,
deforestation.
Listening Post Sound Processor in Forest Listening Post
I. I NTRODUCTION (Arduino) (Raspberry Pi) (Arduino)
powered by solar panels with battery packs as backup power In case 1, a radio link must be constantly maintained
systems. between each Listening Post and the Sound Processor to
exchange the sounds recorded for analysis and sound local-
TABLE I ization. It is important that the quality of such transferred
H ARDWARE D EVICES U SED
sounds remain adequate for analysis during exchange. How-
ever, since Listening posts are low power devices, it is also
Platform Additional Devices important that the selected communication platform has low
nRF24L01+ WiFi Module
power consumption. Therefore, a low cost, low power, high
nRF24L01+ WiFi Module Base
DS3231 Real Time Clock Module
speed communication chip, based on the nRF24L01+ 2.4 GHz
Arduino Uno RF Transceiver (Fig. 4) has been selected as the preferred
ADMP401 MEMS Microphone
Solar Panel technology. This chip is capable of transmitting data at speeds
Battery Pack upto 2 Mbps [9] and has a range up to 1000 m line of sight
nRF24L01+ WiFi Module with improvements [10].
nRF24L01+ WiFi Module Base In case 2, a radio link must be occasionally established be-
DS3231 Real Time Clock Module tween a Sound Processor and the Base Station to transmit data
Raspberry Pi 3 Model B
Texas Instruments CC1310 Module regarding any potential illegal logging activity that has been
Solar Panel
detected by the Sound Processor. Since the requirement of this
Battery Pack
link is transmission of a limited set of characters indicating
the type of illegal activity and the predicted coordinates of the
origin of the sound, it is not important that the link is high
III. C OMMUNICATION A RCHITECTURE speed. However, Sound Processors are located deep within a
Communication Architecture is a part of the System Archi- forest while Base Station remains outside a forest, there is a
tecture focused on utilizing different hardware and software considerable distance between each Sound Processor and the
components to transmit and receive data on different IoT Base Station. Hence, the range becomes an important factor.
platforms defined in the System Architecture. Ideally, such Again, Sound Processors are low powered devices, hence it
data transmissions between devices should be reliable and have is important that the selected communication platform has
considerable range while keeping associated legal/ licensing low power consumption. To fulfill these requirements Texas
focuses on identifying chainsaw sounds in the forest. There-
fore, using a properly trained DNN, the device can identify
chainsaw sounds with higher accuracy.
2) Data-sets: One of the main problems with training a
Deep Neural Network in a supervised manner is the amount
of computational effort and labeled data required for efficient
learning. But the authors were not able to find a proper data-set
of chainsaw sounds available publicly. Therefore, the authors
had to collect a proper data-set. Our team was able to gather
more than 100 chainsaw audio clips from an actual rain-forest
at various distances and directions with natural obstacles in-
between.
3) Experiment Setup: We used Fourier’s Transform to
convert our audio data to the frequency domain. This allows
for a much simpler and compact representation of the data,
which will be exported as a spectrogram (Fig. 7). This process
will give us an image file containing the evolution of all the
frequencies of audio through time.
Time is on the x-axis, and frequency on the y-axis. The
highest frequencies are at the top and the lowest at the bottom.
The scaled amplitude of the frequency is shown in grey-scale,
with white being the maximum and black the minimum.
Fig. 5. Texas Instruments CC1310 LaunchPad The next thing we must do is to deal with the length of the
spectrogram. We can create fixed length slices of the spectro-
gram, and consider them as independent samples representing
Instruments CC1310 (Fig. 5 [11] communication module has the audio. We can use square slices for convenience, which
been selected as the preferred technology. means that we will cut down the spectrogram into 128 x 128
pixel slices. This represents 2.56 s worth of data in each slice.
IV. S OFTWARE A RCHITECTURE After slicing all audio into square spectral images, We can
now train a Deep Neural Network to identify these samples.
Software architecture of the proposed system can be divided
For this purpose, we have used TensorFlow [14].
into four main sections: Sound Recognition Module, Sound
The CNN takes in wave pattern as spectrogram and it does
Localization Module, System Reliability and Real-time Web
so based on ”weights” and ”biases” that need to have a correct
Application. Fig. 6 represents a high-level view of the overall
value for the prediction to work well. Each ”neuron” in a
software architecture of the proposed system.
CNN does a weighted sum of all of its inputs, adds a constant
called the ”bias” and then feeds the result through some non-
A. Sound Recognition Module
linear activation function. In this CNN we used softmax as an
1) Sound Identification Process: This operation identifies activation function for the last layer.
the incoming sound signals from the environment. To fulfill The training operation is happened on the CNN using
the requirement of the device we have tried two approaches 100 sound clips in a batch. There were three Convolutional
which are Audio Fingerprinting Mechanism and Deep Neural layers, three ReLU layers, three Pooling layers and one Fully
Network for Sound Identification (DNN). Connected layer with 50% dropout.
The first approach was an Audio Fingerprinting Mechanism Considering the available hardware resources and power
[12]. This mechanism repeatedly uses FFT over small windows consumption limitations I decided to use Berry Conda package
of time in the audio samples to create a spectrogram of the environment management system [15] to deploy this module
audio. Based on these spectrograms, it stores combination inside the Raspberry Pi. Berry Conda is a conda based Python
of frequency-time data into a database. By using this data- distribution for the Raspberry Pi. With it, we can install and
set, it will compare the incoming sound signal. but the worst manage a scientific or Pydata stack on our Raspberry Pi using
case is it compares the exact value with the available data- conda, a package and environment management system. All
set. Therefore, in order to increase the accuracy of the audio this can be done without compiling a single package.
identification process, it needs to maintain a huge data-set 4) Test Results: The module was evaluated with test data
inside the database. Since the Sound Processor has a limited to check the accuracy of the module.
storing capacity this approach was not a suitable solution for Fig. 9 and Fig. 10 depicts the loss and accuracy of the audio
the problem. identification module. The lower the loss, the better a model
The second approach was to use a Deep Neural Network (unless the model has over-fitted to the training data). The loss
(DNN) with a proper data-set [13]. This module mainly is calculated on training and validation and its interpretation
REST API Web Application
Remote
Sound Localization Module Database
Audio
2D Audio Identification
TDOA
Multilateration Module
Module
Module
Cloud Pusher
NRF24L01+ Radio
Module Transmitter
Web Socket API
Python Interpreter
Main Module
NRF24L01+
Module
Message
Composer
Radio
Transmitter
Protothreading Module
NRF24L01+ Radio
Received Audio Module Transmitter Audio Transmitter
(WAV Format)
+ Timestamp
Loss
2.6
2.4
2.0
1.8
Loss
1.6
1.2
is how well the model is doing for these data. Loss is not in
1.0
percentage as opposed to accuracy and it is a summation of
the errors made for each sample in training or validation sets. 0.8
0 50 100 150 200 250 300 350
The loss graph approximately gives the value to 0.98. Epoch
According to the regression line, the module can further reduce
the loss function value by more epochs. However, the trained Fig. 9. Plot with Loss
module accuracy level is very high. Out of 50 test samples,
the module was able to identify 48 samples correctly.
is happening. To give the solution we tried out new methods
B. Sound Localization and available methods to find out which is fitted best for
Sound source localization in outdoor Environment without our purpose and architecture. Finally came up with TDOA
using microphone arrays and for specific architecture. Output combined with Multilateration as the solution for the SSL.
of this component is to point out the location where logging TDOA is basically about the time difference of received
Accuracy and vice versa. An FFT rapidly computes such transformations
1.0
by factorizing the DFT matrix into a product of sparse (mostly
0.9
zero) factors. As a result, it manages to reduce the complexity
of computing the DFT from O(n2 ), which arises if one simply
applies the definition of DFT, to O(n log n), where n is the
0.8
data size figure.
3) Noise Filtering: Basically, it reduces selected frequen-
0.7
cies 6 decibel times than initial intensity. High pass filter is
Accuracy
50m
R0
R2