Acoustic Detection of Drone: Mel Spectrogram

Acoustic Detection of Drone
Presented by : Jasvinder Singh (20116040) and Lalit Kumar (20116050)

Supervised by : Dr. Ekant Sharma
Department of Electronics and Communication Engineering
Introduction Mel Spectrogram
 In recent years, Unmanned Aerial Vehicles (UAV’s)  It is representation of the spectrum of a audio
usage has increased, with a wide range of applications signal, where the frequency range is divided
ranging from photography, surveillance, video into a set of bands that are spaced according
monitoring. to the mel scale.
 However, many a times drones are engaged in malicious  Mel scale is non-linear scale, designed to map
activities such as armed drones can harm some differences in pitch to different frequencies.
individual, drones can be used to spying.  They can be deployed to identify patterns and
features that are relevant for analysis
 Therefore, to address such concerns, we aim to purpose
 Optimization of Mel spectrograms ,
a deep learning framework using CNN (Convolutional  Normalization of spectrograms
Neural Networks) that detects incoming drone using its Figure 1. Drone  Data augmentation
acoustic signature..  Conversion to grayscale
Figure 3. Mel Spectrogram of  Use Transfer learning
a) Noise b) Drone sound
Convolutional Neural Network

Model Training
 CNN is a type deep learning model ,
primarily used for image recognition and The original dataset is
classification. divided into sets as 70%
 They are effective at identifying spatially for training, 15% for
related features such as specific pattern in validation and 15% for
images. testing the model. Then
 Convolutional layer is first layer of CNN, model is trained on
that has a set of kernels , which on training dataset while
convolving over input image gives feature checking its accuracy
Figure 2. Convolving a kernel of 3x3
map ,,extracting relevant features such as over training and
edges , lines. validation dataset. It is
 Pooling layer is second layer of CNN , that down sample output of first layer , reduces observed that the
spatial dimension of input , preserving essential information. accuracy of model over
 Generally , Adaptive pooling is being preferred , as fixed size pooling kernel might lead to validation dataset is not
loss of information of feature map improving for 10 epochs
 Faster model training , less memory requirements , as adaptive pooling reduces numbers of model is then stopped
parameters . training for avoiding
 Fully Connected layer is last layer , uses output of previous layers to make predictions overfitting and tested on
about input image test dataset.
 Different activation functions can be used in different layers of a neural network to achieve
the desired behaviour and performance Graph 1. Finding number of epochs for training model
Methodology
Drone
Not Drone
Figure 4. Audio signal Figure 5. Mel Spectrogram Figure 6. CNN architecture Figure 7. Model Output
 Obtaining Data : The audio dataset  Data Pre-processing: Audio files are  Training the CNN: CNN is trained using the  Testing the model: Model is tested
having drone sounds is obtained from converted into Mel spectrograms. This pre-processed audio data. Various features are first on part of audio dataset obtained
GitHub repository [2]. The audio dataset involves segmenting the audio into extracted from the generated spectrograms by the from GitHub to evaluate its accuracy
contains mono audio of drone sounds of small windows and performing a Fast algorithm to train the deep learning models. in detecting drones. Then, on drone
one second duration. This data is used for Fourier transform to obtain the Model learns to identify patterns that are unique audio obtained from field testing
training the model. frequency spectrum. to drone. dataset.
Result Discussion
The accuracy obtained on The performance of model on different datasets obtained can be attributed to the lack of
dataset from GitHub repository clear acoustic signature. The audio in initial dataset are clear and have negligible noise
is better than that obtained on while in field testing dataset, the audio is not clear and noise is comparable to audio of
dataset obtained from the field drone. Moreover, when the drone is flying far from the recording source, its sound was very
testing videos. The recall of 35% weak. Better microphone can be used for better accuracy of detection.
indicates only 35% drones were
being detected by model. Table 1. Detection Results
References
[1]. S. Al-Emadi, A. Al-Ali, A. Mohammad and A. Al-Ali, "Audio Based Drone Detection
Future Work and Identification using Deep Learning," 2019 15th International Wireless Communications
& Mobile Computing Conference (IWCMC), Tangier, Morocco, 2019, pp. 459-464, doi:
In our future work, we will do real time field testing using a parabolic microphone for better 10.1109/IWCMC.2019.8766732.
capture of acoustic signature of drone. Weak sound signal need to be amplified for better [2]. [Online]. Available: https://github.com/saraalemadi/DroneAudioDataset
identification. Combination of multiple audio clips from different drones to mimic a swarm of [3]. Bernardini, Andrea & Mangiatordi, Federica & Pallotti, Emiliano & Capodiferro, Licia.
drones' attack can be trained and tested.
Rushil Motwani (20116083)
Drone detection by acoustic signature.10.2352/ISSN.2470-1173.2017.10.IMAWM-168.
Sanchit Garg (20116086)

Acoustic Detection of Drone: Mel Spectrogram

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Acoustic Detection of Drone: Mel Spectrogram

Uploaded by

Copyright:

Available Formats

Acoustic Detection of Drone

Presented by : Jasvinder Singh (20116040) and Lalit Kumar (20116050)

Introduction Mel Spectrogram

Convolutional Neural Network

Sanchit Garg (20116086)

You might also like