10.1007@978 3 030 38445 6

Studies in Computational Intelligence 885
Vinit Kumar Gunjan

Jacek M. Zurada
Balasubramanian Raman
G. R. Gangadharan Editors
Modern Approaches
in Machine
Learning and
Cognitive Science:
A Walkthrough
Latest Trends in AI
Studies in Computational Intelligence
Volume 885
Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new develop-
ments and advances in the various areas of computational intelligence—quickly and
with a high quality. The intent is to cover the theory, applications, and design
methods of computational intelligence, as embedded in the fields of engineering,
computer science, physics and life sciences, as well as the methodologies behind
them. The series contains monographs, lecture notes and edited volumes in
computational intelligence spanning the areas of neural networks, connectionist
systems, genetic algorithms, evolutionary computation, artificial intelligence,
cellular automata, self-organizing systems, soft computing, fuzzy systems, and
hybrid intelligent systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.
The books of this series are submitted to indexing to Web of Science,
EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink.
More information about this series at http://www.springer.com/series/7092

Vinit Kumar Gunjan Jacek M. Zurada
• •
Balasubramanian Raman G. R. Gangadharan

•
Editors
Modern Approaches
in Machine Learning
and Cognitive Science:
A Walkthrough
Latest Trends in AI
123
Editors
Vinit Kumar Gunjan Jacek M. Zurada
Department of Computer Science Department of Electrical
and Engineering and Computer Engineering
CMR Institute of Technology University of Louisville
Hyderabad, Telangana, India Louisville, KY, USA
Balasubramanian Raman G. R. Gangadharan

Department of Computer Science Department of Computer Applications
and Engineering National Institute of Technology
Indian Institute of Technology Roorkee Tiruchirappalli, Tamil Nadu, India
Roorkee, Uttarakhand, India
ISSN 1860-949X ISSN 1860-9503 (electronic)

Studies in Computational Intelligence
ISBN 978-3-030-38444-9 ISBN 978-3-030-38445-6 (eBook)
https://doi.org/10.1007/978-3-030-38445-6
© Springer Nature Switzerland AG 2020
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Today’s information and data technologies are advancing rapidly and machines that
replace humans in such onerous tasks as decision making, data analysis, opti-
mizations and others are becoming more intelligent and efficient than ever. Machine
learning and cognitive science approaches are the most essential component of this
new wave of intelligent computing. They are driven by innovations in computing
power and based on the firm foundation of mathematics, statistics and curation of
large datasets. Last but not least, this quick progress is aided by the democratization
of software, inexpensive data storage and the vast needs of social platforms that
have spread across the world. Today, in order to succeed, almost every organization
needs to integrate these methods into the business fabric. Nonetheless, these ideas
were out of reach for the organizations until few years ago.
The purpose of this book is to contribute to this comprehensive knowledge of the
fast- growing area of machine learning and cognitive science research. The editors
focused on facilitating a cohesive view of the framework for this novel applied
research discipline by focusing on modern approaches in machine learning and
cognitive sciences and their applications. This book is also intended as a tool for
advancing machine learning and cognitive science studies. It is particularly suitable
for researchers and application scientists in machine learning, cognitive sciences
and data technologies. This book should also help as a reference for scholars
intending to pursue research in these fields.
This book makes few assertions about the reader’s context, due to the inter-
disciplinary nature of the content. Additionally, it incorporates fundamental con-
cepts from statistics, artificial intelligence, information theory and other fields as the
need arises, concentrating on just those main concepts that are most applicable to
machine learning and cognitive sciences. Through the discussion of select numbers
of case studies, this book gives the researchers a detailed perspective of the vast
panorama of research directions. This is in hope to help the readers with effective
overview of applied machine learning, cognitive and related technologies.
This volume consists of 18 chapters, arranged on the basis of their approaches
and contributions to the scope of this book. The chapters of this book present key
algorithms and theories that form the core of the technologies and applications
v
vi Preface
concerned, consisting mainly of face recognition, evolutionary algorithms such as

genetic algorithms, automotive applications, automation devices with artificial
neural networks, business management systems and modern speech processing
systems. This book also covers recent advances in medical diagnostic systems,
sensor networks and systems of VLSI domain. Discussion of learning and software
modules in deep learning algorithms is added wherever suitable.
Hyderabad, India Dr. Vinit Kumar Gunjan

Louisville, USA Dr. Jacek M. Zurada
Roorkee, India Dr. Balasubramanian Raman
Tiruchirappalli, India Dr. G. R. Gangadharan
Contents
Face Recognition Using Raspberry PI . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Shruti Ambre, Mamata Masurekar and Shreya Gaikwad
Features Extraction for Network Intrusion Detection Using
Genetic Algorithm (GA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Joydev Ghosh, Divya Kumar and Rajesh Tripathi
Chemical Sensing Through Cogno-Monitoring System
for Air Quality Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Kanakam Prathyusha and ASN Chakravarthy
3 DOF Autonomous Control Analysis of an Quadcopter Using
Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Sanket Mohanty and Ajay Misra
Cognitive Demand Forecasting with Novel Features Using
Word2Vec and Session of the Day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Rishit Dholakia, Richa Randeria, Riya Dholakia, Hunsii Ashar
and Dipti Rana
A Curvelet Transformer Based Computationally Efficient Speech
Enhancement for Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Manju Ramrao Bhosle and K. N. Nagesh
Dexterous Trashbot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Eshwari A. Madappa, Amogh A. Joshi, P. K. Karthik, Ekhelikar Shashank
and Jawali Veeresh
Automated Question Generation and Answer Verification Using
Visual Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Shrey Nahar, Shreya Naik, Niti Shah, Saumya Shah and Lakshmi Kurup
Comprehensive Survey on Deep Learning Approaches
in Predictive Business Process Monitoring . . . . . . . . . . . . . . . . . . . . . . . 115
Nitin Harane and Sheetal Rathi
vii
viii Contents
Machine Learning Based Risk-Adaptive Access Control System

to Identify Genuineness of the Requester . . . . . . . . . . . . . . . . . . . . . . . . 129
Kriti Srivastava and Narendra Shekokar
An Approach to End to End Anonymity . . . . . . . . . . . . . . . . . . . . . . . . 145
Ayush Gupta, Ravinder Verma, Mrigendra Shishodia
and Vijay Chaurasiya
PHT and KELM Based Face Recognition . . . . . . . . . . . . . . . . . . . . . . . 157
Sahil Dalal and Virendra P. Vishwakarma
Link Failure Detection in MANET: A Survey . . . . . . . . . . . . . . . . . . . . 169
Manjunath B. Talawar and D. V. Ashoka
Review of Low Power Techniques for Neural Recording
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
P. Brundavani and D. Vishnu Vardhan
Machine Learning Techniques for Thyroid Disease Diagnosis:
A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Shaik Razia, P. Siva Kumar and A. Srinivasa Rao
Heuristic Approach to Evaluate the Performance of Optimization
Algorithms in VLSI Floor Planning for ASIC Design . . . . . . . . . . . . . . 213
S. Nazeer Hussain and K. Hari Kishore
Enhancement in Teaching Quality Methodology by Predicting
Attendance Using Machine Learning Technique . . . . . . . . . . . . . . . . . . 227
Ekbal Rashid, Mohd Dilshad Ansari, Vinit Kumar Gunjan
and Mudassir Khan
Improvement in Extended Object Tracking with the Vision-Based
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
and Muqeem Ahmed
Face Recognition Using Raspberry PI
Shruti Ambre, Mamata Masurekar and Shreya Gaikwad
Abstract In an age where public security is a priority, there is a growing need

for autonomous systems capable of monitoring hotspots to ensure public safety.
Face recognition technology could create an increased level of security, enable busi-
nesses and governments to save money on CCTV monitoring staff, and increase
business productivity by automating processes such as attendance monitoring. In
recent decades, such a system would have been unfeasible to implement due to
cost and technological restraints. This study aims to explore a real-time face recog-
nition system using easily-attainable components and libraries, such as Raspberry
PI and Dlib, Face Recognition library and Open Source Computer Vision Library
(OpenCV). It also covers various face recognition machine learning algorithms. The
results show that in real-time applications, the system runs at 2 frames per second
and recognizes faces despite Raspberry PI’s limitations such as low CPU and GPU
processing power.
Keywords Face recognition · Haar cascade · OpenCV · Python · Raspberry PI
1 Introduction
The temporal lobe of the brain is responsible for face recognition in humans and
is an essential part of the human perception system. Similarly, in machine learning
systems, facial recognition is a technology capable of autonomously identifying or
verifying a person from a digital image or video in real-time. This is accomplished
by comparing and analyzing patterns, such as contours and facial features [1]. Com-
pared to other methods of identification such as physiological biometrics like finger
scan, iris scan and behavioral biometrics like voice scan, signature scan, face recog-
nition has the advantage of its non-invasive and user-friendly nature. A real-time
face recognition system can be scrutinized from almost any location that has a com-
puter with an internet connection. Facial images can be recorded and recognized
from a distance without interacting with a person, which is particularly beneficial for
S. Ambre (B) · M. Masurekar · S. Gaikwad

St. Francis Institute of Technology, Mumbai, India
© Springer Nature Switzerland AG 2020 1
V. K. Gunjan et al. (eds.), Modern Approaches in Machine Learning and Cognitive
Science: A Walkthrough, Studies in Computational Intelligence 885,
https://doi.org/10.1007/978-3-030-38445-6_1
2 S. Ambre et al.
security and surveillance purposes. Practical applications such as real-time crowd

surveillance, criminal identification, human-computer interface, and the prevention
of unauthorized personnel from accessing restricted areas, etc. are made possible
with face recognition. Furthermore, for government security surveillance, personal
information such as name, address, criminal record, etc. can be obtained by further
analyzing recognition results. Face recognition can also be used for generic pur-
poses in institutions such as schools, shopping malls and other public and private
enterprises for access control and as a content-based database management system.
Face recognition is a two-part procedure, face detection must precede face recog-
nition. Real-time face detection was made attainable by the work of Viola and Jones
[2]. In this paper, Haar cascade algorithm proposed by Viola and Jones, histogram
of oriented gradients (HOG) and Linear Support Vector Machines (SVM) are used
for face detection and Geitgey’s Face Recognition library for Python and command
line, OpenCV and Dlib were used for face recognition.
The algorithms can be used with relatively low-cost hardware, i.e. Raspberry PI
and Raspberry PI (RPI) camera. The proposed system was cost-effective.
The paper is structured as follows: Sect. 2 covers literature review on substantial
findings within face recognition. Section 3 focuses on the system design, components
and its operation. In Sect. 4, results and analysis have been noted, and finally, Sect. 5
concludes the study.
2 Literature Review
Ishita et al. [3] proposed a system for face detection and recognition using Raspberry
PI, Principal Component Analysis (PCA) and the Eigenface approach. For detecting
faces, Haar feature-based cascade classifier is used which uses positive and negative
images (images with and without faces, respectively) to train the classifier. The PCA
is an algorithm that reduces number of (possibly) correlated variables into smaller
number of uncorrelated variables. While capturing and training, various images of
positive and negative types are created. To display results, LCD as well as command
terminal are used. LCD shows the name of a person whose face is detected with
‘present’ status, and in the terminal window, their name is printed with a ‘presentOk’
status.
Ali et al. [4] proposed a system where an LED glows if a face is detected within
a specific range. Haar classifier, developed by the Viola-Jones algorithm is used for
face detection. The Eigen features of the face for tracking its position are detected
using MATLAB and Raspberry PI. This system is using a camera and LED on the
array of LEDs. The LEDs will glow where the face is located and it will track the
face within its commanded limit. The frame is given as an input to Haar classifier and
further it is given to Eigenface where Eigen features detects eyes, nose and mouth.
This information is passed to the geometric transformation where without modifying
the color of the image, the adjustment of pixels takes place. By specifying the LED
Face Recognition Using Raspberry PI 3
(ON) condition, the LED will glow if the face enters a specific range. But this system
is limited for single person’s face only.
Ayman et al. [5] proposed a system for optimizing face detection and recognition
in real-time using Raspberry PI, OpenCV and Python. The faces in frames sent by the
camera to Raspberry PI are detected using the Boosted Cascade of Simple Features
(BCOSF) algorithm, cropped by Python-based program and the cropped faces are
routed to one of the computers (Linux based servers) connected by TCP/IP based over
an Ethernet connection. The cropped faces are recognized using the Local Binary
Pattern Histograms (LBPH). If any face is unfamiliar and the recognition system is
unable to recognize it, the face is passed on to another computer. But as the image
size reduces, the quality of image reduces which results in increase in the average
error rate. Factors like lightning, distance and camera resolution affect the results of
system. The more servers, the more accurate the result will be. The cost and security
between servers should be considered while designing the system.
Priya and Purna [6] used Raspberry PI, OpenCV, Haar cascade, LBPH Recognizer,
and Viola-Jones framework for face detection and recognition. The total system is
divided into 3 modules—dataset creation, training the dataset, testing, sending alert
messages as an extension. In the first part the dataset has been created by taking the
input image, converting the images to grayscale and then storing it with an ID. In the
training phase LBPH face recognizer is initialized, then the faces and IDs are trained
using LBPH and are later saved as an xml or yml file. In the testing phase, the system
has used Viola Jones algorithm for face detection. In this algorithm, each feature
is represented as a single value obtained by subtracting the summation of white
rectangles from black ones. As the number of classifiers increase, it becomes more
complex. This has been overcome by using integral images. As the classification
takes place in stages, the regions that pass through all the classifiers are detected
as faces and local binary pattern algorithm is used for face recognition. Adaboost
machine learning algorithm has been used to cascade the classifier and thus increase
the efficiency.
Umm-e-Laila et al. [7] proposed a system uses Raspberry PI, Python, C++, Java,
MATLAB etc. for implementation. The program is designed to use all the algorithms
of OpenCV for greater efficiency and speed. The system graphical user interface
(GUI) has been designed to detect and recognize faces in real-time using a webcam.
After the camera is activated the system uses algorithms such as LBPH to detect
the faces in real-time. The system GUI includes the option for choosing a specific
algorithm before starting face detection. It has been concluded that Raspberry PI
eliminates the machine dependencies, and LBPH algorithm gives better results in
terms of accuracy, while Fishface algorithm provides better results in terms of time
consumption. It has also been observed that most of the algorithms perform similarly
because of Raspberry PI’s processing power.
4 S. Ambre et al.
3 Proposed System
3.1 System Architecture
The system design in Fig. 1 consists of PI camera module that will be used for real-
time video streaming, capturing input frames and for forwarding the frames to the
Raspberry PI. The Raspberry PI will detect and recognize the faces in the frame with
the resultant output being shown in a real-time video stream on connected devices.
The Raspberry PI is connected to an ethernet and the devices to Wi-Fi or cellular
data.
The system workflow is as follows:
1. RPI camera connected to the Raspberry PI live streams video.
2. The Raspberry PI detects faces from captured frames, computes face embed-
dings, and compares the vector output to the known database and labels the most
matched face based on k-NN classification
3. The Raspberry PI and devices are connected over a virtual network using Vir-
tual Network Computing (VNC), where the Raspberry PI acts as a server and
the devices as clients. The Raspberry PI should be connected to an ethernet
connection and the devices must be networked with TCP/IP.
4. Devices connected to the Raspberry PI can see the labels for recognized faces in
a real-time video stream.
5. If the face is not recognized, an unknown label is shown for the detected face.
Fig. 1 System design

3.2 Components
Hardware Includes Raspberry PI 3 b+ model, Raspberry PI camera module v1,

ethernet cable and power supply.
Raspberry PI b+ Model. Raspberry PI is a small and affordable computer that can be
plugged to a monitor or an LCD and can be used with a keyboard and a mouse. It can
be used to build devices that are Internet of Things (IOT) or sensor based. Raspberry
PI is mainly used for projects that don’t require much processing power or storage
space.
PI 3 Model B makes use of 1.2 GHz 64-bit Quad Core Processor and consists
of 1 GB RAM. It has an onboard Wi-Fi and Ethernet Port and also consists of one
microSD slot.
Raspberry PI camera module v1. The Raspberry PI camera module can record videos
and capture still images. The camera module is compatible with all Raspberry PI
models. It has a native resolution of 5MP. The camera module supports a sensor
resolution of 2592 × 1995 pixels and 1080p30, 720p60 and 640 × 480p60/90 video
modes.
Software Python language was used for programming face recognition.

Haar cascade classifier. Viola and Jones proposed a machine learning object detec-
tion algorithm known as Haar cascade, to identify objects in images and videos based
on the concept of features [2]. The cascade function is trained on a huge dataset of
negative and positive images. For face detection, the algorithm requires positive
images that contain faces and negative images that don’t contain faces to train the
classifier, which then is used to extract the features from.
Every feature is a single value obtained by calculating the difference between the
sums of pixel intensities in each region. Due to the quantity of features used, integral
image concept is applied to prevent an increase in computation time. It also helps to
simplify pixel calculations (Fig. 2).
Adaboost is used for feature selection, reducing the complexity of classifiers and
training them. After which each feature is applied on the training images and for
every feature, the best threshold is found to classify the faces as positive or negative.
However, since it is inefficient and time consuming, cascading classifiers are used
to obtain the best features from a face by grouping the features into a stage of
classifiers and processing or discarding the face region according to the stages of
features passed.
OpenCV. Open Source Computer Vision Library (OpenCV) [9] is a machine learn-
ing library with programming functions for real-time computer vision and image
processing. OpenCV library consists of many built-in packages for face recognition.
The library includes linear and non-linear image filtering, geometric image transfor-
mations, changing color spaces, smoothing images, image thresholding, histograms,
6 S. Ambre et al.
Fig. 2 Haar classifier

patterns [8]
and so on. OpenCV library includes functions for algorithms such as Haar classifier,
Histogram of oriented gradients (HOG), Eigenfaces, Fisherfaces and Local Binary
Patterns Histograms (LBPH).
Histogram of Oriented Gradients. The Histogram of Oriented Gradients is a feature
descriptor used in image processing for object detection. The HOG descriptor tech-
nique counts occurrences of gradient orientation in localized portions of an image -
detection window, or region of interest (ROI).
The HOG descriptor algorithm is as follows:
1. Divide the image into small connected regions called cells, and for each cell
compute a histogram of gradient directions or edge orientations for the pixels
within the cell.
2. Discretize each cell into angular bins according to the gradient orientation.
3. Each cell’s pixel contributes weighted gradient to its corresponding angular bin.
4. Groups of adjacent cells are considered as spatial regions called blocks. The
grouping of cells into block is the basis for grouping and normalization of
histograms.
5. Normalized group of histograms represents the block histogram. The set of these
block histograms represents the descriptor [10].
Support Vector Machines. Support Vector Machine (SVM) is a machine learning
algorithm used for classification and regression. It uses supervised learning to group
the data into two categories. It is trained with a collection of categorized data. The
aim of SVM algorithm is to determine which category a new data point belongs to. It
should not only classify the data but also draw a margin between the two categories
as wide as possible.
k-Nearest Neighbor. k-Nearest Neighbor (k-NN) is known for being a lazy learning
algorithm and is used for classification and regression predictive problems. k-NN is
Fig. 3 Euclidean distance

[11]
easy to interpret and has low calculation time. k-NN classifier, classifies test points
by finding the most similar class among the k-closest examples. “k” is the number
of training set items that are considered for classification. Larger k gives smoother
boundaries and better generalization but it is important that the locality is preserved
(Fig. 3).
Dlib Library. Dlib [12] is an open source toolkit built using C++ containing machine
learning algorithms like classification, clustering, regression, data transformation,
structure prediction. The designs are modular, easily executable, and can be imple-
mented by using C++ API. It is used solve real world problems, both in industries
and academia. It has its application in the domain of robotics, embedded devices,
face recognition. Deep matrix learning tool of Dlib can be used for face recognition.
It is a pretrained ResNet model that recognizes faces with accuracy of 99.38%.
Face Recognition Library. Face Recognition [13] is a Python programmed library that
can be used for recognizing faces. It is built using Dlib’s state of art face recognition.
This library also provides command line tools. It can be used for finding faces,
manipulating facial features and for identifying faces in pictures.
3.3 Working
Many algorithms can be used for face detection and face recognition. The algorithms
proposed in this paper for face detection are Haar cascade, HOG and Linear SVM.
For face recognition, the system will make use of Geitgey’s Face Recognition Library
for Python, OpenCV and King’s Dlib library. The open source Dlib library, OpenCV
library and Face Recognition library for Python contain built-in face recognition
algorithms that have been used in this system.
The system consists of three subsystems - dataset creation, dataset training, and
testing.
Dataset Creation Creation of a folder containing images of a person. The folder

name should be the name of the individual whose photos are contained within.
Dataset Training As per the flowchart in Fig. 4, Dlib’s HOG and Linear SVM
algorithms are used for face detection. For every image in the dataset, the person’s
name is extracted from the path, the image is converted to RGB, faces are localized
in the image using HOG and Linear SVM. The Face Recognition library uses the
8 S. Ambre et al.
Fig. 4 Flowchart for dataset training
Dlib module to compute face embeddings using Dlib’s deep metric network. A pickle
with 128-d embeddings for each face in the dataset along with their names is created.
Dataset Testing The known 128-d face embeddings are loaded and the video stream
is initialized with OpenCV’s Haar cascade for localizing and detection of faces. The
RPI camera module then starts video streaming to connected devices in real-time.
A frame is captured from the threaded video stream and resized for preprocessing.
OpenCV is then used for converting the input frame to grayscale for face detection
and to RGB for face recognition. Faces in the grayscale frame are then detected
using Haar cascade and bounding box coordinates are returned. Face Recognition
library is then used to compute facial embeddings for each face in the bounding
box. The Face Recognition library makes use of k-NN algorithm for calculating the
Euclidean distance between the candidate facial embedding and the known facial
embeddings of the dataset. This distance shows how similar the faces are. If the
calculated distance is above a tolerance of 0.6, then “True” is returned, indicating the
faces match. Otherwise, false is returned. A list of True/False values, one for each
image in the dataset is returned. The indexes of the images with “True” values are
stored and the name associated with each index is used to count the number of “votes”
for each name. The recognized face with the largest number of votes is selected. This
is done for each detected face in the frame. The predicted name is then used to label
the faces in the frame. The resultant output is video streamed in real-time for other
connected devices to view. See Fig. 5.
4 Results and Discussions
The resulting figures show the output. In Fig. 6, Mamata, Shreya and Shruti’s faces
are detected accurately.
In Fig. 7, Mamata’s face is detected and the person whose photos are not in the
dataset is labelled as unknown.
Fig. 5 Flowchart for dataset testing
Fig. 6 Output showing

recognized faces
Because of Raspberry PI’s processing and GPU power limitations, the frame rate
was noted to be 1–2 FPS. Due to the low frame rate, an error causing inconsistencies
in facial recognition can be seen in Fig. 8.
10 S. Ambre et al.
Fig. 7 Output showing

recognized and unrecognized
faces
Fig. 8 Output showing error
5 Conclusion
The system was able to recognize the faces in the video stream in real-time and
thereby, labeled them correctly. The desktop interface of Raspberry PI was remotely
accessed from multiple devices through Virtual Network Computing (VNC). The
system was cost effective. However, one can only obtain a frames per second (FPS)
rate of 1–2. The Raspberry PI while powerful for such a cheap and small device, is
limited in terms of processing power, memory and especially without GPU power.
Factors such as lighting, camera resolution and distance affected the face recognition
process as well. Face recognition is the first step in many of its applications such
as expression analysis, attendance monitoring, security systems, surveillance and
human computer interface.
Acknowledgements We would like to thank our mentor, Prof. Vivian Lobo for his support and
encouragement. We would also like to thank Mr. Thomas Calvert for his diligent proofreading,
which greatly improved the research work.
References
1. Techopedia.com. What is Facial Recognition?—definition from Techopedia. Available at:

https://www.techopedia.com/definition/32071/facial-recognition. Accessed 8 Feb 2019
2. Viola, P., Jones, M.J.: Rapid object detection using a boosted cascade of simple features.
In: 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,
pp. 511–518. USA (2001). https://doi.org/10.1109/cvpr.2001.990517
3. Ishita, G., Varsha, P., Chaitali, K., Shreya, D.: Face detection and recognition using Rasp-
berry PI. In: IEEE International WIE Conference on Electrical and Computer Engineering
(WIECON-ECE), pp. 83–86, pp. 1–4. Pune, India (2006). https://doi.org/10.1109/wiecon-ece.
2016.8009092
4. Ali, A.S., Zulfiqar, A.Z., Bhawani, S.C., Jawaid, D.: Real-time face detection/monitor using
Raspberry PI and MATLAB. In: 2016 IEEE 10th International Conference on Application of
Information and Communication Technologies (AICT), pp. 171–174. Baku, Azerbaijan (2016).
https://doi.org/10.1109/icaict.2016.7991743
5. Ayman, A.W., Amir, O.H., Mohammad, J.T., Sajed, Y.H.: Raspberry PI and computers-based
face detection and recognition system. In: 4th International Conference on Computer and
Technology Applications. Istanbul, Turkey (2018). https://doi.org/10.1109/cata.2018.8398677
6. Priya, P., Purna, S.: Classroom attendance using face detection and Raspberry-Pi. Int. Res. J.
Eng. Technol. (IRJET) 05, 167–171 (2018)
7. Umm-e-Laila, A.A., Muzammil, A.K., Muhammad, K.S., Syed, A.M., Khalid, M.: Compar-
ative analysis for a real-time face recognition system using Raspberry PI. In: 2017 IEEE 4th
International Conference on Smart Instrumentation, Measurement and Applications, pp. 1–4.
Putrajaya, Malaysia (2017). https://doi.org/10.1109/icsima.2017.8311984
8. Docs.opencv.org. OpenCV: face detection using haar cascades. Available at: https://docs.
opencv.org/3.4.3/d7/d8b/tutorial_py_face_detection.html. Accessed 8 Feb 2019
9. Opencv.org. OpenCV. Available at: https://opencv.org/. Accessed 8 Feb 2019
10. Software.intel.com. Histogram of oriented gradients (HOG) descriptor. Available at: https://
software.intel.com/en-us/ipp-dev-reference-histogram-of-oriented-gradients-hog-descriptor.
Accessed 8 Feb 2019
11. En.m.wikipedia.org. Euclidean distance. Available at: https://en.m.wikipedia.org/wiki/
Euclidean_distance. Accessed 8 Feb 2019
12. Dlib.net. Dlib C++ library. Available at: http://dlib.net/. Accessed 8 Feb 2019
13. GitHub. Ageitgey/face_recognition. Available at: https://github.com/ageitgey/face_
recognition. Accessed 8 Feb 2019
14. Rosebrock, A.: Raspberry Pi face recognition—PyImageSearch. PyImageSearch. Available at:
https://www.pyimagesearch.com/2018/06/25/raspberry-pi-face-recognition/. Accessed 8 Feb
2019
15. Lienhart, R., Maydt, J.: An extended set of Haar-like features for rapid object detection. In:
Proceedings International Conference on Image Processing, pp. I-900–I-903 (2002). https://
doi.org/10.1109/icip.2002.1038171
Features Extraction for Network
Intrusion Detection Using Genetic
Algorithm (GA)
Joydev Ghosh, Divya Kumar and Rajesh Tripathi
Abstract Nowadays, Internet has emerged as one of the essential part of human
life. This increase in the use of internet has led to the sharing of large amount of data
across the internet. This data are susceptible to the attacks from various malicious
users and thus needs to be protected from them. As a result of this, the Intrusion
Detection System has emerged as a widely researched area from preventing such
malicious users to get access to these data. Many machine learning approaches in the
detection of such intrusion have been proposed and implemented on a large amount
of data leading to the design of various intrusion detection system. In this paper, a
Genetic Algorithm approach to extract the minimum number of features required
to classify a network packet into normal/attacks using Multi-Layer Perceptron as a
classifier for the final classification has been proposed. Further, KDD99 benchmark
dataset has been used to predict different types of attacks due to its legacy in this
field of intrusion detection. The results also show that the accuracy of the IDS using
the features extracted by the proposed algorithm is appreciable.
Keywords Intrusion detection system · Genetic algorithm · Multi-layer

per-ceptron · NSL-KDD dataset
1 Introduction
With the rapid growth of information and technologies, the way of working of the
business is converted. This also leads to an increase in the number of security dangers
associated with it from which organizations must be protected. Maintaining high level
J. Ghosh (B)
Central Research Laboratory, Bharat Electronics Limited, Bengaluru, India
e-mail: joydevghosh@bel.co.in
D. Kumar · R. Tripathi
Department of Computer Science and Engineering, MNNIT Allahabad, Allahabad, India
e-mail: divyak@mnnit.ac.in
R. Tripathi
e-mail: rajeshtcsed@mnnit.ac.in
https://doi.org/10.1007/978-3-030-38445-6_2
14 J. Ghosh et al.
of security has become very essential for performing safe communications between
different organisational entity. This security can be either in terms of data security
or network security. Various monitoring systems and Intrusion detection systems
has been developed for monitoring the events happening in a computer system or
network and analysing the results to detect any signs of intrusion. This is necessary
to attain overall security in terms of authentication, confidentiality, data integrity and
non-repudiation.
1.1 Classification of Intrusion Detection
The intrusion detection [1] can mainly be divided into the following two subdivisions
based on the component for which Intrusion detection system has been designed:
A. Host Based Intrusion Detection
The main role of a Host-based IDS [2] is to monitor the behaviors of a single or multi-
ple hosts and look for any malicious actions. Host-Based IDS can further be classified
into four categories: file system monitors, log file analyzers, connection analyzers,
kernel-based IDSs. As HIDS provides detailed information about the attack it is very
advantageous. It is also known as System Integrity Verifier.
B. Network-Based Intrusion Detection
A Network-based IDS (NIDS) [3] monitors the network communications by gather-

ing data directly from the packets getting transferred during the communication. For
this, any NIDS is essentially a sniffer. Distributed system architecture can be used
for balancing loads in NIDS.
In this paper, also the proposed work has been developed on Network Intrusion
Detection technique.
1.2 Networking Attacks
Every network attack can be categorized into one of the following attacks:
A. Denial of Service (DoS)
In DoS [4] attack, attacker floods the network with multiple requests making the
network to overflow and rejecting the requests from genuine users.
B. Remote to User Attacks (R2L)
In R2L [5], the attacker sends packets to a machine over the internet without having
proper access privileges with the aim to gain admin privileges and exploit it.
Features Extraction for Network Intrusion … 15
C. User to Root Attacks (U2R)

In U2R [6], an attacker with local access privileges tries to gain superuser privileges
to the victim.
D. Probing
In probing [7] attack, attacker scans the system or a remote device to find the
vulnerabilities in the machine with the intention to exploit it.
1.3 Components of Intrusion Detection System
There are mainly three functional components [8] of Intrusion Detection System. The
first component is an event generator which is mainly used for generation of events
by monitoring the flow of data in different environments. The different types of event
generators are Network-based, Application-based, Host-based and Target-based. The
second component is the analysis engine which uses the following approaches for
analysis:
A. Misuse/Signature-Based Detection
This type of Intrusion detection system mainly relies on searching for pat-
terns/signatures of attacks that exploit the vulnerabilities in software. This Misuse
approach uses several techniques like The rule-based approaches or expert systems,
approaches based on the signature and Genetic Algorithms (GA).
B. Anomaly/Statistical Detection
Using different statistical techniques this detection engine tries to find patterns of
activity that appear to be abnormal, rare or unusual. In this paper, we have used
Genetic Algorithms for Anomaly/Statistical Detection.
The third component is the response manager which acts only when inaccuracies
are found on the system.
2 Related Work
Denning in [9] proposed an intrusion detection technique for network attacks. After-
wards, different soft computing methods including various Genetic Algorithm (GA)
aprroaches were used for Network Intrusion Detection [10–12] has been used for
detecting intrusion detection and for deriving classification rules of Intrusion Detec-
tion Systems. Goyal and Kumar in [13] classify different types of smurf attacks using
a Genetic Algorithm based operators. The authors have realized a low false positive
rate of 0.2% and the detection rate is almost 100%. To detect network anomalies,
Bridges et al. [14] proposed a fuzzy set based approach.
16 J. Ghosh et al.
The log file based forensics is dealt mainly in [15, 16]. Herrerias and Gomez in
[15] have discussed a log correlation model for supporting evidence search process in
a forensic investigations. Their model is successful against the complexities arising
due to massive recorded events. Fan and Wang in [16] have used a steganography
based technique for logs. The intrusion is then detected on the basis of alteration
behavior of logs.
In this paper, we have used the Genetic algorithm along with some graph-based
methods and artificial neural network models for classification purpose. Genetic
Algorithm being a heuristic approach, provides a larger solution base and can be
efficiently used for parallel processing. It is very much related to search and optimiza-
tion problems. The genetic algorithms considers a population of solution to provide
the optimal solution as compared to conventional methods which focuses on single
solutions only. For all these advantageous reasons we have implemented the genetic
algorithm in the optimization part. The rest of the paper is organized as follows:
Sect. 3, contains the proposed methodology, the process of features extraction from
the original KDD-dataset, the pre-processing of the raw data and classification using
Artificial Neural Network. Section 4 contains the results obtained by implementing
the algorithm. Finally, Sect. 5 contains the conclusion and the future work.
3 Proposed Methodology
The proposed feature extraction methodology is based on a combination of Genetic

Algorithms and Multi-layered perception model. The implementation segment is
done on python3.6 [17] and NSL-KDD dataset [18] is used to test the accuracy
of proposed work methodology. The system architecture of the proposed model is
represented in Fig. 1. The detailed work flow of the proposed system is shown in
following subsections:
3.1 Preprocessing
Before implementing the proposed work based on the entire NSL-KDD dataset pre-
processing over the Dataset is needed to be performed. Large volume of redundant
records is a significant weakness in KDD dataset. This leads to a bias towards the
frequent records and results in a decline in accuracy due to missclassification of the
network attacks such as R2l and U2R attacks.
As an initial step, numerical values are substituted for all the the string values.
For this substitution the probabilities of all the unique values in each column are
computed and then the values are replaced by their respective probability values.
Fig. 1 Flow diagram
This is done to preserve the occurrence of each unique values and to bring the values
in the range of 0–1 at the same time.
In the preprocessing stage, the variance of each of features is calculated from the
dataset using Eq. 1 and the features having variance less than the average variance
are removed. The attributes are removed based on the idea that the values that don’t
show much change over the dataset are considered to be constant features and will
not play any significant role in the prediction.

n
Xk − µ
k=0
Var(X ) = (1)
N
where X k is the value of each elements in each columns
n
of the dataset, µ represents
X −µ
the mean of all the values in each columns given by k=0N k , N is the number of
elements in each column.
18 J. Ghosh et al.
Algorithm 3.1: Pre-processing (Train data)

Input: KDD Training Dataset Train data (say).
Output: Pre-processed dataset with reduced features.
1. for each column with string values C i ∈ Train_data do
1.1. for each unique element ∈ C i do
1.1.1. Compute the probability Pi j of unique element by dividing
the occurrence of that unique value with the total number
of values in the column;
1.1.2. Replace that unique value with the computed probability
Pi j;
2. Normalize the processed dataset;
3. for each column C i ∈ normalized dataset D do
3.1 Compute the variance of C i using Eq. (1);
4. Remove columns with variance less than the average variance consider-
ing all columns ∈ D ;
5. Let the final processed dataset produced be F ;
return (F )
3.2 Intrusion Detection
In the proposed work, Genetic Algorithm has been used to extract the optimal features
from the features remaining after pre-processing for the classification of a packet
communicating through a network as an attack or normal. In this paper, the entire
work can be subdivided into some following subsections:
Creation of Initial Population Initially, a 2-dimensional matrix of M rows and N

columns is generated, where M is taken as 50 and N is the number of features left
after removing features with variance less than the average variance from which
the best features are to be extracted. The number of features left after removing the
features with less variance is 17. The initial population of chromosomes is created by
taking random vector representation of features from F as bit strings, where F is a set
of features. Feature set for ith chromosome i.e. Fi = { f i1 , f i2 , f i3 , . . . , f i N } ∀i =
1, 2, . . . , M. Hence each individual is treated as an individual reduced feature set.
The initial population looks like:
⎡ ⎤
f1 f2 f3 f4 . . . f N
⎢ ⎥
⎢ ch 1 0 1 0 1 ... 0 ⎥
⎢ ⎥
initial_population = ⎢
⎢
ch 2 1 1 0 0 ... 0 ⎥
⎥
⎢ .. .... ⎥
⎣ . . . ⎦
ch M 0 1 0 0 ... 1
where N is the number of features in F.
Computation of Fitness Function Each probable solution in the genetic algorithm

is represented as a series of numbers as known as chromosomes. In every round
of evaluation, the aim is the generate better solutions (child) from the solutions of
previous iteration (parents). The better solutions replace the lesser fit individuals
(candidate solutions in previous iteration). Thus the population converges towards
the optimal solution. Next, we find the correlation of each of the remaining features
with every other feature and based on a threshold we will plot a graph such that each
node of the graph denotes each feature and the edges between the features will be
the correlation coefficient between the features. The correlation coefficient matrix
correlation coefficient is obtained which looks like.
⎡ ⎤
f1 f2 . . . f N
⎢ ⎥
⎢ f1 w11 w12 . . . w1N ⎥
⎢ ⎥
correlation_coefficient = ⎢
⎢
f2 w21 w22 . . . w2N ⎥
⎥
⎢ .. .. .. ⎥
⎣ . . . ⎦
f N wn1 wn2 wN N
From the values obtained from this matrix, a weighted graph G = (V, E, W ) is
derived where elements of V represents the features, E represents the set of edges
between the vertices with weights W = correlation_coefficient. Since constructed
graphs contain a lot of edges, it is advisable to perform thresholding and remove all
edges whose weights are less than the predefined threshold. Similarly, the chromo-
somal graph Gi = (V i , E i , W i ) for each chromosome is also developed in the same
way. In the case of the chromosomes, V i is the set of sentences represented by ‘1’ in
the bitstring, W i for the node f i j (i.e. jth feature of the ith chromosome) and f i k (i.e.
kth feature of the ith chromosome) is calculated as Eq. 2.
N

f (i jl ) − f i j f (ikl ) − f i k
l=1
Wi ( f i j, f i k) = (2)
N

2
2
f i j − f (i jl ) f i k − f (ikl )
l=1
20 J. Ghosh et al.
Once the graphs are created, we should compute the degree of each feature for
(G ∩ Gi ) because the features with more degree are more correlated with other fea-
tures and are likely to replace its correlated features during feature extraction. Each
of the features is assigned a fitness value which is updated on each iteration based
on the selection of the chromosomes for crossover and mutation. The chromosomes
are selected at random for crossover and the child chromosome as a result of this
crossover undergoes mutation by flipping the bits at random position of the chromo-
somes thus acquired. Further, the fitness of each feature is defined as a function of
the degree of correlation of that feature with other features along with the fitness of
the chromosome in which that feature is present. The final fitness of the features is
computed using Eq. 3.
chromosome_fitness
Feature_fitness = (3)
degree_correlation
where chromosome fitness is calculated as the accuracy of the Multi-Layer Perceptron

[19], which is calculated by passing the features with bitstring 1 in that particular
chromosome to the Multi-Layer Perceptron and degree correlation is the degree of
the vertex of the graph corresponding to the feature of the chromosome. For each
chromosome, we randomly generate a binary number. The columns with the value 1
are used for as features for classification and the accuracy returned by the classifier
is then used to compute the fitness of these features.
Algorithm 3.2: Features-Extraction (F , Target_Values)

Input: Pre-processed Dataset F (say),Target_Values
Output: Extracted Features.
1. Construct a N × N correlation coefficient matrix for N features by com-
puting the correlation between every possible pair of features i.e. (f i j,
f i k) ∈ F using Eq. (2)
2. Build a graph G = (V, E, W ) where V = set of vertices corresponding to
the features of the dataset F , E = set of edges and W = set of weights
associated with each edge (f i j, f i k) ∈ E computed by Eq. (2);
3. Sparse the graph G by removing the edges with weight less than the
threshold weight (i.e. 0.3) considering all edges of G;
4. The initial population of chromosomes i.e. initial population is created
by taking random vector representation of features from F as bit strings
of 0’s and 1’s;
5. for each individual chromosome of the population ∈ initial population
do
5.1 Graph Gi = (V i , E i , W ) is constructed for ith chromosome where
V i is the vertices of the features with bitstring set to 1 in the ith
chromosome, E i is the edge between these features and W is the
correlation coefficient between these features.
Classification For classification, we have used Multi-Layer Perceptron. The opti-

mization algorithm used is Adam. It is the first-order gradient-based optimization
of the stochastic objective function. We have chosen Adam optimizer because it has
fewer memory requirements, computationally efficient, invariant of diagonal rescale
of gradient and is well suited for large data sets involving a large number of param-
eters. Also, there is 1 hidden layer in the Multi-Layer perceptron with 15 nodes in
the hidden layer. These numbers of hidden nodes are chosen on the basis of the Rule
of thumb proposed in [19]. According to the Rule of thumb, the number of nodes in
the hidden layer can be chosen based on the following criteria:
(1) Number of hidden neurons should lie in the range between the size of the input
layer and the size of the output layer.
(2) The number of hidden neurons should be 2/3rd of the combined addition of
input size and the output size.
(3) The number of hidden layer neurons should be at least as large as twice the
numbers of neurons in the input layer.
Algorithm 3.3: Features-Extraction (F , Target_Values)

5.2. Degree of each vertices of (G ∩ Gi ) corresponding to the features of the
dataset F is computed for calculating the Feature fitness;
5.3. The features with the bitstring set to 1 is passed to the Multi-layer percep-
tron classifter with input layers equals to the number of features passed
and the output layer contains the target values.
5.4. Set the fttness of the chromosome i.e. chromosome fitness as the accuracy
of the input of the Multi-layer perceptron classifter based on the target
values which is obtained by Adam optimization algorithm;
5.5 The Feature fitness is computed as the fttness of the chromosome chro-
mosome fitness in which that feature belongs divided by the degree of
the feature using Eq. (3);
5.6 The Feature fitness is updated by summing the Feature fitness to its
previous value;
6. Arrange the features ∈ F in descending order of their Feature fitness;
7. Take first k features from the sorted list and put those in the Extracted
Features list E;
return (E)
The learning rate for the classier is chosen to be 0.001. The intention behind
keeping the learning is to make the model converge fast and also to keep in note that
it doesn’t diverge, which may happen if the learning rate would have been considered
larger. Also, ReLu (Rectied Linear Unit) [20] is chosen as the activation function
because it is zero centered which make optimization easier and also avoids and recties
vanishing gradient problem. For each iteration, we update the fitness of the features.
22 J. Ghosh et al.
After all the iterations, feature values are sorted according to the fitness values and
the best K features are selected for the prediction. We then apply the features to the
test data and noted the result.
4 Experimental Results
The method has been implemented on the NSL-KDD dataset. We have used the
numpy library for matrix operations and the sklearn library of Python 3.6 for machine
learning algorithms. NSL-KDD dataset contains 125,973 no of total records for
training and 22,543 no of records for testing. In this paper, we have used 70% of the
training data for training the proposed model and the remaining 30% of the training
data for validation testing of the model. Finally, the accuracy is calculated based on
the test data. There are 41 attributes in each record of the dataset corresponding to
different features of a network packet and a label assigned either as an attack type or
as normal. There are four attack types that are considered, namely, DoS, Probe, R2L,
and U2R. In this paper, we have used different methodologies to extract 7 features
out of 41 features of the dataset which is depicted in Table 1.
Accuracy in Table 2 is calculated as the dierence between 1 and relative error (RE)
multiplied by 100, where RE is the ratio between the number of intrusions detected
correctly and the total number of intrusions using Eqs. (4) and (5).
Accuracy = (1 − RE) ∗ 100 (4)
|desired_type − actual_predicted_type|
RE = (5)
desired_type
where desired type is the target attack types and actual predicted type is the actual
predicted output of the predictive multilayer perceptron model.
5 Conclusion
In this research paper, a new approach for feature extraction is demonstrated and the
accuracy of the outcome is evaluated on the benchmark NSL-KDD data. We have
explored GA and MLP for extracting the features. Results revealed that the
accuracy of the system in detecting Normal, U2R, R2L, Probe and Dos classed
based on the extracted features is much higher than that of previously proposed
systems and also it is time efficient due to the utilization of a reduced number of
features. This approach provided a high accuracy on Dos and R2L classes. This
suggests that if with more number of populations in the GA and with more number
of iterations accuracy can be achieved in other classes as well. Further, in the future,
Table 1 Extracted features after reduction
Features Extraction for Network Intrusion …
Extracted features % Total

F1 F2 F3 F4 F5 F6 F7 accuracy
Features’ protocol_type Rerror–rate Dst_host_count Dst_host_srv_count Dst_host_srv_serror_rate Dst_host_rerror_rate Dst_host_srv_rerror_rate 97.299%

info
23
24 J. Ghosh et al.
Table 2 Confusion metric for system evaluation

Attribute type Predicted label % Accuracy
Normal Probe dos u2r r2l
Actual class normal 9271 141 64 102 132 95.48
probe 0 2380 13 11 17 98.33
dos 0 0 7998 0 0 100
u2r 0 0 2 67 4 92.64
r2l 0 0 0 0 2332 100
the performance of the proposed system can be compared with other systems both in
terms of accuracy and time complexity. Also, this can be implemented in the cloud
environment to provide a safe platform for cloud users. As this system uses MLP as
a classier, it is also proved to be efficient for the cloud environment, as the nodes of
the MLP can be disintegrated into different nodes of the cloud and thus can prevent
overloading of a single machine.
References
1. Mukkamala, S., Janoski, G., Sung, A.: Intrusion detection using neural networks and sup-
port vector machines. In: Proceedings of the 2002 International Joint Conference on, Neural
Networks. IJCNN’02, vol. 2, pp. 1702–1707. IEEE (2002)
2. Vigna, G., Kruegel, C.: Host-based intrusion detection (2005)
3. Mukherjee, B., Heberlein, L.T., Levitt, K.N.: Network intrusion detection. IEEE Netw. 8(3),
26–41 (1994)
4. Mirkovic, J., Dietrich, S., Dittrich, D., Reiher, P.: Internet denial of service: attack and defense
mechanisms (radia perlman computer networking and security) (2004)
5. Das, M.L., Saxena, A., Gulati, V.P.: A dynamic id-based remote user authentication scheme.
IEEE Trans. Consum. Electron. 50(2), 629–631 (2004)
6. Lippmann, R., Cunningham, R.K., Fried, D.J., Graf, I., Kendall, K.R., Webster, S.E., Zissman,
M.A.: Results of the darpa 1998 offline intrusion detection evaluation. In: Recent Advances in
Intrusion Detection, vol. 99, pp. 829–835 (1999)
7. Zargar, G.R., Kabiri, P.: Identification of effective network features for probing attack detection.
In: First International Conference on Networked Digital Technologies. NDT’09, pp. 392–397.
IEEE (2009)
8. Hoque, M.S., Mukit, M., Bikas, M., Naser, A., et al.: An implementation of intrusion detection
system using genetic algorithm. arXiv preprint arXiv:1204.1336 (2012)
9. Denning, D.E.:. An intrusion detection model. IEEE Trans. Softw. Eng. 13(2), 222–232 (1987)
10. Chittur, A.: Model generation for an intrusion detection system using genetic algorithms. High
Sch. Honor. Thesis, Ossining High Sch. Coop. Columbia Univ (2001)
11. Li, W.: Using genetic algorithm for network intrusion detection. Proc. U. S. Dep. Energy Cyber
Secur. Group 1, 1–8 (2004)
12. Wei, L., Traore, I.: Detecting new forms of network intrusion using genetic programming.
Comput. Intell. 20(3), 475–494 (2004)
13. Goyal, A., Kumar, C.:. GA-NIDS: a genetic algorithm based network intrusion detection
system. Northwest. Univ. (2008)
14. Wang, W., Bridges, S.: Genetic algorithm optimization of membership functions for mining
fuzzy association rules. Dep. Comput. Sci. Miss. State Univ. 2 (2000)
15. Herrerias, J., Gomez, R.: A log correlation model to support the evidence search process in a
forensic investigation. In: Second International Workshop on Systematic Approaches to Digital
Forensic Engineering. SADFE 2007, pp. 31–42. IEEE (2007)
16. Fan, Y.T., Wang, S.J.: Intrusion investigations with data-hiding for computer log-file forensics.
In: 2010 5th International Conference on Future Information Technology (FutureTech), pp. 1–6.
IEEE (2010)
17. Python 3.6.6rc1 documentation. https://docs.python.org/3/download.html. Accessed 30-11-
2017
18. KDD Cup. Dataset. Available at the following website http://kdd.ics.uci.edu/databases/
kddcup99/kddcup99.html, 72 (1999)
19. Gardner, M.W., Dorling, S.R.: Artificial neural networks (the multilayer perceptron) a review
of applications in the atmospheric sciences. Atmos. Environ. 32(14–15), 2627–2636 (1998)
20. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic
models. Proceedings of ICML, vol. 30, p. 3 (2013)
Chemical Sensing Through
Cogno-Monitoring System for Air
Quality Evaluation
Kanakam Prathyusha and ASN Chakravarthy
Abstract Nowadays, global warming surroundings as well as heavy automobile

usage leads to wide changes in the environmental conditions that causes rise in the
toxic level of the surrounding air which may disturb the lives of individuals. Every
problem in this internet era gives the solution with smart objects that are connected
remotely to one another and notifies the user about the problem occurred ahead.
Sensors extended their way to measure the quality of air in the surroundings by
measuring the toxic levels of the gases as well as to filter the harmful gases which are
leading to pollution in turn causes to death. This work concentrates on profiling of
chemicals in the air pollutants. After profiling, the toxic levels can be calculated using
Air Quality index methods that give the weighed values of individual air pollutants
in the environment which leads to the scientific research. It gives the prototype to
Cogno-Monitoring System that uses air quality algorithm and connects the air quality
system to smart device for notifying about air quality in areas that the device was
sensed.
Keywords Cogno-Monitoring system · Chemical profiling · Air quality sensors ·

Air quality index · Internet of things · Chemical sensing
1 Introduction
In this urbanized and industrialized world, there has been a rapid growth in industries,
deforestation, usage of motor vehicles which ultimately gives rise to air pollution
causing problems to the environment and this can lead to many issues in the aspects
regarding health, climate, loss of biodiversity, etc. In addition to outdoor air pollu-
tion the quality of air is also concerned with the indoor air pollution which will be
K. Prathyusha (B)
Department of CSE, MVGR College of Engineering, Vizianagaram, Andhra Pradesh, India
A. Chakravarthy
Department of CSE, University College of Engineering Vizianagaram, JNTUK Vizianagaram,
Vizianagaram, Andhra Pradesh, India

https://doi.org/10.1007/978-3-030-38445-6_3
28 K. Prathyusha and A. Chakravarthy
produced due to some heating practices, insufficient cooking and in other scenar-
ios. The indoor air pollution also accelerates the hazardous air pollutants leading to
occurrence of diseases such as lung cancer, pneumonia, asthma, chronic bronchitis,
coronary artery disease, and chronic pulmonary diseases.
Air is the primary source for sustaining of individual’s life. Due to raise in urban-
ization and industrialization, there may increase in the number of industries that
releases waste which leads to air pollution in the environment. Various sectors affects
air pollution—electricity generation, chemicals, paper products, food and beverages,
prime metals, vehicle emissions and many more. Of all these, electricity generation
occupies more space nearly 49% rate of air pollution. They in-turn leads too many
aspects regarding health, climate, loss of biodiversity, etc. Technology should step
forward to analyze the quality of air in different areas of the environment.
The major air pollutants which are responsible for these problems are O3 , NO2 ,
CO, SO2 and particulate matter, which is the sum of all solid and liquid hazardous
particles. Among these pollutants, most of life loss is due to the exposure of particulate
matter and ozone. It is observed that NO2 is the prime cause for air pollution through
vehicle emissions. The necessity lies in the measurement of air quality rather by
findings the classes of harmful and harmless gases in the air pollutants.
Every individual’s life interrelated to the future internet by connecting various
devices including their smart phones that are located at distinct locations. Smartness
is the measure to provide easiness in utilizing the resources as well as the quality
services to the individuals. Many applications comes into play in this smart era
and smart city is one such employment to serve for various purposes—weather
forecasting, air quality management, automation of homes and buildings and many
more.
2 Preliminaries
Nihal et al. [1] implemented environmental air pollution monitoring system, where
the concentrations of harmful air pollutants are monitored through semiconductor
gas sensors which are calibrated by the standard static chamber method along with
the smart transducer interface module (STIM) which is implemented by ADuC812
micro-converter and network capable application processor(NCAP). The STIM is
connected to the NCAP via transducer independent interface (TII). Japla et al. [2]
proposed a customized design for environmental monitoring system that predicts
the temperature, humidity and CO2 level and through the nodes of the network; the
respective values are notified to smart phone.
Wireless Sensor Network Air Pollution Monitoring System (WAPMS) to mon-
itor air pollution in Mauritius is proposed in [3]. It makes use of wireless sensors
arranged around the island and through Air Quality Index (AQI) and Recursive Con-
verging Quartiles (RCQ), which is a data aggregation algorithm. [4] used Arduino
and ATMEGA328 microcontroller, along with temperature, humidity, gas, sound
Chemical Sensing Through Cogno-Monitoring System … 29
sensors for sensing the environment condition and provides the data to the cloud
server via IOT module.
Gas detecting sensors calibrated them to serve for different purposes. Along with
the future internet, the sensor domain collaboratively performs for providing a smart
application. In this smart era, every object related to another through the help of
internet. Technology has advent in its way to obtain the readings with respect to
individual gas in the environment that causes pollution. The traditional system noti-
fies the weather forecasting and level of pollution in an area. In extension to this,
the cogno-monitoring system notifies the level of pollutants to smart device of an
individual in that particular area.
2.1 Air Quality Sensors
Every object in this modern era purely relies on the sensor technology and uses
various types of sensors available in the market that serve for different purposes.
All these sensors are cost effective and works depend on their individual design and
capabilities due to the technological advancements in the environment. In most of the
environmental monitoring systems, air pollution sensors as listed in Table 1. widen
their existence to calculate the air quality, pollution level, as well as the pollution
standards with respect to locations. They can be applied to distinct applications
Table 1 Various air quality sensors

Name of air quality sensor Used to detect Applications
MQ-7 gas sensor Carbon monoxide (CO) Home, industrial and
automobiles
Grove gas sensor LPG, Methane (CH4 ), carbon Home and industrial
monoxide (CO), alcohol,
smoke, or propane
Hydrogen sensor Carbon monoxide (CO), LPG, Home and industrial
cooking frames, alcohol
MQ-2 gas sensor LPG, propane, hydrogen, All type of applications
methane (CH4 )
CAIR-CLIP Ozone (O3 ), nitrogen dioxide Indoor and outdoor air quality
(NO2 ) monitoring
Air quality-egg sensor Carbon monoxide (CO), All type of applications
nitrogen dioxide (NO2 ), along
with temperature and humidity
MiCS 2610/2611 Ozone (O3 ) All type of applications
Shinyei PPD42 Particulate matter (PM2.5, All type of applications
PM10)
MiCS 5521 Volatile organic compounds All type of applications
(VOCs)
mainly employed for home and industrial domains and used for both indoor and
outdoor environments. The major focus of these air pollution sensors is on 5 prime
pollutants—ozone (O3 ), particulate matter (PM2.5 and PM10), carbon monoxide
(CO), sulphur dioxide (SO2 ), and nitrogen dioxide (NO2 ). These sensors can help
serve many purposes and help bring attention to environmental issues beyond the
scope of the human eye. Table shows different types of Air quality sensors which
are basically know for the monitoring of air pollutants.
3 Cogno-Monitoring System (CMS)
Due to heavy vehicle emissions and the other factors like industrialization, defor-
estation and electricity generation causes air pollution, contamination of air in the
surroundings. This pollution in turn may lead to several health, climatic and biodi-
versity issues. It is concerned with both outdoor and indoor air pollution factors like
some heating practices, insufficient cooking and in other scenarios. Reacting to the
circumstances is the major issue for protecting the surroundings from air pollution,
as environmental issues draw attention beyond the scope of human eye.
Future Internet or Internet of things (IOT) uses this path to bring strong foundation
to solve the issues. IOT has its potential to serve for different fields like Habitat Mon-
itoring, Environmental Monitoring [5], Fire and Flood Detection, Bio-complexity
Mapping and Precision Agriculture. Air Quality monitoring is one among all such
applications that plays a crucial part in building Smart cities. It may involves the
reduction of man power, chemical profiling of air pollution and their level monitoring,
on-location testing, and connecting output devices with processing systems.
Cogno-Monitoring System (CMS) is one such environmental monitoring system
that measures and predicts the air quality and atmospheric conditions. The machine
is trained to measure the air quality index of the pollutants. The cognition is applied
to machine to understand the atmospheric conditions in an area and give notification
to the user. It mainly composed of 3 primary modules—sensor array, processing unit
and decision making unit as depicted in Fig. 1. The system is connected to the internet
for giving notifications to the user about the pollution level of particular location.
Sensor Array: It is the combination of various types of sensors which listed in
Table 1 in order to detect the respective prime pollutants in the air. Air is passed
through the inlet of sensor array unit as air is the composition of both organic and
inorganic particles such as dust, pollen, soot, smoke and liquid droplets etc. Among
all the pollutants five major components are observed. They are Carbon Monoxide
(CO), Sulphur dioxide (SO2 ), Nitrogen dioxide (NO2 ), Ozone (O3 ), particulate matter
(PM2.5, PM10) and among them particulate matter and also ground level ozone are
most hazardous ones that causes respiratory and cardiovascular illness. The major
composition of PM2.5 are ammonium sulphate, ammonium nitrate, organic carbon,
elemental carbon, crustal material (Al, Fe, Ca, Ti, Mn). The numerical value for the
component represents the size of diameter of particle in micrometers (i.e., PM2.5
has diameter size 2.5 micrometers or less and PM10 has 10 micrometers diameter).
Database
Network
Processing Decision
Sensor
Unit Making
Array
Output Screen
Notification to device
Fig. 1 Schematic view of cogno-monitoring system
Processing Unit: After the detection of gases, the pollutants are profiled with their
pollution level calculated in parts per million (ppm) and their characteristic equation
is derived using linear regression for each prime pollutants (where independent vari-
ables are pollutants and those chemicals depends on the ‘time’). The computation
helps in calculating the air Quality which is measured by its index value that trans-
lates the weighted average of each prime pollutant into single value. Index values
are stored in the database for the respective pollutants and they may be used while
working with the decision making unit.
Decision making Unit: The crucial role of this module is to compare the air quality
index values that are already stored in the database. Depends on these values, the
output is reported. If the index value exceeds the threshold limit value then the user
who is connected with the device will get the notification about the pollution level of
that particular location. This module resembles the IOT module that the user device
is connected to the cogno-monitoring system to give report. Table 2 indicates the air
quality index values for different atmospheric conditions.
This system can be incorporated in heavy traffic and public areas to measure the
air quality in that location. Thus the data obtained from pollution sensors can be
made available to users as notifications which in turn lead to build a Smart city.
Table 2 Details of air quality

Index values Air quality description
index values
0–100 Clean air
101–125 Light pollution
126–150 Significant pollution
150 above Heavy pollution
3.1 Experimental Analysis
Air quality monitoring plays a crucial role in building the smart cities and aware the
citizens about the pollution level of respective location. Cogno-Monitoring system
is a machine trained to measure the quality of air using the mathematical measures
and linear regression transforms. Air is pumped through the inlet of the sensor array
unit which combines different types of sensors to detect the prime pollutants in the
air. Table 3 shows the list of pollutants along with their pollution level values that
are calibrated annually and daily in various zones of industrial, residential, rural and
ecologically sensitive areas.
Among these pollutants, the prime pollutants are identified depending on their ppm
values and also toxic levels. The threshold values are noted at different instances of
time for all of these prime pollutants listed in Table 4.
Intra-procedure of processing unit: The challenge of processing unit is to obtain the
air quality measure to ensure the toxic level of prime pollutants and give notification
to the smart device which is connected with this system. It can be calculated by
following the air quality algorithm that uses linear regression transformation. Linear
regression analysis is to derive the characteristic equation of respective pollutant that
depends on the variable time ‘t’ at various instances.
Table 3 List of pollutants and their concentrations levels occurred in different areas
Name of the air pollutants in µg/m3 Concentrations Concentrations
in industrial, in ecologically
residential, sensitive area
rural and other
areas
Annual Daily Annual Daily
Sulphur dioxide SO2 50 80 20 80
Nitrogen dioxide NO2 40 80 30 80
Particulate matter (size less than 10 µm) or PM10 µg/m3 60 100 60 100
Particulate matter (size less than 2.5 µm) or PM2.5 µg/m3 40 60 40 60
Ozone (O3 ) 100 180 100 180
Lead (Pb) 0.5 1 0.5 1
Carbon monoxide (CO) 2 4 2 4
Ammonia (NH3 ) 100 400 10 400
Benzene (C6 H6 ) 5 0 5 0
Benzo(a)Pyrene (BaP)-particulate phase only 1 0 1 0
Arsenic (As) 6 0 60 0
Nickel (Ni) 20 0 20 0
Table 4 Major pollutants and their concentration levels

Name of the air pollutants in µg/m3 Concentrations in weighted
average of time
Annual 24 h/8 h/1 h/10 min
Sulphur dioxide SO2 20 500 (10 min)
Nitrogen dioxide NO2 40 200 (1 h)
Particulate matter (size less than 10 µm) or PM10 µg/m3 12 25
Particulate matter (size less than 2.5 µm) or PM2.5 µg/m3 20 50
Ozone (O3 ) 100 100 (8 h)
3.2 Air Quality Algorithm
Air Quality Algorithm (AQA) is mathematical procedure of obtaining the most haz-
ardous pollutant in the particular area. CMS processing unit partly depends on AQA.
This scheme is used to retrieve the characteristic equation of particular pollutant
whose values are indexed. Depends on the intercept values, the highest indexed
pollutant is filtered which can be considered as most influenced pollutant in that
area.
Let X(p) = {SO2 , NO2 , PM2.5, PM10, O3 .} and Y(t) = {10 min, 1 h, 8 h, 24 h,
annual} where Y = aX + b. Here, X are independent variables that represents distinct
pollutants of air and Y(t) is dependent variable of these independent variables that
represents i.e., the value of ‘X(p)’ changes accordingly {x1, x2,…,x5} with different
values of ‘Y’. ‘a’ is the X-intercept and ‘b’ is the Y-intercept and the points (x, y)
scatters over the plane of x-axis and y-axis.
{Let X(p) = {SO2 , NO2 , PM2.5, PM10, O3.} and Y(t) = {10 min,
1 h, 8 h, 24 h, annual}}
Step1: For X(p) = {x1, x2, …x5} where x1, x2, …x5
represents the pollution level values at different
units of ‘Y(t)’ = {0.16, 1, 8, 24}
{Compute the characteristic equation of each pollutant
Y = aX + b}
Step2: compute XY, X 2 , Y 2
Step3: compute summation of X, Y, XY, X 2 ,Y 2
Step4: compute X-intercept ‘a’ where a = ( )( 2) ( )(2
y x2 − x x y)
n ( x )−( x )

Step5: compute Y-intercept ‘b’ where b = ( ) 2 ( )( 2 )
n xy − x y
n ( x )−( x )
Step6: derive the characteristic equation Y = aX + b
The obtained equation is nothing but the function applied on respective pollutant
to calculate the index values for measuring the air quality index as a whole. From
air, various pollutants are obtained and their index values are calculated in the form
of equation derived from the AQ Algorithm (i.e., f(X1) is the characteristic equation
Fig. 2 Detailed description

X1 Y1= f(X1)
of air quality algorithm
X2 Y2= f(X2) Aggregation of

all indexes are
Air transformed to AQI
single index
value
Xn Yn= f(Xn)
Table 5 Obtained air quality index values for NO2 to get characteristic equation
x y xy x2 y2
0.5 0.16 0.08 0.25 0.0256
3 1 3 9 1
24 8 192 576 64
72 24 1728 5184 576
x = 99.5 y = 33.16 xy = 1923.08 x2 = 5769.25 y2 = 641.0256
aX1 + b) likewise the same equation is calculated for each and every pollutant of air
to transform into a single air quality index value as shown in Fig. 2.
It is observed that there are five prime components that are most hazardous and
frequently sighted in heavy pollution of air. They are NO2 , SO2 , PM2.5, PM10, O3 .
By considering the sampling values for NO2 , the steps of air quality algorithm is
followed for calculating the individual characteristic equation of respective pollutant
itself (Table 5. gives the values). Thus, a = 0.08 and b = 2.99 and the Index equation
f(x = NO2 ) = 0.08x + 2.99. In the same way, for remaining pollutants, their equations
are calculated and transformed into single air quality index value.
4 CMS Experimental Setup
The sample CMS has setup and experimented to get the PPM values of CO and SO2 ,
which are one of the most toxic level gases that causes air pollution. MQ3 and MQ135
sensors are used for detecting them. Figure 3 depicts the experimental design of the
sample cogno-monitoring system, which is implemented using two air quality sensors
that detect the PPM values of CO and SO2 connected to micro controller Arduino-Uno
board to note the values. Then these values can be sent through a mobile or monitor
display, which are connected through a Bluetooth module or Zigbee protocols (both
establishes wireless personal area networks among the devices).
After establishing the connections between the components the system is placed in
the environment for the detection of the gases and to check the working of the system.
In case of any activity like burning of fuel, coal or wood then carbon monoxide is
Air Quality Sensor
MQ3 sensor Ardiuno Uno Mobile

Zigbee
Micro Controller
MQ135 sensor
Display Monitor
Fig. 3 Schematic view of sample CMS
produced and its toxicity level is detected by the mq3 gas sensor. Similarly, Burning
of fossil fuels such as coal, oil and natural gas are the main source of sulphur dioxide
emissions and these actions are performed the readings of the mq135 gas sensors
are increased indicating the concentration or toxic level of sulphur dioxide (SO2 ).
The concentration of carbon monoxide can be determined only from mq3 gas sensor
because of its sensitivity towards that gas and concentration of sulphur dioxide is
measured by the mq135 gas sensor in this system. Figure 4. shows the concentrations
of these gases in the environment before and after the activities.
The data which is obtained from the system is disseminated to citizen through a
mobile application via Bluetooth. The Bluetooth-Module HC-05 is communicating
with the Arduino via the UART-Interface. Every message which the Arduino wants
to send is first given to the Bluetooth-Module that sends the message wirelessly.
To avoid problems with the UART, Arduino and Bluetooth-Module have to use
the same baud-rate (in default 9600). Before using the app the Bluetooth-Module
(HC-05/HC-06) has to be coupled to the Android in the system-preferences.
On further, the air quality index (AQI) [6] values which are obtained for the
pollutants are classified into six categories depending upon its range. Each category
determines different level of concern to the health where it ranges from Good to
Values sensed before activity Values sensed after activity
Fig. 4 Values noted before and after activity conducted to test sample CMS
Fig. 5 Values notified on

mobile
Hazardous. After evaluating these AQI for the air pollutants, the system evaluates
under which category it falls under. Once a user is connected to the system through
the mobile application via Bluetooth, the AQI values of the air pollutants and its level
of toxicity and its concern to the health will be notified as shown in the Fig. 5.
5 Conclusion
The Future Internet has forwarded a step in order to relate all the computing devices
to analyze, validate and transfer the data among them to achieve either human to
machine interaction or machine to machine interaction. It combines various things,
objects, and people with unique identifiers to call as Internet of things (IoT). Smart
objects interactions can be applied to various fields for solving distinct problems
that surrounds the regular lives of people. Cogno-monitoring system is one such
a prototype that works with air quality algorithm to notify about the toxic levels
of pollutants in the environment through user handheld devices. It enhances the
domain with smart city application with the use of chemical profiling procedures for
preventing the disasters to occur due to entering into heavy pollutants area.
References
1. Kularatna, N., Sudantha, B.: An environmental air pollution monitoring system based on the
IEEE 1451 standard for low cost requirements. IEEE Sens. J. 8, 415–422 (2008)
2. Shah, J., Mishra, B.: IoT enabled environmental monitoring system for smart cities. In: 2016
international conference on internet of things and applications (IOTA), 2016
3. Shah, J., Mishra, B.: Customized IoT enabled wireless sensing and monitoring platform for
smart buildings. Procedia Technol. 23, 256–263 (2016)
4. Khedo K.K., Perseedoss R., Mungur A.A.: A wireless sensor network air pollution monitoring
system. Int. J. Wirel. Mob. Netw. 2, 31–45 (2010)
5. Uma, K., Swetha, M., Manisha, M., Revathi, S., Kannan, A.: IOT based environment condition
monitoring system. Indian J. Sci. Technol. 10, 1–6 (2017)
6. Air Quality Index (AQI) Basics. https://airnow.gov/index.cfm?action=aqibasics.aqi
3 DOF Autonomous Control Analysis
of an Quadcopter Using Artificial Neural
Network
Sanket Mohanty and Ajay Misra
Abstract The Quadcopter is an Unmanned Aerial Vehicle (UAV) which has turned
out to be exceptionally mainstream among specialists in the recent past due to the
advantages it offers over conventional helicopters. Quadcopter is extremely unique
and interesting, however it is inherently unsteady from streamlined features perspec-
tive and aerodynamics point of view. In recent past scientists have proposed many
control schemes for the stability of quadcopter, but Artificial Neural Network (ANN)
systems provide us with the fusion of human intelligence, logic and reasoning. The
research focuses on the use of ANN for the control plant systems whose plant dynam-
ics are expensive to model, inaccurate or change with time and environment. In this
paper, we explore the Linear Quadratic Regulator (LQR) and Sliding Mode Con-
trol (SMC) control is designed for an quadcopter with 3 Degree Of freedom (DOF)
Hover model by Quancer. The main benefits of this approach are the model’s ability
of adapt quickly to unmodeled aerodynamics, disturbances, component failure due
to battle damage, etc. It eliminates the costs and time associated with the wind tunnel
testing and generation of control derivatives for the UAV’s.
Keywords Quadcopter · Unmanned aerial vehicle (UAV) · Artificial neural

network (ANN) · Linear quadratic regulator (LQR) · Sliding mode control (SMC)
1 Introduction
An Unmanned Aerial Vehicle (UAV) is an unpiloted aircraft which can either fly self
governing or it tends to be remotely controlled dependent on program transferred into
on-boards PCs. UAV’s have tremendous zone of utilization in military for mission
that are excessively dull, messy, or hazardous for human guided flying machines. In
past decade, there has been a significant growth in UAV’s role in civilian and military
application scenarios. UAV’s are utilized in applications, e.g. border security, surveil-
lance, airborne study, hunt, and rescue. The growth in number of these applications
S. Mohanty (B) · A. Misra

Defense Institute of Advanced Technology, Pune, India

https://doi.org/10.1007/978-3-030-38445-6_4
40 S. Mohanty and A. Misra
can be attributed to rapid enhancements in field of control, robotics, communication

and computer technology over the years.
Flight control system design is still a fundamental problem for UAVs. The longing
for upgraded deftness and usefulness in a UAV necessitates that it performs over an
expanded scope of working conditions described by vivid variations in pressure and
nonlinear aerodynamic phenomena [1]. In addition, the use of nonlinear actuation
systems increases the complexity of the control design [1].
The most generally contemplated way to deal with nonlinear control includes the
utilization of nonlinear change procedures and differential geometry. This procedure
changes the state and control of the nonlinear framework [2] with the end goal that
the subsequent framework displays straight elements. Linear tools would then be
able to be applied and simultaneously convert back into original coordinates using
inverse transformation [2]. This wide class of methods is most usually known as
‘feedback linearization’.
2 Experimental Setup
The 3 DOF Hover model by Quanser [3] is a Multiple Input Multiple Output (MIMO)
system [4] organized to learn about the conduct of a controlled quadrotor using the
stage displayed in Fig. It contains a casing with 4 propellers mounted on a 3 DOF
pivot joint [4]. Every propeller is driven by a DC motor and the joint licenses the
casing to unreservedly roll, pitch and yaw. The pitch/roll and the yaw can be indepen-
dently obliged by the lift control moreover, the torque delivered by the propellers.
Data about the frame orientation is given by the optical encoders mounted over
each pivot and each DC motor autonomously controlled by means of an analog
signal. This nonlinear system with four inputs (analog signal to the DC motors)
and three yields (the angular displacement of the frame) is an astounding plant to
learn about the properties of a wide extent of controllers in a fixed rotational stage.
Moreover, the low coupling that exists among the pitch, roll and yaw state [4] ele-
ments of the structure allows the improvement of an autonomous control law for
each angular segment of the frame orientation. In different words, the framework
can seek after a free reference signal for each angular segment following the DC
motor control signals got from the yields of 3 distinct controllers.
To control the framework, Quanser furnishes a computer system with the required
equipment (acquisition cards) as well as 4 signal amplifiers to convert the computer
analog outputs to the DC motor control [0; 22 V] range. In addition, the computer
system is prepared with Quarc, the instructive control programming by Quanser,
which allows a computer system to observe the advancement of the plant from
Simulink utilizing the code created from a Simulink model. For instance to control
the 3 DOF Hover, Quanser provides a LQR implemented in Simulink [4].
3 DOF Autonomous Control Analysis of an Quadcopter … 41
2.1 Mathematical Modeling of Quancer 3 DOF Hover Model
The free-body diagram of the Quanser 3 Degree Of Freedom (DOF) Hover is shown
in Fig. 1.
The 3 DOF Hover modeling conventions:
The 3 DOF Hover is horizontal (i.e., parallel with the ground platform) when the
pitch and move edges are zero, θ p = 0 and θr = 0.
• Yaw angle increments decidedly, θ̇ y (t) > 0, when the body turns in the counter-
clockwise (CCW) heading.
• Pitch angle increments emphatically, θ̇ p (t) > 0, when turned CCW.
• Roll angle increments decidedly, θ̇r (t) > 0, when pivot CCW.
When a positive voltage is applied to any DC motor of 3 DOF Hover a positive
thrust force is produced and this makes the corresponding propeller assembly to
ascent. The thrust force generated are indicated by F f , Fb , Fr and Fl , by the front,
back, right, and left DC motors respectively. The thrust forces created by the front
and back DC motors generally control the movements about the pitch axis while the
right and left DC motors essentially move the hover about its roll pivot. The pitch
angle increases when the thrust force from the front DC motor is larger than back
DC motor F f > Fb . The roll angle increases when the thrust force from the right
DC motor is larger than the left DC motor, Fr > Fl .
2.1.1 Pitch and Roll Axis Model
The dynamics for each axis can be described by the general equation:
J θ̈ = F L
Fig. 1 Free-body diagram

of 3 DOF hover

of pitch axis
where θ is the angle of the pivot, L is the distance between the propeller motor and the
pivot on the axis, J is the moment of inertia about the axis, and F is the differential
thrust-force. With the force diagram in Fig. 2, we can model the pitch axis using the
equation.

J p θ̈ p = K f V f − Vb
where K f is the thrust force constant, V f is the front motor voltage, Vb is the back
motor voltage, θ p is the pitch angle, and J p is the moment of inertia about the pitch
axis. This follows the conventions shown in Fig. 1, where the pitch angle increases
when the front motor voltage is larger than the back motor.
Similarly, for the roll axis we have
Jr θ̈r = K f (Vr − Vl )
where K f the thrust force is constant, Vr is the right motor voltage, Vl is the left
motor voltage, θr is the roll angle, and Jr is the moment of inertia about the roll axis.
The roll angle increases when the right motor voltage is larger than the left motor.
2.1.2 Yaw Axis Model
The motion about the yaw axis, shown in Fig. 3, is caused by the difference in torques
exerted by the two counter clockwise and two clockwise rotating propellers.
Jy θ̈ y = τ = τl + τr − τ f − τb
where τl and τr are the torques generated by the left and right clockwise propellers
and τ f and τb are the torques exerted by the front and back counter-clockwise rotors.
In convention the counter-clockwise torques are negative. The torque generated by
all the propellers is assumed to be = K t Vm , where K t the thrust torque is constant
and Vm is the motor voltage. Thus in terms of applied voltage, the yaw axis equation
of motion is

Jy θ̈ y = K t (Vr + Vl ) − K t V f + Vb

of yaw axis
2.1.3 State Space Model
The state space representation is given by ẋ = Ax + Bu

and y = C x + Du
For the quancer 3 DOF Hover, we define the state vector

x T = θ y θ p θr θ̇ y θ̇ p θ̇r

and the output vector y T = θ y θ p θr , control vector u T = V f Vb Vr Vl
Using the equations of motion, the corresponding 3 DOF hover state-space
matrices are as follows (Table 1):
Table 1 Parameters of the system

Symbol Description Value Unit
Kt, n Counter rotation propeller torque-thrust constant 0.0036 N m/V
Kt, c Normal rotation propeller torque-thrust constant 0.0036 N m/V
Kf Propeller force-thrust constant 0.1188 N/V
l Distance between pivot to each motor 0.197 m
Jy Equivalent moment of inertia about the yaw axis 0.110 kg m2
Jp Equivalent moment of inertia about the pitch axis 0.0552 kg m2
Jr Equivalent moment of inertia about the roll axis 0.0552 kg m2
⎡ ⎤
000100
⎢0 0 0 0 1 0⎥
⎢ ⎥
⎢ ⎥
⎢0 0 0 0 0 1⎥
A=⎢ ⎥
⎢0 0 0 0 0 0⎥
⎢ ⎥
⎣0 0 0 0 0 0⎦
000000
⎡ ⎤
0 0 0 0
⎢ 0 ⎥
⎢ 0 0 0 ⎥
⎢ ⎥
⎢ 0 0 0 0 ⎥
B=⎢⎢ − Jy − Jy Jy KJyt
Kt Kt Kt ⎥
⎥
⎢ LK LK ⎥
⎢ f f ⎥
⎣ Jp Jp 0 0 ⎦
LK f LK f
0 0 Jr Jr
⎡ ⎤
100000
C = ⎣0 1 0 0 0 0⎦
001000
⎡ ⎤
0000
D = ⎣0 0 0 0⎦
0000
2.2 Control Design
2.2.1 State Feedback
The state feedback controller is designed to regulate the pitch, roll, and yaw angles of
the Quancer 3 DOF Hover model to desired positions. The control gains are computed
using Linear Quadratic Regulator (LQR) and Sliding Mode Control (SMC) algorithm
in below sections.
The state feedback controller for motors is defined as

T K (xd − x) + u bias i f u ≥ 0
u = V f Vb Vr Vl =
0 ifu < 0
where x is defined in above section, K ∈ Rˆ4 × 6 is control gain,

xd = θd,y θd, p θd,r 0 0 0
is the set-point vector (i.e., reference angles) and


T
u bias = Vbias Vbias Vbias Vbias
is the bias voltage, i.e. a fixed constant voltage applied to each DC motor. Adding a
bias voltage to each propeller prevents the voltage from going below zero and cut-off.
This makes the system more responsive. Allowing only positive thrust also makes it
resemble more closely to how actual VTOL and helicopter operates, the propellers
can’t reverse its direction.
2.2.2 Linear Quadratic Regulator (LQR)
The control gains are computed using the Linear Quadratic Regulator (LQR) scheme.
The feedback law u = −K x
And the weighting matrices
⎡ ⎤
500 0 0 0 0 0
⎢ 0 0 ⎥
⎢ 350 0 0 0 ⎥
⎢ ⎥
⎢ 0 0 350 0 0 0 ⎥
Q=⎢ ⎥
⎢ 0 0 0 0 0 0 ⎥
⎢ ⎥
⎣ 0 0 0 0 20 0 ⎦
0 0 0 0 0 20
and
⎡ ⎤
0.01 0 0 0
⎢ 0 0.01 0 0 ⎥
R=⎢
⎣ 0
⎥
0 0.01 0 ⎦
0 0 0 0.01
and the state space matrices are computed, the control gain
⎡ ⎤
−111.8 132.3 0 −41.41 36.23 0
⎢ −111.8 −132.3 0 −41.41 −36.23 0 ⎥
K =⎢
⎣ 111.8
⎥
0 132.3 41.41 0 36.23 ⎦
111.8 0 −132.3 41.41 0 −36.23
is computed by minimizing the cost function (Fig. 4).
∞
J= x T Qx + u T Rudt
0
Fig. 4 Simulink model of LQR control on 3 DOF hover
2.2.3 Sliding Mode Control (SMC)
The control gains are computed using the Sliding Mode Control (SMC) scheme. The
feedback law (Fig. 5)
U (t) = UC (t) + Ueq (t)
where U (t) = control law, Uc (t) = corrective control, Ueq (t) = equivalent control.
Fig. 5 Closed loop system of quadrotor dynamics with SMC control

3 Simulation Result
3.1 Simulation of Quancer 3 DOF Hover Control Module
Based on the equations of motion described (discussed in earlier section) for com-
bined dynamics of Quancer 3 DOF Hover model, Simulink model of the Linear
Quadratic Regulator (LQR) controller and Sliding Mode Control (SMC) controller
was developed for 3 DOF Hover setup [5]. It was developed in the first phase and
the training and validation data was collected from Real time operation of Quancer
setup for predefined commands (i.e. pitch, roll, and yaw).
This is an attitude command (control) only model. In other words, there is no
control system to track position (model limitation). Instead the controller can only
track attitude θ p , θr , θ y .
Artificial Neural Network (ANN) was created for collected data from real time
operation of Quancer 3 DOF Hover model (as shown below). The inputs for Artificial
Neural Network (ANN) was taken as desired angles Pitch (θ p ), Roll (θr ), and Yaw
(θ y ). Outputs of ANN was assigned to output voltage of respective DC motors front
motor voltage (V f ), back motor voltage (Vb ), left motor voltage (Vl ), and right motor
voltage (Vr ).
3.2 Simulation Result and Analysis
For both the controllers LQR and SMC respective 10 neurons (2 layers) and 20
neurons (2 layers) Neural Network was created. Training of neural network was done
from the real time data collected from Quancer 3 DOF hover module [5] (Figs. 6 and
7).
Fig. 6 Artificial feed

forward neural network with
10 neurons
Fig. 7 Artificial feed

forward neural network with
20 neurons
3.2.1 Training and Validation for LQR control
The training of LQR control with 9896 data points from real time operation using
Levenberg-Marquardt (TRAINLM) training function with random data division.
Adaptation learning function used for neural network here is LEARNGDM.
Training of LQR control based Artificial Neural Network (ANN) was done with 10
neurons (2 layers) and 20 neurons (2 layers) respectively. After validation following
plots were obtained Performance plot, Training plot, and Regression plot (as shown
below).
The performance plot give us Mean Squared Error (MSE), best validation perfor-
mance of 6.9769 achieved after 252 epochs. Which shows relationship between the
outputs of the network and targets. Training state plot give us the gradient (used for
calculation of weights to be used in neural network), mu (training gain/momentum
update) and validation check (Figs. 8 and 9).
The regression plot of training, validation, test, and all are shown below. The
‘R’ value indicates the relationship between outputs and targets. Here LQR control
based ANN with 10 neurons testing regression R is 0.51789 and overall R is 0.51229
(Fig. 10).
mance of 3.2229 achieved after 151 epochs. LQR control based ANN with 20 neurons
gives better results than 10 neuron network. By increasing the number of neurons
in network had reduced Mean Squared Error (MSE) to 3.229 with less number of
iterations i.e. 151 (Fig. 11 and 12).
The regression plot of training, validation, test, and all are shown below. The ‘R’
value indicates the relationship between outputs and targets. Here LQR control based
Fig. 8 Performance plot of LQR based ANN with 10 neurons

Fig. 9 Training state plot of LQR based ANN with 10 neurons
Fig. 10 Regression plot of

LQR control based ANN
with 10 neurons
ANN with 20 neurons testing regression R is 0.7398 and overall R is 0.7511 better
than 10 neuron network (Fig. 13).
Nonlinear Input-Output time series dynamic neural network uses previous (past)
values of one or more time series used to predict future values. Dynamic artificial
neural networks which include tapped delay lines are used for non-linear filtering
and prediction.
Fig. 11 Performance plot of LQR based ANN with 20 neurons
Fig. 12 Training state plot of LQR based ANN with 20 neurons
Output of dynamic neural network (LQR control) with 20 neurons with delays 2.
The Non-linear input-output give Time series response and error histogram (Figs. 14
and 15).
Fig. 13 Training state plot

of LQR based ANN with 20
neurons
Fig. 14 Time series response of LQR based ANN with 20 neurons
3.2.2 Training and Validation for SMC control
The training of SMC control with 10,000 data points from real time operation using
Levenberg-Marquardt (TRAINLM) training function with random data division.
Adaptation learning function used for neural network here is LEARNGDM.
Fig. 15 Error Histogram for LQR based ANN with 20 neurons
Training of Sliding Mode Control (SMC) based Artificial Neural Network (ANN)
was done with 10 neurons (2 layers) and 20 neurons (2 layers) respectively. After vali-
dation following plots were obtained Performance plot, Training plot, and Regression
plot (as shown below).
mance of 3.3431 achieved after 46 epochs. Which shows relationship between the
outputs of the network and targets. Training state plot give us the gradient (used for
calculation of weights to be used in neural network), mu (training gain/momentum
update) and validation check (Figs. 16, 17 and 18).
The regression plot of SMC based ANN with 10 neurons gives ‘R’ value for
testing 0.6030 and overall R is 0.5921.
The performance plot give us Mean Squared Error (MSE), best validation per-
formance of 3.9639 achieved after 119 epochs. SMC control based ANN with 20
neurons gives sluggish result than 10 neurons (Figs. 19, 20 and 21).
Non-linear Input-Output time series dynamic neural network uses previous (past)
values of one or more time series used to predict future values. Dynamic artificial
neural networks which include tapped delay (in this case its 2) lines are used for
non-linear filtering and prediction with earlier 20 neurons network [6]. The time
series response obtained is better than all other cases considered as well as the error
histogram (as shown below) (Figs. 22 and 23).
Fig. 16 Performance plot of SMC based ANN with 10 neurons
Fig. 17 Training state for SMC based ANN with 10 neurons
4 Conclusion
Artificial Neural Network used in plant to simplify the design of a Linear Quadratic
Regulator (LQR) and Sliding Mode Control (SMC) controller and decrease the
computational time and complexity.
The LQR and SMC controller based ANN give perfectly well outputs for the 3
DOF Hover Quancer setup in real time simulation [7], while the research was able
to show the comparison and analysis between LQR and SMC controller based ANN
Fig. 18 Regression plot for

SMC based ANN with 10
neurons
Fig. 19 Performance plot of SMC control based ANN with 20 neurons
with 10 and 20 neurons respectively. Eventually the output obtained for SMC control
based ANN with 20 neurons and time delay series gave better results than the other
cases.
Fig. 20 Training state of SMC control based ANN with 20 neurons
Fig. 21 Regression plot for

SMC control based ANN
with 20 neurons
ANN are the perfect choice for any plant system where one needs adaptive
response with time and environment and the plant modeling is inaccurate or expen-
sive to model. The ANN with 20 neurons generated exact control signal in phase
with the desired output with some gain in its amplitude.
Fig. 22 Time series response for SMC control based ANN with 20 neurons
Fig. 23 Error histogram for SMC control based ANN with 20 neurons
References
1. Cao, J., Yan, C., Wang, X.N.: Application research of integrated design using reinforcement
learning model. Appl. Mech. Mater. 2014
2. Cao, J., Yan, C., Wang, X.N.: Application research of integrated design using neural networks.
Appl. Mech. Mater. 2014
3. Hamel, T., Mahony, R., Lozano, R., Ostrowski, J.: Dynamic modeling and configuration
stabilization for and x4-yer. In: 15th Triennial World Congress of the IFAC, Spain, 2002
4. Besada-Portas, E., Lopez-Orozco, J.A., Aranda, J., de la Cruz, J.M.: Virtual and remote practices
for learning control topics with a 3DOF quadrotor. In: IFAC Proceedings Volumes, 2013
5. Quancer Inc: Quancer 3 DOF (Degree of Freedom) Hover user manual. vol. 1 (2013)
6. Mutaz, T., Ahmad, A.: Solar radiation prediction using radial basis function models. In: 2015
International Conference on Developments of E-Systems Engineering (DeSE), 2015
7. Apkarian, J., Levis, M.: Quancer Inc “Quancer 3 DOF (Degree of Freedom) Hover experiment
for MATLAB/Simulink” vol. 1, 2013
Cognitive Demand Forecasting
with Novel Features Using Word2Vec
and Session of the Day
Rishit Dholakia, Richa Randeria, Riya Dholakia, Hunsii Ashar

and Dipti Rana
Abstract Demand Forecasting is one of the most crucial aspects in the supply chain
business to help the retailers in purchasing supplies at an economical cost with the
right quantity of product and placing orders at the right time. The present investigation
utilizes a years’ worth of point-of-sale (POS) information to build a sales prediction
model, which predicts the changes in the sales for the following fortnight from the
sales of previous days. This research describes the existing and newly proposed
features for demand forecasting. The motivation behind this research to provide
novel features is to obtain an improved and intuitive demand forecasting model.
Two features proposed are: Item categorization using word2vec with clustering and
session of the day based on the time. The demand forecasting models with traditional
features like seasonality of goods, price points, etc. together with our proposed novel
features achieve better accuracy, in terms of lower RMSE, compared to demand
forecasting models with only traditional features.
Keywords Retail industry · Product categorization · Demand forecasting · Novel

features · Word2vec · Word embeddings · Session of the day
1 Introduction
The retail business comprehensively is expanding at a rapid pace. With increasing

competition, every retailer needs to viably adapt to the impending demand. This
additionally implies there is a growing shift towards efficiency and a conscious step
away from excess and waste of the product. In recent times a company’s most valuable
asset is the data generated by its customers. Consequently, it has become popular to
try and win business benefits from analysing this data. Using this approach in big
and small scale industries is our aim, hence the focus would be to provide a much
more intuitive approach to utilize the data generated, by using latest advancements
R. Dholakia (B) · R. Randeria · R. Dholakia · H. Ashar · D. Rana

Sardar Vallabhbhai National Institute of Technology, Surat, India
D. Rana
e-mail: dpr@coed.svnit.ac.in
https://doi.org/10.1007/978-3-030-38445-6_5
60 R. Dholakia et al.
like word2vec based word embedding and session of the day. This would help the
retailers in business strategy improvement, by using accurate forecasting of the sales
of every item.
1.1 Motivation
In the past research papers, some of the traditional and obvious features such as
price, holiday, stock code were used to perform demand forecasting. On detailed
analysis and observation, it was identified that categorizing each product using word
embedding based on word2vec and forming their clusters would be more helpful to
generalize each product, which provides more intuition in predicting the quantity of
the product. Also, in order to prevent the fluctuating variation of the time series data,
it was analysed that categorizing the day into sessions would prove to provide more
accurate results. So utilizing the traditional features along with the proposed novel
features will provide a better accuracy.
1.2 Problem Statement
Improvised demand forecasting using a more intuitive approach using novel features
such as word2vec based item category and session of the day based on time.
The further sections of the report are organised as follows. Section 2 talks about
the survey done on the prior forecasting techniques and the theoretical background
of those techniques. Section 3 describes the proposed framework and methodologies
used in this research. Section 4 consists of the pre-processing techniques used on
the raw data to prepare it for prediction using machine learning models. Section 5
consists of feature engineering techniques which include clustering of items on
word2vec based data and relevant attributes creation to improve accuracy in pre-
diction. Section 6 describes the experimental analysis of forecasting models for
analysing the trade-off between different models and trade-off between inclusion
and exclusion of novel features. Section 7 consists of the research conclusion and
future work.
2 Theoretical Background and Literature Survey
Traditional features such as past weeks’ sales data, price of each item, presence of
holidays etc. have been used to generate a predictive model. Also, various statistical
techniques such as exponential smoothing, ARIMA regression models, SVR, etc.
have been analysed for this application. Word2Vec algorithm has earlier been used
with respect to different applications requiring word embedding.
Cognitive Demand Forecasting with Novel … 61
2.1 Research and Analysis
Past work in this field includes work related to features and various predictive models
used for demand forecasting. Analysis on numerous applications of word2vec was
also done.
Word2vec Based Word Embeddings Word2Vec is used in various applications.

Some applications are dependency parsers, name entity recognition, sentiment anal-
ysis, information retrieval etc. One such application is mentioned by Sachin et al.
that is to evaluate exam papers automatically, using word2vec word embedding [1].
Input Features Retail forecasting research done by Fildes et al. [2] mentioned the
need to view the POS (Point of Sale) data with respect to different perspectives such as
seasonality, calendar events, weather, and past week sales data of an item mentioned
in Martin’s blog [3]. This was done in order to capture hidden yet significant trends
from the data.
Retail product sales data have strong seasonality and usually contain multiple
seasonal cycles of different lengths, i.e. sales exhibit weekly or annual trends. Sales
are high during the weekends and low during the weekdays, high in summer and low
in winter. Data may also possess biweekly or monthly (pay check effects) or even
quarterly seasonality, depending on the nature of the business and business location
[2]. For this reason, models used in forecasting must be able to handle multiple
seasonal patterns, hence gain maximum knowledge from the data. Retail sales data
are strongly affected by some calendar events. These events may include holidays,
festivals and special activities (e.g., important sport matches or local activities) [2].
Most research includes dummy variables for the main holidays in their regression
models. Certain variables are not related to the chosen dataset and hence are ignored
for example weather.
To capture the trends in change of demand comparing to past few weeks, be it
upward or downward, another set of variables must be included which are the sales
of the item in past one week, past two weeks and past three weeks [3].
Predictive Models Kris et al. suggested a few models such as regression trees,
principal components regression etc., to be used in this retail forecasting scenario [4].
Regression Trees are decision trees which are used for continuous dependent variable
dataset containing too many features interacting in nonlinear ways. To handle non-
linearity, the space is partitioned into smaller regions recursively, to form sub-regions.
The resulting chunks are progressively managed to fit simple model [5]. XGBOOST
and LGBM are based on regression trees [5]. Principal Component Regression is
a regression tree based on principal component analysis. The principal values are
calculated and they are used as predictors in a linear regression model [6].
Another researcher Xia et al. suggested that in order to deal with seasonality and
limited data problems of retail products, a seasonal discrete grey forecasting model
such as Fuzzy Grey regression model and ANN model should be used [7].
Current trends of machine learning models are bagging and boosting which
includes XGBOOST and LGBM and other models include SVR and ARIMA for for-
mulating predictive solutions. Bagging and boosting algorithms drastically increase
accuracy by learning from weak learners. LGBM is one such boosting algorithm
which is fast, distributed, high-performing, and produces generalized results that
grows decision trees leaf wise. XGBOOST is another such powerful model which
grows trees level wise which provides inbuilt regularization to exponentially increase
speed and efficiently work around the problem of overfitting. SVR’s are used to per-
form non-linear regression by interpolating data to multidimensional information
space using kernels. Time-series data is handled by using ARIMA models.
XGBOOST, ANN, LGBM were considered for implementation. PCR was not
considered for implementation, as PCA was not required on our data. Fuzzy grey
regression model was not implemented as it works on limited time series data (50
observational time stamps) for predicting the sales output. SVR and ARIMA were
considered for implementation, but as ARIMA could only be used individually on
each product it is not mentioned in the paper. SVR uses multidimensional kernels
to perform regression, requiring large computational power and memory for large
dataset; as a result it was not mentioned in paper.
3 Proposed Framework
From the literature review and after analysing the latest trends, the proposed frame-
work is shown in Fig. 1. for the cognitive demand forecasting with the following
objectives:
• Collect and pre-process the data.
• Group the items based on item similarities to make effective models such that the
disparity of items is less and accurate predictions can be made on the data using
word2vec based word embeddings.
• Derive the Session of the day feature from timestamp information.
• Aggregate features for prediction.
• Predict stock requirement using eclectic machine learning algorithms.
The workflow of the proposed framework along with a brief introduction of each
step is as follows and the detailed workflow is mentioned in the later sections.
Proposed Framework with novel features
Fig. 1 Proposed framework with novel features

3.1 Dataset Description
UCI repository dataset was used for this research work [8].
• Invoice No: Invoice number uniquely assigned to each transaction. If this code
starts with letter ‘c’, it indicates a cancellation.
• Stock Code: Product (item) code uniquely assigned to each distinct product.
• Description: Product (item) name.
• Quantity: The quantities of each product (item) per transaction.
• Invoice Date: Invoice Date and time, the day and time when each transaction was
generated.
• Unit Price: Unit price. Product price per unit in sterling.
• Customer ID: Customer number uniquely assigned to each customer.
• Country: Country name. The name of the country where each customer resides.
The data set is a time-series data, consisting of exact time and date of items
purchased. This data consists of many transactions throughout the year. The data set
consists of around 5,41,910 tuples. It contains one of the most important attribute
like Invoice Date and other features such as seasons, weekends, weekdays, etc. can
be generated from this attribute.
3.2 Data Pre-processing
Real-world data is often incomplete, inconsistent, and/or lacking in certain

behaviours or trends, and is likely to contain many errors. Data pre-processing [9]
is a proven method of resolving such issues. Data pre-processing prepares raw data
for further processing by transforming raw data into an understandable format.
Anomalous rows and noise were identified and rectified by identifying a pattern.
Anomalous rows include negative values for quantity of an item, quantities of prod-
ucts above 70,000 and invalid stock codes. These were removed by using regular
expressions and identifying a pattern throughout the dataset. Duplicate rows were
removed and un-required columns were dropped. This data set was highly left skewed
in terms of unit price as a result logarithm transformation [10] was applied onto the
price column. Heteroscedasticity was also identified and removed using box-cox
transformation [11].
3.3 Feature Engineering Approach
One of the characteristics of any data set is garbage in - garbage out. It means that
if a dirty data is passed in a model we would get garbage values out of the model
resulting in low accuracies. So in-order to get the best accuracies the concept of
feature engineering is used. Feature engineering consists of feature extraction i.e.

only including the required features and eliminating the irrelevant features, feature
segmentation, feature creation and feature augmentation.
Segmenting the Day into Sessions Splitting the Invoice Date into hour categories
known as sessions of the day (e.g.: morning, night etc.). This is done in order to
analyse the trends of the product market during different time periods of the day.
This would even help in finding the anomalies in the data set that might not be
figured out through normal database scan.
Analysis of Past few Weeks Another strategy includes finding out the variations in
quantity sold for items over a range of past few days. This gives an important weight
age in weekly or fortnightly analysis for each product, by deriving a feature like past
few day sales. Since the data is of non-perishable products, weekly analysis of past
3 weeks would provide a much better trend intuition. Using this feature to accurately
predict the safety stock required for the product as suggested by Martin [3].
Day of the Week InvoiceDate is an important feature for stock prediction. The date
attribute can also be broken down into the days of the week like Monday, Tuesday
etc., after which it is converted into numerical attribute, signifying weekdays and
weekends, indicating the past daily or weekly sales of the product [3]. This is even
used to gain knowledge and insights on the seasons for a better forecast.
Word2vec Based Categorical Data This is an approach where a categorical feature
is created by clustering of similar products, as mentioned in the later section of this
paper, which is generated using word2vec based word embeddings to provide the
required information about their similarity with other products.
3.4 Machine Learning Model
This research also focuses on training the data using recently popular machine learn-
ing models like XGBOOST, LGBM and classic model like Artificial Neural Net-
works. It also focuses on the accuracy trade-off between different models and accu-
racy trade-off between inclusion and exclusion of the word2vec based categorical
data and sessions of the day feature. The output of this model is the forecasted stock
value of the product.
4 Feature Engineering
Feature Engineering involves using domain knowledge to create, extract and reduce
features. This is necessary to generate comprehensive knowledge from the data for
more accurate results. Features are augmented and processed. Incubating features
allows the model to derive higher accuracy and knowledge from the data.
4.1 Feature Segmentation
Feature segmentation is performed for better analysis of the data. The InvoiceDate
attribute is broken down into year, month, week, day and hour. These categories
would generate one of our proposed novel feature, session of the day and other
traditional features like numerical attribute for particular day of the week and past
3 weeks’ sales of each product.
Session of the day provides a more intuitive knowledge into the data that can
be used in developing particular market schemes for a given product to be sold,
depending on its highest demand time of the day in hours i.e. morning (8:00–12:00),
afternoon (12:00–16:00), evening (16:00–20:00) or night (20:00–24:00). For exam-
ple, the greatest percentage of newspaper sold is during the morning session. This
provides an intuitive knowledge regarding the outflux session during the day of the
particular product.
Identification of day of the week i.e. weekday or weekend caters into analysing
trend of which period of the week an item is bought. Usually decorative party items
or drinks are bought during the weekend as compared to the weekdays.
As suggested by Martin in his blog [3], previous sales for past 3 weeks for each
product are derived. It is an important feature to predict the requirement of a particular
item for the current week with respect to its sales in the past few weeks. This would
provide an additional incite to the model, to identify the sudden rise or drop in sale of
an item which can be attributed to an indirect un-identified attribute, only identified
by using past week sales. 3 weeks were chosen as retail data contains non-perishable
goods such as furniture.
4.2 Feature Augmentation
Feature augmentation is used to add additional features from different domains other
than the one the data has come from. This helps in increasing the accuracy of the
model by considering possible relationships outside the scope of the data provided.
Another important feature for the construction of the model is the Boolean feature of
holiday. The feature is generated by considering InvoiceDate and mapping this date
to a holiday dataset, obtained from a holiday api. This feature helps to identify the
surge in sales of an item prior to the holiday and during the holiday period [3].
4.3 Word2vec Based Categorical Data
Each of the 500 unique products is mapped to a category manually, in order to

create a feature which would provide more intuition into every product. For example
category handbags consist of the unique handbags such as floral handbags, checkered
handbags and tote bags. This results in the total creation of 351 different categories.
For easier organization of these categories in the dataset and modularity of each
category, it is better to term them as subcategories and group them into their respective
categories. For example, let us consider the fact that we have different types of bags
like handbags, soft bags, and water bags that belong to the category “bag”. These
categories are the clusters which is discussed later in this section. Aggregation of
these different sub categories into categories can be done, as shown in Fig. 2.
To perform categorization of these sub-categories, human labour is needed to indi-
vidually identify similarities between the subcategories. This is a more cumbersome
task, as compared to the former task of sub categorization. This creation of categories
from subcategories can be completed in less time by providing the machine with the
intelligence of knowing the meaning of the subcategories and finding similarities
between them to form categories. So in order to resolve this problem word2vec
based word embedding is used [12]. Meaning of each word is found using the con-
cept of word2vec based word embedding and categorized into clusters. Figure 3
shows the workflow for the creation of the categorical data.
Working with Word2vec Based Word Embeddings The below common words as
shown in Fig. 4 are represented in one hot encoded form, which is in a m * n matrix.
Presence of a word in a particular row is marked with 1.
For applications like language translation and chatbots, it becomes difficult for
learning algorithms, like Recurrent Neural Networks to predict the next word as there
is no relation among them. For example: the words pencil and handbag are nowhere
close to each other in the one hot encoded representation. So, if the sentence “I
want to buy a pencil for school” and “I want to buy a bag for school”, it should
predict “for school” for the later as well. The solution to this is to create a matrix
Fig. 2 Categorization of
sub-categories
Fig. 3 Workflow of creating the categorical data using word2vec based word embedding
Fig. 4 One-hot encoded

values of the products
that describes the word. The description of these words is considered depending on
different relations they have with the factors that represent them. Some examples of
these factors are the words age, gender, royal, food etc. which are then related to
each of these words present. This concept of relating words to factors based on the
values is known as word embedding. In order to generate these word embeddings,
word2vec model is used [13]. These embeddings are then used as input into the
machine learning models. The meaning of each subcategory like pencil, box etc. is
provided by the word embeddings. Using these word embeddings the similarity of
each sub-category is known, so that they could be clustered together. Representation
of these embedding values is in numeric form. Figure 5 shows a part of the embedding
matrix that is used for each of the vocabulary words.
The dimensions of these embedding matrices can vary. There might be about
300–1000 dimensions’ space based on the amount of corpus available. Usually there
are ready embedding matrices available to be used. These words can be trained using
a word2vec model, but this requires a lot of computation, longer time period and
requires a huge corpus of data to work with. For this paper a pre trained model of
the word vector has been used. Google News data set is one such corpus whose
vocabulary word embedding is used [14]. Since this is a 300 dimensions vector it
is computationally expensive to create clusters of the words and would be difficult
to visualize those words. For this reason, the dimensionality of the vector has to
be reduced. PCA, principal component analysis is used in order to reduce these
dimensions for easier computation and for better visualization of the object space
[15]. For this paper a two-dimensional space is used to view the points.
Working with Clustering of Data Now that the representations have been made,
these representations have to be clustered into groups to represent the metadata to
be worked with. For this research k-means clustering is used because it provides
Fig. 5 Embedding matrix of

the products
Fig. 6 Optimum cluster

representation
tighter clusters. The clusters formed represent the group to which the subcategories
of products belong to. As shown in Fig. 6, elbow method is used to find out the
optimum number of clusters [16].
A cluster value range from 5 to 8 and a dimension range of 2–5 were anal-
ysed using a cross-grid to determine which values form the correct clusters for the
sub-categories. The ranges of dimensions are taken to know how many dimensions
provide sufficient amount of information for the similarity of the products. A com-
bination of 4 dimensions and 7 clusters are used to create the categorical feature.
Due to the creation of this novel feature, it is required that the results of clustering be
compared with a benchmark to see if the categorization done is right. Due to absence
of previous benchmarks to be compared, this data was compared with online stores
like Jarir Bookstore [17] and Home Centre [18]. The results proved to be accurate
but with only a few miscellaneous categorizations. The screenshots below provide
the cluster to which each product belongs to after applying clustering.
From Fig. 7., it can be evaluated that art is a different category and subcategories
like pencil, pen, and bags belong to the same category in both the online website of
Fig. 7 Novel cluster

comparison with Jarir
bookstore
Fig. 8 Novel cluster

comparison with home
centre
Jarir bookstore as well as the clusters that have been made using word2vec based word
embedding. Similarly, Fig. 8. depicts that mugs, cups, plates belong to a category
dining, in home-centre’s website as well as, in the above assigned cluster.
5 Experiment Analysis
This section describes implementation of the proposed features and application of

various models on the aforesaid data set. The experiment was executed using the
Anaconda Navigator on the python 3.6 platform. The models ANN, LGBM and
XGBOOST were trained and evaluated on the basis of calculated root mean squared
errors values.
5.1 Model Trade-off
In this section the data is trained on different models like artificial neural networks,
regression trees like extreme gradient boosting and light gradient boosting.
The ANN model is used for both continuous and discrete dependent feature, it
is one of the most flexible models as it requires less amount of parameter tuning
and provides good output result. The XGBOOST model is used for its speed in
Table 1 Comparison of
Model Train-set RMSE Test-set RMSE
various models accuracy
ANN (rectifier) 30.73 31.86
ANN (tanh) 24.84 28.29
LGBM 34.34 34.342
XGBOOST 8.22 11.86
training the data and even provides effective output values for both, the train and
test set compared to other gradient boosting algorithms. The LGBM model is used
as it provides faster training and a much more generalized model i.e. less variance
between the training and test accuracies.
The pre-processed data contains an overlap of a month that is used for analysis and
showcasing the output of the working prediction model. The data is first randomised;
this is done to provide different variations from features throughout the year. The
data is split as 90% train-set and 10% test-set data.
In this paper, the ANN model is trained on two different activation functions
namely rectifier and tanh with two hidden layers as there are 475,455 tuples to be
trained. Each node consists of 100 nodes as this an optimum parameter tuning that
provides the least RMSE value by ANN. In order to get the least RMSE value for
XGBOOST number of iterations have been set to 100, maximum depth of the tree is
set to 25 and the number of CPU cores used for this training is 3. The LGBM model
works as XGBOOST does but uses extra parameters like number of estimators,
which is tuned to 6000. It is used in comparison to XGBOOST because it has a faster
execution rate.
It is observed from Table 1 that ANN and LGBM have high RMSE values but a
good generalization in-comparison to XGBOOST, where the training RMSE value
is the least and has a bit higher test RMSE. Since the variance of test set and train
set of XGBOOST are not very distinct as compared to the other two algorithms and
since it provides the least RMSE value, it would be safe to consider this model for
further analysis.
5.2 Feature Trade-off
Once the model analysis is done and the best model is selected, it is required to check
if the additional features added to the dataset helps in improving the predictions or
not. The two main features, the categorical feature based on word2vec and sessions
of the day are analysed. In this feature trade-off, the RMSE values are compared by
removing one of these two features at a time and training it. This includes considera-
tion of over-fitting also for checking the importance of the two models. Since the best
model was XGBOOST as mentioned above in the model trade-off, this same model
with its same parameter tuning is used for training and testing. It can be depicted from
Table 2 that the exclusion of novel features over-fits the model by a large margin.
Table 2 Comparison of XGBOOST accuracy with different features

Model Train-set RMSE Train-set RMSE
XGBOOST (without proposed novel item category 14.33 189.24
feature)
XGBOOST (without proposed novel sessions of the day 11.03 25.78
feature)
XGBOOST (with proposed item category and sessions 8.22 11.87
of the day features)
But the inclusion of these features reduces the variance margin by a large amount,
hence generalizing the model.
6 Conclusion
Demand forecasting was evaluated with novel features using word2vec based word
embedding which was used to generate clusters where each product belonged.
Another novel feature, session of the day was also generated. The improved data
set was trained using three models namely ANN, XGBOOST and LGBM. Upon
evaluation, it was found that XGBOOST on a dataset using clusters and sessions of
the day provided an improved accuracy in terms of lower RMSE as compared to
previous research papers on demand forecasting.
7 Future Works
The following is the proposed future work:

• Recency, Frequency, Monetary (RFM) segmentation allows retailers to target spe-
cific set of clusters of customers with offers that are much more relevant for their
particular behavioural patterns.
• Recommendation system for retailers suggesting the new items to be kept in the
store.
• Sentiment analysis of product to generate better sales brand wise.
• Using stacking of training models LGBM and XGBOOST.
• Applying the proposed novel features into other applications such as tweets
categorization, weather forecasting etc.
References
1. Sachin, B.S., Shivprasad, K., Somesh, T., Sumanth, H., Radhika A. D.: Answer script evalu-
ator: a literature survey. International Journal of Advance Research, Ideas and Innovations in
Technology 5(2) Ijariit (2019)
2. Fildes, R., Ma, S., Kolassa, S.: Retail forecasting: research and practice. MRPA Paper,
University Library of Munich, Germany (2019)
3. Retail store sales forecasting. https://www.neuraldesigner.com/blog/retail-store-sales-
forecasting
4. Johnson Ferreira, K., Hong Alex Lee, B., Simchi-Levi, D.: Analytics for an online retailer:
demand forecasting and price optimization. Manufacturing and Service Operations Manage-
ment 18(1), 69–88 (Winter 2016)
5. Regression trees. http://www.stat.cmu.edu/~cshalizi/350–2006/lecture-10.pdf
6. Principal Components Regression. https://ncss-wpengine.netdna-ssl.com/wp-content/themes/
ncss/pdf/Procedures/NCSS/Principal_Components_Regression.pdf
7. Xia, M., Wong, W.K.: A seasonal discrete grey forecasting model for fashion retailing.
Knowledge-Based Systems 57, 119–126. Elsevier (2014)
8. Online retail dataset. https://archive.ics.uci.edu/ml/datasets/online+retail
9. Data pre-processing. https://en.wikipedia.org/wiki/Data_pre-processing
10. Dealing with Skewed data. https://becominghuman.ai/how-to-deal-with-skewed-dataset-in-
machine-learning-afd2928011cc
11. Dealing with Heteroscedasticity. https://www.r-bloggers.com/how-to-detect-
heteroscedasticity-and-rectify-it/
12. Word embeddings: exploration, explanation, and exploitation (with code in Python). https://
towardsdatascience.com/word-embeddings-exploration-explanation-and-exploitation-with-
code-in-python-5dac99d5d795
13. Word embedding and Word2Vec. https://towardsdatascience.com/introduction-to-word-
embedding-and-word2vec-652d0c2060fa
14. Google news dataset. https://code.google.com/archive/p/word2vec/
15. Principal component analysis. https://machinelearningmastery.com/calculate-principal-
component-analysis-scratch-python/
16. Elbow method (clustering). https://en.wikipedia.org/wiki/Elbow_method_(clustering)#cite_
note-3
17. Jarir bookstore website. https://www.jarir.com/sa-en/
18. Home center website. https://www.homecentre.in/
A Curvelet Transformer Based
Computationally Efficient Speech
Enhancement for Kalman Filter
Manju Ramrao Bhosle and K. N. Nagesh
Abstract In this paper, we propose an adaptive wavelet packet (WP) threshold-

ing method with iterative Kalman filter (IKF) and Curvelet transformer for speech
enhancement. The WP transform is first applied to the noise corrupted speech
on a frame-by-frame basis, which decomposes each frame into a number of sub-
bands. For each sub-band, a voice activity detector (VAD) is designed to detect the
voiced/unvoiced parts of the speech. Based on the VAD result, an adaptive thresh-
olding scheme is then utilized to each sub-band speech to obtain the pre-enhanced
speech. To achieve a further level of enhancement, an IKF is next applied to the
pre-enhanced speech. An improved method based on Curvelet Transform using dif-
ferent window functions is presented for the speech enhancement. The window func-
tion is used for pre-processing of speech signals. In this method, instead of using
two-dimensional (2-D) discrete Fourier Transform, Curvelet transform is employed
with spectral magnitude subtraction method. the proposed method is evaluated under
various noise conditions. Experimental results are provided to demonstrate the effec-
tiveness of the proposed method as compared to some previous works in terms of seg-
mental SNR and perceptual evaluation of speech quality (PESQ) as two well-known
performance indexes.
Keywords Wavelet packet (WP) · Voice activity detector (VAD) · Iterative Kalman
filter (IKF) · Perceptual evaluation of speech quality (PESQ) · Curvelet transform
1 Introduction
From the time 1990s, wavelet-transform part takes remained well castoff in speech-
recognition, image-processing, and seismic statistics de-noising owing to the sit-
uation nice localization distinctive. Wavelet do a virtuous job in resembling signs
with piercing spikes or signals consuming breaks. Curvelets are suitable source for
M. R. Bhosle (B)
Government Engineering College, Raichur, India
K. N. Nagesh
Nagarjuna College of Engineering and Technology, Bangalore, India
https://doi.org/10.1007/978-3-030-38445-6_6
74 M. R. Bhosle and K. N. Nagesh
signifying speech, which are suave apart from individualities along plane curves,
wherever the bends have circumscribed curvature, i.e. where substances in speech
have a smallest distance age.
In measurements, Kalman filtering, also recognized as linear-quadratic-estimation
(LQE), is a procedure that consumptions a sequences of quantities experiential over
time, encompassing numerical noise and supplementary imprecision’s, and products
guesstimates of unidentified variables that incline to be additional precise than per-
sonals grounded on a solitary measurement only. The filter is entitled next Rudolf
E. Kalman, unique of the main developers of its philosophy. The Kalman-filter is
a extensively pragmatic perception in time sequence examination cast-off in areas
such as signal-processing and econometrics. The procedure mechanisms in a two-
step procedure. In the estimate step, the Kalman-filter products approximations of
current formal variables, alongside with their indecisions. At every discrete-time
addition, a rectilinear operative is applied to state to produce the novel state, with
certain noise varied in, and optionally certain evidence from the reins on organiza-
tion if they are recognized. Formerly, additional linear-operator varied with addi-
tional noise produces the experimental productions from “hidden” state. As soon
as the consequence of subsequent dimension is perceived, these estimations are
rationalized by means of prejudiced regular, with additional weight existence given
to approximations with advanced inevitability. Postponements and simplifications
to the technique partake also been established, such as the extended-Kalman-filter
and unscented Kalman-filter which effort on non-linear schemes. The fundamental
prototypical is comparable to a concealed Markov model.
As a statement of our problem, we have stated here, Speech excellence and
lucidness potency meaningfully worsen in occurrence of related noise, particularly
once speech sign is substance to succeeding handling. In precise, speech coders
and automatic-speech-recognition (ASR) classifications that were premeditated or
accomplished to act on uncontaminated speech gestures might be concentrated use-
less in the existence of circumstantial noise. Speech augmentation processes have
consequently fascinated a excessive deal of attention in the previous 2 decades. Lit-
tle of problems in prevailing procedures are, few procedures may be illustrious by
the quantity of a priori information that is expected on measurements of uncontam-
inated speech-signal. Few procedure sentail a training stage in few the uncontami-
nated speech constraints are projected, prior to the solicitation of the improvement
procedure, while the other methodologies do not necessitate such training phase.
Maximum of the procedures run for solitary repetition which stretch very smallest
noise termination and less-efficiency in acknowledgement. To overawed this itera-
tive grounded Kalman-filter are remained originated into reality, however they have
certain disadvantages as clarified. Limited of the iterative group algorithm is reg-
istered are, comprising spectral deduction, small time shadowy amplitude (STSA)
estimator, the hidden Markov model (HMM), the log-spectral-amplitude-estimator
(LSAE) based purifying procedures, and the Wiener filter methodology.
In imperative to decrease the computational load and to eradicate the adjourn-
ment of iterative-batch algorithm, successive algorithm may be castoff. Even though
A Curvelet Transformer Based Computationally Efficient … 75
in common, the enactment of the iterative-batch procedure is greater, at low SNR

variances in presentation are minor.
In instruction to discourse the aforementioned limits, in this paper we propose an
improved thresholding scheme with the IKF for speech enhancement on a frame-
by-frame basis. The noisy speech is first decomposed into a number of sub-bands
with the WP. The VAD is then applied to each sub-band frame to determine whether
the frame is voice or noise. In contrast to most existing works where only a single
parameter is employed for voice/noise frame detection, our method makes use of
two measurements in the VAD stage. (i) frame energy and (ii) spectral flatness. A
VAD based adaptive thresholding scheme is then proposed for speech enhancement
in accordance with each sub-band frame activity. Finally, an IKF is used for further
noise reduction, which is followed by reconstruction of the full-band speech from
the enhanced sub-band speeches.
Few of the existing methodologies and Speech improvement has remained broadly
considered for several ages and numerous speech enrichment approaches have
remained Industrialized throughout the previous periods. One of purposes of speech-
enhancement is towards deliver good eminence speech message in the existence of
circumstantial noise and simultaneous intervention indications. Consequently, it is
abundant attention to progress an effectual speech improvement method to recu-
perate innovative speech after loud observations. Little of current investigates and
journal on speech handling and improvement procedures and methods are remained
pronounced below as well as their accuracy and shortcomings.
In the paper, A Wavelet-Fusion Technique for Speech-Enhancement by Xia et al.
[1], in instruction to assimilate the appearances of dissimilar noise decline pro-
cedures, a wavelet-fusion technique for speech improvement is projected in this
tabloid. In projected method, the loud speech is chief disintegrated into numerous
sub-bands by wavelet-packet examination, and then improved by arithmetical proto-
typical founded technique and wavelet-thresholding technique, correspondingly. The
production of every sub-band found below fusion guide line founded on cross corre-
lation and a priori SNRs of 2 improved quantity sets. In conclusion, enhanced coeffi-
cients remain converted posterior to time province to develop improved speech. The
presentation of projected technique is calculated underneath ITU G.160. The exam-
ination consequence demonstrations that, associating through the orientation pro-
cedures, the projected technique yields improved language with healthier detached
quality, though the quantity of sound lessening and SNR enhancement leftovers at a
great level. The impression on speech glassy is earmarked in a satisfactory variety
at same time.
Islam et al. [2] in Improvement of Noisy Speech Founded on Decision-directed
Wiener Method in Perceptual-Wavelet Packet-Domain projected as underneath, for
improvement of noisy language, a technique founded on conclusion directed-Wiener
methodology in perceptual wavelet packet (PWP) dominion is obtainable. The pro-
jected technique undertakes an additive Gaussian noise prototypical to originate the
preparation of approximating the clean speech constants. The projected technique
also deliberates the signal-to-noise ratio (SNR) evidence of preceding structure to
attain approximations of spotless speech coefficients of existing frame. By means
of speech records obtainable in NOIZEUS databank, a quantity of reproductions are

completed to estimate the presentation of projected technique for speech indications
in attendance of babble, car and street sounds. The projected technique outs trips cer-
tain of state-of-art speech enhancement approaches together at low and high levels
of SNRs in expressions of customary detached procedures individual assessments.
In Speech De-noising by means of Wavelet-based Approaches with Attention on
Organization of Speech hooked on Pronounced, Unvoiced and Silence Expanses,
by Baishya et al. [3], paper offerings an upgraded speech improvement proce-
dure founded on wavelet transmute laterally with excitation founded organization
of speech to eradicate noise after speech signals. The technique originally catego-
rizes speech into pronounced, unspoken and stillness provinces on the basis of an
original energy founded inception and wavelet-transform is smeared. To eliminate
noise, thresholding is smeared to the element coefficients by enchanting into delib-
eration dissimilar individualities of speech in 3 dissimilar provinces. For this, indul-
gent thresholding is castoff for spoken provinces, hard thresholding for un-voiced
provinces and the wavelet-coefficients of quietness are as are made zero. Speech
indications gained from SPEAR databank and tainted with white-noise are existence
cast-off for assessment of projected technique. Investigation all outcomes show, in
relationships of PESQ and SNR score, de-noising of speech is attained using pro-
jected method. With respects to SNR, finest enhancement is 9.4 dB when associated
to SNR of unique speech and 1.3 dB as associated to enhancement attained by means
of one of lately described approaches.
In Speech Improvement by Grouping of Transient-Emphasis and Noise-
Cancelation Daniel, by Rasetshwane et al. [4], paper evaluates the efficiency of
joining speech modification methods that enhance changeover components with
vigorous noise dissolution to progress the lucidity of speech in sound. Two speech
adjustment methods were measured. One is grounded on wavelet-packet examina-
tion, and the other customs a static filter, resulting from time-frequency examination
that accentuates high incidences. Active noise dissolution was delivered by Bose
noise abandoning head-phones. Examination noise was actual, produced by a ground
supplementary producer on tarmac at an Air-National-Guard-facility. The test indi-
cations were language marks from modified poem test, documented by a male talker.
This pattern was castoff to quota word acknowledgment amounts at various signal-
to-noise fractions. Active noise dissolution by itself providing over 40% upsurge in
word acknowledgement, while the improved speech and fixed filter techniques alone
provided up to 20% improvement, depending on the signal-to-noise ratio. In amal-
gamation, the speech adjustment methodologies delivered over 15% supplementary
development in lucidness over noise withdrawal alone.
With the projected methodology with an iterative manner, we try to improve
the efficiency and decrease the time consumption with the wavelet and Kalman
technologies.
2 Proposed Method
The projected method is presented in Fig. 1. In this method we afford two projected
prototypes. First prototypical is castoff to augment the speech signal, primarily we
income the response as noisy speech indication then by means of dissimilar window
utilities are existing for speech enhancement. The opening function is castoff for pre-
processing of speech signals [5]. In this method, instead of using two-dimensional
(2-D) discrete Fourier Transform, Curvelet transform is employed with spectral mag-
nitude subtraction method. Second model is used to improve much more quality of
speech signal than the first model.
In this paper, we have projected a VAD founded adaptive WP thresholding arrange-
ment with IKF for speech augmentation. The loud speech was first putrefied into 8
sub-bands. Two topographies have remained selected for VAD to perceive whether
the speech border of individually sub-band is a spoken or noise surround. Grounded
on the VAD outcomes [6], the brink was reorganized for individually frame of
different sub-bands, while each frame was accustomed by adaptive thresholding.
Deliberate a time domain loud speech y(k) as specified by
y(k) = s(k) + v(k) (1)
Here, s(k) is kth example of clean speech, and v(k) is noise model. In this paper,
input noisy speech is primary segmented into surrounds yn (k), where n is frame
directory. The succeeding handling is formerly approved out on a frame-by-frame
base. Our projected method contains of two consecutive phases. In the initial stage, an
enhanced VAD founded adaptive WP thresholding arrangement is industrialized to
decrease the noise for unspoken surrounds for separately sub-band. In the additional
stage, the reassembled and penanced full-band speech is administered by IKF for
additional improvement. The particulars of the projected method are obtainable in
the following two sub-sections.
2.1 Adaptive Thresholding
In this subdivision, yn (k) is treated by means of sub-band VAD structure laterally with
adaptive WP thresholding. The block figure of pre-enhancement phase is exposed in
Fig. 1 Block-diagram of pre-enhancement scheme

Fig. 1, here the enclosed speech is initially decayed into numerous sub-bands. Every
sub-band talking is defined as yn(i) (k) [7], here is sub-band directory. A VAD founded
adaptive thresholding structure is further smeared to every sub band, compliant a
improved sub-band speech yn (k). Afterward handling all sub bands, the WP re-
establishment is espoused in instruction to rebuild full group improved speech signal
ŷn (k).
Figure 2 displays the flow chart of VAD founded adaptive thresholding approach.
Chief idea of VAD system is to mine the dignified topographies from participation
noisy speech, and associate the topographies standards with thresholds, those are cal-
culated from noise phases. A pronounced structure is identified if measured standards
surpass particular threshold.
Then, the response speech frame is deliberated as a noise structure. Then the VAD
is implemented, a voice structure is identified as VAD = 1, whereas a noise structure
is noticeable as VAD = 0.
For projected VAD procedure, we majorly deliberate threshold initialization.
For every putrefied sub-band, we determine two topographies rendering for first
N frames, formerly the least significance of every feature midst these structures is
occupied as the primary thresholding assessment for structures as signified by E T,0
as well as F T,0 correspondingly.
VAD development twitches with computing the dual topographies for structure
n (n ≥1) attained, which consequences in E n then F n . Together feature standards
will twitch to associate with preliminary thresholding standards E T,0 besides F T,0 ,
correspondingly. As recommended, if two topographies values surpass the brinks
E T,0 , F T,0 respectively, speech surround n is manifest as a speech frame and the two
Fig. 2 Flowchart of VAD

based adaptive thresholding
thresholding standards are not reorganized. Or else, edging n is noticeable as a sound

frame, and two thresholding standards are formerly reorganized [8].

(n l − 1)E T,n1−1 + E n1
E T,n1 = 40 log + E T,0 (2)
nl
FT,nl = αFT , nl − 1 + (1 − α)Fnl + FT,0 (3)
Here nl is directory of noise solitary surround noticed and α is exponential flat-

tening feature. Figure 3 demonstrations an instance of the projected VAD results.
The distinguished noisy speech have 12 structures and every frame distance is 64.
As we be able to see, the structures 1, 2, 6, 7 are manifest as noise structures, where
as structures 03–05 and 08–10 are distinguished as voice structures.
Recommended Speech Enhancement Process
Fig. 3 Proposed method

1. WP Transmute: Obtain 8 sub-band level frequency from speech with noise y(k).
2. VAD Thresholding: WP thresholding is applied for every sub-band frame ỹ(k),
which produces 8 improved sub-bands. ȳ(k) represents the each improved sub-
band [9].
VAD: Founded on feature adaption, VAD is applied to detect voice and noise
frames for every sub-bands.
Thresholding: Based on the equations tune each sample of sub-band.
P3. Converse WP Transform: Enhance speech signal ŷ(k) is reconstruct with
full-band.
2.2 Iterative-Kalman-Filter
At this point, the full band pre-enhanced speech signal [10] ŷ(k) is auxiliary
administered by an IKF as demonstrated below
y(k) = Hx(k) + w(k), (4)
x(k) = Fx(k − 1) + Gu(k), (5)
Here x(k) = [s(k – p + 1), s(k)], H = GT = [1,.1] ∈ R1×p . The word F signifies
the p × p state changeover matrix characterized as LPCs approximation founded on
Adapted Yule-Walker calculations.
3 Results and Analysis
The detailed step by step input and output waveforms are described in Figs. 4, 5, 6, 7
and 8. Initially the noisy signal is fed to the system which is applied for multi-level
de-noising finally giving out Kalman filtered output speech signal. The next section
details about the comparison of proposed and existing methodologies outcomes with
respect to noise reduction factor.
3.1 Input and Output Waveforms
A. Scenario-A
See Figs. 4, 5, 6, 7 and 8.
Fig. 4 Noisy speech signal
Fig. 5 Multilevel de-noised

speech signal
3.2 Result and Analysis of DWT, Kalman and Curvelet
B. Scenario-B
See Table 1; Fig. 9.
Fig. 6 combined plot signal
Fig. 7 Kalman de-noised

speech signal
4 Conclusion
The proposed adaptive-wavelet-packet (WP) thresholding technique with iterative-

Kalman-filter (IKF) for speech improvement, which includes WP transform, is first
functional to noise dishonored speech on a frame-by-frame basis, which crumbles
very frame into a numeral of sub-bands. For every sub-band, a speech movement indi-
cator is calculated to identify the voiced or un-voiced fragments of speech. Founded
on the VAD outcome, an adaptive thresholding structure is formerly exploited to
each sub-band speech to attain the pre-enhanced speech. To attain an additional level
Fig. 8 Combined plot signal
Table 1 Result and analysis of DWT, Kalman and Curvelet

Sl. No. Noise DWT (db) Kalman (db) Curvelet (db)
1 Burst noise 1 5.01 12 15.8357
2 Burst noise 2 4.94 10.2265 19.198
3 Pink noise 5.0068 8.3836 16.2092
4 White noise 4.979 10.8525 16.3931
25
20
15
dwt
kalman
10
curvelet
0
burst noise1 burst noise2 pick noise white noise
Fig. 9 Flow chart shows the result and analysis between the DWT, Kalman and Curvelet
of improvement, an IKF is sub sequent smeared to pre-enhanced speech. The antic-

ipated technique is appraised under numerous noise circumstances. Investigational
results are delivered to validate the efficiency of the recommended technique as asso-
ciated to some earlier works in footings of segmental SNR and perceptual estimation
of speech excellence as two well-known enactment indexes.
The obtainable iterative collection and consecutive speech improvement proce-
dures in attendance of highlighted contextual noise, and associated presentation of
such procedures through alternate speech enhancement procedures. The iterative-
batch procedure employments the EM technique to approximation the spectral
restrictions of speech-signal and noise procedure. Every recapitulation of algorithm
is collected of an estimate (E) step in addition to a maximization (M) stage. The
E-step is applied by means of Kalman sifting calculations. The M-step is applied
by means of a non-standard YW calculation set, in which associations are substi-
tuted by their a posteriori standards that are considered by exhausting the Kalman
filtering calculations. The improved speech is attained as a byproduct of Estep. The
presentation of this procedure was associated to other alternative speech improve-
ment algorithms. A separate advantage of the recommended algorithm associated
to substitute algorithms is that it improves the quality and SNR of speech, while
preservative its lucidity and ordinary sound. Additional advantage of procedure is a
VAD is not obligatory
References
1. Xia, B., Bao, C.: A wavelet fusion method for speech enhancement. In: International Conference
on Electrical Power and Energy Systems (ICEPES), pp. 473–476, 04 (2017)
2. Islam, M.T., Shaan, M.N., Easha, E.J., Tahseen Minhaz, A., Shahnaz, C., Anowarul Fattah,
S.: Enhancement of noisy speech based on decision-directed wiener approach in perceptual
wavelet packet domain. In: Proceedings of the 2017 IEEE Region 10 Conference (TENCON),
Malaysia, pp. 2666–2671, 21 (2017)
3. Baishya, A., Kumar, P.: Speech de-noising using wavelet based methods with focus on classi-
fication of speech into voiced, unvoiced and silence regions. In: 5th International Conference
on Signal Processing and Integrated Networks (SPIN), pp. 419–424 (2018)
4. Rasetshwane, D.M., Boston, J.R., Durrant, J.D., Yoo, S.D., Li, C.-C., Shaiman, S.: Speech
enhancement by combination of transient emphasis and noise cancelation. In: 2011 Digital
Signal Processing and Signal Processing Education Meeting (DSP/SPE), pp. 116–121, 4–7,
Sedona, AZ, USA (2011)
5. Li, J., Sakamoto, S., Hongo, S., Akagi, M., Suzuki, Y.: Two-stage binaural speech enhancement
with wiener filter for high-quality speech communication. Speech Commun. 53(5), 677–689
(2011)
6. Varga, A., Steeneken, H.J.: Assessment for automatic speech recognition: Ii. noisex-92: a
database and an experiment to study the effect of additive noise on speech recognition systems.
Speech Commun. 12(3), 247–251 (1993)
7. Roy, S.K., Zhu, W.-P., Champagne, B.: Single channel speech enhancement using subband
iterative Kalman filter. In: Proceedings of IEEE International Symposium on Circuits and
Systems (ISCAS), pp. 762–765, IEEE (2016)
8. Ishaq, R., Zapirain, B.G., Shahid, M., Lo¨vstro¨m, B.: Subband modulator Kalman filtering
for single channel speech enhancement. In: Proceedings of IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 7442–7446, IEEE (2013)
9. Oktar, M.A., Nibouche, M., Baltaci, Y.: Speech denoising using discrete wavelet packet decom-
position technique. In: Proceedings of IEEE Signal Processing and Communication Application
Conference (SIU), Zonguldak (2016)
10. Verma, N., Verma, A.K.: Performance analysis of wavelet thresholding methods in denoising of
audio signals of some Indian Musical Instruments. Int. J. Eng. Sci. Technol. 4(5), 2040–2045.
ISSN 0975-5462 (2012)
Dexterous Trashbot
Eshwari A. Madappa, Amogh A. Joshi, P. K. Karthik, Ekhelikar Shashank

and Jawali Veeresh
Abstract Waste segregation involves assembling a wide range of waste in an

unsorted way by utilizing manual work. Segregation of this waste is exception-
ally repetitive, tedious and wasteful. There is a requirement for a framework, which
robotizes the procedure of waste isolation, with the goal that the junk transfer can
be executed effectively and productively. In the proposed solution, the framework
utilizes deep learning approach, use of CNNs, which combines with robotics to clas-
sify wastes into categories like plastic, paper, cardboard, glass and metal. Garbage
segregation is accomplished using classifying images and picking the object using a
robotic arm. The problem of manual picking of garbage which is hazardous to human
health is avoided using the Dexterous Trashbot. ResNet and Vgg architectures were
used for transfer learning and gave better results than a custom CNN model for
image classification. Real-time images captured are classified by making a HTTP
request to the deployed model. The garbage is segregated into respective category
by picking the selected object using robotic arm. The solution proposed discourages
the hazardous manual picking of garbage. Segregation of dry waste could be done
in real-time. Dexterous Trashbot has the ability to segregate garbage easily without
any human intervention.
Keywords Convolutional neural networks · Object classification · Transfer

learning · VGG · ResNet · Waste segregation · Robotic arm
1 Introduction
Garbage segregation is a problem faced worldwide and challenging task in urban

and metropolitan areas across the globe. According to the Press Information Bureau,
E. A. Madappa · A. A. Joshi (B) · P. K. Karthik · E. Shashank · J. Veeresh

Sri Jayachamarajendra College of Engineering, JSS Science and Technology University, Mysuru,
Karnataka 570006, India
E. A. Madappa
e-mail: eshwarinaveen@sjce.ac.in

https://doi.org/10.1007/978-3-030-38445-6_7
88 E. A. Madappa et al.
62 million tonnes of waste (mixed waste containing both recyclable and non-
recyclable waste) is generated in India every year, and the average annual growth rate
of is reported to be 4% [1]. Manual segregation of waste is dangerous since it can
cause health hazards, the toxicity is unknown and the quality of life is degraded. It
is also necessary to understand that garbage segregation is essential to ensure proper
waste treatment. Diverse waste material requires different waste treatment procedures
and mixed waste cannot be treated. Garbage can be segregated into biodegradable
wet waste and inorganic dry waste, and the two categories of waste can be treated
accordingly.
Biodegradable waste can be deposited in vacant land for composting or can be
sent to dumping ground. Non-biodegradable waste can be recycled or be treated
separately. This work aims to segregate garbage into five categories namely glass,
paper, cardboard, plastic and metal. Paper and cardboard waste can further be treated
as recyclable waste. Segregation makes the recycling of the waste easier.
1.1 Background and Related Works
One of the earlier works related to classification of trash involved a group of students
at Stanford University [2]. It involved the use of SVM and CNN to classify the
trash and found that SVM performed better over CNN. The dataset consisted of
six categories of waster classified into paper, plastic, cardboard, glass, metal and
trash. The labelled dataset consisted of around 2600 images. Concepts like SIFT
and radial basis function were applied during the course of experimentation in this
study. The highest accuracy reported using SVM was 63%. Through RecycleNet [3],
the authors were able to obtain better results than the earlier work. It is evident that
they have developed a variation of DenseNet [4] through optimization. The work
also saw the experimentation with optimizers where Adam and AdaDelta optimizers
were used. Transfer learning was used involving the architectures of Inception-v4,
ResNet and MobileNet. A similar work [5] is the project from 2016 TechCrunch
Disrupt Hackathon, where garbage segregation was being done to classify the waste
into two categories—recyclable and compostable. The images were captured using
RaspberryPi camera and prediction was accomplished using a TensorFlow model. In
[6] the authors came up with a Faster R-CNN [7] implementation of waste classifier
with the classes landfill, recycling, and paper. The model was pre-trained on the
PASCAL VOC dataset [8]. Our implementation involves a combination of transfer
learning and custom CNN to provide a novel solution to the problem of garbage
segregation. In transfer learning, we have used ResNet [9] and VggNet [10]. It was
seen that Vgg-16 performed the best. The project involves the use of labelled dataset
with around 2500 images belonging to five categories paper, plastic, glass, trash and
metal. FastAI [11] libraries were used in the development of the deep learning model.
Functions from these libraries help in eliminating the need to identify the optimal
values of learning rate.
Dexterous Trashbot 89
Fig. 1 Sample images

present in dataset
2 Dataset
An important part of this work is the classification of garbage into five categories—
plastic, paper, cardboard, metal and glass. The images have been obtained from the
dataset through the repository [12]. There are about 500 images for each category
accounting to about 2500 images in total. The images have been resized to 224 × 224
through transforms while training the model when applying transfer learning. Various
data augmentation techniques like applying vertical flip, warp, lighting, zoom and
rotation have been applied in the process. The size of the dataset is around 3.5 GB
(Fig. 1).
3 Models and Methods
3.1 Transfer Learning
The concept of transfer learning was applied using architectures—ResNet and Vgg.
The models used in the process are ResNet-34, ResNet-50, VggNet-16 and VggNet-
19. The models were trained using FastAI [11] libraries with PyTorch [13] as using
Jupyter notebooks availing Google’s Colab [14] services. Models were trained by
finding the optimum learning rate and calling the fit-one-cycle methods as described
in [15]. These techniques are optimized to find the best learning rate in order to
optimize the accuracy of the model. Adam [16] optimizer was used while training
the model. It was observed that Vgg-16 as the best performing model on the given
dataset. Train and test split was accomplished with 60-20-20 train/validation/test
split.
Figure 2 provides the confusion matrix data obtained with the learning rate of
0.01 and trained for 7 epochs. The accuracy obtained was 94.351% and F1 score
was observed to be 94.352%. Vgg-19 model was trained with learning rate of 0.01
and trained for 4 epochs (Fig. 3). The model produced accuracy of 89.53% and F1
score of 89.49% as shown in Table 1. The ResNet-34 model used learning rate of
0.01 and trained for 4 epochs. Accuracy of 86.82% and F1 score of 86.72% were
Fig. 2 Vgg-16 confusion

matrix
Fig. 3 Vgg-19 confusion

matrix
Table 1 Performance
Architecture Accuracy F1-score
comparison of architectures
ResNet-34 86.8201 86.7281
ResNet-50 89.1213 89.0312
Vgg-16 94.3515 94.3522
Vgg-19 89.5397 89.49214
reported. Confusion matrix of the same can be seen in Fig. 4. The ResNet-50 model
used learning rate of 0.01 and trained for 4 epochs. Accuracy of 89.12% and F1 score
of 89.03% were reported. Confusion matrix of the same is evident in Fig. 5.
Fig. 4 ResNet-34 confusion

matrix
Fig. 5 ResNet-50 confusion

matrix
3.2 Convolutional Neural Network
A convolutional neural network based deep learning model was developed using
PyTorch [13]. The model was split using train/validation/test split of 60/20/20 rule.
A batch size of 32 images was used. Data augmentation techniques used in the
process includes random horizontal flip, random rotation and normalization. The
size of images was reduced to 128 × 128 in order to improve the performance of
the model. Adam optimizer was used, with a learning rate of 9e−04. A ten-layer
convolution neural network was formed as follows:
1. Layer 0: Input image of size: 128 × 128.
2. Layer 1: Convolution with output channels 16, 3 × 3 kernels, stride 1, and
padding 1.
3. Layer 2: Max-pooling with 2 × 2 filter, stride 2.
padding 1.
padding 1.
padding 1.
10. Layer 9: Fully Connected with 8192 input features and 512 output features.
13. Intermediate: Dropout with p = 0.2 was used between fully connected layers
and the activation function ReLU was used.
An accuracy of 75% was reported by the convolutional neural network (Fig. 6).
Fig. 6 Training versus

validation loss
4 Model Deployment
It was observed that Vgg-16 was the best performing model of the four. The model
was deployed on Render using Flask based application. A pre-configured Docker-
File with the requirements was created and deployed to create an run on the plat-
form provided by Render [17]. A GitHub integration is a requisite in the process of
deployment. The corresponding GUI application rendered is as in Figs. 7 and 8.
The application runs on Render [17] with the latest commit from the Github
repository being used for deployment. An endpoint is exposed for the classification
of images. The images of the object are sent as multipart-form encoded data and sent
in the HTTP POST request. A corresponding response obtained from the call gives
us the result of the object classification.
Fig. 7 Standalone GUI

application
Fig. 8 Prediction using the

GUI application
5 Hardware Implementation
The robot built to segregate the classified garbage is mainly controlled by a Raspber-
ryPi. The RaspberryPi runs on Raspbian OS and a python program in the RaspberryPi
controls all the operations of the robot. Each of the functionality of the robot is imple-
mented and program is written in separate functions. From the main control program,
the functions are called according to the requirement. When the control program is
run, the robot slowly moves towards the target object from its initial position. The
distance travelled by the robot is specified in terms of time. The default time set for
the robot is one second. The time could be easily changed, and the distance travelled
by the robot could be manipulated. When the robot reaches the object to be classified,
the control program then takes a picture of the target object using a web camera and
send a HTTP POST request to the deployed deep learning model. The trained deep
learning model detects the waste object present in the picture and classifies it to the
one of the categories such as plastic, paper, cardboard, metal or glass. The result of
classification is sent back to the control program running in the RaspberryPi. After
obtaining the result, the robotic arm picks the waste object and travels backwards to
its original position. Based on the result obtained from the deep learning model, the
robot travels in one of the five directions and segregates the waste picked according
to the category detected. In the control program a particular direction is mentioned
for each of the five waste categories. The five directions for five categories are par-
tial left, partial right, right, left and deep right. The program to move the robot in
a particular direction is also written and functions are available for each direction.
When a particular waste category is detected, required direction is selected in the
main control program.
A single axis robotic arm with two degrees of freedom is designed to pick the
waste object classified by the deep learning model. The robotic arm is made up of two
servo motors to provide two degrees of freedom. The robotic arm can move up and
down with the help of the servo motor attached to the base. The claw of the robotic
arm opens and closes based on the movement of the second servo motor attached at
the tip of the robotic arm. Both the servo motors are also controlled by RaspberryPi.
Angle of rotation of servo motor is specified by giving a PWM signal with particular
duty cycle. The servo motor expects a pulse every 20 ms and the length of the pulse
will determine how far the motor turns. The RaspberryPi module RPi.GPIO which
is used to control the servo motor provides this duty cycle value to the servo motor
in terms of percentage (Figs. 9 and 10). So the expression to calculate the percentage
duty cycle is:
O N T ime ∗ 100
Dut y C ycle = (1)
Period
Fig. 9 Dexterous Trashbot

working prototype side view

working prototype front view

picking the waste object after
detection
6 Result
Various deep learning architectures are used for training and building the model. The
results obtained from the models were evaluated through a combination of metrics.
It was observed that VggNet-16 performs the best among the architectures and is
used for deployment.
The robot was able to pick the waste object which are feasible and isolated the
picked object properly. The robot fails to pick the detected object when the weight
of the object cannot be handled by the robotic arm or when the size of the object is
larger than the size of the claw (Figs. 11 and 12).
The results obtained can be tabulated as in Table 1.
7 Conclusion
The deep learning models developed for image classification were based on con-
volutional neural networks and transfer learning approach. While implementing the

segregating the picked waste
object
work, object classification paradigm was implemented. It was observed that transfer
learning gives better results than the custom CNN model. Hence, the prediction might
result in errors if collection of objects belonging to different classes are present in
the image. A HTTP request is made to the deployed model in order to classify the
image in real-time which implies that internet is a required medium for the task of
image classification. This could lead to possible delay depending on the connectivity
and internet speed. Also, the dataset is restricted to five categories—paper, plastic,
cardboard, glass and metal and should be exhaustive to implement in real-time sce-
narios. The hardware complications are also a point of concern while picking the
object. Feasibility of picking the object is dependent on the strength of the arm and
the size of the object.
8 Future Scope
Object detection can be implemented in future to identify multiple objects in a given

image. This could be implemented by state-of-the-art techniques like YOLO [18]
or SSD [19]. The concept of applying bounding boxes around the objects helps to
localize the image in the case of YOLO algorithm for object detection. Also, the
robotic arm used for picking the object can be made rugged. A crucial part of the
work in segregating is Inverse Kinematics and mapping 3-D coordinates to identify
the depth of the object. This can also be improved. Navigation and obstacle avoidance
algorithms can also be implemented to make the overall process slick and robust.
References
1. Press Information Bureau: Solid waste management rules revised after 16 years; rules
now extend to urban and industrial areas. http://pib.nic.in/newsite/PrintRelease.aspx?-relid=
138591. Available via Press Information Bureau, Government of India (2016)
2. Yang, M., Thung, G.: Classification of trash for recyclability status. arXiv preprint (2016)
3. Bircanoğlu, C., Atay, M., Beşer, F., Genç, Ö., Kızrak, M.A.: RecycleNet: intelligent waste
sorting using deep neural networks (2018). https://doi.org/10.1109/inista.2018.8466276
4. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional
networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
Honolulu, HI, pp. 2261–2269 (2017). https://doi.org/10.1109/cvpr.2017.243
5. Donovan, J.: Auto-trash sorts garbage automatically at the techcrunch disrupt hackathon.
https://techcrunch.com/2016/09/13/auto-trash-sortsgarbageautomaticallyat-the-techcrunch-
disrupt-hackathon. Available via TechCrunch (2016)
6. Awe, O., Mengistu, R., Sreedhar, V.: Smart trash net: waste localization and classification.
arXiv preprint (2017)
7. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: towards real-time object detection with
region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2015)
8. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual
object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
10. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition. In: CoRR. https://arxiv.org/abs/1409.1556 (2015)
11. Howard, J., et al.: https://github.com/fastai/fastai. Available via GitHub (2018)
12. Thung, G.: Trashnet. https://github.com/garythung/trashnet. Available via Github (2016)
13. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A.,
Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: NIPS-W (2017)
14. Google: Google colaboratory. https://colab.research.google.com/. Available via Colab (2017)
15. Smith, L.N., Topin, N.: Super-convergence: very fast training of residual networks using large
learning rates. In: CoRR. https://arxiv.org/abs/1708.07120 (2017)
16. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: CoRR. https://arxiv.org/
abs/1412.6980 (2015)
17. Render: https://render.com/
18. Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-
time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pp. 779–788 (2016)
19. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: SSD: single shot
multibox detector. In: ECCV (2016)
Automated Question Generation
and Answer Verification Using Visual
Data
Shrey Nahar, Shreya Naik, Niti Shah, Saumya Shah and Lakshmi Kurup
Abstract This paper delineates the automation of question generation as an exten-

sion to existing Visual Question Answering (VQA) systems. Through our research,
we have been able to build a system that can generate questions and answer pairs
on images. It consists of two separate modules—Visual Question Generation (VQG)
which generates questions based on the image, and a Visual Question Answering
(VQA) module that produces a befitting answer that the VQG module generates.
Through our approach, we not only generate questions but evaluate the questions
generated by using a question answering system. Moreover, with our methodology,
we can generate question-answer pairs as well as improve the performance of VQA
models. It eliminates the need for human intervention in dataset annotation and also
finds applications in the field of the educational sector, where the requirement of
human input for textual questions has been essential till now. Using our system, we
aim to provide an interactive interface which helps the learning process among young
children.
Keywords Natural language processing · Computer vision · Question-answering ·

Visual question answering · Visual question generation · Recurrent neural
The authors Shrey Nahar, Shreya Naik, Niti Shah and Saumya Shah are equally contributed to this
chapter, and author Lakshmi Kurup is a Principal Investigator.
S. Nahar · S. Naik · N. Shah · S. Shah (B) · L. Kurup

Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai,
India
e-mail: saumya.shah@djsce.edu.in
S. Nahar
e-mail: shrey.nahar@djsce.edu.in
S. Naik
e-mail: shreya.naik@djsce.edu.in
N. Shah
e-mail: shah.niti@djsce.edu.in
L. Kurup
e-mail: lakshmi.kurup@djsce.ac.in

https://doi.org/10.1007/978-3-030-38445-6_8
100 S. Nahar et al.
network · VQA dataset · Image feature extraction · Long short term memory ·
Gated recurrent unit · Convolution neural network · Attention module · E-learning
system
1 Introduction
In recent years, we have witnessed significant progress in various fields of AI, such
as computer vision, as well as language understanding. These progress motivated
researchers to address a more challenging problem: question answering and question
generation on visual data. This problem combines both image understanding as well
as language understanding.
Essentially, this task is defined as follows: an image along with a question about
that image is the input to the AI system, and the intelligent system is supposed to
output a correct answer to the question with respect to the input image. However,
taking a step beyond that, we through this project, aim to create a system which on
taking an image as input generates a question-answer pair and when the user submits
the answer, verification of the answer takes place.
Our solution thus involves the usage of both the Visual Question Generation
(VQG) and VQA model in a sequential manner. The VQG module generates reason-
able questions given an image whereas, the VQA module generates natural answers
given a question and an image. On the basis of the image, the VQG and VQA module
will be used to get the correct question and then the answer.
The proposed solution will aim towards bridging the gap and assimilating veri-
fication of answers on the existing question-answer generation platforms. The most
immediate application is as an automatic dataset annotation tool. We can reduce
the need for human intervention for dataset annotation. With an increasing need for
varied datasets for Deep Learning, such applications can help reduce the need for
human labor. Further, the solution will benefit the educational sector, as well as help,
improve existing authentication using captcha. In the educational sector, children can
use such an application to learn to answer basic questions from images. While these
same images can be used as an extra step in authentication to increase the security
of the project (Fig. 1).
2 Theory
2.1 Visual Question Answering
In the year 2015, a system [1] for answering free-form and open-ended questions
about an image was introduced in the field of Artificial Intelligence. Given an image
and a natural language question based on the image, the goal was to provide an
accurate natural language answer. The research jointly utilized the linguistic and
Automated Question Generation and Answer Verification … 101
Fig. 1 Overall flow
visual features of the input in order to spawn the next generation of AI algorithms,
such that the multi-modal knowledge and well-defined quantitative evaluation metric
would be incorporated.
With an aim to increase the accuracy of answers related to a set of fine-grained
regions in an image in VQA, an attention mechanism based system [19] was intro-
duced in 2016 by a group of researchers. This stacked attention network (SAN)
system uses the question’s semantic representation as a query to explore regions
in an image that are relevant to the answer. An improvement over this image-
based attention mechanism was introduced through the research [8] on hierarchical
question-answer co-attention. The mechanism jointly reasons about visual attention
and question attention, using the concept of question hierarchy.
In order to elevate the role of image understanding in Visual Question Answering,
another form [4] of research in this field was introduced in the year 2017. This
involved using complementary images in the dataset to enable the construction of an
improved model, which provides answers to questions based on an image along with
a counter-example based reasoning. Hence, the model recognizes similar images,
taking into account that they might have different responses to the same question.
Some of the research in the field of VQA also involves finding the relevancy of
a question with respect to any given image. One such paper [13] demonstrates the
concept of question relevance by identifying non-visual and false premise questions.
2.2 Visual Question Generation
To move beyond tasks based on the literal description of images, a system to ask
interesting and natural questions, given an image, was developed in the year 2016
[10]. This task of Visual Question Generation is thus an extension to the existing
Visual Question Answering Systems, to understand how questions based on images
102 S. Nahar et al.
could be linked with common sense inference and reasoning. This system that was
developed presents generative and retrieval based models for dealing with the task
of question generation on visual images.
Another application of Visual Question Generation lies in the identification of
unknown objects in an image [15]. Since it is not possible to train image recognition
models with every possible object in the world, Visual Question Generation based on
unknown objects can be used for class acquisition. This process of class acquisition is
completed with humans answering unknown object-based questions for each image.
3 Review of Literature
Before designing a VQG and verification system, we must analyze the existing
methodologies in VQA as well as VQG. The use of an integrated system which
uses the elements of VQG and VQA has not yet been properly explored. Thus, it is
also imperative that we understand the limitations of using a particular methodology
before we integrate it with the remaining modules of the system.
The methodology, given by Zhang et al. [20], deals with the generation of questions
of diverse types. It makes use of the simple analogy of creating questions of each
question type (namely—what, when, where, who, why and how). Each question type
will address a specific set of features in the image. The steps used in this process
requires a generation of captions for each image. The paper suggests the use of
DenseCap [5] (originally proposed by Johnson et al.) to generate captions based on
the features in the image. For every caption, a question type is chosen and using a
Long Short Term Memory (LSTM), a question is generated. The probability of each
question type with the given caption is compared using Jaccard similarity to evaluate
the most fitting question generated. The caption model DenseCap generates a set of
captions for a given image.
Another study that makes use of an approach to combine VQA and VQG suggested
by Li et al. [7] construes the problem as a dual-task. They posit that given a question
and an image, a VQA system can be used and given a caption and an image a VQG
system is used. An end-to-end unified framework is developed, that performs VQG
and VQA as a dual task.
In the VQA component, given a question, an RNN is used for obtaining the
embedded feature using MUTAN [3]. In the VQG component, a lookup table is used
for obtaining the embedded feature. A CNN with attention module (proposed as Dual
MUTAN) is used for getting the visual feature from the input image and the answer
feature.
Another approach given by Meng et al. [9] takes into account an attention mecha-
nism, which takes a text explanation of an image, an image, and produces a focused
view of the visual features in the image that explains the text and a specific word
or phrases in the text that are most important. This paper answered multiple-choice
questions about a given image using Parts-of-Speech (POS) tagging for question
generation. After generating a question, co-attenuate the image and text together to
remove any kind of errors.
The work of Mostafazadeh et al. [10] is on how questions about an image are often
directed at commonsense inference in the image and the abstract events evoked by
objects of the images. Here, they introduce the task of Visual Question Generation
(VQG), where the system is tasked with asking an engaging a natural question when
shown an image. The authors provide three data sets which cover a plethora of images
from object-centric to event-centric, with considerably more abstract training data.
Three generative models are implemented by them of which the state of the art is
based on a Gated Recurrent Neural Network (GRNN).
Xu et al. [18], in their paper, implemented the captioning of an image, however,
it also presents an approach that can be applied for question generation as well. It
uses a deep convolution neural network (CNN) as an image encoder. The features
of each image are directly fed into a recurrent neural network to generates sentences
after extraction from the last hidden layer. The recurrent structure used is a Long
Short Term Memory (LSTM) memory network which is known for its performance
in sequence translation. The model is trained by using the image features as all
preceding words. To evaluate the question, we use BeamSearch which picks the top
k questions up to a given time instant before proceeding forward.
Another paper proposed by Vinyals et al. [17] is an improvement on the previous
hypothesis. The idea is to use an attention-based mechanism that guides the model
by focusing on certain words/phrases in the text and regions in the image. While this
paper is used for image captioning, it can be used for question generation as well. It
adds another component to the architecture which consists of an RNN based image
attention model. It takes an image and a region description and creates an image
feature which blurs out everything except the subject in attention. It then passes
these feature vectors to an LSTM which generates the questions as per normal.
4 Proposed Architecture and Modular Description
4.1 Complete Architecture
The architecture consists of three major components, that is, the preprocessing unit,
the Visual Question Generation (VQG) module, and the Visual Question Answering
(VQA) module. The preprocessing module requires image processing techniques to
prepare the image and convert it to a format more easy to work with. For image
feature extraction, we use the VGG19 [14] convolutional neural network. Further,
annotation and question preprocessing is carried out to generate vocabulary. The
Visual Question Generation (VQG) module will take in the visual features that will
learn how to generate an embedding of the question. At each time-step, the same
visual features are shown to the model, which will produce one question word at a
time. The generated question is fed to Visual Question Answering (VQA) module
104 S. Nahar et al.
Fig. 2 Complete architecture
that will generate corresponding answers to the given questions. In order to check
the validity of the question-answer pairs produced, answer verification is carried out
at the last step (Fig. 2).
4.2 Visual Question Generation
Visual Question Generation (VQG) has been lately researched as an expansion to

captioning of images. VQG methods are used to generate questions which are differ-
ent from image captioning methods which describe an image using a few sentences
(e.g., What object is in the room?). In the VQG methodology, the image features
are first encoded using a CNN and then a sentence is generated by decoding those
features using an RNN. The proposed methods in this paper use a gated recurrent
unit network (GRU) and a long short-term memory network (LSTM) (Fig. 3).
Thus, the question generation module is the crux of our model, which uses a
CNN-LSTM model that takes an image as the input and generates questions from
its features. For training and testing, the COCO dataset and the VQA datasets have
been used. It comprises of two sub-modules:
1. An LSTM encoder for question generation using image features.
2. An LSTM decoder and a language model trained on image annotations and
questions.
In the beginning, we define a set of parameters for training the LSTM. The process
then takes each image and question as input and creates an embedding. The algorithm
then iterates for each step of the LSTM and finds the current label embedding from the
previous state. The next output state is predicted by using the current embedding and
LSTM state. Thus, at each state, we predict the word built on maximum probability.
Fig. 3 Question generation
Therefore, based on the current state of the image, an image embedding is generated.
For each subsequent layers thereafter the maximum probability word is matched to
the corresponding embedding. Finally, we output the generated words as generated
questions.
4.3 Visual Question Answering
To understand the images, questions, and their interactions well, Visual Question
Answering (VQA) is used. Visual Question Answering is a model which uses an
image and an associated question about the image as the input. Using the VQA
algorithm, an answer to the question is obtained as the output. The Visual Question
Generation module will provide questions to the Visual Answer Generation module.
The image and question together form the question embedding which is given as
input to a Long Short Term Memory network (LSTM) to get the output. Visual
features of the image are also extracted in this model (Fig. 4).
It comprises of two modules:
1. A CNN to extract the visual features from images.
2. An LSTM which takes question embedding and visual features as input to
produce corresponding answers as output.
The training data used for the VQA module is the same as the training data
used for the VQG module. We then get top answers for each image and filter out
corresponding questions from the top answers.
106 S. Nahar et al.
Fig. 4 Answer generation
For our GRU implementation, we use episodic memory that uses a retention-based
mechanism for learning important entities in a question. After which for each state,
we compute the log probability of each word being predicted and compare against the
ground truth. The word with the highest log probability is generated. Subsequently,
for the current batch, we get the positional weights and set up the Adam Optimizer
which will calculate the loss and propagate it to the subsequent layers.
5 Experimental Setup
5.1 Datasets
Visual QA systems require extensive training to be competent to ask meaning-

ful and grammatically coherent questions. For such a scenario it is essential
that we provide a large amount of training data which encompasses a variety of
objects, scenes and visual features along with its corresponding language equivalent.
The training LSTM model should be exposed to precise captions and questions for
the corresponding image. Since our system would generate questions answers on the
same preprocessed dataset, it is essential that we choose a well-evaluated, accurate
dataset. For this reason, we opted to use the VQA dataset [1] which consists of over
80,000 training images and over 200,000 training questions. The VQA dataset uses
the images and captions from the MS COCO [6] dataset which is the largest dataset
for vision and language tasks. The VQA dataset is the best choice as it provides
human-annotated training questions which also serve the Visual Question Answer-
ing tasks. On this dataset, we perform various transformations which make it fit for
training.
5.2 Data Preparation
The problem of Visual Question Answering makes use of two kinds of data—struc-
tured data from text annotations and unstructured input from images. Such multi-
modal systems require the data to be processed correctly so that the generator model
can make use of the data models correctly. The two preprocessing steps are discussed
below.
5.2.1 Image Preprocessing
Image Preprocessing primarily involves image feature extraction which requires the
use of Convolutional Neural Network capable of performing object recognition. For
such purposes, there are many sophisticated pre-trained CNNs which are trained on
a host of images and object types. We used the VGG19 [14] pre-trained network for
our image preprocessing purposes. However, contrary to a traditional CNN, we do
not train the features until the last activation layer. We take the output points from
the penultimate layer, which in the case of the VGG19 is the fc7 layer. The feature
vectors from this layer having 4096 output points, will be directly fed into the LSTM
for training.
5.3 Annotation Preprocessing
We start the data preprocessing by inputting the annotation and question dataset for
all the images, after which we preprocess and encode the answers. For each image,
we filter out the top answers based on their confidence levels given in the dataset.
Based on the top answers the corresponding questions are filtered and encoded.
These encoded questions and answers are mapped to array elements to represent
a Word2Vec representation [11]. Thus we build a representative vocabulary which
allows words to be generated by the later stages of our VQG and VQA architecture.
For unknown words, we map it to an ‘unk’ token, but we do not allow the decoder
to generate this at the time of training and testing.
108 S. Nahar et al.
5.4 Implementation Details
Our proposed solution can be viewed as a combination of two separate modules, VQG
and VQA. By sharing our training data, the model is trained with two tasks. Although
our proposed method for VQG is borrowed from [17], however, we have adapted
the implementation for question generation, as opposed to caption generation. For
our image preprocessing, we use the ImageNet pre-trained VGG19 architecture for
computing the image features. As stated above, we resize the images to 448 * 448
dimension and apply it as input to the deep convolution neural network. The input
to the deep generative models is the 4096-dimensional output from the fc7 layer of
the CNN.
For the language generation model, we use a Long Short Term Memory (LSTM).
The LSTM model is a Tensorflow implementation of Show and Tell [17]. The LSTM
architecture was ideal since it is a recurrent neural network with a memory compo-
nent, which is required to connect previously processed information, thus remember-
ing long-term dependencies. The LSTM architecture takes as input the 4096 feature
vector for each image, processed by the CNN and the corresponding question for
the image from our questions training data. Each layer of the LSTM takes one word
as labels for supervised learning. All the other word embedding and hidden layers
are vectors of size 512. We have initialized the number of LSTM layers to be 28,
with the maximum word limit to be 26 words and 2 layers for the start and end tags.
We use Adam optimizer with a learning rate of 0.001 to update the parameters. The
training takes place in batches of 256 and is trained for 300 epochs to maximize per-
formance. For the VQA module, we build a GRU which is well suited for combining
the visual features and text features. The question from the VQG component is first
converted into a word embedding and is passed to the GRU in a sequential fashion.
This fixed-length vector is concatenated with the 4096 dimensional CNN vector for
the image and passed on to a multilayer perceptron with fully connected layers. It
utilizes a dynamic memory network with an attention mechanism to generate the
answer based on the encoded question. The last layer is the softmax which converts
the logit output and provides a probability distribution. The word with the highest
probability is chosen as the answer.
6 Evaluation
The VQA and VQG models, use two preprocessing files—one from the images and
one from the annotations. The annotations contain 2,483,490 captions and 248,349
questions for 82,783 training images. All these images are more object-specific
images, having common objects. However, VQA dataset is not trained to recognize
people or recognize unknown objects. Hence, while testing, some of the conditions
to be followed are:
1. The images used for testing should have common objects.
2. An image should ideally portray some kind of action.

3. If not an action image, then, any image with a display of recognizable objects is
possible.
In the process of VQG, it is difficult to produce a limited set of questions. Hence,
forms of automatic evaluation become tricky. Since the questions that are generated
are based on a random object, that the deep neural network picks up from the image,
they do not always correspond to any validation questions associated with that image.
Hence we need to rely on a combination of human evaluation and automated metrics
to evaluate our model.
6.1 Baseline
For our evaluation, we consider the baseline as the image caption generation model
[17] for our visual question generation. The implementation is similar to the question
generation model proposed by Zhang et al. [20]. The difference between the two
hypotheses is the use of the caption generation module called DenseCap [5] in [20]
which is used to generate captions from which questions are obtained. The two
models use an LSTM architecture for language generation, and hence we use [20] as
our baseline. For the VQA module, we have chosen the hypothesis proposed by Lu
et al. [8] as our baseline paper. Similar to our implementation, this model also uses
an attention-based system to improve answer generation accuracy.
6.2 Human Evaluation
Our system which is an automated visual question generation and answering system
uses black-box testing. We use this method to hide the complicated internal functions
and display the end result to the user and implement it using equivalence partitioning
technique. In equivalence partitioning technique, the input values are divided into
valid and invalid partitions and then representative values are chosen from each
partition for the testing data. The human judges see the image along with the question.
Only on answering the question themselves, the answer from our VQA model is
revealed. This eliminates any indirect bias. Our human evaluation was carried out by
asking three people each from the categories: academic, worker and student. Each
of these candidates was asked to evaluate our model with an accuracy ranging from
0 to 100.
110 S. Nahar et al.
6.3 Automatic Evaluation
The samples collected from the human evaluation were also used as reference
hypotheses for our automatic evaluation. Since VQG is a sequence generation prob-
lem, we use BLEU [12] and METEOR [2] recall score, which evaluates sentence-
level semantics and CIDEr [16] which relates image descriptions to evaluate the
correctness of image description. Here, the recall score, as opposed to the precision
score is more useful since we need to test the sensitivity of text rather than speci-
ficity. We use BLEU with a smoothing function up to 4 g and the default setting of
METEOR and CIDEr. While BLEU and METEOR metrics are used for machine
translation use-cases, they evaluate the sentence structure with a high correlation
with human judgment. The CIDEr metric is particularly useful for VQA since it
focuses on the evaluation of image descriptions.
VQA is commonly considered to be a classification problem as it chooses the
answer with the highest probability in the softmax layer of the LSTM. In order
to evaluate the accuracy of predicted answers, we use top-1 and top-5 accuracy
measures. We have taken into account the relevance of the answer to the question as
well as its correctness.
7 Results
See Figs. 5 and 6.
7.1 Evaluation Scores
The VQA dataset [1] consists of three subsets—the training dataset, validation
dataset, and test dataset. We use the val data consisting of 40,504 images to evaluate
our model. For our human evaluation, we randomly choose a set of images which
are evaluated by our human judges. It is important to note the criterion, that each
of these models use to evaluate the system. Since the automatic evaluation metrics
have no reference to the image itself, the human annotators ensure that the question
correlates with the image subject matter. BLEU and METEOR look for sentence
structure, while CIDEr ensures consensus with image annotations. The input to our
system consists of only an image. Based on the image features, the model is able to
deduce the most appropriate question for the image. As we can see from the table,
the VQG component fares very well compared to the baseline and maintains a steady
score from 1 to 4 g in the BLEU metrics. In the VQA component, we can see that there
is a slight improvement in the score. This could be because of using similar training
data for both VQG and VQA components, thus allowing the model to complement
the questions generated by the VQG system well (Tables 1 and 2).
Fig. 5 Correct results of our proposed system
8 Conclusion
Visual Question Answering (VQA) is an important interdisciplinary field that com-

bines the linguistic capabilities in visual data. The popularity of the field is entrenched
in combining image features with linguistic features and the difficulties which come
with them. Moreover, an automated system that combines the VQG, as well as VQA
capabilities, is absent. Our project would, given an image, generate a question, using
our VQG model, which will be passed to our VQA model to generate our answer.
112 S. Nahar et al.
Fig. 6 Incorrect results of our proposed system
Table 1 Automatic
Metric Baseline Proposed solution
evaluation results for VQG
performance performance
Human evaluation – 70
BLEU-1 (recall) 0.434456 0.533762
BLEU-2 (recall) 0.392671 0.467108
BLEU-3 (recall) 0.150000 0.405431
BLEU-4 (recall) 0.997655 0.334750
METEOR (recall) 0.193276 0.399732
CIDEr – 1.234419
Table 2 Automatic
Accuracy score Baseline Proposed solution
evaluation results for VQA
performance performance
Top-1 accuracy 58.2 60.43
Top-5 accuracy – 53.3762
We have also received promising results for our given hypothesis. Thus, if the system
is trained and tested using sufficient data, it will be able to generate questions and
validate them with a robust question answering module.
9 Future Scope
Our future scope is aimed at overcoming the drawbacks of the current system and
advancing it. Since the current VQA system produces one-word answers to questions
generated by VQG, we plan to make the VQA system adaptive to generate sentences
too. To enhance the system further, our future scope incorporates recognizing emo-
tions, actions, and events in the image accurately and generating relevant results
based on the same. Thirdly, we would aim to increase the accuracy of current VQG
and VQA systems, so as to produce more natural question-answer pairs respectively.
References
1. Agrawal, A., Lu, J., Antol, S., Mitchell, M., Zitnick, C.L., Batra, D., Parikh, D.: VQA: visual
question answering. In: IEEE International Conference on Computer Vision (ICCV) (2015).
https://doi.org/10.1109/iccv.2015.279
2. Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved
correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and
Extrinsic Evaluation Measures for Machine Translation and/or Summarization (2005)
3. Ben-Younes, H., Cadene, R., Cord, M., Thome, N.: MUTAN: multimodal tucker fusion for
visual question answering. In: IEEE International Conference on Computer Vision (ICCV)
(2017). https://doi.org/10.1109/iccv.2017.285
4. Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter:
elevating the role of image understanding in visual question answering. In: IEEE Conference
on Computer Vision and Pattern Recognition (CVPR) (2017). https://doi.org/10.1109/cvpr.
2017.670
5. Johnson, J., Karpathy, A., Li, F.: DenseCap: fully convolutional localization networks for
dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (2016)
6. Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele,
B., Tuytelaars, T. (eds.) Computer Vision ECCV 2014. ECCV 2014. Lecture Notes in Computer
Science, vol. 8693. Springer, Cham (2014)
7. Li, Y., Duan, N., Zhou, B., Chu, X., Ouyang, W., Wang, X.: Visual question generation as dual
task of visual question answering. In: IEEE/CVF Conference on Computer Vision and Pattern
Recognition (2018). https://doi.org/10.1109/cvpr.2018.00640
8. Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual ques-
tion answering. In: Proceedings of the 30th International Conference on Neural Information
Processing Systems (2016)
9. Meng, C., Wang, Y., Zhang, S.: Image-Question-Linguistic Co-attention for Visual Question
Answering (2017)
10. Mostafazadeh, N., Misra, I., Devlin, J., Mitchell, M., He, X., Vanderwende, L.: Generating
natural questions about an image. In: Proceedings of the 54th Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Papers) (2016). https://doi.org/10.18653/v1/
p16-1170
11. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in
vector space. arXiv preprint arXiv:1301.3781 (2013)
12. Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation
of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for
Computational Linguistics (ACL), Philadelphia, July 2002, pp. 311–318 (2002)
13. Ray, A., Christie, G., Bansal, M., Batra, D., Parikh, D.: Question relevance in VQA: identifying
non-visual and false-premise questions. In: Proceedings of the 2016 Conference on Empirical
Methods in Natural Language Processing (2016). https://doi.org/10.18653/v1/d16-1090
14. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition. In: CoRR. https://arxiv.org/abs/1409.1556 (2014)
114 S. Nahar et al.
15. Uehara, K., Tejero-de-Pablos, A., Ushiku, Y., Harada, T.: Visual question generation for class
acquisition of unknown objects. In: Computer Vision ECCV 2018. ECCV 2018. Lecture Notes
in Computer Science, pp. 492–507 (2018). https://doi.org/10.1007/978-3-030-01258-8_30
16. Vedantam, R., Zitnick, C.L., Parikh, D.: CIDEr: consensus-based image description evaluation
(2014)
17. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator.
In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015). https://
doi.org/10.1109/cvpr.2015.7298935
18. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show,
attend and tell: neural image caption generation with visual attention. In: Proceedings of the
32nd International Conference on Machine Learning. PMLR, vol. 37, pp. 2048–2057 (2015)
19. Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question
answering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016).
https://doi.org/10.1109/cvpr.2016.10
20. Zhang, S., Qu, L., You, S., Yang, Z., Zhang, J.: Automatic generation of grounded visual
questions. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial
Intelligence (2017). https://doi.org/10.24963/ijcai.2017/592
Comprehensive Survey on Deep
Learning Approaches in Predictive
Business Process Monitoring
Nitin Harane and Sheetal Rathi
Abstract In last few years lot of work is carried out in process mining field by various
researchers. Process mining deals with analysis and extraction of process related
information from the event logs created by business processes. Predictive monitoring
of business process is subfield of process mining which includes activities where
event logs are analyzed to make various process specific predictions. The various
machine learning and deep learning techniques have been proposed in predictive
business process monitoring (BPM). The aim of these techniques is to predict next
process event, remaining cycle time, deadline violations etc. of running process
instance. The goal of this paper is to discuss the most representative deep learning
approaches used for the runtime prediction of business process. The different types
of deep learning approaches used in predictive BPM based on Recurrent Neural
Network (RNN), Long Short-Term Memory (LSTM) and Stacked Autoencoders
have are highlighted in this paper. Also we are focusing on aspects like type of
dataset, predicted values, type of data encoding and quality evaluation metrics for
the categorization of these approaches. In this paper we have highlighted various
research gaps in mentioned deep learning approaches which can be referred by other
researchers in this field to enhance effectiveness of predictive BPM.
Keywords BPM (Business process monitoring) · LSTM (Long short term

memory) · RNN (Recurrent neural network) · Auto encoders
1 Introduction
In field of process mining various techniques are used to extract required information
from event logs created by various business processes (BP) [1]. Such information
extracted from event logs helps to enhance the behavior of ongoing processes. How-
ever, the enthusiasm is expanding to deal with ongoing process instances by applying
N. Harane (B) · S. Rathi

Mumbai, India
S. Rathi
e-mail: sheetal.rathi@thakureducation.org
https://doi.org/10.1007/978-3-030-38445-6_9
116 N. Harane and S. Rathi
process mining techniques. Where predictive monitoring of BPs [2] is subfield of

process mining which helps to provide timely information of ongoing processes to
deal with future risk and improve performance. Various runtime methods are imple-
mented which aimed to develop predictive models [3] that can be used to predict
particular value by extracting useful data from event logs. In this case event log
provides necessary properties of running processes for the prediction.
By taking event log as a input, predictive models generate output which is pre-
dicted value of particular type. There are various types of predicted values like
boolean, categorical or numeric. Predicting such values using instances of ongoing
processes is practically interesting and challenging. These predicted values are used
to assess performance of running processes in the form of effectiveness and effi-
ciency. Also predicted values can be used to mitigate risks or to check process rule
violation.
Now a day’s, deep learning [4] has become very immerging field. In recent times,
various deep learning approaches have been applied in predictive monitoring area [5–
7]. These approaches are developed to predict different kinds of parameters and have
been various areas of BPs. In spite of their disparities, they all provide numerous
comparative perspectives. Therefore, a combined study of mentioned approaches
will enable researchers to explore more in this area. Through this survey we are
going to compare and analyze various deep learning approaches used in predictive
business process monitoring area. These compatible and effective methods comprise
techniques based on deep learning methods.
Our comparative study will help researchers in the predictive monitoring field
to enhance effectiveness of current deep learning approaches. Also they will able
to choose best approach amongst them as per theirs research requirement. Apart
from this, researchers will gain two fold supports from this comparative study. First,
the ideas presented and the general view may enable researchers to structure new
predictive models that improve the efficiency of available techniques. Second, future
researchers will get overall idea about current approaches which will help them to
deal with available gaps in predictive BPM.
The remaining part of this paper is presented as follows. In Sect. 2, we summarize
some basic concepts related to predictive BPM. Sections 3 and 4 discusses vari-
ous deep learning approaches used in predictive BPM considering various aspects.
Section 5 gives comparative analysis of deep learning approaches based on various
evaluation metrics. Finally Section 6 concludes the paper and identifies gaps in this
field.
2 Preliminary Concepts
This section summarizes preliminary concepts like input data for the prediction,
encoding, RNN, and LSTM which are considered for survey.
Comprehensive Survey on Deep Learning Approaches … 117
Table 1 Event log example

Event id Timestamp Resource Cost
9901 22-1-2019@09.15 Aarav 100
9902 22-1-2019@09.18 Aditi 200
9903 22-1-2019@09.27 Aarohi 300
2.1 Input Data
The event log created from different running processes used as a primary input in the
predictive BPM techniques. Table 1 shows example of event log which is generated
by ongoing process. Each row represents the execution of a process event and its
information. Typically, this information includes identity of process instance, event
and the timestamp of executed event. Additional attributes like resource person name
or other attribute related to activity can also be included in the log.
The various parameters shown in Table 1 can be categorized as: an event id, unique
identity of each event, a timestamp represents time and execution date of activity,
the resource who runs the activity, the cost and other information gives idea about
data-flow perspectives.
2.2 Encoding
It is important to explain encoding which stores process related data before building
any predictive model. It is necessary to convert the event log into feature vectors.
Therefore, the feature vectors are nothing but properties of the events. These prop-
erties can be eventid (EI), timestamp (TS), resource (RS) and cost (C) as shown
in Table 1. Generally, the encoding for a trace includes only the flow perspective.
In some encoding techniques interdependencies amongst events in traces also con-
sidered. The encoding generally indicates events and information related to them.
Events with different parameters are considered in the encoding, e.g. in some tech-
niques whole process is considered where in others only few attributes of events
are considered. However, in some techniques attributes like involved resources in a
process are considered for accurate information building.
2.3 Recurrent Neural Network
Recurrent Neural Networks (RNN) is efficient type of neural networks and belongs
to one of the best algorithms out there because of internal memory they are having. In
RNN, each cell feeds back information to maintain internal state over time. Because
of this property they are used in sequential data analysis. It has three layers input,
hidden and output. The output of input layer is provided as input to hidden layer, and
Fig. 1 A simple recurrent

neural network
output of hidden layer provided as input to next hidden layer and so on. Output layer
takes input from last hidden layer. The output for every unit is function generated by
weighted sum of the inputs.
As shown in Fig. 1. Each step is time step in diagram where xt, xt − 1, xt + 1 are
inputs and ot, ot − 1, ot + 1 are outputs provided at different time steps. Output of
RNN generated at time t − 1 is act as input for time t. U and W are weighted vectors
for the new inputs and the hidden layers respectively.
2.4 Long Short-Term Memory
Long Short-Term Memory (LSTM) is an artificial recurrent neural network that used
in the field of deep learning. LSTM includes efficient memory cell which help LSTM
to remember or forget things. The group of LSTM units is used as building blocks for
the layers of RNN. LSTM’s helps RNN’s to remember and process entire sequence
of data.
As shown in Fig. 2. LSTM contains three gates input (it ), forget (ft ) and output
(ot ) gate. These gates are called as controlling gates because LSTM state is accessed,
written, and cleared through output, input and forget gates respectively. Additionally
past memory cell status is cleared if ft is activated.
Fig. 2 Illustration of LSTM

3 Literature Survey
Apart from explicit process models various deep learning based techniques are
presented in recent years. Such techniques used in predictive business process
monitoring are highlighted below with their advantages and limitations.
Evermann et al. [5] presented immense idea of using deep learning with recurrent
neural networks (RNN) [6–8] which helps to predict next event in running process
using process related data available at the prediction point. This is the first idea
presented in which author has treated event trace similar to natural language sentence
and events similar to words. These set of events are passed to RNN in the form of set of
words to predict next event in ongoing process. Author presented RNN architecture
with single hidden layer of LSTM cells to next event in ongoing process. Author
has used BPI Challenge 2012 [9] and BPI Challenge 2013 [10, 11] datasets for
their study. Author has highlighted limitations that this technique can be extended to
predict outcome of process like completion of remaining time or violation of ongoing
process rules.
Tax et al. [12] also applied same LSTM approach by considering activities occur-
rence sequence and their time stamp. Author has used one hot encoding to transform
input events into feature vectors which in not used in earlier deep learning approach.
Inspired by results from Evermann et al. [5] author presented various LSTM archi-
tectures to predict next activity and timestamp in running process instance. In this
approach author has used two separate methods in which one used for activity pre-
diction and other used for timestamp prediction. Helpdesk dataset and BPI’12 Sub-
process W dataset are used for evaluation. It is worth to mention that author has
highlighted limitation of LSTM model which predict long sequences of same event
while dealing with traces having similar activities multiple times. As a future work
this technique can be enhanced to predict other task such as remaining cycle time
and attributes of next activity.
Nijat Mehdiyev et al. [13] presented unique predictive business process model
which based on deep learning approach. The goal of this technique is to predict
next event in ongoing process using its activities available at prediction time. Author
has presented deep learning approach which includes multiple stages to deal with
classification problem like prediction of next event. Important part in this research
work is to built feature vector from sequence data, author has introduced n-gram
encoding technique to detect interdependencies among sequential event data which
is not possible using simple index encoding method. After extracting feature matrix
from event log, deep learning method applied to predict next event in running pro-
cess. Deep learning method in this approach consist of two components unsupervised
component used for pre-training and supervised component which is used for clas-
sification. Applying n-gram encoding technique to large event space may generate
feature space with high dimensions. Therefore, to obtain proper input vector size
from large event space author has used feature hashing technique. Apart from this,
author presented concept of optimizing hyper parameters in deep learning, which is
not addressed in earlier techniques. The datasets used for evaluation are BPI Chal-
lenge 2012 [9], BPI Challenge 2013 [10, 11] and Helpdesk [14] data. As future work
perspective, author suggested to apply presented approach to regression problems
like predicting remaining time for case completion. Also author has addressed issues
like concept drift, feature drift which can be considered as future work.
4 Research Questions
4.1 What Kinds of Deep Learning Approaches Are Used

in Predictive BPM?
In this section we are going to discuss various deep learning approaches which are
presented earlier in the field of predictive BPM’s. In this section we will focus on
deep learning approaches based on RNN, LSTM and Auto Encoders.
LSTM Based Deep Learning Approach Evermann et al. [5] applied LSTM based
deep learning approach which deals with prediction of next event in a ongoing pro-
cess. In this approach author presented RNN architecture which includes hidden
layers with network of LSTM cells. In mentioned approach event traces are treated
as natural language sentence and event analogous to word. Tax et al. [12] presented
same idea of using LSTM network to predict timestamp and next activity in ongoing
case. Tax et al. [12] presented various LSTM based neural network architectures
which includes single task layers and multi task layers. Also author highlighted how
multi-task learning outperforms over single task learning.
Stacked Autoencoders Based Deep Learning Approach To learn data coding in

an unsupervised manner autoencoders are used. Mehdiyave et al. [13] presented
novel deep learning approach in business process prediction which uses stacked
autoencoders. The training process in mentioned approach is divided into two parts:
Unsupervised pre-training. Author has presented idea of using stacked autoen-
coders based deep learning approach for predicting next event in ongoing process
instance. Stacked autoencoders are used in unsupervised pre-training part to extract
high level feature representation. Initial weights for next stage i.e. supervised fine-
tuning are obtained by independent training of stacked autoencoders. As advantage,
stacked autoencoder gives feature representation which is better than conventional
techniques.
Supervised fine-tuning. After unsupervised pre-training stage, weights received from
previous stage are fine tuned by applying logistic regression on them. To add output
layer on top of stack this stage is used as multiclass classification.
4.2 What Are the Various Datasets Are Used in Predictive

BPM?
This section describes the most common datasets those are used to measure efficiency
of deep learning approaches presented in [5, 12, 13].
BPI Challenge 2012 The BPI Challenge 2012 [9] dataset includes 13,087 cases with
262,000 events collected from a Dutch financial institute. The activities involved in
loan application process are categorized into three sub-processes: activities linked to
the application (A), activities belong to the applications (W) and activities linked to
the offer (O). Events for the A and O sub-processes include transition of completed
lifecycle, while the W process has transitions of scheduled, started and completed
lifecycle.
BPI’12 Subprocess W Dataset This dataset is created from BPI’12 and comprises
information from application procedure which is related to various financial products
collected from financial institutions. This dataset divided into three sub processes:
one that deal with application condition, one that deal with offer condition, and one
that deal with data related to application.
BPI Challenge 2013 The BPI Challenge 2013 [10, 11] dataset incorporates log
information related to incident and problem management system collected from
Volvo IT Company. It is categorized into three subsets: First subset has 65,533
events of 13 unique types belongs to 7554 cases which is incident management
subset. Second subset has 2351 events of 5 unique types belongs to 819 cases which
is open problem subset. And third subset has 6660 events of 7 unique types belongs
to 1487 cases which is closed problem subset.
Helpdesk Dataset Helpdesk [14] contains log information generated from ticketing
management process of an Italian software company. All cases are start after creation
of new ticket and ends after case completion or closing of ticket. This log has about
3,804 cases and 13,710 events with 9 activities.
4.3 What Are the Various Encoding Techniques Are Applied

on Event Logs in Predictive BPM?
Encoding represents sufficient information of the running process, which will be

used as primary input for the approach applied to construct predictive model. In
short encoding represents events information and their associated information. This
section describes various encoding techniques used in deep learning approaches
presented in [5, 12, 13].
Word Embedding Evermann et al. [5] presented the idea in which event traces are
treated similar to natural language sentence and events similar to words. Matrix with
v × m dimensions is formed by converting words into n-dimensional “embedding”
space [5]. In this v represents size of vocabulary and m stands for dimension of
“embedding” space and the dimension of every LSTM hidden layer.
One Hot Encoding One hot encoding is a procedure by which categorical values
are changed into numerical values that could be given to deep learning algorithm
which helps to improve predictive performance. Tax et al. [12] presented one hot
encoding for predicting next activity and timestamp. Feature vectors are used as
input to LSTM network which are created from each event e. Features |A| which
represent type of activity from event e are considered to build feature vector in so
called one-hot encoding. Author has used index which is belongs to A → {1, …,
|A|} to indicate the position of an activity in set of activities A. If particular activity
belongs to event value 1 is assigned to feature number index if not 0 is assigned to
other features using one hot encoding.
N-gram Encoding Mehdiyev et al. [13] presented sequence encoding approach to

convert executing activity into numerical input features. Event sequence data is set
of events with their interrelation and dependencies. Compare to one hot encoding
n-gram encoding is more appropriate approach for analyzing dependencies among
events. Consider following event sequence data, E = {P, Q, S, L, M, and N}. The all
combinations such as {PQ, QS, and SL …, MN} are 2-g features and all combinations
such as {PQS, QSL, … LMN} are 3-g features. The reason behind using N-gram
encoding is they require minimum preprocessing such as sequence alignment also
apart from encoding letters they are able to order them automatically.
4.4 What Are the Various Prediction Types in Predictive

BPM?
This section describes the various prediction types used in predictive business process
monitoring. Three main categories of prediction types are presented below.
Numeric Prediction Numeric predictions are partitioned into two groups based on
particular type of prediction:
Time predictions. Several works are carried out in this group which is based on explicit
models. In [15] set of approaches presented by author in which transition systems
which are generated from events are annotated with time related data generated from
the event logs. In [16, 17], annotated transition systems combined with machine
learning techniques to improve their performance.
Cost predictions. In [18], author presented cost prediction related work which is
explicitly relying on models. In such approaches, cost is predicted by analyzing a
process model considering parameters like information about production, volume

and time.
Categorical Prediction Categorical Prediction also partitioned into two groups
those are listed below as:
Risk Predictions. In recent times, a lot of work carried out which deals with the
prediction of risks under the outcome-oriented predictions. Important part in this
type existence of explicit model which guides the prediction. In [19], the authors
presented technique for reducing process risks. Considering process related data
such as resources, execution frequencies, decision tree is generated from event logs.
Generated decision trees are then traversed used for risk mitigation.
Categorical Outcome predictions. No work in this category is based on any explicit
model. In [20], author presented a approach which predicts the violation of a predicate
in an running process. Such a approach generates predictions by considering: (i) event
sequence data and (ii) the data payload of the last activity of the running case.
Activity Sequence Prediction This category includes more recent works which
deals with prediction of next event in ongoing process [8, 12, 17, 21]. In [17], anno-
tated data aware transition system is used to predict future activities of running case.
Other approaches, e.g., [5, 12] make use of recurrent neural network which includes
LSTM cells for predicting various process related parameters.
4.5 Which Evaluation Parameters Are Used to Measure

Effectiveness of Deep Learning Approach in Predictive
BPM?
The quality of a predictive monitoring system can be evaluated with the help of evalu-
ation metrics and it depends on the type of input and encoding technique. Following
are the various measures explained which are used by deep learning approaches
presented in [5, 12, 13].
Precision It is the proportion of instances predicted correctly divided by the total of
predicted instances (Table 2).
TP
Pr ecision =
T P + FP
Recall It indicates what percentage of correctly predicted instances for the positive
class has been correctly identified (Table 2).
TP
Recall =
T P + FN
Table 2 Symbols used in

Symbol Meaning
equations
TP Events with positive event type classified correctly
FP Events with negative event type classified as positive
TN Events with negative event type classified correctly
FN Events with positive event type classified wrongly
Accuracy It is the proportion of true positive and true negative in all evaluated cases
(Table 2).
TP + TN
Accuracy =
TP + TN + FP + FN
F_measure It represents harmonic mean of precision and recall (Table 2).
Pr ecision · Recall
F_measur e = 2 ·
Pr ecision + Recall
4.6 What Are Research Gaps Available in Deep Learning

Approaches Used in Predictive BPM?
Evermann et al. [5] presented novel approach of using recurrent neural net-
works(RNN) based deep learning method [5, 12, 13] for the prediction of next event
by using event log of ongoing process. Predicting process outcomes such as remain-
ing time and compliance violation are not addressed by Evermann et al. [5]. Also
author has suggested to consider case attribute information to predictor which may
lead to better prediction accuracy.
Tax et al. [12] also applied same LSTM approach, taking into account activities
occurrence sequence and their time stamp. Inspired by results from Evermann et al.
[5] author presented various LSTM architectures for predicting timestamp and next
event in ongoing case. It is worth to mention that author has highlighted limitation
of LSTM model which predict long sequences of same event while dealing with
traces having similar activities multiple times. As a future work this technique can
be enhanced to predict other task such as remaining cycle time and attributes of next
activity. Also this technique can be enhanced to predict other parameters such as
aggregate performance indicators and case outcomes. Also classification problem is
unfocused in presented approach which can be other future work.
Nijat Mehdiyev et al. [13] presented unique predictive business process model
which based on deep learning approach. The goal of this technique is to predict next
event in ongoing process using its activities available at prediction time. As future
Table 3 Comparison of deep learning approaches in predictive BPM

Deep learning Datasets Encoding Prediction Gaps/non
method type type predicted
attributes
Evermann RNN with BPI Word Next process Process
LSTM Challenge embedding event outcome
2012
BPI
Challenge
2013
Tax RNN with HelpDesk One hot Next activity, Aggregate
LSTM Dataset encoding remaining performance
BPI’12 cycle time indicators,
Dataset case outcome
Mehdiyev Stacked BPI N-gram Next process Time to next
autoencoders Challenge encoding event event,
2012 remaining
BPI time case
Challenge completion
2013
HelpDesk
Dataset
work perspective, author suggested to apply presented approach to regression prob-

lems like predicting remaining time for case completion. Also author has addressed
issues like concept drift, feature drift which can be considered as future work. Also
issues like concept drift and feature drift are not considered in approach which can
be considered as future work. Presented approach can also be applied to predict out-
come of business process such as compliance with service-level agreements, process
success and failure.
Based on the above research questions following (refer Table 3) gives overall
comparison of various deep learning approaches used in predictive business process
monitoring based on parameters like deep learning method, dataset, encoding and
prediction type.
5 Comparative Analysis
In this section we compare effectiveness of various deep learning approaches used

in predictive business process monitoring based on evaluation parameters like pre-
cision, recall etc. The various datasets like BPI Challenge 2012 [9], BPI Challenge
2013 [10, 11] and Helpdesk [14] are considered for comparison. The dataset BPI
Challenge 2012 [9] categorized into three sub-processes: activities linked to the
application (A), activities belong to the applications (W) and activities linked to the
offer (O). Events for the A and O sub-processes include transition of completed
Table 4 Comparison based on evaluation metrics

Deep learning approach Accuracy Precision Recall
BPI 2012_W Evermann et al. [5] 0.658
Tax et al. [12] 0.760
Mehdiyev at.al [13] 0.831 0.811 0.832
BPI 2012_A Evermann et al. [5] 0.832
Mehdiyev at.al [13] 0.824 0.852 0.824
BPI 2012_O Evermann et al. [5] 0.836
Mehdiyev at.al [13] 0.821 0.847 0.822
BPI 2013_incidents Evermann et al. [5] 0.735
Mehdiyev at.al [13] 0.663 0.648 0.664
BPI 2013_problems Evermann et al. [5] 0.628
Mehdiyev at.al [13] 0.662 0.641 0.662
Helpdesk Tax et al. [12] 0.712
Mehdiyev at.al [13] 0.782 0.632 0.781
Bold indicates that algorithm has achieved better results than other algorithms
lifecycle, while the W process has transitions of scheduled, started and completed
lifecycle. Deep learning approaches in [5, 12, 13] have used only completion events
for above mentioned datasets. The BPI Challenge 2013 [10, 11] dataset incorpo-
rates log information related to incident and problem management system collected
from Volvo IT Company. From BPI challenge 2013 [10] two subsets incident man-
agement and problem management are considered for comparison. Helpdesk [14]
contains log information generated from ticketing management process of an Italian
software company.
As shown in Table 4 considering all three datasets from BPI 2012 Mehdiyev et al.
[13] outperforms over other two approaches. The accuracy achieved by Mehdiyev
et al. [13] is better than that of accuracy achieved by Tax et al. [12]. Also Mehdiyev
et al. [13] outperforms on approach presented by Evermann et al. [5] considering
precision parameter. The results for BPI 2013_incident and BPI 2013_problems
are varied. Mehdiyev et al. [13] performs better in considering recall parameter
where Evermann et al. [5] performs better in considering precision parameter. Finally,
considering Helpdesk dataset Mehdiyev et al. [13] performs better than Tax et al.
[12] in terms of accuracy.
6 Conclusion
In this paper we discussed input data, encoding, LSTM and RNN as preliminary
concepts which are useful in predictive BPM. We discussed various deep learning
approaches used in predictive business process monitoring. We presented types of
predictions, encoding techniques and various datasets used in predictive business

process and compared them considering three deep learning approaches. We com-
pared three deep learning approaches, two of them based on LSTM neural network
and remaining based on stacked autoencoders. We considered BPI 2012, BPI 2013
and Helpdesk datasets for comparison. From comparison we highlighted the advan-
tage of using a stacked autoencoders based deep learning approach over LSTM based
deep learning approaches for predicting future events in ongoing process. Stacked
autoencoder based deep learning approach performs better with respect to evaluation
parameters like precision, recall and accuracy.
In this paper we have also highlighted unique challenges and gaps in current deep
learning approaches used in predictive BPM. As a limitation, LSTM based approach
not perform up to mark while dealing with multiple instances of same activity. Also
LSTM based approaches were not focused on predicting case outcome and aggre-
gate performance indicators. In mentioned approaches, regression problems such as
remaining time for case completion are not addressed. Also concept drift, feature
drift and hyperparameter optimization are some of the issues which are not addressed
in current approaches. These gaps can be considered as a future work for enhancing
effectiveness of mentioned deep learning approaches used in predictive BPM.
References
1. Teinemaa, I., Dumas, M., Maggi, F.M., Di Francescomarino, C.: Predictive bp monitoring with
structured and unstructured data. In: Proceedings of the International Conference on Business
Process Management (BPM), pp. 401–417 (2016)
2. van der Aalst, W.: Process Mining: Data Science in Action. Springer, Berlin (2016)
3. Grigori, D., Casati, F., Castellanos, M., Dayal, U., Sayal, M., Shan, M.: Bp intelligence. Comput.
Ind. 53(3), 321–343 (2004)
4. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–4442
5. Evermann, J., Rehse, J.R., Fettke, P.: A deep learning approach for predicting process behaviour
at runtime. In: PRAISE-2016 (2016)
6. Sutskever, I., Martens, J., Hinton, G.E.: Generating text with recurrent neural networks. In:
Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue,
Washington, USA, June 28–July 2, 2011
7. Graves, A.: Generating sequences with recurrent neural networks. CoRR abs/1308.0850 (2013)
8. Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. CoRR
abs/1409.2329 (2014)
9. van Dongen, B.F.: BPI Challenge 2012. Eindhoven University of Technology. Dataset. http://
dx.doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f
10. Steeman, W.: BPI challenge 2013, incidents. Ghent University. Dataset. http://dx.doi.org/10.
4121/uuid:500573e6-accc-4b0c-9576-aa5468b10cee
11. Steeman, W.: BPI challenge 2013, open problems. Ghent University. Dataset. http://dx.doi.org/
10.4121/uuid:3537c19d-6c64-4b1d-815d-915ab0e479da
12. Tax, N., Verenich, I., La Rosa, M., Dumas, M.: Predictive business process monitoring with
LSTM neural networks. In: Proceedings of CAiSE 2017 (2017)
13. Mehdiyev, N., Evermann, J., Fettke, P.: A Novel Business Process Prediction Using a Deep
Learning Method. Springer, Berlin (2018)
14. Verenich, I.: Helpdesk. https://doi.org/10.17632/39bp3vv62t.12016.128
15. Van der Aalst, W.M.P., Schonenberg, M.H., Song, M.: Time prediction based on process mining.
Inf. Syst. 36(2), 450–475 (2011)
16. Polato, M., Sperduti, A., Burattin, A., de Leoni, M.: Data-aware remaining time prediction
of business process instances. In: 2014 International Joint Conference on Neural Networks
(IJCNN) (July 2014)
17. Polato, M., Sperduti, A., Burattin, A., de Leoni, M.: Time and activity sequence prediction of
business process instances. Computing (2018)
18. Tu, T.B.H., Song, M.: Analysis and prediction cost of manufacturing process based on process
mining. In: ICIMSA (2016)
19. Conforti, R., de Leoni, M., La Rosa, M., van der Aalst, W.M.P.: Supporting risk informed
decisions during business process execution. In: CAiSE (2013)
20. Maggi, F.M., Di Francescomarino, C., Dumas, M., Ghidini, C.: Predictive monitoring of
business processes. In: Proceedings of CAiSE 2014 (2014)
21. Di Francescomarino, C., Ghidini, C., Maggi, F.M., Petrucci, G., Yeshchenko, A.: An eye into
the future: leveraging a-priori knowledge in predictive business process monitoring. In: BPM.
Springer, Berlin (2017)
Machine Learning Based Risk-Adaptive
Access Control System to Identify
Genuineness of the Requester
Kriti Srivastava and Narendra Shekokar
Abstract Data access can be controlled in a static manner using role based or pol-
icy based access control. These access control systems can easily handle situations
in structured databases. In today’s era of big data where lot of research work is
done in storing huge and unstructured data, there is still a big gap in providing data
access security. There are many real world applications where static access control
systems are not effective, such as defense, airport surveillance and hospital man-
agement system. There is a need for a system which learns and adapts according
to the genuineness of the requester. Existing role based access control methodology
easily attracts intruders. The main drawback of policy based access control is lack of
adaptability as the policy decided initially cannot be changed dynamically. Proposed
risk adaptive access control is a framework which, understands the genuineness of
the requester, calculates the risk and then acts accordingly. This framework considers
many real world attributes in its design, such as time of access, location of access,
previous history of the requester (how many times the same request is been asked
by the requester) and sensitivity of information which is requested. The system will
sense the situation (emergency or normal) and learns from the past history. It cal-
culates a risk score and based on the risk score access is provided. We have tested
accuracy of the system as well as false negative which ensures that the framework is
adaptable.
Keywords Risk adaptive access control · Machine learning, deep learning · Neural
network · Hospital management system
K. Srivastava (B) · N. Shekokar (B)

D. J. Sanghvi College of Engineering, Mumbai, India
e-mail: kriti.srivastava@djsce.ac.in
N. Shekokar
e-mail: narendra.shekokar@djsce.ac.in

https://doi.org/10.1007/978-3-030-38445-6_10
130 K. Srivastava and N. Shekokar
1 Introduction
Hospital Management System (HMS) includes different types of information. Some

may be very personal or very critical information. If it is a regular scenario in HMS
then only authorized doctors shall be allowed to access any information. But there
may be an emergency when authorized doctor is not available and another genuine
doctor will have to access information in order to treat the patient. In such situation
the system should be able to understand the genuineness of the doctor who is trying
to access information. This genuineness can be calculated based on many parameters
such as time of access, location of access, previous history, emergency of situation
and sensitivity of information. HMS is a very sensitive system where relaxing the
policies in case of emergency has to be carefully learned by the system. This may
lead to malicious intruders in the system if “genuineness” is not identified correctly.
Hence there is a need to have a strong learning system which finds the genuineness
of a situation adapts and then acts accordingly. In this paper there is an optimized
framework for an access control system which is developed using deep learning.
Further the paper is divided into six sections. Section 2 is a detail analysis of literature
in three main domains: Risk Adaptive Access Control, Risk Calculation mythologies
and Machine learning approach in providing dynamicity. Section 3 is about system
architecture which is divided into three subsections. First subsection discusses the
need of RAdAC in HMS; second subsection discuss the parameters involved in
risk calculation and last subsection is a detail discussion on pre-processing of data.
Section 4 discusses neural network and autoencoder methods for RAdAC. Section 5
is a detail analysis of various results from both the methodologies. Conclusion and
future scope is discussed in Sect. 6.
2 Literature Survey
2.1 Risk Adaptive Access Control
There are many different types of access control systems, such as Role based and
Policy Based. These access control system provides best results with limited data and
in static scenario. Shermin [1] has implemented Role based access control system
for No SQL but he has not tested it on multiple node. So it is difficult to say whether
it will work with same efficiency when data is huge and dynamic. Policy based
systems are able to change policies and update their policies but the developer will
have to make those changes. There is no provision of dynamically adapting to the
changes in the models discussed in these papers [2, 3]. Farroha [4] have discussed
the need of risk adaptive access control in their paper. They have chosen a real world
example of United States department of defense, where they have elaborated various
operational needs for having a risk adaptive access control system. For this Farroha
have concluded that we need to list the operational needs of the application and
Machine Learning Based Risk-Adaptive Access … 131
then calculate the trustworthiness of the requester based on regular factors as well
as environmental factors. This survey paper gave us an idea that risk adaptive access
control can be useful in many different applications such as airport surveillance [5]
and hospital management system [6].
Yang and Liu [7] developed two step methods where first purpose forest is created
to address static needs and then previous history is added to provide dynamicity in
access control. Monterieux and Zhenjiang had proposed an attribute based access
control on bidirectional transactions which used policy languages. They have imple-
mented concept of filters to enforce polices [8]. Rasheed had discussed fine grained
medical software component integrity verification techniques and fine grained role
based access control [9]. Many authors have claimed to have a role based access
control system that is dynamic in nature [10, 11]. But having a fully dynamic access
control system needs a proper mechanism of calculating and analyzing risk of the
situation which includes current factors, previous history, sensitivity of information
being asked and the situation (if emergency).
2.2 Risk Calculation
Role based system identify user roles and based on their roles access is provided. It is
a very static system. In applications such as defense, hospital or airport, if there is an
emergency system needs dynamicity. The system has to take decision if an authorized
person is not available. System can only take proper decision if risk value is calculated
keeping all the factors in mind. Pham et al. have formulated a risk calculation method
which uses a greedy approach with nearest neighbor, support vector and local outlier
factor [12]. They have shown various results to reduce denial of service, probe and
user to root attack. Lu and Sagduyu [13] included text and behavioral analysis in their
risk calculation. Laufer and Koczy used fuzzy approach for their risk calculation
[14]. They identified input patterns and grouped similar inputs then applied different
membership functions to different groups. Similar approach for a risk adaptive access
control is used in reference [15]. Many authors have used rule based methods in order
to get to accurate risk value [16–18]. Yang and Singh had used a multi agent approach
to calculate the risk [19]. Risk calculation is an integral part of any authentication
system or intrusion detection system. It is an important factor if you want to achieve
dynamicity in any security system. Achieving dynamicity in a system is not a onetime
approach. There has to be a proper provision of learning with experience. This can
be done by calculating the cost and reducing the cost each time. The system should
have learning approach in order to find accurate risk value and then based on the risk
value provide appropriate access.
2.3 Machine Learning Approach for Providing Dynamicity
As concluded by the previous section calculation risk on time or with limited param-
eters will actually work as a static system. Risk in the system may not always be there
but suddenly it may occur. The system needs to have a learning methodology which
is sense, access and highlight appropriate risk. Out of the three types of learning
supervised learning is used in many access control cases. The main reason is that
there can be only two types of label in access control system either grants access
or deny access. Supervised machine learning has proved to provide dynamicity and
adaptability in many areas. Refer [20], which used SVM for multi agent based clas-
sification. SVM proved to be very effective in case of high dimensional data set.
ANN-GA is also very effective combination to train the input parameters and do
effective predictions [21]. Neural network is a very robust algorithm and is very
effective in decision making [22]. There are many applications where assessment of
access grant and deny needs to be learned appropriately. One such system which is
very similar to risk adaptive access control is intrusion detection system. In fact, if
the system is able to identify intruders then access grant and deny can be easily han-
dled. Machine learning approach provides effective ways to identify intruders in the
system [23–25]. Recently good amount of work has been done using deep learning
[26, 27]. Many authors have applied CNN, SVM and nearest neighbor to improve
predictions in an unbalanced dataset [28]. Studies show that Auto encoders are very
efficient in unsupervised feature extraction and practically high dimension and huge
dataset can be processed on GPUs [29, 30]. There are few authors who have used
auto encoders to effectively fine credit card and anomaly detection [31, 32].
3 System Description
Role based, attribute based and policy based access control systems have been very
useful if the system is a static system. In real life we have some systems where there
is a need to provide access to a person whose role doesn’t allow him to access the
system. The situation may be completely new. In such cases system has to use its
own intelligence to decide. These decisions will be based on existing parameters,
previous history and intensity of the situation. In this section we will first discuss a
base system on which entire research work will be done and then we will discuss
various parameters used.
3.1 Details of the Base System
Risk adaptive access control (RAdAC) is an access control system which identifies
the need of the requester as well as identifies the criticality of the situation. After
a well analyzed process it gives its decision of providing access. Defense, airport
surveillance and hospital management system are few systems where we need to
have risk adaptive access control system. We have considered hospital management
system as the base system for this work. The reason why RAdAC is needed in HMS is
during any kind of emergency, if the assigned doctor is not available then system has
to take decision whether to provide access to another doctor or not. In such situation
regular system will not allow the new doctor to access information. Hence we need
a system, which sense that this doctor is genuine and allow access. But this may not
always be true. Sometimes someone may pretend to be a doctor and try to access
patient’s information illegally. In such situation the system shall be able to identify
intruders and deny access.
3.2 Discussion on Parameters
Choosing correct parameters are very important in RAdAC. Some of the research
work which has been done in this area includes time and location parameter for
risk evaluation. There is no importance given to the situation. Hence we decided
to include emergency, previous history and sensitivity of information also as input
parameters. Emergency is a binary value either yes or no. Location is also a binary
value. Access of information will be either from the hospital or outside the hospital.
Previous history means how many times same doctor has asked for same patient’s
information. This count is of great importance in identifying the intruder because if
the doctor is not assigned to the patient then he will only access the information in
case of emergency. Emergency is a rare situation so if the count of previous history in
more, then it can be alarming. Hence Previous History will lead the system to have
a better understanding of the genuineness of requester. Sensitivity is a parameter
which describes that how relevant is the requested data to the requester (doctor).
Every doctor has a certain specialization and based on the specialization, patients
approach them. A very straight forward example is only female patients will be
assigned to a gynecologist. Hence if a gynecologist is accessing a female patient’s
record it will be of low sensitivity but if a male patient’s record is being accessed
by a gynecologist then it should be alarming. But there may be situations where a
gynecologist needs to access both husband and wife’s data in order to treat infertility
issues. Similarly, an orthopedic doctor needs to know only regular and bone related
information. But if the patient has neurological issues then accessing neurological
data by an orthopedic is justified. Though it will fall under highly sensitivity data for
an orthopedic but the system has to understand and allow access.
3.3 Preprocessing Input Data
As discussed in the previous section that input table shall have a sensitivity parameter.
This parameter is not given in the dataset so the novelty in our work is to write
appropriate algorithm for calculating sensitivity parameter.
⎧
⎨ 0.80 − 1, if Assign + All Relevant
Sensitivity_score = 0.30 − 0.79, if Assign + 50% Relevant
⎩
0.1 − 0.29, if Not Assign + 50% Relevant
For simplicity purpose we have considered doctors with five specializations: gyne-
cologist, dentist, neurologist, orthopedic and cardio. Sensitivity_Score is generated
keeping in mind whether the doctor who is asking for information is assigned to
the patient or not. Another factor is how relevant is the information to the doctor.
Relevance parameter is assigned by consulting various doctors of Mumbai as shown
in Table 1.
Time parameter in the input data had a wide range. For our work we need to work
with two categories of time parameters first is during hospital hours and second after
hospital hours. One option would have been to have a binary value but as discussed
before we can find a genuine access after hospital hours also. There could be many
reasons for accessing data in inappropriate time such as doctor came late or an
emergency case. So instead of binary, fuzzy value was appreciated. In order to find
appropriate membership function we considered the histogram of time attribute and
found that the histogram is very similar to a sigmoid function as shown in Fig. 1.
We used scikit tool of python to fuzzify time attribute. The input values range was
complete 24 h of the day. After performing sigmoid membership function we got
category of values. Less than 0.5 where accessing time was from 10 pm to morning
Table 1 Sample of sensitivity score

Specialization Related_data Sensitivity_score
Gynec Surgical_history, Obsteric, Allergy, STD, X-Rays, 0.9
Blood_test, Rehab, Consultation_reports, MRI, CT-scan,
Endoscopy
Cardio Surgical_history, Allergy, family_history, X-Rays, 0.8
Blood_test, Rehab, Consultant_reports, MRI, CT-scan,
ECG, Endoscopy
Neuro Surgical_history, Allergy, family_history, X-Rays, 0.65
Blood_test, Consultant_reports, CT-scan, MRI, ctscan,
ECG, EEG
Dentist Surgical_history, Allergy, Dental, X-Ray, Blood_test, 0.4
Consultant_reports, CT-scan, Endoscopy
Ortho Surgical_history, Allergy, Dental, X-Ray, Blood_test, 0.2
Rehab, Consultant_reports, MRI, CT-scan, Endoscopy
Fig. 1 Histogram for time parameter
8 am, 0.5 to 0.7 where accessing time was 8–10 pm where OPD closes at 8 pm and
0.8–1 which is 8 am to 8 pm (regular OPD hours).
4 System Development
RAdAC system depends on lot of factors and most important one is a good learning
mechanism. As discussed earlier there cannot be fixed rules for allowing or denying
access. RAdAC purely depends on the situation. Hence the system has to efficiently
learn, adapt and then act. In this work we have explores and tested two different ideas.
First idea is to calculate sensitivity score and use neural network learning method and
the second idea is to take raw inputs, perform feature extraction using autoencoder.
All these ideas are explained in following subsections.
4.1 Neural Network Approach
Recently with the invention of GPUs neural network has gained lot of popularity.
It is one of the most accurate learning algorithms. Earlier developers used to find it
very hard to train the network but now because of GPUs the training time is reduced.
This was the reason why we started our work with neural network. Any complex real
world problem can be represented by two hidden layer neural network so we decided
Fig. 2 Network diagram for

neural network approach
to have a two hidden layer network shown in Fig. 2. Neural network is evaluated on
five main criteria: time taken to converge, number of epoch, batch size, error and
optimizer used. Batch size can be fixed before starting the training. We choose 128
and 1024 batch size. For calculating the cost we choose mean squared error and
binary cross entropy error.
There are two main optimizers used in neural network. Stochastic gradient descent
(SGD), also known as incremental gradient descent, is an iterative method for opti-
mizing a differentiable objective function, a stochastic approximation of gradient
descent optimization. In stochastic gradient descent, the true gradient of Q(w) is
approximated by a gradient at a single example:
w := w − η Qi(w) (1)
As the algorithm sweeps through the training set, it performs the above update
for each training example. Several passes can be made over the training set until the
algorithm converges. If this is done, the data can be shuffled for each pass to pre-
vent cycles. The Adam optimization algorithm is an extension to stochastic gradient
descent that has recently seen broader adoption for deep learning applications. We
have considered various combinations of batch size, errors, no of epoch and opti-
mizer. As shown in Table 2, with batch size 1024, mean squared error and Adams
optimizer we got convergence in 30 epochs where each epoch was of 2 s. With batch
Table 2 Various combinations of neural network tested

Case Time No. of epochs Batch size Error Optimizer
1 2 s/epoch 30 1024 Mean squared Adam
2 16 s/epoch 5 128 Mean squared Adam
3 3 s/epoch 30 1024 Binary cross entropy Adam
4 2 s/epoch 30 1024 Mean squared SGD
size 128 though the convergence was at 5 epochs but per epoch it took 16 s. Similarly,
we also tried stochastic gradient descent optimizer and binary cross entropy. Results
will be discussed in Result Analysis section.
4.2 Autoencoder Based Approach
An autoencoder is a neural network that has three layers: an input layer, a hidden
(encoding) layer, and a decoding layer which has same number of output as input.
The network is trained to reconstruct its inputs, which forces the hidden layer to
try to learn good representations of the inputs. An autoencoder neural network is an
unsupervised Machine learning algorithm that applies back propagation. Internally,
it has one or many hidden layers that describe a code used to represent the input.
Autoencoders belong to the neural network family, but they are also closely related
to PCA (principal components analysis). Some Key Facts about the autoencoder: It
is an unsupervised ML algorithm like PCA, it minimizes the same objective function
as PCA, and it is a neural network. Autoencoders although is quite similar to PCA but
it is much more flexible than PCA. Autoencoders can represent both liners and non-
linear transformation in encoding but PCA can only perform linear transformation.
Autoencoders can be layered to form deep learning network due to its Network
representation.
Leveraging these properties of autoencoders we constructed this architecture
where instead of taking pre-processed data, we considered raw input data sent it
to autoencoder and then use classification methods. This method is better than only
neural network as in this we do not need domain expertise for calculating sensitivity
of information. We used unlabelled raw data for autoencoding in order to get better
autoencoded inputs for classification. As shown in Figs. 3 and 4 we used denoising
autoencoder. The logic of denoising autoencoder is to be able to reconstruct data from
corrupt input. Here we train autoencoder by stochastically disrupting input data and
Fig. 3 Denoising autoencoder

Set learning_rate, #interations, dropout, batch_size, validation_size HiddenLayer_Architecture

Read data from input file
Input = Elements of column 1 to N-1 (Column N is output)
Split input to training and validation
Initialize weight W
Initialize bias h
(a) h[0] = Input * W[0] + b[0]
h[0] = Relu(h)
(b) Repeat step (a) for all nodes in this layer
(c) Apply dropout to h
Repeat steps a, b, c for all layers
Perform ADAM's optimization
Fig. 4 Sudo Code of autoencoder and random forest
then sending then to neural network. Here the method used is to randomly remove
few inputs. Approximately 30% noise is considered before sending it to neural net-
work. For training autoencoder to denoising data a mapping x → x1 is performed.
This corrupts input data. Now autoencoder uses x1 as input but the main difference
is that the loss function which is mean square error in this case will be computed
from L (x, x1 ) instead of L (x1, x1 ).
The main reason of using denoising autoencoder was to let autoencoder try and
predict missing data. After getting autoencoded input we merged them with the labels.
At this stage we were ensured that we had good quality training data in the form of
autoencoded data. Now we have to use classification. There are many options for this.
Since the training data size was quite huge and with variations so we thought of using
an optimization classification method. Random forest is one of the best ensemble
methods. In this method entire input data is divided into many different datasets and
each of these datasets will run its own decision tree model. After this a weightage
sum is taken for finally classifying the data. We implemented this methodology for
various cases of parameters and hyper parameters as shown in the Table 3. Results
are discussed in the next section.
Table 3 Various combinations of autoencoders and random forest tested

Case Hidden layers Batch size Iterations Optimizer
1 1 layer, 19 nodes 1 2000 Adam
3 1 layer, 19 nodes 1024 20,000 Adam
5 3 layers, 9/3/9 nodes 128 2000 Adam
5 Result Analysis
Our input data file contains more than 8 lakhs records which are approximately
147mb data. For all various approaches 70% data is used for training purpose and
30% for testing.
The possible cases for each field are:
p_id → doctor accessing data of a patient who is assigned to him, patient not assigned
to him
d_id → normal
location_of_access → own pc, hospital pc, outside pc
time_of_access → hospital hours, just before/after hospital hours, other specializa-
tion → normal
data_requested → relevant, slightly irrevant, irrelevant
emergency → yes, no access_granted → yes, no
The first approach was using neural network. In previous subsection there were
four cases discussed. Following table will show you the difference between training
and validation error for each case.
The performance of any system will be good if the difference of training and
validation data will be minimum. Case 1 and case 2 validation and training difference
is the minimum (refer Table 4). Also when we checked the accuracy case 1 and 2
had maximum values. Accuracy of case 1, 2 and 3 are above 80% as shown in Fig. 5.
Case 4 performs less accurate as compared with others. Hence we can conclude that
SGM optimizer is not a better optimizer for this dataset.
Our second approach was using autoencoder for preprocessing. So in this case
raw data was used in autoencoder and then autoencoded data was sent to random
forest for classification. The main idea behind this was to see how well autoencoder
is doing feature selection on raw data. This will reduce the time of calculating sen-
sitivity_score. Case 1 and case 4 are performing much ahead. They are giving more
than 90% accuracy as shown in Fig. 6. Seeing the results, we can say that lesser batch
size can give better results.
Seeing better performance with autoencoders and random forest we also examined
the accuracy with just ensemble model i.e. random forest shown in Table 5. But the
inputs given to random forest cannot be raw as there is no provision of feature
selection in random forest. So we gave pre-processed input to random forest and got
amazing results. We got highest accuracy as shown in Fig. 7.
Table 4 Training and

Case Hidden layers Training error Val error
validation error in neural
network 1 2, 3 nodes each 0.0929 0.0932
2 2, 3 nodes each 0.0932 0.0935
3 2, 3 nodes each 0.2987 0.3
4 2, 3 nodes each 0.1615 0.1607
Fig. 5 Output of various cases of neural network
100.00% 91.14% 90.69% 89.09% 88.77%

74.89% 74.87% 75.84%
80.00%
60.00%
40.00%
20.00%
0.00%
Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Case 7
Fig. 6 Accuracy for various cases of autoencoder and random forest
Table 5 Accuracy of random forest

Model Criterion No. of features Min. samples split Accuracy
RF Gini 10 2 93.35%
Fig. 7 Accuracy of all three 94.00% 93.35%

different approaches
92.00% 91.14%
90.00%
88.19%
88.00%
86.00%
84.00%
Neural Network Autoencoder+ Random Forest
Random Forest
Table 6 False positive rate of

Neural network Autoencoder + random Random forest
all the three approaches
forest
20.68% 19.25% 18.16%
Apart from accuracy false negative is also of great importance. The whole idea
of developing this RAdAC is to sense the genuineness of the requester and allow
access so we need to see from our testing result that how many false negative values
we have. As shown in Table 6 random forest gives the best values for false positive
rate.
6 Conclusion and Future Scope
We have developed a risk adaptive access control model for a hospital management
system. The dataset was not having any attribute which showed the sensitivity of
information. So we preprocessed the data and added sensitivity score which is the
key enabler in deciding the genuineness of the requester. In RAdAC accuracy and
false positive rate, both are of great importance. We first tried with neural network
and got fairly accurate results. Then we decided to give raw data to the system and
let system do feature selection. For this we used denoising autoencoder for feature
selection and used random forest for classification. The results were much better
than neural network. We wanted to understand if this was because of autoencoder or
random forest. Hence we executed random forest, which proved to be the best one.
But we could not use raw input for random forest as it is not capable of performing
feature selection. In the conclusion we would like to say that risk adaptive access
control is dependent a lot on the kinds of parameters we use. If we don’t have any
domain expertise available, then we can use autoencoders for feature selection. Else
add sensitivity of information in the input dataset and use random forest for best
results. In the future we would like to test the same concepts with defense data and
airlines data.
References
1. Shermin, M.: An access control model for NO SQL databases. The University of Western
Ontario (2013)
2. Colombo, P., Ferrari, E.: Fine grained access control within NO SQL document oriented data
stores. Data Sci. Eng. 1(3), 127–138 (2016)
3. Srivastava, K., Shah, P., Shah, K., Shekokar, N.: Int. J. Adv. Res. Comput. Sci. Softw. Eng.
7(5), 518–522 (2017)
4. Farroha, B, Farroha, D.: Challenges of “Operationalizing” dynamic system access control:
transition from ABAC to RAdAC. In: System Conferemce SysCon 2012 IEEE, pp. 1–7 (2012)
5. Fugini, M., Hadjichristofi, G., Teimourijia, M.: Dynamic security modelling in risk manage-
ment using environmental knowledge. In: 23rd international WETICE conference, pp. 429–434
(2014)
6. Athinaiou, M.: Cyber security risk management for health based critical infrastructures. In:
11th International Conference on Research Challenges in Information Science, RCIS 2017,
pp. 402–407 (2017)
7. Yang, Y., Liu, S.: Research on the qualification method of the operational need based on access
purpose and exponential smoothing. In: IEEE 7th Joint International Information Technology
and Artificial Intelligence Conference, pp. 516–522 (2014)
8. Montrieux, L., Hu, Z.: Towards attribute based authorisation for bidirectional programming.
In: 20th ACM Symposium on Access Control Models and Technologies SACMAT 2015,
pp. 185–196 (2015)
9. Rasheed, A.A.: A trusted computing architecture for health care. In: International Conference
on Information Networking, ICOIN 2017, pp. 46–50 (2017)
10. Bijon, K.Z., Krishnan, R., Sandhu, R.: A framework for risk—aware role based access control.
In: 6th Symposium on Security Analytics and Automation, pp. 462–469 (2013)
11. Wang, Q., Jin, H.: Quantified risk-adaptive access control for patient privacy protection in
health information systems. In: ASIACCS 6th ACM Symposium on Information, Computer
and Communications Security, pp. 406–410 (2011)
12. Pham, L.H., Albanese, M., Venkatesan, S.: A quantitative risk assessment framework for
adaptive intrusion detection in cloud. In: Security and Privacy SPC 2016, pp. 489–497 (2016)
13. Lu, Z., Sagduyu, Y.: Risk assessment based access control with text and behavior analysis
for document management. In: IEEE Military Communications Conference MILCOM 2016,
pp. 37–42 (2016)
14. Toth Laufer, E., Varkonyi-Koczy, A.R.: Personal statistics-based heart rate evaluation model.
IEEE Trans. Instrum. Measur., 64(8), 2127–2135 (2015)
15. AI-Zewairi, M., Suleiman, D., Shaout, A.: Multilevel fuzzy inference system for risk adaptive
hybrid RFID access control system. In: Cybersecurity and Cyberforensics Conference (CCC),
pp 1–7 (2016)
16. Marin, P.A.R., Herran, A.O., Mendez, N.D.D.: Rule- based system to educative personalized
strategy recommendation according to the CHAEA test. In: XI Latin America Conference on
Learning Objectives and Technology, pp. 1–7 (2016)
17. Sun, J., Ye, Y., Chang, L., Jiang, J., Ji, X.: Sleep monitoring approach based on belief rule- based
systems with pulse oxygen saturation and heart rate. In: 29th Chinese Control and Decision
Conference, pp. 1335–1340 (2014)
18. Srivastava, K., Aher, P., Shekokar, N.: Fuzzy inference to rule-based risk calculation for risk
adaptive access control. INDIACom 2018 (in press)
19. Wang, Y., Singh, M.P.: Evidence based trust a mathematical model geared for multiagent
system. ACM Trans. Auton. Adapt. Syst. 5(3), 1–15 (2010)
20. Ponni, J., Shunmuganathan, K.L.: Multi-agent system for data classification from data mining
using SVM. In; 5th Green Computing Communication Conservation Energy IEEE ICGCE, pp
828–832 (2013)
21. Yuce, B., Rezgui, Y.: An ANN-GA semantic rule based system to reduce the gap between
predicted and actual energy consumption in buildings. IEEE Trans Autom. Sci. Eng. 14(3),
1351–1363 (2017)
22. Seteono, R., Baesens, B., Mues, C.: Recurssive neural network rule extraction for data with
mixed attributes. IEEE Trans. Neural Network 19(2), 299–307 (2008)
23. Li, L., Yu, Y., Bai, S., Hou, Y., Chen, X.: An effective two step intrusion detection approach
based on binary classification and kNN. IEEE Access 6, 12060–12073 (2018)
24. Lee, C.H., Su, Y.Y., Lin, Y-C., Lee, S.-J.: Machine learning based network intrusion detection.
In: 2nd IEEE international conference on computational intelligence and application ICCIA
2017, pp 79–83
25. Kumar, G.R., Mangathayaru, N., Narasimha, G., Reddy, G.S.: Evolutionary approach for
Intrusion detection. In: International conference on engineering and MIS (ICEMIS), pp 1–6
(2017)
26. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) [Online].
Available: http://www.Nature.com/doifiner; https://doi.org/10.1038/nature14539
27. Goodfellow, I., Bengio, Y., Courville, A.: Deep learning (2016) [Online]. Available http://www.
deeplearningbook.org
28. Chawdhury, M.M.U., Hammond, F., Konowiz, G., Xin, C., Li, H.W.: A few shot deep learning
approach for improved intrusion detection. In: 8th Annual Ubiquitous Computing Electronic
and Mobile Communication Conference (UEMCON), pp. 456–462 (2017)
29. Shone, N., Ngoc, T.N., Phai, V.D., Shi, Q.: A deep learning approach to network intrusion
detection. IEEE Trans. Emerg. Topic Comput. Intell. 2(1), 41–50 (2018)
30. Farahnakian, F., Heikkonen, J.: A deep autoencoder based approach for intrusion detection
system. In: 20th International conference on Advance Communication technology ICACT,
pp. 178–183 (2018)
31. Rushin, G., Stancil, C., Sun, M., Adams, S., Beling, P.: Horse Race Analysis in Credit Card
fraud–deep learning, logistic regression and gradient boosted tree. In; IEEE SIEDS, pp. 117–
121 (2017)
32. Paula, E.L., Laderia, M., Caravalho, R.N., Marzagao, T.: Deep learning anomaly detection
as support fraud investigation in Brazilian Exports and Anti–Money laundering. In: 15th
International Conference on Machine Learning and Applications, pp. 954–960 (2016)
An Approach to End to End Anonymity
Ayush Gupta, Ravinder Verma, Mrigendra Shishodia and Vijay Chaurasiya
Abstract We introduce a novel methodology which is censorship resistant from

the deep packet inspection by the censors or ISP and fortified by malicious exit
nodes using VPN technology. To achieve this, we have done the survey to collect
data about the preferences they sought with TOR privacy, security and performance.
And depending upon that, we categorize the threat model and the respective security
model. Using simulation, we validate the following security models and we believe
that as high-speed anonymity networks become readily available the aforementioned
security model based on threat models will prove significant and compelling.
Keywords Anonymous network · VPN · Obfsproxy and TOR architecture
1 Introduction
Low latency anonymity system TOR provides high degree of anonymity to the users
by routing his communication through various relays deployed across globally by
the volunteers of TOR. It hides the actual source or destination address by other
communicating entities. The TOR network is highly usable anonymity network,
contains of 3000 nodes and 500,000 users approximately [1–3] and been increasing
day by day and mostly used by social activists, communist, researchers and hackers
[4]. But it too has some limitations with it like traffic analysis either by Global
Adversary who has the power to analyze all traffic or by other means. The making
of TOR is also somewhat responsible to confide confronts and attacks, they do not
modify the inter-packet delay to achieve low end-to-end latency which led the TOR
network susceptible to traffic pattern observations attacks [5, 6].
In this paper we are presenting a security solution to the threat models with belief
that coming technology with high computation and high-speed anonymity network
would not decline or limit the performance and security capacity of TOR as of now.
While in first Security model we focused on evasion from the deep packet inspection
A. Gupta · R. Verma (B) · M. Shishodia · V. Chaurasiya

MS (Cyber Laws and Information Security) Division, Indian Institute of Information Technology,
Allahabad, UP 211015, India

https://doi.org/10.1007/978-3-030-38445-6_11
146 A. Gupta et al.
by the censors which analyze traffic between client and TOR Bridge. To curb this
attack from censors we forward the TOR traffic through cloud which itself provided
a tunnel to the communication traffic. Cloud Transport is one of the solutions which
enables this opportunity and ensures the security and privacy of user traffic [7].
On the other threat model where we have focused on malicious exit nodes and
it could be either adversary or some illegitimate person. The problem defines that
after floating the traffic from various networks routers and relays when the traffic
finally comes to exit node it decrypts here and the traffic from exit node onwards
remain unencrypted to the server. So, any malicious mind or adversary can analyze
the traffic and extracts the information about user. However, it is not possible to
de-anonymize every user and relays in the circuit but there are some methods like
described by “Sambuddho chakravarty in his paper to identify the client, relays and
hidden servers using bandwidth estimation” [8] was a remarkable approach in this
field. So, to eliminate the possibility of identifying at exit node we deployed the VPN
where after exiting from TOR, traffic remains encrypted by VPN tunnel which will
decrypted later at some endpoint near to server.
2 Related Work
The majority of work done to secure the traffic from censorship latter is pluggable
transport used in conjunction with TOR which employs the facility to randomize
the traffic pattern or say disguises the network traffic signature regularly. StegoTorus
[9] is one of the circumvention tools which disguise TOR from protocol analysis
which provides two additional layers of obfuscation to TOR traffic. The authors
of this paper demonstrate that while chopping converts ordered sequence of fixed
length cells into variable length blocks which delivered unordered and on other hand
steganography disguises each block as a message in an innocuous cover protocol,
such as an unencrypted HTTP request or response. Short live connections per packet
have been used to foil at transport layer. But by active probing by probers at different
points in the network it can be easily detectable and also susceptible to MITM attack.
There are some other sophisticated circumvention tools like Obfsproxy [10] which
applies additional stream cipher has higher order tradeoff for inter-packet arrival time
and packet sizes. Meek [11] is another new pluggable transport [12] facility which
uses Google app engine as a gateway to relay TOR traffic. It binds TOR transport
with HTTP header which is further concealed within a TLS session for obfuscation,
but it is vulnerable to rogue certificates.
Even after the use of such sophisticated tools which helps in circumvent the
internet censorship for the anonymity; we can’t elude the power of strong adversary
that can analyze the traffic [13]. Initially when bridges using special plugins called
pluggable transport help us to evade from ISP to know that we were using TOR,
now censors have found the ways to block TOR even when clients are using TOR
bridges and they do this by installing boxes in ISPs that watch over the traffic and
block TOR traffic when discovered. In a paper given by Roger Dingledine and Nick
An Approach to End to End Anonymity 147
Mathewson, 2006 [14] they presented the anti-censorship design techniques where
one of it described “the deployment of bridges inside the blocked network too” but
later deep packet inspection by the governments and law agencies proves fatal to the
need of security and specially privacy to the user.
The main problem defines at the network level filtering which will remain sustain
until some irresistible methodology will use because the TOR network traffic is
disjoint to the normal traffic and hence predictable easily by the censors that to
circumvent client are using bridges and therefore it can entail to identification of
network location of client, relays and hidden servers by various attacking techniques.
Our approach seems similar to “Using Cloud storage for Censorship Resistant
Networks” based methodology proposed by Brubaker [7]. The author described that
using this new censorship-resistant communication system that hides user’s network
traffic by tunneling the TOR traffic through cloud storage service like Amazon S3.
Our technique where we presented the threat model of censorship concluded by
the survey on latency and survey follows the same approach. We, however, aim to
present some changes in the current TOR design which fixes the inter-packet delay
by proposing use of UDP in the TOR circuit with short live connections per packet
and running TOR as a bridge in the cloud improve speed and safety of the TOR
network by donating bandwidth.
On the other malicious exit node has been frustrating problem so far, in a paper
discussed by “Chakravarty explains the detection of traffic snooping in TOR using
Decoys [15]” where he demonstrates injection of traffic that exposes bait credentials
for decoy services like IMAP and SMTP and trace it back to the TOR exit node
on which it was exposed. In another paper he demonstrates the “effectiveness of
traffic analysis using Cisco’s Netflow records” [16] where adversaries could these
techniques to mount large scale traffic analysis at exit node or entry nodes and the
success rate of identifying the source of anonymous traffic came around 81.6%. Using
Bandwidth estimation [8] results shows that they were able to expose the real-world
TOR relays with true positive rate of 49.3%. Their main aim was to induce fluctuations
at different network points in the victim’s connection using colluding servers and
measure the bandwidth fluctuation and hence network path and the process was
coordinated by adversaries acting as probers at various network hopes.
We have concluded from the survey that many of them sought high privacy regard-
less high latency in the network. Therefore, with the help of configuring some changes
in the TOR design and usage of VPN at exit node onwards remove the fear of identi-
fying at exit node and various attacks at it. Even if the client with the less awareness
use innocuous protocol like HTTP then even after exiting from TOR exit node traffic
will tunneled by VPN hence secures the perimeter at malicious exit node. During the
experiments and the simulation, we observed that governments and law agencies can
predict the traffic between VPN server and hosted server, but it could be eliminated
by the use of paid reliable VPN which is out of the surveillance country or state
where user resides. Using the Bit coin payment user can hide his anonymity from
revelation.
148 A. Gupta et al.
3 Threat Model
3.1 Depending upon Intermediate Latency and Intermediate

Privacy
To define the problem set we performed the survey to collect data regarding user
activities over TOR. And results shown in Fig. 2, that most of the users who use TOR
are not aware with the pluggable transport usage hence they are more vulnerable to
censorship threat and deep packet inspection. Most of them use by default settings
which of course led them to escape from one additional layer of obfuscation security.
In the conducted survey shown in Fig. 2, 50% people demands intermediate latency
and privacy which led them to somehow vulnerable to some attacks from censors
and malicious exit nodes as shown in Fig. 1.
On the preference of users, we developed the security model as shown in Fig. 3
for this threat model where censorship resisting networking is our main concern.
Therefore, to evade from censorship we tunneled our TOR traffic through cloud
storage service like Amazon S3 [7, 17]. Our approach is similar to Cloud Transport
[18] which works as anti-censorship tool used as pluggable transport to TOR, gateway
to TOR or as a standalone service. Cloud Transport rendezvous protocol works on
the principle that ensures that there would be no direct connection between Cloud
Transport client and Cloud Transport Bridge. Therefore, where previously ISP or
other censors where easily identify the bridge and block the TOR traffic, here they
can’t perform the same activity. Because to perform such strong action against a
giant cloud network like Amazon, Ransack and Google etc., it would require high
capital in analyzing the traffic and forming network filtering. Additionally, to it if the
attacks like bandwidth estimation using trace-back mechanism or Denial of service
technique applied in such case would not be entertained by the cloud because there
are 1000s of other services running across this domain and any country even willing
would not like to perform such task which entailed to disruption to normal services
and users.
Fig. 1 Censorship threat model

Fig. 2 Survey results done

on the user’s preference for
TOR (globally)
Fig. 3 Censorship resistant security model
We, however use UDP protocol instead of TCP which improves the latency score
and achieve high speed and also helps us to remove the problem of inter-packet
delay due to congestion and end-to-end reliability check. Our focus remains with
Amazon as it provides easy configuration of image with low time instance running
and supported by TOR itself.
We can analyze from Table 1 that characteristics provided by running TOR as a
bridge in cloud can fortified our gates from analyzing the traffic.
Performance
Performance shows that under proxified-TOR mode, Cloud Transport like pluggable
transport which is running in cloud or might be acting as a bridge enters the network
150 A. Gupta et al.
Table 1 Features of TOR cloud with pluggable transport [7, 18]

Users’ ISP Cloud storage provider TOR cloud bridge with
pluggable transport
Network locations of Hidden Known Hidden
TOR cloud users
Destinations of TOR Hidden Hidden Known (tunnel mode)
cloud traffic Hidden (proxified mode)
Content of TOR cloud Hidden Hidden Known (tunnel mode)
traffic Hidden (proxified mode)
after passing through bridge henceforth it is subject to the same performance which
was given by TOR + Obfsproxy. However, determining the performance in realistic,
large scale deployment where storage site is far away from continent is the work of
future.
3.2 Depending Upon High Latency and High Privacy
In the same survey we observed some interesting figures which demand the high
privacy regardless of high latency. In Fig. 2, where 41.02% of people is interested in
this methodology where they can compromise with time but not with their privacy.
Perhaps people like social activist, communist and researcher like to use this.
Malicious adversaries in the threat model shown in Fig. 4 at exit node inspect all
traffic and can utilize the information by any means. Adversary of security between
exit node to server is mitigated in the presented malicious exit node evasion security
model below.
Fig. 4 Malicious exit node evasion threat model

Fig. 5 Malicious exit node evasion security model
So, we have developed the security model shown in Fig. 5, which fulfills the
requirements of user’s choice. In the existing TOR architecture when user communi-
cates to server, if malicious exit node is under control of adversary then privacy and
security of user can easily compromise. To mitigate this, we proposed an architecture
which provides end to end security and privacy to user.
In our proposed architecture we use Cloud, TOR with UDP and VPN. Cloud make
the tunnel up to exit node and whole user traffic goes through that tunnel. VPN make
another tunnel from where cloud tunnel ends at malicious exit node to the server.
Therefore, the whole user traffic remains encrypted with the tunnel. Even if exit node
is malicious or under the control of adversary then in such case it would be really
difficult or almost impossible to intervene into the tunnel or at VPN server because
privacy issues would not allow them to pass information easily if the user is paid via
Bit coin specially.
Due to use of VPN the latency will increase but it provides end to end security and
ensure privacy. For those users whose main concern are secure communication and
want to be hidden without worry about latency factor, they can opt this methodology.
Performance
During the research we observed that usage of UDP in the TOR may degrade the
performance and speed which user would not really appreciate as TOR is known for
his low latency network. Therefore, depending upon the user’s concern where high
privacy was main concern but not latency, we still able to succeed in achieving the
approximate latency by altering the architecture of TOR circuit.
We have deployed the UDP + TLS between the initiator and exit node i.e. under
the TOR circuit the connection will work on UDP with transport layer security
protocol, presented by “Reardon [19]”. But we analyzed that there are ‘n’ number of
nodes situated along the circuit and we don’t have such facility from innocent client
side that how many numbers of nodes have been used per circuit. As the number of
nodes increases the packet delivery ratio also increases in TCP but lack in UDP [20].
152 A. Gupta et al.
So, to eradicate this problem we applied two techniques use of UDP + TLS
and many short live connections per packet. Where first handles the issues with
UDP by the help of TLS and on the other hand second defines that if want the
higher throughput from UDP then we have to limit the connections per packet which
even helps in evading from fingerprinting attacks [20]. Use of 1:1 circuit per TCP
connection rather than TOR n: 1 [21]. Here TCP frames are wrapped in DTLS
encryption.
4 Architecture
As most of adversary under the control of government, if entry node and exit node is
under the control then adversary can modify, add or delete the traffic. They can de-
anonymize the user. So, user’s privacy and integrity compromised. In general degree
of anonymity can be measured by a metric that is size of anonymity set, proposed
by Diaz et al. [22] for anonymous communication network.
Anonymity of user depends upon the size of anonymity set. If anonymity set is
larger than larger the anonymity users are provided. Degree of Anonymity can be
calculated by entropy as a tool proposed by Diaz.
The entropy is given by Eq. (1).

N
H (X ) = − pi log2 ( pi ) (1)
i=1
where, sender defines the probability mass function which is represented by (pi ).
And putting negative sign neutralized the negative value of log, as we know the log
value of a number which comes under 0 and 1 gives result in negative.
Degree of the anonymity is d that is defined by Eq. (2).
HM − H (X ) H (X )
d =1− = (2)
HM HM
H (M) = log2 N (therefore, implies maximum entropy). For one user d is defined
by Eq. (3).
H (X )
d= (3)
HM
Randomization is the function of H(x) [23–25]. If the randomization will be more

then entropy will be high as well as degree of anonymity. (Directly proportional to
H(x) and H(x) directly proportional to entropy and randomization.)
In TOR we use the pluggable transport to randomize the traffic and so degree of
anonymity increases and so the privacy of user.
Table 2 Without use of

Users X Entropy value Max entropy Degree of
pluggable transport
H(x) Hm anonymity (d)
1st 0.2 0.698 0.2865
2nd 0.31 0.698 0.4441
3rd 0.42 0.698 0.6017
4th 0.49 0.698 0.7020
5th 0.51 0.698 0.7306
Table 3 With use of pluggable transport

Users X Entropy value H(x) Max entropy Hm Degree of % of increase in d
anonymity (d)
1st 0.24 0.698 0.3438 20
2nd 0.38 0.698 0.5444 22.58
3rd 0.49 0.698 0.7020 16.66
4th 0.53 0.698 0.7593 8.16
5th 0.59 0.698 0.8452 15.68
Let us suppose there is only 5 users So, N = 5; in the experiments assumed by us

as shown in Tables 2 and 3.
Hence degree of anonymity also increases. Our experiments exhibit some of the
values that defined that as entropy value increases so as degree of anonymity in case
of pluggable transport.
In our approach we have used UDP in both the security model which was the
result of survey that raises alarm for threat model (which preferred TCP in the
Tor Circuit). But as UDP alone doesn’t provide reliability, in-order delivery and
congestion control therefore we have deployed TLS above transport layer to tunnel
the TCP traffic between peers which defined by UDP based TLS connection that
handles and eliminates the same issues with TCP. This technique was proposed by
“Reardon and Goldberg in 2009 [19] ” where they have demonstrated that using TCP
over DTLS improves the TOR speed and architecture by ensuring each circuit use
unique TCP connection. Use of QUIC (quick internet UDP connections) protocol
could be the appropriate approach in the future work as it provides low latency with
0 RTT [26].
5 Conclusion
We proposed a new working methodology over tor using simulation depending upon
the two threat models and their respective security models where we achieved the
154 A. Gupta et al.
maximum security and privacy concerns. While in first model which mainly con-
cerns about intermediate latency and privacy we presented the censorship resistant
networking security model running in cloud to satisfy the 50% people demands
in this category and in the second model which mainly exhibits high latency and
high privacy we proposed the evasion from malicious exit node by the use of VPN
technology and able to satisfy the 41.02% people demands in this category.
Our approach not only secure user from censorship and adversary but also per-
formance wise we attained the standard criteria by the use of UDP with TLS and
Short live connections. On other our assumptions and experiments with the entropy
formula provides us the significant results that helps in increasing the randomization
with pluggable transport and hence so in making censorship resistant network. And
we believe that as the coming time is not limited by speed and high computation
networking our techniques will prove significant and more useful.
References
1. Identify location, https://atlas.torproject.org/

2. TOR: overview, https://www.torproject.org/about/overview
3. Users of TOR, https://www.torproject.org/about/torusers.html.en
4. TOR sponsors, https://www.torproject.org/about/sponsors.html.en
5. http://www.theguardian.com/world/2013/oct/04/tor-attacks-nsa-users-online-anonymity
6. Raymond, J.-F.: Traffic analysis: protocols, attacks, design issues and open problems. In: Pro-
ceedings of the International Workshop on Design Issues in Anonymity and Un-observability,
2001, pp. 10–29
7. Brubaker, C., Houmansadr, A., Shmatikov, V.: CloudTransport: using cloud storage for
censorship-resistant networking. PETS (2014)
8. Chakravarty, S., Stavrou, A., Keromytis, A.D.: Traffic analysis against low-latency anonymity
networks using available bandwidth estimation. In: Proceedings of the 15th European Sym-
posium on Research in Computer Security (ESORICS), pp. 249–267. Athens, Greece, Sept
2010
9. Weinberg, Z., Wang, J., Yegneswaran, V., Briesemeister, L.: StegoTorus: a camouflage proxy
for the TOR anonymity system. In: Proceedings of the 19th ACM Conference on Computer
and Communications Security (2012)
10. Dingledine, R.: Obfsproxy: the next step in the censorship arms race. TOR project official blog
(2012) https://blog.torproject.org/blog/obfsproxy-next-step-censorship-arms-race
11. Field, D.: Meek: a simple HTTP transport. TOR Wiki (2014)
12. TOR bridges usage, https://www.torproject.org/docs/bridges#PluggableTransports
13. Winter, P., Lindskog, S.: How the great firewall of china is blocking TOR. In: FOCI, 2012
14. Dingledine, R., Mathewson, N.: Design of a blocking-resistant anonymity system, https://svn.
torproject.org/svn/projects/design-paper/blocking.html
15. Chakravarty, S., Portokalidis, G., Polychronakis, M., Keromytis, A.D.: Detecting traffic snoop-
ing in tor using decoys. In: International symposium on recent advances in intrusion detection
(RAID), pp. 222–241. Menlo Park, CA, Sept 2011
16. Chakravarty, S., Barbera, M.V., Portokalidis, G., Polychronakis, M., Keromytis, A.D.: On
the effectiveness of traffic analysis against anonymity networks using flow records. In: The
Proceedings of the 15th Passive and Active Measurement Conference (PAM). Los Angeles,
CA, Mar 2014
17. Run TOR as a bridge in the Amazon cloud, https://blog.torproject.org/blog/run-tor-bridge-
amazon-cloud
18. TOR bridges in the cloud Amazon, https://cloud.torproject.org/

19. Reardon, J., Goldberg, I.: Improving tor using a TCP-over-DTLS tunnel. In: Proceedings of
18th USENIX Security Symposium 2009 (USENIX Security), Aug 2009
20. Meenakshi, M.: Impact of network size and link bandwidth in wired TCP & UDP network
topologies. Int. J. Eng. Res. Gen. Sci. 2(5) (2014)
21. Murdoch, S.J.: Comparison of TOR datagram designs. Technical report, 7 Nov 2011
22. Diaz,C.: Anonymity metrics revisited. Dagstuhl seminar on anonymous communication and
its applications, Oct 2005
23. Serjantov, A., Danezis, G.: Towards information theoretic metric for anonymity. In: Proceedings
of the 2nd International Conference on Privacy Enhancing Technologies (PET’02). Springer,
Berlin, Heidelberg (2002)
24. Al Sabah, M.: Performance and security improvements for TOR: a survey. Qatar University
and Qatar Computing Research Institute Ian Goldberg, University of Waterloo
25. Ellis, R.S.: Entropy as a measure of randomness, http://people.math.umass.edu/~rsellis/pdf-
files/entropy-randomness-2000.pdf
26. http://blog.chromium.org/2013/06/experimenting-with-quic.html
PHT and KELM Based Face Recognition
Sahil Dalal and Virendra P. Vishwakarma
Abstract Recognition of human face images is getting much attraction in pat-

tern recognition since last few decades. Artificial intelligence and machine learning
always tries to get more and more accurate for recognizing the face images. Only
pixel based information of the face image can be helpful in recognizing the face
images. This recognition rate can be increased if some feature of the face image
is also added up with the pixel information of the face image. Based on this phe-
nomenon, polar harmonic transform is utilized as the feature extraction technique for
the feature based information. With this feature based information, kernel extreme
learning machine (KELM) is utilized as the classification tool. It can be seen from
the results obtained on the ORL, YALE and Georgia Tech face databases that more
accurate results can be obtained using the feature based information.
Keywords Face recognition · Feature extraction · Polar harmonic transform ·

Kernel extreme learning machine
1 Introduction
Face recognition is gaining a boom in the field of biometry, intelligence, military,

face recognition based attendance systems etc. Face recognition has already been
performed using various classification techniques and many feature extraction based
methods as well. These includes artificial neural network [1], support vector machine
[2], extreme learning machine [3] etc.
Using the images directly for classification can result in good face recognition but
this accuracy is increased by utilizing the classification techniques with some feature
extraction techniques [4, 5]. For any face images, there are many feature extraction
methods which have already been utilized. These includes Fractional DCT-DWT [6],
S. Dalal (B) · V. P. Vishwakarma

University School of Information, Communication and Technology, Guru Gobind Singh
Indraprastha University, Dwarka Sector 16-C, New Delhi 110078, India
V. P. Vishwakarma
e-mail: vpv@ipu.ac.in
https://doi.org/10.1007/978-3-030-38445-6_12
158 S. Dalal et al.
DCT [7], KPCA [2], 2DPCA [8] etc. An advantage of extracting the features for any
image is that it adds up some extra information with the pixel information that is
already present in a face image. So, the combination of pixel based information and
feature based information results in more accurate results in face recognition. Rather
than the above mentioned feature extraction methods, there is one more method that
can be utilized in feature extraction i.e. polar harmonic transform (PHT). It was used
in image watermarking by Li et al. [9]. But, it has never been used for face recognition
as feature extraction technique. PHT is utilized here for feature extraction because it
can be very helpful with pose and expression variations in the face images.
The remaining sections of the paper are arranged as follows: Sect. 2 tells about
the basic concepts utilized in the proposed approach. Section 3 explains the proposed
robust feature extraction method for classification of face images in detail. Section 4
gives the experimental results followed by conclusion in Sect. 5.
2 Preliminaries
2.1 PHT
Polar harmonic transform (PHT) consists of three transforms. These are polar com-
plex exponential transform, polar sine transform and polar cosine transform. These
all transforms have same representation in mathematics with different radial parts of
kernel function. Let g (r, θ ) be a continuous image function defined on a unit disk X
= {(r, θ ) 0 ≤ r ≤ 1; 0 ≤ θ ≤ 2π }. The PHTs of order mand repetition n are defined
by
2π 1
∗
L mn = δ g(r, θ )Umn (r, θ )r dr dθ (1)
0 0
∗
where m, n = 0, ± 1, ± 2, …. The kernel function Umn (r, θ ) is the complex conjugate
of the function U mn (r, θ ) determined by
Umn (r, θ ) = Rm (r )e jnθ (2)

√
with j = −1. The radial part of the kernel function and the parameter δ are expressed
as
2 1
PCET: Rm (r ) = e j2πmr , δ = (3)
π

cos(π mr 2 ), f or PC T
PCT and PST: Rm (r ) = (4)
sin(π mr 2 ), f or P ST
PHT and KELM Based Face Recognition 159
1
, m=0
δ= π (5)
2
π
, m = 0
As the kernel function is orthogonal, it leads to reconstruct the image function

g(r, θ ) i.e.

m max
n max
ĝ(r, θ ) = L mn Umn (r, θ ) (6)
m=−m min n=−n min
where mmax and nmax are the maximum values of m and n respectively. Increasing
the number of transform coefficients leads ĝ(r, θ ) to be more and more close to g(r,
β
θ ). Let L mn be the transform coefficient of the rotated image about its centre (say
with angle μ) that gives g (r, θ + μ). It can be represented as
L βmn = L mn e− jnβ (7)
β
This gives a very important property of PHT i.e. rotation invariance as |L mn | =
|L mn |. PHT has many advantages over other orthogonal moments (such as Zernike
moments, pseudo-zernike moments etc.) [10]. These advantages include lower com-
putational complexity, lower sensitivity of noise and has better reconstruction abil-
ity of the images. Therefore, PHT is used as feature extraction technique for the
face images which is further utilized in KELM for the classification which briefly
explained in the next sub-section [11].
2.2 KELM
Extreme learning machine (ELM) is a technique which is a simple feed-forward

neural network having one hidden layer. ELM has the characteristic of having random
weights in hidden layer and hence, results are always unique with its execution. In
KELM, kernel matrix is only related to the input and training samples. There is no
relevance of kernel matrix with output or target. Here, K is the kernel function and
ψ is the hidden layer output matrix. The output ψ for this hidden layer matrix is
computed as follows:

S
K= λs ψ(As , Bs , x) = λ.σ (x) (8)
s=1
where σ (x) = [ψ(A1 , B1 , x), …, ψ(AS , BS , x)] is the output matrix obtained from
the hidden layer with respect to the input x. S represents the number of neurons in
hidden layer. AS and BS are the input weights and biases respectively of sth neuron
in the hidden layer. The output weight λ which connects output to hidden nodes is
160 S. Dalal et al.
obtained analytically as:

−1
I
λ = σ −1 K = σ + σσ (9)
ς
where G is the target class and ς is user defined parameter for regularization.
Therefore, ELM model can be formulated as:
−1
I
K E L M (x) = σ (x)σ + σσ (10)
ς
ELM is to minimize the training errors and also allows to reduce the computational
time. With these advantages, ELM also has some disadvantage as it has local minima
issue and issue of easy over-fitting. So, to overcome the limitations of ELM, another
algorithm is proposed in terms of kernel matrix [12–14]. The kernel matrix can be
obtained using (11):

S
ζ = σ (Aς , Bς , xs ).σ (Aς , Bς , xs ) (11)
ς=1
where, x s denotes the training data and S represents the number of training data used.
Therefore, formulation for KELM can be done as:
−1
I
K K E L M (x) = σ (x)σ +ζ (12)
ς
This approach is named as kernel extreme learning machine (KELM) and can
be used as a binary as well as for multi-class classification [15]. Here, Kernel-ELM
(KELM) is used for the face recognition in the proposed approach. KELM contains
two parameters as regularization coefficient and kernel parameter which needs to be
set for the experimentation.
3 Proposed Method
The proposed approach gives a novel, robust and accurate method for face recogni-
tion. The method is based on feature extraction of the face images. It is represented
using a block diagram which is shown in Fig. 1.
In the proposed approach, face images are recognized with the help of their pixel
based information and some features. Only pixel based information is also helpful
in face recognition but with extraction of features, face recognition becomes more
accurate. As shown in block diagram, face images are used to extract some features
from them. These features are the polar harmonic transform information of the face
Fig. 1 Block diagram of the

Face Images PHT based
proposed approach
Features
Features
(Pixel based +
PHT Based)
KELM
images. Say image is of size u × v. Then, the size of feature matrix obtained by
applying PHT is also u × v. These features are termed as PHT based features or
feature based information of the face images. These features are robust to pose
and expression variations because PHT is rotation invariant transform and hence, it
helps in extracting the rotation invariant features from the face images. These feature
based information matrix of size u × v is combined with the pixel based information
matrix of size u × v, making the combination to 2u × v size matrix. After feature
extraction, KELM is applied over these extracted features. There are mainly three
parameters in KELM which needs to be adjusted for the accurate classification. These
are as follows: regularization coefficient, kernel parameter and kernel type. To obtain
optimal values for these parameters, no method is present in the literature. Values of
these parameters are adjusted according to the database so that minimum percentage
error rate can be achieved.
4 Experimental Results and Analysis
The proposed method of feature extraction and classification is tested over three face
databases. These are briefly discussed below.
4.1 Face Databases
4.1.1 ORL Face Database
The ORL face database is one of the three databases used for evaluating the consis-
tency of the proposed approach. This database contains 400 images with 40 different
subjects. Each subject contains 10 images with different facial expressions and details
[16].
162 S. Dalal et al.
Fig. 2 Sample images of ORL face database for 2 subjects
These facial expressions and variations include face images with open or closed
eyes, smiling or non-smiling face images and images with or without wearing glasses.
These images were taken at different times for some subjects. Some rotation and
tilting of the faces are also present in this database. This is present with a tolerance
of up to 20 degrees. All are gray scale images and normalized to same size of 92 ×
112 pixels. Images in this database are in pgm format. Sample face images for the
ORL database are shown in Fig. 2. It can be seen from the images that pose and
expression of the face images there is some variation present in the images.
4.1.2 YALE Face Database
YALE face database images are in gif format and also contains various face expres-
sions. This face database has 165 images in total with 15 subjects, each subject
having 11 face images. The face expressions that are included in this face database
are images with normal, sad, wink, surprised, happy, occlusion (with or without
glasses) and with varying directions of illumination [17]. All of the images in this
database have 320 × 243 pixels of resolution which are cropped to the resolution
of 220 × 175 pixels. These are resized to 138 × 110 of resolution for the proposed
approach.
The number of images for training dataset is varied from 1 to 8 per subject and
the remaining images are used for testing purpose. Figure 3 represents the sample
face images of the YALE face database with small variation with respect to pose and
expression in the face images. This pose and expression variations does not affect
the proposed approach because of the utilization of PHT which is rotation invariant
transform.
Fig. 3 Sample images of YALE face database for 2 subjects
4.1.3 Georgia Tech Face Database
Georgia tech face database has 15 images per subjects of 50 subjects. These are 750
face images in total [18].
Here also, face images have various variations with respect to illumination, differ-
ent scales and orientations in the face images of each subject taken between 06/01/99
and 11/15/99. To have these variations, images were captured in two different ses-
sions. All the face images of this database are in color jpeg format and hence, con-
verted to gray scale for experimentation using the proposed approach. In this face
database, all the images are resized to 40 × 30 pixels of resolution for the experi-
mentation. Similar to other two databases, sample images for this database are also
shown in Fig. 4. Here also, pose and expression variations can be seen in the face
images of Georgia tech face database.
Fig. 4 Sample images of Georgia tech face database for 2 subjects

164 S. Dalal et al.
Table 1 Percentage error

No. of IMG + Interval type Proposed
rate of the proposed method
images used KELM II FPIE [19] method
on ORL database
for training
3 14.29 13.57 13.57
4 13.33 11.67 11.67
5 10 9.00 7.5
6 8.125 7.50 6.25
7 7.5 5.00 5.83
4.2 Results
The face databases utilized during the experimentation of proposed approach have
pose and expression variations present in them. Figure 2, 3 and 4 show the sample
images of these databases. As PHT is a rotation invariant transform, it is used for
feature extraction of such face images because of the rotation invariant property of
PHT. KELM is, then, utilized for recognizing the face images.
4.2.1 ORL Face Database
The proposed approach is tested over the ORL database. A brief overview of the
database is already given in the above mentioned sub-section. The images from the
database are selected for training and test set. Number of images used for training
purpose is varied from 3 to 7 (out of 10 per subject) and the feature matrix is obtained
for the corresponding train and test sets. Then utilizing KELM, percentage error rate
is obtained taking only images sets for the train and test setsat a time. And then, taking
image with extracted features from PHT in the training and test sets, percentage error
rate is obtained using KELM.
Three parameters in KELM which needs to be selected optimally are selected as
follows: Regularization coefficient is selected as 50 for the proposed approach so that
problem of over-fitting and ill-posed can be solved. The kernel type is selected as RBF
from the linear, polynomial, wavelet and RBF kernels. The value of kernel parameter
is selected as 5 in KELM so that non-linear mapping from lower dimensional to higher
dimensional feature space can be done. The results are shown in Table 1. Comparison
of the proposed method is also represented with the Interval Type II FPIE (Fuzzy
based pixel wise information) method [19].
4.2.2 YALE Face Database
The proposed approach of feature extraction is also utilized on YALE face database.
In this database also, images are utilized sequentially for the selection of training
and testing datasets. 11 images per subject are varied from 1 to 8 for forming the
Table 2 Percentage error rate of the proposed method on YALE database

No. of images IMG + KELM Discriminative Interval type II Proposed
used for training sparse FPIE [19] method
representation
[20]
1 45.33 – – 44.667
2 13.33 – 34.07 13.33
3 9.167 15.00 20.00 9.167
4 8.5714 12.38 18.09 8.5714
5 8.889 10.00 14.44 6.667
6 5.333 – – 4.00
7 5.000 – – 3.333
8 6.667 – – 4.444
training dataset. Similarly, testing dataset is obtained with the rest of the face images
(10 to 3). On these training and testing datasets, feature extraction is computed using
PHT. Now, percentage error rates are computed for the cases when only KELM
is applied over these datasets obtained, and when these datasets are concatenated
with feature extracted from PHT and then KELM is applied over them. Here also,
regularization coefficient, kernel type and kernel parameters are adjusted for the better
results so that error can be minimized. Values of regularization coefficient and kernel
parameter are selected as 1000 and 9000 respectively. It is obtained after number of
experimentations performed for many combinations of these parameters. RBF kernel
type is best suited for the YALE face databases and it can also be seen from the results
that better results are achieved using the proposed approach compared to [19, 20].
The results are shown in Table 2.
4.2.3 Georgia Tech Face Database
After testing the proposed approach over gray scale face images, a face database is
utilized which contains jpeg format color images. These face images are also con-
verted to gray scale and similar to ORL and YALE face databases, training and testing
datasets are formed. Here, images are selected sequentially from 4 to 8 from 15 images
per subject so that fair comparison can be done with [19, 20]. Training and testing
datasets are formed and corresponding features are extracted using PHT. Experimen-
tation is performed utilizing KELM with features and the results obtained are quite
good when the images used for training are less in number as compared to other face
databases. When images used for training are 7 and 8, a slightly higher percentage
error rate is obtained compared to other state of art approaches. For the KELM,
parameters such as regularization coefficient, kernel type and kernel parameter are
tuned for the perfection in the proposed approach. With various experimentations,
regularization coefficient is selected as 60 and kernel parameter as 800. Here also,
166 S. Dalal et al.
Table 3 Percentage error rate of the proposed method on Georgia tech face database
No. of images used for Discriminative sparse Interval type II FPIE Proposed method
training representation [20] [19]
4 42.73 48.36 41.09
5 38.40 44.80 38.20
6 31.33 33.56 31.11
7 28.75 27.50 27.75
8 26.29 24.57 25.71
RBF kernel type gives the best results. The percentage error rate obtained over this
database are shown in Table 3 with comparison with other state of art approaches.
5 Conclusion
The proposed approach is proven to be a novel and robust approach of feature extrac-
tion in face recognition. The results achieved with feature based information com-
bined with pixel based information are more accurate compared with results achieved
only on pixel based information. It can also be observed form the results that PHT is
helpful in obtaining such features from the images that are orientation and rotation
variant. On combining with KELM, betterment of results is also possible. The pro-
posed approach can further be modified for much better results and can also be used
on other face databases to make it more robust to face poses and expressions.
References
1. Latha, P., Ganesan, L., Annadurai, S.: Face recognition using neural networks. Signal Process.
Int. J. 3(5), 153–160 (2009)
2. Kim, K.I., Jung, K., Kim, H.J.: Face recognition using kernel principal component analysis.
IEEE Signal Process. Lett. 9(2), 40–42 (2002)
3. Goel, T., Nehra, V., Vishwakarma, V.P.: An adaptive non-symmetric fuzzy activation function-
based extreme learning machines for face recognition. Arab. J. Sci. Eng. 42(2), 805–816 (2017)
4. Lu, J., Liong, V.E., Wang, G., Moulin, P.: Joint feature learning for face recognition. IEEE
Trans. Inf. Forensics Secur. 10(7), 1371–1383 (2015)
5. Banitalebi-Dehkordi, M., Banitalebi-Dehkordi, A., Abouei, J., Plataniotis, K.N.: Face recogni-
tion using a new compressive sensing-based feature extraction method. Multimed. Tools Appl.
77(11), 14007–14027 (2018)
6. Goel, A., Vishwakarma, V.P.: Fractional DCT and DWT hybridization based efficient feature
extraction for gender classification. Pattern Recognit. Lett. 95, 8–13 (2017)
7. Hafed, Z.M., Levine, M.D.: Face recognition using the discrete cosine transform. Int. J. Comput.
Vis. 43(3), 167–188 (2001)
8. Yang, J., Zhang, D., Frangi, A.F., Yang, J.: Two-dimensional PCA: a new approach to
appearance-based face representation and recognition. IEEE Trans. Pattern Anal. Mach. Intell.
26(1), 131–137 (2004)
9. Li, L., Li, S., Abraham, A., Pan, J.-S.: Geometrically invariant image watermarking using polar
harmonic transforms. Inf. Sci. (NY) 199, 1–19 (2012)
10. Qi, M., Li, B.-Z., Sun, H.: Image watermarking via fractional polar harmonic transforms. J.
Electron. Imaging 24(1), 013004 (2015)
11. Wang, X., et al.: Two-dimensional polar harmonic transforms for invariant image representa-
tion. IEEE Trans. Pattern Anal. Mach. Intell. 46(7), 403–418 (2010)
12. Huang, G.-B., Siew, C.-K.: Extreme learning machine with randomly assigned RBF kernels.
Int. J. Inf. Technol. 11(1), 16–24 (2005)
13. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: a new learning scheme
of feedforward neural networks. In: 2004 IEEE International Joint Conference on Neural
Networks, 2004. Proceedings, vol. 2, 2004, pp. 985–990
14. Huang, G.-B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and
multiclass classification. IEEE Trans. Syst. Man Cybern. Part B 42(2), 513–529 (2012)
15. Wong, C.M., Vong, C.M., Wong, P.K., Cao, J.: Kernel-based multilayer extreme learning
machines for representation learning. IEEE Trans. Neural Netw. Learn. Syst. (2016)
16. AT&T (ORL) face database: [Online]. Available: https://www.cl.cam.ac.uk/research/dtg/
attarchive/facedatabase.html
17. YALE face database: [Online]. Available: http://cvc.yale.edu/projects/yalefaces/yalefaces.%
0Ahtml
18. Georgia tech face database: [Online]. Available: http://ftp.ee.gatech.edu/pub/users/hayes/
facedb/
19. Yadav, S., Vishwakarma, V.P.: Interval type-2 fuzzy based pixel wise information extraction: an
improved approach to face recognition. In: 2016 International Conference on Computational
Techniques in Information and Communication Technologies (ICCTICT), 2016, pp. 409–414
20. Xu, Y., Zhong, Z., Yang, J., You, J., Zhang, D.: A new discriminative sparse representation
method for robust face recognition via l 2 regularization. IEEE Trans. Neural Netw. Learn. Syst.
28(10), 2233–2242 (2017)
Link Failure Detection in MANET:
A Survey
Manjunath B. Talawar and D. V. Ashoka
Abstract The MANET is a wireless network, self-configuring, infrastructure less,

with combination of mobile devices related to wireless link. MANETs are in dynamic,
in terms of motion of different nodes can lead to common link failures. The link
breakage will lead to reducing the network performance and network overhead. The
purpose of this review work is to understand the various approaches proposed to link
failure detection and prediction in wireless networks and also focus on route migra-
tion due to link failure. This summary work discusses about the accepted routing
protocols of link breakage problems.
Keywords Mobile Ad Hoc networks (MANETs) · Link failure · AODV · DSR ·

DSDV · Wireless mesh networks (WMNs)
1 Introduction
MANET [1] is an infrastructure less network with combination of different mobile

nodes and each mobile node works with the small battery power to transfer the pack-
ets from source node to destination node. These days, dynamic nature and mobility in
MANETs leads to set of networking strategies to be developed in order to provide effi-
cient, robust and reliable communication during emergency acquisition operations,
search and rescue operations, military operations, classrooms and conventions. End
users are sharing information using their mobile equipments, in order to provide
efficient communication and reducing power consumption, transmission overhead.
The continuous alteration of structure of the network is due to mobile node mobility.
It is challenging in maintaining routing and handling link failures for routing
protocols to make out with time to time changes in the topology of the network. When
the number of users increases MANET suffers from common network problems like
performance degradation, route loss, link quality, interference between the nodes,
power spent, overhead in the network and changes in topology. Hence reliable routing
protocols and link failure detection methods play vital roles in MANET. These routing
M. B. Talawar (B) · D. V. Ashoka

JSS Academy of Technical Education, Bangalore, India
https://doi.org/10.1007/978-3-030-38445-6_13
170 M. B. Talawar and D. V. Ashoka
Fig. 1 Mobile ad hoc network
protocols have different features. To understand every routing protocol features we

need to have detailed understanding of each MANET routing protocols [2, 3] and
link failures detection method (Fig. 1).
MANET is a multi-hop wireless network in which the wireless transmission range
of the system is amplified by using each of the participating nodes through multi-hop
packet forwarding. MANETS can be used in the situation where the pre-deployed
infrastructure support is not available. In Wireless Mesh Networks [4], MANETs
are taken as subset of WMNs. To increase the connectivity of MANETs can be
done by selecting any topology with configured Wireless Mesh topology. Nodes in
MANETs are having identical specifications, functionalities and regulations without
any restrictions in mobility. Link failure is a big challenge in WMNs and MANETs.
Path overhead, interference between the user nodes, path loss, and bandwidth needed
from the applications lead to link breakage in WMNs and MANETs. Sometimes
dynamic topology created due to mobility in mobile nodes in MANETs with resulting
high rate of link failures, more energy consumption and network partitions interrupt
to data transmission. Based on this, studying network regaining and link breakage
prediction methods in advanced research areas to work, by designing and developing
a fast with accurate mechanism to predict link breakages and also to procedure
regaining from the link breakages, by using backup routing protocols.
Link Failure Detection in MANET: A Survey 171
2 Related Work
The broad classification of Mobile Ad Hoc routing-protocols has been projected as

shown in Fig. 2 [5].
2.1 Proactive Routing Protocols
The proactive routing protocols (table-driven) calculates routes to various nodes in

the network and propagates, proactively, route updates at fixed time intervals. To
maintain routing tables in end to end data communication can lead to overhead in the
wireless network. When network topology changes, the original paths are no longer
valid and all the nodes will receive updates on the status path. So that, the node
routing table is ready in time, and immediately available when needed. In proactive
routing, the data has a minimal delay but there is a considerable waste of wireless
node power and wireless bandwidth.
2.2 Reactive Routing Protocols
Reactive routing protocols (On-demand) [6], Based on the request, path informa-
tion will be communicated to neighbors, calculates the route and sends the routing
table only when there is a need to communicate between two nodes. Because it
does not have to save all possible routes in the network, reactive routing has smaller
route discovery overheads. So, it consumes less bandwidth than proactive routing.
Fig. 2 Classification of mobile ad-hoc routing protocols

Table 1 Comparison of
Descriptions Proactive Reactive
proactive and reactive
protocols Control traffic Usually higher Increases with the
than reactive mobility of active
paths
Delay Small as routes are High as routes are
predetermined computed on
demand
Scalability Up to 150 nodes Higher than
proactive
protocols
Periodic updates Always required Not required
Storage More than reactive Depends on the
requirement required paths
Requirement High Low
power
Route structure Hierarchical/flat Flat
Bandwidth Required higher Low required
Path availability Always available On-demand
But the disadvantage is that not every node sending data can always quickly find
the route. The procedure of finding the route can cause significant delay. Compari-
son between proactive (table-driven) and reactive (on-demand) routing protocols is
shown in Table 1. This table is collected from [7, 8].
2.3 Hybrid Routing Protocols
The Hybrid routing protocols make combination of both proactive (table-driven) and
reactive (On-demand) techniques.
3 Overview of Routing Protocols
Related to this work, brief summary on well-known protocols like AODV, DSR and
DSDV were disused.
3.1 Destination Sequenced Distance Vector (DSDV) Protocol
The DSDV protocol [9] is a proactive routing protocol algorithm and also some
improvement of the Bellman-Ford algorithm. DSDV is a hop-by-hop approach to
broadcast the updated path information (routing tables) time to time to gather infor-
mation about the actual network topology given by each node. Every node holds
isolated routing table that contains the shortest path information given by each node
on the shortest path to all different nodes in the network. To find out routes which pre-
vent routing loops, it embodies updated routing table information with accelerative
series of tags.
To maintain the table consistency in each node, routing details has to be modified
from time to time in routing table. Basically there are two kinds of routing table
modification methods: full dumps and additive updates. The full dumps modify
all approachable routing details fully or partially with its neighbour’s nodes. The
incremental updated detail transfers only the altered routing data since the last full
dump operation. This protocol has some benefit of lower path request latency. The
disadvantage of this protocol is higher overhead in the network and it performs good
in network with average mobility and some nodes.
3.2 Ad Hoc On-Demand Distance Vector Routing (AODV)

Protocol
In Ad Hoc On-demand Distance Vector routing protocol [9], a path is constituted

only when there is request from the source node for data transmission. This protocol
uses distance-vector concept, but in a different way, every node will hold the path
information in the routing tables for packet transmission and also creates routes
on-demand as opposite to DSDV that keeps the list of all routes, but only needs to
maintain the routing information about the active paths. These protocols have been
categorized into pure on-demand path acquisition system. By the movement of a
source node, the path finding protocol will be re-initiated to seek for fresh path to
the target. On the other hand, by the movement of an intermediate node its upstream
neighbors will recognize its movement and a link failure notification message will
be propagated to its current neighbors which will in turn propagate that notifying
message to respected neighbors, until it reaches the source node. The paths are
constituted based on on-demand request to discover the latest path to the destination.
The disadvantage of AODV protocol [10], is more number of control overheads due
to many path reply messages for single route request and also periodic hello message
can lead to unnecessary bandwidth consumption.
Table 2 Property comparison of DSDV, AODV and DSR

Sl. No. Property of the protocol DSDV AODV DSR
1 Reactive No Yes Yes
2 Loop free Yes Yes Yes
3 Periodic broadcast Yes Yes No
4 Paths maintained in Routing table Routing table No
5 QoS support No No No
6 Unidirectional link support No No Yes
7 Multicast routes No No Yes
8 Distributed Yes Yes Yes
9 Route cache/table timer Yes Yes Yes
3.3 Dynamic Source Routing (DSR) Protocol
DSR protocol is a source-routed on-demand routing protocol, header packet contains

the complete hop-by-hop path to the destination node and also paths are maintained
in route cache.
The main source routes that are important to the node are saved in the route caches.
When each node wants to communicate with another node to which it does not know
the path, it initiates a route discovery process by flooding RouteRequest (RREQ)
packets. Each destination node, after receiving the first RREQ, sends RouteReply
(RREP) to the source node. The advantage of DSR protocol is to decrease the control
overhead by using path cache information and disadvantage of the DSR protocol, is
inconsistencies in route construction phase and it may perform poorly in networks
with high mobility (Table 2) [9].
4 Literature Survey
Different researchers have investigated the area of link failure prediction in MANETs.
In this survey section, some illustrations of their works are discussed.
The Ad hoc On-demand Distance Vector (AODV) routing protocol overhead [11,
12] has been analyzed based on the link failure probability in MANETs. In this
method, the conflict probability was affected by secret node issue and outcome of the
link breakage probability were analyzed. However, the maximum routing overhead
was high and only 2 scenarios were discussed such as rectangle and chain scenarios
with all stationary nodes.
Ramesh et al. [13] has given the difficulty for link breakage prediction in Dynamic
Source Routing. In his work, the two path finding methods were discussed, which
are the source path and backup path.
Li et al. [14] has given the link breakage prediction algorithm by using the thresh-
old of the signal level in Ad Hoc On-demand Distance Vector Routing protocol. The
next node were calculated by the cost between it and sending node with the level of
the captured signal level of the packet which was lesser than the threshold level and
finds speed between it and the sending node.
Qin and Kunz [15] has worked with the prediction method of link breakdown
by applying equation to find the correct duration were the link breakdown can take
place. In this technique they did using DSR routing protocol. Every node manages the
routing table that holds the earlier address of the hop node and accepted signal power
of the packet with time duration of the packet accepted. After acquiring 3 packets,
the link breakdown duration of each node has to be calculated and compared it with
a set rate of the threshold. This will notify information to the source node to about
probability of link breakdown.
Qin and Kunz [15], Zhu [16] have dealt the same method of prediction in link
breakage as given and projected by the Qin and Kunz. These algorithms were
developed by using the MAODV and AODV protocols.
Choi et al. [17] have studied prediction algorithm for link failures at vehicular ad
hoc network. This algorithm will predict a link failure by the help of an rate of the
RSSI Received Signal Strength Indicator (RSSI).
Goff et al. [18] have dealt with DSR routing protocol link breakage problem.
In this method, they choose preemptive region and threshold, when particular node
travel into the preemptive region, it will forward a warning message to the initiated
node for the dynamic route in order to notify it that a link breakdown may occur
soon.
Ouni et al. have dealt prediction algorithm in Dynamic Source Routing for link
breakage by applying check model (includes two more modules). The behavior of
the nodes as one component to calculate appropriate routes for usage and second
component as, to find the path availability with delay consideration.
Lu et al. [19] have dealt with technique for changing to new path by identify-
ing earliest link failures in Dynamic Source Routing. The mechanisms named as
Dynamic Source Routing Link Switch finds link failure of a node and adjacent node
by calculating quality of the considered packets. If a link breakage is noticed to occur
soon, the node, using this mechanism, by forwarding link switch message request to
notify earliest link breakage.
Kaplan et al. [20], Singh and Sharma [21] have dealt with technique to find the
link failures in advance in network. To anticipate in early predicting link breakage
by utilizing signal strength method.
Wang and Lee [22], Maamar and Abderezzak [23] have dealt with reliable multi-
path QoS routing (RMQR) protocol using a slot assignment scheme. They used route
existence time and total numbers of hops count to choose a route with less delay
and more firmness. The Global Positioning System have been utilized for calculate
the path loss time in adjacent nodes. The particular technique properties have been
analyzed with other protocols.
Cha et al. [24] have studied; and they proposed a routing technique to find reliable
route increases the link reliability for MANET nodes. In GPS every node has to be
predicted to change anticipated location by finding node’s speed and also position
which modify source node to accept the path with extended link period among more
paths. The proposed method may cut down redundant control messages.
Ad Hoc On-demand Distance Vector-Reliable Delivery (AODV-RD) [25] to iden-
tify link breakage by using Signal Stability-Based Adaptive Routing (SSA) [26], This
technique was used to detect the good quality or worst quality of link by using great-
ness or weakness of signals. In this method it reduces the end to end delay with
increasing PDR.
Veerayya [27] has given a SQ-AODV routing protocol to improve the QoS in
AODV based on choosing energy for a node path finding and maintenance of the
process. The SQ-AODV Algorithm provides stable paths, minimizing the control
overhead and packet delay.
Sarma and Nandi [28] have given a RSQR routing protocol for wireless networks
to assist latency and network throughput requisite for routing quality of service. The
RSQR method calculates the path stability on the basis of strength of the accepted
signal. The RSQR technique has improved delivery ratio of the packets with average
delay.
5 Comparison of Link Failure Detection and Prediction

Algorithms in MANET’s
Sl. No. Name of the Authors Description of the Remarks

algorithm or features covered
method used
1 (LFPQR) Satyanarayana Prediction Algorithm
prediction et al. [29] algorithm to depends on the
algorithm predict the power level of a
upcoming state of node and mobility
a node to of each node
determine, a node
which were
selected as a router
or not. End-to-end
packet delay and
packet loss are
reduced here as a
performance
parameter metrics
(continued)
(continued)
method used
2 (PF_AODV) Sedrati Maamar Prediction method The purpose of this
Predict f ailure in and Benyahia for to detect and work is to improve
AODV Abderezzak predict link the AODV routing
failures by using mechanism and
signal strength. maintenance phase
This algorithm is
capable to find the
quality of link
which has to be
improved. (Qos
and link breakage
features are
covered here)
3 Route stability Sarma and Nandi Calculate link Throughput and
based QoS routing stability and route delay parameter
(RSQR) protocol stability based on has to be
signal strengths considered for
received. The QoS routing
selection of a path requirements
having a higher
order among all
executable paths
achieved by using
route stability
information
4 Optimize AODV Ghanbarzadeh and Hello message Average of broken
routing protocol Meybodi [30] mechanism to links message and
optimize the message overhead
efficiency of ad metrics are
hoc on-demand reduced
distance vector
protocol by help of
path accessibility
5 AODV_LFF Q. Li et al. This technique to To boost packet
routing protocol predict the link delivery rate and
breakage in data reduce the network
transfer transmission delay
6 RMQR N.-C. Wang et al. Path dependability To select a path
and stability to with low latency
predict link failure and high stability
in multi-route by using route
linguistic context lifetime and hops
with Ad Hoc number
on-demand
distance vector
protocol
(continued)
(continued)
method used
7 AODV—reliable J. Liu and F.-M. Li Identify link Warning message
delivery failures by using technique to find a
(AODV-RD) signal link breakage
stability-based
adaptive routing
(SSA)
8 Link break and Gulati et al. [31] The alternative Cost of high
load balanced path is found routing overhead
prediction before the link
algorithm actually breaks,
(LBALBP) based on the signal
strength of the
packets which has
been received from
the neighbouring
node and each
node calculates the
link break
prediction time of
the link and if the
link is found to be
broken
9 Mathematical Qin et al. [32] Prediction method,
model the link breakage
between to support
multimedia
streaming to
mobile nodes
10 Predictive Hacene et al. [33] Using Lagrange The fresh path
preemptive AODV interpolation finds sooner the
(PPAODV method to predicts progressive route
the link failure and breaks
by using average
function to
approximates the
received signal
strength (RSS)
(continued)
(continued)
method used
11 Fuzzy-based P. Rathiga and Dr. This method finds Enhance the
hybrid S. Sathappan the link breakage packet delivery
blackhole/grayhole time between 2 ratio with routing
attack detection mobile nodes in a performance
technique path route by
maintaining the
active routes.
Utilizing an
effective link
failure prediction
approach based on
the linear
regression model
12 Improved AODV Ambhaikar et al. The improved Enhancement of
protocol [34] AODV functions existing AODV
differently, and compares its
updating resolving performance with
link break and new various parameters
path. (To solve
link failures in
MANETs)
13 CPRR algorithm Hanumanthu Naik Link failure route
based on AODV and Raghunatha recovery using
Reddy [35] node monitoring;
check point route
recovery algorithm
(CPRRA) node
energy low. With
help of network
topology
management to
avoid the path
breakage
14 Enhancement Kaur et al. [36] Using signal
AODV strength to provide
best path and in
which path higher
signal level will be
selected as a last
path. Link failure
and packet lost
problem has been
reduced here
(continued)
(continued)
method used
15 Ant colony Aruna et al. [37] To improve the end To increase the
optimization to end delay and node mobility and
(ACO) throughput metrics density by
by discovering enhancing
alternative path congestion control
from the nearer of mechanism
the node
16 SSED-AODV Upadhyaya et al. To find a
[38] longer-lived route
and also to
minimize the link
breakage and to
maximize network
lifetime
17 Link breakage Choi et al. Using the value of
prediction the RSSI Received
Signal strength
indicator (RSSI) to
predict a link
breakage
possibility
6 Conclusion
Connection disappointment in wireless communication happen all the more often

contrasted with wired networks, as a result of node mobility, dynamic snags, restricted
vitality assets, blurring, and range designations decorum or guidelines, notwith-
standing open data transmission. These continuous connection disappointments will
interfere with the correspondences until they are fixed, interface disappointments are
unavoidable issue in wireless communications. Based on this, link failure detections
and recoveries are vital issues to investigate. In this paper, the brief overview on
different strategies for detection and prediction of link failures in MANETs is given
and also the brief survey of comparison of various link failure detection, prediction
and recovery techniques for MANETs are mentioned.
References
1. ShodhGangotri:https://shodhgangotri.inflibnet.ac.in/bitstream/123456789/6971/2/02_
introduction.pdf
2. Rekha, B., Ashoka, D.V.: An enhanced inter-domain communication among MANETs through
selected gateways. Int. J. Recent Trends Eng. Technol. 9(1) (2013)
3. Yamini, A., Suthendran, K., Arivoli, T.: Enhancement of energy efficiency using a transition
state mac protocol for MANET. Comput. Netw. 155, 110–118 (2019). https://doi.org/10.1016/
j.comnet.2019.03.013
4. Research Gate: http://espace.etsmtl.ca/1928/1/HAYAJNA_Tareq.pdf
5. Alshaer, N., El-Rabaie, E.-S.: A Survey on Ad Hoc Networks (2016)
6. Jadye, S.: Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 7(2), 1014–1017 (2016)
7. Zahedi, K., Ismail, A.S.: Route maintenance approach for link breakage prediction in mobile
ad hoc networks. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 2(10) (2011)
8. Ouni, S., Bokri, J., Kamoun, F.: DSR based routing algorithm with delay guarantee for ad hoc
networks. J. Netw. 4(5), 359–369 (2009)
9. Gulati, M.K., Kumar, K.: Performance comparison of mobile ad hoc network routing protocols.
Int. J. Comput. Netw. Commun. (IJCNC) 6(2) (2014)
10. Rekha, B., Ashoka, D.V.: Performance analysis of AODV and AOMDV routing protocols on
scalability for MANETs. In: Sridhar, V., Sheshadri, H., Padma, M. (eds) Emerging Research in
Electronics, Computer Science and Technology. Lecture Notes in Electrical Engineering, vol.
248. Springer, New Delhi (2014)
11. Zhang, Q.J., Wu, M.Q., Yan, Z.H.E.N., Shang, C.L.: AODV routing overhead analysis based on
link failure probability in MANET. J. China Univ. Posts Telecommun. 17(5), 109–115 (2010)
12. Rathiga, P., Sathappan, S.: Regression-based link failure prediction with fuzzy-based hybrid
blackhole/grayhole attack detection technique. Int. J. Appl. Eng. Res. 12, 7459–7465 (2017).
ISSN 0973-4562
13. Ramesh, V., Subbaiah, P., Supriya, K.: Modified DSR (preemptive) to reduce link breakage and
routing overhead for MANET using proactive route maintenance (PRM): Global J. Comput.
Sci. Technol. 9(5), 124–129 (2010)
14. Li, Q., Liu, c., Jiang. H.: The routing protocol AODV based on link failure prediction: ICSP
IEEE. (2008)
15. Qin, L., Kunz, T.: Increasing packet delivery ratio in DSR by link prediction: HICSS 03. IEEE.
Hawaii (2002)
16. Zhu, Y.: Proactive Connection Maintenance in AODV and MAODV: Master of Science.
Carleton University, Canada (2002)
17. Hoi, W., Nam, J., Choi, S.: (2008) Hop state prediction method using distance differential of
RSSI on VANET. NCM 2008. IEEE, pp. 426–431
18. Goff, T., Abu-Ghazaleh, N., Phatak, D., Kahvecioglu, R.: Preemptive routing in ad hoc
networks. J. Parallel Distrib. Comput. 63, 123–140 (2003)
19. Lu, H., Zhang, J., Luo, X.: Link switch mechanism based on DSR route protocol: ICINIS IEEE
(2008)
20. Kaplan Elliott, D.: Understanding GPS: Principles and Applications. Artech House Publishers,
Boston (1996)
21. Singh, M., Sharma, J.: Performance Analysis Of Secure & Efficient AODV (SE-AODV) with
AODV Routing Protocol Using Ns2: https://www.researchgate.net/publication/286679504
22. Wang, N.-C., Lee, C.-Y.: A reliable QoS aware routing protocol with slot assignment for mobile
ad hoc networks. J. Netw. Comput. Appl. 32(16), 1153–1166 (2009)
23. Maamar, S., Abderezzak, B.: Predict link failure in AODV protocol to provide quality of service
in MANET. I. J. Comput. Netw. Inf. Secur. 3(1–9). Published Online in MECS: http://www.
mecs-press.org (2016)
24. Cha, H.-J., Han, I.-S., Ryou, H.-B.: QoS routing mechanism using mobility prediction of node
in ad-hoc network: In: Proceedings of the 6th ACM International Symposium on Mobility
Management and Wireless Access. ACM (2008)
25. Liu, J., Li, F.-M.: An Improvement of AODV protocol based on reliable delivery in mobile
ad hoc networks. In: Fifth International Conference on Information Assurance and Security
(2009)
26. Dube, R., Rais, C.D., Wang, K.-Y., Tripathi, S.: Signal stability-based adaptive routing (SSA)
for ad hoc mobile networks. IEEE Pers. Commun. (1997)
27. Veerayya, M., Sharma, V., Karandikar, A.: SQ-AODV: A novel energy-aware stability-based
routing protocol for enhanced QoS in wireless ad-hoc networks: In: Military Communications
Conference. MILCOM 2008. IEEE (2008)
28. Sarma, N., Nandi, S.: Route stability based QoS routing in mobile Ad Hoc networks. Wirel.
Pers. Commun. 54(11), 203–224 (2010)
29. Satyanarayana, D., Rao, S.V.: Link failure prediction QoS routing protocol for MANET. ICTES
2007, 1031–1036 (2007)
30. Ghanbarzadeh, R., Meybodi, M.R.: Reducing message overhead of AODV routing protocol in
urban area by using link availability prediction: In: 2010 Second International Conference on
Computer Research and Development. IEEE (2010)
31. Gulati, M.K., Sachdeva, M., Kumar, K.: Load balanced and link break prediction routing
protocol for mobile ad hoc networks. J. Commun. 12(6) (2017)
32. Qin, M., Zimmermann, R., Liu, L.S.: Supporting multimedia streaming between mobile peers
with link availability prediction. In: Proceedings of 13th Annual ACM International Conference
on Multimedia, ACM New York, NY, USA, pp. 956–965 (2005)
33. Hacene, S.B., Lehireche, A., Meddahi, A.: Predictive preemptive ad hoc on-demand distance
vector routing: Malaysian. J. Comput. Sci. 19(2), 189–195 (2006)
34. Ambhaikar, A., Sharma, H.R., Mohabey, V.K.: Improved AODV protocol for solving link
failure in MANET. Int. J. Sci. Eng. Res. 3(10) (2012). ISSN 2229-5518
35. Hanumanthu Naik, K., Raghunatha Reddy, V.: Link failure route rectification in MANET using
CPRR algorithm based on AODV. Int. J. Innov. Res. Sci. Eng. Technol. 4(9) (2015)
36. Kaur, H., Brar, G.S., Malhotra, R.: To propose a novel technique to reduce link failure problem
in MANET. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 3(10) (2014)
37. Kadam, A.A., Jain, S.A.: Congestion control mechanism using link failure detection in
MANET. Multidisc. J. Res. Eng. Technol. 1(2), 152–161
38. Upadhyaya, J., Manjhi, N., Upadhyaya et al.: Energy based delay with link failure detection in
MANET. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 6(6), 437–443 (2016)
Review of Low Power Techniques
for Neural Recording Applications
P. Brundavani and D. Vishnu Vardhan
Abstract Continual neural signals recording is very important in the design of

an effective brain machine interface and also to interpret human neurophysiology.
Advancements in technology made the electronics to be capable of recording signals
from large number of neurons on a single device. The demand for data from large
number of neurons is continuously increasing from day to day. It is required for near
approximate estimation of a challenging tool for the design engineers to produce an
efficient Neural Recording Front End (NRFE). For small implant size, area occupied
per channel must be low. Dynamic range in NRFE varies with respect to time due
to change in the distance between electrode and neuron or background noise which
requires adaptability. In this work, techniques for reduction of power consumption
per channel and reduction in area consumption per channel in a NRFE are studied,
via new circuits and architectures, and compared for proper choice of sub-blocks.
Keywords NRFE · Neurophysiology · Neural signals · Dynamic range · Power ·

Area
1 Introduction
In the recent past, interest in the technologies for neuroscience and neuro-prosthetic
applications has increased in many folds. The main aim of studying neuroscience
is to get better understanding of the brain, human neurophysiology and origin of
disorders as schizophrenia, epilepsy, Parkinson’s disease etc. Whereas the study of
neuro-prosthetic basically deals with the restoration of a lost biological function by
mimicking its functionality. The constraints on the neuro-prosthetic application are
P. Brundavani (B) · D. Vishnu Vardhan

JNTUA, Anantapuramu, AP, India
D. Vishnu Vardhan
e-mail: vishnu.ece@jntua.ac.in
P. Brundavani
AITS, Rajampet, AP, India

https://doi.org/10.1007/978-3-030-38445-6_14
184 P. Brundavani and D. Vishnu Vardhan
much more rigid than that of neuroscience application. A neuro-prosthetic system

must ensure safety and reliability for its complete acceptance and has to keep in view
the economic challenges too.
The area of retinal [1] and cochlear prosthetics have made great progress in last
few decades and implants have been successfully tested on human beings. A great
deal of work is going on the improvement and recovery of lower limb malfunction
due to damage spinal cord [2]. In the treatment of Parkinson’s disease, Deep Brain
Stimulation (DBS) has been found to be very effective. Vague nerve stimulation is
proved to be a safe and reliable way to treat epilepsy [3]. In the very core of these
prosthetic systems, there is a Brain Machine Interface (BMI) whose accuracy directly
decides the reliability and efficiency of the system.
Typically, brain acts as the main controller of the various limbs/organs in a human
body and using the central nervous system brain commands all of them. Various
limbs or organs are controlled by different dedicated regions of the brain. If the brain
fails to communicate with a limb or organ, due to any mishap or disease, they become
nonfunctional or mal-functional. To restore the functionality, the communication link
between the brain and the limb has to be re-established. For this, a parallel artificial
link, from the brain to the limb, can be made which mimics the process exactly as in
a normal human being. Two very important blocks for a neural prosthetic system are:
recording and stimulation which act as the interface between the machine and the
human body. Stimulation block deals with the last requisite of a BMI and controls
the movement or function of the body part. It can be electrical or magnetic in nature
with electrical stimulation being more popular.
Device reliability is an important issue in stimulation circuitry as large voltages
are required to send strong current pulses to stimulate the organ. On the other hand,
recording block deals with the first requisite of a BMI and is more commonly known
as neural recording system (NRS). It must be able to extract the information from the
brain without disrupting the normal behavior of the brain. There are many signals that
can be used for the purpose of recording neural signals generated from instruments
such as electroencephalogram (EEG), electrocorticogram (ECoG) and Extracellular
Action Potential (EAP). All these signals have their own merits and demerits for their
use in the recording process. This work focuses on the recording of EAP signals and
hence only EAP based systems are discussed. NRS should not corrupt the information
by adding large amount of noise or interference. NRS is usually implanted in the brain
to keep it in the proximity of the neurons which are basic building blocks of a human
brain to protect the signals from noise and other potential interference source.
Figure 1 represents a conventional multichannel neural recording system. This
system consists of m N-channel neural recording front ends (NRFE) which can cater
to different regions of a human brain. The main role of a NRFE is, to sense, to amplify
and to digitize the information extracted from neural signals from many number of
neurons with no corruption due to electronic noise. NRFE typically consists of Low
Noise Amplifiers (LNA), gain stages and Analog to Digital Converter (ADC). NRFE
must be designed to be adaptive to the dynamic range requirement to potentially
reduce power consumption when conditions are better than the worst. The digital
data is fed to a Digital Signal Processor (DSP) which applies spike sorting algorithm
Review of Low Power Techniques for Neural … 185
Fig. 1 A conventional neural recording system (N-channel)
to attribute a spike to its source neuron. This DSP also helps in reducing the amount
of data to be transmitted by either doing spike thresholding [4] or extracting impor-
tant spike features and transmitting them only. The output of the DSP is fed to RF
telemetry block which serializes, encodes and transmits the data over a wireless link.
Each block of a NRS should necessarily have low power consumption, should add
low noise and must occupy small area. Typically, NRFE consumes maximum power
(excluding RF telemetry) and area in a NRS, and needs careful optimization [5] to
reduce them.
The proceeding section describes the basic neural recording front end in brief
and followed by the review of different research works related to different blocks of
the recording system and then they are analyzed further in the proceeding section.
Finally, the conclusions and outcomes are discussed.
2 Neural Recording Front End
Figure 2 presents a typical N-channel Neural Recording Front End (NRFE). In NRFE,
every individual channel contains a LNA to sense differential small voltage created
in the electrolyte through a Multi Electrode Array (MEA) and to amplify it to protect
from the noise of the succeeding stages. Each channel has additional gain stage(s)
(Av ) to add more amplification to the weak neural signals and to enable the signal to
cover the entire dynamic range of ADC. Each channel is usually ac coupled through
an input capacitance Cin to block large dc offset voltages at the electrode-electrolyte
interface.
The N-channels are time division multiplexed into a Variable Gain Amplifier
(VGA) followed by a k-bit ADC which digitizes the signal and feeds it to the DSP of
the NRS for further processing. NRFE usually dictates the number of channels that
can be employed in NRS for a given chip area [6]. However, there is a continuous
demand for data from more and more number of neurons which needs more number
Fig. 2 A conventional N-channel neural recording front end. Courtesy Vikram Chaturvedi, “An
8-to-1 bit 1-MS/s SAR ADC With VGA and integrated data compression for neural recording”,
IEEE (VLSI) SYSTEMS, 2013
of channels in the same chip area. Safety issues due to power dissipation per unit area
also set an upper limit on the number of channels allowed in a given area. Two very
important metrics used for evaluation of a NRFE are: power consumed per channel
and area occupied by a single channel. A NRFE with low power consumption for one
channel and area occupied by one channel is well suited for a NRS that can perform
chronic neural signal recording from a more number of neurons and it will result in
a BMI with high accuracy and reliability.
3 Literature Review
In the paper titled, “Towards a 1.1 mm2 Free-Floating Wireless Implantable Neural
Recording SoC”, the authors Yeon et al. [7], designed and implemented a wireless
Implantable Neural recorder system on chip. To achieve low power capability for the
Proposed SoC, the authors had implemented a 10-bit Voltage Controlled Oscillator
(VCO) based ADC for the digitalization of analog signals. The ADC designed in
this work yielded good results in terms of power, Resolution and Effective Number
Of Bits (ENOB) over Successive Approximation Register (SAR) ADC. The total
work is carried out in CMOS 350 nm technology. VCO based ADC architecture is
illustrated in the Fig. 3.
In the paper titled “A 70.8 dB 0.0045 mm2 Low-power Continuous-Time Incre-
mental Delta-Sigma Modulator for Multi-Site Neural Recording Interfaces,” by
Shui et al. [8] have designed and introduced Incremental
Power and Area Efficient
Incremental Continuous Time Delta-Sigma (CTI ) ADC for the above discussed
applications.
Figure 4 shows the structure of a second order CTI ADC for the system.
The second order continuous time modulator characterizes a cascade of integrators
Fig. 3 VCO based ADC architecture in wireless implantable neural recorder

Fig. 4 Second-order CTI ADC
type active RC integrator based loop filter with distributed feedback using passive
components that are optimized for area. The ADC is designed in a CMOS method
of 0.18 µm technology and it achieves a signal to noise and distortion ratio (SDNR)
of 70.8 dB at 10 kHz bandwidth. It covers an area of 0.0045 mm2 and consumes
energy of 16.6 µW. Figure 5 shows an area efficient current steering DAC used for
the implementation of CT incremental Delta-Sigma ADC of the system.
A Scalable neural recording interface was developed by the authors Park et al.
[9] in their paper entitled “Dynamic Power Reduction in Scalable Neural Recording
Interface Using spatio-temporal Correlation and Neural Signal Temporal Sparsity”.
An integrated lossless compression method is used here to decrease dynamic power
dissipation (PD) that is appropriate for transmission of data in neural recording
devices of high density. The authors studied the neural signals characteristics and
introduced a powerful lossless compression in distinct signal routes for Local Field
Potential (LFP) and EAP or spike.
For LFP, a modulated ADC and a specialized digital difference circuit exploit the
spatio-temporal correlation of LFP signals. The resultant statistical redundancy is
removed using entropy encoding with no data loss. Only vital sections of waveforms
in the spike signals are obtained from the original information. Spike detectors and
reconfigurable analog memories are used for the above said purpose. The chip was
Fig. 5 Implementation of the current steering DACs
produced using 180 nm CMOS technology, which incorporates 128 channels into a
flexible design that can be easily scaled and extended for bulk neural signal recording.
The fabricated chip accomplished data rate reduction from the suggested com-
pression system for the LFPs by a factor of 5.35 and for the spikes by a factor of 10.54.
Consequently, when compared to the uncompressed situation, PD was decreased by
89%. The recording efficiency of 3.37 µW/channel, 5.18 µVrms noise, and 3.41
NEF were also accomplished. Figure 6 demonstrates the architecture of a neural
recording interface of 128 channels with a built-in lossless compression system.
The authors Kim et al. [10], in their work of integrated recording of neural elec-
trical signals from the brain offers significant problems due to strict requirements
Fig. 6 Top-level circuit architecture with integrated lossless compression system of a 128-channel
neural recording interface
for high dynamic range to solve very small amplitudes signals embedded in noise
among large transient response and stimulation as well as severe energy and quantity
limitations to allow minimally invasive un-tethered operation, in their work of “Tran-
sient Recording with Transient Recovery through Predictive Digital Auto-ranging”.
A neural system on-chip recording with 16 channels, a dynamic range exceeding
90-dB and an input referred to noise of less than 1-µVrms from dc to 500 Hz with
energy consumption of 0.8 µW and 0.024 mm2 /channel was provided by a 65 nm
CMOS method.
Each capture channel has mixture of analog and digital second-order oversam-
pling ADC. This avoids the need to pre-amplify, high-pass filtering in different neural
recording schemes which often lead to distortions in the signal gain of the digital
domain integrator, particularly for big converter gains and for dynamic offset sub-
traction. The built-in ADC in direct neural recording provides a value figure for the
connected front end with 1.81 noise efficiency (NEF) factor and the associated Power
Efficiency Factor (PEF) factor of 2.6.
The predictive digital auto range of binary quantizers further encourages fast
transient recovery while retaining full dc-coupling operation. Therefore, the ADC is
able to record 0.01 Hz slow neural signals and recovering from 200-mVpp transients
within 1 ms which are significant prerequisites for efficient electro-cortical recording
for mapping brain activity. In vivo records from marmoset primate frontal cortex
shows its unique capacity to resolve extremely slow local field potentials. The system
circuit diagram and micrograph of the integrated circuit of 16-channel neural-signal
acquisition is shown in Fig. 7.
The writers Tong et al. [11] has proposed a 10 bit full-differential SAR ADC
with various input channels for neural recording implants, in their work “A 0.6 V
10 bit 120 kS/s of SAR ADC for Implantable Multichannel Neural Recording.” The
proposed SAR ADC involves both energy-efficient and low-electricity switches,
which use the strength of each other’s to achieve less energy consumption. The 10-
bit SAR ADC is able to work at a sampling rate of 120 kS/s. It uses 0.6 V power
supply and it is implemented in 0.18 µm of CMOS technology. The proposed ADC
found to consumes 0.5 µW power. It achieves the 9.51 ENOB which means that
the figure of merit equal to a 7.03 fJ/Conversion Figure of Merit. This ADC has a
386 µm × 345 µm surface region. Further, Fig. 8 describes a multi-channel NRS
with a 10-bit SAR ADC.
The authors, Tong et al. [6], proposed an implantable bio-electronic differential-
input ADC based on a VCO, in their work on “A 1 V 10 bit 25 kS/s VCO-based
ADC for implantable neural recording.” In order to support differential inputs two
single ended VCO-based ADCs are combined together eliminating interference with
a shared mode and even distorted harmonics. The digital output is produced by a 10-
bit binary subtraction system is used for processing the digital outputs from two
matched ADCs. The subtraction circuit with Domino logic doors is designed to
further reduce power consumption and chip area. The 10 bit 25 kS/s VCO-based
ADC can work under 1 V power supply, designed with 0.18 µm CMOS technology.
This ADC’s active area is 270 µm × 100 µm, which in comparison much lower than
a SAR ADC. Figure 9 shows the design of a 10-bt VCO-based ADC.
Fig. 7 16-channel ADC-direct neural recording IC with PDA. a System diagram and circuit
architecture with single-channel detail. b IC micrograph with corresponding single-channel detail
The authors Rehman et al. [12], in their work “A 79 µW 0.24 mm2 8-channel Neu-
ral Signal Recording Front-End Integrated Circuit”, introduced a new architecture for
an ultra-low power efficient and area effective 8-channel prototype of a neural signal
recording front end circuitry. Low energy and low area are two of the most critical
demands for implantable neural recording circuits. The recording route outlined in
the literary architectures is focused instead of using the stand-alone amplifier for
each electrode on a single programmable high-performance gain-bandwidth ampli-
fier. Compared to all earlier published models, the resulting circuitry occupies lower
area and also consumes less power. Such an architecture of a single ended version
of a complete neural recording path is shown in Fig. 10 which is implemented in
0.5 µm CMOS and with a supply voltage of 1.8 V, and the 8-channel recording
Fig. 8 Using the suggested 10 bit SAR ADC, multichannel neural recording system
Fig. 9 Architecture of the 10 bit VCO-based ADC
path is measured to dissipate 79 µW of power and area of 0.24 mm2 . Thus further,
allowing suitability of the design to be used in high channel count environments.
The authors Tang et al. [13], have designed a new area-efficient Driven Right-leg
(DRL) circuit for their work titled “An Integrated Multichannel Neural Recording
Analog Front-End (AFE) with Rea Efficient DRL”. The Common Mode Rejection
Ratio (CMRR) of the system is improved. In comparison with conventional SRL
circuits, the proposed DRL condenser less design achieves a 90% reduction in the
chip surface region with enhanced CMRR qualities which makes it appropriate for
bio-medical recorder applications of multi-channels. This AFE comprises of a low
noise programmable amplifying system, a surface effective DRL block, and a 10-bit
DRL ADC. The AFE circuit was built in a standard 0.18 µm CMOS technology.
Simulation results, after creating layout, exhibit that the AFE offers two different
gains of 54 dB or 60 dB by taking 1 µA/channel at a 1 V of supply voltage. Here,
integrated input referred noise from 1 Hz to 10 k Hz is only 4 µVrms and CMRR
Fig. 10 Single ended

version of complete neural
recording path
is 110 dB. Such a 10-channel neural AFE interface system architecture is shown in
Fig. 11.
The authors Luany et al. [14], a novel neural recording system-on-chip (SoC) is
described with 64 ultra-low-power/low-noise channels featuring a highly reconfig-
urable analog front end (AFE) and block selectable data-driven output described in
their paper entitled “An Event Driven SoC for Neural Recording.” This modifica-
tion allows the extraction of LFPs and/or EAPs by a tunable bandwidth/sampling
rate. Real-time spike detection utilizes a basic double polarity limit to activate the
output of an event-driven neural spikes. Individual channel is separately powered
down and it is configured for the necessary gain bandwidth and detection threshold.
Therefore the output can merge continuous-streaming of data and event-driven data
packets with the scheme that is systematized as a serial peripheral interface (SPI)
slave. This SoC is created in a 0.35 µm CMOS technology with a silicon room of
19.1 mm2 (0.3 mm2 gross/channel) and it requires only AFE to consume 32 µW of
energy/channel (Fig. 12).
Ando et al. [15], in their research work “Wireless Multi-channel Neural Recording
a 128 Mbps UWB Transmitter for Implantable Brain-Machine Interfaces”, large-
scale, long-term and bio-safe neural activity recording was presented. This data
can be used to understand brain-machine interfaces and brain activity in clinical
applications. For the purpose, a new multi-channel neural recording system allowing
multiple connections of custom embedded ASICs to record one ECoG data adopted
to 4096-channel. The ASIC involves 64 channels of low-noise amplifiers, analog
multiplexers and SAR ADC of 12-bits. The recorded data is sampled at 1 kS/s. In
total 51.2 Mbps of raw data from 4096 channels is generated.
Figure 13 shows the neural recording ASIC architecture and LNA and VGA
circuits. This device has a wireless ultra-wideband (UWB) unit to transmit the neural
signals that are collected. The ASICs, multiplex panels and transmission units of
Fig. 11 Architecture of an implantable 10-channel analog front-end interface for NRS
UWB are specifically designed for implantation. It was reliable to contain 4096
channel UWB wireless information transmission at 128 Mbps for distances below
20 mm from preliminary experiments with a human body equivalent fluids phantom.
Bahr et al. [16], in their work titled, “Small Area Low Power Neural Recording
Integrated Circuit in 130 nm CMOS Technology for small Mammalians” designed
a novel architecture is for NFRE. In neuroscience studies, models of genetic mouse
disease are used for the analysis of brain development and also for the treatment
Fig. 12 System architecture showing the 64-channels grouped into 16_4-channel AFE blocks
Fig. 13 a Neural recording

ASIC, b LNA and VGA
of disease like specific type of epilepsy. For the distinctive situation of neo-natal
mice recording, a custom based integrated circuit is used. Neo-natal mice are only
2–3 cm in size and only a few grams in weight. Therefore, size and weight of the
recording system must be very low. The IC utilizes 16 low-area analog differen-
tial pre-amplifiers with band passing (0.5 Hz–10 kHz) function. A mixed 8 × 1
multiplexer, post amplifier and 10-bit SAR ADC structure digitizes high-resolution
signals, and transmits digital data from a SPI. The IC was created with a 130 nm
CMOS technology and was effectively used in in vivo measurements.
In the work entitled “A Fully Integrated Wireless Compressed Sensing Neural
signal procurement system for chronic recording and Brain Machines Interface,” Liu
et al. [17] presented a reliable, multiple channel neural recording system which is cru-
cial for neuroscience research and clinical therapy. For practical usage, the trade-offs
among functionality, electricity use, dimension, trust worthiness and compatibility
need to be carefully considered. The work offers a optimized scheme to record the
wireless compressed sensing module for neural signaling. Special integrated circuit
and universal wireless solutions benefit from this scheme.
The system is composed of a wireless (SoC) implantable and an internal wireless
relay. The SoC integrates 16-channel neural LNAs, programmable filters and high
gain parts, a SAR ADC, an embedded compressed sensor module and a near-field
wireless connection of power and data transmission. The external relay includes a
Bluetooth 4.0 wireless module with a 32-bit low-power microcontroller, a program-
ming interface, and an inductive charging unit. With minimal power consumption
the SoC achieves outstanding signaling performance and decreases infection risk
of connectors through the skin. Compatibility and programmability are increased
through an internal relay. With a compression ratio of 8 and SNDR of 9.78 dB,
the suggested compressed sensing module is highly configurable. The SoC is pro-
duced with a conventional 180 nm CMOS technology, covering an area of 2.1 mm
× 0.6 mm of silicon. To demonstrate the paradigm, a pre-implantable device has
been developed. The model created was used effectively in free behavior of rhesus
monkey for long-term wireless neural recording. Figure 14 illustrates continuous
neural recording system of fully integrated compressed sensing chip.
In the work titled “A Time-Based, Digitally Intensive Circuit and System Archi-
tecture for Wireless Neural Recording with High Dynamic Range”, Hsu et al. [18]
proposed a new type of wireless NRFE with pulse width modulation (PWM). The
targeted signal is encoded in binary with varying width of the pulse by PWM. It
gives larger dynamic range as compared to common voltage-mode instrumentation
amplifiers. Thus the NRFE is not saturated by artifacts of stimulation by allowing
both recording and stimulation simultaneously. The schematic of the proposed AFE
is shown in Fig. 15.
Together with the PWM front-end output, a set of counters, which are triggered by
a system clock forms a first order continuous-time delta-sigma modulator for signal
digitization. It can be treated as a modified voltage controlled oscillator (VCO)
based ADC. To eliminate supply ripples and to improve resolution, the essential
sinc anti-aliasing filter and first-order noise shaping are employed. The architecture
proposed is digitally intensive. Therefore, low-power and low-voltage operation in
scaled technology can be achieved. This work describes the theoretic description and
simulation based on behavioral models.
In their work entitled “A Bidirectional Neural Interface Circuit with Active Stimu-
lation Artifact Cancellation and Cross-Channel Common-Mode Noise Suppression,”
the author’s Mendrela et al. [19] submitted a bidirectional neural interface circuit with
a stimulative artifact circuit. A common front-end CAR circuit is exercised through
the scheme. This circuit suppresses environmental noise across channels to make it
suitable for clinical use. This article also presents a fresh SAR ADC range adapting
(RA) to decrease power consumption.
The architecture of such a NRFE is shown in Fig. 16. A prototype is fabricated
and characterized in 0.18 µm CMOS process. And it is tested in vivo in an epileptic
rat model. The prototype attenuates artifacts of stimuli up to 42 dB and removes up
Fig. 14 a Illustration of NRS using a fully integrated compressed sensing chip, and b chip
architecture
to 39.8 dB of cross-channel noise. The measured power consumption per channel is

330 nW, while the area of single channel is 0.17 mm2 .
The authors Liu et al. [20], in the work “Design of a Low-Noise, High Power
Efficiency Neural Recording Front-end With an Integrated Real-Time Compressed
Sensing Unit”, offered a high efficiency and low-power NRFE for acquisition of
neural signals. It is capable of LFP signals recording and AP signals recording using
Fig. 15 Schematic of AFE
Fig. 16 Architecture of
NRFE
12-channels in it. The suggested NRFE includes low-noise instrumentation ampli-

fiers, filter circuits requiring less power for their operation with suitably organized
gain and cut-off frequencies, a SAR ADC and a real time compacted sensing and
processing device unit. An embedded input impedance boosting capacitor-coupled
instrumentation amplifier has been developed to dissipate 1 µA of quiescent current.
The measured input referred noise was 1.63 µV in 1 Hz–7 kHz range and the mea-
sured amplifier NEF of as 0.76. At 1 MS/s of sampling rate, the ENOB of 10.6-bit
was achieved by SAR ADC. A compacted sensing process device has been incorpo-
rated in the design with a suitably organized compression ratio of up to 8 multiples.
Figure 17 indicates a wireless neural recorder’s block diagram. The design has been
fabricated in 180 nm CMOS process. It occupies a silicon area 4.5 mm × 1.5 mm.
With the custom IC and a commercial low-power wireless module, a portable neural
recorder was constructed. A lithium battery of weight 4.6 g, powers the device to
record the neural signals up to 70 h. of ongoing compacted sensing.
Fig. 17 A wireless portable

NRFE IC with off-chip
transmitter
Fig. 18 A multi-channel neural recording system
The authors Tao et al. [21], in the work “A 0.8 V, 1 MS/s, 10-bit SAR ADC for
Multi-Channel Neural Recording”, presented a single ended Successive Approxi-
mation Register (SAR) ADC of 10-bits. It is suitable for multichannel neural signal
recording. As shown in Fig. 18, this ADC presents several methods to save power
so that power efficiency of the system can be increased. The ADC is constructed
with a common mode buffer on chip to track input. During conversion process, this
buffer is again used as the pre-amplifier of a current mode comparator. To reduce the
capacitive load on the amplifier, a small capacitor is placed between the amplifier
and the capacitive DAC array.
To decrease the switching power, a split capacitor array with dual thermometer
decoders is suggested. It is implemented with 0.13 µm CMOS technology. This
analog to digital converter accomplished a differential non-linearity (DNL) of −
0.33/0.56 LSB (max), integral non-linearity (INL) of −0.61/0.55 LSB (max), ENOB
of 8, 8-bit and 9 µW of power usage.
The authors Widge et al. [22], in the work “An Implantable 64-channel Neural
Interface with Reconfigurable Recording and Stimulation”, presented the next gener-
ation of implantable medical devices which will be able to deliver more accurate and
efficient therapies using adaptive closed loop controllers that can combine sensing
and stimulation processes across a wide range of channels. A significant challenge
in designing these systems is to balance enhanced functionality, and the number of
channels with the miniaturization that is needed for implantation in tiny anatomical
spaces. This type of custom-made therapies will be requiring adaptive systems that
are capable of tuning to sensed and stimulated channels to solve patient-specific
variability requirements, chronic physiological responses and surgical placement of
electrodes. To tackle these difficulties, they are intended to a reduced sized fully-
reconfigurable and implantable front end system which is incorporated at the distant
end of an 8 wire lead. It allows dynamic configuration for sensing and stimulation
of up to 64 electrodes.
Two custom 32 × 2 cross point switch (CPS) matrix application specific inte-
grated circuits enable full configurability. They can route any of the electrodes either
to an amplifier, whose bandwidth is reprogrammable, and embedded analog to digital
converter or to any one of two autonomous channels of stimuli those can be driven
through the 8 wire lead. This 8 wire circuit involves a strong communication digital
interface along with a load-balanced power system for increased safety. The spec-
ified device is embedded in an enclosed package intended to fit within the skull’s
14 mm bur-hole for brain neuro modulation, but could be readily adjusted to improve
therapies across a wide range of applications.
4 Analysis
Various architectures for neural recording system front end are studied. The studied
NRFE architectures were implemented in different CMOS processing technologies.
They were designed using different LNA circuits for reduced noise results in com-
bination with different structures of ADCs for good resolution and dynamic range.
Some of them were implemented including variable or programmable gain ampli-
fiers for adjustable or adaptable gain parameters as necessary at a given instant for
captured neural spike amplitude. ADC structure decides the accuracy, resolution and
dynamic range of the NRFE where as LNA and PGA are responsible for strength
(sensitivity) levels of neural signals.
SAR ADC based NRFE circuits achieved lower power consumption with efficient
switching scheme and lower power usage. The design was also able to operate at
scalable sampling rate at the lowest power supply. VCO based ADCs produced good
results in terms of power, resolution and ENOB over SAR ADC. Whereas continuous
time Delta sigma ADC resulted area efficiency using current steering DAC and also
lower power consumption at a bandwidth of 10 kHz. In a NRFE SCIC multi-canal,
CMRR was increased with area-efficient DRL circuit. Dynamic energy decrease
was achieved through a scalable neural recorder interface with spatio-temporal cor-
relation and neural signals temporal sparseness. The multi-channel low-noise, pro-
grammable filter, programmable winder and SAR ADC implantable wireless chip
(SOC) systems, combined with internal wireless relays, have been intended to incor-
porate custom-made integrated circuits and universally compatible wireless solu-
tions. Cross-channel environmental noise has been suppressed by using the CAR
front end circuit to ensure its use in clinical environments.
5 Conclusion
Power consumption and area consumption are two serious bottlenecks for the
scalability of present neural recording systems to a larger number of channels.
Future neural recording systems must solve these critical issues for an efficient
brain-machine-interface for an implantable neuro-sensor. Various techniques and
considerations to reduce power consumption per one channel and area per one chan-
nel in a NRFE were studied. For low power requirements, delta sigma ADCs are
suitable providing high resolution, but they need more conversion time. SAR ADCs
are suitable for medium conversion speed and medium resolution. Implementation
in 90 nm technology may further reduce the area and power requirements.
References
1. Chen, K., Yang, Z., Hoang, L., Weiland, J., Humayun, M., Liu, W.: An integrated 256-channel
epi-retinal prosthesis. Solid-State Circ. 45(9), 1946–1956 (2010)
2. He, J., Ma, C., Herman, R.: Engineering neural interfaces for rehabilitation of lower limb
function in spinal cord injured. Proc. IEEE 96(7), 1152–1166 (2008)
3. Amar, A., Levy, M., Liu, C., Apuzzo, M.: Vagus nerve stimulation. Proc. IEEE 96(7), 1142–
1151 (2008)
4. Harrison, R., Kier, R., Chestek, C., Gilja, V., Nuyujukian, P., Ryu, S., Greger, B., Solzbacher,
F., Shenoy, K.: Wireless neural recording with single low-power integrated circuit. Neural Syst.
Rehabil. Eng. IEEE Trans. 17(4), 322–329 (2009)
5. Chaturvedi, V., Amrutur, B.: An area-efficient noise-adaptive neural amplifier in 130 nm CMOS
technology. Emerg. Sel. Top. Circ. Syst. PP(99), 1–10 (2011)
6. Tong, X., Wang, J.: A 1 V 10 bit 25kS/s VCO-based ADC for implantable neural recording.
In: IEEE Conference (2017)
7. Yeon, P., Bakir, M.S., Ghovanloo, M.: Towards a 1.1 mm2 free-floating wireless implantable
neural recording SoC. In: IEEE Conference (2018)
8. Shui, B., Keller, M., Kuhl, M., Manoli, Y.: A 70.8 dB 0.0045 mm2 low-power continuous-
time incremental delta-sigma modulator for multi-site neural recording interfaces. In: IEEE
Conference (2018)
9. Park, S.-Y., Cho, J., Lee, K., Yoon, E.: Dynamic power reduction in scalable neural recording
interface using spatiotemporal correlation and temporal sparsity of neural signals. IEEE J. Solid
State Circ. (2018)
10. Kim, C., Joshi, S., Courellis, H., Wang, J., Miller, C., Cauwenberghs, G.: Sub-μVrms-noise
sub-μW/channel ADC-direct neural recording with 200-mV/ms transient recovery through
predictive digital autoranging. IEEE J. Solid-State Circ. (2018)
11. Tong, X., Wang, R.: A 0.6 V 10 bit 120kS/s SAR ADC for implantable multichannel neural
recording. In: IEEE Conference (2017)
12. Rehman, S.U., Kamboh, A.M., Yang, Y.: A 79 µW 0.24 mm2 8-channel neural signal recording
front-end integrated circuit. In: IEEE Conference (2017)
13. Tang, T., Goh, W.L., Yao, L., Cheong, J.H., Gao, Y.: An integrated multichannel neural record-
ing analog front-end ASIC with area-efficient driven right leg circuit. In: IEEE Conference
(2017)
14. Luany, S., Liuy, Y., Williams, I., Constandinou, T.G.: An event-driven SoC for neural recording.
In: IEEE Conference (2016)
15. Ando, H., Yoshida, T., Matsushita, K., Hirata, M., Suzuki, T.: Wireless multichannel neural
recording ith a 128 Mbps UWB transmitter for an implantable brain-machine interfaces. IEEE
Trans. (2015)
16. Bahr, A., Saleh, L.A., Hinsch, R., Schroeder, D., Isbrandt, D., Krautschneider, W.H.: Small
area, low power neural recording integrated circuit in 130 nm CMOS technology for small
ammalians. In: 2016 IEEE (2016)
17. Liu, X., Zhang, M., Xiong, T., Richardson, A.G., Lucas, T.H., Chin, P.S., Etienne-Cummings,
R., Tran, T.D., Van der Spiegel, J.: A fully integrated wireless compressed sensing neural
signal acquisition system for chronic recording and brain machine interface. IEEE Trans.
10(4), 874–883 (2016)
18. Hsu, W.-Y., Cao, C., Schmid, A.: A time-based, digitally intensive circuit and system architec-
ture for wireless neural recording with high dynamic range. In: 2016 IEEE 59th International
Midwest Symposium on Circuits and Systems (MWSCAS), 16–19 Oct. 2016, Abu Dhabi,
UAE (2016)
19. Mendrela, A.E., Cho, J., Fredenburg, J.A., Nagaraj, V., Netoff, T.I., Flynn, M.P., Yoon, E.:
A bidirectional neural interface circuit with active stimulation artifact cancellation and cross-
channel common-mode noise suppression. IEEE J. Solid-State Circ. (2015)
20. Liu, X., Zhu, H., Zhang, M., Richardson, A.G., Lucas, T.H., Van der Spiegel, J.: Design of
a low-noise, high power efficiency neural recording front-end with an integrated real-time
compressed sensing unit. 2015 IEEE, pp. 2996–2999 (2015)
21. Tao, Y., Lian, Y.: A 0.8 V, 1MS/s, 10-bit SAR ADC for multi-channel neural recording. IEEE
Trans. IEEE (2014)
22. Widge, A.S., Dougherty, D.D., Eskandar, E.N.: An implantable 64-channel neural interface
with reconfigurable recording and stimulation. In: 2015 EU, pp. 7837–7840 (2015)
Machine Learning Techniques
for Thyroid Disease Diagnosis:
A Systematic Review
Shaik Razia, P. Siva Kumar and A. Srinivasa Rao
Abstract In Disease Diagnosis recognition of patterns is so important for identify-

ing the disease accurately. Machine learning is the field which is used for building
the models that can predict the output based upon the inputs which are correlated
based upon the previous data. Disease identification is the most crucial task for treat-
ing any disease. Classification algorithms are used for classifying the disease. There
are several classification algorithms and dimensionality reduction algorithms used.
Machine Learning gives the PCs the capacity to learn without being modified exter-
nally. By using the Classification Algorithm a hypothesis can be selected from the
set of alternatives the best fits a set of observations. Machine Learning is used for the
high-dimensional and the multi-dimensional data. Classy and automatic algorithms
can be developed using Machine Learning.
Keywords Machine learning · Classification algorithms · Decision trees · KNN ·

K-means · ANN
1 Introduction
Disease diagnosis is abbreviated as Dx or Ds. This is the process of determining

which disease explains a person’s symptoms. Many signs and symptoms are non-
specific and hence the diagnosis is the most challenging job. We can do the disease
diagnosis using Machine Learning techniques. We can develop a model in which
the user can enter his symptoms and the model gives a particular disease. Machine
Learning gives the PCs the capacity to learn without being modified externally.
There are many types of Machine Learning:
S. Razia (B) · P. Siva Kumar · A. S. Rao

Department of Computer Science and Engineering, K L University, Guntur District, Andhra
Pradesh, India
P. Siva Kumar
e-mail: spathuri@kluniversity.in
A. S. Rao
e-mail: srinivas@kluniversity.in
https://doi.org/10.1007/978-3-030-38445-6_15
204 S. Razia et al.
1.1 Supervised
It can be seen as a Machine Learning job of concluding a function from named training
information. The training information will have an arrangement of preparing cases
in which every instance is a combination of input object (typically a vector) and a
required yield value(also called as supervisory flag).
1.2 Unsupervised
It can be seen as a Machine Learning job used to draw derivations from datasets
which contains input information without named responses. Cluster analysis is the
most widely recognized unsupervised Learning technique.
This technique is utilized for data examination to discover designs which are
unseen.
1.3 Deep Learning
This learning strategy is the class of supervised learning procedures. This learning
method utilizes unlabeled information for training reason. Semi supervised learning
procedure lies in the middle of the supervised learning which utilizes the named
information and the unsupervised learning which utilizes the unnamed information
since it for the most part utilizes the base measure of labelled information with a
tremendous measure of unlabeled information.
1.4 Semi Supervised
This learning strategy is the class of supervised learning procedures. This learning
method utilizes unlabeled information for training reason. Semi supervised learning
procedure lies in the middle of the supervised learning which utilizes the named
information and the unsupervised learning which utilizes the unnamed information
since it for the most part utilizes the base measure of labeled information with a
tremendous measure of unlabeled information.
Machine Learning Techniques for Thyroid Disease Diagnosis … 205
1.5 Reinforcement
This learning advises the algorithm when the appropriate response isn’t right yet does
not give a procedure in which it can be revised. It needs to test different potential
outcomes until the point when it finds the correct one (Fig. 1).
2 Literature Survey
In 2013 the researchers Tiwari and Diwan [1] has given a paper that gives an auto-
matic and hidden approach to identify, patterns that are hidden, of cancer disease.
The given system use data mining techniques such as association rules and cluster-
ing. The methods involved in the data mining techniques are data collection, data
processing, categorization of data set and rule mining. Attribute based clustering for
feature selection is an important task of this paper. In this method we use vertical
fragmentation in the data set. Here the data set is divided into two clusters, one cluster
has all the relevant attributes and the other cluster has all the irrelevant attributes.
In 2006 a researcher Peleg and Tu [2] has given a paper named Decision Sup-
port, Knowledge Representation and Management. The clinical decision support is
complete program designed to help the health professionals in making clinical deci-
sions. The system has been considered as an active knowledge system. The main
objective of the modern clinical system is to assist clinicians at the point of care. The
objective of the system is to give the needed information with the health care’s orga-
nizational dynamics. Decision support systems are implemented by standardization
in information system infrastructure. The system give sits support in the complex
Fig. 1 Types of machine

learning
206 S. Razia et al.
tasks of differential diagnosis and the therapy planning. The system has to work on
the knowledge modelling task in which modelers gives the medical knowledge that
enables the system to deliver appropriate decision support system. The developers of
the above system have two knowledge management tasks, one is the project-oriented
tasks that elucidate the organizational goals, responsibilities and the other is the com-
munication and the co- ordination patterns of the care process in which the system
has to operate.
In August 2013 a researcher Mohammed Abdul Khaleel has given a paper which
is a Survey of on Medical Data for Finding Frequent Diseases using data mining
techniques [3]. This paper concentrate on examining data mining strategies which
are required for medical information mining particularly to find frequent infections,
for example, heart sicknesses, lung malignancy, breast cancer etc. Data mining is
the process of extracting data for discovering latent patterns which can be trans-
lated into valuable information. The data mining techniques have being applied to
medical data include Apriori and FPGrowth and unsupervised neural networks, lin-
ear programming, Association rule mining. The association rule mining discovers
frequently occurring items in the give dataset. The medical mining yields required
business intelligence to support well informed diagnosis and decisions.
A researcher Vembandasamy et al. [4] played out a work, to analyze coronary
illness. In this the algorithm utilized was Naive Bayes algorithm. In Naïve Bayes
algorithm Bayes hypothesis is utilized. Henceforth, Naive Bayes has an effective
freedom presumption. The utilized data collection is gotten from a standout amongst
the most driving diabetic research organizations in Chennai, Tamilnadu. There are
more than 500 patients in the dataset. The device utilized is Weka and classification
is executed by utilizing 70% of Percentage Split. The exactness offered by Naive
Bayes is 86.419%.
The data mining approach es are suggested to be applied by the researchers
Chaurasia and Pal [5] to detect heart disease. An information mining device W
EKA is utilized which contains an arrangement of machine learning algorithms with
the end goal of mining. For this viewpoint Naive Bayes, J48 and bagging are uti-
lized. Coronary illness informational set is given by U CI machine learning lab that
comprises of 76 traits. For expectation just 11 attributes are used. 82.31% preci-
sion is given by Naïve Bayes. J48 gives 84.35% of rightness. 85.03% of exactness
is accomplished by Bagging. Bagging gives the better classification factor on this
informational set.
The researchers Parthib an and Srivatsa [6] have done a work on finding of coro-
nary illness in diabetic patients. For this machine learning techniques are utilized.
Naive Bayes and SVM methods are connected by utilizing WEKA. There are 500
patients in the informational set which is gathered from Research Institute of Chen-
nai, Tamilnadu. There are 142 patients who have infection and there are 358 patients
whose disease is missing. 74% of exactness is accomplished by utilizing Naïve Bayes
Algorithm. The most elevated exactness of 94.60 is achieved by utilizing SVM.
A researcher Tan et al. [7] has given a crossover strategy. In this two machine-
learning methods like Genetic Algorithm (G.A) and Support Vector Machine (SVM)
are consolidated viably. This is finished by wrapper approach. The tools utilized as
a part of this investigation are LIBSVM and WEKA. For this investigation five
informational sets like Iris, Diabetes infection, breast cancer, Heart and Hepatitis
ailment are taken from UC Irvine machine learning storehouse. After applying GA
and SVM cross breed approach, exactness of 84.07% is accomplished for heart
disease. 78.26% precision is acquired for diabetes information set. 76.20% exactness
is accomplished for Breast growth. Precision of 86.12% is accomplished for sickness
of hepatitis.
A researcher Iyer et al. [8] has done a work to foresee diabetes illness by utiliz-
ing two methods decision tree and Naive Bayes. Ailments can happen when insulin
production is in-adequate or the utilization of insulin is inappropriate. Pima Indian
diabetes informational set is utilized as the informational set in this work. Utiliz-
ing WEKA information mining device different tests were performed. 74.8698 and
76.9565% precision is given by utilizing Cross Validation and Percentage Split sep-
arately by J48. Naive Bayes gives 79.5652% exactness by utilizing PS. Algorithms
gives most elevated accuracy by utilizing rate split test.
The researchers Sarwar and Sharma [9] have played out a work on Naive Bayes
for foreseeing diabetes Type-2. There are 3 sorts in Diabetes disease. Type-1 diabetes
is the principal sort, Second sort is Type-2 diabetes, gestational diabetes is the third
sort. Sort 2 diabetes happens from the expansion of Insulin resistance. There are 415
cases in the informational set and for assortment purpose; information is gathered
from different parts of society in India. For the improvement of model MATLAB
with SQL server is utilized. 95% of exactness is accomplished by utilizing Naive
Bayes.
A researcher Ephzibah [10] has built up a model for analysis of diabetes. This
model is the mix of the GA and fuzzy logic. It is utilized to choose the best subset of
features and furthermore utilized for the upgrade of exactness in classification. The
dataset is taken from UCI Machine learning research facility which has 8 qualities
and 769 cases for test. For implementation MATLAB is utilized. Just three best high-
lights/characteristics are chosen by utilizing genetic algorithm. Fuzzy logic classifier
utilizes these three qualities and give 87% rightness.
The researchers Fatima and Pasha [11] in 2017 have done work on how machine
learning is so essential in disease determination and its precision for expectation
of illnesses in which pattern recognition is learnt from cases. The model is utilized
for decision making process in anticipating the disease. This paper gave the inves-
tigation between diseases like coronary illness, diabetes sickness, liver infection,
dengue malady and hepatitis ailment. At last they have inferred that statistical mod-
els are unsuccessful to hold the categorical information which assumes critical part
in sickness expectation.
A researcher Anju jain [12] in 2015 has done work on how extraction of data from
various sources contain problems like heterogeneous data which is unorganized and
high dimensions which can have missing data and outliers. Mining the data accurately
using data pre-processing techniques like feature scaling and such other techniques
for noise removal and missing data and can be used to build the model which certainly
improves accuracy of the model and it will be useful in more biological complex
situations.
208 S. Razia et al.
A researcher Alic [13] in 2017 has worked on comparative analysis of most com-
monly used disease prediction techniques that are Artificial Neural Networks(ANN)
and Bayesian Networks(BN) on classification of diabetes in early stage. In which
higher accuracy is achieved by Artificial Neural Networks (ANN) with 89.78% than
compared with Bayesian Network (BN) 80.43% due to independent relation between
observed nodes. So where ANN’s are the best way for predicting the diseases.
The researchers Vijayarani and Dhayanand [14] have anticipated the liver illness
utilizing Support vector machine (SVM) and Naive bayes Classification algorithms.
ILPD dataset can be acquired by utilizing UCI. Informational set comprises of 560
occurrences and it additionally comprises of 10 attributes. Comparision is done in
light of the precision and time taken for execution. Naive bayes gives 61.28% exact-
ness inside 1670.00 ms. 79.66% rightness is gotten inside 3210.00 ms by utilizing
SVM. MATLAB is utilized with the end goal of execution. SVM gives most elevated
accuracy when contrasted with the Naive bayes for the expectation of liver disease.
Regarding time taken for execution, Naivesbayes takes less time when contrasted
with the SVM.
A researcher Gulia et al. [15] has played out an examination on intelligent methods
which are utilized to group the patients having the liver maladies. This examination
has utilized an informational set which is taken from UCI. In this test WEKA which is
an information mining device is used. It had likewise utilized another five savvy sys-
tems J48,Random Forest, MLP, SVM and Bayesian Network classifiers are utilized.
In the stage 1, all the picked algorithms are connected on the first informational set to
acquire the level of accuracy. In stage 2, a strategy called feature selection is utilized
and connected on the whole informational set to get the subset of liver patients and
all these picked algorithms are utilized for testing the subset of whole informational
set. In stage 3 they have done the correlation of results before the feature selection
and after the feature selection. After the feature selection, algorithms give the most
astounding rightness as J48 gives the 70.669% of rightness, 70.8405% precision is
acquired by utilizing MLP calculation, SVM gives 71.3551% accuracy, 71.8696%
rightness is gotten from Random forest Bayes Net offers 69.1252% rightness.
The researchers Manimeglai and Fathima [16] have done a work for the prediction
of the sickness called Arbovirus-Dengue. The Data mining algorithms which are
utilized by them are Support Vector Machine(SVM). Data set that is utilized for
investigation is taken from the King Institute of Preventive Medicine which is of
Chennai and overviews of numerous healing facilities and research facilities which
is of Tirunelveli from India. It contains 5000 examples and 29 attributes.R venture
form 2.12.2 is utilized for looking at the information. Accuracy that is gotten by
SVM is 0.9042.
A researcher Karlik [17] gives an examination that demonstrates the comparison
between back propagation classifiers and Naïve Bayes which is utilized for diagnos-
ing hepatitis malady. There is a fundamental preferred standpoint in utilizing these
classifiers. That is just little measure of information is utilized with the end goal of
classification. There are a few sorts in hepatitis like A, B, C, D and E. These are
caused by different infections. An open source programming called Rapid Miner
has been utilized as a part of this examination. The informational set is gotten from
UCI. Informational set comprises of 155 cases and 20 highlights. The quantity of
attributes utilized as a part of this investigation is 97% accuracy acquired from Naive
Bayes classifier.
A researcher Eystratios and Keramidas [18] has given a USG image analysis
method for the detection of boundary of thyroid nodule. Initially Region of Interest
which is usually said as ROI has been selected. Thyroid Boundary Detection Algo-
rithm which is known as TBD algorithm has been used. K-Nearest Neighbour (k-NN)
algorithm has been selected as a most powerful and useful classification method. The
works well on longitudinal USG images.
In 2015 a researcher Baby [19] has developed a model that takes kidney disease
data set of patients and developed a model that can predict the type of kidney disease.
The model used several classification algorithms like random forests, ADtrees, j48,
K-means algorithms and compared the results upon statistics that showed Random
forest gives better result than the other algorithms.
In 2017 a researcher Sontakke [20] has developed to study and compare two
methodologies like machine learning and Artificial Neural Networks (ANN) which
they are compared on the deaths reported and classified based upon different types
of liver diseases in which ANN got better results. The field of diseases diagnosis is
going to have more number of advancements in the coming years.
In 2017 Razia [21] developed a framework model to diagnose the thyroid dis-
ease using machine learning techniques. The unsupervised learning and supervised
learning are used to diagnose the thyroid disease and compared with the decision
tree model ultimately the framework model is outperformed than the decision tree
model.
3 Algorithms in ML for Disease Diagnosis
There are several algorithms used in two phases of disease prediction that are mainly
classified into two phases
Pre-processing: In pre-processing we have several techniques for cleaning of
the data where we need to combine heterogeneous, high dimensional data which
contains noise and missing data. In the next step we need to apply feature scaling
for the data so that the new data when entered into the model can predict correctly.
At the last step in this phase we need to apply dimensionality reduction techniques
to combine the data
For reducing the dimensions we have several algorithms like.
3.1 Principal Component Analysis (PCA)
In this method we use orthogonal transformation which is a statistical technique for

converting possible set of correlated values known as principal components.
210 S. Razia et al.
3.2 Linear Discriminant Analysis (LDA)
It is a strategy utilized as a part of insights for pattern recognition to discover linear

combination of features in machine learning approach. The result of this will be used
for linear classification.
In the next Phase contains several machine learning classification algorithms as
3.3 Decision Trees
The decision tree is one of the most important and also most used classification
algorithm. This algorithm utilizes a divide and conquer strategy to build a tree. There
are a set of occurrences which are related with a collection of attributes. A decision
tree comprises of nodes and leaves in which nodes are tests on the estimations of
a characteristics or attributes and leaves are the classes of an example that fulfills
the given conditions. The outcome might be “true” or “false”. Rules can be acquired
from the way which begins from the root node and finishes at the leaf node and
furthermore uses the nodes in transit as preconditions for the got rule, to foresee the
class at the leaf. The tree pruning must be done to evacuate pointless preconditions
and duplications.
3.4 K-means
It can be said as k-means grouping. This is a technique for Vector Quantization, which
is initially from signal processing, which is known for cluster analysis in information
mining. We can utilize the 1-closest neighbour classifier on the cluster centres that
are acquired from k-means to group the new information into effectively existing
groups.
3.5 Knn
It can be called as K-nearest neighbour. It is the clustering technique used for clus-
tering of data. It can be considered as another version of K-means. It doesn’t use the
mean and distance. Instead it is based upon voting of the nearest neighbours in the
k-clusters.
4 Conclusion
The Machine Learning is a type of brute force mechanism which tries to find the
correlation between the numerical attributes of inputs with matching outputs based
upon the previous data. In other words, there is no suitable algorithm that can be so
good for using in disease prediction as there is more labelled data. So as of now there
exists some limitations even for machine learning algorithms.
References
1. Tiwari, V., Diwan, T.D., Miri, R.: Design and implementation of an efficient relative model in
cancer disease recognition. IJARCSSE (2013)
2. Peleg, M., Tu, S.: Decision support, knowledge representation and management. IMIA (2006)
3. Khaleel, M.A.: A survey of data mining techniques on medical data for finding frequent
diseases. IJARCSSE (2013)
4. Vembandasamy, K., Sasipriya, R., Deepa, E.: Heart diseases detection using naive bayes
algorithm. IJISET (2015)
5. Chaurasia, V., Pal, S.: Data mining approach to detect heart disease. IJACSIT, 56–66 (2013)
6. Parthiban, G., Srivatsa, S.K.: Applying machine learning methods in diagnosing heart disease
for diabetic patients. IJAIS, 25–30 (2012)
7. Tan, K.C., Teoh, E.J., Yu, Q., Goh, K.C.: A hybrid evolutionary algorithm for attribute selection
in data mining. IJDKP, 8616–8630 (2009)
8. Iyer, A., Jeyalatha, S., Sumbaly, R.: Diagnosis of diabetes using classification mining
techniques. IJDKP, 1–14 (2015)
9. Sarwar, A. and Sharma, V.: Intelligent naive bayes approach to diagnose diabetes type-2.
ICNICT, 14–16 (2012)
10. Ephzibagh, E.P.: Cost effective approach on feature selection using genetic algorithms and
fuzzy logic for diabetes diagnosis. IJSC (2011)
11. Fatima, M., Pasha, M.: Survey of machine learning algorithms for disease diagnostic. J. Intell.
Learn. Syst. Appl. 1–16 (2017)
12. Jian, A.: Machine learning techniques for medical diagnosis. ICSTAT (2015)
13. Alic, B.: Machine learning techniques for classification of diabetes and cardiovascular diseases.
Mediterr.Ean Conf. Embed. Comput. (2017)
14. Vijayarani, S., Dhayanand, S.: Liver disease prediction using SVM and naive bayes algorithms.
Int. J. Sci. Eng. Technol. Res. (IJSETR) 4, 816–820 (2015)
15. Gulia, A., Vohra, R., Rani, P.: Liver patient classification using intelligent techniques. IJCSIT,
5011–5115 (2014)
16. Fathima, A., Manimegalai, D.: Predictive analysis for the arbovirus-dengue using SVM
classification. Int. J. Eng. Technol. 521–527 (2012)
17. Karlik, B.: Hepatitis disease diagnosis using back propagation and the naive bayes classifiers.
J. Sci. Technol. 1, 49–62 (2011)
18. Eystratios G. Keramidas, E.G.: Efficient and effective image analysis for thyroid nodule
detection. ICIAR, 1052–1060 (2007)
19. Baby, P.: Statistical analysis and predicting kidney diseases using machine learning algorithms.
IJERT (2015)
20. Sontakke, S., Lohokare, J.: Diagnosis of liver diseases using machine learning. ICEI (2017)
21. Razia, S., Narasingarao, M.R.: A Neuro computing frame work for thyroid disease diagnosis
using machine learning techniques. 95(9), 1996–2005. ISSN: 1992–8645 www.jatit.org. E-
ISSN: 1817-3195
212 S. Razia et al.
22. Kavuri, M., Prakash, K.B.: Performance comparison of detection, recognition and tracking
rates of the different algorithms. Int. J. Adv. Comput. Sci. Appl. 10(6), 153–158 (2019)
23. Pathuri, S.K., Anbazhagan, N.: Basic review of different strategies for sentiment analysis in
online social networks. Int. J. Recent. Technol. Eng. 8(1) (2019). ISSN: 2277-3878
24. Padma, G.V., Kishore, K.H., Sindura, S.J.: Controlling the traffic interactions with high mobility
and constant network connectivity by vanets. Lect. Notes Electr. Eng. 593–601 (2018). ISSN
No: 1876-1100, E-ISSN: 1876-1119
25. Yadlapati, A., Kakarla, H.K.: Constrained level validation of serial peripheral interface protocol.
Smart Innov. Syst. Technol. 77, 743–753 (2018). ISSN No: 2190-3018, E-ISSN: 2190-3026
26. Murali, A., Kishore, K.H., Srikanth, L., Rao, A.T., Suresh, V.: Implementation of reconfigurable
circuit with watch-points in the hardware. Lect. Notes Electr. Eng. 657–664 (2018). ISSN No:
1876-1100, E-ISSN: 1876-1119
27. Razia, S. and Rao, M.N.: Machine learning techniques for thyroid disease diagnosis - a review.
(INDJST) Indian J. Sci. Technol. 9(28), Article number 93705 (2016). ISSN: 09746846
28. Razia, S., Narasingarao, M.R. and Sridhar, G.R.: A decision support system for prediction of
thyroid disease- a comparison of multilayer perceptron neural network and radial basis function
neural network. (JATIT) J. Theor. Appl. Inf. Technol. 80(3) (2015) ISSN: 1992-8645, E-ISSN:
1817-3195. www.jatit.org
29. Razia, S., Narasingarao, M.R.: Development and analysis of support vector machine techniques
for early prediction of breast cancer and thyroid. (JARDCS) J. Adv. Res. Dyn. Control. Syst.
9(6), 869–878 (2017). ISSN: 1943-023X
Heuristic Approach to Evaluate
the Performance of Optimization
Algorithms in VLSI Floor Planning
for ASIC Design
S. Nazeer Hussain and K. Hari Kishore
Abstract A research on VLSI Floor planning’s physical layout is addressed using

optimization methods to improve VLSI chip efficiency. VLSI floor planning is
regarded as a non-polynomial issue. Calculations can solve such issues. Representa-
tion of floorplan is the basis of this process. The depictions of the floor plan demon-
strate more effect on search space as well as the design complexity of the floor plan.
This article aims at exploring various algorithms which add to the issue of man-
aging alignment limitations such as excellent positioning, optimal region and brief
run time. Many scientists are proposing and suggesting diverse heuristic algorithms
and also distinct metaheuristic algorithms to solve the VLSI Floor plan issue. Sim-
ulated Annealing, tab search, ant colony optimization algorithm at last the genetic
optimization algorithm are addressed in this article.
Keywords Circuit · Design · Floorplanning · System · VLSI
1 Introduction
With fast technological modifications and improvements, the complexity of circuit

design is growing and the region occupying a bigger region is therefore playing a
crucial role in the design of circuits. Physical design begins with the original phase
of Floor Planning, that determines block sizes and also the places where the blocks
located in an IC by keeping in mind attaining minimum area and also interconnecting
wire length. The floorplan is helpful for renovation of size and also complexity
among floor plan and representation of floorplan. This floorplan in VLSI is regarded
as NP Hard issue. As the number of modules increases, finding the optimal solution
could become very hard that encounters the desired representation of the floor plan.
Floorplan quality is purely depending on how well it is represented. The figure shows
S. Nazeer Hussain (B) · K. Hari Kishore

Department of ECE, Koneru Lakshmaiah Educational Foundation, Vaddeswaram, Guntur, Andhra
Pradesh 522502, India
K. Hari Kishore
e-mail: kakarla.harikishore@kluniversity.in
https://doi.org/10.1007/978-3-030-38445-6_16
214 S. Nazeer Hussain and K. Hari Kishore
VLSI system design flow. The design flow involves various steps such as providing
specifications to the system using schematic or HDL coding, giving the architectural
design, checking of functionality and its design followed by physical design and
verification, fabrication and packaging.
This article focuses primarily on the method of physical design. Physical design:
The geometry of blocks such as size and shape are allocated spatial places and appro-
priate connections for providing routing are also created to obtain ideal region during
this processing stage. As a result of this method, a set of production requirements
must be subsequently verified [1].
Physical design step is split into sub-stages. They are partitioning the system,
floor planning, placement, routing and so on. Breaking down a circuit into several
manageable size sub-circuits is called as partitioning.
For estimating the reliability, performance as well as size of VLSI ICs, the floor
plan is used. The objective is to assign space for the circuit modules so that there
exist no chance for overlapping of modules with each other.
Placement is used according for allocating circuit modules based on their geo-
metrics. Design components are of the same width as the standard cell array method,
while the design components are of varying dimensions in the method of macro cell
design and they are placed invariably with the aim of achieving the optimal design
area of IC.
Routing is divided into two: first is the global routing and the second id detailed
routing. The routing process attempts to determine how to interconnect distinct mod-
ules available on the chip. For instance, physical design demonstrates direct effect
on the performance of the circuit, reliability as well as area., chip performance is
influenced by longer paths as lengthy routes lead to significantly longer delays. Chip
area affected by uneven component placement and so on.
1.1 Need for Optimization
In relation to the widespread adoption of advanced microelectronic systems, rising

technological necessities have generated exceptional demand of large-scale, com-
plicated and embedded circuits. Meeting the above requirements needed technolog-
ical changes in both resources as well as the processing facilities, notable upsurges
in the amount of people engaged in designing an integrated circuit, and increased
significance in using the computer effectively to assist in design.
Physical design is a complicated issue of optimization with multiple distinct
goals like optimizing the interconnecting wire length, reducing area as well as Vias.
Collective optimization goals are improvement in terms of performance, reliability,
etc.
Optimization could be defined as maximizing or minimizing a set-related function
that often represents a range of possibilities available in specific situations.
Heuristic Approach to Evaluate the Performance of Optimization … 215
1.2 Existing Optimization Algorithms Overview
Floor Planning helps in determining the position/position of the module for achieving
a lowest surface area and also least interconnecting wire length. Researchers have
worked over various meta-heuristic algorithms and also heuristic ones with the help
of floorplan representations. Some of such types are Simulated Annealing algorithm,
tabu search optimization algorithm, ant colony algorithm, genetic algorithm.
1.3 Simulated Annealing Algorithm
Kirkpatrick, 1980’s suggested a simulated annealing algorithm; Gellat and Vecchi

from 1982 to 1983 and Cerny from 1985 onwards. The inspiration for this strategy
is to discover the optimal solution depending on correlation available between the
solids physical annealing (also referred as hardening) method and problem of solving
big combinatorial optimization issue.
Annealing includes reaching the upper state of a solid by melting at a greater
temperature and gradually reducing the temperature (annealing), placing elements
in the ground state [2]. The process flow is described as follows:
Step1: Initially heat the metal at high temperature.
Step2: When the temperature decreases gradually forms the crystal.
Step3: If the temperature reduces or decreases very slowly helps to attain higher state
of energy.
1.4 Tabu Search Algorithm
One of the meta-heuristic approaches with non-slicing constraints to floor planning

problem is the Tabu search. It enters the category of iterative heuristics intended
to provide alternatives to issues related to the optimization of combinatorial prob-
lems. Actually this algorithm is simplification of local search method hunting for
the best change occurred in the current neighborhood solution. Unlike local search
optimization algorithm, TS is not trapped in local optima as this method agrees even
to proceed for bad moves if unvisited solutions are planned [3].
1.5 Ant Colony Optimization Algorithm
Ant Colony (ACO) is an optimization based on the population, used to discover the
best solution to complicated optimisation problems. It has initialisation, design and
feedback of the Ant colony as three phases. The phase of initialization consists of
setting parameters such as amount of ants and colonial number. The building stage
includes building the route based on the concentration of pheromones. The feedback
phase comprises the extraction in addition to reinforcement during path searching of
ants traveling experiences [4].
TSP also plays a crucial role in searching for Ant Colony, it helps to find the
shortest distance path in the Source and Food center. As given in flowchart, the
parameters must initially be set. Selection of any town and building route and then
the ant move to the chosen town, if the distance reached is short, then the process
should stop requirements by updating the value and repeat the parameter setting
measures.
1.6 Genetic Algorithm
Using the Genetic algorithm, the issue of obtaining optimal area and optimal min-
imum wire length is unraveled. Working flow of Genetic algorithm is as shown
below:
• Total population is considered
• Select the chromosomes of fine population
• Fitness value calculation
• Alter the value obtained by fitness using mutation operator
• For every iteration, calculate the final cost function
• If cost is minimal, then present population is treated as an optimized consequence.
• Otherwise, the operation will continue until the required outcome is obtained.
2 Literature Survey
In 2016, Sivasubramanian et al. [5] suggested a method for VLSI floor planning
that focuses on area decrease as an enhanced harmony search in addition to the
TMSA (twin memory search algorithm). These two memories are randomly produced
initialized with HMS. The findings of this article showed that the suggested THMS
algorithm lowers various parameters such as region, length of wire, time.
In 2015, a fresh technique for achieving the reduced wire length in FPGA place-
ment was suggested by Premalatha et al. [6]. In this, “Attractive and Repulsive Particle
Swarm Optimization Algorithm” is discussed. Depending on Factor D, the updating
of speed values is performed in the ARPSO algorithm. The results obtained through
simulation show that the ARPSO algorithm is capable of obtaining the minimal wire
length optimized placement in FPGAs.
In 2014, an algorithm was suggested by Shanavaset al. [7] to obtain best solu-
tion for problems in VLSI physical design. Advanced GA named Hybrid Genetic
algorithm was used by the writers to find a solution. The writers collected all phys-
ical design computations separately in this article. Genetic algorithms are used to
optimize globally. Simulated Local Optimization. The findings are developed and
produced as tables comparing optimization of partition as well as floorplanning using
Genetic Algorithm with other hybridized algorithms. Simulated Annealing for place-
ment optimization is compared with the hybrid algorithms and the optimization of
routing results using Simulated Annealing are equated and compared with the hybrid
algorithms.
Chen et al. [8] provided a notion of uniformity restricted floor planning in 2013,
using the Half Perimeter Wire Length (HPWL) approach for the estimating the wire
length as well as area. This article describes the packing algorithm for Longest
Common Subsequence (LCS), in which the pre-packed array blocks are treated as a
large block. The algorithms for floorplanning were introduced in c++ and on MCNC
benchmark circuits the findings were generated.
Abdullah et al. [9] launched the VLSI Floor Planning Design clonal selection algo-
rithm in 2013. Preliminaries were considered by the writers as depiction of the floor
plan, normalized polish expression, artificial immune system, cost function, price
of floor plan, clonal selection algorithm. Tabulate the outcomes with the standard
circuit benchmark of MCNC in addition to GSRC.
In 2013, a research on B*tree based evolutionary algorithms for Floorplanning
optimization was conducted by Gracia et al. [10]. Various algorithms for optimizing
floor planning like Fast Simulated Annealing, tabu search integrated with SA, evo-
lutionary and simulated annealing, HGA (hybrid genetic algorithm), DE algorithm
were addressed in this article. Using MCNC benchmark circuits, all these algorithms
are contrasted.
In 2013, Sivaranjani et al. [11] provided an analysis of the floor planning in VLSI
using evolutionary algorithms by analyzing the performances. Different optimisation
algorithms, like Particle Swarm Optimization, Hybrid Particle Swarm Optimization
and Genetic Algorithm for improved placement outcomes were described in their
paper. The efficiency of algorithms is achieved by applying the MATLAB programs
on conventional MCNC benchmarks.
In 2012, it was Singha et al. [12] provided a genetic algorithm-based strategy
to solving issues with VLSI non-slicing floor planning. B* structure of the tree is
used for non-slicing floor planning. Authors have stated this style to the fresh genetic
algorithm, named as Iterative Prototypes Optimization with Evolved Improvement
(POEMS). Genetical algorithm is used for local searches in this algorithm and is
mainly focused on optimizing the execution time related to the algorithms.
In 2011, using genetic algorithms, Hoyingcharoenet al. [13] suggested fault tol-
erance for the optimization of sensor positioning. Their work is designed to ensure
the minimum detection probability. Sensor nodes fail for the tolerance of faults and
the minimum amount of sensor nodes are used to reach a minimum probability of
detection even if a number of sensor nodes fail to function.
In 2011, Sheng et al. [14] projected a relay race algorithm for minimum area place-
ment of the VLSI module. The paper compares the genetic algorithm, the simulated
annealing algorithm and the suggested relay race algorithm which gives the worst
positioning instances for multi-objective problems. The experiments were carried

out on the standard benchmark circuit MCNC ami49 with a 50 percent improvement
in operating times.
In 2010 Chen et al. [15] were shown genetic hybrid algorithms, mDA and genetic
algorithms, and memic algorithms for the non-slicing hard VLSI flooring scheduling
module with a B*Tree image. Results were demonstrated using MCNC benchmark
circuit HGA effectiveness. It has shown in the outcomes that the circuit region is
decreased using the hybrid genetic algorithm.
Chen et al. [16] introduced a fresh method in 2008 in which the integer coding
is adjusted depending upon the module number. In their work, Discrete Particle
Swarm optimization along with genetic algorithm is integrated into it by mutation
and crossover operators for better optimization. The authors offered the comparison
of Simulated Annealing along with representation of B*Tree, intelligence of Particle
Swarm and also the algorithms of DPSO.
The experiments used benchmark circuits MCNC in addition to GSRC and the
suggested algorithm produced excellent positioning outcomes by avoiding local
minima solutions.
3 Investigation of Experimental Results Obtained

by the Above Algorithms
The effectiveness of the layout in the ASIC design stage needs to be determined.
The primary objective of the floor planning is to reduce the delay and chip region
[17]. By correctly placing the logic blocks, this could be attained. The interconnec-
tions must therefore be predicted and the interconnection delayed before the actual
routing is finalized. Predicting interconnections is normally difficult without under-
standing the blocks of source and also destination. With the above brief, there is a
minimum solution for an earth scheduling system containing 20 logical blocks. The
experimentation was conducted using MATLAB technical computing language for
20 blocks with 50 iterations [18]. The outcomes are displayed in Figs. 1, 2, 3 and 4
after simulation. The arrangement of blocks begins with the selection of a block as a
reference that leads to the process of routing [19] depending over the constraint that
each block has reached. For distinct algorithms, the best solutions are obtained to
arrange the blocks randomly in the given region. Among the algorithms mentioned
in Table 1 is the Genetic Algorithm which gives promising outcomes (Figs. 5, 6, 7
and 8).
The above mentioned statistics predicts the placement of logic blocks based on
Figs. 5, 6, 7 and 8 and their minimum interconnecting wire length.
Fig. 1 Design flow of VLSI

circuits System
Specifications
Architectural design
Functional and Logic Design
Physical Design
Physical verification
Fabrication
Packaging and Testing
Chip
4 Conclusion
A research on the issue of physical design floor planning in VLSI is conducted in

this document. The idea behind this is to achieve minimum area and wire length,
as the modules are positioned in a chip design using the floor scheduling technique.
The study has been performed for twenty blocks with 50 iterations and Genetic
algorithm has produced promising results in relation to the other methods described
in this paper. These findings therefore encourage scientists to create changes to the
genetic algorithm accessible for multiple unit outcomes.
Fig. 2 Physical design flow

of VLSI
Fig. 3 Simulated annealing

Fig. 4 Ant colony

optimization flow
Table 1 Comparison of algorithms for 20 blocks

Methods Simulated annealing Tabu search Ant colony Genetic algorithm
Best cost 441.05 395.4348 90.90 65.70
Fig. 5 Simulated annealing
Fig. 6 Tabu search

Fig. 7 Ant colony algorithm
Fig. 8 Genetic algorithm
References
1. Kahng, A.B., Lienig, J., Markov, I.L., Hu, J.: VLSI Physical Design: From Graph Partitioning
to Timing Closure. Springer, New York (2011)
2. Van Laarhoven, P.J.M., Aarts, E.H.: Simulated annealing. Theory and Applications (1987);
Dorigo, M., Caro, G.D., Gambardella, L.M.: Ant algorithms for discrete optimization. Artif.
Life 5(2), 137–72 (1999)
3. Ninomiya, H., Numayama, K., Asai, H.: Two-staged tabu search for floorplan problem
using o-tree representation. In: Proceedings of IEEE Congress on Evolutionary Computation,
Vancouver (2006)
4. Cordón García, O., Herrera Triguero, F., Stützle, T.: A review on an colony optimization
meta-heuristic: basis, models and new trends (2002)
5. Sivasubramanian, K., Jayanthi, K.B.: Voltage-island based floorplanning in VLSI for area
minimization using meta-heuristic optimization algorithm. Int. J. Appl. Eng. Res. 11(5), 3469–
3477 (2016). ISSN: 0973-4562
6. Premalatha, B., Umamaheswari, D.S.: Attractive and repulsive particle swarm optimization
algorithm based wire length minimization in FPGA placement. Int. J. VLSI Des. Commun.
Syst. 03 (2015)
7. Shanavas, I.H., Gnanamurthy, R.K.: Optimal solution for VLSI physical design automation
using hybrid genetic algorithm. Hindawi Publ. Corp. Math. Probl. Eng. 2014
8. Chen, X., Hu, J., Xu, N.: Regularity-constrained floorplanning for multi-core processors. Integr.
VLSI J. 47, 86–95 (2014)
9. Abdullah, D.M., Abdullah, W.M., Babu, N.M., Bhuiyan, M.M.I., Nabi, K.M., Rahman, M.S.:
VLSI floorplanning design using clonal selection algorithm. Int. Conf. Inform. Electron. Vis.
(ICIEV) (2013)
10. Gracia, N.R.D., Rajaram, S.: Analysis and design of VLSI Floorplanning algorithms for nano-
circuits. Int. J. Adv. Eng. Technol. (2013)
11. Sivaranjani, P., Kawya, K.K.: Performance analysis of VLSI floor-planning using evolutionary
algorithm. Int. J. Comput. Appl. 0975–8887 (2013)
12. Singha, T., Dutta, H.S., De, M.: Optimization of floor-planning using genetic algorithm.
Procedia Technol. 4, 825–829 (2012)
13. Hoyingcharoen, P., Teerapabkajorndet, W.: Fault tolerant sensor placement optimization with
minimum detection probability guaranteed. In: 8th International Workshop on the Design of
Reliable Communication Networks (DRCN) (2011)
14. Sheng, Y., Takahashi, A., Ueno, S.: RRA-based multi-objective optimization to mitigate the
worst cases of placement. In: IEEE 9th International Conference on ASIC (ASICON) (2011)
15. Chen, J., Zhu, W.: A hybrid genetic algorithm for VLSI floorplanning. In: IEEE International
Conference on Intelligent Computing and Intelligent Systems (ICIS) (2010)
16. Chen, G., Guo, W., Cheng, H., Fen, X., Fang, X.: VLSI Floor planning based on particle
swarm optimization. In: 3rd International Conference on Intelligent System and Knowledge
Engineering (2008)
17. Kilaru, S., Harikishore, K., Sravani, T., Chowdary, A., Balaji, T.: Review and analysis of promis-
ing technologies with respect to fifth generation networks. In: 1st International Conference on
Networks and Soft Computing. (2014). ISSN: 978-1-4799-3486-
18. Gopal, P.B., Kishore, K.H. and Kittu, B.P.: An FPGA implementation of on chip UART testing
with BIST techniques. Int. J. Appl. Eng. Res. 10(14), 34047–34051 (2015). ISSN: 0973-4562
19. Hussain, S.N., Kishore, K.H.: Computational optimization of placement and routing using
genetic algorithm. Indian J. Sci. Technol. 9(47), 1–4, (2016). ISSN: 0974-6846
Enhancement in Teaching Quality
Methodology by Predicting Attendance
Using Machine Learning Technique

and Mudassir Khan
Abstract An important task of a teacher is to make every student learn and pass the
end examination. For this, teachers make lesson plans for year/semester according
to number of working days with a goal to complete syllabus prior to final examina-
tion. The lesson plans are made without knowledge of the class attendance for any
particular day, since it is hard for a teacher to make a correct guess. Therefore, when
class strength is unexpectedly low on a given day, the teacher can either postpone
the lecture to next day or continue and let the absent students be at loss. Postponing
the lecture will not complete the syllabus on expected time and letting students be
at loss is also not a solution. This paper will discuss the solution to this problem by
using a Machine Learning Model which is trained with past records of attendance
of students to find a pattern of class attendance and predict accurate class strength
for any future date according to which the lesson plans can be made or modified.
Teachers having prior knowledge of class strength will help them to act accordingly
to achieve their goals.
Keywords College · Attendance · Academic · Performance · Teaching ·

Undergraduate
1 Introduction
The academic performance of students also depends on attendance percentage of

student for a course [1]. But still most students tend to casually skip classes in a very
E. Rashid
RTC Institute of Technology, Ranchi, Jharkhand, India
M. D. Ansari (B)
CMR College of Engineering &Technology, Hyderabad, India
V. K. Gunjan
CMR Institute of Technology, Hyderabad, India
M. Khan
College of Science and Arts, King Khalid University, Abha, Saudi Arabia

https://doi.org/10.1007/978-3-030-38445-6_17
228 E. Rashid et al.
random manner. The reason for skipping a class on a given day depends on numbers
of factors [2]. Since, students’ behavior of attending classes cannot be controlled, it
is necessary to construct lesson plans accordingly. It is the task of teacher to make
teaching plans so that all students learn and pass in examinations. But when students
do not attend classes for no apparent reason and the overall class attendance is low,
it is hard for teachers to decide whether to continue with the lecture or postpone for
another day. When most of the class is absent, there is no point in continuing with
the lecture as most students will not be able to understand the later lectures after this
and ultimately this will have an effect on the examination. Also postponing the class
for another day leaves the chance of falling behind the expected date of completion
of the course.
The idea here is to make lesson plans considering the overall class attendance of
the class so that major topics of subject are not scheduled on a day with low class
attendance. But students tend to skip classes in a irregular and random manner which
makes it hard to correctly guess attendance of a class on a given day and incorrect
guesses are useless. The factor on which class attendance depends is found from a
survey. A large dataset is then created with previous attendance data which can be
used to train a machine learning model to make future predictions.
Attendance of the class is the dependent variable and the different factors affecting
attendance are independent variables. Luca Stanca has presented new evidence in
2006 which affects attendance on academic performance [5]. Nyatanga and Mukorera
in their research found one in four of the students enrolled complete their degrees in
the minimum stipulated time [6]. Godlewska et al., they mentioned in their research
that the main obstacle to the development and maintenance of this model is insti-
tutional culture [7]. Lam and Zou in their research observed that five ubiquitous
pedagogic challenges confronted by educators and students within traditional class-
room contexts [8]. There are various techniques available in the litrature, which are
usefull to predict the attendance of the students includes machine learning and image
processing techniques [9–15].
2 Relationship Between Teaching Quality Methodology

and Attendance
Students should attend classes to learn from the course they have enrolled in col-
lege. But, due to different reasons like poor teaching quality in institutions or lack
of interest in students or difficulty level of courses, students do not attend classes.
Universities in India consider attendance as a parameter to grade students. The min-
imum attendance required to pass a course is 65–75% in most Indian Universities
[3]. Different Universities around the world have different attendance requirements
for students to fulfill, from having no restriction on attendance to depending on the
course lecturer. But all of them mention that “Unexcused absence may affect Stu-
dents grades” [4]. Therefore, a student’s attendance and grades are directly related.
Enhancement in Teaching Quality Methodology by Predicting … 229
It is not a hidden fact that student’s attendance affects their grades but yet it is found
that students do not attend classes. When students’ attendance cannot be controlled,
it is the task of the teacher to be dynamic and make every student learn.
Teachers have limited time to complete a course for a class. Hence, they schedule
their lectures according to the given time. It is common for teachers to fall behind
schedule as class attendance was unexpectedly low on some days of the course. But
this could be controlled if the teacher could predict class attendance while making the
schedule for course (lesson plans) such that major/important topics are not scheduled
on a day with low class attendance. Such schedule when considering class attendance
as a parameter can help most students learn from the course. Teaching quality can be
measured by how many students learn and pass in examination. Here, the teaching
quality is improved with respect to considering class attendance and keeping teaching
techniques constant. This is because of properly scheduling the classes based on
students’ attendance.
3 Methodology
A large dataset is created by collecting 2 years of attendance (see (1) in Appendix)

of students of Computer Science and Engineering Department at Aurora’s Tech-
nological and Research Institute, Parvathapur, Uppal, Hyderabad. The features are
identified from a survey asking students about the reason for not attending a specific
class and also from observed pattern of previous attendance data. The following
conclusions are drawn:
1. If there are consecutive holidays, the day before the holidays is expected to have
low attendance. Eg: If Saturday and next Monday are declared holiday for some
reason and Sunday is anyhow a holiday, it is observed that most students will
skip classes on Friday.
2. The starting days of semester is found to have low attendance.
3. The ending days of semester is found to have high attendance.
4. If there can be 7 classes in a day for a course and on a given day there are less
than half classes, the class attendance is found to be less.
5. Depending on the difficulty level of the subject; tough/hard subjects are found
to have less interest and therefore less attendance compared to easier subjects.
With this analysis of previous record of attendance and student survey, the fea-
tures of dataset is identified on which Machine Learning is be applied for the label,
attendance of class. The input features are:
(a) Next consecutive holidays- (input type:yes/no) If next two or more days are
holidays, the next consecutive holidays feature is given a ‘yes’ value else ‘no’
is given.
(b) Previous consecutive holidays -(input type:yes/no). If previous two or more
days were holidays, the previous consecutive holidays feature is given a ‘yes’
value else ‘no’ is given.
(c) Semester status—(input type: 3 levels—first, intermediate, last). For the first
3 weeks the value of this feature is ‘first’. For the last 3 weeks, the value of this
feature is ‘last. For other cases, the value is ‘intermediate’.
(d) Number of classes—(input type: 1–7). This feature will have an integer value of
how many classes are scheduled for a given day while total numbers of classes
are 7. If the value given in table for this feature is 4, that means only 4 classes
were scheduled for that day out of 7.
(e) Subject_difficulty—Depending the subject pass percentage and survey among
the students, subjects are given values of 1 for easy, 2 for average, 3 for hard.
(f) Weekday—(input type: Monday to Saturday).
Class strength and difficulty levels are not considered here because class strength
is dependent on class name and difficulty level of subject is dependent on the subject.
Also, weekday and semester status together are dependent on the date and hence, it
is not necessary to consider date here. From this data, a Machine Learning Model is
constructed as shown in Fig. 1.
The knowledge base consists of past 2 years (4 semesters) of attendance data of
students gathered randomly from 7 different classes which is an organized training
dataset with seven features discussed as above. Table 1 shows a random sample of
Fig. 1 Diagram for attendance prediction

Table 1 Data from knowledge base

Weekday Next Previous Semester Number Subject Attendance
consecutive consecutive status of of class
holidays holidays classes
Monday Yes No Intermediate 6 DBMS 39
Wednesday No No First 4 CO 43
Tuesday No No Last 5 Ml 57
Wednesday No Yes Last 7 CP 55
Saturday Yes No First 6 OS 13
Thursday No Yes Intermediate 6 FLAT 48
Tuesday Yes No Intermediate 4 CO 43
Tuesday Yes No First 6 M3 33
Monday No Yes Last 4 SE 49
Wednesday No No First 5 CN 42
Thursday Yes No Intermediate 5 SE 35
Friday No No First 7 LP 42
Saturday Yes No Intermediate 6 CP 18
Thursday No No Last 5 CO 46
Saturday No Yes Last 6 OOPS 53
Friday No Yes First 5 DAA 33
data from the knowledge base with all class names. There are 60 students on average
in each class. The complete data is available in the appendix.
Most features are categorical values with different levels. To polish the dataset
for machine learning, we applied dummy encoding for columns Weekday, Semester-
Status, and Subject. Next Consecutive Holidays and Previous Consecutive Holidays
are boolean type and Number of classes is left as it is. The training dataset now
contains 12 columns as features after applying dummy encoding.
The new columns for features are [Next-Consecutive-Holidays, Previous-
Consecutive-Holidays, Number-of-Classes, Weekday_Monday, Weekday_Saturday,
Weekday_Thursday, Weekday_Tuesday, Weekday_Wednesday, Semester-
status_LAST, Semester-status_MID, Subject_HARD, Subject_INTERMEDIATE].
The model is first trained using Linear Regression algorithm in python from
SciKitLearn Library as shown in Fig. 2 and its accuracy on test data along with new
input for prediction is shown in Fig. 3.
The model is trained well and the accuracy on test data is found to be 100% as
shown in Image-1. But the results of new predictions doesn’t perform well on new
real data as the model has trained too well and over-fitting is the only possible error
that has taken place here.
The relationship between each features and label is as shown in Fig. 4.
It is clear from Graph-1 that no feature has a linear relationship with label and
hence, the machine learning model will probably not produce desirable results using
Fig. 2 Linear regression model
Fig. 3 Linear regression model output
Fig. 4 Relationship b/w features and label

Fig. 5 Random forest regression model
Fig. 6 Random forest regression model output
linear regression algorithm. The linear model was tested against real data which is
discussed in Results Analysis in Sect. 4.
Then, we applied Random Forest Regression to train the machine learning model
which better suits our dataset as random forest uses multiple decision trees and can
handle non-linear and categorical data. The code for random forest regression model
in python is shown in Fig. 5 and the output accuracy and new prediction values are
shown in Fig. 6.
The random forest regression model is trained and also achieves 100% accuracy
on test data. A new data is passed to the model which is same as the data passed
to Linear Regression Model and the predictions class percentage (out of 100%) and
class strength is shown in Fig. 6.
4 Results Analysis
The new data passed to both the trained model says the following:
– Weekday: Thursday
– Next-Consecutive-Holiday: False
– Previous-Consecutive-Holiday: True
– Semester-Status: Mid
– Number of Classes: 4
– Subject: Hard.
The linear regression model gave inconsistent results which were not as expected
by intuition nor by verifying it with real class strength as the input data. Upon using
the linear model and random forest regression model on real data for prediction, it
was clearly found that linear model was not fit to be used for prediction of attendance.
The Random forest regression model on the other hand, had more accurate predic-
tions. When both models were tested with real data and attendance the random forest
regression model and was successful in prediction of attendance 8 out of 10 times
with an accuracy of at least 90%. Therefore, the random forest regression model can
be linked to any software application managing college attendance which can take
the required data as input and can be used for prediction of future attendance.
The model can be further improved by considering behavior of individual class by

creating a similar dataset to train the model. The above method is not feasible as
every class will require a model to be trained according to its data. Since, an overall
approximation of class strength is enough for teachers to decide how to schedule
upcoming topics, the model discussed in this paper is sufficient for our purpose.
In this way, the teachers will be able to prepare for any situation regarding class
attendance and plan accordingly about how to finish the syllabus on time while most
students are present thereby increasing teaching quality as more number of students,
than earlier, are gaining more knowledge from class by using this technique for
machine learning.
Acknowledgements We would like thank to the RTC Institute as well as CMR College of
Engineering and Technology for providing the infrastructure and facility to carry out this work.
References
1. Oghuvbu, E.P.: Attendance and academic performance of students in secondary schools: a

correlational approach. Stud. Home Community Sci. 4(1), 21–25 (2010). https://doi.org/10.
1080/09737189.2010.11885294
2. MIT Faculty Newsletter, vol. 18 no. 4 (2006). http://web.mit.edu/fnl/volume/184/breslow.html
3. Attendance Requirements (6.0) in B. Tech R16 Academic Regulation, Jawaharlal Nehru
Technological University, Hyderabad
4. Academic Rules and Regulations from the American University website

5. Stanca, L.: The effects of attendance on academic performance: panel data evidence for intro-
ductory microeconomics. J. Econ. Educ. 37(3), 251–266 (2006). https://doi.org/10.3200/JECE.
37.3.251-266
6. Nyatanga, P., Mukorera, S.: Effects of lecture attendance, aptitude, individual heterogeneity and
pedagogic intervention on student performance: a probability model approach. Innov. Educ.
Teach. Int. 56(2), 195–205 (2019). https://doi.org/10.1080/14703297.2017.1371626
7. Godlewska, A., Beyer, W., Whetstone, S., Schaefli, L., Rose, J., Talan, B., Kamin-Patterson,
S., Lamb, C., Forcione, M.: Converting a large lecture class to an active blended learning
class: why, how, and what we learned. J. Geogr. High. Educ. (2019). https://doi.org/10.1080/
03098265.2019.1570090
8. Lam, K.C., Zou, P.: Pedagogical challenges in international economic education within tradi-
tional classroom contexts. J. Teach. Int. Bus. 29(4), 333–354 (2018). https://doi.org/10.1080/
08975930.2018.1557096
9. Gautam, P., Ansari, M.D., Sharma, S.K.: Enhanced security for electronic health care informa-
tion using obfuscation and RSA algorithm in cloud computing. Int. J. Inf. Secur. Priv. (IJISP)
13(1), 59–69 (2019)
10. Kaur, R., Chawla, M., Khiva, N.K., Ansari, M.D.: Comparative analysis of contrast enhance-
ment techniques for medical images. Pertanika J. Sci. Technol. 26(3), 965–978 (2018)
11. Ansari, M.D., Singh, G., Singh, A., Kumar, A.: An efficient salt and pepper noise removal and
edge preserving scheme for image restoration. Int. J. Comput. Technol. Appl. 3(5), 1848–1854
(2012)
12. Sethi, K., Jaiswal, V., Ansari, M.D.: Machine learning based support system for students
to select stream (Subject). Recent. Pat.S Comput. Sci. 12, 1 (2019). https://doi.org/10.2174/
2213275912666181128120527
13. Rashid, E., Ansari, M.D.: Fixing the bugs in software projects from software repositories for
improvisation of quality. Recent. Adv. Electr. Electron. Eng. 12, 1 (2019). https://doi.org/10.
2174/1872212113666190215150458
14. Ansari, M.D., Mishra, A.R., Ansari, F.T., Chawla, M.: On edge detection based on new intu-
itionistic fuzzy divergence and entropy measures. In: 4th International Conference on Parallel,
Distributed and Grid Computing (PDGC) (pp. 689–693). IEEE (2016)
15. Ansari, M.D., Rashid, E., Siva Skandha, S., Gupta, S.K.: A comprehensive analysis of image
forensics techniques: challenges and future direction. Recent. Pat.S Eng. 13, 1 (2019). https://
doi.org/10.2174/1872212113666190722143334
Improvement in Extended Object
Tracking with the Vision-Based
Algorithm

and Muqeem Ahmed
Abstract The main emphases on any object based tracking with vision algorithms
are parametric state space based algorithms like a Bayesian filter and its family of
algorithms or nonparametric algorithms like Mean Shift algorithms which are color
sensitive. In this paper, We have considered vision based algorithm with Bayesian
filter algorithms. We have seen more of the state space tracking algorithms uses
point based object tracking approaches in which researchers did very well in the last
decades, with the advent of faster computing devices available the tracking algorithm
have improved a lot with extended object tracking where the object is tracked as an
entire object instead of the point-based approaches. The proposed system of the
algorithm is providing good results in terms of state estimation over its point based
approach. So using this vision algorithm the complete object can be tracked using
sensor system and this is the novelty of paper.
Keywords Kalman filter · Extended Kalman filter · Vision-based · Point-based ·

Bayesian filter
1 Introduction
There are many points based vision-based tracking algorithms the primary of them
is Extended Kalman Filter (EKF) which is much popular even today from the 1980s
but it has various drawbacks with much error in state estimation for linearization of
system, another non-linear approach for the system is particle filter which has been
E. Rashid
RTC Institute of Technology, Ranchi, Jharkhand, India
M. D. Ansari (B)
CMR College of Engineering & Technology, Hyderabad, India
V. K. Gunjan
CMR Institute of Technology, Hyderabad, India
M. Ahmed
Department of CS and IT, MANUU (Central University), Hyderabad, India

https://doi.org/10.1007/978-3-030-38445-6_18
much used though it has its drawbacks with respect to computation resources used
in this algorithm.
Due to the increased resolution capabilities of hi-tech sensors, there are extensively
needs to recognize the extended objects as an individual’s units, for maintaining the
extended object tracks. Generally, extended objects are comparatively big and their
sensors are fluctuating in nature. Therefore, sensor reports are originated by the
individual scattering centers of one and the same object. So, the individual reports
cannot be treated as similarity to point object measurements of the group of well-
separated targets. Extended targets and target groups can be found in short range
applications such as littoral observation, robotics etc.
In this research paper, we made a contrast to point based target tracking, (i.e.,)
extended object tracking which will deal with multiple targets of an object. So, the
multiple targets have distributed the measurements per target object. The extended
object tracking is used in different fields for tracking an object. But using this one
can be made single target object tracking.
To develop this method we have used Kalman filter (KF) which is used for State-
estimation and measurement of an object for Navigation and tracking.
2 Related Work
Since the problem statement is dealing with tracking and fusion for automotive
system. The literature survey was started with two approaches. One of them is Koch’s
approach will discuss about the objects with shapes and also realized with an ellipsoid
extensions of Gaussian linear system. And other approach is given by Baum will
discuss about state-space depicted by the sensor measurements where the object state
is defined with center of object and it is also applicable for Single target, problems
with the data association and fails in clutter environment.
Cuevas E. et al., discussed in [1] about vision-based algorithm which estimates and
object by using Bayesian approach of Kalman filter. They also addressed Extended
Kalman Filter (EKF) [2–4]. Basically Kalman Filter is successfully used for the
predictions of an object tracking and vision tracking as well.
Granstrom K. et al., explained and given the exact definition of extended object
tracking problem [5]. They highlighted about Multiple Target Tracking (MTT) prob-
lems including small object assumptions. As they knew MTT based small object
assumptions are highly complex problem because of time-varying number of targets,
sensor noise, measurement origin uncertainty, missed detections, clutter detections.
There are four most common approaches to multiple target tracking.
i. Probabilistic Multiple Hypothesis Tracking (PMHT) [6, 7]
ii. Multiple Hypothesis Tracking (MHT) [8–10]
iii. Random Finite Sets (RFS) approaches [11, 12]
iv. Joint Probabilistic Data Association (JPDA) [13–15].
Improvement in Extended Object Tracking … 239
A new paper overview about multiple target tracking, with a major focal point
on small, so-called point objects, has been given in [16]. In some cases it is found
that the objects have extents with shapes that cannot accurately be represented by
a simple geometric shape like an ellipse or a rectangle. For estimation of arbitrary
object shapes, the literature contains at least two different types of approaches: either
the shape is modeled as a curve with some parameterizations [17–21] or the shape
is modeled as combination of ellipses [22–24]. Moreover, object can be detected
with the help of other algorithms such as edge detection, image forensics, image
processing, image segmentation and machine learning algorithms [25–31].
3 Proposed Methodology
The main aim and objective of the paper is tracking of an object. Tracking system is
used for the measurements of a object based on their moves and time taken which
is ordered in the sequence of a location data for other data to be processed. Based
on tracking system it is capable for rendering the state-space of an object observer
while tracking the coordinates of the object in the state-space based on the estimation
theory. Kalman filter is used for tracking the single target object in this paper. So
the implementation of kalman filter is based on prediction state of an object. Once
the prediction is done then, the observations can be used in correction update of an
algorithm.
Extended object tracking using Kalman Filter approach is used for navigation and
tracking of an object by single target tracking on the bases of using sensor system and
with the implementation of an algorithm called Kalman Filter (KF). The proposed
system is based on computer vision and visual tracking of object detection (Fig. 1).
4 Object Tracking Mathematical Model
The primary goal for any object based tracking is to estimate state path or trajectory
of a moving object. Mostly these objects are assumed to be a point in a space at
some time instance tk, holding important information (like position, velocity, accel-
eration etc.,) about object dynamics is known as state xk in a discrete time space. A
good state estimation is based on useful extraction of observation while tracking an
object. Almost all the objects in space having some state can be tracked and they are
represented in state-space model. Every state-space model consists of two things:
dynamics and observation equation as
xk + 1 = fk(xk, uk, wk) (1)
zk = hk(xk) + vk (2)
Fig. 1 Centralized tracking and fusion system
where xk, uk and zk are the object state, control input and observation vectors, at
time instance tk, while wk and vk are plant and measurement noise, and fk, hk are
time varying vector valued discrete functions, finally xk + 1 is the estimate of new
state. The dynamics model as in Eq. (1) mostly deals with motion of an object to
estimate new state (which is mostly hidden or unknown entity to an observer due to
uncertainty), while the observation model as in Eq. (2) are the measurement obtained
(by observer/sensor) based on dynamics.
There is a possibility in dynamics, there may be no control input vector uk (it is
possible that while tracking an object, the observer has no idea of any maneuvers
going to happen with object), in that case the Eq. (1) is represented as
xk + 1 = fk(xk, wk) (3)

5 Results Analysis
After passing a detection process, essentially working as a means of data rate reduc-
tion, the signal processing provides estimates of parameters characterizing the wave-
forms received at the sensors’ front ends (e.g. radar antennas). From these estimates
sensor reports are created, i.e. measured quantities possibly related to objects of
interest, which are the input for the tracking and sensor data fusion system. By using
multiple sensors instead of one single sensor, among other benefits, the reliability and
robustness of the entire system is usually increased, since malfunctions are recog-
nized easier and earlier and often can be compensated without risking a total system
breakdown.
All context information in terms of statistical models (sensor performance, object
characteristics, object environment) is a prerequisite for track and maintenance and
initiation. Track confirmation or termination, classification or identification, and
fusion of tracks related to the same objects or object groups are part of the track
management.
The scheme is completed by a human-machine interface with display and inter-
action functions. Context information can be updated or modified by direct human
interaction or by the track processor it, for example as a consequence of object
classification or road map extraction.
In this article we gave the output of extended object tracking system in which one
can apply EKF algorithm for getting the results of using sensors and object types for
tracking the single object. The output is depicted in Fig. 2 (Figs. 3 and 4).
Fig. 2 Two objects image, it tracks the both objects red color is for prediction and green color is
for estimation
Fig. 3 Two objects image, it targets single object while tracking and red color is for prediction and
green color is for estimation
Fig. 4 Two objects image, it observe the movements of the single target object while tracking
Table 1 Comparative study

Proposed system Existing system
1. It will use for tracking the 1. It is used for point based
complete extended object vision based tracking
tracking
2. It can reach the exact 2. It can’t reach exactly to the
estimation state of point based estimation
measurement
3. It will remove noisy and 3. It can’t remove
errors in the prediction
estimate
6 Comparative Analysis
In this section we have compared with new proposed system and existing system. It
can be seen in Table 1.
In this paper, we have given state-of-estimation research, and illustrated the methods
by using dissimilar sensors and object types. Increasing sensor resolutions mean that
there will be an increasing number of scenarios in which extended object methods
can be applied. However, there is possibility to cluster or segment the data in pre-
processing and can be applied standard point object methods. For this, needs careful
parameter tuning, thereby increases the risks for errors. While on the other hand uses
Bayesian filter and Kalman filter algorithms for the multiple measurements per object,
for tracking performance is much less dependent on clustering or segmentation.
Due to the high non-linearity and high dimensionality of the problem, estimation
of arbitrary shapes is still very much challenging. Therefore, needs for performance
bounds for extended object tracking methods for a given shape model, how many
measurements are required in order for the estimation algorithm to converge to an
estimate with small error? Performance bonds may help in answering the question
of which shape complexity is suitable when modeling the object. Naturally, in most
applications, one is interested in a shape description that is as precise as possible. So,
in this paper, we have introduced Extended object tracking on a single target system.
Through which the single target object can be detected by using sensor system.
Acknowledgements We would like thank to the RTC Institute as well as CMR College of
Engineering and Technology for providing the infrastructure and facility to carry out this work.
References
1. Cuevas, E., et al.: Technical Report, Free University Berlin, Aug 2005
2. Jazwinski, A.H.: Stochastic Processes and Filtering Theory. Academic Press, New York (1970)
3. Maybeck, P.S.: Stochastic Models Estimation and Control, vol. 1. Academic Press, New York
(1979)
4. Maybeck, P.S.: Stochastic Models Estimation and Control, vol. 2. Academic Press, New York
(1982)
5. Granstrom, K., et al.: arXiv: 1604.00970v3 [cs.CV] 21 Feb 2017
6. Streit, R.L., Luginbuhl, T.E.: Probabilistic multi-hypothesis tracking. Tech. Rep. DTIC
Document (1995)
7. Willett, P., Ruan, Y., Streit, R.: PMHT: Problems and some solutions. IEEE Trans. Aerosp.
Electron. Syst. 38(3), 738–754 (2002)
8. Blackman, S., Popoli, R.: Design and Analysis of Modern Tracking Systems. Artech House,
Norwood, MA, USA (1999)
9. Kurien, T.: Issues in the design of practical multitarget tracking algorithms. In: Bar-Shalom, Y.
(ed.) Chapter 3 in Multitarget-Multisensor Tracking: Advanced Applications, Artech House,
pp 43–83 (1990)
10. Reid, D.: An algorithm for tracking multiple targets. IEEE Trans. Autom. Control 24(6), 843–
854 (1979)
11. Mahler, R.: Statistical Multisource-Multitarget Information Fusion. Artech House, Norwood,
MA, USA (2007)
12. Advances in Multisource-Multitarget Information Fusion. Artech House, Norwood, MA, USA
(2014)
13. Bar-Shalom, Y.: Extension of the probabilistic data association filter to multi-target tracking.
In: Proceedings of the Fifth Symposium on Nonlinear Estimation, San Diego, CA, USA, Sep
1974
14. Bar-Shalom, Y., Daum, F., Huang, J.: The probabilistic data association filter. IEEE Control.
Syst. 29(6), 82–100 (2009)
15. Fortmann, T., Bar-Shalom, Y., Scheffe, M.: Sonar tracking of multiple targets using joint
probabilistic data association. IEEE J. Oceanic Eng. 8(3), 173–184 (1983)
16. Vo, B.N., Mallick, M., Bar-Shalom, Y., Coraluppi, S., Osborne, R., Mahler, R., Vo, B.T.:
Multitarget tracking. Wiley Encyclopedia of Electrical and Electronics Engineering, Sep 2015
17. Baum, M., Hanebeck, U.D.: Shape tracking of extended objects and group targets with star-
convex RHMs. In: 14th International Conference on Information Fusion, IEEE, 1–8 July 2011
18. Cao, X., Lan, J., Li, X.R.: Extension-deformation approach to extended object tracking. In:
Proceedings of the International Conference on Information Fusion, 1185–1192 July 2016
19. Hirscher, T., Scheel, A., Reuter, S., Dietmayer, K.: Multiple extended object tracking using
gaussian processes. In: Proceedings of the International Conference on Information Fusion,
868–875 July 2016
20. Lundquist, C., Granström, K., Orguner, U.: Estimating the shape of targets with a PHD filter.
In: Proceedings of the International Conference on Information Fusion, Chicago, IL, USA,
49–56 July 2011
21. Wahlstro¨m, N., O¨ zkan, E.: Extended target tracking using gaussian processes. IEEE Trans.
Signal Process. 63(16), 4165–4178 (2015)
22. Granström, K., Willett, P., Bar-Shalom, Y.: An extended target tracking model with multiple
random matrices and unified kinematics. In: Proceedings of the International Conference on
Information Fusion, Washington, DC, USA, 1007–1014 July 2015
23. Lan, J., Li, X.R.: Tracking of extended object or target group using random matrix—Part
II: Irregular object. In: 2012 15th International Conference on Information Fusion, IEEE,
2185–2192 July 2012
24. Lan, J., Li, X.R.: Tracking of maneuvering non-ellipsoidal extended object or target group
using random matrix. IEEE Trans. Signal Process. 62(9), 2450–2463 (2014)
25. Gautam, P., Ansari, M.D., Sharma, S.K.: Enhanced security for electronic health care informa-
tion using Obfuscation and RSA algorithm in cloud computing. Int. J. Inf. Secur. Priv. (IJISP)
13(1), 59–69 (2019)
26. Kaur, R., Chawla, M., Khiva, N.K., Ansari, M.D.: Comparative analysis of contrast enhance-
ment techniques for medical images. Pertanika J. Sci. Technol. 26(3), 965–978 (2018)
27. Ansari, M.D., Singh, G., Singh, A., Kumar, A.: An efficient salt and pepper noise removal and
edge preserving scheme for image restoration. Int. J. Comput. Technol. Appl. 3(5), 1848–1854
(2012)
28. Sethi, K., Jaiswal, V., Ansari, M.D.: Machine learning based support system for students
to select stream (subject). Recent Pats. Comput. Sci. 12, 1 (2019). https://doi.org/10.2174/
2213275912666181128120527
29. Rashid, E., Ansari, M.D.: Fixing the bugs in software projects from software repositories for
improvisation of quality. Recent. Adv. Electr. Electron. Eng. 12(1), (2019). https://doi.org/10.
2174/1872212113666190215150458
30. Ansari, M.D., Mishra, A.R., Ansari, F.T., Chawla, M.: On edge detection based on new intu-
itionistic fuzzy divergence and entropy measures. In: 2016 Fourth International Conference on
Parallel, Distributed and Grid Computing (PDGC), IEEE, 689–693 Dec 2016
31. Ansari M.D.: Rashid, E., Skandha, S.S., Gupta, S.K.: A comprehensive analysis of image
forensics techniques: Challenges and future direction. Recent. Pats Eng. 13: 1, (2019). https://
doi.org/10.2174/1872212113666190722143334

10.1007@978 3 030 38445 6

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10.1007@978 3 030 38445 6

Uploaded by

Copyright:

Available Formats

Studies in Computational Intelligence 885

Vinit Kumar Gunjan

More information about this series at http://www.springer.com/series/7092

Balasubramanian Raman G. R. Gangadharan

Balasubramanian Raman G. R. Gangadharan

ISSN 1860-949X ISSN 1860-9503 (electronic)

concerned, consisting mainly of face recognition, evolutionary algorithms such as

Hyderabad, India Dr. Vinit Kumar Gunjan

Face Recognition Using Raspberry PI . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Machine Learning Based Risk-Adaptive Access Control System

Shruti Ambre, Mamata Masurekar and Shreya Gaikwad

Abstract In an age where public security is a priority, there is a growing need

Keywords Face recognition · Haar cascade · OpenCV · Python · Raspberry PI

S. Ambre (B) · M. Masurekar · S. Gaikwad

security and surveillance purposes. Practical applications such as real-time crowd

3.1 System Architecture

Fig. 1 System design

Hardware Includes Raspberry PI 3 b+ model, Raspberry PI camera module v1,

Software Python language was used for programming face recognition.

Fig. 2 Haar classifier

Fig. 3 Euclidean distance

Dataset Creation Creation of a folder containing images of a person. The folder

Fig. 4 Flowchart for dataset training

4 Results and Discussions

Fig. 5 Flowchart for dataset testing

Fig. 6 Output showing

Fig. 7 Output showing

Fig. 8 Output showing error

1. Techopedia.com. What is Facial Recognition?—definition from Techopedia. Available at:

Joydev Ghosh, Divya Kumar and Rajesh Tripathi

Keywords Intrusion detection system · Genetic algorithm · Multi-layer

1.1 Classification of Intrusion Detection

A. Host Based Intrusion Detection

B. Network-Based Intrusion Detection

A Network-based IDS (NIDS) [3] monitors the network communications by gather-

1.2 Networking Attacks

A. Denial of Service (DoS)

B. Remote to User Attacks (R2L)

C. User to Root Attacks (U2R)

1.3 Components of Intrusion Detection System

The proposed feature extraction methodology is based on a combination of Genetic

Fig. 1 Flow diagram

Algorithm 3.1: Pre-processing (Train data)

3.2 Intrusion Detection

Creation of Initial Population Initially, a 2-dimensional matrix of M rows and N

where N is the number of features in F.

Computation of Fitness Function Each probable solution in the genetic algorithm

where chromosome fitness is calculated as the accuracy of the Multi-Layer Perceptron

Algorithm 3.2: Features-Extraction (F , Target_Values)

Classification For classification, we have used Multi-Layer Perceptron. The opti-

Algorithm 3.3: Features-Extraction (F , Target_Values)

Accuracy = (1 − RE) ∗ 100 (4)

Extracted features % Total

Features’ protocol_type Rerror–rate Dst_host_count Dst_host_srv_count Dst_host_srv_serror_rate Dst_host_rerror_rate Dst_host_srv_rerror_rate 97.299%

Table 2 Confusion metric for system evaluation

Kanakam Prathyusha and ASN Chakravarthy

Abstract Nowadays, global warming surroundings as well as heavy automobile

Keywords Cogno-Monitoring system · Chemical profiling · Air quality sensors ·

© Springer Nature Switzerland AG 2020 27

2.1 Air Quality Sensors

Table 1 Various air quality sensors

3 Cogno-Monitoring System (CMS)

Fig. 1 Schematic view of cogno-monitoring system