Bok - 978 981 33 4543 0

Lecture Notes in Networks and Systems 171
H. S. Saini
Rishi Sayal
A. Govardhan
Rajkumar Buyya Editors
Innovations
in Computer
Science and
Engineering
Proceedings of 8th ICICSE
Lecture Notes in Networks and Systems
Volume 171
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of Campinas—
UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Turkey
Derong Liu, Department of Electrical and Computer Engineering, University
of Illinois at Chicago, Chicago, USA; Institute of Automation, Chinese Academy
of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering
University of Alberta, Alberta, Canada; Systems Research Institute
Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to both
the contributors and the readership are the short publication timeframe and the
world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/15179

H. S. Saini · Rishi Sayal · A. Govardhan ·
Rajkumar Buyya
Editors
Innovations in Computer
Science and Engineering
Proceedings of 8th ICICSE
Editors
H. S. Saini Rishi Sayal
Guru Nanak Institutions Guru Nanak Institutions
Ibrahimpatnam, Telangana, India Ibrahimpatnam, Telangana, India
A. Govardhan Rajkumar Buyya

Jawaharlal Nehru Technological University CLOUDS Laboratory
Hyderabad, Telangana, India The University of Melbourne
Melbourne, VIC, Australia
ISSN 2367-3370 ISSN 2367-3389 (electronic)

Lecture Notes in Networks and Systems
ISBN 978-981-33-4542-3 ISBN 978-981-33-4543-0 (eBook)
https://doi.org/10.1007/978-981-33-4543-0
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore
Pte Ltd. 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
This volume contains 84 papers that were presented at the Eighth International
Conference on Innovations in Computer Science and Engineering (ICICSE-2020)
held during August 28–29, 2020, at Guru Nanak Institutions, Hyderabad, India, in
collaboration with Computer Society of India(CSI) and funding from All India
Council for Technical Education (AICTE).
The aim of this conference is to provide an vibrant virtual international forum
that hubs together the researchers, scientists, academicians, corporate professionals
and technically sound students under a roof to make it as a phenomenal, informative
and interactive session which is acutely needed to pave the way to promote research
advancements in the field of computer science and engineering.
ICICSE-2020 received more than 400 research papers from various sub-fields of
computer science and engineering. Each submitted paper was meticulously reviewed
by our review committee consisting of senior academicians, industry professionals
and professors from premier institutions and universities.
• This conference was inaugurated and attended by top dignitaries such as Mr. Srini
Santhanam, Vice President, S2 Integrators LLC, Atlanta, Georgia, USA; Dr. A.
Govardhan, Professor and Rector, JNTU, Hyderabad; Dr. M. Manzoor Hussain,
Professor and Registrar, JNTU, Hyderabad; and Mr. Aninda Bose, Senior Editor,
Springer India Pvt. Ltd, India.
• This conference has a fantastic line up of keynote sessions, webinars sessions
by eminent speakers, paper presentation sessions to present the latest outcomes
related to advancements in computing technologies.
• The keynote and webinar sessions were conducted on cutting-edge technologies
such as advancement in the field of artificial intelligence, advanced machine
learning techniques, cybersecurity, data science-case studies, and the invited
speakers were Dr. Sujala Deepak Shetty, Professor, BITS Pilani, Dubai Campus,
UAE; Mr. Kiran Naidu, Data Scientist, AW Rostamani, Dubai, UAE; Dr. G.
Shanmugarathinam, Professor and CISCO Certified Ethical Hacker, Presidency
University, Bengaluru, India; and Dr. B. Sateesh Kumar, Professor, JNUTH,
Hyderabad, India, respectively.
v
vi Preface
• The organizing committee of ICICSE-2020 takes the opportunity to thank the

invited speakers, session chairs and reviewers for their excellent support in making
this ICICSE-2020 a grand success during this unprecedented pandemic time.
• The quality of the research papers is a courtesy from respective authors and
reviewers to come up to the desired level of excellence. We are indebted to the
program committee members and external reviewers in producing the
best-quality research papers in a short span of time. We also thank CSI
delegates, AICTE, toward their valuable suggestions and funding in making this
event a grand success.
Hyderabad, India H. S. Saini

Hyderabad, India Rishi Sayal
Hyderabad, India A. Govardhan
Melbourne, Australia Rajkumar Buyya
Contents
Static and Dynamic Activities Prediction of Human Using Machine

and Deep Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
S. Valai Ganesh, Mohit Agarwal, Suneet Kr. Gupta, and S. Rajakarunakaran
Implementation of Braille Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Tejas Kulkarni, Shikha Jha, Sunny Gupta, and Anuja Gote
Resized MIMO Antenna for 5G Mobile Antenna Applications . . . . . . . . . 19
S. Subramanyam, S. Ashok Kumar, and T. Shanmuganantham
Molecule Four-Port Antenna is Utilizing Detachment Progress
of MIMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
K. Venugopal Rao, S. Ashok Kumar, and T. Shanmuganantham
IOT-Based Underwater Wireless Communication . . . . . . . . . . . . . . . . . . . . 33
Gitimayee Sahu and Sanjay S. Pawar
Pattern Prediction Using Binary Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
T. Aditya Sai Srinivas, Ramasubbareddy Somula, Karrothu Aravind,
and S. S. Manivannan
Fruit Recognition Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
P. Balakesava Reddy, Somula Ramasubbareddy, D. Saidulu, and K. Govinda
Cross-Domain Variational Capsules for Information Extraction . . . . . . . 63
Akash Nagaraj, K. Akhil, Akshay Venkatesh, and H. R. Srikanth
Automotive Accident Severity Prediction Using Machine Learning . . . . . 73
Niva Mohapatra, Shreyanshi Singh, Bhabendu Kumar Mohanta,
and Debasish Jena
Analysis of Quality of Experience (QoE) in Video Streaming Over
Wi-Fi in Real Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
M. Vijayalakshmi and Linganagouda Kulkarni
vii
viii Contents
Self Driven UGV for Military Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 87

Hrishikesh Vichore, Jaishankar Gurumurthi, Akhil Nair,
Mukesh Choudhary, and Leena Ladge
Vehicular Ant Lion Optimization Algorithm (VALOA) for Urban
Traffic Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Ruchika Kumari and Rakesh Kumar
Dynamic and Incremental Update of Mined Association Rules
Against Changes in Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
N. Satyavathi and B. Rama
E-Governance Using Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Poonam Salwan and Veerpaul Kaur Maan
Implementation of Voice Controlled Hot and Cold Water
Dispenser System Using Arduino . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
K. Sateesh Kumar, P. Udaya Bhanu, T. Murali Krishna,
P. Vijay Kumar, and Ch. Saidulu
Future Smart Home Appliances Using IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Pattlola Srinivas, M. Swami Das, and Y. L. Malathi Latha
Multilingual Crawling Strategies for Information Retrieval
from BRICS Academic Websites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Shubam Bharti, Shivam Kathuria, Manish Kumar, Rajesh Bhatia,
and Bhavya Chhabra
Missing Phone Activity Detection Using LSTM Classifier . . . . . . . . . . . . . . 161
Abhinav Rastogi, Arijit Das, and Aruna Bhat
Suvarga: Promoting a Healthy Society . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
R. L. Priya, Gayatri Patil, Gaurav Tirodkar, Yash Mate, and Nikhil Nagdev
Multi-task Data Driven Modelling Based on Transfer Learned
Features in Deep Learning for Biomedical Application . . . . . . . . . . . . . . . . 185
N. Harini, B. Ramji, V. Sowmya, Vijay Krishna Menon,
E. A. Gopalakrishnan, V. V. Sajith Variyar, and K. P. Soman
Punjabi Children Speech Recognition System Under Mismatch
Conditions Using Discriminative Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 195
Harshdeep Kaur, Vivek Bhardwaj, and Virender Kadyan
Effective Irrigation Management System for Agriculture Using
Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
S. T. Patil, M. S. Bhosale, and R. M. Kamble
IoT-Based Smart Irrigation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Mithilesh Kumar Pandey, Deepak Garg, Neeraj Kumar Agrahari,
and Shivam Singh
Contents ix
A Hybrid Approach for Region-Based Medical Image Compression

with Nature-Inspired Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . 225
S. Saravanan and D. Sujitha Juliet
Attention Mechanism-Based News Sentiment Analyzer . . . . . . . . . . . . . . . 235
Sweta Kaman
Interactive Chatbot for COVID-19 Using Cloud and Natural
Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Patel Jaimin, Patel Nehal, and Patel Sandip
Investigating the Performance of MANET Routing Protocols
Under Jamming Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Protiva Sen and Mostafizur Rahman
Classification of Skin Cancer Lesions Using Deep Neural Networks
and Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Danny Joel Devarapalli, Venkata Sai Dheeraj Mavilla,
Sai Prashanth Reddy Karri, Harshit Gorijavolu, and Sri Anjaneya Nimmakuri
Security Features in Hadoop—A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Gousiya Begum, S. Zahoor Ul Huq, and A. P. Siva Kumar
Optical Character Recognition and Neural Machine Translation
Using Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
K. Chandra Shekar, Maria Anisha Cross, and Vignesh Vasudevan
COVID-19 Touch Project Using Deep Learning and Computer
Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Chatla Venkat Rohit and GRS Murthy
Flood Relief Rover for Air and Land Deployment (FRRALD) . . . . . . . . . 297
Jewel Moncy John, Justin Eapen, Jeffin John, Ebin Joseph,
and Abraham K Thomas
An Enhanced Differential Evolution Algorithm with Sorted Dual
Range Mutation Operator to Solve Key Frame Extraction Problem . . . . 307
M. Aathira and G. Jeyakumar
Annotation for Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
P. Myna, R. V. Anirudh, Brundha Rajendra Babu,
Eleanor Prashamshini, and Jyothi S. Nayak
Development of Self Governed Flashing System in Automotives
Using AI Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
N. Sankarachelliah, V. Rijith Kumar, P. Senthilram, S. Valai Ganesh,
T. Selva Sundar, S. Godwin Barnabas, and S. Rajakarunakaran
Comparison Between CNN and RNN Techniques for Stress
Detection Using Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
Bageshree Pathak, Snehal Gajbhiye, Aditi Karjole, and Sonali Pawar
x Contents
Finding the Kth Max Sum Pair in an Array of Distinct Elements

Using Search Space Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
Deepak Ahire, Smriti Bhandari, and Kiran Kamble
Dynamic Trade Flow of Selected Commodities Using Entropy
Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Sharmin Akter Milu, Javed Hossain, and Ashadun Nobi
An Automated Bengali Text Summarization Technique Using
Lexicon-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
Busrat Jahan, Sheikh Shahparan Mahtab, Md. Faizul Huq Arif,
Ismail Siddiqi Emon, Sharmin Akter Milu, and Md. Julfiker Raju
Location-Based Pomegranate Diseases Prediction Using GPS . . . . . . . . . . 375
Rajshri N. Malage and Mithun B. Patil
Medical Image Enhancement Technique Using Multiresolution
Gabor Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Kapila Moon and Ashok Jetawat
HOMER-Based DES for Techno-Economic Optimization of Grid . . . . . . 393
R. Raja Kishore, D. Jaya Kumar, Dhonvan Srinu, and K. Satyavathi
Odor and Air Quality Detection and Mapping in a Dynamic
Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
Raghunandan Srinath, Jayavrinda Vrindavanam,
Rahul Rajendrakumar Budyal, Y. R. Sumukh, L. Yashaswini,
and Sangeetha S. Chegaraddi
A Comparative Study on the Performance of Bio-inspired
Algorithms on Benchmarking and Real-World Optimization
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
E. Lakshmi Priya, C. Sai Sreekari, and G. Jeyakumar
A Study on Optimization of Sparse and Dense Linear System
Solver Over GF(2) on GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
Prashant Verma and Kapil Sharma
Intracranial Hemorrhage Detection Using Deep Convolutional
Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
K. Thirunavukkarasu, Anmol Gupta, Satheesh Abimannan,
and Shahnawaz Khan
A Multi-factor Approach for Cloud Security . . . . . . . . . . . . . . . . . . . . . . . . . 437
Francis K. Mupila and Himanshu Gupta
An Innovative Authentication Model for the Enhancement
of Cloud Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
Contents xi
Substituting Phrases with Idioms: A Sequence-to-Sequence

Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
Nikhil Anand
A Composite Framework for Implementation of ICT Enabled
Road Accident Prediction Using Spatial Data Analysis . . . . . . . . . . . . . . . . 465
Dara Anitha Kumari and A. Govardhan
VISION AID: Scene Recognition Through Caption Generation
Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
Mathew Regi and Mathews Abraham
Effect of Hybrid Multi-Verse with Whale Optimization Algorithm
on Optimal Inventory Management in Block Chain Technology
with Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
C Govindasamy and A. Antonidoss
Bottleneck Feature Extraction in Punjabi Adult Speech
Recognition System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
Shashi Bala, Virender Kadyan, and Vivek Bhardwaj
A Study of Machine Learning Algorithms in Speech Recognition
and Language Identification System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
Aakansha Mathur and Razia Sultana
Plant Leaf Disease Detection and Classification Using Machine
Learning Approaches: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
Majji V. Appalanaidu and G. Kumaravelan
Single-Channel Speech Enhancement Based on Signal-to-Residual
Selection Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
Ramesh Nuthakki, Junaid Abbas, Ayesha Afnan,
Faisal Ahmed Shariff, and Akshaya Hari
Evolutionary Algorithm for Solving Combinatorial
Optimization—A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
Anisha Radhakrishnan and G. Jeyakumar
Effect of J48 and LMT Algorithms to Classify Movies
in the Web—A Comparative Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
Prashant Bhat and Pradnya Malaganve
A System to Create Automated Development Environments
Using Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
N. S. Akhilesh, M. N. Aniruddha, Anirban Ghosh, and K. Sindhu
Novel Methodologies for Processing Structured Big Data Using
Hadoop Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
Prashant Bhat and Prajna Hegde
xii Contents
Intelligent Cane for Assistant to Blind and Visual Impairment

People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
Meet Patel, Hemal Ahir, and Falgun Thakkar
A Comprehensive Survey on Attacks and Security Protocols
for VANETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
Aminul Islam, Sudhanshu Ranjan, Arun Pratap Rawat, and Soumayadev Maity
Analysis, Visualization and Prediction of COVID-19 Pandemic
Spread Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
Snigdha Sen, B. K. Thejas, B. L. Pranitha, and I. Amrita
Study of Behavioral Changes and Depression Control Mechanism
Using IoT and VR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
Pavan Kumar Katkuri and Archana Mantri
Sentiment Analysis on Hindi–English Code-Mixed Social Media
Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
T. Tulasi Sasidhar, B. Premjith, K. Sreelakshmi, and K. P. Soman
Accident Risk Rating of Streets Using Ensemble Techniques
of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
Akanksha Rastogi and Amrit Lal Sangal
Skin Detection Using YCbCr Colour Space for UAV-Based
Disaster Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
S. J. Arya, A. Asish, B. S. Febi Shine, J. L. Sreelakshmi,
and Elizabeth Varghese
Lie Detection Using Thermal Imaging Feature Extraction
from Periorbital Tissue and Cutaneous Muscle . . . . . . . . . . . . . . . . . . . . . . . 643
Prajkta Kodavade, Shivani Bhandigare, Aishwarya Kadam,
Neha Redekar, and Kiran P. Kamble
Voting Classification Method with PCA and K-Means for Diabetic
Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
Anupama Yadav, Harsh K. Verma, and Lalit Kumar Awasthi
Hybrid Model for Heart Disease Prediction Using Random Forest
and Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657
Hemant Kumar Sharma and Amrit Lal Sangal
Detection of Android Malware Using Machine Learning Techniques . . . 663
Sonal Pandey, C. Rama Krishna, Ashu Sharma, and Sanjay Sharma
The Predictive Genetic Algorithm (GA) Load Management
Mechanism for Artificial Intelligence System Implementation (AI) . . . . . 677
T. Pushpatha and S. Nagaprasad
Contents xiii
Continuous Recognition of 3D Space Handwriting Using Deep

Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693
Sagar Maheshwari and Sachin Gajjar
Automated SQL Grading System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701
Shohna Kanchan, Samruddhi Kalsekar, Nishita Dubey,
Chelsea Fernandes, and Safa Hamdare
Error Analysis with Customer Retention Data . . . . . . . . . . . . . . . . . . . . . . . 709
V. Kaviya, V. Harisankar, and S. Padmavathi
Prediction Based Task Scheduling for Load Balancing in Cloud
Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
Suresh Chandra Moharana, Amulya Ratna Swain, and Ganga Bishnu Mund
Test Case Generation Using Adequacy-Based Genetic Algorithm . . . . . . . 727
Ruchika Malhotra and Shivani Pandey
Performance Analysis of π, AL and CT for Consistency
Regularization Using Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . 737
Rishita Choubey and Koushik Bhattacharyya
An Energy-Efficient PSO-Based Cloud Scheduling Strategy . . . . . . . . . . . 749
Ranga Swamy Sirisati, M. Vishnu Vardhana Rao, S. Dilli Babu,
and M. V. Narayana
A Pronoun Replacement-Based Special Tagging System for Bengali
Language Processing (BLP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761
Busrat Jahan, Ismail Siddiqi Emon, Sharmin Akter Milu,
Mohammad Mobarak Hossain, and Sheikh Shahparan Mahtab
A High Performance Pipelined Parallel Generative Adversarial
Network (PipeGAN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769
Rithvik Chandan, Niharika Pentapati, Rahul M. Koushik, and Rahul Nagpal
Electroencephalogram-Based Classification of Brain Disorders
Using Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779
Laxmi Raja and R. Santhosh
Parallel Antlion Optimisation (ALO) and Grasshopper
Optimization (GOA) for Travelling Salesman Problem (TSP) . . . . . . . . . . 787
G. R. Dheemanth, V. C. Skanda, and Rahul Nagpal
Design and Development of Machine Learning Model
for Osteoarthritis Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795
Naidu Srinivas Kiran Babu, E. Madhusudhana Reddy, S. Jayanthi,
and K. Rajkumar
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803

Editors and Contributors
About the Editors
Dr. H. S. Saini Managing Director of Guru Nanak Institutions obtained his Ph.D.
in the field of computer science. He has over 30 years of experience at
university/college level in teaching UG/PG students and has guided several B.Tech.
and M.Tech. projects and six Ph.D. scholars. He has published/presented above 90
high-quality research papers in international and national journals and proceedings
of international conferences. He has published six books with Springer. He is a
lover of innovation and is an advisor for NBA/NAAC accreditation process to
many institutions in India and abroad. He is chief editor of many innovative
journals and chairing various international conferences.
Dr. Rishi Sayal Associate Director, Guru Nanak Institute of Technical Campus,
has completed his B.E. (CSE), M.Tech. (IT) and Ph.D. (CSE). He has obtained his
Ph.D. in computer science and engineering in the field of data mining from
prestigious Mysore University of Karnataka State. He has over 28 years of
experience in training, consultancy, teaching and placements. His current areas of
research interest include data mining, network security and databases. He has
published wide number of research papers in international conferences and
journals. He has guided many UG and PG research projects, and he is recipient of
many research grants from government funding agencies. He is co-editor of various
innovative journals and convened international conferences.
Dr. A. Govardhan is presently Professor of computer science and engineering,

Rector, JNTUH, and Executive Council Member, Jawaharlal Nehru Technological
University (JNTUH), Hyderabad (JNTUH), India. He did his Ph.D. from JNTUH.
He has 25 years of teaching and research experience. He is member on advisory
boards and academic boards and technical program committee member for more
than 85 international and national conferences. He is member on boards of
governors and academic councils for number of colleges. He has three monographs
and ten book chapters in Springer, Germany. He has guided 85 Ph.D. theses,
xv
xvi Editors and Contributors
1 M.Phil. and 135 M.Tech. projects. He has published 555 research papers at
international/national journals/conferences including IEEE, ACM, Springer,
Elsevier and Inderscience. He has delivered more than 100 keynote speeches and
invited lectures. He has chaired 22 sessions at the international/national
conferences in India and abroad. He has the research projects (completed/ ongoing)
worth of Rs. 1.159 crores.
Dr. Rajkumar Buyya is Redmond Barry Distinguished Professor and Director of

the Cloud Computing and Distributed Systems (CLOUDS) Laboratory at the
University of Melbourne, Australia. He is also serving as Founding CEO of
Manjrasoft Pvt. Ltd., a spin-off company of the university, commercializing its
innovations in cloud computing. He served as Future Fellow of the Australian
Research Council during 2012–2016. He received his Ph.D. from Monash
University, Melbourne, Australia, in 2002. He has authored/co-authored over 625
publications. He has co-authored five textbooks and edited proceedings of over 26
international conferences. He is one of the highly cited authors in computer science
and software engineering (h-index=134, g-index=298 and 95,300+ citations). He
has edited proceedings of over 25 international conferences published by
prestigious organizations, namely the IEEE Computer Society Press and Springer
Verlag.
Contributors
M. Aathira Department of Computer Science and Engineering, Amrita School of

Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
Junaid Abbas Department of Electronics and Communication Engineering, Atria
Institute of Technology, Bengaluru, India
Satheesh Abimannan School of Computer Science and Engineering, Galgotias
University, Greater Noida, India
Mathews Abraham Department of Information Technology, Rajagiri School of
Engineering and Technology, Ernakulam, Kerala, India
T. Aditya Sai Srinivas Computer Science Department, G. Pullaiah College of
Engineering and Technology, Kurnool, India
Ayesha Afnan Department of Electronics and Communication Engineering, Atria
Mohit Agarwal Bennett University, Greater Noida, India
Neeraj Kumar Agrahari Department of Computer Application, National Institute
of Technology Kurukshetra, Kurukshetra, India
Editors and Contributors xvii
Hemal Ahir G H Patel College of Engineering and Technology, Vallabh

Vidhyanagar, Gujarat, India
Deepak Ahire Walchand College of Engineering, Sangli, Maharashtra, India
K. Akhil Department of Computer Science, PES University, Bengaluru, India
N. S. Akhilesh BMS College of Engineering, Bangalore, India
I. Amrita Department of CSE, Global Academy of Technology, Bengaluru,
Karnataka, India
Nikhil Anand Internshala, Gurugram, India
M. N. Aniruddha BMS College of Engineering, Bangalore, India
R. V. Anirudh Computer Science and Engineering, B.M.S. College of Engineering,
Basavanagudi, Bangalore, Karnataka, India
A. Antonidoss Department of Computer Science and Engineering, Hindustan
Institute of Technology and Science, Chennai, India
Majji V. Appalanaidu Department of Computer Science, Pondicherry University
Karaikal Campus, Karaikal, Pondicherry, India
Karrothu Aravind Computer Science and Engineering, GMRIT Engineering
College, Razam, India
S. J. Arya Department of Electrical & Electronics Engineering, Mar Baselios
College of Engineering & Technology, Thiruvananthapuram, India
S. Ashok Kumar Jyothishmathi Institute of Technological Sciences, Karimnagar,
India
A. Asish Department of Electrical & Electronics Engineering, Mar Baselios College
of Engineering & Technology, Thiruvananthapuram, India
Lalit Kumar Awasthi Department of Computer Science and Engineering, Dr. B R
Ambedkar National Institute of Technology Jalandhar, Jalandhar, Punjab, India
Brundha Rajendra Babu Computer Science and Engineering, B.M.S. College of
Engineering, Basavanagudi, Bangalore, Karnataka, India
Naidu Srinivas Kiran Babu Department of Computer Applications, Career Point
University, Kota, India
Shashi Bala Chitkara University Institute of Engineering & Technology, Chitkara
University, Rajpura, Punjab, India
P. Balakesava Reddy Information Technology, VNRVJIET, Hyderabad,
Telangana, India
Gousiya Begum CSE Department, JNTU, Ananthapuramu, India
xviii Editors and Contributors
Smriti Bhandari Department of Computer Science and Engineering, Annasaheb

Dange College of Engineering and Technology, Ashta, Maharashtra, India
Shivani Bhandigare Department of Computer Science and Engineering, Walchand
College of Engineering, Sangli, India
Vivek Bhardwaj Chitkara University Institute of Engineering & Technology,
Chitkara University, Rajpura, Punjab, India
Shubam Bharti Department of Computer Science and Engineering, Punjab
Engineering College, Chandigarh, India
Aruna Bhat Department of Computer Science and Engineering, Delhi
Technological University, Delhi, India
Prashant Bhat School of Computational Sciences and Information Technology,
Garden City University, Bengaluru, Karnataka, India
Rajesh Bhatia Department of Computer Science and Engineering, Punjab
Koushik Bhattacharyya Computer Science and Engineering, Dream Institute of
Technology, Kolkata, India
M. S. Bhosale Department of Computer Science and Engineering, TKIET,
Warnanagar, Kolhapur, India
Rahul Rajendrakumar Budyal Department of ECE, Nitte Meenakshi Institute of
Technology, Bengaluru, India
Rithvik Chandan Department of Computer Science and Engineering, PES
University, Bangalore, India
Sangeetha S. Chegaraddi Department of ECE, Nitte Meenakshi Institute of
Bhavya Chhabra Department of Computer Science and Engineering, SRM
Institute of Science and Technology, Chennai, India
Rishita Choubey Computer Science and Engineering, Dream Institute of
Technology, Kolkata, India
Mukesh Choudhary SIES Graduate School of Technology, Navi Mumbai, India
Maria Anisha Cross GNITC, Hyderabad, Telangana, India
Arijit Das Department of Computer Science and Engineering, Delhi Technological
University, Delhi, India
Danny Joel Devarapalli Department of Computer Science and Engineering,
Vignan Institute of Technology and Science, Hyderabad, Telangana, India
G. R. Dheemanth Department of Computer Science and Engineering, PES
University, Bengaluru, India
Editors and Contributors xix
S. Dilli Babu Department of CSE, Vignan’s Institute of Management and

Technology for Women, Hyderabad, India
Nishita Dubey Department of Computer Engineering, St. Francis Institute of
Technology, Mumbai, India
Justin Eapen Faculty, Department of ECE, Saintgits College of Engineering,
Kottayam, Kerala, India
Ismail Siddiqi Emon Department of CSE, Feni University, Feni, Bangladesh
Md. Faizul Huq Arif Department of ICT(DoICT), ICT Division, Dhaka,
Bangladesh
B. S. Febi Shine Department of Electrical & Electronics Engineering, Mar Baselios
College of Engineering & Technology, Thiruvananthapuram, India
Chelsea Fernandes Department of Computer Engineering, St. Francis Institute of
Snehal Gajbhiye Department of Electronics and Telecommunications, MKSSS’s
Cummins College of Engineering for Women, Pune, India
Sachin Gajjar Department of Electronics and Communication Engineering, Nirma
University, Ahmedabad, Gujarat, India
Deepak Garg Department of Computer Application, National Institute of
Technology Kurukshetra, Kurukshetra, India
Anirban Ghosh BMS College of Engineering, Bangalore, India
S. Godwin Barnabas Ramco Institute of Technology, Rajapalayam, Tamil Nadu,
India
E. A. Gopalakrishnan Center for Computational Engineering & Networking
(CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India
Harshit Gorijavolu Department of Computer Science and Engineering, Vignan
Institute of Technology and Science, Hyderabad, Telangana, India
Anuja Gote Department of Information Technology, Vidyalankar Institute of
A. Govardhan JNTUH, Hyderabad, India
K. Govinda SCOPE, VIT University, Vellore, Tamilnadu, India
C Govindasamy Department of Computer Science and Engineering, Hindustan
Institute of Technology and Science, Chennai, India
Anmol Gupta School of Computer Science and Engineering, Galgotias University,
Greater Noida, India
Himanshu Gupta Amity University, Noida, Uttar Pradesh, India
xx Editors and Contributors
Suneet Kr. Gupta Bennett University, Greater Noida, India

Sunny Gupta Department of Information Technology, Vidyalankar Institute of
Jaishankar Gurumurthi SIES Graduate School of Technology, Navi Mumbai,
India
Safa Hamdare Department of Computer Engineering, St. Francis Institute of
Akshaya Hari Department of Electronics and Communication Engineering, Atria
N. Harini Center for Computational Engineering & Networking (CEN), Amrita
Vishwa Vidyapeetham, Coimbatore, India
V. Harisankar Department of Computer Science and Engineering, Amrita School
of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India
Prajna Hegde School of Computational Sciences and Information Technology,
Garden City University, Bengaluru, Karnataka, India
Javed Hossain Department of Computer Science and Telecommunication
Engineering(CSTE), Noakhali Science and Technology University, Sonapur,
Noakhali, Bangladesh
Mohammad Mobarak Hossain Department of CSE, Asian University of
Bangladesh, Dhaka, Bangladesh
S. Zahoor Ul Huq CSE Department, GPREC, Kurnool, India
Aminul Islam Department of Information Technology, Indian Institute of
Information Technology Allahabad, Prayagraj, India
Busrat Jahan Department of CSE, Feni University, Feni, Bangladesh
Patel Jaimin Smt. K D Patel Department of Information Technology, Chandubhai
S. Patel Institute of Technology (CSPIT), Faculty of Technology & Engineering
(FTE), Charotar University of Science and Technology (CHARUSAT), Changa,
Gujarat, India
D. Jaya Kumar Department of ECE, Marri Laxman Reddy Institute of Technology
and Management, Hyderabad, India
S. Jayanthi Department of IT, Guru Nanak Institute of Technology, Hyderabad,
India
Debasish Jena Department of Computer Science and Engineering, IIIT
Bhubaneswar, Bhubaneswar, Odisha, India
Ashok Jetawat Faculty of Engineering, Pacific Academy of Higher Education and
Research University, Udaipur, India
Editors and Contributors xxi
G. Jeyakumar Department of Computer Science and Engineering, Amrita School

of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
Shikha Jha Department of Information Technology, Vidyalankar Institute of
Jeffin John Department of ECE, Saintgits College of Engineering, Kottayam,
Kerala, India
Jewel Moncy John Department of ECE, Saintgits College of Engineering,
Ebin Joseph Department of ECE, Saintgits College of Engineering, Kottayam,
Kerala, India
Md. Julfiker Raju Department of CSE, Feni University, Feni, Bangladesh
D. Sujitha Juliet Department of Computer Science and Engineering, Karunya
Institute of Technology and Sciences, Coimbatore, India
Aishwarya Kadam Department of Computer Science and Engineering, Walchand
Virender Kadyan Department of Informatics, School of Computer Science,
University of Petroleum and Energy Studies, Dehradun, India
Samruddhi Kalsekar Department of Computer Engineering, St. Francis Institute
of Technology, Mumbai, India
Sweta Kaman Department of Science of Intelligence, IIT Jodhpur, Karwar, India
Kiran Kamble Department of Computer Science and Engineering, Walchand
College of Engineering, Sangli, Maharashtra, India
Kiran P. Kamble Department of Computer Science and Engineering, Walchand
R. M. Kamble Department of Computer Science and Engineering, ADCET,
ASTHA, Ashta, Kolhapur, India
Shohna Kanchan Department of Computer Engineering, St. Francis Institute of
Aditi Karjole Department of Electronics and Telecommunications, MKSSS’s
Sai Prashanth Reddy Karri Department of Computer Science and Engineering,
Shivam Kathuria Department Electrical Engineering, Punjab Engineering
College, Chandigarh, India
Pavan Kumar Katkuri Chitkara University Institute of Engineering and
Technology, Punjab, India
xxii Editors and Contributors
Harshdeep Kaur Chitkara University, Institute of Engineering and Technology,

Chitkara University, Punjab, India
V. Kaviya Department of Computer Science and Engineering, Amrita School of
Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India
Shahnawaz Khan Department of Information Technology, University College of
Bahrain, Saar, Bahrain
Prajkta Kodavade Department of Computer Science and Engineering, Walchand
Rahul M. Koushik Department of Computer Science and Engineering, PES
Vijay Krishna Menon Center for Computational Engineering & Networking
(CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India
Linganagouda Kulkarni KLE Technological University, Hubli, India
Tejas Kulkarni Department of Information Technology, Vidyalankar Institute of
Manish Kumar Department of Computer Science and Engineering, Punjab
Bhabendu Kumar Mohanta Department of Computer Science and Engineering,
IIIT Bhubaneswar, Bhubaneswar, Odisha, India
G. Kumaravelan Department of Computer Science, Pondicherry University
Karaikal Campus, Karaikal, Pondicherry, India
Rakesh Kumar Department of CSE, CUH Mahendergarh, Mahendergarh,
Haryana, India
Dara Anitha Kumari Department of Computer Science, JNTUH, Hyderabad,
India
Ruchika Kumari Department of CSE, NITTTR, Chandigarh, India
Leena Ladge SIES Graduate School of Technology, Navi Mumbai, India
E. Lakshmi Priya Department of Computer Science and Engineering, Amrita
School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
Veerpaul Kaur Maan Giani Zail Singh Punjab Technical University, Bathinda,
Punjab, India
Sagar Maheshwari Department of Electronics and Communication Engineering,
Nirma University, Ahmedabad, Gujarat, India
Sheikh Shahparan Mahtab Department of EEE, Feni University, Feni,
Chittagong Division, Bangladesh
Editors and Contributors xxiii
Soumayadev Maity Department of Information Technology, Indian Institute of

Pradnya Malaganve Department Computational Science and IT, Garden City
University, Bengaluru, India
Rajshri N. Malage Department of CSE, N K Orchid College of Engineering and
Technology Solapur, Solapur, India
Y. L. Malathi Latha Department of CSE, Swami Vivekananda Institute of
Technology, Secunderabad, Telangana State, India
Ruchika Malhotra Department of Computer Science and Engineering, Delhi
Technological University, New Delhi, India
S. S. Manivannan SCOPE, VIT University, Vellore, India
Archana Mantri Chitkara University Institute of Engineering and Technology,
Punjab, India
Yash Mate Computer Department, Vivekanand Education Society’s Education of
Society Chembur, Chembur, Mumbai, India
Aakansha Mathur Department of Computer Science, BITS Pilani, Dubai, United
Arab Emirates
Venkata Sai Dheeraj Mavilla Department of Computer Science and Engineering,
Sharmin Akter Milu Department of Computer Science and Telecommunication
Niva Mohapatra Department of Computer Science and Engineering, IIIT
Suresh Chandra Moharana KIIT Deemed to be University, Bhubaneswar,
Odisha, India
Kapila Moon Department of Electronics Engineering, Ramrao Adik Institute of
Technology, Navi Mumbai, India
Ganga Bishnu Mund KIIT Deemed to be University, Bhubaneswar, Odisha, India
Francis K. Mupila Amity University, Noida, Uttar Pradesh, India
T. Murali Krishna Department of ECE, Vignan’s Lara Institute of Technology and
Science, Vadlamudi, AP, India
GRS Murthy Department of Computer Science and Engineering, Avanthi Institute
of Engineering and Technology, Vizianagaram, Andhra Pradesh, India
P. Myna Computer Science and Engineering, B.M.S. College of Engineering,
Basavanagudi, Bangalore, Karnataka, India
xxiv Editors and Contributors
S. Nagaprasad Department of M.C.A., St.Ann’s College, Mehdipatnam,

Hyderabad, Telanagana, India;
Faculty of CS and CA, Tara Govt. College (A), Sangareddy, Telangana, India
Akash Nagaraj Department of Computer Science, PES University, Bengaluru,
India
Nikhil Nagdev Computer Department, Vivekanand Education Society’s Education
of Society Chembur, Chembur, Mumbai, India
Rahul Nagpal Department of Computer Science and Engineering, PES University,
Bengaluru, India
Akhil Nair SIES Graduate School of Technology, Navi Mumbai, India
M. V. Narayana Department of CSE, Guru Nanak Institutions Technical Campus,
Hyderabad, India
Jyothi S. Nayak Computer Science and Engineering, B.M.S. College of
Patel Nehal Smt. K D Patel Department of Information Technology, Chandubhai S.
Patel Institute of Technology (CSPIT), Faculty of Technology & Engineering (FTE),
Charotar University of Science and Technology (CHARUSAT), Changa, Gujarat,
India
Sri Anjaneya Nimmakuri Department of Computer Science and Engineering,
Ashadun Nobi Department of Computer Science and Telecommunication
Ramesh Nuthakki Department of Electronics and Communication Engineering,
Atria Institute of Technology, Bengaluru, India
S. Padmavathi Department of Computer Science and Engineering, Amrita School
of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India
Mithilesh Kumar Pandey Department of Computer Application, National
Institute of Technology Kurukshetra, Kurukshetra, India
Shivani Pandey Department of Computer Science and Engineering, Delhi
Technological University, New Delhi, India
Sonal Pandey NITTTR Chandigarh, Chandigarh, India
Meet Patel G H Patel College of Engineering and Technology, Vallabh
Bageshree Pathak Department of Electronics and Telecommunications, MKSSS’s
Editors and Contributors xxv
Gayatri Patil Computer Department, Vivekanand Education Society’s Education

of Society Chembur, Chembur, Mumbai, India
Mithun B. Patil Department of CSE, N K Orchid College of Engineering and
Technology Solapur, Solapur, India
S. T. Patil Department of CSE, Sanjay Ghodawat University, Kolhapur, India
Sanjay S. Pawar Department of EXTC, UMIT, SNDT Women’s University,
Mumbai, India
Sonali Pawar Department of Electronics and Telecommunications, MKSSS’s
Niharika Pentapati Department of Computer Science and Engineering, PES
B. L. Pranitha Department of CSE, Global Academy of Technology, Bengaluru,
Karnataka, India
Eleanor Prashamshini Computer Science and Engineering, B.M.S. College of
B. Premjith Computational Engineering and Networking (CEN), Amrita School of
Engineering, Amrita Vishwa Vidyappetham, Coimbatore, India
R. L. Priya Computer Department, Vivekanand Education Society’s Education of
Society Chembur, Chembur, Mumbai, India
T. Pushpatha Department of M.C.A., St.Ann’s College, Mehdipatnam, Hyderabad,
Telanagana, India;
Anisha Radhakrishnan Department of Computer Science and Engineering,
Amrita School of Engineering, Coimbatore, India
Mostafizur Rahman Department of Electronics and Communication Engineering,
Khulna University of Engineering and Technology, Khulna, Bangladesh
Laxmi Raja Department of CSE, Faculty of Engineering, Karpagam Academy of
Higher Education, Coimbatore, India
R. Raja Kishore Department of ECE, Marri Laxman Reddy Institute of
Technology and Management, Hyderabad, India
S. Rajakarunakaran Ramco Institute of Technology, Rajapalayam, Tamil Nadu,
India
K. Rajkumar School of Computer Science and Information Technology, DMI-St
John the Baptist University, Mangochi, Malawi
B. Rama Department of CS, Kakatiya University, Warangal, Telangana, India
C. Rama Krishna NITTTR Chandigarh, Chandigarh, India
xxvi Editors and Contributors
Somula Ramasubbareddy Information Technology, VNRVJIET, Hyderabad,

Telangana, India
B. Ramji Center for Computational Engineering & Networking (CEN), Amrita
Sudhanshu Ranjan Department of Information Technology, Indian Institute of
Abhinav Rastogi Department of Computer Science and Engineering, Delhi
Technological University, Delhi, India
Akanksha Rastogi Department of Computer Science and Engineering, Dr. B R
Ambedkar National Institute of Technology, Jalandhar, Punjab, India
Arun Pratap Rawat Department of Information Technology, Indian Institute of
E. Madhusudhana Reddy Department of CSE, Guru Nanak Institutions Technical
Campus, Hyderabad, India
Neha Redekar Department of Computer Science and Engineering, Walchand
Mathew Regi Department of Information Technology, Rajagiri School of
Engineering and Technology, Ernakulam, Kerala, India
V. Rijith Kumar Ramco Institute of Technology, Rajapalayam, Tamil Nadu, India
Chatla Venkat Rohit Department of School of Computing, Sastra University,
Thanjavur, Tamil Nadu, India
Gitimayee Sahu Department of EXTC, UMIT, Juhu, Mumbai, India
C. Sai Sreekari Department of Computer Science and Engineering, Amrita School
of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
Ch. Saidulu Department of ECE, Vignan’s Lara Institute of Technology and
D. Saidulu Information Technology, Guru Nanak Institutions Technical Campus,
Hyderabad, Telangana, India
V. V. Sajith Variyar Center for Computational Engineering & Networking (CEN),
Amrita Vishwa Vidyapeetham, Coimbatore, India
Poonam Salwan I.K. Gujral Punjab Technical University, Jalandhar, Punjab, India
Patel Sandip Smt. K D Patel Department of Information Technology, Chandubhai
S. Patel Institute of Technology (CSPIT), Faculty of Technology & Engineering
(FTE), Charotar University of Science and Technology (CHARUSAT), Changa,
Gujarat, India
Editors and Contributors xxvii
Amrit Lal Sangal Department of Computer Science and Engineering, Dr. B R

N. Sankarachelliah Ramco Institute of Technology, Rajapalayam, Tamil Nadu,
India
R. Santhosh Department of CSE, Faculty of Engineering, Karpagam Academy of
Higher Education, Coimbatore, India
S. Saravanan Department of Computer Science and Engineering, Karunya Institute
of Technology and Sciences, Coimbatore, India
K. Sateesh Kumar Department of ECE, Vignan’s Lara Institute of Technology and
K. Satyavathi Department of ECE, Nalla Malla Reddy Engineering College,
Hyderabad, India
N. Satyavathi Department of CSE, JNTUH, Hyderabad, Telangana, India
T. Selva Sundar Ramco Institute of Technology, Rajapalayam, Tamil Nadu, India
Protiva Sen Department of Electronics and Communication Engineering, Khulna
University of Engineering and Technology, Khulna, Bangladesh
Snigdha Sen Department of CSE, Global Academy of Technology, Bengaluru,
Karnataka, India
P. Senthilram Ramco Institute of Technology, Rajapalayam, Tamil Nadu, India
T. Shanmuganantham Pondicherry University, Puducherry, India
Faisal Ahmed Shariff Department of Electronics and Communication
Engineering, Atria Institute of Technology, Bengaluru, India
Ashu Sharma Mindtree Hyderabad, Hyderabad, India
Hemant Kumar Sharma Department of Computer Science and Engineering, Dr. B
R Ambedkar National Institute of Technology Jalandhar, Jalandhar, Punjab, India
Kapil Sharma Department of Information Technology, Delhi Technological
University, New Delhi, Delhi, India
Sanjay Sharma C3i, IIT Kanpur, Kanpur, India
K. Chandra Shekar JNTUH, Hyderabad, Telangana, India
K. Sindhu BMS College of Engineering, Bangalore, India
Shivam Singh Department of Computer Application, National Institute of
Technology Kurukshetra, Kurukshetra, India
Shreyanshi Singh Department of Computer Science and Engineering, IIIT
xxviii Editors and Contributors
Ranga Swamy Sirisati Department of CSE, Vignan’s Institute of Management and

Technology for Women, Hyderabad, India
A. P. Siva Kumar MGIT, Hyderabad, India
V. C. Skanda Department of Computer Science and Engineering, PES University,
Bengaluru, India
K. P. Soman Computational Engineering and Networking (CEN), Amrita School
of Engineering, Amrita Vishwa Vidyappetham, Coimbatore, India
Ramasubbareddy Somula Information Technology, VNRVJIET, Hyderabad,
India
V. Sowmya Center for Computational Engineering & Networking (CEN), Amrita
J. L. Sreelakshmi Department of Electrical & Electronics Engineering, Mar
Baselios College of Engineering & Technology, Thiruvananthapuram, India
K. Sreelakshmi Computational Engineering and Networking (CEN), Amrita
School of Engineering, Amrita Vishwa Vidyappetham, Coimbatore, India
H. R. Srikanth Department of Computer Science, PES University, Bengaluru,
India
Raghunandan Srinath SenZopt Technologies, Bengaluru, India
Pattlola Srinivas Department of CSE, Malla Reddy Engineering College
(Autonomous), Hyderabad, Telangana State, India
Dhonvan Srinu Department of ECE, Marri Laxman Reddy Institute of Technology
and Management, Hyderabad, India
S. Subramanyam Jyothishmathi Institute of Technological Sciences, Karimnagar,
India
Razia Sultana Department of Computer Science, BITS Pilani, Dubai, United Arab
Emirates
Y. R. Sumukh Department of ECE, Nitte Meenakshi Institute of Technology,
Bengaluru, India
Amulya Ratna Swain KIIT Deemed to be University, Bhubaneswar, Odisha, India
M. Swami Das Department of CSE, Malla Reddy Engineering College
(Autonomous), Hyderabad, Telangana State, India
Falgun Thakkar G H Patel College of Engineering and Technology, Vallabh
B. K. Thejas Department of CSE, Global Academy of Technology, Bengaluru,
Karnataka, India
Editors and Contributors xxix
K. Thirunavukkarasu School of Computer Science and Engineering, Galgotias

University, Greater Noida, India
Abraham K Thomas Department of ECE, Saintgits College of Engineering,
Gaurav Tirodkar Computer Department, Vivekanand Education Society’s
Education of Society Chembur, Chembur, Mumbai, India
T. Tulasi Sasidhar Computational Engineering and Networking (CEN), Amrita
School of Engineering, Amrita Vishwa Vidyappetham, Coimbatore, India
P. Udaya Bhanu Department of ECE, Vignan’s Lara Institute of Technology and
S. Valai Ganesh Ramco Institute of Technology, Rajapalayam, Tamil Nadu, India
Elizabeth Varghese Department of Electrical & Electronics Engineering, Mar
Baselios College of Engineering & Technology, Thiruvananthapuram, India
Vignesh Vasudevan NIT, Trichy, Tamil Nadu, India
Akshay Venkatesh Department of Computer Science, PES University, Bengaluru,
India
K. Venugopal Rao Jyothishmathi Institute of Technological Sciences, Karimnagar,
India
Harsh K. Verma Department of Computer Science and Engineering, Dr. B R
Prashant Verma Department of Information Technology, Delhi Technological
University, New Delhi, Delhi, India
Hrishikesh Vichore SIES Graduate School of Technology, Navi Mumbai, India
P. Vijay Kumar Department of ECE, Vignan’s Lara Institute of Technology and
M. Vijayalakshmi KLE Technological University, Hubli, India
M. Vishnu Vardhana Rao Department of CSE, Vignan’s Institute of Management
and Technology for Women, Hyderabad, India
Jayavrinda Vrindavanam Department of ECE, Nitte Meenakshi Institute of
Anupama Yadav Department of Computer Science and Engineering, Dr. B R
L. Yashaswini Department of ECE, Nitte Meenakshi Institute of Technology,
Bengaluru, India
Static and Dynamic Activities Prediction
of Human Using Machine and Deep
Learning Models
S. Valai Ganesh, Mohit Agarwal, Suneet Kr. Gupta, and S. Rajakarunakaran
Abstract Recent advancement in smart phones and computing technologies has

played a vital role in people’s life. Develop a model to detect the human basic
dynamic activities such as Amble, Climb stairs, coming down the stairs into the
floor and human basic static activities like Sitting, Standing or Laying using the
person’s smart phone and computers are the major work of this paper. Conventional
Machine learning models like Logistic Regression, SVC, Decision tree, etc. results
are compared with a recurrent deep neural network model named as Long Short
Term Memory (LSTM). LSTM is proposed to detect the human behavior based on
Human Activity Recognition (HAR) dataset. The data is monitored and recorded
with the aid of sensors like accelerometer and Gyroscope in the user smart phone.
HAR dataset is collected from 30 persons, performing different activities with a smart
phone to their waists. The testing of the model is evaluated with respect to accuracy
and efficiency. The designed activity recognition system can be manipulated in other
activities like predicting abnormal human actions, disease by human actions, etc.
The overall accuracy has improved to 95.40%.
Keywords Human activity recognition · LSTM · Sensors · Smart phones ·

Recurrent neural network · Gyroscope · Accelerometer
S. Valai Ganesh (B) · S. Rajakarunakaran

Ramco Institute of Technology, Rajapalayam, Tamil Nadu 626117, India
e-mail: valaiganesh@ritrjpm.ac.in
S. Rajakarunakaran
e-mail: rajakarunakaran@ritrjpm.ac.in
M. Agarwal · S. Kr. Gupta
Bennett University, Greater Noida, India
e-mail: ma8573@bennett.edu.in
S. Kr. Gupta
e-mail: suneet.gupta@bennett.edu.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 1
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_1
2 S. Valai Ganesh et al.
1 Introduction
Human behavior detection by static and dynamic motions is a latest technology that
can be used for identifying human activities through computer and smart phone
systems. A typical human behavior detection dataset is the ‘Activity Recognition
Using Smart Phones Dataset’ available from the internet. Input data’s can be taken
from several sorts of devices, such as sensors for capturing images, recording audio,
monitoring pressure level, orientation and accelerations. The quick development
of communication systems between human with computers and human with smart
phones leads to identifying activities of humans in every aspect. More importantly,
the recent introduction of GPU and deep learning [1, 2] algorithms has made human
behavior detection applications in various areas like athletic competition, smart home
automation and health care or monitoring for the Elder peoples.
In current scenario, two types of methods are available to detect human behavior
detection: First one is using live images of human behavior and the second one is
from wearable sensors [3, 9]. By using the sensors (Gyroscope and accelerometer) in
a smart phone [5], data’s like acceleration and orientation are taken from Accelerom-
eter and Gyroscope with several variations are recorded. Accelerometer [4, 6] and
Gyroscope readings are taken from 30 volunteers (referred as subjects) while per-
forming the static activities like Sitting, Standing or Laying and dynamic activities
like Walking, Walking Upstairs, Walking Downstairs. Accelerometer readings are
divided into gravity acceleration and body acceleration readings, which has three
dimensions in nature. Each Sensor signal is preprocessed by adopting noise filters.
The remaining portion of the article is aligned as follows. In Sect. 2, there was
discussion about research work completed in the past by the research community.
Dataset particulars in the proposed work are provided in Sect. 3. In Sects. 4 and 5,
a discussion about Machine Learning and LSTM model has been discussed. The
experimental results have been discussed in Sect. 6. Article is ended with the com-
parison of machine learning model’s accuracy with deep learning model accuracy
along with possible extension of the work by introducing new deep learning models
in future in Sect. 7.
2 Related Work
Bayat et al. [4] developing two different models for detecting human activities. One
model is named as “in-hand” and another model is “in-pocket”. There are six different
activities are detected. They are Fast Walk, Slow Walk, Running, Stairs-Up, Stairs-
Down and Dancing. They are using tri-axial accelerometer is used to detect the
human activities. Six different classification methods are adopted for this work and
their results are compared. Testing accuracy of up to 91.15% is achieved in everyday
activities using accelerometer.
Static and Dynamic Activities Prediction of Human … 3
Bulbul et al. [5] predicting human behavior using Smartphone’s by adopting deep
learning models. Sensors like Accelerometer and gyroscope are used to predict the
human behaviors. Dataset contains the information of nine individuals performing
three different dynamic activities like walking, climbing up the stairs, climbing down
the stairs, and three different static activities like sitting, standing and laying. Input
data’s are monitored with a frequency range of 50 Hz. The signals are received
and saved for every proportions. Designed models first trained with 80% of the
total dataset and tested with 20% data. Models are developed, observed and tested
using fivefold cross validation. Various conventional Machine learning Classification
models such as Decision Trees, SVM, etc. were used in this work.
Attal et al. [3] provides an overview of various techniques to detect human behav-
ior from a wearable inertial sensing unit. Sensors were located in different portions
of the physical body of a human. Importantly, the detecting devices were located in
the lumbar and waiting for detecting various static and dynamic activities. A con-
catenation of forty stochastic tasks was chosen for the study. Conventional machine
learning classification techniques used in their study among that k-Nearest Neighbors
produced better accuracy among other techniques.
3 HAR Dataset
An Accelerometer and Gyroscope readings are taken from 30 human beings (referred
as subjects) while performing the following 6 classes (labels) such as static classes
like Standing, Sitting, Laying and dynamic classes like Walking, Walking-Upstairs,
Walking Downstairs.
Accelerometer outputs are classified into two parameters, namely, gravity accel-
eration and body acceleration readings, which has three components x, y and z.
Gyroscope readings are used to represent angular velocities of three dynamic activi-
ties. Jerk signals are taken from body acceleration readings. Fourier Transforms are
finished on the above time readings to calculate frequency readings. There are 561
features are available in the dataset. Each window of readings is a data point of 561
features of subjects. Thirty subjects (human beings) data are randomly split to 70%
(21 persons) test and rest is train data. Each data point corresponds one of the 6 class
of activities.
4 Machine Learning Models
HAR Dataset was initially tested by developing a model using conventional machine
learning models like Logistic Regression, Linear SVC, SVM, Random Forest, Deci-
sion Tree and Gradient Boosting methods. The results of the model are shown in
experimental result section.
Fig. 1 Precision and recall comparison of machine learning methods
Fig. 2 Accuracy and F1-score comparison of machine Learning methods
Comparing the precision results of all machine learning models are shown in
Fig. 1. Linear SVC and SVM models are produced almost the same value whereas
DT model produced least value. Recall comparison results are shown that Linear
SVC produced some better value than other machine learning models.
Similarly, by comparing the other two parameters like Accuracy and F1 score
are shown in the form of bar chart as shown in Fig. 2. In that, the Linear SVC
model provides better accuracy than other five machine learning models. Decision
Tree provides least accuracy among other models. On the other hand, by Linear
SVC shows good F1 score as compared to all others. By using six different machine
learning models, Linear SVC produced some good results among others. Decision
tree models produced the least results compared to all other five machine learning
models where as SVM provide some decent results.
5 Deep Learning Model-LSTM
The Long Short-Term Memory model is selected in this work. LSTM able to
learn long-term dependencies. LSTM work extremely well on sequential modeling.
Sequential modeling means ability to predict what comes next in order. Problems like
Vanishing Gradient and Exponential gradient are normally occurring during back-
propagation through time process. LSTM normally overcomes this problem of BPTT
(Back Propagation Through Time). LSTMs looks chain like structure, on the other
Fig. 3 Overview of LSTM model
hand the repeating module has a different structure. There are four interacting layers
in LSTM [7]. Interacting layers are equipped with point wise operations, activation
functions like sigmoid or hyperbolic tangent functions. Line merging indicates con-
catenation process, line forking denotes the copy process and copied content is fed
to various portions inside the interacting layers. The LSTM have able to forget the
information or append the incoming data to the cell state by means of layout called
as gates (Fig. 3).
There are two packages are added with LSTM. One is Hyperopt and Hyperas.
Hyperopt is the open source python library for doing optimization in serial and par-
allel spaces. Hyperopt may include real-valued, discrete, and conditional dimensions
of data. Hyperas is a casing for hyperopt to perform optimization for keras models.
The Softmax activation function is used here to predict six different classes. The
Softmax activation provides a probability based distribution function. It is used to
predict the output when multiple classes are involved. In this work there are six classes
are labels are available in which softmax function provides probability rates for all
classes. Based upon the highest probability rate the output is predicted. Normally,
softmax function is located in the endmost layer in a categorizing problem.
6 Experimental Results
In this work, human activities are obtained based upon movements. The experiment
is performed with python version 3. Initially, the work was started with LSTM layer.
Then expanded to two-layer LSTM, LSTM with hyperparameters with 15 evalua-
tions. The outcome of LSTM model is shown in Tables 1, 2 and 3 (Fig. 4).
The same HAR Dataset is trained and tested with deep learning models like LSTM
single and two layers then with hyperparameters. LSTM with single layer produced
around 92.40% validation accuracy, whereas LSTM with two layers produced slightly
improved results around 92.43% and LSTM with hyperparameters provides around
95.40% accuracy by using hyperopt modules. Linear SVC model and LSTM with
Table 1 LSTM single layer output results—Model-I

Classifier Output shape Parameters Epochs Accuracy
lstm_1 (LSTM) (None, 64) 18,944 50 92.40%
Dropout_1 (None, 64) 0
(dropout)
Dense_1 (dense) (None, 6) 342
Table 2 LSTM two layer output results—Model-II

Classifier Output shape Parameters Epochs Accuracy
lstm_2 (LSTM) (None, 128, 64) 18,944 50 92.43%
Dropout_2 (None, 128, 64) 0
(dropout)
lstm_3 (LSTM) (None, 56) 27,104
(dropout)
Dense_2 (dense) (None, 6) 342
Table 3 LSTM hyperparameters—Model-III

Classifier Output shape Parameters Epochs Evaluations Accuracy
lstm_4 (None, 32) 5376 30 15 95.40%
(LSTM)
(dropout)
Dense_3 (None, 6) 198
(dense)
Fig. 4 a LSTM validation single layer; b LSTM validation two layer; c LSTM validation above
two layer
hyperparameters models are producing almost same validation results. In the upcom-
ing work, Convolutional Neural Network based model and Resnet based model will
be developed and check for their accuracy level. By adopting new deep learning mod-
els we can predict the many label of human activities during normal and abnormal
condition of humans.
7 Conclusion and Future Work
Smart phone applications are not limited to communications and networks. Various
deep learning models are slowly incorporated in smart phones in order to collect the
data for various activities. HAR is one of the important outcomes of this feature.
LSTM produces 95.40% of testing accuracy. In the future, we are planning to add
some classes with developing other models like CNN and Resnet Models and check
for their validation accuracy level.
Acknowledgements We are thankful to RAMCO Institute of Technology and Bennett University

for providing expertise that greatly assisted the research, although they may not agree with all of
the interpretations provided in this paper.
References
1. Agarwal, M., Kaliyar, R.K., Singal, G., Gupta, S.K.: FCNN-LDA: a faster convolution neural
network model for leaf disease identification on apple’s leaf dataset. In: 2019 12th International
Conference on Information & Communication Technology and System (ICTS), pp. 246–251.
IEEE (2019)
2. Agarwal, M., Sinha, A., Gupta, S.K., Mishra, D., Mishra, R.: Potato crop disease classification
using convolutional neural network. In: Smart Systems and IoT: Innovations in Computing,
pp. 391–400. Springer (2020)
3. Attal, F., Mohammed, S., Dedabrishvili, M., Chamroukhi, F., Oukhellou, L., Amirat, Y.: Phys-
ical human activity recognition using wearable sensors. Sensors 15(12), 31314–31338 (2015)
4. Bayat, A., Pomplun, M., Tran, D.A.: A study on human activity recognition using accelerometer
data from smartphones. Procedia Comput. Sci. 34, 450–457 (2014)
5. Bulbul, E., Cetin, A., Dogru, I.A.: Human activity recognition using smartphones. In: 2018 2nd
International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT),
pp. 1–6. IEEE (2018)
6. Kwapisz, J.R., Weiss, G.M., Moore, S.A.: Activity recognition using cell phone accelerometers.
ACM SigKDD Expl. Newslett. 12(2), 74–82 (2011)
7. Kwon, M.C., Choi, S.: Recognition of daily human activity using an artificial neural network
and smartwatch. Wirel. Commun. Mob. Comput. 2018 (2018)
8. Polu, S.K., Polu, S.K.: Human activity recognition on smartphones using machine learning
algorithms. Int. J. Innov. Res. Sci. Technol. 5(6), 31–37 (2018)
9. Sousa Lima, W., Souto, E., El-Khatib, K., Jalali, R., Gama, J.: Human activity recognition
using inertial sensors in a smartphone: an overview. Sensors 19(14), 3213 (2019)
10. Sun, J., Fu, Y., Li, S., He, J., Xu, C., Tan, L.: Sequential human activity recognition based on
deep convolutional network and extreme learning machine using wearable sensors. J. Sens.
2018 (2018)
Implementation of Braille Tab
Tejas Kulkarni, Shikha Jha, Sunny Gupta, and Anuja Gote
Abstract Braille tab is an electronic device used to perform various functions of a

general e-book reader, but for the visually impaired people, the tab will have features
that make it user-friendly and convenient for both who know the Braille language and
for students as well. The objective of this paper is to demonstrate the various stages
of development of the tab—both software and hardware aspects of it. An Android
application called “Braille Tab Companion,” which is compatible with the tab, has
also been developed to improve user experience. The app also helps in accessing
various features of the tab. Education is a special focus here, as both students and
teachers can use it for classroom learning as well as self-learning by various means.
Thus, a Braille equivalent of all the information will be displayed. The paper contains
a detailed illustration and discussion on the software and hardware aspects, including
the application and electronic components used in the system.
Keywords Braille tab · Android app · Firebase · Shift register · Multiplexer
1 Introduction
Digital revolution has brought tremendous changes in the field of education. One can
access a pool of knowledge on their fingertips. This revolution has made our society
more educated and liberal; due to which, the overall standard of living has improved.
But for the visually impaired, acquiring knowledge in not easy even in 2020. Efforts
T. Kulkarni (B) · S. Jha · S. Gupta · A. Gote

Department of Information Technology, Vidyalankar Institute of Technology, Mumbai, India
e-mail: kulkarni.tejas04@gmail.com
S. Jha
e-mail: jha17.shikha@gmail.com
S. Gupta
e-mail: gupta666sunny@gmail.com
A. Gote
e-mail: anuja.gote@vit.edu.in
https://doi.org/10.1007/978-981-33-4543-0_2
10 T. Kulkarni et al.
have been made to provide them knowledge using audiobooks and Braille printed
books. However, research has it that dependence on audiobooks reduces cortical
plasticity—an important factor in cognitive development [1]. A part of the socio-
economic strata is not able to afford information sources for the blind because of
limited number of books are available in Braille lipi. Often, these books are bulky and
expensive, which limits their accessibility in social circles. In recent years, refreshable
tactile displays have been developed, thereby allowing the blind to access information
available online. In most of these projects, more emphasis was given on the hardware
aspect, making them accurate, but not focused on the ease of usage. Hence, the need
to develop a system ensuring easy access to braille tabs was realized and worked
upon.
2 Related Work
After conducting a thorough survey of existing systems, it was learnt that diverse
research has been performed on the hardware aspect of tactile display. Actuators were
composed of piezoelectric material, shape memory alloys, and solenoids to raise
individual pins in the Braille character [2]. Bending characteristics of electroactive
polymers are utilized to provide hydraulic action of Braille dot [3]. Pneumatic signals
were used to raise dots of several braille cells arranged in a row [4]. MCU was
designed to convert Chinese or English text into Braille, play music and provides
keyboard and other display features [5]. Character recognition was used for system
development [6]. Arduino-based tab was created for Devanagari to Braille conversion
[7]. Solidification of liquid state alloys was performed to allow locking of braille dots
[8]. Though a few drawbacks have been noted in above systems such as—(a) they
are commercially not viable, (b) the actuator mechanism can break when excessive
pressure is applied, (c) the one actuator per dot mechanism can make the device
bulky, and finally, (d) the refreshing rate is not satisfying in all the available tabs [9].
3 Proposed System
• Braille tab is in effect, an e-book reader, but for the visually impaired, a special
emphasis has been given on its usage in educational institutions. For testing
purpose, LEDs are used which can be easily replaced with solenoid actuators
for making tactile displays [10] (see Fig. 1).
• The device uses a hierarchical architecture consisting of shift registers and multi-
plexers for accessing each character individually. This makes it energy efficient
and reduces maintenance costs.
• For making any device user-friendly, android applications become helpful. Many
inbuilt features help users to access app and developers to create such apps without
losing focus on the main objective of the application.
Implementation of Braille Tab 11
Fig. 1 Selecting 1st horizontal line
• “Braille companion” acts as a mediator between the user and the braille tab,
helping user to utilize system effectively by providing a smooth flow. The android
application is designed keeping in mind that most of the users will be visually
impaired; hence, a description of talk back and large buttons are provided for
improving user experience.
4 Implementation and Working (Software)
• For providing a secured system and hassle-free usage, biometric lock is enabled
for the app. Only those fingerprints are valid which are already registered for the
device to unlock and saved in TEE. An authenticated user is prompted for the
same (Fig. 3).
• The application uses the Android ID which is unique and assigned during initial
boot-up, and which remains unchanged unless factory reset is performed. It is
used to identify user uniquely on firebase database.
• The absence of user profile indicates that the user is not registered and so is
redirected to the registration page (Fig. 4a).
• Users enrolled in the institution must fill their Institute ID after which they will
be redirected to another login page (Fig. 4b). After successful registration, data
is uploaded on firebase. To control accessibility, such user needs approval from
admin. Newly arrived requests are listed as shown in Fig. 7.
• Unless admin takes any action, the user must wait, and the same is prompted to
the user if a login attempt is made.
• If admin accepts the request using cloud messaging and volley, then a push notifi-
cation is sent. Also depending on admins action, user profile is updated on firebase
(Fig. 6).
• A teacher and a student can both decide to choose the purpose of using the tab
through two modes: “Classroom” and “Self-Learning” (Fig. 8).
• After joining a classroom, the student can access data available for particular
classroom.
Fig. 2 Selecting 1st character from first line
Fig. 3 Login page
• For a non-institutional user, classroom mode is not available, and only self-
learning mode is accessible.
• After mode is selected, user has to choose the type of data to be uploaded (Fig. 9).
• In document mode, .txt files can be selected which are then uploaded on
firestorage, and its download link is uploaded on the firebase for the tab to access
Fig. 4 Registration request
Fig. 5 Firestorage
Fig. 6 Notification
file. Once operation is done successfully, user is prompted for the same (Figs. 5
and 10).
Fig. 7 Registration request
Fig. 8 Select mode
• “Dictate” mode uses Google’s speech-to-text’ API (see Fig. 11) to perform speech-
to-text conversion. The output string is confirmed using talkback system and then
uploaded on firebase (Table 1).
Above table maps the flags set by application values and the download location
for the tab in firebase. For example, when teacher uses dictate, feature is self-learning
mode; flag values of “role-setup-mode” will be “1-2-2,” respectively. Tab can down-
load the string from androidID//Text//my. Flag values help the tab to identify which
mode was used most recently by the user, thereby indicating what data to display on
the tab (Fig. 10).
Table 1 Chart for understanding different parameters before fetching data

Role Setup Mode Upload place Download from
1 1 1 Teacher: classroom—upload doc Link//Classroom
1 1 2 Teacher: classroom—dictate Text//Classroom
1 2 1 Teacher: self-learning—upload Link//My
doc
1 2 2 Teacher: self-learning—dictate Text//My
2 1 X Student: join classroom Depending on what teacher
uploaded
2 or 10 2 1 Student: self-learning—upload Link//My
doc
2 or 10 2 2 Student: self-learning—dictate Text//My
Fig. 9 Select file type
Fig. 10 ‘Dictate’ mode

Fig. 11 “Upload File” mode
Fig. 12 LED matrix
5 Implementation and Working (Hardware)
• In the proposed system, a 3 × 3 Braille tab is implemented, in which each character

is accessed sequentially.
• In this system, initially, for selecting a particular horizontal line, the shift register
connected directly to a micro-controller is used (Fig. 1). Each horizontal line is
provided with its own shift register responsible for accessing each of the three
characters individually (Fig. 2). Once all first line characters are processed, the
second line is selected. Then, characters in the second line are processed, and the
same process is repeated for the third line.
• When a character is accessed, its mux is enabled by Vcc connected to the shift
register. The requirement of i/o port is minimized, because the same data is
provided to all mux(s), but only one is enabled at a time.
Fig. 13 User profile in use
• Once a mux is enabled, it select lines are used to glow an LED as per require-
ment. Here, only one LED glows at a time, but due to high processing power of
micro-controller and persistence of vision, it seems that all the LEDs are glowing
simultaneously.
• For flawless communication, the android ID of user’s device is already registered
in tab.
6 Results
See Figs. 12 and 13.
7 Conclusion
In this paper, a system for creating a user-friendly Braille tab was proposed. Imple-
mentation based on this proposed system was carried out, and software and hardware
aspects of the system were discussed. Special emphasis was put toward the applica-
tion side, in order to better the user experience. Future adaptations of the Braille tab
are possible.
The tab which is currently used for pre-defined classrooms can be further devel-
oped to create customized classrooms. A keyboard can also be added to the tab
depending upon user requirement. The tab can be developed to support formats other
than .txt files. The system can be adapted to be used as a book reader by designing
a web portal.
References
1. Hamilton, R.H., Pascual-Leone, A.: Cortical Plasticity Associated with Braille Learning (1998)
2. Schmidt, R.N., Lisy, F.J., Prince, T.S., Shaw, G.S.: US Patent number: US6743021B2. Retrieved
from https://patents.google.com/patent/US6743021B2/en (2002)
3. Yang, P.: US Patent number: US6881063B2. Retrieved from https://patents.google.com/patent/
US6881063B2/en (2005)
4. Sutherland, N.B.: US Patent number: US3659354A. Retrieved from https://patents.google.
com/patent/US3659354A/en (1972)
5. Xiaoli, H., Tao, L., Bing, H., Qiang, C., Qiang, X., Qiang, H.: Electronic reader for the blind
based on MCU. In: 2010 International Conference on Electrical and Control Engineering,
pp. 888–890. Wuhan (2010)
6. Wajid, M., Kumar, V.: E-Braille documents: novel method for error free generation. Image
Process. Commun. 19(4), 21–26 (2014)
7. Gupta, R., Singh, P.K., Bhanot, S.: Design and implementation of Arduino based refreshable
braille display controller. Indian J. Sci. Technol. 9, 33 (2016)
8. Soule, C.W., Lazarus, N.: Reconfigurable Braille display with phase change locking. Smart
Mater. Struct. 25(7), 075040 (2016)
9. Gote, A., Kulkarni, T., Jha, S., Gupta, S.: A review of literature on braille tab and the underlying
technology. In: 2020 5th International Conference on Devices, Circuits and Systems (ICDCS),
pp. 333–335. Coimbatore, India (2020)
10. Yang, T.-H., Lee, J.-S., Lee, S.S., Kim, S.-Y., Kwon, D.-S.: Conceptual design of new micro-
actuator for tactile display. In: 2007 International Conference on Control, Automation and
Systems, pp. 1306–1309, Seoul (2007)
Resized MIMO Antenna for 5G Mobile
Antenna Applications
S. Subramanyam, S. Ashok Kumar, and T. Shanmuganantham
Abstract Resized MIMO antenna for 5G mobile antenna applications is estab-

lished on a self-isolated property. The suggested antenna has been miniaturized to
resize the antenna by using two vertical stubs and these two vertical stubs are inserted
in isolated antenna. With the help of isolation elements, the four-antenna MIMO
system can achieve a good efficiency. The antenna designed by using FR4 substrate
contains two different antennas like; T-shaped feeding element and perpendicular
stubs which are inserted in naturally self-isolated antenna component. Here, four-
antenna MIMO can achieve a target without any utilization of decoupling compo-
nents. Antenna models are constructed and simulated; it has a good arrangement in
simulation and analysis using IE3D simulator.
Keywords Communication of 5G · MIMO applications · Mobile terminal ·

Compact self-isolated antenna
1 Introduction
The user equipment can have a many advantages by using fifth generation (5G) such
as rate of high transmission and smaller inertia over the present 4G system. MIMO
antenna systems with multiple antennas (more than three antennas) are capable to
achieve higher rate of transmission from the 5G antenna. In this paper, the MIMO
antenna can achieve the high isolation by using the more number of antennas [1]
but, due to restricted space in mobile phones, not able to use the large number
S. Subramanyam (B) · S. Ashok Kumar

Jyothishmathi Institute of Technological Sciences, Karimnagar, India
e-mail: subramanyam.sana@gmail.com
S. Ashok Kumar
e-mail: ashokape@gmail.com
T. Shanmuganantham
Pondicherry University, Puducherry 605014, India
e-mail: shanmugananthamster@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_3
20 S. Subramanyam et al.
of antennas. There are few techniques to develop the process, such as decoupling
element, orthogonal polarization, ground structure, and neutralization line [2–4].
Most of the conventional systems consist of two antennas only, but the signal
strength of two antennas is weak. Antenna isolation was improved by using the four-
antenna MIMO system [5–7]. The more number of antennas are presented in single
system; and the more antennas in a device have high data rate and upload speeds.
Mostly, the size of the antenna is large, but in this work to decrease the size of the
antenna around 30% by using the vertical stubs [8–10]. The U-structured antenna
element can be used as decoupling element and radiating element. This is the unique
feature of MIMO antenna to have very good isolation.
2 Self-isolated Antenna Configurations and the System

Operating Sequence
Self-isolated configuration component was displayed in Fig. 1. The self-isolated

antenna element consists of three elements such as T-shaped component and U-
shaped component and two vertical stubs. Here, T-shaped element acts as a feeding
component. The working process of self-isolated antenna was explained by four-
antenna MIMO system.
The height and length of the self-isolated configuration are given as H and L (H
= 13.6 mm and L = 17.4 mm) and parameters p, t, h and q will clearly show all the
stubs and components.
The value of p, q, t and h is as follows, p = 3.4 mm, q = 11.9 mm, t = 5.1 mm, h
= 10.7 mm and it is shown in Fig. 1. U-structure component was grounded, where
the values of H and L are constant but, possible to change the alternating the values
of h, t, q and p. Without using the vertical stubs, we can know the value of length by
using antenna frequency.
Fig. 1 Single-MIMO antenna configuration

Resized MIMO Antenna for 5G Mobile Antenna Applications 21
In this paper, the proposed antenna can be operated at the frequency about 2.4 GHz
(2.3–2.5 GHz) for four-port MIMO antenna and also to use the same frequency for
single antenna. The single antenna and four antennas had a same discrepancy in
dielectric, tangential loss, and top surface where dielectric = 3.4, tanδ = 0.02 and
top surface (t) = 1.524. The values of H and L are same for both the type of antennas.
The total length of single antenna and four antennas was different. The length of four
antennas is 130.3 mm. Here, the antenna size is decreased about 40% from the actual
height. It should be a small gap between the U-structure component and T-shaped
component. The space between them gives good impedance matching in the exact
antenna system result. MIMO technology leverages multipath behaviour by using
multi-smart transmitters and receivers. MIMO is a wireless technology and it is used
to increase the channel capacity. MIMO can also be called as spatial multiplexing.
3 A Compact-Based Self-isolated Antenna Component

System with Four-antenna MIMO
In mobile terminal, four-port MIMO antenna system is positioned at the boundary.

The measurement of the dimensions shows 130.3 and 13.6 mm. The 4 × 4 MIMO
systems will have four different signals from four transmitted antennas, and by using
this setup, user equipment can receive the better signals. The measurement of the
four antennas is 127.7 mm × 13.6 mm × 1.2 mm and the dimensions of the small
substrate can be seen in Fig. 2.
Here, the four antennas consist of four ports where each port is fed with 50
impedance matching. The distance from one end of the antenna to another antenna
is given as d. The connection between D and d is: D = d + Length, where D =
38.2 mm and d = 20.9 mm.
The standards of L = 17.4 mm and H = 13.65 mm are kept constant where the
values of d and D will vary. The antenna system operates well with 2.4 GHz frequency
band with the following specification: t = 5.1 mm, h = 10.7 mm, p = 3.4 mm and
Fig. 2 Four-antenna system with MIMO

Fig. 3 S-parameter display of self-isolated MIMO system
q = 11.9 mm. A very good isolation can be obtained when antenna moves from one
antenna to another antenna.
The antenna radiation pattern, current distribution, and total efficiencies are good.
For 2.4 GHz frequency band, the efficiency of antenna is more than 90%. The current
passing through the antenna and ground plane explains us about MIMO methodology
isolation.
The data rate customary with a pair of 2.4 GHz frequency is greater to 34 bps/Hz.
The MIMO antenna return loss is influenced by factor t and perfectly matching
in 50 impedance matching at resonant frequency. The operating frequency is
operating based on various stub length. For better performance and proper gain, the
structure of antenna with specified measurements and then the behaviour is verified
at resonant frequency 2.4 GHz, with a reflection coefficient of −30 dB as exhibited
in Fig. 3.
The 2D radiation model of this antenna, which is 4 port MIMO, works well despite
showing different angles as shown in Figs. 4 and 5. Above figures show the maximum
gain is 1 dBi and the efficiency shows 90% at the operating frequency.
From the above figure shows the gain and directivity in between 4.5 and 5 dBi.
4 × 4 MIMO will consist of four antennas. Generally, a device with more antennas
is used to have a high cost because of its hardware. And they will use a bit of extra
power for extra wireless hardware.
Fig. 4 Elevation designed

gain
4 Conclusion
A resized four-port antenna system for 5G mobile antenna applications has been
presented. This antenna system is depended on compressed antenna component and
this antenna element is a self-isolated one. The MIMO antenna is confirmed by
reproduction and analysis. And it can also achieve the good isolation without any
decoupling element or isolation components. Without decreasing any efficiency, the
diffusing component of the implemented antenna acts as an un-coupling component.
Because of size reduction, the suggested MIMO system is a better choice for 5G
mobile in portable systems.
Fig. 5 Azimuth designed

gain
Acknowledgements The authors would like to thank JNTUH for supporting this project. This
research was supported by the TEQIP III Collaborative Research Scheme, JNTUH.
References
1. Teja, R., Kumar, S.A., Thangavelu: CPW-fed inverted six shaped antenna design for internet
of things (IoT) applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019
2. Sahithya, V., Kumar, S.A., Thangavelu, S.: Design of CPW fed Antenna for WIMAX
Applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019
3. Ravali, S., Kumar, S.A., Thangavelu, S.: Design of a CPW fed detective solid bowtie antenna
for satellite applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019
4. Kumar, S.A., Thangavelu, S.: Design of clover slot antenna for biomedical applications.
Alexandria Eng. J. 56, 313–317 (2016)
5. Andrews, J.G., et al.: What will 5G be? IEEE J. Sel. Areas Commun. 32(6), 1065–1082 (2018)
6. Kumar, S.A., Thangavelu, S.: CPW fed monopole implantable antenna for 2.45 GHz ISM band
applications. IJEL 3(3), 152–159 (2015)
7. Kumar, S.A., Thangavelu, S.: CPW fed implantable Z-monopole antennas for ISM band
biomedical applications. IJMWT 7, 529–533 (2015)
8. Kumar, S.A., Thangavelu, S.: Implantable CPW fed rectangular patch antenna for ISM band
biomedical applications. IJMWT 6(1), 101–107 (2014)
9. Kumar, S.A., Shanmuganantham, T.: Design of CPW-fed inverted six shaped antenna for IoT
applications. TEEM, Springer (2020)
10. Kumar, S.A., Thangavelu, S.: Design and performance of textile antenna for wearable
applications. TEEM 19(5), 352–355 (2018)
Molecule Four-Port Antenna is Utilizing
Detachment Progress of MIMO
K. Venugopal Rao, S. Ashok Kumar, and T. Shanmuganantham
Abstract To investigate the characteristics multiple input multiple output (MIMO),

four-element antenna is proposed for wireless applications. As a radiating element,
hexagon molecule-shaped fractal structure is proposed in this antenna. To obtain
better isolation, without any additional decoupling structure, the elements of an
antenna are placed orthogonally to each other. On each radiating element, a
C-shaped slot is placed to attain band negative response in WLAN. The designed
antenna exhibits constant omni-directional wave pattern. The range of 3.5–3.9 GHz
an acceptable impedance bandwidth is shown in this paper, also return loss more
than −20 dB over the resonant frequency. Based on the characteristics and MIMO
parameters, the performance of antenna is simulated and designed. For MIMO appli-
cations, the detachment level in between the material is flexible. For great MIMO and
more-density operations, the simulated and measured results are good and suitable.
Keywords Communication of 5G · MIMO applications · Mobile terminal ·

Wireless applications
1 Introduction
In modern days, to make smaller in size, devices designing of MIMO system with
multiple antennas are developed. While designing some implications is done one
of that is the space between MIMO receiver materials as short as possible. Space
between the antenna element will lead the problem in mutual coupling. Number of
K. Venugopal Rao (B) · S. Ashok Kumar

Jyothishmathi Institute of Technological Sciences, Karimnagar, India
e-mail: venugopalraokalakuntla@gmail.com
S. Ashok Kumar
e-mail: ashokvaasan@gmail.com
T. Shanmuganantham
Pondicherry University, Puducherry 605014, India
e-mail: shanmugananthamster@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_4
28 K. Venugopal Rao et al.
methods are designed in novel as similar to original location of the bits, placing
neutralization strip, moderately equivalent design, and further decoupling design.
The radiating bits are arranged perpendicular together and provided by coplanar
wave guide feeds to enhance isolation. The large area is occupied by usage of CPW.
Isolation is enhanced by ground plane on the coupling current to cancel out it neutral-
ization line is placed between the antenna element. It is reduced the mutual coupling
by considering L-shaped holes in elliptical, radiator slots this radiator and elliptical
slot in the ground plane between perpendicularly placed and fractal-shaped antenna
[1].
To enhance the isolation, radiating element is used inside the T-shaped stub and in
meander line feed combination. To get more isolation, instead of using t-stub we can
use I-model stubs on surface and F-model holes in a co-shared detachment. Most of
the conventional systems consist of two antennas only, but the signal strength of two
antennas is weak. Antenna isolation was improved by using the four-antenna MIMO
system [5–7]. The more number of antennas are presented in single system; and the
more antennas in a device have high data rate and upload speeds. Mostly, the size of
the antenna is large, but in this work to decrease the size of the antenna around 30%
by using the vertical stubs [8–10]. The U-structured antenna element can be used
as decoupling element and radiating element. This is the unique feature of MIMO
antenna to have very good isolation.
2 Antenna Configurations and the System Operating

Sequence
Figure 1 shows the diagrammatic representation of UWB MIMO antenna. The multi-
input multi-output system consists of four single poles; each bit delivered by 50
a micro-strip line. The good isolation is achieved by perpendicular orientation of
the perpendicular element. The edges of the geometry are applied at the hexagon
molecule fractal to achieve the wideband phenomenon.
3 A Compact-Based Self-isolated Antenna Component

System with Four-antenna MIMO
In mobile terminal, four-port MIMO antenna system is positioned at the boundary.

The measurement of the dimensions shows 130 and 13 mm. The 4 × 4 MIMO
systems will have four different signals from four transmitted antennas, and by using
this setup, user equipment can receive the better signals. The measurement of the four
antennas is 130 mm × 13 mm × 1.2 mm and the dimensions of the small substrate
can be seen in Fig. 2.
Molecule Four-Port Antenna is Utilizing Detachment Progress of MIMO 29
Fig. 1 Geometrical view of antenna
Fig. 2 S-parameter display
The data rate customary with a pair of 3.7 GHz frequency is greater to 34 bps/Hz.
The MIMO antenna return loss is influenced by factor t and perfectly matching in
50 impedance matching at resonant frequency. The operating frequency is operating
based on various stub length. Top of form for better performance and proper gain,
the structure of antenna with specified measurements and then the result is verified at
resonant frequency 3.7 GHz, with a return of less than −22 dB at 3.7 GHz as shown
in Fig. 2.
The 2D radiation model of this antenna, which is four-port MIMO, works well
despite showing different angles as shown in Figs. 3 and 4. Figure 5 shows the
maximum gain is 1 dBi and the efficiency shows 90% at the operating frequency.
Fig. 3 Elevation plane
4 Conclusion
In this paper, four-port antenna system for 5G mobile antenna applications has been
presented. This antenna system is depended on compressed antenna component and
this antenna element is a self-isolated one. The MIMO antenna is confirmed by
reproduction and analysis. And it can also achieve the good isolation without any
decoupling element or isolation components. Without decreasing any efficiency, the
diffusing component of the implemented antenna acts as an un-coupling component.
Because of size reduction, the suggested MIMO system is a better choice for 5G
mobile in portable systems.
Molecule Four-Port Antenna is Utilizing Detachment Progress of MIMO 31
Fig. 4 Azimuth plane
Fig. 5 Field gain

Acknowledgements The authors would like to thank JNTUH for supporting this project. This
research was supported by the TEQIP III Collaborative Research Scheme, JNTUH.
References
1. Sahithya, V., Kumar, S.A., Thangavelu, S.: Design of CPW fed antenna for WIMAX
applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019
2. Ravali, S., Kumar, S.A., Thangavelu, S.: Design of a CPW fed detective solid bowtie antenna
for satellite applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019
3. Kumar, S.A., Thangavelu, S.: Design of clover slot antenna for biomedical applications.
Alexandria Eng. J. 56, 313–317 (2016)
4. Kumar, S.A., Thangavelu, S.: Design of CPW-fed inverted six shaped antenna for IoT
applications. TEEM, Springer (2020)
5. Kumar, S.A., et al.: CPW fed monopole implantable antenna for 2.45 GHz ISM band
applications. IJEL 3(3), 152–159 (2015)
6. Kumar, S.A., Thangavelu, S.: CPW fed implantable Z-monopole antennas for ISM band
biomedical applications. IJMWT 7, 529–533 (2015)
7. Kumar, S.A., et al.: Implantable CPW fed rectangular patch antenna for ISM band biomedical
applications. IJMWT 6(1), 101–107 (2014)
8. Kumar, S.A., et al.: Design of implantable CPW fed monopole antenna for ISM band
applications. TEEM 15(2), 55–59 (2014)
9. Kumar, S.A., et al.: Design and performance of textile antenna for wearable applications. TEEM
19(5), 352–355 (2018)
10. Teja, R., Kumar, S.A., Shanmuganantham, T.: CPW fed inverted six shaped antenna design for
internet of things (IoT) applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019
IOT-Based Underwater Wireless
Communication
Gitimayee Sahu and Sanjay S. Pawar
Abstract Underwater wireless communication is an advanced research area need to

be explored extensively. This topic is highly significant for various purposes, starting
from aquatic pollution control, marine life monitoring, and quality of the water and
moreover signal transmission. Different sensors can be used under water for the
above said applications. For signal transmission under water, sound waves and optical
signals were used extensively in back. The drawback is low data rate, attenuation and
backscattering due to suspended particles which are the major challenges. To meet the
above challenges here, a prototype of underwater RF wireless network is developed.
It not only establishes the wireless network but also tracks the underwater devices
and updates the location in the IOT cloud. It also measures temperature, vibration and
quality of the water using underwater sensors and update in www.thingspeak.com
website. The established underwater wireless network provides coverage upto 1.2 km
in diameter and 50 m in depth which is highly considerable. Further optimization
can be done to enhance the range.
Keywords Underwater module · Terrestrial module · Arduino Uno · HC12 ·

Epson WiFi module
1 Introduction
The three forth of earth’s surface is enclosed with water in the aqua-structure of
oceans, rivers and seas. The unexplored underwater environment needs to be exam-
ined. The path to do fruitful experimentation has always dependent on various
technologies. Latest improvement in technologies has directed the way to do the
underwater explorations using different sensors at each level. Hence, underwater
G. Sahu (B)
Department of EXTC, UMIT, Juhu, Mumbai, India
e-mail: giti.sahoo@gmail.com
S. S. Pawar
Department of EXTC, UMIT, SNDT Women’s University, Mumbai, India
e-mail: drsanjayspawar@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_5
34 G. Sahu and S. S. Pawar
sensor network (UWSN) is an emerging research area for variety of applications

such as, (i) aquatic surveillance, (ii) river and sea pollution monitoring and control-
ling, (iii) oceanographic data compilation and commercial exploitation of the aquatic
environment and (iv) marine monitoring.
Moreover signal transmission using wireless network underwater is a fusion of
wireless technology with miniaturized sensors for smart sensing, communication
capabilities and intelligent computing. Underwater wireless network has significant
applications in military and navy academy, marine monitoring and various industrial
applications such as marine fish farms, reduction of deposition of organic waste on
the seabed and to fight against atmospheric pollution are highly beneficial.
UWSN is a network of autonomous sensor nodes [1]. The sensor nodes are
geographically dispersed in order to sense various properties related to under water
such as salinity level, pressure, temperature, etc. The sensor nodes may be mobile or
fixed which may be connected wirelessly through communication modems to transfer
the sensed data [2]. The data can be used by different functions for the welfare of
living things. Underwater communication is primarily carried out using a group of
wireless nodes communicating their data to the gateway node. The gateway then
relays the data to the closest control station.
The objective of the research work is to establish an aquatic wireless network under
the water. This idea is originated from “the crashing of inter-continental aeroplanes
at the mid air and falling inside the sea”. For example, Malaysian aeroplane MH370
get vanished from ATC in March 2014 is the most incredible aviation mystery in
the twenty-first century. The plane abruptly turned back towards Malaysia and then
towards Indian Ocean. The Malaysian government did not found the whereabouts
of the plane and the humans and gadgets inside the ocean. If an aquatic wireless
network can be created under the water, then any device falling inside the sea can be
detected, tracked, and location of the device can be updated in the IOT cloud.
This research work is very innovative and significant. The objective of this research
is to investigate and design a prototype to establish a wireless network under the
water for communication. It also locates and tracks the underwater devices and
updates the position of the devices in IOT cloud. The various use cases of this
research are, (i) to establish underwater wireless network and (ii) communication
and transfer of data between various devices. This network also helps to track the
cellular devices and to find the exact location, i.e. latitude and longitude, and it
will be updated in the IOT cloud. It also monitors and controls the quality of the
water. Aquatic surveillance and oceanographic data compilation and commercial
exploitation of the aquatic environment are the major objectives of the research
work. The other applications include marine monitoring, coastal area surveillance,
oil-rig maintenance and collection of data from under water.
IOT-Based Underwater Wireless Communication 35
1.1 Literature Review
Nowadays, there is huge research related to underwater communication is on

progress. The main research lines are based on increasing the distance and bandwidth
to reduce the energy consumption of underwater devices which aims to increase
the network life period. Underwater communication research especially focuses
on the application of optical signals, electromagnetic waves and the generation of
acoustic and ultrasonic waves. Each methodology has its own significances, with its
advantages and drawbacks. Some literature reviews related to this research work are,
(a) Devices using optical communication are having high propagation speed. But
strong back scattering occurs by suspended particles and is concerned by the
turbidity of the water, hence not better option for large areas [3–5].
(b) Devices using acoustic waves are having minimum sensitive level. They can
able to reach large distances (over 20 km). But the drawback is the low data rate
(0 b/s–20 kb/s), which is defined by low carrier frequency, high attenuation and
reflections [6, 7].
(c) For higher data rates, radio frequency (RF) technique can be used, which
achieves data rate up to 100 Mbits/s in short distances.
Electromagnetic (EM) waves, in the high frequency range, are better choice for
underwater wireless communication system. EM waves have minimum sensitive
level of reflection and refraction in shallow water compared to acoustic waves.
Oubei et al. [3] explained underwater wireless communication using optical
communication. It uses visible light to transmit data in underwater environment. It
can transmit large information through wide bandwidth using unlicensed spectrum
and low power.
Saeed et al. [4] also explained underwater wireless networking and localization
using optical communications.
Adib et al. [6] from MIT Media Lab researchers have designed underwater trans-
mitter that sends a sonar signal to propagate on the water’s surface, which causes tiny
vibrations that reciprocates to 1 and 0 s transmitted. Over the surface, an extremely
sensitive receiver reads these tiny vibrations and decodes the sonar signal. The system
is called as “translational acoustic-RF communication” (TARF). “Acoustic transmit-
ting beacons can be realized like aeroplane’s black box”. It transmits signal every
second, and TARF system can be used to pick up the signal.
Ranjan et al. [8] explained underwater wireless network using sensors and
autonomous underwater vehicles (AUVs). The AUVs communicate, cooperate and
exchange data among each other to carry out sensing and monitoring function.
Underwater communication network (UWCN) has found an increasing use in a
widespread range of applications, such as, autonomous underwater vehicle (AUV),
coastal surveillance systems operation, environmental research, oil-rig maintenance,
collection of data for water monitoring and linking submarines to land.
The research paper is organized in the following manner. Section 1 presents intro-
duction and literature review in brief. Section 2 presents system model, Sect. 3
presents results and discussion and Sect. presents 4 conclusion.
2 System Model
Underwater wireless communication can use sensor network to monitor environ-

mental condition below the water. Figure 2 shows the block diagram of the system
model. Sensors will be connected to Arduino Uno. It then communicates with the RF
antenna attached to it. The underwater sensors will collect all the information’s below
the water and convey to RF antenna through Arduino. The antenna then broadcasts
with other antennas. It will establish communication between the devices, track the
devices and find the location, i.e. latitude and longitude and update in IOT cloud.
Further optimization can be done for range expansion and higher number of
attached devices to enhance the depth of the established network. Figure 1 shows the
prototype of the underwater wireless network (Fig. 2).
The technology of underwater wireless communication provides the solution
of transmitting and receiving information between two different mediums. Here,
we have developed a prototype wireless network, having coverage in the range of
transmitted frequency.
The two wireless WiFi modules consist of Epson 8266 which works as trans-
mitter and receiver. The transceiver module 1 specified as terrestrial unit operating at
2.4 GHz consists of node MCU HC12 wireless module which interfaced with the on-
chip Arduino board. It has broadcasting range upto 1 km and operating bandwidth of
433.4–473.0 MHz. This unit has the capacity to communicate with hundreds or more
live transmitting frequency channels. It transmits bit code message from terrestrial
to water medium.
Fig. 1 Underwater wireless network [1]

Fig. 2 Block diagram of system model
2.1 Module 1 (Terrestrial Module)
Figure 3 shows the transreceiver block diagram. The terrestrial module consists of
four main components, i.e. (i) Arduino, (ii) WiFi (Epson), (iii) HC12 and (iv) Node
MCU. The Arduino works as a controller to all this serially connected devices. First
the Arduino Uno triggers the WiFi Epson module which functions as hotspot and
operates at ISM band. After successful interfacing of hotspot module with the WiFi
Epson module, it displays a coded message on LCD [16 * 2]. After connecting to the
nearby RF network, it works as a blank spot which can be connected to many such
cellular devices in the geographical area approximately 1.5 km range.
The data rate is nearly 10 Kbps for the terrestrial communication. The interfaced
HC12 module frequently transmits and receives data with the specified frequency
range. It supports half duplex transmission. It has sensitivity of 5 Kbps. The paired
transreceiver antennas capable of communicating beyond 1 km range, thus provides
adequate coverage distance.
Node MCU (Epson 8266) is server-based communicating device interconnected
with on-chip Arduino, HC12 and WiFi module. When all these devices get ready
to transmit ping message from the underwater module, the node MCU will act as
transmitter for the IOT cloud via thingspeak website. This module also measures
temperature via sensor and checks the salinity level of the water.
Fig. 3 Block diagram of transreceiver
2.2 Module II (Underwater Module)
The transreceiver consists of on-chip Arduino, WiFi Epson module, HC12 and piezo-
electric sensor. The HC12 module connects the upper layer which has half the range,
i.e. upto depth of 5 ft. When any cellular device comes in contact with the HC12
under the water, within a range of 0.5 km, any devices in this range will get attached
to the HC12 and the location of the lost cellular devices can be traced. The HC12 will
communicate with the Arduino for updating the location to upper module of HC12.
Then the upper module restores the information to www.thingspeak.com (i.e. IOT
cloud) site.
The HC-12 is a half-duplex 20 dBm (100 mW) transmitter paired with a receiver
that has −117 dBm (2 × 10 − 15 W) sensitivity at 5000 bps paired with an external
antenna. These transceivers are capable of communicating up to 1 km in open air
interface and provide adequate coverage and high throughout.
3 Results and Discussion
The research work provides underwater wireless network for short communication
distances, i.e. near field communication (NFC). It can be used to establish the wireless
network under the surface of sea water also. This network can be used to connect
between the cellular devices, track the devices and localize the devices like finding
Fig. 4 Module I, i.e. RF

terrestrial module
the GPS position and to store the location in the IOT cloud. It can also be used for
precision monitoring, controlling the uncleanliness of the water, which may arrive
from neighbour localities and industries.
Figure 4 shows the RF terrestrial module will be switch ON first and gets the
broadband service from nearby base station, i.e. the primary server. The server will
ping node MCU, then the transmitter and receiver HC12 will ON by blinking. It
triggers Epson 826 WiFi module; it operates at the ISM band, i.e. 2.4 GHz. As
all the system starts to respond, the WiFi system will transmit and will connect
module II, i.e. underwater module as shown in Fig. 5. The module II activates and
communicate with the terrestrial module and tracks the nearby devices which are
included in its range. The HC12 will be interfaced with node MCU and will update the
location of the tracked device in www.thingspeak.com. The sensors connected with
module II are DS18B20 temperature sensor, piezoelectric ceramic ring transducer
(SMR3515T55) and Waspmote Smart Water quality monitoring sensor. It measures
temperature, vibration and water quality and update in the IOT cloud. Waspmote
is smart aqua-quality monitoring sensor which is portable and measures whether
there is any chemical leakage to the water or not. It checks various aqua-quality
monitoring parameters like pH level, dissolved oxygen (do), oxidation reduction
potential (ORP) and salinity level of the water. Figure 6 displays the location, i.e.
latitude and longitude of the submerged device under the water.
Fig. 5 Module II, i.e.

underwater module with
sensors
Fig. 6 Displaying the

location of the device under
water
The various use cases include, (i) military applications, (ii) monitoring the marine
activities, (iii) industrial applications, for example, fish farming and (iv) decrement
of waste deposition on the sea bed. The main challenge lies with underwater wireless
network since water is the conducting medium, i.e. lossy in nature unlike air interface.
Hence, coverage range of the network will be small. So more base stations (BS) need
to be deployed for proper and adequate coverage. Deployment of BS inside or above
the surface of the water is also a major assert. As water has flowing in nature, hence
fixed deployment of BS is not possible. Through buoyant, drones or short range
water proof BSs can be placed on the surface of sea water to establish wireless
communication.
4 Conclusion and Future Scope
This research is highly significant since it establishes wireless network under water.
It can able to communicate between the devices, track and locate the devices under
water. It updates the GPS location in the IOT cloud for further reference. It helps
for marine monitoring, sensing and controlling the quality of the saline water. It also
measures temperature, vibration and monitors the quality of the water, i.e. pH level,
dissolved oxygen (do) and salinity level and update in the IOT cloud. It reduces
deposition of organic waste on the sea bed. The developed system provides coverage
upto a range of 1.2 km diameter and 50 m in depth. This coverage is adequate in
water like lossy medium as compared with other related literatures.
Further optimization can be done for range expansion, more number of attached
devices and to enhance the depth of the established network. It also increases the
quality of the signal and reduces losses due to back scattering and reflections.
References
1. Felemban, E., Shaikh, F.K., Qureshi, U.M.: Underwater sensor network applications: a
comprehensive survey. Sage J. (2015). https://doi.org/10.1155/2015/896832
2. Khalid, M.A., Shah, P.A., Iqbal, K., Gillani, S., Ahmad, W., Nam, Y.: Underwater wireless sensor
networks: a review of recent issues and challenges. Wirel. Commun. Mobile Comput. 6470359,
20 (2019)
3. Oubei, H.M., Durán, J.R., Janjua, B., Wang, H.-Y., Tsai, C.-T., Chi, Y.-C., Ng, T.K., Kuo, H.-C.,
He, J.-H., Alouini, M.-S., Lin, G.-R., Ooi, B.S.: Wireless optical transmission of 450 nm, 3.2
Gbit/s 16-QAM-OFDM signals over 6.6 m underwater channel. OSA Tech. Digest Opt. Soc.
Am. 23(18), 23302–23309 (2016)
4. Saeed, N., Celik, A., Al-Naffouri, Y.Y., Alouini, M.-S.: Camera based optical communications,
localization, navigation, and motion capture: a survey. Ad Hoc Netw. (2018)
5. Oubei, H.M. et al.: Light based underwater wireless communications. Jpn. J. Appl. Phys. (2018)
6. Jang, J., Adib, F.: Underwater backscatter networking. In: SIGCOMM, Aug 19, pp. 19–23
(2019). Beijing, China
7. Gussen, C.M.G., Diniz, P.S.R., Campos, M.L.R., Martins, W.A., Costa, F.M., Gois, J.N.: A
survey of underwater wireless communication technologies. J. Commun. Inf. Syst. 31(1) (2016)
8. Ranjan, A., Ranjan, A.: Underwater wireless communication network. Adv. Electron. Electr.
Eng. 3(1), 41–46 (2013)
Pattern Prediction Using Binary Trees
T. Aditya Sai Srinivas, Ramasubbareddy Somula, Karrothu Aravind,

and S. S. Manivannan
Abstract In this busy world, no one has time now. Technology is being developed
every day to increase the efficiency. In this front, word predictor is a small step which
increases our efficiency multifold times. Word predictor has applications in various
areas like texting, search engine, etc. To develop our word predictor program, this
project uses the data structure Trie. Our program uses a stored file of words to predict
the words which the user may think of thus helping a lot. This project has compared
the implementation of word completion using binary trees to that of binary tries.
The proposed method that this project has used is word prediction using binary trees
as compared to already existing binary tries and has proved that implementation of
binary tries takes longer time as compared to our proposed work. Auto-complete
is a feature which helps the user to find out the things that one wants to search
by predicting the value in the search box. This auto-complete starts predicting the
searches related to the few letters or words that are being typed by the user in the
search box. This feature works best when the words typed by the user are more
common such as when addressing an email.
Keywords Prediction · Binary tree · Trie
T. Aditya Sai Srinivas

Computer Science Department, G. Pullaiah College of Engineering and Technology, Kurnool
518002, India
R. Somula (B)
Information Technology, VNRVJIET, Hyderabad 500090, India
e-mail: svramasubbareddy1219@gmail.com
K. Aravind
Computer Science and Engineering, GMRIT Engineering College, Razam 532001, India
S. S. Manivannan
SCOPE, VIT University, Vellore 632014, India
https://doi.org/10.1007/978-981-33-4543-0_6
44 T. Aditya Sai Srinivas et al.
1 Introduction
The feature “Auto-complete” starts predicting the words when the user enters the
first few letters of the word that one wants to search. When the user enters the first
letter, the auto-complete displays all the words beginning with that letter, and so the
writer can select the word from the predicted values instead of fully typing the text.
This saves a lot of time for the users around [1, 2].
Sometimes the words predicted are the ones which are recently searched by
the user. Language modeling and Augmentative and Alternative Communication
(AAC) devices are used in the word prediction process to predict the most frequently
and commonly used words. The user also can enter the words into the prediction
dictionaries using the word prediction software [3, 4].
• To understand the dynamic data structure tree used in developing the program.
• To understand the data structure "trie" being used in the program.
• To construct a strong and efficient algorithm to develop the program which is
editable and can be later used as a module for bigger software mechanism.
• To develop a real-time program which is efficient and has a fast processing and
also has an industrial application.
In this program, data structure Trie is being used to search the data in an ordered
fashion. This data structure is usually known as radix or prefix or digital tree. This
helps in storing the data in a dynamic set in which the keys are strings.
The node in the tree doesn’t store any information about the key instead the
position of the node defines the key. All the successors of a node have a common
prefix term of the previous string associated with the earlier node, and the root is
associated with the empty string. Values tend only to be associated with leaves, and
with some inner nodes that display the keys of interest (Fig. 1).
Fig. 1 Searching a node

using trie
Pattern Prediction Using Binary Trees 45
Fig. 2 Types of binary tree
Compact prefix tree is used in case of space optimization. In the above shown
example, the predictions are done in the nodes based on the first node information.
The final nodes have a prefix term as the earlier node [5–10].
Binary Trie
A tree is a data structure which has two elements in it which are known as left child
and right child. When general tree is converted to binary tree, then left most child of
parent will become left child, and all remaining children will be right child to their
siblings (Fig. 2).
To traverse through the binary tree, this project has three different types of
traversals. They are:
• Post-order
• In-order
• Pre-order
Post-order: It first traverse through the left child node and then right child node
and finally to the root (LRV).
In-order: It first traverse through the left child node and then root finally to right
child (LVR).
Pre-order: It first traverse through root and then left child finally to right child
(VLR).
In auto-complete binary tree, the traversal through the tree is it first traverse
through the root and then to the left child and then to its right child till the second
letter is found; then, it traverses through the right child of the found node.
To traverse for the word ARE
First visit the root node A and traverse to its left child N. Then compare to second
letter of the word. Since it is not the same, so this project traverse to its right child R
and compare. It is same; therefore, this project traverse to its left child (Fig. 3).
Fig. 3 Finding the prefix

ARE
2 Background
The main theme of this study was to investigate if the word processing was helpful
to people, especially the disabled people who find difficulty in writing. It mainly
focused on teaching children to use word processing.
There was an activity conducted among the children. The first case was the children
wrote stories by their own which involved handwritten work while in the other case
children used word processing and word prediction in writing. The difference was
noted in the use of spellings, grammatical errors, and use of legible words. The
results varied. The difference clearly noted the importance of use of word processing
or word prediction [11–15].
Word processing with word prediction improves the legibility and spelling of
written assignments completed by some children with learning disabilities and hand-
writing difficulties. Many students a with physical disabilities find difficulty in
write fluently. One type of assistive technology that has been developed to improve
accuracy in writing is word prediction software, although there is lack of research
supporting its use for individuals with physical disabilities [16–20].
This study did a research on word prediction and word processing to examine
the accuracy in writing draft papers by physically disabled people. Results indicated
that there was no effect on the writing speed of the people but it shows promise in
decreasing spelling and typographical errors [21–25].
Writing is a medium of human communication that involves interaction between
physical and cognitive skills. Physically disabled people find difficulty in writing,
and so they have to overcome few barriers in order to overcome the difficulties in
writing.
Most of the opportunities gained by the individuals are based on their writing
skills, and so there must be some technological development in the field of writing.
One such technology is assistive technology which helps in increasing the fluency in
typing. The main motive of this study is to improve the writing skills of physically
disabled people. There was an alternating treatment designed in which the diverse
physically disabled people were recruited in. The words correct per minute (WCPM)
and the grammatical errors were noted down for further investigation [26–30].
The recruited people were allowed to type for three minutes using the word
processor and word prediction. This was done to check which of these two were
more efficient in case of writing fluently. The most widely used websites involve the
library website and the online searching tools that are being used by the youth to
search things. And so providing automated searching features are most important for
gaining relevant results of their academic research scenarios. This feature helps the
user to end up getting the best hit.
This technology auto-completion has been very productive very affordable for
the people who have disabilities and find difficulty in writing. This thing is easily
made available when compared to other speech-to-text technology or special input
devices.
The only disadvantage of this feature is that for a document, consider a set A
and alphabetical range B of words, compute the set of all word-in-document pairs
(a, b) from the collection such that a belongs to A and b belongs to B. This leads
to the independent size of the underlying document collection. Python is the most
frequently used language for testing advanced features compared to some auto hotkey
scripting language [31–37].
3 Proposed Method
Implementation of binary tree

Binary tree data structure instantiates a pointer of node type and initiates its left and
right child nodes to NULL. It contains methods to add a new word to the dictionary,
search for the location of the partial word which returns the node pointer type. Also
contains methods to parse the binary tree and return all the results in a vector of
strings.
Methods
• void add word (string): adds the given string to the binary tree if it does not exist
already. This function works on interfaces provided by the node class to append
a child or adding the character of the word to the binary tree.
• Node* search word (string, bool): returns pointer to the node that contains the last
character in the string by traversing through the binary tree. If bool flag is false,
then it returns pointer to the node even if it is not a word. If bool flag is true, then
it only returns the pointer to the node if it is a word.
• bool auto-complete (string, vector < string > &): returns true if auto-complete can
be performed on the word entered by the user, else returns false. Calls two utility
functions search word and parse tree.
• void parse Tree (Node*, string&, vector < string > &, bool&): traverses the binary
tree and appends the words in the vector < string > &.
Parsing the Tree

Method
void parse Tree(Node∗, string, vector < string > &, bool&)
This module traverses through the binary tree and appends the word in the vector
of strings which passed as reference to the parse tree function. It uses the algorithm.
1. If left exists, then go left.
2. If right exists, then go right else return to previous node.
4 Result Analysis
In Fig. 4, y-axis represents the time in seconds, and x-axis represents the number of
iterations for binary tree and trie data for prefix appl. The binary tree approach is
better for the prefix appl (Tables 1 and 2).
Fig. 4 Time comparison for prefix appl
Table 1 Time taken for both

Iteration Time taken in BT (s) Time taken in trie (s)
approaches
1 0.016 0.031
2 0.015 0.035
3 0.025 0.027
4 0.016 0.026
Table 2 Time comparison

Iteration Time taken in BT (s) Time taken in trie (s)
for different prefix
1 0.016 0.030
2 0.016 0.031
3 0.015 0.031
4 0.015 0.026
Fig. 5 Time comparison for prefix Gan
In Fig. 5, y-axis represents the time in seconds, and x-axis represents the number
of iterations for binary tree and trie data for prefix appl. The binary tree approach is
better for the prefix Gan.
5 Conclusion
Word predictor has application in messaging application like WhatsApp, web search
engines, word processors, command like interpreters, etc. The original need of word
prediction software was to help people with physical disabilities which increase their
speed of typing as well as fewer the number of keystrokes needed in order to complete
a word or a sentence. Thus, in this front, this project has developed a program for
word prediction using data structure binary which definitely increases efficiency of
the user by at least 10%.
References
1. Sturm, J.M., Rankin-Erickson, J.L.: This report that mind mapping helps students with learning
disabilities to enhance their writing skills. Learn. Disabilities Res. Practice 17, 124–139 (2002)
2. Todman, J., Dugard, P.: Single-Case and Small-N Experimental Designs: A Practical Adviser
to Randomization Tests. Lawrence Erlbaum Associates, Mahwah, NJ (2001)
3. Tumlin, J., Heller, K.: Using word prediction software, writing becomes more easier to mild
disabilities. J. Special Educ. Technol. 19(3) (2004). https://jset.unlv.edu/19.3/tumlin/first.html
4. Weller, H.G.: Evaluating the effect of computer-based methods to support science teaching. J.
Res. Comput. Educ. 28, 461–485 (1996)
5. Zhang, Y.: Technology and the writing skills of students with learning disabilities. J. Res.
Comput. Educ. 32, 467–478 (2000)
6. Basu, S., Kannayaram, G., Ramasubbareddy, S., Venkatasubbaiah, C.: Improved genetic algo-
rithm for monitoring of virtual machines in cloud environment. In: Smart Intelligent Computing
and Applications, pp. 319–326. Springer, Singapore (2019)
7. Somula, R., Sasikala, R.: Round robin with load degree: an algorithm for optimal cloudlet
discovery in mobile cloud computing. Scal. Comput. Practice Exper. 19(1), 39–52 (2018)
8. Somula, R., Anilkumar, C., Venkatesh, B., Karrothu, A., Kumar, C. P., Sasikala, R.: Cloudlet
services for healthcare applications in mobile cloud computing. In: Proceedings of the 2nd
International Conference on Data Engineering and Communication Technology, pp. 535–543.
Springer, Singapore (2019)
9. Somula, R.S., Sasikala, R.: A survey on mobile cloud computing: mobile computing+ cloud
computing (MCC= MC + CC). Scal. Comput. Pract. Experi. 19(4), 309–337 (2018)
10. Somula, R., Sasikala, R.: A load and distance aware cloudlet selection strategy in multi-cloudlet
environment. Int. J. Grid High Perform. Comput. (IJGHPC) 11(2), 85–102 (2019)
11. Somula, R., Sasikala, R.: A honey bee inspired cloudlet selection for resource allocation. In:
Smart Intelligent Computing and Applications, pp. 335–343. Springer, Singapore (2019)
12. Nalluri, S., Ramasubbareddy, S., Kannayaram, G.: Weather prediction using clustering
strategies in machine learning. J. Comput. Theor. Nanosci. 16(5–6), 1977–1981 (2019)
13. Sahoo, K.S., Tiwary, M., Mishra, P., Reddy, S.R.S., Balusamy, B., Gandomi, A.H.: Improving
end-users utility in software-defined wide area network systems. In: IEEE Transactions on
Network and Service Management (2019)
14. Sahoo, K.S., Tiwary, M., Sahoo, B., Mishra, B.K., RamaSubbaReddy, S., Luhach, A.K.: RTSM:
response time optimisation during switch migration in software-defined wide area network. In:
IET Wireless Sensor Systems (2019)
15. Somula, R., Kumar, K.D., Aravindharamanan, S., Govinda, K.: Twitter sentiment analysis
based on US presidential election 2016. In: Smart Intelligent Computing and Applications,
pp. 363–373. Springer, Singapore (2020)
16. Sai, K.B.K., Subbareddy, S.R., Luhach, A.K.: IOT based air quality monitoring system using
MQ135 and MQ7 with machine learning analysis. Scal. Comput. Practice Experi. 20(4), 599–
606 (2019)
17. Somula, R., Narayana, Y., Nalluri, S., Chunduru, A., Sree, K.V.: POUPR: properly utilizing
user-provided recourses for energy saving in mobile cloud computing. In: Proceedings of the
2nd International Conference on Data Engineering and Communication Technology, pp. 585–
595. Springer, Singapore (2019)
18. Vaishali, R., Sasikala, R., Ramasubbareddy, S., Remya, S., Nalluri, S.: Genetic algorithm based
feature selection and MOE Fuzzy classification algorithm on Pima Indians Diabetes dataset. In:
2017 International Conference on Computing Networking and Informatics (ICCNI), pp. 1–5.
IEEE (2017, Oct)
19. Somula, R., Sasikala, R.: A research review on energy consumption of different frameworks in
mobile cloud computing. In: Innovations in Computer Science and Engineering, pp. 129–142.
Springer, Singapore (2019); Kumar, I.P., Sambangi, S., Somukoa, R., Nalluri, S., Govinda, K.:
Server security in cloud computing using block-chaining technique. In: Data Engineering and
Communication Technology, pp. 913–920. Springer, Singapore (2020)
20. Kumar, I.P., Gopal, V.H., Ramasubbareddy, S., Nalluri, S., Govinda, K.: Dominant color palette
extraction by K-means clustering algorithm and reconstruction of image. In: Data Engineering
and Communication Technology, pp. 921–929. Springer, Singapore (2020)
21. Nalluri, S., Saraswathi, R.V., Ramasubbareddy, S., Govinda, K., Swetha, E.: Chronic heart
disease prediction using data mining techniques. In: Data Engineering and Communication
Technology, pp. 903–912. Springer, Singapore (2020)
22. Krishna, A.V., Ramasubbareddy, S., Govinda, K.: Task scheduling based on hybrid algo-
rithm for cloud computing. In: International Conference on Intelligent Computing and Smart
Communication 2019, pp. 415–421. Springer, Singapore (2020)
23. Srinivas, T.A.S., Ramasubbareddy, S., Govinda, K., Manivannan, S.S.: Web image authenti-
cation using embedding invisible watermarking. In: International Conference on Intelligent
Computing and Smart Communication 2019, pp. 207–218. Springer, Singapore (2020)
24. Krishna, A.V., Ramasubbareddy, S., Govinda, K.: A unified platform for crisis mapping using
web enabled crowdsourcing powered by knowledge management. In: International Conference
on Intelligent Computing and Smart Communication 2019, pp. 195–205. Springer, Singapore
(2020)
25. Saraswathi, R.V., Nalluri, S., Ramasubbareddy, S., Govinda, K., Swetha, E.: Brilliant corp yield
prediction utilizing internet of things. In: Data Engineering and Communication Technology,
26. Baliarsingh, S.K., Vipsita, S., Gandomi, A.H., Panda, A., Bakshi, S., Ramasubbareddy, S.:
Analysis of high-dimensional genomic data using map reduce based probabilistic neural
network. Comput. Methods Progr. Biomed. 105625 (2020)
27. Lavanya, V., Ramasubbareddy, S., Govinda, K.: Fuzzy keyword matching using N-gram and
cryptographic approach over encrypted data in cloud. In: Embedded Systems and Artificial
Intelligence, pp. 551–558. Springer, Singapore (2020)
28. Revathi, A., Kalyani, D., Ramasubbareddy, S., Govinda, K.: Critical review on course recom-
mendation system with various similarities. In: Embedded Systems and Artificial Intelligence,
29. Mahesh, B., Kumar, K.P., Ramasubbareddy, S., Swetha, E.: A review on data deduplication
techniques in cloud. In: Embedded Systems and Artificial Intelligence, pp. 825–833. Springer,
Singapore (2020)
30. Sathish, K., Ramasubbareddy, S., Govinda, K.: Detection and localization of multiple objects
using VGGNet and single shot detection. In: Emerging Research in Data Engineering Systems
and Computer Communications, pp. 427–439. Springer, Singapore (2020)
31. Pradeepthi, C., Geetha, V.V., Ramasubbareddy, S., Govinda, K.: Prediction of real estate price
using clustering techniques. In: Emerging Research in Data Engineering Systems and Computer
Communications, pp. 281–289. Springer, Singapore (2020)
32. Maddila, S., Ramasubbareddy, S., Govinda, K.: Crime and fraud detection using clustering
techniques. In: Innovations in Computer Science and Engineering, pp. 135–143. Springer,
Singapore (2020)
33. Rakshitha, K., Rao, A.S., Sagar, Y., Ramasubbareddy, S.: Demonstrating broadcast aggregate
keys for data sharing in cloud. In: Innovations in Computer Science and Engineering, pp. 185–
34. Ramasubbareddy, S., Srinivas, T.A.S., Govinda, K., Manivannan, S.S.: Comparative study
of clustering techniques in market segmentation. In: Innovations in Computer Science and
Engineering, pp. 117–125. Springer, Singapore (2020)
35. Ramasubbareddy, S., Srinivas, T.A.S., Govinda, K., Manivannan, S.S.: Crime prediction
system. In: Innovations in Computer Science and Engineering, pp. 127–134. Springer,
Singapore (2020)
36. Sahoo, K.S., Tiwary, M., Sahoo, S., Nambiar, R., Sahoo, B., Dash, R.: A learning automata-
based DDoS attack defense mechanism in software defined networks. In: Proceedings of the
24th Annual International Conference on Mobile Computing and Networking, pp. 795–797
(2018, Oct)
37. Sahoo, K.S., Sahoo, S., Sarkar, A., Sahoo, B., Dash, R.: On the placement of controllers for
designing a wide area software defined networks. In: TENCON 2017–2017 IEEE Region 10
Conference, pp. 3123–3128. IEEE (2017, Nov)
Fruit Recognition Using Deep Learning
P. Balakesava Reddy, Somula Ramasubbareddy, D. Saidulu, and K. Govinda
Abstract This paper discusses on the fruits classification for which the data is
collected from the dataset called Fruits_360. Using this data, training of a neural
network which will identify the fruit. Using the deep learning and image processing
concepts form a neural networking system. The proposed work uses convolution
neural networks in building the model and also used ResNet to get the results of image
classification from deep learning concept. To meet the resource requirement for the
proposed work, it uses Google cloud vision API which gives us the required GPU
to proceed with the process of analyzing the data from the image and also discussed
the in depth of how image classification is done using deep learning concepts. Here,
in building a deep learning model, which classifies the given image into any of these
nine categories: Apple, Avocado, Banana, Cherry, Cocos, Kiwi, Mango, Orange,
Lemon. This model can also be implemented into mobile version.
Keywords Deep learning · ResNet · Prediction · Network
1 Introduction
Convolution neural networks are designed based on the neural networks of human
brain system. The human brain needs many incidents in real life so that our human
brain recognizes the incidents again and provides the action that is to be given [1].
Similarly, here, the testing data is given to the convolution network, and this testing
data is used to train the network for the further validation of the images [2]. So, this
P. Balakesava Reddy · S. Ramasubbareddy (B)

Information Technology, VNRVJIET, Hyderabad, Telangana, India
e-mail: svramasubbareddy1219@gmail.com
D. Saidulu
Information Technology, Guru Nanak Institutions Technical Campus, Hyderabad, Telangana,
India
K. Govinda
SCOPE, VIT University, Vellore, Tamilnadu, India
https://doi.org/10.1007/978-981-33-4543-0_7
54 P. Balakesava Reddy et al.
convolution network concept is similar to the human neural networks, and moreover,
it is designed in the same manner. Now, in the testing process, the given image is
read and compared with many other images and finds the nearest image to the given
image based on the probabilities. It processes the whole images sets until the last
image even though if it finds the accurate image so giving poor bastards will form
more complicated network, and achieving the high accurate output can be more
complicated and sometimes not even possible [3].
In convolution, the images are classified into two parts one is black and white
image which forms a 2D array and the other colored image which forms a 3D array.
Since they are different, the values assigned will be different for the pixels when
given to the CNN [4]. For the black and white image the values to the pixel are
assigned between 0 and 255 to represent the what is the color of the pixel, where
colored image is a combination of red green blue that they have separate extra layer
which means each color has a range of 0–255 like a pixel might have (255, 105, 180)
as its pixel value, and this defines the pink color of the pixel in the image [5]. So
from this, we get to know the color of the given image which is considered as the
input of the network.
First set the boundaries of the image by detecting its edges, and we sent the data
inside the boundaries as 1’s, and the remaining part of the image is giving 0’s; now,
we will be able to mark the location of the fruit boundaries.
Now, we use a 3 × 3 matrix as feature detector which is a kernel which detects
the image and gets the data. The input image is placed such that the cell from the
first row and first column is fit inside the boundary of a selected image. Later on the
feature detector is moved toward to the other end. Then, the detector is moved in the
next row from the previous row which results in the feature map. This feature map
helps in reducing the pixel which reduces the input image size and takes less time
to detect the image for further usage. The larger the stride is, the lesser the image
map will be, and the lesser the image map, then the accuracy will also be decreased.
There is also a loss in the information of the data which might reduce the accuracy;
however, its function is to take only main content of the image and remove the extra
part and which also improves the accuracy. Reducing the input image means that we
are just concentrating on the main features of the input; this will help the detection
process to concentrate only on the main data instead of the useless data in the image
which also will decrease the accuracy of the system [2].
The linearity for the image is maintained through the usage of rectified linear unit
when the image is under convolution operation. It removes all the black elements
and taking only positive value from the data [3].
In general, we use a regular feature map, whereas there is another kind, i.e., pooled
feature map which process differs from the general one. Now, let us take 2 × 2 box
and place it as usual we place in 3 × 3 at left corner. Now move it along the opposite
end of the row. If we use strides of 2 pixel it results in 3 × 3 pooled feature map.
In general, stride of 2 are very familiar in usage. The minimum value that leads
the account for distortion is the whole point for pooling. So, use these techniques
creation of the model can be achieved [6–10].
Fruit Recognition Using Deep Learning 55
2 Literature Survey
The previous work where the neural networks and deep learning concepts are used
in the image classification. Following these papers which discusses about counting
the fruits of various kinds from the given bunch of fruits. Locating and counting
the red and green pepper fruits from large bunch of fruits is the main aim. They
had used around 28,000 images for training and validation of around various plants.
Two steps involved in the process, one among them is placing the single image,
and the other one is integrating all the views to get accuracy. Here, in the project,
creation neural network concepts are used and also should use convolution networks.
Network will be trained using RGB images; the RGB images are further normalized
into two dimensions. The paper which discusses on the apple production prediction
from which get to know how to get the edges and cross-sectional area of the fruits and
getting the cross-sectional area of the ripened part. Detect the damaged part of the
fruit based on the texture and color of the fruit, and this is compared with the other
testing data using the k-nearest neighbor algorithm which predicts the accuracy. This
is also used in face detection and vehicle detection based on linear projections and
analysis of the image [1–4].
In this model, the concepts of convolutional neural network (CNN). This has five
layers called convolution layer after this then it goes to rectified linear Unit layer and
which passes through pooling layer then it goes to the fully connected layers and
then loss layer. Here, we use an RGB images with pixel size of 100 × 100 [5].
The operation over two functions which produces the third function is called
convolution. Here, the third result is derived from the two functions which have all
the characteristics of the two functions used to derive it. In convolution layer is also
done the same the input from all image data which is convoluted to form a resulting
function which predicts the output [11–15].
To increase the non-linear properties of the input data, we use ReLU layer. To
reduce the dimensions and to reduce the number of computations, we use pooling
layers. 2 × 2 is the pooling layer filter used by us with 2 strides. This makes us to
reduce the input to one fourth of it. The layers from the regular neural network are
called fully connected layers. The connection between the neurons from one layer
to another layer is done here [15–20].
3 Proposed Method
All the images in the dataset were pre-processed using tensorflow image data gener-
ator. Since there are some null entries for age, they are being filled by the mean
age in the dataset. To avoid overfitting and exploding or vanishing of gradients, all
the images in the dataset were normalized by subtracting by their mean and divided
by their standard deviation. Since the deep learning models can only feed on the
numeric data, we cannot directly feed the raw images to the model, so we convert the
images into numeric tensors which are multi-dimensional arrays which are mostly
three dimensional. Generally, gray scale images are two dimensional, and their pixel
value varies between 0 and 255, whereas in the case of colored images, they are
three dimensional, and the third dimension is the color dimension, which generally
have three channels which are red, green, blue channels which are called as the RGB
channels. Similar to the gray-scaled images, the pixel value varies between 0 and
255 in these channels. Accordingly, all the images in the dataset will be converted to
numeric tensors with multiple channels. The training and the testing data were split
into 80:20 ratio. In present paper, size of the training set is increased by data augmen-
tation. The parameters such as rotation range, vertical flip, horizontal flip, height shift
range width shift range, and zoom range are added to the images to increase its size by
data augmentation. By applying these parameters, the image randomly gets cropped
and rotated, zoomed, flipped in different angles. In this way, the model won’t see the
repeated images of base image but it learns better by observing the image in different
ways. So, there is no chance of overfitting the model by data augmentation.
The values of the parameters used for this paper are:
Rotation range = 0.5,
Zoom range = 0.5,
Width shift range = 0.5,
Height shift range = 0.5,
Horizontal flip = False,
Vertical flip = False.
Normalizing our data is very important in deep learning because in the training
data, there will be various ranges of distribution of feature values for each feature,
and there will be a update caused by learning rate in each and every dimension that
will be different from other dimensions. We may be highly increasing the correction
in one of the weight dimensions at the same time decreasing in another. Training
also takes long time when training the model without normalization. In this paper,
we have normalized our training and test data by subtracting with their theoretical
mean values and then dividing by their standard deviation. The values will be roughly
between zero and one. The speed of the training also increases since the gradient
updating becomes easier and gradually the accuracy increases [21–25].
In this paper, the proposed method uses convolutional neural networks which
are playing major role in solving problems related to computer vision. Convolution
neural networks are similar to general neural networks which consist of input, hidden
layers, and output. The activation function used in this work is rectified linear unit
(RELU). The following layers after the convolution layer in the network are the max
pooling layers which help in reducing the dimension, preserve spatial invariance and
to output the high-level features of the input. Since to avoid overfitting, gradient
explode or gradient vanishing we have added some dropout layers in between the
model. Since the model is mainly build using convolutional layers and we need an
output among seven classes so at the end of the model, we had flattened the layers
and we have added some dense layers along with some dropout layers and as the
final layer, we have a SoftMax layer since we have multiple classes in our data. Since
Fig. 1 Convolution neural network using ResNet flow diagram
there are seven classes, we build the final layer with seven neurons. In the output, we
get seven probable values; each neuron outputs a probability value of input image
belonging to that class, and all the values of the seven neurons add up to get the sum
as one (Fig. 1).
The proposed method also uses tensorflow framework for designing, training,
and evaluating our deep learning model in this task. The model in this algorithm
is built mainly by using transfer learning, which means using a pre-trained model
which is trained previously on large datasets, and the pre-trained model can be used
in building our model. The pre-trained model used in our model is ResNet which is
trained on 14 million images containing more than 1000 classes (categories). ResNet
is trained on the ImageNet dataset, and the weights are saved after the training. So, at
the starting stage of the training, we assign these weights, which are generated when
trained ImageNet to the pre-trained model. And above the pre-trained model, we have
added some customized convolutional and max pooling layers. At first, we train the
model by freezing the weights of pre-trained model; it means that the pre-trained
model has the initial weights, and the weights don’t get updated. During the training,
only the parameters of added layers will get updated. The performance of the model
gets better after unfreezing the weights because the whole model will be trained on
the particular dataset provided by us. The parameters in the model get updated by
back propagation which means after calculating the loss between the actual and the
predicted output the parameters gets updated with respect to the loss value [26–30].
The loss between the actual and predicted output is calculated by using the func-
tion called categorical cross-entropy. In this work, the optimizer used to update the
parameters which are RMSprop optimizer with an initial learning rate of 0.0001.
As we are adding the earlier model to the latter model, artificial neural networks
and convolutional neural networks collide. Hence, it becomes more complex and
Table 1 Network structure

Layer Dimension parameter Output
parameters
Convolutional layer 3×3×4 16
Max pooling 2 × 2—Stride: 2 –
Convolutional layer 3 × 3 × 16 32
Fully connected layer 3 × 3 128 1024
Fully connected layer 1024 256
Softmax 256 60
sophisticated during the beginning stage of creating a convolutional neural network.

Here, the artificial neural network plays a vital role by making the convolutional
network more capable of categorizing an image. Artificial neural network helps in
getting the data, integrating the features and making the convolutional network more
efficient. Since we use many classes of data, we have to use the soft max function
[31–37] (Table 1).
4 Result
4.1 Validation and Testing Results After Each Epoch
Epoch 31/40
95/95 [==============================]—3 s 31 ms/step—loss:
0.0558—acc: 0.9784—val_loss: 0.0126—val_acc: 0.9959
Epoch 32/40
95/95 [==============================]—3 s 31 ms/step—loss:
Epoch 33/40
95/95 [==============================]—3 s 30 ms/step—loss:
Epoch 34/40
95/95 [==============================]—3 s 31 ms/step—loss:
Epoch 35/40
95/95 [==============================]—3 s 31 ms/step—loss:
Fig. 2 Accuracy after 40 epochs
Epoch 36/40
95/95 [==============================]—3 s 31 ms/step—loss:
Epoch 37/40
95/95 [==============================]—3 s 30 ms/step—loss:
Epoch 38/40
95/95 [==============================]—3 s 32 ms/step—loss:
Epoch 39/40
95/95 [==============================]—3 s 30 ms/step—loss:
Epoch 40/40
95/95 [==============================]—3 s 31 ms/step—loss:
0.0407—acc: 0.9839—val_loss: 0.0417—val_acc: 0.9889 (Figs. 2 and 3).
5 Conclusion
An effective algorithm for detection and to track the objects is explained. Also got
to know about the drawbacks and efficiency of the algorithm. In request to defeat
the issue of recognition, tracking related to movement and appearance. The major
application of fruit detection can be observed in the vision-based AI’s, where the
identification and tracking of individuals play major role. For any fruit tracking
algorithm, the initial step is to locate the fruit in the respective frame. Though there
are numerous algorithms choosing the accurate location of the face which has been
Fig. 3 Confusion matrix
a difficult task. A CNN is widely being used by all kind of researchers for the fruit
detection. Tracking is followed by the fruit detection. There are many algorithms
to track the objects. For future work, one can implement the same algorithm for
some more different objects with some more advanced filters for the noise reduc-
tion. In request to defeat the issue of recognition, tracking related to movement and
appearance.
References
1. O’Shea, K., Nash, R.: An Introduction to Convolutional Neural Networks. ArXiv e-prints
(2015)
2. Albawi, S., Abed Mohammed, T., Alzawi, S.: Understanding of a Convolutional Neural
Network (2017). https://doi.org/10.1109/ICEngTechnol.2017.8308186
3. Khan, A., Sohail, A., Zahoora, U., Saeed, A.: A Survey of the Recent Architectures of Deep
Convolutional Neural Networks (2019)
4. Zhang, F., Hu, M.: Memristor-Based Deep Convolution Neural Network: A Case Study (2018)
5. Bambharolia, P.: Overview of convolutional neural networks (2017)
6. Basu, S., Kannayaram, G., Ramasubbareddy, S., Venkatasubbaiah, C.: Improved genetic algo-
rithm for monitoring of virtual machines in cloud environment. In: Smart Intelligent Computing
and Applications, pp. 319–326. Springer, Singapore (2019)
7. Somula, R., Sasikala, R.: Round robin with load degree: an algorithm for optimal cloudlet
discovery in mobile cloud computing. Scal. Comput. Practice Exper. 19(1), 39–52 (2018)
8. Somula, R., Anilkumar, C., Venkatesh, B., Karrothu, A., Kumar, C.P., Sasikala, R.: Cloudlet
services for healthcare applications in mobile cloud computing. In: Proceedings of the 2nd
International Conference on Data Engineering and Communication Technology, pp. 535–543.
9. Somula, R.S., Sasikala, R.: A survey on mobile cloud computing: mobile computing + cloud
computing (MCC= MC + CC). Scal. Comput. Practice Experi. 19(4), 309–337 (2018)
10. Somula, R., Sasikala, R.: A load and distance aware cloudlet selection strategy in multi-cloudlet
environment. Int. J. Grid High Perform. Comput. (IJGHPC) 11(2), 85–102 (2019)
11. Somula, R., Sasikala, R.: A honey bee inspired cloudlet selection for resource allocation. In:
Smart Intelligent Computing and Applications, pp. 335–343. Springer, Singapore (2019)
12. Nalluri, S., Ramasubbareddy, S., Kannayaram, G.: Weather prediction using clustering
strategies in machine learning. J. Comput. Theor. Nanosci. 16(5–6), 1977–1981 (2019)
13. Sahoo, K.S., Tiwary, M., Mishra, P., Reddy, S.R.S., Balusamy, B., Gandomi, A.H.: Improving
end-users utility in software-defined wide area network systems. In: IEEE Transactions on
Network and Service Management
14. Sahoo, K.S., Tiwary, M., Sahoo, B., Mishra, B.K., RamaSubbaReddy, S., Luhach, A.K.: RTSM:
response time optimisation during switch migration in software-defined wide area network. In:
IET Wireless Sensor Systems
15. Somula, R., Kumar, K.D., Aravindharamanan, S., Govinda, K.: Twitter sentiment analysis
based on US presidential election 2016. In: Smart Intelligent Computing and Applications,
16. Sai, K.B.K., Subbareddy, S.R., Luhach, A.K.: IOT based air quality monitoring system using
MQ135 and MQ7 with machine learning analysis. Scal. Comput. Practice Experi. 20(4), 599–
606 (2019)
17. Somula, R., Narayana, Y., Nalluri, S., Chunduru, A., Sree, K.V.: POUPR: properly utilizing
user-provided recourses for energy saving in mobile cloud computing. In: Proceedings of the
2nd International Conference on Data Engineering and Communication Technology, pp. 585–
18. Vaishali, R., Sasikala, R., Ramasubbareddy, S., Remya, S., Nalluri, S.: Genetic algorithm based
feature selection and MOE Fuzzy classification algorithm on Pima Indians diabetes dataset. In:
2017 International Conference on Computing Networking and Informatics (ICCNI), pp. 1–5.
IEEE (2017, Oct)
19. Somula, R., Sasikala, R.: A research review on energy consumption of different frameworks in
mobile cloud computing. In: Innovations in Computer Science and Engineering, pp. 129–142.
Springer, Singapore (2019); Kumar, I.P., Sambangi, S., Somukoa, R., Nalluri, S., Govinda, K.:
Server security in cloud computing using block-chaining technique. In: Data Engineering and
Communication Technology, pp. 913–920. Springer, Singapore (2020)
20. Kumar, I.P., Gopal, V.H., Ramasubbareddy, S., Nalluri, S., Govinda, K.: Dominant color palette
extraction by K-means clustering algorithm and reconstruction of image. In: Data Engineering
and Communication Technology, pp. 921–929. Springer, Singapore (2020)
21. Nalluri, S., Saraswathi, R.V., Ramasubbareddy, S., Govinda, K., Swetha, E.: Chronic heart
disease prediction using data mining techniques. In: Data Engineering and Communication
Technology, pp. 903–912. Springer, Singapore (2020)
22. Krishna, A.V., Ramasubbareddy, S., Govinda, K.: Task scheduling based on hybrid algo-
rithm for cloud computing. In: International Conference on Intelligent Computing and Smart
Communication 2019, pp. 415–421. Springer, Singapore (2020)
23. Srinivas, T.A.S., Ramasubbareddy, S., Govinda, K., Manivannan, S.S.: Web image authenti-
cation using embedding invisible watermarking. In: International Conference on Intelligent
Computing and Smart Communication 2019, pp. 207–218. Springer, Singapore (2020)
24. Krishna, A.V., Ramasubbareddy, S., Govinda, K.: A unified platform for crisis mapping using
web enabled crowdsourcing powered by knowledge management. In: International Conference
on Intelligent Computing and Smart Communication 2019, pp. 195–205. Springer, Singapore
(2020)
25. Saraswathi, R.V., Nalluri, S., Ramasubbareddy, S., Govinda, K., Swetha, E.: Brilliant corp yield
prediction utilizing internet of things. In: Data Engineering and Communication Technology,
26. Baliarsingh, S.K., Vipsita, S., Gandomi, A.H., Panda, A., Bakshi, S., Ramasubbareddy,
S.: Analysis of high-dimensional genomic data using mapreduce based probabilistic neural
network. Comput. Methods Progr. Biomed. 105625 (2020)
27. Lavanya, V., Ramasubbareddy, S., Govinda, K.: Fuzzy keyword matching using N-gram and
cryptographic approach over encrypted data in cloud. In: Embedded Systems and Artificial
Intelligence, pp. 551–558. Springer, Singapore (2020)
28. Revathi, A., Kalyani, D., Ramasubbareddy, S., Govinda, K.: Critical review on course recom-
mendation system with various similarities. In: Embedded Systems and Artificial Intelligence,
29. Mahesh, B., Kumar, K.P., Ramasubbareddy, S., Swetha, E.: A review on data deduplication
techniques in cloud. In: Embedded Systems and Artificial Intelligence, pp. 825–833. Springer,
Singapore (2020)
30. Sathish, K., Ramasubbareddy, S., Govinda, K.: Detection and localization of multiple objects
using VGGNet and single shot detection. In: Emerging Research in Data Engineering Systems
and Computer Communications, pp. 427–439. Springer, Singapore (2020)
31. Pradeepthi, C., Geetha, V.V., Ramasubbareddy, S., Govinda, K.: Prediction of real estate price
using clustering techniques. In: Emerging Research in Data Engineering Systems and Computer
Communications, pp. 281–289. Springer, Singapore (2020)
32. Maddila, S., Ramasubbareddy, S., Govinda, K.: Crime and fraud detection using clustering
techniques. In: Innovations in Computer Science and Engineering, pp. 135–143. Springer,
Singapore (2020)
33. Rakshitha, K., Rao, A.S., Sagar, Y., Ramasubbareddy, S.: Demonstrating broadcast aggregate
keys for data sharing in cloud. In: Innovations in Computer Science and Engineering, pp. 185–
34. Ramasubbareddy, S., Srinivas, T.A.S., Govinda, K., Manivannan, S.S.: Comparative study
of clustering techniques in market segmentation. In: Innovations in Computer Science and
Engineering, pp. 117–125. Springer, Singapore (2020)
35. Ramasubbareddy, S., Srinivas, T.A.S., Govinda, K., Manivannan, S.S.: Crime prediction
system. In: Innovations in Computer Science and Engineering, pp. 127–134. Springer,
Singapore (2020)
36. Sahoo, K.S., Tiwary, M., Sahoo, S., Nambiar, R., Sahoo, B., Dash, R.: A learning automata-
based DDoS attack defense mechanism in software defined networks. In: Proceedings of the
24th Annual International Conference on Mobile Computing and Networking, pp. 795–797
(2018, Oct)
37. Sahoo, K.S., Sahoo, S., Sarkar, A., Sahoo, B., Dash, R.: On the placement of controllers for
designing a wide area software defined networks. In: TENCON 2017–2017 IEEE Region 10
Conference, pp. 3123–3128. IEEE (2017, Nov)
Cross-Domain Variational Capsules for
Information Extraction
Akash Nagaraj, K. Akhil, Akshay Venkatesh, and H. R. Srikanth
Abstract In this paper, we present a characteristic extraction algorithm and the

Multi-domain Image Characteristics Dataset of characteristic-tagged images to sim-
ulate the way a human brain classifies cross-domain information and generates
insight. The intent was to identify prominent characteristics in data and use this
identification mechanism to auto-generate insight from data in other unseen domains.
An information extraction algorithm is proposed which is a combination of Varia-
tional Autoencoders (VAEs) and Capsule Networks. Capsule Networks are used to
decompose images into their individual features and VAEs are used to explore vari-
ations on these decomposed features. Thus, making the model robust in recognizing
characteristics from variations of the data. A noteworthy point is that the algorithm
uses efficient hierarchical decoding of data which helps in richer output interpreta-
tion. Noticing a dearth in the number of datasets that contain visible characteristics
in images belonging to various domains, the Multi-domain Image Characteristics
Dataset was created and made publicly available. It consists of thousands of images
across three domains. This dataset was created with the intent of introducing a new
benchmark for fine-grained characteristic recognition tasks in the future.
Keywords Machine reasoning · Image information · Capsule networks ·

Variational autoencoders · Hierarchical decoding.
A. Nagaraj · K. Akhil · A. Venkatesh (B) · H. R. Srikanth

Department of Computer Science, PES University, Bengaluru, India
e-mail: akshay.venkatesh24@gmail.com
A. Nagaraj
e-mail: akashn1897@gmail.com
K. Akhil
e-mail: akhilkred@gmail.com
H. R. Srikanth
e-mail: srikanthhr@pes.edu
https://doi.org/10.1007/978-981-33-4543-0_8
64 A. Nagaraj et al.
1 Introduction
The machine reasoning domain [1], a part of the machine learning umbrella, deals
with extracting information from latent data, decoding it and reasoning out the deci-
sions made by machine learning systems. Machine reasoning is a two-step process;
the generation of information and the generation of reasoning from this information.
We extract information by training the model on a few domains and testing the
model on a new domain. In doing so, the model discovers information from the new
domain. Though this might not seem like machine reasoning in the truest sense, it does
generate information from latent data. With this paper, we aim to solve a small prob-
lem in this vast domain: Simulate the way a human brain classifies cross-domain
information and generates insight, by identifying prominent characteristics in
data and use this identification mechanism to auto-generate insight from data
in unseen domains.
A part of machine reasoning is transfer learning [2]. It stores the knowledge gained
from tackling one problem and applies it to another problem which is related to the
previous problem solved. Our model incorporates transfer learning to transfer latent
information across domains, known as Domain Adaptation [3].
1.1 Domain Adaptation
Domain adaptation is a field that deals with machine learning as well as transfer
learning. Domain Adaptation can be used when the goal is to learn from one source
distribution and apply the learning to a different target distribution related to the
source. Scenarios, in which there are multiple source distributions present, are called
multi-source domain adaptations. Research being done in this field addresses a major
issue—the need to determine a model’s capacity to accurately accept data from a
given target domain and label that data accordingly. The challenge arises because
the model is trained on a different source domain. Unsupervised learning algorithms
[4] that are implemented without using domain adaptation assume that the examples
are independent and identically distributed.
2 Dataset
2.1 Introduction
The dataset introduced in this paper, the Multi-domain Image Characteristic Dataset
[5], consists of thousands of images sourced from the internet. Each image falls
under one of three domains—animals, birds or furniture. There are five types under
each domain. There are 200 images of each type, summing up the total dataset to
Cross-Domain Variational Capsules for Information Extraction 65
3000 images. The master file consists of two columns; the image name and the
visible characteristics in that image. Every image was manually analysed and the
characteristics for each image was generated, ensuring accuracy.
Images falling under the same domain have a similar set of characteristics. For
example, pictures under the Birds domain will have a common set of characteristics
such as the color of the bird, the presence of a beak, wing, eye, legs, etc. Care has been
taken to ensure that each image is as unique as possible by including pictures that
have different combinations of visible characteristics present. This includes pictures
having variations in the capture angle, etc.
2.2 Why Our Dataset Is Required?
At the time of our research, there was a dearth of publicly available datasets that
contain visible characteristics in images belonging to various domains. The proposed
dataset [5] addresses this, as it has the following features:
• describes visible characteristics present in every picture.
• contains at least hundreds of pictures belonging to multiple domains, and also
contains multiple types within each domain. This is crucial to train our model
accurately.
• contains unique pictures belonging to a type that fall under a certain domain. This
is accomplished by collecting pictures that have different combinations of visible
characteristics, different angles in which the object was captured, etc.
2.3 Training and Testing
We recommend a test-train split of 600 samples (20%) and 2400 samples (80%). A
.txt file with the images to be included in the test and train splits is included, with
no overlap between the sets. Following the train-test split as mentioned would help
ensure consistency of experiments reported on the Multi-domain Image Character-
istics Dataset.
3 Approach
3.1 Variational Capsules
Variational capsules are a combination of capsule networks [6] and variational

autoencoders [7]. The capsules generated from capsule networks follow a known
prior distribution, and new capsules can be sampled from each of them. They are a
natural fit for the model presented in this paper, as they provide a rich representation
of image data and are robust to tiny variations in the decoupled features of the image.
3.2 Cross Domain Variational Capsules
Cross-Domain Variational Capsules are an enhancement to Variational Capsules

introduced in the previous subsection. After the latent representation of Variational
Capsules is generated for the input image data, this representation is fed to the Infor-
mation Decoder. The Information Decoder performs the hierarchical decoding of the
rich latent information available from the capsules. In comparison with traditional
decoders, our decoder preserves the hierarchical relationship—constructed by the
capsules—between features in the data. It leverages the depth of information (in the
form of a vector) available for each feature to construct a multi-hot vector identifying
the important characteristics from a vocabulary of words spanning all the domains
in scope.
The representation can also be leveraged to store cross-domain information to
perform information extraction across them. The Cross-domain Variational Cap-
sule model is divided into two parts: Creating the latent representation (Variational
Autoencoders and Capsule Networks) and Generating insights from that represen-
tation (a tailor-made deep network is used for this). A high-level overview of the
model can be seen in Fig. 1.
3.3 The Model
Let w[lower, higher] be a matrix where (lower, higher) are the dimensions of lower-
level and higher-level capsules respectively. The depth of the vector (dimensions)
is achieved by stacking m feature maps together. The vector output of the 32 lower
capsules is sent to all the higher-level capsules.
Fig. 1 Model design

Essentially, from the squash function, it can be inferred the lower level capsules
sends information only to the capsule having the closest centroid to themselves; as it
reinforces this connection. It enforces a level of agreement or disagreement between
the capsules in different layers. The squash function:
||sj2 || sj
vj = (1)
1+ ||sj2 || ||sj ||
3.4 Learning Algorithm
A prediction vector ûi/j is the prediction from the capsule i to the output of the capsule
j. If the activity vector vj is in close agreement with the prediction vector ûi/j , we
strengthen the connection bij . This is the Routing algorithm introduced in capsule
networks. “Agreement” coefficient:
aij =< ûi/j , vi > (2)
The Routing algorithm works on inner epochs/iterations which specify the number
of times it needs to be run. This is a hyper-parameter to the capsule network model.
An epoch starts with bij = 0 for all capsules i in the lower level and corresponding
connection capsules j in the higher level.
A normalization function is added to bij . We define
cij = softmax(bij ) (3)
An agreement weighted sum is calculated,

sj = cij ûj/i (4)
i
After squishing this sum, we get
||sj2 ||
vj = ŝj (5)
1 + ||sj2 ||
Finally, we update the weight of the connection
(bij = bij + ûj/i vj ). (6)
This process is performed for all pair wise capsule layers.

3.5 Losses
The total loss is defined as
TL = Marginal Loss + αCapsule Loss + βKL Divergence Loss. (7)
where α, β, and γ are constants.

It is important to note that the Reconstruction Loss is not relevant for our model.
However for capsule training purposes, we chose to keep it.
3.5.1 Capsule Loss
The capsule loss Lc for each capsule is
Lc = Tc max(0, m+ − ||vc ||)2 + λ(1 − Tc )max(0, ||vc || − m− )2 (8)
where Tc is 1 if an object of class C is present (If a relevant object is present, the

capsule agrees with the lower level capsule), m+ is the threshold for ||vc || if Tc = 1,
m− is the threshold for ||vc || if Tc = 0 and λ is a learning hyper-parameter (negative
sample loss rate).
3.5.2 Marginal Loss/Hinge Loss
The Hinge loss is:

(LM = max(0, 1 − t.y) (9)
where t is the target and y is the output.
3.5.3 KL Divergence Loss
[8] Let Z be a latent variable, X be a real distribution, P the encoder network, Q the
decoder network and E the expectation.
log P(X ) − DKL (Q(Z|X )||P(Z|X )) = E(log P(X |Z)) − DKL (Q(Z|X )||P(Z))
(10)
Equation (10) is the variational autoencoder objective function. The left-hand
side of the objective can be interpreted as lowering the bound of log P(X ), which
describes our data. The error is the KL Divergence term and lowers the bound of
P(X ). The maximum likelihood estimate [9] (MLE) can be calculated by maximizing
log(P(X |Z)) and minimizing the difference between the true latent distribution P(Z)
and a simple Gaussian distribution Q(Z|X ).
Variational autoencoders deal with constructing the underlying distribution of the

prior. To achieve this, it uses a reparameterization trick to reconstruct the distribution
from the trained μ and log(σ )2 of the prior. Log variance is used instead of true
variance (σ 2 ) as it is less volatile and numerically stable.
DKL is to be reduced to P(Z) = N (0, I ). Let Q(Z|X ) be Gaussian with parameters
μ(x) and (x). These are the trainable capsules’ mean and log variance. DKL between
these two distributions are computed in the close form.
DKL [N (μ(x), (x))||N (0, I )] = 0.5(trace((x)) + μ(x)T μ(x) − k − log det((x)))

(11)
where, k is the dimension of the Gaussian distribution, trace(x) is the trace function
(sum of diagonal of X ) and det(x) is the determinant (diagonal of matrix X ).
LKL = 0.5k (σ 2 (X ) + mu2 (X ) − 1 − log σ 2 (X )) (12)
4 Experiments and Results
4.1 Model Evaluation
4.1.1 Metrics
The model’s objective dictates that it is tolerant with noisy characteristics but not
with missing ones. Due to this unequal weightage given to false positives and false
negatives, accuracy is a poor evaluation metric. Hence, the model uses recall and
precision instead. To achieve the objective, the recall must be high, while the precision
could be low.
4.1.2 Evaluation
To evaluate the performance of the Cross-domain Variational Capsule model, we

used the Multi-domain Image Characteristics Dataset. We have trained the model
on 3 domains: Animals, Birds and Furniture. To test the model, we used cross-
validation with a 20–80 test-train split. A simple end-to-end supervised training of
image versus characteristic gave poor results. We also made sure that capsules were
trained sufficiently to accurately generate the rich vector representation for each
class. Hence, the model is trained on two levels:
• The Variational Capsule setup is a typical Capsule Network with output capsules
duplicated to be the mean and variances for each capsule. It is trained with the
image as the input and the classification as output. This setup uses a modified
Capsule Routing algorithm to train both sections simultaneously.
• The Information Decoder is a hierarchical neural network (where the nodes in a

layer are connected to only its parent in the previous layer). It is trained with the
image as input and its corresponding characteristics as output.
4.2 Results
The results obtained by our algorithm on the Multi-domain Image Characteristic

Dataset is seen in Table 1. As seen, the value of precision is low, while the value of
recall is high because recall depicts the capability of the model to identify relevant
characteristics, while precision depicts the proportion of the characteristics the model
identifies correctly, to the correct characteristics.
Although the accuracy of the model on a whole is quite low (at about 18%),
considering precision and recall shows that the model can successfully identify char-
acteristics in image data. A point worth noting; F1-score is a metric that finds the
balance between precision and recall, and was not a relevant metric to consider in
our case, as all the classes had an equal number of data points.
A sample output is seen in Fig. 2, showing the probabilities of the characteristics
identified in the sample image of a dog from the proposed dataset [5].
Table 1 Model results

Metric Value
Recall 0.7666
Precision 0.0024
Fig. 2 Sample output: characteristic identification from a sample image

5 Conclusion
A cross-domain information extraction algorithm using Variational Capsules that

learns to extract individual characteristics from image data is proposed. The aim
of this algorithm is not to improve an existing model but to satisfactorily solve the
relatively recent problem of identifying prominent characteristics of data.
This algorithm preserves the relationship developed between features in capsules,
using hierarchical decoding as opposed to fully-connected layers. It is also very data
efficient, working with a limited number of data points on multi-domain information
and is also robust to noise owing to the use of Variational Capsules. Our algorithm was
evaluated using the Multi-Domain Image Characteristics Dataset, confirming that
it successfully extracts characteristics (or information in general) from image data.
The algorithm can also work on any form of data supported by capsules. Potential
applications of our algorithm are numerous as information extraction is used in a
wide number of fields. Image characteristic extraction is also very versatile and is
used in a plethora of fields ranging from autonomous driving to astronomy.
5.1 Future Enhancements
Future enhancements include experimentation with different data formats (audio,

text, etc.) and characteristic recognition methods. Applying the above algorithm
to different data formats, and extracting characteristics from the data, we aim to
best represent the underlying characteristics of all formats of data. An additional
improvement would be to improve the efficiency and speed of the proposed algorithm,
drawing inspiration from similar real-time approaches [10].
References
1. Bottou, L.: From machine learning to machine reasoning. Mach. Learn. 94(2), 133–149 (2014)
2. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10),
1345–1359 (2009)
3. Sun, B., Feng, J., Saenko, K.: Return of frustratingly easy domain adaptation. In: Thirtieth
AAAI Conference on Artificial Intelligence (2016)
4. Barlow, H.B.: Unsupervised learning. Neural Comput. 1(3), 295–311 (1989)
5. Nagaraj, A.K.A., Venkatesh, A.: Multi-domain Image Characteristic Dataset. https://www.
kaggle.com/grassknoted/multidomain-image-characteristics-dataset (2020)
6. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural
Information Processing Systems, pp. 3856–3866 (2017)
7. Doersch, C.: Tutorial on variational autoencoders. Stat 1050, 13 (2016)
8. Hershey, J.R., Olsen, P.A.: Approximating the kullback Leibler divergence between gaussian
mixture models. In: 2007 IEEE International Conference on Acoustics, Speech and Signal
Processing—ICASSP’07, vol. 4, pp. IV–317. IEEE (2007)
9. Myung, I.J.: Tutorial on maximum likelihood estimation. J. Math. Psychol. 47(1), 90–100
(2003)
10. Nagaraj, A., Sood, M., Srinivasa, G.: Real-time automated answer scoring. In: 2018 IEEE 18th
International Conference on Advanced Learning Technologies (ICALT), pp. 231–232. IEEE
(2018)
Automotive Accident Severity Prediction
Using Machine Learning
Niva Mohapatra, Shreyanshi Singh, Bhabendu Kumar Mohanta,

and Debasish Jena
Abstract Prediction of the automotive accident severity plays a very crucial role in
the smart transportation system. The main motive behind our research is to find out
the specific features which could affect the vehicle accident severity. In this paper,
some of the classification models, specifically logistic regression, artificial neural
network, decision tree, k-nearest neighbors and random forest, have been imple-
mented for predicting the accident severity. All the models have been verified, and
the experimental results prove that these classification models have attained consid-
erable accuracy. The results of this research can be used in the smart transportation
system to predict if the road accident will be slight, severe or fatal, in accordance
with the top three features as predicted from the machine learning model.
Keywords Logistic regression · Artificial neural network · Decision tree ·

K-nearest neighbors · Random forest · Machine learning
1 Introduction
Road accidents are an increasing cause of concern in today’s world. These accidents
result in injuries, damage to properties and even death. These accidents also cause
heavy monetary losses. Many researchers have tried to examine the significant fea-
tures that can affect the automotive accident severity [1, 2]. The main aim of this
N. Mohapatra (B) · S. Singh · B. Kumar Mohanta · D. Jena

Department of Computer Science and Engineering,
IIIT Bhubaneswar, Bhubaneswar, Odisha 751003, India
e-mail: b416062@iiit-bh.ac.in
S. Singh
e-mail: b416046@iiit-bh.ac.in
B. Kumar Mohanta
e-mail: C116004@iiit-bh.ac.in
D. Jena
e-mail: debasish@iiit-bh.ac.in
https://doi.org/10.1007/978-981-33-4543-0_9
74 N. Mohapatra et al.
research project is the enhancement of the safety of people by extracting particular

features that determine how severe an accident will be. The authors in [3] proposed
“Magtrack” for detecting the road surface condition using Machine learning. Clas-
sification based on the road accidents dataset is performed using machine learning
methodologies. Based on the prediction, people can be made aware about the sever-
ity of the impending accident by notifying them through text messages. Applying
Machine Learning (ML) methodologies on the available dataset can help to under-
stand the features that have a critical role to play in affecting the severity of accidents
and help prevent such accidents in the future.
1.1 Organization of the Paper
Rest of the sections are arranged as follows. In Sect. 2, some of the related works on
the prediction of crash severity are described briefly. The proposed model is presented
in Sect. 3. It is then followed by implementation of the model and results analysis in
Sect. 4. Further, our paper sums up with a conclusion and any possible future work
that can be extended from our research in Sect. 5.
2 Literature Review
Predicting the severity of road accidents has been a major challenge globally. Irani-
talab et al. [4] presented that pretermitting crash costs would result in misreckoning
while selecting the correct prediction algorithm. They have developed a crash cost-
based approach to compare the prediction models of accident severity and researched
about various clustering algorithms. Alkheder et al. [5] proposed an artificial neural
network algorithm, which would predict the severity of injuries in the road accidents.
For better accuracy of ANN classifier, the datasets were then splitted to three specific
clusters by the use of K-means Clustering (KC) algorithm. The outcomes after the
clustering unveiled remarkable enhancement in the accuracy of the ANN classifier.
Zong et al. [6] compared two ML modeling algorithms, Bayesian network as well
as regression models and concluded that the Bayesian network is more efficient than
regression models for predicting accident severity. Hashmienejad et al. [7] contrived
a multi-objective Genetic Algorithm (GA) for optimizing and identifying protocols
in accordance with the metrics (confidence, comprehensibility and support). Kunt
et al. [8] predicted the accident severity by implementing twelve crash related fea-
tures in GA, pattern search and ANN algorithms. They concluded that the ANN
algorithm obtained the highest R-value, leading to the result that ANN provided the
best prediction. The security and privacy issue of Internet of Things (IoT) are sur-
vey in detailed which mentioned that machine learning could be used for addressing
security issue [9, 10]. Although most of the previous works presented the effects of
various classification models, there has been no specific contribution that can com-
Automotive Accident Severity Prediction Using Machine Learning 75
pare the accuracy of five classification models taken together. Therefore, we have
collectively applied these models and further, accuracy of all the above mentioned
algorithms is compared, so that we can find the most efficient algorithm which can
predict the accident severity.
3 Proposed Architecture and Methodology
Here, in the architecture depicted in Fig. 1, there is a roadmap that demonstrates

the communication among various vehicles, roadside units (RSU), a base station and
a GPS module. All these are connected within a closed network which ensures that
only authenticated devices are allowed to communicate. The machine learning model
is now fitted into the vehicle. We have assumed automation in this model, which
implies that the vehicle sensors are providing proper signals to the ML algorithm.
Various ML algorithms are compared and the selected features are clustered using k-
means clustering. The most efficient ML model is trained which predicts the accident
severity. If the inputs are matched, the model will either predict 0 (slight), 1 (severe)
or 2 (fatal). The predicted accident severity will be informed to the driver beforehand
through SMS using the way2sms API.
It can be referred from Fig. 2 that shown in flowchart. Firstly, data preprocessing
takes place which includes merging datasets, dropping rows and columns which con-
tain null values, resampling unbalanced data by oversampling and undersampling to
avoid bias and creating dummies out of categorical variables and dropping variables
containing the same information. Standardization is done to transform input features
into comparable scale. Standardization is performed before PCA to prevent input
Fig. 1 Architecture of the accident severity prediction model

Fig. 2 Flowchart of the accident severity prediction model
features with higher or wider ranges illegitimately dominating over those with low
variance. K-Fold cross-validation is used to produce a less biased result. Here, ten
splits have been chosen. Random forest model [11] can handle datasets with higher
dimensionality. Decision tree algorithm is used since it is quite resistant to the out-
liers [12]. Artificial neural networks [13] algorithm is used as it needs less statistical
training. KNN algorithm classifies a new data point, based on the similarity between
new and available data points [14]. Here, in our research, we have also used multi-
nomial LR classification, where it can deal with three or more than three classes
[15]. The mean accuracy and standard deviation of these five classification models
are compared. The selected features are then clustered using k-means clustering.
The ML model with highest accuracy is trained while clustering, which predicts the
severity of the impending accident.
4 Implementation Details and Result Analysis
Here, the authors have used a 64-bit operating system, x64-based processor with
an installed memory (RAM) of 8.00 GB. The system has an Intel(R) Core(TM)
i5-8250U CPU. A laptop manufactured by HP (Hewlett-Packard) is used, where
Windows 10 OS is booted by default. Python (version 3.6.10) is the programming
language used in this project. The front-end/UI technology used here is Flask (version
Automotive Accident Severity Prediction Using Machine Learning 77
Fig. 3 Screenshot of the comparison of various ML models
1.1.2). The integrated development environments (IDEs) used are Jupyter Notebook
and PyCharm. Way2sms API is used for sending mobile alerts to the drivers or
doctors of the nearby hospitals.
The comparison of all the five ML models, after implementing all the machine
learning classifiers, is summed up in Fig. 3. The best ML model as per our research
is the Artificial Neural Network (ANN) with a mean accuracy of 73.98%. We also
concluded that the three most important conditions which can affect the automotive
accident severity are the age of casualty, number of vehicles and casualty class-
pedestrian.
Due to development of Information and Communications Technology (ICT) and

other emerging technology like Internet of Things (IoT), cloud computing, Artificial
Intelligence (AI) and machine learning, transportation systems are now referred to
as smart transportation. The number of vehicles has also rapidly increased creating
lots of traffic congestion and accidents in daily basis. In this work, authors have
used various machine learning models and concluded that Artificial Neural Network
(ANN) model has the highest mean accuracy of 72.94% and a standard deviation
of 2.71%. The authors then clustered the accident severity feature into three classes
(using K-Means Clustering) as per ANN classification model, which helped in pre-
dicting if the road accident is slight, severe or fatal. The paper also concluded that the
three most important conditions which can affect the automotive accident severity
are the age of casualty, number of vehicles and casualty class-pedestrian. Severity
prediction of road accidents is very useful in the smart transportation system. It is
high time that we need to implement this in our roads to save people’s lives. It can
alert the driver beforehand about the accident on his/her mobile phones using the
way2sms API, so that the driver will be careful while driving (Proactive approach).
In future, this work can be extended to alert the nearby hospital to the doctors about
the accident on their mobile phones using the same API, so that the hospital will take
immediate actions to save the victim (Reactive approach).
References
1. Chong, M., Abraham, A., Paprzycki, M.: Traffic accident data mining using machine learning
paradigms. In Fourth International Conference on Intelligent Systems Design and Applications
(ISDA’04), Hungary, pp. 415–420 (2004)
2. Chong, M.M., Abraham, A., Paprzycki, M.: Traffic accident analysis using decision trees and
neural networks. arXiv preprint cs/0405050 (2004)
3. Dey, M.R., Satapathy, U., Bhanse, P., Mohanta, B.K., Jena, D.: MagTrack: detecting road
surface condition using smartphone sensors and machine learning. In: TENCON 2019—2019
IEEE Region 10 Conference (TENCON), pp. 2485–2489. IEEE (2019)
4. Iranitalab, A., Khattak, A.: Comparison of four statistical and machine learning methods for
crash severity prediction. Accid. Anal. Prev. 108, 27–36 (2017)
5. Alkheder, S., Taamneh, M., Taamneh, S.: Severity prediction of traffic accident using an arti-
ficial neural network. J. Forecast. 36(1), 100–108 (2017)
6. Zong, F., Xu, H., Zhang, H.: Prediction for traffic accident severity: comparing the Bayesian
network and regression models. Math. Probl. Eng. 2013 (2013)
7. Hashmienejad, S.H.A., Hasheminejad, S.M.H.: Traffic accident severity prediction using a
novel multi-objective genetic algorithm. Int. J. Crashworth. 22(4), 425–440 (2017)
8. Kunt, M.M., Aghayan, I., Noii, N.: Prediction for traffic accident severity: comparing the
artificial neural network, genetic algorithm, combined genetic algorithm and pattern search
methods. Transport 26(4), 353–366 (2011)
9. Mohanta, B.K., Jena, D., Satapathy, U., Patnaik, S.: Survey on IoT security: challenges and
solution using machine learning. Artificial Intelligence and Blockchain Technology, Internet
of Things, p. 100227 (2020)
10. Mohanta, B.K., Satapathy, U., Jena, D.: Addressing security and computation challenges in
IoT using machine learning. In: Advances in Distributed Computing and Machine Learning,
11. Mohapatra, N., Shreya, K., Chinmay, A.: Optimization of the random forest algorithm. In:
Advances in Data Science and Management, pp. 201–208. Springer, Singapore (2020)
12. Tanha, J., van Someren, M., Afsarmanesh, H.: Semi-supervised self-training for decision tree
classifiers. Int. J. Mach. Learn. Cybern. 8(1), 355–370 (2017)
13. Da Silva, I.N., Spatti, D.H., Flauzino, R.A., Liboni, L.H.B., dos Reis Alves, S.F.: Artificial
neural networks, p. 39. Springer, Cham (2017)
14. Yu, B., Song, X., Guan, F., Yang, Z., Yao, B.: k-Nearest neighbor model for multiple-time-step
prediction of short-term traffic condition. J. Transp. Eng. 142(6), 04016018 (2016)
15. Yin, M., Zeng, D., Gao, J., Wu, Z., Xie, S.: Robust multinomial logistic regression based on
RPCA. IEEE J. Sel. Top. Signal Process. 12(6), 1144–1154 (2018)
Analysis of Quality of Experience (QoE)
in Video Streaming Over Wi-Fi in Real
Time
M. Vijayalakshmi and Linganagouda Kulkarni
Abstract Over the years, in wireless and mobile networks, video traffic is becoming
more dominant. In order to assess the users’ satisfaction of the services, a measure
has to be considered which depicts the delight or annoyance of the users’ experience
with the services. Quality of Experience is one such measure which focuses on the
experience of the users with the services delivered, unlike quality of service (QoS)
which focuses on the media or network itself. In addition to video transmission,
Quality of Experience introduces a user experience-driven strategy that focuses on
the contextual and human factors. This is helpful because it expresses user experi-
ence both objectively and subjectively. Hence, in order enhance viewers’ experience,
measuring the Quality of Experience of the services along with network and a system
factor proves to be beneficial. We aim to analyze the Quality of Experience of users
in the university. The data gives insight about the various parameters that affect trans-
mission of video or any data in that regard. The quality of the transferred videos is
assessed by the end users by rating their experience. We aim to provide objective
and subjective measure of Quality of Experience by analyzing the factors affecting
Quality of Experience and the users’ experience, respectively.
Keywords Mean opinion score (MOS) · Quality of experience (QoE) · Quality of

service (QoS)
1 Introduction
User satisfaction is important for any service provider, since it is decisive in deter-
mining the success of the service. Hence, quality of service based on user’s perception
plays an important role. Quality of Experience (QoE) is one such measure that reflects
M. Vijayalakshmi (B) · L. Kulkarni

KLE Technological University, Hubli, India
e-mail: viju11@kletech.ac.in
L. Kulkarni
e-mail: linganagouda@yahoo.ac.uk
https://doi.org/10.1007/978-981-33-4543-0_10
80 M. Vijayalakshmi and L. Kulkarni
this user’s perception. It shows the satisfaction level of the customer is with a certain
service and represents how well the service fulfills the user’s expectation [1].
In video streaming and related applications, user viewing experience plays a major
role in determining whether the user wants to repeat the services or discard it forever.
A user will continue to avail the services of the same network provider depending
on the experience of the services offered, be it be video buffering and loading time,
or the quality of transmission. With the increasing demand of multimedia services
such as video transmission, there is a need to develop performance-based evaluation
metrics to evaluate video services/applications.
Although there are many video quality representation metrics like peak-signal-to-
noise-ratio (PSNR), jitter, bandwidth that objectively measures the quality of video
between the clients, but user’s views are not considered for quality evaluation, hence
they are incapable of representing the true experience of users. Quality of Experience
(QoE) is a user centric quality strategy that overcomes the shortcomings of the above
quality metrics. QoE is the degree of satisfaction or dissatisfaction of the user with an
application or service. There are various factors which drive the Quality of Experience
(QoE) for video consumption which in turn plays a key role in the perception of
quality of the service.
The paper is aimed to analyze the various factors affecting the quality of video
transmission over Wi-Fi of the university and thereby analyze the Quality of Expe-
rience (QoE) of the users with respect to these videos sent over the network. The
quality of the videos is assessed subjectively by the collective ratings given by the
users. These ratings constitute mean opinion score (MOS). MOS in this context is
a numerical measure of the human—judged overall quality of an experience. The
rating scale ranges from 1 to 5 with 1 indicating bad and 5 indicating excellent expe-
rience. The quality of the video is evaluated objectively by objective video quality
models, where several independent variables such as bit rate, length, PSNR are fit
against the results obtained in a subjective quality evaluation using regression tech-
niques. Finally, the objectively predicted values are compared with subjective scores
available as mean opinion score (MOS).
1.1 Motivation for Analysis of QOE
One of the most popular online services today is video streaming. It is occupying more
than 77% of all consumer Internet traffic [2] as per the cisco visual networking index.
Users demand high Quality of Experience (QoE) while using these video services
on wireless networks, such as Wi-Fi. This poses a challenge for network admin-
istrator’s environments such as university campuses and also the service providers.
Guaranteeing the best possible QoE becomes consequential. This leads to challenges
in optimizing network resources and also providing better experience to end-users.
Hence, QoE becomes a key metric for both network providers and end-users.
Analysis of Quality of Experience (QoE) … 81
2 Design for Analysis of QOE
2.1 Measurement Methodology
Setup In Fig. 1, we show the proposed framework for the analysis of QoE. The anal-
ysis of QoE of videos is done over university Wi-Fi network. The network condition
at a place depends on the health of the Access Point the user is connected to. For
our analysis three network conditions, i.e., good, medium and poor, based on the
performance and health of the Access Points in the campus, the places are selected
for the transfer of videos under study. The Access Points (AP) is Aruba AP-135 [3].
Videos is sent from sender (Client 1) to receiver (Client 2) over the selected places
of the campus. Data and the changes during the transfer of the videos are extracted
from the Aruba controller software, [4] where MAC address and IP address identify
the devices of transfer. FFmpeg [5] tool is used to extract the required video charac-
teristics of the received videos. FFmpeg is a video framework with large collection
of coding libraries. It is also used to calculate PSNR of the received videos. PSNR is
a video quality metric or performance indicator. Receivers rate their experience and
MOS from all the receivers is tabulated and compared with MOS obtained from the
QoE metrics.
Test videos A total of 22 videos are transferred from Client 1 to Client 2 at different
network conditions. The videos are of variable length, resolution and in MP4 format.
Fig. 1 Overview of the proposed system

2.2 QoE Metrics
PBR It is the number of bits per second. It determines the size and quality of the
video, the higher the bit rate, the better the quality. Hence, higher bit rate may be
providing excellent video quality.
Dropped frames When the connection to the server is unstable, or problems such as
random disconnections due to firewall/anti-virus/security software, routers, etc. Will
lead to dropping of frames. Because of this, some of the video frames will be dropped
in order to lower the traffic. This may lead to disconnection from the streaming server.
Due to congestion during the transfer of videos, the dropped frames will be resent
which constitutes the retried frames.
PSNR Peak signal-to-noise ratio is the ratio between the maximum power of signal
and the power of corrupting noise. Logarithmic decibel scale is used to express
PSNR, as many signals have a wide dynamic range. PSNR is used in detecting the
presence of dropped frames and the location of dropped frames in a video.
3 Related Work
Former works [6] have introduced a machine learning technique that explains the QoE
ground-truth, for all the video applications. In contrast to the above work, we focus
on analyzing the factors affecting the QoE for video streaming over university Wi-Fi
network and providing a comparison between subjective and objective MOS which
depicts the Quality of Experience of the users. Some works [7] shows the analysis of
video streaming over mobile network by considering MOS metrics. Models proposed
were a combination of clustering and logistic regression methods on a realistic data
set. Compared to this work, our model proposes to use different models such as
random forest, Ridge, Linear, and Lasso regression for analysis on a Wi-Fi network
in real time over a university scenario. Authors [8] proposed an analysis of QoE over
video streaming, by considering MOS (matrices). Proposed model is a combination
of K-mean clustering method and logistic regression method, and experiments were
conducted on realistic datasets and have the precision of 96.94, 97.13, and 97.54%
on dataset 1, dataset 2, and dataset 3. Authors [9] have developed a model based on
Markov chains for user experience using adaptive streaming in dynamic environment
and buffer-based DASH clients for switching frequency. The Authors [10] proposed
an SDN control plane approach to multimedia transmission problem, employing
video encoding based on latest standard. In paper [11], author explains about the
video streaming importance in the Wi-Fi environment and how helpful it will be to
stream video in Wi-Fi condition.
4 Data Analysis
4.1 User Study
The dataset comprises of 22 videos that are of variable length, resolution, codec,
size, and in MP4 format. These videos resent over Wi-Fi network and are received
by receiver (Client 2). The receiver is then asked to rate their experience based on
the quality of the received videos on a scale of 1-5. All the ratings are collected and
this constitutes the subjective MOS. Subjective MOS gives the user’s perception of
video quality.
4.2 Objective and Subjective MOS
Subjective MOS is taken from user’s rating. And, objective MOS is calculated by
taking characteristics of received video and different parameters depicting network
conditions. Train and test data are split in the ration 7:3. QOE metrics, subjec-
tive MOS, and different characteristics are considered as x, and objective MOS is
predicted for test data. The further subjective and objective MOS are calculated
(Fig. 2).
Machine learning techniques are used to predict the objective mean opinion score.
Different models like linear regression, Lasso regression, ridge regression, AdaBoost,
random forest have been implemented and their accuracy and mean absolute error
have been calculated. Linear regression performs the task to predict the value of
dependent variable (Objective MOS) based on a given independent variable (QoE
metrics). By applying this model, mean absolute error of 5.023 was obtained. To
reduce the over-fitting caused by simple linear regression and to reduce complexities,
some of the simple techniques like ridge and Lasso regression are used. By applying
ridge and lasso regression, mean absolute error of 0.867 and 0.327 was obtained,
Video Send over Controller Received Extraction of

Client 1 Wi-Fi Video Client 2 QoE Metrics
Packet Transmis- Analysis and

sion Data Prediction
MOS Scores
Fig. 2 Architecture of the proposed system

Table 1 Learning models for

S. No. Model name Mean absolute error
prediction of objective MOS
1 Linear regression 5.023
2 Lasso regression 0.327
3 Random forest 48.3 (accuracy)
4 Ridge regression 0.867
respectively. By applying random forest model, 0.483 accuracy was obtained. The
predicted MOS and actual MOS have been compared.
Model name mean absolute error linear 5.023 Lasso0.327 random forest 48.3
ridge 0.867 (Table 1).
5 Conclusion
Measuring the Quality of Experience plays a major role in determining users’ satis-
faction with the services. The videos sent and received over the network based on
different performance states of the access points like good, medium, and low shows
how the experience of the users is affected based on these parameters. The compar-
ison between the subjective and objective mean opinion score (MOS) depicts the
users experience based on their perception of quality, and the quality of the received
videos based on the different parameters that contribute to it, respectively. This helps
in understanding of the how the users perceive the quality of the videos as well as the
different network and video parameters that determine the quality of the transferred
videos.
6 Future Work
In future, this understanding becomes helpful in using different techniques such as

adaptation and optimization to enhance the experience of users in video streaming.
Furthermore, Quality of Experience of users can be analyzed for videos of different
formats.
References
1. Dai, Q.: A survey of quality of experience. In: Lehnert, R. (ed.) Energy-Aware Communications.
Springer, Berlin Heidelberg, pp. 146–156 (2011)
2. Pepper, R.: Cisco visual networking index (VNI) global mobile data traffic forecast update.
Tech. Rep. (2013)
3. Aruba ap-135. Available at http://content.etilize.com/user-manual/1023377357.pdf
4. Aruba controller software. Available at https://www.arubanetworks.com/products/networking/

gateways-and-controllers/
5. Dasari, M., Sanadhya, S., Vlachou, C.: Ffmpeg. Available at https://www.ffmpeg.org/about.
html
6. Kim, K.H., Das, S.R.: Scalable ground-truth annotation for video QOE modeling in enterprise
Wi-Fi. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS).
IEEE, pp. 1–6 (2018)
7. Wang, Q., Dai, H., Wang, H., Wu, D.: Data-driven QOE analysis on video streaming in mobile
networks. In: 2017 IEEE International Symposium on Parallel and Distributed Processing
with Applications and 2017 IEEE International Conference on Ubiquitous Computing and
Communications (ISPA/IUCC), December 2017, pp. 1115–1121 (2017)
8. Wang, Q., Dai, H.-N., Wang, H., Wu, D.: Data-driven QoE analysis on video streaming in mobile
networks. In: 2017 IEEE International Symposium on Parallel and Distributed Processing
with Applications and 2017 IEEE International Conference on Ubiquitous Computing and
Communications (ISPA/IUCC), pp. 1115–1121. IEEE (2017)
9. Poojary, S., El-Azouzi, R., Altman, E., Sunny, A., Triki, I., Haddad, M., Jimenez, T., Valentin,
S., Tsilimantos, D.: Analysis of QoE for adaptive video streaming over wireless networks. In:
2018 16th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and
Wireless Networks (WiOpt), pp. 1–8. IEEE (2018)
10. Awobuluyi, O., Nightingale, J., Wang, Q., Alcaraz-Calero, J.M.: Video quality in 5G networks:
context-aware QoE management in the SDN control plane. In: 2015 IEEE International Confer-
ence on Computer and Information Technology; Ubiquitous Computing and Communica-
tions; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing,
pp. 1657–1662. IEEE (2015)
11. Zhu, X., Schierl, T., Wiegand, T., Girod, B.: Video multicast over wireless mesh networks with
scalable video coding (SVC). In: Visual Communications and Image Processing 2008, vol.
6822, p. 682205. International Society for Optics and Photonics (2008)
Self Driven UGV for Military
Requirements
Hrishikesh Vichore, Jaishankar Gurumurthi, Akhil Nair,

Mukesh Choudhary, and Leena Ladge
Abstract Soldiers of any nation are engaged in close combats against terrorists
where human life is at stake. Unmanned Ground Vehicles are used everywhere to
reduce human life loss as it may be impossible to have a human operator present
at the location. The vehicle will have a set of sensors to observe the environment
which mainly contains four cameras and a gun loaded on the top of the UGV acting
like a turret. The bot will have two modes of operation. For autonomous driving,
the accuracy of the self-driving model is about 74% and it will make decisions by
getting real-time feeds from the camera by using the Image Processing algorithm
with an accuracy of about 95%. For manual driving, a human operator will control
it from a remote control centre over the Internet by providing security against one of
the biggest threat in remote control vehicles i.e. Man-in-the-middle (ITM) Attack.
Keywords Unmanned ground vehicle · Tensorflow · Convolutional neural

network · Raspberry pi · Object detection · Carla
H. Vichore (B) · J. Gurumurthi · A. Nair · M. Choudhary · L. Ladge

SIES Graduate School of Technology, Navi Mumbai, India
e-mail: hrishikesh.vichore16@siesgst.ac.in
J. Gurumurthi
e-mail: jaishankar.gurumurthi15@siesgst.ac.in
A. Nair
e-mail: akhil.nair16@siesgst.ac.in
M. Choudhary
e-mail: mukesh.choudhary15@siesgst.ac.in
L. Ladge
e-mail: ladge.leena@siesgst.ac.in
https://doi.org/10.1007/978-981-33-4543-0_11
88 H. Vichore et al.
1 Introduction
Carrying out military affairs requires a lot of manpower these days. Thus, the secu-
rity of life becomes the most important and prevailing question. To solve this a
concept known as Unmanned Ground Vehicle or in short UGVs was introduced. It
is a mechatronics robot that is used in place of humans to carry out life-threatening
tasks such as surveillance, disposal of bombs and shooting on spot. Albeit a military
robot, the application of it does not limit to defence systems only. It can also be used
for domestic purposes such as a toy car, cleaning bot or a payload carrying bot etc.
People, before used to lambaste on this idea but lately they are more munificent about
this idea. Climacteric operations such as rescue operations are the coup de grace for
using this technology. To also reduce the latency and time of the decision the bot is
self-driven. With 74% of accuracy, the bot will be able to avoid obstacles and correct
its location which is very helpful in rescue operations. The vehicle will use various
components such as cameras, servo motor, stepper motors, geared brush-less DC
motors. The cameras will observe the environment and detect human location and
movements. The vehicle will be controlled automatically through if needed a user
who is in a remote location will be able to override the decision of the autonomous
bot. The vehicle would be armed with a weapon mounted on a turret which will auto-
matically change its direction and follow the target. A person remotely operating the
vehicle will be getting the live video feed from the camera and will be able to trigger
the weapon according to his own decision. Security is a major issue in this type of
vehicles. Spoofing and Man-in-the-middle (MITM) attacks are the common attacks
used in security breaches. To provide security to the vehicle it will be averted from
such attacks using encryption techniques such as two-way authentication protocols
and Strict Transport Layer Security (STLS).
2 Literature Survey
Thomas et al. [1] describes the design and implementation of the unmanned ground
vehicle (UGV) for surveillance and bomb diffusion is presented. This paper gives
a general idea of surveillance, although the feedback is a haptic glove system. Fol-
lowing this idea, Noor et al. [2], describes the development of a remote-operated
multi-direction Unmanned Ground Vehicle (UGV) was designed and implemented.
Xbee Pro, PIC micro-controller was used to achieve this. The 16-bits Microchips
microcontroller were used in the UGV’s system that embeds with Xbee Pro through
variable baud-rate value via UART protocol and controls the direction of wheels.
For securing the communication Sandhya and Devi [3], provides countermeasures
for thwarting the MITM attack. Along with the existing approaches a new way was
discussed in the paper. It includes the generation of security overheads and allocating
a separate channel for the security.
Self Driven UGV for Military Requirements 89
Yong et al. [4], provides the methods for performing MITM along with defence,
handicapping the attack was also analyzed. Quantum key distribution was used to
distribute cryptographic key. Correspondents have no way to verify each other’s
key which created all the problem leading to communication be noticeably delayed.
For increasing the speed of object detection and still keeping the accuracy same,
Kanimozhi et al. [5], provides the solution of a lightweight model to reduce the
computational overhead. For the above purpose, MobileNet was used. Single Shot
Multi-Box Detection was used to increase accuracy and identifying the items in the
household real-time. Tensorflow Object Detection API is also used in this process.
The insights of Anping Gu and Xu et al. [6], proved to be very helpful in making the
detection of objects very fast. In this paper, a vision-based real-time circular ground
marker detection method, which is used to guild small autonomous control robotic
UAV (RUAV) to pick up a ground target object, is presented. The method is based
on RHT algorithm and running entirely on GPU via Microsoft DirectX 11 API since
the CPU on-board doesn’t have sufficient computing power.
To further improve the performance of object detection, Talukdar et al. [7], does
transfer learning through the use of synthetic images and pre-trained convolutional
neural networks offer a promising approach to improve the object detection perfor-
mance of deep neural networks. The object detection performance of various deep
CNN architectures is also studied, with Faster-RCNN proving to be the most suitable
choice, achieving the highest mAP of 70.67. Transfer Learning was used to increase
the accuracy of the state google’s TensorFlow object detection API and extend it on
a synthesized data. For self-driving car and it’s testing on an environment CARLA
the research of Dworak et al. [8], proved very resourceful although this paper uses
LiDAR for object detection, RGB camera sensor of the environment can also be
used for object detection. The data generated from the CARLA was used to train
the model which is a CNN model to create a self-driving experience. For navigation
of self-driven cars using CNN, Duong et al.[9], provided a very innovative way of
dealing with the complications of using Markov models and hence need of generative
models was eliminated. Such a colossal task was reduced down to one simple model.
3 Existing System
It allows the soldier to spot out the enemies on the patrol or waiting to ambush them. It
can help save lives. It can adjust the strategies based on the surroundings. UGVs can
be used to detonate the explosive. Helps in the firefight, combat as well as to supply
ammunition. It can spot the explosives or the human opposition before the soldiers
can be harmed in the combat. Some of the drawbacks in existing systems where,
Bandwidth always is the problem with the wireless solutions and even some wired
solutions. Battery discharge during some mission. If autonomous then can misfire
some other person rather than the enemy. Expensive, The big disadvantage is the
cost at which these vehicles bring in our military. Requires its specific programming,
They require many engineers to spend countless hours on testing and designing them.
They can be destroyed before they have benefited any of our soldiers.
4 Network Architecture
4.1 Convolutional Neural Network(CNN)
Majorly, the experiments done for the self-driving car involved some of the other
variations of generative models viz. hidden Markov models. This method was com-
putationally very expensive and thus needed a simple solution out of it. CNN is one
such solution to this problem. The camera fitted on the vehicle would act as the data
on which the training will happen. Accompanying the camera feeds is one more
feature viz. steering angle. All the above information is fed to the CNN model for
training and error will be solved by using the back-propagation algorithm. The train-
ing files are created by recording the CARLA environment which tells the speed of
the direction and steering angle. The data is recorded and stored in a npy file exten-
sion. Each npy file was of 185 MB and there were 106 such total files making the
entire dataset around 19.1 GB. Even though the recording was done at 1280 × 720p it
was later resized to 480 × 270p to make the images more CNN friendly. The record-
ing was done at 25 frames per second. The architecture used for training the model
was Inception v3 model. Since Inception focuses mainly on computational cost this
model was used. All the previous weights of the Inception v3 model was used during
the training.
There are 5 epochs overall i.e. the entire training will be done 5 times but fitting
for every file would be done just once in the batch size of 15. The learning rate for this
training was 1e-3. The average accuracy came out to be 74%. To avoid overfitting a
dropout rate of 0.3 was set. It took around 8 hours of training on Nvidia GTX 1070
GPU (Fig. 1).
4.2 Equations
Equation of CNN

G[m, n] = ( f ∗ h)[m, n] = h[ j, k] f [m − j, n − k] (1)
j k
Fig. 1 Training of the model
Equation of Sobel filters

⎛ ⎞ ⎛ ⎞
−1 0 1 1 2 1
K x = ⎝−2 0 2⎠ , K y = ⎝ 0 0 0 ⎠ (2)
−1 0 1 −1 −2 −1
Equation of Gradient Intensity

y
|G| = I2x + I2 (3)

Iy
(x, y) = arctan (4)
Ix
Equation of Gaussian Blur

1 (i − (k + 1))2 + ( j − (k + 1))2
Hi j = exp − ; (5)
2π σ 2 2σ 2
1 ≤ i, j ≤ (2k + 1) (6)
Equation of Hough Line Transform
r = x cos θ + y sin θ (7)

4.3 CARLA
CARLA is an open source simulator for researching self driving cars. It’s dynamic
and free roam capacity along with different weather patterns and changing of game
mechanics according to it makes it the front runner in any consideration for self
driving experiments. As of writing this paper the latest stable version available for
Windows 10 is 0.9.5. CARLA along with standard RGB camera has 7 different
sensors which can also be used to make the self driving experience even more realistic.
In the experiments every frame from the CARLA was picked and applied Gaussian
blur (Fig. 2) to make out the out-linings of the lane and delete every other components
that doesn’t matter. To zero in on the lanes and avoid crossing of it this is a must do
process. Before applying Gaussian Filter Canny Edge Detection was also used with
two threshold values as 150 and upper end capping at 220. These values were set after
a bit of research and found out that the algorithm works best at these values. A 5 × 5
mask was applied during Gaussian Filter. To extract the shape of the lanes Hough
Lines Transform was used. A major reason behind this being it can recognize the
shape even if it broken or distorted to some extent.Finally, the images with Gaussian
Blur on it was imposed on the original images and we got back the original images
with two lines of green color indicating lanes (Fig. 2b). Our car will always be in
between these lanes and crossing it accounts for an error. This image which has
lanes imposed on it along with the vehicle is considered for training the model. The
prediction part of the model outputs an image of size 480 × 270p which is then scaled
up to 1280 × 720p. The choices for the prediction are straight, left, right, reverse,
forward left, forward right, reverse left, reverse right and lastly no keys pressed.
(a) Gaussian Filter Applied on (b) Lane detection for CARLA us-
CARLA ing Gaussian Filter
Fig. 2 Guassian filter

5 Proposed System
5.1 Objectives
• It identifies the person as well as the gun with the help of object detection API
from google.
• Increases the security.
• The bot can be triggered from the remote location.
• Can identify person in the night as well.
• To make the bot self driven.
Operation of weapons through triggers and central controller is shown. It also

depicts how the camera system helps in triggering the weapon.
The flowchart [Fig. 3] of how the UGV is controlled and the cameras giving feed
to the object classification model so as to detect any person intruding. Based on this
the above weapons system will be triggered.
5.2 Advantages
• Increases mobility.
• Loss of human life is reduced.
• Voice or autonomous controlling of bot.
• Invulnerable to Spoofing and MITM.
Fig. 3 UGV control system

6 Methodology
We are using Tensorflow object detection API, a deep learning model which is pre-
trained model to detect the number of persons along with the weapon they are using.
The current version of the object detection API is 9.0.2. MobileNet V3 and COCO
database are used for training the model for which ResNet is used. A protocol buffer
known as Protobuf version 3.0.0 is used which helps in maintaining the kernel-based
interaction of the API for scheduling of the jobs. For the vehicle, iron is used instead
of aluminium to maintain ground clearance and stability. Two brushless DC planetary
motors are used. Two batteries with 12 V 9 A configuration are used in parallel. To
handle such heavy load RKI 1341 is used providing safety to the micro-controller.
The micro-controller used here is Raspberry Pi 3B+. Rpi is present on the vehicle for
processing all the inputs which includes controlling the hardware and also managing
the video feed input and providing it to the server. For safety precautions, 100 F
capacitor is used along with 1000k acting as a shield. End stop switches are used to
make the vehicle move forward, backward, left and right. The vehicle is to be armed
with a turret, originally an air pressure gun was used but reloading, in that case,
was very difficult. So, a gear-based model gun is used in that place. Stepper motor,
NEMA 17, is used to turn the motor with 30 steps. Security is a major concern in
this type of system and one of the most tangible attacks in this situation is Man in the
Middle Attack. So, to prevent this a two-step authentication is used and the service
was provided by NGROK. It also uses SHA-512 system for encryption. NGROK
creates a VPN tunnel through there servers and providing proxy servers for safety.
The connection is a TCP/STLS connection which maintains the security throughout.
The bot is contacted through the internet and no hotspot module is connected but a
GSM module is used.
7 Experimental Set up
Image of the bot (Fig. 4a) after completely assembling it with all the cameras and
gun. Using Scapy (Fig. 4) with python to perform Man-in-the-Middle attack in kali
linux. Using Metasploit (Fig. 4b) on kali linux to perform Man-in-the-Middle attack
using different types of sniffers. Using Raspberry pi (Fig. 4c) to provide internet
to the bot and transfer camera feeds over it. An example of how tensorflow object
detection API (Fig. 4d) works and classifies different types of objects. A diagram of
how CNN (Fig. 4e) is used to make self driven cars using camera feeds and steering
angles.
(a) Bot image (b) ARP Spoofing Part 1
(d) Camera and Internet Connec-

(c) ARP Spoofing Part 2
tivity via Rpi
(e) Object Detection API (f) Flowchart for Self Driving Car
Fig. 4 Experimental setup
8 Results
8.1 Model Training and Security of the Bot
To make the system secure and avoid MITM attack NGROK (Fig. 5a) was used.
It provides tunnels from a public endpoint to local running services. Given below
is the NGROK tunnel in working condition. The training accuracy (Fig. 5b) of the
model caps out at just a hair below 80% after more than 30 thousand iterations of
training.The validation accuracy (Fig. 5c) caps at around an average of 74% with
clear signs of over-fitting here and there.
(a) NGROK Secure Tunnelling (b) Training Accuracy
(c) Validation Accuracy
Fig. 5 Accuracy results and security
8.2 Testing of the Model
Following are the examples of testing of self driving car.

It can be clearly seen that in some locations the car crosses the road and enters
pedestrian path before correcting it’s way and returning to the street. This can prove
to be very harmful and can be eliminated by incorporating more training data into it
(Fig. 6).
9 Conclusion
Thus concerning the above lying scope decided for the project, we have successfully
implemented an Unmanned Ground Vehicle for threat detection and elimination
which is a prototype bot which can be used in war-prone areas to detect the threat
remotely using classification model which shows object detected in-camera and can
be remotely controlled over the internet along with the camera feed. CARLA is a
continuous and dynamic environment and thus data collected from it provides a very
(a) (b)
(c) (d)
(e) (f)
Fig. 6 Testing of the model on CARLA
realistic approach towards the self-driving car. CNN proves to be a better method
than hidden Markov models which are generative and computationally expensive.
The accuracy of the model is around 74%.
10 Future Scope
Due to limited computational resources, the data on which the model was trained
was limited. So, more data can be collected and different optimizers can be tried out
as well along with different loss functions. For the vehicle, a more lightweight metal
can be used to reduce its weight and the object detection API can be made more fast
and efficient as well. The reliability of the model can be increased because it still
runs out of bounds which can be a very serious threat. Over-fitting should also be
reduced.
Acknowledgements Every aspect of this idea that was brought into fruition wouldn’t have been
possible without the tremendous family support for all the authors. We would also like to thank the
SIES Graduate School of Technology and the HOD of IT Dept. Dr. Lakshmisudha Kondaka for
allowing us to work on the project and supporting us throughout.
References
1. Thomas, S., Devi, A.: Design and implementation of unmanned ground vehicle (UGV) for
surveillance and bomb detection using haptic arm technology. In: 2017 International Conference
on Innovations in Green Energy and Healthcare Technologies (IGEHT), Coimbatore, pp. 1–5
(2017)
2. Noor, M.Z.H., Zain, S.A.S.M., Mazalan, L.: Design and development of remote-operated multi-
direction unmanned ground vehicle (UGV). In: 2013 IEEE 3rd International Conference on
System Engineering and Technology, Shah Alam, pp. 188–192 (2013)
3. Sandhya, S., Devi, K.A.S.: Contention for man-in-the-middle attacks in bluetooth networks.
In: 2012 Fourth International Conference on Computational Intelligence and Communication
Networks, Mathura, pp. 700–703 (2012)
4. Wang, Y., Wang, H., Li, Z., Huang, J.: Man-in-the-middle attack on BB84 protocol and its
defence. In: 2009 2nd IEEE International Conference on Computer Science and Information
Technology, Beijing, pp. 438–439 (2009)
5. Kanimozhi, S., Gayathri, G., Mala, T.: Multiple real-time object identification using single shot
multi-box detection. In: 2019 International Conference on Computational Intelligence in Data
Science (ICCIDS), Chennai, India, pp. 1–5 (2019)
6. Gu, A., Xu, J.: Vision based ground marker fast detection for small robotic UAV. In: 2014
IEEE 5th International Conference on Software Engineering and Service Science, Beijing, pp.
975–978 (2014)
7. Talukdar, J., Gupta, S., Rajpura, P.S., Hegde, R.S.: Transfer learning for object detection using
state-of-the-art deep neural networks. In: 2018 5th International Conference on Signal Processing
and Integrated Networks (SPIN), Noida, pp. 78–83 (2018)
8. Dworak, D., Ciepiela, F., Derbisz, J., Izzat, I., Komorkiewicz, M., Wójcik, M.: Performance of
LiDAR object detection deep learning architectures based on artificially generated point cloud
data from CARLA simulator. In: 2019 24th International Conference on Methods and Models
in Automation and Robotics (MMAR), Miedzyzdroje, Poland, pp. 600–605 (2019)
9. Duong, M., Do, T., Le, M.: Navigating self-driving vehicles using convolutional neural net-
work. In: 2018 4th International Conference on Green Technology and Sustainable Development
(GTSD), Ho Chi Minh City, pp. 607–610 (2018)
Vehicular Ant Lion Optimization
Algorithm (VALOA) for Urban Traffic
Management
Ruchika Kumari and Rakesh Kumar
Abstract In the past years, various routing methods have been advanced for
VANETs. Routing protocols that utilize various parameters have been defined to
be most suitable for vehicle networks due to their efficiency in production with DE
modifies due to vehicular network mobility. Parameters are linked-stable, network
speed, and environment situations, etc. This research article presents a traffic-based
management system-based protocols for VANETs satisfactory for urban city back-
ground. The novel method is an improved description of the dynamic source routing
(DSR) energy-based protocol. The developed protocol, termed effective DSR, uses
an ant-based method to search a path that has optimized network connectivity. It is
supposed that every vehicle node has a vehicle id of paths such as road and streets,
etc. Utilizing data included of small control network packets called ANTLIONs,
the vehicle nodes evaluate a distance, energy for roadsides or streets to the network
connections. ANTLION data packets are defined by the vehicle in particular areas.
To search the valuable and the perfect route among source to sink node, the src
vehicle defines the route on roads with minimum total distance and energy for the
complete path. The fitness function of the planned routing protocol has been identi-
fied, and its presentation has been calculated in reproduction parameters or inputs.
The experiment outcomes define that the PDR rate 10% improved as compared with
the existing protocol (VACO: vehicle ant colony optimization) that also utilized and
vehicle ant lion optimization (VALO) method. In the calculation, the control the E2D
and network overhead (NO) is also mitigated.
Keywords Vehicle ant colony optimization (VACO) · Vehicle ant lion

optimization (VALO) · Dynamic environment (DE) · VANET · And dynamic
source routing (DSR)
R. Kumari (B)
Department of CSE, NITTTR, Chandigarh, India
e-mail: ruchika.katoch91@gmail.com
R. Kumar
Department of CSE, CUH Mahendergarh, Mahendergarh, Haryana, India
e-mail: raakeshdhiman@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_12
100 R. Kumari and R. Kumar
1 Introduction
In the past few years, enhancements in ITS have been motivated to reduce traffic
jamming to mitigate time complexity and overhead, the improvement of transporta-
tion management or traffic safety, etc. To attain the major communication needs of
both security and without security uses in VANET situation, there is a requirement to
advance vehicle communication (VC) and smart communications (SCs). VANET is
a sub-type of a MANET, in which VCs with each other and with nearby static road-
side device (RS). Its communications include various models like as a V2V and V2I.
VANET is a developing topology that aimed to present a wireless communication
(WC) between rotating vehicles and also among vehicles and organization’s stations
[1]. The main goal of VANET is to present security-related data to vehicles. Vehicles
interchange status data like speed and location of periodic data known as beacons,
to generate awareness for neighboring vehicles, improve security, and decrease the
rate of the accidents.
Figure 1 defines a characteristic vehicle network set-up, where vehicle-2-
infrastructure transportations can be utilized to access position services or attain
traffic data. Vehicle-2-Vehicle can be engaged to prepared about difficulties or reach
out of coverage vehicle nodes (CNs) done multihop message.
Vehicular communication is used in several applications with highly varied
requirements. The probable applications of VANET are secure oriented, convenience,
and commercial oriented. Some of the uses of VANET networks are: security, weather
conditions, and traffic management, etc., [2].
In the existing routing protocol described as a hybrid approach (VACO), it is a
combination of two algorithms which are GSR and ACO optimization algorithm.
Fig. 1 Illustration of
VANET scenes: accident
information (V2V) and V2I
to send data to difficulties
services [18]
Vehicular Ant Lion Optimization Algorithm (VALOA) … 101
VANET Routing protocols
Topology Position Cluster GeocastR Broadcasti

based dependent dependent outing ng Routing
protocols Routing Routing protocols protocol
protocols protocols
Fig. 2 Categorized of routing methods in VANET
Network
Initialization
Searching the
vehicle nodes for Stop
data
transmission
Develop a DSR After that

routing protocol implement a
Calculate the (to signal novel approach Paramet
coverage broadcasting, (VALO) ers
distance (with reply, and algorithm
Dist_formula) maintenance)
Fig. 3 Research algorithm process
Geographical source routing is the linking of the positional and topological-based

routing. The selection of the route and searching of the shortest path is used Dijkstra
approach. It utilizes a reactive position-based service to achieve the receiver address.
It utilizes the best path to advance data packets. The main structure of the routing
of the GSR routing protocol follows the top town operation. The application layer
presents benefits to interface with users and other layers for the transmission of the
data. The transport layer utilizes UDP and TCP for controlling the congestion over
the VANET system [3]. The network layer presents a path to the information through
RLS. The next layer is the physical layer which presents the wireless transmission that
emphasis IEEE802.11a, IEEE802.11b, and IEEE802.11c. The issue of the hidden
data, flooding can be prevented at the time of communication in this approach. Ant
colony optimization approach is an algorithm that is nature inspired by the foraging
conduct of the ants. Pheromone is the part of the hormone that can be identified
by ants. Its appeal to ants, and ants follow maximum pheromone absorptions. It
includes the concept of the specialization of the group of the regulation and selective
rules. This algorithm is based on swarm intelligence approach. The approach is
based on the standard evolution and shared conduct in animals. An algorithm is
dependent on the genetic swarm of the specific insect sequences. It leads to a complex
approach and may have intelligent conduct by the complex interface of the thousands
of automating swarm associates [4]. The communication is based on the main nature
with no supervision.
In the existing research work, an author [5] presents a traffic-aware location that
depends on routing methods for VANET. An improved version of the geograph-
ical source routing (GSR) method is used in existing work. VACO method found
an optimal path between a source and a sink, search route connectivity in the
network. VACO method was searching the route to improve network performance,
but maintenance cost was high, and packet delivery was only 10% improved.
The response of this research article is ordered as express: In Sect. 2, defines the
classification of numerous routing protocols of Vehicular ad hoc networks). After
that, 3 Section defines the previous routing protocols that cannot adequately satisfy
the routing requirements in these vehicular networks, due to the active behavior
of VANETs, which is a consequence of traffic situations and problems. Section 4
describes the research policies using dynamic source routing (DSR) protocol, ant
lion optimization (ALO) algorithm, and research methodology. Sections 5 and
6 present the experimental result analysis, mathematical formulas, comparison
between proposed and existing routing protocols (VACO, DSR, and VALO), define
conclusion and future scope.
2 Routing Scenarios
The routing process is the provider responsible for inventing and protecting routes
between origin and destination hops. The routing process is essential for network
operations. Routing protocols are used for interchanging node data among the nodes
in the network with the least network overhead. VANET routing protocols may be
indicated by different factors such as routing algorithm used, information of routing
data, similarities of protocols, network protocols. The VANET can be categorized as
[6].
2.1 Topology-Based Protocol
This protocol is to connect the data that is present in the system to perform the
forwarding of the data. These protocols are categorized as [7]. The methods use
associations data that occur in the vehicle network to achieve packet forwarding.
2.1.1 Pro-active Routing Protocol
This routing protocol is the method of direction finding of the data such as the nearest
node is managed in contextual instead of the transmission demands. The data packets
are regularly broadcasted and scattered between the hops to manage the route, and
then, the routing table is developed to build in the hop that recognizes the next node
along the receiver hop. The main benefit of the routing protocol is that there is a
requirement of the path investigation though receiver hop which is placed in the
contextual, but this protocol has minimum latency. It is known a table-driven routing
protocol. The protocol runs periodically due to the interchanging of the topology
between the hops in the system.
2.1.2 Re-Active Routing Protocol
This protocol contains the route investigation stage, where query packets are scattered
to the system for searching for the route and completion of the task. The routing
protocols is known as on demand routing protocol since of the periodic updating of
the route when the information is to be transferred.
2.1.3 Hybrid Protocol
This protocol is developed to decrease the data overhead of pro-active routing proto-
cols, thus reduce the starting path investigation delay in reactive routing protocols
[8].
2.2 Position-Dependent Routing Protocol
This protocol contains the group of routing approach. The geographical positioning
data features are shared to excellent the nearest advancing nodes. The data packet is
forwarded in the absence of the mapping data to one neighbor node that is nearest
to receiver hop. This is essential routing due to the absence of the globalized path
from sender to receiver hop which is required to be generated and managed. These
protocols are location greedy vehicle to vehicle protocol and delay resistant protocol.
2.3 Cluster-Dependent Routing Protocol
This protocol is formed in which vehicles are placed close to each other to form
clusters. Every cluster has a unique CH that is accountable for intra- and inter-
cluster maintenance. Intra-cluster hops interconnect using directional links. On the
other hand, inter-cluster communication is presented by cluster head (CH).
2.4 Geo-Cast Routing Protocol
It is mainly a position based on multiple route protocol. The major objective of

this protocol is to transmit packets from source to all connecting hops in a unified
geographical area. The vehicles which are placed in the external region of the zone are
not notified to prevent an unexpected dangerous response. This protocol is determined
as the multiple cast provision in a geographical area. In the receiver area, unicast
routing is utilized to send data packets [8].
2.5 Broadcasting Routing Protocol
It is mainly a position based on multiple route protocol. The main goal of this protocol
is to transmit data packets from source to all connecting hops in a unified geographical
area. The vehicles which are placed in the external region of the zone are not notified
to prevent an unexpected dangerous response. This protocol is determined as the
multiple cast provision in a geographical area [9].
3 Prior Work
In the section elaborates on the survey of the various research articles in VANET.
As described already, routing procedures are a major problem in vehicular ad hoc
networks. Goudarzi et al. [10] presented research on a traffic-aware position-based
routing protocol for VANET that was appropriate for the city scenario. The routing
protocol is an improved Ver. of the geographic src routing protocol. GSR protocol
is utilized as an ant dependent approach to search the path that has the optimum
connection. It was estimated that each vehicle has a digital map of the path that
consists of the distributed route. The data contain the smaller controlled packets
which are known as the ant and vehicle compute the mass of each route segment
related to that connection. The data packets of ant were established by the vehicles in
street areas. The optimum path was searched among the sender and receiver, where
the sender vehicle recognizes the route of the mapping street with less mass of the
whole path. The exact function of the planned protocol was recognizing, and perfor-
mance was evaluated through simulation outcomes. Simulation outcomes showed
that a PDR was enhanced by more than 10% for speed up to 70 km/h and associ-
ated to the VANET routing protocol that was based on the VACO. Mejdoubi et al.
[11] presented a segmented probable road traffic maintenance scheme for VANET.
It aimed at recognizing the traffic on the road along with the regular adoption of the
path at every junction to decrease the time of driving and prevent congestion. The
communication among the vehicles and roadside units determines the traffic predic-
tion which can be acquired by the segmented method. Nawaz and Sattar [12] analyzed
traffic in rural and urban areas using vehicle ad hoc network routing protocols. In this
research, some of the protocols were studied which were listed as AODV, DSDV, and
DSR. The exploration was achieved in both rustic and urban zones. The examination
was performed based on information drop, vehicle thickness, and throughput and
starts to finish delay. It was investigated from the got outcomes in the type of low
packet drop and maximum throughput. DSR gives better outcomes when contrasted
with AODV and DSDV in country regions, and AODV provides great execution in
contrast with DSR in conditions of low thickness. Saha et al. [13] proposed research
through simulation parameters of different cities. This research showed a near trial of
different versatility situations of vehicular specially appointed system in three surely
understood Indian metros. The AODV routing protocol has been utilized for the
simulation results. The comparison analysis was done among protocols based on the
packet drop, throughput, and complete time taken by the test system to simulate the
given system. Durga et al. [14] defined reliable information distributed in vehicular ad
hoc networks. Impact evasion and traffic advancements were further investigations of
imminence in the Wise Vehicle Framework. The widespread data and proficient data
between vehicles were significant parts of the considerable number of ITS appli-
cations. Guo et al. [15] implemented a real-time application in monitoring traffic
environment. In this research, they initially proposed a compelling constant traffic
data sharing component which depends on a dispersed transportation framework
with RSUs, which has a lower registering intricacy and less excess.
4 Research Policies
In this section, mitigate the existing issues; the major purpose of this proposed work
is to grow a technique for routing protocol in vehicle network to enhance:
• PDR full form is Packet Delivery Ratio
• E2D full form is End to End Delay
• RO’s full form is Routing Overhead.
The research technique utilizes a dynamic routing protocol (DRP) and the ant
lion optimization (ALO) approach (VALO) and hence improving the delivery rate,
network overhead, and end to end delay (E2D).
4.1 Dynamic Protocol
The DSR is a basic routing protocol in which the source determines a series of
intermediate hop in the data packet of the routing table. In this protocol, the header
is copied in the query packet of the middle hop that is transferred. After that, the
receiver retrieves the route from a query and utilizes it to reply to the receiver hop. In
case, receiver hop forward multiple paths, the source hop will get the data and store
multiple paths from the receiver hop. The other hops used a similar connection in
the present path [13]. DSR is a re-active protocol that depends on the source route
method. This protocol is mainly reliant on the link state convention in which the
sender initiates the route request on-demand basis [16].
4.2 Ant Lion Optimization Process
The ant lion optimization algorithm is an evolutionary approach used in the searching
of the area through the establishment of the randomized output [13]. The group of
the applicant leads to searching for the exact global optimum output rather than a
random variable. The method used for resolving the problem of internal and external
results [14]. Hence, there is the establishment of the required optimal value with
randomized alterations in the output value. In this method, there were maximum
possible results to get the desired possible optimal comparison of the native optima
[17].
Initially, to create a vehicle ad hoc network with x-coordinates (network length)
and y-coordinates (network width). It defines the simulation parameters such as
vehicle nodes, energy, veh_ids, and data packet rates. It searches the source node and
destination node in the VANET. After that create the coverage set in the VANET. It
calculates the coverage set distance, range, and matrix of the VANET. Develop the
dynamic source routing protocol (DSR) algorithm to send the request one node to
the intermediator node. If route request sent the one node to another vehicular node
for data or packet transmission in VANET. If request accepts through intermediator
node using reply back. In case, route error occurs, then the DSR third phase used
(Route Maintenance). After that, evaluate the performance parameters such as PDR,
routing overhead (RO), and E2D. In the proposed algorithm, the ALO optimization
algorithm is an evolutionary approach used in the searching of the area through
the establishment of the randomized output. In this process is optimized the routes
and evaluate the performance metrics (PDR, RO, and E2D) and compared with the
existing VACO traffic control protocol.
5 Simulation Result
The simulation tool used MATLAB is the high-level programming language and
interacting environment for the numerical, alpha-numerical, and mathematical,
programming approach. It is established by the organization of the MathWorks.
Tables 1 and 2 show the network simulation parameters such as network area 2000
m × 2000 m; range of data communication value is 300 m; the protocol used DSR
for data communication in sequence manner; vehicle nodes used to data sending one
hop to another hope such as 0, 5, 10, 15, 20,…, 30. RSU is called as receiver value
is 5, coverage distance evaluate is 100, and network performance is calculated based
on PDR, network overhead, and delay.
The mathematical formula’s used in this research work:
Packet delivery ratio:
It is the proportion of the amount of the packets to be transferred. The performance
of the network is improved with an increase in PDR.
Mathematically, it is given as,
Table 1 Differentiate various routing protocols in VANET [10]

Protocol type Topology Position based Cluster Geocast Broadcast
based based based based
FM: Forwarding WMH: Heuristic WMH: WMH: Wireless
method Wireless Wireless Wireless multihop
multihop multihop multihop
Method recovery No No Yes No No
The need for a No No Yes No No
digital map
Structure need Yes Yes No Yes Yes
Scenario City area City area City area High-way High-way
area area
Table 2 Simulation
Parameters Values
parameters
Network area 2000 m × 2000 m
vnode 5, 10, 15, 20, 25, 30, …
Range 300 m
Energy Randomly
Parameters PDR, E2D, and RO
Data packets Randomly
RSU 5
Fig. 4 Process is DSR protocol
No of packets achieved
PDR = × 100
No of packets transferred
Network overhead
It is the proportion of the amount of the packets created to the total number of the
packets created.
Number of the packets which is generated

Network overhead =
Total number of the packets generated
End to end delay

The network performance metric that defines the average amount of the different
delays of every data packet achieved by sink hop, and the period a data packet is
forwarded by sensor hops.
End to end delay = Packets received time − Packets transferred.

Figure 4 shows the deploy the VANET as a receiver. Calculate the vehicular ad
hoc network area based on the network length and network width. To search the
start node for packet transmission and a destination node in the vehicular ad hoc
network. The coverage set and calculate the distance between the source node to the
destination node in the VANET. It calculates the network distance from the coverage
range and matrix of the VANET. This network defines the vehicle node ID. When
the user sends the data one node to another node and assigns the unique id which is
100 up to 500, a unique id exceeds then overload increase and delays occur in the
VANET.
Figure 5 shows the path maintenance process in the VANET. Particularly, route
maintenance in the DSR protocol needs no periodical data packets at any equal in
the network. For instance, DSR does not need any route broadcasting and nearest
recognized data packets and does not depend on the functional data. The whole
on-demand conduct and loss of the periodical operation permit the amount of the
overhead data packets produced by DSR to measure data. When hop starts to transfer,
the pattern of the communication modifies, and the route data packet identifies the
route. In contrast to route searching, hop recognizes multiple paths to receiver hop.
It permits the response to route modification that may take place rapidly.
Figure 6 shows the sender will forward path demand information (PREQ). Every
node gets path demand and that is again forwarded to the neighboring node. If the
Fig. 5 Path maintenance procedure

Fig. 6 Route maintenance in VANET
sink node or receiver gets the path demand, then it will respond to the sender as
a perfect route reply message (PREP). A start node will get the shortest route and
forward data packets toward the specific route. Path maintenance is accountable for
the failure of connections. In case middle hop determines path breakages that will
forward an error fault data to the source node.
Figure 7 shows the comparative analysis with numerous protocols such as VALO,
DSR, and VACO algorithm. VALO is a proposed routing protocol which is optimized
the valuable route and network performance. VACO is used for traffic-aware routing
protocol, and ants are transferred by well-organized broadcasting apparatus to control
the network issues. In DSR, routing protocol is used for route broadcasting, route
reply, and problems come under the network lacking then route maintenance in the
VANET. In VALO is reduce the end to end delay parameters as compared with the
VACO and dynamic source routing method.
Figure 8 demonstrations the comparison between proposed and existing routing
protocols such as DSR, VACO, and VALO optimization algorithm. VALO is used to
fast signal broadcasting and high data transmission from source to destination vehicle
nodes. In VACO, traffic routing protocols to manage the route errors and recover the
lost information from the valid route in the network. In DSR, routing protocol to
handle the route searching, maintenance, and replying inaccurate manner. In VALO,
PDR performance is increasing as compared to VACO and DSR Routing protocols.
Fig. 7 Comparison—end to
end delay (ms)
Fig. 8 Comparison—packet
delivery ratio (%)
Figure 9 shows a comparison between routing overhead such as VALO, VACO,

and DSR routing protocol. In VALO, performance is mitigated as compared with the
VACO and DSR routing protocols.
Tables 3 and 4 show the performance of the network parameters like PDR, NO, and
E2D. In the proposed algorithm, VALO is PDR is 70%, delay 0.3 ms, and overhead
0.00057 bytes. VACO algorithm parameter values PDR is 55%, delay value is 0.8,
and overhead value is 0.150 bytes. In DSR, routing protocol performance PDR is
60%, the delay is 0.3, and overhead 0.0013 bytes. VALO PDR is 10% increased as
compared with the VACO and DSR routing protocol in VANET.
Fig. 9 Comparison—
routing overhead
(Byte)
Table 3 Simulation result

Parameters Values
analysis
V_network 2000 m × 2000 m
Range of communication 300
Protocol Dynamic Source Routing (DSR)
Vehicle node 10, 20, 30, 40, …
RSU 1, 2, 3, 4, 5
Coverage distance 100
Parameters PDR (%), Overhead (byte), and
Delay (ms)
Table 4 Performance
Parameters DSR Protocol DSR_ALO
parameters with DSR and
DSR_ALO proposed Work PDR (%) 0.60–60 0.707–70.7
Delay (ms) 0.3 0.2
Overhead (Byte) 0.0013 0.00057
Table 5 Comparison
Parameters VACO DSR protocol VALO
between proposed and
existing protocol PDR (%) 0.55–55 0.60–60 0.707–70.7
Delay (ms) 0.8 0.3 0.2
Overhead (Byte) 0.150 0.0013 0.00057
In conclusion, it is determined that VANET is the most prominent type of commu-

nication. VANET contains the vehicles which are the subtype of the MANET to
present the connection between the closest vehicles and among vehicles and close
by roadside devices, but generally different from other systems based on character-
istics. Precisely, the movement of hops in VANET is inadequate to road topology, so
that the road data are obtainable, and one may able to recognize the further location
of vehicles. Moreover, VANET provides substantial computing, transmission and
sensing abilities, and also regular communication, energy to provide these functions.
The protocol was made awake of the traffic situations on street segments. Hence,
minimum packets in the form of ant were used to sample traffic situation and update
vehicle route data.
This existing protocol is done through ant colony optimization (VACO). The
main issue in existing research is that it did not consider the vehicle traffic situations
on the street toward the route to assure the connecting. Hence, the VALO method
developed to overcome this issue. A new VALO technique will be developed to
improve the traffic rate in urban environments. VALO is a technique to find the
route, manage, and deliver maximum data from sender to receiver in VANET. Also,
studied and recognize the routing protocols in VANET. Moreover, develop a routing
protocol DSR to find route array, request, and store information in VANET. After
that, implement an ant lion optimization (ALO) technique through convergence and
fitness function to improve the performance rate. Along with that, it evaluated the
performance rate based on the PDR, E2D, and network overhead using VALO and
compared with the existing parameters. That is, its PDR rate is at least 10% maximum
achieve than that of the other comparison routing protocols up to a various number
of vehicle nodes.
The experimental outcomes will show the planned protocol (EEFFA) offers better
network presentation than energy efficient OLSR and firefly optimization methods.
It will improve the routing overhead and delay as related to the previous routing
protocols.
References
1. Tonguz, O., Wisitpongphan, N., Bait, F., Mudaliget, P., Sadekart, V.: Broadcasting in VANET.
In: IEEE 2007 Mobile Networking For Vehicular Environments, pp. 7–12 (2007)
2. Singh, A., Kumar, M., Rishi, R., Madan, D.K.: A relative study of MANET and VANET:
its applications, broadcasting approaches, and challenging issues. In: Springe International
Conference on Computer Science and Information Technology, pp. 627–632 (2011)
3. Dixit, M., Kumar, R., Sagar, A.K.: VANET: architectures, research issues, routing protocols,
and its applications. In: IEEE International Conference on Computing, Communication and
Automation (ICCCA), pp. 555–561 (2016)
4. Qu, F., Wu, Z., Wang, F.Y., Cho, W.: A security and privacy review of VANETs. In: IEEE
Transactions on Intelligent Transportation Systems, vol. 16(6), pp. 2985–2996 (2015)
5. Gaikwad, D.S., Zaveri, M.: VANET routing protocols and mobility models: a survey. In:
Springer Trends in Network and Communications, pp. 334–342 (2011)
6. Cabrera, V., Rose, F.J., Ruiz, P.M.: Simulation-based study of common issues in vanet routing
protocols. In: 2009—IEEE 69th Vehicular Technology Conference, pp. 1–5 (2009)
7. Gozalvez, J., Sepulcre, M., Bauza, R.: Impact of the radio channel modelling on the performance
of VANET communication protocols. Springer Telecommun Syst 50(3), 149–167 (2012)
8. Paul, Bijan, Islam, Mohammed J.: Survey over VANET routing protocols for vehicle to vehicle
communication. IOSR J Comput Eng (IOSRJCE) 7(5), 1–9 (2012)
9. Kumar, S., Rani, S.: A study and performance analysis of AODV, DSR and GSR routing
protocols in VANET. Int J Comput Appl 96(9), 48–52 (2014)
10. Goudarzi, F., Asgari, H., Al-Raweshidy, H.S.: Traffic-aware VANET routing for city envi-
ronments—a protocol based on ant colony optimization. IEEE Syst J 13(1), 571–581
(2018)
11. Mejdoubi, A., Fouchal, H., Zytoune, O., Ouadou, M.: A distributed predictive road traffic
management system in urban VANETs. In: IEEE 15th International Wireless Communications
and Mobile Computing Conference (IWCMC), pp. 37–42 (2019)
12. Nawaz, A., Sattar, A.R.: Traffic analysis in rural/urban area using VANET routing protocols.
In: Hybrid Electrical/Fuel Cell Vehicles Advances in Automobile Engineering, pp 2–5 (2016)
13. Saha, S., Roy, D.U., Sinha, D.D.: VANET simulation in different Indian city scenario. In:
Advance in Electronic and Electric Engineering. ISSN 2231-1297 (2013)
14. Durga, C.V., Chakravarthy, G., Alekya, B. Efficient data dissemination in VANETs: urban
scenario. In: IEEE 2018 International Conference on Inventive Research in Computing
Applications (ICIRCA), pp. 891–896 (2018)
15. Guo, C., Li, D., Zhang, G., Zhai, M.: Real-time path planning in urban area via vanet-assisted
traffic information sharing. IEEE Trans Veh Technol 67(7), 5635–5649 (2018)
16. Kaur, H.: Analysis of VANET geographic routing protocols on real city map. In: 2017 2nd IEEE
International Conference on Recent Trends in Electronics, Information and Communication
Technology (RTEICT), pp. 895–899. IEEE (2017)
17. Sachdev, A., Mehta, K., Malik, L.: Design of protocol for cluster based routing in VANET using
Fire Fly algorithm. In: 2016 IEEE International Conference on Engineering and Technology
(ICETECH), pp. 490–495. IEEE (2016)
18. Zhu, M., Cao, J., Pang, D., He, Z., Xu, M.: SDN-based routing for efficient message propaga-
tion in VANET. In: Springer International Conference on Wireless Algorithms, Systems, and
Applications, pp. 788–797 (2015)
19. Kaur, H.: Analysis of VANET geographic routing protocols on real city map. In: 2017 2nd IEEE
International Conference on Recent Trends in Electronics, Information and Communication
Technology (RTEICT), pp. 895–899. IEEE (2017)
Dynamic and Incremental Update
of Mined Association Rules Against
Changes in Dataset
N. Satyavathi and B. Rama
Abstract Association rule mining (ARM) in data mining provides quality associa-
tion rules based on support and confidence measures. These rules are interpreted by
domain experts for making well-informed decisions. However, there is an issue with
ARM when the dataset is subjected to changes from time to time. Discovering rules
by reinventing wheel, scanning entire dataset every time in other words, consumes
more memory, processing power, and time. This is still an open problem due to prolif-
eration of different data structures being used for extracting frequent item sets. We
proposed an algorithm for update of mined association rules when dataset changes
occur. The algorithm is known as FIN_INCRE which exploits the preorder coded tree
used by FIN algorithm for fast item set mining. The proposed algorithm outperforms
the traditional approach as it mines association rules incrementally and dynamically
updates mined association rules.
Keywords Association rule mining · POC-Tree · FIN-INCRE · Incremental

mining · Support · Confidence
1 Introduction
Association rule mining (ARM) has numerous applications such as analysis of sales
and discovering latent relationships among attributes in medical dataset to mention
few. ARM has two important phases known as discovery of frequent item sets and
producing association rules from the results of first phase. Different association rule
mining algorithms are evaluated [1, 2]. The difference of algorithms lies in the usage
of data structure. For instance, node set is the data structure used [3] for reducing
time and space complexity. Fast item set mining was thus made possible. However,
N. Satyavathi (B)
Department of CSE, JNTUH, Hyderabad, Telangana, India
e-mail: Satyanadendla15@gmail.com
B. Rama
Department of CS, Kakatiya University, Warangal, Telangana, India
e-mail: rama.abbidi@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_13
116 N. Satyavathi and B. Rama
there was need for incremental association rule mining algorithms that are expected
to generate association rules incrementally without rescanning the entire database
when database update occurs.
We found that the dataset used in FIN algorithm [3] and POC-Tree associated
with that provides faster and efficient mining of incremental association rules.
FIN-INCRE is the algorithm proposed for discovering association rules incre-
mentally. It exploits the POC-Tree and the underlying data structure of FIN
algorithm.
The paper is structured as follows. Section 2 presents related work done for mining
association rules. Section 3 presents the proposed methodology that can be used for
incremental and fast mining of frequent item sets. Section 4 provides procedure for
generating interesting association rules. Conclusion of the paper and future scope of
the research is provided in Sect. 5.
2 Related Work
This section reviews literature on association rule mining. ARM has been a persistent
topic in the domain of data mining for number of years. Plentiful research is found
on ARM and it proved its utility.
Many algorithms were developed for incremental association rule mining:
DB-tree and PotFP-tree [4], AFPIM [5], IFP-Growth [6], CAN tree [7], EFPIM
[8], CP-tree [9], FUFP-tree [10], BIT-FP-Growth [11] were developed. A new
approach called IRARM [12] for mining relational association rules is developed. A
system [13] is developed for incremental mining. Original database is represented in
the form of COMVAN tree, and frequent item sets are mined using COMVAN tree.
Many approaches have been developed for mining incremental association rules. But
still, these algorithms suffer from the following drawbacks:
• Required to scan original database many times.
• Works only in case of insertions.
• Works only in case of Deletions.
• Doesn’t work in case of support change.
• Data structure used in mining is not capable in terms of time complexity/space
complexity.
Hence, there is a need for an algorithm for ARM, which overcomes the drawbacks
of existing algorithms, must be developed. Hence, such algorithm is proposed by
enhancing FIN algorithm for efficient mining of incremental association rules.
Dynamic and Incremental Update of Mined Association Rules … 117
3 Proposed Methodology
Proposed algorithm works well (i) When original database is changed with new
transactions (ii) When some of the old transactions are deleted (iii) When change
in specified user threshold. FIN algorithm is used for discovering frequent item
sets from database D. When D is subjected to new records or removal of existing
records or user specified support changes, the item which is frequent may become
infrequent and infrequent item sets may become frequent. The proposed incremental
mining algorithm FIN_INCRE (shown in Fig. 1) finds the items which became
infrequent after adding new transactions, deletes them from Original POC-Tree and
also finds items which became frequent after adding new transactions and adds them
to original POC-Tree. In this way, POC-Tree is updated to leverage performance of
FIN_INCRE significantly. Algorithm UPOCinINS (shown in Fig. 2), UPOCinDEL
(shown in Fig. 3), UPOCinSup (shown in Fig. 4) is used to update POC-Tree in case
of insertions, deletions, and support change, respectively.
Fig. 1 FIN_INCRE algorithm

Fig. 2 UPOCinINS algorithm

Fig. 3 UPOCinDEL algorithm
4 Generating Association Rules
Association rules can be generated from frequent item sets by using “confidence”
measure. But enormous amount of rules may be generated which may or may not be
interested to the user. So, post processing of rules is required. To find the interesting
rules, several evaluation measures can be used [14, 15].
Fig. 4 UPOCinSup algorithm
5 Conclusions and Future Scope
In this paper FIN_INCRE, an ARM algorithm that is incremental in nature, is

proposed. The algorithm exploits the nodeset data structure and POC-Tree. Nodeset
consumes less memory as it performs encoding of each node of POC-Tree. Thus,
the proposed algorithm runs much faster than other incremental ARM algorithms.
The FIN_INCRE algorithm requires scanning of original dataset only once. After
scanning the dataset, it generates POC-Tree and from which it produces item sets that
occur frequently. Then, they are used to generate association rules. When some new
instances are inserted to original dataset, the algorithm scans only the newly added
instances. Then, it updates POC-Tree and frequent item sets before actually updating
the mined association rules. In future, we continue this research with FIN_INCRE
algorithm using distributed datasets. Another interesting future work is to explore
FIN_INCRE with streaming data.
References
1. Satyavathi, N., Rama, B., Nagaraju, A.: Present State-of-the-Art of association rule mining
algorithms. Int. J. Eng. Adv. Technol. (IJEAT) 9(1). ISSN 2249–8958 (2019)
2. Satyavathi, N., Rama, B., Nagaraju, A.: Present State-of-the-art of dynamic association rule
mining algorithms. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 9(1). ISSN 2278-3075 (2019)
3. Hong, Z., Deng, H., Sheng Long, L.V.: Fast mining frequent itemsets using Nodesets. Exp.
Syst. Appl. 41(10), 4505–4512 (2014)
4. Ezeife, C.I., Su, Y.: Mining incremental association rules with generalized FP-tree. In: Advances
in Artificial Intelligence, Lecture Notes in Computer Science, vol. 2338. Springer, Berlin,
Heidelberg (2002)
5. Koh, J.L., Shieh, S.F.: An efficient approach for maintaining association rules based on adjusting
FP-tree structures. In: Proceedings of the DASFAA, pp. 417–424. Springer, Berlin Heidelberg,
New York (2004)
6. Tong, Y., Baowen, X., Fangjun, W.: A FP-tree based incremental updating algorithm for mining
association rules. 5, 703–710 (2004)
7. Leung, C.K., Khan, Q.I., Hoque, T.: CanTree: a tree structure for efficient incremental mining of
frequent patterns. In: Proceedings of the Fifth IEEE International Conference on Data Mining
(ICDM’05) (2005)
8. Li, X., Deng, X., Tang, S.: A fast algorithm for maintenance of association rules in incre-
mental databases. In: Proceeding of International Conference on Advance Data Mining and
Applications, pp. 56–63 (2006)
9. Tanbeer, S.K., Ahmed, C.F., Jeong, B.S., Lee, YK.: CP-tree: a tree structure for single-pass
frequent pattern mining. In: Advances in Knowledge Discovery and Data Mining, Lecture
Notes in Computer Science, vol. 5012. Springer, Berlin, Heidelberg (2008)
10. Hong, T.P., Lin, J.W., We, Y.L.: Incrementally fast updated frequent pattern trees. Exp. Syst.
Appl. 34, 2424–2435 (2008)
11. Totad, S.G., Geeta, R.B., Prasad Reddy, P.V.G.D.: Batch incremental processing for FP-tree
construction using FP-growth algorithm. Knowl. Inform. Syst. 33(2), 475–490 (2012)
12. Diana-Lucia, M., Gabriela, C., Liana, C.: A new incremental Relational association rules mining
approach. In: International Conference on Knowledge Based and Intelligent Information and
Engineering Systems, KES2018, Belgrade, Serbia (2018)
13. Gupta, A., Tiwari, A., Jain, S.: A system for incremental association rule mining without
candidate generation. Int. J. Comput. Sci. Inform. Sec. (IJCSIS) 17(7) (2019)
14. Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association
analysis. Knowl. Discov. Data Min. 29(4), 293–313 (2004)
15. Liu, B., Hsu, W., Chen, S., Ma, Y.: Analyzing the subjective interestingness of association
rules. Intell. Syst. Appl. 15, 47–55 (2000). https://doi.org/10.1109/5254.889106. IEEE
E-Governance Using Big Data
Poonam Salwan and Veerpaul Kaur Maan
Abstract The continuous advancements in the field of ICT and the constant efforts
from the Central and State governments have been the foremost forces for the
successful launch and reinforcement of e-governance in India. With the help of public
and private sectors, governments are encouraging organizations for interoperability
to store and process data from a central location that further enhances decision-
making. This fastest-growing data is turning into big data. The tools used to study
and analyse big data at a great speed and accuracy are known as big data analytics.
These big datasets can be text/audio/video/picture, etc. As the use of e-governance
datasets is increasing, the citizens expect to analyse and process datasets at greater
speed and accuracy. This paper shows the relationship between e-governance and
big data, its implementation around the globe, initiatives taken by India to estab-
lish e-governance, and some challenges in implementing big data with e-governance
projects.
Keywords E-governance · Big data · Big data analytics · Interoperability
1 Introduction
E-governance refers to the process of delivering government services electronically.

It helps to maintain the essence of real democracy by making the government proce-
dures transparent. It helps to establish the government of the people, for the people,
and by the people through making the government officers accountable and respon-
sible for their duties. With the help of private sectors [1, 2], the Central and State
governments are conducting seminars, workshops, and advertisements to encourage
citizens towards e-governance. With all such initiatives, the transactional amount of
P. Salwan (B)
I.K. Gujral Punjab Technical University, Jalandhar, Punjab, India
e-mail: poonam12_sharma@yahoo.com
V. K. Maan
Giani Zail Singh Punjab Technical University, Bathinda, Punjab, India
e-mail: veerpalkaur1@rediffmail.com
https://doi.org/10.1007/978-981-33-4543-0_14
124 P. Salwan and V. K. Maan
data has been increasing so fast that the traditional database management system
cannot be used to deal with such exponentially growing data. This also affects the
decision-making as more than half amounts of data remain unprocessed due to change
in its type. In this paper, Sect. 1 will discuss how to manage this continuously growing
data related to e-governance. Section 2 will discuss big data in e-governance and its
features. Section 3 will discuss the role of big data in e-governance projects across the
globe and India’s initiatives to adopt big data. Section 4 will discuss some challenges
that may occur while using big data analytics in e-governance.
2 E-Governance and Big Data—An Inside
The main notion of e-governance is to provide a better socio-economic-political

environment to the citizens [3]. In 2006, the Indian government had initiated the
National e-Governance Plan (NeGP). Initially, it was having 27 Mission Mode
Projects (MMPs) of State and Central governments and 8 integrated MMPs. Later
on, another four projects were added in NeGP. All these projects led to the gener-
ation of a huge amount of data. This huge amount of data is known as big data [4,
5]. The most popular example of e-governance-based big data is Aadhar-UID. The
term big data is referred for the datasets whose size and capacity are beyond the
capabilities of a traditional database system. These datasets may be structured/semi-
structured/unstructured in nature which cannot be dealt with a traditional database
system. “The size or amount of data under big data varies from company to company;
i.e. one company’s big data may not be as big as other company’s big data” [6].
2.1 Characteristics of Big Data
Basically, all the datasets that satisfy the characteristics of 3Vs—Volume, Velocity,
and Variety—are considered as big data (Fig. 1). The technique used to study and
process mixed type datasets at a faster speed is called big data analytics. The big data
analytics process the big data by dividing datasets into equal sizes [7], storing them
on different computers known as nodes in a cluster of computers. This way big data
analytics make the processing faster and accurate.
2.2 Phases of Big Data
The different phases comprising data as big data [7] are as follows (Fig. 2):
• Big data generation: This phase refers to different sources generating huge
amounts of data at greater speed.
E-Governance Using Big Data 125
Fig. 1 Characteristics of big

data
Fig. 2 Phases of big data
• Big data acquisition: This phase refers to collecting data, turning into big
data, from different resources, or distributing data among other resources or
pre-processing of data.
• Big data storage: This phase refers to the management skills to store big data in
such a way that it could enhance the accessibility and availability of big data.
• Big data analytics: This phase refers to the analysis of struc-
tured/unstructured/mixed datasets to forecast future trends or predictions.
2.3 Features of Big Data
The important features of big data are as follows:

• It is capable to manage a dynamic type of data; i.e. it can manage the structured,
semi-structured, and unstructured type of data easily.
• It can easily manage a great volume of datasets, produced at a great velocity.
• It is scalable in nature; i.e. its setup can be modified as and when required.
• It has very vast analytic techniques meant for different types of data that help to
study different patterns or trends from processed/unprocessed data.
• It helps to take important decisions basis the current trend’s analysis.
3 Adoption of Big Data in E-Governance Project
Earlier, when the digital form of data was not available—the veteran leaders of
the government were expected to use their wisdom and past experiences to make
decisions [8]. In the present era, big data analytics help in decision-making using
digitized datasets. Almost 90% of datasets generated through different resources are
of an unstructured type. Big data analytic techniques give us a facility to explore
the unknown or hidden facts through the dissemination and processing of data under
different phases. Figure 3 shows how different types of datasets are collected, refined,
and synthesized to get the required data from the datasets [9].
The private sector has started using big data analytics to maximize their profit
by studying market trends, consumer behaviour, expectations, etc. The government
departments are using it for the growth and development of their citizens. The govern-
ments are also making laws and implementing policies to ensure security and privacy,
at all the phases of big data processing, for the validity of the information. Many
countries of the world like the US, UK, and Japan have already started projects using
big data analytic techniques to make future predictions [10].
Fig. 3 Overview of different phases of big data processing

3.1 E-Governance and Big Data Across the Globe
Here is the analysis of various counties running e-governance projects based on big
data analytics [10, 11].
• The Australian government has been using big data analytics to provide better
services to their citizens. The Australian Customs and Border Protection Service
(ACBPS) is using big data analytics to ensure the security of their borders.
• The UK government had allotted £189 million for big data research, and major
emphasis was given to the agriculture industry.
• The government of France has allocated e11.5 million on the proposal related to
7 big data processing projects.
• The Norway government has been using big data analytics for the health care of
its citizens.
• The Indian government has invested Rs. 5630 crores on the UID project to provide
a unique ID to its citizens.
The United Nations Department of Economic and Social Affairs (U.N DESA)
conducts E-Governance Development Survey [12–14] every two years (biannu-
ally). This survey helps to find out the e-readiness of different countries and calcu-
lates E-Government Development Index (EGDI) using human development-related
parameters. The detail of these parameters is as follows:
1. Online Service Index (OSI): It checks whether the countries are following the
minimum level of Web Content Accessibility Guidelines or not.
2. Telecommunication Infrastructure Index (TII): It checks communication-related
aspects of the nation like total users of computer per 100 people; total connections
of telephone per 100 people; total connections of Internet per 100 people; total
users of mobile per 100 people; and total users of broadband per 100 people.
3. Human Capital Index (HCI): This parameter checks the literacy rate, enrolment,
and level of education at the primary and secondary levels, and skill development.
After calculating the above parameters, EGDI further finds out the composite
index based on the weighted average of these parameters. The possible values of this
index lie between zero (minimum) to 1 (maximum).
EGDI = (1/3 ∗ OSI) + (1/3 ∗ TII) + (1/3 ∗ HCI)
The EGDI index report 2018 (Table 1) shows Denmark at the top rank with 0.9150
index value. India, through its constant efforts, has made it possible to achieve 96th
global rank in the EGDI report with 0.5669 index value [12, 15].
Now the obvious question that comes to the mind is—Is the ranks scored by
different countries is the result of continuous efforts [16] or the result of efforts
invested in two years only? The answer to this question can be understood with the
help of Table 2. It shows the consolidated status of different countries on the basis
of EGDI Biannual reports of 2014, 2016, and 2018.
Table 1 E-Governance Development Index (EGDI) 2018 survey report

Rank Countries EGDI 2018 OSI 2018 TII 2018 HCI 2018
1 Denmark 0.9150 1.0000 0.7978 0.9472
2 Australia 0.9053 0.9722 0.7436 1.0000
3 Republic of Korea 0.9010 0.9792 0.8496 0.8743
4 UK 0.8999 0.9792 0.8004 0.9200
5 Sweden 0.8882 0.9444 0.7835 0.9366
11 USA 0.8769 0.9861 0.7564 0.8883
65 China 0.6811 0.8611 0.4735 0.7088
94 Sri Lanka 0.5751 0.6667 0.3136 0.7451
96 India 0.5669 0.9514 0.2009 0.5484
117 Nepal 0.4748 0.6875 0.2413 0.4957
Source UN e-government survey 2018
Table 2 Biannual comparison of EGDI

Rank Country EGDI 2014 EGDI 2016 EGDI 2018
1 Denmark 0.8162 0.8510 0.9150
2 Australia 0.9103 0.9143 0.9053
3 Korea 0.9462 0.8915 0.9010
4 UK 0.8695 0.9193 0.8999
5 Sweden 0.8225 0.8704 0.8882
6 USA 0.8748 0.8420 0.8769
7 China 0.5450 0.6071 0.6811
8 Sri Lanka 0.5418 0.5445 0.5751
9 India 0.3834 0.4637 0.5669
10 Nepal 0.2344 0.3458 0.4748
Source UN e-government survey 2014, 2016, and 2018
Fig. 4 Biannual comparison Biannual comparison of EGDI

of EGDI survey 2014–2018 1
EGDI Index
0.8 EGDI 2014

0.6
EGDI 2016
0.4
0.2 EGDI 2018
0
Countries and their ranks

The pictorial representation (Fig. 4) further helps to understand the difference in

parameters, the growth rate of e-governance, and the e-readiness of various countries
at different time intervals. It indicates that e-governance is a long-term project seeking
continuous efforts, time, money, and management for its successful implementation.
3.2 E-Governance and Big Data in India
The Indian governments have initiated many e-governance-based projects at the

Central level, State level, or with the integration of both for the citizen’s welfare
[17]. The most prestigious project was UID (Aadhar Card) where the government
has invested Rs. 5630 crores, to uniquely identify citizens. This project is using big
data analytic techniques as it is dealing with huge amounts of mixed datasets that
need to be processed in real time at great speed. In order to make India—Digital
India, the governments are trying to make the citizens aware and enable to use all the
government services. As far as the digitization of public departments is taking place,
the problem of maintaining a huge amount of data using traditional databases was
also becoming heartbreaking. Thus, now the role and support from big data analytics
have not been only supporting e-governance but also to provide various techniques
to easily store or process a huge amount of datasets at a great speed and accuracy.
That is how big data has been proving its worth in e-governance projects.
E-governance projects at Central level: Table 3 shows the detailed list of
Mission Mode Projects initiated in India [18].
Some of the MMPs initiated and implemented at the Central level are:
• Digitization of government offices: The Department of Administrative Reforms
and Public Grievances (DAR&PG) and National Information Centre (NIC)
worked together for the computerization and its successful implementation in
all the government departments [19] to make the system more transparent.
• Issuance of Unique Identification (UID): The idea of this project was first triggered
and discussed in 2006 [20]. This project stores all the related information like
name, address, retina scan, and finger impressions.
• Income tax (IT): This project enables the citizens to file income tax [21] on anytime
and anywhere basis. This project encourages issuing PAN cards to citizens that
are further linked with the citizen’s account. Citizens can also track the status of
their returns or refund online.
• Central Excise and Customs: This project facilitates the trade and industry by
simplifying the custom and excise processes [22], filing of returns, reconciliation,
e-registration for excise and service tax, etc.
• Insurance: This project provides speedy processing of claims, online insurance
of policies on the web, etc., through interoperability [23].
E-governance projects at the State level: Some of the Mission Mode Projects
initiated and implemented at the State level are as follows:
Table 3 Mission Mode Projects (MMPs)

S. No. Central MMPs State MMPs Integrated MMPs
1. Banking Agriculture CSC
2. Central Excise and Customs Commercial taxes e-Biz
3. Income tax (IT) e-District e-Courts
4. Insurance Employment Exchange e-Procurement
5. MAC 21 Land Records (NLRMP) EDI for eTrade
6. Passport Municipalities National e-Governance
Service Delivery Gateway
7. Immigration, Visa e-Panchayats India Portal
And Foreigners Registration
and Tracking
8. Pension Police
9. e-Office Road transport
10. Posts Treasuries Computerization
11. UID PDS
12. Education
13. Health
Source https://meity.gov.in/content/mission-mode-projects
• Agriculture: The main objective of this MMP is to inform the farmers [24] about
seeds, type of soil and matching crops, fertilizers, pesticides, government schemes,
weather forecasts, etc.
• Commercial taxes: The main objectives taken care by this project are e-filling of
returns [25], refunds, e-payment of taxes, online dealer ledger, etc.
• Education: Education is the common concern of both the Central and State
governments [26]. Thus, the Ministry of Human Resource Development (MHRD)
established a centralized structure that will be implemented by State governments.
• E-municipalities: Digitization of the state-level municipalities is another very
important initiative taken by the Central government [27] under the e-governance
plan.
• Digitization of land records: The main objective of this project is to digitize the
existing land records to avoid the chances of human mistakes [28].
• Employment Exchange: This project helps employers and employees to match
their requirements and find the best fit using online resources [29].
Integrated e-governance projects: Other than the projects mentioned above,
there are many projects seeking Central and the State governments’ coordination for
the welfare of the citizens, for example, land records, education, entertainment, etc.
Some of the integrated projects and their objectives are as follows.
• Road transport: This project created a unified scheme (states and union territo-
ries) to computerize their transport offices for efficient and quick management of
driving licences and certificates [30].
• E-Procurement: This project helps to make the procurement processes simple,
transparent, and result-oriented [31] using the Internet.
• EDI for eTrade: The electronic data interchange (EDI) for online trade provides
deliveries of services (24 * 7) electronically, increased transparency, reduced time,
cost, etc. [32].
• E-Biz: This project provides services in Government-to-Business (G2B) [33] by
sharing updated online information, easy to access the website, etc.
4 Challenges of Using Big Data Analytic Techniques
Big data analytic techniques have proved their worth in e-governance-based projects.
Still there seems some challenges or gaps to overcome for the successful use of big
data in e-governance [34, 3].
• Threat to privacy: Big data analytic techniques need to process personal details of
the citizens like UIDs, bank details, health details, sale, or purchase information
for analysis. If this personal information is not used appropriately, then it may
lead to its safety threat.
• Ethical versus unethical: As the end-users (citizens) are neither aware nor
informed that their personal details have been shared for future analysis, this act
inclines towards the unethical use of power for accessing sensitive information.
• Security of data: The e-governance project’s datasets, placed on the distant servers
may lead to intentional or unintentional threats to sensitive datasets.
• Lack of skilled resources: There is a deficiency of skilled resources to maximize
the utilization of big data analytics by finding out hidden patterns or detail.
• Reliability of information: The reliability of these reports mainly depends on the
capabilities and intentions of the enabled resource generating that report.
5 Conclusion
E-governance has been transforming the whole world. Now paper files have been
turned into computerized files and stored and maintained at the repositories placed
at far of places. Big data analytic techniques have been adding sophistication in the
e-governance by providing detailed insights of hidden patterns or datasets. Big data
analytic techniques have also been overwhelming the traditional DBMS problems
like storing, sharing, and processing huge volumes, high velocity of datasets at greater
speed. Big data analytics also have some issues or risks related to safety, security, and
accessing of datasets. Technocrats are continuously working to provide safeguards
against all the odds being faced using big data analytic techniques. The Indian govern-
ment is also working to make India—Digital India. Various e-governance projects
have been implemented at the central and the state levels for the welfare of the citi-
zens. The most popular project; i.e. UID has been using big data analytic techniques
to store and process huge amounts of data. Thus, the integration of e-governance and
big data should be encouraged to make Indian cities—smart cities and India—Digital
India. This will also help the Indian government in decision-making, better planning,
and management of resources for the welfare of citizens.
References
1. WSP International Management Consulting: Report on Public-Private Partnerships in India

2. Infrastructure in India: The Economist (Magazine) (2012)
3. Benchmarking E-government: A Global Perspective. United Nations Division for Public
Economics and Public Administration
4. Preet, N., Neeraj, S., Manish: Role of big data analytics in analyzing e-governance projects.
Gian Jyoti E-J. 6(2), 53–63 (2016)
5. Big data: Wikipedia. https://en.wikipedia.org/wiki/Big_data
6. Poonam, S., Mann, V.K.: IJRTE 8(6), 1609–1615 (2019). https://www.ijrte.org/wp-content/upl
oads/papers/v8i6/F7820038620.pdf
7. Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mob. Netw. Appl. 171–209 (2014)
8. Zaid, S.A.: Case Study: Impact of Leadership on Governance in Public Administration.
Academia
9. Big Data and Analysis. FOSTECH & Company
10. Rajagopalan, M.R, Solaimurugan, V.: Big data framework for national e-governance plan. In:
11th International Conference on ICT and Knowledge Engineering. IEEE (2013)
11. Sridhar, V.: E-paper—Big Data’s Big Governance Impact (2017)
12. E-governance Development Index survey report from UN DESA (2018). https://drive.google.
com/file/d/1FZT5zDfTa-ejvPh9c1Zu1w51DoMOefw1/view
com/file/d/1C-wGuGkLEIY4pwM-cO7Nv2xjM2_IvJbI/view
com/file/d/1BrSZ7zfsPGLd6t6AiHyynsZLhCEFZjmY/view
15. E-governance and Digital India Empowering Indian Citizens Through Technology, September
2015
16. Chap 8—E-governance in India: Initiatives, Opportunities and Pr0spects to Bridge the Digital
Divide
17. Mohanty, P.K.: Using e-Tools for Good Governance and Administrative Reforms. Academia
18. Mission Mode Projects (MMPs) of India: Ministry of Electronics and Information Technology.
Government of India
19. National Portal of India: www.india.gov.in
20. UID: http://www.censusindia.gov.in/
21. Income Tax Department: www.incometaxindia.gov.in
22. Central Excise and Custom: http://www.cbec.gov.in/
23. Insurance: http://financialservices.gov.in/
24. Agriculture: http://agricoop.nic.in/
25. Commercial Taxes: https://dor.gov.in/
26. Ministry of Human Resource Development: www.mhrd.gov.in, www.edudel.nic.in
27. E-municipalities: http://tte.delhigovt.nic.in/wps/wcm/connect/doit_udd/Urban+Development/
Home
28. Digitization of land records: http://noidaauthorityonline.com/land-record.html

29. Employment exchange: www.labour.nic.in/
30. Road Transport: http://morth.nic.in/ and http://meity.gov.in/
31. E-procurement: https://commerce.gov.in/
32. Electronic Data Interchange (EDI) for Trade (eTrade): https://commerce.gov.in/
33. E-Biz Homepage: http://dipp.nic.in/
34. Sachdeva, S.: White Paper on E-Governance Strategy in India (2002)
Implementation of Voice Controlled Hot
and Cold Water Dispenser System Using
Arduino
K. Sateesh Kumar, P. Udaya Bhanu, T. Murali Krishna, P. Vijay Kumar,
and Ch. Saidulu
Abstract The water dispenser is a system which can be used to dispense drinking
water at various work and commercial places. Due to extensive usage among public,
the demand for these water dispensers is increasing day to day. As per the health-
conscious and/or based on their interest prefer hot and cold water. Even though a
plethora of water dispensers are available in the market still there is scope to improve
in performance wise. The existing choice microcontroller unit-based water dispenser
systems dispensers are facing common problems like button operated and wastage of
water in the case of overflow of no glass/container case. In this paper, these problems
are addressed with hardware design. A novel voice control-based water dispenser is
proposed by maintaining the choice-based dispense with voice control using Arduino
Nano. This system also avoids the wastage of water.
Keywords Arduino · Microcontroller unit · Voice controlled · Water dispenser
1 Introduction
The water dispenser is a system which can be used to dispense drinking water
at various work and commercial places from schools to corporate workplaces
including hospitals. Due to extensive usage among public, the demand for these
K. Sateesh Kumar · P. Udaya Bhanu (B) · T. Murali Krishna · P. Vijay Kumar · Ch. Saidulu
Department of ECE, Vignan’s Lara Institute of Technology and Science, Vadlamudi, AP, India
e-mail: udayabhanu.potu@gmail.com
K. Sateesh Kumar
e-mail: sateeshkumarkanagala@gmail.com
T. Murali Krishna
e-mail: murali22061999@gmail.com
P. Vijay Kumar
e-mail: pusuluriv99@gmail.com
Ch. Saidulu
e-mail: saidulu.ch786@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_15
136 K. Sateesh Kumar et al.
water dispensers is increasing day to day [1, 2]. These dispensers are used to dispense
water at low (cool), normal (room temperature), and high (hot) temperatures with
different microcontroller-based circuits/designs. Even soft drinks also offer with this
technology. Particularly, in pandemic situations, people from all over the world are
very conscious about their health, particularly about drinking water. On the other
side, embedded systems are rapidly developing to address various real-time issues
in our day to day life [2]. This makes the large space for marketing innovative and
smart water dispensers on a global scale as an application of an embedded system.
With this, we propose an Arduino nano-based voice controlled hot and cold water
dispenser system.
This paper is organized as follows. Section 1 represents the introduction to the
research problem and followed by background in Sect. 2. Detailed discussion about
the proposed method is presented in Sect. 3. Algorithm of the proposed is presented
in Sect. 4. Results are presented in Sect. 5 and conclusion follows.
2 Background
In site view of background is presented in this section. As an impact of digitalization

process all household appliances, from the basic need to entertainment is trans-
formed into a smart mode. This was extensively applicable for water dispensers.
In the early stage, water dispensers are equipped with heating coils with controlling
circuits which are analog in nature. Due to the development of embedded systems, the
microcontroller-based units are replaced these existing analog systems [3]. Perfor-
mance wise like heating of water, controlling of temperature, speed and accuracy
are good in these digital-based systems [4, 5]. The standard Micro Controlled base
Unit (MCU) is efficient in power management too [6]. ATMEL family-based water
dispensers can also supply hot water with specially designed outlets made up of
copper or steel [7] and having a mixing mechanism to produce water at a required
temperature level [6]. Apart from the basic functioning, water dispensers are also
getting into smarter with interesting features like mixing of water to get required
temperature, monitoring of water level, mineral level in the water, quality of water,
and sensor-based ON and OFF, etc., with the Internet of Things (IoT) [7, 8]. Most
of the embedded based machines are operated with power supply. These are better
than power-free water dispenser [9]. The motive of this paper is to design a smart
system to avoid wastage of water with more users friendly rather than to study the
chemical attributes [4] of water.
In the existing system of AT89S52-based microcontroller, the existing water
dispenser [9] was implemented with microcontroller (AT89S52) that was done with
embedded C program. This microcontroller unit can handle and interface with the I/O
modules. Whenever the user presses a particular button in the dispenser, it will send
the information to the microcontroller, which is connected to DC motors. According
to the information received as an input, the corresponding motors will activate, and
they dispense the water. The heat section coils present in the dispenser will heat the
Implementation of Voice Controlled Hot and Cold Water … 137
water in the dispenser. The temperature sensor can be used to sense the temperature
of the water tank. The power supply is given to heat section coils heat the water.
Some of the drawbacks of the existing system are as follows.
• Water over flow will occur at the dispenser, during no glass case.
• Continuously operating the buttons of dispenser, till the completion of required
water level.
These drawbacks and addressed and resolved in the proposed method of voice-
based water dispenser system with advanced hardware and high computational power
than existing.
3 The Proposed System
The block diagram of the proposed Arduino Nano-based water dispenser system is
represented in Fig. 1.
The proposed voice choice-based water dispenser system works with Arduino
Nano board. This Nano board is advanced than Arduino Uno [5, 10, 11]. This is
an improved version of the existing 89S52-based system in hardware design and
functionality (software). The novelty of the proposed system with the existing is
there is feedback between user (input) and outlet (output). This reduces the wastage
of water and improves the performance of the system. The insight of the proposed
water dispenser is categorized into two parts. (i) Hardware, (ii) Software.
Voice Relay, Pump

Command motor (Hot)
Crystal Relay, Pump

Oscillator ARDUINO motor (Cold)
NANO
Power supply IR sensor
Temperature
Sensor LCD Display
Fig. 1 Block diagram of the proposed water dispenser using Arduino

3.1 Hardware
The heart of the hardware section of the water dispenser Arduino Nano. Arduino
replaces the AT89S52, which was at down stage in the performance than Arduino.
Arduino Nano: Arduino Nano is a product of Arduino. It is a flexible advanced
microcontroller with and broadband supportive. It supports its earlier version UNO
at small in size. The output Arduino Nano can produce analog and digital outputs to
control the peripherals.
Power supply: The power supply required for this water dispenser is at maximum of
12 V.
Voice recognition module (VR-3): In this, voice recognition module voice(s) of user
are recorded and stored. When user again gives command, compares with database,
and gives the response as either hot or cold which was opted by the user.
Crystal oscillator: Crystal oscillator is used to provide the clock to Arduino board.
This is assembled on the board itself. The crystal frequency is in the order of 6 MHz.
Relay: Relay is electromechanical to acts as automatic switch to drive the pump
motor drivers (L298N) to dispense water of choice through outlets. Separate relays
are used for hot and cold outlets.
IR sensor: The IR sensor is used to identify the presence of the container or glass at
the outlet by transmitting and receiving infrared signals. Water level also measured
using IR sensor.
LCD display: In this proposed method, 16 × 2 LCD is used to display the status
of container, status of water that dispensing and any error messages. The outputs of
LCD display are discussed in the result section.
Temperature sensor: The temperature sensor is to sense the temperature value of
the hot and cold water tank. The mixing of water is also possible based on the user
choice.
3.2 Software (Arduino 1.8.10)
The Arduino Integrated Development Environment (IDE) is used to program the

proposed water dispenser in C++. This IDE provides the environment to store and
process voice commands given by user.
4 Algorithm of the Proposed Method
The algorithm of the proposed method is represented in this section.

1. Power on the Unit, it displays “Please place glass” in LCD.
2. If the glass is not detected by IR sensor, then it displays “Please place glass” in
LCD.
3. If the glass is detected by IR sensor, then it displays “Glass detected gives your
input in LCD.
4. Now, user can give voice through VR3. This recorded voice is compared with
the database.
5. If the voice matches with database, then voice command is goes to Arduino
module.
6. Arduino dispenses the water, i.e., HOT or COLD based on the choice.
7. Check whether the glass is full or not using IR sensor.
8. Dispense the water continuously till the completion of the glass.
9. If the glass is full, then display message “Take your water” LCD.
10. Stop the dispensing of water.
5 Results
This section provides the brief discussion of the results with the help of corresponding
screenshots. Figure 2 shows the overall design of the proposed voice-based smart
water dispenser using Arduino and message of “Please place Glass”. Figure 3 repre-
sents the message “Glass detected give voice input,” and Fig. 4 shows the display of
“HOT water” and “Cold water” in LCD display after accepting the voice command
by user.
Fig. 2 Overall design of the proposed water dispenser using Arduino

Fig. 3 Displaying “Glass detected give the voice”, when system is on
Fig. 4 Display of message “Hot water” and “Cold water” in LCD
6 Conclusion
The voice-controlled water dispenser using Arduino Nano is proposed with more
user-friendly accessibility. This system is an improved version in hardware and in
functional than existing 89S52 model. Avoidance of water wastage is an added advan-
tage for this system. This makes it best suitable for home and commercial areas.
It needs minimum maintenance than power-free water dispensers. The proposed
Arduino-based system can save the water resource by implementing IR sensor-based
container detection mechanism and overflow detection.
References
1. Huang, P.P.: The effect of different temperature water intake to the resting heart rate variability.
Department of Physical Education, Fu Jen Catholic University, Magisterial Thesis (2005)
2. Reverter, F., Gasulla, M., Palhls-Areny, R.: Analysis of power-supply interference effects on
direct sensor-to-microcontroller interfaces. IEEE Trans. Instrum. Meas. 56(1), l71–177 (2007)
3. Jinxiong, X., Dong, Z., Yuying, W., et al.: A design of energy-saving drinking dispenser based
on fuzzy memory control. J. Inspection Quarantine 20(3), 30–33 (2010)
4. Huang, J., Xie, J.: Intelligent water dispenser system based on embedded systems. In: Proceed-
ings of 2010 IEEE/ASME International Conference on Mechatronic and Embedded Systems
and Applications, pp 279–282. Qingdao (2010)
5. Cheng, W.Z., Cheng, R.Z., Shuo-Yanchou: Power saving for IoT enabled water dispenser
system. In: 2019 42nd International Conference on Tele Communications and Signal Processing
(2019)
6. Huang, C.J., Tsai, F.T.: Research and development of a practical water dispenser. In:
International Conference on Applied System Innovation (ICASI), pp. 1225–1228. Sapporo
(2017)
7. Zhongren, C., Fangjing, C., Yanfeng, Z.: Development and application of an external intelligent
power saver for drinking water dispenser. Shan Xi Electronic Technology (2012)
8. Smart Systems and IoT: Innovations in Computing. Springer Science and Business Media LLC
(2020)
9. Ariffin, S.H.S., Baharuddin, M.A., Fauzi, M.H.M., Latiff, N.M.A., Yusof, S.K.S., Latiff,
N.A.A.: Wireless water quality cloud monitoring system with self-healing algorithm. In:
2017 IEEE 13th Malaysia International Conference on Communications (MICC), pp. 218–223
(2017)
10. Yen, Y., Chou, Z., Hou, M., Wang, X.: The design of intelligent water supply device based on
MCU. In; 2015 IEEE 5th International Conference on Electronics Information and Emergency
Communication, pp. 388–391. Beijing (2015)
11. Aisuwarya, R., Hidyathi, Y.: Implementation of Ziegler-Nichols PID tuning method on stabi-
lizing temperature of hot-water dispenser. In: 2019 16th International Symposium on Research
(QIR), International Symposium on Electrical and Computer Engineering (2019)
Future Smart Home Appliances Using
IoT
Pattlola Srinivas, M. Swami Das, and Y. L. Malathi Latha
Abstract Internet of Things is a physical device and objects that collect data, store,
and analyze the data. In-home appliances manually operated activities and func-
tions. The IoT uses technical product elements, in-home appliances rapid changes
in society. The IoT system design development, control and monitoring in various
applications such as health, transport, agriculture, home appliances, etc. We propose
a framework model future smart home appliances using IoT which helps the devel-
oper to build infrastructure home automation applications accordingly to the user
specifications and the requirements. The proposed model is the best solution to use
smart home applications with the use of sensors, communication, smart home opera-
tions, controlling with the use of mobile apps and Arduino. The system will provide
security, smart home automation. In future, it will be extended to develop intelligent
smart-based application with an integrated environment and reporting applications.
Keywords IoT · Home security · SmartPhone · Smart home appliances ·

Automation
1 Introduction
Internet of Things is the network of physical devices, objects, sensors in the network
connectivity, which are used to collect, exchange data, store, and analyze the data.
IoT applications in home automation features energy, protection, and safety. In 1990,
P. Srinivas (B) · M. Swami Das

Department of CSE, Malla Reddy Engineering College (Autonomous), Hyderabad, Telangana
State, India
e-mail: pattlolasrinivas@gmail.com
M. Swami Das
e-mail: msdas.520@gmail.com
Y. L. Malathi Latha
Department of CSE, Swami Vivekananda Institute of Technology, Patny Center, Secunderabad,
Telangana State, India
e-mail: malathilatha_99@yahoo.com
https://doi.org/10.1007/978-981-33-4543-0_16
144 P. Srinivas et al.
home automation using Internet and devices. In 2000, home network systems using a
smartphone which uses apps for remote monitoring applications; in 2010, smart home
application by IoT and AI technologies is used for context-aware systems, now 2020,
intelligent smart home appliances using IoT, AI, and machine learning for record
data, store, analyze, and remote control and access according to the information in
context. IoT plays by 2023 US $137.9 billion, growth rate. The home appliances by
smartphone with wi-fi as the communication protocol.
Internet of Things is a new technology used in important various applications like
smart homes, health, energy servers, defense, monitoring, transport, traffic manage-
ment, infrastructure management, and water and building environment, etc. IoT
consisting of components, networks, sensors, able to integrate to read, store, and
analyze information. IoT essential technologies are WSN, RFID, middleware, cloud
computing, and IoT software applications.
IoT is a service required in worldwide various applications. According to world
statistics, 2020 by 20.4 billion IoT devices, expected to be 64 billion IoT devices are
used by 2025. Growing the popularity to use of home systems, security is the most
important; in IoT-based security helps guarantee availability of services. Today smart
home technology advancements are mostly used in general human aid, intelligent
smart home IoT services. IoT changing the life of a human, home appliances are
also used of home in the office, every domestic space light, dishwater, gardening,
air conditioning, etc. The sensors controlled smart devices use smartphone or tablets
with wi-fi connection are used to collect the sensor data, which allows to read,
store, analyze the data. Gardening use automatic sprinklers in smart automation
infrastructure of house with communication medium wi-fi or Bluetooth. In 2020,
there are 20 billion devices are connected in healthcare, advanced technology with
IoT products are used in home appliances and house automation. IoT rapid changes
in society to provide the scope of the devices. Future home appliances used TV,
lighting, heating, freeze operations and use of IoT systems. IoT devices connected
with communication efficiently.
The paper is organized Sect. 2 describes the literature survey, Sect. 3 describes
the proposed model, Sect. 4 is about discussions, and Sect. 5 specifies about the
conclusion and future scope.
2 Literature Survey
2.1 Related Work
In Lee et al. [1], IoT technology emphasis the essential elements, products, and
services. Kumar Mandula et al. [2] discussed IoT-based applications health care,
home, etc., using automation microcontroller and mobile apps. Mussab Alaa et al.
[3] studied IoT related to 229 articles and technological advancements in smart home
applications, smart homes, apps, IoT databases and classified papers into IoT smart
Future Smart Home Appliances Using IoT 145
home survey. Vignesh et al. [4] proposed a model home automation that accesses
control devices remotely by smartphones with the use of WSN, cloud networking
from remote locations.
Timothy Malche et al. [5] proposed a FLIP architecture that uses sensor environ-
ment alert, monitor, controlling, intelligent, in smart home applications using Frugal
Laboratories IoT (FLIP) architecture. Swetha et al. [6] studied systems to monitor
smart homes in electrical appliances light, fan, etc., using sensors and the Internet.
According to Min Li et al. [7], smart home applications were used in the most impor-
tant part of smart grid usage that users respond from services in designing a smart
home with electricity service. Petnik et al. [8] proposed a home care cloud-based
service with an integration layer. Heetae Yang et al. [9] studied smart home service
functions the authors collected 216 samples from Korea, personal and characteris-
tics based on behavior. Jian Mao et al. [10] studied IoT functionality, security with
machine learning algorithms which play the most significant role in smart home
systems. Hana Jo et al. [11] study smart home, IoT-related technology which inte-
grated devices that are used to organize each device in a network perform activities.
Majid Al-Kuwari et al. [12] proposed a smart home automation with the use of IoT
sensing and monitoring. Controlling home, smart home with intelligent automation
with design, sensing, and monitoring. Shradha Somani et al. [13] proposed an IoT-
based smart security that provides home automation using a smart home that will
use the software, sensors, and actuators. Ahmed et al. [14] studied IoT quality assur-
ance. IoT applications growing in various domains such as security, e-health, smart
cities, and defense applications. Batalla et al. [15] proposed an architecture to provide
security and availability. According to Khalaf et al. [16], smart home control activity
using IoT sensors, processing, and applications.
2.2 Problem Definition
Design and developing a model for IoT-based smart home appliance system with
automation activities based on sensors, processing data control and monitoring
system in the smart home environment.
3 Proposed Model
The architecture mainly consists of users, devices, network communication, control-

ling, and application services. The system proposed model is shown in Fig. 1. IoT-
based home application and reporting system design, development of infrastructure
and application services. The architecture is at initial stages that the developer will
need to investigate needs and design the futures of IoT home automation appliances
system and collect different specifications according to the operations.
Arduinos
Control and
Sensor Networks IoT Home
Users Monitoring
appliances
Smart
MobileApp
Smart
Light Refrigerator Door
Phone
Smart Temperature
Security Smart TV
Speaker and others
Fig. 1 IoT-based home application and reporting system
Table 1 Smart environment

Technology Elements
applications
Network size Small, medium, and large
Users Home users
Energy Rechargeable battery
Internet Wi-Fi, Bluetooth
Data Local, sensor, and remote data
IoT devices Smart mobile, RFID, protocols, and apps
Analysis Data storage, analysis, and reporting
IoT elements each component, hardware, middleware, storage, computing tools,

data analysis, and visualization. It is used by mobile users and remote users; IoT
home appliance users are the stakeholders to user functions [17, 18]. The IoT home
appliances will use the smart environment technology elements which are described
in Table 1.
In IoT sensors are collecting data will be communicating to IoT applications
according to the sensors information and user operations. The sensors are used for
data management, processing, and analysis home health, entertainment, based on
information process events and running actions in services. The system with compo-
nents uses of sensors. Sensors will collect the data; for example, sensors will detect
light, refrigerator, door, security, smart speaker, smart TV, temperature, water level,
air quality, video, sound, pressure, humidity, infrared, vibration, and ultrasonic. The
communication system will use gateways, protocols, firmware, network in home
appliances to use communication to collect the information, for example, RFID,
Wi-Fi, WSN, and satellite home automation using IoT, use the software, hardware,
and networking. Automation uses sensors, protocols, hardware, software, apps, and
communication protocols. Application services in IoT smart home appliances consist
of home safety gardening management and security, air and water quality moni-
toring, voice assistant for natural language, smart watch, smart lock, smart energy
meter depend on smart home IoT home automation gateway. Communication and
control smart mobile with functions and communicating to control and monitoring
with Arduino.
Home automation will provide functions according to the stakeholder’s specifica-
tions. Characteristics features like house automation, functions, and security services.
IoT emerging personal, home, enterprise, utilization of mobile operations for data
collection, keep track, home maintenance application services, and optimization.
Home automation flexibility to use automation uses of IoT home appliances. In open
source IoT platform: Home assistant, demotics, OpenHAB uses IoT device security
with message queuing, administration of devices, data collection, analysis, visual-
ization, and integrity with services. The other IoT applications are transport, agri-
culture, production, environment, industrial, safety, and retail. Network connected
objects will provide security and utilization of applications.
3.1 Sensors and Network Communication
The availability of the services of the proposed architecture with efficient use of
technology like CCTV, door sensors, smart lock sensors with gateway collection
of the data using communication protocols, and alert or alarm warning information
to the users. Security, availability, and response according to events and provide
security, privacy is an important task in smart homes [19].
3.2 Smart Home—phone and notification through Functions
Smart home functions in the proposed smart IoT home system will provide various
functions according to the user specifications that use, smart control use of mobile
phones, functions like IoT operations which are electric light On/Off refrigerator
On/Off, door Open/Closed, security On/Off, smart speaker On/Off, smart TV On/Off,
and temperature On/Off. In addition to home appliance control using smartphone and
notification through Email, SMS, and other applications like solar power system and
the smart parking applications according to the user specifications.
3.3 Smart Home—Automation
Home IoT applications provide home automation using network service providers
by quality of services. The standard traffic network management, security protocol
Bluetooth use of optimizing data transmission using channels like Zig-Bhee, 5G
network to automate the response of actions in home automation by sending signals in
unidirectional sensors, interfaces, controllers, send commands to actuators (outputs).
It uses residential gateway sensors, light, temperature with the use of the smartphone
with interface functional control with actuators (i.e., light home automation based
on WSN technology, Zigbee, and wi-fi technology) and android-based smartphone,
functionalities of smart home systems [20]. House automation systems are connected
with the Internet which communicates to the user. A sensor helps to user to provide
security. All security of terminals, alarm, records with sensor data like video, door
interface, and use of energy efficiently. The security system which uses automatic
doors, safety, and alarm systems [21]. Home assistant process and home controlling
including collect data, storing, and to analyze data. Home automation systems and
trigger commands are based on configuration. Smart home trigger based on past
behavior of a user, the apps will be used to control devices in mobile phones and
tablet. The model with a scenario to design, control, and monitor system for the
smart-based home systems [22, 23].
3.4 Smart Home—Controlling and Monitoring with Smart

Mobile Apps and Functions
The home automation will use IoT-based sensing and monitoring platforms that
have sensors, signals in smart home—Automation. It uses communication with wi-fi,
Bluetooth, and others. The functional principles are using Algorithm 1. The sensors,
signals with devices read the data home security, smart home interior design, intel-
ligent light, hardware, Arduino board, software IoT design, residential connectivity,
home ecosystem devices, components, software, and sensors to monitor setup the
functions in smart home technology that chooses a site for smart home construction.
The architecture design that provides all amenities using IoT-based smart home tech-
nology according to the user specifications the some of the of functions which are
energy monitoring, health smart parking, and smart gardening, etc. The controlling of
operations and applications according to the sensor data with mobile app functions,
for example, light on/off.
Algorithm 1 // Pseudo Code for Home Appliances

// Input: Using Mobile Interface Functions ON/OFF
Output: Light ON/OFF
Begin
Step.1. Read the input data from Sensors with set up communication credentials
Step.2. Use the Console Arduino-Board with Smart Phone App control operate the
functions according to sensor data
Step.3. Setup the Functions and operate Home appliances using sensor data using
communication
Step.4. Control the signals using sensors control the home appliances with required
operations on/off
Step.5 Report the information to the user
End
Developing a smart home with home safety, required functions monitoring, smart
parking, and video surveillance, the smart home uses of hardware, software, sensors
and utilizes the household applications that optimize home application systems by
low-cost maintenance, energy (i.e., Remote adjustments, improve efficiency). For
examples, applications lighting is light turn on/off, stove turn on/off, water on/off,
video monitoring video conference with family, friends, security by video surveil-
lance control— automatic alert, SMS, and detection. Preventive decision systems
like healthcare application will use of Bluetooth technology to measure blood pres-
sure, temperature, safety, alarm, activity monitoring as well and direct exercises, diet,
food, and preventive measures, internal functions of each day life activities, SMS,
alert, and functional operations to the stake holders [24, 25].
4 Discussion
The IoT-based smart home will improve efficiency. Home lighting-automated

lighting, managing energy with effective usage of mobile applications that can
control Air conditioning, Security with use of Wireless Technologies. IoT prob-
lems are data management, Home care systems, Home appliances consumption,
Home safety, Home security, device connectivity and software reliability. The system
uses networking, hardware, software, communicating protocols, switches, Sensors to
ensure system interfaces between them to provide security, planning, configuration,
monitoring Connectivity for home applications with Time sensitivity networking.
Communication end to end, time synchronization, latency, reliability, resource,
features, options, configurations and protocols. The recommendations of Smart home
reduce energy usage, warnings of defective are home, smart-phones in health care
medical guidelines, health assistant, fire systems, security systems, Home safety,
security, connectivity and scientific analysis. Smart Home is a combination of sub-
systems related to advanced technologies. The smart home interaction with a user
and enhance safety, interaction, convenience, and optimize the people’s lifestyle.
Support remote operations, it can monitor interacting with home through a mobile
device in a remote work. The smart Home realizes real-time meter security is used for
reading and timing process support network, business, and intelligent services like
Home air conditioning, power information service, home appliances and electricity
management. In future Smart Home applications can be extended to communicate
service applications use Intelligent interactive terminal users to operate command
interaction with smart home appliances security equipment systems.
Sensor networks are used to measure for understanding the environment, natural
resources, urban environment, IoT, Internet into future specifications are user appli-
cations. Pervasive communication, IoT, and smart connectivity computer systems
that are interacting with objects, sensors using smart devices like (Smartphone, smart
watches). The Internet of Things helps humans in home automation, house hold oper-
ations which respond with actions. The proposed model will help the developers
in smart home appliances using IoT which use sensors, hardware, RF-transmitter,
receiver, sensors, user interfaces, hardware, processor, data collector analysis and
reporting systems, and effective utilization of required functional services.
In future, smart tags are used for logistics, vehicles using heterogeneous systems,
smart traffic that automatically uses intelligent smart traffic and intelligent applica-
tions. Future Internet, Worldwide Internet of Things many objects, smart connec-
tivity, network resources, smart phones, devices, objects, and intelligent environ-
ments that smart home appliances with integrated systems into future home appli-
ances are used to identify the suspicious activity to detect any activity by video
surveillance and report when events in critical conditions.
References
1. Lee, In, Lee, Kyoochun: The Internet of Things (IoT): applications, investments, and challenges
for enterprises. Bus. Horiz. 58, 431–440 (2015)
2. Kumar, M., Ramu, P., Murty, C.H.A.S., Magesh, E., Lunagariya, R.: Mobile-based horne
automation using Internet of Things (IoT). In: 2015 (lCCICCT). IEEE, pp. 340-34 (2015)
3. Alaa, M., Zaidan, A.A., Zaidan, B.B., Talal, M., Kiah, M.L.M.: A review of smart home
applications based on Internet of Things. J. Netw. Comput. Appl. 1–36 (2017). Elsevier
4. Vignesh, G., Sathiya Narayanan, M., Abubakar, B.: Customary Homes to Smart Homes Using
Internet of Things (IoT) and Mobile Application. IEEE, pp. 1059–1063 (2017)
5. Malche, T., Maheshwary, P.: Internet of Things (IoT) for building smart home system. In:
International Conference on I-SMAC (IoT in Social, Mobile, Analytics, and Cloud) (I-SMAC
2017). IEEE, pp. 65–70 (2017)
6. Swetha, S., Suprajah, S., Vaishnavi Kanna, S., Dhanalakshmi, R.: An intelligent monitor system
for home appliances using IoT. In: International Conference on Technical Advancements in
Computers and Communications. IEEE, pp. 106–109 (2017)
7. Li, M., Gu, W., Chen, W., He, Y., Wu, Y., Zhang, Y.: Smart home: architecture, technologies
and systems. In: ICICT-2018. Elsevier, pp. 393–400 (2018)
8. Petnik, J., Vanus, J.: Design of Smart Home Implementation Within IoT with Natural Language
Interface. Elsevier, pp. 174–179 (2018)
9. Yang, H., Lee, W., Lee, H.: IoT smart home adoption: the importance of proper level automation.
Hindawi. J. Sens. 1–11 (2018)
10. Mao, J., Lin, Q., Bian, J.: Application of learning algorithms in smart home IoT system security.
Math. Found. Comput. 63–76 (2018)
11. Jo, H., Yoon, Y.I.: Intelligent smart home energy efficiency model using artificial TensorFlow
engine. Hum. Cent. Computer. Inf. Sci. 8(9), 1–8 (2018)
12. Al-Kuwari, M., Ramadan, A., Ismael, Y., Al-Sughair, L., Gastli, A., Benammar, M.: Smart-
home automation using IoT-based sensing and monitoring platform. In: 2018 IEEE 12th (CPE-
POWERENG 2018), Doha, pp. 1–6 (2018)
13. Somani, S., Solunke, P., Oke, S., Medhi, P., Laturkar, P.P.: IoT based smart security and home
automation. In: 2018 Fourth International Conference on Computing Communication Control
and Automation (ICCUBEA), Pune, India, pp. 1–4 (2018)
14. Ahmed, B.S., Bures, M., Frajtak, K., Cerny, T.: Aspects of quality in the Internet of Things
(IoT) solutions: a systematic mapping study. IEEE Access 7, 13758–13780 (2019)
15. Batalla, J.M., Gonciarz, F.: Deployment of the smart home management system at the edge:
mechanisms and protocols. Neural Comput. Appl. 31(1301–1315), 1301–1315 (2019)
16. Khalaf, R., Mohammed, A., Essa, E., Ali, H.: Controlling smart home activities Using IoT. In:
2019 International Conference on Computing and Information Science and Technology and
Their Applications (ICCISTA), Kirkuk, Iraq, pp. 1–6 (2019)
17. Bhat, O., Bhat, S., Gokhale, P.: Implementation of IoT in smart homes. Int. J. Adv. Res. Comput.
Commun. Eng. 6(12), 149–154 (2017)
18. Shah. H.: Home Automation Using IoT. https://www.simform.com
19. Batalla, J.M., Gonciarz, F.: Deployment of the smart home management system at the edge:
mechanisms and protocols. Neural Comput. Appl. 31, 1301–1315 (2019). https://doi.org/10.
1007/s00521-018-3545-7
20. https://www.slideshare.net/shohin/iot-home-automation-using-arduino-cayenne
21. https://www.businesswire.com/news/home/20200102005197/en/Prominent-IoT-Technology-
Leader-Showcase-Newest-Must-Have
22. Cheruvu, S., Kumar, A., Smith, N., Wheeler, D.M.: Demystifying Internet of Things security
successful IoT Device/Edge and Platform Security Deployment. Springer, pp. 347–411 (2020)
23. Linskell, J., Dewsbury, G.: Home automation system. Science Directory. In: Handbook of
Electronic Assistive Technology. Elsevier (2019)
24. Kadima, M.N., Jafari, F.: A customized design of smart home using Internet of Things. In:
ICIME2017. ACM, pp. 83–86 (2017)
25. Gubbi, J., Buyya, R., Music, S., Palaniswami, M.: Internet of Things (IoT): a vision architectural
elements and future directions. Future Gener. Comput. Syst. 29, 1645–1660 (2013)
Multilingual Crawling Strategies
for Information Retrieval from BRICS
Academic Websites
Shubam Bharti, Shivam Kathuria, Manish Kumar, Rajesh Bhatia,

and Bhavya Chhabra
Abstract This paper proposes a web crawler for finding details of Indian origin
academicians working in foreign academic institutions. While collecting the data of
Indian origin academicians, we came across BRICS nations. In BRICS, except South
Africa, all other countries have university websites in native languages. Even if the
English version is available, it is with lesser data that can’t make the decision of
whether an academician is of Indian origin or not. This paper proposes a translation
method of the data from the main website in the native language to English language.
It is to be noted that google translation on such website does not give output in the
desired manner. We discover the area of translation using various APIs as well as
other techniques available for the same like UNL, NER (provides a supportive role
for translation), NMT, etc. Also, we will explore Stanford NER and segmenter for
these operations.
Keywords Web crawler · Indian origin academicians · BRICS · Language

translation
S. Bharti (B) · M. Kumar · R. Bhatia

Department of Computer Science and Engineering, Punjab Engineering College, Chandigarh,
India
e-mail: shubambharti1998@gmail.com
M. Kumar
e-mail: Manishkamboj3@gmail.com
R. Bhatia
e-mail: rbhatiapatiala@gmail.com
S. Kathuria
Department Electrical Engineering, Punjab Engineering College, Chandigarh, India
e-mail: shivamkathuria9@gmail.com
B. Chhabra
Department of Computer Science and Engineering, SRM Institute of Science and Technology,
Chennai, India
e-mail: Bhavya1600@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_17
154 S. Bharti et al.
1 Introduction
The heart of any search engine is the data collected by its web crawler. A web
crawler can be defined as the program that browses the internet storing links to
and information about the pages they visit. There is a type of web crawlers that
focus on only the web pages which are relevant according to a pre-defined set of
keywords/topics. They are called focused crawlers [1].
For finding the information of Indian origin academicians [2], various constraints
were devised, different educational domains were studied. After this, a relevancy
checking-based mechanism was developed that can guide the focused crawling.
Our developed crawler was running successfully for all the countries in which the
academic websites are in English, but when we tried to apply the same thing on
non-English websites problem raised.
This paper discusses the problems that came across while crawling the BRICS
nations, which have a high number of Indian origin academicians. However, these
websites are mostly in their native language and have to be translated via the Google
Translate API, which is automatically handled by the browsers but not by the web
crawlers which directly visit the seed URL provided. This process yields very poor
results as the keywords used to train crawler were in English language.
2 Problem Statement
A. Language problem for academic websites of BRICS Nations
BRICS Nations—The BRICS nations is an association of 5 nations, namely Brazil,

Russia, India, South Africa, and China. As we need the data only from non-Indian
nations, so India is excluded.
For South Africa, the language of the University websites is English, so there is no
need for translation. For Brazil, the official language of most websites is Portuguese,
and some are in Spanish. Both these languages make no changes in the names, and
hence extracting the data needs no translation there as well.
For China, all the academic websites are in Chinese and hence the names as
well need to be translated. But here, a mere Google Translate does not work as it
sometimes gives the literal meaning of the word.
B. Extracting data from the translated text
From the converted texts driven from webpages, required data needs to be extracted.
The data to be fetched includes anything that might prove useful for the purpose of
the project, like name, department name, specializations, contact, etc. For Chinese
websites, recognizing above-mentioned data from a translated webpage is difficult.
Hence, we need to handle this challenge as well.
Multilingual Crawling Strategies for Information Retrieval … 155
Hence, we aim to enhance the current crawler [2], with the capability to
handle these different languages, convert them to English based on the translation
mechanism needed for that particular instance.
3 Techniques Used
3.1 UNL: Universal Networking Language
UNL is a technique in which the given words/sentences are converted into Universal
words and vice-versa. Thus, it focuses on taking a source language and converting
it into a language which is independent of the given languages, whether source or
target. As shown in Fig. 1, the system of UNL consists of the two core processes,
namely UNLization and NLization, which are explained below.
These two processes are explained in detail.
(1) UNLization (using IAN): Interactive analyser (IAN) is a tool based on JAVA.
It is a web application used for the process of UNLization. Its input is a natural
language (source language), and it converts the given words into UNL form
which is language independent.
(2) NLization(using EUGENE): This is an online software tool. It was developed
by UNDL organization. It is similar to IAN and was released in 2012 [3]. The
UNL created in UNLization process is given as input to it.
UNL Components: The various components of UNL are discussed next.
Fig. 1 System architecture of UNL

(a) Universal Words (UW): Universal expressions consists of nodes, which are
formed, or represented, by the Universal Words. 2 other components, namely
relations and attributes are combined to represent these words. The following
format given for Eq. (1) is used to represent a Universal word in UNL.
<uw>=<headword>[<constraintlist>] (1)
To demonstrate this process, we provide the following English expression in
(2). English sentence: Man drives car. (2)
Universal words here are
man(icl>person)@singular),drive(icl>travel>
do,agt>thing),car((icl>object)@singular)))
(b) Relations: Relations are the links that exist between 2 UNLs. The relation
names, that are then used to make UNL expressions, are pre-decided set of
names.
(c) Attributes: The subjective nature of a Universal Word in a sentence is depicted
with the help of attributes.
3.2 NER: Named Entity Recognition
NER is an initial step in data extraction. The major objective of NER is to locate and
classify different entities (which are named) in the text provided, and allocate them
into several pre-defined categories. These categories can be person, organizations,
expressions of time, locations, monetary values, symbols, percentages, quantities,
etc.
One example of the same mapping is shown below in Fig. 2. Where the sentence
is used to classify person and organization.
Fig. 2 An example of NER

3.3 Direct APIs
Many direct APIs like pytrans, googletrans, translation library of JAVA were used
for the translation of Spanish and Portuguese. Direct use of googletrans is enough
for these languages and gives high accuracy for the names.
3.4 NMT: Neural Machine Translation
Neural machine translation (NMT) [3] has proved to be one of the most powerful
and efficient algorithm to perform the task of natural language translation. Where
statistical translation uses only data for translation, it also raises the issues wrong
translation of sentences due to it, as mostly, they make no sense after the process.
One of these models of NMT is the encoder decoder structure. This architecture as
shown in Fig. 3 is comprised of 2 recurrent neural networks (RNNs) which are used
together in tandem to create a model for translation. After coupling it with the power
of attention mechanisms, this architecture can achieve impressive results [4].
4 Methodology
For BRICS, we modified this approach to bring in translation as well, which makes
the final approach as follows:
1. A list of University URLs is fed to crawler which visits them.
2. It extracts those URLs which might have faculty member names in the pages.
3. Applying NER on those pages, we can find proper nouns, which are mostly
names.
4. These names, in case of BRICS nations, are translated to the English language.
5 Data Gathering
The following datasets were used for the filtering process.
Fig. 3 Structure of NMT

translator
5.1 Dataset of Names
The list of Indian names required for matching has been taken from Kaggle [5].
Also, the list of names we had from our ongoing project having database of the
Indian origin academicians in US and UK. Further names from Lok-Sabha elections
and parliamentary election [6] were extracted as to be candidate in the elections you
need to be an Indian citizen.
5.2 List of Universities
The start the crawling we need to have URLs of the academic websites that act as the
seed URL for the crawler to get the academicians. The following are the approach
we use to get the list of all the seed URLs of the home page for the universities in
the BRICS nations.
1. Higher education boards and sources [7]
2. Seed URL using google maps crawling [8].
6 Counter-Intuitive Approach
The proposed approach to deal with the change in names due to literal word translation
by the google translate is as follow. We are making a comparison between the English
version of the name and it’s the translation in Chinese. So that we can map the Chinese
translation with the original Indian name [9].
The first part of this creating a mapping of Indian to Chinese names. For this,
we used Google Translate to find the Chinese translation of the Indian name and
then again translating that Chinese translated name back to English to check for the
consistency in the translation and repeating the process one more time. This was
done for 33,000 Indian names derived from the datasets mentions above.
As shown in Fig. 4, the first column is the original name, and the consecu-
tive columns are translations from Google Translate of the previous column. The
reader can clearly observe the total change in name from column A to E after these
translations. So, the set of translations is as follows:
English -> Chinese -> English -> Chinese -> English
So, the first case is looking for Indian names in the English article that contains
Indian names and then counting the Indian names manually from the NER labeled
PERSON entities in the text and also making a count by comparing with the dataset
above.
Next, we convert the English article to foreign language, say Chinese using Google
Translate and then search for the Indian names in Chinese from the translated Chinese
Fig. 4 Sample data set of names
text using the list of names in column B and column D. Surprisingly, the count in
both the cases comes out to be different. Here, in the translated Chinese text file we
use regex to remove any English words present in the file.
Next, the Translated Chinese text from above is converted back to English and
tested for getting the names using NER as in the first case.
7 Results
Figure 5 shows the results from translation of entities of Chinese Websites. First row
shows the length of entities we get after running NER on the sites—1818. In this
corpus, the entities designated as PERSON were 2869, and it matched 2239 names
from our dataset of names, whose size is 30,000. Also, total matches for overall
entities is 35,821 (includes names, as well as other text on the website pages).
Now, as shown in Fig. 4, we translated the 30,000 names back and forth to English
and Chinese. Using the final translated names from those results, we match them again
Fig. 5 Sample results for Chinese website

with the NER entities. This time, the matches for English text were reduced to 450,
and the Chinese matches reduced to 317.
Similarly, for Row 8 and Row 9 which match the Chinese versions of the words.
Applying all these translations on the text, the final matches with the dataset as shown
in row 11:7814, which is a decrease of 78.18% from Row 4 where for the original
text, the matches were 35,821.
8 Conclusion
This paper discussed the usage of various techniques, and the results obtained after
applying some of those viable techniques on the corpus of data. The results can be
divided into 3 categories for BRICS nations:
A. South Africa: The data is already in English and hence needs to translation, so
just direct extraction of data is possible.
B. Brazil: The languages for the websites are Portuguese and Spanish. Both these
languages make no changes to the names of people, and hence names need not
to be translated.
C. Russia and China: For Russian and Chinese, the character set is different than
English with some letters like M and V and others are not present in one or both
of these.
Hence, to generalize, any language which has a character set same as that of
English is easily translatable using currently available methods, but other languages
need different methods to achieve the same. A combination of NER and NMT-based
translators seem a viable option for the same.
References
1. Kumar, Manish, Bhatia, Rajesh, Rattan, Dhavleesh: A survey of Web crawlers for information
retrieval. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 7(6), e1218 (2017)
2. Kumar, M., Bindal, A., Gautam, R., Bhatia, R.: Keyword query based focused Web crawler.
Procedia Comput. Sci. 125, 584–590 (2018)
3. Neural Machine Translation (online article). https://towardsdatascience.com/neural-machine-
translation-15ecf6b0b. Last accessed 1 March 2020
4. Wang, X., Zhu, C., Li, S., Zhao. T., Zheng, D.: Neural machine translation research based on the
semantic vector of the tri-lingual parallel corpus. In: 2016 International Conference on Machine
Learning and Cybernetics (ICMLC), Jeju, pp. 69–74. https://doi.org/10.1109/icmlc.2016.786
0879
5. https://www.kaggle.com/chaitanyapatil7/indian-names/version/1 [online dataset]
6. https://github.com/datameet [a community of Data Science enthusiasts.]
7. https://www.ugc.ac.in/oldpdf/Consolidated%20list%20of%20All%20Universities.pdf
8. https://github.com/shivamkathuria/Google-Maps-Crawler [to get code of developed crawler]
9. Creekmore, L.: Named entity recognition and classification for entity extraction. District Data
Labs
Missing Phone Activity Detection Using
LSTM Classifier
Abhinav Rastogi, Arijit Das, and Aruna Bhat
Abstract We propose a smart phone application that aids a user to find his lost
phone. In this application, we try to find the cases, where a mobile can be separated
from the user, by training a classifier. The application identifies specific events, where
a mobile phone goes away from the user, thus recording cumulative sensor data. The
obtained sensor data can be used for further analysis of mobile phone surroundings
which could narrow down the search domain. A more trivial example would be
that GPS couldn’t help when you forgot, where the phone has been placed. So, the
application can gather the information of surroundings at the last recognized event
and make the search more effective.
Keywords Mobile sensor · Smart phone detection · Long short-term memory ·

Recurrent neural network
1 Introduction
Science and technology in today’s world has enhanced by a great extent. It is an effort
for making humans life easier and importantly comforting. A lot of information that
we might need can be stored in a smart phone. So, a mobile phone acts as our
secondary brain not only to remember but also remind, see, and listen, covering
basically all the senses humans have. With all these abilities, a smart phone has been
such a helpful companion to our daily life, what if you lost it or forgot it at some
place? And was low on battery causing it to turn off eventually? We will be left
A. Rastogi (B) · A. Das · A. Bhat

Department of Computer Science and Engineering, Delhi Technological University, Delhi
110042, India
e-mail: abhinavrastogidtuite@gmail.com
A. Das
e-mail: arijit.dtu2k16@gmail.com
A. Bhat
e-mail: aruna.bhat@dtu.ac.in
https://doi.org/10.1007/978-981-33-4543-0_18
162 A. Rastogi et al.
clueless. There aren’t many efficient ways to find the phone; the existing solutions
give you a GPS location, which could be helpful only when outside. And the mobile
need to be alive when the location request is triggered.
What if you were doing some task at your home and forgot your phone? Like
placing the phone in a bookshelf picking books. The GPS can just say that it is in
your home but nothing more. As said a smart phone has a sensing capability like that
of humans, we want to use this to advantage. A mobile phone can be made to sense the
surroundings like lighting, coordinates (Indoor positioning if it can), sound signals
(to identify if the environment is silent/noisy) along with GPS; at the instant, a phone
got separated from the user [5]. On sensing that is recording the sensor information,
it can be uploaded to a server. Thus knowing surrounding info on all cases when
a phone identifies the trained instances. In this paper, we propose a mobile phone
monitoring application. The application serves as an aid, helping narrow down the
details of the surroundings when a phone gets separated from the user. The existing
solutions make the user to query for the location after they realize that the phone is
lost. Then, the application can give the GPS location contacting the mobile phone
provided the mobile phone is on, whereas our application addresses the problem
of phone being separated from the user through the question “How or when can a
phone be separated from the user?” Thus, the trigger events, which indicate when a
phone gets separated from the user. With this approach, the user can get the latest
information about phones location before its dead (turned off). The trigger events
can be a phone getting dropped from the pocket or hands unnoticed, placing it at
a location and forgetting the location due to some other important distraction or a
mobile phone being stolen from the user [2, 6, 7].
2 Related Work
Loosing, a phone can be considered as one of the trivial problems yet difficult to
recover, a mobile user could face. On losing a phone, a user would like to know where
the phone currently is and want the phone not to be misused. Mobile phone operating
systems like Android and iOS have apps Find your phone and Find My iPhone that
can query the device for its location. But, the disadvantage with this approach is
that the phone needs to be alive (ON) when the query is made; otherwise, the phone
can’t respond to the query. A smart phone can learn the location of a neighborhood
not only through physical coordinates but also in a logical way. That is, a smart
phone can sense the light ambiance and sound in a neighborhood describing if the
location is a quite/noisy place, a dark/bright area logically. Research is also being
done in this field, where a mobile phone can locate users in adjacent stores, solving
the problem through logical localization [5]. This added to the application, brings the
search domain lower making work easy for users. The other approach is finding the
trigger event of the phone being lost and alerting the user [1] proposed a framework
called iGuard which triggers an alarm when it identifies a theft. Their paper addresses
problem by analyzing the activity during moment of theft. That is, they described
Missing Phone Activity Detection Using LSTM Classifier 163
how the sensor signatures of a mobile phone would differ when being taken out by
the thief and the mobile owner, thus alerting the user instantly on theft. However,
the disadvantage of this approach is that feature extraction is done manually by the
researchers, which is a very cumbersome task. For the activity of a missing phone, it
is possible that we may miss out on the actual factors responsible for differentiating
two different actions. The approach of our application aligns to their paper that is
how or when a mobile phone could be separated from the user. In this paper, we make
use of a deep learning approach to classify our sensor data for different activities.
3 System Design
The application at a high level can be divided into communication among three
components, (i) The mobile phone, to monitor the user’s activity, (ii) A classifier that
can take in the sensor information from mobile phone and identify the trigger events,
(iii) An online platform, where the sensor information about the neighborhood is
logged automatically for further logical analysis when the trigger event is identified.
Following are the assumptions for the app to be functional
(i) The application needs to be started manually and continuously monitors the
user activity for trigger, (ii) The mobile phone must be on when the trigger event
occurred, (iii) The mobile phone must have Internet on, in order for the sensor to
log information to an online platform. Figure 1 gives a high-level overview of the
mentioned system. The most important and challenging component of the system
is identifying the trigger event. The classifier must differentiate the short duration
activities like taking the phone out of pocket using the features extracted from the
sensor signatures of various activities. Once trigger has been identified, the system
can log sensor information onto an online platform for further analysis.
Fig. 1 High-level working

Fig. 2 Algorithm
4 Algorithm/Method Design
Figure 2 describes the algorithm of the application. Information shall be logged to

the online platform directly in both the cases when the phone is dropped and when
the phone is stolen. But in case of a mobile phone being placed on a surface, this
event is a longer event, and the phone shall be stationary unless another activity is
noted. Therefore, the algorithm checks if the mobile is inactive, and only then, it logs
the sensor information. The inactivity of the mobile phone can be verified through
sensor readings, mic can be used to know if the user is around [3]. This is rather a
complex case because the mobile could be separated from the user, but he/she could
be still aware that the phone is nearby. Therefore, the system needs all the possible
factors that can assure that the user is nearby.
5 Experimental Setup
5.1 Preliminary
In this section, we first develop various scenarios, where a person can be separated
from his phone. Specifically, we take three different situations into consideration.
(i) User places a phone on a surface and forgets about it. (ii) User drops his phone
Fig. 3 Testing accuracy as

predicted by our model
while walking or standing. (iii) User’s phone is stolen from his pocket by a thief
or perpetrator. The implementation of this paper is focused on creating different
signatures for each of these activities and then training a classifier to demonstrate
how mobile sensor data could be used to warn the user about the missing phone
problem in real time.
5.2 Data Collection
We used the AndroSensor application available on Google play store to collect

sensor data when the user performs several motions. We use the values of accelerom-
eter, gyroscope, linear acceleration, and gravity sensors for our application. Experi-
ments are performed on four android phones (Pixel 2, Samsung s7, Samsung Note3,
Samsung S8). Data for all sensors is collected at a frequency of 10 Hz. We found four
volunteers (2 female and 2 male) between the ages of 19–23 to perform experiments.
In the first experiment, the user walks with the phone in his hand and accidentally
drops it on the ground. As part of the second experiment, a volunteer walks with
the phone in his pocket and then performs the activity of taking the phone out of his
pocket. In the third experiment, while the first volunteer walks at a normal pace, the
second volunteer performs the act of stealing the phone from the person’s pocket
slyly. In the fourth experiment, the user places his phone on a plain surface. Further
experiments comprised of the user walking, standing, and sitting with phone in his/her
pocket/hand. Each experiment is performed by a volunteer for a period of seven sets
of 6 min each. In each set, the specific activity is repeated periodically such that we
collect 50 samples per volunteer per activity, leading to a total (50 × 7 × 4) 1400
samples. In addition to that, all experiments are video recorded, and data collected
from the filters is labeled manually by comparing sensor data to the video frame by
frame.
5.3 Data Processing
Collected sensor data was passed through a low pass filter to remove noise. Next,
data collected from each experiment per volunteer was sampled with a window size
of 3 s with 50% overlap. Since the frequency of data collection is 10 Hz, we end up
with 30 samples of data per input. Each sample is further represented by 12 features
individually, where the twelve entities comprise of x, y, z values of each of the four
sensors involved, namely accelerometer, gyroscope, linear acceleration, and gravity
sensors.
Data processing has been done using Python’s SciPy and NumPy libraries.
5.4 Training
In most of the previous models for theft detection, feature extraction is done manually
by the researchers. For example, [1] proposed a framework called iGuard, where the
authors explicitly look for a specific signature in both the activities of the phone
being taken out by the user himself and another, where the phone is stolen by the
perpetrator.
Their paper mentions how in the event of a user taking out the phone himself; the
speed of the user first decreases; then, the phone is taken out, and then, normal speed
is resumed.
Also, for another case of the perpetrator taking out the user’s phone, the phone is
first taken out; then, the speed of the perpetrator increases. These scenarios though
true for most cases are highly specific, and in an activity of theft, the user or the
perpetrator may not act like the model figures they have been represented as in
the application. Feature extraction is a highly cumbersome task and requires precise
feature engineering [4]. For the activity of a missing phone, it is possible that we may
miss out on the actual factors responsible for differentiating two different actions.
Also, as we consider more and more sensors in our application, it is a challenging
task to analyze the effect of each sensor on different activities. To counter the above
scenarios, we make use of a deep learning approach to classify our data for different
activities. We make use of a LSTM recurrent neural network to classify mobile sensor
data. The advantage of using LSTM is that while giving accurate results, it does the
feature engineering for us [4]. Also, we can avoid the hassle of doing a whole lot of
signal processing before using the sensor data in our model. We wanted to use an
RNN for our activity identification model, because we were dealing with a sequence
of sensor data. And also, because we wanted the neural network to learn the hidden
features that differentiate two different activities.
5.5 Implementation
The classifier has been implemented in Python using tensor flow. Along with that,
all sensor processing, data preprocessing, and data analysis have been done using
scikit-learn, NumPy, matplotlib, and pandas libraries in Python along with Jupyter
Notebooks.
6 Performance Evaluation
In this section, we evaluate the performance of our model under various scenarios to
show its accuracy and robustness. We have tested our model using Android Phones
(Pixel 2, Samsung s7, Samsung Note3, Samsung S8). The sampling rate is set as
10 Hz. This sampling rate is conducive to our model because each activity of taking
the phone out, or stealing of the phone by the perpetrator takes roughly 3 s. Having a
sampling rate of 10 Hz gives us 30 readings per sensor per time window, which is good
enough to train our model. All experiments have been conducted in four different
scenarios: (i) the library, (ii) an open area, (iii) an open area with people around,
(iv) in an office like setting. To evaluate the accuracy of our model, we randomly
segment our dataset into training and test set, in the ratio 4:1. Figure 4 highlights the
accuracy of our model which is 95.52%, which is fairly good. We also calculate the
precision (96.03%), recall (95.51%), and f1_score (95.50%) for our model. Figure 3
shows how the loss over both training and testing data decreases with increase in
number of training iterations. Also, with increase in number of training iterations the
accuracy of the model increases.
Fig. 4 Training and testing

losses and accuracies
In the confusion matrix depicted below, we can find that while the activities of
sitting, standing, and placing the phone on the table are detected without a miss;
certain segments (25) of walking are misclassified as being dropped or as taking
phone out of pocket. Apart from that, only four segments of the activity of phone
being taken out by the user were classified as that being stolen by a thief. Also, only
eight segments of the phone being stolen by a thief were classified as phone being
taken out by self (Fig. 5 and Fig. 6).
Fig. 5 Confusion matrix normalized to percentage of total dataset
Fig. 6 Confusion matrix

showing the accuracy of our
model using different colors
7 Conclusion
The proposed solution can successfully identify the triggers with around 95.5% accu-
racy, thus can send the log information for an effective search. But the application
has few drawbacks. The application is of no use when the trigger event happens,
and the mobile phone is turned off. There is a solution, where a signal can be trans-
mitted from the phone even if it is turned off using the bios battery. So, a minor log
information can be embedded to signal can be sent, for recovery. We will have to
manually open the app in order to check for triggers. Phone getting dropped and user
falling down along with the phone can give the same results. Hence, the classifier
can further be trained for such kind of scenarios. Therefore, proposed application
can successfully recognize the trigger events, which play a vital role in automated
sensor logging. This helps in removing the window between a user recognizing the
phone is missing and responding to that, by automation, thus helping in finding the
phone.
References
1. Jin, M., He, Y., Fang, D., Chen, X., Meng, X., Xing, T.: iGuard: a real-time anti-theft system for
smartphones. IEEE Trans. Mob. Comput. 17(10), 2307–2320 (2018). https://doi.org/10.1109/
tmc.2018.2798618
2. Liu, X., Wagner, D., Egelman, S.: Detecting phone theft using machine learning, pp. 30–36
(2018). https://doi.org/10.1145/3209914.3209923
3. Chang, S., Lu, T., Song, H.: SmartDog: real-time detection of smartphone theft. In: IEEE Interna-
tional Conference on Internet of Things (iThings) and IEEE Green Computing and Communica-
tions (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart
Data (SmartData), Chengdu, pp. 223–228 (2016). https://doi.org/10.1109/ithings-greencom-
CPSCom-SmartData.2016.61
4. Pulver, A., Lyu, S.: LSTM with working memory. In: International Joint Conference on Neural
Networks (IJCNN), Anchorage, AK, pp. 845–851 (2017). https://doi.org/10.1109/ijcnn.2017.
7965940
5. Uddin, M.P., Nitu, A.: A tracking application for lost/stolen android phones using face detection
(2015)
6. Carrara, F., Elias, P., Sedmidubsky, J., Zezula, P.: LSTM-based real-time action detection and
prediction in human motion streams. Multimedia Tools Appl. 78, 27309–27331 (2019)
7. Senyurek, V.Y., Imtiaz, M.H., Belsare, P., Tiffany, S., Sazonov, E.: A CNN-LSTM neural network
for recognition of puffing in smoking episodes using wearable sensors. Biomed. Eng. Lett. 10,
195–203 (2020)
Suvarga: Promoting a Healthy Society
R. L. Priya, Gayatri Patil, Gaurav Tirodkar, Yash Mate, and Nikhil Nagdev
Abstract In India, over 22% of the population is below the poverty line. This poverty
pushes people on streets which in the future transforms into slums. These slums,
as are not planned, lack certain necessities like electricity, sanitary services, and
basic hygiene resources leading to a hub for the spread of diseases. In essence, the
primary aim of this paper is to identify the leading causes of diseases in slum areas of
Mumbai using data collected from IoT modules, health checkup drives, and various
government authorities. With this information, the concerned civic authorities and
slum residents will be alerted regarding the danger so that necessary action can be
taken. This, in turn, promotes the healthier society in various slum regions of India.
Keywords Internet of things (IoT) · Slum management · Sanitation · Decision

tree · LSTM · Air quality index · Water quality index
R. L. Priya · G. Patil (B) · G. Tirodkar · Y. Mate · N. Nagdev

Computer Department, Vivekanand Education Society’s Education of Society Chembur,
Chembur, Mumbai 400074, India
e-mail: 2017.gayatri.patil@ves.ac.in
R. L. Priya
e-mail: priya.rl@ves.ac.in
G. Tirodkar
e-mail: 2017.gaurav.tirodkar@ves.ac.in
Y. Mate
e-mail: 2017.yash.mate@ves.ac.in
N. Nagdev
e-mail: 2017.nikhil.nagdev@ves.ac.in
https://doi.org/10.1007/978-981-33-4543-0_19
172 R. L. Priya et al.
1 Introduction
According to the United Nations (UN 2009) estimates, only 4% of the terrestrial
surface is occupied by cities [1]. Though the percentage is so low, more than half the
world population stays in these cities which eventually generates a huge imbalance
in the world resources as this section consumes three-quarters of the world’s natural
resources.
For upgrading these slums, commonly, the first action taken is to demolish the
slums and reallocate the residents but since 1970 there have been multiple recom-
mendations by authors such as Turner (1972) which suggest otherwise. This gives
birth to the concept of upgrading slums and their residents to a better standard of
habitation [2]. The paper uses the approach of data analysis and deep learning to
have a better understanding of this approach and provide solutions for implementing
the same.
2 Literature Survey
Slum management has always been a major issue in the city like Mumbai [3]. Current
research by SRA maps the information of residents on a website with the help
of drones. The government has gathered demographic data from this method. [4].
Although this method covers many data fields, major data fields like the health factors
and pollution measures are ignored.
Improved infrastructure can prove to be a major catalyst for achieving major
sustainable goals. Taking this aspect into consideration, the UN-Habitat Opinion
Survey method which was based on the nature of social reality and the perspective
of the researchers was used in the slum residents of Africa. The result after analysis
displayed that the infrastructure in Africa can be primarily developed with the help
of proper water supply, road networks, and telecommunication [4].
To understand the positive and negative implications of upgrading slums, a case
study was conducted in Moravia’s Neighborhood, Medellin. The principles of urban
design strategies and urban rehabilitation programs were identified through technical
documents, qualitative and quantitative data which was collected through surveys at
the community level [5].
A non-integrated framework was adopted to evaluate the suitability of the interior
design of a low-income multipurpose apartment to provide enhanced IEQ. Here,
expert opinion survey was taken, AHP TOPSIS was performed, and the final opti-
mized solution was generated [6]. This research included only the interior design and
not the other parameters like pollution and health and other geographical aspects.
Suvarga: Promoting a Healthy Society 173
3 Proposed System
3.1 Overview
To aid the health situation in the country, the model proposed a novel approach
by building an Internet of things (IoT)-based intelligence system. The model will
provide regular updates about the chances of epidemics, few symptoms to spot those
diseases, and also emergency contacts of concerned doctors along with a few home
remedies. It also alerts the government authorities regarding its aim to create the
necessary awareness among the government authorities and the slum residents.
3.2 System Architecture
The system is composed of various modules such as data collection from various
sources (IoT, BMC health data, water data, sanitation, survey data), preprocessing,
feature extraction, training, and testing models, displaying the final output on the
web and mobile application as listed below and shown in Fig. 1.
Fig. 1 System architecture of Suvarga

4 Detailed Architecture of Suvarga
The detailed workflow of Suvarga as shown in Fig. 2 describes the data collection
from heterogeneous sources with data preprocessing to build a better prediction
model.
Fig. 2 Detailed architecture of Suvarga

Table 1 Component in air quality monitoring device

Component name Features and description
MQ135 Gases, including NH3 , NOx, alcohol, benzene, smoke, and CO2 , are
detected by this air quality sensor
MQ2 Combustible gas and smoke at concentrations from 300 to 10,000 ppm are
detected by the semiconductor gas sensor
MQ3 This sensor is used to detect leakage of flammable gases (LPG), methane
4.1 Air Quality Monitoring System
To calculate the air quality index of specific regions, we have built an IoT based
module. Each module is composed of MQ series sensors (MQ135, MQ2, MQ3) to
measure the air quality index. These sensors are placed on an ESP8266 (Table 1).
4.2 Water Quality Monitoring System
Poor water quality is a big issue, especially in slum regions. To test the water quality,
the model uses the BMC water quality monitoring module. The data gives out the
pH, dissolved oxygen, BOD, COD, etc. Over ten years of data is collected.
4.3 Sanitation
Sanitation data from BMC gives the distribution of toilets for men and women in the
Chembur region. It provides information on the number of toilets with respect to the
number of people.
4.4 Algorithms Used
The model was trained using algorithms like long short-term memory (LSTM) and
decision trees. Later it was compared to choose the best algorithm.
4.5 Intelligent System
The proposed system aims to build an intelligent system for promoting healthy living
in various slum regions of India. It consists of mainly two components such as the
prediction or analysis model and data visualization model. Such components are
designed to display the analysis obtained from the analysis model and visualized in
graphical formats via the web application.
4.6 Prediction Model
Using the data collected from the IoT module for reading air quality parameters
and others collected from various other sources via government authorities such as
BMC and MPCB, an analysis algorithm is run to predict the values of air/water
and correlation among various features of the dataset. LSTM algorithm is applied to
data, and the correlation among features is found using Pearson’s correlation formula
available in the pandas’ library.
4.7 Data Visualization
The final outcome of all the analysis needs to be presented to the layman in terms
understandable; hence, a user-friendly web app is built for the government authorities
as well as the slum residents. Each can access multiple features of the web app
like predictive analysis of air and water quality in the future, basic care, and home
available remedies to prevent oneself and loved ones from epidemics.
5 Implementation
5.1 Need for Real-Time Monitoring
The aggregated data provided by the government is useful in data analysis over long
time but there can be mishaps. To address this problem, Suvarga has developed a
network of IoT devices that would be installed in the slums. These IoT devices would
act as a network, continuously monitoring the various air quality parameters.
5.2 Experimental Setup of Air Quality Monitoring Device
The IoT device consists of a microcontroller called ‘NodeMCU.’ It is capable of

being interfaced with gas sensors and also transmits data over a Wi-Fi network. The
sensors that are interfaced with this microcontroller are MQ135, MQ3, and MQ2
sensors.
Fig. 3 Experimental setup of the air quality real-time monitoring device
Three different IoT devices are fitted at the three corners of the slum area. The
devices act in unison, forming a mesh, transferring data to a common device acting
as the source which forms a server.
The data received over the IoT module is sent to this centralized data visualization
Cayenne server. The data could be visualized in real time and plotted on a live graph.
The set trigger is activated when a sensor gives a value that crosses a threshold,
indicating that an accident has taken place. An instant notification in the form of
SMS and email alert to the concerned government authority is sent simultaneously.
With this, the government can send instant relief or could take necessary actions to
pacify the toxic environment. As shown in Fig. 3, it describes the experimental setup
of the air quality real-time monitoring device.
5.3 Slum Health Drive Data Analysis
A slum health drive was conducted for the residents of the slum adjoining VESIT
on the 25th of January 2020. The parameters of the data that the team obtained from
the health drive are name, gender, age, weight, height, poverty status, toilet, drainage
linked to the house, waste collection system, compost pit, source of water, washing
clothes and utensils, alcohol, diabetes, hypertension, cholesterol, level of education,
Aadhaar card, authorized electricity connection, bank account, computer literate,
and source of income. Figure 4 shows the ratio of blood pressure people. Figure 5
represents the percentage of people that were healthy by weight, or overweight, or
underweight.
Fig. 4 Blood pressure ratio
Fig. 5 Weight ratio
5.4 Sanitation Data Analysis
Open defecation has been an onerous issue for a while, causing a variety of health-
related issues. The team of researchers decided to collect sanitation data from the
government authorities of the Chembur region through BMC offices. The dataset
obtained was in the form of a CSV file encompassing various parameters including
hypertension, fever, asthma, communicable, etc. The dataset comprises 172 rows
(records) and 7 columns (attributes).
A ward by ward analysis is done to ensure proper sanitation facilities which exist
in every ward so as to not strain the resources. The group by method in pandas (a
library in Python that deals with data frames) is used to group the data by each ward.
To find out, if all wards have toilets commensurate with each other, a pie chart has
been plotted that shows the distribution of the toilets in the region. A discrepancy
has been observed in the distribution. While the number of toilets in ward number
154 soars as high as 32, the number of toilets in ward number 149 is a meager 3. The
total toilets in the region are 2669. According to research, the number of people per
toilet is 100 but using population data obtained, the number of persons per toilet is
close to 457 (Fig. 6).
Fig. 6 People per toilet
5.5 Health Data Analysis
The health data was obtained from the hospitals in the vicinity of the area where
cases of malaria and other respiratory infections happened sporadically. The dataset
obtained from BMC offices was month-wise historical data of years 2017 and 2018
and had the name of the dispensary, the month under consideration, and the average
levels of health-related parameters like the total number of people suffering from
asthma, malaria, URTIs, and heart diseases to name a few.
The air and water quality data obtained encompassed historic data for the past
four years, whereas the health data procured from the BMC authorities attributed
records from 2017 to 2018. To avoid any aberrations, air and water quality data of
2018 and 2017 is taken into account along with health data of the same two years.
Correlation between all the parameters is obtained, and the results of which are
stored in a correlation matrix. All the correlations are stored in the matrix and then
sorted in ascending order using quick sort. The variables having the most correlation
are shown in Fig. 7.
However, the correlations above show the relation between the attributes of the
same table. A correlation between air quality parameters and URTIs is established,
and in the same way, malaria is correlated with the biological oxygen demand in the
water.
According to research, there has been an association between upper urinary tract
infection and respirable suspended particulate matter (RSPM) [7]. An increase in
particulates is detrimental to health as it causes a variety of conditions related to
respiratory tracts. The analysis conducted (Fig. 8) also states the direct positive
association between the two factors.
6 Results and Analysis
The comparative study established between the regression algorithms suggests that
Decision Tree Regression achieves the lowest error rate when evaluated with error
measurement metrics for regression comprising of mean squared error (MSE), mean
absolute error (MAE), and R2 score as compared to the other regression algorithms
Fig. 7 Correlation matrix
Fig. 8 URTI versus RSPM

Fig. 9 R2 score
being tested on these metrics like Lasso Regression, Lasso Lars Regression, Bayesian
Regression, and Random Forest Regression (Fig. 9).
6.1 Mean Squared Error
Mean squared error is a metric that tells how close the predicted points are to the
actual points on the regression line.
2
1
L= Ŷ − Y (1)
N
where: L—loss, Ŷ —output, Y —actual value, and N—samples.

The results of MSE indicate that Lasso Lars Regression gives the maximum error
of 381.74, while Decision Tree Regression gives the least MSE of 5.65. Random
forests perform better than most of the algorithms except decision trees giving an
MSE of 17.19. The results are shown in Fig. 10.
6.2 Mean Absolute Error
Mean absolute error is the measure of the difference in the actual value and the
predicted value.
Fig. 10 Mean squared error
1
n
MAE = |xi − x| (2)
n i=1
n = the number of errors, = symbol for summation, and |x i − x| = the absolute

errors.
From the chart in Fig. 11 below, we can infer that the MAE for decision trees is the
least with 0.64, while Bayesian and Lasso Regression give the most R2 score among
the five indicating poor performances. The values of MAE of other algorithms lie
between these two, represented by the bar chart.
7 Conclusion and Inferences
The research undertaken particularly focuses on improving the health and sanitation
facilities on the slums in Chembur, Mumbai region. Harnessing the potential of
artificial intelligence, data analysis, and Internet of things (IoT), the proposed system
predicts the patterns in air quality, water quality, and sanitary facilities and builds a
strong interdependence of these environmental and sanitary factors on the health of
the individuals residing there.
Through sanitation data, it was found out that the number of people per toilet
was 456.955 which were a lot higher than the ideal ratio which is 100 people per
toilet. A correlation is also established between the numbers of malaria patients in the
hospitals in the vicinity of the slums to the water quality Index. It was concluded that
URTI and RSPM have correlation of 0.547853, thus having a significant correlation.
Fig. 11 Mean absolute error
Data from the NGO health drive conducted indicated that a 57.15% of residents were
either overweight or underweight. Moreover, the R2 score obtained from decision
trees has a value of 0.99, indicating almost perfect prediction. The government could
use the findings of the research to take appropriate actions to assuage the detrimental
effects of poor well-being, unclean surroundings, and polluted environment on the
dwellers of slums.
References
1. United Nations: World Population Prospects: 2009 revision, Population and Development
Division, Department of Economics, and Social affairs (2009)
2. Building resilience of urban slums in Dhaka, Bangladesh, Iftekhar Ahmed. Procedia-Soc. Behav.
Sci. 218 (2016)
3. Dikhle, S., Lakhena, R.: GIS-Based slum information management system. In: 17th Esri India
User Conference (2017)
4. Arimah, B.: Infrastructure as a catalyst for the prosperity of African cities. In: Urban Transitions
Conference, Shanghai (2016)
5. Vilar, K., Cartes, I.: Urban design, and social capital in slums. Case Study_ Moravia’s
Neighborhood, Medellin (2004–2014)
6. Sarkar, A., Bardhan, R.: Improved indoor environment through an optimised ventilator and
furniture positioning: a case of slum rehabilitation housing, Mumbai, India. Accepted 1 Dec
2019
7. Li, Y.R., Xiao, C.C., Li, J., Tang, J., Geng, X.Y., Cui, L.J., Zhai, J.X.: Association between air
pollution and upper respiratory tract infection in hospital outpatients aged 0–14 years in Hefei,
China: a time series study. Public Health 156, 92–100 (2018)
Multi-task Data Driven Modelling Based
on Transfer Learned Features in Deep
Learning for Biomedical Application
N. Harini, B. Ramji, V. Sowmya, Vijay Krishna Menon,

E. A. Gopalakrishnan, V. V. Sajith Variyar, and K. P. Soman
Abstract Accurate automatic Identification and localization of spine vertebrae

points in CT scan images is crucial in medical diagnosis. This paper presents an
automatic feature extraction network, based on transfer learned CNN, in order to
handle the availability of limited samples. The 3D vertebrae centroids are identified
and localized by an LSTM network, which is trained on CNN features extracted from
242 CT spine sequences. The model is further trained to estimate age and gender
from LSTM features. Thus, we present a framework that serves as a multi-task data
driven model for identifying and localizing spine vertebrae points, age estimation and
gender classification. The proposed approach is compared with benchmark results
obtained by testing 60 scans. The advantage of the multi-task framework is that it
does not need any additional information other than the annotations on the spine
images indicating the presence of vertebrae points.
Keywords Vertebrae localization · Transfer learning · Multi-task model ·

LSTM · CT spine volumes · Convolutional neural network
1 Introduction
Automatic identification and localization of spine vertebrae points from Comput-

erized Tomography (CT) spine scans is challenging due to the homogeneity and
symmetry of structure of each vertebrae [12]. The main goal of this work is to
transfer learn from existing image networks, to overcome the limitations of limited
N. Harini (B) · B. Ramji · V. Sowmya · V. Krishna Menon · E. A. Gopalakrishnan ·

V. V. Sajith Variyar · K. P. Soman
Center for Computational Engineering & Networking (CEN),
e-mail: harininarasimhan123@gmail.com
URL: https://www.amrita.edu
K. P. Soman
e-mail: kp_soman@amrita.edu
https://doi.org/10.1007/978-981-33-4543-0_20
186 N. Harini et al.
data samples in deep learning, adapting them for biomedical application. We also
extend the model to classify gender and predict the age of the patients using the
features extracted from spine CT volumes. This extension might help in analysing
spine diseases for a particular time or place, even when there is no metadata. The
proposed method uses transfer learning to extract the feature descriptor of each spine
scan images from each spine sequence. Those feature descriptors are then fed to an
LSTM network combined with dense layers, to localize the spine vertebrae for each
spine sequence. The final dense layer features are used to classify the gender and to
predict the age of the patient using a random forest classifier and regressor respec-
tively. We also report a comparative study on our results with previous techniques
that have used the same dataset. Furthermore, the age prediction and gender classifi-
cation are evaluated using standard cross validation; 10, 20 and 30% of the available
training set. The following are the major contributions and novelty of the proposed
method
1. Limited availability of the data is handled with transfer learned feature extraction
for spine CT volumes in the MICCAI 2014 challenge dataset.
2. The LSTM network is utilized to extract continuity information from feature
descriptors of the spine volumes, where each feature descriptor is handled as
each instance of the spine scan.
3. A novel extension to identify age and gender of the patient.
2 Literature Survey
A good amount of spine scan images are necessary to model a machine learning prob-
lem yielding high localization accuracy. This creates challenges where availability
of data is scarce due to legal and medical bindings. Furthermore, to perform segmen-
tation or localization tasks, annotating large volumes of data (by domain experts) is
necessary, which is a tedious and challenging task. Generally, low amount of data
is dealt with by many, using traditional augmentation techniques such as translation
and rotation [9, 10] and GAN [7] as proposed for classification [5] and segmentation
[2]. Though generation of data helps to improve many computer vision applications,
generating spine scans comparable to original is risky and highly challenging. So we
have used, another approach; transfer learning a feature extraction network, that will
leverage using existing experience, without any synthetic data generation [13]. Trans-
fer learning approaches using a pre-trained CNN networks has improved the results
on various medial imaging applications as shown in [8, 13]. Among other methods
for localization of spine vertebrae, Glocker et al.’s random forest (RF) regression and
Hidden Markov Model achieved benchmark results on the MICCAI 2014 Computa-
tional Challenge on Vertebrae Localization and Identification [6] dataset. It uses hand
crafted features from CT spine volumes. Chen et al. proposed three stages; a coarse
vertebra candidate localization, vertebrae identification using JCNN, and localiza-
tion refinement with shape regression model [3]. This method employs a binary
Multi-task Data Driven Modelling Based on Transfer … 187
RF classifier, based on HOG features extracted from CT volumes, into vertebrae

and non-vertebrae, trained using the ground truths provided by domain experts. The
method succeeded in improving accuracy at the expense of complex computations.
Liao et al. proposed a multi-task model that provides short and long range contex-
tual information using CNN and bi-directional RNN (Recurrent Neural Network)
[11]. Wang et al. proposed combined Deep Stacked Sparse Auto Encoder (SSAE)
Contextual Features and Structured Regression Forest to identify and localize spine
vertebrae [14]. This method was evaluated with the MICCAI 2014 test set and 38
local datasets and the results were compared with previous techniques. None of these
methods have used transfer learned feature extraction in vertebrae localization and
Identification tasks.
3 Methodology
The overall architecture of the proposed work is shown in Fig. 1. It has three stages,
where each stage serves as a feature extractor for the next stage.
Dataset Description: MICCAI 2014 conducted a challenge titled, “Vertebrae
Localization and Identification” which consists of 242 training and 60 testing scans
of CT spine volumes. Each scan in the training and testing set is manually annotated
with 3 dimensional vertebrae centroids with labels; the meta data such as age and
gender of the patient is available only for the training set. Within each scan in the
Fig. 1 Overall architecture proposed for the identification and localization of vertebrae centroids
and estimation of age and gender
dataset there are a varying number of CT images between 31 and 511 grayscale
images of size 512 × 512. The total vertebrae centroids available are 7 cervical, 12
thoracic, 5 lumbar and 2 sacrum.
Stage 1—Transfer Learned Feature Descriptors: The extraction of feature
descriptors from the CT volumes is an essential step to train an automatic local-
ization model. Each scan is converted to a feature vector using a pre-trained Dense
network. This network is trained on Imagenet dataset and the final Global Average
Pooling (GAP) layer is selected to extract the feature for each image in the scan vol-
umes. Each scan has N number of images represented as {In }n=1 N
where each image I
is converted to f (I ) as a transfer learned feature descriptor with vector length 1664
as the final GAP layer of Dense network yields the same.
Stage 2—Vertebrae Localization using LSTM layer: The converted training
set, where each scan represented as f (I ) where n in range of 1 to N has vertebrae
centroids as V = {C1 , C2 , C3 , . . . , S1 , S2 } where each element is a 3D point rep-
resented as (x, y). It consists of total 26 vertebrae centroids resulting in a vector
length of 78 (26 * 3) and the missing centroids are set as zero. The feature descriptors
(vector) for the training set are used to train an LSTM layer combined with a dense
layer; the final fully connected regression layer gives the centroid points.
Stage 3—Age and Gender identification: From the trained LSTM Network,
the resultant feature before the regression layer serves as the feature descriptor for
each scan. In this module, each scan is represented as f (s) where s is the CT spine
volumes and f (s) is the feature extracted from a dense layer with 256 neural nodes.
Those features are used to train a random forest regressor to estimate the age and a
classifier to identify the gender. Random forests are the ensemble machine learning
algorithm which can handle both regression and classification [4].
The feature descriptor obtained by the transfer learning are embeddings, which are
used for the vertebrae centroids identification, age and gender prediction, a relevant
metric should be chosen to evaluate the extracted features. Multiple pre-trained CNN
networks are evaluated based on the cosine distance between these embeddings
(which are our feature descriptors). The premise is, each scan maps to a different
target vector V , so the cosine distance between the images belonging to different
scans should indicate divergence; all the images of the same scan are expected to
show proximity. The cosine distance between the images are computed as follows
1 N N
Ai × B j
Dsame = (1)
N (N − 1) i=1 i= j, j=1 | Ai | × | B j |
where Dsame represent the distance of a scan with N images. Similarly, the same
formula is used to calculate the distance between two scans in which A is the feature
vector of the image belonging to one scan and B is the feature vector of the image
belonging to another scan. The cosine distance between two scans of N1 and N2
images respectively is computed as follows
1 N1 N2
Ai × B j
Ddiff = (2)
N1 (N2 − 1) i=1 i= j, j=1 | Ai | × | B j |
The obtained cosine distance between the experimented pre-trained networks

is as tabulated in Table 1. The experimental results between different pre-trained
CNN networks are tabulated in Table 1. The best model is that which has larger
difference between the estimated Dsame and Ddiff . The Dsame is estimated for all
242 scans and the mean is computed. Similarly, Ddiff is estimated between 242
scans which produces 58,322 values ((242 * 242) − 242) and the mean is computed.
This way, the model Densenet 169 is selected as a Stage-1 network that gives the
feature descriptor of length 1664 for each image in the scans, with which the stage-
2 network is trained to identify and localize the vertebrae centroids. The extracted
feature descriptors are of varying sequence length. Each sequence is the features of
each scan represented as a 2D matrix with size N × 1664 where N is the number of
images in each scan. The maximum sequence length (511) is selected to which all the
sequences are zero padded. These sequences are fed to the LSTM network as shown
in Fig. 1. Various experiments on selecting the network are performed and analysed
based on the mean localization error. The results obtained on different experiments
are tabulated in Table 2. It explains the difference with the results obtained before
and after preprocessing steps implemented on the target vector. The target vector is
heavily sparse as not all the centroid points are available for all the scans and it scaled
and transformed based on the minimum and maximum value of the centroid points.
The comparative result analysis is based on Mean Localization Error (MLoE).
It is calculated as the mean of distance(in mm) between each predicted vertebrae
centroids and the manual annotations. From Table 2, it is evident that the LSTM cells
returning the hidden state of every instance with GAP layer and the preprocessing
steps improved the obtained results.
Table 1 Comparison between pre-trained networks for feature extraction based on the cosine
distance
Model Cosine distance Cosine distance (diff) Difference in distance
(same)
Inception V3 0.246 0.308 0.061
Densenet 169 0.230 0.319 0.089
VGG 19 0.016 0.021 0.004
Xception 0.320 0.406 0.086
Resnet 50 0.036 0.054 0.050
Table 2 Comparison between the experiments performed on localization of vertebrae centroids

Model MLoE (before preprocessing) MLoE (after preprocessing)
LSTM(1024) + Dense(256) + 34.81 17.96
Dense(78)
LSTM(1024) + Dense(256) + 28.72 16.63
GAP + Dense(78)
LSTM(512) + Dense(256) + 28.04 16.11
Dense(78)
LSTM(512) + Dense(256) + 26.33 14.71
GAP + Dense(78)
Table 3 Comparison of proposed results with benchmark results

Method Glocker [6] JCNN [3] Proposed
Region Counts Id.Rate Mean Std Id.Rate Mean Std Id.Rate Mean Std
(%) (%) (%)
All 657 74.04 13.20 17.83 84.76 8.82 13.04 85.60 14.71 18.95
Cervical 188 88.76 6.81 10.02 91.84 5.12 8.22 86.44 12.01 15.56
Thoracic 324 61.74 17.35 22.30 76.38 11.39 16.48 81.85 10.87 14.25
Lumbar 113 79.86 13.05 12.45 88.11 8.42 8.62 88.22 12.14 12.84
V − min(V )
V = (3)
max(V ) − min(V )
LSTM network mentioned without GAP layer returns only the output of the
final LSTM cell but not, every cell. Extracting the information provided by every
LSTM cell and averaging the features vertically (GAP) before the regression layer
(Dense(78)) provides lesser localization error. Furthermore, the results obtained by
the best LSTM network is compared with the previous benchmark results that have
used the same test set evaluation as shown in Table 3. Though the network fails to
provide localization with lesser mean and standard deviation error with respect to
Cervical and Lumbar centroids, it gives better and uniform identification accuracy
across all types of centroids as shown in Fig. 2a. The method provides better results
than Glocker [6] without any classification on images as vertebrae or background
as presumed by JCNN [3]. The information that whether an image in the CT spine
volume is a vertebrae or a background is not available in the challenge dataset and it
requires domain expert’s knowledge. Chen et al. has reported the variation of iden-
tification rate for all the vertebrae centroids [3] where the identification rate ranges
between 30 and 100% in which the thoracic region experiences a lower identification
rate.
In Fig. 2a and b, the identification rate and the localization error between all
the centroids are shown. The proposed method has the identification rate varying
between 70 and 90% resulting in a balanced estimation on vertebrae centroids. The
Fig. 2 a Identification rate of the predicted vertebrae centroids. b Localization error of the predicted
vertebrae centroids
Table 4 Comparison on gender estimation between different validation splits

Validation split 10% 20% 30%
Train −216 Train −192 Train −168
Validation −25 Validation −49 Validation −73
Accuracy 0.64 0.653 0.6164
F1 score 0.7272 0.7462 0.7142

4 9 7 13 10 14
Confusion matrix
0 12 4 25 14 35
stage 3 in the proposed model is a novel approach that extends the use of features
extracted from the stage 2 network. The MICCAI 2014 challenge provided the meta
information for every scan in the training set and it was not utilized in any previous
approaches as the main goal is to identify and localize the vertebrae centroids [1].
This extension is evaluated by different validation splits on the training set due to
availability of metadata only for the training set. The features obtained by the GAP
layer of the LSTM network is used to estimate the gender and age. In the training
set, the distribution of the data between the two genders are imbalanced, and so
a weighted class with more weightage to the Female class is trained on different
validation splits as shown in Table 4.
In all the three splits, the accuracy, F1 score and confusion matrix obtained on
testing the validation data is compared. The model is evidently capable of estimating
the gender of the patient from the spine scans with an F1 score of 0.70. The same
features, that are used to train the gender classifier, is utilized to identify the age of
the patients.
The age range of the patients available in the train data varies between 10 and
100 which is a wide space, the target variable (age) is transformed in the range
between 0 and 1 and trained using a random forest regressor. The results obtained
on evaluating different validation splits as experimented in gender identification are
compared based on Mean Absolute Error (MAE) in Table 5. As the estimation of the
Table 5 Mean absolute error (MAE) between the age ranges among different validation splits
Validation Age ranges MAE
split (%) on vali-
dation
10–20 20–30 30–40 40–50 50–60 60–70 70–80 80–90 90–100
10 0 6.22 0 2.84 1.31 5.70 4.90 0 0 5.74
20 0 6.54 1.13 2.33 4.44 3.70 4.14 6.25 0.7 5.97
30 0 7.74 1.23 2.46 4.60 3.64 4.22 5.98 1.10 5.59
age is with the mean absolute error not more than 6, the model is evidently believed to
estimate the age from spine scans with less difference to the actual age of the patient.
The error obtained with age ranges in the validation data is tabulated in Table 5. The
error rate is lesser for the scans that belong to the patient of age lesser than 80 and
greater than 30. It might be due to the more amount of data in the train set for the
mentioned age ranges.
Clinical spine diagnosis and spine disease trend analysis can be assisted with a multi-
task data driven model that localizes spine vertebrae centroids and identifies the age
and gender using spine CT volumes. The proposed model handles the disadvantages
of limited sample dataset through transfer learning feature extraction using a novel
performance analysis to find the right transfer learning network. The model also seeks
to perform uniformly across all types of spine vertebrae by producing identification
rates between 70 and 90%. The novel extension of the model is not just limited
to identify age and gender but can also be extended to cluster the scans belonging
to the same patient. This will be tabled for future research since the metadata in
the challenge dataset also includes the annotation of scans belonging to the same
patient. Though the proposed algorithm outperforms the benchmark results with
identification rate of the vertebrae centroids, the model can be hyper tuned to further
reduce the localization error.
References
1. http://csi-workshop.weebly.com/challenges.html
2. Bowles, C., Chen, L., Guerrero, R., Bentley, P., Gunn, R., Hammers, A., Dickie, D.A., Hernán-
dez, M.V., Wardlaw, J., Rueckert, D.: GAN augmentation: augmenting training data using
generative adversarial networks. arXiv preprint arXiv:1810.10863 (2018)
3. Chen, H., Shen, C., Qin, J., Ni, D., Shi, L., Cheng, J.C., Heng, P.A.: Automatic localization and
identification of vertebrae in spine CT via a joint learning model with deep neural networks. In:
International Conference on Medical Image Computing and Computer-Assisted Intervention,

pp. 515–522. Springer (2015)
4. Cutler, A., Cutler, D.R., Stevens, J.R.: Random forests. In: Ensemble Machine Learning, pp.
157–175. Springer (2012)
5. Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: GAN-based
synthetic medical image augmentation for increased CNN performance in liver lesion classi-
fication. Neurocomputing 321, 321–331 (2018)
6. Glocker, B., Zikic, D., Konukoglu, E., Haynor, D.R., Criminisi, A.: Vertebrae localization
in pathological spine CT via dense classification from sparse annotations. In: International
Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 262–270.
Springer (2013)
7. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville,
A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing
Systems, pp. 2672–2680 (2014)
8. Hon, M., Khan, N.M.: Towards Alzheimer’s disease classification through transfer learning. In:
2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1166–
1169. IEEE (2017)
9. Hussain, Z., Gimenez, F., Yi, D., Rubin, D.: Differential data augmentation techniques for
medical imaging classification tasks. In: AMIA Annual Symposium Proceedings. vol. 2017,
p. 979. American Medical Informatics Association (2017)
10. Kwasigroch, A., Mikołajczyk, A., Grochowski, M.: Deep neural networks approach to skin
lesions classification—a comparative analysis. In: 2017 22nd International Conference on
Methods and Models in Automation and Robotics (MMAR), pp. 1069–1074. IEEE (2017)
11. Liao, H., Mesfin, A., Luo, J.: Joint vertebrae identification and localization in spinal CT images
by combining short- and long-range contextual information. IEEE Trans. Med. Imaging 37(5),
1266–1275 (2018)
12. Schmidt, S., Kappes, J., Bergtholdt, M., Pekar, V., Dries, S., Bystrov, D., Schnörr, C.: Spine
detection and labeling using a parts-based graphical model. In: Biennial International Confer-
ence on Information Processing in Medical Imaging, pp. 122–133. Springer (2007)
13. Van Opbroek, A., Ikram, M.A., Vernooij, M.W., De Bruijne, M.: Transfer learning improves
supervised image segmentation across imaging protocols. IEEE Trans. Med. Imaging 34(5),
1018–1030 (2014)
14. Wang, X., Zhai, S., Niu, Y.: Automatic vertebrae localization and identification by combining
deep SSAE contextual features and structured regression forest. J. Digit. Imaging 32(2), 336–
348 (2019)
Punjabi Children Speech Recognition
System Under Mismatch Conditions
Using Discriminative Techniques
Harshdeep Kaur, Vivek Bhardwaj, and Virender Kadyan
Abstract It is a very difficult challenge to recognize children’s speech on Automatic

Speech Recognition (ASR) systems built using adult speech. In such ASR tasks, a
significant deteriorated recognition efficiency is observed, as noted by several earlier
studies. It is primarily related to the significant inconsistency between the two groups
of speakers in the auditory and linguistic attributes. One of the numerous causes of
conflict found is that the adult and child speaker vocal organs are of substantially
different dimensions. Discriminatory approaches are noted for dealing extensively
with the effects emerging from these differences. Specific parameter variations have
been introduced with boosted parameters and iteration values to achieve the optimum
value of the acoustic models boosted maximum mutual information (bMMI) and
feature-space bMMI (fbMMI). Experimental results demonstrate that the feature
space discriminative approaches have achieved a significant reduction in the Word
Error Rate (WER). This is also shown that fbMMI achieves better performance than
the bMMI and fMMI. Recognition of children and the elderly will need even more
studies if we are to examine these age groups features in existing and future speech
recognition systems.
Keywords ASR · Children speech recognition · Kaldi toolkit · Discriminative

techniques · Acoustic mismatch
1 Introduction
Speech recognition devices are usually based on adult data. While the latest tech-
nology of speech recognition is not yet ideal for adults, the task of building children’s
H. Kaur · V. Bhardwaj (B)

Chitkara University, Institute of Engineering and Technology, Chitkara University, Punjab, India
e-mail: vivek.bhardwaj@outlook.in
V. Kadyan
Department of Informatics, School of Computer Science, University of Petroleum and Energy
Studies, Dehradun, India
https://doi.org/10.1007/978-981-33-4543-0_21
196 H. Kaur et al.
spoken dialog applications faces far greater challenges [1]. Children are an impor-
tant segment of users that will benefit from advances in multimedia technology.
In multimedia games and computer instructional material, children are one of the
primary potential users of computers for conversational interaction. By using spoken
language interfaces, children are generally comfortable and happy. In order to make
ASR interfaces more interesting to interact with, it is important that they understand
and adapt the language of the user to match or complement the speech of the user
[2].
The topic of children’s speech recognition has been gaining attention in recent
years [3]. Wilpon’s study [4] shows that recognition of the language of children
is more difficult than that of adults. He also noticed that some formant informa-
tion is missing in the case of children’s speech of telecommunications bandwidth
expression, and he lowered the count of Linear Predictive Coding (LPC) coefficients
in his recognizer to account for this trend. Another study [1] explored the utility
of frequency warping to account for shifts in the frequency spectrum owing to the
smaller vocal tracts of boys. Some of the mistakes were considered to be related to
grammar problems. Further focus had to be given to the features related to the ASR of
an infant. It is well recognized that young children’s speech habits differ significantly
from those of adults. Differences have been studied between the auditory properties
of speech in children and adults in [5]. For digit and phrase recognition activities,
this form of deterioration is checked in [6].
In this paper, we discuss some aspects of our work on the children Punjabi speech
recognition system developed under mismatched acoustic conditions for contin-
uous speech. We used discriminative techniques for our work. As a consequence,
discriminatory training techniques are key components of technologies and are a
major area of speech recognition research [7]. These techniques achieved substantial
improvements for small vocabulary tasks on small datasets.
Standard databases such as TIMIT and ATIS are available for foreign languages
such as English but the key obstacle in Punjabi speech research or any other
Indian language is the lack of resources such as speech and text corpora. In this
paper, the confusion among word pattern is handled on small child training dataset
using discriminative methods to enhance the unbalancing between corrected and
uncorrected word sequence.
Apart from the introduction in Sect. 1, the rest of the paper is sorted as pursues.
A part of the relevant work is shown in Sect. 2. Section 3 describes the context in
theoretical terms. The experimental setup is given in Sect. 4, and the experimental
results are provided in Sect. 5. The method is eventually summarized in Sect. 6.
2 Related Work
Li and Huang [8] have proposed an auditory extraction algorithm based on the feature.
The author applied Cochlear Filter Cepstral Coefficients (CFCC) features for speaker
Punjabi Children Speech Recognition System Under … 197
identification to resolve the conditions of acoustic mismatch between test and envi-
ronment. Typically, the system’s output dropped significantly when examined on
noisy data when it is focused on clean speech. In this type of situation, CFCC
is better to perform on the baseline instead Mel Frequency Cepstral Coefficient
(MFCC) under the three mismatched conditions that is car noise, white noise, and
babble noise. All the MFCC and CFCC worked well when data is clean but the preci-
sion of MFCC decreases when signal-to-noise level of 6 decibels although the CFCC
can still achieve higher precision than the MFCC. The CFCC does better under the
white noise than the PLP and RASTA but the CFCC does similarly to the PLP and
RASTA under the vehicle and babble noise.
Giuliani and Gerosa [5] examined speech comprehension in children in the
context of the process of phoneme identification. They analyzed phone recogni-
tion by comparing the two experimental configurations of which children get the
lower accuracy, 77.30% with respect to accuracy, 79.43% for adult phone recogni-
tion obtained. The outcomes of many children speakers being heard were as strong
as for adults. Concerning the reference system under mismatch conditions, for adults
and children, respectively, Vocal Tract Length Normalization (VTLN) makes a rela-
tive reduction of 10.5 and 5.3%. When they recognize the children’s speech with
the baseline system, they obtained a low recognition performance of 58.11%. When
they applied VTLN, they obtained better recognition performance on the same dataset
with the trained system on children up to 66.43%.
Das et al. [1] conducted a several experiments with children’s information to
develop a system of speech recognition for children. Using certain commands and
control data, they found a gain of frequency warping. They designed the acoustic and
language prototype and analyzed word recognition results in different configurations,
where a WER of 10.4 was achieved by the construction method.
Li and Russell [9] studied a speech recognition quality in a small children’s
community. The grammar is proposed to be a major influence on the quality of
recognition of expression. Use a personalized dictionary will improve the perfor-
mance of the ASR, but the change is small. Quality deterioration due to poor speech,
combined with degradation due to the use of telecommunications bandwidth capacity,
is proposed to account for most of the recorded differences in performance between
adult and child.
Lee et al. [3] published on a collection of temporal and auditory parameters
calculated from a recently compiled speech sample of 436 participants between
5 and 18 years of age and 56 adults. Their findings indicated that a major trend
correlated with the growth of speech in normal children is the decrease in amplitude
and in subject variation of both temporal and spectral acoustic parameters with age.
Arunachalam et al. [2] studied the research which focuses on a discourse analysis
of child-machine relationships in the spoken language. Their results indicate that
with no obvious gender differences, younger children are less likely to use respectful
signs and more direct requests for information compared to older ones.
Narayanan and Potamianos [10] reported results that are feasible for the creation
of children’s conversational framework. Standardization of the speaker and modifi-
cation of the template was used to improve the performance of speech recognition.
198 H. Kaur et al.
Overall, the prototype was a positive first effort to build a children’s multimodal
program with an emphasis on conversational language.
Kathania et al. [11] investigated the possibility of deliberately adjusting the pitch
of the voice of children to minimize reported pitch variations between two speaker
classes. Such a clear reduction in pitch is acknowledged as a significant improvement
in the recognition quality. The feasibility of the suggested methods was tested on
ASR models equipped for adults using different auditory simulation strategies, i.e.,
Gaussian Mixture Simulation (GMM), subspace GMM, and Deep Neural Network
(DNN). It is observed that the suggested approaches were highly effective in all the
simulation paradigms that have been studied.
Shahnawazuddin et al. [12] introduce their efforts to improve the quality of the
keyword spotting program under a limited data scenario for children’s language.
They addressed two different ways of implementing prosody alteration effectively.
Augmentation of data based on prosody modification often helps to improve the
quality of adult voice. The frequency of the speech test utterance of the kids is
lowered, and the speaking level is increased significantly. The performance attained
by prosody adjustment is much higher than the output obtained through VTLN usage.
It is also observed that data increase is very effective in improving the performance
of KWS concerning the speech of children.
3 Theoretical Background
3.1 Discriminative Techniques
Discriminative training is a supervised learning theory that minimizes the modeling

of bias labels and the outcomes of recognition. The article focuses mainly on MMI,
though other methods of training are accessible. The objective function in the MMI
is given as
K

R
pλ {xt }r |Hsr p L (sr )
FMMI (λ) = log (1)
s pλ ({x t }r |Hs ) p L (s)
K
r =1
where R is the count of assertions of training; where {x t }r is a list of rth function

of utterance. The generalized Baum-Welch algorithm optimizes the acoustic design
parameters; r is the learning utterance list. HSr and H s are the correct label HMM
sequences S r and the product of recognitions, respectively. Pλ is an acoustic model’s
probability, k is an acoustic size, and pL is a language model’s probability. Assertions
involving multiple errors need to be treated intensively, and reliability is improved
by testing phoneme accuracies.
K

R
pλ {xt }r |Hsr p L (sr )
FbMMI (λ) = log (2)
s pλ ({x t }r |Hs ) p L (s)e
K −b A(s,sr )
r =1
where A(s, sr ) for a reference sr is the phoneme accuracy of s, and (b > 0) controls
the stability of its outcomes. We equate MMI and bMMI output with ML efficiency.
3.1.1 Discriminative Feature Transformation
Aside from discriminative learning, the use of a function transformation focused on

discriminative training criteria is necessary. The present approach predicts a matrix
M which projection from high-dimensional non-linear features to low-dimensional
transformed features, as present in Eq. (3):
yt = xt + Mh t (3)
where x t is the initial k-dimensional function, ht is the non-aligned L-dimensional

feature, and yt is the transformed element. The proportions of matrix M are K × L. We
verify the utility of fMMI and its improved MMI (fbMMI) feature-space extension.
The features were designed in the same manner in both of these, but the objective
function of training is specific. After the substitution of y of Eq. (3) for x, we get the
objective function of fMMI in Eq. (1) and (2):
K

R
pλ {yt }r |Hsr p L (sr )
F f −MMI (M) = log (4)
s pλ ({yt }r |Hs ) p L (s)
K
r =1
The objective function F is defined by M as

∂F ∂F ∂F T
= ··· h1 · · · h T f (5)
∂M y1 yT f
where T represents the interchange, and T f is the aggregate count of datasets. The
objective function of fbMMI is similarly constructed. The optimal matrix M is
obtained by gradient descent. N components of GMM are achieved by accumu-
lating the Gaussians into N components in the original triphone acoustic models and
restating their criterion to form the functionality. The ht non-aligned properties are
measured as

T
xt,1 − μn,i xt,K − μn,K
pt,n , . . . , pt,n , αpt,n (6)
σn,i σn,K
where μn , I, and σ n , i in the nth Gaussian element are the mean and variance in
dimension i. α is the element of scaling. For each frame, Pt, n is measured posteriors
of Gaussian components, accurately measured so that all posteriors except the Q-best
200 H. Kaur et al.
are set at zero. This calculation was achieved to minimize the expense of computation
by making sure that the ht is sparse.
The experiments were performed to confirm the efficiency of discriminative sequen-

tial training. The study of the outcome was carried out on the Punjabi children’s
speech corpus under mismatch conditions. Separate systems were trained using the
two-speech corpus (Child/Adult). The first is a child speech recognition system;
the corpus and results are given in [13]. The second system was trained on the adult
speech corpus and test on the child’s speech corpus. The adult speech corpus consists
of a total of 3353 utterances. Both the datasets were used to analyze the performance
of discriminative techniques under mismatch conditions. The recognition system for
mismatch was trained on 3353 utterances spoken by the adult and test on 1440 utter-
ances spoken by the child speaker. The other recognition system for mismatch was
trained on 3859 utterances spoken by the adult and child speaker and test on 1653
utterances spoken by the child speaker. The whole children’s and adult speech reper-
toire were measured at 16,000 Hz level. On these databases, the acoustic models were
trained, and the corresponding language model 5 k scale was used. Using 13 standard
MFCC + Delta + Double Delta coefficients, an input speech signal was analyzed to
produce acoustic characteristics. It was also found that later Linear Discriminative
Analysis (LDA) reconstructs such acoustic characteristics derived and invented the
limited vocabulary dataset training substantially. The first 13 MFCC elements culmi-
nated in 117 measurements that were further enclosed in 40 dimensions through the
LDA solution. However, it also tried to use these features on HMM state alignments,
utilizing the triphone layout. This was created by the creation of high-dimensional
acoustic characteristics in the degradation of the efficiency of the model, thereby
making it more difficult to modify the feature space. Therefore, utilizing state-
conditional covariance of combined space properties, MLLT was used to convert the
function area. The linear regression of the feature-space ML was applied to achieve
substantial improvement and was therefore helpful in the adaptation of speakers.
Discriminative training is performed in both feature and model space using a variant
of the MMI criterion called boosted (or margin based) MMI (BMMI). The objective
function is used is a modification of BMMI which uses a frame-based, state-based loss
function instead of a phone-based accuracy measure. Result analysis was conducted
in this segment utilizing four variants of MMI discriminative approach: MMI, bMMI,
fMMI, and fbMMI to obtain the following results:
Table 1 Word error rate using MMI, boosted MMI model on boosted value 0.25, and fMMI, fbMMI
on the iteration value 3
Dataset (train/test) WER%
MMI bMMI fMMI fbMMI
Adult/child 63.72 61.62 55.25 53.49
Adult, child/child 21.32 20.23 17.93 16.06
• Variation in error rate with MMI boost value level.

• Error rate variance observed with iteration value number in fbMMI.
Using the Kaldi toolkit [14], the system was developed. It was built on the Ubuntu
operating system which is a Linux platform. In terms of WER, system recognition
efficiency was evaluated. The results for child speech recognition are given in [13].
Additionally, the recognition system output trained from mixed data from the adult
and child corpora shown in Table 1.
5.1 Experimental Results with Varying Boosted Parameter

Values
The primary step in the discriminative training sequence is the generation of lattices
for numerator and denominator. It has been found that the number of lattices also
appears to decrease the transcription’s forced alignment. We have adapted Kaldi’s
MMI recipes to fit the data through yet to our knowledge.
In order to decode for the process of MMI, four iterations (by default) of stochastic
gradient descent are being used. Further, we have used the boosting factor with MMI
to robustly locate the likelihood of the path with more errors. bMMI performs better
than MMI due to the refined constant learning rate of 0.00001 and I-smoothing as
regularization shown in Table 1. Moreover, the boosting factor was investigated with
the values varied from [0.25, 0.5, 0.75, 1.0, and 2.0] and the system obtained lower
WER at bMMI value of 0.25.
5.2 Experimental Results with Varying Number of Iteration

Values
Before starting with feature space discriminative training, to train the diagonal mix of
the implemented Gaussian, 250 amounts of Gaussian with the silence of weight value
0.5 were employed. The total number of eight iterations with a boost value of 0.25
and a learning rate of 0.001 were employed for the training of feature space bMMI.
The denominator states are thus used, and the lattice is re-scored on all eight iterations
202 H. Kaur et al.
leading to the transformation of the features needed for robust discriminative training
in the feature space. The obtained WER shows that system obtained maximum output
at fbMMI iteration value 3 as shown in Table 1.
This paper represents a Punjabi language speech recognition system for children’s
speech under mismatch conditions. The experiments were repeated for the child and
adult speech corpus described in Sect. 5. Discriminative techniques were explored
for the training and testing conditions of both matched and mismatched data. The
framework presented was developed using one kind of discriminative technique and
its variants, i.e., MMI, bMMI, fMMI, and fbMMI. It showed significant that changes
were made by using discriminative training methods for limited vocabulary tasks
on small datasets. Recognition efficiency declines significantly for matched and
mismatch conditions. The WER for mismatch conditions is about higher than for
matched conditions. The primary cause of the loss of results is the inconsistency
between the mismatch of speech data. The boosting factor was investigated with
the different values and the system obtained lower WER at bMMI value of 0.25.
The obtained WER shows that system obtained maximum output at fbMMI iteration
value 3. It has been analyzed that fbMMI appeared to be a promising technique than
MMI, bMMI, and fMMI from the results presented in this paper.
In mismatched speech recognition process, acoustic properties of the speech signal
like pitch, formant frequency, fundamental frequency, and speech speaking rate play
an important role for achieving the good performance. There is a lot of differences
in acoustic properties of children’s and adult speech signal. So, in the future by
enhancing the pitch and acoustic properties of the children’s speech signal, the
performance of the Punjabi children’s speech recognition system will be increased.
References
1. Das, S., Nix, D., Picheny, M.: Improvements in children’s speech recognition performance.
In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal
Processing, ICASSP’98 (Cat. No. 98CH36181), vol. 1, pp. 433–436. IEEE (1998, May)
2. Arunachalam, S., Gould, D., Andersen, E., Byrd, D., Narayanan, S.: Politeness and frus-
tration language in child-machine interactions. In: Seventh European Conference on Speech
Communication and Technology (2001)
3. Lee, S., Potamianos, A., Narayanan, S.: Acoustics of children’s speech: developmental changes
of temporal and spectral parameters. J. Acoust. Soc. Am. 105(3), 1455–1468 (1999)
4. Wilpon, J.G., Jacobsen, C.N.: A study of speech recognition for children and the elderly. In:
1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference
Proceedings, vol. 1, pp. 349–352. IEEE (1996, May)
5. Giuliani, D., Gerosa, M.: Investigating recognition of children’s speech. In: Proceedings
of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003.
(ICASSP’03), vol. 2, pp. II-137. IEEE (2003, April)
6. Potamianos, A., Narayanan, S., Lee, S.: Automatic speech recognition for children. In: Fifth
European Conference on Speech Communication and Technology (1997)
7. Heigold, G., Ney, H., Schluter, R., Wiesler, S.: Discriminative training for automatic speech
recognition: modeling, criteria, optimization, implementation, and performance. IEEE Signal
Process. Mag. 29(6), 58–69 (2012)
8. Li, Q., Huang, Y.: An auditory-based feature extraction algorithm for robust speaker identifica-
tion under mismatched conditions. IEEE Trans. Audio Speech Lang. Process. 19(6), 1791–1801
(2010)
9. Li, Q., Russell, M.J.: An analysis of the causes of increased error rates in children’s speech
recognition. In: Seventh International Conference on Spoken Language Processing (2002)
10. Narayanan, S., Potamianos, A.: Creating conversational interfaces for children. IEEE Trans.
Speech Audio Process. 10(2), 65–78 (2002)
11. Kathania, H.K., Ahmad, W., Shahnawazuddin, S., Samaddar, A.B.: Explicit pitch mapping for
improved children’s speech recognition. Circ. Syst. Sig. Process. 37(5), 2021–2044 (2018)
12. Shahnawazuddin, S., Maity, K., Pradhan, G.: Improving the performance of keyword spotting
system for children’s speech through prosody modification. Digit. Signal Proc. 86, 11–18 (2019)
13. Kaur, H., Kadyan, V.: Feature space discriminatively trained Punjabi children speech
recognition system using Kaldi toolkit. Available at SSRN 3565906 (2020)
14. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Silovsky, J.: The
Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition
and Understanding (No. CONF). IEEE Signal Processing Society (2011)
Effective Irrigation Management System
for Agriculture Using Machine Learning
S. T. Patil, M. S. Bhosale, and R. M. Kamble
Abstract Farming of different crops is major income source of the farmers in India
but there are many factors that affect the farming business. One of the important factor
is efficient water supply for the crop. The work in this paper proposes the effective
irrigation system that helps to increase the productivity of the crops by regulating
the water requirement for the crop with the help of machine learning approach. The
images of different farming land are studied and classified depending upon the soil
type and its properties like the water requirements in different conditions. The image
processing is applied to images of lands to understand the current soil condition.
This phase is followed by application of decision tree and random forest to take a
decision of water is required or not. If answer is yes, then using linear regression,
we calculate the time period of water flow.
Keywords Classification · Regression · Decision tree · Random forest ·

Agriculture · Image processing
1 Introduction
Agriculture is an important sector in Indian economy as it contributed 17–18% of

total GDP of country (2018). Indian farmer grow different crops at different parts of
the country which depends upon the weather conditions of the region and the soil
type. In addition to weather conditions and properties of the soil, effective watering
to the crop is one of important aspect of the farming. As Dr. Vibha Dhawan in [1]
S. T. Patil (B)
Department of CSE, Sanjay Ghodawat University, Kolhapur, India
e-mail: sangram.patil@sanjayghodawatuniversity.ac.in
M. S. Bhosale
Department of Computer Science and Engineering, TKIET, Warnanagar, Kolhapur, India
e-mail: msbhosale@tkietwarana.ac.in
R. M. Kamble
Department of Computer Science and Engineering, ADCET, ASTHA, Ashta, Kolhapur, India
e-mail: rmk_cse@adcet.in
https://doi.org/10.1007/978-981-33-4543-0_22
206 S. T. Patil et al.
explained that good seeds and fertilizers fail to achieve their full potential if crops
are not watered optimally. The fresh water requirement of different industry and
agriculture is increasing rapidly, but uncertainty in rain fall, limitations of reservoir,
and low ground water level are a big threat for farming/agriculture sector. To cope
with this situation and for the better and efficient use of existing water, we need an
effective irrigation management system.
Sugarcane is major crop in southern Maharashtra and proper percentage water,
moisture in soil/land yields to better productivity. The sugarcane is very water inten-
sive crop as it requires large quantity of water. Currently, sugarcane farmers are not
having any economical resource/equipment which will guide them for effective and
efficient watering for farming. They are guessing water requirement on their past
experience and flood the water into farm; this leads to huge wastage of water. Some
farmer uses sprinklers to minimize water wastage. In recent years, some low-cost
sensors are developed to detect the water contents in soil [Development of low-cost
sensor]. In [determination of soil moisture], Praveen et al. designed IoT-based sensors
for irrigation system but they are not suggesting what should be the water quantity
required for maintaining required moisture. The work presented in this paper esti-
mates the water requirement of crop based on given image of soil with machine
learning approach.
In the next chapter, literature survey is given. Third chapter explains methods
used for data collection. In the fourth chapter, we have explained methodology. Fifth
chapter explains the result analysis and includes the numerical comparisons of these
methods. The last chapter gives concluding comments.
2 Literature Survey
dos Santos [2] in which he presented relation between the soil and moisture with the
help of images taken by digital camera, and images are adjusted to white balance. In
his study, he derived the equation which will relate the presence of moisture in soil
from image taken. He referred the data provided by Federal University of Vicosa.
Fitton [3] studied on farming in Africa, China, Europe, and Asia. After study is
observed that proper watering can increase the productivity also he explained how
water affects productive capacity of land/soil. More water or less water contents
decreases the productivity. Proper water increases the productivity.
Ashok [4] studied how images can be used to detect and treatment of different
disease by capturing the images. After capturing the image, some preprocessing is
done from which the histogram which is generated will classify or detects the disease
on plant. This study is helpful for detecting the disease from image.
Khan in paper [5] uses decision tree method to estimate how much water is
available in that region. He also tried to estimate the moisture in soil. He showed
how data mining can be used for estimating water availability in that region. This
estimation will help to get good crop.
Effective Irrigation Management System … 207
Dhawan in [1] presented detailed analysis of water and agriculture in India. One
of the outcome of research is water efficiency in the country that should be increased
by making best use of available technologies.
3 Dataset
For our study, we have selected eight different pieces of lands of same size, depending
upon water holding capacity. The information of land is collected in the form of
four digital images from each corner of the land, and physical samples of soil are
collected. The collected soil samples are studied for moisture contents of the soil
and also to get other properties. The wind flow rate and current temperature of the
day are also important factor which affects the moisture contents of soil. That’s why
while collecting samples of soil the wind flow rate and temperature of that region is
also recorded and maintained with images taken. Then, we studied total quantity of
water irrigated for a sugarcane over the period of time and measured the final product
gain of sugarcane.
For the experimental study, we prepared a dataset of images. The images were
taken twice a day from every piece of land for six months excluding the rainy season
for sugarcane crop cycle. So we collected nearly 11,520 images, 5760 samples of soil,
and 1440 temperature and wind flow readings. We considered a standard moisture
contents that are expected to be present in land/soil to increase productivity at every
stage of a crop.
4 Proposed System Architecture and Methodology
The practical analysis of our work is represented in Fig. 1. First digital images were
captured at 1 feet height from surface of land/soil. Then, the captured images are
taken as input to our system. The final analysis is depended on captured images so
we did preprocessing of images to remove unwanted part. The images are stored into
RGB format with 256 × 256 size. Also it is stored in 8 × 8 representation for further
processing.
The images are classified into three classes. This classification is dependent on the
water requirement of the soil, i.e., low, moderate, and high water requirement. Also
from this classification [5], we can get the current moisture in land/soil. Then, images
are provided to decision trees [4]. The output of decision trees provided to input to
random forest [4] which gives decision whether water is required for land/soil or not.
If irrigation is required, then from linear regression, it is decided how much water is
required.
The following equation is used to calculate water need of the crop.
W = EM − CM (1)
Fig. 1 System architecture

Image Image
Acquisition Preprocessing
Dataset
Image
Classification
Random Forest
Irrigation is Irrigation is
NOT required required
Linear
Regression.
How much
is required?
where W is how much water is required, EM is expected moisture in land/soil, and

CM is current moisture I land/soil.
Expected moisture (EM) depends upon which type of sugarcane we have selected,
date of sow, current date, and soil type. This information we have taken from
government offices/agriculture departments of that region.
And the time for water should be irrigated is calculated as,
W
T = (2)
F
where T is time in minutes; F is flow rate of water from irrigation system install in
land/soil.
We are also calculating total water consumption (TWC) of the month using
formula

TWC = F∗T (3)
The algorithm is as follows

Input: Images from land/soil.
Output: Quantity of water required in time format.
Step 1: Capture the image.
Step 2: Preprocess the image.
Step 3: Find the class of the image and get the value of EM and CM.
Step 4: Process the random forest.
Step 5: If output of step 4 is no, then go to step 8.
Step 6: Input EM, CM, current temperature, wind flow rate to linear regression.
Step 7: Get the T from linear regression.
Step 8: End.
Initially, we checked the performance of the system which we have developed. For
this, the collected data is divided into two parts training data and test data. Training
data is used to train the model, and then, model is tested on testing data, and results
are shown below. Average accuracy of the model is approximately 84% (Fig. 2).
For experimental result, we primarily focused on number of units of electricity and
water consumption for one crop cycle of sugarcane and also recorded total production
of the product. We have studied the crop cycle of sugarcane for three years; out of
which first 2 years, we observed the normal farming of the crop, and for third year
crop cycle, we implemented our proposed system. Following are the experimental
results for electric consumptions. For experimental result we also considered the
distance from water resource also. We combined the results depending upon this
distance also.
Fig. 2 Performance measure of the model

If we consider the electrical consumption, then it clearly shows that after the
system is implemented, the electrical consumption is reduced as per previous two
years. This is because here we are giving time duration for allowing water to flow
which was not previously considered (Figs. 3, 4, 5, and 6).
If we check at the water consumption, then it clearly shows less water consumption
as previous years after implementation of the system (Fig. 7).
Effective irrigation system provides proper water for land which keeps moisture in
land/soil as per requirement which increases the productivity of the land/soil (Fig. 8).
Fig. 3 Electrical consumption in number of units (low water requirement)
Fig. 4 Electrical consumption in number of units (moderate water requirement)

Fig. 5 Electrical consumption in number of units (high water requirement)
Fig. 6 Electrical consumption in number of units (combine)
6 Conclusion
In India, high water intensive crop like sugarcane is the major crop taken by farmers.
Water plays an important role in the growth of sugarcane. Optimal use of water gives
us better growth and better productivity. To achieve this machine learning techniques
can be used effectively. From the result, it is clear that our system decreases the water
consumption, electric consumption and increases the productivity of the crop.
Fig. 7 Water consumption in number of units (combine)
Fig. 8 Sugarcane productivity (combine)
In future, we can implement IoT system for starting and stopping of irrigation
system. Also we can use this system to guide farmer for selecting type of sugarcane
and other crops.
References
1. Dhawan, V.: Water and agriculture in India. In: Background paper for the South Asia expert
panel during the Global Forum for Food and Agriculture (GFFA) (2017)
2. dos Santos, J.F.C.: Use of digital images to estimate soil moisture. Sci. Direct (2016)
3. Fitton, N.: Global Environment Change. Elsevier, Amsterdam (2019)
4. Ashok, J.M.: Agricultural Plant Disease Detection and its Treatment usig Image Processing.
IJSRD (2015)
5. Khan, S.A.: An Approach to Predict Soil Nutrients and Efficient Irrigation for Agriculture with
Spatial Data Mining, IJSRD (2015)
6. Aruna, D.D.: A Survey on Different Disease and Image Processing Techniques in Sugarcane
Crops. IJSRD (2016)
7. Balew, A.: The Egyptian Journal of Remote Sensing and Space Science. Elsevier, Amsterdam
(2020)
8. Sneht, S.H.: Land Use Land Cover Change Detection of Gulbarga City Using Remote Sensing
and GIS. IJSRD (2014)
9. BenDor, E.: Using imaging spectroscopy to study soil properties. Remote Sens. Environ. http://
dx.doi.org/10.2016/j.rse.2008.09.019
10. Tomar, M.: Development of Low Cost Soil Moisture Sensor. IEEE, ViTECoN (2019)
11. Barapatre, P., Patel, J.: Determination of soil moisture using various sensors for irrigation water
management. IJITEE, 8 (2019)
12. Wang, W., Liu, K.: Remote sensing image-based analysis of the urban heat isaland effect in
Shenzhen, China. Elsevier Book 110 (2019)
13. Peng, J., Jia, J.: Seasonal contract of the dominant factors for spatial distribution of land surface
temperature in urban areas. Elsevier Book, 215 (2018)
IoT-Based Smart Irrigation System
Mithilesh Kumar Pandey, Deepak Garg, Neeraj Kumar Agrahari,

and Shivam Singh
Abstract IoT-based smart irrigation system is used to automatize farming. It can be

used to manipulate the quantity of water to go with the flow at desired durations, main-
tains desired humidity, soil moisture level for crop protection and crop improvement,
and to save the time of farmers. In this irrigation system, all the work perform auto-
matically by using the technologies, and all the processes can be handle via a mobile.
Sensors sense the field and give the information to the microcontroller of change
in moisture and temperature. Then, microcontroller reads parameters measured by
sensors and transfers to the server and users through MQTT protocol which has high
speed, and the user can use easily all information on his mobile. In this paper, we
have deeply reviewed this area and all related applications. Based on vast review, we
have highlighted all the features or functioning.
Keywords IoT · Raspberry · Sensor · Cloud computing · MQTT protocol
1 Introduction
The Internet of Things is the concept of connecting any tool (see you later because
it has an on/off transfer) to the Internet and different related devices. The IoT is a
huge network of connected topics and people—all of which gather and percentage
facts about the way they’ll be used and approximately the surroundings round them.
M. K. Pandey · D. Garg · N. K. Agrahari (B) · S. Singh

Department of Computer Application, National Institute of Technology Kurukshetra,
Kurukshetra, India
e-mail: agraharineeraj84@gmail.com
M. K. Pandey
e-mail: pmithilesh967@gmail.com
D. Garg
e-mail: erdeepakgarg21@gmail.com
S. Singh
e-mail: adhiraj03061996@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_23
216 M. K. Pandey et al.
Irrigation is the process of providing the water to plants at the wanted intervals.
Irrigation allows us to grow crops and re-vegetate the disturbed soils in parched areas.
It gives water transport at the proper time, within the proper amount, and within the
right location within the area which performs an essential feature inside the plant’s
boom. Water manipulates remotely which is likewise a hard venture; particularly,
the manipulate turns into greater hard within the direction of the shortage of water,
which can also in any other case damage the crop. By using sensors like moisture,
rain, and so on. Water delivers for irrigation can be managed effects with the useful
resource of analyzing the state of affairs of soil and weather. Soil moisture sensors
properly hit upon the degree the soil moisture and primarily based on that statistics;
the place is getting irrigated robotically with amazing deal plenty much less human
interventions. Smart irrigation, that is, the idea of doing irrigation in a modern day
manner, there are a variety of strategies that might be adopted in irrigation so that
the yield grows and could boom the production.
2 Related Work
In this paper [1], Cloud computing, Wi-Fi sensors, UAVs, and verbal exchange tech-
nologies are used. Also, diverse IoT primarily based structures are furnished along
admires to farm applications. In this paper [2], Message Queuing Telemetry Trans-
port (MQTT) is used which communicates with devices; it saves power by way
of which it is low luxurious. This calls for an awful lot less human intercession
and less maintenance for agricultural fields. In this paper [3], programmed irriga-
tion system screens just as keep up special soil dampness fulfilled material texture
through programmed watering. It utilizes microcontroller ATMEGA328P, soil damp-
ness sensors, and sensor readings are transmitted to an issue talk channel to deliver
diagrams for the appraisal. In this paper [4] test the edge values for climatic condi-
tions like humidity, temperature, moisture and sense the invasion of animals and
deliver ideas via SMS to the farmer at once on his cell the usage of GSM module.
3 The Necessity to Use the Cloud
Arrangement of a wi-fi sensor organizes (WSN) reenactment environmental factors

requires the consideration of many stuff viz suitable walking gadgets, appropriately
enough RAM necessities, and capacity. Additionally, the host PC should be run for
an extended term continually. A virtual stage is expected to satisfy the above prereq-
uisites which may be based on cloud completely. “Sensefinity machinates cloud
platform, except storing all of the measurements acquired from WSN, is likewise
chargeable for: information sources identity, performing records validation, parti-
tioning, and processing. The later includes on foot the irrigation set of guidelines for
detecting whenever the vegetation want irrigation [5].”
IoT-Based Smart Irrigation System 217
Fig. 1 Precept of cloud computing for IoT
Utilization of the cloud depository:

1. Data might be gotten to from any place.
2. Hardware necessities and value decreases.
3. The security of measurements will development (Fig. 1).
4 Benefits
1. Saves more money and more water.

2. Reduces the wish to bring water and store.
3. Make the garden convenient and smooth.
5 Different Features
Different features are the following:

1. Easy to install and operation.
2. Design a unique time table for watering.

3. Efficiently use of water.
4. Handle by a mobile phone application.
6 Literature Review
In this paper, some of these factors are taken into deliberation and the location of
various technologies; particularly, IoT is displayed to form the smart farm and more
efficient to satisfy future expectations. According to this aim, here, cloud computing,
UAVs, wi-fi sensors, and communique technologies are explained fully [1]. To require
an outstanding arrangement less human mediation and least wellbeing for horticul-
tural grounds, the machine is conceivable to use without issues for all agriculturists.
Additionally, the moment is taken care of, and water is utilized satisfactorily exclu-
sive of dipping. This MQTT has utilized which is capable of talking with the various
contraption. It is the lowest information move limit and espresso potential utilization
forms the projected contraption expenditure astounding [2]. The framework likewise
encourages ongoing far-flung tracking of the current ecological kingdom of place.
Present day innovation may be consolidated to permit down the cost [3]. This device
generates an irrigation time desk based on the sensed actual time facts from field
and statistics from the climate repository. This machine can advise farmers whether
or now not or not is there a need for irrigation [4]. Better and optimally farms are
irrigated appropriately which is the crop yield. So this portray has designed a brilliant
farming tool dependent totally on IoT with together sensor of moisture [6]. The edge
voltages are picked for modification of the sensors through because of past signifi-
cant lots of temperature and soil sogginess regards. Limit esteems can be different
based upon the crop and plantation. In the destiny, with the resource of introducing,
the device reading set of rules for use to device the information and decrease the
complexity of the hardware [7]. A channel is created with the aid of the open-supply
IOT platform is created to shop and show the soil moisture data and additionally to
govern the irrigation by the manner of the Internet [8]. These techniques take the
longer length and wasting the available water in higher quotes so it ends in the utiliza-
tion of water extra than what required [9]. The gadget proposed in this paper pursuits
the purpose of the mixture of structures with the engaging strengths provided through
the manner of cloud computing. It can be carried out for rural applications [5]. All
the machine features a characteristic sensor format for power standard performance,
charge performance, preservatives, with other gullibility to surrender the benefit of
use [10].
Fig. 2 Raspberry Pi 4
7 Hardware Description
7.1 Raspberry PI 4—Model B
Raspberry Pi is like a small PC, and it is lightweight, and it has an ARM processor.
Besides, it has HDMI port, wi-fi modules, USB ports, and Ethernet port. Raspberry
Pi has an operating system like Raspbian, Linux, Kali Linux, Snappy Ubuntu, Arch
Linux ARM, etc. It has an HDMI port, and it does not have HDD or SSD still we
can insert micro SD card into raspberry pi so that we can boot the operating system
of Raspberry Pi (Fig. 2).
7.2 Software Used
To implement, first, I have to install the software in Raspberry Pi 4. There is a need to

download two software and one operating system. The first software is Win32 Disk
Imager. The second software is SD card formatter, and one operating system for the
Raspberry Pi is the Raspbian operating system. The useful programming language
of Raspberry pi is Python.
Fig. 3 Shows the primary

circuit diagram of hard and
fast regulated energy delivery
7.3 Power Supply
Every single computerized circuit requires managed vitality dispatching. In this

context, we will find approaches to get controlled incredible conveyance from the
mains convey (Fig. 3).
7.4 Data Acquisition System
A propelled reality securing is chosen on an unmarried chip that fuses styles and
proceeds with hardware. MCP3208 exhibit in Fig. 4 is the driven IC which changes
over easy to twelve-piece automated pointers. It has ended up programmable each
unmarried or discrepancy couple input. The difference between non-linearity is ±1
LSB, and central nonlinearity is ±1 LSB. It also has a form of SAR. Safeguard
capacitor can achieve for 1.5 clocks/cycles starting at the fourth creation brink of
the consecutive clock. This chip put the capacitor divide the yield into 12 pieces and
change the pace of 100 ksps.
Fig. 4 Data acquisition chip

Fig. 5 Soil moisture sensor
Fig. 6 Temperature sensor
7.5 Soil Moisture Sensor
The exactness soil moistness has picked appeared in Fig. 5 which consolidates tests
that might be inserted into the earth. Exactly, when the present day experiences the
tests, the soil contains low clamminess which gives a better than the average plan of
abundance fundamentally less check and goes through extraordinary contemporary
day. Volatile defiance is the criterion to choose the fraction of soil condensation.
7.6 Temperature Sensor (LM35)
The temperature sensor combination is accuracy included route heat sensors as

attempted in Fig. 6, whose acquiesce electrical energy is straightly relating to the
centigrade heat.
7.7 Buzzer
A buzzer reviewed in Fig. 7 is used in this projected organization to offer notification

markers that the water pump is turned ON or OFF. It offers sound posted notification
each through mechanical, electric controlled, or equipment worked. The buzzer is a
sound hailing instrument.
Fig. 7 Buzzer
8 Block Diagram
See Fig. 8.
Fig. 8 Flow chart of the

process used
Fig. 9 Image of output
9 Result
• The above image describes the output of MQTT clients who are receiving
parameter values from different sensors (Fig. 9).
• Using MQTT protocol, all sensors parameters are transmitted to clients.
• If “Crop/node” resembles the MQTT node, then more clients in the same node can
receive multiple pieces of information from different clients placed in different
areas of fields.
10 Conclusion
IoT-based smart irrigation system reduces the human survey, utilization of water,
and the labor related with normal processes. By using simple electronic parts, this
smart irrigation system can be made at a low cost. To avoid the waste of water and
to use water efficiently, the smart irrigation system is very important. Also, it can
increase the creation of fruits or vegetables, and it helps the agriculture ground to
reduce the waste of water. In all the processes of a smart irrigation system, the MQTT
protocol plays the most important role. By the MQTT protocol, a smart irrigation
system becomes independent with the fast transmission of information. The benefit
of MQTT protocol is that whenever clients are not in range of node network, the
information will be sent, whenever clients come in range and connected with that
node network; then, they can see the information which has been sent before.
References
1. Ayaz, M., Ammad-Uddin, M., Sharif, Z., Mansour, A., Aggoune, E.-H. M.: Internet-of-Things
(IoT)-based smart agriculture: toward making the fields talk. IEEE Access (2019)
2. Islam, M.M., Hossain, M.S., Reza, R.K., Nath, A.: IOT based automated solar irrigation system
using MQTT protocol in Charandeep Chakaria. IEEE (2019)
3. Dokhande, A., Bomble, C., Patil, R., Khandekar, P., Dhone, N., Gode, C.: A review paper on
IOT based smart irrigation system. IJSRCSEIT (2019)
4. Sushanth, G., Sujatha, S.: IOT based smart agriculture system. IEEE (2018)
5. Saraf, S.B., Gawali, D.H.: IOT based smart irrigation monitoring and controlling system. IEEE
(2017)
6. Mishra, D., Khan, A., Tiwari, R., Upadhay, S.: Automated irrigation system-IOT based
approach. IEEE (2018)
7. Nageswara Rao, R., Sridhar, B.: IoT based smart crop-field monitoring and automation
irrigation system. IEEE (2018)
8. Benyezza, H., Bouhedda, M., Djellou, K.: Smart irrigation system based Thingspeak and
Arduino. IEEE (2018)
9. Pernapati, K.: IOT based low-cost smart irrigation system. IEEE (2018)
10. Vaishali, S., Suraj, S., Vignesh, G., Dhivya, S., Udhayakumar, S.: Mobile integrated smart
irrigation management and monitoring system using IOT. IEEE (2017)
A Hybrid Approach for Region-Based
Medical Image Compression
with Nature-Inspired Optimization
Algorithm
S. Saravanan and D. Sujitha Juliet
Abstract Medical modalities generate a massive amount of digital volumetric data

to analyze and diagnose medical problems. To sure over the data quality and storage
space, compression primes to be an efficient methodology in the digital world.
Medical images represent the body features for a diagnostic purpose that needs to
get compressed without a loss of information. Reducing the redundancies and repre-
sentation in a shorter manner achieved over a region of interest area on an image
solves the problem. The proposed methodology uses the region-based active contour
method driven by bat algorithm to segment an image into a region of interest and
non-region of interest. Region of interest area compressed by a lossless integer-based
Karhunen-Loeve transforms, where the non-region of interest compressed by a lossy
Karhunen-Loeve transforms. Optimum results suggest that the proposed method
improvises the region segmentation of a medical image, which results in achieving
a high PSNR, SSIM and a quality compressed image.
Keywords Medical image compression · Region of interest · Nature-inspired

algorithm · BAT
1 Introduction
Medical image compression attains to be an efficient process for image archiving in

hospitals. Computed tomography, magnetic resonance imaging, ultrasound, electro-
cardiography, X-ray, mammogram, etc. were the popular modalities that generate vast
medical data. Compression techniques were able to process substantial volumetric
S. Saravanan (B) · D. S. Juliet

Department of Computer Science and Engineering, Karunya Institute of Technology and
Sciences, Coimbatore, India
e-mail: saranrulz671@gmail.com
D. S. Juliet
e-mail: sujitha.juliet@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_24
226 S. Saravanan and D. S. Juliet
data to attain the rapid interactivity, context-oriented images, and also for quantita-
tive analysis [1]. Lossy, lossless, and hybrid are the categories of compression tech-
niques. To represent the medical information source without significantly degrading,
its quality is achieved through lossless compression. Two crucial factors which dealt
with image compression are redundancy and irrelevance. Removing duplicate infor-
mation from the image is named redundancy reduction, whereas unnoticed image
information avoids in terms of irrelevance reduction. A hybrid method of compres-
sion acts with two or more algorithms processed over an image to achieve the best
visual quality output. JPEG and JPEG2000 [2] are the well-known algorithms, which
brings the output image as same as the quality of the input image, which represents
as a lossless compression algorithm. DCT and DWT are the expansively used trans-
form in image compression models. Transform coding and predictive coding play a
significant role in the process of lossless compression technique. Predictive coding,
namely JPEG-LS, CALIC were used as a single predictor to achieve the lossless
compression.
In this article, we propose a region-based medical image compression model in
Sect. 3, where the region of interest and non-region of interest are segmented using
the region-based active contour method. The BAT optimization algorithm drives
further results for region-based segmentation. Region of interest part process with the
integer-based KL transform (IKLT) that realizes integer to integer mapping by factor-
ization for achieving a lossless image compression. Non-region area compressed with
the KL transform. Results are analyzed in Sect. 4 with the existing image compression
IDTT [3] and IDWT [4].
2 Related Works
In telemedicine to overcome the problem of storage space and transmission time,

a medical image compression algorithm is essential. Image compression is widely
classified into lossy technique and lossless technique. The hybrid compression tech-
nique combines two or more algorithms in achieving a high-quality compressed
image. Region-based medical image compression has become the most efficient
methodology in the region of interest selection method of compression. DCT, DWT,
and KLT are the widely used transform in image compression. As the medical data
need to be compressed with a loss of quality, lossless based compression is developed
such as JPEG-2000, JPEG-LS, and CALIC. Image segmentation has a key role in
the region-based compression model such as thresholding, edge detection method,
anatomical region separation, diseased area separation using K-means [5], fuzzy c
means [6], active contour method [7], etc. Multilevel thresholding [8] proposes on
achieving a higher PSNR and high SSIM metrics. Active-based contour also proved
efficient in separating the region of interest and non-region of interest area from the
medical image.
Advantages of active contour methods over the classical segmentation methods
are sub-pixel accuracy of edges of the objects; the formulation method is simple for
A Hybrid Approach for Region-Based Medical Image Compression … 227
energy minimization and can achieve a smooth closed contour result. Comparing
the types of active contour methods, the edge-based model [9] and the region-based
contour model [10] region-based active contour method proved to be efficient in
comparing it with the Chan-Vese method [11]. Metaheuristic algorithms are used to
furthermore driven with a segmentation algorithm to achieve an optimized output
selection. Particle swarm optimization algorithm on segmentation proposed to be
efficient in order to compress the image with a high-quality image. Bat optimiza-
tion with fuzzy encoding method proves to be efficient when its compared with the
tradition transform-based compression model. Using an optimization technique to
optimum the active contour method can bring out efficient segmented image output.
Integer-based transform [12] tends to achieve a lossless image compression model as
proved in [3]. When the transformed output are factorized to the nearby values results
in lossy compression technique. Integer to integer conversion is achieved through
the integer-based transform model to produce a lossless compression technique.
3 Proposed Method
Finding from the survey, it declares that a region-based compression model can be
an efficient methodology to obtain an exhaustive uncompressed image for the region
of interest area. In this section, image segmentation experiments with a region-based
active contour method, which is optimized using the BAT algorithm. Segmented
region of interest areas are lead into the integer-based KL transform for decorrelation
purpose, and non-region of interest area is compressed using the KL transform as
illustrated in Fig. 1.
3.1 Region-Based Active Contour Segmentation
In order to segment the region of interest and non-region of interest from the medical
database, an active contour method of segmentation is utilized. As compared to the
classic method of segmentation like edge detection [13], thresholding method, and
region growing method, active contour method attain sub-pixel accuracy of the area
boundaries, it is easy to formulate under the energy minimization context. As a
Region based
acve ROI IKLT
Compressed
Input medical contour
image method image
driven by BAT NON-ROI KLT
algorithm
Fig. 1 Flowchart of the proposed methodology

result, it achieves smooth and closed contours. Region-based active contour method
is implemented in this article that aims to identify each region of interest using a
certain region descriptor in order to guide the movement of active contour. It works
based on the intensity homogeneity [14]. In the article [14], overcome of assumption
problem in region-based active contour is achieved, which is implemented in the
proposed method.

N
ε=∫ ∫ K (y − x)|I (x) − b(y)ci | dx dy
2
(1)
i
i=1
Notation describes that true image (J) and observed image (I) with as image
domain with N distant constant value C 1 ,…, C N at disjoint regions say 1 ,…N ,
which results in minimizing the energy function Eq. (1). b denotes the component for
intensity inhomogeneity, K as kernel function chosen as truncated Gaussian function
as followed in [14]. Metaheuristics algorithm is clubbed with the active contour
method to achieve the optimal segmented region of the medical data. BAT algorithm
[15] is driven with a region-based active contour method to adaptively select the
external energy weights and escape the local minima to obtain the classical contour
method.
The BAT optimization is a metaheuristic algorithm based on the echolocation
property of the natural bats. Described in notation as bat frequency (f i ), position (X i )
and velocity (vi ) are initiated with loudness (Ai ) and pulse rate (r i ). According to the
number of iterations, the velocity and position of the bats are updated.

f i = f min + f min − f max β (2)
vij = vij (t − 1) + X̂ j − X̂ j (t − 1) f i (3)
X ij (t) = X ij (t − 1) + vij (t) (4)
where minimum frequency ( f min ), maximum frequency ( f max ), and β denote the
randomly selected in the interval [0,1]. To create random walks using Eq. (5) and
Eqs. (6) and (7) are used to update the loudness and pulse rates.
X new = X old + ε A(t) (5)
Ai (t + 1) = α Ai (t) (6)

ri (t + 1) = ri (0) 1 − e−γ t (7)
where α and γ are constants with bats, average loudness at time (t), which is signified
by A(t); the strength and direction of the random walk are employed using the vari-
able ε ∈ [−1, 1]. Figure 2 compares the region-based active contour method with
the existing Chan-Vese active contour method, and as it says that the region-based
segmentation works efficiently. Table 1 denotes the algorithm of BAT optimization
algorithm.
Fig. 2 Segmentation results using Chan-Vese contour method and region-based active contour
method
Table 1 BAT optimization

Objective function f (x), where x = (x1 , . . . , xn )
algorithm
Step 1: Initialize the position xi , velocity vi , and pulse
frequency f i for bat population
Step 2: Initialize the pulse rates ri , loudness Ai —max no of
iterations T
Step 3: Each iteration t in T, repeat step 4 to step 16
Step 4: For each BAT bi , repeat step 5 to step 14
Step 5: Creating new solutions using Eq. (2)–(4)
Step 6: IF (rand > ri )
Step 7: Choose the solution among the best solutions
Step 8: Create a local solution within the best solution
Step 9: END IF
Step 10: IF (rand < Ai ) and (f (xi ) < f (GlobalBest ))
Step 11: Admit the new solutions
Step 12: Increase ri and reduce Ai
Step 13: END IF
Step 14: END FOR
Step 15: Order the bats rank and find GlobalBest
Step 16: END FOR
3.2 Integer Karhunen-Loeve Transform
KLT is a linear transformation that removes the redundancy by decorrelating the

data, which achieves an effective compressed method. Results in factorized output
which reaches a lossy compression model. In order to achieve a lossless compression
method, an integer-based KLT is declared. IKLT which proposes a matrix factoriza-
tion of converting to integer outputs. The main advantages of IKLT over KLT are
lossless conversion, huge energy compaction, efficient lifting plan, and enhancement
over the linearity. And moreover, it reverts as zero-missing data during the process
of compression and decompression using IKLT. Factorization process [12] results of
eigenvectors into four matrices, P, L, U, and S. P represents the pivoting matrix; L
and S state lower triangular elementary reversible matrices (TERM), and U denotes
upper TERM. Equation (8) denotes the integer to integer version of A matrix of
an image. Non-region of interest is compressed by the linear KL transform, which
results in lossy compression.
Ã : Z N → Z N , Ã = P L̃ Ũ S̃ (8)
After the combination in region of interest and non-region of interest, the performance
metrics evaluation like peak signal-noise ratio, mean square error, compression ratio,
and SSIM are used for findings. Sample input images considered for evaluation and
output compression images obtained are illustrated in the Fig. 3.

√
PSNR = 10 ∗ log10 2552 MSE (9)
Peak signal-noise ratio is a parameter for assessing the quality of the compressed
image. It is defined in Eq. (9). MSE defines the mean square error using Eq. (10).
The compression ratio is achieved by the size of the input image divided by the size
of a compressed image as given in Eq. (11).
1
MSE = × ( f (x, y) − F(x, y))2 (10)
N i j
Size of the Original Image

Compression Ratio (CR) = (11)
Size of the compressed image
Structural similarity (SSIM) is an important metric used to measure the similarity

between the input and output images. The equation for SSIM is defined in Eq. (12).
Fig. 3 Input images with compressed images using proposed methodology

2μx μ y + C1 2δx y + C2
SSIM = 2 (12)
(μx + μ2y + C1 ) δx2 + δ 2y + C2
The proposed method is compared with the existing algorithms like integer-
based DWT [4], inter based DTT [3]. And the results are analyzed in Table 2. The
proposed method outperforms when compared with the existing compression algo-
rithms. SSIM values from Table 2 reflect that the proposed method can able to regain
its best quality output image with the highest similarity of 0.998. Bolded values in
Table 2 indicates the highest value achieved when compared with other existing algo-
rithms. And it proves that the region-based compression technique works efficiently
with the medical image in terms of separating the region of interest and non-region
of interest. It also proved to be achieving a high-quality compressed image with a
higher PSNR value of 44 dB. Figure 4 describes the PSNR value analysis.
Table 2 Comparison of the proposed method with other existing algorithms for image compression
Images Methodology PSNR CR MSE SSIM
Image 1 IDWT 40.14 4.1 2.19 0.9971
IDTT 42.01 4.49 2.53 0.9975
42.39 4.58 2.03 0.9998
IKLT (proposed)
Image 2 IDWT 40.64 3.91 2.16 0.9972
IDTT 40.16 4.18 2.20 0.9979
40.86 4.29 2.09 0.9996
IKLT (proposed)
Image 3 IDWT 40.26 4.40 2.17 0.9991
IDTT 41.10 4.8 2.09 0.9996
41.4 4.93 1.9 0.9997
IKLT (proposed)
Image 4 IDWT 41.92 4.25 2.48 0.9971
IDTT 42.07 4.72 2.60 0.9979
43.21 4.91 2.32 0.9997
IKLT (proposed)
Image 5 IDWT 41.2 3.72 2.40 0.9989
IDTT 42.81 4.04 2.15 0.9994
42.13 4.12 2.23 0.9998
IKLT (proposed)
Fig. 4 Comparison of PSNR

achieved using proposed and
existing algorithms
5 Conclusion
Region-based medical image compression is achieved with a high PSNR and CR

value through the region-based active contour method, which is optimized by the
metaheuristic BAT algorithm. Integer-based KL transform is used to decorrelate
the image and produce the region of interest compressed image with the factoriza-
tion integer to an integer output value. KL transform decomposes the non-region of
interest image, and the combination of the compressed images achieves a higher simi-
larity index. Thus, an efficient model of region-based image compression algorithms
is evaluated in order to achieve a high fidelity compressed image.
References
1. Gonzalez, R.C., Woods, R.E., Masters, B.R.: Digital image processing, Third Edition. J.
Biomed. Opt. 14(2), 029901 (2009). https://doi.org/10.1117/1.3115362
2. Skodras, A., Christopoulos, C., Ebrahimi, T.: The JPEG 2000 still image compression standard.
IEEE Signal Process. Mag. 18(5), 36–58 (2001). https://doi.org/10.1109/79.952804
3. Xiao, B., Lu, G., Zhang, Y., Li, W., Wang, G.: Lossless image compression based on integer
Discrete Tchebichef Transform. Neurocomputing 214, 587–593 (2016). https://doi.org/10.
1016/j.neucom.2016.06.050
4. Nagendran, R., Vasuki, A.: Hyperspectral image compression using hybrid transform with
different wavelet-based transform coding. Int. J. Wavelets Multiresolut. Inf. Process. 17(2),
1–21 2019. https://doi.org/10.1142/s021969131941008x
5. Chen, X., Zhou, Y., Luo, Q.: A hybrid monkey search algorithm for clustering analysis. Sci.
World J. 2014 (2014). https://doi.org/10.1155/2014/938239
6. Vincent, C.S., Janet, J.: An enhanced N-pattern hybrid technique for medical images in
telemedicine. Procedia Comput. Sci. 79, 305–313 (2016). https://doi.org/10.1016/j.procs.2016.
03.040
7. Palanivelu, L.M., Vijayakumar, P.: Effective image segmentation using Particle Swarm Opti-
mization for image compression in multi application smart cards. In: Proceedings of the World
Congress Information and Communication Technologies WICT 2011, pp. 535–539 (2011).
https://doi.org/10.1109/wict.2011.6141302
8. Horng, M.H.: Multilevel thresholding selection based on the artificial bee colony algorithm for
image segmentation. Expert Syst. Appl. 38(11), 13785–13791 (2011). https://doi.org/10.1016/
j.eswa.2011.04.180
9. Xie, W., Li, Y., Ma, Y.: PCNN-based level set method of automatic mammographic image
segmentation. Optik (Stuttg) 127(4), 1644–1650 (2016). https://doi.org/10.1016/j.ijleo.2015.
09.250
10. Zuo, Z., Lan, X., Deng, L., Yao, S., Wang, X.: Optik an improved medical image compression
technique with lossless region of interest. Optik—Int. J. Light Electron Opt. 126(21), 2825–
2831 (2015). https://doi.org/10.1016/j.ijleo.2015.07.005
11. Mandal, D., Chatterjee, A., Maitra, M.: Robust medical image segmentation using particle
swarm optimization aided level set based global fitting energy active contour approach. Eng.
Appl. Artif. Intell. 35, 199–214 (2014). https://doi.org/10.1016/j.engappai.2014.07.001
12. Hao, P., Shi, Q.: Matrix factorizations for reversible integer mapping. IEEE Trans. Signal
Process. 49(10), 2314–2324 (2001). https://doi.org/10.1109/78.950787
13. Kiran, R., Kamargaonkar, C.: Region separation techniques for medical. 1314–1325 (2016)
https://doi.org/10.15680/ijirset.2016.0502021
14. Li, C., Huang, R., Ding, Z., Gatenby, J.C., Metaxas, D.N., Gore, J.C.: A level set method for
image segmentation in the presence of intensity inhomogeneities with application to MRI.
IEEE Trans. Image Process. 20(7), 2007–2016 (2011). https://doi.org/10.1109/TIP.2011.214
6190
15. Yang, X.S.: A new metaheuristic bat-inspired algorithm. Stud. Comput. Intell. 284, 65–74
(2010). https://doi.org/10.1007/978-3-642-12538-6_6
Attention Mechanism-Based News
Sentiment Analyzer
Sweta Kaman
Abstract Sentiment Analysis is the task of determining the feeling or opinion of

a chunk of text. One of the crucial applications of sentiment analysis is to classify
the various news articles into three fundamental categories of sentiments—negative,
positive, and neutral. By considering the sentiments of news articles, we can figure out
whether the writer’s sentiment is negatively or positively oriented. Multiple models
have been constructed to analyze the news articles but none of them compassed at
both the levels of a sentence as well as the whole document. In this paper, I have
proposed a method which can perform this task at both the levels with accurate results,
by using LSTM network and a deep learning framework, i.e., attention network.
Keywords Sentiment analyzer · News web crawler · Attention mechanism · Deep

learning · Text classification · Semeval 2016 · LSTM · NLP
1 Introduction
The global news media or news industry which is a very important source of infor-
mation linked with all of our lives is associated with numerous biased and unbiased
press groups according to [1]. We still await to believe that these news sources
which influence each one of us in many different ways, delivers true stories to the
public and not the sugar-coated one. However, some news channels, press groups and
websites are corrupt and political party dedicated and intentionally publishes fake
news, hate speech, etc., that transmits an alarming and disturbing environment. They
are awfully engaged in publishing negative stories that they have forgotten their role
in the society, i.e., enlightening the citizens of a country with reality and spreading
positivity and hope. Attention mechanism [2] accommodates a neural network to pay
attention to only a specific part of an input sentence while generating a translation
much like a human translation. The task of sentiment analysis [3] along with the
attention mechanism will help us to identify those websites and news sources by
S. Kaman (B)
Department of Science of Intelligence, IIT Jodhpur, Karwar, India
e-mail: kaman.1@iitj.ac.in
https://doi.org/10.1007/978-981-33-4543-0_25
236 S. Kaman
paying attention to each of the negative words present in the article and finding their
intentions that will ultimately lead us to prevent us from following them and sustain
an environment filled with positivity. The goal of the proposed model in this paper is
to analyze the sentiments of the article and to classify them into negative, positive,
and neutral with high accuracy.
The brief explanation of the structure of this paper is elaborated as follows.
Section 2 of this paper discusses the pros and cons of the existing methods; Sect. 3
of this paper discloses the dataset used in the project and the method to prepare
the train and test dataset; Sect. 4 elaborates the proposed methodology step by step
in completing the task; Sect. 5 of this paper presents the experimental results and
predictions of the proposed model; Sects. 6 and 7 conclude this paper with some
research accomplishments and future work of this project.
2 Related Work
Analyzing the sentiments of a news article have been in trend since last few years.
Numerous models have been proposed to perform this task with high accuracy. One
of the method is proposed by Lim et al. [4] in which the author has used a machine
learning technique to predict the opinion of an article with business headlines. The
task of analyzing the sentiments has also been modified by classifying the text into
two categories, i.e., “good news and bad news” as proposed by Alvarez et al. [5],
where they have focused only in positive and negative sentiments. But in my opinion,
both the sentence and document level analysis, by focusing at all three sentiments are
important while classifying the article, which is not included in the existing models
and the methodology of which is interpreted in the next sections of this paper.
3 Data Preparation
3.1 Train Dataset
SemEval-2016 Task 4 [6] has been used as the training data for my model which
consists of a combination of training and additional datasets. The training dataset
lacked necessary amount of data to train my model so I successfully assembled all
the additional data files and train data files into a single file consisting of 53,368
sentences since “larger the corpus, higher the accuracy.” The training data contains
three basic sentiment columns which I explicitly constructed corresponding to each
of the train sentences. I named these three columns as negative, positive, and neutral.
The values inside each of these columns denote the presence or absence of the name
head sentiment tag, i.e., if the value is 1 under the neutral column, then the sentiment
Attention Mechanism-Based News Sentiment Analyzer 237
Fig. 1 An instance of training data (Source Jupyter Notebook)
of the sentence is neutral, and if the value is 0, then vice versa, and so on (follow
Fig. 1).
3.2 Test Dataset
BeautifulSoup [7] that analyzes the XML and HTML files and is a Python library
which has been used in the proposed model to create a news crawler. The input
to the crawler will be news articles from any news sources, and the output will be
preprocessed and clean tokenized sentences of the article. These sentences will be
compiled together with the whole article itself to reconstruct the test data, i.e., if the
number of sentences in the article is “n,” then “n + 1”th row in the file will consist
of the article itself for the document level analysis. Three columns will be explicitly
generated as described in Sect. 3.1, but here, the values will be initialized to 0.
After performing some primitive preprocessing on the train and test datasets, glove
embeddings [8] which stand for global vector embeddings are used, which were
invented by a group of Stanford researchers. This assists us to consider the global
property of the dataset unlike the other word embeddings like Word2Vec [9] which
considers only the local property of the dataset. This makes use of the co-occurrence
matrix or simply the count matrix which helps in extracting the semantic relationship
between the words and to predict the words which are semantically sound to the words
around it. This reduces the dimension of the word vector embedding, and the glove
embeddings which I have used in this model are 300 dimension vectors. Then, the
text to sequence is done, and padding is added in the data after which the shape of
the train data will be (53,368, 150), and the shape of final test data will be (310,
150). After segregating the train data into train and validation with a ratio of 80:20,
it is ready to be fed into the neural network which uses attention mechanism and
LSTM network that helps us to decide what type of output we want to generate.
The activation function used is “relu,” the optimizer used to minimize the loss value
which is “rmsprop,” and the loss function used is “binary_crossentropy.” The brief
summary about the network is displayed in the following Fig. 2.
238 S. Kaman
Fig. 2 Summary of the model (Source Jupyter Notebook)
5 Results
The model which is being depicted in Sect. 4 is now ready for training and for which
I set the number of the batch size as 256 and number of epochs as 25. After the
successful training, the model achieved an accuracy of 0.9186 and loss of 0.1885
which is displayed as follows (Fig. 3). The validation loss and accuracy are 0.1948
and 0.9257 respectively.
5.1 Final Predictions
The final output consists of predicted scores for the probable sentiments corre-
sponding to each of the sentences of the test dataset. There are total 310 rows in
the final result out of which the last row, i.e., ID 309 indicates the whole article itself,
and the remaining rows are sentences of the article. The three sentiment tag columns
consist of predicted scores such that the highest amongst all three scores will decide
the final sentiment of the sentence. The last row of the output is the document level
prediction of the news article, corresponding to which the highest score is 0.999970
that comes under the column ‘neutral, as illustrated in the figure. This affirms that
Fig. 3 Statistics after training the model

Attention Mechanism-Based News Sentiment Analyzer 239
Fig. 4 Final predictions at sentence and document level (Source Jupyter Notebook)
the document level sentiment is neutral. And rest of the rows in the final prediction
are the sentence level predictions of the test dataset which is laid out in the following
Fig. 4.
6 Conclusions
This project can successfully contribute to the task of analyzing the sentiment of
news sources. The three elementary sentiments, i.e., neutral, negative, and positive
are successfully predicted with an accuracy of 0.9186 by using attention mechanism
at sentence and document level. This model is not limited to predict the sentiments for
news articles only but can also be modified further and can be implied in numerous
other fields of natural language processing.
7 Future Work
The task of sentiment analysis has an ample amount of application areas, and in the
forthcoming years, there will be an increase in these numbers. This project can be
further modified to get much better performance by using XLNET [10] or BERT
[11] which are the pre-trained models of deep learning. The news articles which I
have used can be replaced by the conversations of people and the task of deception
detection can comply in it. Determine the emotions and mental health of people
240 S. Kaman
during the times of pandemic like COVID-19 [12], or to detect fake news’ in multiple
platforms since it spread chaos amidst people.
Acknowledgements This project has been fortunately executed because of the inspirational ideas
and teachings I got from numerous remarkable projects of Dr. L. Dey, chief scientist at TCS Research
and Innovation, India.
References
1. Eveland, J.W.P., Shah, D.V.: The impact of individual and interpersonal factors on perceived
news media bias. Polit. Psychol. 24(1), 101–17 (2003)
2. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Polosukhin, I.:
Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–
6008 (2017)
3. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2),
1–135 (2008)
4. Lim, S.L.O., Lim, H.M., Tan, E.K., Tan, T.P.: Examining machine learning techniques
in business news headline sentiment analysis. In: Computational Science and Technology,
5. Alvarez, G., Choi, J., Strover, S.: Good news, bad news: a sentiment analysis of the 2016
Election Russian Facebook Ads. Int. J. Commun. 14, 27 (2020)
6. Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: SemEval-2016 task 4: sentiment
analysis in Twitter. arXiv preprint arXiv:1912.01973 (2019)
7. Chandrika, G.N., Ramasubbareddy, S., Govinda, K., Swetha, E.: Web scraping for unstruc-
tured data over web. In: Embedded Systems and Artificial Intelligence, pp. 853–859. Springer,
Singapore (2020)
8. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In:
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
(EMNLP), pp. 1532–1543 (2014)
9. Rong, X.: Word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 (2014)
10. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: generalized
autoregressive pretraining for language understanding. In: Advances in Neural Information
Processing Systems, pp. 5754–5764 (2019)
11. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
12. Mehta, P., McAuley, D.F., Brown, M., Sanchez, E., Tattersall, R.S., Manson, J.J.: COVID-19:
consider cytokine storm syndromes and immunosuppression. Lancet 395(10229), 1033–1034
(2020)
Interactive Chatbot for COVID-19 Using
Cloud and Natural Language Processing
Patel Jaimin, Patel Nehal, and Patel Sandip
Abstract Spreading of COVID-19 is making it very hard for healthcare departments

and governments to solve people queries; due to limited front-desk assistance. This
problem can be solved by cutting-edge technology like artificial intelligence. The
purpose of this study is to implement AI-powered chatbot, which is very helpful to
give necessary information about the disease to the user in a conversational way. With
the help of NLP technology, Cloud and message application, we can build chatbots
very easily. We can easily define the flow of the conversation in an AI service, which
have an excellent capability of natural language technology. We have built a chatbot
which can give common question–answer to the user and on the top of it also able
to predict the sign and severity of COVID-19 based on user’s symptoms. As users
interact with the bot, we can make our NLP model more accurate, by training it
further, to understand users’ questions meaning.
Keywords COVID-19 · Health care · Cloud computing · Artificial intelligence ·

Natural language processing · Chatbot
1 Introduction
A new disease in late December 2019 has been spreading rapidly in China, which
has been named “coronavirus disease 2019,” also known as “COVID-19” [1]. Within
a few weeks, COVID-19 disease has been spread rapidly outside china in all over the
world. On March 11, the World Health Organization has declared it as a pandemic
P. Jaimin · P. Nehal (B) · P. Sandip

Smt. K D Patel Department of Information Technology, Chandubhai S. Patel Institute of
Technology (CSPIT), Faculty of Technology & Engineering (FTE), Charotar University of
Science and Technology (CHARUSAT), Changa, Gujarat 388421, India
e-mail: nehalpatel.it@charusat.ac.in
P. Jaimin
e-mail: jaimin2751997@gmail.com
P. Sandip
e-mail: sandippatel.it@charusat.ac.in
https://doi.org/10.1007/978-981-33-4543-0_26
242 P. Jaimin et al.
[2]. COVID-19 is a respiratory disease spreading from person to person contact.

The incubation period [3] of this disease is between 2 and 14 days. The primary
symptoms are fever, cough and shortness in breath, although studies found that
a majority of cases, up to 80%, are asymptomatic in India [4]. Older adults with
underlying medical conditions like heart or lung disease or diabetes are at higher
risk for developing COVID-19 severe illness [3]. Currently, there is no vaccine or
treatment available for COVID-19. So, the only way to help slow the spread of this
virus is Nonpharmaceutical Interventions [5] taken by people and communities. In
response, the government is working closely with public health partner.
In this pandemic situation, government and health departments are receiving a high
amount of requests about COVID-19 information which is very difficult to process
in a short time. Due to technological advancements, chatbots [6] play leading role by
bridging the gap between patients and clinicians. Because of the recent advancement
in natural language processing [7], chatbots can recognize the meaning of a user’s
message (abbreviated “Intent [8]”) and give the relevant information accordingly in
reply. Hence, the user can directly ask their specific questions and saves much time
going through traditional interface on the website [9]. Chatbots save healthcare costs
when used in place of a human or assist them as a preliminary step of helping to
assess a condition and providing self-care recommendations [10].
This project aims to spread awareness about COVID-19 by giving relevant infor-
mation to the user’s questions and predicts the severity of COVID-19 based on the
user’s inputted symptoms. Apart from this, this research shows that how the cutting
edge field artificial intelligence [11] becomes useful in this type of pandemic situation
with the help of the cloud.
2 Literature Review
Table 1 show that 17 chatbots regarding COVID-19. In these bots, we have found
only one conversational bot, which is Providence St. Joseph. Some bots are providing
primary information about COVID-19 in the form of question–answer, while some
are checking the sign or severity of COVID-19 based on the user’s symptoms.
3 Proposed Model
Figure 1 depicts the COVID-19 Assessment bot flowchart. There are a total of three
services used in this system—(i) Messenger application [12], which is responsible
for the user interface, (ii) IBM Watson Assistant [13], which is natural language
processing service from IBM; it is liable to detect the intent of the text message and
deliver an appropriate response back to the messenger application and (iii) AWS
Lambda [14], which is a cloud service where custom code can be executed. It is
responsible for predicting the severity and sign of COVID-19. The flow of this system
Interactive Chatbot for COVID-19 Using Cloud … 243
Table 1 Chatbot survey

Bot name Platform AI Conversational Symptoms Q/A
service checker
World Health Organization’s health alert Whatsapp No No No Yes
(+41 79 893
18 92)
Cleveland clinic’s tool [15] Webchat No No Yes No
Cobot-19 Whatsapp No No No Yes
(+91
7948058218)
Coronavirus self-checker [16] Webchat Yes No Yes No
COVID-19_Mohw [17] Facebook No No No Yes
Messenger
COVID-19-assessment-chatbot-template Webchat No No Yes No
[18]
COVID-19-cases-tracker [19] Webchat No No No No
COVID-19-faq-chatbot [20] Webchat No No No Yes
Delhi Govt—corona helpline Whatsapp + No No No Yes
91
8800007722
Gog covid-19 helpline Whatsapp + No No No Yes
91
7433000104
GOV.UK Whatsapp + No No No Yes
44
7860064422
MyGov corona helpdesk Whatsapp + No No Yes Yes
91
9013151515
MyGov corona hub [21] Facebook No No No Yes
Messenger
Project baseline [22] Webchat No No Yes No
Providence St. Joseph [23] Webchat Yes Yes Yes Yes
Symptoms checker [24] Webchat No No Yes No
TS Gov Covid Info Whatsapp + No No No Yes
91
9000658658
starts with the user text entered in messenger application, which will be sent to IBM
Watson Assistant to recognize the meaning of the text. The assistant identifies the
intent of the text message and sends the response back to the messenger application.
The response is defined in the dialog of that particular intent. If the assistant identifies
the intent as the assessment test intent, then POST request webhook will be called
to AWS lambda, where custom code will be executed and give the prediction of
COVID-19 sign and severity as a webhook response, which will be sent back to
messenger application by Assistant as a dialog response.
Fig. 1 System workflow for COVID-19 assessment
4 Implementation
COVID-19 assessment bot implemented in node.js [25] programming language. The

user interface of this bot is the Facebook messenger application. Facebook page is
required to give a unique identity to a bot. Messenger application sets up from
Facebook for developers [26]. Subscribing the Facebook page from the messenger
application links the page to the application so that the user can communicate with
the page through the messenger application.
In order to make bot conversational, AI service is essential, which can recognize
the meaning of the user’s message and respond accordingly. We have chosen IBM
Watson Assistant for this bot. In Watson Assistant, we defined some intent and
entity. Intent is responsible for getting the meaning of text message [27], and Entity
is responsible for recognizing stored values [28]. In Watson Assistant, we have to
define some dialog and set input in the form of intent or entity. When Watson Assistant
recognizes set intent, the dialog will be triggered and send the response text, which
has configured in that dialog. Besides, a webhook is required to run some custom
code from an external server to get a response text. In this application, we have
used AWS lambda service to predict COVID-19 sign and severity based on user
inputted symptoms. Lastly, the Facebook messenger application needs to be linked
with Watson Assistant to make the bot conversational. When a user enters some text,
it will be sent to IBM Watson Assistant by messenger application. Watson Assistant
will perform some natural language processing and give the response text based on
the user’s text meaning. So, in this way, a user will get a personalized experience with
the bot. It saves the user’s time and effort to get the required information as they can
directly ask their query instead of going through structured design on websites to get
information. Figure 2 depicts Chatbot Profile, Fig. 3 represents Welcome message,
Fig. 4 denotes question–answer and Fig. 5 shows Symptoms Checker.
Fig. 2 Chatbot profile

Fig. 3 Welcome message
5 Conclusion
In this demo bot, we are successfully able to give information to the user based on the
message. This type of healthcare bot does not require having a dedicated server and
storage. It can be easily implemented in the cloud with the help of AI services such
as IBM Watson Assistant [13], Google Dialogflow [29], and Microsoft Azure [30].
In addition, we can easily integrate it into a website or any other messaging platform.
Thus, a chatbot is very useful, especially in this type of pandemic situation, when the
healthcare department can lower their burden to solve primary user’s queries with
the help of conversational chatbots.
Fig. 4 Question–answer
Fig. 5 Symptoms checker
References
1. APA Wu, Y.-C., Chen, C.-S., Chan, Y.-J.: The outbreak of COVID-19. J. Chin. Med. Assoc.
83(3), 217–220 (2020). https://doi.org/10.1097/jcma.0000000000000270
2. https://www.cdc.gov/mmwr/volumes/69/wr/mm6918e2.htm?s_cid=mm6918e2_w
3. https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html
4. https://www.indiatoday.in/newsmo/video/what-are-asymptomatic-covid-19-cases-1670422-
2020-04-24
5. https://www.cdc.gov/nonpharmaceutical-interventions/index.html
6. https://en.wikipedia.org/wiki/Chatbot
7. https://en.wikipedia.org/wiki/Natural_language_processing
8. https://www.nlpworld.co.uk/nlp-glossary/i/intent/
9. Valtolina, S., Barricelli, B.R., Di Gaetano, S.: Communicability of traditional interfaces VS
chatbots in healthcare and smart home domains. Behav. Inf. Technol. 39(1), 108–132 (2020)
10. Fadhil, A.: Beyond patient monitoring: conversational agents role in telemedicine and
healthcare support for home-living elderly individuals. arXiv preprint arXiv:1803.06000 (2018)
11. https://en.wikipedia.org/wiki/Artificial_intelligence
12. https://developers.facebook.com/docs/messenger-platform/
13. https://cloud.ibm.com/docs/services/assistant?topic=assistant-getting-started#getting-started
14. https://docs.aws.amazon.com/lambda/latest/dg/welcome.html
15. https://my.clevelandclinic.org/landing/preparing-for-coronavirus
16. https://www.cdc.gov/coronavirus/2019-ncov/testing/diagnostic-testing.html
17. https://www.messenger.com/t/COVID19.MOHW.BW
18. https://hellotars.com/chatbot-templates/coronavirus-covid19-fight/NkBd08/covid-19-assess
ment-chatbot-template
19. https://hellotars.com/chatbot-templates/coronavirus-covid19-fight/VJXnAu/covid-19-cases-
tracker
20. https://hellotars.com/chatbot-templates/coronavirus-covid19-fight/NJJrH-/covid-19-faq-cha
tbot
21. https://www.facebook.com/MyGovIndia/
22. https://www.projectbaseline.com/study/covid-19/
23. https://coronavirus.providence.org/
24. https://www.buoyhealth.com/symptom-checker/
25. https://nodejs.org/en/docs/
26. https://developers.facebook.com/docs
27. https://cloud.ibm.com/docs/assistant?topic=assistant-intents
28. https://cloud.ibm.com/docs/assistant?topic=assistant-entities
29. https://dialogflow.com/
30. https://azure.microsoft.com/en-in/
Investigating the Performance
of MANET Routing Protocols Under
Jamming Attack
Protiva Sen and Mostafizur Rahman
Abstract Mobile ad hoc networks are a genre of wireless networks that can perform
as both routes and hosts and have the competency to organize dynamically without
using static infrastructure. Because of the scarcity of central administration and rapid
topological changes, they are mostly affected by various security attacks. Jamming
attack is a physical layer attack which is responsible for decreasing network perfor-
mance by isolating the communication with neighboring nodes. This paper aims to
find out the network performance under jamming attack on three routing proto-
cols such as geographical routing protocol (GRP), optimized link state routing
protocol (OLSR) and ad hoc on-demand distance vector (AODV). The simulation of
these protocols is considered with respect to performance parameters, network load,
throughput and delay by using Riverbed simulator. Finally, the outcome of different
scenarios is compared to find out the better performing protocols in case of jamming
attack.
Keywords MANET · Jamming attack · Riverbed · AODV · GRP · OLSR
1 Introduction
In recent days, mobile ad hoc network (MANET) has gained its popularity because
of its dynamic characteristics and mobility and also can handle any kind of changes
that is happened within the network. However, it is not mandatory to have any central
management [1]. Every node in the network plays a role to discover the routes and
maintain connection with other nodes around. A great benefit of MANET is that
it can be generated in any place, any time and any natural conditions without the
necessity of any pre-installed infrastructure [2]. Note that wireless network faces
P. Sen (B) · M. Rahman

Department of Electronics and Communication Engineering, Khulna University of Engineering
and Technology, Khulna 9203, Bangladesh
e-mail: protiva.ete@gmail.com
M. Rahman
e-mail: mostafiz963@yahoo.com
https://doi.org/10.1007/978-981-33-4543-0_27
252 P. Sen and M. Rahman
more security challenges than wired network because of the mobility of the nodes
[3, 4]. Pulse jamming attack is one of the most serious attacks in denial of service
(DoS) attack which prevents information transmission between genuine sender and
receiver. Sometimes malicious node can detect the original signal and destruct the
communication [5]. In this investigation paper, we represent the performance of three
routing protocols as AODV, OLSR and GRP using medium FTP, medium email and
low database traffic with respect to performance parameters, throughput and delay.
The same things are implemented under pulse jamming attack to find out the best
performed protocol under attack and without attack.
1.1 Jamming Attack
Jamming attack is a well-known DoS attack. Due to the characteristics of wireless

communication, MANET is subjected to security attack [3, 6]. The responsibility of
jamming attack is to prohibit nodes from transmission and reception of data packets
on network [1]. It is caused by matching the radio frequency of sender node by a
jammer device which continuously transmit radio signal to match with it [2, 7]. The
performance of the network is decreased by affecting network load, throughput, end
to end delay, data dropped, etc. (Fig. 1).
2 Related Works
Singh and Gupta [1] analyzed the performance of MANET under jamming attack and
without jamming attack. AODV routing protocol was selected for simulation. The
network performance was surveyed with simulation results of delay, data dropped
and network load. From the comparison, it was concluded that jamming attack is
Fig. 1 Jamming attack

Investigating the Performance of MANET Routing Protocols Under … 253
responsible to decrease the network performance. Rao et al. [2] investigated the
performance of MANET routing protocols such as AODV, DSR, GRP and OLSR.
Eight performance metrics were used to compare the simulation result. Based on
these results, OLSR’s performance proved better than others. Popli and Raj [3]
configured a network with high mobility and AODV routing protocol. The network
in normal condition was compared with the network under jamming attack for end
to end delay and throughput. Jassim [8] presented the effect of jamming attack on
WLAN. Jamming attack decreased the throughput and increased delay. To mitigate
the jamming attack, PCF was enabled into the guard nodes.
3 MANET Routing Protocols
Depending on different routing criteria, MANET routing protocols are classified.

Based on the way of routing information attained and maintained, MANET routing
protocols are categorized into proactive routing protocol (table-driven), reactive
routing protocol (on-demand) and geographical routing protocol (Hybrid).
3.1 AODV Routing Protocol
Ad hoc on-demand routing protocol is used for mobile nodes in MANET which
can handle thousands of nodes at a time. It performs route table management to
find only one destination route instead of multiple routes and no other nodes are
required to maintain it. The main features of this protocol are that it can adopt
quickly and requires lower processing and lower utilization of network [9]. There
are three message formats which are used in this protocol. One of them is route
request (RREQ) which is a broadcast routing and used to discover a new route to
the receiver node. Second one is route reply (RREP) which is a unicast routing and
reply to its RREQ flood, and route error (RERR) which is a re-broadcast message,
used only when at least one unreachable destination is found [10].
3.2 Geographical Routing Protocol (GRP)
A familiar routing protocol for mobile network is GRP which deals with the position
of source node. Because of its combined robustness of proactive and reactive routing
protocol, it is named as hybrid routing protocol [2]. Source node is liable for collecting
the information for finding the best route, and according to this information, data
packets are started to transmit. A great disadvantage of this protocol is complexity
and overhead [10].
3.3 Optimized Link State Routing Protocol (OLSR)
A suitable protocol for random traffic and large network is optimized link state
routing protocol. It uses multipoint relay (MRP) of node to forward packets rather
than flooding them [2]. It executes hop by hop routing and successfully delivered its
packets to the destination by following the shortest route. Its proactive behavior makes
it possible to get available routes immediately when needed [11]. The distributed
characteristic of its design makes sure that no central entity is required. OLSR is
compatible to any dense network where a considerable amount of networks perform
frequent communication [10].
4 Experimental Details
In this section, required simulation tool along with the several simulation setups will
be described.
4.1 Simulation Tool
In this paper, Riverbed 17.5 modeler is used for network simulation. It was previ-
ously known as OPNET simulator. The modified version of OPNET is Riverbed.
It is specialized for network researches and development. It facilitates the user to
design communication networks with different devices and protocols, test securities
and simulate the network with different performance parameters and applications
[7]. Several experiments have been settled in wireless technologies regarding with
development problems and their solutions.
4.2 Simulation Setup
The simulation setup describes MANET’s performance for three different protocols.
For each protocol, two scenarios are designed to compare its performance under
jamming attack and without attack. A campus network is implemented with 20
mobile nodes in 10 * 10 km. area. Each scenario is set to run for simulation as 250 s
and the seed value as 128. For traffic generation on the network, the used applications
are email (medium load), FTP (medium load) and database (low load) (Tables 1 and
2).
Table 1 MANET parameters

Name Value
Mobility model Random waypoint
Mobility speed 10 m/s
Traffic type Email, FTP, database
Ad hoc routing protocol AODV, OLSR, GRP
Pause time 50 s
Trajectory Vector
Packet size 16,000 bits
Physical characteristics Direct sequence
Data rate (bps) 11 mbps
Transmit power (W) 0.005
Packet-reception threshold −95
Rts threshold (bytes) 128
Short retry time 4
Long retry time 7
Max receive lifetime (secs) 0.5
Buffer size (bits) 1,024,000
Table 2 Jammer parameters

Name Value
No. of jammer 3
Trajectory Vector
Jammer band base frequency 2402
Jammer bandwidth 100,000
Jammer transmit power 0.001
Pulse width 1.0
The simulation outcomes are compared and studied in this section. Jammer nodes are
established inside the network for comparing the performance of this new network
with its normal condition. This comparison is accomplished by observing throughput,
delay, traffic sent, traffic received, etc.
Fig. 2 Performance of AODV (with and without jamming attack)
5.1 Comparison of Jamming Attack Under AODV Protocol
In the first scenario, AODV protocol is configured without any jammer node.
This scenario is then modified by introducing jammer nodes and compared the
performance of the new scenario with previous scenario.
From the simulation result, it is clear that jammer nodes decrease the performance
of the network by creating unwanted traffic. It reduces throughput from 4.5 megabits
to 3.0 megabits and increases delay from 3.6 to 4.8 s. Figure 2 shows the performance
of AODV in case of normal condition and under attack condition.
5.2 Comparison of Jamming Attack Under OLSR Protocol
The same networks are configured with OLSR protocol with and without jamming
attack. Figure 3 shows the performance parameters of OLSR protocol without and
with jamming attack. When the routing traffic sent is compared, it is almost 114,000
bits without the existence of jammer whereas this value reduces to almost 72,000
with the presence of jammer.
Then the results of both simulations are compared by using throughput and
delay of the network. After introducing jammer nodes, throughput reduces from
4.0 megabits to 2.52 megabits and delay rises from 3.9 to 5.75 s which causes due to
the congestion in the network. As a result, the overall performance is fallen down.
Fig. 3 Performance of OLSR (without and with jamming attack)
5.3 Comparison of Jamming Attack Under GRP Protocol
In this case, again two scenarios are simulated for GRP protocol; one is with the
involvement of jammer nodes and another is without jammer node. The performance
parameters of GRP protocol are compared for these two scenarios which are shown
in Fig. 4. This comparison gives clear evident about the degradation of network
performance during jamming attack. Throughput and delay of the network under
GPR protocol is also affected because of jamming attack. It decreases the number of
packet reached at the destination during run time.
Fig. 4 Performance of GRP (without and with jamming attack)

6 Conclusion
Due to the behavior of wireless media between the sender node and the receiver
node, MANET is more susceptible to different attacks. These attacks are responsible
to fall down the network performance. The objective of this research was to find out
the reliable wireless routing protocol in the face of jamming attack. The networks
under AODV, OLSR and GRP protocol are all severely affected by jamming attack.
Among these three protocols, OLSR indicates the worst performance in term of
traffic sent, delay, throughput, traffic received and network load. From the observed
results of these three protocols, it can be concluded that GRP and OLSR protocols
are more attackable under jamming attack. On the contrary, AODV is verified as the
best performer in case of jamming attack among aforementioned three protocols. For
this reason, configuring the network with AODV protocol will be the best choice to
withstand under jamming attack. Security issue in WSN is now a great concern.
This research work would be further expanded to other security attacks like worm-
hole attack and byzantine attack. Prevention mechanism against these attacks will
also be discussed.
References
1. Singh, J., Gupta, S.: Impact of jamming attack in performance of mobile ad hoc networks. Int.
J. Comput. Sci. Trends Technol. (IJCST) 5(3), 184–190 (2017)
2. Rao, Y.C., Kishore, P., Prasad, S.R.: Riverbed modeler simulation-based performance analysis
of routing protocols in mobile ad hoc networks. Int. J. Recent Technol. Eng. (IJRTE) 7(6S),
350–354 (2019)
3. Popli, P., Raj, P.: Effect of jamming attack in mobile ad hoc environment. Int. J. Sci. Eng.
Technol. Res. (IJSETR), 5(5), 1521–1526 (2016)
4. Yadav, N., Dr. Kumar, V.: Securing ad hoc network by mitigating jamming attack. Int. J. Adv.
Res. Comput. Eng. Technol. (IJARCET) 4(6), 2502–2506 (2015)
5. Bandaru, S.: Investigating the effect of jamming attacks on wireless LANS. Int. J. Comput.
Appl. (0975–8887) 99(14), 5–9 (2014)
6. Manickam, P., Baskar, T.G., Girija, M., Dr. Manimegalai, D.: Performance comparisons of
routing protocols in mobile ad hoc networks. Int. J. Wirel. Mob. Netw. (IJWMN) 3(1), 98–106
(2011)
7. Jasim, S.I.: PCF investigation to improve the performance of TORA—based manet against
jamming attacks. Int. J. Comput. Sci. Eng. Survey (IJCSES) 5(3), 17–28 (2014)
8. Jassim, S.I.: Investigate the integration of PCF in WLAN to improve its performance against
attackers. J. Univ. Babylon Pure Appl. Sci. 26(5), 241–255 (2018)
9. Modi, S., Dr. Singh, P., Dr. Rani, S.: Performance improvement of mobile ad hoc networks
under jamming attack. Int. J. Comput. Sci. Inf. Technol. 5(4), 5197–5200 (2014)
10. Baxla, S., Nema, R.: Performance Analysis of ODV, OLSR, DSR and GRP routing protocols
of adhoc networks. Int. J. Innovative Res. Dev. 2(5), 888–900 (2013)
11. Jacquet, P., Muhlethaler, P., Clausen, T., Laouiti, A., Qayyum, A., Viennot, L.: Optimized link
state routing protocol for ad hoc networks. In: Proceedings, IEEE International Multi Topic
Conference, 2001, IEEE INMIC 2001, Technology for the 21st Century, pp. 62–68. IEEE
(2001)
Classification of Skin Cancer Lesions
Using Deep Neural Networks
and Transfer Learning
Danny Joel Devarapalli, Venkata Sai Dheeraj Mavilla,

Sai Prashanth Reddy Karri, Harshit Gorijavolu,
and Sri Anjaneya Nimmakuri
Abstract Skin cancer is among the life-threatening cancers, but unlike most cancers,
skin cancer is observable and can be detected in early stages, yet not many are aware
of its detectability. There are mainly three types of skin cancers, which are basal
cell carcinoma, squamous cell carcinoma, and melanoma, where melanoma is the
most dangerous type of cancer with a very low survival rate. Skin cancers are not
painful, most of the time, even though they appear to be visibly distressing it makes
them easily detectable, as cancer is nothing but the abnormal growth of skin cells.
A person can detect if a skin lesion is cancerous by taking a picture. Deep neural
networks can be used to classify the type of cancer. This can be done by collecting and
feeding several clinical images of cancerous skin lesions, segmentation, removing
noise, etc., to a deep neural network to train on before detecting cancerous lesions.
Our data was scraped from the Internet and few images were collected from the
HAM10000 dataset, ISIC Archive, and scraped images from the Web. Every class
has 3552 images which are a total of 10,656 images; image augmentation was used to
generate images to make all classes have an equal number of images. The first model
was a basic CNN model that trained, several times changing the hyperparameter
values to fine-tune the model to give accurate results, which gave us 86.5% accuracy
and implemented transfer learning with the ImageNet weights of different ImageNet
D. J. Devarapalli (B) · V. S. D. Mavilla · S. P. R. Karri · H. Gorijavolu · S. A. Nimmakuri

Department of Computer Science and Engineering, Vignan Institute of Technology and Science,
Hyderabad, Telangana, India
e-mail: dannyjoeldevarapalli@gmail.com
V. S. D. Mavilla
e-mail: dheerajmavilla.dm@gmail.com
S. P. R. Karri
e-mail: saiprashanth1776@gmail.com
H. Gorijavolu
e-mail: harshitraj0300@gmail.com
S. A. Nimmakuri
e-mail: srianjaneya2019@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_28
260 D. J. Devarapalli et al.
models, where ResNet50 gave us the highest accuracy of 95.6%. We have deployed
this into a Web application using JavaScript and tensorflow.js.
Keywords Transfer learning · Deep neural networks · Skin cancer · Image

classification
1 Introduction
There are many types of skin diseases and skin cancers and some of them can be
fatal. Skin cancers must be treated in the earlier stages or it proves to be deadly in the
long run. Many people neglect the skin lesions which can cost their life.In rural areas,
where the hospitals are not well equipped, there is struggle to detect the diseases in
early stages. So, the need for automatic skin disease prediction is increasing for the
patients and as well as the dermatologist. The available current diagnosis procedure
consists of long laboratory procedures and takes two weeks for the patient to get
their biopsy results, but this system will enable users to predict skin disease using
image processing. With a predictor system, a patient or the dermatologist can check
whether the lesion is malignant or not easily, so that cancer can be treated in the early
stages. We collected images of three most common skin cancers and the prediction
system is used to predict these three skin cancers, namely melanoma, squamous cell
carcinoma (SCC), and basal cell carcinoma (BCC). Early detection of melanoma can
potentially improve survival rate, yet nearly 30,000 people die yearly in the USA
alone. Skin cancers do not give any pain most of the time, but they are visible, as
the cancer is nothing but the abnormal growth of skin cells. In this paper, we have
discussed our prior study in the literature survey that has been published focusing on
a similar objective, in the Proposed Work section we have discussed our study and
workflow of its implementation, and Results section which shows promising results
when transfer learning is used. The Future Scope section of this paper discusses the
real-world possible implementations of this work.
1.1 Objective
Primary Objective: Our primary objective is to classify the skin cancers to their
types by building the best model using convolutional neural network and transfer
learning.
Second Objective: Our second objective is to learn how convolutional neural
networks work on sample data like this, to learn and carefully observe other architec-
tures such as VGG16/19, InceptionV2/V3, and ResNet50/101, which have excelled
in ILSVRC throughout the years. Study the results with and without transfer learning.
Classification of Skin Cancer Lesions Using Deep Neural … 261
2 Literature Review
There have been many types of research following not only skin cancer detec-
tion/classification but also on many skin diseases, and some of the papers that have
been published before [1]. Proposed a technique to classify a melanoma lesion only.
A very careful study on skin cancers and their anatomy performed three prepro-
cessing techniques—grayscale conversion, noise removal, and image enhancement
and used support vector machine classifier; the image is segmented and then fed into
the fit function for training, as melanoma is tested based on the shape and used a
very small sample size, which cannot identify the possibilities of real-world images
[2]. Computer-aided clinical skin disease diagnosis using CNN and object detec-
tion models gives us the influence of object detection technique in this approach
for training, which can increase the accuracy and decrease the computation cost by
removing unwanted background learning, used an ensemble learning approach to get
the final output, and used two datasets Skin-10 (which contains images belonging
to 10 classes of skin diseases, with a total of 10,218 images) and Skin-100 (which
contains 100 classes of common skin diseases with a total of 19,807 images). The
best accuracy achieved is 79.01% for dataset Skin-10 and 53.54% for dataset Skin-
100, the reason we believe for the low accuracy is because of the class imbalance
after checking the dataset they mentioned, and the model was sensitive toward the
high volume class, which may have led to the poor performance. The importance
of object detection and ensemble learning methods is emphasized. Jana et al. [3]
is a research on skin cancer cell detection using image processing gives the role
of segmentation, features extraction in image processing, and proposes a technique
to remove unwanted features in the area of interest, such as hair. Ambad [4] is an
image analysis system to detect skin diseases, and a workflow comprises basic steps,
used two-level classifier, which is a great idea, the first classifier detects if a lesion is
normal or defected and further if it is infected, the second classifier classifies whether
the lesion is a melanoma, psoriasis, dermo. A two-way classifier is a good approach.
3 Methodology
This project is aimed at determining the class of a skin cancer lesion to its correct clas-
sified class of skin cancer, achieved by training a deep convolutional neural network
and other segmentation methods. The major area of concentration and evaluation
would be in defining the number of epoch cycles, batch sizes to be good enough to
fit non-linearly to the data.
The whole workflow of this research is by following steps, and a clear explanation
and description are given to every step.
3.1 Data Preprocessing
Data cleaning is done by removing the unwanted images and cropping the images
so that the lesion is focused. After data collection, the images are uploaded to the
Google Drive in different folders, that is, HAM 10,000, Web, social media. Some
images contained watermarks and clinical markings on lesions which might confuse
the model; these pictures have been either avoided or edited using the tool GIMP.
The images are resized to 224 × 224 × 3 since most of the ImageNet models use
this image format.
3.1.1 Generating Data
This step involves the final image data that is ready to be generated to a deep neural
network. Since our dataset is imbalance, which means our classes have a different
number of images, for melanoma we have found many images, the data available
was huge since it is very common, but we found less number of images for squamous
and basal cell carcinoma, for melanoma it was 1500+ images, SCC 700+, and BCC
900+ images. With class imbalanced data, there is a greater chance that the model
might overfit to the class belonging the highest number of classes, or be sensitive
to only the class that has more images. To avoid this vulnerability to overfitting, we
have used the ImageDataGenerator class that generates images with specified image
augmentations.
Image Augmentation Brownlee [5] Image augmentation is a technique that is used

to artificially expand the dataset. This is helpful when we are given a dataset with very
few data samples. Image augmentation parameters that are generally used to increase
the data sample count. We did not use the ones that would distort the image too much
by further removing the lesion features/information. We have generated 3552 images
for each class, and total dataset image files to 10,656 images. The parameters and
their respective values are as follows, brightness_range = [0.3, 1.0], zoom_range =
[0.5, 1.0], horizontal_flip = True, rotation_range = 90, and vertical_flip = True.
3.2 Prediction Techniques
In deep learning, model training is the most tiring job, since it takes a lot of time to
train, to overcome this we have trained our models on Google Colab which allows
users to use their computer engine backend integrated GPUs and TPUs for free, with
almost 12 GB of RAM. We started by building a simple CNN architecture to train
on our data.
3.2.1 CNN—Basic Architecture
CNN architecture starts with feature extraction, followed by pooling layers, fully
connected layers, and finishes with classification. Feature extraction is performed
by changing convolution layers with subsampling or pooling layers. Classification
is performed with dense or fully connected layers followed by a final softmax layer.
For image classification, CNN architecture performs better than a fully connected
feedforward neural network. Deotte [6] A basic CNN architecture contains the
following.
• Filters is the number of desired feature maps.
• Kernel size is the size of the convolution kernel. A single number 5 means a 5 ×
5 convolution.
• Padding is either ‘same’ or ‘valid’. Leaving this blank results in padding = ‘valid’.
If padding is ‘valid’ then the size of the new layer maps is reduced by kernel_size-
1. For example, if you perform a 5 × 5 convolution on a 28 × 28 image (map)
with padding = ‘valid’, then the next layer has maps of size 24 × 24. If padding
is ‘same’, then the size is not reduced.
• Activation is applied during forward propagation. Leaving this blank result in no
activation.
• We used ‘ReLu’ activation for every layer and softmax in the end, we also used
a drop out of 0.4 (40%) to generalize data, thereby avoiding overfitting. Our
architecture takes in an input_size of (28, 28, 1) ‘grayscale’ image and consists
of:
• Two convolutional layers with feature map 32 × 32 and kernel_size 3 × 3,
with activation ‘ReLu’ and one convolutional layer with feature map 32 × 32,
kernal_size 5 × 5 with stride 2.
• Two convolutional layers with feature map 64 × 64 and kernel_size 3 × 3,
with activation ‘ReLu’ and one convolutional layer with feature map 64 × 64,
kernal_size 5 × 5 with stride 2.
• Flatten layer followed by a fully connected layer, dense—128, a dropout of 0.4,
dense—3(number of classes present).
• The following model is trained with batch_size = 32 and epochs = 100, we
added batch normalization after every layer also we added two dropouts, one
with 0.4 after two layers, and 0.5 before the final fully connected layer, we used
kernel regularizer = l2(0.001), bias regularized = l2(0.001). The accuracy after
100 epochs was 0.8647 and on training accuracy, it was 0.9947, but this was not
our best model. We next trained with ILSVRC models that were trained on the
ImageNet dataset. We try to train our data without ImageNet weights and solely
on their architectures to see the results and then use the transfer learning method
to fine-tune the model and train it with ImageNet pretrained weights.
We have used ResNet101, VGG, and InceptionResNetV2 with and without
transfer learning, which means weights as None and with transfer learning weights
as ‘ImageNet’. We created a pickle object file of our image data ready in our drive
to avoid extracting data every time we ran by just loading it from Google Drive.
3.2.2 Without Transfer Learning
We used Keras.applications to use VGG16, InceptionResNetV2, and ResNet101

architectures, from the documentation provided by Keras/applications, the weights
parameter as None. The input images were extracted and resized to 224 × 224 × 3,
since most of the ImageNet models use this image format and include top = False.
To this, we added a global average pooling 2D layer, dropout of 0.5 for VGG16 and
ResNet101, and a dropout of 0.4 for InceptionResNetV2, all with activation ‘ReLu’
and a dense layer of 3, for three classes in the dataset. All the models have loss
function as ‘sparse_categorical_crossentropy’ since, one image belongs to only one
type of cancer, and the metrics values = [‘accuracy’] for all.
ResNet101 Fung [7] A residual neural network (ResNet) is an artificial neural

network (ANN) that is based on constructs known from pyramidal cells in the cere-
bral cortex. Residual neural networks utilize skip connections or shortcuts to jump
over some layers. ResNet-N is a deep residual network that is N layers deep. It is
a subclass of convolutional neural networks, with ResNet most popularly used for
image classification.
• We have used stochastic gradient descent (SGD) optimizer with a learning rate of
0.01 and a momentum of 0.9.
• With ResNet’s 101 layers, the trainable params: 42,558,979, non-trainable
params: 105,344. With batch_size of 32, epochs 20, and validation_slipt 0.2.
TRAINING ACCURACY = 0.891 and VALIDATION ACCURACY = 0.8116.
VGG16 Tewari [8] VGG16 is a CNN model. The model achieved 92.7% top-5
test accuracy with the ImageNet dataset. This network consists of a very simple
architecture using only 3 × 3 convolutional layers stacked on top of each other in
increasing depth.
• We have used Adam optimizer with a learning rate of 0.0001.
• With VGG16’s architecture, total params: 3,588,003, trainable params: 3,588,003,
non-trainable params: 0, with batch_size of 32, epochs 100, and validation_slipt
0.2. TRAINING ACCURACY = 0.9981 and VALIDATION ACCURACY =
0.8918.
InceptionResNetV2 Raj [9] Inception layer is a combination of a 1 × 1 convolu-

tional layer, 3 × 3 convolutional layer, 5 × 5 convolutional layer with their output
filter banks concatenated into a single output vector forming the input of the next
stage.
• We have used Adam optimizer with a learning rate of 0.0001 the same as VGG16.
• With InceptionV2’s architecture, total params: 54,451,939, trainable params:
54,391,395, non-trainable params: 60,544. With batch_size of 32, epochs 10,
and validation_slipt 0.2. TRAINING ACCURACY = 0.9575 and VALIDATION
ACCURACY = 0.866.
3.2.3 With Transfer Learning
Marcelino [10] With transfer learning, instead of starting the learning process from
scratch, we start from patterns that have been learned when solving a different
problem, such as ImageNet in our case. This way we leverage previous learning
and avoid starting from scratch which would save us a lot of time. There are three
types of transfer learning or strategies to implement transfer learning.
1. Train the entire model.
2. Train some layers and leave the others frozen
3. Freeze the convolutional base.
Out of the three strategies, the second strategy is used when we have a small
dataset, as the dataset of 10 k images is small compared to the 14million images of
ImageNet.
We used Keras.applications to use VGG16, InceptionResNetV2, and ResNet101
architectures, from the documentation provided by Keras/applications, the weights
parameter as ‘ImageNet’. The input images were extracted and resized to 224 × 224
× 3, since most of the ImageNet models use this image format and include top =
False. All the models have loss function as ‘sparse_categorical_crossentropy’ since
one image belongs to only one type of cancer, and the metrics values = [‘accuracy’]
for all.
VGG16-ImageNet Weights
• We added a convolutional layer to this with feature maps 64, kernal_size = (3, 3),
a max pooling 2D with pool_size = 2, a flatten layer, a fully connected layer of
256, a dropout of 0.5, and a fully connected layer 3 with a ‘softmax’ activation.
• With VGG16’s architecture, total params: 12,278,915, trainable params:
12,278,915, non-trainable params: 0.
With batch_size of 32, epochs 100, and validation_slipt 0.2.
• TRAINING ACCURACY = 0.9785 and VALIDATION ACCURACY = 0.9009,
which is an acceptable result, but the difference value can be accepted but we
cannot certainly call it an optimal model. As the difference of accuracies shows
evidence of overfitting.
InceptionResNetV2—ImageNet weights
• We added a flatten layer to the loaded model, a dropout of 0.4, and finally a dense
layer 3 with ‘softmax’ activation.
• With InceptionResNetV2’s architecture, total params: 54,451,939, trainable
params: 54,391,395, non-trainable params: 60,544. With batch_size of 32, epochs
10, and validation_slipt 0.2.
which is an acceptable result, but the difference value can be accepted but we
Fig. 1 ResNet101 learning curves with transfer learning
cannot certainly call it an optimal model. The difference also does not show much
of the evidence of overfitting.
ResNet101—with ImageNet Weights
• For ResNet101, we have added a global average pooling 2D layer, a dropout of
0.4, and finally a dense layer 3 with ‘softmax’ activation.
• We have used Adam optimizer with a learning rate of 0.0001, as we used for the
rest of the models. Gave us better results with a slow learning rate.
• With ResNet101’s architecture, total params: 42,664,323, trainable params:
42,558,979, non-trainable params: 105,344. With batch_size of 32, epochs 100,
and validation_slipt 0.2.
which is an acceptable result and almost an optimal model, but the difference
value is accepted. This has been the best result so far with training and testing
accuracies having very less difference, which means that the model has trained
well on the training data of 8516 images, generalized well without overfitting and
it can 95.63% accurately predicts to new data, and compared with the ResNet101
model without using any pretrained weights, we can say the transfer learning
method outperformed the traditional approaches.
• The learning curves (in Fig. 1) also do not show much of the evidence of overfitting.
Compared with ResNet101 without pretrained weights.
4 Results
The experimental results are shown below for the input images of skin cancers, by
the means of transfer learning approach.
Hence, from the results obtained from the deep learning algorithms (in Table 1),
it can be accounted for that ResNet101 is the best algorithm for predicting the class
of skin lesions.
Table 1 Accuracies of
Algorithm Accuracy
different deep learning
algorithms With ImageNet Training accuracy Validation accuracy
weights
VGG16 0.9785 0.9001
InceptionResNetV2 0.9927 0.9317
ResNet101 0.9992 0.9563
5 Conclusion
The project’s key goal is to predict an image of skin lesion to its type with the highest
possible accuracy, by the means of transfer learning approach. Several architectures
have been trained with different learning rates, epochs, and batch sizes, however
ResNet101 architecture with ImageNet weights has given us the best accuracy to
identify the type of a skin cancer lesion ever to be published or recorded, which is
95.63% with training accuracy of 99.92%, and we do not see the model overfitting
in this case. Also, an ensemble approach that has been said to give better results
is implemented by using a basic voting mechanism that was written in Python. We
have tried to deploy this model as a Web application, but we got few errors with
the express server and tensorflow.js version. Our second goal of understanding how
these deep neural networks work and knowing how to implement them and fine-tune
them to get better results is achieved.
5.1 Future Scope
Since the project identifies the cancer lesion type, this can be used by both dermatolo-
gists and patients. Before sending the clinical image for the biopsy test dermatologists
can run the lesion through the model and based on the results, they can focus on vali-
dating if the lesion belongs to the class that the model has specified. This would
cut down the delay of 2 to 3 weeks for the biopsy results. If the model predicts
inaccurately in certain conditions then the model can be trained with the wrongly
classified images to better learn the features it missed in the first learning. For better
reliability on the model, because we cannot solely trust a machine for final prediction
and take the result of the machine as a final answer, we can find the performance
of dermatologists and the machine, by providing the dermatologists and the model
to classify a set of images and validate them with their predictions and evaluate the
performance of the model over an experienced doctor.
References
1. Ansari, U.B., Sarode, T.: Skin cancer detection using image processing. Int. Res. J. Eng.
Technol. (IRJET) (2017) Mumbai, India. Available at: https://www.irjet.net/archives/V4/i4/
IRJET-V4I4702.pdf
2. He, X., Wang, S., Shi, S., Tang, Z.: Computer-Aided clinical skin disease diagnosis using
CNN and object detection models, Nov 2019, China. Available at: https://www.researchg
ate.net/publication/337413270_Computer-Aided_Clinical_Skin_Disease_Diagnosis_Using_
CNN_and_Object_Detection_Models
3. Jana, E., Subban, R., Saraswathi, S.: Research on skin cancer cell detection using image
processing. In: IEEE-International Conference on Computational Intelligence and Computing
Research (ICCIC), Dec-2017, India. Available at: https://ieeexplore.ieee.org/document/852
4554
4. Ambad, P.S.: An image analysis system to detect skin diseases. IOSR J. VLSI Sig. Process.
(IOSR-JVSP) (2016) India. Available at: https://pdfs.semanticscholar.org/014e/75f75274d4b8
a75ae3e2356556f7450fdb5a.pdf
5. Brownlee, J.: How to configure image data augmentation?, (2019). Available at https://mac
hinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deep-
learning-neural-networks/
6. Deotte, C.: Basic CNN architecture, (2018). Available at https://www.kaggle.com/cdeotte/how-
to-choose-cnn-architecture-mnist
7. Fung, V.: An overview of resnet and its variants, (2017). Available at https://towardsdatascie
nce.com/an-overview-of-resnet-and-its-variants-5281e2f56035
8. Tewari, S.: CNN architecture series—VGG16 with implementation (Part I), (2019). https://
medium.com/datadriveninvestor/cnn-architecture-series-vgg-16-with-implementation-part-i-
bca79e7db415
9. Raj, B.: A simple guide to the versions of inception networks, (2018). Available at https://toward
sdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
10. Marcelino, P.: Transfer learning from pre-trained models, (2018). Available at https://toward
sdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751
Security Features in Hadoop—A Survey
Gousiya Begum, S. Zahoor Ul Huq, and A. P. Siva Kumar
Abstract Extensive usage of Information and Communication Technology appli-

cations including online banking, ecommerce, retail, social media and smart phone
apps etc. are responsible for creation of extremely large amounts of digital data every
hour which is termed as Big Data. To store and process Big Data, an open source
framework is needed and the framework is known as Apache Hadoop. Initially it was
developed for internal use at Yahoo with limited security features. Later Hadoop was
made open source and distributed under Apache License. Once Hadoop was made
open source, so many developers contributed to Hadoop development. In this process
many authentication, authorization services were developed and distributed as part
of Hadoop framework. In this paper, the security features of Hadoop are discussed,
their limitations and also highlight the future scope of work with respect to Hadoop
Security.
Keywords HDFS · Map reduce · Security · Kerberos · ACL
1 Introduction
The large amount of data collected is known as Big Data. The data is collected from
various sources like social media, databases etc. Initially Big Data characteristics are
specified by 3v’s. They are variety, velocity and volume. Volume specifies the size
of data to be stored. Recent study forecasted that 1.8 zettabytes of data was created
in 2011 alone [1]. Around 2.5 quintillon bytes of data is created everyday and every
G. Begum (B)
CSE Department, JNTU, Ananthapuramu, India
e-mail: gousiyabegum@gmail.com
S. Z. U. Huq
CSE Department, GPREC, Kurnool, India
e-mail: szahoor@gmail.com
A. P. Siva Kumar
MGIT, Hyderabad, India
e-mail: sivakumar.ap@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_29
270 G. Begum et al.
single second it is increasing. Variety specifies whether the data is structured or

unstructured data [2]. Velocity specifies how fast the data is coming in and how fast it
has to be processed. Apart from 3 V’s today it has been increased to 51 V’s. They are
(variety, velocity, volume, volatility, validity, veracity, vanilla, voice, visualization,
victual, value, viability, verification, verbosity, vexed, versatility, voluntariness, vet,
vulpine, variability, viscosity, vocabulary, verdict, venue, versed, violation, vibrant,
versioning, vagueness, vitality, virality, visibility, vastness, varmint, vantage, valor,
varnish, veer, vaticination, vane, varifocal, vault, veil, virtuosity, vivification, vogue,
voodooism, voyage, verve, venturesomeness) [3]. The data collected from various
sources has to be processed and to process this data traditional system resources
are not enough so Hadoop is used. Hadoop is a fully distributed massive paral-
lelism framework powered by Apache which stores Big Data and process it in an
environment of distributed fashion in various computers which form clusters using
programming languages such as java. Hadoop has its own file system(HDFS) and
it is placed above host computer file system [4]. HDFS is used to store Big Data
and MapReduce is used to process. It uses scale out technique. YARN (Yet Another
Resource Negotiator) is used as resource manager [5, 6].
The organization of paper is as follows: Sect. 2 describes about Hadoop archi-
tecture. Section 3 about literature survey. Section 4 describes about Hadoop security
techniques. Section 5 describes about conclusion and future work.
2 Hadoop Architecture
Figure 1 describes about Hadoop architecture. In this architecture, data is divided

into blocks and these blocks are distributed on different nodes/machines. In each
Fig. 1 Architecture of hadoop framework

Security Features in Hadoop—A Survey 271
machine there are some splits and for each split we are running a mapper that is
parallelism inside parallelism. This is called massive parallelism.. The client stores
the file on HDFS to perfom some operation on it. So, he sends a request to NameNode
to store the data. NameNode stores files metadata and tells the client, the location
of data nodes where the files in HDFS has to be stored [7]. The client then stores
the files in particular data nodes. After successfully storing the file, DataNode sends
acknowledgment to the client. The data is replicated in minimum 3 systems to avoid
data loss. A heartbeat is sent to NameNode from data nodes to inform about block
report and also tell NameNode that data node is still running. If the client want to
perform processes then Job Tracker will do that, it will ask NameNode where the
data is stored and Task Tracker will perform the functions required. While processing
the two functions done are Map and Reduce. One the two functions are completed
the output from all data nodes are taken and reduced to a single output. To tell about
current functioning of Task Tracker, it also send a heartbeat to Job Tracker. The main
distributors of Hadoop software [8, 9] are the 1. MapR, 2. Cloudera, 3. Hortonworks,
4. Microsoft HD Insight, 5. IBM Infosphere Big Insights, 6. Pivotal HD.
3 Literature Survey
Hadoop has developed with less security. In 2009 Security implementation has
started to hadoop. Hadoop distributors like Cloudera, Hortonworks and MapR have
proprietary security but these features are not present in Apache releases of Hadoop.
Security of Hadoop is based on four levels as per Cloudera [10].
• Authentication: It explains who are the users who can be authenticated.
• Authorization: It explains which authenticated users can access how much data
and what data.
• Audit: It monitors when the data is accessed, where the data is accessed from and
how the data can be accessed.
• Encryption: It explains how the data can be protected when it is at rest or moving.
A. Authentication/Perimeter Security: It is added in 2010 and the aim of providing

authentication is clients accessing the cluster should be genuine and servers of
the cluster should be authenticated. One of the concept used for authentication is
Kerberos. It is an authentication protocol [11, 12] used in networks which is used
to provide authentication for applications. It uses the concept of tickets and based
upon symmetric encryption cryptography. First the user will be authenticated
himself/herself with Authentication Service (AS) [30] by providing a password
then Ticket Granting Ticket (TGT) is issued by Ticket Granting Service (TGS)
[13] and stores that in cache. When the user wants to get any service, he/she
sends TGT to TGS and gets a service ticket using which is used to authorize the
access the service. Though Kerberos is the good solution for authentication it is
also having some disadvantages like
272 G. Begum et al.
• For every MapReduce job if the user use TGT then the Key Distribution
Center(KDC) would quickly become the bottleneck and traffic will increase and
there is a chance of getting Distributed Denial of Service attack.
• Kerberos tickets are not renewal frequently by which there may be hackers who
can hack the tickets and use the system.
• Deployed code should be complaint to Kerberos so there should be a separate
planning and testing for performing authentication to the code.
• If KDC fails HDFS or MapReduce will not work.
• To identify authentication breaches KDC does not have any strategy.
To decrease above disadvantage Delegation Tokens are used.

The user use Delegation Token for authenticating with servers. MapReduce
processes use these delegation tokens for authenticating themselves by a Name Node
whenever they want to access HDFS. Delegation Token uses HMAC mechanism and
they are stored as HashMap in server where key contains public information and
value contain private information.
Block Access Token [11]: For data block accesses on Data nodes, client should be
authenticated so NameNode provides Block Access Token which is available only
for less time (default 10 h) and it cannot be renewed after that time.
In [14], An authentication protocol called Trusted Platform Module(TPM) devel-
oped using which authentication is provided to hadoop internal items. The problem
with this TPM is that it believes that NameNode is trustworthy.
B. Authorization: In HDFS authorization is primarily governed by file permissions.
To provide access to a file or directory in HDFS it requires permissions. Similar
to Linux system permissions like read, write and delete are given to owner, group
and others. Any member of the group defined in dfs.permissions.superusergroup
in NameNode can read, write or delete any file and directory. HDFS supports
three additional special permissions: sticky, setgid and setuid. The sticky bit is
used for directories, such as/tmp, where you want all users to have write access
to the directory but only the deletion of data is done by owner of the data.
Hadoop enable authorization based on ACLs (Access Control Lists) [15, 16]. It
supports access control lists (ACLs) on the job queues by controlling which users
can submit jobs to queues and which users can administer a queue. Apache Sentry
is developed in order to resolve such issues of access control. In Apache Sentry [17]
a fine-grained role- based access controls (RBAC) is used to give administrators the
flexibility to control what can be accessed by the users. Extended ACLs are introduced
from Hadoop version 2.4. Extended ACLs are enabled on NameNode. The configura-
tion property dfs.namenode.acls.enabled is set to true in hdfs-site.xml. Authorization
in hadoop is supported at service level also and hadoop.security.authorization vari-
able is set to true in core-site.xml. This will check which users or groups of users
can access which protocols so as to stop unauthorized access. The actual authoriza-
tion polices are configured in hadoop-policy.xml file. HDFS, MapReduce and YARN
also supports authorizations at service level. MapReduce or YARN does not control
access to data but only provide access to resources of the clusters like memory, disk,
CPU and network I/O.
C. Auditing means keeping track of what users and services are doing in the cluster
[18]. In HDFS, audit.log is used for auditing which will audit user activities like
when the user create a file, change file permissions, etc. To audit at service level,
SecurityAuth-hdfs.audit, is used. The log files used for auditing in HDFS are
log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit
and log4j.category.SecurityLogger. Auditing in MapReduce focuses
on end user queries and jobs and to audit this mapred-audit.log
is used. To audit authorization at service level, SecurityAuth-
mapred.audit, is used. The log file used for auditing in MapRe-
duce are log4j.logger.org.apache.hadoop.mapred.AuditLogger and
log4j.category.SecurityLogger
D. Encryption: Encryption can be done for Data-in-Transit encyrption and Data-
at-Rest encryption
1. HDFS Data-at-Rest Encryption
Data at Rest encryption is encrypting data at application layer before it is sent in transit
and reaches storage. This type of encryption runs above the operating system layer
and it requires Hadoop system packages or hardware only. Within HDFS, directory
paths which has to be encrypted are broken down into encryption zones. A unique
data encryption key(DEK) [19] is used to encrypt each file in encryption zone. A
zone level encryption known as encryption zone key (EZK) is used to encrypt the
DEK into encrypted DEK (EDEK) as a plain text DEKs does not exist. These EZKs
should not be stored in HDFS because if it is stored in HDFS decryption becomes
easy, so EZKs must be accessed through a secure key server. In big enterprises, actual
storage component is taken care by a dedicated hardware security module(HSM).
Hadoop Key Management Server(KMS) is used between HDFS clients and key
server. The KMS handles both EZKs and DEKs, communicating with the key server,
and decrypting EDEKs. The KMS communicates with the key server through a Java
API called the KeyProvider.
2. Data-in-Transit Encryption
Transport Layer Security: SSL/TLS are thee protocols which is used for securing
data which moves through network. They are used to secure any socket connection.
SSL/TLS trust on a certificate authority(CA) for providing security Hadoop Data-
in-Transit Encryption: Hadoop used RPC, TCP/IP and HTTP [20, 21] are used to
communicate over the network RPC calls are used by API clients of MapReduce, Job
Tracker, Task Tracker, NameNode and data nodes. TCP/IP sockets for data transfer
are used by HDFS MapReduce shuffles uses HTTP protocol.
(a) Hadoop RPC Encryption: Hadoop’s RPC implementation supports SASL which
supports integrity, confidentiality and authentication using different variables.
274 G. Begum et al.
RPC protection mechanism of Hadoop is configured in core-site.xml file with the

hadoop.rpc.protection property.
(b) HDFS TCP/IP protocol encryption:To transfer data between clients in HDFS a
direct TCP/IP socket is used. For data transfer encryption, in hdfs-site.xml file,
dfs.encrypt.data.transfer is set to true.
3. Hadoop HTTP encryption: HTTPS is used to encrypt Data-in-Transit.
4 Some Security Techniques
Data Leakage Prevention (DLP) Technology [22] is used for data security against
data leakage. This technology is introduced in year 2000. The disadvantage of DLP
is if the data is removed it does not able to protect it.
Verizon released a white paper [23] on cloud security. This model is been divided
into four security layers:
• Base: which takes cares of physical security
• Logical: checks the integrity, availability and confidentially of data, resources in
network. It has network, compute, management and storage sublayers
• Value-Added: Provides a Private IP network capability, Firewall capabilities and
VPN capabilities.
• Governance, Risk and Compliance: Ensure all measures of security in above three
layers
Based on Verizon, a security architecture by Twilio company has been introduced.
Twilio is a cloud company which implements Hadoop reliably using Amazon S3
services. It uses S3 policies and ACL.
According to [24], The computation of MapReduce is distributed in nature and
there is a chance for variety attacks such as
• Impersonation attack: A illegitimate user acts like a legitimate user by some brute
force attack and run map reduce jobs which result in data leakage.
• A Denial of Service attack: An attacker stops the functioning and accessing of
mapper or reducer using different tasks which are undesirable.
• A replay attack: Attacker use previous tasks to the data nodes and keeps them
busy continuously.
• An eavesdropping attack: Attacker gives input data and generate intermediate and
final outputs without MapReduce computations.
• A Man in the middle attack [25]: Attacker modifies or corrupts computing code
between two legitimate users.
Proper authorization, authentication, restricted access, confidentiality and input
to mapper class and reducer class is required for Secure MapReduce computation.
According to the author in [11], a Bull Eye Algorithm is proposed for hadoop.
This algorithm allows read or write of data only by authorized persons and when
implementing it will check that data is encrypted for better protection. Only highly
confidential data is stored in data node is checked.
There is one more approach given in [11] called NameNode approach and to
increase the security in the data available, two name nodes are used where one is
master and the other is slave. Name Node Security Enhance (NNSE) provides the
two redundant name nodes and these name nodes uses Bull Eye Algorithm.
Apache Knox [26, 27] is a framework for supporting security on Hadoop clusters.
It is a REST(Representational State Transfer) API gateway. The REST interact with
clusters using one access point. Authentication using LDAP [28] and Active Direc-
tory is managed by System Administrators. Through knox, they conduct an HTTP
header-based federated identity management and audit hardware on clusters.
Apache Ranger [17] is a centralized framework used to manage policies at
resource level and it developed various tools and techniques to standardize security
across Hadoop clusters. It also provides authorization in Hadoop.
Apache Rhino provides a security solution for Hadoop ecosystem. It is a frame-
work based on crypto codec and offers block level encryption of data in hadoop. It
also provides Token based authentication and SSO solution. To encrypt data blocks,
various key distribution and management functions which executes MapReduce jobs
are also supported. Audit logging framework for auditing [29] is also provided.
In this paper various levels of security defined by Cloudera like Authentication,

Authorization, Audit and Encryption levels is being discussed. At each level some
feature is added to provide security but each level is confined to some limitations. We
also discussed some more techniques to provide security to Hadoop. Existing features
clearly indicate that the security framework of Hadoop is confined to authentication
and authorization. There is no such mechanism to detect malicious jars and presence
of harmful code in Pig scripts and Hive queries. A legitimate user may also execute
a jar which may contain harmful code. Hence a concrete solution to detect harmful
code and prevent from executing on HDFS is needed.
References
1. Tang, Y., Yang, J.: Secure Deduplications of General Computations. Columbia University
2. Geczy, P.: Big data characteristics. Macrotheme Rev. 3(6), 94–104 (2014)
3. Khan, N., Naim, A., Hussain, M.R., Ahmad, N., Qamar, S.: The 51 V’s of big data: survey
technologies characteristics opportunities issues and challenges. In: Proceedings of ACM
Omni-layer Intelligent Systems Conference (COINS’19), ACM, Heraklion, Crete, May (2019)
4. Martis, M., Pai, N.V., Pragathi, R.S., Rakshatha, S., Dixit, S.: Comprehensive survey on hadoop
security. Springer Nature Singapore Pte Ltd, (2019)
5. Horwitz, J., Nugent, A., Halper, F., Kaufman, M.: Big Data for Dummies. Wiley (2013)
276 G. Begum et al.
6. Akshata, Chandrashekhar, B.S.: An execution for security scheme in hadoop. MAT J. Comput.
Sci. Eng. Softw. Testing 4(2)
7. Das, D., O’Malley, O., Radia, S., Zhang, K.: Adding Security to Apache Hadoop. HortonWorks
(2011)
8. Erraissi, A., Belangour, A., Tragha, A.: A big data hadoop building blocks comparative study.
Int. J. Comput. Trends Technol. (IJCTT) 48(1):336 (2017). ISSN: 2231-2803 http://www.ijcttj
ournal.org
9. Securosis: Securing Hadoop: Security Recommendations for Hadoop Environments. Securosis.
White paper, Mar 2016 (2014-06-13). Knox Gateway Available: http://knox.apache.org/
10. Bhatal, G.S., Singh, A.: Big data: hadoop framework vulnerabilities, security issues and attacks.
Elsevier
11. Saraladevi, B., Pazhaniraja, N. Paul, P.V., Saleem Basha, M.S, Dhavachelvan. P.: Big data and
Hadoop—a study in security perspective. In: 2nd International Symposium on Big Data and
Cloud Computing (ISBCC’2015). Elsevier
12. Kohl, J., Neuman, C.: The Kerberos network authentication service (V5). (2017)
13. O’Malley, O., Zhang, K., Radia, S., Marti, R., Harrell, C.: Hadoop security design. Yahoo, Inc.,
Tech. Rep (2009)
14. Dou, Z., Khalil, I., Khreishah, A., Al-Fuqaha, A., Robust insider attacks countermeasure for
hadoop: design and implementation. IEEE Syst. J. (2017)
15. Shetty, M.M., Manjaiah, D.H., Hemdan, E.E.D.: Policy-Based access control scheme for
securing hadoop ecosystem. Springer Nature Singapore Pte Ltd. (2019)
16. Narayana, S., Securing hadoop implement robust end-to-end security for your Hadoop
ecosystem. Packt Publishing
17. Gupta, M., Patwa, F., Sandhu, R.: An attribute-based access control model for secure big data
processing in hadoop ecosystem. In: ABAC’18, Mar 21 (2018), Tempe, AZ, USA, 13
18. Spivey, B., Echeverria, J.: Hadoop security: protecting your big data platform. O’Reilly
Publishers (2015)
19. Cloudera Security Report, Cloudera version Cloudera Enterprise version 5.5x (2016)
20. Perwej, Y.: The hadoop security in big data: a technological viewpoint and analysis. Int. J. Sci.
Res. Comput. Sci. Eng. 7(3), 1–14 (2019)
21. Parmar, R.R., Roy, S., Bhattacharyya, D., Bandopadhyay, S.K., Kim, T.-H.: Large-Scale
encryption in the hadoop environment: challenges and solutions. IEEE (2017)
22. Security and Privacy in the Era of Big Data. The SMW, a Technological Solution to the
Challenge of Data Leakage, Arenci/ National Consortium for Data Science White Paper
23. Sharif, A., Cooney, S. Gong, S.: Current security threats and prevention measures relating to
cloud services, hadoop concurrent processing, and big data. In: IEEE International Conference
on Big Data, Washington, DC, USA (2015)
24. Philip Derbekoa, S.D.E.G.S.S.: Security and privacy aspects in MapReduce on clouds: a survey.
Comput. Sci. Rev. 1–28 (2016)
25. Butt, K.K., Li, G., Rehman, M.O.U.: Comparative analysis of hadoop security Ad-Ons. In:
IEEE International Conference on Computational Science and Engineering (CSE) and IEEE
International Conference on Embedded and Ubiquitous Computing (EUC) (2019)
26. Sharma, P.P., Navdeti, C.P.: Securing big data hadoop: a review of security issues, threats and
solution. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 5(2), 2126–2131 (2014)
27. Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, R,
Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., OMalley, O., Radia, S., Reed, B., Balde-
schwieler, E.: Apache hadoop YARN: yet another resource negotiator. SoCC13, Santa Clara,
California, USA, Oct (2013)
28. Priyadharshini, M., Baskaran, R., Srinivasan, M.K., Rodriques, P.: A framework for securing
web services by formulating a collaborative security standard among prevailing WS-* security
standards. Springer CCIS, Springer, Heidelberg, USA, Sep. 2012, Service, vol. 193, pp. 269–
283 (2012). https://doi.org/10.1007/978-3-642-22726-4_29
29. Kim, S.-H., Lee, I.-Y.: Data block management scheme based on secret sharing for HDFS. In:
10th International Conference on Broadband and Wireless Computing, Communication and
Applications (2015)
Optical Character Recognition
and Neural Machine Translation Using
Deep Learning Techniques
K. Chandra Shekar, Maria Anisha Cross, and Vignesh Vasudevan
Abstract Over the years, the applications of text detection and text translation have
expanded across various fields. Many researchers have used several deep learning
algorithms for text detection and text translation separately. We propose a hybrid
methodology to use NMT with OCR to develop a better result to perform text detec-
tion and translation from an image. In this paper, we present techniques to detect and
recognize texts in Hindi from a given image and translate them into English and vice
versa. To achieve this, we are combining two concepts: optical character recognition
(OCR) and neural machine translation (NMT). The output from this hybrid scheme
gives the optimized feature.
Keywords Optical character recognition · Neural machine translation ·

Convolutional recurrent neural networks · Long short-term memory · Recurrent
attention model · Encoder–Decoder model
1 Introduction
Deep learning, predominantly used in different AI and machine learning applications,

empowers the framework to learn like a human and to improve the capability by
training data. Deep learning strategies [1] are capable of feature representation using
unsupervised/supervised learning; there even exist higher and increasingly abstract
layers. Deep learning is at present being utilized in image applications, big data
analysis, machine translation, and speech recognition.
K. C. Shekar (B)
JNTUH, Hyderabad, Telangana, India
e-mail: chandhra2k7@gmail.com
M. A. Cross
GNITC, Hyderabad, Telangana, India
e-mail: mariaanishacross@gmail.com
V. Vasudevan
NIT, Trichy, Tamil Nadu, India
e-mail: vigyvasu4937@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_30
278 K. C. Shekar et al.
Optical character recognition (OCR) is defined to be the mechanical or electrical

change of pictures of printed, composed, or transcribed text into encoded text that a
computer system can process, edit, and store as a text file. The images may include
examined archives, a photograph of a record, a photograph of a scene containing text,
legal forms, street signs, number plates of vehicles, shipping container numbers, and
ID cards. We use this concept to detect and recognize the text in the image present
in our problem statement [2].
The recurrent attention model (RAM) is based on the idea that when the natural
human eye is shown in a particular scene, certain pieces of the picture grab its
attention. The eye obtains information by focusing on those pieces first. In the model,
the picture is trimmed to various sizes around a typical focus, and glimpse vectors
are made with noticeable highlights from each edited rendition. Glimpse vectors
are then passed to a location network, which utilizes an RNN to foresee the next
piece of the picture to focus. The next input for the glimpse network is this location.
Eventually, the model investigates extra pieces of the picture, each time performing
backpropagation to check whether the data from the past impressions is adequate to
accomplish an elevated level of precision. RAM accomplishes a significant level of
precision by “glimpsing” at the picture from alternate points of view and afterward
characterizing the output. DRAM uses two RNNs—an area RNN to anticipate the
next impression area and a characterization RNN committed to predicting class
labels.
The convolutional recurrent neural networks(CRNNs) are the mix of two of the
most conspicuous neural systems. The CRNN includes convolutional neural network
(CNN) trailed by the recurrent neural network (RNN). A CRNN works in layers that
break pictures into sections, recognizing connections among characters, and then
delivering output.
Machine translation is a technique to translate a sentence from one language to
another with the assistance of modernized computerized frameworks, without any
human help [3]. Various methodologies are available to develop such kinds of frame-
works [1], yet we require an increasingly strong strategy to make a more proficient
framework than the current ones. Jiajun Zhang and Chengqing Zong [4] gave an
extensive outline of usage of DNNs in machine translation from two perspectives:
indirect application, which endeavors to improve standard MT frameworks, and
direct application, which receives DNNs to structure a simply neural MT model. A
well-trained system drives the framework toward its objective, which is to produce a
progressively effective translation framework that is fit for achieving great accuracy.
A statistics-based system can be developed with training data in which translation
of the text is done into several languages. Here, we consider thousands of possible
translations, and the probability of each of those translations is similar to the training
data under consideration. This is done in three stages:
Step 1: Divide the original sentence into multiple pieces.
Step 2: Locate every possible translation for each piece.
Step 3: Create all possible sentences and locate the most likely one.
Optical Character Recognition and Neural Machine Translation … 279
Statistical models are a challenge to build and maintain as it is an excess of work

to develop multiple pipelines and manage the large amounts of training data. To
overcome this problem, we use two concepts: RNNs and the encoder–decoder model.
By using these two concepts effectively, we can build a self-training translation
model.
A recurrent neural network (RNN) is an adaptation of a neural network where
the next input of the neural network is the previous state. RNNs are expected to
see information’s successive qualities and utilize examples to predict the following
likely situation. The next most likely word is predicted by considering the first few
words of a sentence.
Three kinds of RNN cells are commonly used: simple RNN, LSTM, and GRU.
The simple RNN limited the training of deep RNNs, so the long short-term memory
(LSTM) was developed to address this vanishing gradient problem. The gated recur-
rent unit (GRU) was built to simplify the LSTM. It has been demonstrated that both
the GRU and LSTM are altogether better than the simple RNN; however, in general,
the LSTM is commonly better. LSTM cells reliably beat GRU cells in our tests.
The definitive objective of our NMT model is to identify the language of the
detected text which is the input and translate that text into the desired language that
will be returned as output. To be specific, we need an approach to change sentences
into a data format that will be given as input into a machine learning model. This is
done by converting our textual data into a numeric form by using Encoders.
The key advantages of this methodology are the capability to deal with variable-
length input and output groups of text and the capability to train a solitary end-to-end
model right above the source and target sentences.
2 Related Work
Vijaya Kumar Reddy et al. [5] proposed an alternate neural network approach for
recognition of Hindi characters written by hand. G Vamvakas et al. [6] proposed a
total OCR approach that helps the identification of text in historical documents. This
technique can be applied to either hand-written or printed documents. Shashi Pal
Singh et al. [7] found that RNN and RAE provide better outcomes in text processing
when contrasted to other neural networks. Sarkhel, R. et al. [8] proposed a multi-
scale deep quad tree-based component extraction technique for the acknowledgment
of disconnected transcribed characters of famous Indic contents. Shahnawaz et al. [9]
proposed a neural network-based methodology for machine translation from English
to Hindi.
Different models and functions have been used in the research reviewed so far, for
text detection and text translation. In this study, we propose a model which is capable
of performing both these tasks by the use of OCR and NMT in the following way:
Step 1: Image preprocessing
• Removal of noise present in the image.
• Removal of the ambient background.
• Handling of the various lighting conditions.
Step 2: Using an LSTM cell as a component of a CRNN to divide the image into
columns, identifying relationships between characters, and then generating the text.
• An established convolutional neural network (CNN)—this is the first layer that
breaks the image into features and divides it into feature columns.
• These columns are supplemented to a deep-bidirectional long short-term memory
(LSTM) cell, which provides a pattern to identify relationships between the
characters.
• The output of the LSTM cell is then given to a transcription layer, which takes the
character sequence, including expendable characters, and takes up a probabilistic
approach to clean the output.
Step 3: Using the LSTM cell as a component of an encoder–decoder model in

identifying the language of the detected text and translating it.
• This is achieved by transforming each word into a one hot encoding vector, which
is then fed into the model. A one hot encoding vector is merely a vector with “0”
at every index except for “1” at a single index corresponding to this specific word.
Thus, each word has a distinct one hot encoding vector, and in this way, every
word in our dataset can be represented by a numerical index.
• To develop this encoding, we need to feed the sentence into the RNN, word by
word. The final result is obtained after the last word is processed, which are
returned in the form of values that represent the entire sentence.
Fig. 1 Encoder–decoder model

• As shown in Fig. 1, two RNNs are placed from back to back: the first RNN is
responsible for generating the encoding that represents the recognized sentence,
the second RNN is responsible for taking that encoding and applying the same
logic inversely, to decode the original sentence; we can edit and train the second
RNN to decode the sentence into Hindi or any other language by using the parallel
corpora training data to train and develop it.
4 Results and Discussions
For our experiments, we used the Devanagri character dataset and street view text
dataset to train our model to locate and recognize texts in Hindi and English.
Street View Text Dataset Dealing with images that involve ambient noise, lighting
issues, and image artifacts is a highly demanding and arduous OCR task. The legacy
OCR algorithms cannot normally process the images in this dataset. A sample image
from this dataset is shown in Fig. 2. This dataset only has word-level interpretations
(no character bounding boxes) and can only be used for the
• Recognition of the cropped lexicon-driven word,
• Detection and recognition of the full image lexicon-driven word.
Devanagri Character Dataset This dataset contains 1800 samples from 36 char-
acter sets obtained from 25 varied writers in the Devanagri script. A distinct file
is used to store each character, and all these files are comma-separated text-based
values. Each character is estimated at around 4 KB. The organized datasets which
mirror the 36 classes are stored in the folders. There are 50 such samples inside
each class folder. A pattern of coordinates (pen-tip positions) from pen up to pen
down movement is considered as one stroke as shown below in Fig. 3. The pattern
of strokes made in a pen movement is captured by the digitizer.
Fig. 2 Street view text data

Fig. 3 Devanagri character data
Table 1 Statistics of IIT Bombay English Hindi dataset

Language Train Test Dev
#Sentences 1,492,827 2507 520
#Tokens eng 20,667,259 57,803 10,656
hin 22,171,543 63,853 10,174
#Types eng 250,782 8957 2569
hin 343 8489 2625
Fig. 4 Final translation
IIT Bombay English Hindi Parallel Corpus We have used the IIT Bombay English
Hindi Parallel Corpus [10] to train our model to translate the detected and recognized
text from Hindi to English and vice versa. The approach proposed in this study
generates enhanced and optimized features by using LSTM with CRNN which is
an encoder–decoder model. The statistics corresponding to the number of sentences,
tokens, and types for the different types of data are given in Table 1.
The model was then able to detect the text and translate it, as shown in Fig. 4.
5 Conclusion
In this paper, we proposed an alternate neural network framework to deal with

detecting and recognizing texts in Hindi from an image and translate it into English
and vice versa. We utilized LSTM as a part of both CRNN and the encoder–Decoder
model to build a model to perform both OCR and NMT together. This approach
was trained and tested on a standard user-defined dataset, which was collected from
different users.
Our future work will concentrate on advancing and optimizing the current recogni-
tion and translation results by implementing new approaches for integrating OCR and
NMT more efficiently. We would also like to move toward hybrid generic intelligent
systems to improve recognition and translation accuracy further.
References
1. Cheragui, M.A.: Theoretical Overview of machine translation. African University, Adrar,

Algeria, Icwit (2012)
2. Ciregan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classi-
fication. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), Providence, RI, USA, 16–21 June 2012
3. Hutchins, W.J.: Machine translation: past, present, future. (Ellis Horwood Series in Computers
and their Applications. Chichester, Ellis Horwood, 1986. 382p. ISBN: 0-85312-788-3
4. Zhang, J., Zong, C.: Deep neural network in machine translation. In: Institute of Automation,
Chinese Academy of Sciences, IEEE International Conference on Computer, Communications
and Electronics (2017)
5. Reddy, R.V.K., Babu, U.R.: Handwritten hindi character recognition using deep learning
techniques. Int J Comput Sci Eng (2019)
6. Vamvakas, G., Gatos, B., Stamatopoulos, N., Perantonis, S.J.: A complete optical char-
acter recognition methodology for historical documents. In: IEEE Computational Intelligence
Laboratory, Institute of Informatics and Telecommunications, National Center for Scientific
Research “Demokritos”, GR-153 10 Agia Paraskevi, Athens, Greece, 2008
7. Singh, S.P., Kumar, A., Darbari, H., Singh H., Rastogi, A., Jain, S.: AAI, center for development
of advanced computing, Pune, India. In: Conference: International Conference on Computer,
Communications, and Electronics (Comptelix), 2017
8. Sarkhel, R., Das, N., Das, A., Kundu, M. Nasipuri, M.: A multi-scale deep quad tree-based
feature extraction method for the recognition of isolated handwritten characters of popular
indic scripts. Pattern Recogn. (2017)
9. Shahnawaz, Mishra, R.B.: A neural network-based approach for english to hindi machine
translation. Int. J. Comput. Appl. 53 (2012)
10. Kunchukuttan A, Mehta P, Bhattacharyya P.: The IIT Bombay english-hindi parallel corpus.
In: Language Resources and Evaluation Conference (2018)
COVID-19 Touch Project Using Deep
Learning and Computer Vision
Chatla Venkat Rohit and GRS Murthy
Abstract The world is suffering from the COVID pandemic. Out of empathy we
want to serve the planet with whatever possible we can do. We came up with an
idea of using technology to maintain physical distancing and tracking sanitizing of
each person at public places including ATMs and supermarkets. As monitoring each
person is very difficult, we have used detection models to solve the problem. Our
model can be deployed easily and can be integrated into a network of existing CC
cameras so that it is best used by the society. In our model, we have used close to
60 with max at 92FPS as a trade-off between speed and accuracy, and our project
has detected hand touching of the door/chair or any object around with those details
with an accuracy score of 96%.
Keywords Computer vision · Cloud functions · YOLOv3 · Darknet · Pub/sub

pipeline · Transfer learning
1 Introduction
COVID-19 pandemic is spread like fire all over the world due to unknowing physical
contacts between persons. Keeping in mind that public places are visited by people on
a daily basis, we came up with a model which tracks the contacts made by hands with
different objects around the vicinity of the camera and alerting whenever we bring our
hand closer to face without washing\sanitizing (action). In our work, we considered
ATMs and shopping markets are the places that need this model for regulation of the
COVID-19 spread. Hand movements and sanitizing of hands can be detected. Every
detail of touching, timestamps, and skeleton information is stored in a database, and
C. V. Rohit (B)
Department of School of Computing, Sastra University, Thanjavur, Tamil Nadu 613401, India
e-mail: rohitchatla@gmail.com
G. Murthy
Department of Computer Science and Engineering, Avanthi Institute of Engineering and
Technology, Vizianagaram, Andhra Pradesh 531162, India
e-mail: murthy.grs@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_31
286 C. V. Rohit and G. Murthy
using cloud functions, we triggered notifications to citizens whenever they touch their
face without washing/sanitizing hands. Our models also sense whether a person wore
a mask, gloves or not, to give access to public places. Continuous camera monitoring
can be achieved by installing the model at places like ATMs and shopping malls where
doors are touched frequently and supermarkets where people get in contact with
different objects (fruits, vegetables, etc.). By integrating our model with existing CC
cameras officials now only need to sanitize that particular area instead of sanitizing
the whole market preventing adverse effects of cleansing more (cost and time). This
will also help officials to track travel and engagement history of COVID-19 positive
patients.
2 Related Works
There are so many projects which adopt YOLOv3 for different purposes but as far
as tracking/detecting for COVID cases and alerting clients using detection of human
movements and action concern none is available. Recently, India’s main contact
tracing technology launched Arogya Setu app [1] which works on the principle of
tracking of clients nearby them and alerting if COVID-positive patient is reported
nearby the client vicinity, but this will be fruitful to the fullest only when all clients
turn on their Bluetooth and GPS continuously, thus allowing to track uninterruptedly
but our model can be integrated to shopping malls and ATM’s seamlessly with a
simple software plug into existing CC cameras without installation of any sensors.
Arogya Setu also has some privacy issues like anyone can change the internal database
data snaps, etc., which we hope to be solved.
3 Methodology
If we are able to monitor people who are making contact with different objects around
them, then we can prevent these diseases from spreading. Our project used YOLOv3
for object detection to recognize the touches made by people near ATMs which is a
place that we visit frequently (Fig. 1).
3.1 Working of YOLOv3
YOLO model is fast and accurate enough to recognize different objects and uses
a regression mechanism. This model recognizes the images during runtime in each
frame and updates the table when required. It uses convolution neural network (CNN)
that is used as object detection in real time [2]. Each frame is divided into regions
COVID-19 Touch Project Using Deep Learning and Computer Vision 287
Fig. 1 Architecture diagram of corona touch project
with boundary probabilities. Then, boxes are weighed accordingly to distinguish

between different objects [3].
3.2 Preprocessing Steps and Training
We preprocessed the images of the human hand, door, and common objects in context
(COCO). We have used open image dataset v6 [4] and OIDv4_Toolkit to download all
the images and used a custom script to convert Protobuf formatted label to YOLOv3
coordinates format [5, 6]. We have used darknet.conv.53 pre-trained weights and
classes to train YOLOv3. We trained some objects: Chair, door, human hand, and
used transfer learning for common objects in context (COCO objects). As we are
going to use this setup for a custom model, so we needed to write Python scripts
that were used as a wrapper to the original darknet repository, and those scripts are
used to detect and track (deep sort) human hands touching the objects. To make the
dataset serve our needs, we cropped out the unnecessary portions in the background
and focused on the foreground required objects.
3.3 Darknet
It is a data science framework written in C++ [7]. It can be installed along with two
dependencies—1. OpenCV and 2. CUDA. The computation power can be increased
Fig. 2 Overlapping model

for tracking
by shifting the process to GPU from CPU using darknet. OpenCV along with darknet
gives more freedom to the model for detecting different images and videos. Darknet
flexibility makes it a favorite model for use.
3.4 OpenCV
In our project, we have used darknet/YOLOv3 for custom training, and we needed
to build OpenCV as a binder and should be built from scratch as darknet/YOLOv3
is written in C++. After custom training in our project, we used a custom Python
wrapper function for tracking (deep sort) and detecting objects. We used OpenCV
for image/video manipulations like opening of the stream, and contours (rectangular
boxes) are also constructed using OpenCV [8]. OpenCV is a computer vision library
that has 2500+ optimized algorithms for computer vision and machine learning [9].
These algorithms track camera coordinates or movements and extract 3D, 2D models
from the image, and for manipulating with the camera and video, image streams [10].
We used box overlapping methods to detect touch and hand coming to face features
of our project, for example, if you assume red box as door (Fig. 2) and a white box
as hands, then whenever in output white box touches/overlaps/comes inside the red
box, then it means hand touching the door [6].
3.5 Posenet
We used Posenet for detecting hand wash/sanitization action in our project. Our
function returns pose info (wrist left/right points, etc.), skeleton (elbow, arms lines,
etc.) info, key points, confidence, and accuracy score for all properties. It is a robust
and real-time monocular six degree of freedom relocalization system. It is trained
on a convolutional neural network to regress the 6-DOF camera pose from a single
RGB image in an end-to-end manner with no need for additional engineering or
graph optimization [11]. Posenet’s poses and skeleton helped for us to build a model
Fig. 3 Detecting
wash/sanitize action
out of it for custom washing/sanitizing actions with custom coordinates calibrations

(Fig. 3).
3.6 Tensorflow.js
We used Tensorflow.js for the hand and face detection which in turn used for hand
bring closer to face action (Fig. 4).
We used deep sort for tracking frame by frame and giving different ID’s to similar
objects which enables to track all image throughout the duration of a video feed,
once is being detected by the object detection at the first frame, if the detection of
its presence is lost, then the tracking is stopped for static (position of the camera)
video feed. To reach our goal, we tested a few models and gathered meta-information
which outstands others and gathered the results. We mainly used YOLOv3 and deep
sort algorithm [2] with customization of Posenet and OpenCV as mentioned above
for our use case as shown in below section.
Fig. 4 Overlapping model

for hand to face detection
4 Scenarios for Use Case and Working Applications
4.1 Introduction to in-Home-Based Model
What this model basically does is whenever you touch any item in your house, say,
for example, a chair with your hand, it keeps tracking this through CC camera context
(24 × 7) [6] then that info (metadata) will be sent to the database. In a database with
metadata info, we will be having importantly three attributes which are (1) touch,
(2) wash, and (3) detect. All are preset to 0, which are actually binary-valued classes
(logic high referred by 1).
How Cloud functions and pub/sub are used:
Whenever touch is detected of human hand with chair/door (Fig. 5), a cloud function
is activated to change the Touch variable to 1 and store the metadata (object coor-
dinates, timestamp, contours, etc.) in the database. Whenever you wash your hands
(Fig. 6), using Posenet/ml5.js info of poses and skeleton cloud function changes
Wash value to 1 and stores the Wash metadata in database [12] and changing Touch
to 0 else Touch remains 1 and Wash remains 0 (Fig. 7). Whenever hand coming to
face is detected (Fig. 8), then cloud function is used to change the Detect value to
Fig. 5 Hand touching chair

tracked (1) Touch == 1:
notify chair coordinates, etc
detail
Fig. 6 Posenet’s
coordinates, poses, and
skeletons info (2) Wash ==
1: wash details
Fig. 7 Wash action detected

using Posenet/ml5.js
Fig. 8 Hand detected while

bringing closer to face using
webcam on laptop and
notified for the same (3)
touch == ‘1’ && detect ==
‘1’
1 and obviously not detected means again changing to 0 in real time. Later to buzz
the alarm or not by simply checking if both touch and detect attribute is 1 then only
to shoot the alert, mobile notifications or of course in all stages too, depending on
attributes state (high/low) [6].
4.2 ATM/Supermarket Scenario
In the real-world scenario, we can deploy our model (in-home) by adjusting it to

particular needs. At first in ATM’s, we can use a session scheme for a person, which
will have a completed transaction cycle, from entering to leaving ATM’s, this session
will be used for authorization/identity (Fig. 9) of person to store his touching’s
in his table of the database instead of complex and less accurate face detections.
Now particular person’s actions, movements of touching the door (Fig. 10), and the
currency will be recorded and if he/she sanitized their hands before entering the ATM,
then the display board before ATM will have a safe sign else will have a touch history
(touch details) of others (Fig. 11) who did not follow the rules (without revealing
their identity). This mechanism can also be incorporated with personal hygiene by
adding additional rules of sanitizing after touching door, currency, etc., which will
Fig. 9 ATM card

authenticator (outside view)
Fig. 10 Person not touching

the door
Fig. 11 Person touching the

door tracked
not cause an alarm to buzz or else like in in-home scenario while working before
PC/laptop whenever a person brings his hands close to face the notifications and
alarm will buzz. Same goes for supermarkets, we can use the existing CC cameras to
monitor the touches of people and send that information on the fly to the inventory
department later they can sanitize those areas in no-working hours. Using session
schemes, the touches of individuals can be sent to particular customers.
Fig. 12 Graphical Representation of: (i) Accuracy score (dynamic stream), (ii) GPU (training) time,
(iii) memory used, and (iv) speed (FPS) of mask_R-CNN, SSD, CNN, and YOLOv3 algorithms,
respectively
5 Results
5.1 Comparisons of YOLOv3 Versus SSD Versus R-CNN

Versus Mask-R-CNN| Verdict|Test Images
See Fig. 12.
5.2 Observations of the Experimental Analysis
All these graphs (Fig. 12) are plotted by taking the average of 500 images/ 2 min
length video which is used as validation/testing data with a split of 80/20 into train
and testing dataset. Observing accuracy (avg dynamic) YOLOv3 is ahead, R-CNN is
ahead in accuracy (avg static stream), mask_R-CNN is detecting noise. As you can
see, the FPS graph which infers us that YOLOv3 is powerful for dynamic motions.
As YOLOv3 is sparse algorithm (macro items) which further fits for our need in
dynamic motions. So, we opted YOLOv3 for its faster predictions (as once we detect,
fits enough for next 10 s). Average FPS is way less than that of Max FPS shown in
Fig. 13 Testing images of (i) mask_R-CNN, (ii) SSD, (iii) R-CNN, and (iv) YOLOv3
the graph, but still YOLOv3 with 40 is a good score for average. Though GPU time,
memory usage is more for YOLOv3 for training but as training is more often a
one-time process so this con is also not a potential problem in our case. As we are
using macro-particles, (Fig. 13) so the resolution is decreased using cloud function
trigger (on the cloud) and is sent to the algorithm so FPS is increased and out average
is now bumped close to 60 with max at 92 fps, so with a trade-off between speed
and accuracy, we can achieve our project goal to detect hand touching door/chair
or any object around with those details, though accuracy is also good close to 96%
all graph is evidence for these and this also makes less use of computation on GPU
(cost-effective on GPU/CPU) as the resolution in diminished by 10’s scale [6].
We used the following configurations for training dataset
GPU: 1xTesla K80, compute 3.7, having 2496 CUDA cores, 12 GB GDDR5
VRAM.
6 Conclusion and Future Works
Finally based on experimental analysis and best-fit conditions, we preferred YOLOv3

with a pre-trained model and custom training weights (transfer learning) are the best
fit compared to other algorithms like: mask-R-CNN, SSD, and R-CNN. YOLOv3
fastness and manageable accuracy are best suited for our goal and as even after
compressing the images stream, fine adjusting to our requirements, still accuracy is
kept to the par for our project. By using this model, we can bring in small change in
society by helping the people in terms of their touches. Futuristically, it can be used
for detection of social distance management by using drones/traffic CC cameras,
two-person are at min distance away or not and analyzing from that. Also detecting
a person is wearing a mask/gloves or not wearing to further allow them to public
places by keeping mask authentication barriers in public places (airport, crowd areas,
etc.). Totally, the scenarios used in the project are used to give an idea to deploy the
same in different cases with fine adjustments in the real-world applications.
References
1. Arogya Setu App: https://www.mygov.in/aarogya-setu-app/

2. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. (2018), arXiv:1804.02767v1
3. Tian, Y., Yang, G., Wang, Z., Wang, H., Li, E., Liang, Z.: Apple detection during different
growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric.
157, 417–426 (2019). https://doi.org/10.1016/j.compag.2019.01.012
4. Dataset: https://storage.googleapis.com/openimages/web/visualizer/index.html?set=train&
type=detection&c=%2Fm%2F0k65p
5. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer
Vision and Pattern Recognition (CVPR). 10.1109/CVPR.2017.690
6. Covid App Extra files/pics: http://www.cvrrocket.ga/projects/touch_app
7. Darknet repository. https://github.com/AlexeyAB/darknet
8. He, K., Zhang, X., Ren, S. et al.: Deep residual learning for image recognition. In: Computer
vision and pattern recognition (cs.CV). (2015b), arXiv:1512.03385
9. https://www.pyimagesearch.com/2018/11/12/yolo-object-detection-with-opencv
10. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer
Vision, pp. 1440–1448 (2015)
11. Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF
camera, (2015). arXiv:1505.07427
12. https://youtu.be/iwSNO8O-xvY (Project Video (Demo in RealTime))
Flood Relief Rover for Air and Land
Deployment (FRRALD)
Jewel Moncy John, Justin Eapen, Jeffin John, Ebin Joseph,

and Abraham K Thomas
Abstract A drone and rover integrated setup is used for the rapid recovery of the
people affected by a natural disaster. The rover has an onboard camera, and the
visuals are relayed to an operator remotely for control. The rover moves around
using its four wheels and the operator will manually move both the rover and the
drone using the system cameras. The rover also has an inbuilt GPS and PIR for
giving exact location and for helping in the accurate detection of victims. Using face
recognition algorithm, the user will identify the victim. This face recognition can be
done using database that contains a list of disaster-prone people. The medical staff
can also collect patient’s medical information using facial recognition algorithm for
rapid medical support and recovery. A fleet of these systems can be deployed so that
the search and rescue can be done efficiently and can save life.
Keywords Disaster · Drone · Rover · Mapping · Rescue · Detection · Medical
J. M. John (B) · J. John · E. Joseph · A. K. Thomas

Department of ECE, Saintgits College of Engineering, Kottayam, Kerala, India
e-mail: jewelmj25@gmail.com
J. John
e-mail: jeffin.jz1620@saintgits.org
E. Joseph
e-mail: ebin.joseph1620@saintgits.org
A. K. Thomas
e-mail: abraham.k@saintgits.org
J. Eapen
Faculty, Department of ECE, Saintgits College of Engineering, Kottayam, Kerala, India
e-mail: justin.eapen1620@saintgits.org
https://doi.org/10.1007/978-981-33-4543-0_32
298 J. M. John et al.
1 Introduction
In catastrophic circumstances such as floods, tornadoes or hurricanes, one of the main

obstacles facing rescue and recovery teams is to find and identify survivors and casu-
alties at the earliest opportunity. In these situations, however, rescue teams are unable
to detect the actual state of life under the rubble that eventually leads to disaster. In
fact, accidents have had such a destructive impact on the body, making it much more
difficult to differentiate between materials like mud and person itself. This may direct
to a lot number of people losing their lives; an uncontrollable situation. The rescue
teams find it difficult to save people as they are unable to locate people quickly and
then provide them with emergency medical care. The availability and demand for
drones have risen considerably in recent years due to technical breakthroughs that
have provided them with more sophisticated technologies such as multi-functional
sensors, position trackers, and built-in cameras. Although commonplace for indus-
trial use, aerial drones [1] also are utilized in surveillance, army operations, and catas-
trophe alleviation investigations. Their flexibility and compactness make it easier to
perform activities which might be potentially dangerous to humans.
A literature survey was conducted taking into account the flood that has occurred
in the state of Kerala, India in 2018 and 2019. It occurred in the months of July
and August when the rainfall was above the normal average to about 120%. The
dams became full; as a result, the government opened all the dams and resulted in a
severe flood. More than 700 people were dead and costed the state more than USD
5.6 billion. The rescue team found it really difficult to evaluate the condition of the
disaster that has taken place. It took a lot of time to find the victims trapped under
debris during the landslide along with the flood. The rescue teams used helicopters
for rescue and to evaluate the situation, which proved to be really inefficient and
expensive.
The idea is to create a rover which is both waterproof and is suitable for all terrains.
The rover is integrated on to a drone. The drone can fly and find and people or victim
with its onboard camera. The drone and rover setup is being controlled remotely by
an operator. Moreover, there is an onboard PIR sensor attached to the rover. This can
help the rover detect people that are stuck under debris in hard to reach areas. The
rover is deployed when it finds possible victims, and the rover can reach hard to reach
site like under debris, tunnel, inside the house where the drone cannot fly. Critical
data like the position of the rover, which gives the location of the victim can be
conveyed to the remote operator, and the remote operator then conveys the location
to the rescue team. The rescue team can reach the location to provide medical help.
Simultaneously, the remote operator can detect the person using face recognition and
identify the victim’s medical history like blood group, allergies toward any medicine,
etc. This data can be helpful for the medical team.
Flood Relief Rover for Air and Land Deployment (FRRALD) 299
2 Drone Design
The drone provides the lift to reach the affected area quickly [2]. The drone is
integrated to the rover. The parts of the drone include the following [3].
Frame: The frame provides the support for the essential control and motor of the
quadcopter. The design should be light, should be compatible with the rover design,
and should hold the rover and carry it with ease. It also provides support for four of
the motors that provides lift to the design. However, the design should be collision
free and there should be adequate separation between each motor blades. For the
design of the frame, Hobbyking SK450 frame is used. The frame should facilitate
space for enough holes for screwing other parts to the frame.
Brushless Motors: The required lift and thrust for the drone is provided by the 4
brushless motors fixed on 4 arms of the quad frame. Brushless DC motors are used
instead of brushed DC motors, as they provide a greater thrust/weight ratio. The
motors are controlled with the help of ESC and other sensors so that correct rpm
is maintained on the motors and we can control the motion of the motor for the 6
DOF. The kv rating indicates the speed of the motor when 1v is applied. This motor
provides 1000 kV. The current of 12A in 60 s indicates the maximum current that
can be drawn. The weight of the motor is approximately 275 g.
Propellers: For the propeller, a carbon fiber propeller of length 12’, pitch 4.5’,
weight 36.5 g, and shaft diameter 6 mm. Four propellers are required for the four
quadcopter drone motors to provide the thrust. The propeller needs to be light, be
balanced to reduce the vibrations and should not overheat the motors.
ESC: Each motor needs its own electronic speed controller (ESC). The ESC
accepts commands in the form of pulse width modulated control signal and output
corresponding motor speed. The current rating of the ESC the maximum current, it
can deliver without causing the motor to overheat. They provide power to the motor
so that the required rpm is maintained to keep the drone stable. A 25A four in one
controller can be used to deliver power to the motors.
Transmitter and Receiver: The radio transmitter is what the pilot or the remote
operator controls. The transmitter sends out signal mostly in the 2.4 GHz range to
the receiver. The receiver processes these signals and send it to the flight control unit.
Increased power rating can increase the range as the operator in this case works with
the help of cameras and not line of sight communication. The drone army or swarms
of drones can act as repeaters for extending the range. The transmitter can be used
to control all six DOF.
LIPO Battery: Lithium polymer batteries are used to provide high torque to the
motors of the drone. Since LIPO batteries have very high discharge capacity they
are ideal for placement in the drone unlike a Li–ion battery which has less discharge
capacity. They also have high energy density when compared to the weight of the
battery itself. It typically has an output voltage of 3.7 V.
Attitude Sensor: The orientation and the attitude are controlled by the attitude
sensors. Basically, they contain sensors like gyroscope and accelerometer for control.
A six-axis inertial measurement unit (IMU) is used, consisting of an accelerometer
Fig. 1 Inertial frame of a free body
and gyroscope on the same unit. Without the attitude sensors, the drone can simply tip
over and cannot fly. The drone must be able to control all the six degrees of freedom
with the least number of motors as much as possible. The six degrees of freedom that
we are using involves movement and rotation along each x, y, and z-axis as shown
in Fig. 1. All aspects of flight can be manipulated by applying different values of
thrusts to the different motors individually. When all the motors have the same rpm;
the drone lifts upwards. To move to a particular direction, for example, to the left,
the rpm of the two motors at the left can be reduced while also increasing the rpm of
the side. The six DOF are X, Y, Z, and rotational DOF includes roll, pitch, and Yaw.
3 Rover Fabrication
The design of the rover was planned to be done using 3D printing. But unfortunately,
the overall design proved to be fragile using 3D printing. So, laser cutting was used.
The design was made using SolidWorks. As can be seen from the side view in Fig. 2a,
a belt drive was also designed, but during the fabrication stage, it was difficult to laser
cut it. Figure 2b shows the 3D view of the rover. In order to laser cut the model, each
frame was resolved from 3D to 2D that is an outline was made in 2D and was printed
using laser cutter. An acrylic sheet of 5 mm thickness for the structure and 8 mm
thickness for the wheels is used. However, the wheels were found to be less strong,
so the thickness of both the front wheels was increased to 10 mm. To make the design
lighter, other materials instead of acrylic like carbon fiber or plastic can be used.
4 Rover Motion and Control
The rover motion is provided by Arduino. The reason a dedicated microcontroller is

used to basically increase efficiency. The efficiency in communication between the
Raspberry Pi and its subsystems can be improved by making use of an independent
microcontroller. An Arduino Uno is programmed and is used as the microcontroller to
Fig. 2 SolidWorks model a side view, b 3D view
control the system. It is connected to a Bluetooth module HC-05 for communication.

Four DC motors, each of 500 rpm and input 12 V (dc) is used for rover movement.
The rover is powered by a 12 V dc LIPO rechargeable battery. Since the voltage
generated by an Arduino is 5v (DC) and the motor works at 12v (DC), we use a
motor driver, L298 bridge motor controller board to powers all the four motor at
12v (DC) and also powers the Arduino itself using its output 5v (DC). The circuit
diagram in Fig. 3 shows the connections with a single motor driver. The motor has
two polarities. The peculiar feature of a DC motor is that when we change the polarity
the direction of the motor is reversed. So, let us name the front two motors as M1A
and M2A. Similarly, the back two motors as M1B and M2B. For example, to get a
Fig. 3 Circuit diagram of motor control
forward motion, M1A can be set to high and M1B to low. The direction is reversed
when M1A is low and M1B is high. An HC05 Bluetooth module is used, and the TX
is connected to Rx and Rx to TX on the Bluetooth module and Arduino.
5 Mapping Using GPS
The GPS module is used to get the position of the victim in Google Maps [4].
The GPS uses three satellites which sends the longitude, attitude, and time to the
GPS receiver. The GPS module is connected to the Raspberry Pi [5]. It consists of
four pins—VCC, GND, TX, and Rx. The TX is connected to the Rx and the Rx
to the TX of the Raspberry Pi and the GPS module. The VCC and GND pins are
connected to the respective pins on the Raspberry Pi. The data received is in the form
of pseudocode and it is converted to coordinates at the Raspberry Pi. The coordinates
that are converted in the Raspberry Pi are sent to an online server; in this case, we
use ThingSpeak since it is free. The latitude [6] as in Fig. 4a and longitude as in
Fig. 4b is uploaded to the server using its upload keys. The server plots the graph of
the data received as shown in Fig. 7a, b. With the help of Google API keys under the
developer options, we can plot the coordinates of the victims’ location on Google
Maps as shown in Fig. 5.
Fig. 4 ThingSpeak data a latitude, b longitude
Fig. 5 Location on Google

Maps
6 Human Detection Using PIR
The passive infrared sensor (PIR) detects people at arrange. The human body emits
radiations of 0.7–300 µm. The PIR has two slots. The slot is made of a material which
is sensitive to infrared radiations (IR). When the sensor is idle both slots receive the
same amount of radiations. These radiations can be the radiations from the room,
wall, etc. A positive differential change is created when a human body passes to
the first half of the sensor. When the human body leaves, a negative differential
change between the two halves is detected. These pulse changes are detected and
communicated to the Raspberry Pi.
The PIR module has three pins—VCC, GND, and control. The PIR module detects
people to about 8 m for the quick detection of humans [2]. The remote operator can
then detect the people with ease and can also know the location of the victims with
the help of GPS module. A high signal is given out from the PIR sensor on detection
of the human body to the microcontroller. Figure 6 shows the influence of the human
or animal body on the PIR sensor. The PIR module is connected to the Raspberry
Pi. The data received from the Pi is uploaded in the same way as in the GPS module
to the online server ThingSpeak. If human is detected the operator gets notification.
Figure 7 shows the data received on the ThingSpeak server, if the system shows “2”
that shows the presence of a human. On the other hand, if it shows “1” then the system
Fig. 6 Influence of PIR on humans
Fig. 7 PIR data in

ThingSpeak
conveys the absence of a human being. However, this system works with continuous
integration with the camera module for accurate results.
7 Camera and Face Recognition
The PI cam is connected to a Raspberry Pi. The camera gives about 5 megapixel
clarity. Instead of the Raspbian OS, we will have to install MotionEye OS. This OS
is used for surveillance with the Raspberry Pi. The data received from the Raspberry
Pi is sent to the operator via the Internet [7]. The operator can get the visuals by
entering the Raspberry Pi’s IP address. This IP address is integrated to MATLAB
for visualization. We use database to store the relevant pictures of all the people. In
the case of an emergency this dataset can be used for face recognition. The system is
trained with AlexNet and the visual received from the Pi is used for matching. The
dataset in Fig. 8a, b are trained using MATLAB. The dataset consists of about more
Fig. 8 Dataset of a person 1, b person 2
Fig. 9 Detection of a person 1, b person 2
than 120 pictures each. The training is done with stochastic gradient decent with
momentum (SGDM) with an epoch of 20. 90% of the data is given for training and
the rest 10% for testing. The testing is done using MATLAB with a test set of person
1 and 2. As in Fig. 9a, b, the person 2 and 1 are detected successfully. The system
is modified for getting critical information such as patients’ blood group, medical
history, etc., along with the name. This can be used for the rapid response from the
rescue team and medical team.
8 Conclusion
FRRALD can help in the effortless and rapid rescue of victims affected by flood
and other disasters. They can get immediate medical attention. The system can assist
rescue teams to recognize the effect of the catastrophe that has taken effect. The
rescue team can rescue the people trapped by receiving the location to rescue. The
GPS accuracy was good. Some tests were carried out using the GPS and Google Maps
and it gave accurate results. Tests were carried out, in which the coordinates from the
system were compared against known coordinates. With the help of the camera, the
operator gets a clear-cut view of the scenario and the effects of the disaster that has
taken place. The PIR and camera can assist in detecting and identifying the people
trapped. The face recognition is useful in getting important information about the
patient for rapid medical support. For the research work, a free ThingSpeak server
is used, but for real-time communication, operation and to send the acquired data, a
strong backhaul network and server can be used.
The challenges that FRRALD face includes the accurate detection of the victims
trapped deep within the mud. To increase the deep learning and MATLAB algorithm
and to better visualize the face of the victim, it is still a major drawback for FRRALD.
The cost of construction or the integration of the drone which can carry the payload
weight of the rover, it can make the overall cost high. To increase the accuracy of
the location tracked and relying to the operator can be a problem. The system can
be fitted with thermal camera to get an overall better visualization. The system can
also be upgraded by sending a FRRALD army or a fleet [8], to reduce the time in
detecting the victims and also to act as a repeater to increase the range.
References
1. Pedersen, J.: Use of UAVs in the NGO world. In: CRS Conference—ICT4 Development, Nairobi,
Kenya, Mar 25–28, (2014)
2. Rivera, A.J.A., Villalobos, A.D.C., Monje, J.C.N., Mariñas, J.A.G., Oppus, C.M: Post-disaster
rescue facility: Human detection and geolocation using aerial drones. In: 2016 IEEE Region 10
Conference (TENCON)
3. Alwateer, M., Loke, S.W.: On-Drone decision making for service delivery: concept and simu-
lation. 2019 IEEE International Conference on Pervasive Computing and Communications
Workshops (PerCom Workshops)
4. Tariq, R., Rahim, M., Aslam N., Bawany N., Faseeha, U.: DronAID: a smart human detection
drone for rescue. In: 2018 15th International Conference on Smart Cities: Improving Quality of
Life Using ICT & IoT (HONET-ICT)
5. Parvu, P., et al.: Autonomous system for image geo-tagging and target recognition. M J1 erospace
Conference, m press, May 2014, pp. 1–26
6. Câmara, D.: Cavalry to the rescue: drones fleet to help rescuers operations over disasters
scenarios. In: 2014 IEEE Conference on Antenna Measurements & Applications (CAMA)
7. Gaszczak, A., Breckon, T.P., Han. J.: Real-time people and vehicle detection from VAV imagery.
In: Proceedings of SPIE: Intelligent Robots and Computer Vision XXVIII: Algorithms and
Techniques, San Francisco, California, 2011, pp
8. Besada, J.A., Bernardos, A.M., Bergesio L, Vaquero, D., Campaña, I, Casar, J.R.: Drones-as-
a-service: A management architecture to provide mission planning, resource brokerage and
operation support for fleets of drones. In: 2019 IEEE International Conference on Pervasive
Computing and Communications Workshops (PerCom Workshops)
An Enhanced Differential Evolution
Algorithm with Sorted Dual Range
Mutation Operator to Solve Key Frame
Extraction Problem
M. Aathira and G. Jeyakumar
Abstract This paper proposes a modified Differential Evolution (DE) algorithm in

which the conventional mutation operation of DE is replaced by a ‘sorted popula-
tion’ based mutation operation. This ‘sorted population’ based mutation operation,
proposed by authors, differs from the conventional mutation operation in the way in
which it selects the candidates for the mutation process and the values it sets for the
mutation scale factor (F). The modified DE was implemented, to verify its superi-
ority, on solving 14 different standard benchmarking problems. A comparative study,
based on the results obtained, revealed that the proposed algorithm solved the prob-
lems providing optimal solutions with lesser time, for higher dimensional problems.
Next, the experiments were extended to solve the key frames problem from videos.
This part of the experiment combined the conventional SSIM (Structural Similarity
Index) approach of key frame extraction with the proposed DE. The results showed
that the proposed DE was giving comparatively better results than classical DE.
Keywords Differential evolution · Mutation · Modified mutation · Video

analytics · Key frame extraction · SSIM approach
1 Introduction
Evolutionary Algorithms (EAs) is a set of systematic random search algorithms popu-

larly used for solving real-world optimization problems. The researcher community
of EAs focus on various aspects of the algorithms which includes analysing the
theoretical property of the algorithms, modifying their algorithmic structure, inte-
grating them with other similar algorithms and designing strategies to control/tune
their parameters. As well as, testing their applicability on solving different real word
M. Aathira (B) · G. Jeyakumar

Department of Computer Science and Engineering, Amrita School of Engineering, Amrita
e-mail: cb.en.p2cse18001@cb.students.amrita.edu
G. Jeyakumar
e-mail: g_jeyakumar@cb.amrita.edu
https://doi.org/10.1007/978-981-33-4543-0_33
308 M. Aathira and G. Jeyakumar
optimization problems. It is also commonly found in the research community that

the researchers propose innovations to the algorithmic structures of EAs and then the
modified EAs are tested on real world optimization problems. In the similar line, this
paper proposes to modify the mutation component of Differential Evolution (DE)
(proposed in [1]) algorithm and to solve the key frame extraction problem of video
analytics with the modified algorithm.
The remaining part of the paper is organized with the following sections—Sect. 2
to discuss related works, Sect. 3 to introduce the proposed mutation strategy, Sect. 4
to explain the design of the experimental set up used for this study, Sect. 5 to present
and discuss the results obtained on the benchmarking functions, Sect. 6 to verify
the novelty of the proposed mutation on a video analytics problem and Sect. 7 to
conclude the paper.
2 Related Works
This section summarizes popular amendments made in the mutation component of

DE and key frame extraction of video.
An enhanced DE algorithm with multiple-mutation strategies and self-adapting
control parameters was proposed in [2]. In [3], based on the fitness value, individuals
of each generation are ranked from better to worst part. To perform the mutation, 2
individuals are chosen either from the worst part or better part. A modified mutation
named dual preferred learning mutation (DPLM) was proposed in [4]. The DPLM
simultaneously learns behaviours from the individual with better fitness (BFI) and
individual with better diversity (BDI). A Revised Mutation DE, ReDE, was proposed
in [5]. The ReDE used two control parameters and two types of populations. In [6], the
author proposed a diversity based base vector selection mechanism for the mutation.
This idea was extensively evaluated and reported in [7], by the same authors. A
different base vector selection strategy, to select the centroid of top 3 candidates as
the base vector was proposed in [8]. In [9], two different novel variants of Differential
evolution called centroid differential evolution (CDE) and differential evolution with
a local search (DELS) were proposed. [10] proposed a novel version of mutation
operator to DE inspired from the biological phenomenon called Hemostasis. Authors
of [11] proposed a set-based mutation operator which works on the causal matrix.
There are also attempts to investigate and propose new algorithmic structure of DE
viz dynamic DE and distributed DE ([12–14]) etc.
In video analytics extracting the key frames from the video involves more appli-
cations viz video summarization, content based retrieval and object detection etc.
There are numerous approaches proposed in the literature for this purpose. [15]
proposed a Euclidean Distance Based strategy. In [16], an entropy based approach
was proposed. The approach proposed in [17], used an improved histogram algo-
rithm. An algorithm based on optimized key frame difference was proposed in [18].
The chi-square histogram algorithm was used in [19, 20] introduced a formula to
calculate the difference between the current and the next frame. The integration of
An Enhanced Differential Evolution Algorithm … 309
Structural Similarity Index Method (SSIM) with classical DE algorithm to solve the
key frame extraction problem was first introduced in [21]. An extensive compar-
ative study on the conventional SSIM, entropy method and the Euclidean method
and their integration with DE was presented in [22]. It was reported that the DE
unified algorithms showed high accuracy. Following the DE_SSIM proposed in [21],
this paper proposes to integrate the proposed mutation modified DE algorithm with
SSIM approach to detect the key frames from a set of traffic surveillance videos. The
proposed mutation method is described in the next section.
3 Proposed Mutation Strategy
The general logic of DE’s mutation (which is named as differential mutation) is to

add the scaled difference of two (or more) candidates in the population with another
candidate in the population. Based on the way these candidates are selected there are
many different mutation operators available for DE.
The proposed mutation strategy is a modified version of ‘rand’ mutation. The
population is sorted in the ascending order of their objective function values and
it is divided into two partitions—Promising and Non-promising. The indexing of
is from 0 to NP − 1. Hence, the
the candidates index
range of promising region
is 0, N2P and the non-promising region is N2P , N P − 1 . For mutating the
candidates in the promising
and non-promising
region
the random candidates are
selected in its region 0, N2P and N2P , N P − 1 , respectively. The F value is set
random in the range [0, 0.5] and [0.5, 1], respectively. In both the cases the base vector
is the candidate with good fitness value among the three random candidates. The
proposed mutation operator is named ‘sorted dual range mutation’ (sdrm). The DE
with sdrm is denoted as DE sdrm , henceforth in the paper. The design of experimental
setup is discussed in the next section (Sect. 4).
4 Design of Experiments
The objective of this experiment is to investigate the performance DE sdrm and

classical DE (cDE). The comparison of the algorithms was done based on their
performance on benchmarking problems and a video surveillance problem at traffic
signals.
The parameters for DE are the number of candidates in the population—popula-
tion size (ps), size of each candidate—dimension (d), the mutation scale factor—F,
the probability of crossover—crossover rate (C r ), the maximum number of genera-
tions (MaxGen) and the maximum number of trail runs (M tr ). The values for these
parameters were set constant before start of the DE sdrm run, except for F. The values
for F were chosen randomly in the ranges specified in the proposed mutation strategy
(sdrm), for every candidate in the population. The experiment was repeated for two
different population sizes 60 and 200. The summary of parameter setting is ps = 60

and 200; d = 30; F in [0, 1]; C r = 0.5; MaxGen = 10 and M tr = 50;
The performance metrics used in the experiments were the average of solutions
(AOS) and the speed. The AOS was measured as the average of these solutions
obtained for M tr . The speed was measured with two metrics – the number of function
evaluations (nFE) and the execution time (ExeTime). The values measured for each
run are reported for discussion in Sect. 5.
The classical DE (cDE) and the proposed DE (DE sdrm ) were implemented to solve
the benchmarking functions chosen in the experimental setup. The AOS, nFE and
ExeTime measured for cDE and DE sdrm , for ps = 60, is presented in Table 1. The
results indicate that the proposed DE sdrm outperformed cDE by all the performance
metrics, only, for f 2 . The DE sdrm outperformed cDE by two metrics together (AOS
and ExeTime) for three functions f 3 , f 8 and f 13 . The DE sdrm outperformed cDE,
only by nFE, for two functions—f 4 and f 10 . The DE sdrm outperformed cDE, only
by ExeTime, for five functions—f 1 , f 7 , f 9 , f 11 and f 14 . Except for f 5 , in all other
functions the DE sdrm outperformed cDE by at least any one of the metrics. The
summary of inferences is “cDE was good in AOS and the DE sdrm was good in speed
in both ExeTime and nFE”.
Table 1 The AOS, nFE and ExeTime for ps = 60

Functions cDE DE sdrm
AOS ExeTime nFE AOS ExeTime nFE
f1 55,270.07 0.0020 660 75,252.14 0.0018 660
f2 109,338.55 0.0040 660 11,374.32 0.0034 606
f3 427.49 0.0047 660 418.03 0.0045 660
f4 20.57 0.0048 660 3.79E + 27 0.0049 606
f5 6.02E + 09 0.0021 660 5.91E + 11 0.0021 660
f6 88.25 0.0020 660 88.07 0.0020 660
f7 1.82E + 08 0.0039 660 2.59E + 08 0.0031 660
f8 58,148.70 0.0024 660 52,925.40 0.0023 660
f9 76.62 0.0060 660 98.16 0.0053 660
f 10 484.59 0.0040 660 919.78 0.0043 606
f 11 528.03 0.0054 660 552.34 0.0044 606
f 12 860.42 0.0053 660 506.95 0.0053 660
f 13 2559.65 0.0053 660 2393.92 0.0052 660
f 14 1522.02 0.0073 660 2033.85 0.0071 660
a good results are marked in bold
Table 2 The AOS, nFE and ExeTime for ps = 200

Functions cDE DE sdrm
AOS ExeTime nFE AOS ExeTime nFE
f1 50,092.57 0.005 2200 55,856.26 0.005 2200
f2 76,148.29 0.010 2200 70,590.85 0.010 2200
f3 389.05 0.012 2200 382.37 0.011 2020
f4 20.58 0.012 2200 1.19E + 25 0.011 1840
f5 2.13E + 09 0.005 2200 6.80E + 08 0.004 1480
f6 83.80 0.005 2200 77.09 0.004 2200
f7 2.07E + 08 0.010 2200 2.00E + 08 0.010 2200
f8 58,462.30 0.007 2200 40,353.00 0.005 1480
f9 70.50 0.015 2200 64.29 0.014 2020
f 10 169.30 0.011 2200 1024.88 0.003 580
f 11 583.61 0.013 2200 530.26 0.014 2200
f 12 788.16 0.013 2200 713.74 0.015 2200
f 13 2103.64 0.013 2200 2048.07 0.012 2020
f 14 1791.78 0.017 2200 1451.90 0.019 2200
The same experiments were repeated for cDE and DEsdrm to solve the 14 bench-
marking problems, however, with ps = 200. The values measured for the performance
metrics of cDE and DEsdrm is presented in Table 2. The superiority of the DE sdrm
was clearly evident. DE sdrm outperformed cDE in 11, 9 and 7 function cases out of
14 by AOS, ExeTime and nFE, respectively. DE sdrm could outperform cDE by all the
three metrics for five benchmarking functions—f 3 , f 5 , f 8 , f 9 and f 13, by speed (both
by ExeTime and nFE) for two functions—f 4 and f 10 and by AOS and ExeTime for
one function—f 6 .
Thus, the experiments done on the benchmarking functions proved that the
proposed DE sdrm shows superior performance than the classical DE by both the
solution and speed.
To validate further the superiority of DE sdrm , its performance was assessed on
solving the problem of extracting key frames from video. The experimental details
and the observations gathered are presented in the next section.
6 Validation of DEsdrm on Video Analytics Problem
There was numerous evolutionary algorithm based frameworks proposed in the liter-
ature for extracting key frames from the given videos. In this experiment, the cDE
and DE sdrm were implemented for key frame extraction, to demonstrate the efficiency
of DE sdrm . The video was first converted into frames and 75 frames. The objective
of this experiment was set to extract 10 key frames from these 75 frames.
The values for the DE parameters set were—ps = 10, D = 10, F = random (or
constant (0.9)), C r = 0.6, MaxGen = 10 and M tr = 3/50. A population with 10
candidates was initialized. Each candidate in the population was a set of 10 random
frames. The fitness of the candidates was measured as the ASSIM (Average Structural
Similarity Index) value of the frames in the set. The experiments were repeated for
3 trails, each with different independent runs, in order to get better comparative
analysis of the algorithms.
In Trial 1, the M tr was set 3 for Trail 1. The F values were chosen in the range
of [0, 0.5] and [0.5, 1] for the promising and non-promising regions. The proposed
DEsdrm failed to outperform cDE. The average ASSIM value of DE sdrm was higher
than the cDE.
In Trail 2, the F values were set differently for each region of the population. The
F values used were 0.5 and 0.9 for the promising and non-promising regions. The
M tr was set to 5. DE sdrm outperformed cDE with a marginal difference of 0.0039. As
well as, cDE also outperformed DE sdrm in 3 out of 5 runs. This showed the equal and
comparable performance of DE sdrm with cDE. It is worth noting here that the average
performance of DEsdrm has increased in its Trial 2 compared to its performance in
Trail 1.
In Trial 3, the F value was set constant, as 0.9, for both the promising and the
non-promising regions. The M tr was set to 50 for this trail. It was found that the
proposed DE sdrm generated key frames with lesser ASSIM values compared to cDE.
The experimental results recorded are presented in Table 3. The cDE and DE sdrm
algorithms were compared by different metrics measured on ASSIM values obtained
for the 50 runs. The best, worst and average ASSIM values of 50 runs were lesser
for the DE sdrm compared to the corresponding values of cDE. It is observed from the
results that, on comparing the corresponding runs of cDE and DE sdrm , the DEsdrm
significantly outperformed cDE in all the 50 runs. The pairwise difference between
the ASSIM values attained by the algorithms in each run is also reported in the results.
Table 3 Experimental results

Details cDE DE sdrm
for trial—3 video 1
Key frames 1, 1, 3, 23, 1, 1, 5, 24, 28, 39,
Comparison by 34,43,50, 51, 64, 48, 51, 58, 75
70
ASSIM Best 0.6250 0.5853
Worst 0.7195 0.6308
Worst—best 0.0945 0.0455
Average 0.6654 0.6045
Pair + 0 50
– 50 0
Min_Diff 0.0051
Max Diff 0.1205
Avg Diff 0.0608
Fig. 1 Key frames a cDE

and b DE sdrm
The average difference found was 0.0608. This showed the reasonable performance
enhancement achieved by the proposed DE sdrm algorithm. The key frames extracted
by the cDE and the DE sdrm are depicted in Fig. 1a, b, respectively, for a reference.
Thus, the superiority of the proposed DEsdrm algorithm was proven on a set of
14 benchmarking problems and a video analytics problem.
7 Conclusions
This paper proposed a novel mutation strategy named ‘sorted dual range mutation
(sdrm)’ to Differential Evolution (DE) algorithm. The DE, in which the classical
mutation operator replaced with the sdrm, was named as DE sdrm . To prove the
novelty of sdrm the classical DE and DE sdrm were implemented to solve a set of
14 benchmarking problems and a key frame extraction problem. The results of the
benchmarking experiments showed that the DE sdrm could outperform cDE, signifi-
cantly, for higher dimensions. For key frame extraction problems, three trials were
tried with different F values. The results revealed the trend of performance enhance-
ment of DE sdrm from Trial 1 to Trial 3. The results proved the superiority of the
proposed DE sdrm algorithm in the key frame extraction problem. The superiority of
DE sdrm was well evident in the chosen video.
In overall, the experiments on benchmarking and key frame extraction problems

revealed the novelty exists in the sdrm mutation. The sdrm followed a strategy of
exploring and exploiting the population in every generation from the beginning to
the end of the evolution of DE. This strategy can be further analysed by comparing
it with other similar mechanisms in the literature.
References
1. Rainer, S.: Differential evolution-a simple and efficient adaptive scheme for global optimization
over continuous spaces. Tech Report Int. Comput. Sci. Inst. (1995)
2. Attia, M., Arafa, M., Sallam, E.A., Fahmy, M.M.: An Enhanced differential evolution algorithm
with multi-mutation strategies and self-adapting control parameters. Int. J. Intell. Syst. Appl.
11(4), 26–38 (2019)
3. Zhou, Y., Li, X., Gao, L.: Adaptive differential evolution with intersect mutation and repaired
crossover rate. Appl. Soft Comput. 13(1), 390–401 (2013)
4. Duan, M., Yang, H., Liu, H., Chen, J., Duan, M., et al.: A differential evolution algorithm with
dual preferred learning mutation. Appl. Intell. 49, 605–627 (2019)
5. Ramadas, M., Abraham, A.: Revised mutation strategy for differential evolution algorithm.
In: Metaheuristics for Data Clustering and Image Segmentation-Intelligent Systems Reference
Library, vol. 152, pp 57–65 (2019)
6. Gokul, K., Pooja, R., Gowtham, K., Jeyakumar, G.: A Self-switching base vector selec-
tion mechanism for differential mutation of differential evolution algorithm. In: International
Conference on Communication and Signal Processing (2017)
7. Gokul, K., Pooja, R., Jeyakumar, G.: Empirical evidences to validate the performance of self-
switching base vector based mutation of differential evolution algorithm. In Proceedings of
7th International Conference on Advances in Computing, Communications and Informatics,
pp. 2213–2218 (2018)
8. Salehinejad, H., Rahnamayan, S., Tizhoosh, H.R.: CenDE: centroid-based differential evolu-
tion. In: Proceedings of IEEE Canadian Conference on Electrical & Computer Engineering
(CCECE)
9. Ali, Musrrat, Pant, Millie, Nagar, Atulya: Two new approach incorporating centroid based
mutation operators for differential evolution. World J. Model. Simul. 7(1), 16–28 (2011)
10. Prabha, Shashi, Yadav, Raghav: Differential evolution with biological-based mutation operator.
Eng. Sci. Technol. Int. J. 23(2), 253–263 (2020)
11. Jing, S.-Y.: Set-Based differential evolution algorithm based on guided local exploration for
automated process discovery. In: Foundations and Applications of Process-based Modeling of
Complex Systems, Complexity, vol. 2020, (2020)
12. Jeyakumar, G., ShunmugaVelayutham, C.: Differential evolution and dynamic differential
evolution variants—an empirical comparative performance analysis. Int. J. Comput. Appl.
(IJCA) 34(2), 135–144 (2012)
13. Jeyakumar, G., Shunmuga Velayutham, C.: Distributed mixed variant differential evolution
algorithms for unconstrained global optimization. Memetic Comput. 5(4), 275–293 (2013)
14. Jeyakumar, G., Shunmuga Velayutham, C.: Distributed heterogeneous mixing of differential
and dynamic differential evolution variants for unconstrained global optimization. Soft Comput.
18(10), 1949–1965 (2014). Springer
15. Wang, L., Zhang, Y., Feng, J.: On the Euclidean distance of images. IEEE Trans. Pattern Anal.
Mach. Intell. 27(8), (2005)
16. Algur, S.P., Vivek, R.: Video key frame extraction using entropy value as global and local
feature. arXiv:1605.08857 (cs.CV), (2016)
17. Liu, G., Zhao, J.: Key frame extraction from MPEG video stream. In: Proceedings of Second
Symposium International Computer Science and Computational Technology (2009)
18. Liu, H., Meng, W., Liu, Z.: Key Frame extraction of online video based on optimized frame
difference. In: Proceedings 9th International Conference on Fuzzy Systems and Knowledge
Discovery (2012)
19. Ramender, G., Pavani, M., Kishore Kumar, G.: Evolving optimized video processing and
wireless transmission system based on arm-cortex-a8 and gsm. Int. J. Comput. Netw. Wirel.
Mobile Commun. 3(5), (2013)
20. Liu, H., Pan, L., Meng, W.: Key frame extraction from online video based on improved frame
difference optimization. In: Proceedings of 14th International Conference on Communication
Technology (ICCT) (2012)
21. Abraham, K.T., Ashwin, M., Sundar, D., Ashoor, T., Jeyakumar, G.: An evolutionary computing
approach for solving key frame extraction problem in video analytics. In: Proceedings of
ICCSP-2017—International Conference on Communication and Signal Processing (2017)
22. Abraham, K.T., Ashwin, M., Sundar, D., Ashoor, T., Jeyakumar, G.: Empirical comparison
of different key frame extraction approaches with differential evolution based algorithms. In:
Intelligent Systems Technologies and Applications, ISTA 2017 Advances in Intelligent Systems
and Computing, vol. 683, pp. 317–326 (2018)
Annotation for Object Detection
P. Myna, R. V. Anirudh, Brundha Rajendra Babu, Eleanor Prashamshini,

and Jyothi S. Nayak
Abstract Computer vision is an important, new area of research. It requires large

datasets for training; such datasets are often inaccessible due to financial reasons or do
not exist for specialized needs. This paper discusses an annotation tool designed for
convenient data annotation. The aim is to enable easy manual annotation for image.
Also, annotation accuracy has been compared under a case study for detection by
humans and detection by YOLO9000.
Keywords Annotation · Object detection · Computer vision
1 Introduction
Video and image processing are a highly researched fields and are predicted to
continue expanding for a significant period. The improvement of computing capabil-
ities and easy access to video and image recording gadgets have enabled the develop-
ment of computer vision applications in surveillance, disease detection, autonomous
vehicles design, etc. Since most real-world applications are highly sensitive, it is
imperative to trained and tested machine learning algorithms on huge datasets.
P. Myna · R. V. Anirudh · B. R. Babu · E. Prashamshini (B) · J. S. Nayak

Computer Science and Engineering, B.M.S. College of Engineering, Basavanagudi, Bangalore,
Karnataka 560019, India
e-mail: prashamshini@gmail.com
P. Myna
e-mail: myna.pk3@gmail.com
R. V. Anirudh
e-mail: anirudh.rv1234@gmail.com
B. R. Babu
e-mail: brundha.r.reddy@gmail.com
J. S. Nayak
e-mail: jyothinayak.cse@bmsce.ac.in
https://doi.org/10.1007/978-981-33-4543-0_34
318 P. Myna et al.
Niche applications, such as those in biology and astronomy, often do not have
annotated datasets or easily accessible high-quality images. Thus, manual image
collection and annotation become the only option [1]. Another important application
of this tool is to compare the accuracy of algorithm-based object detection to the
accuracy of detection by the human eye.
The prime focus of this paper is to discuss the design of a manual annotation tool
and check the accuracy of the same with respect to algorithm-based annotation.
Annotation of an image means associating critical extra information with the
image/diagram. In this tool, all persons and objects are identified in the image and
are assigned the correct labels. YOLO9000 is a real-time object detection algorithm
used for classification for objects in the annotation tool.
Intersection over Union (IoU) is an evaluation metric used popularly used to check
object detection accuracy. This tool provides a feature to check the IoU between
human annotated image and image annotated done by the tool [2].
2 Related Work
2.1 Annotation Tools
Over the years, many automatic or semi-automatic annotation tools have been
developed [3]. Most of them work using pre-trained weights or targets. Thus, for
applications where no targets exist, manual annotation becomes a necessity.
2.2 Object Detection Algorithm
Object detection tools, supported by advancements in technology, have been recently

put into production. While accuracy, size and speed issues are still persistent, they
are no longer major hindrances.
Two-step object detection [4], such as CNN (Convolutional Neural Network),
R-CNN, Fast R-CNN, and Faster R-CNN models, generally surpass their one step
counterparts in accuracy. The first step, region proposal, checks for regions in the
image that have a significant probability of being an object and then generates relevant
coordinates. The second step, object detection, takes the generated regions as inputs
and performs classification.
On the other hand, single-step object detection models combine locating and
classifying into a single step and thus have higher speeds and memory efficiency
despite their simplicity. Some examples of these models are Single Shot MultiBox
Detector Model, Retina Net and You Only Look Once (YOLO).
Annotation for Object Detection 319
The object detection algorithm which has been tested for accuracy in this paper
is a version of YOLO [5]. YOLO detects objects and provides a confidence score for
how accurate the detection of the object is.
YOLO employs regression and compacts the whole detection pipeline to one
network. A single head iterates through sections of the image and processes the same
using a few convolutional layers to get a feature map. Then, offsets are calculated to
get an anchor box. This system of anchor and offsets is reported to decrease training
time. A threshold confidence score of 30% is generally used while generating object
detection outputs.
YOLO9000 [6] is a more optimized version and is a better fit here. The use of
Siamese Networks [7] has helped to train with the limited annotated surveillance
data that is available.
2.3 Currently Available Annotated Datasets
As computer vision applications expand, newer annotated datasets for specific needs
are required. To provide context, some commonly used datasets are discussed briefly:
1. Common Objects in Context (COCO) dataset [4]: This dataset is of 328,000
images and 91 object classes of objects in their natural surroundings. It has
labels for commonly seen objects such as cat, car and eye glasses. This dataset
was annotated by a tool called coco-annotator.
2. ImageNet dataset [8]: This mammoth dataset contains 12 subtrees with 5247
synsets of classifications and 3.2 million images. This dataset contains more
detailed labels like Egyptian cat, freight car, passenger car and sunglasses. This
dataset was hand-annotated.
3. SUN dataset [9]: This dataset focuses on scene categorization with 397 categories
and 130,519 images. This contains images with object labels such as door, car
and tree, as well as scene labels such as cafeteria, farm and elevator. This dataset
was hand-annotated.
2.4 Metric
The metric chosen to measure object detection capabilities in this paper is Intersection
over Union (IoU) [10].
Intersection over Union calculation requires:
1. The actual hand-labelled bounding boxes, referred to as ground-truth bounding
boxes.
2. The bounding boxes predicted as output by the object detection model.
Figure 1 explains how Intersection over Union is calculated. An IoU score greater
than 0.5 usually indicates good detection.
320 P. Myna et al.
Fig. 1 Intersection over

Union formula [10]
3 Implementation
The input to the system was videos of busy streets. A script was run to extract
frames from the video. Then, frames were run through YOLO9000 and also annotated
manually. The accuracy of detection was calculated using IoU. The frontend for the
application was implemented using ReactJS. MongoDB was used for its ability to
store semi-structured and unstructured data.
3.1 System Architecture
Images are uploaded to the tool where it is displayed with two layers; the actual
image and a transparent layer above on which annotation is done. Up to 100 images
can currently be uploaded at once.
Humans annotate each of these images manually by drawing boxes around each
person in the image. If required, YOLO9000 can be run on the images as well to
detect objects classified as ‘people’. On saving, a file with original images, human
annotation details and YOLO9000 annotation coordinates are stored on the local
system.
Input to the system is uploaded as images or as a video through a script to extract
frames. As inputting images individually would be cumbersome during testing, short
videos were inputted to the system. Frames were extracted from these videos at
random intervals and processed (Fig. 2).
3.2 Interface Design
A single page application, using ReactJS, has been created to provide access to the
annotation tool. The user flow has been crafted to be simple and intuitive for users.
The application has a drawable area, where the image to be annotated is layered
with a canvas. The user can proceed to manually annotate the displayed image by
Fig. 2 System architecture
drawing boxes around persons, using their mouse. When the user saves the annota-
tions, all coordinates are stored at the backend. Further, the user can run a comparison
with the YOLO9000 for the annotated images and download all the results.
3.3 Design
The frontend is built with ReactJS as a single page application (SPA). The application
is created using create-react-app, and each of the page components is dynamic. HTTP
requests are made from the ReactJS app, to the backend. The backend consists of an
Application Programming Interface (API) written in Go language and NodeJS. The
annotated images are stored using Mongo Atlas cloud services.
A Docker image of the application is created which is used to create a Docker
container. The container is hosted on AWS cloud services thus ensuring security and
scalability (Fig. 3).
Suitable images of people were collected. Then boxes, called bounding boxes, were
drawn around the object of interest using two sets of coordinates. The coordinates
are denoted by (X1, Y1) and (X2, Y2) such that X1 and Y1 are the coordinates of the
top-left corner of the object, and X2 and Y2 are the coordinates of the bottom-right
322 P. Myna et al.
Fig. 3 Design
corner of the object. Coordinates are measured with the top-left corner of the image
as the origin.
Using the two sets of coordinates, all the four corners of the object section of the
image can be represented as:
(X1, Y1)—Top-left coordinate of object
(X2, Y1)—Top-right coordinate of object
(X1, Y2)—Bottom-left coordinate of object
(X2, Y2)—Bottom-right coordinate of object
A two-part experiment was setup to record coordinates as follows:
(a) Human Annotation: The authors of this paper manually recorded the coordinates
of boxes around people in the images.
(b) Machine Learning Algorithm: The images are annotated by the chosen ML
algorithm, where emphasis is laid on a particular label/class of objects. Addi-
tionally, most algorithms give a confidence score for these detected objects in the
images. This paper explores using the YOLO9000 object detection algorithm
in the tool.
5 Case Study: YOLO9000
5.1 Overview of YOLO9000
YOLO9000 works well on images with abundant noise and thus is selected for the
accuracy comparison in this paper. YOLO9000 has been tested for object detec-
tion on the ImageNet detection validation set and has received a score of 19.7mAP
(mean Average Precision). On testing with the COCO dataset, YOLO9000 has scored
16.0mAP for the 156 classes not in COCO [6].
5.2 Dataset Description
Images of Indian urban and rural locations were used. A good mixture of images of
busy streets, markets and other public spaces was used. Four hundred images were
used to check the versatility of YOLO9000, especially to check its application in
monitoring crowded Indian public spaces.
5.3 Experimental Setup
YOLO9000 [6] has been chosen as the object detecting algorithm for its capa-
bility to provide coordinates of bounding boxes around 9000 objects and to provide
confidence scores. This case study focuses on detection of the ‘people’ label.
The images were uploaded to the tool, where each image was manually annotated
using the annotation tool.
Finally, the coordinates for people detected by YOLO9000 were evaluated with
respect to those detected by humans, using IoU. It is unrealistic to expect the model
to predict the exact coordinates of any detected object. By considering the area of
overlap between the ground-truth bounding boxes and the predicted coordinates, the
closeness of values generated by the model and by hand labelling can be measured
(Fig. 4).
In the accuracy analysis, the IoU is computed for each object (person) detected.
Further, the detected bounding boxes are looped over, and the IoU for each is
computed.
In order to measure IoU, each bounding box labelled as ‘people’ is checked with
all the possible ground truths. Then, the maximum IoU is considered for that specific
bounding box (non-max suppression). The above is repeated for each bounding box
detected by the YOLO9000 algorithm.
To tackle the possibility that people detected in the ground truth are completely
ignored by the YOLO9000 algorithm, the difference in the number of people detected
by YOLO9000 and the number of people annotated for ground truth is calculated.
324 P. Myna et al.
Fig. 4 Comparison of annotation by a humans and b YOLO9000
Fig. 5 Case study results: annotation by YOLO9000. Red bounding boxes represent annotation by
humans, and blue ones represent annotation by YOLO9000
These unaccounted detections of people are added with zero values to the list of the
people detected for IoU calculation. Then, IoU scores are averaged over images and
finally over the entire dataset (Table 1).
Table 1 IoU scores for case studies

Scenario Resolution Density of people Fig. IoU
Distant Medium Low 0.30967884374807597
5a
Blurry image with Low High 0.518701104056638
clustered objects 5b
Well-distributed High Low 0.7732527620999491
objects 5c
5.4 Results
Efficient image annotation is possible using this tool. Also, this tool enables stream-
lined, convenient testing of object detection and text detection algorithms. This
would be a significant convenience during the development of algorithms related
to computer vision.
On comparing manual annotation to YOLO9000 annotation, an IoU score of
0.3005834618310959 was obtained for our dataset. On assuming that the human eye
has an accuracy of 100% in detecting people, YOLO9000 scored about 30%. This
shows that human annotation might be more reliable for a variety of sensitive needs.
6 Conclusion and Future Enhancements
This tool helps to conveniently annotate large sets of images. Functionalities to allow
annotation by YOLO9000 have also been implemented. The use of open-source
software has made the tool inexpensive and thus, accessible,
The YOLO9000 feature currently processes an image in approximately 4 s on a
1.6 gigahertz, dual core system. The use of higher capacity processors would greatly
improve the speed of the YOLO9000 feature. Additionally, the use of improved
metrics might help us better evaluate and compare human annotation to machine
annotation.
References
1. Russell, B.C., Torralba, A., Murphy, K.P., et al.: LabelMe: a database and web-based tool for
image annotation. Int. J. Comput. Vis. 77, 157–173 (2008)
2. Cheng, Q., Zhang, Q., Fu, P., Tu, C., Li, S.: A survey and analysis on automatic image
annotation. Pattern Recogn. 79:242–259 (2018)
3. Zhang, D., Islam M.M., Lu, G.: A review on automatic image annotation techniques. Pattern
Recogn. 45(1):346–362 (2012)
4. Lin, T.-Y., et al.: Microsoft coco: common objects in context. In: European Conference on
Computer Vision. Springer, Cham (2014)
5. Redmon, J., et al.: You only look once: Unified, real-time object detection. In: Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (2016)
6. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (2017)
7. Koch, G., Zemel, R., Salakhutdinov, S.: Siamese neural networks for one-shot image
recognition. In: ICML Deep Learning Workshop, vol. 2 (2015)
8. Deng, J., et al.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference
on Computer Vision and Pattern Recognition. IEEE (2009)
9. Xiao, J., et al.: Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition. IEEE (2010)
10. Rosebrock, Adrian: Intersection over Union (IoU) for object detection-Py image search.
Machine Learning, Object Detection, Tutorials (2016)
Development of Self Governed Flashing
System in Automotives Using AI
Technique
N. Sankarachelliah, V. Rijith Kumar, P. Senthilram, S. Valai Ganesh,

T. Selva Sundar, S. Godwin Barnabas, and S. Rajakarunakaran
Abstract Developing an intelligence system to automatically turn ON or OFF the

indicator in automotive (particularly four wheeler) by drawing input from sensors.
Almost 50% of drivers failing to use indicators while changing lanes or overtake
a vehicle. This leads to vehicle accidents and may cause some serious issues. The
proposed system comprises of a steering angle sensor, optical sensor for vehicle
detecting and tracking and also incorporates open CV (AI tool) for lane detection
and this system is adaptable to the current situation.
Keywords Vehicle · Indicator · Intelligence system · Automatic · OpenCV
N. Sankarachelliah (B) · V. Rijith Kumar · P. Senthilram · S. Valai Ganesh · T. Selva Sundar ·

S. Godwin Barnabas · S. Rajakarunakaran
Ramco Institute of Technology, Rajapalayam, Tamil Nadu 626117, India
e-mail: 953617114078@ritrjpm.ac.in
V. Rijith Kumar
P. Senthilram
S. Valai Ganesh
e-mail: valaiganesh@ritrjpm.ac.in
T. Selva Sundar
e-mail: selvasundar@ritrjpm.ac.in
S. Godwin Barnabas
e-mail: godwin@ritrjpm.ac.in
S. Rajakarunakaran
e-mail: rajakarunakaran@ritrjpm.ac.in
https://doi.org/10.1007/978-981-33-4543-0_35
328 N. Sankarachelliah et al.
1 Introduction
New technologies are nothing but continual development in the technologies already
exist with the further updating. Today most part of our surroundings are equipped
with an intelligence system, which play a vital role in our day to day life. They are
playing major role in the every activity of our day to day life to further improve the
performances and to enhance the human ability. System which is programmed to
think like humans and mimics their actions [1]. The Intelligent System (IS) can be
described as a device that integrates information into machine handling applications.
Intelligent devices often perform complex automatic processes that are not feasible
under the conventional programming model. Human-machine interface helps the
driver to perform various tasks, for example intelligence system can be used to
control the turn signals in automotives. The current turn signaling system requires
the driver to turn on/off the signal for the required turn.
The remainder of the article is structured as follows. Section 2 describes the Need
Statement and Present Study in Sect. 3. Discussion on the existing system in Sect. 4
and discussion of our proposed system in Sect. 5. The experimental findings were
discussed in Sect. 6 followed by conclusion and future work in Sect. 7.
2 Necessity of the Device
The testing is carried out in the four-wheeler car for making a turns, change directions
or pass a car Drivers normally use the indicators [2]. Almost drivers either refuse to
signal or do not turn off the indicator while changing from one lane to another [3, 4].
Though it look like a minor violation to refuse a signal change, a lot of car crashes
occur when turning without notice or when the lane switches.
3 Problem Identification
The present study conducted by automotive engineers shows that most of the drivers
nearly 48% failed to turn the indicator off while changing the lane or while making a
turn and similarly 25% failed to turn the indicator on while making a turn [4]. More
study shows that most of the drivers not using the turn signals, nearly 20 crore times
for a day, for a year it comes to nearly 7500 crore time. It creates more problem than
the disturbances in the driving.
Mentioned numbers shows that there is an alarming rate of increase in this problem
and it is happening globally all around the world.No solution has been made till date
to address to this issue and whole present system is dependant on the driver input.
A driver mistake on the road not only threatens the safety of the driver, but also that
of the following cars. A single act of neglect quickly impacts a variety of individuals.
Development of Self Governed Flashing System in Automotives … 329
4 Existing Systems
4.1 Conventional Turn Indicator
The conventional turn indicator is fully manual controlled system which requires the
driver to turn on/off the signal for the required turn.
Often, this system may delay a driver’s response to trigger the turn signal. Some
drivers don’t trigger the turn signal because their hands need to be separated from
the steering wheel to turn the light on. The approach is much more challenging for
less experienced drivers.
4.2 ORVM
ORVM defines Outside Rear View Mirror shown in Fig. 1. Indicators mounted on
the rear view mirror to make sure the driver can quickly locate the signal and respond
correctly, particularly though a vehicle drive parallel to the car and has crossed the
traditional indicator mounted on the windshield. This is also very suitable for a
U-turn, because from a perpendicular perspective, these signs are clearly visible.
Fig. 1 OVRM
4.3 Automatic Vehicle Turn Indicator Using Speech

Recognition (Still, It Did Not Come into the Market)
The system actuates the indicator by recognizing the driver’s speech which is done
with the help of Google Maps Voice Assistant [5].
5 Proposed Solution
The proposed system is designed to automatically turn the indicator on / off and totally
evacuate the manual operation during overtaking and change of lane. Currently, the
proposed system focuses only on two-way traffic. The system draws input from
various devices and sensors.
The Framework comprises of 3 segments
1. Camera data source
2. Steering angle data source
3. Ranging sensor.
When the vehicle (A) leaves its current lane, it crosses a lane line with or without
a second vehicle (B) in proximity. The automatic activation may occur when the
vehicle processes a data from the device on the first vehicle to determine whether or
not it crosses the lane [6]. If the first vehicle (A) is computed, the turn signal will be
activated.
In the case of unmarked lane
The automatic activation may occur when the vehicle processes a data from the
device on the first vehicle (B) to determine whether any vehicle is in front of that or
not and the distance between two vehicles is lowered or not if it is determined then
the turn signal will be activated.
For Prior Indication
Velocity Difference is inversely proportional to the distance.
Relative Velocity between A and B
VB − V A = 40 km/h
The distance between A and B is decreased. There is a more chance to overtake
(Fig. 2).
Fig. 2 Relative velocity
Introduction of Autopilot [7] mode enhances safety and comfort features of the
vehicle. Autopilot is designed to support you with the most burdensome driving
parts. Autopilot adds new features to make the Tesla safer and more reliable over
time and improves current functionality. Autopilot allows the car to automatically
steer, accelerate and brake within its path. Present Autopilot functions require active
supervision of the driver.
Block Diagram
Figure 3 shows a flowchart of an exemplary SGF embodiment.
Development of Self Governed Flashing System in Automotives … 331
Fig. 3 Process flow
System Block Diagram is proposed and designed software for lane detection and
tracking using OpenCV (AI tool) [6]. Blinkers are automatically switched on/off,
that is made by controller. It requires a program (code) that is also designed for the
implementation (Fig. 4).
Fig. 4 Lane detection and

tracking
Avoid almost 50% of drivers failing to use indicators while changing lanes or
overtake another vehicle.
Compared to conventional system the proposed system is an efficient and completely

restricts manual intervention when an indication of a turn is required. Through this
we may prevent accidents. The project’s future work is to extend our system for all
types of vehicles and for all way traffic. And switch to alternate vision solutions. It’ll
then be practically tested.
Acknowledgements We would like to express our sincere thanks and gratitude to our college
management for providing excellent infrastructure, laboratory and computing facilities to complete
this research work successfully.
References
1. https://www.igi-global.com/dictionary/intelligent-system/15045
2. Yusuf, M.M., Karim, T., Saif, A.S.: A robust method for lane detection under adverse weather
and illumination conditions using convolutional neural network. In: Proceedings of the Inter-
national Conference on Computing Advancements, pp. 1–8 (2020)
3. http://www.foxbusiness.com/features/2012/05/04/half-drivers-dont-use-turn-signals
4. Ponziani, R.: Turn signal usage rate results: A comprehensive field study of 12,000 observed
turning vehicles. In: SAE Technical Paper. SAE International (2012). https://doi.org/10.4271/
2012-01-0261
5. Divakar, A., Krishnakumar, S., et al.: Automatic vehicle turn indicator using speech recognition.
Int. J. Recent Technol. Eng. (IJRTE) 8, 6697–6700 (2019)
6. https://towardsdatascience.com/tutorial-build-a-lane-detector-679fd8953132
7. https://www.tesla.com/autopilot
Comparison Between CNN and RNN
Techniques for Stress Detection Using
Speech
Bageshree Pathak, Snehal Gajbhiye, Aditi Karjole, and Sonali Pawar
Abstract The profession of maintaining law and order is not an easy task. It is an
inherently stressful job. Due to an increase in crime, policeman’s working hours have
also increased, resulting in poor psychological health and increased risk of suicide.
Hence, we are building software for the detection of stressed and non-stressed speech
for policemen. We propose to develop a system for Central Police Research (CPR)
using Machine Learning techniques. We are identifying if a person is in a stressed or
non-stressed condition using Python language. We are using two techniques Recur-
rent Neural Network (RNN) and Convolutional Neural Network (CNN) to detect
stress in speech.
Keywords Police · Machine learning · Feature extraction · Supervised learning ·

NN · MFCC · CNN · RNN
1 Introduction
Speech is an expression of ideas and thoughts using articulate vocal sounds. Stress is
a mental, physical, or emotional factor which cause mental or bodily tension. In this
research work, we are using machine learning techniques to determine whether an
individual is in stress or non-stress condition, given an audio recording. The database
is generated in two ways for this research work which are database generated at
B. Pathak · S. Gajbhiye · A. Karjole (B) · S. Pawar

Department of Electronics and Telecommunications, MKSSS’s Cummins College of Engineering
for Women, Pune, India
e-mail: aditi.karjole@cumminscollege.in
B. Pathak
e-mail: bageshree.pathak@cumminscollege.in
S. Gajbhiye
e-mail: snehal.gajbhiye@cumminscollege.in
S. Pawar
e-mail: Sonali.Pawar@cumminscollege.in
https://doi.org/10.1007/978-981-33-4543-0_36
334 B. Pathak et al.
CPR department and recorded voice samples from Internet. In the training phase,
recorded samples need to be converted into an appropriate format and provided to the
preprocessor for applying different processing techniques such as noise reduction,
silenced voice removal, etc.
Preprocessing output is given to the feature extraction. We have used mel-
frequency cepstral coefficients. Representation of power spectrum of a speech is
called Mel-frequency Cepstral (MFC). MFCC is the most efficient technique for
feature extraction, and it is further given to a supervised learning algorithm that is
CNN and RNN techniques. CNN generates fixed size output by taking fixed size
input. RNN, on the other hand, can handle arbitrary input/output lengths, but would
typically require much more data compared to CNN because it is more complex.
2 Literature Survey
Kouzani and Kipli [1] present a depression detection using MRI. To find whether the
person is under depressed or normal condition, the brain’s structural MRI, and it’s
volumetric features has been investigated so that we can determine features that are
contributed to the more accurate depression detection. It gives accuracy up to 80%
but the cons of this existing system is that it is a costly system.
Lee and Kan [2] have researched depression detection using EEG. They stated that
in a recent study about this topic electroencephalography (EEG) is used for analyzing
the waves of the brain. Brain’s electrical activity is measured by EEG. They got the
accuracy of about 70% but as a cons of this project, usually people don’t prefer to
go through EEG for detection of depression. Even it is giving high accuracy people
don’t prefer this process.
In this paper [3], the main target of the review was to discover the occurrence
of stress or anxiety or depression for patients having normal pathologies prompting
voice. The pathologies were focused on MTD and PVFMD because of its presumed
connection to mental conditions of patients.”
In [4], feature extraction used as the impulse response of the vocal response track,
and then, it is convolution with the glottal source of excitation signals, and funda-
mental frequency IAIF removes vocal track effects. SWIPE algorithm as a feature
selection method. The technique used for classification is specifically, stressed and
neutral PDF curves. In the paper, feature extracted are formants, BFCC, PLP, MFCC,
and energy. In this existing system, if they would use neural network algorithms which
surely add on to the accuracy of their system.
Alghowinem [5] uses the dataset of the speech signal, extracted features such as
linguistic and acoustic, energy, intensity, loudness, jitter, HNNNR, MFCC followed
by support vector machine classification technique. Databases used are Berlin
database, entered face database, and expressive speech database. Database used for
developing this system is highly efficient.
In [6], the ORI-DB database used in this with the extraction of feature technique
as a spectral category with Support Vector Machine (SVM) mode classifier. The
Comparison Between CNN and RNN Techniques for Stress … 335
accuracy is obtained between 80 and 84.5%, due to the prior knowledge of the
emotions considered with the help of processing of speech. In this system, accuracy
achieved more due to known dataset samples. Hence, cons of this research work is
that we can process only the known samples of dataset. System is not efficient for
unsupervised techniques.
3 Database Generation
The database is generated in two ways for this research work.
3.1 Database Generated at CPR
Database is generated at CPR department in collaboration with Cummins College

of Engineering and SNDT Arts and Commerce College. We visited the training
center of the police to collect speech samples. We took around 50 voice samples of
officers both stressed and non-stressed. We have also provided questionnaires to all
officers in the format of Google form. Based on their answers in questionnaires, the
psychology department of SNDT validated the stressed and non-stressed speeches.
This validation is done for a better understanding of the database and to cross verify
the result after getting output as stressed or non-stressed.
3.2 Recording Speech Samples
We have collected speech samples of stressed and non-stressed from media coverage
and YouTube videos. By considering the recent incidents such as the nationwide
pandemic situation of COVID 19, acid attack survivor’s speech, other judicial victims
of various cases, etc., the stressed speech samples collected from all such situational
videos. For non-stressed speech samples, by taking samples of family members,
relatives as they were in non-stressed phase.
4 Methodology
See Fig. 1.
Fig. 1 Block diagram
Fig. 2 Plot for speech signal
4.1 Signal Pre-processing
Audio channels, sample rate, bit depth are the audio properties that need prepro-
cessing. Librosa package in a python is used for audio processing. In this research
work for preprocessing, we used Librosa’s load() function. It has a default sampling
rate of 22.05 kHz; it normalizes the data and flattens audio channels to mono. Figure 2
shows speech signal after preprocessing. Duration of audio sample is set to 3 s.
4.2 Feature Extraction
In this research work, we have used Mel-frequency cepstral coefficients. Represen-

tation of the power spectrum of a speech is called Mel-frequency Cepstral (MFC).
MFCC is considered to be as most efficient technique for feature extraction, and it is
Fig. 3 Plot for MFCC data
further given to a supervised learning algorithm that is CNN and RNN techniques.
Figure 3 shows the plot for MFCC data. We have taken 13 MFCC coefficients per
frame for our dataset. There are total 259 frames for each audio sample.
4.3 Classification
The classification of stressed data and non-stressed data has been done using two
classifiers that are RNN and CNN.
4.3.1 RNN
RNN is a supervised machine learning technique. RNN is one of the types of artificial
neural networks. RNN uses its internal state for processing variable-length sequences
of inputs which are derived from feedforward neural networks. RNN is used to denote
networks of two classes with a similar structure, one having infinite impulse and the
other having finite impulse.
Here, we have used Long Short-Term Memory (LSTM) which is an RNN architec-
ture. Feedback connections are present in LSTM. It can process single data points, as
well as complete sequences of data like video or speech. LSTM commonly comprises
of a cell, input, an output, and a forget gate. The cell can remember values over
random time intervals. The flow of data in and out of the cell is controlled by three
gates. LSTM systems are suitable to process, classify, and make predictions using
time series data as there can be delays in the unknown period among significant
time series events. While training traditional RNN, a fading gradient problem is
encountered. LSTM is developed to deal with this problem.
4.3.2 CNN
CNN is a supervised machine learning technique. CNN involves input, output, and
multiple hidden layers. In hidden layers, series of convolution layers are present. The
RELU layer is normally used as an activation layer. There are additional convolutions
such as fully connected, pooling, and normalization layers. These layers are called
hidden layers. Activation function and final convolution are used to mask inputs and
outputs of hidden layers. In the convolutional layer, stride, depth, and zero padding
are the three hyperparameters which control the size of the output.
The formula to calculate number of neurons which will fit in a given volume is
[(i−k + 2 p)/s] + 1[(i−k + 2 p)/s] + 1
I input size
k kernel field size of the convolutional layer neurons
p zero paddings, and
s stride.
5 Results
A loss model indicates how bad the model’s prediction on a single epoch. In the
case of RNN, we have taken 10 epochs, and for CNN, 50 epochs have taken. If the
model’s prediction is true during validating the data, then the loss will be zero, else
the loss will be more. Model loss of training and testing data for CNN and RNN is
shown in Figs. 4 and 5, respectively. We have taken epoch on X-axis and loss on
Y-axis to show the model loss.
We used 104 speech samples to train the model. For RNN, we got 85.58%
accuracy. For CNN, we got 81.73% accuracy.
The confusion matrix is a table that describes the performance of the classifier
using the results of validation data. The confusion matrices shown in Tables 1 and 2
for validation data are obtained from two classification RNN and CNN, respectively.
We can verify model’s accuracy from confusion matrix tables.
Fig. 4 Model loss of RNN

algo
Fig. 5 Model loss of CNN

algo
Table 1 Confusion matrix

Stress Non-stress
for RNN
Stress 51 8
Non-stress 7 38
Table 2 Confusion matrix

Stress Non-stress
for CNN
Stress 54 5
Non-stress 14 31
6 Conclusion
In this research work, we have developed a software to detect whether a person is

under stress or not. Our research work is completely dedicated to the police depart-
ment. For this research work, we used Python 3.7 software and Spyder IDE compiler
to implement python code.
We generated database by two means, firstly from CPR department’s trainee
officers and secondly by using media coverage and YouTube videos for stressed
and non-stressed speech samples. The database which we collected from the CPR
department has been verified by the psychology department of SNDT college.
We have used CNN and RNN artificial neural network techniques to get the final
result. Testing accuracy obtained by the CNN technique is 81.73%, and the by RNN
technique is 85.58%. Therefore, from the results of both techniques, we conclude
that RNN is having greater accuracy than CNN for our database.
Acknowledgements We also like to express special thanks of gratitude to Central Police Research
(CPR) Department who gave us the opportunity to do this wonderful project and contributing toward
the betterment of the health of police officials.
References
1. Kouzani, A.Z., Kipli, K.: Evaluation of feature selection algorithms for detection of depression
from brain SMRI scans. Adv. Comput. Sci. Appl. Technol. (ACSAT) (2013)
2. Lee, P.F., Kan, D.P.X.: Decrease alpha waves in depression: an electroencephalogram (EEG)
study. In: International Conference on Biosignal Analysis, Processing and Systems (ICBAPS)
(2015)
3. Dietrich, M., Abbott, K.V., Gartner-Schmidt, J., Rosen, C.A.: The frequency of perceived stress,
anxiety, and depression in patients with common pathologies affecting voice. J. Voice 22(4)
(2008)
4. Simantiraki, O., Giannakakis, G., Pampouchidou, A.: Stress detection from speech using spectral
slope measurement. Pervasive Comput. Paradig. Mental Health (2016)
5. Alghowinem, S.: A comparative study of different classifiers for detecting depression in speech:
multi classifier system. In: IEEE International Conference on Acoustics, Speech and Signal
Processing (2013)
6. Stolar, M.N., Lech, M., Allen, N.B., Stolar, S.J.: Detection of Adolescent depression from speech
using optimized spectral roll-off parameters. Biomed. J. 2, 10 (2018)
7. Fung, P., Zuo, X., Li, T.: A multilingual database of natural stress emotion. In: Proceeding of
the 8th International Conference on Language Resources and Evaluation (LREC’12) (2012)
8. Hawila, S., Tomba, K., Dumoulin, J., Khaled, O.A., Mugellini, E.: Stress detection through
speech analysis. In: Proceeding of the 15th International Joint Conference on e-Business and
Telecommunication (ICETE) (2018)
Finding the Kth Max Sum Pair
in an Array of Distinct Elements Using
Search Space Optimization
Deepak Ahire , Smriti Bhandari , and Kiran Kamble
Abstract The algorithm aims to find the Kth max sum pair of two indices of an
array of N (N ≥ 2) distinct elements [a1 , a2 , a3 , …, an ]. If the sum of values repre-
sented by the 2 indices of a single pair in array A is the same as that of any other pair,
i.e., if P(i, j) and P(m, n) are 2 distinct pairs and if (A[i] + A[j] = A[m] + A[n]),
then the pair containing the index which represents the maximum of all 4 values
represented by indices of the 2 pairs in the array obtains the highest priority, i.e., if
(A[m]>A[i]>A[n]>A[j]), then the pair containing the index m obtains the highest
priority. The purpose of this algorithm is to optimize the computation of recommen-
dations on real time platforms. At the time of making a purchase on e-commerce
platforms, with millions of options available in the product catalog, the algorithm
can be used to recommend the best complementary product that can be bought as a
pair with the main product or two all together different products of same type as of
main product which can be bought as a combo or a pair. Not only the top recommen-
dations, but random recommendations are also necessary so that the customers get
a good breadth or variety of the available products in the catalog. In this paper, we
propose an algorithm which can be used to address both the scenarios in real time
and conclusively, it is evident that the time and space complexities are independent
of K.
All the authors have an equal contribution towards work.
D. Ahire (B)
Walchand College of Engineering, Sangli, Maharashtra, India
e-mail: ahiredeepak20@gmail.com
S. Bhandari
Department of Computer Science and Engineering, Annasaheb Dange College of Engineering
and Technology, Ashta, Maharashtra, India
e-mail: smriti_bhandari@yahoo.com
K. Kamble
Department of Computer Science and Engineering, Walchand College of Engineering,
Sangli, Maharashtra, India
e-mail: kirankamble5065@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_37
342 D. Ahire et al.
Keywords Algorithm · k-max-sum-pair · Searching · Sorting ·

Product-recommendation · Real-time-searching · Search-space-optimization
1 Introduction
Searching is presently one of the most tedious tasks. With increases in the amount
of data every second, searching can consume a substantial amount of CPU time
depending upon the data organization and searching mechanisms used. Not only
the processors, but it also taxes the users performing online search [1, 2]. More
importantly, regarding the retrieval of data in real time, for example, in the case
of an e-commerce platform, Amazon found that a delay of a fraction of a second
can cost several percentage points of sales, as discussed in [3]. A Harvard business
review discusses how the design of the product page also affect the online sales [4].
In addition to processing time and page design, the consumer demands are also
affected by the availability of substitutes and complements, as discussed in [5]. Not
only swiftly rendering customers’ requirements but, sales promotion is also important
to change their perception and purchasing behaviour [6]. One of the most important
incentives of sales promotion is product bundling or combo [7]. Customers generally
tend to buy combos instead of one main product if offered at the same price or lower
price. According to the study discussed in [8], it was found that the customers placed
a perceived value on combo meals, even if it would cost the same when choosing
items a la carte. People also prefer combo meals even there is no discount [8]. Results
reported in [9], on basis of experiments provided empirical evidence that customer
preferred bundles in circumstances when the searching cost was reduced by the
availability of the combos as a choice. As customers expect the swift and graceful
experience because of the fact that humans are generally bad at choosing when plenty
of options are available, described in [10], loading all possible recommendations on
the page is not a feasible option as it would also consume a lot of time. Instead, the
top K matching combos can be recommended to the customer, which is analogous to
the “Top N Video Ranker” technique used by Netflix as discussed in [11]. Not only
the top K matching recommendations, but completely random recommendations are
also useful for customers so that they get a good breadth or variety of the available
products in the catalog as discussed in [11, 12]. The random recommendations, not
only provide a good breadth of available products, but act as a choice for customers
in terms of price, brand, current situations, publicity, and many more. These can also
be used to promote new, popular, non-recent and non-popular items which would
have not been found out by the users as described in [13]. A customer may like the
recommended combos, but may reject it on the basis of price as discussed in [14].
Customers also buy combos with high cost than their preferred price limit if they
get a better product quality, brand or for something extra which they might not have
considered while buying in the first place [15]. Therefore, computing and suggesting
random recommendations are also crucial with the top K recommendations taking
swiftness into account. In this paper, we propose an algorithm which can be used
Finding the Kth Max Sum Pair in an Array of Distinct Elements … 343
to address both the scenarios. At the time of making a purchase on e-commerce

platforms, with millions of options available in the product catalog, the algorithm
can be used to recommend the best complementary product that can be bought as
a pair with the main product or 2 all together different products of same type as
of main product which can be bought as a combo or a pair. Once the customer
filters the product and adds the main product (the product of his/her choice) into the
cart, the proposed algorithm in the article comes into play. It has to suggest pairs
of products from a list or collection of relevant or similar products with respect to
main product. This list or collection of the relevant products is already computed by
the recommendation engine on basis of several metrics such as commonality index,
which represents the relatedness of a item to other relevant item [16]. Although, the
list is not the same at all time, as the engine is constantly learning in the backend
considering several factors. But, for a particular instance of time, this list of relevant
products can be used in real time to suggest the pairs of product or combo. This
problem can be solved by finding the Kth Max Sum Pair in an array of distinct
elements. Here, for example, the elements of the array can represent the commonality
index assigned to other products with respect to the the main product. For a pair or
combo of related products, the commonality index can be computed as summation
of commonality indices of the individual products. Computing the Kth Max Sum
Pair in real time is tricky as this task is both, compute and space intensive because
there can be millions of related products to the main one and the maximum number
of possible pairs or combos, K max = N2 , where N is the size of the list of related
products. This paper is organized in sections as follows: Section 2 describes the
abbreviations used, an example to explain the problem statement and the related
work. The proposed approach is presented in Sect. 3 with an algorithm and complexity
analysis. The experimental setup, results and a detailed discussion on results for the
implementation with different dataset are reported in Sect. 4. Finally, the Sect. 5
provides the conclusion.
2 Problem Statement
2.1 Abbreviations
Table 1 lists the abbreviations used in this manuscript.
2.2 Example
The following is an example that aims to explain the use case:

344 D. Ahire et al.
Table 1 Abbreviations used in this manuscript

Abbreviation Definition
A Input array (0 based indexing is used)
MAX_SUM Sum of a pair having a maximum sum (A[N − 1] + A[N − 2]) after A is
sorted
MIN_SUM Sum of a pair having a minimum sum(A[0] + A[1]) after A is sorted
N Size of the array
P(i, j) Pair of indices i and j, where j > i, i < N, and j < N
P-QUE Priority queue for holding pairs P(i, j) and obeying PRI-1 and PRI-2
PRI-{i} Priority for maintaining pairs in the queue. Here i represents the priority
number
PRI-1 Non-increasing order for a pair sum
PRI-2 If the sum of values represented by 2 indices of a single pair in array A is
the same as that of any other
pair, for example, if P(i, j) and P(m, n) are 2 distinct pairs and if
(A[i] + A[ j] = A[m] + A[n]),then, the pair containing the index which
represents the maximum of all 4 values represented by indices of the 2
pairs in the array obtains the highest priority. For example, if
(A[m]>A[i]>A[n]>A[ j]), then the pair containing the index m obtains
the highest priority
S Set for holding unique pairs P(i, j)
TARGET_PAIR The required Kth pair
Consider an array of distinct elements, A = {1, 2, 3, 4}, which represents the list
of commonality indices of the products. The maximum value, 4, belongs to the main
product and the other three belong to the related products.
Thus, the possible set of pairs (representing pairs of indices of the array A, sorted
according to PRI-1 and PRI-2, as mentioned in Table 1) are = {(2, 3), (1, 3), (0, 3),
(1, 2), (0, 2), and (0, 1)}.
Notice that the 3rd pair and the 4th pair ordered in the above set represent an
equal sum, i.e.(A[0] + A[3]) = (A[1] + A[2]), but the 3rd pair obtained the highest
priority as A[3] >A[2] >A[1] >A[0]. Therefore, if K = 3, then the answer is P(0,
3), which means that the combo of the main product and the product having
commonality index of 1 (as A[0] = 1) stands 3rd in the list of combos with respect
to PRI-1 and PRI-2.
2.3 Related Work
The algorithm devised in this article is inspired by a similar use case based on two
arrays. The use case aims to find the first K maximum sum pairs from all the possible
sum pairs using the two given arrays, as discussed in [17–20]. For our scenario, we
need an approach which works for a single array. The naive approach is to compute
a set of all possible pairs P(i, j) and sort them according to PRI-1
and PRI-2. After
sorting, the first K maximum sum pairs are returned. There are N2 distinct pairs that

can be formed from a list of N elements. Therefore, for sorting, it will take O( N2 *
N
log( 2 ) = O(N 2 * log(N)) time complexity and O( N2 ) space complexity. A more
optimised approach is to limit the search space as discussed in [17–20]. Identical
approach was used to devise an optimised algorithm which works for a single array,
provided in Algorithm 1.
Algorithm 1 Find_Kth_Max_Sum_Pair(A, K)
Finds the Kth Max Sum Pair in an array of distinct elements
Pre A is the array containing distinct elements, K is a constant
Post Array A is sorted
Return The TARGET_PAIR
1: Sort the array A in a non-decreasing order.
2: Enqueue the pair having MAX_SUM, i.e., P(N−2, N−1) into P-QUE.
3: Initialise temporary variable dequeue_count = 0.
4: Initialise S to an empty set (to avoid insertion of duplicate pair in the P-QUE ).
5: Loop( P-QUE is not empty and dequeue_count = K−1 ) do
5.1: Dequeue the P-QUE front item. (let it be P(i, j)).
5.2: Increment dequeue_count by 1.
5.3: Insert the dequeued pair P(i, j) into set S.
5.4: Enqueue new pair P(i−1, j), if not present in the set S, if i−1 ≥ 0 and (i−1) = j.
5.5: Enqueue new pair P(i, j−1), if not present in the set S, if j−1 ≥ 0 and (j−1)= i.
End Loop
6: Return P-QUE front (front item is the required TARGET_PAIR).
End Algorithm 1
Rather than computing all the possible pairs, the focus is to generate only the first
K Max Sum Pairs. We are enqueuing the pair once and also dequeuing it from the
queue. Therefore, for each pair, there are 2 operations (enqueue into and dequeue
from P-QUE). Therefore, for K number pairs, the time complexity is equal to O(K
* 2 * log(K)), that is, O(K * log(K)), as maximum number of pairs possible in this
case is of the squared order of the size of the input (K max = N2 ). The factor of
log(K) is generated because of the max heap operations. Gerald’s O(1) time priority
queue [21] is significant in reducing the factor of log(K) and thus finally reducing
the time complexity to O(K). A. Mirzaian and E. Arjomandi devised an O(N) time
algorithm for a similar use case for selecting the Kth smallest element in matrix
which is cartesian sum of 2 sorted vectors of real numbers each of size N [22]. For
our scenario, we have to compute the Kth Max Sum Pair using a single array. For K
= 1, we can just find the maximum sum pair in the given array and the pair can be
computed in O(N) time
complexity and O(1) space complexity as discussed in [23].
The case for K = N2 is equivalent to finding the minimum sum pair in the given array
and the pair can be computed in O(N) time complexity and O(1) space complexity
as discussed in [24].
346 D. Ahire et al.
Table 2 All pairs and corresponding pair sums are from Sect. 2.2 mentioned earlier
Pairs Corresponding pair sum Number of pairs (having pair
sum ≥ Corresponding pair
sum)
(2, 3) 7 1
(1, 3) 6 2
(0, 3) 5 4
(1, 2) 5 4
(0, 2) 4 5
(0, 1) 3 6
The pair sum is the sum of the values in array A at the indices represented by the pair, and the
corresponding pair sum is the pair sum of the pair mentioned in the respective rows
3 Proposed Approach
Even the optimized version, that is algorithm 1 have

both the time and space com-
plexities of the squared order of N (K max = N2 ). It would consume a substantial
amount of CPU time and hence will lead to a greater user response time, when con-
sidered to be applied to a real-life scenario where the basic size of the input starts
with a value greater than or equal to a million. Therefore, we devised an algorithm
to find the answer in time and space complexities, that are independent of K.
Before we dive into the approach, the following facts are worth examining:
• We know that the TARGET_PAIR will have a pair sum, which we will call
TARGET_SUM.
• MIN_SUM ≤ TARGET_SUM ≤ MAX_SUM.
• Table 2 forms the basis of the development of the approach.
• The number of pairs with a pair sum greater than or equal to the given sum can be
computed in time complexity O(N * log(N)). This can be achieved by subtracting
the number of pairs with pair sum less than the given sum from total number of
possible pairs. The number of pairs with pair sum less than the given sum can
be computed in time complexity O(N) for sorted array as discussed in [25]. For
unsorted array, it takes an extra factor of log(N) to sort the array.
The key is to find the greatest TARGET_SUM that follows Eq. 1.
Number of pairs(having pair sum ≥ TARGET_SUM) ≥ K (1)
Having computed the greatest TARGET_SUM, we know that there are at max-
imum N/2 such pairs possible that have a pair sum equal to the greatest computed
TARGET_SUM.
Thus, our new search space is now reduced to the size of N/2 at maximum.
TheTARGET_PAIR lies in the newly generated search space.
We need an offset to find the Kth pair. We cannot directly return the Kth pair in
the new search space, so we need to subtract the count of pairs (having pair sum >
greatest computed TARGET_SUM).
Therefore, let us define a function, F(given Sum), which will return the number
of pairs having a pair sum ≥ givenSum.
Therefore, F(givenSum) = (Number of pairs with pair sum ≥ givenSum). Then,
New Offset(K New ) = K − F (greatest computed TARGET_SUM + Δ) (2)
Note that, in Eq. 2 a very small value (Δ), is added to the greatest computed
TARGET_SUM and then passed to function F, as we want the count of pairs having
a pair sum greater than the greatest computed TARGET_SUM.
Note: Δ can take any type of value as per the datatype of input array elements.
For example, if the datatype of an array is an integer, then Δ = 1 or, if the datatype
of array is a floating-point, Δ = 0.001. Finally, the K N ew th pair in the new search
space is the required TARGET_PAIR.
3.1 Proposed Algorithm
The proposed approach to solve the problem is provided in Algorithm 2

Algorithm 2 Find_Kth_Max_Sum_Pair(A, K)
Finds the Kth Max Sum Pair in an array of distinct elements
Pre A is the array containing distinct elements, K is a constant
Post Array A is sorted
Return The TARGET_PAIR
1: Sort the array A.
2: Assign MIN_SUM = A[0]+A[1].
3: Assign MAX_SUM = A[N−2]+A[N−1].
4: Binary Search on TARGET_SUM (lower_bound = MIN_SUM, upper_bound = MAX_SUM, K):
4.1: Find the greatest TARGET_SUM, such that F(TARGET_SUM) ≥ K
End Binary Search on TARGET_SUM.
5: Generate the new search space having pairs that have a pair sum equal to the
greatest computed TARGET_SUM.
6: Calculate the New Offset(K N ew ) = K − F (greatest computed TARGET_SUM + Δ).
7: Return the K N ew th pair from the new search space.
End Algorithm 2
3.2 Complexity Analysis
Time Complexity: For sorting the array, it takes O(N * log(N)) and for binary search
and finding the greatest TARGET_SUM, O(N ∗ log(MAX_SUM − MIN_SUM)).
Therefore, Time Complexity = O(N ∗ log(N )) + O(N ∗ log
(MAX_SUM − MIN_SUM)) = O(N ∗ max(log(N ), log(MAX_SUM − MIN_
SUM))).
348 D. Ahire et al.
Space Complexity: For the generation of New Search Space: O(N/2). Therefore,
Space Complexity = O(N).
4 Experimental Setup, Results and Discussion
Both the algorithms were implemented, and multiple tests were performed using the
environment mentioned in Table 3. Tables 4, 5, 6, 7, 8 and 9 describe the results of
the tests carried out on unsorted array containing N distinct elements having values
in the range (1, N). Test 1 is comparison of average runtimes for both the algorithms.

K
Average Runtime = ( Runtime to compute the ith pair)/K (3)
i=1
For Algorithm 1,

K
Runtime to compute the ith pair = 2 ∗ (1 ∗ log(1) + 2 ∗ log(2)
i=1
+ 3 ∗ log(3) + . . . . . . + K ∗ log(K )) ≈ K 2 ∗ log(K ) ≈ O(N 4 ) (4)
From Eqs. 3 and 4, it is evident that for Algorithm 1, the time complexity to compute
average runtime is O(N 4 ), therefore, we have limited the input size for comparison
to 200, to compute the average runtime in polynomial time. For Test 2, the average
runtime for greater values of N was computed by limiting K, that is 1 ≤ K ≤ N . In
Test 3, the average runtime was computed for first half of the range of values of K,
that is, 1 ≤ K ≤ (N ∗ (N − 1)/4), whereas for Test 4, it was for second half, that is,
(N ∗ (N − 1)/4) ≤ K ≤ (N ∗ (N − 1)/2). Test 5 was carried out for constant input
size of N = 104 , so that the average runtime could be computed for greater values
of K in polynomial time. Test 6 was carried out solely on Algorithm 2 for greater
values of both N and K.
Table 3 Testing environment for both algorithms

Environment type Value
Testing platform Google cloud
Machine type n1-standard-4
CPU platform Intel Haswell
OS image Debian GNU/Linux 10 (buster)
Implementation Language C++
RAM 15 GB
Number of vCPUs 4
Runtime calculation Using chrono library in C++
Table 4 Results for test 1

Input size (N) Metrics
Average runtime in seconds % Reduction in
average runtime
Algorithm 1 Algorithm 2
2 0.000005 0.000005 0
10 0.000068 0.000006 91.176
15 0.000176 0.000010 94.318
25 0.000560 0.000024 95.714
50 0.002677 0.000070 97.385
75 0.006618 0.000135 97.96
100 0.012644 0.000185 98.537
105 0.013935 0.000192 98.622
110 0.015522 0.000204 98.686
120 0.018851 0.000223 98.817
130 0.022641 0.000243 98.927
150 0.031912 0.000342 98.928
200 0.059142 0.000470 99.205

Input size (N) Average runtime in seconds
200 0.000439 0.000467
400 0.000944 0.001175
1000 0.002737 0.003605
2000 0.005621 0.008752
5000 0.015746 0.029666
10000 0.034195 0.068591

average runtime
10 0.000031 0.000006 80.645
50 0.001219 0.000074 93.929
100 0.005690 0.000186 96.731
150 0.013735 0.000344 97.495
200 0.025961 0.000475 98.170
250 0.042503 0.000599 98.591
350 D. Ahire et al.

average runtime
10 0.000102 0.000007 93.137
50 0.003794 0.000071 98.129
100 0.017793 0.000186 98.955
150 0.044392 0.000344 99.225
200 0.084147 0.000473 99.438

K Metrics
average runtime
102 0.001313 0.059648 −4442.879
103 0.006624 0.067545 −919.701
104 0.063618 0.067740 −6.479
105 0.732708 0.077039 89.486
106 9.744984 0.073913 99.242
107 126.241947 0.070026 99.945
(104 ) ∗ (104 − 1)/2 = 717.129630 0.009545 99.999
K max

Algorithm 2 Runtime in seconds
Input size K = 106 K = 107 K = 108 K = 109 K = 1010 K = 1011
(N)
104 0.079911 0.075864 NA NA NA NA
105 1.172436 1.148921 1.022144 1.113623 NA NA
106 14.508822 14.440534 15.067522 16.353549 15.003332 15.336468
From Table 4, it is evident that, for N ≤ 200, there was more than 90% reduction in
average runtime as compared to Algorithm 1. From Test 2 and Test 5, it is evident that
Algorithm 1 performs better than Algorithm 2, when 1 ≤ K ≤ N . From Tables 6 and
7, it is evident that, % reduction in average runtime for higher values of K is greater
as compared to the lower values. For values of K ≥ N , the Algorithm 2 outperforms
Algorithm 1. Table 8 depicts that as K increases, the % reduction in runtime for
Algorithm 2 increases. Test 5 and Test 6, act as a proof that it is possible to calculate
the pairs with lower priorities or higher in rank, in real time using Algorithm 2, giving
them a fair chance to be recommended to the customer with saving more than 89%
of average CPU time as compared to Algorithm 1.
5 Conclusion
In this manuscript, we addressed the importance of both top K and random recom-
mendations. We discussed why the existing algorithms have a high response time and
can’t provide a fair chance to the pairs with lower priority. We proposed an optimized
algorithm that finds the Kth max sum pair in time and space complexities indepen-
dent of K and proved why it is a feasible real time searching option by carrying out
various tests for different values of N, K and an input array of commonality indices
thus, supporting a catalog of a million products.
References
1. Sohail, S., et al.: Product recommendation techniques for ecommerce—past, present and
future. Int. J. Adv. Res. Comput. Eng. Technol. 1(9), 219–225 (2012)
2. Gayle, L.: How Product Bundling Can Boost Your E-Commerce Sales. https://returnonnow.
com/2018/08/how-product-bundling-boost-ecommerce/ (2018)
3. Einav, Y.: Amazon Found Every 100ms of Latency Cost them 1% in Sales. https://www.
gigaspaces.com/blog/amazon-found-every-100ms-of-latency-cost-them-1-in-sales (2019)
4. Harmeling, C. et al.: How to Design Product Pages that Increase Online Sales. https://hbr.
org/2019/11/how-to-design-product-pages-that-increase-online-sales
5. Rousu, M., et al.: The effects of selling complements and substitutes on consumer willing-
ness to pay: evidence from a laboratory experiment. Can. J. Agric. Econ. Revue canadienne
d’agroeconomie. 56(2), 179–194 (2008)
6. Ai, W., Yazdanifard, R.: The review of how sales promotion change the consumer’s perception
and their purchasing behavior of a product. Glob. J. Manage. Bus. Res. E Mark. 15(5), 32–37
(2015)
7. Foubert, B.: Product Bundling: Theory and Application. University of Antwerp, Faculty of
Applied Economics, Working Papers (1999)
8. Sharpe, K., Staelin, R.: Consumption effects of bundling: consumer perceptions, firm actions,
and public policy implications. J. Pub. Policy Mark. 29(2), 170–188 (2010)
9. Harris, J., Blair, E.: Consumer preference for product bundles: the role of reduced search
costs. J. Acad. Mark. Sci. 34(4), 506–513 (2006)
10. Schwartz, B.: The Paradox of Choice. Harper Perennial, New York (2004)
11. Gomez-Uribe, C., Hunt, N.: The netflix recommender system. ACM Trans. Manage. Inf. Syst.
6(4), 1–19 (2016)
12. What the difference between global and random recommendations?. https://support.
shippingeasy.com/hc/en-us/articles/115005400683-What-the-difference-between-global-
and-random-recommendations
13. Hopfgartner, F.: News recommendation in real-time. In: Smart Information Systems: Com-
putational Intelligence for Real-Life Applications, pp. 169–170. Springer International Pub-
lishing (2015)
352 D. Ahire et al.
14. Zhao, Q., et al.: E-commerce recommendation with personalized promotion. In: Proceedings
of the 9th ACM Conference on Recommender Systems—RecSys ’15, pp. 19–226 (2015)
15. Shanthi, R.: Customer Relationship Management. MJP Publisher (2019)
16. Linden, G., et al.: Collaborative Recommendations Using Item-to-Item Similarity Mappings
(2020)
17. Agrawal, N., Sharma, S.: K maximum sum combinations from two arrays—GeeksforGeeks.
https://www.geeksforgeeks.org/k-maximum-sum-combinations-two-arrays/
18. Gangwar, A.: N Max Sum Pairs. https://discuss.codechef.com/t/n-max-sum-pairs/14769
19. Liu, S.: N Max Pair Combinations. https://shengqianliu.me/heaps-and-maps/n-max-pair-
combinations
20. K maximum sum combinations from two arrays—Tutorialspoint.dev—TutorialsPoint.dev.
https://tutorialspoint.dev/data-structure/heap-data-structure/k-maximum-sum-
combinations-two-arrays
21. Paul, G.: A complexity O(1) priority queue for event driven molecular dynamics simulations.
J. Comput. Phys. 221(2), 615–625 (2007)
22. Mirzaian, A., Arjomandi, E.: Selection in X + Y and matrices with sorted rows and columns.
Inf. Process. Lett. 20(1), 13–17 (1985)
23. Mittal, N.: Find the Largest Pair Sum in an Unsorted Array—GeeksforGeeks. https://www.
geeksforgeeks.org/find-the-largest-pair-sum-in-an-unsorted-array/
24. Ojha, D.: Smallest Pair Sum in an array—GeeksforGeeks. https://www.geeksforgeeks.org/
smallest-pair-sum-in-an-array/
25. Mittal, N.: Count Pairs in a Sorted Array Whose Sum is Less than x—GeeksforGeeks. https://
www.geeksforgeeks.org/count-pairs-array-whose-sum-less-x/
Dynamic Trade Flow of Selected
Commodities Using Entropy Technique
Sharmin Akter Milu, Javed Hossain, and Ashadun Nobi
Abstract The global entropy and uniformity of three major commodities that
exported most worldwide and all commodities combinedly were observed from
1995 to 2018. It is found that global entropy and uniformity of manufactured goods
chiefly classified by material are higher than two other products machinery, transport
equipment, and crude materials, inedible, except fuels, and it was fluctuating in two
products. In 2018, they fall remarkably in manufactured goods and crude materials,
inedible, except fuels. Further, local entropy and number of trade partners of two
world’s most influencing countries China and USA were investigated and compared
it with world’s average value of local entropy and trade partners. It is seen that local
entropy and trade partners of two countries are much higher than world’s average
value except some early cases of local entropy of China. It is also observed that
when local entropy and number of partners both countries declined together, world’s
average values fall significantly.
Keywords International trade · Export · Global entropy · Uniformity · Trade

partnership
1 Introduction
Economic transactions are made among the countries for the purpose of providing a
nation with commodities it lacks in exchange for those commodities that it produces
in abundance. This is called the export-import relationship or worldwide trade [1–4]
S. A. Milu · J. Hossain · A. Nobi (B)

Department of Computer Science and Telecommunication Engineering(CSTE), Noakhali Science
and Technology University, Sonapur, Noakhali 3814, Bangladesh
e-mail: ashadunnobi_305@yahoo.com
S. A. Milu
e-mail: sharminmilu7@gmail.com
J. Hossain
e-mail: javedhossain.nstu@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_38
354 S. A. Milu et al.
relationship. It is established among nations depending on the economic, political,

and social relationships. It can be thought as the driving force for the development of
economic growth of a country. Globalization and regionalization have been occurred
during last two decades in the term of international trade. In recent years, by viewing
the international trade system as an interdependent complex network, a huge amount
of fast-growing literature has been built, where nodes are representing countries, and
edges are trade relationships [5–11]. In weighted network approach, each link carries
the trade value which is known as weight. The weight is an amount that a country
produces with a partner country in a year [12]. Trade volumes are also distributed
largely [7, 13, 14]. To observe the number of trade partners with the variation of
time, how trade values are distributed with the diverging number of trade partners
and how much different volumes are among the trading countries, a study has been
employed with entropy technique [15].
In this paper, we have worked with three main products that are exported exten-
sively and associated with large number of trade partnerships. The products are:
(1) Manufactured goods chiefly classified by material, (2) Machinery and transport
equipment, and (3) Crude materials, inedible, except fuels. We also took the export
value aggregated all over the products which we named as all commodity.
The aim of the study is to find the global entropy for the products all over the world
and how uniform the trade was. We also found the local entropy and the number of
trade partners for the most involving countries in international trade China and USA
and also compared them with average local entropy and trade partners of the whole
world for the individual three products and all commodity.
2 Data Analysis
The trade data has been here was from United Nations (UN) COMTRADE. We
studied whole concept for a time period from 1995 to 2018 about 24 years for 168
countries. To build an international trade network, we constructed a matrix, where
each cell of the matrix contains the trade value that it exported with a partner country.
If there is a trade relationship between any two countries (e.g., France, India, Japan,
China, USA), then we’ve set the trade value and otherwise 0.
3 Methods
3.1 Global Entropy
We have worked with exported value for measuring the global entropy. We took
the values, where Yi j (t) ≥ 0; here, trade flows from country i to j for the year t.
Dynamic Trade Flow of Selected Commodities Using … 355
We used the normalized value of exported volume by calculating like this yi j (t) =
N
Yi j (t)/Y (t), total trade value Y (t) = k,l Yk,l (t).
Then, the global entropy for each commodity for a time period is determined as
[16–18].

S(t) = − yi j (t)log2 yi j (t)
i, j
This equation provides the information about the problem of which pair of coun-
tries make a partnership for trading a product. If global entropy increases, it means
that total number of trading pair countries are increasing. On the other hand, it is
concentrated only some specific pairs of countries if the entropy decreases.
3.2 Uniformity
Uniformity gives the information how much the trade heterogeneous or homogeneous
was. To calculate
uniformity, we need the total partners involved in trading products
such as P(t) = i, j,yi j ≥0 1. Then, the uniformity is the ratio, U(t) = S(t)/ log2 P.
3.3 Local Entropy
The information amount of the trade partnership of a single country in a year is

calculated by

si (t) = − f i j (t) log f i j (t)
j
The trade flux will be normalized locally as f i j (t) = Yi j (t)/Yi (t) where Yi (t) =
j Yi j (t) and then calculate local trade entropy of a country i in year t.
The world’s average local entropy is calculated by

si (t)
savg =
N
Here, N = 168 (Total countries considered)
3.4 Trade Partnership
Total number of countries that a country makes trade partnership with those coun-
tries. We investigated the number of partnerships for export value of some specific
countries.

pi (t) = 1
j,yi j (t)>0
Here, yi j (t) represents export value.

The average partnership of the world k is calculated all over the world (the
countries considered in the study) to compare it with a specific country. In this
case,

pi (t)
K(t) =
N
4 Results
4.1 Global Entropy and Uniformity
We have calculated global entropy and uniformity for produced products manufac-
tured goods vitally categorized by material; machinery and transpor equipment;
crude materials, inedible, except fuels, and all commodities for the exported trade
value. In Fig. 1a, we can see the global entropy of all commodities, and manufac-
tured goods are higher than two other products during whole time period. It means
that manufactured goods are the most exported products that involved higher trade
partners and obviously with a higher trade value. The global entropy was gradually
increasing till 2007, and it was almost constant till last except a sharp fall of global
entropy of manufactured goods in 2018. The global entropy of crude materials is
higher than machinery and transport equipment till 2006, and then, it interchanges
with each other with some fluctuations. A sharp fall in crude materials is also seen
in 2018 like manufactured goods.
In Fig. 1b, we explained the uniformity of trade all over the world. The higher
uniformity means that the trade was homogeneous over the world. In other words, it
can be said that the trade was evenly distributed; the influence of a specific country
was less in higher uniformity. On the other hand, lesser uniformity means the hetero-
geneity or unevenly distributed trade. Like global entropy, a transition is seen in
uniformity in both the products manufactured goods and crude materials in 2018.
(a) Global Entropyfor All (b) Uniformityfor All

commodities;Manufactured goods chiefly commodities;Manufactured goods chiefly
categorized by material; machinery and categorized by material; machinery and
transport equipments; Crude materials, transport equipments; Crude materials,
inedible, except fuels inedible, except fuels
Fig. 1 a Global entropy for all commodities; manufactured goods chiefly categorized by material;
machinery and transport equipment; crude materials, inedible, except fuels. b Uniformity for all
commodities; manufactured goods chiefly categorized by material; appliance and transit ingredients;
crude materials, inedible, except fuels
4.2 Local Entropy
We calculated local entropy for the most influential two countries: China and USA
in world trade and compared it with average local entropy of the world. In Fig. 2a,
in 1995, the local entropy of manufactured goods chiefly classified by material for
China is its lowest value; then, it increased gradually over time, and it takes the
highest value of entropy after 2010. A sharp fall is shown in 2018. In the case of
USA, it is decreasing till 2000, and then, it is increased up to 2017. In 2018, the
entropy of USA is fallen as like china. In world’s average local entropy, the value
remains same almost in whole time period, but in 2018, it is declined due to the fall
of both china and USA.
In machinery and transport goods (Fig. 2b), local entropy is increasing with small
fluctuation for China, and there is a rapid fall in 2018. While, the entropy of USA is
increased from 2000 to 2006 and slightly declined from 2007. We see in this product
that there is no significant change in world’s average local entropy.
Like two other products, in crude materials, China’s local entropy has fallen in
2018 (Fig. 2c). It was increasing with some significant fluctuations up to 2017 and fell
perniciously in 2018. It causes a fall in world’s average value as in the manufactured
products. On the other hand, USA has an opposite change in 2018 with an upward
transition. It means that the influence of China is higher than USA in this product.
And finally, we considered all the combined products named all commodity, we
saw that local entropy of China was also fallen drastically in 2018 which impacted
the average local entropy of the world deficiently (Fig. 2d).
Fig. 2 Local entropy for a manufactured goods chiefly classified by material, b machinery and
transport equipment, c crude materials, inedible, except fuels, d all commodity
4.3 Export-Trade Partnership Analysis
In trade partnership analysis, we took the most influential two countries: China and
USA in world trade to see their number of trade partners and compared it with average
number of partners of the world in the same manner of local entropy.
In Fig. 3a, we see partners of China and USA are almost same in whole time
period, and there is a sharp fall to 75 for both countries for manufactured goods. It
affected to the average partnership of the world trade as we see a downward transition
in world average partnership which means that trade partnership of China and USA
has a very strong effect on world trade.
In machinery and transport goods (Fig. 3b), the partnership in China in 2018 has
fallen sharply. For this reason, the average world partnership has decreased slightly
which means that it effected world trade of these products a little.
Fig. 3 Trade partnership for a manufactured goods chiefly categorized by material, b machinery
and transport equipment, c crude materials, inedible, except fuels, d all commodity
Like two other products, in crude materials, China’s number of partners has fallen
in 2018 (Fig. 3c). The number of partners of China was gradually increasing with a
little fluctuation and fell perniciously in 2018.
In all commodity, we saw that the partnership of China was also fallen drastically
which impacted a little bit in the average partnership of the world (Fig. 3d). We can
see a common change in every product for china in 2018 which effected world trade
slightly. But when China and USA both’s partnership fallen, we see a great impact
in world trade as seen in manufactured goods. We also found that one thing is in all
products China’s fall which was constant in 2018, but the fall of USA was only in
manufactured goods in trade partnership.
5 Conclusion
The global entropy describes the trade relationship among countries, and uniformity
represents how much uniform the trade was for the considered individual products
and all commodity. From three products, we have found that manufactured goods
were involved to higher global entropy and uniformity and two other products with
some fluctuating values. From the local entropy and trade partnership analysis, we
found that local entropy and the partnership of China and USA are much higher than
world’s average except some early years of local entropy of China. That’s why we can
say that the two countries have an impactful influence in world trade. We also noticed
that China’s local entropy and trade partnership drastically fall at 2018 in almost all
products which have an small effect on world trade but when local entropy and trade
partnership of China and USA fell together it effected world trade significantly that
we have seen in manufactured goods in both local entropy and trade partnership.
Acknowledgement This work is fully funded and supported by ICT Division, Ministry of Posts,
Telecommunications and Information Technology, Bangladesh under the ICT fellowship scheme.
References
1. Eaton, J., Kortum, S.: Technology, geographyand trade. Econometrica 70(5), 1741–1779 (2002)
2. Helpman, E., Melitz, M., Rubinstein, Y.: Estimating trade flows: trading partners and trading
volumes. Quart. J. Econ. 123(2), 441–487 (2008)
3. Rose, A.K.: Dowereally know that the WTO increases trade? Am. Econ. Rev. 94(1), 98–114
(2004)
4. Foschi, R., Riccaboni, M., Schiavo, S.: Preferential attachment in multiple trade networks.
Phys. Rev. 90, 022817 (2014)
5. Serrano, M.A., Boguná, M.: Topology of the world trade web. Phys. Rev. E 68, 015101 (2003)
6. Garlaschelli, D., Loffredo, M.I.: Structure and evolution of the world trade network. Physica
A 355, 138–144 (2005)
7. Fagiolo, G., Reyes, J., Schiavo, S.: World-trade web: topological properties, dynamics, and
evolution. Phys. Rev. E 79, 036115 (2009)
8. Riccaboni, M., Schiavo, S.: Structure and growth of weighted networks. New J. Phys. 12,
023003 (2010)
9. De Benedictis, L., Tajoli, L.: The world trade network. World Econ. 34, 1417–1454 (2011)
10. Riccaboni, M., Rossi, A., Schiavo, S.: Global networks of trade and bits. J. Econ. Interac.
Coord. 8, 33–56 (2013)
11. Riccaboni, M., Schiavo, S.: Stochastic trade networks. J. Complex Netw. forthcoming (2014)
12. Fagiolo, G., Reyes, J., Schiavo, S.: The evolution of the world trade web: a weighted–network
analysis. J. Evol. Econ. 20, 479–514 (2010)
13. Bhattacharya, K., Mukherjee, G., Saramäki, J., Kaski, K., Manna, S.S.: The international trade
network: weighted network analysis and modelling. J. Stat. Mech. P02002 (2008)
14. Cha, M.-Y., Lee, J.W., Lee, D.-S.: Complex networks and minimal spanning trees in
international trade network. J. Korean Phys. Soc. 56, 998 (2010)
15. Oh, C.-Y., Lee, D.-S.: Entropy of international trades. Phys. Rev. E95, 052319 (2017)
16. Shannon, C.E.: A Mathematical Theory of Computation Bell. Syst. Tech. J. 27, 379 (1948)
17. Pulliainen, K.: Entropy measures for international trade. Swedish J. Econ. 72, 40 (1970)
18. Lei, H., Chen, Y., Li, R., He, D., Zhang, J.: Maximum entropy for the international division of
labor. PLoS ONE 10, e0129955 (2015)
An Automated Bengali Text
Summarization Technique Using
Lexicon-Based Approach
Busrat Jahan, Sheikh Shahparan Mahtab, Md. Faizul Huq Arif,

Ismail Siddiqi Emon, Sharmin Akter Milu, and Md. Julfiker Raju
Abstract There is enough resources for English to process and obtain summarize
documents. But this thing is not directly applicable for Bengali language as there is
lots of complexity in Bengali, which is not same to English in the context of
grammar and sentence structure. Again, doing this for Bengali is harder as there is
no established tool to facilitate research work. But this necessary as 26 crore people
use this language. So, we have gone for a new approach Bengali document sum-
marization. Here, the system design has been completed by preprocessing the i/p
(input) doc, tagging the word, replacing pronoun, sentence ranking, respectively.
Pronoun replacement has been added here to minimize the rate of swinging pro-
noun in the output summary. As the pronoun replacement, we have gone ranking
sentences according to sentence frequency, numerical figures (both in digit and
word version) and document title. Here, if the sentence has any word that exists in
title also taken into our account. The similarity between two sentences has been
checked to deduct one as that causes less redundancy. The numerical figure also
makes an impact, so they were also identified. We have taken over 3000 newspaper
and books documents words has been trained according to grammar. And two
B. Jahan I. S. Emon Md. Julfiker Raju

Department of CSE, Feni University, Feni, Bangladesh
e-mail: hossenbipasa980@gmail.com
I. S. Emon
e-mail: emonsahriar0@gmail.com
Md. Julfiker Raju
e-mail: julfikerar@gmail.com
S. S. Mahtab (&)
Department of EEE, Feni University, Feni, Chittagong Division, Bangladesh
e-mail: mahtabshahzad@gmail.com
Md. Faizul Huq Arif
Department of ICT(DoICT), ICT Division, Dhaka, Bangladesh
e-mail: arifict27@gmail.com
S. A. Milu
Department of CSTE, Noakhali Science and Technology University, Noakhali, Bangladesh
H. Saini et al. (eds.), Innovations in Computer Science and Engineering,
https://doi.org/10.1007/978-981-33-4543-0_39
364 B. Jahan et al.
documents have been checked by the design system to evaluate the efficiency of
designed summarizer. From the evaluation system, it is been found that the recall,
precision, F-score are 0.70 as it is 70%, 0.82 as it is 82%, 0.74 as it is 74%,
respectively.
Keywords Text summarizer BTS Bengali NLP Python Machine

learning POS tagging
1 Introduction
Text summarization is the process of summarizing a text or document. There are

many summarization tools for the English language. There are also some tasks for
automated Bengali text or document summarization. From an application stand-
point, the tools do not seem to be very suitable. The abstracts are categorized in two
ways: the extractive and the abstractive approach. Most of the summarizer methods
for Bengali text summarization are extractive [1]. In an automated text summa-
rization process, a text is delivered to the computer, and the computer returns a
less-than-redundant extract or abstract of the original text (s). Text abstraction is the
process of producing an abstract or a summary of an extract by selecting a sig-
nificant portion of the information from one or more texts [1–3]. Thus, the overview
summarizes the meaning of the extremes, and some time extraction results in data
loss. These methods are not also able to create a plain text from related hierarchical
texts. Extract summarization is less like complexity when it comes to favorite issues
than abstracts for less complexity. We can use the grammatical rules in conjunction
along with mathematical rules for making sentences to decrease the unnecessary
error. Again, it can be use for creating new and plain text from multiple texts that
enables to reduce the size of the text summary [4]. Rafel et al. told the extractive
summarizer states all the basic requirements. This method has three structures: text
analysis, ranking/scoring sentence and summarization [5].
2 Literature Review
The observation of summarization of Bangla language for only a document is

showed in this sector. However, the area of Bangla text summarization was begun
several years back as a new research. Previously, most of the work in the text
summarization domain was done on the basis of sentence prohibition. The survey of
different text summarization techniques is proposed as the article method [1]. They
accomplished an analysis of various methods for text and implemented the basis of
extraction Bangla text summarizer.
According to the proposed method of Jones [4], which provides a summary of a
text without reading full text we have found that. The main steps in his method
An Automated Bengali Text Summarization Technique Using … 365
have (i) preprocessing, (ii) scoring/ranking sentence and (iii) generating summary.
It has also term frequency (TF), inverse document frequency (IDF) and positional
value (PV).
The presented method of Haque et al. [5], it summarized Bangla document by
using an extraction based summarization technique. The four major steps of their
method are given here: (i) preprocessing, (ii) scoring/ranking sentence, (ii) sentence
clustering, (iv) generating summary.
Efat et al. [6] suggested a summarization method as an extraction based which
acts on the Bangla documents. At the same time, it is capable of summarizing a
single document. It has two major steps in their proposed method: (i) preprocessing,
(ii) scoring/ranking sentence and summarization.
The method of Das and Bandyopadhyay [7] presented the identification of
sentiment from the text, combines it and lastly signifies the text summarization.
They used a sentiment model to restore and integrate sentiment. The integration is
based on the presentation of theme clustering (K-means) and document level theme
relational graph algorithms and finally generates summary selected by the standard
page rank algorithm for data retrieval.
3 Suggested Method
For successfully we have employed two tagging systems. One is general tagging
system, and another is special tagging system. The special tagging system makes
the thing best and updated.
3.1 General Tagging
Every word is made to tag (like noun, pronoun, adjective, verb, preposition, etc.).
By using a lexicon database [2] and SentiWordNet [3]. The lexicon database and
SentiWordNet have limited number of predefined words. Using lexicon database,
the words can be tagged as “JJ” (adjective), “NP” (proper noun), “VM” (verb),
“NC” (common noun), “PPR” (pronoun), etc. On the other hand, SentiWordNet has
list of words with tag as “a” (adjective), “n” (noun), “r” (adverb), “v” (verb), “u”
(unknown). Based on these predefined lists of words, we have experimented on 200
Bangla news documents and found that 70% words can be tagged. Bangla words
(especially verb) are very much interesting [1]. Though we use word stemming to
identify the original term of the word, 100% inactive verbs cannot be stemmed. In
fact, it is very difficult to identifying verb because there are many suffixes in
Bangla. For example, basis on the tense and person, the English words “do” may be
“doing”, “did” and “does”, but on the other hand, the word may have different
forms in Bangla. To consider the present continuous tense Like, “কর” (kor-do),
three main forms of this word can only depend on the first, second and third person.
366 B. Jahan et al.
Also, it can be “করছি” (doing) for first person, “করছ” (doing) for second person
and “করছেন” (doing) for third person, respectively. To consider the present con-
tinuous tense Like, “কর” (kor-do), three main forms of this word can only depend
on the first, second and third person. Also, it can be “করছি” (doing) for first person,
“করছ” (doing) for second person and “করছেন” (doing) for third person, respec-
tively. The forms of verbs for all these meanings of “you” in Bangla are also
different. For instance, all these meanings for the forms of verbs of “you” are also
different in Bangla. As, “আপনি করছেন” (you are doing), “তুমি করছ” (you are
doing), “তুই করছিস” (you are doing) where those terms are specified in present
continuous tense and also with second person. Thus, the word “কর” (do) may have
the given forms: “করে” (do), “করেন” (do), “করিস” (do), “করি” (do), “করছে”
(doing), “করছেন” (doing), “করছ” (doing), “করছিস” (doing), “করছি” (doing),
“করেছে” (did), “করেছেন” (did), “করেছ” (did), “করেছিস” (did), “করেছি” (did),
“করুক” (do), “করুন” (do), “করল” (did), “করলেন” (did), “করলে” (did), “করলি”
(did), “করলাম” (did), “করত” (do), “করতেন” (did), “করতে” (did), “করতিস” (did),
“করতাম” (did), “করতেছি” (doing), “করতেছ” (doing), “করতেছেন” (doing),
“করছিল” (doing), “করছিলেন” (doing), “করছিলে” (doing), “করছিলি” (doing),
“করছিলাম” (doing), “করেছিল” (doing), “করেছিলেন” (doing), “করেছিলে” (do-
ing), “করেছিলি” (doing), “করেছিলাম” (doing), “করবে” (do), “করবেন” (do),
“করবি” (do), “করব” (do), “করো” (do). Thus, there is no any comparison between
the complexity of verb in Bangla and English. However, verb identification is very
important for language processing because the verb is the main word of a sentence.
So, the complexity of verb in Bangla cannot be compared with English. A list of
suffixes are considered as for the final checking in following: “ইতেছিস” (itechhis),
“তেছিস” (techhis), “ইতিস” (itis), “ইলে” (ile), “ইবি” (ibi), etc. Now, if the word has
suffix, it is tagged as a verb. The result of word tagging has been improved from
68.12% (before using the list of suffixes [4]) to 70% (after using the list of suffixes).
We get some preliminary tagging in this step, and later, it may be updated in the
next steps and also along with certain words will be specifically tagged as acronym,
named entity, occupation, etc., in the next step [8–11].
3.2 Special Tagging
After general tagging, special tagging was introduced to identify the words as
acronym, elementary form, numerical figure, repetitive words, name of occupation,
organization and places.
1. Examining for English acronym: When the words are formed by the initials of
the other words, then it is called acronym such as “ইউএনও” (UNO), “ওআইসি”
(OIC), “‘ইউএসএ” (USA). For examining these kinds of words, when we can
separate these words that like “ইউএনও” (UNO) to match with “ইউ” (U), “এন”,
“ও” (O), those are matched every letter of the words. Actually, we can write all
English letters in Bangla like: A for (“এ”), B for (“বি”), C for (“সি”), D for
(“ডি”), … W for (“ডাব্লিউ”), X for (“এক্স”), Y for (“ওয়াই”), Z for (“জেড”)

and if we can sort them by descending order depend on their string lengths
where W (“ডাব্লিউ”) will be in the first place and A (“এ”) will be in the last
place, then match every letter of the words. It is important in descending order
that is always used to ensure the longest match. Such as, “এম” (M) does not
match with “এ” (A), but it will match with “এম” (M). This experiment shows
that 98% success rate for this case.
2. Studying for Bangla elementary tag: Bangla letters with spaces, like: “আ ক ম”
(A K M), “এ বি ম” (A B M), etc. These letters will be tagged as Bangla primary
tag. We have gotten based on research; the accuracy of the elementary result is
100%.
3. Studying for recurrent words: Recurrent words are special form of word com-
bination where same word can be placed for two times consecutively. For
example, “ঠান্ডাঠান্ডা” (thandathanda—cold cold), “বড়বড়” (boroboro—big
big), “ছোটছোট” (chotochoto—small small), etc. There are some words, and
they are partially repeated such as “খাওয়াদাওয়া” (khawadawa—eat). We have
found 100% accuracy on identifying recurrent/repetitive words.
4. Studying for numerical digit: There are three conditions for recognizing the
numerical representation in words and digits, are examined as follows:
(a) It is formed by following the first part of the word, like as, 0 for (০), 1 for
(১), 2 for (২),…, 9 for (৯) or “এক” (one), “দুই” (two), “তিন” (three), “চার”
(four) to “নিরানব্বই” (ninety nine). The decimal point (.) is also considered
when examining the numerical form from digits.
(b) The next part (if any) is followed by: “শত” (hundred), “হাজার” (thousand),
etc.
(c) Finally, it can have suffixes such as, “টি” (this), “টা” (this), “এন” (en), etc.
After the experiment on our sample test documents, 100% numerical form
can be found from both numerical values and text documents.
5. Studying for name of occupation: Occupation has a significant word, and for the
human named entity identification, occupation is very much helpful by which
named entity can be recognized. If we get any word as occupation, we may
consider the immediate next some words to find out named entity. We have
retrieved some entries for the occupation of Bangladesh from a table such as
“শিক্ষক” (shikkhok-master), “সাংবাদিক” (sangbadik-journalist). Every word
has matched with these words (that we collected from different online source)
and if any matches are found then tagged as occupation. Here, “শিক্ষক”
(shikkhok-master) will turn into “প্রধান শিক্ষক” (prodhanshikkhok-Head
master) and so on. From this study, it may identify 96% for occupation.
6. Studying for the name of organization: Name of organization is an important
factor where any type of word may be the element of organizational name. From
our analysis, it has been mentioned as follows:
(a) The following complete name of the organization, which is depended on the
acronym of the name that is together with this parenthesis. For example,
368 B. Jahan et al.
“দূর্নীতিদমনকমিশন(দূদক)” “Durniti Domon Commission (DUDOK) Anti

Corruption Commission (ACC)”.
(b) The organization name with last part may contain certain words. Such as,
“লিমিটেড” (limited-limited), “বিদ্যালয়” (biddaloyschool), “মন্ত্রণালয়”
(montronaloy-ministry), etc. Along with the above point, if any such of
words are presented in the text according to the point (b), then immediately
check the three words of the particular word. Uncertainty when the words
are found as noun, name entity or any blocked word, then call them the
organizations name. It is found that the organizations name may be accepted
the basis of point (b) 85% times.[13–20]
7. Studying for name of place: There is a table the name of places of Bangladesh, it
is made with 800 names for the list of division, district, upazila and munici-
pality. Here, the top level is division, second level is district, and third level is
upazila or municipality in area-based separation. In addition, we have analyzed
230 countries names and their capitals. In this way, about 91% of the place
names can be identified in our experiment.
Sample input
Title: দুই ভাই-বোনের ময়না তদন্ত হয়েছে, মামলা হয়নি
Text: রাজধানীরবনশ্রীতেদুইভাইবোনেররহস্যজনকমৃত্যুরঘটনায়এখনোমা
মলাহয়নি।শিশুদেরবাবামামলাকরবেনবলেজানিয়েছেপরিবার।দুইশিশুরলাশেরময়ন
াতদন্তহয়েছে।তাঁদেরগ্রামেরবাড়িজামালপুরেলাশদাফনকরাহবে।খাবারেরনমুনা
পরীক্ষারফলাফলএখনোপাওয়াযায়নি।শিশুদের বাবা আমানউল্লাহর বন্ধু জাহিদুল
ইসলাম আজ মঙ্গলবার বেলা সোয়া ১১ টার দিকে প্রথম আলোকে এসব কথা
জানিয়েছেন।রামপুরাথানারভারপ্রাপ্তকর্মকর্তা (ওসি) রফিকুল ইসলাম বলেন,
এখনো মামলা হয়নি।পরিবারের পক্ষ থেকে আজ মামলা হতেপারে।জিজ্ঞাসা বাদের
জন্য চায়নিজ রেস্তোরাঁর ব্যবস্থাপক, কর্মচারী, পাচককে থানায় নেওয়া
হয়েছে।চায়নিজ রেস্তোরাঁ থেকে আগের দিন আনা খাবার গতকাল সোমবার দুপুরে
গরম করে খেয়ে ঘুমিয়ে পড়ে নুসরাত আমান (১২) ও আলভী আমান (৬)। এরপর তারা
আর জেগে ওঠেনি। অচেতন অবস্থায় হাসপাতালে নেওয়া হলে চিকিৎসকেরা তাদের
মৃত ঘোষণা করেন।পরিবারের অভিযোগের ভিত্তিতে পুলিশ জিজ্ঞাসাবাদের জন্য
ওই রেস্তোরাঁর মালিককে থানায় নিয়ে গেছে। নুসরাত ভিকারুননিসা নূন স্কুল অ্যান্ড
কলেজের পঞ্চম ও আলভী হলিক্রিসেন্ট স্কুলে নার্সারি শ্রেণির শিক্ষার্থী। তাদের
বাবা মো. আমান উল্লাহ ব্যবসায়ী ও মা জেসমিন আক্তার গৃহিণী। এই দম্পতির এই
দুটি সন্তানই ছিল। চায়নিজ রেস্তোরাঁ থেকে আগের দিন আনা খাবার গতকাল
সোমবার দুপুরে গরম করে খেয়ে ঘুমিয়ে পড়ে নুসরাত আমান(১২) ও আলভী আমান(৬)।
এরপর তারা আর জেগে ওঠেনি। অচেতন অবস্থায় হাসপাতালে নেওয়া হলে
চিকিৎসকেরা তাদের মৃত ঘোষণা করেন। পরিবারের অভিযোগের ভিত্তিতে পুলিশ
জিজ্ঞাসাবাদের জন্য ওই রেস্তোরাঁর মালিককে ওই দিনই থানায় নিয়ে গেছে।
Getting Summary of Sample

Title: দুইশিশুরলাশেরময়নাতদন্তহয়েছে।
Text:রামপুরাথানারভারপ্রাপ্তকর্মকর্তা (ওসি) রফিকুলইসলামবলেন,
এখনোমামলাহয়নি।
দুইভাইবোনেরময়নাতদন্তহয়েছে,মামলাহয়নিরাজধানীরবনশ্রীতেদুইভ
াইবোনেররহস্যজনকমৃত্যুরঘটনায়এখনোমামলাহয়নি।শিশুদেরবাবামামলাকর
বেনবলেজানিয়েছেপরিবার। পরিবারেরপক্ষথেকেআজমামলাহতেপারে। শিশুদের বাবা
আমানউল্লাহর বন্ধু জাহিদুল ইসলাম আজ মঙ্গলবার বেলা সোয়া ১১টার দিকে প্রথম
আলোকে এসব কথা জানায় ।
See Figs. 1 and 2.
4.1 Co-selection Measures
Co-selection measures: In co-selection measures, the principal evaluation metrics

are [12]:
(i) Precision (P):
It is the number of sentences occurring in both system generated summary and ideal
summary divided by the number of sentences in the system generated summary.
Fig. 1 Sentence scoring of sample document

370 B. Jahan et al.
Fig. 2 Mean deviance of sample document
PrecisionðPÞ ¼ ðA \ BÞ=A
where “A” denotes that the number of sentences obtained by the summarizer and
also “B” denotes the number of relevant sentences compared to target sets.
(ii) Recall (R):
It is the number of sentences occurring in both systems generated summary and
ideal summary divided by the number of sentences in the ideal summary.
RecallðRÞ ¼ ðA \ BÞ=B
(iii) F-measure:
The integrated measure that incorporated both precision and recall is F-measure.
FScore ¼ ð2 P RÞ=ðP þ RÞ
The evaluation result of first ten document has given in Table 1.
Table 1 Result of precision, Document No. Precision (P) Recall (R) F-score
recall and F-score
1 0.84 0.71 0.76
2 0.79 0.72 0.75
3 0.82 0.69 0.74
4 0.82 0.68 0.74
5 0.79 0.71 0.74
6 0.82 0.73 0.75
7 0.78 0.72 0.73
8 0.85 0.70 0.75
9 0.85 0.71 0.76
10 0.84 0.71 0.76
Average score 0.82 0.70 0.74
5 Conclusion
We have gone for an automatic Bengali document summarizer using Python as a

programming platform. There is enough resources for English to process and obtain
summarize documents. But this thing is not directly applicable for Bengali language
as there is lots of complexity in Bengali, which is not same to English in the context
of grammar and sentence structure. Again, doing this for Bengali is harder as there
is no established tool to facilitate research work. But this necessary as 26 crore
people use this language. So, we have gone for a new approach Bengali document
summarization. Here, the system design has been completed by preprocessing the i/
p (input) doc, tagging the word, replacing pronoun, sentence ranking, respectively.
Pronoun replacement has been added here to minimize the rate of swinging pro-
noun in the output summary. As the pronoun replacement, we have gone ranking
sentences according to sentence frequency, numerical figures (both in digit and
word version) and document title. Here, if the sentence has any word that exists in
title also taken into our account. The similarity between two sentences has been
checked to deduct one as that causes less redundancy. The numerical figure also
makes an impact, so they were also identified. We have taken over 3000 newspaper
and books documents words which has been trained according to grammar. And
two documents have been checked by the design system to evaluate the efficiency
of designed summarizer. From the evaluation system, it is been found that the
recall, precision, F-score are 0.70 as it is 70%, 0.82 as it is 82%, 0.74 as it is 74%,
respectively.
372 B. Jahan et al.
References
1. Radev, D.R., Hovy, E., McKeown, K.: Introduction to the special issue on summarization.
J. Comput. Linguist. 28(4), 399–408 (2002)
2. Hamou-Lhadj, A., Lethbridge, T.: Summarizing the content of large traces to facilitate the
understanding of the behaviour of a software system. In: Proceedings of the 14th IEEE
International Conference on Program Comprehension (ICPC), pp. 181–190. IEEE, (2006)
3. Hovy, E.: Automated text summarization. In: Mitkov, R. (ed.) The Oxford Handbook of
Computational Linguistics, pp. 583–598. Oxford University Press (2005)
4. Jones, K.S.: Automatic summarizing: factors and directions. In: Advances in Automatic Text
Summarization, pp. 1–12 (1999)
5. https://blog.frase.io/
6. Dongmei, A., Yuchao, Z., Dezheng, Z.: Automatic text summarization based on latent
semantic indexing. J. Artif. Life Robot. 15(1), 25–29 (2010)
7. Kunder, M.D.: The size of the world wide web. Online. Available. http://www.
worldwidewebsize.com. Accessed 15 Feb 2015
8. Chakma, R., et al.: Navigation and tracking of AGV in ware house via wireless sensor
network. In: 2019 IEEE 3rd International Electrical and Energy Conference (CIEEC), Beijing,
China, pp. 1686–1690 (2019). https://doi.org/10.1109/cieec47146.2019.cieec-2019589
9. Emon, I.S., Ahmed, S.S., Milu, S.A., Mahtab, S.S.: Sentiment analysis of bengali online
reviews written with english letter using machine learning approaches. In: Proceedings of the
6th International Conference on Networking, Systems and Security (NSysS ’19). Association
for Computing Machinery, New York, pp. 109–115 (2019). doi: https://doi.org/10.1145/
3362966.3362977
10. Ahmed, S.S., et al.: Opinion mining of Bengali review written with English character using
machine learning approaches. In: Bindhu V., Chen J., Tavares J. (eds.) International
Conference on Communication, Computing and Electronics Systems. Lecture Notes in
Electrical Engineering, vol. 637. Springer, Singapore (2020). https://doi.org/10.1007/978-
981-15-2612-1_5
11. Milu, S.A., et al.: Sentiment Analysis of Bengali reviews for data and knowledge engineering:
a Bengali language processing approach. In: Bindhu V., Chen J., Tavares J. (eds.)
International Conference on Communication, Computing and Electronics Systems. Lecture
Notes in Electrical Engineering, vol. 637. Springer, Singapore (2020). https://doi.org/10.1007/
978-981-15-2612-1_8
12. Munir, C., Ibrahim, K., Mofazzal, H.C.: Bangla VasarByakaran. Ideal publication, Dhaka
(2000)
13. Ferreira, R., de Souza Cabral, L., Freitas, F., Lins, R.D., de Frana Silva, G., Simske, S.J.,
Favaro, L.: A multi-document summarization system based on statistics and linguistic
treatment. Expert Syst. Appl. 41(13), 5780–5787 (2014)
14. Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165
(1958)
15. Foong, O.M., Oxley, A., Sulaiman, S.: Challenges and trends of automatic text summariza-
tion. Int. J. Inf. Telecommun. Technol. 1(1), 34–39 (2010)
16. Azmi, A.M., Al-Thanyyan, S.: A text summarizer for arabic. J. Comput. Speech Lang. 26(4),
260–273 (2012)
17. Karim, M.A., Kaykobad, M., Murshed, M.: Technical challenges and design issues in bangla
language processing. Published in the United States of America by Information Science
Reference (an imprint of IGI Global) (2013)
18. Islam, M.T., Masum, S.: Bhasa: a corpus based information retrieval and summarizer for
bengali text. In: Proceedings of the 7th International Conference on Computer and
Information Technology (2004)
19. Uddin, M.N., Khan, S.A.: A study on text summarization techniques and implement few of
them for bangla language. In: Proceedings of the 10th International Conference on Computer
and Information Technology (ICCIT-2012), pp. 1–4. IEEE (2007)
20. Sarkar, K.: Bengali text summarization by sentence extraction. In: Proceedings of International
Conference on Business and Information Management (ICBIM-2012), pp. 233–245. NIT
Durgapur (2012)
Location-Based Pomegranate Diseases
Prediction Using GPS
Rajshri N. Malage and Mithun B. Patil
Abstract In our India, agricultural field is most important and essential field. This
field plays major role in our economy and daily life. There are different types of
crops and fruits that are cultivated in our country. Pomegranate is one of the major
commercial fruit grown in our country, but these fruits are prone to many uneven
climatic diseases. The weather forecasting technology using GPS is very important,
effective, and beneficial to pomegranate farmer to protect the plant form different
diseases and maintain the immunity of their plant. In this research paper, we have
designed a system to predict the correct weather of that location to detect and forecast
the pomegranate diseases and provide prevention tips for the diseases. It gives an
alert message to cultivator based on which he makes decisions.
Keywords Pomegranate diseases · Accuweather · Segmentation · Weather

forecast · Global positioning system (GPS)
1 Introduction
Pomegranate is one of commercial horticulture products in India and many other

countries because of its high medical application across the globe.There are many
disease effects; these pomegranate cultivations in terms of plants or fruit result in
degradation of quality and quantity of production of fruits which results in degrada-
tion of financial source in terms of agriculturists and the human health so to over-
come this problem forecasting and predication of diseases in these Pomegranate is
an important issue. To overcome this problem, we have designed an Android mobile
application especially for pomegranate diseases.We use the GPS system to iden-
tify location of that agriculture plant and calculate the 5 days weather condition of
R. N. Malage (B) · M. B. Patil

Department of CSE, N K Orchid College of Engineering and Technology Solapur, Solapur, India
e-mail: Rajshrim57@gmail.com
M. B. Patil
e-mail: mithunpatil@orchidengg.ac.in
https://doi.org/10.1007/978-981-33-4543-0_40
376 R. N. Malage and M. B. Patil
(a) (b)
Fig. 1 a Mars disease on pomegranate plant. b Bacterial blight on pomegranate plant
that farm and according to weather situation we forecast and predict occurrence of
diseases and solution of that disease which in turn prevent affection of plant. The
pomegranate cultivators can use the system/application to get information regarding
daily weather and changes in environmental conditions, application mostly gives the
information regarding diseases due to weather.Hangs and appropriate solutions on
that disease. The diseases generally, these pomegranate plants need to have a spray
Fig. 1. Bacterial blight on pomegranate fruit treatment is based on humidity and
temperature so in our application we have designed system for forecasting diseases
based on available whether information. Variation in wheatear condition requires
spray treatment as shown in Fig. 2 occurred due to changeable climate that affects to
whole plant of pomegranate; it would be a huge loss for cultivator. As per researchers
of pomegranate plant, it should require 16 months for healthy growth and if plant
is prone to any disease, it will be vanishing within few days. In pomegranate culti-
vation, exact information regarding the disease’s predication is an important issue.
The pomegranate is prone to many diseases like bacterial blight and Mars diseases as
shown in Fig. 1 which results in reduction in yield and reduce its medical importance.
2 Literature Review
In recent developments of information communication technology (ICT) to farmers

several computer techniques are made available for agricultural and horticultural
cultivator. The cultivator cannot contact the agricultural or horticultural experts due
to their distant availability and unable to identify the disease symptoms due to its
complexity of disease pattern. In existing system, the captured images from plant
leaves surfaces can give a better solution where the remote agri experts can see
instantly the image for disease diagnosis to extend advice to remote areas through
manual or telephonic. Most of the diseases in these plant are based on environ-
mental condition so we have designed an application which forecast the diseases
Location-Based Pomegranate Diseases Prediction Using GPS 377
Fig. 2 Temperature and

humidity
in pomegranate plant based on weather conditions. Few of the related works in

forecasting the diseases are discussed here. [1] designed a system for detection of
pomegranate disease using algorithms of machine learning and Internet of things,
and the main goal of research work is to detect the whole plant of pomegranate
to avoid diseases. Dubey and Jalal [2] have an image processing-based technique
for identification and detection of pomegranate disease in which image segmenta-
tion, local binary patterns, color coherence vector-mean clustering, histogram, and
complete local binary patterns are used for extracting the features. Islam et al. [3]
proposed method that integrates machine learning and image processing to allow
diseases diagnosis from leaf images. The author proposed an image segmentation
with multiclass SVM to develop an automated and easily accessible system. Bhange
and Hingoliwala [4] explain pomegranate diseases detection using image processing
(image processing algorithms and K-map algorithm). It is accurate system, through
the image processing detection of diseases of pomegranate plant. This system only
uses image to detect pomegranate diseases. Dhakate and Ingole [5] diagnosis of
pomegranate plant uses image processing and neural network algorithms to deal
with the issues of pathology, i.e., classification of diseases. Gaikwad and Karande
[6] introduce the detection of disease and grading in pomegranate fruit using digital
image processing, and image processing for fruit disease detection image processing
is required for enhancing image.Lamani et al. [7] predicted plant diseases from
weather forecasting using data mining. Plant diseases determination is an art and
science. Plant diseases are essential problem that lowers the quantity and reduces
the quality of agricultural production. The proposed system uses segmentation tech-
nique such as k-means clustering and deep neural network learning to predict the
disease-based weather feature of orange plant.
3 Proposed System
The proposed architecture for forecasting of diseases in pomegranate disease is as

shown in Fig. 3 which consists of different phases as explained below.
3.1 Processing
In this phase, the deal information weather such as humidity, temperature, wind
speed, possibility of rain, and timing of sunrays is collected since these are the major
parameter in forecasting of diseases in pomegranate plant.
3.2 Segmentation
In this phase, the processed data is grouped together based on parameters which can
be used for efficient calculation and analysis of stored information and it also take
Fig. 3 Pomegranate diseases forecasting system

cares of potential challenges to implement the appropriate data segmentation and

data quality tools with customer data validation.
It is an initial process in which dimensionality reduction into manageable group

for processing is done. In this phase, various features are selected and combined
which, effectively reduces the amount of data processed, while still accurately and
completely describing the original dataset. The process of feature extraction reduces
the need resources without losing the important information.
3.4 Preprocessing
In this phase, the data are normalized and selected relevant data for processing
missing data are corrected in this phase so that unreliable data, noise, and irrelevant
data are ignored during processing time.
3.5 Testing
In general, testing is finding out how well something works. To examine something
carefully to find out if it is working properly or what it is like, in this phase, testing
of pomegranate based on weather conditions is performed.
3.6 Classification
It is the phase of organizing things into groups based on their type, systematic
arrangement in group, or categories according to established criteria.
3.7 Detection
Plant detection is the process matching a specimen plant to know taxon. The ability to
identify plants allows us to access many important pasture variables that are critical
to management like range condition proper stopping rate wide life habited quality.
4 Result and Discussion
According to this result, we get weather forecasting information regarding humidity,

temperature, wind speed, precipitation, and sunrays. Through this result, we are
displaying three days weather. So, the farmer can take decision at which time spray
will take according precipitation and wind speed.
The weather forecasting information related to on humidity, temperature, wind
speed, precipitation, and sunlight is done as shown in Fig. 4. The information is
notified to cultivator. Based on the information, the cultivator can take decision about
spray as per precipitation and wind speed. Figure 5 shows the detail forecasting of
disease based on weather such as Fulkidi, Ali, Pityadhekun, Mava. These diseases
reduce the quality and quantity. It will reduce the price of pomegranate fruit in market.
It will create big impact on farmer income.
Fig. 4 Weather forecast
Fig. 5 Diseases occurred

Figure 6 shows the temperature graph which is primary factor affecting the rate
of plant development. Raise in temperature may affect the plant which may impact
plant productivity. Temperature will help to farmer from sun burning for protect their
pomegranate fruit and cover the farm. Figure 7 shows wind speed. This result shows
to farmer for taking daily spray. So, he can save his money and spray. Wind direction
and velocity have significant influence on crop growth. Figure 8 shows humidity
means the amount of wetness or water vapor in the air and it can also predicate
rainfall.
Fig. 6 Temperature graph
Fig. 7 Wind speed graph

Fig. 8 Humidity graph
5 Conclusion
The forecasting of disease in pomegranate plant based on environment condition and

intimating the cultivator which is main need for high yield of pomegranate to culti-
vator is designed and implemented in this paper. The designed system/application
forecasts pomegranate plant disease based on weather condition by considering the
location of the plant using GPS system. The designed application considers the
weather information like temperature, humidity, wind speed, and possibility of rain
and timing of sunrays to detecting pomegranate diseases, which in turn helps the
farmers/cultivator to increase the yield with quality of fruit and pomegranate plant.
References
1. Pawara, S., Navalem, D., Patil, K., Mahajan, R.: Detection of pomegranate disease using
machine learning and internet of things. In: IEEE 3rd International Conference for Convergence
in Technology (I2CT) (2018)
2. Dubey, S.R., Jalal, A.S.: Detection and classification of tomato vegetable diseases using
complete local binary patterns IEEE. In: Third International Conference on Computer and
Communication Technology, vol. 3, pp. 247–251 (2012)
3. Islam, M., Dinh, A., Wahid, K., Bhowmik, P.: Detection of potato diseases using image segmen-
tation and multiclass support vector machine. In: IEEE 30th Canadian Conference on Electrical
and Computer Engineering (CCECE) (2017)
4. Bhange, M., Hingoliwala, H.A.: Pomegranate disease detection using image processing.
Procedia Comput. Sci. 280–288 (2015)
5. Dhakate, M., Ingole, A.: Diagnosis of pomegranate plant diseases using neural network. In:
Fifth National Conference on Computer Vision, Pattern Recognition, Image Processing and
Graphics (NCVPRIPG) (2015)
6. Gaikwad, D.S., Karande, K.J.: Image processing approach for grading and identification of
diseases on pomegranate fruit: an overview. Int. J. Comput. Sci. Inf. Technol. 7(2), 519–522
(2016)
7. Lamani, S.B., Ravikumar, K., Jamal, A.: Pomegranate fruits disease classification with fuzzy
c mean clustering. Int. J. Adv. Eng. Res. Dev. 5(2) (2018)
8. Kaur, K., Kaur, M.: Prediction of plant disease from weather forecasting using data mining.
Int. J. Future Revolution Comput. Sci. Commun. Eng. 4(4) (2018)
9. Sowmya, G.M., Chandan, V., Kin, S.: Disease detection in pomegranate leaf using image
processing technique. Int. J. Sci. Eng. Technol. Res. (IJSETR) 6(3) (2017)
10. Li, Q., Wang, M., Gu, W.: Computer vision based system for tomato surface defect detection.
Comput. Electron. Agric. 36, 215–223 (2002)
11. Mehl, P.M., Chao, K., Kim, M., Chen, Y.R.: Detection of defects on selected tomato cultivars
using hyperspectral and multispectral image analysis. Appl. Eng. Agric. 18, 219–226 (2002)
12. Wang, Y., Cui, Y., Huang, G.Q., Zhang, P., Chen, S.: Study on vegetable quality inspection
based on its surface color in produce logistics. In: International Conference on Manufacturing
Automation (2010)
13. Chaerle, L., Lenk, S., Hagenbeek, D., Buschmann, C., Straeten, D.V.D.: Multicolor fluores-
cence imaging for early detection of the hypersensitive reaction to tobacco mosaic virus. J.
Plant Physiol. 164(3), 253–262 (2007)
14. Singh, V., Varsha, A.K.: Detection of unhealthy region of plant leaves using image processing
and genetic algorithm. In: 2015 International Conference on Advances in Computer Engi-
neering and Applications (ICACEA) IMS Engineering College, Ghaziabad, India
15. Chaudhary, M., Chavan, R., Durgawali, S., Ghodeswar, A.: Smart agriculture: detection of
disease in plants using image processing. In: International Conference on Innovative and
Advanced Technologies in Engineering
16. Mithun, P., Aishwarya, K., Nikita, S., Aishwarya, G.: Android based application for fruit quality
analysis. Int. J. Innovative Res. Sci. Eng. Technol. 12(6) (2016)
17. Doddaraju, P., Kumar, P., Gunnaiah, R., Gowda, A.A., Lokesh, V., Pujer, P., Manjunatha,
G.: Reliable and early diagnosis of bacterial blight in pomegranate caused by Xanthomonas
axonopodis pv punics sensitive PCR technique
18. Sharma, J., Sharma, K.K., Kumar, A., Mondal, K.K., Thalor, S., Maity, A., Gharate, R.,
Chinchur, S., Jadhav, V.T.: Pomegranate bacterial blight: symptomatology and rapid inoculation
technique for Xanthomonas axonopodis pv punicae. J. plant Pathol.
19. Jain, K., Desai, N.: Pomegranate the cash crop of India: a comprehensive review on agricultural
practices diseases. Int. Res. Health Sci. Res.
Medical Image Enhancement Technique
Using Multiresolution Gabor Wavelet
Transform
Kapila Moon and Ashok Jetawat
Abstract Medical images are applied for analysis and diagnosis of particular
medical disorder or diseases. Hence, the medical image enhancement technique is
necessary and challenging for further processing through computer vision systems.
It assists in further processing of medical images for segmentation, detection and
prediction of certain diseases such as cancer, tumor and any other disorder. Most
of the medical images obtained through various sources are dark and seem to be
noisy that requires efficient image enhancement technique that preserves the content
of the images. In this paper, the image enhancement technique through multireso-
lution Gabor wavelet transform is presented. Gabor wavelet transform has demon-
strated multiresolution capabilities with better texture enhancement that helps in
quality improvement in medical images. Experiments based on public dataset reveal
better performance with respect to qualitative and quantitative analysis. Experimental
results on several low illuminated medical images demonstrate best results in terms
of enhancement parameters and visual testing. Finally, the obtained outcomes are
compared with the prominent methods published in the literature.
Keywords Image enhancement · Medical images · Multiresolution · Gabor

wavelet transform · Low illumination
K. Moon (B)
Department of Electronics Engineering, Ramrao Adik Institute of Technology, Navi Mumbai,
India
e-mail: kapila.moon@gmail.com
A. Jetawat
Faculty of Engineering, Pacific Academy of Higher Education and Research University, Udaipur,
India
e-mail: drashokjetawat@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_41
386 K. Moon and A. Jetawat
1 Introduction
Medical images are mostly captured in low light environment and are affected by
noise and low contrast. Therefore, it is utmost necessary to apply image enhance-
ment techniques so as to make it suitable for further processing through computer
vision systems. These systems process medical images for segmentation, detection
and prediction of certain diseases such as cancer, tumor and any other disorder.
During day time or better illumination, the image quality may be sufficient for its
indented application, but especially during night and low illumination circumstances,
the image quality may be worse and affect the correct diagnosis of the disease [1–
3]. Some of the images obtained in low illumination are depicted in Fig. 1. Many
image enhancement techniques are proposed by researchers and developers based
on spatial and transform domain. Most of the spatial domain techniques are based on
histogram equalization for contrast enhancement, adaptive mean filtering technique
that uses statistical methods and models for removing noise. Whereas transform
domain techniques apply frequency domain techniques such as Fourier transform
and wavelet transform for image enhancement and contrast stretching. Nowadays,
machine learning techniques are being explored to enhance the region of interest and
further processing of medical images.
A technique based on multiresolution Gabor wavelet transform for medical image
enhancement is presented in this paper. Medical image enhancement technique is
necessary and challenging for further processing through computer vision systems.
It assists in further processing of medical images for segmentation, detection and
prediction of certain diseases such as cancer, tumor and any other disorder. Our contri-
bution can be summarized as first we apply fixed Gabor wavelet transform, secondly
variable Gabor wavelet transform is applied at several resolution and finally the results
are summed to obtain a no noise and contrast stretched image. Further, the paper is
organized in various sections that includes introduction in Sect. 1, related work is
discussed in Sect. 2, presented methodology is explained in Sect. 3, experimental
results are elaborated in Sect. 4 and conclusion is discussed in Sect. 5.
Fig. 1 Some examples of

medical images obtained
under low illumination
Medical Image Enhancement Technique Using Multiresolution … 387
2 Related Work
Many researchers, scientist, medical practitioners, engineers and medical imaging

equipment developers and manufacturers have expressed the necessity of image
enhancement technique for correct analysis and diagnosis [4–6]. Many image
enhancement techniques are proposed by researchers and developers based on
spatial and transform domain. Several techniques in spatial domain are proposed by
researchers based on histogram equalization that mainly focuses on image contrast
[7]. However, it is observed that dark regions within an image are not restored appro-
priately. Single scale retinex (SSR) [8], multiscale scale retinex (MSR) [9] and multi-
scale retinex with color restoration (MSRCR) [10] techniques based on frequency
domain approach are proposed, but unsuitable for medical images that need better
contrast. Yet other techniques were applied using basic image processing algorithms
such as erosion, median filter, dilation, outlining and edge detection for proper extrac-
tion of region of interest and further segmentation to detect cancer nodules [11–13].
Nowadays, machine learning techniques are being explored to enhance region of
interest and further processing, whereas these techniques completely depend on the
network that learns enhancement function, appropriate training set of medical images
and need such large database [14, 15]. Convolutional neural network (CNN) is now
mostly applied in the detection of abnormalities in the medical images. Training the
CNN is the most important step that necessitates the appropriate dataset of images.
It requires quality images which are needed to extract feature vectors to the train the
network. But most of the medical images are obtained with the unnecessary noise and
distortion [16–18]. Therefore, the medical image enhancement technique for dark
and noisy images is required that enhances features in the dark region of the image
and removes noise and assist in further processing for better analysis and diagnosis.
3 Multiresolution Gabor Wavelet Transform
Fourier transform is one of the best tools to obtained frequency response of the audio,
image and video signals. However, Fourier transform loses the frequency spectrum
information that makes it inappropriate for restorage of image and further processing
especially through convolutional neural network (CNN). Gabor wavelet transform
offers better multiresolution approach that represents the texture of an image. We
apply Gabor filter to extract global features from the whole medical image. The 2-D
Gabor function can be specified by the frequency of the sinusoid w and the standard
deviation σ x and σ y of the Gaussian envelope as shown in Eq. (1)

x2 2
1 − 21 + y 2 +2π jwx
g(x, y) = e σx2 σy
(1)
2π σx σ y
Let g(x, y) be the mother Gabor wavelet, filter coefficients can be obtained by
appropriate dilations and rotations of g(x, y) depicted in Eq. (2).
gmn (x, y) = a −m g(x̃, ỹ) (2)
where m specifies the scale, whereas n specifies the orientation respectively of the
wavelets, the m and n are the integers whose values can be given as m = 0, 1, 2, …,
M − 1, n = 0, 1, 2, …, N − 1. The integers designated by M and N represent the
total number of scales and orientations applied in wavelet transform respectively and
can be represented as Eqs. (3) and (4)
x̃ = a −m (x cos θ + y sin θ ) (3)
ỹ = a −m (−x sin θ + y cos θ ) (4)
where a > 1 and θ = 2π /N. Let I(x, y) be the gray level an input medical image, the
convolution of this image I with a Gabor kernel Gmn is given using Eq. (5).

Gmn (x, y) = I (x − s, y − t)g∗mn (s, t) (5)
s t
where s and t are the filter mask size variables, g∗mn is the complex conjugate of Gabor
function gmn . Application of Gabor filters on the whole medical image with different
orientation and scale, an array of magnitudes are obtained. These magnitudes at a
different scale and orientation of the image are finally summed to obtain a no noise
and contrast stretched image.
Investigational results for established dataset [19], it consists of brain tumor dataset
containing 3064 images from 233 patients with three kinds of brain tumor menin-
gioma (708 slices), glioma (1426 slices) and pituitary tumor (930 slices) for
enhancing low light images through multiresolution Gabor wavelet transform. Most
of the medical images obtained through various sources are dark and seem to be noisy
that requires efficient image enhancement technique that preserves the content of the
images. In this paper, the image enhancement technique through multiresolution
Gabor wavelet transform is presented. Gabor wavelet transform has demonstrated
multiresolution capabilities with better texture enhancement that helps in quality
improvement in medical images. To appropriately quantify our results, three param-
eters were explored mean average error (MAE), peak signal to noise ratio (PSNR)
and image enhancement factor (IEF) defined in Eqs. (6), (8) and (9), respectively.
MAE, PSNR and IEF need base image to evaluate the parameters.

2552
PSNR = 10 ∗ log10 (6)
1
m∗n
∗ se
where m and n are size of the images.

m
n
se = |I (x, y) − O(x, y)|2 (7)
x=1 y=1
where I(x, y) and O(x, y) are obtained base input and output/restored image,
respectively.

m
n
MAE = |I (x, y) − O(x, y)| (8)
x=1 y=1
m n
x=1 y=1 |In(x, y) − I (x, y)|2
IEF = m n (9)
x=1 y=1 |I (x, y) − O(x, y)|2
where In(x, y) is low illuminated input image.

Results obtained through various techniques such as histogram equalization (HE),
wavelet transform (DWT) through Haar wavelet and discrete Fourier transform
(DFT) are depicted in Fig. 2 and performance parameters MAE, PSNR and IEF
are tabulated in Table 1. It clearly indicates the performance with respect to quali-
tative and quantitative analysis. Our method comparatively achieves the higher IEF
and PSNR and lower MAE. Figure 3 demonstrates the restored images from various
low light illuminated medical brain images taken from the dataset.
5 Conclusion
In this paper, image enhancement technique through multiresolution Gabor wavelet

transform is presented. Medical images are mostly captured in low light environment
and affected by noise and low contrast. Convolutional neural network (CNN) is now
mostly applied in the detection of abnormalities in the medical images. Training the
CNN is the most important step that necessitates the appropriate dataset of images.
It requires quality images which are needed to extract feature vectors to the train the
network. Therefore, it is utmost necessary to apply image enhancement techniques
so as to make it suitable for further processing through computer vision systems.
Experimental results on several low illuminated medical images demonstrate best
results in terms of enhancement parameters and visual testing. Gabor wavelet trans-
forms through its multiresolution approach demonstrates better image enhancement
Input image captured under low Output image(HE)

illumination
Output image (DFT) Output image (DWT)
Output image (Gabor filtered) Output image (our work)
Fig. 2 Experimental results obtained on public dataset through various methods
Table 1 Performance parameters

Technique HE DFT DWT Our work
MAE 85.64 41.84 2.58 0.017
PSNR (dB) 8.61 12.81 33.50 63.55
IEF 38.18 107.94 15,096.0 1,311,900
as compared with DWT, DFT and histogram equalization especially for low illumi-
nated images. Thus, it is preferred to apply multiresolution Gabor wavelet transform
for preprocessing of medical images that assist in further diagnosis and analysis.
Fig. 3 Experimental results Input image captured Output image (our work)
obtained on public dataset under low illumination
through multiresolution
Gabor wavelet transform
(our work)
References
1. Kadir, T., Gleeson, F.: Lung cancer prediction using machine learning and advanced imaging
techniques. Transl. Lung Cancer Res. 7(3), 304–312 (2018)
2. Makaju, S., Prasad, P.W.C., Alsadoon, A., Singh, A.K., Elchouemi, A.: Lung cancer detection
using CT scan images. Procedia Comput. Sci. 125, 107–114 (2018)
3. Zhang, G., Jiang, S., Yang, Z., Gong, L., Ma, X., Zhou, Z., Bao, C., Liu, Q.: Automatic nodule
detection for lung cancer in CT images: a review. Comput. Biol. Med. 103, 287–300 (2018)
4. Zhang, J., Xia, Y., Cuia, H., Zhang, Y.: Pulmonary nodule detection in medical images: a survey.
Biomed. Signal Process. Control 43, 138–147 (2018)
5. Uzelaltinbulat, S., Ugurb, B.: Lung tumor segmentation algorithm. Procedia Comput. Sci. 120,
140–147 (2017)
6. Nithila, E.E., Kumar, S.S.: Segmentation of lung nodule in CT data using active contour model
and Fuzzy C-mean clustering. Alexandria Eng. J. 55, 2583–2588 (2016)
7. Abdullah-Al-Wadud, M., Kabir, M.H., Dewan, M.A.A., Chae, O.: A dynamic histogram equal-
ization for image contrast enhancement. IEEE Trans. Consum. Electron. 53(2), 593–600
(2007)
8. Jobson, D.J., Rahman, Z.-U., Woodell, G.A.: Properties and performance of a center/surround
retinex. IEEE Trans. Image Process. 6(3), 451–462 (1997)
9. Rahman, Z., Jobson, D.J., Woodell, G.A.: Multi-scale retinex for color image enhancement.
In: Proceedings of 3rd IEEE International Conference on Image Processing, pp. 1003–1006
(1996)
10. Jobson, D.J., Rahman, Z.-U., Woodell, G.A.: A multiscale retinex for bridging the gap between
color images and the human observation of scenes. IEEE Trans. Image Process. 6(7), 965–976
(1997)
11. Sharma, D., Jindal, G.: Identifying lung cancer using image processing techniques. In:
International Conference on Computational Techniques and Artificial Intelligence (ICCTAI),
pp. 115–120 (2011)
12. Chaudhary, A., Singh, S.S.: Lung cancer detection on CT images by using image processing.
In: IEEE International Conference on Computing Sciences (ICCS), pp. 142–146 (2012)
13. Gupta, A., et al.: Methods for increased sensitivity and scope in automatic segmentation and
detection of lung nodules in CT image. In: IEEE International Symposium on Signal Processing
and Information Technology (ISSPIT), pp. 375–380 (2015)
14. Shen, L., Yue, Z., Feng, F., Chen, Q., Liu, S., Ma, J.: MSRnet: low-light image enhancement
using deep convolutional network. Available https://arxiv.org/abs/1711.02488 (2017)
15. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional
networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)
16. Jakimovski, G., Davcev, D.: Using double convolution neural network for lung cancer stage
detection. Appl. Sci. 9(427), 1–12 (2019)
17. Chapaliuk, B., Zaychenko, Y.: Deep learning approach in computer-aided detection system for
lung cancer. In: IEEE International Conference on System Analysis and Intelligent Computing
(SAIC), Ukraine (2018)
18. Li, Z., Li, L.: A novel method for lung masses detection and location based on deep learning. In:
IEEE International Conference on Bioinformatics and Biomedicine (BIBM), America (2017)
19. https://figshare.com/articles/brain_tumor_dataset/1512427/5
HOMER-Based DES
for Techno-Economic Optimization
of Grid
R. Raja Kishore, D. Jaya Kumar, Dhonvan Srinu, and K. Satyavathi
Abstract This study presents the techno-economic feasibility on grid connected

distributed energy system (DES) or micro-grid for a big technical institute. It is
concentrated on how to optimize the electricity utilization from the grid by deliv-
ering, however, much as could reasonably be expected renewable energy and not
withstanding that it integrates green vehicle transportation usage such as hydrogen
gas and electric cars which are necessary elements of a sustainability in the proposed
system. The work is initiated collecting the institute monthly electrical load data,
climate data and associated monetary data with the aim of investigating a renewable
energy supply system feasibility study. Different scenarios are developed according
to the project needs and the scenarios were modelled by HOMER software. The study
concludes with a direct comparison of the economic feasibility, renewable energy
fraction and emission among all system looks for appropriate sustainable solution.
This study will provide helpful insights into the relevant stack holders and policy-
makers in the development of grid connected distributed energy systems. This can be
achieved by HOMER programming that simulates a hundred or even a huge number
of techniques. HOMER simulates the procedure of a hybrid micro-grid for an entire
year, in time frames from one moment to 60 min.
Keywords DES modelling · HRES · HOMER size and cost optimization
R. Raja Kishore (B) · D. Jaya Kumar · D. Srinu

Department of ECE, Marri Laxman Reddy Institute of Technology and Management, Hyderabad,
India
e-mail: rajakishore@mlritm.ac.in
D. Jaya Kumar
e-mail: jayakumar@mlritm.ac.in
D. Srinu
e-mail: srinudhovan@gmail.com
K. Satyavathi
Department of ECE, Nalla Malla Reddy Engineering College, Hyderabad, India
e-mail: satyanarayana.ah@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_42
394 R. Raja Kishore et al.
1 Introduction
In the present scenario, every country places considerable importance on sustainable

development and energy security. Hence, hybrid renewable becomes more significant
sources. Energy requirement is an essential need to enhance the income, improved
life quality of individuals [1]. Developing countries on the way of growing their
economies are in much demanding for electricity access to facilitate their economical
and industrial growth.
These days renewable power source assets are one of the promising approaches
to address numerous issues. Environmental change, desertification, nursery impact
and so on lead the world towards reasonable energy period [2]. Utilizing common
and sustainable assets, for example, wind, sunlight-based geothermal, tidal, wave
and hydroelectric offer clean choices for non-renewable energy source [3]; in which
they are omnipresent, abundant, free, clean and easily accessible even in isolated
and undeveloped places. Sources of energy that can be repeatedly generated are of
the challenges for the development for a system of energy, i.e. renewable with least
socio-economic problems is to design a new form of fuel [4]. This energy that can be
generated repeatedly from the same source needs to be appropriately designed and
evaluated from beginning stages [5]. Unreliable nature of these sources of energy
is one of short comings in the process of their development, in particular as thereof
more reliable energy sources available to be replaced when it is essential to distribute
the load [6]. This inadequacy coupled with a huge expense at the beginning as well
as depending heavily on conditions of weather and climate makes it essential to put
together various renewable resources to make up a fusion of different systems which
can be used with more flexibility more cost effectively, reliably and efficiently [7].
But planning assessing carefully should make sure effective carrying out of hybrid
of this new power system. The thing the operators trained, getting the participation
of the community on programs of electrification and supervising the setting up and
commissioning having the maintenance structure, in force and following up the
maintenance and reporting of essential for successful implementation of a hybrid
power system.
2 HOMER Software
The HOMER Pro is small scale lattice programming and the HOMER Energy is the
worldwide standard for advancing smaller-scale network structure in all territories,
from network force and island advantages to matrix connected grounds and the
army installations [3]. At first, it will be created at the national regeneratable energy
laboratory. This system can improve and distributed by HOMER Energy. Hybrid
optimization model for multiple energy resources (HOMER) nests three controlling
tools in one software product, so that production and financial side work side by side
are shown in Fig. 1.
HOMER-Based DES for Techno-Economic Optimization of Grid 395
Fig. 1 HOMER Pro screenshot
2.1 Simulation
By the side of its basic, HOMER is a reproduction model it wills effort to simulate
a feasible method for all possible arrangements of the tools that you hope to study.
Reliant on how you complex your concern, HOMER programming can simulate a
hundred or even a huge number of techniques. HOMER simulates the procedure of
a hybrid micro-grid for an entire year in timeframes from one moment to 60 min.
2.2 Optimization
HOMER concentrates every single possible mix of game plan types in a lone single
run, and afterwards bunches the frameworks as per the advancement adaptable of
ideal. HOMER Pro structures our unique streamlining process that impressively
improves the technique method for grouping least-value opportunities for micro-
grids or other distributed generation of electrical power systems. HOMER optimizer
is a patented “derivative free” optimization process that was calculated specifically
to effort in HOMER is shown in Fig. 2.
3 Case Study
These case studies were developed to test the ability of the multilevel optimization
method to analyse remote communities with different climate conditions. For this, we
Fig. 2 Optimization window
Fig. 3 Proposed location
consider a Marri Laxman Reddy Institute of Technology and Management campus

having three blocks (main block, SR Block and MV block) which is located in
Hyderabad, Telangana (in Fig. 3).
The college campus is selected as they exhibit distinct climate conditions of solar
energy. Table 1 shows the mean horizontal isolation (Fig. 4).
Since we are interested in analysing on the influence of climate conditions, we
fixed the load profile for the campus. Consequently, for this case, we used the
sample load profile is shown in Fig. 4, as it mimics the average electricity demand
(168.4 kwh/day) for campus.
3.1 Renewable Energy Resources (Solar Resource)
Data solar resource input data for HOMER is made up of monthly averaged daily
insulation incidents on a horizontal surface (kWh/m2 /day) from the NASA Surface
Meteorology and Solar Energy (SSE) website. NASA gives monthly averaged values
from 10 years of data. Due to the close distance, the location data of the city Hyder-
abad is used as location data of the college Marri Laxman Reddy Institute of Tech-
nology and Management in the study. The following location data is used to find the
solar radiation data:
Table 1 Monthly average

Month Clearness index Daily radiation Kwh/m2 /day
solar global horizontal
irradiance January 0.645 5.060
February 0.663 5.820
March 0.646 6.360
April 0.616 6.510
May 0.580 6.280
June 0.447 4.840
July 0.395 4.260
August 0.395 4.180
September 0.452 4.529
October 0.532 4.790
November 0.607 4.850
December 0.630 4.740
Fig. 4 Power demand for a sample week of an average sized campus
Latitude: 17° 35 39.51 N

Longitude: 78° 24 59.28 E
Time Zone: Eastern Time Zone (UTC5:30) New Delhi
In Table 1, he obtained solar radiation data for the campus Marri Laxman Reddy
Institute of Technology and Management is presented.
The solar resource raw data inputting to the software is the average global hori-
zontal radiation measured in 10-min time interval over the two years. On top of the
solar resources data, the latitude and longitude of this area would also be used as an
input. The time zone is another parameter to be set. The college is at latitude: 17°
35 39.51 N, longitude: 78° 24 59.28 E, and with time zone of UTC +5:30. The
annual solar radiation available within the study location is 5.18 kWh/m2 /day using
HOMER.
HOMER assesses PV exhibit power for the year on an hourly premise and uses
scope latitude value to compute the normal everyday radiation from the clearness
index and vice versa. The yearly arrived at the midpoint of day by day solar insulation
here was seen as 5.18 kWh/m2 /day. The proficiency of the PV exhibits anything, but
HOMER input, in light of the fact that the product does not assign the PV cluster
size as far as m2 , yet in kW of rated capacity. The rated capacity is the measure
of intensity the PV module gets under STC and records for the board proficiency.
By overseeing evaluated limit, HOMER has no compelling reason to manage the

effectiveness, since two modules with various efficiencies (and a similar territory)
would be set to various sizes.
Solar resource data was downloaded at 4/21/2020 11:21:02 AM from NASA
Surface Meteorology and Solar Energy data base
NASA Surface Meteorology and Solar Energy data base
Cell Number: 107258
• Cell Dimensions: 1 Degree *1 Degree
• Cell Midpoint Latitude: 17.5
• Cell Midpoint Longitude: 78.5
• Annual Average Radiation: 5.18 Kwh/m2 /day.
4 Description of Hybrid Renewable Energy System
The energy system that is proposed is expected to meet the demand of the load
of electricity of the community that will also include classrooms. The source of
renewable energy considered here are mainly of solar and wind due to the unstable
nature of renewable energy battery bank is employed as storage system. In this
configuration, a two-way converter is inserted. This is used to change the battery
power in terms of AC type voltage into DC type voltage. It supplies AC type power
back from battery to AC type load to consumers. AC type power is required by all
the consumers, part of the input values into the software are given according to size
and quantity. The other components are solar PV and converter; these two are also
vary in size. The simulated model of the hybrid architecture considered in this paper
is presented in Fig. 5.
Fig. 5 Architecture of the

selected technologies of the
hybrid system
5 Size and Cost Optimization
Immediately after selecting the segments innovation from the library of HOMER
programming, enter the power load into the demonstrating apparatus. The essential
load input entered on a 24 h information basis and from that point the product models
a peak load. It additionally combines the month to month load from a 24-h input
information. This paper describes an essential power burden and its data sources. It
groups an end of the week load and for August, January and further rest of the months
produced by HOMER in the after generated the 24 h load information portrays the
diurnal variety of the essential burden profile of the college Fig. 5 demonstrates the
essential load demands and shows that load profile changes during the day. The load
is going to zero from midnight to 6:00 clock in the morning. The load is about to
raises the demand from 6:00 to 9:00 o’ clock. Around lunch time, i.e. from 12:00
AM to 2:00 PM there is a greater demand in power. There is a greater demand
for power around dinner time; however, peak hour is from 6:00 PM to 12:00 PM
midnight. This schematic clearly demonstrates that electricity ID consumed most for
lightening purpose.
5.1 Cost Data and Size Specifications of Each Component
The cost of the modules as chief purpose of the work is investigating the best power
system contour which would meet the requirements with minimum NPC and COE
is the basic criteria relevant to the selection of the power system components in this
thesis. The cost of equipment was estimated on the basis of a current cost available
in the market.
Initial capital cost:
The total installed cost used in purchasing and installing components in the beginning
of the project.
O&M cost:
The cost of maintaining and operating of the system is the O&M cost. All the compo-
nents related to this scheme considered in this project as variable operational cost
and maintenance cost. Miscellaneous O&M cost mentioned by HOMER is emission
damages, capacity shortage, penalty and fixed operational and maintenance cost.
Replacement cost:
It is compulsory to change wear out modules at the end of its lifetime initial cost of
components is different from this because of all the spaces of the components are
not necessary to be replaced at the end of the life cycle and cost born by donors may
make up or reduce the starting cost. However, a new cost may not be considered as
travel cost.
Table 2 Size and cost of PV panel

Size of PV (kW) Capital cost ($) O&M cost Life span of PV Considered sizes
($/year) (year) (kW)
1 500 250 15 0, 100,200
5.2 Solar PV Size and Cost
The reason for choosing after considering different products regarding the cost
provided them with four modules having product was chosen from the stated company
because of its low cost. This is expected to give an efficient service for a consider-
able long time. We considered a 50 KW solar panel 250 W capacity delivered by
Generic PV Company. The panel is known as Generic Flat Plate PV built with
mono-crystalline silicon we have efficiency of 20.4% and price range from $1.16
to $1.31/W. The insulation cost is taken as 60% of PV price. The operation and
maintenance would expected to 1% per year and other details found in Table 2.
6 Simulation Results and Discussions
Optimization results are presented in an overall and classified form showing the most
workable power system structure which is suitable for a load and input workable
solutions are appeared in an expanding order of the net present cost in a dropping
request. A characterization table gives the least cost effective from all the units’
setup. While the general enhancement results introduced all the moderate frame-
works mixes dependent on our NPC. Net present expense and net present cost were
the basics of selecting the power systems. The parameters were like low excess elec-
tricity generation, low capacitive shortage and high renewable fraction are used for
illustration of power generation schemes in order to test their technical feasibility.
Optimization results for a selected hybrid power system are shown in Fig. 6.
7 Conclusion
Configuration of a viable system design of a modified renewable energy system

using distributed energy resources for the application in the college has been done
for distant cases of connections of renewable sources. Case studies are already done
with considering the solar as renewable source with different cases. Another case
study was complete by recon the energy load and the resource availability for elec-
tricity production here. The scale annual average electrical load at the college was
estimated 168.4 Kw/day with peak load of 27.34 Kw. The electrical analysis shows
that remaking scenarios are not economical fit for the current situation. During the
Fig. 6 HOMER results
conditions figured in this study, the relatively low NPC of the system is much depen-
dent under among the price at which power can be sold at grid. Therefore, of selling
electricity as an important role for the systems economical suitable such an agree-
ment would make the energy system much more economically viable for the college,
which would continue not power to the stage power grid, reduce the CO2 emissions
and contribute to an increase renewable energy use and increased availability of
power supply.
References
1. Vendoti, S., Muralidhar, M., Kiranmayi, R.: Optimization of hybrid renewable energy systems
for sustainable and economical power supply at SVCET Chittoor. i-manager’s J. Power Syst.
Eng. 1(1), 26–34 (2017)
2. Boqtob, O., El Moussaoui, H.: Optimal sizing of grid connected micro grid in Morocco using
Homer Pro. In: IEEE Conference Proceedings (2019)
3. Vendoti, S., Muralidhar, M., Kiranmayi, R.: HOMER based optimization of solar-wind-diesel
hybrid system for electrification in a rural village. In: IEEE Digital Library Explorer pp. 1–6
(2018)
4. Vendoti, S., Muralidhar, M., Kiranmayi, R.: Techno-economic analysis of off-grid
solar/wind/biogas/biomass/fuelcell/battery based system for electrification in a cluster of villages
by HOMER software. Environ. Dev. Sustain. (2020)
5. Fernando, W., Gupta, N., Kamya, G., Ozveren Suheyl, C.: Feasibility study of small scale battery
storage systems integrated with renewable generation technologies for Sri Lankan domestic
applications. IEEE Conference Proceedings (2019)
6. Khasawneh, H.J., Mustafa, M.B., Al-Salaymeh, A., Saidan, M.: Techno-economic evaluation
of on-grid battery energy storage system in Jordan using Homer Pro. AEIT (2019)
7. Marais, S., Kusakana, K., Koko, S.P.: Techno-economic feasibility analysis of a grid-interactive
solar PV system for South African residential. In: 2019 Proceedings of the 27th Domestic Use
of Energy Conference, pp. 163–168 (2019)
Odor and Air Quality Detection
and Mapping in a Dynamic Environment
Raghunandan Srinath, Jayavrinda Vrindavanam,

Rahul Rajendrakumar Budyal, Y. R. Sumukh, L. Yashaswini,
and Sangeetha S. Chegaraddi
Abstract Timely collection of biodegradable wastes if not collected regularly can

pollute the environment and surroundings and can be a major health risk. This paper
proposes a cost-effective technique that can detect the garbage on the pavements and
decentralized collection points through odor sensing. The system will be equipped
in a moving vehicle and consist of a MQ series sensor which detects the foul smell.
The sensor is designed to send the information which is a value that indicates the
level of toxicity of the smell and location with the help of global positioning system
fitted in the vehicle. The information will be sent with the support of LoRa (Long
Range) network and cloud. The location-wise level of toxicity captured in a master
screen would support the authorities in prioritizing the areas to be cleaned up first
and would also support in monitoring the results of the action.
Keywords Odor · Air quality · Heltec ESP32 · LoRa · MQ series sensor ·

MapBox (API) · GPS · Firebase (database)
R. Srinath
SenZopt Technologies, Bengaluru, India
e-mail: raghu@senzopt.com
J. Vrindavanam (B) · R. R. Budyal · Y. R. Sumukh · L. Yashaswini · S. S. Chegaraddi
Department of ECE, Nitte Meenakshi Institute of Technology, Bengaluru, India
e-mail: jayavrinda.v@nmit.ac.in
R. R. Budyal
e-mail: rahulrbud99@gmail.com
Y. R. Sumukh
e-mail: yrsumukh@gmail.com
L. Yashaswini
e-mail: yashaswina428@gmail.com
S. S. Chegaraddi
e-mail: sangeethaschegaraddi@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_43
404 R. Srinath et al.
1 Introduction
The waste disposal management systems of the cities around the world has been
facing challenges on account of ever-increasing urbanization and concomitant rising
volume of waste generated and littering. Public waste bins are filling up faster than
ever and inevitably many of the bins overflow prior to collection, causing not only
bad odors and garbage overflow nearby to the dumping area, but also poses health
hazards and environmental pollution as overflowing, and uncollected waste bins
are a perfect location for the growth of bacteria, insects, and vermin. The flies that
feed on the rubbish can spread the diseases like typhoid, gastroenteritis, and other
major illnesses. Overflowing garbage smell causes various respiratory diseases, fall
in air quality, adverse health effects in the form of spreading of disease causing
pathogens entering into human body through breathing and contacts. Though there
can be quite a few substances in the waste, in general, the air gets polluted with
gases like carbonic acid gas, laughing gas, ammonia, and methane. In daily life, we
identify the polluted air through smelling of odors, which are usually caused out of
decomposition of bio-degradable items. So as to make sure timely collection of food
and other perishable wastes at risk of degeneration, municipal authorities in certain
localities have introduced static sensors in areas like the market and other specific
locations. Since the wastes are generated at wider locations within the cities, timely
collection of biodegradable wastes assume importance as the delays can pollute the
air and the nearby environment. Instead of installing a few sensors at a specific
location, this paper proposes to introduce placing of the sensors in moving objects
like vehicles and tracking all sorts of odor so that the city-wide odors can be sensed,
and appropriate action can be initiated. When the sensors are fitted with large number
of vehicles, inputs on the air odor can be obtained from multiple locations found be
a better and effective approach that can support cleaner environment.
The odor sensing device proposed in this paper continuously detects, measures,
and monitors the odor gaseous contaminants. The solution incorporates Odor Atmo-
spheric Dispersion Modeling (OADM) for predicting odor impact on the surrounding
area depending on meteorological conditions. With the help of meteorological data,
the odor sensing device can trace the odorant dispersion plume incited by condi-
tions like wind speed and wind direction. The odor sensing device uses LoRa a
low-power wide-area network (LPWAN) technology network, which is one of the
best cost-effective approaches in such conditions. The odor sensor is being imple-
mented by using chemical sensors (MQ-2, MQ-3, MQ-9, MQ-135, etc.), air quality
sensors. Whenever the brink point of the chemical sensor is reached, the sensor data
is shipped to the LoRa gateway together with the placement (longitude and latitude)
of the vehicle. The LoRa main hub is placed every 3–5 km radius. The LoRa receiver
receives this data and sends it to the cloud. From the cloud, the municipality can take
action to scrub that area. All this data is shipped to the corporation for the upkeep of
cleanliness and keeping the environment clean.
Odor and Air Quality Detection and Mapping … 405
In the present scenario, in most of the countries, the sensors introduced by the
municipal authorities are static and accordingly, placed only in a few select loca-
tions. Given the increased attention to cleanliness amongst the cities (like Swachh
Bharat Abhiyan (Clean India Mission) and ranking of cities based on cleanliness
(e.g., in India), the project can support relevant authorities for ensuring better living
conditions by reducing the hazardous odor and timely waste collection by using the
proposed system, which can be a moving sensor that can detect the foul smell in any
part of the cities.
Sect. 1 as above is on introduction, and Sect. 2 explains the literature review.
Proposed system is discussed in Sect. 3. Results are analyzed in Sects. 4 and 5 is on
conclusion.
2 Literature Review
In the year 2004, a study has brought out a detection instrument to identify odor pollu-
tion in the environment [1]. Keeping in view the criticality of the waste management
for ensuring better public health, thereafter, a number of researches were carried on
the organic components and chemical compositions present in odor. The measure-
ment method was introduced for odor emission capacity to describe the number
of odorants present. Odor compounds can also be recognized by means of detec-
tion instruments like gas chromatography. Yet another method introduced has been
the E-nose, which is an instrument developed to approach the biological olfaction
system. This system comprised of the electronic, chemical sensors with partial speci-
ficity, and an appropriate pattern recognition system, capable of recognizing simple
or complex odors. In another study [2] on odor detection methods, such as olfac-
tometry and chemical sensors, the study examined the state of both human and
instrumental sensing currently used for the detection of odors. The olfactometric
techniques employing a panel of trained experts were discussed and the strong and
weak points of odor assessment through human detection highlighted. Further, the
paper also discussed the merits and demerits of the instrumental sensory methods and
chemical sensors. The limitations of the dynamic olfactometry are that it provided
point odor concentration data, which is not sufficient to have a full evaluation. There
are also studies that have attempted comparison and integrations between olfactom-
etry and E-nose, and their outcomes were listed out. Another paper viewed that using
more than one approach is required for better understanding of olfactory nuisance
cases. Monitoring household garbage odors in urban areas through distribution maps
were proposed [3] and introduced e-nose, which is equipped by bike, sensors such
as MQ series (2, 9) and TGS series (2620, 2602a, 2602b). The limitation is use of
bicycle for monitoring the waste; furthermore, bicycle is equipped with expensive
devices such as laptop, GPS module, and e-nose, etc. Other studies carried out by
Deepak and Neeta [4] surveyed odor detection and sensors in the year 2017. The
paper reviewed various odor detection system and sensors that can be employed for
real world in the field of detection, identification, and classification of various odor
present in the air. Surface acoustic wave sensors were created to detect multiple
volatile organic compounds. Metal oxide sensor also possesses the detection capa-
bility of volatile compounds and detects molecules in a range. New sensors were
introduced for the biological olfactory system called biosensors. Gas chromatog-
raphy system which results in the detection of VOCs. And various odor detection
sensors give depth knowledge about various aspects of odor that is used during
detection and classification. And also increases the efficiency of odor detectors.
Dabholkar et al. [5] proposed a method for illegal dumping detection. The authors
have demonstrated that application of deep learning approach can facilitate auto-
matic detection of illegal dumping. The authors explored multiple design trade-offs
to achieve better accuracy with a smaller memory footprint. Okokpujie [6] and others
introduced the idea of a smart air pollution monitoring system to continuously track
air quality, and the measured air quality indicators were displayed on a screen. The
authors have developed a platform named, “Thing speak,” where the collected indica-
tors were displayed. The purpose of the system was to enhance public awareness on
air quality. This monitoring device was capable of delivering real-time measurements
of air quality.
Smart air quality monitoring system with LoRa (Long Range) WAN was proposed
by Thu [7] and others in the year 2018. The smart air quality monitoring system,
which the paper described as an end-to-end system was implemented in Yangon,
the business capital of Myanmar. The smart system, according to the paper, allowed
the users to access the online dashboard to monitor the real-time status of the air
quality and also had the feature that allowed the users to retrieve the past data, all
by themselves. The smart system also has the feature of adding sensor nodes and
gateways in case the implementation team decides to extend the scope of monitoring
the area. Yet another intervension has been the IOT-based E-tracking system [8]
that enables monitoring of garbage. The paper viewed that the proposed application
has been economical apart from being a long-range automated garbage monitoring
system. The system was capable of generating real-time data and analysis, supported
through a web portal and android application. Through android application, system
sends the notification to the garbage collector about garbage level along with its ID
and location using GPS module. This feature also supported in the route optimiza-
tion. In addition, the proposed system has made use of machine learning model to
predict the impact of air pollution based on the collected air quality parameters for
a given period of time. This has enabled pro-active management of resources and its
deployment and also contributed in enhanced efficiency in terms of time and cost.
Unlike the above studies, the present implementation is novel and cost effective, as
it uses vehicles as a platform for fixing the sensors, and the online connectivity is
achieved through the LoRa.
3 Proposed System
The proposed system uses MQ-2 gas sensor, which is highly sensitivity to LPG,
i-butane, propane, methane, alcohol, hydrogen, and smoke. MQ-135 gas sensor is
sensitive to ammonia, sulfide, and benzene steam, also sensitive to smoke and other
harmful gases. Heltec ESP32 module is an integrated system consisting of ESP32,
LoRa, and OLED display. Initially, the device will sense the odor values using the
MQ series sensor placed within the device. The smell is detected as bad or good
supported the edge being set. If the detected smell is bad, then the sensor will send
the information value to the LoRa, and also, it will fetch the location of the vehicle.
LoRa, as already explained in the literature, is a wireless technology that provides
long-range, low-power, and secure data transmission for M2M and Internet of Things
(IoT) applications. LoRa is predicated on chirp spread spectrum modulation, which
has low-power characteristics like Frequency Shift Key (FSK) modulation but may
be used for long-range communications. The sensor will send the sensor data together
with the situation of the vehicle (latitude and longitude) to the LoRa gateway given
that the edge value is reached. Location is fetched using the GPS module that is being
used.
The LoRa receiver will receive the information and send it to the cloud from the
cloud everyone can access the map. LoRa is low powered module. LoRa can be
connected to the vehicle battery using voltage regulators. The block diagram of the
proposed device is as shown in Fig. 1.
Advantages of the proposed model are as follows:
• Quality of the air can be measured.
• Prevents spreading of disease by detecting the foul smell of dead animals.
• Municipal’s will get the locations to be cleaned.
Fig. 1 Block diagram of the proposed system. Firebase—to fetch database, MapBox—for maps,
PyQt5—for creating GUI, MQ series sensors—odor detection, LoRa—to receive and send the data,
ESP32—microcontroller
Fig. 2 Flow chart of the proposed system
The flow chart of the proposed system is as shown in Fig. 2.

The sensor is used to detect the smell as bad or good. If the detected smell is bad,
then the sensor will send the information value to the LoRa, and also, it will fetch
the location of the vehicle.
In the proposed system as shown in Fig. 3, sensor is embedded on any vehicles plying
in the city and supports city-wide tracking of odor across multiple locations. This
would ensure that considerable data points are gathered, and the municipal authorities
can initiate actions depending upon intensity of the odor.
The GUI shown in Fig. 4 is a representative screen which provides the data, where
the garbage is detected by the vehicle. The data provided is in terms of latitudinal and
longitudinal coordinates that enable us to spot the location of garbage. The data 1 and
data 2 correspond to the value of MQ sensors, i.e., MQ-2 and MQ-135, respectively.
If sensors data exceeds the threshold value, then it indicates foul smell.
Live data will be highlighted on map, whenever the value of sensors surpasses the
threshold value as shown in Fig. 5. The bar graph on the right indicates the intensity
of foul smell which is depicted with different colors. The foul smell with highest
Fig. 3 Snapshots of the vehicle sensing the odor
Fig. 4 Snapshot of the GUI
intensity is with represented by red color, and the least intensity values are with blue.
The color code enables the municipal authorities to give priority to the red hotspot
areas over the remaining colors.
5 Conclusion
The paper has introduced a system-based odor detection and tracking that can be fitted
on a vehicle with the support of the network and GUI. The results on map provide
the areas, where the odor intensity is more, and accordingly, waste collection plan
can be initiated. Further, the system can also be used as a base support indicator for
Fig. 5 Snapshot of map showing the locations to be cleaned is detected from the vehicles
placing the right sized bins keeping in view of the volumes of waste generation or
on account of repeated triggers.
References
1. Yuwono, A., Lammers, P.S.: Odor pollution in the environment and the detection instrumentation.
Agric. Eng. Int. CIGR J. Sci. Res. Dev. 6 (2004)
2. Brattoli, M., Gennaro, G., Pinto, V., Loiotile, A.D., Lovascio, S., Michele, P.: Odor detection
methods: olfactometry and chemical sensors. Proc. J. Sens. 11(5), 5290–5322 (2011)
3. Monroy, G., Gonzalez, J.J., Sanchez-Garrido, C.: Monitoring Household Garbage Odors in
Urban Areas Through Distribution Maps. Department of System Engineering and Automation
IEEE Sensors. Valencia, November (2014)
4. Aeloor, D., Patil, N.: A Survey on Odor Detection and Sensors. Department of Computer
Engineering (2011)
5. Dabholkar, A., Muthiyan, B., Shilpa, S., Swetha, R., Jeon, H., Gao, J.: Smart illegal dumping
detection. In: 2017 IEEE Third International Conference on Big Data Computing Service and
Applications (2017)
6. Okokpujie, K., Noma-Osaghae, E., Modupe, O., John, S., Oluwatosin, O.: Smart air pollution
monitoring system. Int. J. Civil Eng. Technol. (IJCIET) 9(9), 799–809 (2018). ISSN: 0976-6308
and ISSN: 0976-6316
7. Thu, M.Y., Htun, W., Aung, Y.L., Shwe, P., Tun, N.M.: Smart Air Quality Monitoring System
With LoRaWAN (2018)
8. Gokhale, M., Chaudhari, P., Jadhav, N., Wagh, R., Smita, K.: IOT based E-tracking system for
waste management. In: The IEEE International Conference on Internet of Things and Intelligence
System (2018)
A Comparative Study
on the Performance of Bio-inspired
Algorithms on Benchmarking
and Real-World Optimization Problems
E. Lakshmi Priya, C. Sai Sreekari, and G. Jeyakumar
Abstract Biologically inspired computing, shortly known as bio-inspired

computing (BiC), follows the models of biology to solve the problems by computing.
The main objective of the study presented in this paper is to present the working
principle of three BiC algorithms with different biological bases. The algorithms
considered were genetic algorithm (GA), particle swarm optimization (PSO), and
simulated annealing (SA). These algorithms were implemented, to solve a set of
benchmarking problems and a real-world image segmentation problem, and their
performance were compared. The performance metrics used for the comparison
were the solution obtained (So), number of generations (NoG), and the execution
time (ExeTime). It was observed from the results, of benchmarking problems, that
PSO has given better solutions followed by SA and GA. For the real-world problem,
it was concluded with the results that GA has segmented the image better than SA
and PSO.
Keywords Bio-inspired algorithms · Particle swarm optimization · Genetic

algorithm · Simulated annealing · Image segmentation · Comparative study
1 Introductions
Nature, which exhibits diversity, dynamicity, complexity, robustness and fascinating

phenomenon, is a great source of inspiration for solving hard and complex prob-
lems in computer science (CS). For the past decades, numerous research efforts have
E. Lakshmi Priya · C. Sai Sreekari · G. Jeyakumar (B)

Department of Computer Science and Engineering, Amrita School of Engineering, Amrita
E. Lakshmi Priya
e-mail: cb.en.u4cse16432@cb.students.amrita.edu
C. Sai Sreekari
https://doi.org/10.1007/978-981-33-4543-0_44
412 E. Lakshmi Priya et al.
been concentrated on solving the optimization problems around us by taking inspi-

ration from the nature. Thus, the field of study bio-inspired computing (as we call
it) emerged to solve computer science problems using the models of biology. It is
observed across the CS community that the BiC algorithms are providing optimal
solutions with less computation requirements. Thus, the BiC algorithms are gradually
gaining huge prominence. There exist many algorithms under BiC, and each of them
differs from others in the way it solves the optimization problem. The algorithmic
structure of each algorithm is matched with the biological model it follows. Hence,
the articles describing the working of BiC algorithms are much appreciated by the
practitioners as well as the researchers in the community. Following this tendency,
this paper proposes to compare the performance GA, PSO and SA on well-defined
benchmarking problems and a real-world medical image problem.
The remaining part of the paper is organized with Sect. 2 for related works, Sect. 3
for presenting the results and discussion, and finally Sect. 4 for the conclusions.
2 Related Works
A comparative study of results of five algorithms: GA, PSO, artificial bee colony
(ABC) algorithm, invasive weed optimization (IWO) algorithm, and artificial immune
(AI) algorithm to solve some standard benchmark multivariable functions was
presented in [1]. The comparison of ant colony optimization (ACO) and PSO on
optimizing the membership functions of a fuzzy logic controller was presented in
[2]. Another comprehensive review and comparative study of the Bi C algorithms was
presented in [3]. A comprehensive review of the significant bio-inspired algorithms
that are popularly applied in sentiment analysis is presented in [4]. A comparative
study of four bio-inspired algorithms (GA, PSO, DBDE, and BSO) in finding optimal
energy saving pattern for an intelligent Internet of things-based system was presented
in [5].
The authors of [6] proposed a hybrid bio-inspired algorithm for load balancing
and scheduling among the cloudlets. The proposed algorithm was compared with
firefly algorithm, ACO, and ABC. Aswanth et al. [7] presented comparison of firefly
algorithm, symbiotic organism search algorithm, harmony search algorithm, and the
k-means algorithms for clustering the sensor nodes in wireless sensor networks.
A similar study on comparing the algorithms for their performance in solving the
autonomous landing problem of Unmanned aerial vehicle was presented in [8].
Following the above-mentioned trend of research, this paper proposes to compare
the performance of GA [9], PSO [10] and SA [11] on a set of four benchmarking
problems and a medical image segmentation problem.
A Comparative Study on the Performance … 413
The experimental study was performed in two phases. The Phase I compared
the performance of the algorithms on four benchmarking functions chosen from
CEC2005 [12]. The Phase II used the algorithms to solve a medical image
segmentation problem and compared their performance.
Phase I—The GA, PSO and SA were used to solve the benchmark functions
with varying dimensions. The dimensions (d) used for this study were (d = 2, d
= 5 and d = 10). The benchmark functions [12] taken for this phase were Ackley,
Rastrigin, Griewank and Bent Cigar. These functions are differing from each other
by their basic properties. The performance metrics measured (solution obtained (So),
number of generations (NoG), and the execution time (ExeTime)) for GA, PSO and
SA on solving the above benchmarking functions are presented in Table 1. As shown
in Table 1, for Ackley function, PSO gives best solutions at all the dimensions.GA
takes less execution time as the dimension increases. PSO takes more execution time
as the dimension increases. SA takes more generations to solve the function. It also
Table 1 Results for benchmarking functions

Dimension (d = 2) Dimension (d = 5) Dimension (d = 10)
Results GA PSO SA GA PSO SA GA PSO SA
Ackley function
So 4.05 4.44 0.00 7.73 8.93 0.17 7.118 0.01 18.20
e−16 e−08
NoG 299 299 29,999 299 299 29,999 299 299 29,999
ExeTime 0.98 0.796 2.13 1.02 1.09 2.28 1.35 1.656 2.886
Rastrigin function
So 1.07 0.0 0.026 9.21 1.03 0.026 18.01 5.43 0.603
NoG 299 299 29,999 299 299 29,999 299 299 29,999
ExeTime 0.9538 0.721 1.859 0.965 1.177 1.859 1.276 1.51 2.81
Griewank function
So 0.528 2.220 3.877 1.549 0.029 16.64 20.56 0.072 13.15
e−16
NoG 299 299 29,999 299 299 29,999 299 299 29,999
ExeTime 0.9171 0.753 1.879 1.028 1.044 2.381 1.405 1.476 2.811
So 0.528 2.220 3.877 1.549 0.029 16.64 20.56 0.072 13.15
e−16
Bent Cigar function
So 1.09 e6 2.53 0.136 25.8 e6 1.59 4.100 598 e6 132.34 5.17
e−33 e−08
NoG 299 299 29,999 299 299 29,999 299 299 29,999
ExeTime 0.766 0.722 1.954 0.861 0.953 2.133 1.186 1.301 2.309
noticed that the performance of the algorithms decay as the dimension increases. For
the Rastrigin function, PSO gives the best solutions at lower dimension and SA gives
best solutions at higher dimensions. Execution time taken by GA was less than other
algorithms, at higher dimensions. PSO takes more execution time as the dimension
increases. The results for the Griewank function show that PSO performs superior,
consistently, in all the dimensions in giving best solutions. The ExeTime of GA was
less than other algorithms for higher dimensions. For the Bent Cigar function, for d
= 2 and d = 5, PSO outperformed others by So. At d = 10, SA has given best So.
For d = 2, the ExeTime also less for PSO. However, for higher dimensions GA took
less ExeTime compared to other algorithms.
For unimodal functions (Griewank and Bent Cigar)—PSO has given best So
consistently in all the dimensions.GA was able to get its solutions faster, i.e., with
less execution time. SA could produce good results at few higher dimensional cases,
however, with more number of generations. For multimodal functions (Ackley and
Rastrigin)—PSO was found superior in producing best results than other algorithms
for varying dimensions. GA was found taking less execution at higher dimensions,
and PSO was taken more execution time as dimension increases. SA was found to
take more generations to solve multimodal functions also.
The Phase I of the comparative study revealed that PSO is good at solving all
the functions irrespective of its problem characteristics. However, PSO takes more
execution time as dimension increases and GA takes less time as dimension increases.
SA gives better result at higher dimensions (but not consistently) and it takes more
generations to solve benchmarking functions.
Phase II—The Phase II of the comparative study is to solve a medical image
Segmentation using GA, PSO and SA and to compare their performance. Image
segmentation is the process of partitioning of an image to many segments. Image
segmentation is primarily used in locating objects and boundaries in an image.
Medical image segmentation plays an essential role in computer-aided diagnosis
systems which are used in various applications such as microscopy, X-rays, and MRI
scans. Image segmentation is considered to be a most essential medical imaging
process as it extracts the region of interest. The input image taken for segmentation
is depicted in Fig. 1. The steps followed for GA, PSO, and SA to solve the image
segmentation problems are described below. The output images obtained using GA,
PSO and SA are shown in Fig. 1.
Fig. 1 Images a input, b GA’s output, c PSO’s output, d SA’s output

3.1 Image Segmentation Using GA
The image is divided into subimages and GA is applied to each subimage starting with
a random initial population. Each individual is evaluated using an arbitrary fitness
function. Best-fit individuals are taken and mated to produce offspring in forming
the next generation. Morphological operation is used to make new generation with
the help of crossover and mutation operators. The algorithm finally comes to an end
to give the final segmented subimage. The segmented subimages are combined to
form the final end image. The execution time taken by GA is 37.16 s.
3.2 Image Segmentation Using PSO
Set a particular threshold level. For each speck in the population—upgrade speck’s
fitness in search space, upgrade speck’s best fitness in search space, and move the
speck in the population. For each speck do—if the swarm gets finer, reward the
swarm and extend the speck’s and swarm’s life, else remove the speck and decrease
the swam’s life. Extend the swarm to breed and it is considered for the next iteration.
Delete the failed swarm and rest the threshold counter. Execution time taken by PSO
is 33.39 s.
3.3 Image Segmentation Using SA
The image is divided into subimages, and SA is applied to each subimage. Initialize
the temperature to T. Calculate energy U of the conformation. Alter the system using
appropriate Gaussian distraction. Calculate new energy U 1 of the altered system and
calculate the change in energy of the system as well [det(U) = U − U 1 ]. If [det(U) >
0] accept the altered system as the new conformation. Else accept the altered system
as the new conformation with a probability exp. [det(U)/KT]. Reduce the temperature
corresponding to the cooling schedule. Repeat the above steps until it cools to do a
considerably low value. Now SA has been applied to a subimage. Repeat the above
steps for each subimage and combine all the subimages to get the final segmented
image. Execution time taken by SA is 40.98 s.
On comparing the resultant images and the execution time taken by GA, PSO and
SA, the following inferences are recorded from the Phase II comparative study.
(1) GA has segmented the image better followed by SA and PSO.
(2) PSO takes less execution time but with poor quality of result.
(3) The best-to-worst order of algorithms based on the clarity of output image is
GA, SA and PSO.
(4) The best-to-worst order of algorithms based on the execution time taken in PSO,
GA and SA.
4 Conclusions
This paper analyzed the working and the performance of three widely used bio-
inspired algorithms namely GA, PSO, and SA. An elaborative comparative study
was performed in two phases. In Phase I, four benchmarking functions with different
characteristics were solved by GA, PSO and SA and their performance were compared
by three performance metrics. The experiments were done with different problem
dimensions. This phase could identify that PSO was consistently outperforming
other algorithms in producing optimal solution at all the dimensions. GA was found
to take lesser execution time, but not with good solutions. Few interesting higher-
dimensional cases were observed where SA was able to perform better than GA and
PSO. In Phase II, a medical image segmentation problem is solved by GA, PSO
and SA and their performances were compared based on the solution quality and
the execution time. The observations found were—GA was good in producing good
solutions, PSO was good in solving the problem faster and was no special remarkable
performance by SA.
This contradictory performance of the algorithms on the benchmarking problems,
and the real-world problems needs to be investigated further with more extensive
experimental setup and different optimization problems.
References
1. Krishnanand, K.R., Nayak, S.K., Panigrahi, B.K., Rout, P.K.: Comparative study of five bio-
inspired evolutionary optimization techniques. In: Proceedings of 2009 World Congress on
Nature and Biologically Inspired Computing (NaBIC), pp. 1231–1236 (2009)
2. Castillo, O., Martinez-Marroquin, R., Melin, P., Veldez, F., Soria, J.: Comparative study of
bio-inspired algorithms applied to the optimization of type-1 and type-2 fuzzy controllers for
an autonomous mobile robot. Inf. Sci. 192, 19–38 (2012)
3. Kalaiarasi, S., Sirramya, P., Edreena, P.: A review and comparative stud of bio-inspired
algorithms. Int. J. Appl. Eng. Res. 9(23), 23435–23448 (2014)
4. Yadav, A., Vishwakarma, D.K.: A comparative study on bio-inspired algorithms for sentiment
analysis. In: Cluster Computing (2020)
5. Romero-Rodriguez, W.J.G., Baltazar, R., Zamudio, V., Casillas, M., Alaniz, A.: Comparative
study of bio-inspired algorithms applied to illumination optimization in an ambient intelligent
environment. Smart Innov. Syst. Technol. 148 (2020)
6. Shobana, S., Radhika, N.: Efficient cloudlet provisioning using bio-inspired hybrid algorithm
in mobile cloud computing. J. Adv. Res. Dyn. Control Syst. 10(5), 1672–1678 (2018)
7. Aswanth, S.S., Gokulakannan, A., Sibi, C.S., Ramanathan, R.: Performance study of bio-
inspired approach to clustering in wireless sensor networks. In: Proceedings of 3rd International
Conference on Trends in Electronics and Informatics (2019)
8. Harun Surej, I., Ramanathan, R.: A Performance study of bio-inspired algorithms in
autonomous landing of unmanned aerial vehicle. In: Proceedings of Third International
Conference on Computing and Network Communications (2019)
9. Holland, J.H.: Adaptation in Natural and Artificial System. MIT press, Cambridge, USA (1975)
10. Russell, E., James, K.: Particle swarm optimization. Proc. IEEE Int. Conf. Neural Netw. 4,
1942–1948 (1995)
11. Van Laarhoven, P.J.M., Aarts, E.H.L.: Simulated Annealing: Theory and Applications, pp. 7–15
(1987)
12. Chen, Q., Liu, B., Zhang, Q., Liang, J., Sugunathan, P., Qu, B.: Problem definitions and
evaluation criteria for CEC 2015. In: Proceedings of Special Session on Bound Constrained
Single-Objective Computationally Expensive Numerical Optimization (2015)
A Study on Optimization of Sparse
and Dense Linear System Solver Over
GF(2) on GPUs
Prashant Verma and Kapil Sharma
Abstract There are various crypt-analytic techniques where solving a large dense
or sparse system of linear equations over finite field becomes a challenge due to
high computation. For instance, the problem like NFS for factorization of large
integers, symmetric ciphers for crypt-analytic problem, discrete log problem, and
algebraic attacks involves solving large sparse or dense linear systems over finite
field. Here, we consider GF(2) finite field. Gaussian elimination is the popular and
relevant method for solving large dense systems while Block Lanczos and Block
Wiedemann algorithms are well known for solving large sparse systems. However,
the time complexity of such popular method makes it reluctant and hence, the concept
of parallelism is made compulsory for such methods. In addition, the availability
of high end parallel processors and accelerators such as general-purpose graphics
processing units (GPGPUs) solves computationally intensive problems in reasonable
time. The accelerators with thousand of cores available today explore the bandwidth
of memory and take advantage of multi-level parallelism on multi-node and multi-
GPU units. Here, we consider Nvidia GPUs like Keplar, Pascal, and Volta along
CUDA and MPI. Also, CUDA-aware MPI leverages GPU-Direct RDMA and P2P
for inter- and intranode communication.
Keywords Cryptography · GPGPUs · GPU-direct P2P · MIMD · RDMA
1 Introduction
In today’s world, the growth of digital information has been increased rapidly; there-
fore, information security is imperative for the security requirement of the digital
world. There are various crypt-analytic techniques where solving a large system of
P. Verma (B) · K. Sharma

Department of Information Technology, Delhi Technological University, New Delhi, Delhi, India
e-mail: prashantperot@gmail.com
K. Sharma
e-mail: kapil@ieee.org
https://doi.org/10.1007/978-981-33-4543-0_45
420 P. Verma and K. Sharma
linear equations over finite field become a challenge due to high computation. The
system can be either dense or sparse depending on algorithms and problem defined
in cryptography. For instance, the integer factorization problem using number fields
sieve (NFS) [1] algorithm, discrete log problem, symmetric ciphers cryptanalysis,
and algorithms used in algebraic attacks involves handling large system of linear
equations (either dense or sparse) over finite field or GF(2). Gaussian elimination is
the popular and relevant method to handle large dense systems. In order to achieve
the results in short span of time, it is hard to define the hotspots for parallelism with
high extent. Additionally, such huge systems cannot sustain into the memory of a
particular node; therefore, an effective solver based on latest parallel hardware plat-
form and Gaussian elimination approach should be present. Block Lanczos [2, 3] and
Block Wiedemann [4–6] are the popular methods to solve such compute intensive
problems but the time complexity of such systems is cubic, therefore such system
is computationally slow and practically not feasible. To solve compute intensive
problems in reasonable amount of time, accelerated units such as general-purpose
graphics processing units (GPGPUs) are accomplished. Now it is very popular to
create supercomputer in the form of clusters where each node hosts multiple GPUs.
If we see the top 500 supercomputer list [7], we found that most of them are GPUs
based. Thus, it is necessary to develop applications in a way that it can be efficiently
scaled over multiple GPGPUs and nodes. The original method for Block Lanczos
algorithm [8, 9] is roughly split in three steps. The steps are preprocessing, Lanczos
iterations, and post-processing and at higher densities greater than 10%, the Block
Lanczos is quite costly in terms of performance.
This paper describes the research work for optimizations carried out on an existing
GPU enabled code for Gaussian elimination and Block Lanczos algorithm. The opti-
mization exercise started with understanding, performance profiling of the existing
methods. The next section gives the details of literature review. Section 3 explains
parallel methodology of Gaussian elimination and Block Lanczos over GF(2) [3,
10]. The optimization in multiple hardware platforms is explained in Sect. 4. The
next Sect. 5 shows the results over different hardware platform for performance and
scalability of Gaussian elimination and Block Lanczos for dense and sparse system
over GF(2), and finally Sect. 6 concludes the paper.
2 Literature Review
In order to solve dense and sparse system of linear equations over GF(2), the methods
that are available were implemented serially [11, 12]. The parallel implementation
is also available [13], but they are not optimized with latest hardware platforms
available and hence not fully utilized the available hardware resource of latest existing
technology.
Nvidia introduces series of accelerating cards for researchers to make their appli-
cation parallel and solve bigger problems in a reasonable amount of time [14].
Figure 1 shows the architecture of Nvidia GPU where grid, blocks, and thread are
A Study on Optimization of Sparse and Dense … 421
Fig. 1 GPUs grid, block,

and thread
arranged. For solving large system of dense linear equations, Gaussian elimination
is a prominent area for researchers and the research to optimize its parallel version
is less focused.
Koc and Arachchige [15] proposed Gaussian elimination algorithm over finite
field GF(2) and implemented the same on the geometric arithmetic parallel processor
known as GAPP. Parkinson and Wunderlich [16] proposed the parallel Gaussian
elimination for finite field GF(2) and the same was deployed on the parallel array
processor named as ICL-DAP. Bogdanov et al. [17] used a hardware that is parallel
in architecture to solve Gaussian Elimination over finite field GF(2) quickly. This
architecture was implemented on a field-programmable gate array (FPGA). In addi-
tion to this, the author also evaluates for a possible implementation based on ASIC
architecture. All these solutions can solve only small systems of dense over finite
field GF(2) and are very costly using special kind of hardware platforms. Albrecht
and Pernet [18] proposed the solution of dense system of linear equations over finite
field GF(2). The solution used multicore architectures and are very efficient and
part of the Method of four Russians (M4RI) library [19]. This solution shows the
performance results for 64 × 64 K linear systems of equations and presented that
their method is as good as to the implementation by Allan Steel [20] for solving
Gaussian elimination over finite field or GF(2) using MAGMA library. The work to
solve Gaussian elimination over finite field or GF(2) in a general-purpose processor
is not yet focused. This solution shows the performance results for 64 × 64 K linear
systems of equations and presented that their method is as good as to the implemen-
tation by Allan Steel for solving Gaussian elimination over GF(2) using MAGMA
library. This is the first work to solve Gaussian elimination over GF(2) in GPGPUs.
The challenge with the sparse matrix is to reduce the substantial memory require-
ments by accumulating the only nonzero elements. Depending on the sparsity factor
distinct data structures can be utilized to save enormous amount of memory. Formats
to save only nonzero elements can be divided into mainly two groups. The first
groups are those that support modification efficiently. For instance, dictionary of
keys (DOK), list of lists (LOL), or coordinate list (COO) comes under this cate-
gory and typically used for constructing the matrices. The second groups that help
effective entree and matrix actions, such as compressed sparse column (CSC) or
compressed sparse row (CSR) [21, 22]. Figure 3 shows storage representation of
dense and sparse matrix format.
The system has the order of hundreds of thousands of unknowns [23, 24]. There-
fore, an efficient optimized Block Lanczos solver for large sparse systems should be
available which can run on multiple instruction multiple data (MIMD) architecture
shown in Fig. 2, a kind of cluster of multiple nodes, each consist of either one graphics
processing unit or multiple graphics processing unit. This study shows that how a
Gaussian elimination for dense system and Block Lanczos for sparse systems lever-
ages parallel hardware and scales efficiently over MIMD architecture with hybrid
technology [25].
Fig. 2 Multiple GPU devices across multiple nodes using MPI and CUDA
Fig. 3 Sparse matrix storage format representation

3 Research Methodology and Analysis
Given a system of linear equations over GF(2) and the task is to find out the equations
linearly dependent on others and remove them. Consider the system of equations
where no. of equations equals to no. of variables and of order O (105 ) or higher.
4 Linear System Solver Over GF(2) for Dense
The system is of the form A * x = B (mod 2) where Matrix A is dense its 50% of
the elements nonzero and no. of rows is greater than no. of columns. All arithmetic
operations are over GF(2) which means that addition and multiplication is equivalent
to logical XOR and logical AND, respectively. The Gaussian elimination to solve
large dense system of equations has following steps:
4.1 Generate Random Matrices
[1] Generate entries of A and x with pseudo-random number generator

[2] Compute A * x = B
[3] Solve linear system [A, B]
[4] Compare computed solution of linear system with reference.
4.2 Generate Linear System Using LFSR
[1] LFSR is initialized to random input

[2] Clocked multiple times to produce multiple bits of output
[3] The output bits are expressed as linear combination of initial condition
[4] With enough equations, a linear system of equations can be formed
a. Initial condition of LFSR as unknown
[5] Solve the linear system
a. Compare the computed initial condition with reference
4.3 Single and Multi-GPU Gaussian Elimination
[1] Matrix is split in parts rowwise

[2] Each GPU owns exactly one part
[3] All processing (3 kernels) on the part is done by owner GPU
[4] All operations are done in parallel by the GPUs

[5] Consensus about pivot is achieved after findPivot operation.
4.4 Optimization
[1] Performance is heavily influenced by memory access pattern

[2] How should A be stored?
[3] Find pivot prefers column major, extracting pivot row prefers row major
[4] One coalesced and one stride accesses of memory is unavoidable
[5] Row reduction works better with column major
[6] Store transpose A instead of A
[7] Memory access pattern.
5 Block Lanczos Solver Over Finite Field or GF(2)

for Sparse
The initial method for Block Lanczos algorithm is roughly split in three steps
preprocessing, Lanczos iterations and post-processing shown in Fig. 4.
In the preprocessing step, operation such as memory allocation, initialization, and
loading of the linear system data are done. The Lanczos step involves the iterative
part of code that computes the solution and finally in post-processing step solution
is written to file. The optimization work that has been explored is as follows.
Fig. 4 Handling steps in

Block Lanczos algorithm
5.1 Better Test Data Generation
The method requires sparse linear systems as input for benchmarking the perfor-
mance. A new data generating module should be present, which is faster and can
generate arbitrary relations between columns of the matrix.
5.2 Optimization of SpMV and SpMTV Operations Pression
The Lanczos step involves repeated calls to two GPU kernels, the sparse matrix–
vector multiplication (SpMV) and sparse matrix transpose vector multiplication
(SpMTV). The high percentage share of these two kernels makes them primary candi-
date for optimization. Performance of both the kernels is improved with following
techniques. The SpMV and the SpMTV are both matrix–vector multiplication. The
matrix–vector multiplication is composed of multiple dot products. Multiple dot
products can be executed in parallel and warp (vector of 32 threads) is dedicated for
computing one dot product.
The dot product operation involves two steps: first pointwise multiplication and
second is adding all multiplication results together. The pointwise multiplication can
be done in parallel by each thread of the warp. However, for adding the multiplication
results together, is reduction operation and thus threads need to cooperate. The Kepler
architecture introduced four shuffles instructions: _shfl(), _shfl_down(), _shfl_up(),
and _shfl_xor(). Figure 5 shows shuffle down operation on 8 threads. Shuffle instruc-
tions allow faster cooperation between threads from same warp. Effectively, threads
can read registers of other threads in the same warp. The reduction operation in
new version of SpMV is implemented using shuffle instructions. The shuffle-based
reduction performs better than even the shared memory atomics-based implemen-
tation. This modification leads to better work distribution among threads of a warp
and reduces warp divergence significantly. Warp-level approach also results in more
coalesced memory access.
Fig. 5 Warp shuffle

instruction
6 Conclusions
This paper presents a study on optimization of scalable solution for solving large
sparse and dense systems of linear equations over finite field or Galois Field for binary,
i.e., GF(2). These solvers are utilized as a library for various cryptography and crypt-
analysis applications like integer factorization problem using NFS, cryptanalysis of
ciphers, DLP, algebraic attacks, etc. The research work explored CUDA and MPI
to leverage multi-level parallelism used in multi-socket, multi-GPU systems. Many
optimizations techniques with respect to solving large dense and sparse systems are
discussed to tell about the capabilities of the device kernels and excellent scalability
in multi-GPUs architecture. At higher densities (>10%), the Block Lanczos is quite
costly in terms of performance. For such cases, even the dense solver such as Gaus-
sian elimination can be tried. The SpMV and SpMTV are essentially matrix–vector
operations. In SpMV the matrix is in normal format while in SpMTV the matrix is
in the transposed format. This change leads to huge change in performance of code.
The transpose multiply is 3–4x slower than the normal multiply. The overhead of
this approach in terms of execution time is, time needed for transposing the matrix
and in terms of memory is doubling the storage space of matrix. The future research
is to explore hotspots in a program which are massively parallel and offload it to
the GPGPUs. We also focus on memory out of bound, where the system of linear
equations overtakes the memory space of an individual GPU.
References
1. Wang, Q., Fan, X., Zang, H., Wang, Y.: The space complexity analysis in the general NFS
integer factorization. Theor. Comput. Sci. 630, 76–94, (2016). ISSN: 0304–3975, https://doi.
org/10.1016/j.tcs.2016.03.028
2. Sengupta, B., Das, A.: Use of SIMD-based data parallelism to speed up sieving in integer-
factoring algorithms. IACR Cryptol. 44 (2015)
3. Intel Corp.: Technical Report. https://en.wikipedia.org/wiki/Lanczos algorithm (2009)
4. Giorgi, P., Lebreton, R.: Online order basis algorithm and its impact on the block Wiede-
mann algorithm. In: Proceedings of 39th International Symposium on Symbolic and Algebraic
Computation (ISSAC’14), pp. 202–209. ACM (2014)
5. Huang, A.G.: Parallel Block Wiedemann-Based GNFS Algorithm for Integer Factorization.
Master thesis, St. Francis Xavier University, Canada (2010)
6. Zhou, T., Jiang, J.: Performance modeling of hyper-scale custom machine for the principal
steps in block Wiedemann algorithm. J. Supercomput. 1–23 (2016)
7. Top 500 list—Nov 2017. https://www.top500.org/list/2019/11/
8. Summit: Oak Ridge National Laboratory’s Next High-Performance Supercomputer. https://
www.olcf.ornl.gov/olcfresources/computesystems/summit
9. Flesch, I.: A new parallel approach to the Block Lanczos algorithm for finding null spaces over
GF (2). Master thesis, Utrecht University, The Netherlands (2006)
10. Thomé, E.: A Modified Block Lanczos Algorithm with Fewer Vectors. arXiv:1604.02277
11. Yang, L.T., Huang, Y., Feng, J., Pan, Q., Zhu, C.: An improved parallel block Lanczos algorithm
over GF (2) for integer factorization. Inf. Sci. 379, 257–273 (2017). ISSN 0020-0255, https://
doi.org/10.1016/j.ins.2016.09.052
12. Xu, T.L.: Block Lanczos-Based Parallel GNFS Algorithm for Integer Factorization. Master
thesis, St. Francis Xavier University, Canada (2007)
13. Yang, L.T., Xu, L., Yeo, S.S., Hussain, S.: An integrated parallel GNFS algorithm for integer
factorization based on Linbox Montgomery block Lanczos method over GF (2). Comput. Math.
Appl. 60(2), 338–346 (2010)
14. Reaño, C., Silla, F.: Performance evaluation of the NVIDIA pascal GPU architecture. In:
2016 IEEE 18th International Conference on High Performance Computing and Communi-
cations, pp. 1234–1235. Sydney, NSW (2016). https://doi.org/10.1109/HPCCSmartCity-DSS.
2016.0173
15. Koc, K., Arachchige, S.N.: A fast algorithm for gaussian elimination over GF (2) and its
implementation on the GAPP. J. Parallel Distrib. Comput. 13(1), 118–122 (1991)
16. Parkinson, D., Wunderlich, M.: A compact algorithm for gaussian elimination over GF (2)
implemented on highly parallel computers. Parallel Comput. 1(1), 65–73 (1984)
17. Bogdanov, A., Mertens, M.C., Paar, C., Pelzl, J., Rupp, A.: A parallel hardware architecture
for fast gaussian elimination over GF (2). In: 14th IEEE Symposium on Field-Programmable
Custom Computing Machines, pp. 237–248 (2006)
18. Albrecht, M.R., Bard, G.V., Pernet, C.: Efficient dense gaussian elimination over the finite field
with two elements. CoRR, abs/1111.6549, 2011
19. M4ri library. https://github.com/malb/m4ri
20. Bosma, W., Cannon, J., Playoust, C.: The magma algebra system I: the user language. J. Symbol.
Comput. 24(3–4), 235–265 (1997)
21. Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector
and matrix-transpose-vector multiplication using compressed sparse blocks. In: Proceedings of
the Twenty-First Annual Symposium on Parallelism in Algorithms and Architectures (SPAA
’09). Association for Computing Machinery, New York, NY, USA, pp. 233–244. https://doi.
org/10.1145/1583991.1584053
22. Vastenhouw, B., Bisseling, R.H.: A two-dimensional data distribution method for parallel sparse
matrix-vector multiplication. SIAM Rev. 47(1), 67–95 (2004)
23. Zamarashkin, N.L., Zheltkov, D.A.: GPU based acceleration of parallel block Lancoz solver.
Lobachevskii J. Math. 39(4), 596–602 (2018)
24. GPU acceleration of dense matrix and block operations for lanczos method for systems over
large prime finite field. Supercomput. RuSCDays Ser. Commun. Comput. Inf. Sci. 793, 14–26
(2017)
25. Gupta, I., Verma, P., Deshpande, V., Vydyanathan, N., Sharma, B.: GPU-accelerated scalable
solver for large linear systems over finite fields. In: 2018 Fifth International Conference on
Parallel, Distributed and Grid Computing (PDGC), Solan Himachal Pradesh, India, pp. 324–329
(2018). https://doi.org/10.1109/PDGC.2018.8745743
Intracranial Hemorrhage Detection
Using Deep Convolutional Neural
Network
K. Thirunavukkarasu, Anmol Gupta, Satheesh Abimannan,
and Shahnawaz Khan
Abstract A brain hemorrhage is a serious medical emergency that can cause

intracranial bleeding that occurs inside the cranium. Intracerebral hemorrhage leads
to severe neurological symptoms on one side of the human body, such as loss of
consciousness, numbness, or paralysis. That often needs swift and intense therapy.
Hypertension specialists review the patient’s cranial medical images to see the loca-
tion of intracranial bleeding. Now, it is a complex process and often time-consuming.
This research identifies a convolutionary neural network approach from computed
tomography scans for automatic brain hemorrhage detection. Convolutional neural
networks are a powerful image-recognition technique. This research evaluates a firm
neural network optimized for the detection and quantification of intraperitoneal,
subdural/epidural, and subarachnoid hemorrhage on contrast CT scan. The dataset
used for this research includes 180 GB images of 3D head CT studies (more than
1.5 million 2D images). All provided images are in DICOM format used for medical
images.
Keywords Intracranial hemorrhage detection · Deep convolutional neural network
1 Introduction
A debilitating illness classifies intracranial hemorrhage (ICH) [1]. This illness is one
of the leading causes of death and injury and causes a stroke. Intra-crane hemor-
rhage is identified as bleeding inside the human body’s skull. Traumatic brain injury
(TBI) is among the leading causes of death and disability in USA. This represents
nearly 30% of all deaths in 2013. There is a high risk of TBI transforming into a
secondary brain injury that can lead to insensitivity. If it remains untreated, it may
K. Thirunavukkarasu (B) · A. Gupta · S. Abimannan

School of Computer Science and Engineering, Galgotias University, Greater Noida, India
e-mail: thiruk.me@gmail.com
S. Khan
Department of Information Technology, University College of Bahrain, Saar, Bahrain
https://doi.org/10.1007/978-981-33-4543-0_46
430 K. Thirunavukkarasu et al.
Fig. 1 Types of hemorrhage
lead to death. It is considered clinically awful. Intracranial hemorrhage is divided

into five subtypes (Fig. 1) based on its location in the brain: intraparenchymal (IPH),
intraventricular (IVH), subdural (SDH), subarachnoid (SAH), and epidural (EDH).
Intracranial hemorrhage that occurs within brain tissue is often called intracere-
bral hemorrhage. Detecting hemorrhages and even mild ones are difficult. Specially
trained radiologists need a high degree of concentration for the study.
The head’s computed tomography (CT) [2] is the method of medical imaging of
workhorses adopted worldwide to diagnose neurological accidents. The CT image
quality and its quick procurement time make it a suitable diagnostic method for
primary evaluation of intracranial hemorrhage over magnetic resonance imaging.
CT scans to produce a series of images. It uses X-ray beams to capture brain tissues
with varying intensities based on the magnitude of X-rays absorption in the tissue.
The CT images are shown by means of a windowing system. The HU numbers are
converted into grayscale values [0, 255] by the parameters of the width and the level
of the windows in the windowing method. CT scanning grayscale images, however,
are constrained by poor signal-to-noise, lower contrast, and a huge percentage of arti-
facts. In a massive 3D volume with near-perfect sensitivity, one individual challenge
is to recognize miniature subtle anomalies.
Deep convolutionary neural networks are a spectacular branch of machine learning
for visual objects that have gained considerable focus in recent years due to
their striking success in various computer applications, such as object recognition,
detection, and segmentation. Deep CNN have succeeded with significant results
[3, 4].
Intracranial Hemorrhage Detection … 431
Convolutionary neural networks discover the associations between the input

images’ pixels by extracting characteristic features through pooling and convolu-
tion. The features discovered using learned kernels at each layer vary in terms of their
complexity [5]. The first layer removes basic features like edges and then removes
more complex and high-level features from layers. Convolution operation on CNNs
is supported by three main sections: (i) Weight-sharing mechanism to handle 2D
images or 3D data, such as volumetric images and videos [6, 7], (ii) local input
topology connectivity can be used using 2D or 3D kernels, (iii) slight shift-invariance
generated using the pooling layer [8].
Newly proposed very deep CNN architectures to replace the traditional convo-
lutionary layer with more robustly represented modules by using limited compu-
tational resources [9, 10]. Szegedy et al. [11] introduced inception modules that
could remove multi-scale features from the input function maps and could effec-
tively decrease parameter numbers. The inception module was developed with a
higher uniform clarified architecture compare to past versions and could. Therefore,
the performance achieved for large-scale image classification task was of top class.
For years, traditional supervised machine-learning approaches [12, 13] were
developed using well-engineered algorithms and programmed training techniques.
The method consisted of taking the immature data, describing their content with low-
dimensional feature vectors—using detailed prior knowledge of the outline being
discussed—and inputting the vectors into a trainable classifier. Although the classi-
fier was indeed useful for other purposes, the features were not generic in principle.
The precision of the method would depend on how well the heuristics were designed.
The most important type of deep neural networks, such as video sequences or
images, is convolutionary neural networks. It has the capability to process array-
like data. The concept behind CNN, from a top-level standpoint, is to identify
compositional hierarchy characteristics that object from the real-life display [14].
The next section of this research illustrates the materials and methods used in
the detection of intracranial hemorrhage. It also illustrates the detailed architecture,
algorithm, and implantation procedures of the proposed methods. Section 3 presents
the results of this paper. Section 3 discusses the result and discussion of the project
system and existing system. Finally, Sect. 4 concludes this research and proposes
guidelines for future research.
2 Materials and Methods
The paper proposes a method using deep CNN and weighted multilabel focal loss
for the classification of intracranial hemorrhages into epidural hemorrhage, intra-
parenchymal hemorrhage, intraventricular hemorrhage, subarachnoid [15] hemor-
rhage, and subdural hemorrhage.
subtype Subarachnoid subdural

intraparenchymal
intraventricular
epidural
Fig. 2 Target class distribution of the training data
Dataset
The paper used the intracranial hemorrhage dataset RSNA [16] for the analysis of
intracranial hemorrhage. The dataset contains 4,516,818 DICOM format images of
five different types of intracranial hemorrhage together with its associated metadata
which was labelled with the help of 60 volunteers. Figure 2 shows the distribution
of the training data. Since a standard training and validation split was not provided
for the dataset, we split into 70:30 splits.
The image augmentation techniques such as rotation, zoom, scale, and translate
were performed before splitting the dataset. Also to avoid any data leakage was
performed while evaluating model performance adversarial cross-validation.
Proposed Deep CNN Architecture
Deep CNN with regularizing layers [17] like max pooling. Dropout is used to obtain
CT scans embedding features, which are then classified into four different classes
using a neural network fully convolution.
Architecture of the model with its corresponding input and output form is shown
in Fig. 3. Since the total trainable parameters are 5, 147, 716 which could result in
overfitting while practicing, batch normalization, max pooling, and dropout layers
are applied to the model which increased the generalizability of the proposed model.
Loss Function
Our dataset was highly imbalanced with very few images of epidural hemorrhage
while other forms of hemorrhage had roughly the same distribution. Because of this
imbalance complexity of data loss functions such as categorical cross-entropy, we
did not hit the global minimum and used weighted class approach and weighted
multilabel focal loss to solve this problem and found that the weighted multilabel
focal loss handled the class imbalance problem as shown in Table. 1 very well.
The Weighted multilabel focal loss used in our methodology is given as
1
N M
L= wm .[a + b] (1)
X n=1 m=1
Fig. 3 Architecture of the proposed method with layer type and output shape
Table 1 Results comparison

MT REG AUG Objective function Log loss
CNN No No CCE 0.6923
Deep CNN Yes Yes CCE 0.8144
Deep CNN Yes Yes CCE + WC 0.8529
Deep CNN Yes Yes WFL 0.9721
MT model type; REG regularization; AUG augmentation; CCE categorical cross entropy; WC
weighted classes; WFL weighted focal loss
γ γ
where a = (1− ∝). 1 − yn,m .tn.m . ln(yn.m ) and b = α.yn,m . 1 − tn,m . ln 1 − yn,m
1
N M
L= wm .[c.ln(yn , m, t)] (2)
N n=1 m=1
γ
where c = (1− ∝t ). 1 − yn,m .
where w is the class weight, α is weighing factor, and γ is focusing parameter which
is tuned in the range of [0,5] and it was observed that on moving from γ = 0 to γ =
5 the loss evaluated [18] were having higher contributions from imbalanced classes
which were wrongly classified.
3 Result and Discussion
After careful experimentation with different methodologies and then evaluation using
log loss evaluation metrics, we found that our proposed method with deep CNN
architecture and weighted focal loss [19] performs very well and achieves 97%
accuracy, whereas, in comparison with other methodologies, the key takeaway is that
regulatory techniques and proper augmentation were the key methods that helped
in achieve top accuracy. Figure 4 shows the training and validation accuracy of our
intended model that is being trained for 40 epochs while Fig. 5 shows our intended
model’s training and validation losses.
Although the number of hemorrhages in the test sets is low, especially if broken
down by type, the findings provide important insights. The highest possible intra-
parenchymal hemorrhages were recognised. Typically, they were hyperattenuating
and enclosed by normal tissue. Epidural hemorrhage was evident straight away.
Fig. 4 Training and

validation accuracy of the
proposed model
Fig. 5 Training and

validation loss of the
proposed model
Missed subdural hemorrhage was primarily hypoattenuating with regards to healthy

tissue. Hemorrhages of the subarachnoids are difficult to detect. These are typically
thin, with blood filling in the sulci, which are cortex fissures.
We use a deep convolutionary neural network to detect brain hemorrhage. The
method proposed with deep CNN architecture and weighted focal loss is 97% accu-
rate. Therefore, the future research on the identification of multiple pathologies from
brain CT scans is verified. The solution proposed should not be misconstrued as a
credible replacement for the real radiologists in the field.
In short, the suggested deep CNN system demonstrates the ability to be used as a
tool for emergency exams. Still, the method has been tested on a limited test set and
is still subject to further experimentation for its real-world implementation.
References
1. Mandybur, T.I.: Intracranial hemorrhage caused by metastatic tumors. Neurology 27(7), 650–
650 (1977)
2. https://www.pnas.org/content/116/45/22737
3. Rao, A.A., Patel, M.D.: Deep 3D convolution neural network for CT brain hemorrhage classi-
fication. In: Proc. SPIE 10575, Medical Imaging 2018: Computer-Aided Diagnosis, 105751C
(27 Feb 2018). https://doi.org/10.1117/12.2293725
4. Khan, S.N., Usman, I.: Amodel for english to urdu and hindi machine translation system using
translation rules and artificial neural network. Int. Arab J. Inf. Technol. 16(1), 125–131 (2019)
5. https://stats.stackexchange.com/questions/362988/in-cnn-do-we-have-learn-kernel-values-at-
every-convolution-layer
6. https://datascience.stackexchange.com/questions/26755/cnn-how-does-backpropagation-
with-weight-sharing-work-exactly
7. Bashir, T., Usman, I., Khan, S., Rehman, J.U.: Intelligent reorganized discrete cosine transform
for reduced reference image quality assessment. Turkish J. Electr. Eng. Comput. Sci. 25(4),
2660–2673 (2017)
8. https://stats.stackexchange.com/questions/121703/what-does-shift-invariant-mean-in-convol
utional-neural-network
9. Khan, A., Sohail, A., Zahoora, M.M.E., Qureshi, A.S.: A survey of the recent architectures of
deep convolutional neural networks. Artif. Intell. Rev. (2019). https://doi.org/10.1007/s10462-
020-09825-6
10. Shahnawaz, Mishra, R.B.: An English to Urdu translation model based on CBR, ANN and
translation rules. Int. J. Adv. Intell. Paradig. 7(1), 1–23 (2015)
11. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V.,
Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9
12. Kotsiantis, S.B.K.: Supervised machine learning: a review of classification techniques.
Informatica 31, 249–268 (2007)
13. Khan, S., Kannapiran, T.: Indexing issues in spatial big data management. In: International
Conference on Advances in Engineering Science Management and Technology (ICAESMT)-
2019, Uttaranchal University, Dehradun, India (Mar, 2019)
14. L’azaro-Gredilla, M., Liu, Y., Phoenix, D.S., George, D.: Hierarchical compositional feature
learning. arXiv:1611.02252 [Online], https://arxiv.org/pdf/1611.02252.pdf
15. Thorgood, M., Adam, S.A., Nlann, J.: Fatal subarachnoid haemorrhage in young women: role
of oral contraceptives. Brit. Med. J. 283, 762 (1981)
16. https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/data
17. https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/
18. https://machinelearningmastery.com/loss-and-loss-functions-for-training-deep-learning-neu
ral-networks/
19. Multi-class classification with focal loss for imbalanced datasets [Online]. https://www.dlo
logy.com/blog/multi-class-classification-with-focal-loss-for-imbalanced-datasets/
A Multi-factor Approach for Cloud
Security
Abstract Cloud computing is known for its complexity regarding different models
of deployment and services that it offers. However, security remains a massive
hindrance in its development. Hence, a multi-factor approach to secure the cloud
environment proposed by this paper relies on authentication and auditing as the
fundamental elements for sustaining the privacy of information in the cloud environ-
ment. These are needful assets to counter various threats and attacks on the cloud
service provider as well as at the user end. This paper proposes a multi-factor approach
through which a user’s identity is verified securely; as well as a means to build trust
between the client and the cloud service provider by allowing proper visibility of the
user’s activities.
Keywords Authentication · Auditing · Cloud computing · Cloud security · User’s

trust
1 Introduction
The security in cloud computing has been a broad viewpoint since cloud computing
works in terms of services they provide to the users. Therefore, its security should
be applied proportionately. Following the architecture of cloud computing, which is
SaaS, PaaS, and IaaS, each level requires particular attention to its security as they do
not face the same threats. Moreover, the cloud service provider should dedicate and
implement the appropriate security role to the needed application or resources so as
not to slow its performance. The security of the cloud environment aims, to ensure
F. K. Mupila (B) · H. Gupta

Amity University, sector 125, Noida, Uttar Pradesh, India
e-mail: kmf.mupila@gmail.com
H. Gupta
e-mail: hgupta@amity.edu
https://doi.org/10.1007/978-981-33-4543-0_47
438 F. K. Mupila and H. Gupta
the user’s trust in their data and to prevent vulnerabilities from being exploited. It also
aims to prevent threats that cause error to the infrastructure and stops the likelihood
of attacks to occur.
Cloud computing is a virtual base concept of nature. It has its complexities from
services or techniques running in the background, such as its elasticity, scalability,
and ubiquity. Therefore, it becomes hard to secure and easy to breach the security.
Data protection englobes a wide variety of laws, technologies, and policies
to protect data, applications, and the ever-expanding cloud computing infrastruc-
ture. The cloud security alliance (CSA) published a detailed study on the top 12
information security risks [1] which are listed below.
• Data breaches
• Identity, Credential, and Access Management insufficient
• Vulnerable IPS and APIs
• Vulnerabilities of the system
• Account Hijacking
• Malicious Insiders
• Advanced Persistent Threats
• Data Loss
• Inadequate Due Diligence
• Abuse and Nefarious Use of Cloud Services
• Denial of Services
• Shared Technology Vulnerabilities.
There are various approaches to assess these threats, such as end-to-end strong
and appropriate encryption, better isolation of resources, strong authentication, moni-
toring and auditing among other. In regards to this, this paper proposes a multi-factor
approach where is focus is on authentication, auditing and monitoring which is
composed of steps between the cloud service provider and the end-user to manage
trust and data integrity of user’s data.
By definition, authentication and authorization are two different terms but both of
them have a similar implementation as well. Authentication is performed to verify
the user before accessing any resources. And, authorization focuses on providing
the right access to the right user. Multi-factor authentication (MFA) is defined as an
authentication scheme where a computer user has access only after having success-
fully sent two or more pieces of proofs (or factors) to an authentication mechanism,
for example; information (what the user and only the user knows), ownership (what
the user and only the user possess), and inherence (something the user and only
the user is) [2] Multi-factor authentication approach preserves the confidentiality
and integrity of the users. It deploys multiple manners of authentication which averts
against attacks such as the man in the cloud attack, men in the middle attack, phishing
attack, and so on which may result to the modification of data.
The end user’s concerns are about the storage and location of their data, as long
as they have no physical access to the data centre, although this flaw creates a signif-
icant trust problem with the service provider in which they are subscribed. One of
the schemes to build trust between the user and the cloud service provider is by
A Multi-factor Approach for Cloud Security 439
implementing a reliable authentication system which guarantees the data’s integrity

and offers proper visibility to the user over their data. Auditing and monitoring the
cloud-based application is an essential feature as it helps to recognise any suspi-
cious activity in the network which can be detected while tracking and analysing the
infrastructure.
As mentioned above, the overall idea of this paper is composed of six-steps,
which, emphasises on the authentication, auditing, and monitoring. The latter has
the interest to strengthen the trust of the cloud consumer and provides a new way to
secure the cloud computing environment.
2 Related Work
In view of my proposed work, there are findings of other researchers whose works
have been highlighted regarding the security of cloud computing. Ganesh V. Gujar
proposed STEP-2 user authentication in which a dynamic token from the hash table is
sent to the user’s email ID; then, the token value is required for the step-2 authentica-
tion at the user interface. An additional feature for the session management in which
the dynamic token generated from hash table will remain valid up to a particular
session only. Once the user logs out from the cloud environment, the token expires
[3].
Prachi Soni proposed a multi-factor authentication security framework in cloud
computing. Here, an elliptic curve point algorithm is implemented, executing ten
steps to assure authentication and authorization. However, data confidentiality,
integrity, and then control access are based on attribute certificate. This technique
is used here to give the power and control to the client by using the combination of
cryptography. This also gives access control to keep the data safe from vulnerabilities
[4].
Sabout Nagaraju has suggested SecAuthn: Secure authentication of multi-factor
authentication for the cloud computing environments. Four steps, notably key creden-
tial, username and password, bio-metric fingerprint, and OTP have been used as the
multi-factor’s aspects. Then, station-to-station Diffie-Hellman key exchange is used
to prepare, encrypt and share one-time session keys, which is followed by the hashed
credentials checked in the authentication servers using only the original creden-
tials. The authentication scheme proposed offers true security to the cloud user’s
credentials with the aid of GNY logic [5].
Kashif Munir, stated that an in-depth security strategy must be enforced to protect
against threats to the integrity and safety of their applications and data. This security
line includes firewalls, identification and prevention of intruders, reputation manage-
ment, log review, and malware protection. Such protection can be used by prudent
organisations and service providers in their cloud infrastructure to provide data
protection so they can have a leverage on cloud computing before their competitors
[6].
Given a number of efforts to address this problem, a number of problems such as

identification, privacy, personalization, integration, protection, and scalability remain
major barriers to cloud adoption. These are few related works not to mention all, but
the security of cloud computing remains an important factor.
3 Proposed Model
This work aims to reinforce the security of cloud computing. Considering the fact
that a single factor is not enough to secure a cloud-based environment, it is proposed
that using a multi-factor approach can provide more security to the environment
and establishes a trust between user and cloud service provider. The proposed work
offers a secure environment presenting a technique to keep track of all the activity,
and a reliable authentication process to give the power to the client in order for the
latter to have control over their activities and data. These processes verify the user’s
credentials and monitor their access control over the resources.
As previously mentioned, authentication is crucial for information privacy, there-
fore, this work has been divided in such a way authentication is counted in order to
mitigate various cyber-attacks over cloud computing, furthermore, monitoring the
activities of the user, and their access control refers as the second part of this work
where the focus is on granting to the user the visibility over their logs record and
the ability to track their activities following the six-steps of the proposed framework
where security responsibilities are shared between the client and the cloud service
provider in order to avoid a lengthy process at only one side of this communication.
Consequently, one of the most important difficulties in integrating cloud-based
security is to ensure unified access and accountability across the various domains,
however, a mixture of public, private, and even hybrid cloud-based services, makes
integration of security services a key task, despite several established networks [7].
The lack of visibility creates gaps in the overall safety of an organisation’s network,
making it difficult to see attacks. In the old network architecture, all structures inside
the wall of an organisation, that is, under the control of the organisation, were not an
essential challenge to maximum visibility in the network. However, when the cloud
is adopted, some control is lost and consequently the correct visibility is no longer
achievable. Visibility is the main take-over, since devices that you can’t see can’t be
secure [8].
3.1 Overview of the Proposed Approach
The concerns of this work take place in the cloud service provider side as well as at
the user side. Amid the six-steps which contains this process, step one and four are
executed at the client-side, whereas, step two, three, five, and six are executed at the
service provider side.
Fig. 1 Brief description of interaction to the cloud infrastructure
It is known that security over private cloud is more secure since all data are within
the boundary (firewall) and are not available for the general public, meanwhile, in
public cloud, the subscriber or client does not know the structure of the data centre,
particular which server processes the data, how is the implementation of the network,
or how secure is the environment.
As a matter of fact, there is an urgent need to focus more on preventing breach of
confidence than on post-service lack of accountability from diminishing the concerns
that hinders its progress so that we could fully benefit from the unprecedented advan-
tages that cloud computing has to offer. An effective standardised trust management
system is necessarily required for the individuals and organisations to utilise the
potential benefits served by cloud computing technology adequately [9].
Figure 1 illustrates the basic interaction between the client and the cloud service
provider. It is well known that authentication is the first operation to take place in
order to authenticate the user’s credentials.
3.2 Detailed Description and Working Principle
(1) Step 1
The step one of this proposed model concerns a standard login process which
means the user has to enter its credential to access the requested resource in the
cloud. However, to mitigate identity theft, a unique verification method needs
to be set to confirm the email through which the user uses to register for the
cloud service. And, the establishment of some other parameters to be used in
further steps.
(2) Step 2
After the user gained access across the cloud services, now it is the responsibility
of the cloud service provider to monitor all the activities performed by the user.
The first task from the cloud service provider is to send a report of login to the
client through its email address given at the registration time containing details
of login such as:
• Location
• Device IP address
• Device type
Fig. 2 Steps of the proposed model
• Device Mac address

• Time
The log report is essential in preventing the privacy leakage of user’s data. The
detailed report not only provides accurate information about the login session
but also enables the user to trace its activities.
Figure 2 demonstrates the steps followed in the proposed model at the user end
as well as at the cloud service provider.
(3) Step 3
In this method, session time is introduced not only to establish the trust between
user and service provider but to maintain the confidentiality and integrity of data
as well.
At the time of registration or subscription to the cloud, the user must determine
the duration of their daily session in such a way that allows the service provider
to convey a new password to the user email address once their scheduled session
expires. The session is suspended until the user logs in again.
The user needs to log in again with a new password, which is shared from the
service provider through the registered email address to maintain the session
active. An email is sent to the user contains an encrypted password of a minimum
12 characters. The user is obligated to determine its decryption method at the
registration time to be able to decrypt the new password and to resume its
session.
(4) Step 4
Once the user logs in again with the new password, the session resumes unless
there is any mismatch with the one sent. Accordingly, if there is the presence of
an attacker, automatically he or she is going to be logged out from the session.
Most important of all is that, all the connected devices which has failed to
reconnect using the new password are going to be out of the session.
(5) Step 5
The cloud service provider ensures the security of client over the remote network
considering all the security methods applied by the client or subscriber will no
longer be taken or supervised by them. This step provides more visibility of the
user activity. Subsequently, after the client has entered the new password, the
cloud service provider has to perform another action to maintain the visibility
of the user activities. Therefore, another login report is sent to the user using
the registered email address considering:
• Current image of the client
• Screenshot of the current page
• Monitored report of the previous session
• IP address confirmation
• Location confirmation
(6) Step 6
The final step consists of a final report sent from the cloud service provider to
notify the client about the completion of the current session.
The challenges of cloud security are not insurmountable. With the right partners,
technology and foresight, companies can leverage the benefits of cloud technology.
A trusted administration service can be cloud independent, but trust techniques
and evaluations features must be consistent with the IaaS, PaaS, or SaaS cloud model
underlying this approach [10]. We argue that consideration of a multi-factor strategy
has to capabilities to establish potential trust.
Management view point and techniques is crucial. Out of several steps, it is
believed that these steps are essential in the secure functioning of the cloud. These
six-steps are user friendly because of the security share method it deploys from the
user side and the service provider, quick, reliable, and robust.
4 Research Analysis
The proposed model overcomes some of the threats and attacks cited in the table
below. Besides granting authority to the users, control is another critical question
that builds trust. In fact, we trust a system less when we don’t have much control
over our assets.
There is no way, of course, to ensure that cloud is fully safe for customers. The
importance of trust differs between organisations, depending on the nature of the
data. Therefore, the less confidence a company places in the cloud provider, the
more it needs the technology to monitor its data [11].
Table 1 indicates some attacks and threats which challenges the cloud environment
and averts the consumer to have a complete trust to the service provider.
Table 1 Possible problems in cloud

Problematic in cloud
Threats Attacks
Threats Security control Attack Security control
Migration to cloud Strong authentication Zombie attack Strong authentication
Cloud API Strong authentication Attack on monitoring Monitoring with
IDS/IPS
Insider attack Monitoring Spoofing attack Strong authentication
Datas loss Strong authentication Back door and channel Strong authentication
and auditing attack
Risk profiling Monitoring Phishing attack Strong authentication
Identity theft Strong authentication Man in the middle Strong authentication
attack and encryption
5 Future Work
Despite the limitations, these are valuable in light of this work, reliable authentication,
and proper visibility of activities which gives authority and establish trust between
user and cloud service provider. Further researches should focus on securing the
email address in such a way to counter an intruder or attacker to have access and to
possess the new password. Besides, the implementation of a secure authentication
method is also favoured.
6 Conclusion
In summary, this paper argued that visibility of activities to the user is a valuable asset
because it brings trust to utilise the potential benefits served by cloud computing tech-
nology adequately. This work identifies the types of cloud services that this technique
supports and develop a suitable trust management system. In addition to authentica-
tion and authorization procedures, we assume that using the audit monitoring system
to monitor all positive and unsuccessful authentication and access attempts to be
genuine way to build trust and to asses attacks. For this reason, both the client and
the cloud service provider are responsible for maintaining its security.
References
1. Walker, K.: The treacherous twelve’ cloud computing top threats. In: RSA Conference Booth
#S2614, SAN FRANCISCO, Cloud Security Alliance, 29 Feb 2016
2. Multi-Factors Authentication, From Wikipedia, the free encyclopaedia. https://en.wikipedia.

org/wiki/Multi-factor_authentication
3. Gujar, G.V., Sapkal, S., Korade, M.V.: STEP-2 user authentication for cloud computing. Int. J.
Eng. Innov. Technol. (IJEIT) 2(10), ISSN: 2277-3754 ISO 9001:2008 Certified Apr 2013
4. Soni, P., Sahoo, M.: Multi-factor authentication security framework in cloud computing. Int. J.
Adv. Res. Comput. Sci. Softw. Eng. (IJARCSSE) 5(1), 1065–1071 (2015). ISSN: 2277 128X
5. Nagaraju, S., Parthiban, L.: SecAuthn: provably secure multi-factor authentication for the cloud
computing systems. Ind. J. Sci. Technol. (IJST) 9(9) (2016). ISSN (Online): 0974-5645
6. Munir, K., Palaniappan, S.: Secure cloud architecture. Adv. Comput. Int. J. (ACIJ) 4(1) (2013)
7. SDxCentral Staff “What is Cloud-Based Security” Topic Hub/Security, 17 Oct 2015 1:57 PM.
https://www.sdxcentral.com/security/definitions/what-is-cloud-based-security/
8. Sarai, S.: Building the new network security architecture for the future. In: SANS Institute,
Information Security Reading Room, SANS White Paper, Jan 2018
9. Khan, M.S., Warsi, M.R., Islam, S.: Trust management issues in cloud computing ecosys-
tems. In: International Conference on Sustainable Computing in Science, Technology and
Management (SUSCOM) (2019)
10. Noor, T.H., Sheng, Q.Z., Maamar, Z., Zeadally, S.: Managing trust in the cloud: state of the art
and research challenges. IEEE Computer Society, IEEE Xplore 2016
11. Khan, K.M., Malluhi, Q.: Establishing trust in mobile cloud computing. J. ICIC Express Lett.
9(6), 1713–1718
An Innovative Authentication Model
for the Enhancement of Cloud Security
Abstract Cloud computing provides different types of services deployed in different

models; thus, its security has become a paramount concern in the IT field. In this
paper, a conceptual framework is proposed to mitigate various authentication threads
by introducing an encrypted certificate and token build-up using the user’s geograph-
ical location in such a way to enhance the security and ensure the users against the
data loss, unauthorized access over unauthorized users and hackers.
Keywords Authentication · Cloud computing · Cloud security · Web API · API

gateway
1 Introduction
Cloud computing is a significant aspect of the development of computing technology.

It allows big and small organizations to manage and access their infrastructures,
applications, storages, networks, directories and platforms at various data center on
a distributed system through the Internet. The deployment of cloud computing is as
per the user interest and needs, such as a public cloud, a private cloud, a hybrid cloud
as well as a community cloud. Furthermore, cloud models like; Software as a Service
(SaaS), Platform as a Service (PaaS), Infrastructure as a Service (IaaS), Network as
a service (NaaS), Directory as a Service (DaaS).
F. K. Mupila (B) · H. Gupta

Amity University, sector 125, Noida, Uttar Pradesh, India
e-mail: kmf.mupila@gmail.com
H. Gupta
e-mail: hgupta@amity.edu
https://doi.org/10.1007/978-981-33-4543-0_48
According to its definition in computer science, authentication refers to the process

of verifying the identity of a connected device. In other words, authentication refers
to how an application determines who you are. Not to confuse with an authorization,
which refers to how application limits access to users. Authentication and authoriza-
tion are two things that need to comply. [1] This verification process can be done
through five forms as follows: Type 1: What the user knows such as a password or
pin, Type 2: What the user has, such as a phone, token or a smart card, Type 3: What
the user is such as fingerprint, retina and voice recognition, Type 4: Where the user
is such as the location, network, Type 5: What does the user prefer, user signature or
pattern.
In order to avoid attacks such as the man in the cloud, the man in the browser
or any other attack, the cloud service provider must set up a robust, reliable and
trustworthy mechanism to prevent and detect unauthorized users to access the data
or resources. Even though no security method can last for longer without being
tampered, in this paper, a conceptual framework to enhance the security of cloud
computing is proposed by introducing an encrypted certificate and token build-up
using the user’s geographical location to enhance the security and protect from data
loss and unauthorized access.
Figure. 1 shows the login interface which is used to enter the credentials.
Fig. 1 Login interface
Username
Password
Remember Me Log In
An Innovative Authentication Model for the Enhancement … 449
2 Importance of Security
The IT industry is evolving day by day. Companies, organizations, agencies and

more institutions are migrating from on-premise environment to cloud environment.
In this process, the complexity of cloud intensifies. Cloud is not simply used for
data storage, but it ensures in keeping data highly available, exchanging information
between client and cloud service provider, handling the communication channel and
so on. Cloud’s complexity makes it vulnerable which can result in data loss [2].
Therefore, cloud’s security is a significant aspect to be taken seriously.
By definition, cloud security is a collection of system, technology, software and
controls used to secure virtualized cloud computing infrastructure, IP, data, appli-
cation, services and related cloud infrastructure [3]. Nevertheless, the users must
use a secure connection to access the cloud environment, and cloud service provider
must maintain a high-security level, for instance, using a robust encryption system
to keep the integrity of the data. Besides, it is essential to remember that increasing
the security to a certain extent reduces the performance of functionality. Therefore,
both the safety and the performance are to be kept almost at an equal level [4].
One of the most dangerous attacks in the cloud environment is the insider attack.
Thus, the cloud service provider must ensure that workers who have physical access
to the servers at the data center conduct routine background checks so that the data
center is monitored for suspicious activities [5]. Subsequently, the cloud service
provider must ensure proper data isolation and logical storage segregation since more
than one customer’s data are stored on the same server. This sharing of resources
can contribute to information leakage, and 75% of security problems are triggered
by this sharing of resources [6].
In addition, the primary concern of cloud protection is the secrecy and privacy of
data. Therefore, the system should be sufficient and very scalable so that safety would
not be an additional requirement, but rather it should exist as an essential feature of
the system at all levels (computers, communications and service-level agreement)
[7]. Accordingly, with the growth of cloud technology, authentication remains a key
factor as it ensures that the user is the one claimed [8]. Trust, confidentiality, integrity,
availability, authentication and authorization are the most critical security problems
in cloud computing [9].
3 Related Work
Several techniques and methods have been deployed to provide sustainable authen-
tication; some of these techniques and work based on authentication are listed
below:
(1) Certificate-based authentication is usually used in the current industry because
a digital certificate has been incorporated to authenticate the customer. For the
first time, a user uses a service; a user installs a unique certificate on their
device, and when the user enters the service, they ask the computer for that
specific certificate and only access is given if the certificate is valid [10].
(2) Server authentication with location verification, the aim behind this paper is to
strengthen the problem of Web authentication. Here, the author uses a concept to
leverage the server location as the second factor of its authenticity by introducing
a location-based server authentication while preventing server impersonation at
any cost even if the secret of the victim server is known by the attacker [11].
(3) Security algorithms for cloud computing has been reviewed as symmetric algo-
rithms for different encryption and encoding strategies and has concluded that
EAS is a good choice for key encryption and that MD5 is faster for encoding.
In addition, it can be improved by using 1024-bit RSA and 128-bit keys with
RSA EAS encryption algorithm. This monitors the data protection of cloud-
based applications. The private key cannot be evaluated using AES although
the attacker provided the public keys [12].
The authentication breach is specified as the root of data losses in the cloud
environment. Once addressed, the customers will be ensured that the integrity of
their data stored in the cloud infrastructure is secure as just the tree in the soil is
secured from the root.
4 Proposed Model
This work aims to provide a secure procedure through which the user’s credentials
are entered and verified to gain access to the cloud environment. For this purpose, this
conceptual framework proposes an encrypted certificate and token build-up deployed
to provide a trusted authentication and authorization between the clients and the cloud
service provider by using the client’s geographical location. Additionally, with the
help of the Geolocation API used in google chrome55 and other upgraded Web
browsers, HTML5 Geolocation API is used to gather the user’s geographical location.
Even if this can compromise the user’s privacy, as mentioned in the new regulation
of privacy and the statement, the position is not procurable unless the user consents
to it [13].
Using this new feature allows the Web browser to notify the Web server about the
user’s accurate location. Similarly, there are a large number of factors that this feature
is dependent on, from technological, geographical and even to physical, to influence
how particular this feature is it implemented in the real world [14]. Subsequently,
this work is going to take place in three phases where each of them is performing and
executing a particular task in such a way to enhance the security over the process of
identifying users before gaining access to their resources.
The API gateway involved in this work acts as a reverse proxy to collect the user’s
requests and redirect them to the microservice in charge. Relatively, it decreases the
security breaches as only one public IP address it is used publicly [15].
API Gateway Client Service Web Server
Authentication
Server
Fig. 2 Flow of the authentication cycle
Figure 2 shows the cycle which the authentication of this proposed work follows,
henceforth, to consolidate the user’s verification operation into the three phases.
4.1 Working Principal of the Proposed Work
1. Phase 1
It incorporates the communication between the client service inserted on the Web
browser and the API gateway. From this point, the client’s requests for a Web page
from the Web server are executed by the gateway API as it is hosted inside the Web
server and acts as an entry point. Generally, API gateway accommodates SSL
certificate, authentication, authorization and many more microservices according
to the Web server’s configuration. Additionally, API gateway is configured in such
a way to receive and manage all the static requests from clients.
It receives the client’s request, and then, it forwards the login page. Besides, the
gateway API is configured in such a way to share the cryptographic hash function
and the cryptographic key with the client service to execute the HMAC algorithm
at the client service side. The preference over hash function here is because of its
swiftness in computing and its ability to minimize any duplication of the output
value.
2. Phase 2
After the exchange is established, the client service sends the user’s credentials
along with the users’ locations by dint of the Geolocation API of the browser
to the authentication server. Basically, the content sent through the network to
the authentication server is a hash-based message authentication code. From this
point, the integrity and confidentiality of the user’s credentials and the loca-
tions are guaranteed. Subsequently, the cryptographic hash function used for this
model is the SHA3-512. The authentication server stores the results if correct,
so that the further request cannot use the same details (the user name and pass-
word) to request access to the resources. This procedure averts against one of
the challenging threats faced by the cloud service provider, which is the replay
attack.
Here are the sub-processes of this phase; at first, the authentication server gener-
ates the JWT Web token to be sent to the client service. However, the token
contains a new claim by adding the user’s location into the payload. Following
that, the authentication server and the Web server execute the key distribution
exchange to use the RSA cryptographic system. Finally, an encrypted certificate
containing the user’s location is shared between the authentication server and the
Web server to verify the authenticity of the received token due to the additional
claim added. Beyond the ability to be a public-key cryptosystem, RSA is used
in this proposed model to sign the certificate and counter any alteration to the
message.
Figure 3 shows how the exchange of the token, the key and the certificate takes
place. Besides, there is an important aspect, namely the refresh token. Every new
request sent from the client service has to bring up to date its location to the
authentication server, and then, the operation to get the JWT token takes place
again.
Fig. 3 Exchange done in phase 2 of the proposed model

3. Phase 3
The encrypted token reaches the client service and then sent to the Web server to
validate the JWT token. And then, the pieces of information held in the certificate
are verified with the private key of the RSA cryptosystem before gaining access
to the resources.
Such security techniques ensure data transfer, the security of the user interface,
the security for the separation of data, the storage of data and the user access control
[16]. This research uses a token to make security decisions and store tamper-proof
information about a device individually. Although a token is usually used to reflect
only cryptographic details, it is also capable of carrying additional free-form data
that can be added when the token is being produced. Lack of good authentication can
result in illegal disclosure to the cloud domain user accounts, which can contribute
to breaches of privacy. Similarly, the absence of authorization in cloud computing
leads to infringements of privacy when unauthorized parties enter the user’s database
[17].
4.2 Program Code
4.2.1 Sting of Code Representing the Token: (Output)
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.
eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4
gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.
SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
4.2.2 Decoded JWT Token with Extra Claim (Highlighted In Yellow)
HEADER: ALGORITHM & TOKEN TYPE

{
"alg": "HS256",
"typ": "JWT"
}
PAYLOAD:DATA
{
"Sub": "1234567890",
"Name": "John Doe",
"iat": 1516239022
"enc_loc":"1s0x390cfd5b347eb62d:0x52c2b7494e204dce"
}
VERIFY SIGNATURE
HMACSHA256 (
base64UrlEncode (header) + "." +
base64UrlEncode (payload),
Your-256-bit-secret
) Secret base64 encoded
5 Future Work
It will be important that future research investigates whether the request is from an
authorized user. For that, it requires the efficacy and security of data transactions in
order to guarantee the integrity and confidentiality of the user’s data by implementing
a secure authentication technique. Most users prefer easy passwords which becomes
easy for an attacker to guess. But even the best password can also be stolen by brute
force and dictionary attacks. Taking this into consideration, future work must consist
of a more in-depth analysis in the complexity present in the cloud environment, and
a hybrid encryption system is preferably recommended to strengthen security.
5.1 Conclusion
The main conclusion that is drawn is that even though there is no such a robust or
stringent technique to implement the security into the Web environment, this paper
presents a model to authenticate the user with the help of the user’s geolocation. Cloud
security architecture works effectively only when the correct defensive implementa-
tions are in place and considered efficient only when it can recognize the questions
that arise the security management. To this end, the user’s access to the resource is
granted after the token, and the certificate is validated.
References
1. Turner, D.M.: Digital authentication: the basics. In: Cryptomathic. Archived from the Original
on 14 Aug 2016. Retrieved 9 Aug 2016
2. Gharami, S., Dinakaran, M.: Sequential mathematical solution for authentication and autho-
rization technique implementing encryption methodology creating a secure transaction using
various methods also at the quantum level. In: IOP Conference Series: Materials Science and
Engineering 2017
3. From Wikipedia, the free encyclopedia, https://en.wikipedia.org/wiki/cloud_computing_sec
urity
4. Bhardwaj, A., Goundar, S.: A framework to define the relationship between cybersecurity and
cloud performance. Comput. Fraud Secur. (2019)
5. Indu, I.A., Rubesh Bhaskar, P.M., Vidhyacharan: Identity and access management in a cloud
environment. Mech. Challenges Eng. Sci. Technol. Int. J. 21, 574–588 (2018)
6. Wueest, C., Barcena, M.B., O’Brien, L.: Mistakes in the IAAS Could Put Your Data
At Risk. https://www.symantec.com/content/en/us/enterprise/media/security_response/Whi
tepapers/mistakes-in-the-iaas-cloud-could-put-your-data-at-risk.pdf. May 2015
7. Subramanian, N., Jeyaraj, A.: Recent security challenges in cloud computing. J. Comput.
Eelectr. Eng. 71, 28–42 (2018)
8. Farooq, H., Lokhande, T.S., Rajeshri, R.: A review on cloud computing security using
authentication techniques. Int. J. Adv. Res. Comput. Sci. 8(2) (2017)
9. Kshetri, N.: “Privacy and security issues in cloud computing” The role of institutions and
institutional evolution. Telecommun. Policy 37, 372–386 (2013)
10. From Wikipedia, the free encyclopaedia. https://en.wikipedia.org/wiki/basic_access_authentic
ation
11. Yu, D.-Y., Ranganathan, A., Masti, R.J.: Salve: server authentication with location verification.
In: International Conference on Mobile Computing and Networking, Mobicom 2016
12. Bhardwaj, A., Subrahmanyam, G.V.B., Avasthi, V., Sastry, H.: Security algorithms for cloud
computing. In: International Conference on Computational Modelling and Security CMS
(2016)
13. The World’s Largest Web Developer. https://www.w3schools.com/html/html5_geolocation.asp
14. Rich, B.: Everything You Ever Wanted to Know About Html5 Geolocation Accuracy. Feb
2018. https://www.storelocatorwidgets.com/blogpost/20453/everything_you_ever_wanted_
to_know_about_html5_geolocation_accuracy
15. Bush, T.: API Gateway. 11 June 2019. https://nordicapis.com/what-is-an-api-gateway/
16. Gonzalez, N., Miers, C., Redigolo, F., Simplicio, M., Carvalho, T., Naslund, M., Pourzandi,
M.: A Quantitative Analysis of Current Security Concerns and Solutions for Cloud Computing.
Springer (2012)
17. Raju, B., Swarna, P., Rao, M.: Privacy and security issues of cloud computing. Int. J. (2016)
Substituting Phrases with Idioms:
A Sequence-to-Sequence Learning
Approach
Nikhil Anand
Abstract In this paper, a sequence-to-sequence model is proposed for translating

sentences without idiom to sentence with the idiom. The problem is challenging in
two ways, predicting the correct idiom based on context and generating the correct
sentence using the idiom due to complex semantic and syntactic rules of language.
Sequence-to-sequence learning has gained popularity in the past few years due to their
surprising results on machine translation task. This work is based on sequence-to-
sequence learning of word sequences along with their part-of-speech tags to predict
sentences with correct idiomatic phrases. Results have shown that models have
achieved higher BLEU score on using part-of-speech tags as the input sequences.
These observations show the prominence of part-of-speech tags in identifying the
hidden writing patterns in the language.
Keywords Machine learning · NLP · Encoder–decoder · RNN · POS tags
1 Introduction
Communication has evolved in thousands of years. Humans have covered a very long
journey, starting from cave paintings to modern language. Cave paintings, ideograms,
petroglyphs, pictograms, and writing, all these communication techniques have the
same common idea of conveying meanings from one individual to another or one
group to another. Language is an ordered system of communication that has emerged
in the past thousands of years and is continuously evolving. New words, phrases, and
proverbs are continuously being added to the languages. Apart from time, language
also got reshaped among a community and group of people. This leads to variation
in the same language in different periods and within different groups.
Idioms are part of any language that has a metaphorical meaning which is different
from the actual meaning of words comprising it. These phrases amplify the sentence
N. Anand (B)
Internshala, Gurugram, India
e-mail: nikhil@internshala.com
https://doi.org/10.1007/978-981-33-4543-0_49
458 N. Anand
when they are used. New idioms are being added from time to time, gaining popu-
larity and becoming part of our daily use. Semantic and syntactical rules specific to
any language make linguistics difficult. Irregularities also emerged due to different
writing styles.
This paper explores the possibility of augmenting natural text by substituting
phrases by idioms. This is a development of the previous work on implementing
an idiom recommendation system using POS tagging and sentence parsing [1]. The
earlier work was entirely based on handcrafted rules. In the field of NLP, we have
observed that the irregularities in natural language restrict the effectiveness of rule-
based methods for any task. The impressive results of neural networks in various areas
of natural language processing have influenced this work. From sentiment analysis
to anomaly detection, from text generation to image captioning, deep learning has
shown its unmatched capabilities [2–5].
In this work, a sequence-to-sequence model is proposed that translates a sentence
without any idioms to sentences with idiomatic phrases based on the context.
Different variations of RNN encoder–decoder models for the experiment without
explicitly defining any syntactic rules. The paper is framed in the following manner:
Sect. 2 represents the literature survey, Sect. 3 represents the methodology, Sect. 4
represents the results, and Sect. 4 represents the conclusion.
2 Literature Survey
2.1 Encoder–Decoder
The growing popularity of deep learning has evolved various architectures for
different applications in machine learning. From image recognition to machine
translation, all the tasks have specialized deep learning architectures [6, 7].
Encoder–decoder is one such neural network architecture, which is used for image
compression, neural machine translation, anomaly detection [3, 7, 8] etc.
The encode–decoder architecture contains two connected layers known as encoder
and decoder. When the encoder receives a source sequence, it reads the sequence
and converts it to a low-dimensional hidden state feature vector; the process is called
encoding. The decoder reverses the process by transforming the low-dimensional
vector back to the sequence, the process is called decoding. Since encoder–decoder
architecture is an end-to-end machine learning model, the results are not visible, and
it can be seen as a mapping source sequence to target sequence via an intermediate
hidden layer as a feature extractor.
In machine translation, the encoder–decoder architecture was implemented
successfully before. Different variations were implemented in the past few years.
An RNN encoder–decoder was proposed for statistical machine translation [7].
Another similar approach with LSTM layers as encoder and decoder was proposed
Substituting Phrases with Idioms: A Sequence-to-Sequence … 459
for machine translation which achieved greater BLUE scores for even much longer
sentences [9].
2.2 Recurrent Neural Network
Recurrent neural networks (RNN) are specialized neural networks that give impor-
tance to the order of input in long sequential input [10]. The gated architecture of
recurrent neural networks such as LSTM and GRU has gained popularity in recent
years due to their capability in capturing sequential regularities. There are two major
problems associated with RNN, vanishing gradient, and exploding gradient [11,
12]. Long short-term memory popularly known as LSTM is a solution to vanishing
gradient descent problems in recurrent neural networks [13]. It solves the problem
by using gated architecture. Gates are used at each input state to decide how much
the new input should be written in the memory cell and how much the content of the
current memory cell should be forgotten. LSTM architecture is defined as:

s j = RLSTM s j−1, h j = c j ; h j (1)
c j = f c j−1 + i z (2)

h j = o tan h c j (3)

i = σ x j W xi + h j−1 W hi (4)

f = σ x j W x f + h j−1 W h f (5)

o = σ x j W xo + h j−1 W ho (6)

z = tan h x j W x z + h j−1 W hz (7)

y j = OLSTM s j = h j (8)
s j ∈ R 2·dh , xi ∈ R d x , c j , h j , i, f, o, z ∈ R dh , W xo ∈ R d x × dh , W ho ∈ R dh × dh
(9)
Here, cj and hj are memory and hidden state component, respectively. Three gates
are there—i, f, and o which stands for input, forget, and output gate.
Gated recurrent unit popularly known as GRU is an alternative to LSTM. LSTM
architecture is hard to explain and its complexity makes it hard to analyze [14].
460 N. Anand
There are computational constraints with LSTM networks as well. GRU architecture
overcomes these shortcomings. It has fewer gates than LSTM and does not have a
separate memory cell. GRU architecture is defined as:

s j = RGRU s j−1, x j = (1 − z) s j−1 + z s ∼j (10)

z = σ x j W x z + s j−1 W sz (11)

r = σ x j W xr + s j−1 W sr (12)

s ∼j = tan h x j W xs + r s j−1 W sg (13)

y j = OGRU s j = s j (14)
s j , s ∼j ∈ R ds , xi ∈ R d x , z, r ∈ R ds , W xo ∈ R d x X dh , W so ∈ R ds X ds (15)
In bidirectional RNN, each element of the sequence is based on both past and
future contexts [15]. Two different RNN, one process from left to right and another
from right to left, are concatenated together. These networks are efficient when
features are extracted from the context window around a word. Bidirectional RNN
is defined as:

biRNN (x1: n , i) = yi = RNNforward (x1: i ); RNNbackward (xn:i ) (16)
2.3 Part-of-Speech Tagging
Part of speech is word categories in any language. This part of speech includes—
nouns, verbs, adjectives, determiners, adverbs, etc. POS tagging is a technique where
POS tags are assigned to words and word sequences. POS tagging techniques are
classified into two categories—supervised and unsupervised.
Supervised POS tagging technique uses probability for assigning the POS tags.
A large corpus is trained on tagged data. The probabilistic approach is used while
tagging the data considering unigram, bigram, trigram, hidden Markov model, etc.
Due to the sequential training of the POS tagger, these show best results for sequential
data only [16].
Rule-based tagger utilizes grammatical information and handcrafted set of rules
for assigning POS tags. These are one of the earliest tagging practices. The unsu-
pervised approach is not as accurate as of the supervised approach. Although, some
recent work has filled the gap between unsupervised and supervised approaches for
POS tagging by using bilingual graph-based projections [17].
3 Methodology
In this paper, a sequence-to-sequence model is proposed for substituting phrases with

idiomatic expressions. This work is an extension of the previous work on idioms
recommendation based on their syntactic structure using rule-based methods. This
method is influenced by previous work on neural machine translation using sequence-
to-sequence learning [9].
Two different idioms—in a while and for a while/awhile are used. To train the
model in such a way that it can identify the correct idiom, the correct version of
the idiom, and the correct position of the idiom in the sentence, we will be using
part-of-speech tag sequences along with the word sequences to train the model.
The goal of the model is to estimate the conditional probability of y1:m , where the
input sequence is x1: n is concatenated with POS1: n . The recurrent neural network
first obtains the fixed dimensional feature vector c = RNNEncoder (x1: n ; POS1: n ) by
stepping through the input time sequences. A condition generator RNNDecoder (c)
is then used for stepping through the output time steps with a softmax over all the
words in the vocabulary to obtain y1: m .
For the experiment, input text sequences, output text sequences, and POS tag
sequences are label encoded. Then, these sequences are padded to convert them to
fixed-size vectors. These fixed-size vectors are then used to train the encoder–decoder
framework (Fig. 1).
Four different versions of recurrent neural networks are used for the experiment-
SimpleRNN, GRU, LSTM, and Bi-LSTM. These specialized architectures are further
trained on word sequences as input data and word sequences as input along with the
part-of-speech tag sequences concatenated with them.
4.1 Dataset
For the experiment, 1275 sentences are collected from different web sources. The
dataset contains the pair of two sentences—sentence without idiom and sentence
with idiom phrase. These sentences can be classified between the two idioms—in a
while and for a while/awhile based on the context. The second idiom can be in two
forms—for a while or awhile based on the semantic rules of language. The sample
data is shown in Table 1. The left column has the input sentences while the right
column has the same sentences with idiomatic expressions replacing phrases.
462 N. Anand
Fig. 1 Proposed architecture with concatenated word embedding and part-of-speech embedding
as the input for the encoder–decoder framework
Table 1 Three sample

Sentences Sentence with idiom
sentences and their
corresponding sentences with We will reach there in a short We will reach there in no time
idiomatic expression period of time
Stay for a short period of time Stay awhile and rest
and rest
I was on crutches for a short I was on crutches for a while
period of time
4.2 Evaluation Metrices
For measuring the performance of sequence-to-sequence learning models, there

are various quantitative methods available such as word error rate (WER), multi-
reference word error rate (mWER), BLEU score, subjective sentence error rate
(SSER), and information item error rate (IIER) [18].
Table 2 BLUE-4 scores

Layers in model BLEU-4 score
from different architectures
with and without SimpleRNN 0.9420
concatenating the POS tag SimpleRNN with concatenated POS tags 0.9454
embedding as input along
GRU 0.9572
with word embedding
GRU with concatenated POS tags 0.9570
LSTM 0.9476
LSTM with concatenated POS tags 0.9568
Bi-LSTM 0.9623
Bi-LSTM with concatenated POS tags 0.9653
BLEU score is a machine translation evaluation score that compares the n-gram
in the machine-generated text to the n-grams in the reference text. For this model,
we have used a cumulative score from 1-gram to 4-gram, also called BLUE-4 [19].
5 Results
The model is evaluated on 10% of the total data. To evaluate the machine-generated
text, we have used BLEU-4 score. The score for different models is shown in Table 2.
The model is evaluated on 10% of the total data. From the above results, we
observe that concatenated POS tag sequences along with word sequences performed
comparatively better than only word sequences. Except for the GRU, every other
architecture of RNN performed significantly better with the concatenated layers.
Although, the difference in GRU both the different versions is insignificant. The
qualitative evaluation has also shown that concatenated POS tags have predicted
idioms more accurately.
6 Conclusions
A sequence-to-sequence model is introduced for translating input sentences to the

output sentence with an idiomatic phrase. It is observed that part-of-speech tags as
features improve the model ability to capture more hidden features and semantic
rules. The qualitative evaluation has shown the importance of part-of-speech tags as
features for predicting the correct idiom and capturing the context.
The results further suggest that the bidirectional LSTM model performs best
among all the other specialized RNN architectures. The results can further be
improved by using a larger data set. This model can be implemented on more idioms
and by using dense encoder–decoder networks. Using attention layers can also be
considered for further improving the results.
464 N. Anand
References
1. Anand, N.: Idiom recommendation using POS tagging and sentence parsing. In: Kumar, A.,
Paprzycki, M., Gunjan, V. (eds.) ICDSMLA 2019. Lecture Notes in Electrical Engineering,
vol. 601. Springer, Singapore (2020)
2. Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for senti-
ment classification. In: Proceedings of Conference on Empirical Methods in Natural Language
Processing—EMNLP 2015, pp. 1422–1432 (2015, September). https://doi.org/10.18653/v1/
d15-1167
3. Sakurada, M., Yairi, T.: Anomaly detection using autoencoders with nonlinear dimension-
ality reduction. In: ACM International Conference Proceeding Series, vol. 2, pp. 4–11 (2014,
December). https://doi.org/10.1145/2689746.2689747
4. Marcheggiani, D., Perez-Beltrachini, L.: Deep graph convolutional encoders for structured data
to text generation, pp. 1–9 (2018). https://doi.org/10.18653/v1/w18-6501
5. You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2016,
pp. 4651–4659 (2016, December). https://doi.org/10.1109/cvpr.2016.503
6. Calderon, A., Roa, S., Victorino, J.: Handwritten Digit Recognition using Convolutional
Neural Networks and Gabor filters. In: Proceedings of the 2003 International Congress on
Computational Intelligence (CIIC), pp. 1–8 (2003)
7. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical
machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural
Language Processing (EMNLP), pp. 1724–1734, 2014. https://doi.org/10.3115/v1/d14-1179
8. Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Deep convolutional autoencoder-based lossy image
compression. In: Proceedings of 2018 Picture Coding Symposium (PCS 2018), pp. 253–257
(2018). https://doi.org/10.1109/pcs.2018.8456308
9. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv.
Neural. Inf. Process. Syst. 4(January), 3104–3112 (2014)
10. Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990). https://doi.org/10.
1207/s15516709cog1402_1
11. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In:
30th International Conference on Machine Learning (ICML 2013), no. PART 3, pp. 2347–2355
(2013)
12. Pascanu, R., Mikolov, T., Bengio, Y.: Understanding the exploding gradient problem. In: 30th
International Conference on Machine Learning (ICML 2013), no. PART 3, pp. 2347–2355
(2013)
13. Hochreiter, S., Schmidhuber, J.: Long Short-term memory. Neural Comput. 9(8), 1735–1780
(1997). https://doi.org/10.1162/neco.1997.9.8.1735
14. Dey, R., Salemt, F.M.: Gate-variants of gated recurrent unit (GRU) neural networks. In: Midwest
Symposium Circuits System, vol. 2017, pp. 1597–1600 (2017, August). https://doi.org/10.
1109/mwscas.2017.8053243
15. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process.
45(11), 2673–2681 (1997). https://doi.org/10.1109/78.650093
16. Ratnaparkhi, A.: A Maximum entropy model for part-of-speech tagging. Ann. Neurol. 5(1),
133–142 (1996)
17. Das, D., Petrov, S.: Unsupervised part-of-speech tagging with bilingual graph-based projec-
tions. In: ACL-HLT 2011—Proceedings of 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies, vol. 1, pp. 600–609 (2011)
18. Tomás, J., Mas, J.À., Casacuberta, F.: A quantitative method for machine translation evaluation,
pp. 27–34 (2003). https://doi.org/10.3115/1641396.1641401
19. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of
machine translation. Ann. Phys. 371(23), 311–318 (2002). https://doi.org/10.3115/1073083.
1073135
A Composite Framework
for Implementation of ICT Enabled
Road Accident Prediction Using Spatial
Data Analysis
Dara Anitha Kumari and A. Govardhan
Abstract Matching with the growth of the nations, the road surface transportation
for each country has increased to it maximum capacity. The increased traffics have
left a very little space for the maintenance of the roads and keeping the roads up to the
mark for handling these higher volumes of traffic. Moreover, the traditional method
for road maintenance is manual and it is highly time consuming. Due to this fact,
many of the developed and underdeveloped nations are facing the problem of under
maintained roads, which again leads to the road accidents. The accidents in the roads
not only reduce the effectiveness of the country to match the growth of the industrial
development but also cause the threat to the human life, which is highly unacceptable.
Thus, the demand for the accident prediction automation is the need of the current
research. Many of the parallel research outcomes have aimed to solve this problem
by analysing the road traffic volume. However, many of the researchers, which again
is cited further in this work, have proven that the accidents on the road surface are
caused by the road conditions, rather by the traffic volume. Henceforth, this work
proposes a novel framework for demonstrating the use of ICT enabled methods
for predicting the accident-prone zones by analysing the road conditions. This work
demonstrates nearly 90% accuracy for noise reduction, nearly 98% accuracy for road
surface defect detection and nearly 98% accuracy for predicting the accident-prone
zones for making the road surface transportation a much safer option.
Keywords Correlation · Adaptive · Location dependent · Accident possibility

prediction · Regression
D. A. Kumari (B)
Department of Computer Science, JNTUH, Hyderabad, India
e-mail: anithakumaridara@gmail.com
A. Govardhan
JNTUH, Hyderabad, India
e-mail: govardhan_cse@jntuh.ac.in
https://doi.org/10.1007/978-981-33-4543-0_50
466 D. A. Kumari and A. Govardhan
1 Introduction
Indian cities are among the fastest developing metropolises in the world. With the
growth of the economy as the population is continuously increasing in big and
medium sized cities with the increase of the living standards. The future strength
of India lies in urban territories; therefore, it is crucial to evolve these environments.
Smart city solutions are best serving the needs of citizens to live a safe, convenient
and happy life. The mission to develop smart cities in India consists of many diverse
tasks. One of the ways towards smart city is better road condition. Road network
acts as the principal network to smooth out the progress of trade, transport, social
assimilation and financial development. It provides better accessibility, flexibility and
reliability and thereby is the greatest advantage of economies of scale. According
to NHAI, 60% of goods and 80% of fare traffic are carried by the road. Among
all type of transport via road transport is preferable for short distance connectivity.
Mega cities traffic in India is increasing day by day due to increase in the population.
Quantity of vehicles has also been increasing at an average speed of 10.16% per
annum over the decade.
Geometric shapes facilitates in the plane can be utilized instinctively regarding
one’s present area, where case the x-pivot will highlight the nearby north. All the more
officially, such arranges can be gotten from three-dimensional directions utilizing the
cunning of a guide projection. It is absurd to expect to outline bended surface of Earth
onto a level guide surface without disfigurement. The trade-off frequently picked—
called a conformal projection—jelly points and length proportions, with the goal that
little circles are mapped as little circles and little squares as squares.
Further, the rest of the paper is organized such that, in Sect. 2, the problem is iden-
tified and listed for finding the solution in the next phase, in Sect. 2.1, the proposed
architecture is furnished, in Sect. 3, the comparative benefits are listed with parallel
research outcomes and finally, this work presents the final research conclusion in
Sect. 4.
2 Problem Identification
In this section of the work, the problem is identified and presented. With the collected
recommendations from various research attempts, various research attempts are being
implemented and studied in order to establish a process to automate the maintenance
cycle of the roads. Nevertheless, the complete cost propositions of those models are
not completely justified. Also, adding to this complexity challenge, road conditions
are captured in various lighting conditions and using variety of capture devices,
disjointed by quality and method of capturing. A number of research attempts are
carried out in order to detect the road condition based on the potholes. Nonetheless,
the detection process is highly time complex and makes the maintenance process
delayed [4–6], Further, majority of the parallel research outcomes failed to measure
A Composite Framework for Implementation of ICT Enabled Road … 467
multiple potholes in a single image and cannot distinguish the potholes based on the
emergency of repair. Thus, this work defines a newer dimension of pothole detection
for road images, which contains higher number of potholes in a single image and
makes the process faster by reducing the change of false detection. It is being observed
that the preventing measures on the road repair can make the road surface last longer
and can reduce significant time for the maintenance needed for high damage rebuild
operations. Nonetheless, the potholes causing major delinquent on the road surfaces
can easily visible but the cracks on the road, which will significantly, become a
pothole cannot be always seen by human eyes.
Henceforth, after the understanding of the problem, in the next section of this
paper, the proposed architecture is presented.
2.1 Proposed Architecture
In this section of the work, the proposed architecture is furnished in Fig. 1.

Considering the fissures of the recent researches, this work identifies the following
major steps to be taken in order to make a significant contribution towards this
research domain:
• Improving the input image quality is the basic need to improvement the accuracy
for detecting the road conditions.
• The first and foremost challenge of this direction of the research is to formulate a
method or framework to identify and reduce various types of noises and adaptively
remove the images.
Fig. 1 Proposed architecture of the automated framework

• This work extracts the parameters for determining the potholes existence as the
major outcome.
• Yet another outcome of this work is to classify the potholes based on the urgency
of repair.
• The outcome of the work is to automate the detection facility to provide a timely
maintenance alert and deliver a better road condition in India.
• In addition, the maintenance tasks demand a suitable condition of the weather,
which is difficult to predict. Situations have proven that the maintenance work
started with no knowledge of the weather had to abort and caused further delay in
the task resulting into further decay in the road conditions. Thus, it is the demand
of the recent research to provide the prediction of the road condition in order
to detect the potholes to be given higher priority, cracks to be considered for
immediate repair and patch works to be ignored during the automation.
• The major outcome of this work is to build an automated framework to analyse and
predict the road damages and recommend the schedule maintenance tasks with
100% accuracy in order to make the world with better surface transport capable.
The proposed algorithms are already been discussed in the other works by the
same author [1–3].
Henceforth, in the next section of this work, the proposed framework is compared
with the other parallel research outcomes.
3 Comparative Analysis
As this work is already been demonstrated in various parts in the previous sections,
the final comparative analysis is carried out in this section of the work [1]. Firstly,
from research objective—1, the noise reduction comparisons are furnished in Table 1.
Dara, A.K., and Govardhan, A. [2] the results are visualized graphically [6–9]
in Fig. 2. Secondly, from research objective—2, the clustering comparisons are
furnished in Table 2.
The results are visualized graphically in Fig. 3.
Dara, A.K., and Govardhan, A. [3] finally, from research objective—3 [9–12], the
prediction accuracy comparisons are furnished in Table 3.
The results are visualized graphically in Fig. 4.
Table 1 Noise reduction comparative analysis

Research outcome Missing value detection and Outlier detection and reduction
reduction accuracy (%) accuracy (%)
Ertürk et al. [4] 58.32 58.49
Çeşmeci et al. [5] 62.84 61.61
Proposed method 90.00 90.00
Fig. 2 Noise reduction comparative analysis
Table 2 Clustering accuracy

Research outcome Accuracy (%)
comparative analysis
Kanarachos et al. [6] 90
Bello-Salau et al. [7] 94
Bayer et al. [8] 91
Azhar et al. [9] 90
Proposed method 98
Fig. 3 Clustering accuracy comparative analysis

Table 3 Prediction accuracy

Research outcome Accuracy (%)
comparative analysis
Trajectory-based, Cai 2015 [10] 90
Dense trajectory, Wang [11] 84
Auto parking, Mahmood [12] 90
Proposed method 98
Fig. 4 Prediction accuracy comparative analysis
Henceforth, it is natural to realize that the proposed automated framework has

outperformed the parallel research works.
Further, in the next section of this paper, the final research conclusion is presented.
4 Conclusion
In order to match the current trend of research, this work proposes a novel frame-
work for predicting the road accident-prone zones on a live map. This work maps
the zones with the coordinates such as longitude and latitude from the maps. To
achieve the higher accuracy of the prediction, in the first phase of the research,
this work deploys three algorithms such as Adaptive Moment-Based Spatial Image
Noise Detection and Removal Algorithm (AMBSI-NDR) for reduction of the noises
from the image data, which is separated from the spatial data, Adaptive Logistic
Correlation-Based Missing Value Identification and Replacement Algorithm (ALC-
MVIR) for missing value reduction method from the textual data extracted from
the spatial information and Correlative Logistic Correction-Based Outlier Identifi-

cation and Removal Algorithm (CLC-OIR) for reducing the outliers from the textual
data extracted from the spatial information. During this phase, the work demon-
strates nearly 90% accuracy. In the second phase of this research, this work presents
the fourth algorithm as Parametric Extraction and Pragmatic Clustering for Defect
Detection (PE-PC-DD) for clustering of the defects based on the extracted param-
eters. During this phase, the work demonstrates nearly 98% accuracy. Finally, in
the third phase of the research, this work showcased the algorithm implementation
called Correlation-Based Adaptive Location-Dependent Accident Possibility Predic-
tion (CBA-LD-APP) for predicting the accident-prone zones with nearly 98% accu-
racy. Thus, this work demonstrates a complete autonomy for the accident prediction
and accident-prone zone mapping and must be considered as one of the benchmark
in this domain of research.
References
1. Dara, A.K., Govardhan, A.: Noise reduction in spatial data using machine learning methods for
road condition data. Int. J. Adv. Comput. Sci. Appl. 11. https://doi.org/10.14569/ijacsa.2020.
0110120
2. Dara, A.K., Govardhan, A.: Parametric extraction of the road conditions spatial data and detec-
tion of defeats using pragmatic clustering method. Int. J. Eng. Adv. Technol. (IJEAT) 9(3)
(2020). ISSN 2249 – 8958
3. Dara, A.K., Govardhan, A.: Detection of coordinate based accident-prone areas on road surface
using machine learning methods. Int. J. Comput. Eng. Inf. Technol. (IJCEIT) 12(3) (2013).
E-ISSN 2412-8856
4. Ertürk, A., Çeşmeci, D., Güllü, M.K., Gerçek, D., Ertürk, S.: Integrating anomaly detection to
spatial preprocessing for endmember extraction of hyperspectral images. In: Proceedings of
IEEE Geoscience and Remote Sensing Symposium (IGARSS), pp. 1087–1090 (2013)
5. Ertürk, A., Çeşmeci, D., Güllü, M.K., Gerçek, D., Ertürk, S.: Endmember extraction guided
by anomalies and homogeneous regions for hyperspectral images, IEEE J. Sel. Topics Appl.
Earth Observ. Remote Sens. 7(8), 3630–3639 (2014)
6. Kanarachos, S., Christopoulos, S.R.G., Chroneos, A., Fitzpatrick, M.E.: Detecting anomalies
in time series data via a deep learning algorithm combining wavelets neural networks and
hilbert transform. Expert Syst. Appl. 85, 292–304 (2017)
7. Bello-Salau, H, Aibinu, A.M., Onumanyi, A.J., Onwuka, E.N., Dukiya, J.J., Ohize, H.: New
road anomaly detection and characterization algorithm for autonomous vehicles. Appl. Comput.
Inf. (2018). [online] Available https://doi.org/10.1016/j.aci.2018.05.002
8. Bayer, F.M., Kozakevicius, A.J., Cintra, R.J.: An iterative wavelet threshold for signal
denoising. Sig. Process. 162, 10–20 (2019)
9. Azhar, K., Murtaza, F., Yousaf, M.H., Habib, H.A.: Computer vision based detection and
localization of potholes in asphalt pavement images. In: 2016 IEEE Canadian Conference on
Electrical and Computer Engineering (CCECE), pp. 1–5 (2016, May)
10. Cai, Y., Wang, H., Chen, X., et al.: Trajectory-based anomalous behaviour detection for
intelligent traffic surveillance. IET Intell. Transp. Syst. 9(8), 810–816 (2015)
11. Wang, H., Klaser, A., Schmid, C., et al.: Action recognition by dense trajectories. In: Proceed-
ings of IEEE International Conference on Computer Vision and Pattern Recognition, Colorado
Springs, CO, USA, pp. 3169–3176 (2011)
12. Mahmood, Z., Haneef, O., Muhammad, N., et al.: Towards a fully automated car parking
system. IET Intell. Transp. Syst. 13, 293–302 (2019)
VISION AID: Scene Recognition
Through Caption Generation Using Deep
Learning
Mathew Regi and Mathews Abraham
Abstract Visually impaired individuals heavily trust their alternative senses like
acoustic signals and touch to comprehend the world outside. It is incredibly tough for
a visually handicapped individual, to perceive objects without feeling them. But there
could be times when physical contact between the individual and the object is risky or
deadly. This proposed paper presents a real-time object recognition application to aid
the visually impaired. A camera linked mobile phone with systematised orientation,
given as input to a computer device for generation of real-time object detection. The
proposed project utilises a convolutional neural network (CNN) to recognise pre-
trained items in captured imagery and uses recurrent neural network (RNN) with
LSTM for generation of captions. Here, the caption dataset is utilised for the training
of captioning model. After the training, these neural models can generate captions
of objects. The network output can then be analysed to impart to those with visual
impairment. This is put forth in audio format by converting the generated captions to
audio. Exploratory outcomes on the MS-COCO dataset show that our design beats
the best in class.
Keywords Object recognition · Caption generation · CNN · RNN · LSTM
1 Introduction
In this age, where most applications solely benefit the healthy ones, it is essential
to create a device for guiding the visually challenged. Generally, these impaired
individuals depend on the assistance of others to guide them through. Unfortunately,
there could be scenarios, where help may not be easily available or the blind may
get fooled.
M. Regi (B) · M. Abraham

Department of Information Technology, Rajagiri School of Engineering and Technology,
Ernakulam, Kerala, India
e-mail: mathewregi333@gmail.com
M. Abraham
e-mail: mathewsa@rajagiritech.edu.in
https://doi.org/10.1007/978-981-33-4543-0_51
474 M. Regi and M. Abraham
Taking these issues into consideration, it was proposed to design a modern appli-
cation favouring the visually impaired. In this technological era, with strides towards
progress in every sphere, the blind must not get left behind. This application aims
to provide them with a better understanding of the world around. Currently, a few
things like spectacles, the Braille or even a walking stick are used to tide over the
impairment and move on with their lives.
This proposed project utilises convolutional neural network model for recognising
objects along with recurrent neural network for generating captions. Its imagery is
described automatically to the blind person by converting the generated text-to-
speech, devoid of external help.
2 Related Works
Various methods have been designed for generating captions from images. This
section includes some of the important works done in the area of caption generation
using deep learning techniques.
Khademi et al. [1] propose a contextual and focussed deep architecture for the
caption generation of images. The proposed architecture uses a bidirectional grid
LSTM. This captures visual aspects of an RGB imagery as input and learns its
intricate space patterning based on a dual-dimensional background, by choosing
or disregarding its input. Often, region-grounded versions elucidate features of
those entities and their link in the images. For caption generation of images, it
integrates characteristics from grid LSTM with these versions, utilising dual-layer
bidirectionally.
A new approach based on region-based deep learning [2] method is recommended
to generate captions for imagery. This consists of recurrent neural network (RNN)
attribute predictor, region-based object detector, encoder-decoder language producer
fixed with dual RNNs to create meaningful explanations of the given imagery. It uses
R-CNN architecture to detect objects and encoder-decoder RNN model to generate
sentences. The IAPR TC-12 dataset enables the evaluation process.
In paper [3], multilayer dense attention architecture is proposed to generate image
captions. Faster R-CNN is used to obtain imagery features, and LSTM helps decode
the multilayer dense attention architecture. Thus, caption text is produced. The
model’s overall architecture is performed on encoder-decoder format, which is split
into two levels: bottom-up attention and top-down attention. The first mechanism is
proposed to extract image regions, and the second mechanism is to produce the rele-
vant captions in each time series. It is evaluated on various datasets like MS-COCO,
Flickr, Chinese-AI.
The method proposed in paper [4] uses a cascade recurrent neural network
(CRNN) to generate image captions. CRNN uses a cascade network to generate the
captions of images. This network can utilise in-depth meaningful contexts present
in the imagery. Unlike the conventional MRNN, CRNN comprises front-end and
back-end network, linked to obtain visual language interfaces from two sides. Here,
VISION AID: Scene Recognition Through Caption Generation Using … 475
a stacked gated recurrent unit is made with dual concealed levels that stretch the
verticality of RNN and thus obtains meaningful correlation between images and
sentences. Its back-end network has been developed to extract semantic context by
front and back directions to predict words. It transfers the acquired knowledge in
the front as initial setting and feeds sentences in reversal in the back-end system.
Efficacy of CRNN is confirmed by MS-COCO datasets.
3 Proposed System
This proposed system is a real-time scene capturing application for those vision
impaired individuals to guide them through. This application will capture the image
of a scene and deliver the description of the scene as an audible format. Hereby, they
understand what objects are in front of their surrounding through a camera aligned
smartphone and thus reduce the risk of accidents.
The layout of this recommended device is explained in Fig. 1. Initially, the user
on shaking the mobile activates the application, and the camera starts taking pictures.
Fig. 1 System layout

Then, the picture is taken to the server, where the weight file is stored for predicting
the caption. The MS-COCO dataset [5] is employed to train the network.
For generating captions, a lot of image data is essential. Varied image datasets like
Flickr30k, Flickr8k, MS-COCO, SBU, Pascal and more can be easily accessed. MS-
COCO is the latest and possibly the most popularly utilised and systematised dataset.
It has 82,783 images for the training and 40,504 for both testing and validation. Each
one of the images consists of five captions. Current model is used to train with the
MS-COCO dataset. This is used extensively for network testing and training.
Initially, a pre-trained convolutional neural network (CNN) with VGG19 architec-
ture is used for preprocessing the image, and the output is given to RNN to generate
descriptions for images. Subsequently, the generated captions are saved in a text file
and given over to the mobile. Next, the caption in text format is converted into speech
by using text-to-speech API and given back to the visually impaired user. The major
steps of this proposed system is described herewith.
3.1 Object Detection and Recognition
VGG-19 object detector is implemented to detect the object efficiently. VGG-19 is

a CNN architecture that is 19 layered, wherein number 19 represents the total layers
with trainable weights. This comprises 16 convolutional levels besides 3 wholly
connected ones (Fig. 2) [6].
The VGG-19 consists of five sets of convolution layers. Of this, two of them have
64 filters and the next set has 2 convolution layers with 128 filters. This is followed
Fig. 2 VGG19 architecture

Table 1 Example of object

Colour Black, white, grey, blue, green, etc.
attributes
Shape Long, circle, round, rectangle, square, etc.
Pattern Spotted, striped
Texture Rough, furry, smooth, shiny, metallic, wooden, wet,
etc.
by a set of 4 convolution levels having 256 filters. Next, 2 sets have 4 convolution
levels each, having 512 filters. Max pooling layers between each set of convolution
layers have 2 × 2 filters with 2 pixels. The yield of the last pooling layer is levelled
out and send to a fully connected layer, whose output is fed to another similar layer
with 1000 neurons. All these layers are ReLU activated. Finally, there is a softmax
layer that outputs a vector which represents the probability distributions of a list of
outcomes. Convolution layers and fully connected layers are trainable weights. Max
pool layer helps decrease the size of input imagery, where softmax is utilised for the
final decision making.
The system takes a (224, 224, 3) RBG image as the input from the MS-COCO
dataset for training. After training, the network is capable of detecting an object in
the scene.
3.2 Attribute Prediction
Here, we utilise RNN-based attribute classification [7]. Research shows that RNNs
benefit varied spheres of machine learning, involving caption generation, device
transformation, etc. RNNs are employed here because of their capability in effectively
predicting the attributes that must be reported for the prescribed set of features.
Besides, RNNs utilised for this work are word based and uses LSTM [8] architec-
ture. At the point of testing, CNN helps to obtain imagery features. These are then
used for predicting multiple characteristics, one at a time. This prediction depends on
those extracted image aspects in combination with previously created terms. It goes
on producing appropriate features till the assigned STOP is generated, i.e., when
RNN concludes that no other attribute can be utilised for describing imagery, with
all its aspects and those formerly produced features (Table 1) [2].
3.3 Caption Generation
The captions generated thereof are more explanatory than those brought forth by other
prevailing research studies, in terms of features and object recognition details. Hence,
ideally, the MS-COCO dataset is utilised for the purpose of training and evaluation.
Owing to its descriptions, this proposed system is superior when compared to other
popular databases.
Both CNN and RNN are utilised to generate captions [9, 10]. Hence, the network is
trained through the MS-COCO dataset. Here, each image in particular is combined
with five captions. To hasten training, each imagery is pre-encoded to its feature
series. Since the caption may contain a large number of unique terms, word encoding
is not used. But, the trained entrenched architecture outputs the word into a vector
like (1, 128). LSTM architecture is used to generate captions. During the training
period, the network learns how to develop descriptions for images through analysis
of the provided dataset.
After training the network, a weight model is formed which contains all the learned
weights of the network. The vector format test image is fed as input to weight model
to create the captions.
Overall, both the CNN- and RNN-based object and attribute estimations are very
efficient in classifying high-meaningful sentence generation.
3.4 Application Development
The proposed caption generation mobile app is developed through react native, a
JavaScript framework. It is used to write veritable, localised mobile apps for android
and ios devices. Based on react Facebook’s JavaScript library, this helps to build user
interactions, yet rather than target web browsers, it aims at mobile platforms. React
native easily enables simultaneous development for both android and iOS.
Our application is created to be work on android devices. As a blind aid the user
can activate the application by gesture like shaking the mobile. Then, the sensor event
listener consists of the sensor manager which notifies whenever it receives sensor
data. The two variables used to detect whether shake has occurred are the sensor
accelerometer and sensor manager. The camera activity commences if the threshold
is more than value of threshold set, or else it will not trigger the camera activity. The
camera activation enables the user to capture images.
The saved mobile imagery is fed as input into a trained system that generates
captions for images. Then, this caption is saved into a text document. Subsequently,
this is directed into the mobile and converted into speech. The text-to-speech API is
used to convert the generated captions into an audible format for the blind, which is
similar to human speech.
4 Evaluation Results
For evaluating the performance of this system to generate image captions, we use the
BLEU score to compare our model with other existing models. There are different
evaluation metrics like BLEU, ROUGE, METEOR, etc. for evaluating description
generation.
Table 2 BLEU evaluation report

BLEU-1 BLEU-2 BLEU-3 BLEU-4
CRNN [4] 0.691 0.514 0.376 0.275
Deep [11] 0.713 0.539 0.403 0.304
Show and tell [12] 0.718 0.504 0.357 0.25
Adaptive [13] 0.742 0.58 0.439 0.332
Our model 0.751 0.592 0.448 0.341
The bilingual evaluation score (BLEU) is a method to calculate a predicted caption

to reference caption. If it matches, then the score will be 1, but a mismatch results
in 0. Through this evaluation, the closeness between a system produced caption and
the original dataset caption is determined.
The BLEU evaluation of existing methods has been employed for comparison and
to help us calculate the efficiency of our model. BLEU-1, BLEU-2, BLEU-3, BLEU-
4 are cumulative scores that confer with the computation of individual n-gram scores
at all series from one to n and weighting them by computing the mathematical mean.
An individual N-gram score is an analysis of simply matching grams of a selected
order, like single words (one-gram) or word pairs (two-gram or bigram). By default,
it computes up to a cumulative 4-gram BLEU score, also known as BLEU-4. The
cumulative and individual one-gram BLEU use the same weights. The two-gram
weights assign 50 percent to each of one-gram and two-gram and the three-gram
weights are 33 percent for each of the one, two, and three-gram scores. The weights
for the BLEU-4 are 25 percent for each of the one-gram, two-gram, three-gram,
and four-gram scores. Table 2 depicts the comparison table of the BLEU score of
different models. It also shows that our model generates captions better than other
existing models.
This proposed model is executed based on TensorFlow framework on a NVIDIA
TESLA T4 GPU. The epochs were set to 50 for the purpose of training. After the
completion of epochs, the trained CNN-RNN-based captioning model consist of
trained weights for testing was obtained.
The real-time caption generation from the developed application is shown in
Fig. 3. On shaking the mobile, then the image is taken, and its corresponding caption
is generated in the form of speech. The results obtained from the android application
is shown in Fig. 3.
The generated captions are (a) a vase filled with purple flowers, (b) a clock on a
table, (c) a car parked in front of a window.
5 Conclusion
Today, modern technology has grown by leaps and bounds. This can be harnessed
efficiently for creating a device that will aid the blind to live fuller lives. The proposed
Fig. 3 Generated captions from the application
system will provide them with a better understanding of their surroundings and make
them more independent. So our project aims to develop a user friendly application
that can guide the visually impaired in our society. The proposed system focuses
on generating captions for varied images. This android application will generate
meaningful sentences for images captured by the camera aligned smartphone of the
blind user. It then speaks out the captions formed for the benefit of the visually
impaired.
As future work, this can work more precisely generate captions if it transferred
to video file. Some issues like out-of-focus or blurred imagery could be solved by
utilising the video as an input to the system. The efficacy of the network enables to
tide over any delays in caption generation for images fed as input to the server. The
accuracy of prediction can be increased through high-quality datasets and efficient
training.
References
1. Khademi, M., Schulte, O.: Image caption generation with hierarchical contextual visual spatial
attention In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Work-
shops (CVPRW), Salt Lake City, UT pp. 2024–20248 (2018). https://doi.org/10.1109/CVPRW.
2018.00260
2. Kinghorn, P., Zhang, L., Shao, L.: A region-Based Image Caption Generator with Refined
Descriptions. Elsevier (2018). https://doi.org/10.1016/2017.07.0140925-2312/2017
3. Wang, E.K., Zhang, X., Wang, F., Wu, T., Chen, C.: Multilayer dense attention model for
image caption. In: 2019 IEEE Access 7, 66358–66368. (2019). https://doi.org/10.1109/ACC
ESS.2019.2917771.
4. Wu, J., Hu, H.: Cascade recurrent neural network for image caption generation. Electron. Lett.
53(25), 16421643 (201, 7th December). (IEEE)
5. Lin, T.Y., et al.: Microsoft coco: Common objects in context. In: Fleet, D., Pajdla, T., Schiele, B.,
Tuytelaars, T., (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer
Science, vol 8693. Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1_48
6. ResearchGate, Fig. 8 Illustration Of The Network Architecture Of VGG-19 Model: conv
means convolution, FC means fully connected , https://www.researchgate.net/figure/llustr
ation-of-the-network-architecture-of-VGG-19-model-conv-means-convolution-FC-means_
fig2_325137356
7. Wu,Q., Shen, C., Wang, P., Dick, A., van den Hengel, A.: Image captioning and visual question
answering based on attributes and external knowledge. IEEE Trans. Pattern Anal. Mach. Intell.
(2017). https://doi.org/10.1109/tpami.2017.2708709
8. Poghosyan, A., Sarukhanyan, H.: Long short-term memory with read only unit in neural image
caption generator. IEEE Comput. Sci. Inf. Technol. (2017). https://doi.org/10.1109/csitechnol.
2017.8312163,2017
9. Kumar, N.K., Vigneswari, D., Mohan, A., Laxman, K., Yuvaraj, J.: Detection and recognition of
objects in image caption generator system: a deep learning approach. In: 2019 5th International
Conference on Advanced Computing & Communication Systems (ICACCS), Coimbatore,
India, 107–109 (2019). https://doi.org/10.1109/ICACCS.2019.8728516
10. Luo, R.C., Hsu, Y., Wen, Y., Ye, H.: Visual image caption generation for service robotics and
industrial applications. In: 2019 IEEE International Conference on Industrial Cyber Physical
Systems (ICPS), Taipei, Taiwan, 827–832 (2019). https://doi.org/10.1109/ICPHYS.2019.878
0171
11. Re, Z., Wang, X., Zhang, N., Lv, X., Li, L.-J.: (2017) Deep reinforcement learning based image
captioning with embedding reward. In: 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) (2017). IEEE. https://doi.org/10.1109/cvpr.2017.128
12. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.:
Show, attend and tell: neural image caption generation with visual attention. In: Proceedings
of the International Conference on Machine Learning, pp. 20482057 (2015)
13. Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual
sentinel for image captioning.In: 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Honolulu, HI, 3242–3250 (2017). https://doi.org/10.1109/CVPR.201
7.345
Effect of Hybrid Multi-Verse with Whale
Optimization Algorithm on Optimal
Inventory Management in Block Chain
Technology with Cloud
C Govindasamy and A. Antonidoss
Abstract One of the important works of supply chain management is the optimal
inventory control. The optimal inventory control techniques plan to minimize the
supply chain cost by efficiently managing the inventory. This paper tactics to analyze
the influence of hybrid Multi-Verse Optimization (MVO) and Whale Optimization
Algorithm (WOA) termed as Whale-based Multi-Verse Optimization Algorithm (W-
MVO) on optimal inventory management in block chain under cloud sector. The costs
like transaction cost, inventory holding cost, shortage cost, transportation cost, time
cost, setup cost, back-ordering cost and quality improvement cost are considered
for deriving the multi-objective model. The effectiveness of the proposed hybrid
algorithm is analyzed by varying Travelling Distance Rate (TDR) from 0.2 to 1.2,
and the model is evaluated with the assistance of block chain under the cloud sector.
Keywords Supply chain management · Optimal inventory control · Whale-based

multi-verse optimization algorithm · Block chain · Transaction cost · Inventory
holding cost · Shortage cost · Transportation cost · Time cost · Setup cost ·
Back-ordering cost · Quality improvement cost
1 Introduction
The continuity of organizations in this provident world is about the merit of control-
ling inventories. In most of the fabricating organizations, there must be few kinds
of inventory varieties like; effectiveness of material on the technique that is not
yet completed raw materials that are progressing to be sorted via generation and
finished outcomes for sales that are managed for the organization. The enhance-
ment of the organization is achieved by the best inventory control strategies [1].
C. Govindasamy (B) · A. Antonidoss

Department of Computer Science and Engineering, Hindustan Institute of Technology and
Science, Chennai, India
e-mail: cgovindasamy30@gmail.com
A. Antonidoss
e-mail: aro.antoni@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_52
484 C. Govindasamy and A. Antonidoss
Inventory management is an analytical management problem for many companies

like medium-sized companies, large companies and small companies. Block chains
are a dispensed framework and determining model based on an asymmetric encryp-
tion algorithm. Block chains contain distinctive benefits over existing transaction
techniques and data storage. One of the best components is the efficient flow inven-
tory management. The generation techniques are adopted in certain inventories. The
retailers and wholesalers hold the necessary sufficient inventories. The basic objec-
tive is to attain inventory management between high ROT and low inventory. The
levels of inventory handle the organization materials very efficiently.
2 Literature Review
Although there were several inventory management models, still there exist various
challenges that have to be resolved in the future. Integrated research framework
[2] enhances the behaviour of inventory control systems, and it also reduces the
inventory-related costs. But, it does not consider the inconsistent and changing dates
of expiry in various groups of received orders, and it also does not consider the up-to-
date behavioural factors in healthcare. The two-stage stochastic programming model
[3] is sufficiently adaptable, and it also reduces the present target levels to minimize
the total cost and wastage. Still, it does not involve the feasibility regarding the
hospital in a network. Holistic Mixed Integer Linear Programming (MILP) model
[4] permits the dynamic inventory management, and interacting pumping runs, and it
also quickly proves the optimality. However, it does not manually assign the starting
products inside the pipeline, and it influences both the Central Processing Unit (CPU)
time and the solution quality. Simultaneous Equation Modelling (SEM) [5] provides
the concurrent and associated relationship among the demand simulation impact and
the sales impact, and it also provides various methods for producing the resulted
goods. Yet, it has limitations of data availability, and it also does not differentiate
and compute the predicting accuracy of every system. Continuous-time scheduling
model [6] accomplishes the optimization of depot inventory management and multi-
product pipeline transportation. But there exist few challenges among the realistic
applications and the developed work. These challenges motivated to analyze the
influence W-MVO on optimal inventory management in block chain under the cloud
sector.
Effect of Hybrid Multi-Verse with Whale Optimization Algorithm … 485
3 Major Assumptions and Structure of Proposed Inventory

Management Model
3.1 Structure and Assumptions
The three-echelon supply chain inventory model is composed of n number of

suppliers, manufacturer, transport chain and o number of distributors in the middle
of every echelon. The system is split into manufacturers, suppliers and distributors.
Few presumptions are articulated as in [7].
3.2 Problem Definition
Parameters of the inventory cost: The term ho12 j represents the holding cost of
1
the final product j which represents manufacturer. tcn1k indicates transportation cost
of raw material k from the supplier n to manufacturer. scoj denotes shortage cost
2
of the product j for the distributor o. oc1oj denotes the fixed order cost of the final
1
product j from the distributor o to manufacturer. honk represents the holding cost
3
of raw materials at the supplier n. hooj denotes holding cost of the last product
2
at the distributor o. tc1oj represents transportation cost of the final product j from
1
manufacturer to distributor o. ocn1k represents the fixed order cost of raw material k
from manufacturer to the supplier n. deoj (tim) represents the need for raw materials
from the manufacturer during the time tim.
I n 21 j (tim) denotes real-time inventory of completed product during the time tim.
I n nk (tim) represents the real-time inventory of raw material k at supplier n during
1
the time tim. I n 3oj (tim) represents real-time inventory of completed product for
distributor o during the time tim(I n 1nk (tim), I n 21 j (tim) and I n 3oj (tim) represent the
non-negative integers).
2
Parameters of the time cost: ve1oj denotes the delayed transportation cost of
2
the final product j from the manufacturer to distributor o. tr1oj denotes the delayed
1
transit time of the final product j from the manufacturer to the distributor o. ven1k
represents the delayed transportation cost of raw material k from the supplier n to
1
manufacturer. trn1k represents the delayed transit time of raw material k from the
supplier n to the manufacturer. k, k = 1, 2, . . . , K denotes the index number of raw
material; n, n = 1, 2, . . . , N denotes the index number of supplier inventory; j, j =
1, 2, . . . , J represents the index number of the final product; u, u = 1, 2, . . . , U
represents the index number of time period; o, o = 1, 2, . . . , O represents the index
number of distributor inventory.
Parameter initialization of remaining costs: Let the cost of each item be repre-
sented as I c1 , I c2 , . . . , I c j , where j represents the finished product. The additional
cost to improve quality is represented as Ac1 , Ac2 , . . . , Ac j , in which j represents
the finished product. The supplier setup cost is represented as As1 , As2 , . . . , Asn ,
where n denotes the number of suppliers. The manufacturer setup cost is represented
as Am. The distributor setup cost is represented as Ad1 , Ad2 , . . . , Adn , where n
represents the number of distributors.
4 Contribution of Whale-Based Multi-Verse Optimization

for Inventory Management
4.1 Proposed Architecture
The multi-level three-echelon supply chain is shaped with ’manufacturers, suppliers

and distributors’. The architecture of block chain under the cloud sector is represented
in Fig. 1.
In the developed technique, the five inventory management parameters are inte-
grated via block chain technology inside the cloud environment. These parameters
are optimized with the help of the developed W-MVO algorithm. The multi-objective
function involves several cost functions. Therefore, with the help of W-MVO, these
Distributor
Supplier Manufacturer
Cloud
Block
chain
Cost function
Transaction cost
W-MVO Inventory holding cost
Shortage cost
Transportation cost
Time cost
Setup cost
Quality Improvement cost
Back ordering cost
Fig. 1 Architecture of proposed inventory management

costs are minimized, and then, the finally obtained optimal solution is linked to every
distributor and stored in the cloud with the help of block chain. This completed
optimal solution of each distributor is safeguarded and could not display to any
distributors.
4.2 Proposed W-MVO
The usage of optimization algorithms gained high attention between scientists [8].
Multi-Verse Optimization (MVO) [9] is inspired by the abundant big bang that
follows to the conveyance of circle. Although there are several advantages, it suffers
from various disadvantages such as the binary version, and the multi-objective
method is not achieved. Therefore, to overcome the disadvantages, Whale Optimiza-
tion Algorithm (WOA) is integrated into it, and the resultant algorithm is known as
W-MVO. WOA [10, 11] is a nature-inspired meta-heuristic technique, which has the
capacity of handling different problems. When it is differentiated with other types
of optimization algorithms, WOA has many advantages like exploration capability,
exploitation capability, etc. optimization techniques or procedures are integrated to
generate a hybrid optimization algorithm. Generally, in the conventional MVO, if
ran2 < WEP the mechanism is updated using Eq. (1) and if ran2 ≥ WEP, then the
same solution is used. But in the proposed W-MVO if ran3 < 0.5, the solution is
updated using Eq. (1) of MVO. Otherwise, if ran2 ≥ WEP, then the location of the
individual is updated using the WOA based on Eq. (2).
⎧
⎪
⎨ C g + TDR × ubg − lbg × ran4 + lbg ran3 < 0.5 ran2 < WEP
c gp = C g − TDR × ubg − lbg × ran4 + lbg ran3 ≥ 0.5
⎪
⎩
c gp ran2 ≥ WEP
(1)
c(r + 1) = H · ecran · cos(2π ran) + c∗ (r ) (2)
In the above equation, r represents the iteration, c represents the solution, c repre-
sents a constant, c∗ represents the location of prey, H = |

c ∗ (r ) − c(r )| denotes
the distance of whale to prey, ran represents random number in the interval range of
[−1, 1], and · represents element-by-element multiplication.
In1nk (tim ) In12j (tim ) Inoj3 (tim ) P1 Pne N1 N be

2
tr11k tr1oj
Fig. 2 Solution encoding
4.3 Solution Encoding
The actual tuning or optimization of the parameters is to reduce the multi-echelon

supply chain cost of the inventory technique. Along with these parameters, the prob-
ability of failure Pn and the number of backorders Nn are also taken as a decision
variable in solution encoding. The boundary value of the Pn lies in between the range
of [0.3, 0.9], and the boundary value of Nn lies in between the range of [0, 5]. The
diagrammatic representation of solution encoding is represented in Fig. 2. Here, j
represents the finished product.
The proposed W-MVO algorithm is employed to acquire the minimum objective
function for obtaining a better solution for inventory management.
4.4 Objective Model
The aim is to reduce the ’multi-level inventory cost’.

1. Transaction cost: It involves the transaction cost between the manufacturer and
suppliers and between manufacturers and distributors as in Eq. (3).

N
K
O
J
O= 1
On1k + 2
O1oj (3)
n=1 k=1 o=1 j=1
2. Inventory holding cost: It involves the cost of the manufacturer, suppliers and
distributors as in Eq. (4).

N
K
J
O
J 3
hooj ·
H= 1
honk · In1nk (tim) + ho12 j · In21 j (tim) + (4)
n=1 k=1 j=1 0=1 j=1 I n 3n (tim)
3. Shortage cost: It is defined by the need of raw materials from the manufacturer
deoj (tim), shortage cost scoj and the real-time inventory of the final product
In2oj (tim). In2oj (tim) as in Eq. (5).

O
J

S= scoj · deoj (tim) − I n 2oj (tim) (5)
o=1 j=1
4. Transportation cost: It involves the transport cost between distributors and

manufacturers and between manufacturers and suppliers as in Eq. (6).

N
K
O
J
Tr = 1
tcn1k · I n 1nk (tim) + 2
tc1oj · I n 21 j (tim) (6)
n=1 k=1 o=1 j=1
5. Time cost: It involves the time cost between distributors and manufacturers and
between the manufacturer and the suppliers as in Eq. (7).

N
K
O
J
T = 1
ven1k · trn1k
1
+ 2
ve1oj · tr1oj
2
(7)
n=1 k=1 o=1 j=1
6. Setup cost: It is defined as the cost sustained to get equipment prepared to process
a divergent quantity of goods as in Eq. (8).
⎡ ⎤
K
J

N
O
⎢ tcok ⎥
Sec = Asn tcn1k · I n 1nk (tim) + Am + Ado ⎢
⎣ j=1
⎥
⎦ (8)
n=1 k=1 o=1
·I n 2k j (tim)
7. Quality improvement cost: It is defined as the probability of additional costs

incurred that result after the finished product as in Eq. (9).

K
QIC = Pk · Ac j (9)
k=1
In the above equation, P represents the probability, and Ac represents the

additional cost.
8. Back-ordering cost: It is defined as a type of cost that is sustained by an inventory
when it is not able to complete an order and must finish it after some time later
as in Eq. (10).

K
BOC = Icj · bj (10)
k=1
Here, I c represents the item cost, and b denotes the number of backorders.
The multi-echelon supply chain inventory model’s objective function is given in
Eq. (11).
Z = α(O + H + T r + S + Sec) + βT + γ (Q I C + BOC) (11)

In the above equation, the values of α, β and γ are represented as α = 0.5,

β = 0.2 and γ = 0.3.
5.1 Simulation Setup
The developed inventory management in block chain technology under cloud selector
was performed in MATLAB 2018a, and analysis was executed. The characteristics
of the developed technique were analyzed by taking into account three test cases.
The total population size was considered as 10, and the maximum rounds performed
were 1000. The behaviour of the developed W-MVO was differentiated based on
the analysis of proposed W-MVO and statistical analysis by varying the travelling
distance rates as 0.2, 0.4, 0.6, 0.8, 1.0 and 1.2.
5.2 Analysis of Proposed W-MVO
The analysis of the proposed W-MVO is graphically represented in Fig. 3. The

travelling distance rate of the W-MVO is altered from 0.2 to 1.2, and the analysis is
performed. In Fig. 3a, for test case, the proposed W-MVO performs well when any
travelling distance rate of the W-MVO is taken into consideration. The cost function
linked to all TDR is overlaid, and therefore, it is concluded that for any TDR, the
proposed W-MVO is attaining minimum cost function. At 500th iteration, the cost
function of the proposed W-MVO is maximum at TDR = 1.0. Hence, it can be
concluded that the proposed W-MVO is best applicable for inventory management
in block chain under the cloud sector.
6 Conclusion
This paper analyzed the influence of hybrid W-MVO on optimal inventory manage-
ment in block chain under the cloud sector. The costs like transaction cost, inventory
holding cost, shortage cost, transportation cost, time cost, setup cost, back-ordering
cost and quality improvement cost were considered for deriving the multi-objective
model. The effectiveness of the proposed hybrid algorithm was analyzed by varying
the TDR value, and the model was evaluated with the assistance of block chain under
the cloud sector. Moreover, from the analysis, the cost function of the proposed W-
MVO is maximum at TDR = 1.2. Thus, it can be concluded that the proposed
Fig. 3 Algorithmic analysis of the proposed W-MVO for inventory management in block chain
under cloud sector by varying the TDR for ‘a test case 1, b test case 2 and c test case 3’
W-MVO-based block chain under the cloud sector performed effectively when it
was analyzed with various TDR values.
References
1. Chukwuemeka, G.H., Onwusoronye, O.U.: Inventory management: pivotal in effective and

efficient organizations. A case study. J. Emerg. Trends Eng. Appl. Sci. 4(1), 115–120 (2013)
2. Saha, E., Ray, P.K.: Modelling and analysis of inventory management systems in healthcare: a
review and reflections. Comp. Ind. Engine. 137, 1–16 (2019)
3. Dillon, M., Oliveira, F., Abbasi, B.: A two-stage stochastic programming model for inventory
management in the blood supply chain. Int. J. Prod. Econ. 187, 27–41 (2017)
4. Mostafaei, H., Castro, P.M., Relvas, S., Harjunkoski, I.: A holistic MILP model for scheduling
and inventory management of a multiproduct oil distribution system. Omega. 1–47 (2019)
5. Chuang, C.-H., Zhao, Y.: Demand stimulation in finished-goods inventory management:
empirical evidence from general motors dealerships. Int. J. Prod. Econ. 208, 208–220 (2019)
6. Yu, L., Chen, M., Xu, Q.: Simultaneous scheduling of multi-product pipeline distribution and
depot inventory management for petroleum refineries. 220 (2020)
7. Wang, Y., Geng, X., Zhang, f., Ruan, J.: An immune genetic algorithm for multi- echelon
inventory cost control of IOT based supply chains. IEEE Access. 6, 8547–8555 (2017)
8. Rajakumar, B.R.: Impact of static and adaptive mutation techniques on genetic algorithm. Int.
J. Hybrid Intell. Syst. 10(1), 11–22 (2013)
9. Mirjalili, S., Mirjalili, S.M., Hatamlou, A.: Multi-verse optimizer: a nature-inspired algorithm
for global optimization. Neural Comput. Appl. 27, 495–513 (2016)
10. Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016)
11. Beno, M.M., Valarmathi, I.R., Swamy, S.M., Rajakumar, B.R.: Threshold prediction for
segmenting tumour from brain MRI scans. Int. J. Imaging Syst. Technol. 24(2), 129–137
(2014)
Bottleneck Feature Extraction in Punjabi
Adult Speech Recognition System
Shashi Bala, Virender Kadyan, and Vivek Bhardwaj
Abstract In this paper, the bottleneck feature extraction technique with MLP is used
on Punjabi adult speech recognition. Nowadays, neural networks are most widely
used approaches for training and testing the system. It helps to recognize the back
probabilities among various phoneme set. This input info includes at some point
get wrapped, and it becomes difficult to prepare them on Hidden Markov Model
(HMM) based state-of-the-art synthesis. Here, context-based model is trained on
Deep Neural Network (DNN) and after that on Bottleneck-Neural Network (BN-
NN) system with the use of Multi-layer Perceptron (MLP). The baseline ASR is
performed with different environment conditions on different modelling system.
To improve the performance of a system, MLP-based supervised learning method
utilizing for adjoining voice outlines related data to change the design of profound
neural system DNN by extracting the bottleneck features. Finally, the MLP are used
as input for the DNN-HMM and BN-NN state-of-the-art system. This paper presents
the larger improvement obtained by applying the MLP feature vector with the relative
improvements of 4.03% which is achieved on the Punjabi ASR with varying the
several attributes associated with BN-NN and DNN-HMM modelling approaches.
Keywords BN-NN · Mel-frequency cepstral coefficients (MFCC) · MLP ·

DNN-HMM
S. Bala · V. Bhardwaj
Chitkara University Institute of Engineering & Technology, Chitkara University, Rajpura, Punjab,
India
e-mail: Shashi.bala@chitkara.edu.in
V. Bhardwaj
e-mail: vivek.bhardwaj@outlook.in
V. Kadyan (B)
Department of Informatics, School of Computer Science, University of Petroleum and Energy
Studies, Dehradun, India
e-mail: ervirenderkadyan@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_53
494 S. Bala et al.
1 Introduction
Neural networks have become a part of day to day human life from past few years
as we are rigorously moving towards human and machines interaction. However, it
has been remaining a major concern for many researchers that how to recognize the
pattern of a speech process. Need of pattern recognition evolved many techniques
like HMM, GMM [1–3]. Other than the probabilistic bottleneck approach [4–6] for
HMM-GMM [1–3] system (e.g. MLP based) acoustic modelling that have been addi-
tionally investigated as an elective methodology of Hidden Markov Model system
[1–3]. With regards to evaluate back likelihood, NN’s-based feature extraction with
two hidden layers can be considered as a procedure of non-direct feature transfor-
mation. While BN-NN approach is utilized as a non-direct discriminative analysis
that can be interpreted as dimension reduction method of state-of-the-art framework.
The BN-NN features are basically concatenated with MFCC which performs as the
output for posterior features. In the light of the ongoing achievement of profound
neural system in hybrid acoustic modelling, the initial step is the calculation of BN
features that are taken already []. In order to fuse state-of-the-art info in HMM-
GMM, as [4] appeared with the combination of BN-NN are various related ideas
resulted as a better performance of the system. Additionally, training the MLP-NN
on MFCC feature extraction with five-layer BN feature, where the neural network
evaluated as phoneme posteriors. Consequently, we have depicted the BN-NN feature
on MFCC features. To begin with, the output layer is optimized; second, the further
structure is used for MLP training for the output of DNN-HMM and BN-NN linear
transformation for Punjabi speech recognition.
The rest of this paper is organized as: Sect. 2 gives a related work performs
in BN features and a whole description of BN-NN in Sect. 3. In Sect. 4, whole
system overview is summarized on BN feature extraction. Section 5 describes the
experimental setup with corpus table. Result and analysis are reported in Sect. 6,
finally some conclusion in Sect. 7.
2 Related Work
In [5], author generated bottleneck features with the help of ANN structure and
has obtained 33.3% WER and has shown 2.5% better improvement over HLDA-
PLP baseline of state-of-the-art system. Also [7] presented a bottleneck features for
LVCSR dataset by some reduction in WER of the system, where [8] has presented
deep neural nets by exploring linear units for LVCSR utilizing Bayesian optimization
approach for relative improvement of the system. In [4], authors presented system for
preparing multilingual MLP’s, likewise, characterize the use of a language ward layer
traditional three layer’s which are used to drive phoneme posteriors. This method-
ology allows for sharing assets cross-wise over dialects without expecting to develop
common phoneme set. Likewise, Morgan et al. [9] proposed a novel method to
Bottleneck Feature Extraction in Punjabi Adult Speech … 495
manage multi-layer perceptron factor analysis using five-layer MLP with a normal-
ized linear bottleneck layer can outperform three-layer MLP system using MFCC
alone. Therefore, while taking about bottleneck, in [1], Michael et al. presented
DNN-based acoustic modelling for finding that they can match ASR system produc-
tion with Aurora without definite any noise. Kadyan et al. [3] have describe various
normalizing databases with using RASTA channel standardization of feature before
input to the MLP getting 18% relative improvement of WER.
3 Theoretical Background
3.1 Bottleneck
This approach is presented by Grezl et al. [10] which can be translated as a non-direct
dimensionality reduction method it fundamentally dependent on MLP approach,
where the internal layers has a small hidden unit, like the size of another hidden
layer. These layer makes a limitation/necessity in the system that must have the
option to produce compressed features after compelling the dimensionality reduction.
Therefore, bottleneck features can be derived using both unsupervised and supervised
method [11]. In supervised training, decoder is used to train acoustic model in several
languages and conditions [4–6, 10]. The system comprises of an encoder and a
decoder as shown in Fig. 1.
The input consists classifier with hidden vector x encoded to hidden layer h which
calculates the posterior probability over HMM state. x is encoded to hidden layer
h by a non-linear activation function σ, using learned weight matrix W (1) and bias
vector b(l) as follows:
Fig. 1 Structure of
bottleneck feature with
decoder
496 S. Bala et al.

h = w(1) x + b(1) (1)
after that, input layer is decoded from the hidden layer to produce a reconstructed
layer y using learned weight matrix W (2) and bias layer b(2) as follows:

y = w(1) h + b(1) (2)
The autoencoder parameter θ = (W (1) , b(1) ), (W (2) , b(2) ) is learned using back-
propagation algorithm by minimizing the mean square error (MSE) loss(m) as
defined:
1
MMSE (θ ) = m MSE (x, y) (3)
d
The learning process attempts to minimize the prediction error L (x, y) with respect
to the parameter θ = (W (1) , b(1) ), (W (2) , b(2) ), …, (W (L) , b(L) ). Typically, the loss
function in MLP is the cross-entropy error function [12]. Bottleneck features provide
more effective information while preserving enough information of the original input
features.
4 System Overview
The bottleneck-NN based Punjabi ASR has been described in Fig. 2. First, the system
is trained and tested on bottleneck-NN. For evaluating the accuracy, the front-end
Fig. 2 Block diagram summarizing the BN-NN features for enhancing Punjabi ASR system
feature extraction technique MFCC is used in BN-NN based solution in ASR. So, to
improve the performance of the BN-NN-based ASR is trained on MLP features by
KALDI toolkit [13].
In training and testing phase, the stage of an input speech is handled by utilizing
20 ms window with an edge set of 100 Hz helping pre-emphasis factor 0.97. The
extricated input frames are changes over into frequency domain that is prepared by
DFT. It helped in expulsion of stage info from the output of short-term spectrum.
The Fourier sign is additionally gone through 25 channel banks. At long last, DCT is
utilizing to change the Mel-frequency which is effective to deliver numerous arrange-
ments of de-correlated cepstral coefficients. These coefficients are utilizing to expel
higher order data from it. Therefore, the output acquired is 13 default coefficients
with a splicing factor. A sum of 9 setting size edge with 4 left and right each has
been analyzed. The mail output got with 13 default MFCC is prepared with HMM
modelling; with following, the procedure of feature extraction, the monophones,
and + (triphone (delta + delta − delta)) are figured on tri2 models. Further,
these + are joined with a static cepstra to 39-dimensional size feature vector.
So, as to improve the performance of the framework: LDA and MLLT techniques
have been applied to implement 40-time spliced reduced features. Although, these
features further performed with fMLLR approach in tri4 model with speaker adap-
tive training which is used to handle these tri4 features. The output has been gotten
on triphone modelling, where the model has been given to baseline GMM-HMM,
DNN-BN approaches. In part of, GMM-HMM, three states HMM using an eight
corner to corner covariance mixture for each state used. A total of 2500 number of
leaved and 30,000 Gaussians a are selected. Further, DNN system is trained with
tanh non-linearity model with variation in hidden layers. For improvement of DNN-
HMM system, learning rate and epochs are employed on a mini batch size of 512
Besides.
For experiments, we employ two set of corpora, where 422 phonetically rich and
connected words generated from 5000 most frequent words. Later, these sets well
combined to form only one set. The replication of unique sentences and words will
further be produced by 20 different speakers. A roman transcription was done from
audio by keeping the aspect of linguistic characteristics of Punjabi language. There
are 7 male 13 female in the synthetically created dataset. Also, we divide the collected
dataset into two sets: 70% in train and rest 30% in test. Table 1 shown represents
the training and testing partitions of combined dataset. To analyze the performance
of obtained dataset, two parameter was employed, i.e. word error rate (WER) and
relative improvement (RI).
498 S. Bala et al.
Table 1 Corpus specification

Type Test Train
of bottleneck feature
extraction No. of speakers 6 14
Language used Punjabi
Type of data Phonetically rich sentences and
isolated words
Age of speakers 18–26
Total no. of audio files 1211 2866
Gender 3-male, 3-female 4-male, 10-female
Here, the result section is performed by specifying the accuracy of the system. To
obtain the performance of the ASR utilizing the BN-NN, DNN-HMM metrices like,
word Accuracy (WA) = 100 × ((TW − S − D − I)/TW) and word error rate (WER)
= (S + D + I)/TW.
6.1 Performance Measure in Clean Environment

of GMM-HMM with MFCC and DNN
For the entire dataset, the system is training, and testing was done using the corpus
of the system, i.e. speaker independent on training and speaker dependent in testing
phase. The acoustic models were trained on these datasets, and the size of comparing
language model 5 k was utilized. An input speech signal was processed to generate
acoustic features using 13 static MFCC + Delta + Double Delta coefficients. It
was likewise seen that later linear discriminative analysis (LDA) reproduces these
separated acoustic features and essentially ad libbed the training of small vocabulary
dataset. The initial 13 MFCC features in combination with nine frames was resulted
into 117 dimensions which were further wrapped into 40 dimensions through LDA
approach. apart; it additionally attempted to utilize these features on HMM state
arrangements utilizing triphones models (Table 2).
Table 2 WER verifying through the acoustic modelling with MFCC

Mono Tri1 Tri2 Tri3 Tri4 DNN BN-NN
WER (%) 6.50 8.09 8.07 7.88 5.76 4.12 4.03
Table 3 WER verifying learning rate on different modelling techniques

Learning rate 0.005–0.0005 0.010–0.0010 0.015–0.0015 0.020–0.0020
DNN 4.12 3.98 4.03 4.11
BN-NN 4.03 4.06 4.05 3.80
6.2 Performance Measure Through Learning Rate
For successful training of DNN-HMM system, different variation was trained by

varying key parameters, i.e. learning rate number of iteration and number of epochs
as follows:
Further, to boost up the accuracy of the explored system, the different values of
learning rate are analyzed by gradient decent algorithm, where weights are in layers
which are calculated through gradient error for changing them to compressing the
error rate. Therefore, the system is examined through calculating the predetermined
piecewise constant learning rate which clearly depicts that when to change the values
of learning rate to what value [12]. In Table 3 shows that the results performed on
learning rate, and the system has finally achieve the maximum efficiency for DNN
and BN-NN at 0.010–0.0010 and 0.020–0.0020.
6.3 Performance Measure Through Epochs
To find the best results by passing the entire dataset through neural networks is not
enough. Basically, impulse in an input signal is caused due to its ’vocal folds’ that
will closer due to its pitch is called ‘epochs. The major occurrence in vocal tract was
due to its glottal pulse, where significant excitation was taking place at each epoch’s
location. The calculation simply involves working out the difference between the
observed output for each unit, then adding up all these squared differences for each
output unit and for each input signal [21]. Therefore, in Table 4, epochs values of
the audio files input are ranges from 15, 20, 25, and 30, where the system got finest
result on epochs_30. For BN-NN with no change in DNN-HMM system.
Table 4 WER verifying epochs on different modelling techniques

No. of epochs Epochs_15 Epochs_20 Epochs_25 Epochs_30
DNN 4.10 4.12 4.04 4.03
BN-NN 4.14 4.03 4.97 3.64
500 S. Bala et al.
7 Conclusion
The work proposed here focuses on the effect of feature vector on Punjabi language
with BN-NN. To further, extent the effectiveness, these variations has been projected
on DNN-HMM system. Preceding model training, optimal value of learning rate and
epoch are selected to produce effective result. Overall, the system is evaluated on
original and synthetic speech corpus, where gain has been obtained thorough fMLLR
speaker adaption training of the network which gives the finest performance of the
system. The output of the system on BN-NN achieved a relative improvement of
3.33% over conventional GMM-HMM and DNN-HMM system.
References
1. Seltzer, M. L., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust
speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal
Processing, pp. 7398–7402 (2013)
2. Patel, I., Rao, Y.S.: Speech recognition using hmm with mfcc-an analysis using frequency
specral decomposion technique. Sig. Image Process. Int. J. (SIPIJ) 1(2), 101–110 (2010)
3. Kadyan, V., Mantri, A., Aggarwal, R.K.: Improved filter bank on multitaper framework for
robust Punjabi-ASR system. Int. J. Speech Technol. 1–14 (2019)
4. Yu, D., Seltzer, M.L.: Improved bottleneck features using pretrained deep neural networks. In:
Twelfth Annual Conference of the International Speech Communication Association (2011)
5. Grézl, F., Karafiat, M., Burget, L.: Investigation into bottle-neck features for meeting
speech recognition. In: Tenth Annual Conference of the International Speech Communication
Association (2009)
6. Grézl, F., & Karafiát, M.: Hierarchical neural net architectures for feature extraction in ASR. In:
Eleventh Annual Conference of the International Speech Communication Association (2010)
7. Veselý, K., Karafiát, M., Grézl, F.: Convolutive bottleneck network features for LVCSR. In:
2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 42–47 (2011)
8. Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using
rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech
and Signal Processing, pp. 8609–8613. IEEE (2013)
9. Morgan, N.: Deep and wide: multiple layers in automatic speech recognition. IEEE Trans.
Audio Speech Lang. Process. 20(1), 7–13 (2011)
10. Grézl, F., Karafiát, M., Kontár, S., Cernocky, J.: Probabilistic and bottle-neck features for
LVCSR of meetings. In: 2007 IEEE International Conference on Acoustics, Speech and Signal
Processing-ICASSP’07, vol. 4, pp. IV-757. IEEE (2007)
11. Valente, F., Magimai-Doss, M., Wang, W.: Analysis and comparison of recent mlp features
for lvcsr systems. In: Twelfth Annual Conference of the International Speech Communication
Association (2011)
12. Essays, UK. (November 2018). Speech Recognition using Epochwise Back Propagation. Int.
J. Comput. Appl. 0975 – 8887. Retrieved from https://www.ukessays.com/essays/computer-
science/speech-recognition-using-epochwise-8817.php?vref=
13. Yegnanarayana, B., Murty, K.S.R.: Event-based instantaneous fundamental frequency esti-
mation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624
(2009)
14. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks
for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1),
30–42 (2011)
15. Bourlard, H., Morgan, N.: Continuous speech recognition by connectionist statistical methods.
IEEE Trans. Neural Netw. 4(6), 893–909 (1993)
16. Grézl, F., Karafiat, M., Janda, M.: Study of probabilistic and bottle-neck features in multilingual
environment. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding,
pp. 359–364. IEEE (2011, December)
17. Grezl, F., Fousek, P.: Optimizing bottle-neck features for LVCSR. In: 2008 IEEE International
Conference on Acoustics, Speech and Signal Processing, pp. 4729–4732. IEEE (2008)
18. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M.,
Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J.: The Kaldi speech recognition toolkit. In: IEEE
2011 Workshop on Automatic Speech Recognition and Understanding (No. CONF). IEEE Sig.
Process. Soc. (2011)
19. Rodríguez, L.J., Torres, I.: Comparative study of the baum-welch and viterbi training algo-
rithms applied to read and spontaneous speech recognition. In: Iberian Conference on Pattern
Recognition and Image Analysis, pp. 847–857. Springer, Berlin, Heidelberg (2003)
20. Senior, A., Heigold, G., Ranzato, M. A., Yang, K.: An empirical study of learning rates in deep
neural networks for speech recognition. In: 2013 IEEE International Conference on Acoustics,
Speech and Signal Processing, pp. 6724–6728
A Study of Machine Learning
Algorithms in Speech Recognition
and Language Identification System
Aakansha Mathur and Razia Sultana
Abstract Speech recognition is a broad topic that primarily involves sub-topics like
language identification, speaker identification, speech emotion recognition, speech to
text systems, text to speech systems, dialogue systems and much more. While, human
beings are quickly able to recognize or identify a language because of the corpora of
knowledge built over the years. However, it is a challenging task to have a machine
identify a spoken language. So, to build a system that can correctly identify multiple
languages irrespective of the dialect and speaker characteristics is an interesting area
of research. One benefit of such a LID system is that the barrier between people
caused due to language differences will be broken. Such a system will further the
progress of globalization. The latest developments of machine learning in speech
and language are described as a detailed state of the art in this paper.
Keywords Machine learning · Speech recognition · Support vector machines ·

Classification · Language
1 Introduction
Every language across world is recognized by machine learning algorithm using

an identification pattern. A language identification system (LID) aims to detect the
language spoken in an audio speech signal or file. Most LID systems, proposed by
research, consist of two stages:
1. Feature extraction stage: The feature extraction stage primarily involves extrac-
tion of audio signal features like melody and stress.
2. Classification stage.
Before the feature extraction stage, the input speech signals are preprocessed.
A. Mathur (B) · R. Sultana

Department of Computer Science, BITS Pilani, Dubai, United Arab Emirates
e-mail: mathur.aakansha7@gmail.com
R. Sultana
e-mail: razia@dubai.bits-pilani.ac.in
https://doi.org/10.1007/978-981-33-4543-0_54
504 A. Mathur and R. Sultana
1.1 Preprocessing
The preprocessing steps include the following.
1.1.1 Pre-emphasis
The pre-emphasis stage primarily involves smoothening the frequency spectrum.

This method improves the efficiency of the system by amplifying the high-frequency
part of signal.
1.1.2 Framing
Framing is the method of diving a speech signal into frames. Each frame is usually 20–
30 milli seconds. The frames overlap each other for some milli seconds. A common
overlap time is 20 milli seconds.
1.1.3 Windowing
Windowing lessons the discontinuities in the frames. A window function is applied

on every frame. A commonly used window function is Hamming window function.
1.2 Machine Learning in Speech Processing
The classification stage involves selecting a classifier, passing the extracted features
to the classifier and identifying the language. Earlier research had involved using
only audio signal processing techniques for language identification. In other words,
only signal processing techniques were used for both stages in a LID system. A
common signal processing method used for classification purposes is vector quan-
tization method. However, as research progressed, many researchers began using
machine learning classification algorithms like Gaussian mixture models (GMM),
decision trees (DT), K-nearest neighbours (K-NN), support vector machines (SVM),
artificial neural networks (ANN) and deep neural networks (DNN). These machine
learning classifiers performed very well in a LID system. This also highlights the
application of machine learning in speech recognition. While the classification stage
of a LID system has used machine learning techniques, the feature extraction stage
predominantly uses signal processing techniques.
A Study of Machine Learning Algorithms in Speech Recognition … 505
1.3 Types of Features
There are various speech signal features that are extracted for discriminating between
languages. There are two types of features: low-level and high-level speech features.
The different types of low-level features are acoustic features, phonotactic features
and prosodic features. The different types of high-level features are lexical and
syntactic features. Most of the research is focused on using low-level features to
language identification.
1.3.1 Acoustic Features
Another stream of features called acoustic features is obtained through two tech-
niques: linear prediction and mel frequency cepstral coefficients (MFCC) based.
The linear prediction techniques give linear predictive coding (LPC), linear predic-
tion cepstral coefficients (LPCC) and perceptual linear predictive features (PLP). The
MFCC features are widely used in research because of their robustness and ability
of eliminating speaker-dependent features.
1.3.2 Auditory Features
Furthermore, MFCC, PLP and RASTA-PLP are auditory features while MFCC and
LPCC are static features. Auditory features use filter bank method for extracting
features. Auditory features are inspired from human hearing system. Static features
involve dividing speech signals into frames to obtain static characteristics. These
static characteristics vary with time. Another notable feature of speech processing
is phonotactic feature involving study of phonemes and system. The smallest unit
in a language is phoneme and phoneme is used to construct meaningful parts in a
language. A phoneme itself does not have a meaning. Phonology is a field concerned
with the functioning of the sound in a language. This objective of the sound functions
is to make the speech meaningful. Prosodic features like melody (pitch), stress,
intonation, duration of speech, intonation and rhythm are extracted by research.
1.3.3 Lexical Features
Lexical features are a type of high-level features and deal with a language’s word
structure. The research with lexical features primarily involves extracting words from
the speech and building word-level LID systems.
1.3.4 Syntactic Features
Syntactic features are concerned with the order of words and sentence structure in a
language. Not many LID systems have been constructed which use syntactic features.
Researchers have used the extracted features individually and in combinations in the
LID systems. Often researchers compensate noise from the input speech signal to
improve the performance of the LID system. It depends on the researcher if noise
should be compensated or not. However, recent researches have attempted to identify
sub-languages. For instance, the Indian subcontinent consists of many sub-languages.
Researches have tried to identify sub-languages such as Tamil, Hindi, Punjabi or
Assamese. The data set used by researchers primarily consists of speeches from
local news and radio broadcasts. For instance, researchers using Indian languages
derived their data set from All India Radio broadcast or Doordarshan Television
Network. Moreover, the data set consists of male and female speakers and speakers
with different dialect to add variability to the data set. The extracted features inputted
to the classification algorithm. The classifier first trains itself on these features and
then recognizes the language in an unknown audio signal. So, researchers have further
split the classification stage into learning and recognition phase. Various researches
have been conducted to improve the learning phase which in turn will improve the
performance of the LID system. Let us now look at chronological development of
research in language identification in recent years. The next four sections explain
few of the LID models and finally the conclusion.
2 Language Identification Model I
The objective of the research [1] was to build a LID for three types of Indone-
sian languages. The research extracted high-level speech features and phonotactic
features. The research used two phonotactic feature extraction methods:
1. Phone recognition followed by language modelling (PRLM)
2. Parallel phone recognition followed by language modelling (PPRLM).
2.1 Methodology
The research analysed and compared the performance of the two phonotactic
methods. The input to the PRLM is a speech signal. The PRLM method first performs
phone recognition and then performs classification of the phone into the target
languages. The PRLM system consists of a single universal phone recognizer. The
universal phone recognizer is created using n-gram statistics model. That is, the like-
lihood of sequences of a phone appearing in a certain language is calculated. The
phones recognition, from a speech signal, tabulates log likelihood for each language.
The identification of language in a speech is determined by maximum log likeli-

hood value. The (PPRLM) method uses more than one phone recognizers or uses
multiple phone recognizers. Each phone recognizer identifies the language for phones
of a speech signal. Each phone recognizer acts as a language model for different
languages. The log likelihood values are tabulated from each language model. The
log likelihood values are compared against each other. The language in a speech is
determined by maximum log likelihood value. The research used a phone recognizer
developed by Brno University of Technology. The phone recognizer was used to
identify phones in four languages: Czech, English, Mandarin and Russian. Eighteen
speech (three languages × six speakers) recordings are there in the database. Equal
number of male and female speakers’ speeches were taken. The speech clips are at
a frequency of 16 kHz.
2.2 Data set and Putting into Practice
The data set is divided into training subset, development subset and test subset.
The research performed experiment on the n-gram models for PRLM and PPRLM
methods. In PRLM experimentation, the research trains for four spoken language
identification systems using Czech, English Hungarian and Russian and then tests
the systems. The four systems are tested on three Indonesian languages. The four
systems are tested with different n-grams statistical models. The value of n ranged
from 3 to 10. Confusion matrices for PRLM experiments are derived. It was observed
that English and Russian phone recognizers gave the highest accuracy of 77.42 and
75.94%, respectively. The PPRLM experiments consist of two language identifica-
tion systems. The first system creates interpolated models by using all phone recog-
nizers for Czech, English Hungarian and Russian. Then, the first system tokenizes
the phones. The second language identification system uses two phone recognizers
that give the highest accuracy in PRLM experiments. The research selected phone
recognizers of English and Russian as they had the highest accuracy in PRLM exper-
imentation. The two language identifications systems are also tested with the three
Indonesian languages. The two LID are also tested for the different n-gram statistical
models.
3 Language Identification Model II
The research [2] proposed a language identification model that identifies the
following five languages: Arabic, Chinese, English, Korean and Malay. The data set
consisted of ten speakers and each of them spoke the different languages mentioned
earlier. So, the total number of recordings was 50 (ten speakers × five languages).
3.1 Methodology
The preprocessing is the first step in the LID. The preprocessing step consists of
amplification of the speech signal. The speech signal was amplified because it was a
weak signal, and it could not be used as an input. Another preprocessing procedure
was removing the silence in speech recordings and removing the background noise.
The pre-emphasis stage performed noise removal in the speech and emphasized
the higher frequencies in the speech signal. There were two ways to implement
pre-emphasis stage. One way was pre-emphasis as a fixed coefficient filter. The
second way was pre-emphasis as an adaptive coefficient filter. In the second way,
the coefficient was adjusted with time according to a speech’s autocorrelation value.
The pre-emphasis causes spectral flattening. This results in the signal being less
vulnerable to the finite precision effects in subsequent signal processing.
3.2 Procedural Steps in a Nutshell
The speech was divided into frames of 50 milli seconds. The frames overlapped
every 20 milli seconds. The research assumed that speech signal was stationary over
each frame. The research increased the correlation of linear predictive coding (LPC)
in order to decrease the discontinuity between beginning and end of each frame.
This was done by windowing each frame. The research used Hamming window.
Then, the proposed system passed the windowed frames through fast Fourier trans-
formation and mel frequency warping. By doing so, the mel spectrum was obtained.
The logarithmic of mel spectrum gave MFCC. The model derived these features
because MFCC features are robust. Once the MFCC features were extracted, these
features were then passed to the classification stage. The research used vector quan-
tization (VQ) method as classifier. The VQ technique is a classic technique in audio
processing. The process of approximating feature vector that causes quantization of
multiple values is known as quantization process. The research created a codebook
which is used by VQ. This objective of the codebook is to work as a descriptor for
vector quantizer. The codebook contains a set of fixed prototype vectors, where each
vector is called a codeword. The VQ process matches the input vector to the codeword
in the codebook. To perform this task, the VQ method needs a distortion measure.
The index of the input vector codeword replaces the input vector. The index should
show the codeword with the smallest distortion in the codebook. So, minimization
of distortion is the goal of VQ technique.
3.3 Data set
The research divided the data set into testing and training data set. The training data
set consisted of speech recordings from four males and one female. The testing data
set consisted of speech recordings from two males and three females. The research
optimized the frequency parameter and codebook size parameter and observed its
effects on recognition rate.
3.4 Evaluation Results
The audio files were set to two frequencies: 8 and 16 kHz. The recognition rate for
all the five languages was higher for 16 kHz sampling frequency than that of 8 kHz
frequency. The recognition rate is the ability of the classifier to correctly classify the
audio signals into the different languages. The average recognition rate for the five
languages was 78%. A limitation of the research was lack of experimentation with
machine learning classifiers such SVM, K-NN, ANN and K-means clustering.
4 Language Identification Model III
The next research proposal [3] applied machine learning procedure to build a LID
that uses MFCC and K-NN, a machine learning classifier. The LID is used to identify
Arunachal languages.
4.1 Data set
The data set consisted of speech files from five types of Arunachal languages. Speech
recordings were taken from All India Radio local news broadcast. The data set
consisted of speech files is of duration of 4 min.
4.2 Procedural Steps in a Nutshell
The first stage of the system is feature extraction stage. The research extracted MFCC
features. The MFCC features were extracted because MFCC has production and
perception of the speech similar to that of a human being. The logarithmic perception
of loudness and pitch is imitated by the MFCC. MFCC features do not include
speaker-dependent features. The MFCC feature extraction technique involves the
following steps: framing, windowing, discrete Fourier transformation, mel filter bank
and discrete cosine transformation (DCT). Framing is a process of dividing speech
signals into frames, and these frames overlap each other. The number of features in a
one minute of a speech signal is a sequence of 5000 13-dimensional feature vectors.
The discrete Fourier transformation is a process to convert speech signal from time
domain to frequency domain.
4.3 Machine Learning Algorithm
The classification stage follows the feature extraction stage. The research had chosen
K-NN algorithm for classification task. The MFCC features extracted are passed to
the K-NN algorithm for language identification of a speech signal. The training data
set consisted of 20 min of the speech file. The testing data set contained speech
samples of time length of 20 s.
4.4 Extended Evaluation
The research further experimented by changing the test data. The test data was
changed from 20 s speech signals to 10 s speech signals. It was observed that the
correct prediction accuracies of Adi, Apatani, Galo, Idu and Tagin were 77%, 65.8%,
94%, 97% and 83%. The Adi language was misclassified into Apatani, Galo, Idu and
Tagin with a misclassification rate of 1.5%, 20%, 1.5% and 0.5%, respectively. The
Apatani language was misclassified into Adi, Galo, Idu and Tagin with a misclas-
sification rate of 14.6%, 14.1%, 1.4% and 3.9%, respectively. The Galo language
was misclassified into Adi, Apatani, Idu and Tagin with a misclassification rate of
3.6%, 1.2%, 0.2% and 0.9%, respectively. The Idu language was misclassified into
Adi, Apatani, Galo and Tagin with a misclassification rate of 0.6%, 0.4%, 0.8% and
0.2%, respectively. The Tagin language was misclassified into Adi, Apatani, Galo
and Idu with a misclassification rate of 5.3%, 5.1%, 4.5% and 1.8%, respectively.
The research did not explore other classification algorithms when MFCC features
are used.
5 Language Identification Model IV
Another similar work constructs [4] a language identification system that identifies
four different Indian languages: Tamil, Telugu, Hindi and Kanada using machine
learning algorithm.
5.1 Methodology
The language identification system takes a speech signal as input and classifies a
speech signal into one of the four Indian languages by performing computations on
the speech signal. The classifiers that the research work practices are decision tree
and SVM.
5.2 Data set
The proposed language identification system consists of several steps: MFCC feature
generator, feature vectors, training data and classifier. The data set consists of audio
files in waveform audio file format (WAV). The speech recordings are obtained from
news broadcasts of Doordarshan Television Network. The data set consists of 5 h
of speech recording for each language. The data set is divided into two: testing
and training data. Before feature extraction, the speech files are preprocessed. The
preprocessing step involves removing silence in the speech recordings. This is done
by using short-term energy function.
5.3 Procedure in a Nutshell
The system proposed by the research has a feature extraction step that involves
extracting MFCC features. The MFCC features remove the harmonics from speech
signals thereby eliminating speaker-dependent characteristics. The MFCC extraction
technique involves the followings steps. The first step is framing the signal into short
frames. The frames are of length 20 milli seconds, and the frame shift is of time
length of 10 milli seconds. Then the next step is computing periodogram estimate
of the power spectrum. The output of this step is mel spectrum. Following this is
applying mel filter bank to the power spectra and summation of the energy in each
filter. This step is also called mel scale filtering. The output of mel scale filtering is mel
frequency spectrum. The next is tabulating the logarithmic of all filter bank energies.
The DCT is applied on the logarithmic of filter bank energies. The DCT coefficients
2–13 are kept and other are discarded. From the discrete cosine transformation, we
get the mel cepstral coefficients. Once the MFCCs are obtained, the MFCC feature
values are saved in a comma separated value file (CSV).
5.4 Machine Learning Algorithm
These optimal hyperplanes will perform data separation with minimal to no errors.
Support vectors are those training points that are closest to the optimal hyperplanes.
Each training sample is stored as a row in the CSV file. This CSV file is passed to
the support vector machines and decision tree classifiers for training. The research
assessed the performance of the classifiers by calculating the detection rate. The
classifiers classify the unknown test speech signals.
5.5 Evaluation and Results
The detection rate is the ratio of number of correctly classified keywords to sum of
number of correctly classified keywords, number of incorrectly classified keywords
and number of rejected keywords. The accuracy for Tamil, Telugu, Hindi and Kanada
when SVM was used as classifier: 0.4, 0.2, 0.28 and 0.33, respectively. The accuracy
for Tamil, Telugu, Hindi and Kanada when decision tree was used as classifier: 0.8,
0.67, 0.2 and 0.22, respectively. The research obtained accuracies of 76% and 73%
when using support vector machines and decision trees, respectively. The research
only used one type of spectral characteristic, MFCC and did not explore other features
like prosodic features.
6 Conclusion
The current report studies the significant research in speech recognition has taken
place 2016–2020. From the current report, it can be observed that machine learning
has an application in speech recognition. Moreover, the interaction between machine
learning and audio signal processing is also significant. A typical language identifi-
cation system consists of feature extraction stage and classification stage. The data
set used by the researchers predominantly consisted of speech recordings from local
news broadcasts. Moreover, the researchers brought variability to their data set by
incorporating speeches by male and female speakers. Researchers have performed
various feature extraction techniques. Moreover, machine learning classifiers like
SVM, DT and NN have been used. Some types of neural networks that have been
used are artificial neural networks, probabilistic neural networks [5], deep belief
NN and FFBPNN. Some researchers have broken down the classification stage into
learning phase and recognition phase. These researches have tried to optimize the
learning phase by building learning models. Researchers have used extreme learning
machine approach to create learning models. Moreover, researches have tried to
optimize the extreme learning machine approach by using different optimization

approaches. Lastly, the current research uses utterance of words in a speech as data
set. More research needs to be done on language identification system for continuous
speech signals.
References
1. Safitri, N. E., Zahra, A., Adriani, M.: Spoken language identification with phonotactics methods
on Minangkabau, Sundanese, and Javanese Languages. In: SLTU, pp. 182–187 (2016, January)
2. Gunawan, T.S., Husain, R., Kartiwi, M.: Development of language identification system
using MFCC and vector quantization. In: 2017 IEEE 4th International Conference on Smart
Instrumentation, Measurement and Application (ICSIMA), pp. 1–4 (2017, November). IEEE
3. Nyodu, K., Sambyo, K.: Automatic identification of arunachal language using K-nearest
neighbor algorithm. In: 2018 International Conference on Advances in Computing, Communi-
cation Control and Networking (ICACCCN), pp. 213–216 (2018, October). IEEE
4. Venkatesan, H., Venkatasubramanian, T.V., Sangeetha, J.: Automatic language identification
using machine learning techniques. In: 2018 3rd International Conference on Communication
and Electronics Systems (ICCES), pp. 583–588 (2018, October). IEEE
5. Sulthana, A.R., Gupta, M., Subramanian, S., Mirza, S.: Improvising the performance of image-
based recommendation system using convolution neural networks and deep learning. Soft
Comput., 1–14 (2020)
Plant Leaf Disease Detection
and Classification Using Machine
Learning Approaches: A Review
Majji V. Appalanaidu and G. Kumaravelan
Abstract Early detection of plant diseases will certainly increase the productivity of
agricultural products. In addition, identification of the type of diseases by which the
plant leaves are affected is a cumbersome task for human beings. Hence, in recent
years, image processing techniques with machine learning algorithms provide an
accurate and reliable mechanism to detect and classify the type of diseases in plants.
We delivered a comprehensive study on the identification and classification of plant
leaves using image processing and machine learning techniques. We presented a
discussion about common infections and followed a line of investigation scenarios
in various phases of the plant disease detection system. Finally, the problems and
future developments in this area are explored and identified. This review would help
investigators to learn about image processing and machine learning applications in
the fields of plant disease detection and classification system.
Keywords Image processing · Plant disease · Machine learning · Classification
1 Introduction
The detection of diseases in plants is an essential issue, and it should be controlled

entirely in the field of agricultural science. In particular, the crop loss due to diseases
in developing countries like India affects their economic growth and nutritional stan-
dard adversely, because almost 70% of the population depends on agriculture. Thus,
detection of disease in crops/plants at the earlier stage plays a vital role. Besides, few
diseases have no visible symptoms, and farmers are not having enough knowledge of
M. V. Appalanaidu (B) · G. Kumaravelan

Department of Computer Science, Pondicherry University Karaikal Campus, Karaikal,
Pondicherry, India
e-mail: naidu.lolugu@gmail.com
G. Kumaravelan
e-mail: gkumaravelanpu@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_55
516 M. V. Appalanaidu and G. Kumaravelan
these diseases. In these cases, farmers fail to identify those diseases. Therefore, the
necessity of an automated system to detect the type of plant diseases, and its severity
level becomes more critical.
Recently, image processing techniques with machine learning (ML) algorithms
prove to be a prominent technique for automatic plant leaf recognition and catego-
rization of diseases. Figure 1 shows the overall architecture of an automated plant
disease detection and classification system. It typically involves two-step practices.
The first step consists of image processing routines, namely image acquisition a
method to capture images of the infected parts of the plant leaf through RGB camera
followed by image pre-processing a method that removes noises in the captured
image through filters, followed by image segmentation a method that extracts the
diseased portion from the chosen image, and finally feature extraction a method to
set up the derived values from the segmented image. The second step consists of a
classification process through the ML algorithm, which predicts the healthy and the
infected plant leaves.
The organization of this paper is as follows: Section 2 presents a Categorization of
Plant Diseases. Section 3 describes various modules that are involved in the Process
of Plant Leaf Disease Detection and classification Systems. Section 4 discusses the
results of the previous research works. Section 5 concludes this paper along with
future work directions.
Image Processing Practice
Image Feature
• Smart phone Preprocessing • K-means Extraction
• Camera • Noise • Color • Color
• Image Threshold • Shape
Enhancement • Texture
Image Image
Acquistion Segmentation
Classification Practice
Healthy Leaf
• SVM
• ANN
• KNN
Machine Diseased Leaf

Learning
Algorithms
Fig. 1 Architecture of an automated plant disease detection and classification system

Plant Leaf Disease Detection and Classification Using Machine … 517
2 Categorization of Plant Diseases
Generally, plant diseases have caused by biotic and abiotic factors. Usually, biotic
factors are living organisms such as bacteria, fungi, and viruses, and abiotic factors
are non-living organisms such as excess temperature, insufficient sun lights, and
chemicals substances from an industry outlet. Nevertheless, abiotic factors are mostly
avoidable because they are less dangerous, non-infectious, non-transmissible. Thus,
the farmers are more worried about the biotic factors than abiotic factors that affect
the agricultural farm in terms of the quality and quantity of the crop products.
Figure 2 shows the various types of biotic factors such as bacteria, fungi, and
viruses through which the plant leaves are affected. Soft spot, spot, and wilt are
examples of bacterial diseases that normally affect plants like potato, corn, and bean.
Mildew, rots, and spots are examples of fungal diseases that normally affect the
plant leaves like carrots, beetroot, and beans. Dwarfing, distortion, and mosaic are
examples of viral diseases that normally affect plants like tobacco, pepper, potato,
tomato, eggplant, cucumber, and petunia. Figure 3 shows the leaves of the plants
affected by various types of biotic factors.
Types of
Diseases
Bacterial Fungal Viral

Diseases Diseases Diseases
Soft Spot Wilt Mottling Distortion Dwarfing

spot
Molds Rust Mildew Rots Cankers Spots Wilts
Fig. 2 Types of various biotic factors influencing plant diseases

Bacterial
Diseases
Softspot Spot Wilt
Fungal Molds Rust Mildew

Diseases
Rots Spots Wilts

Viral
Diseases
Dwarfing
Mottling Distortion
Fig. 3 Leaves of the plants affected by various types of biotic factors
3 Process of Leaf Disease Detection and Classification

System
This section elaborates in detail about the various processing modules used in the
development of the leaf disease detection/classification system.
3.1 Image Acquisition
In this method, the investigators have used well-known datasets, namely plant village
dataset, integrated pest management (IPM) images, and American Psychopatho-
logical Society (APS) images. Most of the works have observed a single crop for
the period instead of the full-fledged dataset [1–5]. Some of the experimenters use
scanned images [6, 7]. A few research workers have used self-collected images. A
powerful leaf disease detection system depends on the image capturing in the envi-
ronmental conditions. A list of datasets used by the various researchers has been
shown in Table 1.
Table 1 List of datasets used by various researchers

S. No. Name of the dataset Data collected from Maximum number Used by the number
the source of images of researchers
1 Apple Agriculture 320 1
research institution
at university of
Tehran
2 Soybean Plant village dataset 4775 2
3 Bitter gourd Self-collected 470 1
images
4 Cassava An experimental 160 1
field located in
Khaphaengsaen
campus Kasetsart
University, Nakhon
pathom, Thailand
5 Cotton Self-collected 290 4
images
6 Groundnuts Self-collected 400 1
images
7 Jujube Self-collected 45 1
images
8 Paddy Paddy fields, 330 4
Shivamogga
district, Karnataka
state, India
9 Potato Plant village dataset 300 3
10 Rice Self-collected 500 4
images
11 Tomato Self-collected 800 5
images
12 Watermelon Watermelon nursery 200 1
in Kuala Selangor
13 Wheat Self-collected 800 2
images
3.2 Image Preprocessing
This method helps to maintain all the images in fixed size using the resize function.
Various filters are employed to the image for noise removal and image enhancement.
If the captured images have no noise, then those images give better results in the next
steps. Mean and median filters have used to eliminate unwanted information like
dust, dewdrops, water drops, insects, and shadows that appear on the image. Weiner
filters are used to clear the blurring effect of the leaf image. The list of preprocessing
functions has shown in Table 2.
Table 2 List of preprocessing functions

S. No. Pre-processing Filters/functions
1 Noise removal Filters like Mean, Median and Weiner
2 Image enhancement Image contrast, Image resize, Image cropping
Image smoothing
3.3 Image Segmentation
Segmentation of the image means to split the image into different segments and
extracts the unhealthy portion of the leaf image using segmentation techniques. A
list of various segmentation techniques used by the researchers has shown in Table 3.
Table 3 List of various segmentation techniques used by the researchers

S. No. Segmentation method Used by the number of Highest classification
authors accuracy
1 Active contour model 1 85.52
2 Binary threshold 1 89.6
3 Color threshold 7 99.8
segmentation
4 Edge detection using 2 98.75
Sobel
5 Fermi energy 2 92.2
6 Fuzzy c means 1 NA
7 Genetic algorithm 1 95.7
8 GAACO 1 91.3
9 Global threshold 1 88
10 Grab cut algorithm 1 NA
11 Improved histogram 1 90
segmentation
12 k-means 12 100
13 Otsu thresholding 4 98
14 PSO 1 97.4
15 Ring segmentation 1 90
16 SOFM 1 97.3
17 Thresholding and 1 100
masking
18 YCbCr color space 1 98
Table 4 A list of features used by the researchers

S. No. Features Used by the number of Highest classification
authors accuracy
1 Color 2 100
2 Shape 1 75
3 Texture 6 99.8
4 Color, shape 5 97
5 Color, texture 6 97.3
6 Color, texture, shape 8 97.3
7 Discrete wavelet transform 1 89.6
8 Fractional-order Zernike 1 97.3
moments
9 LBP features 1 95
10 Geometrical 1 76.5
11 SIFT 1 93.3
12 Eigen 1 90
13 Hu moments 1 85.52
The final method of image processing is feature extraction. This method is useful
to reduce the image dataset and also to find the name of the disease. Color features
have extracted by CMM (color moment method) and CCM (color co-occurrence
matrix). Mean and Standard_Deviation are some examples of the color feature. Shape
features have extracted by the MER (minimum enclosing rectangle). Area, perimeter,
and diameter are a few examples of shape features. The GLCM (Gray_Level_Co-
occurrence_Matrix) extracts texture features. Contrast, entropy, and homogeneity
are examples of the texture feature. CCM is also used to extract combinations of
color and textures. A list of features used by the researchers has shown in Table 4.
3.5 Existing Machine Learning Algorithms for Plant Disease

Classification
Support Vector Machine (SVM): The authors in [8] classify the five diseases of the
banana leaf. They collect a total number of 106 images by the digital camera. During
classification, training used 60 images, and testing used 46 images. SVM performs
the classification with 95.7% accuracy. The authors in [9] classify the soybean leaves
of three diseases. During classification, training used in 3341, and testing used 1434
images. They divide the whole dataset into three models, like model1, model2, and
model3. For training and testing, model1 uses each 50% of the total images. Model2
uses 60% and 40% of the overall pictures for training and testing. For the learning
and evaluation, model3 uses 70% and 30% of the total frames. Among the three
models, the highest classification accuracy 62.53% achieved by the model3. The
authors in [10] classify the grape leaves of the two diseases. The total number of
images 400 collected from a well-known benchmark dataset called the plant village
dataset in the form of JPEG. During classification, training used in 225 and testing
used 175 images. SVM performs the classification with accuracy 97.3%. The author
compared the proposed model with NN, ANN, fuzzy set theory algorithms and
concluded that the proposed model gets the best precision. The authors in [11] classify
the two diseases of potato leaves. They collect the images from the plant village
benchmark dataset. Images used during classification, training used 180, and testing
used 120. MulticlassSVM classifier performs classification with an accuracy of 95%.
The authors in [12] classify the four diseases of a wheat leaf. During classification,
training used 150 images, and testing used 50 images. The proposed model multiple
classifier systems (MCS) performs the classification with 95.1% accuracy.
K-Nearest Neighbor (KNN): The authors in [13] categorize two diseases of paddy
leaf. Initially, segment the image by the global threshold method to separate the
unhealthy region of the leaf image. After that, extract geometric features from the
segmented images and submitted to the KNN classifier. During classification, training
used 198, testing used 132 images. KNN classifies the diseases of paddy leaf with
76.59% accuracy. The authors in [14] classify the five kinds of diseases of corn leaves,
which take by the digital camera. The KNN classifies the diseases of corn leaf with
an accuracy of 90%. The authors in [15] classify the two diseases of soybean leaves.
During classification, training used 100, and testing used 44 images with accuracy
75%. The authors in [16] classify the two diseases of a paddy leaf using the KNN clas-
sifier. Classifiers, i.e., SVM and KNN, perform the classification with accuracy 93.3
and 91.10%, respectively. During classification, training used 90 images, and testing
used 30 images. Finally, the author concludes among the two classifiers that KNN
has given the best performance. The summary of all machine learning classification
algorithms has shown in Table 5.
Naïve Bayes (NB): The authors in [17] present an efficient technique to classify
the healthy and diseased leaf of the okra. They test the proposed method on 79 leaf
images. During classification, training used 49, and testing used 30 images. Naïve
Bayes classifier categorizes the healthy and unhealthy leaf with 87% accuracy.
Neural Network (NN): The authors in [18] proposed Enhanced Particle Swarm
Optimization (EPSO) to classify the diseases of root rot, leaf blight, bacterial blight,
micronutrient, wilt of the cotton leaf. They apply the reduced features to SVM and
BPNN classifiers. During classification, training used 270, and testing used 120
images for the classification of various diseases of cotton leaves with 94% accu-
racy. Finally, the author concludes that BPNN is the best classifier among the two
classifiers. The authors in [19] developed a new system to identify the varieties of
white rot, anthracnose, rust, ascochyta spot, witches broom of jujube leaf. Eleven
shapes, four texture, and nine color feature extract from the segmented images. Lastly,
these features apply to the neural network classifier as input. The classifier identifies
various diseases of jujube plant leaf with an accuracy of 85.33%. The authors in [20]
Table 5 Summary of classification techniques

Classification Author and year Plant name Number Training Testing Classification
technique of images images accuracy
diseases
SVM Vijai Singh et al. Banana 5 60 46 95.7
(2017)
Sukhvir Kaur Soya bean 3 3341 1434 62.53
et al. (2018)
P. Kaur et al. Grapes 2 225 175 97.3
(2019)
Islam et al. (2017) Potato 2 180 120 95
Tian et al. (2010) Wheat 4 150 50 95.1
KNN M. Suresha et al. Paddy 2 198 132 76.5
(2017)
S. W. Zhang et al. Corn 5 90 10 90
(2015)
S. Shrivastava Soybean 2 100 44 75
et al. (2014)
K. J. Mohan et al. Rice 3 90 30 93.3
(2016)
NB D. Mondal et al. Okara 2 40 39 87
(2015)
NN Revathi et al. Cotton 2 270 120 94
(2014)
W. Zhang et al. Jujube 6 30 15 85.33
(2013)
M. Ramakrishnan Groundnuts 1 360 40 97.41
(2015)
DT H. Sabrol et al. Tomato 5 117 266 97.3
(2016)
H. Sabrol et al. Tomato 5 598 150 77.6
(2016)
investigate a method to identify the Cercospora disease of groundnut using BPNN.

They collect a total number of 400 images for this proposed method. Initially, convert
RGB to HSV for color generation and descriptor. Later, it removes the background
using the thresholding algorithm. Finally, extract the features namely texture and
color, from the segmented image and then passed to the BPNN classifier to perform
classification with an accuracy of 97.41%.
Decision Tree (DT): The authors in [21] classify the five diseases of tomato plant
images. The decision tree performs classification with an accuracy of 97.3%. The
author concludes that the combination of features gives the best classification accu-
racy. The authors in [22] suggested a model automate the plant disease recognition
and categorization of different types of tomato leaves and stems’ images. Initially, all
the pictures are segments by the Otsu thresholding segmentation technique to sepa-
rate the diseased part of the form. Later, extract ten color features from the segmented
image and store in the feature vector. Finally, these extracted features are submitted
to the decision tree and perform the classification with 78% accuracy.
4 Discussions
The image processing has shown an effective method for identifying and diag-
nosing plant disease and replacing the digital camera with human eyes and brain
with learning optimization algorithms. The above review clarified different methods
for identifying and classifying various plant leaf infections. Some facts drive this
inference. Table 1 indicates the list of datasets has used by multiple authors. Table 1
shows that the maximum number of images has taken from the source plant village
dataset. Table 2 describes that the different preprocessing functions have been applied
by the various authors. Table 3 shows the list of segmentation methods, and it has
indicated that k-means and thresholding methods have the best performance among
all the segmentation techniques. Table 4 presents different types of features and their
combination of features used by the various authors. From Table 4, color features
alone and the combination of the features have better performance among all the
features. The results for Table 5 show that the NN classifier performs best among all
the classifiers concerning all the classification performance measures. The classifi-
cation accuracy of the NN classifiers is 97.41. The SVM and DT classifier performs
well next to NN classifier. The SVM and DT have the same classification accuracy
97.3. However, the KNN classifier yields the next classification performance among
all the classifiers. The classification accuracy of the KNN classifiers is 93.3. Finally,
NB shows the lowest classification performance with 87%.
5 Conclusions
This review paper describes the various image processing and machine learning
strategies used in the detection and classification of diseases of different plants. A
detailed list of image processing methods has explained individually. A comparison
of different classification approaches has clearly described. From the above review,
researchers should implement a few new algorithms and an understanding of methods
to achieve better outcomes. A mixture of unexplored methods of processing, selec-
tion, and training can also have to improve detection and classification methods.
Through developing mobile applications, farmers can make immediate solutions
available. Web portals may have to provide online solutions for plant disease.
References
1. Mohanty, S.P., Hughes, D., Salathe, M.: Using deep learning for image-based plant disease
detection. Front. Plant Sci. 7, 1–10 (2016)
2. Ipm images. https://www.ipmimages.org/about/. Accessed 15 May 2017
3. APS Image database. https://imagedatabase.apsnet.org/search.aspx. Accessed 16 May 2017
4. Pujari, J.D., Yakkundimath, R.S., Jahagirdar, S., Byadgi, A.M.: Quantitative detection of
soybean rust using image processing techniques. J. Crop Prot. 5(1), 75–87 (2015)
5. Rumpf, T., Mahlein, A.K., Steiner, U., Oerke, E.C., Dehne, H.W., Plumer, L.: Early detec-
tion and classification of plant diseases with support vector machines based on hyperspectral
reflectance. Comput. Electron. Agric. 74(1), 91–99 (2010)
6. Pires, R.D.L., Goncalves, D.N., Orue, J.P.M., Kanashiro, W.E.S., Rodrigues, J.F., Machado,
B.B., Gonçalves, W.N.: Local descriptors for soybean disease recognition. Comput. Electron.
Agric. 125, 48–55 (2016)
7. Phadikar, S., Sil, J., Das, A.K.: Rice diseases classification using feature selection and rule
generation techniques. Comput. Electron. Agric. 90, 76–85 (2013)
8. Singh, V., Misra, A.K.: Detection of plant leaf diseases using image segmentation and soft
computing techniques. Inf. Process. Agric. 4(1), 41–49 (2017). (Elsevier)
9. Kaur, S., Pandey, S., Goel, S.: Semi-automatic leaf disease detection and classification system
for soybean culture. IET Image Process. 12(6), 1038–1048 (2018)
10. Kaur, P., Pannu, HS., Malhi, AK.: Plant disease recognition using fractional-order Zernike
moments and SVM classifier. Neural Comput. Appl. pp. 1–20 (2019). (Springer)
11. Islam, M., Dinh, A., Wahid, K., Bhowmik, P.: Detection of potato diseases using image segmen-
tation and multiclass support vector machine. In: 2017 IEEE 30th Canadian Conference on
Electrical and Computer Engineering (CCECE), pp. 1–4. IEEE (2017)
12. Tian, Y., Zhao, C., Lu, S., Guo, X.: SVM-based Multiple classifier system for recognition of
wheat leaf diseases. In: Proceedings of 2010 Conference on Dependable Computing (CDC
‘2010), pp. 2–6 (2010)
13. Suresha, M., Shreekanth, K.N., Thirumalesh, BV.: Recognition of diseases in paddy leaves
using knn classifier. In: 2nd IEEE International Conference for Convergence in Technology
(I2CT 2017), pp. 663–666 (2017)
14. Zhang, S.W., Shang, Y.J., Wang, L.: Plant disease recognition based on plant leaf image. J
Anim Plant Sci 25(Suppl. 1), 42–45 (2015)
15. Shrivastava, S., Hooda, D.S.: Automatic brown spot and frog eye detection from the image
captured in the field. Am. J. Intell. Syst. 4(4), 131–134 (2014)
16. Mohan, K.J., Balasubramanian, M., Palanivel, S.: Detection and recognition of diseases from
paddy plant leaf images. Int. J. Comput. Appl. 144(12), 34–41 (2016)
17. Mondal, D., Kole, D.K.: Detection and classification technique of yellow vein mosaic virus
disease in okra leaf images using leaf vein extraction and Naive Bayesian classifier. In: IEEE
International Conference on Soft Computing Techniques and Implementations (ICSCTI) (2015,
October)
18. Revathi, P., Hemalatha, M.: Cotton leaf spot disease detection utilizing feature selection with
skew divergence method. Int. J. Sci. Eng. Technol. 3(1), 22–30 (2014)
19. Zhang, W., Guifa, T., Chunshan, W.: Identification of jujube trees diseases using a neural
network. Int. J. Light Electron. Opt. 124(11), 1034–1037 (2013)
20. Ramakrishnan, M.: Groundnut leaf disease detection and classification by using a backpropaga-
tion algorithm. In: IEEE International Conference on Communications and Signal Processing
(ICCSP), pp. 0964–0968 (2015, April)
21. Sabrol, H., Kumar, S.: Tomato plant disease classification in digital images using classification
tree. In: International Conference on Communication and Signal Processing, IEEE, pp. 1242–
1246 (2016)
22. Sabrol, H., Kumar, S.: Intensity-based feature extraction for tomato plant disease recognition
by classification using a decision tree. Int. J. Comput. Sci. Inf. Secur. 14(9), 622–626 (2016)
Single-Channel Speech Enhancement
Based on Signal-to-Residual Selection
Criterion
Ramesh Nuthakki, Junaid Abbas, Ayesha Afnan, Faisal Ahmed Shariff,

and Akshaya Hari
Abstract Over the last 40 years, researchers/engineers have proposed quite a many
speech enhancement algorithms to reduce noise, but little efforts have been made
to improve speech comprehensibility. The prime aim in this paper is to ameliorate
speech quality standard and comprehensibility by examining the application of binary
mask in conditions, unfavorable to hearing impaired or normal listeners who find
the speech incomprehensible. Gain functions like Wiener and spectral subtraction
aim to attenuate the signal when speech is absent or the estimated SNR is low and
retain the signal when speech is present for which the estimated SNR is high. For
this approach, access to accurate SNR estimates and estimates of background noise
spectrum is needed. Even in extremely low SNR conditions (SNR < −5 dB), this aim
is attainable. This method is applicable in real time in hearing aids, mobile phones
and speech-activated machines.
Keywords Speech comprehensibility · Ideal binary mask · Parametric gain

estimator · STOI · SSNR
R. Nuthakki · J. Abbas · A. Afnan · F. A. Shariff (B) · A. Hari

Department of Electronics and Communication Engineering, Atria Institute of Technology, ASKB
Campus, 1st Main Rd, Ags Colony, Anandnagar, Hebbal, Bengaluru 560024, India
e-mail: fahmed97186@gmail.com
R. Nuthakki
e-mail: nuthakki.ramesh@atria.edu
J. Abbas
e-mail: junaidabbas876@gmail.com
A. Afnan
e-mail: syeda.afnan66@gmail.com
A. Hari
e-mail: akshayahari123@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_56
528 R. Nuthakki et al.
1 Introduction
Human speech signal typically degrades due to various surrounding environmental

conditions. Background noise is one of the most influential factors that cause dete-
rioration of the speech standard and comprehensibility. Background noise is either
stationary or non-stationary and isn’t mostly correlated but additive to the speech
spectrum [1, 2].
Speech enhancement focuses on improvement in speech condition by the use of
different algorithms. The objective of this improvement algorithm is to achieve an
enhancement in both comprehensibility and overall quality of deteriorated speech
by the use of audio signal processing techniques. Although many advances are made
in developing enhancement algorithm that suppresses the background noise and
improves overall quality of speech, significantly less progress has been made in
developing an algorithm that enhances speech comprehensibility [3–7].
Former studies performed on normal-hearing people have reportedly proved large
improvements in comprehensibility using an ideal binary mask method. It distin-
guishes the noise dominated and speech dominated units, which is then calculated
and implemented onto the input noisy spectrum to get the noise-suppressed spectrum.
This mask was customized to keep the time-frequency (T-F) areas where the masker
(noise) was dominated by the target speech (local SNR > 0 dB) and to eliminate T-F
units where masker is more (local SNR < 0 dB). A Bayesian classifier is used to
indicate the efficiency of the binary mask in enhancing the speech comprehensibility
[8]. The elimination or retention of T-F bins by the application of a binary mask
is determined by the masker spectrum underestimation or overestimation criteria.
This proves necessary because numerous available algorithms that estimate noise
underestimate its power-spectrum density (psd). Another mask can alternatively be
synthesized by the application of restrictions on dual speech degradation types that
the gain function can initiate [5]. When the gain function is applied, it gives a variation
in spectral amplitudes. Consequently, the attenuation and amplification distortions
occur. Research has shown that between amplification and attenuation distortions, the
former causes more damage compared to the latter. Incidentally, the improved speech
that is obtained has attenuation distortion and is proven to have more comprehensi-
bility compared to the noisy speech. Hence, an ideal binary mask is applied to the
enhanced spectrum to construct a speech signal comprising of only the attenuation
distortion [4, 9].
The binary mask procedure used can substantially enhance comprehensibility of
speech sentences that are deteriorated by background noise of even −10 dB SNR
levels. The proposed method nullifies the intrusion of noise from the targeted speech
by using the mask and enhances the SNR of speech, thus improving the efficiency of
speech communication by reducing listener fatigue and increasing listening comfort
[8, 10].
Single-Channel Speech Enhancement Based on Signal-to-Residual … 529
2 Proposed Method
2.1 Signal Residual Selection Criterion
Consider a clean speech z(n), interrupted by noise d(n), (not correlated to z(n)).
Subsequently, the corrupted speech y(n) is:
y(n) = z(n) + d(n) (1)
Figure 1 shows the blocks used to build the mask in magnitude domain. Portions
containing noise are partitioned into frames/slots of 20 ms, with 50% overlap binding
the adjoining frames. Hanning window is applied to each speech frame and short-time
Fourier transformed. On multiplying the noise spectrum Y (k, mi ) with the gain G(k,
mi ), we get an estimate of the speech spectra. Here G(k, mi ) is given with respect to
a priori SNR, and Ẑ (k, mi ) is used to denote the estimate of clean sentence spectrum
where mi is frame index, and k denotes the frequency bin [9, 11].
Ẑ (k, m i ) = G(k, m i ).Y (k, m i ) (2)
After computing the estimated noise spectrum, the formulation of binary mask is
done on restricting the anomalies that were caused by inaccuracies during estima-
tion of the noise spectrum. Particularly if Ẑ (k, mi ) ≤ 2Z(k, mi ), binary mask lets the
spectrum pass through and masks the spectrum if vice versa. Usually, the processed
speech contains both noise underestimation and overestimation. The spectrum esti-
mate is differentiated opposed to the real noise spectrum for every T-F unit, the ones
Fig. 1 Procedure to build the mask in magnitude domain

that satisfy the constraints are held and ones that do not satisfy are eliminated. By
applying ISTFT, we get enhanced speech in time domain [12].
Parametric Wiener gain filter algorithm is used as a gain function. This algo-
rithm was chosen due to its less computation complexity, easy implementation and
its efficiency with respect to speech comprehensibility, unlike other sophisticated
noise-reducing algorithms [2]. The following equation is used to calculate parametric
Wiener filter function:
ω
1
G(k, m i ) = (3)
δ + SNRprio (k, m i )
Here, SNRprio is the a priori SNR which is calculated by:

2
Ẑ 2 (k, m i − 1) Y (k, m i )
SNRprio (k, m i ) = α. + (1 − α).max − 1, 0 (4)
λ D̂ (k, m i − 1) λ D̂ (k, m i )
α is a smoothing constant which controls SNRprio and its value is 0.98. Background
noise variance estimate is represented by λ D̂ [3, 5, 13].
2.2 Channel Selection Algorithms
The block diagram of the process involved for the aforementioned SNRESI -based
algorithm is shown in Fig. 2. Unlike the SNR rule, the SNRESI rule selects channels
from the enhanced (noise-suppressed) spectrum rather than from the noise-corrupted
spectrum. The noise-reduction block shown may include any conventional noise
reduction algorithm. The choice of algorithm will not influence performance, at least
in terms of intelligibility [11]. If Ẑ (k, mi ) > Z (k, mi ), it indicates noise overestimation,
and Ẑ (k, mi ) < Z (k, mi ) indicates noise underestimation distortion. Normally, both
are present in the processed speech.
Fig. 2 Block diagram representing two different channel selection algorithms

Fig. 3 Plot of SNRESI

versus the ratio of enhanced
(| Ẑ |) to clean (|Z |) spectra
The impact of these distortions by gain on the comprehensibility of speech, in

steady noise conditions and competing talker, is assessed as shown in Fig. 3. They
were confined into three regions by using an ordinary noise reducing algorithm
(Square Root Weiner).
Reg I: Only contains attenuation distortion.
Reg II: Only contains amplification distortion lesser than 6 dB.
Reg III: Contains amplification distortion greater than 6 dB.
As seen from above, if we combine the first two regions and denote them as Reg
I+II, we get the restraint:
Ẑ (k, m i ) ≤ 2. Z (k, m i ) (5)
In Region I: SNRENH (k) ≤ SNR(k) leads to Ẑ(k, mi ) ≤ Z(k, mi ) which gives rise
the condition in this region. In Region II:
SNR(k) < SNRENH (k) ≤ SNR(k) + 6 dB which gives rise to the condition in
this region. Lastly, Region III constraint is obtained because factually in this region:
SNRENH (k) > SNR(k) + 6 dB. By these definitions pertaining to these three regions, it
is made obvious that, to maximize SNRESI (and hence maximize comprehensibility),
the approximated magnitude spectra Ẑ (k, mi ) must be retained in both regions I and
II [3, 4].
3 Objective Measures
The initial speech signal and the enhanced speech signal are generally used to calcu-
late the objective quality measures. In either frequency or time domain, every speech
slot’s distortion measure average is taken to evaluate speech distortion. The objective
measures computed in our project are segmental signal-to-noise ratio (SSNR) and
short-time objective intelligibility (STOI).
3.1 SSNR
The SSNR is estimated in both time and frequency domains. The most simple measure
used to evaluate the speech improvement algorithm is the time domain approach. The
original and processed signals are time aligned with phase errors rectified. SSNR can
be defined as,
⎛ ⎞
−1 N

M−1
N m+N
i=1 Z (i)
2
10 ⎜ ⎟
SSNR = log ⎝ 2 ⎠ (6)
M M=0 10 i=N m N
i=1 Z (i) − Ẑ (i)
where Z (i) is initial (clean) signal, Ẑ (i) is enhanced signal. M denotes the signal
frame numbers and N indicates the length of frame (20 ms).
Geometric average of SNRs over all slots of the signal provides basis of SSNR.
A probable issue with the approximation of SSNR is that, in the durations of quiet-
ness in a speech signal (which are plenty in all human conversations), the signal
energy is observed to be very low that results in highly negative SSNR values which
tend to create unfairness in the overall assessment. To resolve this, the quiet frames
are excluded, by comparison of short-time energy computations with respect to a
threshold and by restricting SSNR to low values. They were restricted within a span
of (−10, 35 dB) hence evading the use of a speech silence detector. The SSNR is
based on the clean and processed signals. The signals passed through the perceptual
weighting filters. After passing clean and processed signals through these filters,
computation of the segmental SNR is based on the outputs of these filters [14].
3.2 STOI
Rate of sample is 10,000 Hz for which this technique is synthesized, so as to

wrap the appropriate frequency span for comprehensibility. Signals with different
rates of sample must be sampled again. Moreover, we presume that clean and esti-
mated signals are time-aligned. Firstly, 50% overlapping is done on both signals by
segmenting to gain a T-F depiction, slots of 256 length samples are Hann-windowed,
up to 512 samples of zero padding are done for each frame and analysis of 1/3rd
octave band is done by clumping DFT channels. Totally, 15,1/3rd octave bands are
utilized along with setting the smallest central frequency to 150 Hz. Ẑ (k, mi ) defines
the kth DFT-channel of mi th slot of clean signal. Then, the T-F unit which is norm
of the jth 1/3rd octave band is given by,

k2 ( j)−1 2

Z j (m i ) = Ẑ (k, m i ) (7)
k=k1 ( j)
Here k 1 and k 2 are the 1/3rd octave band edges rounded off to the closest DFT-
channel. T-F depiction of the processed signal is achieved in the same manner and
denoted by s. The intermediary comprehensibility assessment for a single T-F unit,
represented dj(mi ), depends on N successive T-F unit regions from Z j (m i ) and
Ẑ j (m i ) both, with mi ∈ M where
M = {(m i − N + 1), (m i − N + 2), . . . , m i − 1, m i } (8)
Normalization method is first done by scaling of every T-F unit from Ẑ j (m i ),

⎛ ⎞
2 1/2

α=⎝ Z j (m i )2 Ẑ j (m i ) ⎠ (9)
mi mi
In a way that its energy is equal to the clean signal energy, inside that T-F region.
Then, for signal-to-distortion ratio (SDR) to be lower bound, α Ẑ j (m i ) is clipped,
we define SDR as,
⎛ ⎞
⎜ Z j (m i )2 ⎟
SDR j (m i ) = 10 log10 ⎝ 2 ⎠ (10)
α Ẑ j (m i ) − Z j (m i )
Hence,

Ẑ = max min α Ẑ , Z + 10−β/20 Z , Z − 10−β/20 Z (11)
Here Ẑ denotes clipped plus normalized T-F unit, also β indicates lower bound
SDR. Intermediary comprehensibility assessment is given by approximating the
correlation coefficient between processed and unprocessed T-F units,

mi Z j (m i ) − 1
N l Z j (l) Ẑ j (m i ) − 1
N l Ẑ j (l)
d j (m i ) = (12)
2
2
mi Z j (m i ) − 1
N l Z j (l) mi Ẑ j (m i ) − 1
N l Ẑ j (l)
Finally, the OCM is obtained by taking the mean of intermediary comprehensi-

bility assessment on all frames and bands,
1
d= d j (m i ) (13)
J M j,m
i
It is clear from Table 2 that there is a significant improvement in SSNR and STOI
for speech signal corrupted by random, babble, helicopter and car noises for δ and
ω values of the Parametric Wiener gain filter shown. The values were chosen so as
to get the trade-off between the overall signal quality and comprehensibility.
4 Subjective Measures
A total of 8 listeners, 4 males and 4 females, were asked to participate in a listening

test. Both enhanced speech signal and the noisy sentences were played. They were
then asked to rate the enhanced signal out of 5 with respect to the following subjective
quality parameters: background (BAK), signal (SIG) and overall (OVL).
Unprocessed speeches were deteriorated by random, car, babble and helicopter
noises of 0 and −5 dB SNRs, respectively [15]. Listeners scored these speeches
which are shown in Table 1. After observing the scores, it is seen that there is a
massive advancement in speech quality for the enhanced speech signal (Table 2).
Table 1 Subjective measure analysis

Noise type SNR(dB) BAK SIG OVL
Random noise 0 (δ = 4.8, ω = 0.2) 3.8 4.0 4.1
−5 (δ = 3.3, ω = 0.4) 2.2 2.3 2.9
Babble noise 0 (δ = 1.3, ω = 0.2) 3.9 4.2 4.3
−5 (δ = 2.2, ω = 0.7) 2.6 2.7 3.0
Helicopter noise 0 (δ = 3.0, ω = 0.3) 3.8 4.2 4.5
−5 (δ = 2.9, ω = 0.3) 3.7 4.1 4.4
Car noise 0 (δ = 4.8, ω = 0.3) 3.2 3.7 4.1
−5 (δ = 4.5, ω = 0.4) 2.3 2.9 4.0
Table 2 Objective measure analysis

Noise type Input SNR δ ω SSNR(dB) SSNR (dB) STOI (dB) STOI (dB)
(dB) with with Wiener using with Wiener
parametric filter (δ = 1, parametric filter (δ = 1,
Wiener filter ω = 1) Wiener filter ω = 1)
Helicopter 0 3.0 0.3 14.7530 11.8087 0.934 0.856
noise −5 2.9 0.3 10.5136 9.9602 0.905 0.817
Random 0 4.8 0.2 12.6998 9.4059 0.958 0.842
noise −5 3.3 0.4 10.1681 8.8039 0.895 0.788
Babble 0 1.3 0.2 3.8180 1.2307 0.921 0.804
noise −5 2.2 0.7 0.4231 0.5082 0.698 0.684
Car noise 0 4.8 0.3 11.7970 11.0471 0.953 0.882
−5 4.5 0.4 10.4678 10.1048 0.912 0.858
Fig. 4 Mean
comprehensibility score
5 Mean Comprehensibility Score
Figure 4 indicates the mean % of words identified by listeners with standard capacity
to identify. It has been evident from the figure that intelligibility was improved when
noise distortion constraints were applied in the magnitude domain and was noticed to
have been degraded on applying the Weiner and unrefined stimuli [3]. The recordings
obtained on the basis of mean % of words taken from listeners are shown in Fig. 4.
UN represents the values derived from unprocessed speech. It has been evident that
there was a significant improvement in the performance on applying the proposed
binary mask, as depicted in Fig. 3. On considering −5 dB, the performance increased
from 25% with un-processed stimuli (UN) to 97% with proposed binary mask ( Ẑ (k,
mi ) ≤ 2.Z (k, mi )) and at 0 dB performance increased from 65% with un-processed
stimuli (UN) to 99% with the proposed binary mask ( Ẑ (k, mi ) ≤ 2.Z (k, mi )).
6 Spectral Analysis
Spectrograms are framed to illustrate the functioning of the time-changing spectral

attributes. The plots of the spectrogram for the magnitude domain are displayed in
Fig. 5. These spectrograms were obtained for −5 dB and 0 dB input SNR levels. We
can say by seeing the spectrograms that it retrieves both voiced and silent edges plus
formants in magnitude domain.
7 Results and Conclusion
The parametric Wiener gain filter was used for the implementation of the new binary
mask approach with the use of MATLAB tool. Different subjective and objective
tests were done. For objective tests, the parameter calculated was SSNR and STOI in
Clean speech
10000
8000
6000
4000
2000
0 0.5 1 1.5 2 2.5 3
Noisy speech
10000
8000
6000
Frequency(Hz)
4000
2000
0 0.5 1 1.5 2 2.5 3
Wiener Filter processed
10000
8000
6000
4000
2000
0 0.5 1 1.5 2 2.5 3
Enhanced speech Signal
10000
8000
6000
4000
2000
0 0.5 1 1.5 2 2.5 3
Time (s)
(a)
(b)
(c) (d)
Fig. 5 Spectrograms of helicopter noise a SNR = 0 db b SNR = −5 db and spectrograms of car

noise c SNR = 0 db d SNR =−5 db
time domain. The tests were run for different combinations of δ and ω of parametric
Wiener gain filter for different background noises at 0 and −5 dB SNR levels. On
looking at the objective scores, there was an obvious upgrade in the SSNR values
for sentences degraded by helicopter, car, random and babble noises at 0 and −5 dB
SNRs. The subjective tests also show improvement in overall speech enhancement
standard and speech comprehensibility. The mean comprehensibility scores also
suggest improvement in intelligibility for the proposed binary mask channel selection
criterion.
8 Future Scope
In the future, improvement in speech comprehensibility and overall speech enhance-

ment standard for signal degraded by noise as low as −10 dB SNR levels can be
achievable. Further improvement in values for other objective measures parameters
such as SDR, PESQ can be obtained and verified.
References
1. Naik, D.C., Sreenivasa Murthy, A., Nuthakki, R.: A literature survey on single channel speech
enhancement techniques. Int. J. Sci. Technol. Res. 9(3). ISSN 2277-8616
2. Rangachari, S., Loizou, P.C.: A noise-estimation algorithm for highly non-stationary environ-
ments. Speech Commun. 4, 220–231 (2006). (TX 75083-0688, 2005 Elsevier B.V)
3. Kim, P., Loizou, P.C.: Gain-Induced Speech Distortions and the Absence of Intelligibility
Benefit with Existing Noise-Reduction Algorithms. Department of Electrical Engineering,
University of Texas at Dallas, Richardson, Texas. 75080 VC 2011 Acoustical Society of
America. https://doi.org/10.1121/1.3619790. pp. 1581–1596. Accepted 2 July 2011
4. Kim, G., Loizou, P.C.: Why do Speech-Enhancement Algorithms not Improve Speech Intelli-
gibility? Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX
75080, USA 978-1-4244-4296-6/10/2010 IEEE 4738 ICASSP 2010
5. Nuthakki, R., Sreenivasa Murthy, A., Naik, D.C.: Single channel speech enhancement using a
new binary mask in power spectral domain. In: Proceedings of the 2nd International Conference
on Electronics, Communication and Aerospace Technology (ICECA 2018) IEEE Conference
Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1
6. Nuthakki, R., Sreenivasa Murthy, A., Naik, D.C.: Modified magnitude spectral subtraction
methods for speech enhancement. In: 2017 International Conference on Electrical, Electronics,
Communication, Computer and Optimization Techniques (ICEECCOT). IEEE. 978-1-5386-
2361-9/17/2017
7. Nuthakki, R.: Speech enhancement techniques. Int. J. Adv. Res. Sci. Eng. 6(8) (2017, August)
8. Kim, G.: Binary mask estimation for noise reduction based on instantaneous SNR estimation
using Bayes risk minimisation. Electron. Lett. 51(6), 526–528 (2015, 19 March)
9. Nuthakki, R., Sreenivasa Murthy, A.: Enhancement of speech intelligibility using binary mask
based on noise constraints. Int. J. Recent Technol. Eng. (IJRTE). 8(3) (2019, September). ISSN:
2277-3878
10. Li, N., Loizou, P.C.: Factors influencing intelligibility of ideal binary-masked speech:
implications for noise reduction. Acoust. Soc. Am. (2008). https://doi.org/10.1121/1.2832617
11. Nuthakki, R., Sreenivasa Murthy, A., Naik D.C.: Enhancement of speech intelligibility using
binary mask based on channel selection criteria. Int. J. Recent Technol. Eng. (IJRTE) 8(5)
(2020, January). ISSN: 2277-3878
12. Chen, F., Loizou, P.C.: Impact of SNR and Gain-Function Over- and Under-Estimation on
Speech Intelligibility. Department of Electrical Engineering, University of Texas at Dallas,
Richardson, TX 75083-0688, USA. Accepted 8 Sept 2011
13. Kim, G., Loizou, P.C.: A New Binary Mask Based on Noise Constraints for Improved Speech
Intelligibility. Department of Electrical Engineering, University of Texas at Dallas, USA, ISCA
1632, 26–30 Sept 2010, Makuhari, Chiba, Japan INTERSPEECH 2010
14. Ma, J., Hu, Y., Loizou, P.C.: Objective measures for predicting speech intelligibility in noisy
conditions based on new band-importance functions. In: 2009 Acoustical Society of America.
https://doi.org/10.1121/1.3097493
15. Hu, Y., Loizou, P.C.: Subjective Comparison and Evaluation of Speech Enhancement
Algorithms. Elsevier B.V (2006)
Evolutionary Algorithm for Solving
Combinatorial Optimization—A Review
Anisha Radhakrishnan and G. Jeyakumar
Abstract Evolutionary computing (EC) has made a remarkable competency in both

research and industry. Its efficiency in addressing the combinatorial optimization
problems (COPs) has gained wide popularity. Exploration of bio-inspired algo-
rithms for solving COPs has experienced a notable shift from classic algorithms
to hybridized and co-evolutionary algorithms. This paper presents a broad study on
different approaches exists for solving COPs. In particular, detailed explanation on
evolutionary algorithms (EAs) and its usages in solving COPs are presented. This
study also details different algorithmic adjustments to be done on EAs and possible
integrations of them with other meta-heuristics, in order to make them apt for solving
COPs.
Keywords Evolutionary algorithm · Combinatorial optimization problem ·

Discrete optimization problem · Continuous optimization problem
1 Introduction
Evolutionary algorithm (EA) is a global, generic population-based, parallel search

optimization technique originated by the inspiration of natural. Traditionally, evolu-
tionary programming (EP), evolution strategies (ES), genetic algorithm (GA) genetic
programming (GP) were its family. The other bio-inspired algorithms which are in
the domain are particle swarm optimization (PSO), ant colony optimization (ACO),
biogeography-based optimization (BBO), Cuckoo search (CS), artificial bee colony
(ABC), learning classifier systems (LCS), and differential and estimation of distri-
bution algorithm (EDA). Combinatorial optimization is a branch of mathematics,
A. Radhakrishnan
Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore,
India
e-mail: r_anisha@cb.amrita.edu
G. Jeyakumar (B)
Amrita Vishwa Vidyapeetham, Ettimadai, India
https://doi.org/10.1007/978-981-33-4543-0_57
540 A. Radhakrishnan and G. Jeyakumar
where optimal solution is found from a finite set of possible solutions. From last
decades, researchers have explored a lot in EA solving complex COPs. Though there
are several approaches available to solve COPs, EAs have outperformed in solving
COPs and finding solution in polynomial run time [1].
The practical application of combinatorial optimization can be seen in all the
directions of real-world applications. As stated in “No Free Lunch” theorem [2],
there is no single global optimal algorithm that can solve all the problems. This
survey emphasizes on real-world COP applications where EAs are applied. It reviews
the approaches embedded with EA to solve COPs. Analysis on how EAs solve the
real-world COPs and benchmarking COPs is presented along with the performance
measurements.
The rest part of this article is structured as follows Sect. 2 introduces combinatorial
optimization problems (COPs), the Sect. 3 discusses how EAs are used for solving
COPs and summarizes based on application areas, and Sect. 4 concludes the article.
2 Combinatorial Optimization Problems
The problems, around us, which have multiple feasible solutions but one or more
best solutions are called as optimization problems. The process of searching the best
solution among the feasible solutions is termed as optimization. The optimization
problems are categorized into different types. COPs are those who have optimal solu-
tions within a finite set of possible solutions. This set is defined by a set of conditions,
and it is too large to search. The mathematical techniques finding the optimal solu-
tions to COPs involve finding an ordering of a finite set of objects (solution compo-
nents) that satisfy the given conditions. The COPs are hard optimization problems to
solve than the continuous optimization problems. However, the advancements in the
algorithm design methodologies and computing technologies made solving COPs
easier. There are two categories of approaches in formulating the algorithms to solve
COPs—(1) Exact Approach (2) Heuristic Approach. The exact approach follows the
brute force strategy. The complexity involved in generating all possible solutions
is high. Hence, the idea of finding approximate solutions which are good enough
brought into picture. The heuristic approaches follow this idea. They do not guar-
antee that the exact solution will be found but find approximate solutions which are
good enough for the problem at hand [3]. This lead to the availability of numerous
general purpose heuristics for solving complex COPs in reasonable time. They are
classified as constructive heuristics, meta-heuristics, approximate algorithms, and
iper-heuristics.
The constructive heuristics start the process with generating an “empty solution.”
Then, the empty solution is extended to get a complete solution. Meta-heuristics
[4] are problem independent algorithmic frame works which provides guidelines for
constructing optimization algorithms for solving problems [5]. The most popular
meta-heuristics are evolutionary algorithms (EAs) [6], Tabu search [7], simulated
annealing [8], and ant colony optimization [9], etc. The approximate algorithms
Evolutionary Algorithm for Solving Combinatorial Optimization … 541
are the special class of heuristics, which guarantee the near optimal solutions with a
limited error from global optimal solution with a specified threshold for the error. Inte-
grating the operation research and artificial intelligence techniques, the iper-heuristic
approaches aim at developing general algorithms able to generate problem specific
algorithm. Objective of this paper is to present an overview of how evolutionary
algorithms (EAs) are used to solve the COPs.
3 Evolutionary Algorithms for COPs
EAs are stochastic, approximation optimization method that belongs to a subclass

of evolutionary computation (EC). The layout of EA includes population initializa-
tion, fitness function evaluation, mutation, crossover, and selection. The fittest indi-
vidual will survive and added to the population for next generation. The foremost
step in EA is to initialize the population. The set of initial possible solutions (also
called as individuals or chromosomes) is called as population. The chromosomes
can be represented as integer, float, binary or tree depending on the application. The
number of chromosomes in a population is termed as population size (Np). There
are different initialization techniques. They are categorized based on randomness,
compositionality, and generality [10]. Once the population is initialized, the next task
is to evaluate the potential of the solutions. Two approaches are suggested for this—
problem-based and evolutionary-based [11]. The next step is to perform crossover
and mutation. These evolutionary operators are significant in EA, to produce the
offspring. The logic of mutation and crossover strategies is specific to each classical
EA. Several variants to these classical EAs are proposed in the research community
with improved mutation and crossover strategies. The selection operators are to select
the candidates for next generation.
From past two decades researches and studies are extensively happening in evolu-
tionary eomputing (EC) field to use EAs for solving COPs. Since the EAs are
stochastic search algorithms, their probability of finding approximate best solutions
from finite set of solution space of the COPs at the early stage of the optimization
process is very high. The EA solving a COP needs to consider several aspects. There
are different approaches in finding solutions for COP problems using EAs. There
are EAs which are designed specifically for the COP at hand. EAs which are mainly
proposed for continuous domain cannot be applied directly to COP; we need to apply
modification in its representations. These modifications can be broadly classified
as—Ensemble for COP, evolutionary operator-based hybridization and coevolution
[12].
There are several model-based approaches based on ensemble for discrete opti-
mization. Discrete problems can be represented as categorical variables, strings, tree,
graph, permutations, and ordinal integers. The strategies are (i) Naive approach (ii)
Custom modeling (iii) Discrete Model (iv) Mapping (v) Feature Extraction and (vi)
Similarity centered modeling. A summary of various COPs solved by EAs, other
meta-heuristics and their hybridization is presented in Tables 1, 2 and 3.
Table 1 Summary for algorithms for COPs (for electrical power systems)
Reference Algorithm used Technique
Optimal power flow with Enhanced ACO Modified structure
ecological emission [13]
Optimal chiller loading using Fish algorithm Hybridization
minimum power consumption
[14]
Optimal reactive power Ant lion optimizer Global optimizer
dispatch problem [15]
Optimal integration of PSO Modified PSO with operators
renewable energy sources [16] of DE
Optimal reactive power Enhanced firefly algorithm, Hybridization of GA and LS
dispatch problems [17–19] teaching learning based
algorithms, Gravitational
search algorithm
Table 2 Summary for algorithms for COPs (for routing, traveling salesman, scheduling, and
planning)
Reference Algorithm Used Technique
University time table scheduling Simulated annealing + Hybridization
[20] GA
Parallel machines manufacturing Symbiotic organisms Hybridization
scheduling [21] search
Simulated annealing
Job scheduling [22] PSO + simulated Hybridization
annealing
Vehicle routing problem [23] Tabu search Modified structure
Vehicle routing problem [24] Modified PSO Hybridization
Manufacture scheduling [25] Hybrid EDA (markov Hybridization
network-based EDA)
Job scheduling [26] EDA Hybridization
Hybrid dynamic berth allocation Chemical reaction Hybridization
planning problem [27] optimization
Flexible job scheduling [28] PSO Hybridization
Constraint shortest path problem PSO + VNS (variable Hybridization (GA + LS)
[29] neighborhood search)
Traveling salesman problem [30] ACO + 3 Opt algorithm Hybridization
4 Conclusion
This paper presented a survey on using EA and other meta-heuristics for solving
combinatorial optimization problems (COPs). As many real-world COPs are NP,
Table 3 Summary for algorithms for COPs (for pattern recognition-feature selection, classification,
clustering)
Reference Algorithm Used Technique
Feature selection and classification ACO + BCO Hybridization
[31]
Feature selection [32] Artificial bee colony and gradient Hybridization
boosting decision tree
High-dimensional classification [33] PSO—competitive swarm Modified structure
optimizer (CSO)
Handwritten signature verification Artificial immune systems Modified structure
[34]
Feature selection in big data [35] Fish swarm optimization Modified structure
hard problems adapting EAs for approximate solutions is the most invited possi-
bility. EAs are designed for solving continuous parameter problems, but they are not
directly adaptable to discrete domain. We need to have proper mapping method to
represent them. Applying genetic operator results in real values, thus proper mapping
and searching techniques for global solution is effectively possible. Studies have
proved that EAs perform better when they are modified for solving COPs. This
paper summarized several COPs solved by EAs with suitable changes made in the
algorithmic structure and hybridization.
References
1. Puchinger, J., Raidl, G.R.: Combining metaheuristics and exact algorithms in combinatorial
optimization: a survey and classification. In: Mira, J., Álvarez, J.R. (eds.) Artificial Intelligence
and Knowledge Engineering Applications: A Bioinspired Approach. IWINAC 2005. Lecture
Notes in Computer Science, vol. 3562, pp 41–53. Springer, Berlin (2005)
2. Wolpert, D.H., Macreedy, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol.
Comput. 1(1), 67–82 (1997)
3. Montiel, O., Díaz Delgadillo, F.J.: Reducing the size of combinatorial optimization problems
using the operator vaccine by fuzzy selector with adaptive heuristics. Math. Prob. Eng. (2015)
4. Osman, I.H., Kelly, J.P.: Meta-Heuristics: An Overview. Meta-Heuristics, pp. 1–21. Springer,
Boston (1996)
5. Glover, F., Sörensen, K.: Metaheuristics. Scholarpedia 10(4), 6532 (2015)
6. Back, T.: Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary
Programming, Genetic Algorithms. Oxford University Press, Oxford (1996)
7. Glover, F., Laguna, M.: Tabu search. Handbook of Combinatorial Optimization, pp. 2093–2229.
Springer, Boston (1998)
8. Van Laarhoven, P.J.M., Aarts, E.H.L.: Simulated annealing. Simulated ANNEALING:
THEORY AND applications, pp. 7–15. Springer, Dordrecht (1987)
9. Dorigo, M., Di Caro, G.: Ant colony optimization: a new meta-heuristic. In: Proceedings of
the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), vol. 2. IEEE
(1999)
10. Kazimipour, B., Li, X,. Qin, A.K.: A review of population initialization techniques for evolu-
tionary algorithms. In: 2014 IEEE Congress on Evolutionary Computation (CEC). IEEE
(2014)
11. Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation. Soft.
Comput. 9(1), 3–12 (2005)
12. Bartz-Beielstein, T., Zaefferer, M.: Model-based methods for continuous and discrete global
optimization. Appl. Soft Comput. 55, 154–167 (2017)
13. Raviprabakaran, V., Subramanian, R.C.: Enhanced ant colony optimization to solve the optimal
power flow with ecological emission. Int. J. Syst. Assur. Eng. Manag. 9(1), 58–65 (2018)
14. Zheng, Z., Li, J.: Optimal chiller loading by improved invasive weed optimization algorithm
for reducing energy consumption. Energy Build. 161, 80–88 (2018)
15. Mouassa, S., Bouktir, T., Salhi, A.: Ant lion optimizer for solving optimal reactive power
dispatch problem in power systems. Int. J. Eng. Sci. Technol. 20(3), 885–895 (2017)
16. Lorestani, A., Ardehali, M.M.: Optimal integration of renewable energy sources for
autonomous tri-generation combined cooling, heating and power system based on evolutionary
particle swarm optimization algorithm. Energy 145, 839–855 (2018)
17. Liang, R.-H., et al.: An enhanced firefly algorithm to multi-objective optimal active/reactive
power dispatch with uncertainties consideration. Int. J. Electr. Power Energy Syst. 64, 1088–
1097 (2015)
18. Ghasemi, M., et al.: Solving optimal reactive power dispatch problem using a novel teaching–
learning-based optimization algorithm. Eng. Appl. Artif. Intell. 39, 100–108 (2015)
19. Chen, G., et al.: Optimal reactive power dispatch by improved GSA-based algorithm with the
novel strategies to handle constraints. Appl. Soft Comput. 50, 58–70 (2017)
20. Fredrikson, R., Dahl, J.: A comparative study between a simulated annealing and a genetic
algorithm for solving a university timetabling problem (2016)
21. Ezugwu, A.E., Prayogo, D.: Symbiotic organisms search algorithm: theory, recent advances
and applications. Expert Syst. Appl. 119, 184–209 (2019)
22. Tang, H., et al.: Flexible job-shop scheduling with tolerated time interval and limited starting
time interval based on hybrid discrete PSO-SA: An application from a casting workshop. Appl.
Soft Comput. 78, 176–194 (2019)
23. Archetti, C., et al.: An iterated local search for the traveling salesman problem with release
dates and completion time minimization. Comput. Oper. Res. 98, 24–37 (2018)
24. Norouzi, N., Sadegh-Amalnick, M., Tavakkoli-Moghaddam, R.: Modified particle swarm opti-
mization in a time-dependent vehicle routing problem: minimizing fuel consumption. Optim.
Lett. 11(1), 121–134 (2017)
25. Gen, M., et al.: Advances in hybrid EDA for manufacturing scheduling with uncertainty: part I.
In: International Conference on Management Science and Engineering Management. Springer,
Cham (2018)
26. Hao, X., et al.: Effective multiobjective EDA for bi-criteria stochastic job-shop scheduling
problem. J. Intell. Manuf. 28(3), 833–845 (2017)
27. De, Arijit, et al.: A hybrid dynamic berth allocation planning problem with fuel costs consider-
ations for container terminal port using chemical reaction optimization approach. Ann. Oper.
Res. 1–29 (2018)
28. Nouiri, M., et al.: An effective and distributed particle swarm optimization algorithm for flexible
job-shop scheduling problem. J. Intell. Manuf. 29(3), 603–615 (2018)
29. Marinakis, Y., Migdalas, A., Sifaleras, A.: A hybrid particle swarm optimization–variable
neighborhood search algorithm for constrained shortest path problems. Eur. J. Oper. Res.
261(3), 819–834 (2017)
30. Mahi, M., Baykan, O.K., Kodaz, H.: A new hybrid method based on particle swarm optimiza-
tion, ant colony optimization and 3-opt algorithms for traveling salesman problem. Appl. Soft
Comput. 30, 484–490 (2015)
31. Shunmugapriya, P., Kanmani, S.: A hybrid algorithm using ant and bee colony optimization
for feature selection and classification (AC-ABC Hybrid). Swarm Evol. Comput. 36, 27–36
(2017)
32. Rao, H., et al.: Feature selection based on artificial bee colony and gradient boosting decision
tree. Appl. Soft Comput. 74, 634–642 (2019)
33. Gu, S., Cheng, R., Jin, Y.: Feature selection for high-dimensional classification using a
competitive swarm optimizer. Soft. Comput. 22(3), 811–822 (2018)
34. Parmar, M., et al.: State of art survey signature verification techniques 2019. Asian J.
Convergence Technol. (AJCT) 5(3), 91–96 (2020)
35. Manikandan, R.P.S., Kalpana, A.M.: Feature selection using fish swarm optimization in big
data. Cluster Comput. 22(5), 10825–10837 (2019)
Effect of J48 and LMT Algorithms
to Classify Movies
in the Web—A Comparative Approach
Prashant Bhat and Pradnya Malaganve
Abstract Social Media websites such as Facebook, YouTube, twitter, etc., are the
convenient platforms to share one’s views about the multimedia. Videos getting
uploaded on YouTube every day are millions in number. Videos can be of different
category such as comedy video, sports video, news, advertisement, movie trailer
video, etc. Nowadays, data mining researchers are attracted towards different clas-
sification techniques of data mining to discover hidden information as well as to
discover knowledge from huge video data. The goal of this research is, classifying
and predicting movies trailer videos as poor movie, good movie, very good movie
and excellent movie based on the meta data such as likes, dislikes, comments, ratings,
budget, etc. An attempt is made in the present work to provide an effective mining
result about classifying Social Media movies. These movies are labelled based on
a particular class and other related attributes of the same dataset. 10 folds cross-
validation test is applied on J48 and LMT decision tree algorithm, and comparison
analysis is made based on confusion matrix and accuracy rate.
Keywords Classification · J48 Decision tree · LMT Decision tree · Social Media
1 Introduction
Social Media data is very huge in size, and many times the data will be full of noise,
fuzzy, incomplete and unstructured in nature. At the same time, it is an essential
task to handle such kind of data and discover useful information or knowledge out
of it. Every day about a million GB of movie videos are uploaded on Social Media
websites such as Facebook, YouTube, Instagram, etc [1].
P. Bhat · P. Malaganve (B)

Department Computational Science and IT, Garden City University, Bengaluru, India
e-mail: pradnya181278@gcu.ac.in
P. Bhat
e-mail: prashant.bhat@gardencity.university
https://doi.org/10.1007/978-981-33-4543-0_58
548 P. Bhat and P. Malaganve
In the proposed work, by using the meta data rating, all movies are classified as
whether the movie is poor, good, very good or excellent to watch. Ratings are nothing
but marks given to a movie trailer on YouTube in the range of 0–10 to convey one’s
opinion about the movie trailer. And, average of different viewers rates on particular
movie will be calculated and given to a movie. It gives a hint to the user to decide
upon watching that movie. The present dataset contains several attributes related to
YouTube and Twitter. As well as another meta data [2] budget is also considered for
comparison. The attribute budget is divided among three labels such as, high budget
movie, average budget movie and low budget movie [3, 4].
In this work, WEKA tool [5] is used to pre-process and to classify the dataset
considering the class and other related meta data. Decision Tree—J48 [6] and LMT
are used for testing and training the dataset by considering 10 folds cross-validation
test [7]. An attempt is to make comparative analysis between Decision Tree J48 and
LMT [8].
The rest of the paper is organised as literature review, proposed methodology
including sample dataset, table of contents, confusion matrix of both J48 and LMT
Decision Tree algorithms, findings and conclusion.
2 Proposed Model
Figure 1 represents the proposed model for comparison analysis of Decision Tree
J48 and LMT based on confusion matrix and accuracy rate. The data is extracted
from Social Media that are YouTube and Twitter, and the extracted data is stored
in .CSV file for pre-processing. The pre-processed data is classified into different
classes such as poor movie, good movie, very good movie and excellent movie based
on the meta data using LMT and J48 data mining algorithms. By using classification
techniques, we apply 10 folds cross-validation test on pre-processed data set to divide
it as training and testing data to get the efficient result of Classification. Finally, the
accuracy rate and confusion matrix of both LMT and J48 algorithms is generated
using WEKA then the results of both the algorithms is compared and analysed to
prove which algorithm can provide best accuracy result after classifying the whole
data set which contains movie trailer videos.
2.1 J48 Decision Tree
J48 Decision Tree is advance version of C4.5. The algorithm uses divide-and-conquer
method. And, to construct the tree, it uses pruning method [9]. It is a common method
which is used in information gain or entropy measure [10]. Hence, it is like tree
structure with root node, intermediate and leaf nodes. Node holds the decision and
helps to acquire the result [11].
Effect of J48 and LMT Algorithms to Classify Movies in the Web … 549
Data extracted from You Tube Data extracted from Twitter
Data Selection
.CSV file
Data Pre-process
Classification Algorithms
LMT J48
Data mining algorithm
10 folds cross validation
Training Testing
Confusion matrix & Accuracy rate
Knowledge Discovery
Comparison analyses
Fig. 1 Propose model for comparison analysis of J48 and LMT Decision Tree
2.2 LMT Decision Tree
Logistic model tree (LMT) is a classification model with an associated super-

vised training algorithm. It combines Decision Tree learning and logistic prediction.
Logistic model trees use a Decision Tree that has linear regression models at its
leaves to provide a section wise linear regression model. All illustrations must be
supplied at the correct resolution [10, 12].
2.3 Attribute Details of Table 1
Movie: The dataset contains 232 different movie names which are stored under Movie
attribute column.
Year: It indicates, in which year each movie got released on the screen.
Ratings: Shows the number of ratings the movie has got, depending on which
movie can be classified.
Genre: It indicates the genre or category of the movie.
Gross: Gross collection of a particular movie after it got released on the screen.
Budget: Total amount of budget the movie required to build.
Screens: Number of screens in USA where all the movie got released.
Sequels: Number of sequels made next to the movie.
Sentiment: Sentiment score of the movie.
Views: Number of views of movie trailer on YouTube.
Likes: Number of likes of movie trailer on YouTube.
Dislikes: Number of dislikes of movie trailer on YouTube.
Comments: Number of comments of movie trailer on YouTube.
Aggregate followers: Aggregate actor followers on Twitter.
Table 1 contains the real values of used dataset with respective data type. To label
the class as poor movie, good movie, very good movie and excellent movie, we have
considered “Ratings” attribute and converted the numeric values of Ratings attribute
to nominal values by providing particular range as shown in Table 2.
Table 1 Dataset descriptions

Name of attribute Data type Description
Movie Nominal Name of the movie
Year Numeric Year at which movies were projected on the screens
Ratings Numeric Ratings on the movie
Genre Numeric Genre
Gross Numeric Gross income in USD
Budget Numeric Budget in USD
Screens Numeric Number of screens in USA
Sequel Numeric Sequel
Sentiment Numeric Sentiment score of the movie
Views Numeric Number of views of movie trailer on YouTube
Likes Numeric Number of likes of movie trailer on YouTube
Dislikes Numeric Number of dislikes of movie trailer on YouTube
Comments Numeric Number of comments of movie trailer on YouTube
Aggregate followers Numeric Aggregate actor followers on Twitter
Table 2 Classification of
Rating range Label
rating attribute
0–3 Poor
3.1–5 Good
5.1–8 Very good
8.1–10 Excellent
Table 3 Classification of
Budget range Label
budget attribute
Budget < 800,000 Low budget movie
Budget < 16,000,000 Average budget movie
Budget > 16,000,000 High budget movie
The attribute “Budget” was having carrying numeric values has converted to
nominal values to classify the dataset in better way. Nominal values of attribute
“Budget” are labelled as shown in Table 3.
3 Findings and Results
3.1 Folds Cross-Validation Test
Here, we can divide the dataset in different number of folds. If we consider 10 folds,
the dataset is divided into 10 different sets. In the first iteration, first set is considered
as testing dataset and remaining 9 sets are considered as training datasets. Same in
the second iteration, second set is considered as testing dataset and remaining 9 sets
are considered as training dataset and so on… Hence, the entire dataset is considered
as training as well as testing dataset.
3.2 Comparison Analyses of J48 and LMT Decision Tree
Figure 2. shows the accuracy of Decision Tree J48 as 85.71% and the accuracy of
Decision Tree LMT as 86.58%; therefore, Decision Tree LMT gives better result as
compared to J48. When we observe the confusion matrix in Fig. 3. J48 has correctly
classified 193 instances as very good movies, 5 instances as good movies, 0 instances
as excellent movie and the dataset does not contain poor movie which are below 3
ratings so confusion matrix has taken only three classes. At the same time, LMT has
correctly classified 199 instances as very good movies, 1 instance as good movie, 0
instance as excellent movie, and the dataset does not contain poor movie which are
below 3 ratings so confusion matrix has taken only three classes [13, 14].
Fig. 2 J48 and LMT
=== Confusion Matrix === === Confusion Matrix ===

a b c Classified as a b c Classified as
193 6 2 | a = Very good movie 199 2 0 | a = Very good movie
13 5 0 | b = Good movie 17 1 0 | b = Good movie
12 0 0 | c = Excellent movie 12 0 0 | c = Excellent movie
Fig. 3 Confusion Matrix J48 and Confusion Matrix LMT
4 Conclusion
As we saw in Sect. 3.2, i.e., comparison analysis of Decision Trees J48 and LMT,
accuracy of LMT is better than the accuracy of J48, as Decision Tree LMT has
correctly classified a greater number of instances. Based on the analysis of Confusion
Matrix, Classification accuracy and other required calculations shown in Fig. 2, such
as kappa statistics, mean absolute error, root mean squared error, relative absolute
error and root relative squared error which takes vital lead in classifying the instances
correctly, we conclude that LMT Decision Tree is the best suitable classification
method for classifying the movies trailers data set with good efficiency and accuracy.
5 Future Work
In the future, we will make good attempt to check whether all high budget movies
are excellent or very good to convey and whether all low budget and average budget
movies can also carry good ratings or not.
References
1. Sharma, A.K., Sahni, S.: A comparative study of classification algorithms for spam email data
analysis. Int. J. Comput. Sci. Eng. (IJCSE). 3(5) (2011). ISSN 0975-3397
2. Rangaswamy, S., Ghosh, S., Jha, S., Ramalingam, S.: Metadata extraction and classification of
YouTube videos using sentiment analysis. In: 2016 IEEE International Carnahan Conference
on Security Technology (ICCST)
3. Algur, S.P., Bhat, P., Kulkarni, N.: Educational data mining: classification techniques for recruit-
ment analysis. Int. J. Modern Educ. Comput. Sci. 2, 59-65 (2016). (Published Online February
2016 in MECS). http://www.mecs-press.org/.10.5815/ijmecs.2016.02.08
4. Bansal, A., Gupta, C.L., Muralidhar, A.: A sentimental analysis for youtube data using
supervised learning approach. Int. J. Eng. Adv. Technol. (IJEAT) 8(5, (2019, June). ISSN
2249-8958
5. Weka—Data Mining Machine Learning Software. Available at http://www.cs.waikato.ac.nz/
ml/weka/
6. Kalmegh, S.R.: Comparative analysis of WEKA data mining algorithm random forest,
Randomtree and LADTree for classification of indigenous news data. Int. J. Emerg. Technol.
Adv. Eng. www.ijetae.com. 5(1) (2015, January). ISSN 2250-2459, ISO 9001:2008 Certified
7. Bhat, P., Malaganve, P., Hegde, P.: A new framework for social media content mining and
knowledge discovery. Int. J. Comput. Appl. (0975 – 8887) 182(36) (2019, January)
8. Kalmegh, S.: Analysis of WEKA data mining algorithm REPTree, simple cart and randomtree
for classification of Indian News. Int. J. Innov. Sci. Eng. Technol. (IJISET) 2(2) (2015, February)
9. Nahar, N., Ara, F.: Liver disease prediction by using different decision tree techniques. Int. J.
Data Mining Knowl. Manag. Process (IJDKP) 8(2) (2018, March)
10. Algur, S.P., Bhat, P.: Web video mining: metadata predictive analysis using classification tech-
niques. Int. J. Inf. Technol. Comput. Sci. 2, 68–76 (2016). (Published Online February 2016 in
MECS)
11. Algur, S.P., Bhat, P.: Abnormal web video prediction using RT and J48 classification techniques.
Int. J. Comput. Sci. Eng. 4(6), 101–107 (2016, June). E-ISSN 2347-2693
12. Malika, H., Tiana, Z.: A framework for collecting youtube meta-data. In: Peer-Review Under
Responsibility of the Conference Program Chairs. Published by Elsevier B.V. https://doi.org/
10.1016/j.procs.2017.08.347
13. Algur, S.P., Bhat, P., Ayachit, N.H.: Educational data mining: RT and RF classification models
for higher education professional courses. Int. J. Inf. Eng. Electron. Bus. 2, 59-65 (2016).
(Published Online March 2016 in MECS, http://www.mecs-press.org/) https://doi.org/10.5815/
ijieeb.2016.02.07
14. Vadhanam, B.R.J., Mohan, S., Ramalingam, V.V., Sugumaran, V.: Performance comparison
of various decision tree algorithms for classification of advertisement and non advertisement
videos. Indian J. Sci. Technol. 9(48) (2016, December). https://doi.org/10.17485/ijst/2016/
v9i48/102098
A System to Create Automated
Development Environments
Using Docker
N. S. Akhilesh, M. N. Aniruddha, Anirban Ghosh, and K. Sindhu
Abstract In software development, there is often a great deal of dependencies that

need to be set up and managed before the actual process of development can begin.
For instance, before one can start doing Java development, one would need to install
and set up the JRE, JVM, Gradle (Optionally) and Maven (Optionally). Additionally,
when working with a team, maintaining consistency in the dependency versions used
by everyone in the team becomes necessary as well (since version clashes can often
lead to unpredictable behavior and incompatibility issues). To resolve all this, there
exists tools such as package managers and the more commonly used Docker. Docker
is a tool often used in development to achieve cross platform automated dependency
management and uniformity between development and production environments.
While Docker is an incredibly useful tool, it does have a learning curve associated
with it, and novice programmers would need to understand concepts such as virtual-
ization and the like before they can start using Docker and benefiting from its various
features. All of this facilitates the need for an application or system that would allow
developers to leverage the power of Docker without requiring any knowledge of it. In
this paper, we illustrate and implement such a system, one which allows even novice
programmers to easily and effortlessly create automated development environments
that leverage Docker under the hood.
Keywords Docker · Automation · Dependency management
N. S. Akhilesh (B) · M. N. Aniruddha · A. Ghosh · K. Sindhu

BMS College of Engineering, Bangalore, India
e-mail: 1bm16is009@bmsce.ac.in
URL: https://bmsce.ac.in/home/Information-Science-and-Engineering-About
M. N. Aniruddha
A. Ghosh
K. Sindhu
e-mail: ksindhu.ise@bmsce.ac.in
https://doi.org/10.1007/978-981-33-4543-0_59
556 N. S. Akhilesh et al.
1 Introduction
Modern applications are often quite complex. Generally, they are composed of a
number of software each playing a vital role in the application (e.g., a MEAN stack
application uses MongoDB as its database, Express for routing, AngularJS for its
frontend and NodeJS for its back-end). Setting up and managing each of these soft-
ware dependencies (as well as each of their own internal dependencies) can be quite
cumbersome, especially in a team of people working on the application. This is
where Docker comes in. Docker is a tool that (among other things) allows one to
define all the software dependencies of any application in a configuration file called a
Dockerfile (or docker-compose.yml file) and feed that configuration file into Docker
which will then use the file to create a development environment which has all the
dependencies mentioned in the file automatically installed and set up. And in a team,
the configuration file can easily be shared via git to ensure uniformity across the team
over the application’s dependencies.
By automating much of the dependency setup process, Docker has not only solved
a great deal of problems related to dependency management (such as version clashes,
complications involved in version updates and OS-level interference) but has also
made the actual process of development easier [1]. Needless to say, Docker is an
incredibly useful software, and its wide scale adoption and use in the industry is
reflective of this. In this paper, we focus primarily on Docker’s ability to automate
dependency management as we believe that a great deal of programmers (particularly
novice programmers and students) can benefit from this feature. Novice programmers
in particular often face difficulty setting up dependencies for a language, tool or
framework before they can start using it (Ex: Setting up Ruby on Rails on Windows
or setting up a MEAN stack application). Docker can be useful in this situation, but
it does have an associated learning curve and people who are new to programming or
unfamiliar with the topic may need to understand things such as virtualization before
they are able to understand Docker.
In this paper, we propose a system or application that takes the form of an inte-
grated development environment (IDE) that uses Docker under the hood to set up
environments for any language, tool or framework which people can immediately
start working with. The end product should be a code editor similar to VS code but
one where a developer can additionally type out a few pieces of information (such as
a language and its version) and the editor will then automatically set up an environ-
ment based on that information in which the developer can start coding. In essence,
this is a system which abstracts on top of Docker to allow for its use without having
to know how to write a Dockerfile or docker-compose.yml file. Such a system would
be useful to novice programmers, students (who want a learn various languages,
tools and frameworks without having to worry about setting them up), computer labs
(since this one system can replace a number of languages, tools and frameworks that
would otherwise need to be set up individually) and developers who are interested in
leveraging the power of multiple languages side by side in a Jupyter notebook style
operating environment.
A System to Create Automated Development Environments Using Docker 557
2 Literature Survey
2.1 Docker
Docker is a tool used to create quick execution environments known as containers to

perform various tasks. This concept of using containers to manage various function-
alities is known as containerization. Containerization in many ways is the evolution
of virtualization [2]. It was standard practice that if one wanted to run several differ-
ent tasks without those tasks interfering with them, you would run them on individual
virtual machines, and a large part of cloud services such as AWS, GCP, Azure still
uses virtual machines to provide isolated serviceable environments for clients. But
one of the major disadvantages with virtual machines is the amount of overhead they
bring. An Ubuntu virtual machine, often contains a large amount of software such
as a Web browser, a GUI, a text editor that are largely useless if all you want to
do with it is run a REST server. Containers help solve this problem because unlike
virtual machines which are entire OSs that take up a certain amount of hardware
resources and space, containers are nothing more than lightweight isolated process
groups which can do many of the same things that a virtual machine can but without
bringing the large amount of overhead associated with virtual machines [3, 4].
Containers can be started and stopped near instantaneously since they are nothing
more than specialized processes unlike virtual machines which need time to boot
up an entire OS. Containers also take up significantly less resources and can be
automated extremely easily [5]. Docker containers have been shown to be faster than
virtual machines in boot up times, calculation times, random read, write and mixed
speeds and sequential read and write speeds [6]. Containers are able to achieve all
these benefits by using technologies which are already provided by the Linux kernel:
Control groups (CGroups) and kernel namespaces [3].
Control groups is a special feature of the Linux kernel which allows for the easy
allocation, management and monitoring of resources for a given process. It can also
be used to set limits to the amount of resources (RAM, CPU, etc.) that a process can
use [7]. Kernel namespace can be used to isolate process groups from each other.
A container can be assigned a specific namespace, and all processes and resources
inside the container are scoped to that namespace and cannot access anything outside
that namespace [8]. All in all, these various technologies come together in Docker
to create automated isolated lightweight execution environments.
Docker has become mainstream in the industry. It is commonly used as a tool
for the development, testing and deployment of microservice-based architectures
since containers are ideal environments in which microservices can be run in [9].
This is largely because Docker automates the process of setting up networks and
connections between the containers. Additionally, Docker gained a great deal of
traction in the field of DevOps. Another major benefit provided by Docker is that it
creates uniformity between the development, staging and production environments
since the same configuration can be used for both environments [10].
2.2 Docker in Research
Sample work and implementation are essential in many aspects of scientific research,
being able to reproduce the work of specific research has become very vital to its
verification and validation by researchers and domain experts. Though reproducing
computer software seems significantly simpler than replicating the physical envi-
ronments of some experiments, the ever-changing nature of software today and the
challenges of interoperable dependencies in software can make this task a serious
challenge. This is where Docker can prove to be extremely useful as it stands as
a far superior solution to existing solutions such as workflow systems and virtual
machines. Carl Boettiger illustrates this in his paper where he uses an R statistical
environment setup in various conditions (including Docker) and compares them [11].
Additionally, Docker is also useful in automating the various tasks of a workflow
system like makeflow and workqueue. Containers can be connected to various points
of a workflow’s infrastructure, and there have been several methods produced to
manage containers’ images that need to be shared for the execution of tasks [12]. All
of this hints to Docker’s extensive use in the field of research, and an area of interest
for this paper since replicating environments to test and review peer research is a
vital aspect of the field.
2.3 Electron JS
Electron is an open-source framework that can be used for desktop app development.
It is created and maintained by GitHub who used it to build the Atom editor. Electron
uses a combination of the Chromium browser and the NodeJS runtime to create fully
functioning desktop applications. Because of this, it allows for the UI development
of the application to be done using standard HTML, CSS and JS, while the core logic
of the application is done via NodeJS. A majority of Electron’s APIs is written using
C++ and Objective-C which are then exposed to the core logic via NodeJS bindings
[13].
3 Existing Solutions
Automation in development is not a new concept. There have been tools to do this even
before Docker. So if you are a developer, what are some of the ways you could tackle
common issues that arise when working on a coding project such as dependency hell,
poor documentation (which can often make it difficult to setup, initiate and work on
existing projects) and code rot (referring to code changing behavior due to external
circumstances such as OS updates or bug fixes in the languages used by the software)
[11].
One approach is to use OS package managers such as APT (Ubuntu), Home-

Brew (MacOS) or chocolatey (Windows) to automate installation and management
of dependencies and then use workflow systems such as MakeFlow to provide a
simple and automated means to build and execute code (instead of having to rely
on documentation). Additionally, to avoid code rot, one could set up a controlled
environment using virtual machines in which to develop the code in. To make life
easier, one could also use a tool like vagrant which automates the process of setting
up and managing virtual machines.
While the above-mentioned approach is fairly popular (especially in research),
it requires a developer to understand and be able to use a large set of tools such
as package managers and workflow systems. Additionally, virtual machines can be
slow and resource intensive, thus slowing down the speed of active development
at the cost of providing consistency and predictability. A better approach would
leverage Docker which as discussed above can solve all the issues we discussed
earlier: dependency hell, poor documentation and code rot (via configurable isolated
lightweight shareable environments called containers for the code to run in).
Docker has seen a large deal of adoption in industries and corporations for its
ability to automate the development workflow and make the lives of developers easier.
So if you are an experienced developer, you could just learn Docker and achieve all
the benefits offered by the system we intend to propose in the next section, but
Docker is not a simple tool. If you are novice programmer, a student or someone
who is completely new to development, you will be less likely to learn something
like Docker (since you have not even learnt a language yet) and thus cannot benefit
from all the features it provides.
4 Proposed System
As we have discussed in the previous section, Docker alone does already achieve a
great deal of automation. Its only flaw being its associated learning curve which can
deter people who are new to programming. Therefore, in this paper, we propose a
solution which abstracts the features of Docker and provides them to a user through
an easy-to-use and simplified user interface, allowing the user to leverage Docker
without knowing how to use it.
Our proposed system is an IDE in which a user would enter a few details in a
form and the IDE will then use that information to set up a development environment
by creating the required Dockerfiles and docker-compose.yml files as well as the
required language files for that specific language, tool or framework that the user
wishes to work with.
The proposed system will also allow for these development environments to be
shareable. This is done by allowing for each development environment to be created
using a minimal configuration file. Adding this configuration file to any normal
project will make it compatible with the system, and we refer to such a project as
a recipe. These projects (recipes) can then be shared and managed via Git allowing
for them to be community driven and customizable.
The team behind the proposed system will maintain official recipes for various
languages that will act as both stable defaults and base recipes. Any individual can
then build on these official recipes to create customized and personalized recipes, and
this would be especially useful in private organizations where development teams
might have their own custom setups for products.
5 Mechanism
Assuming that all the OS and software requirements are fulfilled, the application
works in the following way: (Let us assume that a user of the application wishes to
execute some code in NodeJS)
First the application pulls a recipe template (remember that a recipe is merely
a project with a special configuration file and template here refers to handlebars
template) for NodeJS (the official by default, but a custom one can be specified by
the user) from GitHub/GitLab/BitBucket and stores the template in a special directory
reserved for the application by the OS which is usually:
• Windows XP—C:/Documents and Settings/USERNAME/Application Data
• Windows 7 and above—C:/Users/USERNAME/AppData/Roaming
• MacOS—/Users/USERNAME/Library/Preferences
• Linux—/home/USERNAME/.local/share.
Then, the application gets any inputs from the recipe that were specified by the
creator of the recipe and renders them as a form to the user. The application then
takes the output of the form and uses it to fill out the recipe template and then places
the recipe in a directory local to the project in which the user is working (usually a
directory called “judip_recipes”.
After the recipe is added, it appears on the frontend as a codeblock to the user
where the user can then enter any type of (in this case NodeJS) code, and the entered
code gets saved to the locally stored recipe.
Finally, the application checks the newly installed recipe’s configuration file which
contains “execute” and “execute_background” keys that the application can use to
execute the recipe.
Fig. 1 NodeJS code being executed by the IDE in an expanded codeblock
6 Results
To demonstrate our proposed system in use, we have developed a PoC desktop

application which is implemented in the form of a command line interface (CLI)
built using NodeJS and a graphical user interface (GUI) built using ElectronJS. The
CLI contains the core logic of the application that interacts with Docker underneath
the hood to provide environments to code in, and the GUI provides an interface with
which the user can then interact with the CLI seamlessly. The below screenshots
depict the application running code written in NodeJS on a laptop with a Windows
10 OS, i5-6300HQ Intel processor, 8 GB of RAM, an Nvidia GTX 960M graphics
card and 256 GB Sandisk SSD (Figs. 1 and 2).
The above execution was performed using the official NodeJS recipe for the
proposed system which can be found at https://github.com/izalus which also contains
all the code used to create the application (both CLI and GUI) as well as official
recipes for other languages such as C, C++, Java and Python.
Fig. 2 NodeJS code being executed by the IDE in an un-expanded codeblock
7 Conclusion
In this paper, we illustrated the concept of a recipe which is a configuration that

the application uses to recreate development environments and enables them to be
shared easily via any kind of public distribution platform like GitHub, BitBucket, etc.
We built an application using a range of technologies: NodeJS, Electron, ReactJS,
Git and Docker that allows one to create automated development environments for
various languages, tools and frameworks. The application is split into a command
line interface that is shared through NPM and a graphical user interface that is shared
using GitHub or Bintray.
References
1. Willis, J.: Docker and the Three Ways of DevOps. https://goto.docker.com/rs/929-FJL-178/

images/20150731-wp_docker-3-ways-devops.pdf
2. Turnbull, J.: The Docker Book: Containerization is the New Virtualization (2014)
3. Docker. https://www.docker.com/
4. Linux Containers. https://linuxcontainers.org/
5. Merkel, D.: Docker: Lightweight Linux Containers for Consistent Development and Deploy-
ment. https://www.seltzer.com/margo/teaching/CS508.19/papers/merkel14.pdf
6. Rad, B.B., Bhatti, H.J., Ahmadi, M.: An introduction to Docker and analysis of its perfor-
mance. IJCSNS Int. J. Comput. Sci. Netw. Secur. 17(3) (2017). http://paper.ijcsns.org/07_
book/201703/20170327.pdf
7. Linux control groups. http://man7.org/linux/man-pages/man7/cgroups.7.html
8. Linux namespaces. http://man7.org/linux/man-pages/man7/namespaces.7.html
9. Anderson, C.: Docker [Software engineering]. IEEE Softw. 32(3), 102-c3 (2015). https://
ieeexplore.ieee.org/document/7093032
10. Zhang, Q., Liu, L., Pu, C., Dou, Q., Wu, L., Zhou, W.: A Comparative Study of Containers and
Virtual Machines in Big Data Environment (2018). https://arxiv.org/pdf/1807.01842.pdf
11. Boettiger, C.: An Introduction to Docker for Reproducible Research, with Examples from the
R Environment (2014)
12. Zheng, C., Thain, D.: Integrating Containers into Workflows: A Case Study Using Makeflow,
Work Queue, and Docker (2015)
13. Electron. https://www.electronjs.org/
Novel Methodologies for Processing
Structured Big Data Using Hadoop
Framework
Prashant Bhat and Prajna Hegde
Abstract There are many tools and techniques to store and process data. But with
big data usual traditional systems fail to handle big data. The reason is its structure,
size, etc. Because of this reason many new tools have been developed and Hadoop
is one of them. Hadoop is a framework that contains many tools to manage big
data. Apache Hadoop has tool called Hive which can be used to process big data in
structured form. There are many ways in which big data can be processed. But if the
user is not well-versed with programming and knows query language like SQL, then
required information can be retrieved by using Hive. Using Apache, Hive not only
reduces the number of lines of coding but also saves time of programmer. This paper
explains the working of Hive along with an illustration of how useful data can be
retrieved by using HiveQL. This paper presents the effective way of achieving big
data analytics using big data using Hadoop Hive.
Keywords Big data · Big data analysis · Hadoop · Hive · MapReduce
1 Introduction
The data obtained in huge volume from different sources can be in any form, i.e., it
may be structured, unstructured, and even semi-structured. Based on whether data is
structured, unstructured, or semi-structured, appropriate tool can be selected in order
to get useful data. Challenge is getting something which has a value for the user from
the huge amount of data gathered from various sources [1]. Analysis of big data [2]
results in information that can be used by the user to implement new ideas in their
business, hence results in increasing efficiency of the business. Getting information
from huge dataset can also result in monitoring financial transactions. Information
P. Bhat · P. Hegde (B)

School of Computational Sciences and Information Technology, Garden City University,
Bengaluru, Karnataka, India
e-mail: prajna_rh@yahoo.co.in
P. Bhat
e-mail: prashantrcu@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_60
566 P. Bhat and P. Hegde
retrieval can be done from big data [3] that can be used in healthcare, crime, and
other fields. The data that flows into the system needs to be analyzed effectively and
quickly to get useful data [4]. Hadoop enables storing and processing of big data in
distributed systems across a cluster of computers using simple programming model.
Hadoop MapReduce is one of the data processing technique which can be applied
to perform big data analytics. Another Hadoop tool Apache Hive can be used to
process data quickly. Hive is a data warehouse system which allows user to write
query similar to SQL and helps to get appropriate answer to the query. Hive [5] is
used to analyze huge amount of data stored in HDFS. This paper gives insight into
working with big data using Hive. Also, this paper explains the flow of data in Hive
system, and the way by which query is processed [6]. This paper also focusses on
characteristics of Hive system.
2 Related Work
Authors Kan et al. [7] presented a paper, where they say Electronic Health Record
(HER) store information in digital format. Due to technological innovations, data in
HER is increasing. Effective techniques are needed to store and analyze and interpret
heterogeneous data in HER. In this paper, authors focused on techniques by which
data in HER can be analyzed, and required information can be retrieved. Hive queries
are executed in HDFS. This paper also shows the use of tableau as a data analysis
technique to get meaningful information from visual graph.
Authors Potharaju et al. [8, 9] say Hadoop is not a software. It cannot be down-
loaded directly into the computer. Hadoop is a framework that contains many tools.
It makes use of ad hoc queries and analyses huge datasets stored in Hadoop. SQL like
language is facilitated by Hive called HiveQL. In this paper, authors presented simple
examples for using Hive using Hadoop. This paper also explains how to create table
and store data in table along with getting data from table when required. Cumulative
CPU time, time required for fetching records from files are also explained in this
paper.
In this paper, author says Apache Hive which provides analytical power to the
users and organizations. Hence, it has become in practice standard for SQL. Hive
is created by Facebook in 2008. This paper compares Apache Hive with Impala,
Shark, and also HAWQ. This paper explains the strength of Hive which has become
enterprise SQL data warehouse.
Authors Thusoo et al. [10] presented a paper, where they say that warehouse
system is becoming expensive as datasets to be analyzed are growing with high
frequency. MapReduce programming model needs user to write custom programs
which make it low-level model. This paper explains Hive architecture, Hadoop,
HiveQL language. HiveQL allows user to add custom MapReduce scripts to queries.
HiveQL contains tables, supportive data type, arrays, maps, etc. Along with this, Hive
also has meta store which contains schemas and statistics that can be used for data
Novel Methodologies for Processing Structured … 567
exploration, query optimization, and compilation. Authors have explained structure

of Hive warehouse in this paper.
Author Gupta [11] presented a paper, where he says it is very difficult as well as
costlier to process huge amount of data in traditional way. Popular framework called
Hadoop written in java is used by companies like Facebook, Yahoo, etc. These
companies use Hadoop to process huge data in commodity hardware. Hive is used
to process structured data using query processing called HiveQL which is similar to
SQL. HiveQL is introduced by Facebook. Hive contains metastore, system catalog
which are useful while doing query optimization, data exploration.
Data growing exponentially with time can be stored and processed using framework
called Hadoop. It provides different tools like MapReduce, Hive, Tez, etc. MapRe-
duce is a programming model which processes data using two steps called map and
reduce. User needs to write lengthy programs in order to work with data. Rather
than writing long codes, it is easy to query dataset to retrieve information. Hive is
built to work with structured data in the way same as SQL. Hive can be used to
query data which is in huge volume same as SQL. Hive is a warehouse infrastructure
developed on top of Hadoop. Hadoop Hive architecture is a solution to manage big
data. It works on data which is stored in HDFS. Hive uses language called HiveQL
to query data. Hive works on the principal of write once and read many times. Hive
is a mirage which processes data using MapReduce but no need to write long code
for the user. Hive query is converted into MapReduce program, and data is retrieved
and provided to the user. Hive is just a translator which makes work of user much
easier (Fig. 1).
3.1 UI
User interface is an interface between user and Hive. It allows user to communicate
with Hive. It provides Hive which provides command line interface, web interface,
and thrift server to the users to submit their queries.
3.2 Metastore
It stores structure details of partition, tables. Information like number of columns in

table, data types of databases are stored in metastore. Hive uses derby SQL server
as default metastore.
User Interface MapReduce
Web Name Node

1 Execute
Interface
query
Hive CLI Resource
Manager
Thrift
HDFS
6 Execute
Execution En-
3 Get metadata
2 Get plan
Driver Compiler Metastore
5 Send plan 4 Send metadata
Fig. 1 Hive data flow
3.3 Executing Engine
Its work is to execute the plan developed by compiler. To execute the work plan, it
interacts with name node, resource manager. It communicates with data node, where
actual data is stored. It also communicates bidirectionally with metastore to perform
data definition language operations. After communication with Hadoop daemons like
data node, name node, and job tracker execution engine executes query on HDFS.
The result generated is sent to user interface via driver.
4 Characteristics of Hive
Hive is a data warehouse infrastructure which resides on the top of Hadoop and is
used to analyze big data. Some of the characteristics of Hive are as follows (Fig. 2):
• Large data: Hive is a tool that can be used to process data with huge volume.
• Language: Hive uses a query language called HiveQL.
• Table structure: Hive stores data in table format. That is, it stores data in terms of
rows and columns.
• Data analysis: Hive is used to retrieve useful information from large dataset.
Hence, it helps in data analysis.
• Storage: Hive works on data stored in Hadoop distributed file system.
• Multi-user: More than one user can query data stored in Hadoop distributed file
system at the same time using HiveQL language provided by Hive.
Large
Data
Multi User Language
Hive
Table
Storage
Structure
Data
Analysis
Fig. 2 Characteristics of hive
Hive can be used to query huge database which cannot be queried using structured
query language (SQL). Queries are converted into series of MapReduce jobs. Hence,
user need to to write long MapReduce codes. Hive uses a query language called
HiveQL to get useful work done (Fig. 3).
Consider a dataset of Zomato restaurants in India. It is a huge dataset which can
be effectively queried using Hive. It contains following attributes:
• Res_id: It represents restaurants id.
• Name: It represents name of the restaurant.
• Establishment: It gives details of restaurant whether it is dhaba, quick bites, casual,
etc.
• Url: It represents url address.
• City: represents the city name, where the restaurant is located at.
• City_id: It represents city id number.
• Locality: It represents locality of restaurant.
• Latitude: It gives latitude coordinate of restaurant.
• Longitude: It gives longitude coordinate of restaurant.
Fig. 3 Zomato India dataset

• Zipcode: It represents zipcode of restaurant.

• Country_id: It represents id of country in which restaurant resides.
• Locality_verbose: It gives locality of restaurant along with city.
• Cuisines: It gives details about type of cuisines available in restaurant.
• Timings: It represents timings during which restaurant provides service.
• Average_cost_for_two: It represents cost for two customers in different cuisines.
• Price_range: It gives details about range of price for food.
• Currency: It represents type of currency.
• Highlights: It represents highlight of the corresponding restaurant.
• Aggregate_rating.
• Rating_text: It represents text based on ratings.
• Votes: It represents number of ratings.
• Photo_count: It gives details about photo count.
• Opentable_support: It represents whether open table support is provided or not.
• Delivery: It represents whether online delivery or not
• Takeaway: It represents whether takeaway service is there or not.
As mentioned, this dataset has number of columns and rows. The data is huge and
cannot be queries using SQL [12]. This can be made possible with the help of Hive.
The process will take time but task can be made possible. Big data analysis can be
done using Hive by executing series of queries. Here, an attempt is made to retrieve
information from Zomato restaurant dataset.
To analyze this dataset [13], first database called "dataset" is created. This can be
done using following command.
• CREATE DATABASE dataset;
Next, a table called zomato1 is created in database that we have created. This can
be done by following command.
• USE dataset;
• CREATE TABLE zomato1 (Res_id, ineteger, Name string, Establishment
string, Url string, Address string, City string, City_id string, Locality
string, Latitude float, Longitude float, Zipcode integer, Country_id integer,
Locality_verbose string, Cuisines string, Timings string, Average_cost_for_two
integer, Price_range integer, Currency string, Highlights string, Aggregate_rating
float, Rating_text string, Votes integer, Photo_count integer, Opentable_support
integer, Delivery integer, Takeaway integer).
row format delimited.
fields terminated by “→”;
Above commands create a database called dataset. And creates a table called
zomato1 inside “dataset” database. Next step is to load dataset into the created table
that is zomato1. To do so, following command is used.
• LOAD DATA LOCAL INPATH ‘/home/prajna/Desktop/zomato_restaurants_in_
India.txt’ into table zomato1;
Fig. 4 Output sample
Once dataset has been added to the table, it can be queried as required by the
user. Now, dataset has been places in Hadoop distributed file system. The useful
information can be retrieved from the dataset stored in Hadoop distributed file system
by writing the query in HiveQL language.
A. Display the number of restaurants in Panaji.
SELECT COUNT(*) from zomato1 WHERE locality = “Panaji”;
This query gives the total umber of restaurants which provide Zomato service
in the city Panaji.
B. Display different names of cities which are mentioned in the dataset.
SELECT DISTINCT(city) FROM zomato1;
This query returns different names of the restaurants given in the dataset.
C. Display names and average cost per two person of restaurants located in Amritsar.
SELECT name, average_cost_per_two from zomato1 where city = “Amritsar”;
This dataset gives the list of restaurant names and respective cost per two people.
D. How many restaurants are providing zomato service in Udupi?
SELECT COUNT(CITY) from zomato1;
This query gives total number of restaurants in Udupi city (Fig. 4).
5 Conclusion
In this paper, we discuss how data flows in Hive and how it processes data. Along
with characteristics of Hive, this paper explains some of the novel examples for
creating, storing, and retrieving useful information using Hive QL command. Big
data is mainly recognized by volume, velocity, and variety. It can be structured,
unstructured, and semi-structured. Big data can be analyzed by using technique like
MapReduce. But MapReduce expects user to write code to get useful information
from stored data. But if the data stored is in structured format, then it can be analyzed
using Hadoop Hive which required user to write query instead of long programming
code. It not only saves the time of user but also helps user who does not have much
expertise in coding. It processes the data by storing it in the form of rows and columns,
i.e., in table format. Hive converts query written in Hive QL language to MapReduce
tasks and processes the data.
References
1. Peng, X., Liu, L., Zhang, L.: A hive -based retrieval optimization scheme for long-term storage
of massive call detail records. IEEE Access 1–1. https://doi.org/10.1109/Access.2019.2961692
2. Shakhovska, N., Veres, O., Mariia, H.: Generalized formal model of big data. ECON-
TECHCHMOD Int. Q. J. 5(2), 33–38
3. Kapil, G., Agrawal, A., Khan, R.A.: Big data security issues. Asian J. Comput. Sci. Technol.
7(2), 128–133
4. Pandey, P., Satsangi, C.S.: Comparative performance using Hadoop ecosystem-PIG and HIVE
through rendering of duplicates. ICANI2018. https://doi.org/10.1007/978-981-13-2673-8_11
5. Krishna Mohan, K.V.N.: Query optimization in big data Hadoop using hive 4(1), 2347–9272
(2016)
6. Pushpalatha, N., Sudheer, P.: Data processing in big data by using hive interface 3(4), 2321–
7782 (2015)
7. Kan, K., Cheng, X., Kim, S.H., Jin, Y.: Apache hive-based big data analysis of health care data.
Int. J. Pure Appl. Math. 119(18), 237–259 (2018)
8. Potharaju, S.P., Shanmuk Srinivas, A., Tirandasu, R.K.: Case study of hive using Hadoop. Int.
J. Eng. Res. Technol. 3(11) (2014). ISSN: 2278–0181
9. Pushpa, S.K., Manjunath, T.N.: Analysis of airport data using Hadoop-hive: a case study. Int.
J. Comput. Appl. 0975–8887 (2016)
10. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy,
R.: Hive a petabyte scale data warehouse using Hadoop. In: Proceedings of 26th International
Conference on Data Engineering, California, USA, pp. 996–1005. https://doi.org/10.1109/
ICDE.2010.5447738
11. Gupta, A.: HIVE-processing structured data in Hadoop. Int. J. Sci. Eng. Res. 8(6), 2229–5518
(2017)
12. Patel, N.: Analyzing of vehicle registration trend in NY using HBase, pig, hive and MapReduce.
https://doi.org/10.13140//RG.2.2.18574.92488
13. Amiripalli, S.S., Tirandasu, R.K.: Case study of hive using hive. Int. J. Curr. Eng. Sci. Res.
1(3), 2393–8374 (2014)
14. Dubey, A.; Big data. Int. J. Eng. Serv. Manage. Res. 5, 9–12 (2018). https://doi.org/10.29121/
ijetmr.v5.i2.2018.606.
15. Manike, C., Nanda, A.K., Gajulagudem, T.: Hadoop Scalability and Performance Testing in
Homogeneous Clusters. https://doi.org/10.1007/978-3-030-30577-2_81
Intelligent Cane for Assistant to Blind
and Visual Impairment People
Meet Patel, Hemal Ahir, and Falgun Thakkar
Abstract Everyone wants freedom in their life to go anywhere and everywhere,

but some cannot have due to their compromised vision. Electronic mobility aid has
been proposed in this paper to help them make their life easier and more convenient.
The National Institutes of Health of the United States published an article on NCBI
reporting problems faced by blind people in navigating and obstacle detection. To
overcome these for a visually impaired person by using technologies and the Internet
of Things (IoT) platforms like the Thingspeak and IFTTT server. This mobility aid
will help their guardian or family member to navigate them. In addition to this, if a
person tumbles, then an alert message is sent to their closed ones so help can reach
them. The current system becomes bulky so to overcome their restrictions this device
contains all the components befitted within this stick.
Keywords Blind stick · Internet of Things · GPS module · ESP8266
1 Introduction
According to the latest research conducted by World Health Organization, there are
at least 2.2 billion people suffering from vision impairment or blindness, of whom
around 1 billion people have a vision impairment that could have been prevented
or has yet to be addressed [1]. In India, there are 40 million people blind in that
1.6 million are children [2]. Major reasons being infections, diabetic retinopathy,
M. Patel (B) · H. Ahir · F. Thakkar

G H Patel College of Engineering and Technology, Bakrol Road, Vallabh Vidhyanagar, Gujarat,
India
e-mail: meet21p@gmail.com
H. Ahir
e-mail: hemalahir149@gmail.com
F. Thakkar
e-mail: falgunthakkar@gcet.ac.in
https://doi.org/10.1007/978-981-33-4543-0_61
574 M. Patel et al.
age-related macular degeneration, cataract, and glaucoma [1]. But among all this,
the cataract is the most common cause for blindness.
While blind people may require assistance in certain circumstances but they do
not get that always, so for their convenience this assistive aid is to guide them rather
than old and traditional white cane or guided dogs. Adoption of assistive technology
in their life helps them to make their life comfortable. There is some remarkable work
done in the field of electronic mobility aid which is being discussed in the below
section; a smart stick is also one of the electronic travel aids to support them but there
is some hurdle so to overcome this, the updated system is proposed. In this system,
navigation of the user is possible, and the guardian of the user also gets alert emails
and calls whenever the user tumbles. The major problem with the existing system is
that it is bulky and complicated for a user to use and understand so to avoid this, all
the modules in this system are implemented within the stick, and it is foldable which
makes it convenient in terms of mobility.
2 Related Works
Already there are many significant works done in this domain several researchers
came with some good ideas and constructed their projects. Ameliorating some tradi-
tional mobility aid by adding various electronic sensors. Different design of ETAs is
discussed below along with their functionality.
2.1 Design and Implementation of Mobility Aid for Blind

People
All the electronic systems are implemented in jackets mounted with five sensors.
There will be five ultrasonic sensors mounted on the jacket such that one sensor
detects potholes or stairs; the other sensor is implemented for obstacles near the
head, and the remaining three sensors are for right, left, and front [3]. They have
included the salient feature for users that the microcontroller finds minimum value
from these three ultrasonic sensors and notifies the user about obstacles by voice
command which is pre-installed in micro SD card [3]. The downside of this system
is that people won’t find it comfortable wearing it all the time.
2.2 Smart Stick Blind and Visually Impaired People
This device was developed by Mukesh Agarwal and Atma ram Gupta. This version
of the stick includes ultrasonic and water sensors [4]. Both these sensors help them in
Intelligent Cane for Assistant to Blind … 575
obstacle detection and water detection as their name suggests. Stick is integrated with
the SIM808 module which supports global system for mobile communication and
merges GPS technology for satellite navigation [4]. Sim card is used to implement
identical communication like regular cell phones. The drawback of this gadget is that
its design is highly complex, and modules does not get fit inside the stick.
3 Proposed System
3.1 Electronic Component and Their Usage
Ultrasonic sensor. The working principle of the ultrasonic sensor and radar system
is identical. The basic difference between sonar and ultrasonic is that sonar is used
for underwater with both kinds of frequency high as well as low while on another side
ultrasonic is used for terrain surface, and only uses high frequency. The electrical
signals provided to echo pin of ultrasonic sensor are converted to acoustic waves and
vice versa. The ultrasonic wave is also called an acoustic wave. The ultrasonic sensor
generates an acoustic wave at 40 kHz. At the frequency of 18 kHz, the acoustic wave
travels in a free medium by ultrasonic sensor (HC-SR04) [5]. It provides range from
2 to 400 cm the eight acoustic waves burst each burst having a 10 us duty cycle is
sent to the trigger pin of the sensor by the microcontroller at the same time timer
initiates [5]. Immediately, the timer stops after receiving reflected acoustic waves.
The primary aim of this sensor in a blind stick is for obstacle detection like potholes,
staircase, and many more by calculating the time difference between transmitting
and receiving a signal; the distance can be calculated by using this formula:
Time Taken × Speed of Sound

Distance =
2
MPU-6050 IMU. This module has six output, three accelerometer output, and
three gyroscope output so it is also known as six-axis motion tracking or six degrees of
freedom device [6]. It uses Micro-Electromechanical System (MEMS) and Coriolis
effect to calculate force and angle in the respective plane. The accelerometer gives
gravitational acceleration, and the gyroscope gives a rate of change in velocity over
time along the x, y, z-axis. This module uses I2C as a communication protocol with
other devices [6]. MPU 6050 IMU also has an embedded thermistor and Digital
Motion Processor (DMP). DMP is used to compute motion processing algorithms.
This sensor plays a vital role by giving an angle of a stick concerning ground. Pictorial
representation of MPU6050 IMU is given in Fig. 1.
Speaker and vibrator motor. The speaker of 8 is connected to take maximum
output. Both these components alert the user about an obstacle or problem on their
way. If a person falls, then the speaker activates and will divert the public toward
576 M. Patel et al.
Fig. 1 MPU6050 [9]
them so that they can get help and also as per distance between the obstacle and user
increases the intensity of the vibration motor increases.
GPS Neo-6M. GPS means a global position system that works on a basic math-
ematical principle of trilateration. The position is determined by calculating the
distance between the receiver and the satellite. Nowadays, the NEO-6 module series
is very popular due to its cost-effectiveness, on a board memory chip, miniature pack-
ages, high-performance; it also has a ceramic patch antenna and backup battery [7].
This series is based on the u-blox NEO-6M GPS engines. This module works well
with a DC input in the 3.3–5 V range [7]. UART, USB are some well-known commu-
nication protocols supported by this module. Well-known communication protocols
supported by this module. GPS module sends raw data in form NEMA message [7].
User 2D location can be determined by the receiver using at least three satellites
(latitude and longitude) and can track movement. By using four or more satellites
in view, the receiver can determine the user’s 3D position (latitude, longitude, and
altitude).
Node MCU. In this project, we have used Node MCU by Devkit having ESP8266
as a microcontroller. This microchip is integrated with Wi-fi SoC and low power
consumption. ESP8266 chip drives between 3 and 3.6 V [8]. Node MCU module has
a total of 30 pins from them 17 are GPIO having all peripheral duties ADC channels,
UART interface, PWM output, SPI, I2C, and I2S interface [8]. It includes four power
pins, three 3.3 V, and one V in . This microcontroller is for connecting through the
internet and makes a blind stick as a part of IoT so we can access from anywhere in
the world. This microcontroller board is shown in Fig. 2.
3.2 Design and Development
In this stick, there are features like emailing and calling as we have connected this
stick with the Internet, a person who has access to an account linked with a stick on
the server can see the location of users on Google maps. All the sensors as discussed
Fig. 2 Node MCU [10]
above are installed inside the stick for user best comfort. The modules used in this
device are less than 4 cm. As we have seen blind people use conventional sticks
having stagnant design so we have replaced this stick with a foldable mechanism.
Not only this, but the panic button is also included in this stick for an emergency
purpose which will automatically call the user’s guardian or relative, and at the same
time, email is also sent which incorporates latitude and longitude of the user. The
ultrasonic sensor is arranged in the outer surface of cane in such a way that it covers
all the obstacles which come in the direction and alert the user about them.
Fig. 3 2D prototype of cane

578 M. Patel et al.
3.3 Hardware Assembly
As Node MCU is the microcontroller of this system, it is connected with all the
modules. Node MCU is not only connected with MPU6050 but also with GPS module
having I2C protocol. Two General Purpose Input/Output (GPIO) pins of Node MCU
are connected to the echo and trigger pin of the ultrasonic sensor (HC-SR04); this
helps the system to calculate the distance between obstacle and stick. The other two
GPIO pin function as output and warns the user about an obstacle on their way.
The first is connected to the speaker, and the second is connected to the vibration
motor. One more pin takes action as input, and it is connected to is a panic button for
emergency purposes. Most of the sensors are oriented inside the stick so it becomes
more convenient for the user to hold it as shown in Fig. 4.
4 Working
As shown in Fig. 5, when the switch is on the microcontroller starts controlling

the sensor from its GPIO pins. First, the microcontroller will check whether it is
connected with Wi-Fi or not, and if not, then it will wait till the connection is estab-
lished. Once it makes sure that the connection is established, it activates trigger pin of
ultrasonic sensor which will send acoustic waves. This acoustic wave gets reflected
when it strikes an obstacle in front of a user so by calculating time difference, we can
find the distance between obstacle and user. Depending on the distance, the user is
notified by the vibration of the motor. When the distance between user and obstacle
is 40–60 inches, then the motor will vibrate in an interval of 1000 ms, as distance
decreases below 20 inch the vibration becomes constant and plays a predefined tune.
Next, the inclination of the stick is checked by the position of MPU6050. It gives the
position of the module in terms of x, y, z-axis, and this parameter is converted to roll,
Fig. 5 Flowchart
pitch, and yaw by monitoring roll parameter the system analysis whether the user
has fallen or not. If users tumble, then the stick will also fall so to navigate them;
data from GPS module is fetched, and at the same time, the IFTTT server will trigger
links related to webhook service which will call and send emails to the guardian or
relatives of the user. The call will just inform them to check email while in the email,
there will be the location of the user at one click. This will direct the user’s guardian
or relative through the Thingspeak server, and it will show the latitude and longitude
of the user as pinout on Google maps. The whole process will continue checking all
the aspects of the system until the system is switched off.
580 M. Patel et al.
5 Results
The system is checked in outdoor and normal conditions, and the results are according
to expectation. When the stick fall, microcontroller triggers IoT platform services
like IFTTT and Thingspeak. The call is received informing to check email which
consists of a link as shown in Fig. 6 is sent, and when we click on that link web page
consisting of pinout location on Google maps is opened as shown in Fig. 7.
There are two ultrasonic sensors connected in this system; their readings are taken
as input and shown in Table 1. In this table, the output of the vibration motor will
vary according to the range of the ultrasonic sensor when the range is between 40
and 60 in. vibration will occur on an interval of 1000 ms; if the obstacle is in range of
20–40 in, then the vibration of motor interval becomes 100 ms; at last, if the object
is less than 20 in., then the vibration will be constant, and pre-install tune is played.
Table 2 contains the data of time taken by the system to connect wi-fi and trigger
time from system to server and server to user’s relatives when the user tumble. Table
2 mostly depends on the Internet speed of a user and a user’s relatives. These readings
were taken under 25–30 Mbps as a user connection and 35–40 as the user’s relative
connections.
Fig. 6 Email sent by server
Fig. 7 Location of user

Table 1 Real-time performance of ultrasonic sensor and vibration motor

S. No. Ultrasonic sensor (inches) Vibration motor delay Intimation
Sensor 1 Sensor 2 (ms)
1 79.2 104.2 – No obstacle ahead

2 59.93 153.68 1000 Obstacle 40–60 in.
3 77.77 50.45 ahead
4 25.49 28.64 100 Object 20–40 in.
5 25.49 48.02 ahead
6 10.52 28.67 Continuous vibration Object 0–20 in. ahead
7 29.45 5.45
Table 2 Real time taken by system-server-user’s relatives

Feature Case no. Time taken by system to trigger Response time taken from server
the server (ms) to user’s closed one’s (s)
Wi-fi connection 1 6218 –
2 6505
3 3156
Call 1 1325 33
2 1184 31.25
3 1109 27.56
Email 1 1347 18.17
2 1240 15.51
3 1045 17.45
The model is a simple foldable blind stick consisting of many features and easy to
use for users. The system is designed as such to replace the old and traditional blind
stick which visually impaired people are using for a long time. The main motive
is to provide visually impaired being with affordable assistive technology costing
around 3000–4000. Although, it has limitations like user must be accompanied by a
smartphone having Internet connection 24/7. In future, artificial intelligence can be
install so that the user can easily operate through voice command and get feedback
in terms of voice.
References
1. World Health Organization: Blindness and Vision Impairment (2019). https://www.who.int/

news-room/fact-sheets/detail/blindness-and-visual-impairment. Accessed 26 June 2020
582 M. Patel et al.
2. The Tribune: India Home to 20 Percent of World’s Visually Impaired. https://www.tribunein

dia.com. Accessed 26 June 2020
3. Sourab, B.S., Ranganatha Chakravarthy, H.S.: Design and implementation of mobility aid
for blind people. In: International Conference on Power and Advanced Control Engineering
(ICPACE), pp. 290–294, Bangalore, India (2015)
4. Agrawal, M.P., Gupta, A.R.: Smart stick for the blind and visually im-paired people.
In: Proceedings of the 2nd International Conference on Inventive Communication and
Computational Technologies (ICICCT 2018), pp. 290–294, Coimbatore, India (2018)
5. The Working Principle Applications and Limitations of Ultrasonic Sensors. https://www.mic
rocontrollertips.com/principle-applications-limitations-ultrasonic-sensors-faq/. Last accessed
2020/05/07
6. MPU-6000 and MPU-6050. https://howtomechatronics.com/tutorials/arduino/arduino-and-
mpu6050-accelerometer-and-gyroscope-tutorial. Last accessed 2020/05/07
7. U-blox 6 GPS Modules DataSheet. https://www.u-blox.com/sites/default/files/products/doc
uments/NEO-6_DataSheet_%28GPS.G6-HW-09005%29.pdf,last. Accessed 2020/05/07
8. Insight into ESP8266 NodeMCU Features. https://lastminuteengineers.com/esp8266-nod
emcu-arduino-tutorial/. Last accessed 2020/05/07
9. Elementz Engineering. https://www.elementzonline.com/mpu6050-gy-521-3-axis-analog-
gyro-sensors-accelerometer-module
10. NodeMCU pintrest. https://pl.pinterest.com/pin/645492559069569752/
A Comprehensive Survey on Attacks
and Security Protocols for VANETs
Aminul Islam, Sudhanshu Ranjan, Arun Pratap Rawat,

and Soumayadev Maity
Abstract The increasing demand for improving road traffic and the driver’s safety
has brought our consideration towards the Intelligent Transportation System (ITS)
which was termed as Vehicular Ad-hoc Network (VANET). Its main goal is to
enhance roadways efficiency and traffic safety. In VANET many issues came while
implementing privacy and security measures. Since this network is vulnerable to
attacks on security therefore numerous security requirements need to be fulfilled. In
this survey we have emphasized on finding the limitations of the existing papers in
the respective field. Going through the fundamentals of VANET, we have illustrated
its communication methods. Then we have discussed the application areas and secu-
rity services in the contiguous sections. Later possible attacks in VANET have been
thoroughly discussed.
Keywords VANET · V2I · V2V · ITS · Security · Privacy · Authentication ·

Attacks
1 Introduction
Large number of vehicles can be seen running on the roads of a city. Road traffic
controllers manually direct vehicles to reduce the traffic congestion and prevent road
accidents but without using wireless communication technology, it is really a hec-
A. Islam (B) · S. Ranjan · A. P. Rawat · S. Maity

Department of Information Technology, Indian Institute of Information
Technology Allahabad, Prayagraj, India
e-mail: iiita.aminulislam@gmail.com
S. Ranjan
e-mail: imsranjn@gmail.com
A. P. Rawat
e-mail: rawatarun1592@gmail.com
S. Maity
e-mail: soumyadev@iiita.ac.in
https://doi.org/10.1007/978-981-33-4543-0_62
584 A. Islam et al.
tic job for them. Unknowingly, they may direct the traffic to an already busy road.
Also, they may not be aware of some emergency vehicles which may be stuck in
the traffic away from their eyesight. To solve all these kind of problems, the Intelli-
gent Transportation System (ITS) was introduced which provides the two types of
communications i.e. vehicle to vehicle (V2V) and vehicle to infrastructure (V2I).
Here infrastructure involves the two basic components—Road Side Unit (RSU) and
Trusted Authority (TA), installed alongside the road. Later, ITS was termed as Vehic-
ular Ad-hoc Network (VANET) which uses the functionalities of Mobile Ad-hoc
Network (MANET). The network architecture of VANET consists of three major
components i.e On-Board Unit (OBU), Roadside Unit (RSU) and Trusted Authority
(TA) as shown in the Fig. 1. In VANET, every vehicle is assumed to be equipped with
an OBU device that also comprises of the different component e.g. Global Position-
ing System (GPS), micro-sensors etc. OBU takes the advantages of Dedicated Short
Range Communication (DSRC) protocol which is based on IEEE802.11p (5.9 Ghz)
radio technology to communicate among the vehicles. It also uses a Tamper-Proof
Device (TPD) to store secret information of the vehicles. TPD is assumed to be more
secure as it is considered to be unfeasible to ingress the stored data for a malicious
node. It warns the driver periodically about traffic-related information like speed,
location, direction and road condition etc. to avoid traffic jams and road accidents.
Further, this information is sent to RSU and then RSU verifies all the received infor-
mation and rebroadcasts it with the warnings to other vehicles. Moreover, RSU is
responsible for all the authentication work to lighten the burden of a TA. Whereas
TA plays a major role in registering all the OBUs and RSUs. It has high computa-
tional and storage capabilities as compared to other components and also maintains
a database of the vehicles so that they can remove a malicious node from the network
by tracing back to the origin of the messages.
Fig. 1 VANET network architecture

A Comprehensive Survey on Attacks and Security … 585
Every vehicle in the VANET broadcasts safety messages which may contain a
vehicle’s information (e.g. speed, position etc.) that need to be processed before trans-
mitting to the other vehicles because any malicious vehicle may intend to send some
misleading messages deliberately that can destroy the VANET. It is also required to
secure the personal information (e.g. id, car number etc.) of a vehicle and prevent
other nodes from accessing it in the network. So, it arises the requirements of security
services. Verifying every message sequentially by RSU, may not satisfy the timing
requirement of the VANET. Let us suppose there are 200 vehicles and everyone is
sending messages in every 300 ms that need to be signed. Consequently, an RSU
will have to verify at least 650 messages per second approximately which is not
a good solution. Moreover, storing and managing the public key certificates were
also a communication overhead. So, to overcome this, ID-based group verification
scheme was suggested in which a batch of messages is verified at a time that sig-
nificantly reduces the time overhead. However, it is also having some drawbacks.
The rest structure of the paper is as follows: Sect. 2 provides the application areas
of VANET in brief. Required security services of VANET are described in Sect. 3.
Possible Attacks types are mentioned in Sect. 4. Section 5 presents the discussion on
existing papers in detail. And finally Sect. 6 presents the concluding remarks of the
paper.
2 Application Areas
The interaction of the OBU with the RSU has solved the traffic congestion problem.
There are a number of applications of VANET in real life some of them are as follows.
2.1 Safety Related Applications
It enable a vehicle to gather information from its sensors or by communicating to

other vehicles then this information is used as safety messages which enhances the
traffic system and make it more safe. It provides a secure Ad-hoc network by notifying
its users about emergency services and privacy-related precautions.
2.2 Intelligent Transportation Applications
These applications observe the traffic pattern and manage them accordingly. It
enhances the delivery of traffic information by improving the efficiency and accuracy
of traffic detection.
586 A. Islam et al.
2.3 Comfort Applications
These applications directly relate to the comfort of passengers and drivers. It keeps
them updated with the vicinity such as locations of the nearest fuel point, ATMs, food
courts and restaurants with their price list. While having an interface with the RSUs,
it also provides entertainment related applications such as online games, nearest
cinema hall’s location etc.
3 Security Services
It is the basic requirement in VANET that enhance the network by providing the
security to its users, data and services. To make a vehicular network trustworthy
and efficient, its security services should be effective. There are some basic security
services which are as follows.
3.1 Availability
In VANET, availability ensures that all the required resources and services should
be available to all the legitimate vehicles during wireless communication. Since all
other services depend on the availability of the resources which make it one of the
very crucial security services.
3.2 Integrity
Integrity ensures vehicles of the network that the data they are sharing during the
communication is not altered or modified in between. It is an important security
service because in the absence of it, a hacker can modify the data that may cause
traffic congestion or accidents in some cases.
3.3 Authentication
Authentication service makes sure that the vehicle who is sending the safety message
to the RSU is an authorized user. In addition to this, the receiver can also be sure
about the legitimacy of the sender via a pseudonym. So it allows only the legitimate
vehicle to communicate in VANET.
3.4 Confidentiality
Confidentiality basically assures the vehicles or users that their messages will not
be read by any illegitimate user in the network. It is achieved by encrypting the
transmitted message.
3.5 Non Repudiation
With this service, the vehicle which has sent the message can not retract with this
fact. So it works like a proof of sender for the receiver of the message in the VANET.
It can also help in tracing an unauthorized vehicle.
4 Possible Attack Types
As of now, we have discussed the vehicular Ad-hoc network and we know that
VANET is vulnerable to attacks. In VANET, attacks can be defined as stealing or
manipulating the vehicle’s information and using it for all the wrong purposes. In
this section, we will enunciate about major possible attack types in VANET.
4.1 Attacks on Availability
4.1.1 Spamming Attack
Spams are the unsolicited messages (eg advertisement). They have no use to the
vehicles’ driver or the traveller. They are only meant for consuming the bandwidth
that may cause high latency. Due to the lack of Central Administration, it is difficult
to control.
4.1.2 Broadcast Tampering Attack
It refers to modify or to manipulate the information. Hackers may add some new
messages or hide the precautions. That may result in road jams and accidents also in
some cases.
588 A. Islam et al.
4.1.3 Denial of Service Attack (DoS)
If an attacker tries to jam the communication medium and restrict legitimate users
from accessing the network resources, this comes under DOS attack. This attack is
performed by flooding the requests to the RSUs.
4.1.4 Malware Attack
Malware is the malicious software carried mostly by the insider. They are intended
to steal the relevant information. Malware (e.g. worm or virus ) could be installed in
vehicles during the installation of the update.
4.2 Attacks on Authentication
4.2.1 Replay Attack
It is an attack where the fraudulent vehicle captures another vehicle’s safety message
and replays the manipulated messages for his own use which may cause traffic
congestion. This attack may incur insignificant failure to the network.
4.2.2 Sybil Attack
In a Sybil attack, multiple fake vehicles are created to send fake safety messages
which may force another vehicle to change their way and result in traffic jams.
4.2.3 Masquerading Attack
It involves forging the identity for unauthorized access to the VANET. This is intended
for gaining the personal information of some authorized vehicle and may send wrong
messages in the network.
4.2.4 Global Positioning Attack
Global positioning is used to locate a vehicle in real-time. This attack involves pro-
viding the wrong location to the other vehicles.
4.2.5 Message Tampering Attack
Message tampering attack involves altering the useful information. In this attack, a
malicious vehicle may discard, alter or drop the information shared by an authorized
vehicle in VANET.
4.2.6 Impersonation Attack
As the name suggests the impersonation attack refers to impersonate an authorized

vehicle (OBU) to send the manipulated message just to prove his own purpose.
4.2.7 Tunneling Attack
The attacker performs the tunneling attack with the intention of analysing the traffic
by linking two parts of the vehicular network with the help of a tunnel where the
tunnel refers to a communication medium.
4.3 Attack on Confidentiality
As we have already mentioned the importance of confidentiality of the data in the

VANET. In this attack, the attacker tries to get valuable information from legitimate
users and may disclose it to other vehicles which may destroy the communication
network.
4.4 Attack on Non-repudiation
Non-repudiation can provide assurance about the sender of the message that means
he can not oppose it [12]. Two users should be identified uniquely. It happens when
two or more users share the same key for communication. In such cases tracing the
unauthorized user is difficult.
5 Survey on Existing Research
As we can see that road traffic is increasing day by day, and unresponsive behaviour
of the drivers may cause a traffic jam and seldom an accident also. So to overcome
this, many researchers have proposed different security protocols. But these protocols
have various vulnerabilities. In this section we have classified those protocols into
590 A. Islam et al.
three different categories i.e. Public Key Infrastructure (PKI) based schemes, Elliptic
Curve Cryptography (ECC) schemes and Identity-based signature (IBS) schemes.
Later we have compared those protocols and analysed them briefly. This classification
has been shown in Table 1.
5.1 Public Key Infrastructure Based Schemes
Asymmetric key cryptography, also known as public-key cryptography, plays a very

significant role in VANET. The rationale to use this technique is to ensure data safety
in the communication network. In this technique, key pairs that consist of public key
and private key are used to encrypt and decrypt the safety messages. Every message
that is transmitted over the network needs to be digitally signed to make the network
reliable and secure. In PKI, OBUs and RSUs are authenticated by a trusted third
party. Moreover, Digital Certificates, also known as public-key certificates are used
to provide authentication in PKI.
In 2007, Raya et al. [11] suggested a PKI based protocol where all the traffic-
related information is signed and verified by the certificate authority (CA). In order
to start a communication, a large number of certificates with identities are generated
in advance that are randomly selected by a vehicle.
In 2008, Zhang et al. [9] suggested an authentication scheme for an OBU to RSU
communication. In this scheme, the vehicle communicates with RSU with the help
of key pairs, certificate and Hash Message Authentication Code (HMAC).
Later in the same year, Lu et al. [8] suggested an authentication scheme for OBU to
RSU communication in VANET. In this protocol, TA generates the system parameters
like a public key, private keys and certificates by using the Bilinear Pairing (BP) for
signing and verifying traffic-related information. Each vehicle obtains an anonymous
certificate whenever it is in the range of RSU. This protocol minimizes the moving
track attack since the vehicles do not reveal the real identity.
In 2013, Wasef et al. [6], suggested a PKI based authentication scheme. This
scheme works for both OBU to OBU and OBU to RSU communication. In this
protocol, the author reduces the time to check the certificate revocation list (CRL)
by replacing it with a keyed hash message authentication protocol MAC.
For making the network secure and reliable, it is an overhead for TA to maintain
and for vehicles to store a large number of certificates. So, after analyzing the proto-
cols we have found that Raya et al. [11] protocol is vulnerable to Traceability attack
as well as TA has to face overhead due to a large certificate revocation list. Contrary
to that, in Zhang et al. [9] protocol, every vehicle needs to be notified by the RSU
to check whether it is valid or not. It results in a heavy message and transmission
overhead and makes the Ad-hoc network slow. Due to heavy operations in Lu et al.
[8] protocol, it still faces the computational overhead. Moreover, in the Wasef et al.
[6] protocol, because of the global nature of the key, the key update process requires
high transmission delay.
Table 1 Comprehensive analysis of different Protocols

Classifications Year Author name Type of Objectives Limitations
communication
Public key 2007 Raya et al. [11] V2I Improves Traceability
infrastructure security attack, overload
based schemes on the TA
2008 Zhang et al. [9] V2I Improves Transmission
computation overhead is
overhead increased
2008 Lu et al. [8] V2I Moving track Computational
attack, reduces overhead
storage cost
2013 Wasef et al. [6] V2V, V2I Average High
verification transmission
delay is delay
efficient
Elliptic curve 2019 Cui et al. [3] V2I Reduces the Still
cryptography storage and computational
schemes communication cost is high
cost
2019 Ming et al. [2] V2I Reduces the High
communication computational
cost by 44% cost
ID-based 2008 Zhang et al. V2I Reduces delay Impersonation
signature [10] in signature attack,
schemes verifications, traceability
transmission attack
overhead
2010 Chim et al. [7] V2V Impersonation Still vulnerable
attack to
raceability impersonation
attack attack
2013 Horng et al. [5] V2V Impersonation Still vulnerable
attack to traceability
attack
2018 Li et al. [4] V2I Full key Not efficient for
exposure attack, message
forgeability signing and
attack verification
2020 Ali et al. [1] V2I Computation Communication
overhead, overhead
forgeability increases due to
attack PKG
592 A. Islam et al.
5.2 Elliptic Curve Cryptography Schemes
Elliptic curve cryptography (ECC) was first proposed by Neal Koblitz and Victor
S. Miller in 1985. It provides a high level of security in VANET with low cost.
Considering the points on the elliptic curve, it generates the public and private keys.
This algorithm uses smaller keys as compared to the RSA (Rivest–Shamir–Adleman)
and DSA (Digital Signature Algorithm) due to which it takes less computational
power. In addition to this, it requires less space and less bandwidth. As well as it
takes less time to generate keys and encrypting or decrypting the data.
In 2019, Cui et al. [3], analysed and found a weakness in the group based and
pseudonym protocol. The author has shown that these schemes lack many function-
alities such that it needs to manage CRL and distribute the certificate to vehicles.
With such schemes, the vehicles need to store the certificates, key pairs which is
very bulky. To manage certificate revocation lists, large computational and storage
capabilities are required. For that reason, many schemes are available on the basis of
TA but it is very difficult to implement it in real-world. So, the author has suggested
a new semi-trusted ECC based authentication scheme. In this scheme, the receiver
has no need to worry about CRL and the vehicle has no need to store it as well.
In the same year, Ming et al. [2], suggested a scheme which is based on ECC for
V2I communication. According to the scheme, the RSU can handle a vast number of
messages in very less time. The suggested scheme also fulfils all the security require-
ments along with provably secure in the random oracle model. This scheme neither
uses Bilinear Pairing (BP) nor map-to-point operation. Thus it reduces the computa-
tion delay of singing and verifying the messages and this scheme is appropriate for
real-life application.
As we know that ECC protocols have the benefits of less computational power
as compared to other encryption schemes. So, after analyzing the protocols we have
found that for batch verification the Cui et al. [3] protocol’s computation cost is
higher than that of Ming et al [2] protocol because Ming et al. [2] protocol takes (2n
+ 2) scalar multiplication operations in ECC whereas Cui et al. [3] protocol take (n
+ 2) scalar multiplications operation, (n) small scale multiplication operations and
(2n) addition operations in ECC along with (2n) one-way hash function operations.
Besides, in both the protocols, every user authenticated once is assumed not to be
maliciously affected in the near future.
5.3 Identity-Based Signature Schemes
Identity-Based Cryptography (IBC) is a type of Asymmetric Key Cryptography

which was proposed by Adi Shamir in 1984. In this protocol, some significant iden-
tity of the user like name, mobile number and email id etc. can be used as his public
key. It also reduces the overhead of maintaining the public keys. In Identity-based
signature (IBS) scheme, user’s identity is used as a public key and corresponding
private keys are generated with the help of the Private Key Generator (PKG). In IBS,
private keys are used to sign the safety messages. It can be defined in four phases:
• Setup phase: In the first phase of ID-based signature scheme, PKG generates the
system parameters (master key) which are distributed across the vehicles.
• Key Extraction: In this phase a private key is generated using vehicles unique id
and master key for the communication purpose.
• Signing phase: In this phase message is signed by using timestamp and previously
derived private key.
• Verification phase: Finally, the signed message is verified by using verification
algorithm.
In 2008, Zhang et al. [10], addressed the issue in the OBU to RSU communication
that when the RSU gets a large number of signatures, due to storage problems, it
could not verify them in the span of 300ms time. There must be a delay to verify
all the signatures. So, the author proposed this ID-based protocol to overcome these
issues. This is identity-based protocol due to which no certificate is needed and that
is why transmission overhead problems can be reduced. In this protocol, the author
used batch verification techniques by using bilinear pairing to overcome the delay in
verifying a huge number of signatures.
In 2010, Chim et al. [7] raised issues in Zhang et al. [10] protocol stating that this
protocol is impuissant to impersonation attack and heavily depends upon TPD. If the
TPD is compromised then the whole network will have to suffer. So, to overcome, they
suggested an first software-based group communication protocol by using Bloom
Filter(BF) and Binary Search Techniques(BST). In this scheme, there is no need for
an RSU to share information within a batch. It also takes the advantages of BP and
reduces the number of operations to improve its efficiency.
In 2011, Horng et al. [5], addressed the issue in the previous Chim et al. [7]
protocol and found that it is still vulnerable to an impersonation attack. So, the
author suggested a new authentication scheme to overcome it. Being a software-
based protocol, it does not rely on the hardware. In this protocol, the vehicle can
generate a pseudo-identity to transfer a message to another vehicle so that the real
identities of vehicles are not revealed. Only TA can disclose the uniqueness of the
vehicles whenever it is required.
In 2018, Li et al. [4], suggested an ID-based message authentication scheme that
takes the advantages of Id signature and ring signature along with BP. After analysing
the securities, they have shown that this protocol can defend key exposure attack and
forgeability attack.
In April 2020, Ali et al. [1], has suggested an identity-based conditional privacy
preserving authentication (ID-CPPA) scheme for V2I communication that relies on
BP. It uses one-way hash function due to which processing of the messages at RSU can
be done efficiently. It also allows the batch verification and ensures the forgeability
attack against the Inverse Computational Diffie-Hellman problem in the oracle model.
The above protocols use the Bilinear Pairing (BP) approach which requires heavy
operations. Due to which it obligates high computational cost. VANET needs to be
more secure and capable of refraining the attacker from accessing the network. For
594 A. Islam et al.
that reason, we have analysed aforementioned paper and found that Zhang et al. [10]
and Chim et al. [7] protocols are still impuissant to overcome impersonation attacks.
Moreover, Zhang et al. [10] and Horng et al. [5] protocols are needed to provide the
security against the traceability attack. We have noticed that Li et al. [4] protocol
uses the bilinear pairing which increases the computational delay. Along with this,
Li et al. [4] protocol makes use of ring-signature and Id-based signature. These all
are heavy operations that make message signing and verifying inefficient. At last, we
have observed that Ali et al. [1] protocol is still facing high communication overhead
due to PKG.
6 Conclusion
Vehicular Ad-hoc Network fulfils the emerging requirements of vehicles for mak-
ing the Intelligent Transportation System. So it is seen that in the past few years,
researchers have concentrated on improving the security and privacy of Vehicular
Ad-hoc Network. The rationale behind VANET is to implement it into the real world
and to provide a better traffic system. In this paper, we have discussed the security
services, possible attack types and communication method in VANET. At last, we
have illustrated the benefits and drawbacks of the existing papers successfully. It is
expected that this paper will give a clear overview of already suggested protocols on
VANET and will open a door for researchers to extend the securities in the VANET.
References
1. Ali, I., Li, F.: An efficient conditional privacy-preserving authentication scheme for vehicle-
to-infrastructure communication in VANETs. Veh. Commun. 22, 100228 (2020). https://www.
sciencedirect.com/science/article/abs/pii/S221420961930275X
2. Ming, Y., Cheng, H.: Efficient certificateless conditional privacy-preserving authentication
scheme in VANETs. In: Mobile Information Systems 2019 (2019). https://www.hindawi.com/
journals/misy/2019/7593138/
3. Cui, J., Wu, D., Zhang, J., Xu, Y., Zhong, H.: An efficient authentication scheme based on semi-
trusted authority in VANETs. IEEE Trans. Veh. Technol. 68(3), 2972–2986 (2019). https://
4. Li, J., Liu, Y., Zhang, Z., Li, B., Liu, H., Cheng, J.: Efficient ID-based message authentication
with enhanced privacy in wireless ad-hoc networks. In: 2018 International Conference on
Computing, Networking and Communications (ICNC), Maui, HI, pp. 322–326 (2018). https://
5. Horng, S., et al.: b-SPECS+: batch verification for secure pseudonymous authentication in
VANET. IEEE Trans. Inf. Forensics Secur. 8(11), 1860–1875 (2013). https://ieeexplore.ieee.
org/document/6576161
6. Wasef, A., Shen, X.: EMAP: expedite message authentication protocol for vehicular ad hoc net-
works. IEEE Trans. Mob. Comput. 12(1), 78–89 (2013). https://ieeexplore.ieee.org/document/
6081877
7. Chim, T.W., et al.: SPECS: secure and privacy enhancing communications schemes for
VANETs. Ad Hoc Netw. 9(2), 189–203 (2011). https://www.sciencedirect.com/science/article/
abs/pii/S1570870510000648
8. Lu, R., Lin, X., Zhu, H., Ho, P., Shen, X.: ECPP: efficient conditional privacy preservation pro-
tocol for secure vehicular communications. In: IEEE INFOCOM 2008—The 27th Conference
on Computer Communications, Phoenix, AZ, pp. 1229–1237 (2008). https://ieeexplore.ieee.
org/document/4509774
9. Zhang, C., Lin, X., Lu, R., Ho, P.: RAISE: an efficient RSU-aided message authentica-
tion scheme in vehicular communication networks. In: 2008 IEEE International Conference
on Communications, Beijing, pp. 1451–1457 (2008). https://ieeexplore.ieee.org/document/
4533317
10. Zhang, C., et al.: An efficient identity-based batch verification scheme for vehicular
sensor networks. In: IEEE INFOCOM 2008-The 27th Conference on Computer Com-
munications. IEEE (2008). https://www.researchgate.net/publication/4334277_An_Efficient_
Identity-Based_Batch_Verification_Scheme_for_Vehicular_Sensor_Networks
11. Raya, M., Hubaux, J.-P.: Securing vehicular ad hoc networks. J. Comput. Secur. 15(1),
39–68 (2007). https://www.researchgate.net/publication/37439204_Securing_Vehicular_Ad_
Hoc_Networks
12. Khan, S., Khan Pathan, A.: Wireless Networks and Security, vol. 10, pp. 978–3. Springer
(2013). https://link.springer.com/book/10.1007%2F978-3-642-36169-2
Analysis, Visualization and Prediction
of COVID-19 Pandemic Spread Using
Machine Learning
Snigdha Sen, B. K. Thejas, B. L. Pranitha, and I. Amrita
Abstract Over the years, human beings have faced several health issues related
to the spread of viruses. After Spanish flu, Nipah, and Ebola, now COVID-19 has
thrown a serious threat to society all over the world. The rate is increasing exponen-
tially, prevention, proper measurement and strategic action are the need of the hour
to combat this pandemic. This paper focuses on analyzing COVID-19 dataset using
numerous machine learning (ML) algorithms, visualizing the results and evaluating
the performance of the best algorithm. The spread of virus outbreak has caused thou-
sands of deaths across the world and is considered to be a pandemic according to
WHO reports. There are a number of methods in preventing the risk of infection
manually such as predicting the risk of infection, screening the patients, using chat-
bots to analyze the risk of infection, identifying and speeding up drug development,
etc. In this paper, we mainly experimented with KNN, ANN, SVM, linear (LR) and
polynomial regression (PR) methods to learn and analyze about pandemic spread.
To achieve this, we have considered COVID-19 dataset of Karnataka state. Mostly,
district-wise confirmed, active and death cases have been considered for this work.
In addition, we have also performed gender-wise infection spread and presented a
cumulative dashboard for overall district-wise active, confirmed and recovered cases
of Karnataka.
Keywords COVID-19 · Machine learning · Seaborn · Dashboard · Visualization
S. Sen (B) · B. K. Thejas · B. L. Pranitha · I. Amrita

Department of CSE, Global Academy of Technology, Bengaluru, Karnataka, India
e-mail: snigdha.sen@gat.ac.in
B. K. Thejas
e-mail: thejaskiran99@gmail.com
B. L. Pranitha
e-mail: pranitharenu@gmail.com
I. Amrita
e-mail: amrita.indresh@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_63
598 S. Sen et al.
1 Introduction
Originated from Wuhan market, China, in December 2019, slowly, it started

stretching out its tentacle over the entire world. Previously termed as 2019 Novel
coronavirus [1], COVID-19 is a group of viruses with scientific name as Ortho-
coronavirinae or simply Coronavirinae that can infect animals majorly mammals.
Because one of the major symptoms of this is acute respiratory syndrome, ICTV
termed it as SARS-CoV-2. After entering the human body, it may even kill most
of the infected humans specially with comorbidities. The transmission of this virus
through cough and sneezing of infected people is very rapid, and hence, measures
need to be taken to stop the spreading of the virus. The COVID-19 pandemic has set
foot across the globe and is majorly impacting the country’s economy across indus-
tries and business. According to WHO, there are 7,039,918 confirmed cases and
404,396 confirmed deaths all over (as per 09-06-2020 report) [1] which is a serious
issue to be considered. When the entire world is worried about the pandemic, as
computer science engineers, we try to explore how ML algorithms and data analysis
can assist us to tackle and control this pandemic better. Being residents of Karnataka
state, we used the state dataset for our case study. We evaluate performance of various
ML algorithms toward data analysis of Karnataka. However, these algorithms can
be applied on another dataset as well.
The manuscript is organized as follows. Section 2 discusses relevant literature
survey. The dataset description has been described in Sect. 3. We present experimental
setup and result discussion about various ML algorithm and data visualization in
Sect. 4. Lastly, we conclude with possible future work.
2 Literature Survey
In the last few years, AI, machine learning and deep learning are setting notable
footprints in data analysis across every sector. Due to this COVID-19, researchers
all over the world demand help from data scientists to analyze and predict the spread
so that situation can be handled in a better and organized way. Benvenuto et al. [2]
discussed the ARIMA model based on autoregressive integrated moving average
which is useful for predicting COVID-19 spread and forecasting disease prevalence.
Narinder et al. [3] have evaluated and compared performance of support vector
machine, polynomial regression, deep neural network and recurrent neural networks
using long short-term memory (LSTM) with COVID-19 data from Johns Hopkins
University and finally reported PR offers best prediction results with low root mean
square error (RMSE) over other approaches in confirmed, death and recovered case.
In his paper [4], time series method described by Deb et al. helps in estimating
reproduction rate of COVID-19. Here, the authors also concentrated on usage of
various data analysis and statistical tools to find out patterns of virus outbreak. Based
on that, early precaution can be taken. While working in same research orientation,
Analysis, Visualization and Prediction of COVID-19 Pandemic … 599
Sars-COV-2 transmission scientific model was developed and proposed by Kucharski

et al. [5] using datasets focusing on confirmed, death and recovery cases inside Wuhan
and rest of the world. Later, Lauer et al. [6] worked on the incubation period of
COVID-19 and mentioned time period can be 5–14 days. While using deep learning,
Narin et al. [7] used and explored usage of CNN for automatic disease detection
from x-ray images. They evaluated performance of three CNN models like ResNet50,
InceptionV3 and InceptionResNetV2 and showed that ResNet50 offers 98% accuracy
in classifying infected patients outperforming other two models. Apart from that,
a lot of another research is going on in this field. Scientists use ML and DL [8]
for screening and creating antibodies to cure COVID-19 disease which proves a
great success. These algorithms even help to recognize suitable antibody candidates
in a quicker and cost-effective way than the traditional approach. It will certainly
accelerate cure therapies of viruses. Columbia University students [8] launched a
startup app EVQLV which is helping to generate millions of antibodies quickly. MIT
developed model [9] is the first which directly uses data from the coronavirus itself
and works on integration of machine learning and standard epidemiology for better
prediction. At Rensselaer Polytechnic Institute (RPI), researchers are also utilizing
ML to analyze the effects of social distancing. Nguyen [10] conducted a survey
on how various AI, IoT and deep learning methods are effective and prompt in the
battle against COVID-19 pandemic using CT scan images and reported performance
of those methods in terms of accuracy.
3 Description of Dataset Used
Mainly, we used data from Kaggle and Johns Hopkins University. The first reported
confirmed case in Karnataka is on March 9, 2020. So, our dataset contains data from
that day till 5th June for analysis.
4 Experimental Setup and Result
Data analysis has been done using Python in Jupyter notebook with libraries from
Matplotlib and Seaborn for visualization. In Fig. 1, we have reported district-wise
active versus confirmed versus recovered cases captured during March 2020 till June
8, 2020.
4.1 Dashboard Creation
We have built a dashboard using Microsoft Excel for the number of COVID-19 cases
in Karnataka as on May 21, 2020. Some techniques like conditional formatting,
600 S. Sen et al.
Fig. 1 Confirmed versus active versus recovered cases in Karnataka
Fig. 2 Karnataka COVID-19 dashboard
pivot table and a few basic formulas were used in Excel for obtaining the desired
dashboard. A provision for users to compare districts has also been incorporated.
Users can also visualize the districts which are above a certain limit. Here, the input
for this type of formatting must be given by the user (Fig. 2).
4.2 Comparative Analysis of ML Algorithm
Day-wise rise in confirmed cases has been plotted here keeping daily increase as a
target variable. Dataset for Karnataka has been used till June 5, 2020. For SVR, RBF
kernel is used, KNN with k = 3 and ANN with a single hidden layer with ReLU
activation function and linear activation function in the last layer is considered using
Keras. Network was trained in 10 epochs with 10 batch size. MAE reported 797.3837.
Linear regression SVR Polynomial KNN

(LR) regression
Fig. 3 Comparative study
Table 1 RMSE error

Linear regression SVR Polynomial KNN ANN
(LR) regression (PR)
RMSE = 52.6 RMSE = 72.7 RMSE = 28.9 RMSE = 25.4 RMSE = 1287.6
r 2 score = 0.51 r 2 score = 0.075 r 2 score = 0.85 r 2 score = 0.72 r 2 score = −0.63
Table 2 Forecasting
Date Predicted value Actual value (confirmed
confirmed cases
(confirmed case) case)
1/6/20 3130 3221
2/06/20 3337 3408
3/06/20 3556 3796
4/06/20 3788 4063
5/06/20 4033 4320
MAE and RMSE are evaluator metric for regression model. Linear regression does
not fit COVID-19 data well, whereas polynomial regression with degree 5 works
best (Fig. 3).
From Table 1, it is visible that PR works best among all algorithms with least
RMSE. So, we used PR for forecasting day-wise confirmed cases depicted in Table 2.
4.3 Data Analysis and Visualization
Here, in Fig. 4, we have shown interactive pie chart, KDE plot to find the percentage
of confirmed, recovered, active and diseased cases from each district of Karnataka.
Brighter region consists of safer districts, and the darker region consists of districts
which are more prone to COVID-19. Date-wise number of male and female confirmed
cases and categorizing districts based on infection spread are analyzed using KNN.17
602 S. Sen et al.
Fig. 4 Visualization using seaborn and matplotlib
districts are in critical zone. Critical zone is calculated by the percentage of recovered
victims with respect to total affected victims.
5 Conclusion
To save the world from the jaw of this pandemic, more collaboration among the
medical fraternity and data scientists should be promoted. Through this paper, we
tried to highlight the impact and potential of machine learning tools to fight this
disease quickly. Collecting more datasets and exploring other ML algorithms can
be part of further research study for even more better prediction. While considering
Indian population, early lockdown helped us to reduce the number of infected cases
and death rate too. Still, there is a long way to go by maintaining social distance,
avoiding crowded places and using sanitizer and mask. So, stay healthy, stay safe.
References
1. WHO corona viruses (COVID-19). Retrieved June 10, 2020 from https://www.who.int/emerge
ncies/diseases/novel-coronavirus-2019
2. Benvenuto, D., Giovanetti, M., Vassallo, L., Angeletti, S., Ciccozzi, M.: Application of the
ARIMA model on the COVID-2019 epidemic dataset. Data Brief 105340 (2020)
3. Punn, N.S., Sonbhadra, S.K., Agarwal, S.: COVID-19 epidemic analysis using machine
learning and deep learning algorithms. Preprint https://doi.org/10.1101/2020.04.08.20057679
(2020)
4. Deb, S., Majumdar, M.: A time series method to analyze incidence pattern and estimate
reproduction number of COVID-19. arXiv preprint arXiv:2003.10655 (2020)
5. Kucharski, A.J., Russell, T.W., Diamond, C., Liu, Y., Edmunds, J., Funk, S., Eggo, R.M., et al.:
Early dynamics of transmission and control of COVID-19: a mathematical modelling study.
Lancet Infect. Dis. (2020)
6. Lauer, S.A., Grantz, K.H., Bi, Q., Jones, F.K., Zheng, Q., Meredith, H.R., Azman, A.S., Reich,
N.G., Lessler, J.: The incubation period of coronavirus disease 2019 (COVID-19) from publicly
reported confirmed cases: estimation and application. Ann. Intern. Med. (2020)
7. Narin, A., Kaya, C., Pamuk, Z.: Automatic detection of coronavirus disease (COVID-19) using
x-ray images and deep convolutional neural networks. https://arxiv.org/ftp/arxiv/papers/2003/
2003.10849.pdf
8. Kent, J.: Data scientists use machine learning to discover COVID-19 treatments. https://health
itanalytics.com/news/data-scientists-use-machine-learning-to-discover-covid-19-treatments
(As on June 10, 2020)
9. Gallagher, M.B.: Model quantifies the impact of quarantine measures on Covid-19’ spread
http://news.mit.edu/2020/new-model-quantifies-impact-quarantine-measures-covid-19-spr
ead-0416 (As on June 10, 2020)
10. Nguyen, T.T.: Artificial intelligence in the battle against coronavirus (COVID-19): a survey
and future research directions. Preprint https://doi.org/10.13140/rg.2.2.36491.23846 (2020)
Study of Behavioral Changes
and Depression Control Mechanism
Using IoT and VR
Pavan Kumar Katkuri and Archana Mantri
Abstract Internet of things (IoT) has cemented its place as one of the critical tech-
nologies in providing solutions for the present issues. Though we say that there is a
massive advancement in technology, but we still have a person who dies every 40 s due
to depression or mental health-related issues. Mental health disorder/deterioration or
depression is the key issue which needs to be addressed in the healthcare domain. This
paper is a study about analyzing the behavioral changes and how the IoT and virtual
reality (VR) can be used for identification of mental disorder or depression-related
issues and avoid facing the critical stages of depression by the user.
Keywords Mental disorder · Depression · Internet of things · Virtual exposure ·

Virtual reality · Early stage
1 Introduction
In this millennium generation, we have seen many technologies being used for
addressing the issues in various domains like health care, transport, public or civil
services and so on. Health care has progressed with many innovations including
tracking of health, fitness and providing interventions to maintain the health condi-
tion. However, as per the report by “World Health Organization” (WHO), one person
is dying in every 40 s [1]. Though there is vast research happening with respect to
various diseases and health conditions, there is limited research focused on diag-
nosing and healing depression [2]. As per the estimation of the World Health Orga-
nization, depression will be one of the top three pandemic diseases by the year
2030 [1]. Hence, there is a tremendous necessity to address the mental health-related
issues, which causes depression and emotional imbalance.
P. K. Katkuri (B) · A. Mantri

Chitkara University Institute of Engineering and Technology, Punjab, India
e-mail: pavankumarkatkuri@gmail.com
A. Mantri
e-mail: archana.mantri@chitkara.edu.in
https://doi.org/10.1007/978-981-33-4543-0_64
606 P. K. Katkuri and A. Mantri
Depression and anxiety patients need to be treated with utmost care and support,
to cure and make them back normal, but most of times the end result is not achieved.
Depression will lead to low mood, lack of interest in the activities they do which
can be identified by the change in their behavior or actions. These patients if not
identified in the early stage, their disorder could blow out of proportion and could
lead them to suicidal thoughts.
Most of the patients fail from getting recognized that they are suffering from
disorder tendency; if identified on time, few could get cured based on their stage
of depression levels, few might end up with committing suicide and few could
undergo the treatment for a lifetime at regular intervals. So, it is very much
essential for the patient to get diagnosed at an early stage and get timely cured
or and thus avoid from reaching critical stages of depression [3]. As tracking
human emotions/health is key element to cure depression patients, IoT devices were
deployed for tracking/monitoring the emotional health. For this, a perfect mecha-
nism should be able to analyze the data generated by IoT devices and VR so as to
identify the level of depression and then generate the recommending actions to help
the patient recover.
2 Assistance for Depressive Disorder Victims
2.1 Early Stage Identification
Depression is becoming an increasingly severe problem in society these days. It is

therefore necessary to identify the patients for any mental health disorders at its early
stage. Depression symptoms can be detected and monitored using IoT devices by
monitoring their behavioral changes without human interaction. It could be advan-
tageous to have the patient’s behavior monitored when she/he is alone instead of
monitoring in the ambience of a psychiatric hospital or when he is surrounded by
family or friends. In order to identify the behavioral change, firstly, the diagnosing
system needs to be trained with the patient’s normal behavior so that it can detect
the behavioral change and report the user/patient at an early stage.
The system can identify the change in behavior, which may lead to depression,
based on few parameters like
(i) Feeling of guilt: can be measured with sensors.
(ii) Activity: measuring the energy level with which they perform the actions.
(iii) Sleep: sleep can be calculated with IoT devices or wearable devices.
(iv) Depressed mood: can be measured with IoT devices or wearable devices or
even the smartphone using the parameters like calls, messages, images, etc.
(v) Suicidal tendency: browsing negative content over the Internet.
(vi) Levels of anxiety: IoT devices/wearable devices.
(vii) Weight: monitoring weight loss continuously.
(viii) Breath and heart rate: measured with IoT devices.
Study of Behavioral Changes and Depression Control Mechanism … 607
Fig. 1 Mechanism to identify behavioral changes
(ix) Phobias, panic attacks.

(x) Obsessive-compulsive disorder (OCD).
(xi) Post-traumatic stress disorder (PTSD).
(xii) Relationship with others.
Mental ill health is wide-ranging with different symptoms and severity and is
generally characterized by abnormal thoughts, emotions, behavior, relationships with
others, etc. The aforesaid symptoms mentioned are some of the emotions that cause
mental disorder and it is very crucial that these emotions be continuously monitored
at all locations and at all times. The data generated by the system deployed can be
analyzed based on the questionnaire, survey, Hamilton depression scale, self-report
rating, BDI [4], etc. The stage of depression/mental disorder of the patient can be
identified using the aforesaid symptoms (Fig. 1).
2.2 Monitoring the Victims
Depression/mental disorders can be classified into three stages; they are mild,
moderate and severe. In order to cure the depression, we have adequate physiatrists,
hospitals and medical treatments, etc., but still depression is considered a critical
problem as they are not medicated or identified in the early stages. If the patients are
identified in their early stage and given utmost care and are shown empathy, most
of the cases could be cured. However, most of them cannot even know on their own
that they are suffering from a mental disorder. Patients with strong psychological and
physical health can recover even from the severe stage, but the percentage of such
types is very low. Once the stage of the mental disorder is identified based on the
score, the IoT system can be used to (1) provide interventions between doctors and
patients, (2) enable individuals to engage in their health care actively, (3) support
proactive, preventive and personalized healthcare delivery to individuals around the

world and (4) significantly decrease unnecessary hospital admissions and associated
costs [5].
If the patient is identified in the early stages, then the system will be able to
provide recommendations based on the health status of the user. Suggestions play
a crucial role when the patient is identified in the early stage. Figure 2 shows the
recommendations based on the depression disorder [6]. For example, if the user
is feeling deprived of sleep which might also lead to the depression, this can be
identified at an early stage and can be provided with recommendations initially to
listen to music and/or meditate, etc., and make him conscious about the importance
of having adequate sleep for good health.
If it continues further, then recommendations can be changed; based on the stage,
even medical assistance can be recommended. So, it is crucial to provide recom-
mendations from the very early stage, identifying or monitoring based on the health
symptoms of the person.
Few other recommendations that can be generated based on the user depres-
sion symptoms are, viz.: watching a movie, doing yoga, meditation, showing good
past moments from mobile images/videos, stroll in the cool breeze with family or
friends, playing sports, reading books and/or suggesting books to be read, suggesting
mobile apps that help the patients in sleeping, meditation, etc., growing plants or
Fig. 2 Recommendations
based on depression disorder
Fig. 3 Remote health monitoring [9]
gardening, etc. These activities can keep the patient engaged and active—physi-
cally and mentally. Recommendations should vary from person to person and be
categorized and be categorized based on the age and health condition of the user.
IoT system should generate the recommendations by taking the following into
consideration:
(i) Gender of the use.
(ii) Age group varying from children to old age.
(iii) Based on emotions or symptoms measured.
(iv) Identifying the context when the abnormal behavior is recorded.
(v) Tracking the location of the user (Fig. 3).
Based on the above factors, the recommendation is to be generated by the system.
The sensors used for measuring play a vital role in generating the data. These recom-
mendations need to be changed based on the state of the mental disorder. Simply
having an automated mechanism will not be able to cure the patients, instead it can
only help the user to identify and suggest the recommendation. It is always necessary
and conveniently beneficial to have the people who can take care, shower love and
empathy on the patient. If the patient is suffering from mild depression, this is the
stage where the state and progress of the patient are regularly informed to his family
or friends or with whom the patient maintains frequent contact. This contacts’ data
can be taken from the patient’s phone, analyzed for measuring the emotional connect
of the patient with those contacts. When the patient is about to reach the severe stage,
the report has to be shared or reported to the doctor. These recommendations and
continuous monitoring of the patient and the collected data will precisely help in
identify the depression or mental disorder. Data thus has a more significant impact
on analyzing depression or mental disorder.
3 Wearable Devices and Sensors
A vulnerable segment of those with a mental illness is termed as severe mental illness
(SMI), a condition associated with psychosis and other extreme states. These patients
may resist treatment, particularly, medication; this is where the sensors and IoT
systems will be beneficial. This is the reason we have seen the penetration of wearable
devices to monitor health. The sensors in the devices can help the doctors and family
members to ensure that patients take their medication regularly as prescribed.
Various devices like Mimo, Sproutling, Withings home, Emospark, Apple watch,
Jawbone, Fitbit, etc., have proved that people are not only attracted towards wearable
devices but also become inclined to improve few parameters of fitness. Some of the
advanced devices can monitor blood pressure, sugar levels heart rate, sleep, oxygen
level, water consumption, etc. Apple’s new cognitive kit can also help the patients to
share their live moods with the doctors. The best thing about the cognitive kit is that
it has some games, which record the reaction of the patients and therefore act like a
psychometric test. This proves that IoT can be a better solution for diagnosing and
monitoring the patients.
4 Diagnostic Algorithms
Is it vital to diagnose the patients faster? Does this have a greater impact on the
rate of recovery? Studying the latest research in which the researchers believe that
IoT systems with VR technologies help in diagnosing the patients faster when
compared to the traditional methodology. However, combining this with AI algo-
rithms improves and speeds up the diagnosis and the following treatment. IoT
combined with VR/AI algorithms detects signs of clinical depression three months
earlier than the medical provider’s diagnosis. With the help of these advance systems,
monitoring of changes can be done and analyzed accurately.
During a recent test, the patients underwent the examination of their mood and
thought patterns. This information necessitated to guide the patients through cogni-
tive behavioral therapy (CBT) skills. This treatment led them to the proven AI-based
approach better than other mechanisms. As mental health is difficult to be measured,
we need to have better wearable tools to measure our vital statistics. IoT-driven smart
concepts supported by algorithms are a win-win situation for both, the rural masses
for getting cured and their doctors for quickly predicting illness before it develops.
These applications or the systems have the high potential to save countless lives. AI
and machine learning, for instance, could learn from the symptoms, treatments and
Fig. 4 Mechanism for detection of depression
outcomes for a specific condition and provide insights to the physician, which would
be difficult to predict as an individual (Fig. 4).
5 VR in Treating Clinical Disorder Victims
Some of the major changes in the behavior that may lead to depression/mental
disorder are phobia and panic attacks. VR Technologies provides great means of
support in accessing and treating the clinical disorder patients [9]. Many reviews in the
past, discussed about the clinical implications and findings of VR on disorder patients
with accessed quality [10]. Various pilot studies, open trails and random access
trails (RCTs) were reviewed which implemented VR treatment. These compared the
effectiveness of VR treatment with other or no treatment. This study proved that
VR treatment provided better outcome compared to other treatment or no treatment.
VR-enabled treatment can provide better outcomes for disorders caused by “fear of
heights,” fear of flying,” “spider phobia,” “social phobia,” “obesity,” “fear of public
speaking,” etc. (Table 1).
VR Technology supports various treatments like “VR-assisted cognitive behavior
therapy,” “VR-based cognitive treatment,” “VR exposure,” “VR therapy,” etc. These
treatments provided better results by treating the clinical attendees and referred
patients who were facing depressions and behavioral changes, but the evidence for
the efficacy of VR treatment still needed to be established.
6 Research
Research study shows that one in every 20 persons suffers from depression at one
or the other stage of their life. Many researchers are working on designing various
IoT systems to detect depression and track emotions in diverse age groups. Multiple
Table 1 Controlled trails

Behavioral changes Sample/size Treatment Comparison Outcome
Fear of height CA/37 VR exposure No treatment VR > NT
Fear of flying CA/30 VR exposure with Vivo exposure VR > IV
physiological feedback
Social phobia RP/36 VR exposure CBT group VR = GCBT
Obesity CA/216 VR therapy CBT group VR > CBTP
No treatment
Fear of speaking Students/17 VR exposure Self-exposure VR > SE
Spider phobia Students/40 VR exposure with No treatment VR > NT
tactile cues
CA—clinical attendees, RP—referred patients
mechanisms like speech features, facial expressions, text patterns, algorithms and
energy levels were used to identify depression and provide recommendations for the
patients. Wearable IoT technology, smart healthcare, virtual reality [11], artificial
intelligence (AI), EEG signal processing and so on are all undergoing extended trials
for optimization in this field.
Depression/mental disorder is one of the major problems in humans, causing symp-

toms that affect how one thinks, feels and even performs daily activities. This paper
is the study of behavioral changes and the importance of the IoT mechanism and
VR technologies, which can help in identifying and curing the depression/mental
disorder. This system will be able to monitor the user activities and emotions, and
the data can then be used to determine the mental disorder. Advantages of this mech-
anism can help in monitoring the patients continuously and report to the physiatrist
and family members with recommendations generated based on the mental disorder
stage of the patient.
There are many wearable devices getting introduced into the market regularly.
These devices can be used for monitoring all emotions and detecting the symp-
toms based on the data. These can be further improved to provide better service by
combining IoT systems with algorithms of artificial intelligence or machine learning.
IoT can also be combined with other technologies like AR or VR and can provide
AR interface, reliable self-assessment service based on real practitioners’ knowl-
edge, direct pathway to relevant rethink mental illness resources and personalized
support and interventions. Thus, IoT system when combined with AR/VR, AI or
ML, can give better results and thereby significantly control the death rate.
References
1. https://www.who.int/news-room/detail/09-09-2019-suicide-one-person-dies-every-40-sec
onds
2. Anumala, H., Busetty, S.M., Bharti, V.: Leveraging IoT device data for emotional health. In:
International Internet of Things Summit, pp. 487–501. Springer, Cham (2015)
3. Deepika Mathuvanthi, P., Suresh, V., Pradeep, C.: IoT powered wearable to assist individuals
facing depression symptoms (2019)
4. Beck, A.T., Ward, C.H., Mendelson, M., Mock, J., Erbaugh, J.: An inventory for measuring
depression. Arch. Gen. Psychiatry 4, 561–571 (1961)
5. Zois, D.S.: Sequential decision-making in healthcare IoT: real-time health monitoring, treat-
ments and interventions. In: 2016 IEEE 3rd World Forum on Internet of Things (WF-IoT),
pp. 24–29. IEEE (2016)
6. Ali, S., Kibria, M.G., Jarwar, M.A., Kumar, S., Chong, I.: Microservices model in WoO
based IoT platform for depressive disorder assistance. In: 2017 International Conference on
Information and Communication Technology Convergence (ICTC), pp. 864–866. IEEE (2017)
7. Vaseem, A., Dr. Sharma, S.: Depression: a survey on the Indian scenario and the technological
work done. Int. J. Eng. Res. Technol. (IJERT) 08(03) (2019)
8. https://www.c-sharpcorner.com/UploadFile/f88748/internet-of-things-applications/
9. Gregg, L., Tarrier, N.: Virtual reality in mental health. Soc. Psychiatry Psychiatr. Epidemiol.
42(5), 343–354 (2007)
10. Glantz, K., Rizzo, A., Graap, K.: Virtual reality for psychotherapy: current reality and future
possibilities. Psychother. Theor. Res. Pract. Training 40, 55–67 (2003)
11. Katkuri, P.K., Mantri, A., Anireddy, S.: Innovations in tourism industry and development using
Augmented Reality (AR), Virtual Reality (VR). In: TENCON 2019–2019 IEEE Region 10
Conference (TENCON), Kochi, India, pp. 2578–2581 (2019). https://doi.org/10.1109/tencon.
2019.8929478
Sentiment Analysis on Hindi–English
Code-Mixed Social Media Text
T. Tulasi Sasidhar, B. Premjith, K. Sreelakshmi, and K. P. Soman
Abstract Social media has been experiencing an enormous amount of activity from
millions of people across the globe over last few years. This resulted in the accu-
mulation of substantial amount of textual data and increased several opportunities
of analysis. Sentiment analysis and classification is one such task where the opinion
expressed in the text is identified and classified accordingly. This becomes even more
trickier in code-mixed text due to free style of writing which does not have a proper
syntactic structure. In this paper, we worked on such Hind–English code-mixed
texts obtained from SentiMix shared task of SemEval-2020. We created a novel cus-
tomized embedding model for feature generation from Hindi–English code-mixed
texts to classify them to various sentiments like positive, neutral and negative using
deep learning techniques. It is observed that attention-based CNN-Bi-LSTM model
has achieved better performance out of all models with 70.32% F1-score.
Keywords Sentiment analysis · Word2Vec · fastText · Long short term memory

(LSTM) · Attention mechanism
1 Introduction
Social media platforms like Facebook, Twitter and Instagram have seen a phenomenal
range of interactions across the world. These platforms are flooded with all sorts of
data like texts, videos, images, and among all, textual communication is a prime
T. Tulasi Sasidhar (B) · B. Premjith · K. Sreelakshmi · K. P. Soman

Computational Engineering and Networking (CEN), Amrita School of Engineering,
Amrita Vishwa Vidyappetham, Coimbatore, India
e-mail: sasidharturaga97@gmail.com
B. Premjith
e-mail: b_premjith@cb.amrita.edu
K. Sreelakshmi
e-mail: ammaslakshmy@gmail.com
K. P. Soman
e-mail: k_psoman@amrita.edu
https://doi.org/10.1007/978-981-33-4543-0_65
616 T. Tulasi Sasidhar et al.
source of research due to abundant usage. The amount of people engaging in social
networking are exponentially increasing each day on enormous variety of aspects.
This opened a huge scope for analyzing and understanding the behavioral pattern of
people and leveraging it for the improvement in several fields, i.e., getting feedback
for a product, reviewing public opinion on new government policy, obtaining verdict
of a movie and so on. The series of methods and techniques to understand the human
polarity by extracting the relevant information from the textual data can be termed
as sentiment analysis [1]. Traditionally, sentiment analysis and classification means
analyzing the polarity of the expressed opinion and categorizing them as positive,
neutral or negative.
Sentiment analysis is one of the subcategories and the prime area of research
within the area of natural language processing. There are lot of advanced models
achieved state-of-the-art classification results on texts which are expressed in mono-
lingual nature like English, Spanish, Chinese and so on. But people from multilingual
societies like India tend to use different style of writing texts as they more likely tend
to have an influence of at least two languages on them and code-mixed [2] writing is
one of such text patterns. Code mixing is a phenomena of transliterating and mixing
native and foreign language. An example of such text is illustrated below.
• Sarkar ne corona time pe lockdown shuru karke spread ko control kar diya.
The above example contains a Hind–English code-mixed text, and we can see that
Hindi words like “Sarkar ne”, “shuru karke” and “kar diya” are written in roman
script. Analyzing this type of sentences and classifying based on the sentiment
expressed in them is still an active research and difficult when compared to tra-
ditional text classification. Lack of pretrained models and quality annotated corpus
make the task even more trickier. In this paper, we conducted experiments on such
type of data, and we chose Hindi–English code-mixed texts and attempted to classify
them based into buckets of positive or negative or neutral. A novel way of creating
a customized embedding model is proposed for better feature generation and used
deep learning models to classify the sentences.
The paper structure is as follows. Section 2 provides a detailed description about
the existing works done in the field of sentiment classification. The details of the
dataset used for experimentation are given in Sect. 3. In Sect. 4, a detailed flow along
with description of each step followed while conducting experiments is provided,
and the paper is concluded in the Sect. 5.
2 Related Works
Code-mixed text classification in the context of Indian languages is still an active

research area. Lack of annotated data and involving a complex and semantically rich
language like Hindi along with English makes it trickier. The most crucial step for
dealing with textual data is to obtain a relevant numerical representation for it. Word
embeddings are one such learned representations which convert words to vectors
Sentiment Analysis on Hindi–English Code-Mixed Social Media Text 617
while preserving the context among them. Cha et al. [3] proposed a work on encap-
sulating models by the formation of word embedding clusters for evaluation of text.
They used Bag-Of-Words, word2vec, fastText and Doc2Vec to fabricate semantic
embedding features which are highly beneficial in readability of text, and among
them, fastText gave better performance. In the context of code-mixed text, Braja
et al. [4] presented a summary of a task in which texts are classified based on the
sentiment expressed in them, and two different code-mixed texts (Hindi–English
and Bengali–English) are used for experimentation. A brief of each team approach
in terms of features and models used by them is provided. Among all, the top two
performing teams used GloVe and fastText word embeddings. fastText along with
CNN layer to grab sub-word features and bi-directional long short-term memory
network (Bi-LSTM) to capture sequential information gave top classification per-
formance. Shalini et al. [5] proposed an approach for classifying the code-mixed
texts in Indian languages based on the sentiment embedded in it. They introduced
the first Kannada–English annotated corpus by grabbing Facebook comments using
API. The proposed model used Doc2Vec and fastText for feature vector genera-
tion. Machine learning model like SVM and deep learning networks like convolu-
tional neural network (CNN) and Bi-LSTM are used for classification. The proposed
method is validated on Bengali–English and Hindi–English corpus acquired from a
shared task. They achieved 60.22%, 72.20% using Bi-LSTM on EN-HI, EN-BE and
71.50% using CNN on EN-KA dataset. In many cases of code-mixed texts, a subset
of words constitute the entire context of sentence. In order to have better classifica-
tion, its important to weigh each word. This can be carried by neural network with
the incorporation of attention mechanism [6] in it. Zhou et al. introduced an LSTM
model with attention mechanism for classifying the sentiments in cross-language
texts [7]. Word2vec model trained on both English and Chinese is used to generate
feature vectors. They used a neural network model with attention mechanism which
is trained in combination with the bilingual bi-directional LSTMs to model the word
sequences and achieved 82.4, 84.1 and 81.3% accuracies on NLP&CC datasets. The
main challenge of distinguishing emotions in code-mixed texts is exploring mono-
lingual and bilingual content of each text and identifying the useful words from the
context. Wang et al. [8] addressed these challenges by proposing a bilingual atten-
tion network (BAN) model which accumulates the important word features from both
languages to construct feature vectors and integrate the vectors with high attention
weight to predict the emotion.
The related works that are provided for this task support the fact that sequential
models along with attention mechanism enhanced the classification but lack of state-
of-the art pretrained embedding model [9, 10] especially in Hindi–English code-
mixed domain resulted in sub-par accuracy values. In this work, we propose to
fabricate a customized embedding model which gives better numerical representation
of texts so that better classification can be achieved.
3 Dataset Description
The dataset used for experimentation for sentiment analysis is obtained from Sen-
tiMix shared task organized in SemEval-2020 [11].
The detailed distribution of data is portrayed in Table 1. The task is to classify the
Hindi–English code-mixed text based on the sentiment expressed in it. The sentiment
labels considered are positive, neutral and negative. The dataset contains 14,000
sentences for training, 3000 sentences for validation and 3000 sentences for testing.
Each text is annotated with respective sentiment, and along with that, word level
language labels are also provided.
This section is organized as follows. In Sect. 4.1, preprocessing steps which are used
for cleaning the data are illustrated. First phase of experiments, results and output
analysis are provided in Sect. 4.2. In Sect. 4.3, the reason and procedure of fabricating
a customized embedding model are described, and the experiments conducted with
feature vectors from customized embedding model are provided in Sect. 4.4.
4.1 Preprocessing
The texts present in data are extracted from social media platforms, and they are filled
with information like Usernames, URLs, Hashtags and special characters. A prior
preprocessing is required to remove irrelevant information from the sentences. As
each data point is splitted into words, first step is to concatenate and form sentences.
After that, all the usernames which are in general start with @ and hashtags (#) are
removed from the sentences. All the special characters like multiple dots, smileys
along with additional spaces are removed, and each sentence is converted to lower
case. These preprocessed sentences are utilized for all the experiments that are carried
out in this work.
Table 1 Dataset description

Label Train data Validation data Test data
Positive 4634 982 1000
Neutral 5264 1128 1100
Negative 4102 890 900
4.2 Experiments-1
It is clear from the literature survey that pretrained bilingual embedding models gen-
erate better feature vector in code-mixed sentence classification. As they were already
trained on similar pattern of sentences, they establish relatively better semantic rela-
tion between words and generate better numerical representation for them. Hence,
as the first level of experiments, a domain-specific pretrained model is utilized [12].
Initially, every preprocessed sentence is tokenized. Word2vec from gensim library
is used to load the pretrained model and retrain it with the tokenized sentences. Skip-
gram method is used, and model is retrained for 10 epochs. Sequential models like
LSTM, Bi-LSTM are used along with CNN headed models. Each model is trained for
15 epochs, and the test results are tabulated in Table 2. It is evident from the results
that there is a large scope of improvement. Hence, we performed a retrospective
analysis and found that there are huge number of unique words which are introduced
by this data to the pretrained embedding model. As lot of new words are present
in the dataset, it is tough to generate relevant embeddings with the available model.
Hence, we propose to fabricate a customized word embedding model.
4.3 Customized FastText Embedding Model
As word2vec retrained model has many newly introduced words, we have decided
to create a customized embedding model for this dataset. It is also evident from the
literature survey that fastText is one of the word embedding model producing better
feature vectors. All the scenarios directed us to use fastText embedding model for
further set of experiments. So, the first stage of work is to collect tweets which are
similar to the texts in data of experimentation. Python library named tweepy is used
to scrape more code-mixed texts. Initially, all the words in data are tokenized, and
n-grams are collected out of them. The n value ranged from 1 to 5. The collected
n-grams are used as key words and given input to the tweepy library. It collected all
the tweets that contain n-gram within the text. All the collected tweets are manually
refined, and tweets which have relevant information and code-mixed in nature are
filtered. In total, 110,000 code-mixed texts are collected from social media and other
sources. Gensim fastText library is utilized to create embedding model. Skip-gram
mechanism is elected, and a fastText model is fabricated by training it with collected
data for 10 epochs.
Table 2 Experiments-1 results

Model Accuracy Precision Recall F1-score
LSTM 0.5543 0.5671 0.5613 0.5572
CNN-LSTM 0.5634 0.5742 0.5680 0.5672
Bi-LSTM 0.5602 0.5627 0.5659 0.5640
CNN-BiLSTM 0.5763 0.5729 0.5896 0.5759
4.4 Experiments-2
In this phase, customized fastText obtained model is used to generate feature vectors,
and the experiments are conducted using the same deep learning models used in
Experiment-1. There is a significant rise in classification results. At this stage, the
misclassified sentences are retrospected, and it is observed that most of them are either
lengthy or short. This high arbitrary nature in the lengths of sentences is responsible
for false classification. So, an attention model is adapted in the architecture to identify
and capture the relevant information according to context.
This resulted in the better performance than the already experimented models.
The architecture for the top performing model is as shown in Fig. 1. The results of
each of the experimented models are illustrated in Table 3. The metrics for measuring
the quality of classification are accuracy, recall, precision and F1-score.
The confusion matrix of test data results for the best performing model is shown
in Fig. 2 in which model class-wise classification performance can be displayed. In
comparison with Experiments-1 results, there is a surge in accuracy, and F1-score
of deep learning models can be seen in the phase of Experiments-2. It is evident that
a customized fastText bilingual embedding model gave better feature vectors and
attention mechanism helped in handling the sentences of which lengths are highly
arbitrary in nature.
The optimal hyperparameters of best performing model are given in Table 4. Ini-
tially, we started of with embedding vector size as 100, and to observe the change
in the results, we experimented by varying the vector size from 200 to 400. There
was improvement in results till 300 but at 400 the results started decreasing so the
optimal embedding vector was fixed as 300. Various activation functions like ReLu,
Tanh, etc., were used, and on observing the result, Tanh gave the best performance.
We experimented by varying the number of epochs from 5 to 15 and found that after
10 epochs the results were overfitting so we stopped with 10 epochs. The number of
Fig. 1 Overview of attention-based Bi-LSTM model
Table 3 Experiments-2 results

Model Accuracy Precision Recall F1-score
LSTM 0.6335 0.6371 0.6196 0.6282
CNN-LSTM 0.6476 0.6484 0.6560 0.6476
Bi-LSTM 0.6423 0.6409 0.6496 0.6452
CNN-BiLSTM 0.6524 0.6667 0.6546 0.6579
CNN-BiLSTM + 0.7016 0.7060 0.7016 0.7032
Attention
Fig. 2 Confusion matrix of best performing model
Table 4 Hyperparameters of best performing model

Hyperparameter Selected value
Embedding dimension 300
Optimizer Adam
Loss function Categorical cross entropy
Activation Tanh(Bi-LSTM), Softmax(dense layer)
Epochs 10
Batch size 100
No. of Bi-LSTM units 350
No. of conv. filters 300
Size of conv. filters 10 × 1
units in Bi-LSTM is experimented from 100 to 400, and at 350, we observed better
classification. In summary, all the hyperparameters are selected based on trial and
error method.
5 Conclusion
Sentiment classification in Hindi–English code-mixed text is carried out in this work.

The data for the experimentation is obtained from SentiMix shared task by SemEval-
2020, and the target labels are positive, neutral and negative. In the first phase of
experiments, a pretrained word2vec model is retrained with the data, and feature
vectors are generated. CNN headed BI-LSTM sequential model gave better per-
formance with 57% F1-score. In order to improve the classification, a customized
fastText bilingual embedding model is fabricated, and attention mechanism is utilized
to deal with arbitrary sentence lengths. It is observed that out of all the experimented
models attention-based CNN-BiLSTM has given better performance in terms of F1-
score. It is evident from the confusion matrix that it also has given better class-wise
performance.
References
1. Mäntylä, M.V., Graziotin, D., Kuutila, M.: The evolution of sentiment analysis–a review of
research topics, venues, and top cited papers. Comput. Sci. Rev. 27, 16–32 (2018)
2. Sreelakshmi, K., Premjith, B., Soman, K.P.: Detection of hate speech text in Hindi–English
code-mixed data. Procedia Comput. Sci. 171, 737–744 (2020)
3. Cha, M., Gwon, Y., Kung, H.T.: Language modeling by clustering with word embeddings for
text readability assessment. In: Proceedings of the 2017 ACM on Conference on Information
and Knowledge Management, pp. 2003–2006. ACM (2017)
4. Patra, B.G., Das, D., Das, A.: Sentiment Analysis of Code-Mixed Indian Languages: An
Overview of SAIL-Code-Mixed Shared Task@ ICON-2017. arXiv preprint arXiv:1803.06745
(2018)
5. Shalini, K., Ganesh, H.B., Kumar, M.A., Soman, K.P.: Sentiment analysis for code-mixed
Indian social media text with distributed representation. In: 2018 International Conference on
Advances in Computing, Communications and Informatics (ICACCI), pp. 1126–1131. IEEE
(2018)
6. Chen, H., Sun, M., Tu, C., Lin, Y., Liu, Z.: Neural sentiment classification with user and product
attention. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language
Processing, pp. 1650–1659 (2016)
7. Zhou, X., Wan, X., Xiao, J.: Attention-based LSTM network for cross-lingual sentiment clas-
sification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language
Processing, pp. 247–256 (2016)
8. Wang, Z., Zhang, Y., Lee, S., Li, S., Zhou, G.: A bilingual attention network for code-switched
emotion prediction. In: Proceedings of COLING 2016, the 26th International Conference on
Computational Linguistics: Technical Papers, pp. 1624–1634 (2016)
9. Kamble, S., Joshi, A.: Hate Speech Detection from Code-mixed Hindi-English Tweets Using
Deep Learning Models. arXiv preprint arXiv:1811.05145 (2018)
10. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword infor-
mation. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
11. Patwa, P., Aguilar, G., Kar, S., Pandey, S., PYKL, S., Gambäck, B., Chakraborty, T., Solorio,
T., Das, A.: Semeval-2020 task 9: overview of sentiment analysis of code-mixed tweets. arXiv
e-prints, pp.arXiv-2008 (2020)
12. Sasidhar, T.T., Premjith, B., Soman, K.P.: Emotion detection in Hinglish (Hindi + English)
code-mixed social media text. Procedia Comput. Sci. 171, 1346–1352 (2020)
Accident Risk Rating of Streets Using
Ensemble Techniques of Machine
Learning
Akanksha Rastogi and Amrit Lal Sangal
Abstract Increased vehicular traffic and lack of expert drivers on the street coupled
with the adverse conditions and poor maintenance of streets are liable for increase in
traffic accidents. Hence, prediction of traffic collision is of paramount importance for
their mitigation. Street traffic analysis and prediction can be a dedicated approach to
ensure safe and reliable street networks. The primary objective of this research is to
assign an accurate accident risk factor for each street using machine learning models
on the identified dataset. For automated and accurate prediction, various ensemble
models of machine learning are applied, and their performance is compared with the
naive models.
Keywords Receiver operating characteristic (ROC) · Support vector machine

(SVM) · Decision tree (DT) · K-nearest neighbor (KNN) · Root mean square error
(RMSE)
1 Introduction
In recent times, increased urbanization has resulted in much higher count of vehicles
on streets, which has given rise to numerous troubles, such as traffic congestion, acci-
dents, and air pollution. These issues have caused immense physical and economic
loss as well as human casualties. The Global Status Report on Road Safety 2015,
representing statistics from 180 nations, reveals that the overall number of traffic
fatalities worldwide observes an annual increase by 1.25 million, with the maximum
traffic mortality rates in lower-income nations. According to National Vital Statistics
Reports 2017, traffic accidents are reasons for around 36,000 deaths in the USA.
Urgency of moment is to improve traffic safety and reduce the number of deaths.
A. Rastogi (B) · A. L. Sangal

Department of Computer Science and Engineering, Dr. B R Ambedkar National Institute of
Technology, Jalandhar, Punjab, India
e-mail: arastogi.305@gmail.com
A. L. Sangal
e-mail: sangalal@nitj.ac.in
https://doi.org/10.1007/978-981-33-4543-0_66
624 A. Rastogi and A. L. Sangal
Some collisions have found to have occurred because of the road structures whereas
others can be attributed to human error. Although progress has been made in strength-
ening the regulations for on-road protection and making cars safer, the study indicates
that the speed of reform remains very slow. With the guide of traffic information and
deep learning, traffic stream forecast has encouraged individuals to avoid enormous
road jams and accidents by choosing routes with a lower clog. Enormous traffic
information and machine learning may likewise give an urging solution to foresee
and diminish the danger of traffic casualties.
One significant work in car accident evasion is to develop a successful traffic
security score prediction system. On the off chance that a traffic security score in a
specific area can be anticipated, we can advance this report to the neighboring drivers
to make them careful or, while on the other hand, make them select a safer street. In
any case, the definite forecast of the car accident is as troublesome as many related
causes could impact car crash.
Ensemble models provide us an upper hand in machine learning such that it
combines the results of various models and allows better predictive performance
compared to a single model. This potential makes it beneficial to apply the ensemble
models of machine learning in the problem mentioned above. The main objective
is to investigate risk levels of roads using traffic accident data. The purpose of this
paper is to investigate accident data and classify streets into various accident risk
levels and applying machine learning models to accurately predict the risk levels of
the streets. So that future research can be done on making a device that would aid
in avoiding the collision-prone areas and advise alternative approaches to alleviate
accident recurrence and harshness.
The remaining part of the paper is structured in the following way: Sect. 2 talks
about the reviewed literature. This section is followed by our proposed accident risk
assignment methods in Sect. 3. Model evaluation is discussed in Sect. 4. Section 5
concludes the paper with a depiction on future directions.
2 Literature Survey
The immense effort was committed to the identifiable proof of main factors or distinct
road patterns that could have caused the collision of traffic. For instance, Oh suggested
the reason that disrupting traffic streams is one of the reasons for provoking the
accident [1]. Ultimately based on the loop detector dataset and crash dataset, they
found that a normal 5-min deviation of the automobile rates, immediately before a
car accident, is a notable indication of a crash. Even though few accident pointers
have been recommended, they could not address the issue of exact mishap prediction
because plenty of components had complicated relations with car accident.
Spatio-temporal reliance is a moving part of traffic, and the reliance of traffic
movement on space and time is assessed by Yue utilizing cross-relationship reasoning
and shows its importance in traffic prediction assessment [2]. Dauwels proposed
unsupervised learning ways to deal with and conclude the spatio-temporal examples
Accident Risk Rating of Streets Using Ensemble … 625
in a gigantic traffic speed forecasting [3]. Dish thought of a model intended to foresee
the spatio-temporal effect of happened occurrences on its neighboring traffic contin-
gent upon the constant traffic data [4]. A spatio-temporal recurrent convolutional
neural system to consider the spatial interdependent conditions and temporal behavior
of network-wide traffic and worldly conduct of system-wide traffic is proposed by
Yu [5].
The progression of AI innovation triggers the researchers to target real-time traffic
accident prediction. Lv considered the factors relying upon Euclidean measurement
and utilized k-closest neighbor way to deal with foresee car accidents [6]. Park
assembled an enormous measure of vast roadway car crash information in Seoul
and built a prediction flow of work contingent upon the k-means grouping approach
and logistic regression [7]. Chen gathered the human versatility dataset in Japan and
utilized model a stack denoise autoencoder to decipher the continuous traffic open
to danger. One negative mark of these inquiries is that these works did not manage
the temporal patterns of traffic impact itself into the models. Without this data, the
percent intensity of the model could be reduced.
In general, the literature reviewed on the harshness of collision injury found
that serious thought had been given to modeling crash harshness, but prediction of
injury outcome would not be a core concern. Statistical models are all the more
often utilized in crash seriousness modeling contrasted with AI techniques, while AI
strategies were, for the most part, utilized as prediction tools. DT, NB, SVM, and RF
are seen as utilized in crash seriousness modeling with changing popularity.
3 The Proposed Risk Assignment Method
The proposed process for the risk assignment to each street is shown in Fig. 1. The
detailed description of all the steps is given in the subsections following the figure.
Fig. 1 An overview of the risk assignment process

3.1 Dataset Selection
The two datasets used in this research are vehicle dataset and vehicular accident
dataset received from the Chicago Data Portal of the City of Chicago in the year
2015 to 2019.
Based on the common report number attribute we have merged both the datasets
also we have cleaned the data to include only on-road vehicles and passenger vehicle
types while excluding unknown values for the traffic type, lightning condition and
weather columns. The attribute ‘num-passengers’ does not include the driver of the
vehicle. Hence, we check If the ‘num-units’ = 1, and this implies that there was only
one driver involved in the crash, and we add +1 to get the total number of passengers
in the vehicle. If the ‘num-units’ >1, means there was more than one vehicle involved
in the crash, and we add that value to the total number of passengers in the vehicle.
From 1610 samples, 30% of samples were kept for testing and remaining 70%
were used to train the model.
3.3 Feature Engineering
Feature selection is the way toward choosing the traits that can make the anticipated
variable increasingly correct or taking out those properties that are insignificant
and can diminish the model precision and quality. Correlation is an approach to
comprehend the connection between numerous factors and characteristics in the
dataset. Correlation pictures on the off chance that one or numerous properties rely
upon another quality or a reason for another trait or one or different features are related
with different features. Correlation investigation shows that the accident severity can
be resolved dependent on number of wounds and physical harm in the accident data.
3.4 Assigning the Danger Score
We assign the danger score according to the number of injuries per person in the
vehicle and the physical damage of the vehicle, and as these two factors have the
maximum correlation, we find number of injuries per person involved in the accident
by dividing the total number of the injuries in the accident by a total number of people
involved in the accident. We choose to assign four danger score ratings to account
for all accidents depending on the amount of injuries. We checked the unique values
of the damage and then decided what weight to add to it in the computation of the
danger score. There are three unique values of the damage in monitory terms (<500,
500–1500 and >1500), so we assign rating values 1, 2, and 3, respectively, then add
a weight of ‘w’ in the multiplication. With a weight of 0.5, we have eight different
scores to bin into four categories. Danger score is 1, if the current score is 0.5 and 1;
danger score is 2, if the current score is 1.5 and 2; danger score = 3, if the current
score is 3 and 4; danger score = 4, if the current score is 4.5 and 5. Then, we assign
the accidents into three bins as combined danger score.
3.5 Applying Machine Learning Models
After assigning the danger score to the each accident, we have applied machine
learning models to estimate the accuracy of our assigned danger scores. We have
applied basic machine learning models like logistic regression, SVM, KNN, deci-
sion tree, and ensemble models like random forest and gradient boosting. The
implementation results of these models are described in the next section.
4 Model Evaluation
We have implemented basic machine learning models and ensemble models and
compared the results based on metrics like precision, recall, and F1 Score in Table
1 with other existing methods used in the study for prediction such as SVM, KNN,
and DT models. We have shown the performance of models on ROC.
Figures 2, 3, 4, 5, 6 and 7 provide the ROC curve for comparison of each model’s
performance on the given dataset for the three danger score classes.
The results obtained shows that gradient boosting is better as compared to all
other models. The gradient boosting model has the smallest RMSE across different
prediction models for all the classes.
Table 1 Parameters based on the performance of models

Model Precision Recall F1-score Accuracy RMSE
Logistic regression 62 51 56 51.19 63.97
K-nearest neighbor 86 84 84 83.62 47.96
Support vector machine 91 89 89 85.82 40.46
Decision tree 81 90 85 85.97 39.25
Random forest 86 84 84 85.84 39.35
Gradient boosting 94 93 93 90.26 31.20
Fig. 2 ROC for logistic regression
Fig. 3 ROC for k-nearest neighbor
5 Conclusions and Future Scope
In the last decade, the analysis and prediction of street traffic have become a subject
of continuous research in various sub-fields of computer science. In this paper, we
enlisted and discussed various approaches proposed for the accidents risk rating
assignment of streets, including machine learning and ensemble techniques. With
results obtained, we can conclude that Gradient Boost, an ensemble model of machine
learning, performed better than simple random forest in terms of accuracy and other
parameters.
Fig. 4 ROC for support vector machine
Fig. 5 ROC for decision tree
Future work can be extended to develop a more successful model that would
outperform the accuracies achieved by gradient boosting. Also, we can use hyper-
parameter tuning to optimize our models with parameters giving the best results,
Moreover, our system could be extended to include more accident-related features
that would help drivers to choose the safest path out of the various available routes,
and the model should also be able to tell the risk of all the available streets. Future
research will focus upon confirming the superiority of DNNs as traffic accident
severity classification/ prediction models using the existing datasets of the relevant
Fig. 6 ROC for random forest
Fig. 7 ROC for gradient boosting
literature, also putting forward a set of independent parameters that are not only
salient, but also enough, for traffic accident severity prediction.
References
1. Oh, C., Oh, J.-S., Ritchie, S., Chang, M.: Real-time estimation of freeway accident likelihood.
In: 80th Annual Meeting of the Transportation Research Board, Washington, DC (2001)
2. Yue, Y., Yeh, A.G.-O.: Spatiotemporal traffic-flow dependency and short-term traffic forecasting.
Environ. Plann. B Plann. Des. 35(5), 762–771 (2008)
3. Asif, M.T., Dauwels, J., Goh, C.Y., Oran, A., Fathi, E., Xu, M., Dhanya, M.M., Mitrovic, N.,
Jaillet, P.: Spatiotemporal patterns in large-scale traffic speed prediction. IEEE Trans. Intell.
Transp. Syst. 15(2), 794–804 (2014)
4. Pan, B., Demiryurek, U., Shahabi, C., Gupta, C.: Forecasting spatiotemporal impact of traffic
incidents on road networks. In: 2013 IEEE 13th International Conference on Data Mining
(ICDM), pp. 587–596. IEEE (2013)
5. Yu, H., Wu, Z., Wang, S., Wang, Y., Ma, X.: Spatiotemporal recurrent convolutional networks
for traffic prediction in transportation networks. Sensors 17(7), 1501 (2017)
6. Lv, Y., Tang, S., Zhao, H.: Real-time highway traffic accident prediction based on the k-nearest
neighbor method. In: International Conference on Measuring Technology and Mechatronics
Automation. ICMTMA’09, vol. 3, pp. 547–550. IEEE (2009)
7. Park, S.-H., Kim, S.-M., Ha, Y.-G.: Highway traffic accident prediction using vds big data
analysis. J. Supercomput. 72(7), 2815–2831 (2016)
Skin Detection Using YCbCr Colour
Space for UAV-Based Disaster
Management
S. J. Arya, A. Asish, B. S. Febi Shine, J. L. Sreelakshmi,

and Elizabeth Varghese
Abstract Crushing impact of inappropriate disaster management and expanded

death rate during a disaster compelled to search for powerful disaster management.
Rapid technological advancement and research on unmanned aerial vehicle (UAV)
urged them to use in disaster management. UAV captures the disaster site images
and upon further analysis by image processing techniques helps in human detection
through skin detection technique. The study involves skin detection using YCbCr
colour space. Experiment test was performed on both outdoor and indoor, and the
results largely depend on light conditions and various environmental factors. The
UAV was hovered above 15 m from the ground to capture the outdoor samples This
technique helps to get better output such that humans could be detected based on
their skin and quick recovery can be made reducing the mortality and risk of rescue
operators.
Keywords Unmanned aerial vehicle · Image processing · Disaster management ·

Skin detection
S. J. Arya (B) · A. Asish · B. S. Febi Shine · J. L. Sreelakshmi · E. Varghese

Department of Electrical & Electronics Engineering, Mar Baselios College of Engineering &
Technology, Thiruvananthapuram, India
e-mail: aryasadasivanj@gmail.com
A. Asish
e-mail: asishasokan22@gmail.com
B. S. Febi Shine
e-mail: febishine@gmail.com
J. L. Sreelakshmi
e-mail: jllakshmi1997@gmail.com
E. Varghese
e-mail: eeliza.v@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_67
634 S. J. Arya et al.
1 Introduction
Disaster surveillance is an unpredictable procedure and is in a developing stage as

there is a compelling impulse for powerful disaster management. Innovative progres-
sion got refined changes in disaster management and one among them is the surveil-
lance utilizing Unmanned aerial vehicles. Successful disaster management is required
for a fast recuperation and reclamation, therefore giving a solid guide to the salvage
administrators. The management gets troublesome at its post-disaster scenes because
of its huge vulnerability and turbulent conditions. This paper plans to conduct a study
on the YCbCr colour space technique for skin detection, which can be utilized with
the end goal of disaster victim surveillance utilizing UAV [1].
The area of skin detection is a developing sector for the past ten years. There are so
many proposed techniques for skin detection. Skin colour extraction is a basic element
in the skin detection process [2]. The different characters of images are brightness,
variance, visibility and saturation [2]. Human skin detection is a challenging process
because of the huge range of skin colour from one region to another [3]. One colour
space can be defined by two or three different colour components. Different appli-
cations like TV broadcasting and graphics suit different colour spaces [2]. There
are numerous papers which talk about various texture segmentation procedure like
Gabor filter, edge discovery, content-based picture recovery, thresholding, Markov
arbitrary fields, supervised segmentation, unsupervised segmentation, grouping tech-
niques, region-based and histogram-based strategies [4]. A significant property of
Gabor channels is that they have ideal joint restriction, or goals, in both the spatial
and the spatial-recurrence areas [5]. In 2011, a texture-based skin detection algorithm
was proposed. The paper covers the process of skin detection via HSV and another
topic of texture segmentation in grey-level distribution [5]. This paper concludes like
algorithm based on greyscale distribution is effective and yields better results. The
referred papers show that YCbCr colour space is well-suited for skin detection in
complex image. Since this paper deals with skin detection in disaster site, it adopts
YCbCr colour space and greyscale conversion for better results.
By coordinating image processing technique with UAV, a quick and effective
rescue facility can be empowered and it gives achievement to disaster surveillance.
Image processing is a technique of modifying an image and changing its attributes
to obtain the ideal characteristics. The best image processing technique for disaster
surveillance is skin recognition technique which identifies the nearness of human
caught in a disaster site through skin shading distinguishing proof. Human body
location is done utilizing skin shading recognition, and it is a procedure by which
skin hued pixels are recognized from caught images or video. Skin colour along with
light conditions is the deciding factors for skin recognition. The captured image is
broken down into single pixel and classifies them into the desired output in YCbCr
colour space. Thus for developing a fast and cheaper rescue facility with better
efficiency, a skin identification technique linked with drone technology opens a new
path in disaster surveillance.
Skin Detection Using YCbCr Colour Space for UAV … 635
2 Methodology
The paper describes an experimental study on skin detection in YCbCr colour space.
MATLAB is used for image processing, and results give the feasibility of the selected
skin detection method. This will help to make an effective system for disaster victim
surveillance using UAV.
Once the sample is loaded, it is converted into YCbCr colour spaces from
normal RGB value. Different morphological operations are done to develop efficient
skin detection. Along with this colour, segmentation model paper aims to check
the efficiency of skin detection in greyscale value distribution. For this purpose,
MATLAB code is improved to plot the histogram and to detect the skin colour in the
greyscale range also. In YCbCr, procedure effectiveness is broke down by plotting
the histogram of every part.
UAV mounted with a camera catches the pictures of the catastrophe site, and
it is received at the ground station. Preparing the pictures that have taken by UAV
during the flight from a height in MATLAB for skin identification opens an approach
to discover casualties and guarantees a snappy recuperation. Figure 1 indicates the
block diagram of the system which has aground board system and an air board
system.
3 Image Processing
Image processing is a rising field with a wide assortment of uses and assumes an
immense job in surveillance. It is a technique of modifying the characteristics of an
image and changing its attributes to get an ideal yield according to the enthusiasm
of the client. The fundamental strides in each sort of image processing continue as
before as it includes bringing in images utilizing image obtaining process, breaking
down the images through programming, the caught images are controlled to deliver
the ideal highlights and different filter techniques are applied to evacuate noises. Skin
detection is done by separating skin shaded pixel from non-skin hued.
3.1 YCbCr Colour Space
YCbCr colour space is an encoded form of RGB colour space used for video
streaming and compression. Since the portrayal makes it easy to discard some repet-
itive shading information, it finds application in the image and video compression
standards like JPE Group, MPE Group 1, MPE Group 2 and MPE Group 4. The
change straightforwardness and express parcel of luminance and chrominance frag-
ments make YCbCr shading model. In this course of action, luminance data is taken
care of as a unit fragment (‘Y’), and chrominance data is taken care of as 2 shading
differentiation parts (‘Cb’ and ‘Cr’). ‘Cb’ addresses the differentiation between blue
portion and reference esteem. ‘Cr’ addresses the complexity between red part and
reference esteem. It is a linear conversion of RGB [4].
YCbCr values can be gotten from RGB shading space as per Eqs. 1–3 [4].
Y = 0.299R + 0.287G + 0.11B (1)
Cr = R − Y (2)
Cb = B − Y (3)
YCbCr colour model representation is shown in Fig. 2. All three components are
represented in the image.
Fig. 2 [4] YCbCr colour

model
4 Results of Skin Detection
Skin tone selected for the study is of the moderate colour range.
0 < H < 50 & 23 < S < 68 [2]
RGB-level to grey-level conversion takes place as per the code, and the converted
range is y > 0.2; cb > 0.3 & cb< 0.44; cr >0.47.
Skin detection of both an indoor image and an outdoor image which is captured
during the flight of UAV is done in YCbCr colour space.
4.1 Skin Detection Results in YCbCr
Figures 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 and 14.
Fig. 3 Indoor sample
Fig. 4 Skin pixels

Fig. 5 Skin pixels in colour
Fig. 6 Non-skin pixels in

colour
Fig. 7 Original colour

image
Fig. 8 Histogram of Y
image
Fig. 9 Histogram of Cb
image
Fig. 10 Histogram of Cr
image
Fig. 11 Oudoor sample
Fig. 12 Skin pixels

Fig. 13 Skin pixels in colour
Fig. 14 Non-skin pixels in

colour
4.2 Results
RGB colour space is not being used commonly because of its non-uniform nature.
YCbCr colour space shows better results and transformation. As the disaster site
contains more complex situations, YCbCr colour space will be effective for the skin
detection purpose. Figure 3 is an indoor sample. Skin pixels are extracted and shown
in Fig. 4. Skin part and non-skin part are marked in Figs. 5 and 6, respectively.
Histogram of the image can be used for better analysis of the picture. The original
colour image got by processing the sample using the code which is shown in Fig. 7.
Histogram of Y, Cb, Cr is indicated by Figs. 8, 9 and 10, respectively. Figure 11
is the outdoor sample that was taken using UAV from an approximate height of 15
metres. Skin pixel identification in greyscale is shown in Fig. 12. Figures 13 and 14
are the skin pixels in colour and non-skin pixels in colour, respectively. The results
were perfect in the case of the samples that we were used for the test.
5 Conclusion
Skin detection is a leading edge in human body identification and analysis and is
applied in many emerging technologies like face detection where human skin colour
acts as an elementary guide for detection. The segmentation process of human skin
colour depends upon the colour space selected, as the skin colour distribution largely
depends on the colour space. The images under various positions based on orientation,
illumination, shadow, pose, in-plane rotation can be easily distinguished and are
applied in diverse applications like video-compression and recognition technology,
as it finds difficult in computer vision technology.
Considering the various limitations in this field, we can say there is not been a
Zen per cent solution for skin detection and is still under development.
References
1. Mbaitiga, Z., Fuji, S., Minori, S.: Rapid human body detection in disaster sites using image
processing from unmanned aerial vehicle (UAV) cameras. In: ICIIBMS 2018, Track 2: Artificial
Intelligent, Robotics, and Human-Computer Interaction, Bangkok, Thailand
2. Lei, Y., Hui, L., Xiaoyu, W., Dewei, Z., Jun, Z.: An algorithm of skin detection based on texture.
In: 4th International Congress on Image and Signal Processing 2011, pp. 1822–1825 (2011)
3. Shaik, K.B., Ganesan, P., Kalist, V., Sathish, B.S., Jenitha, J.M.M.: Comparative study of skin
color detection and segmentation in HSV and YCbCr color space. Procedia Comput. Sci. 57,
41–48 (2015)
4. Ahmed, E., Crystal, M., Dunxu, H.: Skin detection—a short tutorial. Encyclopedia of Biometrics,
pp. 1218–1224. Springer, Berlin, Heidelberg (2009)
5. Kolkur, S.: Human skin detection using RGB, HSV and YCbCr colour models. Adv. Intell. Syst.
Res. 137, 324–332 (2017)
Lie Detection Using Thermal Imaging
Feature Extraction from Periorbital
Tissue and Cutaneous Muscle
Prajkta Kodavade, Shivani Bhandigare, Aishwarya Kadam, Neha Redekar,

and Kiran P. Kamble
Abstract The contribution addresses problem of detecting a deception when inter-

rogation is going on by taking the thermal images of the face. When the person is
lying, due to stress on his face, the blood flow in periorbital tissue and cutaneous
muscles increases, which ultimately results in the higher blood flow in the respective
areas. In the proposed work, we have collected the dataset of such thermal images and
developed algorithms to extract the features from the dataset generated in order to
be able to train the neural network model which will in turn classify the input image
or frame of video as a deception or truth. Using the proposed approach, obtained F1
score for baseline , truth, direct lie and indirect lie is 54%, 46%, 67%, respectively,
and overall accuracy is 60.53%.
Keywords Thermal images · Neural networks
1 Introduction
According to several studies, the differentiation between the liars and non-liars is
very poorly detected by normal as well as expert people. The well-known method,
i.e., polygraphy includes different sensors which measure a person’s blood pressure,
P. Kodavade (B) · S. Bhandigare · A. Kadam · N. Redekar · K. P. Kamble

Department of Computer Science and Engineering,
Walchand College of Engineering, Sangli, India
e-mail: prajktadkodavade@gmail.com
S. Bhandigare
e-mail: shivani1111bh@gmail.com
A. Kadam
e-mail: kadamaishwarya05@gmail.com
N. Redekar
e-mail: neharedekar99@gmail.com
K. P. Kamble
e-mail: kirankamble5065@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_68
644 P. Kodavade et al.
respiration activity, etc. Mostly, the polygraph technique succeeds in detecting the
lies with a accuracy of 90%. But its major drawback is the time taken and the
quality of expert people conducting the test. This drawback can be overcome by
using automated deception detector which detects the lies by facial and behavioral
changes. The lie detector uses thermal image as an input image. It uses the temperature
of the skin that is captured by the thermal camera, which varies due to the blood flow
which is the result of varying emotions. This technique seems to be promising as it is
hard to control one’s emotions [1]. This change in the blood flow is mostly observed
in the forehead and periorbital regions of face. This gives us the input as the relevant
patterns to be found in these regions to detect the lies and truth.
2 Literature Survey
Rajoub et al. [2] presented the lie detection approach by observing thermal fea-
tures in region of interest, i.e., periorbital region. The approach includes the use of
machine learning for the feature. Bhowmik et al. [3] proposed a solution for facial
feature detection. Features such as eye, nose, mouth are detected which do not vary
on rotation, scale, image noise such as eyes and nose. This solution uses an algo-
rithm called Harris Interest Point Detection algorithm to do so. Wu et al. [4] have
presented recognition of thermal face and also learn important features like nose,
eye, mouth, etc., from the raw dataset using a CNN architecture. The recognition
rate here is still affected by conditions like head rotation, expression variation and
illumination variation. Kyal et.al. [5] have presented a way to identify the face of a
human being from a thermal image efficiently. The feature extraction task was done
using a technique called histogram plot. In order to be able to detect face efficiently,
techniques such as object boundary analysis, thresholding are applied to images.
George et al. [2] analyzed the count of eye blink and duration of blink of truth and lie
responses. Analysis over 50 people over some sample questions showed that count
and the duration are more in case of deception. Grouping of responses was done on
the basis of maximum blink duration, and maximum count of blinks for both lie and
truth responses and also the responses where no blinking is observed was categorized
as no blink category.
3 Methodology
The methodology used for present research includes recording of facial skin tem-
perature using thermal camera, processing captured images on mobile device using
deep learning techniques.
Lie Detection Using Thermal Imaging Feature Extraction from Periorbital … 645
3.1 Architecture
Figure 1 demonstrates the top view architecture of the project model. Thermal camera
is plugged into a smartphone, and video is recorded. Then, recorded video in the
gallery of a smartphone is transferred to application. This video is fetched on local
machine where trained model is already located. The model is downloaded from
deep learning server after training with a sufficiently large dataset. The processed
input image is then given as input to the script, and respective response is generated
using the pre-trained model which is ultimately sent back to user via application.
3.2 Data Acquisition
Data will be acquired by using a thermal camera, Seek Compact Thermal Imager for
Android with a temperature range −20 to 1000 ◦ C. During each interview session,
thermal measurements of participant’s face will be obtained by using the Android
smartphone attached with the thermal camera. The dataset can be obtained by using
the following methods:
1. Surveillance of participants and interviewing them.
2. To ask a subject to describe another person.
3. Interview of different people which can have different base temperatures of the
body under normal circumstances.
Fig. 1 Architecture of Lie detection system

These varying temperatures can affect or change the heat maps, and it can also worsen
the accuracy of detecting a lie. In order to get over this issue, the initial few seconds
of every recording will be used as the baseline for each person. In this case, the
concerned person will sit normally without getting into any pastime or answering.
The dataset is based on two profile case studies and a particular mock crime. There is
a total of ten participants who are considered for the demo interview, each interview
further divided into four parts. (1) Baseline (2) True (3) Direct lie and (4) Indirect
lie. For testing purposes, we will be using mock crime video.
3.3 Data Processing
The data processing begins with identifying and cropping of the subject’s face [6].
This action is followed by detecting the maximum intensity point in the image(refers
to the nasal tip which is most close to thermal camera). Now, taking this as the
reference point, the location of forehead and eye region is calculated [7]. These
regions are now cropped from the image for creation of dataset. Now, the dataset
consists of forehead and periorbital regions which we will eventually get by refining
the previously cropped images. These cropped images are stored as dataset. Sample
of processing is as shown in Figs. 2 and 3.
3.4 Experimenting
Experimenting the data acquired from the above step using deep learning (DL)
machine. Dataset of four parts like baseline, true, direct lie and an indirect lie is
Fig. 2 Periorbital regions to

be cropped from the eye
region
Fig. 3 The cropped forehead and eye region
uploaded on the DL server. The server has two NVIDIA V100 architecture GPU
cards and 128 GB DDR 4 ECC RAM. The model is trained by using AlexNet [8]
and different algorithms available on the server [9]. We have varied epochs and dif-
ferent parameters to get more accuracy. Model is trained using processed dataset
which contains 22,000 images of baseline, 33,912 for truth, 11,000 each for direct
and indirect lie. Test dataset contains 6752 test images for baseline, 7892 test images
for truth, 3375 test images for direct and indirect lie class.
4 Dataset
The dataset is acquired by using a thermal camera, Seek Compact Thermal Imager
for Android with a temperature range −20 to 1000◦ C (evening). The thermal camera
features are as follows—Seek Thermal Camera with an image resolution of 640 ×
480 pixels and a frame rate of 12 FPS. 256 × 156 Thermal sensor. 36-degree field
of view, works day and night. During each interview session, thermal measurements
of the participant’s faces were obtained by using the Android smartphone attached
with the thermal camera. The dataset was obtained by using the following methods:
1. Surveillance of people and interviewing them.
2. To tell a subject to describe another person.
3. Different subjects may have different base temperatures in usual circumstances.
4. These differences can control and worsen the accuracy of lie detection.
5. In order to get over this issue, the initial few seconds of every recording will be
used as the baseline for each person.
6. In this case, the concerned person will sit normally without getting into any pastime
or answering.
The dataset is based on two profile case studies and a particular mock crime. There are
a total of ten participants who are considered for the demo interview, each interview
further divided into four parts—true, direct lie, indirect lie and baseline. For testing
purposes, we used a mock crime video.
Fig. 4 Overall structure of the dataset
The subject was given a character profile to learn for 10 min. Four sessions of each
subject were conducted.
Baseline session, truth session, direct lie session and indirect lie session. Figure 4
gives us the overall distribution of the dataset used in the project. It sums up the
total number of videos to 41 videos. Using the videos of various participants will
narrow the error zone, thereby widening the scope of the project. Table 1 gives us the
overall distribution of the questions. The baseline questions consist of the general
questions like what is your name, which city do you belong to, etc. The true session
consisted of questions where the participant gave genuine answers. The direct lie
session consisted of the questions where the person lied directly without hiding the
truth or making up a story. This was achieved as the person was given the story
plot before the session. The indirect lie session consisted of the questions where the
participants made up a story and lied.
• Sessions recording using the thermal Android app provided with the camera.
• The camera should be placed approximately 20 cm away from the subject’s face.
• Sessions to be conducted in the evening so as to get sharp edges.
Table 1 Distribution of questions

Session Number of questions
Baseline 7
True 50
Direct 25
Indirect 25
Total 107
6 Result Analysis
The trained model is now able to identify between true and lie correctly, whereas the
minute differences like between direct and indirect are not detected.
Tables 2 and 3 denote the confusion matrix for classification as class true or class
lie and for classification in four classes, i.e., baseline, true, direct lie and indirect
lie, respectively. Table 2 denotes the overall accuracy of the model, while classifying
image into true or lie is 100%. Table 3 denotes the F1 score for baseline, truth, direct
lie and indirect lie is 54%, 46%, 67%, 67%, respectively. The overall accuracy based
on this is 60.53%.
Table 2 For classification as true or lie

True Lie Overall Precision
classification
True 27,000 0 27,000 100%
Lie 0 22,000 22,000 100%
Overall truth 27,000 220,00 – –
Recall 100% 100% – –
Table 3 For classification in four classes, i.e., baseline, true, direct lie and indirect lie
Baseline Truth Direct lie Indirect lie Precision F1 score
Baseline 8000 8000 0 0 50% 54%
Truth 5390 5610 0 0 51% 46%
Direct lie 0 0 6050 4950 55% 67%
Indirect lie 0 0 1000 10,000 90.90% 77%
Overall 13,390 13,610 7050 14,950 – –
truth
Recall 59.74% 41.22% 85.816% 66.89% – –
7 Conclusion
We proposed a solution to detect lie in an investigation interview with least physical

intervention and with great accuracy. Our solution works sufficiently well when it
comes to classify between lie and not lie. Whereas, a clear classification between
direct and indirect lie is not that accurate. Some different approach such as eye
blinking, looking down can be taken into consideration in order to improve the
accuracy of the classification between direct and indirect lie.
Acknowledgements We appreciate S. H. Bhandari, Minal Parchand, Pankhudi Bhonsle and Komal

Kotyal for building concrete foundation which helped us to accomplish this work and also to
Department of Computer Science and Engineering WCE, Sangli for continuous support and valuable
guidance.
References
1. Marzec, M., Koprowski, R., Wrobel, Z.: Method of face localization in thermograms. Biocy-
bern. Biomed. Eng. (2014)
2. Rajoub, B.A., Zwiggelaar, R.: Thermal facial analysis for deception detection. IEEE Trans.
Inf. Forensics Secur. 9(6), 1015–1023 (2014)
3. Bhowmik, M.K., Shil, S., Saha, P.: Feature points extraction of thermal face using harris
interest point detection. In: International Conference on Computational Intelligence: Modeling
Techniques and Applications (CIMTA) (2013)
4. Wu, Z., Peng, M., Chen, T.: Thermal face recognition using convolutional neural network. In:
2016 International Conference on Optoelectronics and Image Processing
5. Kyal, C.K., Poddar, H., Reza, M.: Detection of human face by thermal infrared camera using
MPI model and feature extraction method. In: 2018 4th International Conference on Computing
Communication and Automation (ICCCA)
6. Latif, M.H., Md. Yusof H., Sidek, S.N., Rusli, N.: texture descriptors based affective states
recognition- frontal face thermal image. In: 2016 IEEE EMBS Conference on Biomedical
Engineering and Sciences (IECBES)
7. Abd Latif, M. H, Md. Yusof, H, Sidek, S.N, Rusli, N.: Implementation of GLCM features in
thermal imaging for human affective state detection. In: 2015 IEEE International Symposium
on Robotics and Intelligent Sensors
8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional
neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105,
Curran Associates, Inc (2012)
9. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Weinberger densely connected con-
volutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition
Voting Classification Method with PCA
and K-Means for Diabetic Prediction
Anupama Yadav, Harsh K. Verma, and Lalit Kumar Awasthi
Abstract Data mining can be defined as a technology using which valuable infor-
mation can be extracted from the massive volume of data. The big patterns can be
explored and analyzed using statistical and artificial intelligence in big databases. The
goal of this research work is to predict the diabetes disease accurately with machine
learning algorithms such as PCA, K-means, random forest, multilayer perceptron
(MLP), and naive Bayes. The diabetes prediction model has various steps like data
preprocessing, feature extraction with the help of PCA, and classification with voting
classifier. The fundamental focus of this research is to improve the prediction accu-
racy. To improve the accuracy for diabetes prediction, voting classifier is introduced
for the diabetes prediction.
Keywords Diabetes prediction · PCA · K-means · Voting · Logistic regression
1 Introduction
Diabetes is a frequent chronic malady. This disease severely affects the health of a
human being. The increase in blood sugar level from the normal range is the main
feature of this disease. Imperfect insulin secretion or impaired genetic effects are the
main causes of this disease. In this disease, human body either doesn’t generate suffi-
cient insulin or becomes inefficient in the usage of generated insulin in proper way. If
this disease is not treated in proper time, then it can cause harm to a person’s nerves,
eyes, kidneys, and other organs. The first type generally affects youngsters below
A. Yadav (B) · H. K. Verma · L. K. Awasthi

Technology Jalandhar, Jalandhar, Punjab, India
e-mail: anupamayadav18june@gmail.com
H. K. Verma
e-mail: vermah@nitj.ac.in
L. K. Awasthi
e-mail: director@nitj.ac.in
https://doi.org/10.1007/978-981-33-4543-0_69
652 A. Yadav et al.
thirty year. Some medical indications of this disease are more thirst and repeated
urination, high blood sugar levels, etc. It is not possible to cure this disease with oral
medicines only. In many cases, insulin is given to the human body through injection.
The second type of this disease mainly occurs in middle-aged and old people. In old
people, this disease mainly occurs due to fatness, high blood pressure, dyslipidemia,
arteriosclerosis, and other maladies.
The future trends can be predicted, or hidden pattern can be discovered using data
mining. There are various methods in data mining using which relevant information
can be taken out. These techniques include classification, clustering, association rule,
regression, outlier detection, etc. The technology of data mining is gaining a lot of
popularity in healthcare sector. Data mining is a leading tool set in clinical databases.
Nowadays, the use of data mining algorithms for generating clinical predictions has
become quite common. Over the past few years, many researchers have theorized that
medically assistive supports, and prediction patterns can be acquired from the crucial
data of a patient. Most of the researches in the area of disease prediction analysis
are focused on increasing the accuracy rate. The data should be in understandable
format for carrying out analysis.
2 Literature Survey
The database of diabetes patient had employed in this system to provide the diabetes
malady analysis. The exploitation of KNN and Bayesian algorithms was suggested
in this system that had carried out in the dataset of diabetes patients. Several diabetes
features were extracted for analysis of these algorithms to predict the disease of
diabetes [1].
Recommends a risk prediction model for diabetes of type-2. This model was
designed on the basis of ensemble learning technique. The selection of optimal
attributes was done by RF-WFS and XGBoost (extreme gradient boosting) classi-
fiers for selecting best features. A lot of performance parameters were compared in
this work to validate the efficiency of the recommended classifiers. Moreover, these
classifiers showed more accurate prediction results as compared to other existing
classifiers [2].
A medical case had considered after taking account of electronic health records
from different sources that were related to patients of diabetes. The naive Bayes and
SVM data mining algorithms for classifications were employed to implement the
analysis. This analysis focused on the diabetes prediction with health record. The
superior algorithm was identified for predicting the diabetes when the precision of
both the data algorithms was compared [3].
The GMM, support vector machine, artificial neural network, ELM, and logistic
regression were various data mining technique that had applied to diagnose the
diabetes in early phase. The outcomes obtained after experiments demonstrated that
the better accuracy had achieved from ANN than other methods [4].
Voting Classification Method with PCA … 653
The experiment on WEKA tool was conducted using four classifiers named SVM,
random forest, simple cart, and naive Bayes for diabetes prediction. The comparison
of these classifiers was performed in terms of exactness value, time for training and
testing. The classifier measure of accuracy was another performance technique that
had used for evaluation. The SVM classifier was performed better than naive Bayes,
RF, and simple cart for predicting the diabetes. The results acquired after the testing
demonstrated the efficiency of suggested model [5].
The predicting models were set up from the diagnostic medical datasets for
extracting the knowledge. This extracted knowledge had proved efficient for diabetic
prediction in patients. The diabetic mellitus was predicted using SVM, naive Bayes,
KNN, and decision tree (C4.5) machine learning algorithms on data related to
youngsters. The greater accuracy had obtained from decision tree (C4.5) [6].
The prediction of diabetes was completed using ANN, K-means, and RF methods.
The highest accuracy had achieved from the artificial neural network that was eval-
uated 75.7%. It was helpful to aid medical professionals for making decisions for
treatment [7].
A new model based on data mining to diagnose and predict diabetes disease in
early stage is given. This algorithm could be utilized for different types of data. The k-
means was simple, and it was very responsive to original locations of cluster centers.
This phenomenon determined the ultimate clustering result. The adequate clustered
dataset was received from it for the logistic regression model. To improve the accuracy
rate of k-means along with logistic regression was the main motive of this work. It
was evaluated from the results that the accuracy of both the algorithms mentioned
above was improved by principal component analysis. The recommended model
included three algorithms. These algorithms were identified as principal component
analysis (PCA), k-means for clustering, and logistic regression for classification. The
tested outcomes depicted that PCA algorithm made improvements in the k-means
approach. This k-means algorithm showed 25% improvement in accuracy rate while
logistic regression showed 1.98% more accuracy rate [8].
3 The Proposed Diabetes Prediction Model
Following are the various phases for the diabetic prediction (Fig. 1):
1. Dataset input: The diabetes dataset obtained from UCI database is used for this
prediction. The dataset is comprised of 768 sample female patient from the
Arizona, USA population who were examined for diabetes. The dataset has a
total of 8 attributes such as pregnancies (preg), glucose (plas), blood pressure
(pres), skin thickness (Skin), insulin, BMI, diabetes pedigree function, age with
one target class (0 or 1).
2. Attribute selection: In this phase, the technique of PCA is utilized to decrease
the dimensionality of the data. The technique of PCA is applied which can select
the most relevant attributes from the large number of attributes. The selection
654 A. Yadav et al.
Fig. 1 Diabetes disease

prediction model START
Input data from UCI repository for diabetes

prediction
Apply PCA algorithm for the feature reduction
Apply k-mean algorithm to cluster similar data
Apply voting classification method for the

diabetes prediction
Analyze in terms of certain

parameters like accuracy, precision and recall
STOP
of relevant attributes may lead to reduction in execution time as high dimension

data is extremely complex to process due to inconsistencies in the features.
3. Clustering: In this phase, k-means clustering algorithm will be used for the better
classification. The process in which the similar objects are grouped together is
called clustering. These objects are grouped according to their characteristics. K-
means clustering is one of the least complex algorithm which uses unsupervised
learning method to resolve known clustering problems.
Steps involved in k-means clustering are:
Step 1 Randomly setup k points called the group centroids.

Step 2 Elbow curve can be used to dictate the value of k (number of clusters).
Step 3 Calculate the separation between the data points and the group centroid
introduced using Euclidean distance formula.
Step 4 On the basis of minimum distance, data points are allocated into nearest
clusters.
Step 5 Calculate the mean value including the new data point for every cluster to
find out the new centroid.
Voting Classification Method with PCA … 655
Table 1 Models performance parameters

Model Precision Recall Accuracy
PCA + logistic regression 71 72 72.07
PCA + Naive Bayes 69 69 69.48
PCA + SVM 71 72 72.07
PCA + K-means + logistic regression 97 97 97.40
Proposed method (PCA + K-means + voting classifier) 98 98 98.05
Table 2 Comparison from previous work

S. No. Author Model Accuracy
1 Zhu et al. [8] PCA + K-means + logistic regression 97.40
2 Our approach PCA + K-means + voting classifier 98.05
Step 6 Repeat past two stages iteratively till the group centroids quits changing their
positions.
4. Classification: In this stage, a voting classification algorithm will be utilized

for the diabetic forecast. Voting is one of the easiest method of consolidating
the prediction from numerous machine learning algorithm. This voting classifier
will be combination of random forest, naive Bayes, and multilayer perceptron.
4 Model Evaluation and Result Comparison from Previous

Work
This work focuses on the diabetic prediction. The data is taken from the UCI database.
The dataset has 8 attributes, and dataset is of multivariate type for the prediction anal-
ysis. Different methods are implemented and compared in terms of certain parameters
like accuracy, precision, and recall. In the proposed method PCA, k-means and voting
classification approaches are implemented for diabetic prediction. The voting classi-
fication method is combination of multilayer perceptron (MLP), random forest, and
naive Bayes classifier. We have applied following models on the dataset whose result
is given in Tables 1 and 2.
In this paper, it is inferred that various steps are involved in diabetes prediction. The
technique of PCA is used for the feature reduction. The k-means clustering algorithm
is used to cluster alike and diverse type of data. In the last, the voting classification
656 A. Yadav et al.
method is implemented for the diabetic and non-diabetic prediction. It is examined

that proposed method has high precision, accuracy, and recall values as compared
to the existing methods. The techniques which are proposed in previous research
works are using different set of algorithms like k-means, SVM, logistic regression,
and other machine learning algorithm for prediction. The proposed algorithm is
combination of the PCA, k-means, and voting classification. The proposed model
gives accuracy about 98.05% which is better than previously achieved accuracies in
different papers which are mentioned above. In future, the proposed method can be
additionally expanded by utilizing transfer learning technique for diabetes forecast.
References
1. Shetty, D., Rit, K., Shaikh, S., Patil, N.: Diabetes disease prediction using data mining. In:
2017 International Conference on Innovations in Information, Embedded and Communication
Systems (ICIIECS), pp. 1–5, Coimbatore (2017)
2. Xu, Z., Wang, Z.: A risk prediction model for type 2 diabetes based on weighted feature selection
of random forest and XGBoost ensemble classifier. In: 2019 Eleventh International Conference
on Advanced Computational Intelligence (ICACI), pp. 278–283, Guilin, China (2019)
3. Raj, R.S., Sanjay, D.S., Kusuma, M., Sampath, S: Comparison of support vector machine
and Naïve Bayes classifiers for predicting diabetes. In: 2019 1st International Conference on
Advanced Technologies in Intelligent Control, Environment, Computing and Communication
Engineering (ICATIECE), pp. 41–45, Bangalore, India (2019)
4. Komi, M., Li, J., Zhai, Y., Zhang, X.: Application of data mining methods in diabetes prediction.
In: 2017 2nd International Conference on Image, Vision and Computing (ICIVC), pp. 1006–
1010, Chengdu (2017)
5. Mir, A., Dhage, S.N.: Diabetes disease prediction using machine learning on big data of health-
care. In: 2018 Fourth International Conference on Computing Communication Control and
Automation (ICCUBEA), pp. 1–6, Pune, India (2018)
6. Faruque, M.F., Asaduzzaman, Sarker, I.H.: Performance analysis of machine learning techniques
to predict diabetes Mellitus. In: 2019 International Conference on Electrical, Computer and
Communication Engineering (ECCE), pp. 1–4, Cox’sBazar, Bangladesh (2019)
7. Alam, T.M., Iqbal, M.A., Ali, Y., Wahab, A., Abbas, Z.: A model for early prediction of diabetes.
Inform. Med. Unlocked 16, 100204 (2019)
8. Zhu, C., UwaIdemudia, C., Feng, W.: Improved logistic regression model for diabetes prediction
by integrating PCA and K-means techniques. Inform. Med. Unlocked 17 (2019)
Hybrid Model for Heart Disease
Prediction Using Random Forest
and Logistic Regression
Hemant Kumar Sharma and Amrit Lal Sangal
Abstract Data mining is a method in which the valuable data is mined from the rough
data. The futuristic outcomes are forecasted using recent information in the prediction
analysis. The more useful, efficient, and commercial management of health resources
after the recognition of risks, the prediction of disease in people or the prediction
of hospital entry’s length is facilitated through it. This research work deals with the
prediction of the heart disease. There are several steps that are included in the heart
disease prediction. The preprocessing, feature selection and classification are some
of these steps. The Random Forest (RF) and logistic regression based the hybrid
scheme are introduced. The features are selected using RF. The implementation of
Logistic Regression (LR) is done for classification. The analysis of performance of
the recommended model for acquiring accuracy, precision, and recall is completed
in this research. The accuracy has obtained in predicting the heart disease from this
model is evaluated 95.08%.
Keywords Heart disease prediction · Naive Bayes · Random forest · Logistic

regression
1 Introduction
The use of data mining technology in healthcare sector has revolutionized the task
of disease prediction. The role of this technology in heart disease prediction is quite
significant. At present, a lot of data mining techniques are being used to detect and
extract valuable information from the medical dataset with minimum user inputs
and hard work. With the time, researchers have found several methods for imple-
menting data mining in medical domain so that different types of heart diseases can
H. K. Sharma (B) · A. L. Sangal

Technology Jalandhar, Jalandhar, Punjab, India
e-mail: happysharma602@gmail.com
A. L. Sangal
e-mail: sangalal@nitj.ac.in
https://doi.org/10.1007/978-981-33-4543-0_70
658 H. K. Sharma and A. L. Sangal
be predicted accurately. The performance of data mining differs from technique to

technique being used and the features selected. In general, the clinical datasets in the
medical domain are redundant and unpredictable. Therefore, there is the need of prior
and appropriate preparations for implementing data mining approaches. Following
are the various techniques commonly used in data mining:
a. Association: One of the known data mining technique in which the relation-
ship among particular items of similar transaction is used to discover a certain
pattern which is known as association. For instance, the relationship of various
attributes used for analysis in heart disease prediction is known through asso-
ciation technique. All the risk factors needed for disease prediction are used
[1].
b. Classification: On the basis of machine learning, another classic data mining tech-
nique designed is classification. Every object in the dataset is categorized into one
of the predefined set of classes through classification. The various mathematical
techniques are used in this method.
c. Clustering: The objects with similar property are clustered together to generate
a meaningful cluster using an automatic approach known as clustering. The
classes are defined by the clustering technique as well, and the objects are placed
in them. Further, in the predefined classes, the classification objects are assigned.
For instance, it is possible to cluster the list of patients with similar risk factors
using clustering when predicting heart disease. Therefore, the patients with high
blood sugar and relevant risk factors can be separated [2].
d. Prediction: The relation among independent variables and dependent variables
are discovered by another data mining technique called prediction.
2 Literature Survey
There are a few sources which are liable for any sort of coronary illness. The Naive
Bayesian (NB) calculation is viewed as which structure the Smart Heart Disease
Prediction (SHDP). A precision of around 89% is appeared by the proposed approach
[3].
To resolve the heart disease prediction related issues, ensemble techniques are
used. An accuracy of 85.48% is achieved by proposed technique [4].
To improve the accuracy of predicting cardiovascular diseases using a model of
hybrid random forest with linear model, an improved performance accuracy level of
around 88.7% was achieved in this research [5].
To predict heart disease, this research focused on adapting the SVM and apriori
algorithms. The medical profiles based on various factors were collected and used
here. The patients that were more likely to get heart disease were predicted here [6].
For the medical fraternity and patients, the usage of appropriate technology
support proved to be highly beneficial. Data mining techniques could be used to
resolve such an issue. The accuracies of naive Bayes and decision tree were compared
in this research [7].
Hybrid Model for Heart Disease Prediction … 659
To identify the risk in highly accurate manner, a heart disease prediction system
was proposed in this research. A new system was designed using the data mining tech-
niques. The frequent pattern growth association mining was applied on the dataset of
patients to provide strong association rules. The data could be explored, and the heart
disease could be predicted accurately by the doctors using this proposed method [8].
3 The Proposed Heart Disease Prediction Model
See Fig. 1.
3.1 Dataset Selection
The Cleveland dataset has been widely used for the heart disease prediction. This
dataset has 14 attributes.
Fig. 1 Heart disease model

START
Input the data from the UCI repository for

prediction
Pre-process the data for expulsion of missing and

redundant values
Apply random forest for the feature selection

and then K-means
Apply logistic regression classifier to perform
Analyze performance in terms of certain

parameters like accuracy, precision, recall
STOP
For applying data mining techniques such that completeness can be introduced, and a
meaningful analysis can be achieved on the data; the data preprocessing is performed.
The performance of training model is improved by providing a clean and noise free
data for the feature selection process.
3.3 Feature Selection
A subset of highly distinguished features is picked by feature selection to diagnose the

disease. The discriminating features that belong to the available classes are selected
by feature selection process. In the proposed method, the RF model is used for the
feature selection. The RF model takes 100 as the estimator value and generates tree
structure of the most relevant features. The random forest model selects the features
which are most relevant or important for the heart disease prediction. After that, the
selected features will be given to the k-means to do the clustering. The two clusters
will be formed as our target variable has two classes yes or no.
3.4 Classification
To categorize the given features for performing disease prediction, the selected
features are mapped to the training model. Here, a kind of heart disease is represented
by each separate class. The logistic regression model is applied for the classification.
The logistic regression takes input of the k-means output. In this research work, two
classes are defined which are heart disease and no heart disease. It means that which
persons have probability of heart disease and which don’t have probability of heart
disease.
4 Model Evaluation and Result Comparison from Previous

Work
A variety of models such as decision tree, Naive Bayes (NB), Multilayer Perceptron
(MLP), ensemble of Random Forest (RF), NB, and MLP is applied on the dataset. The
results of above models are compared in terms of accuracy, precision, and recall. It is
analyzed that accuracy of proposed model is 95.08% which is maximum as compared
to other models for the heart disease prediction. The dataset is divided into ratio of
60:40. 60% of the dataset is used for training, and remaining 40% is used for testing
(Fig. 2; Tables 1 and 2).
Hybrid Model for Heart Disease Prediction … 661
Fig. 2 Accuracy comparison of models
Table 1 Parameters based on performance of models

Model Precision Recall Accuracy
Decision tree 75 75 75.41
Naive Bayes 84 84 83.61
Multilayer perceptron 85 84 83.61
(NB + RF + MLP) 86 85 85.25
Proposed method 95 95 95.08
Table 2 Comparison from previous work

S. No. Author Model Accuracy
1 Anjan Nikhil Repaka et al. Naive Bayes[3] 89.77
2 C. Beulah Christalin Latha et al. Ensemble classification [4] 85.48
3 Our approach Proposed random forest and logistic 95.08
regression
Heart disease is term that defines any disorder related to the heart. The problems using
the blood vessels, circulatory system, and the heart are defined as the cardiovascular
disease. It is analyzed in this work that heart disease prediction is very challenging as
the large number of features included in it. The various models are tested for the heart
disease prediction like decision tree, naive Bayes, multilayer perceptron, ensemble
classifier. The novel model in which the random forest and logistic regression are
integrated is introduced for the prediction. The selection of features is generated using
random forest, and logistic regression is carried out to perform the classification.
The recall, accuracy, and precision obtained from the proposed model is computed
as 95%. In future, the proposed model can be further improved using the methods of
deep learning.
References
1. Duff, F.L., Muntean, C., Cuggia, M., Mabo, P.: Predicting survival causes after out of hospital
cardiac arrest using data mining method. In: Medinfo, pp. 1256–1259 (2004)
2. Frawley, W.J., Piatetsky-Shapiro, G., Matheus, C.J.: Knowledge discovery in databases: an
overview. AI Mag. 13(3), 57–57 (1992)
3. Repaka, A.K., Ravikanti, S.D., Franklin, R.G.: Design and Implementing Heart Disease
Prediction Using Naives Bayesian. IEEE (2019)
4. Latha, C.B.C., Carolin Jeeva, S.: Improving the accuracy of prediction of heart disease risk based
on ensemble classification techniques. Inform. Med. Unlocked 16, 100203 (2019)
5. Mohan, S., Thirumalai, C., Srivastava, G.: Effective heart disease prediction using hybrid
machine learning techniques. IEEE Access 7
6. Sowmiya, C., Sumitra, P.: Analytical Study of Heart Disease Diagnosis Using Classification
Techniques. IEEE (2017)
7. Priyanka, N., Kumar, P.R.: Usage of data mining techniques in predicting the heart diseases—
Naïve Bayes and decision tree. In: 2017 International Conference on Circuit ,Power and
Computing Technologies (ICCPCT), pp. 1–7, Kollam (2017)
8. Chauhan, A., Jain, A., Sharma, P., Deep, V.: Heart disease prediction using evolutionary
rule learning. In: 2018 4th International Conference on Computational Intelligence and
Communication Technology (CICT), pp. 1–4 (2018)
Detection of Android Malware Using
Machine Learning Techniques
Sonal Pandey, C. Rama Krishna, Ashu Sharma, and Sanjay Sharma
Abstract With the increase in popularity of the internet and android operating
system, the number of active internet user and their daily activity on android devices
is also increasing. So, that’s the reason malware writers are targeting android devices
more and more. The quickly creating malware is a major issue, and there is a require-
ment for discovery of android malware to secure the framework. Signature-based
technologies work efficiently for known malware but fail to detect unknown malware
or new malware. Academia is continuously working on machine learning and deep
learning techniques to detect advanced malware in today’s scenario. For machine
learning, feature vector and sufficient dataset are very important. In this paper, we
will develop and implement an approach for the detection of unknown malware with
a high detection rate.
Keywords Malware · Metamorphic malware · Android · Machine learning
1 Introduction
In recent years, android has overtaken many other mobile operating systems to
become one of the most popular and versatile mobile platforms in the world. Inter-
national Data Corporation (IDC) shared a report on global market share [1] for the
S. Pandey (B) · C. Rama Krishna

NITTTR Chandigarh, Chandigarh, India
e-mail: pandey.sonal88@gmail.com
C. Rama Krishna
e-mail: rkc@nitttrchd.ac.in
A. Sharma
Mindtree Hyderabad, Hyderabad, India
e-mail: ashu.abviiitm@gmail.com
S. Sharma
C3i, IIT Kanpur, Kanpur, India
e-mail: sanjaysr@iitk.ac.in
https://doi.org/10.1007/978-981-33-4543-0_71
664 S. Pandey et al.
smartphone operating system, this report shows that in the 3rd quarter of the year
2018, 86.8% or total market is shared by the android operating system. Developers
prefer android over other smartphone operating systems for developing applications
because it is completely open-source. On the other hand, a user is biased to opt for
android smartphones due to the availability of low to high-end models, easy to use,
customization, high level of multitasking, custom ROMs, support of a large number
of applications, etc.
1.1 Android Background
Google developed an android working framework, which is an open-source portable

mobile operating system dependent on the Linux kernel and has been released under
the Apache v2 open-source permit. The following sections provide an overview
of android system architecture, benign and malware application, android tools and
techniques to distinguish them, and previous research work is done in this field.
1.2 Malware and Their Types
A software/program which is proposed to attack the framework without client permis-

sion as well as makes denied move is recognized as malicious software [2]. As
new variants of malware are introduced every day by malware developers, malware
detection becoming a difficult task. Malicious software is term used for computer
virus, spyware, trojan, worm, infection, and so on. In today’s digital world are big
threats because highly skilled hackers are increasing customized malware to disrupt
industries and military espionage [3].
1.3 Malware Analysis Approaches
The detection process followed by signature-based, heuristic, normalization, and

machine learning techniques [4] which further accomplished the process with static,
dynamic, and hybrid analysis. Analysis approach means how detection techniques
gather information which is further used for detecting malicious software.
Static Analysis: Static analysis is employed to extract features from code which
is gathered by disassembling the programs with the utilization of any disassembler
tool further which is employed for a distinction between malware and benign [5].
Dynamic Analysis: Dynamic analysis is also defined to as behavioral anal-
ysis involves executing malicious program and monitoring behavior, framework
interaction, and the effect on the host machine [6].
Detection of Android Malware Using Machine Learning Techniques 665
Hybrid Analysis: It includes both static and dynamic approaches for malware
analysis. It first inspects malware code by static examination shadowed by a dynamic
analysis approach for the improvement of complete examination [7].
1.4 Malware Detection Techniques
The purpose of detection methods is to study the program’s behavior and verify
whether it is malicious or benign. Robust malware detection relies upon the capability
of handling obfuscated malware efficiently [2]. Two generally used obfuscation tech-
niques used are polymorphic and metamorphic in the generation of second-generation
malware. To battle threats/attacks from malware, antimalware developer’s software is
created, which primarily relies on the presumption that the formation of malware does
not modified considerably. The following are the techniques for malware detection:
1. Signature-Based: Signature-based technique is an easy and impressive way of
detecting known malware [8]. To combat threats/attacks from the malware,
antivirus developing companies use signature-based techniques. Unique byte
sequences are extracted when the malware is classified which is being used as a
signature.
2. Heuristic-Based: In heuristic-based detection technique artificial intelligence
used with signature-based detection techniques to enhance efficiency [9].
3. Malware Normalization: In malware normalization, normalizer acknowledges
the confounded type of malware and annihilates the confusion taken on the
program and creates the standardized executable.
4. Machine Learning: In the last couple of years, a machine learning technique
is attaining popularity for malware detection. “Tom Mitchell portrays machine
learning as the learning of computer algorithms that improves through analyses”
[10].
2 Background Study
Carrying out a literature review is extremely important in any research project as

it establishes the need for the work. For assessment of the issue and proposed
arrangement, some various research papers and some related books from international
conferences, journals, and symposia are extracted.
Schultz et al. [11] introduced machine learning for malware detection, author
utilized program executable (PE), byte n-gram and strings for features extraction
[11]. The classifier used by the author is Ripper, Naive Bayes, multi-Naive Bayes
for training and testing. So, from 2001, machine learning assumes a crucial job in
obscure malware presentation.
Allix et al. [12] introduced a novel approach in 2014 to extract the control flow
graph from the application program that is a more expressive way than n-gram repre-
sentation. The authors used a sizeable dataset (over 50,000) of android application and
implemented using machine learning classifiers viz. Random forest, J48, LibSVM,
and JRip using ten-fold cross-validation.
Feizollah et al. [13] showed the effectiveness of explicit Intent and implicit Intent
for android malware detection. The evaluation has been done on 5560 malware
samples and 1846 benign samples. They adept a 91% accuracy by utilizing android
Intent, 83% utilizing android permission, and by consolidating together, they learned
the detection rate of 95.5% [13].
Sun et al. [14] introduced SigPID which is based on the permission analysis
to detect malicious applications. Authors have extracted 135 permissions from the
dataset but used only 34 permissions (25% of total permissions) to differentiate
between malicious and benign applications [6]. They have used the support vector
machine (SVM) for model training and declares an accuracy of 93.62% within dataset
and 91.4% for unknown malware [14].
Tao et al. [15] in this paper, authors studied hidden patterns of malware in real
world android applications. Authors extracted sensitive APIs that are utilized in
malware and similarly implemented an automatic malware recognition system to
detect unknown android malware [15]. Authors conducted a comprehensive study
using 31,185 benign and 15,336 malware samples and got 98.24 F1 scores.
Rashidi et al. [9] introduced an android assets utilization hazard evaluation
called XDroid. They have utilized Drebin malware dataset and affirmed that their
methodology could achieve up to 82% precision [9].
Zhu et al. [16] proposed a highly efficient and low-cost approach to extract permis-
sions and sensitive APIs as a feature and used ensemble rotation forest for model
training. The authors used 2130 samples to train the model and got 88.26% detec-
tion accuracy with 88.40% sensitivity at the precision of 88.16%. Opcode plays an
important role in malware detection as Sanjay et al. [17] used opcodes frequency for
malware detection in their approach. They have used the fisher score feature selection
algorithm for relevant feature selection. Authors used several classifiers existing in
Weka machine learning tool and got almost 100% detection accuracy.
Recently, Ashu et al. [17] examined the five classifiers on the Drebin dataset using
the opcodes occurrence as a feature and got an accuracy of 79.27% by functional
tree classifier for malicious application detection.
Sahin et al. [18] proposed an authorization based android malware framework to
recognize malevolent applications. In contrast to different investigations, the authors
proposed an authorization weight approach. At that point, K-nearest neighbor (KNN)
and Naıve Bayes (NB) calculations are used and got 90.76% accuracy. As indicated
by the authors, the proposed approach has preferable outcomes over different ones.
In this segment, we clarify our methodology for the detection of malicious android
applications utilizing machine learning. Figure 1 displays the structure of our offered
strategy where we perform the accompanying advances:
• Dataset collection
• Feature extraction
• Feature selection
• Classification of malware and benign
3.1 Dataset Collection
We collect malicious and benign apks for android applications from AndroZoo [7],
which is a developing vault of android applications. AndroZoo contains the appli-
cations that are collected from the various sources, including the Google play store
marketplace. The dataset we use for the analysis contains 15,000 malware and 15,000
benign android application package APKs. We also check the secure hash algorithm
(SHA) value of the applications to ensure the unique sample for analysis.
Fig. 1 Flow chart of our

approach
In this phase, we extract the features from the dataset that we have collected. In
this work, we have performed the static analysis of android applications to identify
malicious applications. For the extraction of features, we use Apktool and Andro-
guard [19] reverse engineering tools. We extract seven categories of features, namely
requested android open source project (AOSP) permissions, requested third-party
permissions, providers, activities, receivers, services, and opcodes. We extract all
categories of features based on the occurrence in the application. During the initial
stage of feature extraction, we extract a huge number of features in each category.
Then, we first filter the features with the top frequency of occurrence in the dataset.
Figures 2, 3, 4, 5, 6, 7 and 8 shows the top 20 frequency feature in our dataset for
each category of features. The information that is extracted as features are as follows:
1. Permissions: Permissions are used to protect the privacy of an android user, and
a few applications also need permission to access users’ sensitive data like short
message service SMS, contact, etc. Some applications also request third part
permission which is not mentioned in the android open-source project [20]. The
combination of permission sometimes reflect the malicious behavior. Therefore,
we extract two types of permission as features.
• AOSP and third-party permission. Figure 2 shows comparison of top 20
android open-source project permissions in malware and benign. Figure 3
shows comparison of top 20 third-party permissions in malware and benign.
Fig. 2 Comparison graph of top 20 AOSP permission in malware and benign

Fig. 3 Comparison graph of top 20 third-party permission in malware and benign
Fig. 4 Comparison graph of top 20 activity in malware and benign
2. Activity: An activity class is defined as a urgent segment of an android appli-

cation, and the manner in which exercises are propelled and assembled is an
essential piece of the platform’s application model. Figure 4 shows comparison
of top 20 activity in malware and benign.
3. Opcodes: Opcodes play an essential role in the execution of the application [5].
During the literature review, we find that in the static analysis, operational codes
are basic building blocks during the execution of applications.
Figure 5 represents comparison of top 20 opcodes present in malware and benign.
Fig. 5 Comparison graph of top 20 opcodes in malware and benign
Fig. 6 Comparison graph of top 20 service in malware and benign

Fig. 7 Comparison graph of top 20 provider in malware and benign
Fig. 8 Comparison graph of top 20 receivers in malware and benign
4. Service: A service is an application part that can execute long-running activities

in the back-end, and it does not deliver a user interface [1]. A provider manages
access to a central repository of data. Figure 6 represents comparison of top 20
services present in malware and benign.
5. Provider: A provider is part of an android application, which often provides its
user interface (UI) for working with the data. Figure 7 represents comparison of
top 20 provider features present in malware and benign.
6. Receiver: Receivers answer broadcast messages from different applications or

from the framework itself. These messages are at some point called events or
intents. Figure 8 represents comparison of top 20 receiver features present in
malware and benign.
3.3 Features Selection
During the feature extraction phase, we extract a total of 696 features. But, we
understand that the efficiency of the machine learning model will decrease with a
large number of features, and also it increases the time to train and test the model. We
apply information gain and correlation coefficient as a feature selection algorithm
for the reduction of irrelevant features. During the literature review, we find that
the researchers widely use these feature reduction algorithms [21, 22]. During the
feature selection process, we remove those features which are either not participating
in enhancing the model performance or trying to make the performance of the model
worse. In the case of the correlation coefficient, feature selection, we select 180
features whose ranking score is greater than 0.1189 and in case of information gain
106 features are selected whose score is greater than 0.103665.
3.4 Classiftcation
This section discusses the machine learning classifiers which are utilized in our
work. For the classification, we practice four dissimilar supervised machine learning
classifiers, namely random forest [4], eXtreme Gradient Boosting (XGBoost) [23],
decision tree [24], and k-nearest neighbors (KNN) [25]. These ML classifiers are
mostly used in malware detection domain, and there is one reason to select tree-
based classifiers because these are very robust [20, 26–29]. They perform well on a
large variety of problems and also capture dependencies in such ways linear models
cannot. We spilt the dataset in a 70–30% ratio for training and testing the models,
respectively. We also are doing the parameter tuning while training and testing for
the higher performance of the classifiers. To analyze the performance of our models,
we execute experiments using ten-fold cross-validation.
Firstly, we applied ten-fold cross-validation on data set with a single feature vector,
i.e. (opcodes, receivers, providers, etc.), after that on combined features. Finally, we
applied ten-fold validation on the dataset with selected features using information
gain and correlation. Table 1 summarized the accuracy of each classifier with a
Table 1 Experimental results

Features/classifters Random forest XGBoost Decision tree KNN
Activity 0.759036 0.747992 0.757363 0.731928
AOSP 0.911312 0.878179 0.899933 0.873159
opcodes 0.953846 0.920013 0.92838 0.929384
Third party permissions 0.723896 0.72423 0.725904 0.691767
Providers 0.60107 0.601405 0.60107 0.567603
Receivers 0.716867 0.705823 0.716867 0.669344
Service 0.727242 0.722557 0.727242 0.683735
Combined all 7 feature without feature 0.978246 0.985609 0.977912 0.9334
selection
Combined all 7 feature with 0.991968 0.988956 0.98929 0.930723
information gain
Combined all 7 feature with correlation 0.990629 0.985609 0.985609 0.925033
single feature vector, without feature selection, and finally with selected features.
So, in the case of random forest using information gain feature selection algorithm
with combined features, we got maximum accuracy, i.e., 99.19%.
We presented a machine learning approach based on 7 types of features with two

feature selection algorithm, i.e., information gain and correlation algorithm to detect
and analyze malicious android apps. Specifically, we used 7 types of features viz.
requested AOSP permission, requested third-party permissions, providers, activi-
ties, receivers, services, and opcodes with two feature selection algorithm. In our
approach, we have used 4 classifiers for classification and got 99.1968% accuracy
with random forest.
The proposed approach can be extended by analyzing apps dynamically. As future
work, another exploration center is consolidating static and dynamic examination in
which different machine learning classifiers are utilized to dissect both source code
as well as dynamic feature of applications in run time environment.
References
1. International Data Corporation: Smartphone Market Share. https://www.idc.com/promo/sma

rtphone-market-share/os. (Nov, 2019)
2. Sharma, A., Sahay, S.K.: Evolution and Detection of Polymorphic and Metamorphic Malwares:
A Survey. arXiv:1406.7061 (2014)
3. Stone, R.: A Call to Cyber Arms (2013)
4. Dogru, N., Subasi, A.: Traffic accident detection using random forest classifier. In: 2018 15th
Learning and Technology Conference (L&T), pp. 40–45. IEEE (2018)
5. Sharma, S., Krishna, C.R., Sahay, S.K.: Detection of advanced malware by machine learning
techniques. In: Soft Computing: Theories and Applications, pp. 333–342. Springer (2019)
6. Shabtai, A., Moskovitch, R., Elovici, Y., Glezer, C.: Detection of malicious code by applying
machine learning classifiers on static features: a state-of-the-art survey. Inf. Secur. Tech. Rep.
14(1), 16–29 (2009)
7. Allix, K., Bissyandé, T.F., Klein, J., Le Traon, Y.: Androzoo: Collecting millions of android
apps for the research community. In: Proceedings of the 13th International Conference on
Mining Software Repositories, pp. 468–471. MSR ’16, ACM, New York, NY, USA (2016).
https://doi.org/https://doi.org/10.1145/2901739.2903508
8. Griffin, K., Schneider, S., Hu, X., Chiueh, T.C.: Automatic generation of string signatures for
malware detection. In: International Workshop on Recent Advances in Intrusion Detection,
pp. 101–120. Springer (2009)
9. Rashidi, B., Fung, C., Bertino, E.: Android resource usage risk assessment using hidden markov
model and online learning. Comput. Secur. 65, 90–107 (2017)
10. Dietterich, T.G.: Machine learning in ecosystem informatics and sustainability. In: Twenty-First
International Joint Conference on Artificial Intelligence (2009)
11. Schultz, M.G., Eskin, E., Zadok, F., Stolfo, S.J.: Data mining methods for detection of new
malicious executables. In: Proceedings 2001 IEEE Symposium on Security and Privacy. S&P
2001, pp. 38–49. IEEE (2000)
12. Allix, K., Bissyandé, T.F., Jérome, Q., Klein, J., State, R., Le Traon, Y.: Largescale machine
learning-based malware detection: confronting the” 10-fold cross validation” scheme with
reality. In: Proceedings of the 4th ACM Conference on Data and Application Security and
Privacy, pp. 163–166 (2014)
13. Narudin, F.A., Feizollah, A., Anuar, N.B., Gani, A.: Evaluation of machine learning classifiers
for mobile malware detection. Soft. Comput. 20(1), 343–357 (2016)
14. Li, J., Sun, L., Yan, Q., Li, Z., Srisa-an, W., Ye, H.: Significant permission identification for
machine-learning-based android malware detection. IEEE Trans. Industr. Inf. 14(7), 3216–3225
(2018)
15. Tao, G., Zheng, Z., Guo, Z., Lyu, M.R.: Malpat: mining patterns of malicious and benign
android apps via permission-related Apis. IEEE Trans. Reliab. 67(1), 355–369 (2017)
16. Zhu, H.J., You, Z.H., Zhu, Z.X., Shi, W.L., Chen, X., Cheng, L.: Droiddet: effective and
robust detection of android malware using static analysis along with rotation forest model.
Neurocomputing 272, 638–646 (2018)
17. Sharma, A., Sahay, S.K.: An investigation of the classifiers to detect android malicious apps.
In: Information and Communication Technology, pp. 207–217. Springer (2018)
18. Şahın, D.Ö., Kural,O.E., Akleylek, S., Kili¸c, E.: New results on permission based static analysis
for android malware. In: 2018 6th International Symposium on Digital Forensic and Security
(ISDFS), pp. 1–4. IEEE (2018)
19. Androguard. https://androguard.readthedocs.io/en/latest/ (Dec, 2019)
20. Milosevic, N., Dehghantanha, A., Choo, K.K.R.: Machine learning aided android malware
classification. Comput. Electr. Eng. 61, 266–274 (2017)
21. Jimenez, J.H., Goseva-Popstojanova, K.: Malware detection using power consumption and
network traffic data. In: 2019 2nd International Conference on Data Intelligence and Security
(ICDIS), pp. 53–59. IEEE (2019)
22. Zhang, Z., Chang, C., Han, P., Zhang, H.: Packed malware variants detection using deep belief
networks. MATEC Web Conf. 309, 02002 (2020)
23. Zhang, Y., Huang, Q., Ma, X., Yang, Z., Jiang, J.: Using multi-features and ensemble learning
method for imbalanced malware classification. In: 2016 IEEE Trustcom/BigDataSE/ISPA,
pp. 965–973. IEEE (2016)
24. Gunnarsdottir, K.M., Gamaldo, C.E., Salas, R.M., Ewen, J.B., Allen, R.P., Sarma, S.V.: A
novel sleep stage scoring system: combining expert-based rules with a decision tree classifier.
In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and
Biology Society (EMBC), pp. 3240–3243. IEEE (2018)
25. Cunningham, P., Delany, S.J.: k-nearest neighbour classifiers. Multiple Classifier Syst. 34(8),
1–17 (2007)
26. Alam, M.S., Vuong, S.T.: Random forest classification for detecting android malware. In: 2013
IEEE International Conference on Green Computing and Communications and IEEE Internet
of Things and IEEE Cyber, Physical and Social Computing, pp. 663–669. IEEE (2013)
27. Firdausi, I., Erwin, A., Nugroho, A.S., et al.: Analysis of machine learning techniques used in
behavior-based malware detection. In: 2010 Second International Conference on Advances in
Computing, Control, and Telecommunication Technologies, pp. 201–203. IEEE (2010)
28. Kruczkowski, M., Niewiadomska-Szynkiewicz, E.: Comparative study of supervised learning
methods for malware analysis. J. Telecommun. Inf. Technol. (2014)
29. Wang, J., Li, B., Zeng, Y.: Xgboost-based android malware detection. In: 2017 13th Inter-
national Conference on Computational Intelligence and Security (CIS), pp. 268–272. IEEE
(2017)
The Predictive Genetic Algorithm (GA)
Load Management Mechanism
for Artificial Intelligence System
Implementation (AI)
T. Pushpatha and S. Nagaprasad
Abstract The next generation of cloud infrastructure will allow the network more
versatile and useable resources effectively. Load balancing is one of the key issues
of cloud computing which distributes the task over many nodes to insure that no one
tool becomes overwhelmed which underused. The user must be willing to guarantee
that all criteria are fulfilled in a limited span of time for optimal performance for
applications which are cloud dependent almost every day. The cloud load genetic
algorithm (GA) approach is given in this article. Depending on the population initial-
ization period, the urgency of the proposal is considered. The emphasis is on the
idea of imaging the universe in question. Systems for real-life situation have other
targets for our algorithm to be mixed. The suggested method is modeled using cloud
analyst. A simulation of cloud infrastructure is feasible. The end result will reveal the
viability of a quantitative workload management approach that would help manage
working loads with an improved use of computational capital. This article offers a
new approach to genetic algorithm (GA) load control. To order to minimize the diffi-
culty of a single task, the algorithm handles the cloud storage load. The proposed load
balancing strategy was evaluated by a program analyst model. The findings of simu-
lations for a typical sample system show that the suggested algorithms exceeded real
methods like FCFS, round robin (RR), and the local search algorithm of stochastic
hill climbing (SHC).
Keywords Cloud computing · Load balancing · OLB · GA
T. Pushpatha (B) · S. Nagaprasad

Department of M.C.A., St.Ann’s College, Mehdipatnam, Hyderabad, Telanagana, India
e-mail: pushpareddy28@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_72
678 T. Pushpatha and S. Nagaprasad
1 Introduction
Configuration balance is one of the key issues of virtualization, command of setup.

There are large-scale load balancing projects, but cloud computing remains a signif-
icant topic and numerous research programs are under way [1]. Because, the archi-
tecture of the cloud is generic, and the problem is normal. Only with homogenous
and dedicated money will traditional load-balance algorithms be used, but cloud
computing cannot operate successfully [2]. The complexity, complexities, and flex-
ibility of the cloud technology are often popular but cannot be directly applied to
cloud infrastructure with conventional charge balancing algorithms.
Network cloud computing is an increasingly increasing network system that
delivers services to its consumers with the aid of remote device resources. Stored
are technology and facilities, production systems of technology and tool testing
experiments [1, 3]. This allocation of resources is made by suppliers of energy.
The second was placed under ‘Server Application Systems’ (SaaS) and the ‘Cloud
Data infrastructure’ (Iaas) server network (SaaS), respectively. [2] Cloud computing
is a demand-based, pay-as-go (PAYG) [4] private cloud storage network. Adobe,
Microsoft, Twitter, SAP, Oracle, VMware, IBM, and other big players are among the
key players for this development [1, 3]. The manufacturers are primarily IT firms.
Two different headings are displayed for the cloud storage platform. Firstly, informa-
tion is provided defined by the way the conventional cloud provider operates. This is
why three main SaaS, PaaS, and IaaS forms are widely used [5, 6]. The other is the
size, the connection, the management, the sophistication, and the accessibility of the
cloud model. The summary of NIST offers four cloud structures, proprietary, public,
collaborative, and hybrid [7]. The load balance relates to the ways in which oper-
ations are spread fairly across the storage infrastructure of data centers to improve
cloud computing performance. The primary function for load balancing that focuses
on the client or the service provider and can be identified by • the user in order to
minimize the impact of its operation regardless of any other network activity.
The service company’s objective is to increase the turnaround period and allocate
available funds effectively. The matter is divided into four steps which reflect a
handling solution for loads. (1) Load calculation: A load estimate is first necessary
to assess the load unbalance. Workload calculation involves different activities to
evaluate the number balance of operations. (2) Load start-up: If a difference occurs,
when loadings have been specified for all VMs.
The question of load disequilibrium is the unforeseen phenomenon on the part of
CSP, which undermines the capability and trust of system resources along with the
guarantee of standard of operation (QoS) under the service level agreement agreed
(SLA). Load balancing (LB) is essential under these circumstances, and this subject
is of particular interest to the researchers. At the physical device or VM stage [1] of
cloud computing, the load balancing may be accomplished.
The Predictive Genetic Algorithm (GA) Load Management … 679
2 Literature Survey
Cloud computing is recognized as one of the newest cloud computing technolo-

gies and is not designed for universities but rather for companies. As end-user soft-
ware, this cloud platform includes virtualized, streamlined, and substantial resources.
Indeed, it has a major advantage in promoting computation completely as a business.
There are thousands of machines. In the entire cloud world, it is not feasible to
manually allocate resources, and therefore, we focus on the principle of virtualiza-
tion. Innovative equipment maintenance, certified applications, and employee prepa-
ration options are offered by the cloud infrastructure. Cloud computing is entirely
Internet-based, millions of machines are internet linked to the web. Online computing
provides servers, bandwidth, applications, networking, and more.
The cloud platforms are versatile for users in the context of virtualization.
Figure 2.1 illustrates the concept of cloud computing design. The concept behind
cloud computing is virtualization. Virtualization incorporates vast computing
resources in order to maximize technology. The four layers of cloud computing were
proposed by Foster et al. (2008). Live ET everything. The framework layer includes
computer, device, and network resources. The actual property layer includes the
hardware image of the virtualization methodology. The application layer controls
the malware framework for end users. The foundation of the application contains
the web client. One of the big problems of virtualization is load balancing. Key
load balancing studies, however, are important topics for cloud infrastructure and
various work activities are ongoing. Because of the common cloud infrastructure
and the distinctive problem. The classic load balancing algorithms can only be used
with standard, committed services so that the cloud infrastructure cannot function
properly. Many facets of the cloud architecture are efficiency and durability, which
conventional load balances cannot do directly in cloud computing. M. Randles et al.
have investigated a decentralized strategy of load balancing with honeybees, which
is a natural solution.
It manages loads through the activity of the nearby program. The execution
of the software is enhanced by an expanded set of features, but the size of the
computer does not enhance performance. This is ideally tailored to the situations
under which a specific community of service users is needed. Z. Samson et al. In
a transparent, distributed computing system, a load balancing solution was intro-
duced which snarled the anti-colony and dynamic network theories. Such approach
removes complexity, can be adapted to different environments, provides genius in
faulty tolerance and is highly adaptable, thereby improving device efficiency. Without
the quality of a complex charge balance, the system uses small worlds.
3 Proposed System and Methodology
For VMs in cloud storage, a common load balancing technique was adopted. This
requires regional policy experience to render load management decisions. Normal
performance is improved similarly by load balancing, and fault reduction is not
taken into account. R. The card approach that involves the load balancing and the
distributed rate control system has been introduced by Hamilton et al. and acts as an
integrated tool for cloud management. Brazilian et al.
Yi = f (X i , β) + ei (1)
f (X i , β) = β0 + β1 X i (2)
For data centers with automated cloud virtualization and computing, vector dot
methodology has been implemented. The dot component is used to distinguish nodes
by utility requirements. The algorithm in the illustration aims to resolve the problem
of load balance for capital delivery.
Yi = β0 + β1 X i + ei (3)

= (Yi − f (X i , β))2 (4)
i
Nevertheless, the approach does not tackle the reduction of costs, that is, the
expense of the allocation of loads that may take longer than the actual measurement
time (Fig. 1).
To mitigate costs for storage and the benefit from decreased knowledge trans-
mission. However, in order to maximize the distribution and migration of the data
using the linear algorithm, such algorithms will require simultaneous applications
for data processing and migration, implementing slave load balance master (master
slave load balance).
Fig. 1 Predicted bandwidth requirement

Fig. 2 System architecture design

= ei2 = (Yi − (β0 + β1 X 1i + β2 X 2i ))2 = 0 (5)
i i
Yi = β0 + β1 X 1i + β2 X 2i + ei (6)

ei2 = (Yi − (β0 + β1 X 1i + β2 X 2i ))2 = 0 (7)
i i
Nevertheless, this method just addresses static load balancing. This indicates that
the Lagrange multiplier is estimated for the transmitted weight and therefore an
efficient functional weights balance converting algorithm in Euclidean form (Fig. 2).
The development technology of a hybrid grid and cloud infrastructure [8] reduces
the operating system’s length and overall management.

(xi − x)(yi − y)
β1 = (8)
(xi − x)2
yi = β0 + β1 xi . (9)

1 x2 xi2
σβ0 = σε + = σβ1 (10)
n (xi − x)2 n
To address the expenditure and the time period for the question. This approach
yields better outcomes in a shorter time. Similar types of topics were regarded.
3.1 Machine Learning Algorithms with BSP Paradigm
Typically located ML uses the BSP model, for example, spark and log. The calcula-
tion method for BSP contains a set of T super phases divided into a synchronization
firewall. Super phase is used to define a series of operations over two synchroniza-
tion maximum periods. In each super step, all measurement nodes conduct iterative
calculations simultaneously.

n
p

n
xi j xik βk = xi j yi , j = 1, . . . , p. (11)
i=1 k=1 i=1

X X β = X Y, (12)
β = (X X)−1 X Y. (13)
Enter a synchronization barrier and wait. The parameter changes and passes the
regional parameters to all computer nodes until the calculations are finished by all
the computer nodes and acts are settled upon.

1
ρ yX, β, σ 2 ∝ (σ 2 )− 2 exp − 2 (y − Xβ)T (y − Xβ) .
n
(14)
2σ
β = (XT X)−1 XT y (15)
All computer nodes then travel together through the next overlapping. The align-
ment limits are exposed. This sync function allows the parallel ML algorithm of
the BSP model to be serialized to ensure that changes in parameters are globally
consistent and that execution of the algorithm is precise.
3.2 The Efficiency of the BSP Machine Model
If there is an unbalanced cluster load, the efficacy of the BSP system model will
decrease considerably. For e.g., the Ho analysis shows that, if the LDA model is run
on 32 BSP devices, the synchronization barrier is six times higher than iterations
[9]. Some cases, thus removing the problem of straggler. The styling threshold of
DSP is low, but the cluster load balance is implied. The straggler problem cannot
be solved because when the system nodes are added, the DSP does not fully fit the
cluster charge. The following threshold limits analyses are provided.
3.3 DSP-Based Load Equilibrium Adaptation Method
As all computation nodes are modified synchronously and the iterative check quantity
from each device node is calculated through the usage of the output model, the
controller mechanism generates the function performance via the Ganglia machine
control unit. A-DSP provides a system for adjusting load balances dependent on
DSP.
4 Prediction and Simulation Method
The main parameter method components are the centralized management framework,
the application unit for output insurance, the centralized synchronous control device,
and the redistribution task framework. For global model parameters, the standardized
parameter control architecture is applied (Fig. 3).
More estimates at strong nodes are delegated by rapidly converting the estimated
sum of the iteration of each node into the actual iteration period between nodes, thus
essentially growing the cluster load and the configuration exercise. Transform FR
makes less node slower time and faster node measurement.

vs 2 = (y − Xβ̂)T y − Xβ̂ and v = n − k, (16)

ρ β, σ 2 = ρ σ 2 ρ βσ 2 , (17)

2 −
v0
−1 v0 s02
ρ σ 2
∝ (σ ) 2 exp − 2 . (18)
2σ
Fig. 3 Aggregates of sample data

Fig. 4 A basic design with three stages
1
ρ βσ 2 ∝ (σ 2 )− 2 exp(− 2 β − μ0 )T 0 (β − μ0 ) .
k
(19)
2σ
The working load array records the training number calculated with the next
iteration of the algorithm node 1. Three computers with one system and database
transfers are included on the site level. Only through the flow lines from thirds. It
is represented when not all operator pairs connect in the following traffic matrix
(Fig. 4).
The coding was passed to the manager of the dependency program and an unbiased
operation evaluation. It receives the job and tests if it is totally isolated or requires
multiple jobs.

μn = (XT X + 0 )−1 XT X β + 0 μ0 . (20)
It explores the relations between various activities if several activities are involved.
You’ll take the job queue and another job queue into consideration. The work roles
will be directed to the scheduler to plan childcare after childcare.
4.1 The Load Balancing in the IWRR
The addiction job list will contain activities dependent on the other VM activities. As
long as all the roles of the child are fulfilled in this set, the parent task of the VM is
delegated while the person tail includes different duties. The scheduler comes with
a separate work queue and addiction feature.

ρ β, σ 2 y, X ∝ ρ βσ 2 , y, X ρ σ 2 y, X , (21)

n = XT X + 0 , μn = (n )−1 XT X β + 0 μ0 , (22)
The scheduler selects the correct machine based on the IWRR algorithm. The
scheduler gathers the details of the resource planner.
n 1
an = a0 + , bn = b0 + yT y + μT0 0 μ0 − μTn n μn . (23)
2 2
It tests the processing power of the VMs and then uses the suggested algorithm to
determine the right VM for the particular task. Every VM provides comprehensive
details on the task execution list, task split list, and job custody.
4.2 Load Balance Measures with Percentage from VM
The load balancer tests the office number percentage to the VM point. The loading
on will of the VMs would be calculated by means of the VM’s job execution list if
the proportion is less than 1, then the VM scheduler must marking for the task.
1 T
bn = b0 + y y + μT0 0 μ0 − μTn n μn . (24)
2

p(y|m) = p(yX, β, σ ) p(β, σ )dβdσ (25)

1 det(0 ) b0a0 (an )
p(ym) = · · (26)
det(n ) bnan (a0 )
n
(2π ) 2
p(β, σ |m) p(yX, β, σ, m)
p(ym) = (27)
p(β, σ y, X, m)
The least-used VM will be allocated when usage falls below 20 per cent; the
scheduler will be informed of the right VM for the job. Before the right server is
located, the job will be assigned to this machine.
P(XZ)P(Z) P(XZ)P(Z)
P(ZX) = = (28)
P(X) ∫Z P(X, Z)dZ
The configured data centers contain host and VM with the corresponding elements.
The funds are checked for idleness and large loads to move demands for workers
efficiently to an acceptable location.
5 Results
The following order indicates the maximum to lowest efficiency of the computing
power of the heterogeneous VMs into account. Among the homogenous workplaces
among heterogeneous settings, more workers are allocated to higher ability VMs.

DKL (QP) = Q(Z) log Q(Z) − log P(Z, X) + log P(X) (29)
Z
The WRR takes into consideration the connection of VM ability to the overall
VM resources and assigns a proportionate amount of works

DKL (QP) = EZ log Q(Z) − log P(Z, X) + log P(X) (30)
When the least loaded is able to complete all of the works current in extremely
loaded worker in the shortest possible period. It is the next step it does. Based from
the previous equation, the LDVs will be allocated long work, so the execution period
will be postponed (Fig. 5).
The scheduler then checks the estimated completion date for each of the loaded
VMs and covers the estimated period for a VM at the real completion date of the set.
Consequently, the least likely period of depletion was calculated in one of the VMs
from the above measurements, and the function was then allocated to this VM. At
the end of the task is the load balance in the IWRR with the work time. Also, for
heterogeneous area data centers, this method is ideal (Fig. 6).
Solid diurnal task weights developed by the Cicada professional tracking
algorithm. Nearly, all weights are given 24 h earlier by the algorithm (Fig. 7).
The following instruments are listed. The data storage module helps to store each
computer node’s subtractive sets. The node reads the sharing of workload already.
The goal of this paper is to adapter to the dynamic workload distribution (Fig. 8).
The output of the Cicada estimation algorithm versus past. When a tale rises, the
sum of past requires to be considered. For a projection in all but one event, Cicada
requires fewer than 10 ms (Fig. 9).
Fig. 5 Figure relative L-2

error
Fig. 6 Index of X-indicates

matrix
Fig. 7 Results CDF
Fig. 8 Speed of prediction

calculations
Fig. 9 CDF 2 spatial

variations
Fig. 10 Index X-2
The SSP requires multiple training sessions to be performed by each the slowest
node completes this iteration by default. The global model parameters are then
satisfied in order to adjust local model parameters (Fig. 10).
The degree of the frequency of synchronization challenges is minimized, and SSP
model testing costs are lowered. At the same time, if the output is greater than PE,
a space-free CPU/PE may only be used to execute one operation at any time. The
intern performs higher job migrations in the WRR and RR algorithms. This sum
of migration is also important in the smaller number of resources in WRR and RR
algorithms (Fig. 11).
Displays this type for reservation. Rather, a ‘over-subcontracted data clumters’
(VOC) model is ideally adapted to authors who recognize there are no program traffic
trends in this curriculum vitae (quotation [37] and [61]). This model makes groups
with oversubscribed virtual machine interaction as seen in Figs. 3, 4 and 5b. The
VOC model needs two additional parameters (Fig. 12).
The estimate shows that the violation in SLA enhancement is less than 0.286.
Figure 6 points out the effects of the JCR with and without SVM. JCR has a redline
impact with SVM, while blue line reflects JCR’s effects without SVM. The figure
shows that the JCR increases above 0.538 (Fig. 13).
Fig. 11 Virtual switch

network results
Fig. 12 Error
Fig. 13 CDF 4 relative error

fraction
This is an example of the topology of the egoistic location of the network. In

order to place J1 and J3, the gourmet positioning algorithm also requires the rate-1
approach as it places (Fig. 14).
Effect on the fulfillment of requests using the field data instead of the Cicada esti-
mates as seen. Cicada is proposing to develop software in this respect for some
Fig. 14 Oversubscriptions factor

55–75%. The average increase is 11–18% in total and 25–26% in the amount
of applications that have been modified by Cicada. These are near the numbers
published.
6 Conclusion and Enhancement in Future
The improved weighted round robin algorithm helps VMs to operate in and out of
most compatible VMs. In the three different circumstances of the atmosphere cycle,
there are three distinct phases. Initial placement concentrates on the strengthened
weighted round robin algorithms to provide job requirements for msv participants,
based on the capabilities of the VM and the required working time. The dynamic
planner is ready for the loading and completion time of all configured VMs for all
configured VMs. The minimum time for completion of this specific role was then
defined for one of the VMs based on the above calculations. The weighing device
for the ring robin is at the end of each game. When complete, the load is spread
consistently within the facilities (VMs) involved over any VMs and idling periods.
The results of the success analysis and tests with this algorithm have shown that the
improved weighted ring royal algorithm is appropriate for heterogeneous work with
heterogeneous devices relative to the other ring and weighted circular algorithm.
This algorithm calls the QoS key parameter reaction times.
References
1. Simar, P.S., Anju, S., Rajesh, K.: Analysis of load balancing algorithms using cloud analyst.
Int. J. Grid Distrib. Comput. 9(9), 11–24 (2016)
2. Maguluri, S.T., Srikant, R., Ying, L.: Stochastic models of load balancing and scheduling in
cloud computing clusters. In: INFOCOM Proceedings IEEE, pp. 702–710 (2012)
3. Desyatirikova, E.N., Kuripta, O.V.: Quality management in IT service management based on
statistical aggregation and decomposition approach. In: 2017 International Conference “Quality
Management, Transport and Information Security, Information Technologies” (IT&QM&IS),
pp. 500–505. https://doi.org/10.1109/ITMQIS.2017.8085871
4. Cheng, D., Rao, J., Guo, Y., Jiang, C., Zhou, X.: Improving performance of heterogeneous map
reduce clusters with adaptive task tuning. IEEE Trans. Parallel Distrib. Syst. 28(3), 774–786
(2016)
5. Chiang, M.L., Luo, J.A., Lin, C.B.: High-reliable dispatching mechanisms for tasks in
cloud computing. In: BAI2013 International Conference on Business and Information, Bali,
Indonesia, p. 73, 7–9 July 2013
6. Mohapatra, S., Smruti Rekha, K., Mohanty, S.: A comparison of Four Popular Heuristics for
Load Balancing of Virtual Machines in Cloud Computing
7. Kundu, S., Rangaswami, R., Dutta, K., Zhao, M.: Application Performance Modeling in a
Virtualized Environment. In: Proceedings of IEEE HPCA, Jan 2010
8. Chiang, M.-L., Hsieh, H.-C., Tsai, W.-C., Ke, M.-C.: An improved task scheduling and load
balancing algorithm under the heterogeneous cloud computing network. In: 2017 IEEE 8th
International Conference on Awareness Science and Technology (iCAST). https://doi.org/10.
1109/icawst.2017.8256465
9. von Laszewski, G., Wang, L., Younge, A.J., He, X.: Power-aware scheduling of virtual machines
in DVFS-enabled clusters. In: IEEE International Conference on Cluster Computing and
Workshops, New Orleans, LA, pp. 1–10 (2009)
10. Kaneria, O., Banyal, R.K.: Analysis and improvement of load balancing in cloud computing.
In: International Conference on ICT in Business Industry and Government (ICTBIG), Jan 2016
11. Ajila, S.A., Bankole, A.A.: Cloud client prediction models using machine learning techniques.
In: 37th Annual International Computer Software and Applications Conference, Kyoto, Japan
(2013)
12. Lyu, H., Li, P., Yan, R., Luo, Y.: Load forecast of resource scheduler in cloud architecture. In:
2016 International Conference on Progress in Informatics and Computing (PIC)
13. Shakir, M.S., Razzaque, A.: Performance comparison of load balancing algorithms using cloud
analyst in cloud computing. In: 2017 IEEE 8th Annual Ubiquitous Computing, Electronics
and Mobile Communication Conference (UEMCON). https://doi.org/10.1109/uemcon.2017.
8249108
14. Kumar, M., Sharma, S.C.: Dynamic load balancing algorithm for balancing the workload
among virtual machine in cloud computing. In: 7th International Conference on Advances in
Computing and Communications, ICACC-2017, 22–24 Aug 2017, Cochin, India
15. Volkova, V.N., Chemenkaya, L.V., Desyatirikova, E.N., Hajali, M., Khodar, A., Osama, A.:
Load balancing in cloud computing. In: 2018 IEEE Conference of Russian Young Researchers
in Electrical and Electronic Engineering (EIConRus). https://doi.org/10.1109/eiconrus.2018.
8317113
16. Wang, Y., Ren, Z., Zhang, H., Hou, X., Xiao, Y.: “Combat Cloud-Fog” network architec-
ture for internet of battlefield things and load balancing technology. In: 2018 IEEE Interna-
tional Conference on Smart Internet of Things (SmartIoT).https://doi.org/10.1109/smartiot.
2018.00054
17. Li, J., Qiu, M., Niu, J.-W., Chen, Y., Ming, Z.: Adaptive resource allocation for preempt able
jobs in cloud systems. In: 10th International Conference on Intelligent System Design and
Application, pp. 31–36 (2011)
18. Shi, J.Y., Taifi, M., Khreishah, A.: Resource planning for parallel processing in the cloud. In:
IEEE 13th International Conference on High Performance and Computing, pp. 828–833 (2011)
19. Goudarzi, H., Pedram, M.: Multi-dimensional SLA-based resource allocation for multi-tier
cloud computing systems. In: IEEE International Conference on Cloud Computing, pp. 324–
331 (2011)
20. Dhiman, G., Marchetti, G., Rosing, T.: vGreen: a system for energy efficient computing in
virtualized environments. In: Conference of ISLPED 2009 San Francisco, California ,USA,
pp. 19–21 (2009)
21. Jin, H., Deng, L., Wu, S., Shi, X., Pan, X.: Live virtual machine migration with adaptive, memory
compression. In: IEEE International Conference on Cluster Computing and Workshops, New
Orleans, LA, pp. 1–10 (2009)
22. Pattanaik, P.A., Roy, S., Pattnaik, P.K.: Performance study of some dynamic load balancing
algorithms in cloud computing environment. In: 2015 2nd International Conference on Signal
Processing and Integrated Networks (SPIN)
23. Li, B., Li, J., Huai, J., Wo, T., Li, Q., Zhong, L.: EnaCloud: an energy-saving application live
placement approach for cloud computing environments. In: IEEE International Conference on
Cloud Computing, Bangalore, pp. 17–24 (2009). (2) (PDF) VM Allocation in cloud computing
using SVM. Available from https://www.researchgate.net/publication/336022132_VM_Alloca
tion_in_cloud_computing_using_SVM. Accessed 16 Mar 2020
Continuous Recognition of 3D Space
Handwriting Using Deep Learning
Sagar Maheshwari and Sachin Gajjar
Abstract In this paper, we attempt to present novel input methods that help enable
byzantine free of hands interface through recognition of 3D handwriting. The motion
is detected wirelessly by the use of the inertial measurement unit (IMU) of the
Arduino 101 board. Two different approaches are discussed. One approach is to use
the pattern matching engine (PME) of the Intel® Curie™ module on Arduino 101
mounted on the back of the hand. The second approach uses the IMU input to a well-
structured recurrent neural network. The spotting of handwriting segments is done
by a support vector machine. The former approach, being indigent of memory, is
not preferred over the latter. The deep learning approach can continuously recognize
random sentences. The model was trained on 1000 freely definable vocabulary and
was tested by only one person, achieving the lowest possible word error rate of 2%.
Keywords Air writing · Arduino · Deep learning · Recurrent neural networks ·

Support vector machine
1 Introduction
Hand gestures are a pervasive, common, and significant piece of a communicated

language. The advent of recognition of 3D hand gestures has enticed variety of
research concerns in various emerging fields namely pattern recognition, computer
vision, and human–computer interaction. There are several ways of sensing gestures,
one of which is a low-cost method of sensing through hand mounted sensors that
include accelerometers and gyroscopes [1]. Operations like writing or compre-
hending text or some different convoluted tasks entail more expressive capacity
than a limited bunch of secluded gestures [2]. This paper presents novel approaches
S. Maheshwari (B) · S. Gajjar

Department of Electronics and Communication Engineering, Nirma University, Ahmedabad,
Gujarat 382481, India
e-mail: 17bec091@nirmauni.ac.in
S. Gajjar
e-mail: sachin.gajjar@nirmauni.ac.in
https://doi.org/10.1007/978-981-33-4543-0_73
694 S. Maheshwari and S. Gajjar
that combine the intuition gathered from gestures to express it in the form of hand-
writing, specifically as a text output. Several challenges arise. First, in everyday life,
the gestures are not limited to specific handwriting segments, but also include the
normal day-to-day activities, introducing a lot of irrelevance in the text input inter-
face. The handwriting segments should be identified beforehand in the continuous
stream of data. Secondly, as the accelerometer data is noisy, it should be filtered
before sending it to the recognition stage. Third, the actual text input must be recog-
nized in the whole data stream. For continuous recognition, we use two approaches.
The first approach involves the use of Arduino 101 [3]. The Intel® Curie™ module
embedded on the Arduino 101 provides us a pattern matching engine that can be used
to recognize the gestures. The second approach is to use the 3-axes accelerometer
of Arduino 101 and divide the process into 2 stages. The first stage is the spotting
stage, which involves the use of a support vector machine [4] to classify between the
writing and non-writing segments. The second stage uses recurrent neural networks
for recognition of the gestures [5]. While the existing proposed scheme is based on
recognition of text, this can be utilized as a base for any type of gesture recogni-
tion scheme which is built on a primeval alphabet of freely definable gestures. The
first approach lacks suitable memory for large datasets; i.e., it is limited to only 128
bytes of memory per neuron for 128 neurons. The second approach, however, can be
applied to large definable vocabularies, larger than V1K. Following is the organiza-
tion of rest of the paper. Section 2 discusses the related work. Recognition of gestures
using Arduino 101 and deep learning are discussed in Sects. 3 and 4, respectively,
finally followed by conclusion.
2 Related Work
Recent research suggests the paradigm shift to mobile device computing by facil-
itating hands-free action. Gestures allow fostering interface that is independent of
any handheld tool. Hence, allowing faultless incorporation into day-to-day activities.
Mini-projectors portray the display on a rigid exterior in front of the subject and the
gesture is tracked via a camera or any other medium [6]. However, the approach
depends on sensory input and hence would perform poorly in case of continuous
stream recognition. Other researchers propose that 3D communication is doable
without any sort of graphical output. The operator needs to imagine a blank surface
that serves the purpose of the screen [7]. Handwriting can be predicted as text lacking
any optical or sensory feedback, a method that is used here. In any accelerometer data,
the spotting of relevant signal segments is necessary. This is possible by employing
a binary classifier to detect probable segments and then classify the gesture after-
ward [7]. This approach, however, introduces latency, and therefore, the overhead
involved reduces the efficiency of the recognition system. The other method is to
sort the input constantly and eliminate any irrelevant outputs. Gesture recognition
using accelerometer data has been experimented heavily previously where normally
numerous secluded motions are expressed and sorted [8]. Many researchers propose
Continuous Recognition of 3D Space … 695
a variety of methods to recognize gestures through accelerometer input. Though

various researchers have proposed various methods, they are either built on a very
exhaustive and a primitive vocabulary or lacks in recognition of continuous stream
of input. To this end, this paper discusses two independent approaches to recognize
gestures using accelerometer data. The said approaches solve the issue of limited
vocabulary and a continuous stream of input.
3 Gesture Recognition Using Arduino 101
Arduino 101 is a development kit that comprises Intel® Curie™ module, intended
to assimilate the low power usage of the core with elevated ease-of-use [3]. The
Arduino has capacities of low energy Bluetooth and consists of an onboard 6-axes
accelerometer/gyroscope. It consists of 2 miniature cores, a 32-bit ARC architecture
core, and an x86 (Quark), both of which are clocked at 32 MHz. The real-time
operating systems (RTOS) and the associated framework designed by Intel are both
open-source [3].
3.1 Deep Learning on Intel® Curie™
The pattern matching engine (PME) of Curie™ works as an engine for concurrent data
recognition having 128 concurrent processing elements (PE) each along with input
vector of 128-byte, 128-byte model memory, and 8-bit arithmetic units. It supports 2
classification techniques: radial basis function and k-nearest neighbors and supports
127 contexts. Arduino 101 provides CuriePME API that can be used to train and
classify gestures. Additionally, the module also provides an inertial measurement
unit having 6 degrees of freedom and each sensor sample (so ) can be represented as
a 6-dimensional vector corresponding accelerometer/gyroscope values:

so = (a, g) = (ax , ay, az ), gx , g y , gz (1)
As stated earlier, the QuarkSE core on Curie module comes with 128 neurons, with
128 bytes of memory per neuron. And hence, there is a trade-off between memory
and the data that can be classified. Figure 1 shows the glove for gesture recognition.
We propose a system shown in Fig. 2 which is user-dependent and gives a
comparatively poor performance in terms of word error rate on a person-independent
setup. This system gives better performance when the dataset comprises of utmost
3 syllable words. The system gives 100% accuracy when single letters are to be
classified. Continuous recognition of words is also possible with this setup but is not
recommended due to memory indigence.
For instance, drawing the letter A takes almost 2 s, which is 200 samples at
100 Hz. Now, the 3-axes accelerometer values in ‘int’ are 4 bytes each which makes
Fig. 1 Prototype of gesture recognition glove
Fig. 2 Prototype of gesture recognition glove
up 2400 bytes per letter. But for 128 neurons, our pattern can be no larger than 128
bytes. So, to throw off at least 95% of data without affecting the results requires
the use of under-sampling, after which the max size of 128 bytes per letter can be
achieved. Also, to remove the noisy data, we use an averaging filter. From the above
discussion, it is clear that memory management is not efficient with this system.
And which leads to a lot of data being wasted. The CuriePME library is mostly used
for (1) learning patterns (2) classifying and recognizing patterns and (3) storing and
retrieval of pattern matching as knowledge. The use of CurieBLE library ensures the
wireless function [9].
Figure 3 shows the raw and noisy accelerometer data. Figure 4 shows the
accelerometer data under-sampled to a total of 45 samples and mapped from 0 to 255.
Due to the memory constraints, i.e., only 128 bytes per pattern and poor memory
management, we propose a new method for gesture recognition with the use of deep
learning.
Fig. 3 Drawing ‘A,’ raw accelerometer data
Fig. 4 Under sampled version of ‘A’
4 Gesture Recognition Using Deep Learning
This approach is a more robust approach for gesture recognition. It can be divided
into spotting stage and the recognition stage. The combination of the two stages
introduces no overhead and there is no effect on the accuracy of word detection. The
process can be seemingly pipelined and real-time detection of gestures is possible.
Fig. 5 SVM architecture
4.1 Spotting Stage
The role of the spotting stage is to classify the writing and the non-writing segments
in the accelerometer data, uniquely. The intuition of the spotting stage is derived by
Amma et al. [4]. The segments that are correctly recognized as the writing segments
are then carried forward to the detection stage. The stage uses an RBF kernel (C =
126, γ = 2) based on a binary support vector machine (SVM) classifier. For usage
on continuous data streams and in real-time, the approach of the sliding window is
more suitable. The overlapping sliding windows are classified and accumulated to
send to the recognition stage. The window of length 0.9 s and shifting width of 0.1 s
is used in the approach. Figure 5 depicts the architecture of the spotting stage. In the
figure, green and red segments show writing and non-writing segments, respectively.
Visual inspection depicts that the handwriting part has high frequency and ampli-
tude than the non-writing part. For each window wt , the SVM classifier C(wt ), returns
1 when handwriting segment is detected and returns 0 otherwise. One sample of
sensor, st is categorized as a handwriting motion if at least a single window consisting
of st is categorized as handwriting segment [4].
C(st ) = max C(wk ) (2)

k: s(t)∈w(k)
This system is biased towards the detection of writing motion. Also, minute inter-
vals while writing would not result in any gaps in the detected writing segments. All
real-time experiment results show that the chosen values are suitable for the model.
As the system is biased, a high recall of 98.2% is attained in the process and the
low precision of 32% is attained. As a comparable result in [4], these values are
reasonable.
4.2 Recognition Stage
The purpose of gesture detection is to build a dominant classifier and hence several
deep learning models that exhibit temporal dynamic behavior come into play. These
state-of-art models include gated recurrent units (GRU) [5], long short-term memory
(LSTM) [5], and recurrent neural networks (RNN) [5]. We discuss RNN for this
stage. They are utilized for processing time-sequence data. From the input layer to
Fig. 6 RNN architecture
the output layer in a conventional neural network, the layers are fully connected that
is not appropriate for time-series data. Hence in RNN, the present output is also
related to the past output. A network would remember the previous output and apply
this information for calculating the present output. Theoretically, however, RNN can
manage infinite time-series data. In practice, to reduce intricacy, the present state is
only associated to the past few states, according to need [5]. Averaged 3D acceleration
and averaged 3D angular rate features are extracted from the inertial measurement
unit of Arduino 101. Figure 6 shows the RNN structure that is used.
Formula for the network is a derivative of the Eqs. 3 and 4:

h t = f wh h t−1 + wi x t (3)

y t = f wo h t (4)
where x t depicts input at t = 1, 2, 3, 4 …, ht forms the hidden layer at value t,

yt results in the output of step t, and f is usually a non-linear activation function,
example ReLU or Leaky ReLU or tanh [5]. The experiment was conducted by a
single subject. The length of sentences varied from 2 to 4 words. The user had to
write 10 English sentences without moving the wrist with an approximate height of
15 cm per character. In total, the user wrote 37 words with 245 characters. The neural
network took half an hour to train on a small vocabulary (V1k) that contains 986
words on NVIDIA GeFORCE GTX 1050Ti GPU. The word error rate was calculated
as depicted in Amma et al. [4]. A word error rate (WER) of 2% was reached.
5 Conclusion
In the initial part of the work, a wearable system of gesture input that is adept
in recognizing text input written in air centered on the IMU of Arduino 101 is
suggested. With the use of CuriePME, the system works well on detection of gestures
containing words with utmost 3 syllables and works with 100% efficiency when a
single syllable is input. However, dataset is limited because of lack of memory and
more memory is required to expand the vocabulary. To avoid memory constraints, a
new method using deep learning was used. During the spotting stage, 98% recall and
32% precision was achieved. The network was trained on a very small vocabulary
(V1K). Experiments were conducted on a dataset of approximately 300 words. A
WER of 2% was attained. In the future, the proposed system will be tested on a
versatile dataset of large vocabulary V8K and above.
Acknowledgements The work is funded by IDEA LAB Program at Institute of Technology, Nirma
University, India under contract IDEA-2019-EC-02.
References
1. Cheng, H., Yang, L., Liu, Z.: Survey on 3D hand gesture recognition. IEEE Trans. Circuits Syst.
Video Technol. 26(9), 1659–1673 (2016)
2. Amma, C., Schultz, T.: Airwriting: demonstrating mobile text input by 3D-space handwriting.
In: Proceedings of the ACM International Conference on Intelligent User Interfaces (IUI’12)
(2012)
3. “Arduino -Arduino101”, Arduino.cc, 2020 [Online]. Available https://www.arduino.cc/en/
guide/arduino101. Accessed: 05 Apr 2020
4. Amma, C., Georgi, M., Schultz, T.: Airwriting: hands-free mobile text input by spotting and
continuous recognition of 3D-space handwriting with inertial sensors. In: 2012 16th International
Symposium on Wearable Computers, Newcastle, pp. 52–59 (2012)
5. Du, T., Ren, X., Li, H.: Gesture recognition method based on deep learning. In: 2018 33rd
Youth Academic Annual Conference of Chinese Association of Automation (YAC), Nanjing,
pp. 782–787 (2018)
6. Chen, F., et al.: WristCam: a wearable sensor for hand trajectory gesture recognition and
intelligent human–robot interaction. IEEE Sens. J. 19(19), 8441–8451 (2019)
7. Gustafson, S., Bierwirth, D., Baudisch, P.: Imaginary interfaces: spatial interaction with empty
hands and without visual feedback. In: Proceedings of the 23rd Annual ACM Symposium on
User Interface Software and Technology (UIST’10) (2010)
8. Elmezain, M., Al-Hamadi, A., Michaelis, B.: Hand trajectory-based gesture spotting and recog-
nition using HMM. In: 2009 16th IEEE International Conference on Image Processing (ICIP),
Cairo, pp. 3577–3580 (2009)
9. Support for Intel® Curie™ Modules”, Intel, 2020 [Online]. Available https://www.intel.com/con
tent/www/us/en/support/products/94036/boards-and-kits/intel-curie-modules.html. Accessed:
05 Apr 2020
Automated SQL Grading System
Shohna Kanchan, Samruddhi Kalsekar, Nishita Dubey, Chelsea Fernandes,

and Safa Hamdare
Abstract A grading system is a procedure used by teachers to assess and eval-

uate a student’s educational performance. The method employed by tutors in a
general grading system for grading structure query language (SQL) assignments
is laborious, time consuming, and inaccurate. In this paper, an automated grading
system for SQL queries is proposed which thus provides an efficient way to assess
a student’s performance by awarding appropriate scores. The automated grading
system is implemented for partial marking using PostgreSQL. The front-end of our
system is executed with a database driven website in Django. We believe that the
system will be very useful to global educational systems.
Keywords Automated grading · Partial marking · PostgreSQL ·

Canonicalization · Structure query language
1 Introduction
An automated grading system provides efficient means for tutors to check the
student’s understandings of certain concepts in order to determine comprehension.
Considering the incrementing proportion of students, the automation of this process
highly enhances the overall efficiency. The student queries are graded by comparing
S. Kanchan · S. Kalsekar (B) · N. Dubey · C. Fernandes · S. Hamdare

Department of Computer Engineering, St. Francis Institute of Technology, Mumbai, India
e-mail: samruddhikalsekar3@gmail.com
S. Kanchan
e-mail: shohnakanchan14@gmail.com
N. Dubey
e-mail: nishitad14@gmail.com
C. Fernandes
e-mail: chelseafernandes966@gmail.com
S. Hamdare
e-mail: safahamdare@sfit.ac.in
https://doi.org/10.1007/978-981-33-4543-0_74
702 S. Kanchan et al.
its each component with the components of the correct query [1, 2]. An initial
approach is to award full marks if the query under consideration is correct; i.e.,
the correctness of the student SQL query is evaluated by comparing the result and
the respective query of student query with that of instructor query. Furthermore, in
cases as given below [3], wherein the queries look similar but give different results.
Instructor query Student query

SELECT Department.Dept_Id, SELECT Department.Dept_Id, Employee.Name
Employee.Name FROM Department
FROM Department RIGHT JOIN Employee
RIGHT JOIN Employee ON Department.Employee_Id =
ON Department.Employee_Id = Employee.Employee_Id
Employee.Employee_Id ORDER BY Employee.Employee_Id
ORDER BY Department.Dept_Id
In such a scenario our system allocates partial marks based on the matching
of student query with the correct query. A student may write some parts of the
query correctly, and in such cases, partial marks should be allotted by taking the
weighted attributes and predicates under consideration. Partial marking, thus, incor-
porates various sub techniques under query pre-processing for awarding partial
marks to incorrect student queries. Canonicalization of the student query as well
as the instructor query is required for performing comparison of the same syntactic
variations. Thus, the canonicalized queries are then broken into components and
these components are compared. Moreover, canonicalization may not guarantee an
optimal result due to deviations in the form in which the student query and instructor
query are canonicalized, even though they are equivalent. As a result, various pre-
processing techniques are required. Dividing a given query in different attributes and
performing initial pre-processing is an integral step toward awarding partial scores.
Depending upon the syntactic variations, techniques for instance, attribute disam-
biguation, WITH Clause Elimination, BETWEEN Predicate Elimination, normal-
ization of relational, predicates, join processing are performed. In this paper, we
summarize the pre-processing techniques which are required to allot partial marks
to the student queries.
2 Literature Review
The techniques for data generation in the X-Data system were further extended
in order to include a much larger variety of queries and mutations. This data was
used in building a SQL grading system. The testing for the accuracy of the datasets
generated by this X-Data system was conducted by using SQL queries that students
submitted as a part of their DBMS course. This system did not support, or supported
only partially, some SQL features which included sub-queries in a query, queries
containing arithmetic operations, and identifier replacement mutations. It also did
Automated SQL Grading System 703
not support the functionality of assigning partial marks to examine the extent of
correctness of the student query [3, 4].
A system was presented that took a database application program as input. It
generated datasets and using these datasets, unit tests were carried out to test the
accuracy of the functions with queries in the application. The techniques that were
used were on the basis of mutation testing and static program analysis. Java appli-
cations that used JDBC or Hibernate APIs were examined. The system could not
handle all areas of SQL query mutations. It would not suggest correct queries based
on the datasets [2, 5].
An automated SQL assignment grading system was developed using object-
oriented design (OOD) technique and model-view-controller (MVC) framework. The
system consisted of two main parts: assignment management and automated SQL
grader. Instructors could manage their assignment and student information conve-
niently anytime and anywhere via internet network. The automated SQL grader was
designed to support four DBMSs: MariaDB, MySQL, PostgreSQL, and Microsoft
SQL server. In this system, grading on SQL outputs was not applicable for the SQL
with comparison operators. The partial marking system was absent [6].
The scope of the XData system was increased by including functionalities of
assigning partial marks to student queries. In the comparison of student and instructor
query, the system was able to check many more syntactic features. Due to this, the
system was able to be fully automated and scalable to huge class sizes such as
those of massive open online courses (MOOCs). Canonicalization of sub-queries
was not taken into account in this system. Canonicalization of DISTINCT placement
in FROM clause sub-queries versus outer queries was another area of future work
[1, 7].
3 Challenges Identified
Canonicalization and partial marking of sub-queries or nested sub-queries needs to

be done by de-correlation. This system would make use of a method of dividing
the constraints and putting it into a table which will contain the following fields:
Predicates, Projections, Relations, Group By, and Having Clause. This technique is
different from the one mentioned in research paper, i.e., test data generation.
Ability to grade the student query according to specifications explicitly indicated
by the instructor which must be reflected in the student query is the biggest challenge
identified. These specifications must be taken into account since it may vary in
different cases.
Comparison between the query execution time of the student with the instructor
and the same must be reflected on the portal. A suggestion will be given to the
student if it has taken more query execution time but the marks will not be deducted.
The automated SQL partial marking system [1] is not open- source and not freely
available for use.
4 Problem Definition
A generic assessment process for grading of structured query language tradition-

ally follows two approaches; mainly by either comparing the input query with the
optimal set of queries or by executing the respective input query. Considering the
manual effort and the accompanying error in both the above-mentioned approaches,
an automated grading approach proves to be more efficient and effective. Therefore,
we model an automated SQL grading system which includes partial marking of SQL
queries.
5 Problem System Methodology
5.1 Flow-Chart of the System
Step 1 An instructor can create SQL assessment tests and can provide model
answers for the same in the instructor mode.
Step 2 Instructor will enter the required keywords and assign corresponding weights
to entities of the model answer query.
Step 3 In the student mode, the student attempts the assessment test and submits
answers for each question.
Step 4 Once the query is submitted, the student query and the tutor query as well
as their outputs are evaluated by the matching criteria, and if all mentioned
conditions are satisfied, the student is awarded full marks.
Step 5 If the query is incorrect, the student gets the justification for the same in the
learning mode along with appropriate partial scores (Fig. 1).
5.2 Algorithm
1. Evaluation of student query using X-Data System (X-Data generates multiple

datasets to kill mutations).
2. Canonicalization or removal of syntactic variations of student and instructor
query to make the queries comparable.
3. Pre-processing includes attribute disambiguation between predicate elimination,
normalization of relational predicates, join processing.
4. Generating equivalence class of attributes.
5. Join minimization.
6. Functional dependencies which includes canonicalizing ORDER BY attributes,
comparing GROUP BY attributes, canonicalizing duplicate removal.
7. Deconstruction of SQL queries into components such as SELECT list, FROM
clause, WHERE clause, GROUP BY, HAVING, etc.
Fig. 1 Flow-chart of the assessment system
8. Component and sub-parts (attributes) matching.

9. Computation of marks using weighted technique.
In Fig. 2, both the instructor and student queries are segregated into elements inclu-
sive of the basic selection clause, where clause predicates, from clause, operators,
etc. These elements are further divided into sub-parts like Predicates, Projections,
Relations, Group By, and Having Clauses. For each component of the instructor
query, the sub-parts from the instructor query are matched with the corresponding
sub-parts from the student query. Missing sub-parts are penalized by giving marks
for that component in proportion to the number of instructor query sub-parts that
are actually present. Extraneous sub-parts in the student query are not penalized;
marks are computed in this manner for each subpart and added to get a mark for each
component [1].
6 Performance Evaluation Parameter
Following are the parameters based on which the performance will be evaluated:
Instructor query:
SELECT Department.Dept_Id, Employee.Name
FROM Department
RIGHT JOIN Employee
ORDER BY Department.Dept_Id;
Student query:
SELECT Department.Dept_Id, Employee.Name
FROM Department
RIGHT JOIN Employee
ORDER BY Employee.Employee_Id;
Marks: 6.5
Paral Marking Details
Department.Employee_Id = Department.Employee_Id =
Employee.Employee_Id Employee.Employee_Id
Department.Dept_Id, Department.Dept_Id, Employee.Name
Employee.Name
Department Department
Employee Employee
Employee Employee
Employee.Employee_Id Department.Dept_Id
Fig. 2 Component-wise partial marking
1. User Friendly Interface: The interface must be easily accessible and comprehen-
sible by the user (student and tutor).
2. Easy integration with existing systems: The system should portray flexibility
with respect to installation and up gradation.
3. Processing time: Time taken in evaluation of queries and displaying aggregate
marks.
The accuracy of the system will be evaluated based on the required execution time
and how well the awarded partial marks correlate with the marks assigned by the
tutor. Unit test of each program will be performed using black box testing followed
by integration, system and acceptance testing.
Hardware Requirements: A system with RAM 2 GB and storage 2 GB

Software Requirements: Python, Django
Database used: PostgreSQL.
The user interface proposed in this project is classified into two working modes—
Instructor mode and student mode. The fundamental aim of the system is to provide
easy navigation and interaction providing interfaces, thus enhancing the educational
Fig. 3 Sample user interface
experience. The two corresponding modes will maintain a user specific profile for
every student and teacher, also providing authorization for the same. Furthermore,
the student mode is classified into two profiles—Learning and Assessment.
In the instructor mode, the questions and respective solutions will be provided.
The instructor also defines the keywords and assigns the weights. For e.g., in the
above sample GUI, given in Fig. 3 the instructor has provided the keywords such
as INNER JOIN, GROUP BY. Then, in the student mode, the student can view the
marks allotted to them and the optimal query execution time taken.
8 Conclusion
In the course of our research, we have studied the various approaches contributing
to the SQL query evaluation processes, especially highlighting about the syntactic
mutations of the same. We have tried to address the challenges identified due to
conventional grading systems through our developed system. In this work, we are
primarily focusing on standardization and canonicalization techniques to process and
evaluate SQL queries for assignment evaluation, as well as enhancing the learning
environment by improving efficiency. The system provides a clear indication of the
correctly assessed queries and vice versa up to a specific efficiency rate for the result
instances tested against the system. Considering 50 queries, comparing the expected
and actual result, the system facilitates an efficiency rate of 76%. We have also
successfully studied the flow of the entire system regarding the query processing and
its various techniques involved.
References
1. Chandra, B., Joseph, M., Radhakrishnan, B., Acharya, S., Sudarshan, S.: Automated grading of
SQL queries. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE)
2. Neumann, T., Moerkotte, G.: A combined framework for grouping and order optimization. In:
VLDB, pp. 960–971 (2004)
3. Chandra, B., Chawda, B., Kar, B., Maheshwara Reddy, K.V., Shah, S., Sudarshan, S.: Data
generation for testing and grading SQL queries. VLDB J. 24(6), 731–755 (2015)
4. Paulley, G.N., Larson, P.-A.: Exploiting uniqueness in query optimization. In: CASCON,
pp. 804–822 (1993)
5. Agrawal, P., Chandra, B., Venkatesh Emani, K., Garg, N., Sudarshan, S.: Test data generation
for database applications. In: IEEE 34th International Conference on Data Engineering (ICDE)
(2018)
6. Singporn, P., Vichianroj, P., Trongratsameethong A.: ASQLAG—automated SQL assignment
grading system for multiple DBMSs. J. Technol. Innov. Tertiary Educ. 1(1), 41–59 (2018)
7. Silberschatz, A., Korth, H.F., Sudarshan, S.: Database System Concepts, 6th edn. McGraw Hill
(2010)
Error Analysis with Customer
Retention Data
V. Kaviya, V. Harisankar, and S. Padmavathi
Abstract Churn prediction is required by most of the service companies to improve

their business. The machine learning approaches concentrate on the selection of
algorithms or features to improve the accuracy of churn prediction. Algorithms to
understand what went wrong and why a prediction is not accurate are needed to
improve the system. This paper gives special attention to the error analysis of those
approaches and the overall analysis of the dataset. This paper analyses the working
of various machine learning approaches for customer retention prediction based on
bank customer’s transaction data. It also gives a detailed error analysis using distance
and similarity metrics like Mahalanobis distance, Hamming distance, and Jaccard
Similarity Score. It provides a ranking for the features in the dataset based on error
analysis and also lists their importance in a quantified manner by removing highly
ranked features.
Keywords Churn prediction · Data mining · Error analysis · Machine learning ·

Similarity metrics
1 Introduction
In the digital era, vast amounts of data are generated from various sources like
Health care, retail, telecommunications, banking, social networking sites, etc. Due
to the sharp growth of data, researchers and decision-makers often find it difficult
to analyse the data with efficiency and obtain beneficial and worthy conclusions.For
V. Kaviya (B) · V. Harisankar · S. Padmavathi

Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore,
V. Harisankar
S. Padmavathi
e-mail: s_padmavathi@cb.amrita.edu
https://doi.org/10.1007/978-981-33-4543-0_75
710 V. Kaviya et al.
any business, customer retention is the primary key to its long- term survival. Cus-
tomer retention is the act of retaining customers by undertaking activities to prevent
customers from defecting to other peer companies.According to the Harvard busi-
ness review, a company can raise its profits by 25–85% if its customer retention
rate is increased by 5% [1]. Therefore, companies are in need of accurate analytical
models that can identify the fruitful customers based on their personal, behavioral
and demographic data. The analysis used in this paper is carried out on a dataset of
transactions of bank clients [2].This paper intends to analyse and inspect the working
of numerous machine learning approaches for the dataset using different evaluation
metrics and to focus on the error analysis based on the wrong predictions to give a
clear understanding on the reason for the deviation from the general trend and wrong
predictions.This paper begins with Sect. 2 where a detailed assessment of the already
existing methodologies is presented. This section also discusses the limitations of
these methodologies. In Sect. 3, a description of the overall working mechanism of
the study with results of all the approaches along with the error analysis is presented.
Finally, the paper ends with Sect. 4 where conclusions are presented.
2 Literature Review
Most of the studies conducted by the machine learning community on churn predic-
tion have used datasets from Telecom industry [3–6] and very few studies have taken
datasets from Banking industry [7, 8]. In [2], the study compared the results of the
Decision tree algorithm with both the Spark ML package and the Spark MLlib pack-
age in handling enormous data and found that Spark ML package performed better.
In data pre-processing, it can be seen that feature selection, random under-sampling
or over-sampling, data cleaning, feature extraction, standardization and encoding of
categorical and continuous attributes has significant impact on the prediction of the
model.Prediction techniques like CART, SVM, Random Forest, MLP(Neural Net-
works), Naive Bayes and DT [3, 4, 9] are used to a large extent,and it has been found
that traditional models like DT and SVM perform better compared to the Neural
networks and clustering models.
However, most of the studies in the literature have not considered a study that
covers the error analysis for the wrong predictions. There are various types of errors
related to machine learning and data analytics. The training and testing error samples
are important for error analysis and testing error samples are considered the most
important since they aid in assessing potential execution of a given prescient model
on fresh and inconspicuous data. Therefore, the current study is unique considering
all the widely used pre-processing techniques employed into a single study and then
establishing a solid ground for the classification errors and the deviation from the
general trend by performing error analysis using distance and similarity metrics.
Error Analysis with Customer Retention Data 711
3 Methodology
3.1 Dataset
Dataset taken in this analysis contains details of bank clients which is freely accessible
on Kaggle. It consists of transactional details of 10,000 customers. The features of
the dataset include Row Number,Customer ID, Surname, Credit Score (A credit
score is a 3 digit number that quantifies a person’s capacity to pay back the acquired
amount), Geography The locality of the clients across three nations where the bank
is working), Gender, Age Balance, IsActiveMember, Estimated Salary, Tenure (The
time of having the account in months), NumOfProducts (Number of accounts the
individual has), HasCrCard (Binary variable to indicate whether the client has a credit
card)and Exited (Binary variable to denote whether the client has left the bank).
Studies have demonstrated that preprocessing has a remarkable impact on the pre-
diction of the model.First, the insignificant attributes (Row Number, CustomerID
and Surname) were dropped.Categorical variables like Gender and Geography are
encoded using one hot encoding and Min max normalization is adopted for feature
value normalization.
One of the main problems faced by any classification model is class bias where
there is an unequal distribution between the classes of a target variable. SMOTE
(synthetic minority oversampling technique) algorithm is utilized which works by
creating an arbitrary arrangement of minority class samples to move the classifier
learning bias towards minority class. ROC scores have improved from 74.62 to
75.55% for RandomForest after applying SMOTE.
3.3 Evaluation Metrics
The execution of the model is assessed utilizing accuracy and AUC- ROC. Accuracy
is the ratio between the correct number of predicted samples and the total number of
samples. ROC-AUC score reveals the ability of the model to differentiate between
classes. Larger the score, better the model is at distinguishing classes.
3.4 Observation
The accuracy and ROC-AUC scores for all algorithms are tabulated in Table 1.
From the table, it can be seen that Random Forest and XGBoost gave better accuracy
and ROC-AUC score compared to other algorithms. Random Forest is a cluster of
decision trees. Each node within the decision trees is a condition on one feature, to
group similar values from the dataset. For classification algorithm the condition is
based on Gini impurity. The feature importance is computed based on the amount of
influence that each feature has in decreasing the weighted impurity.For example, in
Random Forest, the final feature importance is the average of the values across all
the trees.Feature importance for Random Forest and Xgboost is shown in Fig. 1 and
it can be observed that ‘Age’, ‘NumOfProducts’, ‘Balance’ are the most dominant
features in both algorithms.
Figure 2 gives the feature importance based on Random Forest algorithm when (a)
wrongly classified samples alone are taken and (b) when an equal number of correct
and wrongly classified samples are used. It can be observed that ‘CreditScore’, ‘Age’,
‘Balance’ are the most dominant features in the first scenario. When features were
analysed individually, customers with balance less than 20,000 and customers with
age greater than forty had high retention.
Table 3 shows the number of samples incorrectly classified and the count of
common incorrectly classified samples between the two algorithms. These samples
are considered for error analysis.
Table 1 Accuracy and roc-score of selected models

Model Accuracy Roc-Auc score
RandomForest 87.1 74.62
XGBoost 86.88 74.62
SVM 86.05 68.13
Naive Bayes 78.40 66.51
Logistic regression 81.45 59.68
Fig. 1 Feature importance

of XGBoost and
RandomForest
Table 2 Legend for Figs. 1 and 2

Label Feature Label Feature
f1 Geography (France) f7 Balance
f2 Geography (Spain) f8 NumOfProducts
f3 CreditScore f9 HasCrCard
f4 Gender f10 IsActiveMember
f5 Age f11 EstimatedSalary
f6 Tenure
Fig. 2 Feature importance using wrongly classified and an equal number of correct and wrong
samples
Table 3 Values wrongly predicted between two algorithms

Country Xgboost RandomForest Common values
France 101 103 90
Spain 60 65 55
Germany 81 90 71
3.5 Error Analysis
In order to identify the common features that caused the errors, we have analyzed
the similarity and distance of the features belonging to the error samples.
3.5.1 Mahalanobis Distance
Mahalanobis distance is a measure which considers the unequal variances and corre-
lations between features to find the distance between two data elements in the space
defined by features. This algorithm has been used for object classification in [10].
Equation explains how to compute Mahalanobis distance which was first introduced
in [11].
D 2 = (x − m)T C −1 (x − m) (1)
where D 2 = Mahalanobis distance, x = vector of data, m = Vector of mean values

of independent variables, C −1 = Inverse Covariance matrix of independent variables
and T indicates vector should be transposed.
3.5.2 Hamming Distance
Hamming distance is the minimum number of substitutions required to turn one into
another. Since hamming distance works on binary data, all continuous values were
binned and converted into minimum categories possible. In [12], Hamming distance
has been used for fault analysis of various circuits.
3.5.3 Jaccard Similarity Score
The Jaccard similarity estimates likeness between limited sample sets and is charac-
terized as the cardinality of the intersection of sets divided by the cardinality of the
union of the sample sets. In [13], Jaccard similarity has been used to find dissimilar-
ity between the frames of a video to detect motion wherein pixels are tokenized and
hashed.
Mahalanobis distance is calculated between each feature and the target variable
to find the within-class variance. The feature with the highest Mahalanobis distance
is the most confusing. The feature with the maximum between-class variance also
has the minimum within-class variance. Between class variance of each feature is
calculated by using Hamming distance and Jaccard similarity score. Error analysis
sample set is split into two classes (a) Samples which were wrongly predicted as
Exited and (b) Samples which were wrongly predicted as Not-Exited.
Distance measure can be considered as an inverse for similarity measure. The fea-
tures that contribute to wrong prediction are the ones which have smaller values for
distance and larger values of similarity in the error samples. These are the confusing
features that result in wrong prediction. The following details can be concluded from
the observations in Table 4. Concerning Mahalanobis distance, ‘Gender’, ‘Geog-
raphy’ and ‘NumofProducts’ are the confusing features as they have the highest
Mahalanobis distance. With respect to Hamming distance ,the similarity in the dom-
inant features which is ‘Credit Score’ and ‘Balance’ have resulted in the wrong
classification.
It can be observed that ‘Balance’ and ‘NumofProducts’ are the most dominant
features based on Fig. 1. But ‘Balance’ and ‘NumofProducts’ were found to be con-
fusing features based on error analysis.To quantify the importance of these features,
these features were removed and respective results are tabulated in Table 5. There
Table 4 Results of error analysis

Feature Binned features Mahalanobis Hamming Jaccard
distance distance similarity
score
Geography 21.4934 0.3846 0.6153
(Germany)
Credit score 17.5018
Cred_Score(550, 650) 0.2857 0.7142
Cred_Score(750, 850) 0.2527 0.7472
Gender 22.2626 0.4065 0.5934
Balance 16.4628
Balance (140,000, 0.2637 0.7362
260,000)
NumOfProducts NumOfProducts (1, 4) 21.0203 0.3956 0.6043
Table 5 Accuracy and roc-score after removing confusing features

Features removed RandomForest XgBoost
Accuracy Roc-auc score Accuracy Roc-auc score
None 87.1 74.62 86.88 74.62
NumOfProducts 83.95 66.35 84.6 66.21
Credit score 86.45 73.64 86.5 73.11
is a drop in accuracy from 87.1 to 83.95 and ROC score from 74.62 to 66.35 when
‘NumofProducts’ alone is removed.
The features in the dataset were ranked based on their importance concerning both
RandomForest and XGBoost for the whole sample set and when wrongly classified
samples alone are taken.When the whole sample set was taken, it could be seen
that Age, Credit Score, Balance, and NumOfProducts were the top-ranked features
for both the algorithms. When wrongly classified samples alone were taken, it can
be observed that the same features—Age, Credit Score, and Balance are the most
dominant.
Error analysis and identifying the features causing the error is very important in this
machine learning age. This paper has considered the Customer retention data for
error analysis. The data is tested and analysed with various classifiers and it has been
observed that Random Forest and XGboost algorithms perform very well for the data
set under consideration. The misclassified data of these two classifiers is considered
for error analysis. The features are ranked based on Gini impurity which is used in
the default Scikit-Learn implementation.
Based on the observations, features that were found to be the confusing features
were ranked higher in the feature importance ranking with respect to both Random
Forest and XGBoost. When these particular features alone were removed, accuracy
dropped to a great extent. This affirms the fact that an algorithm gives more signifi-
cance to an attribute that is not capable of separating a data sample into two classes.
The classifier performance is analyzed and top ranking features are listed under three
circumstances: actual data with less bias, with 50% error samples and with 100%
error samples. This paper lists the possible confusion features that are responsible
for misclassification and compares with the actual data.
The study can be widened by assessing and performing error analysis with datasets
from various sources to identify a general pattern and to check whether this stratified
error analysis can be generalized. Feature ranking for various sampling techniques
and diverse machine learning algorithms needs to be explored to get a clearer under-
standing of their influence on feature ranking.
References
1. Reichheld, F.F., Sasser, E.: Zero defections: quality comes to services. Harvard Bus. Rev. 68(5),
105–111 (1990)
2. Sayed, H., Abdel-Fattah, M.A., Kholief, S.: Predicting potential banking customer churn using
apache spark ML and MLlib packages: a comparative study. Int. J. Adv. Comput. Sci. Appl. 9,
674–677 (2018). https://doi.org/10.14569/IJACSA.2018.091196
3. Sahar, F.: Machine-learning techniques for customer retention: a comparative study. Int. J. Adv.
Comput. Sci. Appl. 9 (2018). https://doi.org/10.14569/IJACSA.2018.090238
4. Au, T., Ma, G., Li, S.: Applying and evaluating models to predict customer attrition using data
mining techniques. J. Comp. Int. Manage. 6(1), 10 (2003)
5. Qureshi, S.A., Rehman, A.S., Qamar, A.M., Kamal, A., Rehman, A.: Telecommunication sub-
scribers’ churn prediction model using machine learning. In: Eighth International Conference
on Digital Information Management (ICDIM 2013), Islamabad, pp. 131–136 (2013). https://
doi.org/10.1109/ICDIM.2013.6693977
6. Umayaparvathi, V., Iyakutti, K.: Applications of data mining techniques in telecom churn
prediction. Int. J. Comput. Appl. 42, 5–9 (2012). https://doi.org/10.5120/5814-8122
7. He, B., Shi, Y., Wan, Q., Zhao, X.: Prediction of customer attrition of commercial banks based
on SVM model. Procedia Comput. Sci. 31, 423–430 (2014). https://doi.org/10.1016/j.procs.
2014.05.286
8. Devi Prasad, U., Madhavi, S.: Prediction of churn behaviour of bank customers using data
mining tools. Indian J. Mark. 42(9), 25–30 (2012)
9. Xia, G., Jin, W.: Model of customer churn prediction on support vector machine. Syst. Eng.
Theor. Pract. 28, 71–77 (2008). https://doi.org/10.1016/S1874-8651(09)60003-X
10. Natarajan, V., Bharadwaj, L.A., Krishna, K.H., Aravinth, J.: Urban objects classification from
HSR -HTIR data using gaussian and mahalanobis distance classifiers. In: Proceedings of the
2018 IEEE International Conference on Communication and Signal Processing (ICCSP 2018),
Chennai, pp. 1041–1045 (2018)
11. Mahalanobis, P.C.: On the generalized distance in statistics. Proc. Natl. Inst. Sci. India 2(1),
49–55 (1936)
12. Chandini, B., Nirmala, Devi M.: Analysis of circuits for security using logic encryption. In:
Thampi S., Madria S., Wang G., Rawat D., Alcaraz Calero J. (eds.) Security in Computing and
Communications. SSCC, Communications in Computer and Information Science, vol. 969.
13. Srenithi, M., Kumar, P.: Motion detection algorithm for surveillance videos. In: Pandian, D.,
Fernando, X., Baig, Z., Shi, F. (eds.) Proceedings of the International Conference on ISMAC
in Computational Vision and Bio-Engineering 2018 (ISMAC-CVB). ISMAC 2018. Lecture
Notes in Computational Vision and Biomechanics, vol. 30, pp. 955–964. Springer Netherlands
(2019)
Prediction Based Task Scheduling for
Load Balancing in Cloud Environment
Suresh Chandra Moharana, Amulya Ratna Swain, and Ganga Bishnu Mund
Abstract The exponential growth in demand towards computing resources led the
foundation of cloud computing technology. Cloud computing enables the provision
of virtual resources in terms of Virtual Machine (VM)s to service user requests.
The user tasks are scheduled for these VMs for their accomplishment. However, the
services of cloud computing are web-based and hence the workload over the VM
gets updated dynamically. In order to handle the dynamic workload, smarter task
scheduling heuristics need to be incorporated in the cloud models. The absence of
a proper task scheduling scheme may result, uneven load distribution across VMs
leading to inefficient utilization of resources. In this work, a prediction based task
scheduling scheme is proposed that handles the dynamically changing workload
efficiently. It has been seen that the proposed model lessens the load imbalance level
across VMs as compared to the contemporary task scheduling models.
Keywords Task scheduling · VM · Load balancing · MAD
1 Introduction
The increasing demand for computing resources enabled cloud computing to provide
unlimited resources to the end-user. Cloud computing is based on distributed com-
puting concepts and it offers services to users over the web. It arranges infrastructure,
platform, and software as services as per the pay-per-use model [1, 2]. Besides, cloud
computing also reduces the cost of building and managing infrastructure by provid-
ing scalable virtualized resources to the end-user [3]. Virtualization [4] helps cloud
S. C. Moharana (B) · A. R. Swain · G. B. Mund

KIIT Deemed to be University, Bhubaneswar, Odisha 751024, India
e-mail: sureshmoharana@gmail.com
A. R. Swain
e-mail: swainamulya@gmail.com
G. B. Mund
e-mail: mundgb@yahoo.com
https://doi.org/10.1007/978-981-33-4543-0_76
720 S. C. Moharana et al.
computing to provide scalable resources in terms of the Virtual Machine (VM) to the
end-user. The user tasks are allocated to these VMs for their accomplishment. How-
ever, the monstrous growth of need for these resources motivating service providers
of cloud to use their resources towards fulfilling more user service requests [5, 6].
It may lead to uneven distribution of workload across these VMs leading to ineffi-
cient usage of computing resources. Hence, there is a necessity of distributing the
workload evenly across the VMs leading to load balancing. It will not only use the
resources efficiently but also satisfy the QoS requirements effectively. Further, it is
also observed that the resource allocation to the tasks in the cloud is NP-complete
[7]. So, it is a moving assignment to build up a task scheduling model on cloud
resources.
The above-mentioned challenge can be addressed by designing appropriate task
scheduling heuristic that balances the load across VMs leading to the effective usage
of resources. Further, in the cloud model the workload is always unpredictable and
hence requires a dynamic scheduling model to handle the unpredictability. In this
work, the objective is to design a prediction based task scheduling model that leads to
load balancing across the available VMs. At first, the overloaded VMs get selected
by using an upper threshold. The upper threshold is chosen dynamically using a
popular statistical method considering the historic CPU usage information of VMs.
Then, the tasks need to be remapped to the VMs other than the current VM is selected
using a pre-defined heuristic. Finally, the identified tasks are rescheduled to the non-
overloaded VMs from overloaded ones in-order to achieve load balancing. The load
imbalance level parameter is taken into account to compare the proposed scheme
with the contemporary models. After performing extensive experimentation, it has
been seen that the proposed model decreases the load imbalance level marginally as
contrasted with the current methodologies. So, the proposed model not only achieves
load balancing but also handles the dynamic workload efficiently.
The remainder of the paper is sorted out as follows. The next section presents
a summary of the literature closely linked to the current work. Section 3 provides
the details of the proposed system model. In Sect. 4, the assessment of the proposed
model is featured. Finally, the last section highlights the conclusive remarks and
future directions.
2 Related Work
The assignment of tasks to the VMs famously known as task scheduling in the cloud
model has been widely studied in different literature. Among them, scheduling of
tasks for balancing the in the cloud has taken a reasonable place. As per the literature
[8], the objective of load balancing schemes not only to distribute the load evenly
across VMs but also maximize the utilization of computing resources. Milani and
Navimipour [9] have discussed the different load balancing schemes applicable to
the cloud environment. Besides, they have also mentioned the challenges faced by
the load balancing algorithms. Patel et al. [10] also studied the varied load balancing
Prediction Based Task Scheduling for Load Balancing … 721
schemes in the cloud environment. They have classified the mentioned load balancing
schemes into various categories and highlighted the pros as well as cons of each
method.
The resource allocation problem can be classified as of NP-Complete nature. So,
there is a need of developing heuristic and meta-heuristic schemes for addressing this
problem. Freund et al. [11] discussed max–min and min–min heuristic for scheduling
a task in a distributed environment. He et al. [12] suggested a modification of the min-
min heuristic by taking of QoS into consideration. The literature [13, 14] has focused
on task scheduling schemes in the cloud environment that takes QoS into account.
Umarani et al. [15] have presented an ant colony optimization based meta-heuristic
model for scheduling of tasks however experience the ill effects of prolonged waiting
time. Cho et al. [16] proposed a hybrid meta-heuristic approach towards addressing
the scheduling problem taking both the ant colony optimization and particle swarm
optimization into consideration. The task scheduling heuristics termed as round-robin
and random is presented in the literature [17, 18]. The work presented by Rimal et
al. [18] used round-robin for even distribution of load across computing resources,
but it does not take loads of VMs into consideration.
It has been observed that the load balancing schemes can also be either migration
based or prediction based in the cloud environment. Task scheduling schemes based
on migration transfers the running tasks from overloaded VMs to the non-overloaded
ones without service disruption. A particle swarm optimization based task mapping
scheme is proposed in [19] in which rather than migrating the overburdened VM
the tasks on the over-burdened VM are moved to achieve load balancing. Wu et al.
[20] have presented a prediction based task mapping scheme relying upon previous
data. The focus of the authors is to predict the VM needs in advance and schedule
the task accordingly to accomplish load balancing. Bala and Chana [21] presented a
predictive approach to identify the overloaded and underloaded VMs and highlighted
a migration based scheme of task scheduling from the over-burdened VMs to the
under-burdened ones to balance the load across VMs. After reviewing the literature,
the following research gaps are identified. It has been observed that prediction based
task scheduling schemes based on statistical techniques are missing. Alongside this,
most literature has considered a single parameter for taking decisions in their model.
It has been also found that the underutilized virtual machines are taken into account
in a few pieces of literature. In this proposed scheme, the authors have addressed
some of the gaps mentioned above. In the next section, the details of the proposed
task scheduling scheme will be discussed.
3 System Design
In the proposed model, the authors have presented a prediction based task mapping
scheme for achieving load balancing.
For this model, the tasks are assumed to be independent of one another. At first,
the underloaded hosts are detected by considering the lower threshold as 15% CPU
usage. The tasks available on these hosts are randomly placed over the available VMs.
The next job is to identify the overloaded VM. In-order to detect the overloaded
VM, an upper threshold is computed based on a statistical method. As suggested
by Beloglazov and Buyya [22], the median absolute deviation (MAD) has been
employed to decide upon the upper threshold. The MAD value is computed by taking
the previous CPU usage values of available VMs and is used to predict the upper
threshold value dynamically. The technique used for computing the MAD value is
mentioned below.
v_MAD = median(| h_CPUi − h_CPU |) for 1 ≤ i ≤ n (1)
where n represents the count of currently active VMs, v_MAD represents the MAD
value, h_CPUi represents the previous CPU usage value of ith CPU, and h_CPU
represents the mean CPU usage value of n active VMs. Then, the upper threshold
value (u_THR) gets predicted using the rule,
u_THR = 1 − k ∗ v_MAD for 0 ≤ k ≤ 1 (2)
In case, the value of k moves towards 0 then u_THR gets a lesser value otherwise
it gets a larger value. The value of k decides the aggressiveness of the VM for
the accomplishment of tasks assigned to it. In the proposed scheme, the k value is
assumed to be the standard value of 0.7. The complete proposed system model is
provided in Fig. 1 for reference. In the second phase of the presented model, the task
selection is taken into account after the selection of the overloaded VMs. For each
task running on the available VMs, a matrix is maintained that will keep the record
of workload (in MIPS) and the priority of each one.
α ∗ t_LOAD + (1 − α) ∗ t_PRT (3)
Fig. 1 Proposed system model

The task with the highest value is selected for migration from the previously selected
VMs. Nevertheless, the random strategy is applied to break the coincidence in task
selection. The α value is meant for keeping the balance between the task load (t_Load)
and its priority (t_PTR) and it must be chosen according to the environmental require-
ments. In the proposed model the α value is chosen 0.6, as it leads to the best results.
It is worthy to mention, at the VM selection phase, the VMs are segregated into either
overloaded or non-overloaded ones. After the task gets selected from the overloaded
VM, it is migrated to the non-overloaded VM. This process continues in iterations in-
order to achieve load balancing across the VMs. As a result, the computing resources
are also utilized efficiently. The next section will highlight the assessment of the pre-
sented model and compare the findings with the existing approaches.
4 Experimental Evaluation and Results
The experimental environment takes the hp pro-book system with core i5 8th gen
processor, 8 GB of RAM, and Ubuntu 18.04 OS into consideration. The proposed
model is implemented in the Java programming environment.
The collection framework in Java has been used efficiently for realizing the pro-
posed scheme. One list is created in Java for each VM. The CPU requirement of
the user tasks are represented as random values in a pre-defined range and these
values are stored in the list. It simulates the scheduling of tasks to VMs. Then, the
proposed model executed in many iterations. In the first experiment, 20 VMs are
considered whereas the second experiment considers 40 VMs for analyzing the per-
formance of the presented scheme with contemporary approaches. The performance
metric Load_Imbalance_Level has been used to measure the performance of the pre-
sented model with the existing schemes. The Load_Imbalance_Level parameter can
be mathematically defined as,

n
Load_Imbalance_Level = (c_LOADi − c_LOADi+1 ) n (4)
i=1
where c_LOADi represents the CPU load post applying the proposed scheme. The
values of c_LOADi in the gap of five iterations have been recorded to analyze the
effectiveness of the presented scheme. It has been observed that the presented scheme
outperforms the existing model in terms of Load_Imbalance_Level in both the exper-
iments as shown in Fig. 2. As Load_Imbalance_Level gets reduced in the presented
scheme, the CPU load difference among the active VMs gets minimized as well
leading to load balancing. The next section will present the concluding remarks and
provide future directions as well.
Fig. 2 Performance assessment of proposed model
5 Conclusion
The scheduling of user service requests to the virtual machines in the cloud environ-
ment to balance the load is extensively studied in the literature. However, prediction
based task scheduling using the statistical approaches can be an extra addition. In this
work, a prediction based task scheduling scheme is presented based on the median
absolute deviation. The work aims to reduce the CPU load difference among the
active VMs to achieve load balancing among the VMs. The experiment results sug-
gest that the presented model outperforms the contemporary scheduling schemes in
terms of load imbalance level. In the future, the plan is to incorporate multiple param-
eters like memory usage, network bandwidth into the proposed model and analyze
its performance.
References
1. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson,
D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4),
50–58 (2010)
2. Armbrust, M., Fox, A., Griffith, R., Joseph, A., Katz, R., Konwinski, A., Lee, G., Patterson, D.,
Rabkin, A., Stoica, I., Zaharia, M.: Above the Clouds: A Berkeley View of Cloud Computing.
Rep. UCB/EECS-2009-28, University of California at Berkley, USA (2009)
3. Yousafzai, A., Gani, A., Noor, R.M., Sookhak, M., Talebian, H., Shiraz, M., Khan, M.K.:
Cloud resource allocation schemes: review, taxonomy, and opportunities. Knowl. Inf. Syst.
50(2), 347–381 (2017)
4. Panda, B., Moharana, S.C., Das, H., Mishra, M.K.: Energy aware virtual machine consoli-
dation for load balancing in virtualized environment. In: 2019 International Conference on
Communication and Electronics Systems, pp. 180–185. IEEE, India (2019)
5. Singh, S., Chana, I.: Cloud resource provisioning: survey, status and future research directions.
Knowl. Inf. Syst. 49(3), 1005–1069 (2016)
6. Arunarani, A., Manjula, D., Sugumaran, V.: Task scheduling techniques in cloud computing:
a literature survey. Future Gener. Comput. Syst. 91, 407–415 (2019)
7. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-
Completeness. Freeman, San Francisco (1979)
8. Subrata, R., Zomaya, A.Y., Landfeldt, B.: Game-theoretic approach for load balancing in
computational grids. IEEE Trans. Parallel Distrib. 19(1), 66–76 (2008)
9. Milani, A.S., Navimipour, N.J.: Load balancing mechanisms and techniques in the cloud envi-
ronments: systematic literature review and future trends. J. Netw. Comput. Appl. 71, 86–89
(2016)
10. Patel, D.K., Tripathy, D., Tripathy, C.R.: Survey of load balancing techniques for grid. J. Netw.
Comput. Appl. 65, 103–119 (2016)
11. Freund, R.F., Gherrity, M., Ambrosius, S.L., Campbell, M., Halderman, M., Hensgen, D.A.,
Keith, E.G., Kidd, T., Kussow, M., Lima, J.D., Mirabile, F., Moore, L., Rust, B., Siegel, H.J.:
Scheduling resources in multi-user, heterogeneous, computing environments with SmartNet.
In: Proceedings of the 7th Heterogeneous Computing Workshop, pp. 184–199. IEEE, USA
(1998)
12. He, X., Sun, X., Von, L.G.: QoS guided min-min heuristic for Grid task scheduling. J. Comput.
Sci. Technol. 18(4), 442–451 (2003)
13. Wu, X., Deng, M., Zhang, R., Zeng, B., Zhou, S.: A task scheduling algorithm based on
QoS-driven in cloud computing. Procedia Comput. Sci. 17, 1162–1169 (2013)
14. Ali, H.G.E.D.H., Saroit, I.A., Kotb, A.M.: Grouped tasks scheduling algorithm based on QoS
in cloud computing network. Egypt. Inform. J. 18(1), 11–19 (2017)
15. Umarani, S.G., Maheswari, V.U., Shanthi, P., Siromoney, A.: Tasks scheduling using ant colony
optimization. J. Comput. Sci. 8(8), 1314–1320 (2012)
16. Cho, K.M., Tsai, P.W., Tsai, C.W., Yang, C.S.: A hybrid meta-heuristic algorithm for virtual
machine scheduling with load balancing in cloud computing. Neural Comput. Appl. 26(6),
1297–1309 (2014)
17. Lee, Y.C., Zomaya, A.Y.: Energy efficient utilization of resources in cloud computing systems.
J. Supercomput. 60(2), 268–280 (2012)
18. Rimal, B.P., Choi, E., Lumb, I.: A taxonomy and survey of cloud computing systems. In: Fifth
International Joint Conference on INC, IMS and IDC, pp. 44–51. IEEE, South Korea (2009)
19. Ramezani, F., Lu, J., Hussain, F.K.: Task-based system load balancing in cloud computing
using particle swarm optimization. Int. J. Parallel Prog. 42(5), 739–754 (2013)
20. Wu, H.S., Wang, C.J., Xie, J.Y.: TeraScaler ELB—an algorithm of prediction-based elastic
load balancing resource management in cloud computing. In: 27th International Conference
on Advanced Information Networking and Applications Workshops, pp. 649–654. IEEE, Spain
(2013)
21. Bala, A., Chana, I.: Prediction-based proactive load balancing approach through virtual machine
migration. Eng. Comput. 32(4), 581–592 (2016)
22. Beloglazov, A., Buyya, R.: Optimal online deterministic algorithms and adaptive heuristics
for energy and performance efficient dynamic consolidation of virtual machines in cloud data
centers. Concurr. Comput.: Pract. Exp. 24(13), 1397–1420 (2011)
Test Case Generation Using
Adequacy-Based Genetic Algorithm
Ruchika Malhotra and Shivani Pandey
Abstract Generating test cases is one of the most time- and effort-consuming prob-
lems in software testing. Many efforts have been made to automate this problem so
as to make the procedure of software testing more efficient. The major part of these
solutions involves the use of evolutionary techniques. Genetic algorithm is associ-
ated with automating the problem of test case generation since early 1990s. This
paper presents an alternative way of using genetic algorithm for test case genera-
tion. It involves adequacy-based approach where the mutants are incorporated into
the source code while generating the test cases. This approach will not only help in
producing efficient results but also will reduce ample amount of time taken in the
process. The results show that the intended approach undergoes an effective decline
in the obtained number of test cases when compared to the path testing approach.
Keywords Test case · Test case generation · Evolutionary techniques · Genetic

algorithm · Adequacy-based approach
1 Introduction
In conventional life cycle of developing the software, the software testing procedure
takes nearly half of the development budget, more than half of the total development
time, and maximum effort compared to all the other phases [1]. The process of soft-
ware testing comprises of three main phases considering test cases: (i) generation,
(ii) execution, and (iii) evaluation which can be singularly described as: Test case
generation is the process which involves developing the relevant test cases in accor-
dance with a particular software system. Further, the test cases are executed for the
verification of software functionalities in the process called test case execution. The
R. Malhotra · S. Pandey (B)

Department of Computer Science and Engineering, Delhi Technological University, New Delhi,
India
e-mail: shivanipnd459@gmail.com
R. Malhotra
e-mail: ruchikamalhotra@dtu.ac.in
https://doi.org/10.1007/978-981-33-4543-0_77
728 R. Malhotra and S. Pandey
third phase, known as test case evaluation, involves recording the test cases which
were useful and provided value to the entire process of software testing [2].
The most crucial one out of all these phases is test case generation as it takes
the maximum cost, effort, and time out of all the three phases [1], and its execution
requires a certain level of expert knowledge. The process of automating test case
generation can greatly reduce the overall development time and cost of the software
testing procedure which in turn will reduce the time and cost of overall software
development procedure [3].
There has been a lot of research done in the field involving automation of
software testing. Most of the work regarding automation of test case generation
involves the use of evolutionary techniques. Genetic algorithm is the most used
evolutionary technique in automatic test case generation [4]. It imitates the process
of the natural biological evolution built on the notion: ‘survival of the fittest’ given
by Charles Darwin. This algorithm involves an initial population which evolves into
a better population in each generation by allowing the reproduction of the individ-
uals partaking high fitness and discarding the individuals with low fitness values.
It involves having an initial population which is checked for its fitness followed by
selection of parents having high fitness values to generate a new population which
contains the offsprings using the process of crossover and mutation of genes. This new
population is then evaluated. This algorithm is iterated until a pre-decided stopping
criterion is met [5].
Software testing is broadly categorized into functional testing and structural
testing. The process of functional testing is used for checking the functionalities of
the software system [6], and structural testing emphases the correctness in structure
or hierarchy of the system [7]. Structural testing is further classified into reliability-
based criteria and adequacy-based criteria. The reliability-based criteria outputs a
reliable set of test cases which proves the correctness of the program while adequacy-
based criteria brings out the fault finding capacity of the test suite generated [8]. This
paper uses adequacy-based testing criteria along with incorporating mutation anal-
ysis alongside the process of test case generation which will save a substantial amount
of time in the whole process. It is an extension of work in [9] and uses the concept
with better technology and enriched dataset to get efficient results which prove the
fidelity of the technique.
The rest of the paper is systematized as: Sect. 2 covers methodology employed
in the paper. Section 3 discusses experimental studies, technologies and parametric
settings, and the results obtained by the process. Section 4 carries the conclusions
and the possibilities of the future work.
2 Methodology
This paper proposes an adequacy-based criteria for generation of the test cases. The
most common practice which examines test case adequacy is ‘mutation analysis’
which usually is done after we have generated the test cases. This paper proposes a
Test Case Generation Using Adequacy-Based Genetic Algorithm 729
Fig. 1 Steps of genetic

algorithm in the form of Initial population
flowchart diagram
Fitness evaluation
Selection procedure
N
Mutation/ Crossover
Stopping
Condition
Y
End
technique which uses the mutation analysis alongside test case generation which save
ample amount of time and automatically generates adequate test cases as the output.
The typical process of genetic algorithm is shown below in a form of flowchart in
Fig. 1.
In the proposed method, we first generate mutants in the program by making some
slight variations in the source code. Then, we record the difference in the original
and mutated statement of the source code and generate the respective constraints
accordingly. The solution to these constraints would represent the test cases. Subse-
quently, using the rules given in [10] we construct fitness function for the source
code. This fitness function is then fed to the genetic algorithm for the generation of
the test cases. Now, this process has the capability to kill the other mutants, along
with the current one, recorded at the initial level. So we will now examine the status
of other mutants. If any mutant is still alive, we will repeat the process until all the
mutants are killed. Figure 2 shows the comprised steps in the proposed process.
1. Identify the working source code for a program.

2. Generate the mutants in definite statements of the source code.
3. For a particular mutant, precisely record the differences in the mutated and original statements.
4. Generate the constraints for the program according to the recorded differences.
5. Generate the fitness function for the program using the constraints generated.
6. Feed the generated fitness function to genetic algorithm module for generating the test data.
7. If all other mutants are killed by the generation of current test data; end the process.
8. Else, consider the next live mutant and go to step 3.
9. Repeat step 7 and 8 until all the mutants are processed.
10. End.
Fig. 2 Steps showing the proposed method in detail
Table 1 Details of the

S. No. Program Description LOC
dataset
P1 snake.c Snake and bits game 564
P2 bike.c Bike racing game 581
P3 pacman.cpp Pacman game 647
P4 sal.c Snake and ladder game 867
P5 heli.cpp Helicopter attack game 959
P6 helilearn.cpp Helicopter driving game 1048
P7 fortell.cpp Fortune teller game 1059
P8 tank.c Tank game 1092
3 Experimental Studies
3.1 Dataset
In this work, we have used source codes of eight real time programs, which lie in
the range of 564–1092 lines of code. All these programs are developed in C/C++
language. All the selected source codes are of game-based programs and are fully
functional. The details of the programs used are shown in Table 1.
3.2 Fitness Function Construction
Once we have finalized the dataset, we have to consider each program source code
individually and apply the process described in Sect. 2 on the same until the desired
results are obtained.
Mutant generation. A mutant in a source code is introduced by intentional alter-
ation of the source code [11]. For each program considered, we have chosen five
mutants by introducing a slight variation in the source code. While choosing the
mutants we have to make sure that we have to choose such a mutant that its execu-
tion will lead the program onto a different path than the expected path. This is how
the mutant will be identified and killed. We have recorded each of these mutants
carefully along with its processing status during the execution. For each mutant, a
total of 10 runs are iterated in each program.
Constraint Generation. This is the crucial step which will ensure the correctness
of the obtained test data [8]. After the generation of mutants, we record the differences
obtained in the mutated statement with the original statement of the source code. We
record these differences in a specific format to generate the constraints for the mutant.
The solutions to these constraints will give us the required test cases for the specific
program.
Fitness function. The main element in genetic algorithm is its fitness function
value. Based on this value of fitness function, the algorithm decides the goodness
measure of each individual element in the population [12]. The performance of
the algorithm solely depends upon how effective is the associated fitness function.
Henceforth, the step of generating the fitness function is the most crucial in the
execution of genetic algorithm. The procedure adopted for the construction of fitness
function is the one followed in [9]. It will generate the fitness function for the mutant
in consideration which will then be fed the genetic algorithm module of Python to
get the desired results.
3.3 Parametric Settings
The settings of the adequacy-based algorithm are as follows:

1. Initial population size: 100
2. Genetic Algorithm software: GA module of Python.
3. Selection Technique: Roulette wheel selection.
4. Representation scheme: Double vector.
5. Crossover technique: Single point crossover.
6. Mutation rate: 0.01
7. Crossover probability: 0.09
8. Number of generations (maximum): 40
3.4 Experimental Results
For each program mentioned in Table 1, we have computed test cases for the proposed
technique by taking five mutants in each source code, generating their respective
constraints and fitness functions and subsequently the same to the genetic algorithm
module of Python. For comparing the efficiency of this adequacy-based technique,
we have used the most used technique from reliability testing criteria: ‘path testing
technique’ [13]. For this technique, we have constructed control flow graphs for
Table 2 Total test cases generated as per adequacy-based and reliability-based testing techniques
S. No. Program source code Adequacy-based technique Reliability-based technique
(proposed technique) (path testing technique)
P1 snake.c (Snake and bits 43 111
game)
P2 bike.c (Bike racing game) 64 104
P3 pacman.cpp (Pacman 117 199
game)
P4 sal.c (Snake and ladder 33 415
game)
P5 heli.cpp (Helicopter 124 172
attack game)
P6 helilearn.cpp (Helicopter 107 179
driving game)
P7 fortell.cpp (Fortune teller 100 373
game)
P8 tank.c (Tank game) 165 241
each of the program source codes mentioned in Table 1 and then chose five unique
paths from those CFGs [14]. We have followed the method given in [9] for the
construction of fitness values for each of these paths. Upon the construction of these
fitness functions, we will feed them to our genetic algorithm module as we did for
the proposed technique; and in the same manner present technique is also iterated
for 10 runs.
Table 2 shows the total number of test cases generated by both the techniques,
namely adequacy-based technique (proposed technique) and reliability-based tech-
nique (path testing technique). Now, for each of the five mutants and paths selected
in each program, the total number of unique test cases is recorded for each of the
10 runs. Then, we have taken the average of the values for each mutants or path
separately. Any value obtained in decimal is approximated by rounding off to the
floor value to get an integral value. After this, we have taken the sum of all these
approximated values. These are the values shown in Table 2 (result of the sum of
approximated average values of all the 10 runs for mutants/paths for each program).
The comparison between both techniques is done on the basis of two measures:
(i) total number of test cases generated and (ii) time taken for generating the test
cases.
Here, the proposed method has generated considerably less number of test cases,
as and when compared with the path testing technique. The main reason being, the
proposed technique only generates adequate test cases while path testing technique
generates adequate and non-adequate test cases as well. Millo and Offut have stated
that an adequate test case set is responsible for the failure of all faulty versions
of considered program [6]. Adequacy mainly focuses on detection of faults by the
100
90
80
% reduction in test cases
70
60
50
40
30
20
10
0
1 2 3 4 5 6 7 8
Program number
Fig. 3 Percentage reduction obtained in the number of test cases
test case set rather than focusing on proving the correctness; and this makes it a
better alternative. We are comparing both the techniques on the basis of percentage
reduction in the test cases which is calculated by the formula used in [9] which is
shown as Eq. 1 below:
TPT − TOT
percentage reduction = (1)
TPT
Here, TPT denotes the number of test cases generated by the path testing technique
and TOT denotes the number of test cases generated by our proposed technique.
The following bar chart represented in Fig. 3 shows the comparison between the
two techniques based on the above-stated formula. Figure 3 shows that there is 27.9–
92.04% reduction observed in number of the test cases which are yielded by the
proposed technique in comparison to the reliability-based technique.
For the method of path testing, for each program five unique paths are taken
into consideration and fitness function are constructed for each of these five paths
separately before feeding them onto the genetic algorithm module. However, in the
proposed technique, we have chosen five mutants, which upon execution kill one or
more of the other mutants. Therefore, we have to construct the fitness function for
only those mutants which are not killed by their fellow mutants. For instance, if out
of 5 mutants, 3 are killed in the process, we have to construct the fitness function only
for two mutants saving 60% time when compared with the method of path testing
technique. Hence, ample amount of time is saved in the proposed technique when
compared with the time taken in path testing technique. The bar chart in Fig. 4 shows
the percentage reduction in time in proposed method for considered dataset when
compared to the path testing technique.
The above figure shows that time taken in generating the test cases undergoes
20–60% reduction in the proposed technique in comparison to the reliability-based
technique.
70
60
% reduction in time
50
40
30
20
10
0
1 2 3 4 5 6 7 8
Program number
Fig. 4 Percentage reduction in time taken in generating the test cases
4 Conclusion
In this work, genetic algorithm has been used for the generation of adequate test
cases. We have adopted a concept of generating test cases along with simultaneous
mutation analysis so as to automatically generate adequate test cases and to save
additional time. We have taken a rich dataset of eight real-time program that ranges
up to 1092 lines of source code. We implemented the technique on this dataset and got
promising results. The results of the implementation were compared against the most
used reliability-based technique: path testing technique. Considering the number of
generated test cases, we have recorded a reduction up to 92.04% in comparison
of path testing technique; which is substantially better than the parent research.
In terms of time taken in generation of these test cases we have recorded up to
60% savings. Hence, in both the comparison criteria, the proposed technique has
shown significant better results than the reliability-based testing. Thus, we can say
the proposed technique is a promising technique in the area of automatic test cases
generation.
The future aspects of this works includes the implementation of other heuristic
algorithms following this concept of the proposed adequacy-based technique such
as particle search optimization, bat algorithm, artificial bee colony algorithm etc.
to verify the efficiency of the technique independent of the algorithm. Further, an
automatic tool can be developed for implementing the proposed technique to save
time and efforts to develop the code at each attempt.
References
1. Beizer, B.: Software testing techniques. Dreamtech Press (2003)

2. Singh, Y.: Software Testing. Cambridge University Press (2012)
3. McMinn, P.: Search based software test data generation: a survey. Softw. Test. Verif. Reliab.
14(2), 105–156 (2004)
4. Chuaychoo, N., Kansomkeat, S.: Path coverage test case generation using genetic algorithms.
J. Telecommun. Electron. Comput. Eng. (JTEC) 9(2–2), 115–119 (2017)
5. Korel, B.: Automated software test generation. IEEE Trans. Softw. Eng. 16(8), 870–879 (1990)
6. Duran, J.W., Ntafos, S.C.: An evaluation of random testing. IEEE Trans. Softw. Eng. 10(4),
438–443 (1984)
7. Jones, B.F., Sthamer, H.H., Eyres, D.E.: Automatic structural testing using genetic algorithms.
Softw. Eng. J. 299–306 (1996)
8. DeMillo, R., Offutt, A.J.: Constraint-based automatic test data generation. IEEE Trans. Softw.
Eng. 17(9), 900–910 (1991)
9. Malhotra, R., Garg, M.: An adequacy based test data generation technique using genetic
algorithms. J. Inf. Process. Syst. 7(2), 363–384 (2011)
10. Chen, Y., Zhong,Y.: Automatic path-oriented test data generation using a multi-population
genetic algorithm. In: Fourth International Conference on Natural Computation, pp. 566–570.
IEEE (2008)
11. Haga, H., Suehiro, A.: Automatic test case generation based on genetic algorithm and mutation
analysis. In: IEEE International Conference on Control System, Computing and Engineering,
pp. 119–123. IEEE (2012)
12. Xanthakis, S., Ellis, C.: Application of genetic algorithm to software testing. In: Proceedings
of 5th International Conference on Software Engineering and its Applications, pp. 625–636.
Toulouse, France (1992)
13. Nirpal, P.B., Kale, K.V.: Using genetic algorithm for automated efficient software test case
generation for path testing. Int. J. Adv. Netw. Appl. 2(6), 911–915 (2011)
14. Dahal, K., Hossain, A.: Test data generation from UML state machine diagrams using GAs.
In: International Conference on Software Engineering Advances, pp. 834–840. IEEE (2007)
Performance Analysis of π, AL and CT
for Consistency Regularization Using
Semi-Supervised Learning
Rishita Choubey and Koushik Bhattacharyya
Abstract A semi-supervised learning problem starts with a series of labeled data

points as well as some data point for which labels are not known. The primary
motive of a typical semi-supervised model is to categorize some of the unlabeled
data by means of the labeled information set. The training procedures used for semi-
supervised learning paradigm show different levels of consistency loss. These losses
are mainly caused by small disruptions or disturbances of its inputs and parameters,
and it helps in improvisation of generalization performance in comparison to super-
vised learning. This research article analyzes the performance of , active learning
(AL) and interpolation consistency training (ICT).
Keywords Semi-supervised learning · · Active learning · Interpolation

consistency training · Consistency regularization · Supervised learning · Neural
network · Consistency-based model
1 Introduction
Deep learning models [1] yield better results when trained with an ample amount of
supervised data. In real-life scenarios, obtaining a large-scaled labeled dataset can be
challenging, since the construction of such datasets is usually cost-incurring. Here
semi-supervised learning [2, 3] comes into play. By virtue of standard development
procedures like Reinforcement Learning Models or Generative Adversarial Networks
(GANs) [4, 5], large-scaled labeled datasets can be composed, and their potency can
be further improved by the implementation of consistency enforcing models. These
models are trained using unlabeled data and aim at stabilizing the predictions, on
being subjected to input perturbations. These are widely used for training audio
R. Choubey (B) · K. Bhattacharyya

Computer Science and Engineering, Dream Institute of Technology, Kolkata, India
e-mail: rishitachoubey999@gmail.com
K. Bhattacharyya
e-mail: koushikbhattacharyya123@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_78
738 R. Choubey and K. Bhattacharyya
recognition models and further utilized in research-oriented fields of medicine and

other technologies. After a detailed analysis, we have determined the accuracy of
the following consistency enforcing models [6–8], based on the error percentage
in their predictions. Hence, our research article is purposed to show a performance
analysis of the best-known consistency regularization models. π model, AL model,
CT model, applied to elementary neural network approaches, using conventional
datasets, namely CIFAR-10 and SVHN. This gives a clear perception of the three
models and the best among the three.
2 Literature Review
The demand of consistency-based models is rapidly increasing due to the vivid use
of semi-supervised learning. One example of such consistency-based models has
proposed by Samuli Laine and Timo Aila [4], where self-ensembling [9] is intro-
duced. Here concurrent predictions for unknown labels are made using the outputs
of the network, on various time intervals. The proposed theory heavily relies on the
concept of dropout regularization and input augmentation [10]. The drawback here
was that dropout, when increased beyond a certain threshold, results in overfitting or
underfitting of the model. Intuitively, higher dropout rate would result in variance to
some of the layers which eventually degrades the overall performance of the model.
Hence, dropout is not used much nowadays. On the other hand, input augmenta-
tion is also computationally very expensive (only rotation and scaling are cheap).
Another model which was grounded on training consistency-based methods using
stochastic weight averaging (SWA) was proposed by Ben Athiwaratkun, Marc Finzi,
Pavel zmailov, Andrew Gordon Wilson. It is a recent approach where a modified
learning rate is used, it uses the mean of weights along the arc of stochastic gradient
descent (SGD) [7] is evaluated. But SWA does not give optimum predictions and can
be often slow for large datasets depending on the learning rate of the SGD. Further
researches include the concept of active learning (AL) [11], authored by Mingfei
Gao, Zizhao Zhang, Guo Sercan O. Arik, Larry S. Davis, Tomas Pfister. This is a
combination of data labeling and model training. Minimization of the labeling cost
is achieved by primarily selecting data of higher value. In AL models based on pool
mechanism, easily accessible unlabeled data, are used for selection purposes, but not
for training a model. The performance of AL model is better than π model, but it
cannot be considered as the optimum model since the error rate in this model is also
considerably elevated.
Performance Analysis of π, AL and CT … 739
3 Consistency-Based Models
NL
In the semi-supervised setting, we have access to labeled data DL = {(xiL , yiL )}i=1 )}
N
and unlabeled data DU = {(xi )}i=1 . Given two perturbed inputs x , x of x and
U U
the perturbed weights ωf and ωg the consistency loss penalizes the predicted
probabilities f i(x ; ωf ) and (x ; ωg ).
This loss is typically the mean squared error or KL divergence:
2
MSE
lcons ω f , x = f x : ωf − g x , ωg
or

KL
lcons ω f , x = KL f x ; ωf ||g x , ωg (1)
The total loss used to train the model can be written as:

L ωf = lC E ω f x, y + λ lcons ω f , x (2)
(x,y)∈D L x∈D L ∪DU

LCE L cons
where for classification LCE is the cross-entropy between the model predictions and
supervised training labels. The parameter λ > 0 controls the relative importance of
the consistency term in the overall loss.
3.1 Π-model
The -model can also be seen as a simplification of the -model of the ladder
network by Rasmus et al. [2, 12, 13], a previously presented network architecture
for semi-supervised learning.
Algorithm (-model pseudocode):
Require: xi = training stimuli.
Require: L = set of training input indices with known labels.
Require: yi = labels for labeled inputs i ∈ L.
Require: w(t) = unsupervised weight ramp-up function.
Require: f θ (x) = stochastic neural network with trainable parameters θ.
Require: g(x) = stochastic input augmentation function.
for t in [1, numepochs] do.
for each minibatch B do.
z i∈B ← f θ (g(xi∈B )), evaluate network outputs for augmented inputs.
z̃ i∈B ← f θ (g(xi∈B

)), again, with different dropout and augmentation.
loss ← − |B1i | log z i |yi |, supervised loss component.
i∈(B∩L)
+ω(t) C|B|
1
||z i − z̃ i ||2 , unsupervised loss component.
i∈B
update θ using, e.g., ADAM. Update network parameters.
end for.
end for.
return θ.
The network is evaluated for each training input xi twice, resulting in prediction
vectors z i and z̃ i . The loss function consists of two components. The first component
is the standard cross entropy loss, evaluated for labeled inputs only. The second
component, evaluated for all inputs, penalizes different predictions for the same
training input xi by taking the mean square difference between the prediction vectors
z i and z̃ i . To combine the supervised and unsupervised loss terms, the latter is scaled
by time-dependent weighting function w(t).
3.2 Consistency-Based Semi-Supervised AL (Active

Learning)
It includes incorporating a semi-supervised learning (SSL) objective in the training

phases of AL [8, 11]. This model is based on minimizing the notion of sensitivity
to perturbations with the idea of inducing “consistency,” i.e., imposing similarity in
predictions when the input is perturbed in a way that would not change its perceptual
content. For consistency-based semi-supervised training, a common choice of loss
is:

L u (x, M) = D P(Ŷ = l|x, M), P(Ŷ = l|x̃, M) (3)
where D is a distance function such as KL divergence, or L2 norm [4, 14] and x̃

denotes a perturbation of the input x.
Specifically, it proposes a simple metric C measures the inconsistency across
perturbations. There are various ways to quantify consistency. Due to its empirically
observed superior performance:

C(B, M) = ε(x, M), where
x∈B

J
ε(x, M) = Var P(Ŷ = l|x, M), P(Ŷ = l|x̃1 , M), P(Ŷ = l|x̃ N , M) (4)
l=1
J is the number of response classes and N is the number of perturbed samples of the
original input data x, {x̃1 , …, x̃ N }.
Algorithm (AL-model pseudocode):
Require: xi = training stimuli.

Require: L = set of training input indices with known labels.
Require: yi = labels for labeled inputs i ∈ L.
Require: w(t) = unsupervised weight ramp-up function.
Require: f θ (x) = stochastic neural network with trainable parameters θ.
Require: g(x) = stochastic input augmentation function.
fort in [1, num epochs] do.
for each minibatch B do.
z i∈B ← f θ (g(xi∈B )), evaluate network outputs for augmented inputs.
z̃ i∈B ← f θ (g(xi∈B

)), again, with different dropout and augmentation.
loss ← − |B1i | log z i |yi |, supervised loss component.

i∈(B∩L)
+ω(t) C|B|
1
||z i − z̃ i ||2 , unsupervised loss component.
i∈B
update θ using, e.g., ADAM. Update network parameters.
end for.
end for.
return θ.
where D is a distance function such as KL divergence and x̃ denotes a perturbation

(augmentation) of the input x. C is the selection criterion for better integration of AL
selection mechanism in the SSL training framework. N is the number of perturbed
samples of the original input data x, {x̃1 , …, x̃ N }.
3.3 ICT Model
Interpolation Consistency Training (ICT) is a naive and competent computation

algorithm for training deep neural networks [15] in the semi-supervised learning
paradigm. ICT method boosts the prediction at an interpolation of unlabeled points
to be consistent with the interpolation of the predictions at those points. ICT moves
the decision boundary to low-density regions of the data distribution in classifica-
tion problems. It has been observed that ICT’s performance, when applied to stan-
dard neural network architectures, is optimal [9, 16] on the CIFAR-10 and SVHN
benchmark datasets.
Algorithm (ICT-model pseudocode):
Require: (x): f θ (x) neural network with trainable parameters θ .
Require: f θ (x) mean teacher with θ equal to moving average (MA) of θ .
Require: D L (x, y): collection of the labelled samples.
Require: DU L (x, y) : collection of the unlabelled samples.
Require: α: rate of moving average.
Require: w(t): ramp function for increasing the importance of consistency
regularization.
Require: T: total number of iterations.
Require: Q: random distribution on [0,1].

Require: Mixα (a, b) = λa + (1 − λ)b.
for t = 1, ..., T do.
Sample {(xi , yi )}i=1
B
∼
D L (x, y). Sample labeled minibatch.
L S = Cross-Entropy {( f θ (xi ), yi )}i=1B
Supervised loss (cross-entropy).
U
Sample u j j=1 , {u k }k=1 ∼ DU L (x). Sample two unlabeled examples.
U
U U U
ŷ j j=1 = f θ u j j=1 , ŷk k=1 = { f θ (u k )}Uk=1 . Compute fake labels.
Sample
λ ∼ Q. sample
an interpolation
coefficient.

u m = Mixλ u j , u k , ŷm =
Mix λ ŷ j , ŷk U. Compute
interpolation.
L U S = Consistency Loss f θ (u m ), ŷm i m=1 . e.g., mean squared error.
L = L S + w(t) · L U S . Total Loss.
gθ ← ∇θ .L. Compute Gradients.
θ = αθ + (1 + α)θ . Update moving average of parameters.
θ ← Step(θ, gθ ). e.g., SGD, Adam.
end for.
return θ.
ICT regularizes semi-supervised learning by encouraging consistent predictions
f (αu 1 + (1 − α)u 2 ) = α f (u 1 ) + (1 − α) f (u 2 ) (5)
at interpolations αu 1 + (1 − α)u 2 of unlabeled points u 1 and u 2 . ICT learns f θ in

a semi-supervised manner. ICT uses f θ , where θ are an exponential moving average
of θ . During training, predictions are

f θ (Mixλ u j , u k ≈ Mixλ f θ u j , f θ (u k ) , (6)
and correct predictions for labeled examples xi . Given a mixup [8, 17] operation:
Mixλ (a, b) = λ · a + (1 − λ) · b (7)
Interpolation Consistency Training (ICT) trains a prediction model f θ to provide

consistent predictions at interpolations of unlabeled points:

f θ Mixλ u j , u k ≈ Mixλ f θ u j , f θ (u k ) (8)
where θ is a moving average of θ .

4 Experiments
4.1 Implementation
It is a common practice of conducting experiments based on semi-supervised

learning, where to cut down the expenses, only a minor part of the training data
is labelled, and the rest do not have labels (unlabelled). To achieve accurate predic-
tions standardized procedures [18, 19] were used. The CIFAR-10 dataset comprises
of 70,000 color images, split into size of 32 × 32 each, between 50 K training and
10 K test images. The ten classes of our dataset comprised of images, mainly of
common objects around us namely buildings, dogs, ship and cars. Similarly, there
are 73,257 training samples and 26,032 testing samples also available in the SVHN
dataset, each of which can have a dimension of 32 × 32. Either sample is a detailed
image of the unique number of a particular house (0–9 s the ten classes as mentioned
earlier). Further, in case of CIFAR-10, we resize each image by zero-padding it on
each side by 2 px. The finally obtained image is restored back to its original size (32
× 32) by arbitrary cropping. Hence , a new image is obtained this way. Likewise, for
SVHN, each image is resized by zero-padding each side of it by 2 px, and again the
same cropping is done, producing a new image of resolution 32 × 32 px.
This process is again trailed by zero-mean and unit-variance image whitening.
At first, it is observed that the most convenient samples for consistency regular-
ization are located adjacent to the decision boundary. If any trivial perturbation δ is
added to these unsupervised samples u j , then it eventually pushes the resultant u j +δ
to such an extent that it ranges to the opposite side of the decision boundary. Hence,
u j + δ, can now be considered as a decent point for the application of consistency
regularization techniques since it violates the assumption of low-density separation.
On the contrary, the unlabeled points secluded from the decision boundary, i.e.,
having higher margins, do not experience these violations. Hence, in low-margin
unlabeled pointu j , it is detected that a particular perturbation δ, can result in u j and
u j + δ, to lie on obverse sides of thedecisionboundary. Rather, if the interpolations
u j + δ is considered, u j + δ = Mix u j + u k (Fig. 1 shows an example), where u k
is the successive randomly chosen unlabeled instance. There are 3 possibilities, first,
the unlabeled samples u j and u k belong to the same cluster. Second, the samples are
located in discrete clusters but are parts of a single class and third being the samples
are situated on separate clusters and also unlike classes. This elucidation proposes that
interpolations between arbitrary unlabeled samples may probably lie in low-density
areas. Hence, consistency-based regularization can be applied in such interpolations.
Spontaneously, as it can be concluded that decision boundaries having large margin
yields better results. Mixup is one of the methods, where prediction models are
compelled to linearly change between the samples, in this way the decision boundary
and the class have substantial distancein between
them. The paradigm f θ , on being

trained, predicts the “fake label”, Mix f θ u j , f θ (u k ) at location Mix u j , u k and
the mixup is extended to semi supervised learning setting (Fig. 1 shows an example),

where θ is considered to be the MA (i.e., cumulative average of the groups of subsets
Labeled
Img(i) Supervis
ed loss
Super
Unlabele vised
d Img(j) Loss+
Consis
Consistenc tency
Mixed y loss Loss
m (j, k)
Unlabele
d Img(k)
Fig. 1 Proposed block diagram for consistency regularization using semi-supervised learning,
where li → correct predictions for labeled images, u j , u m , u k → low margined unlabeled sample
images
of the main dataset) of θ . By virtue of SGD or stochastic gradient descent, for each
repetition t, the parameters θ are revised so as to curtail L = L S + w(t)· L U S , where
L S is the loss incurred due to use of typical cross-entropy mechanism on supervised
samples D L . Besides that, L U S is the state-of-the-art terminology in regularization of
interpolation consistency. These losses of consistency are reckoned over minibatches
which can be either supervised or unsupervised, and the significance of L U S (consis-
tency regularization term), is amplified subsequently after simultaneous iterations
by consuming the gradient function w(t). The unlabeled samples namely u j and u k
develop a couple of mini batches, sampling which, the L U S is evaluated. Further,
fake labels of the same are assessed accordingly.
4.2 Performance Analysis
The experiment is conducted by means of CNN-13 or ConvNet-13 along with the

architecture of Wide-Residual networks-28-2. The CNN-13 architecture is consid-
ered to be the standard benchmark architecture in recent state-of-the-art SSL methods
[20]. We are using its variant (i.e., the input layer is devoid of any additional Gaus-
sian noise) as instigated in the -model [7]. The results, i.e., the error percentage,
are provided for CIFAR10 and SVHN datasets (see Table 1). In each experiment,
we delineate the attributes like mean, variance and standard deviation across 3 inde-
pendently run trials. The value of consistency coefficient w(t) is initially set to 0.0.
Later on, at a quarter of the total number of areas, the value is elevated to its highest
possible limit, by executing the typical sigmoid routine. The loss in consistency was
Table 1 Outcomes for various models on CIFAR10 (5000 labels) and SVHN (1000 labels) datasets
S. No. Models CIFAR10 5000 labeled 60,000 SVHN 1000 labeled 76,277 unlabeled
unlabeled (test error %) (test error %)
1 29.66 ± 2.34 17.21 ± 3.01
2 AL 18.67 ± 1.23 9.01 ± 1.01
3 ICT 6.79 ± 0.12 2.54 ± 0.04
determined by computing the sum of squared distances between our target variable
and predicted values (mean squared error).
Table 1 shows outcomes for various well-known consistency regularization
models on CIFAR10 (5000 labels) and SVHN (1000 labels) datasets. It is observed
that for CIFAR10 as well as SVHN datasets, CT attains better results compared to
other models. An SSL algorithm can be evaluated by comparing its performance
against a novel algorithm using supervised learning. Hence, this research article
shows an effective comparison of three sophisticated algorithms, administered as π ,
CT, and AL in Table 1. After successful completion of the experiment, it is observed
that CT method outperforms this test as compared to other models and results in
a twofold reduction in the error obtained in the case of CIFAR10, and a drastic
reduction of four-folds is detected for SVHN dataset.
Additionally, in Table 1, it is perceived that CT considerably cuts down the
test error as compared to robust SSL approaches. For example, for 5000 samples
(labeled), it brings down the error percentage of the best-affirmed approach by almost
25%. In general, it is noticed that for a handful of data having labels, lesser the values
of the max-consistency coefficient and α, better the validation errors were obtained.
For SVHN, CT obtains test errors are competent concerning other well-known SSL
methods (Table 1). SSL algorithm, which uses the WRN-28-2, brings out the least
error percentage obtained for either of these algorithms. To find the actual efficiency
of CT contrary to these semi-supervised learning algorithms, the experiments were
conducted on Wide ResNet-28-2 architecture. The outcomes are jotted down in Table
1. CT proves to be more efficient on CIFAR10 and SVHN datasets as compared to
other models.
5 Conclusion
Machine learning [1] has a radical influence to various extents, and still its applica-
tion is often constrained due to the high cost of labeled data. Advancement in SSL
techniques [18] bridges the gap for those implementations where obtaining labeled
data is cost incurring. In this article, we have conducted a performance analysis with
the best-known consistency-based models, namely π , AL, and CT, using CIFAR 10
and SVHN, to which we have observed that CT yields the optimal result (having the
least error prediction). CT has two benefits when compared to other methods using
semi-supervised learning. Firstly, it is a very simple model demanding almost no

surplus computation, which is not the same for techniques using adversarial pertur-
bations or generative model training. Secondly, it outpaces robust reference points
on two standard datasets, in spite of being devoid of an extensive hyperparameter
tuning.
References
1. Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn.
15(2), 201–221 (1994)
2. Rasmus, A., Berglund, M., Honkala, M., Valpola, H., Raiko, T.: Semi-supervised learning with
ladder networks. In: Advances in Neural Information Processing Systems (2015)
3. Sajjadi, M., Javanmardi, M., Tasdizen, T.: In regularization of stochastic transformations and
perturbations for deep semi-supervised learning. In: Proceedings of the International Confer-
ence on Neural Information Processing Systems, NIPS’16, pp. 1171–1179, USA, 2016. Curran
Associates Inc. ISBN: 978-1-5108-3881-9
4. Laine, Aila, T.: Temporal ensembling for semi-supervised learning. In: International Confer-
ence on Learning Representations (2017)
5. Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: Mixmatch: A
Holistic Approach to Semi-Supervised Learning (2019)
6. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.C., Bengio, Y.:
Generative adversarial nets. In: Advance Neural Information Processing Systems, pp. 2672–
2680 (2014)
7. Athiwaratkun, B., Finzi, M., Izmailov., P, Wilson, A.G.: There are many consistent explanations
of unlabeled data: In: International Conference on Learning Representations
8. Luo, J., Zhu, M., Li, Y.R., Zhang, B.: Smooth neighbors on teacher graphs for semi-supervised
learning. In: CVPR (2018)
9. French, M.M., Fisher, M.: Self-ensembling for visual domain adaptation. In: International
Conference on Learning Representations (2018)
10. Mohammadi, M., Al-Fuqaha, A., Guizani, M., Oh. J.: Semisupervised deep reinforcement
learning in support of IoT and smart city services. IEEE Internet Things J. 5(2), 624–635
(2018). https://doi.org/10.1109/JIOT.2017.2712560
11. Chapelle, O., Schlkopf, B., Zien, A.: Semi-Supervised Learning, 1st edn. The MIT Press (2010).
ISBN 0262514125, 9780262514125
12. Yazici, C.-S., Foo, S., Winkler, K.-H., Yap, G.P., Chandrasekhar, V.: The Unusual Effectiveness
of Averaging in GAN Training (2018)
13. Gao, M., Zhang, Z., Yu, G., Arik, S.O., Davis, L.S., Pfister, T.: Consistency-Based Semi-
Supervised Active Learning: Towards Minimizing Labeling Cost (2019). arXiv:1910.07153
14. Berthelot, D., Raffel, C., Roy, A., Goodfellow, I.: Understanding and Improving Interpolation
in Autoencoders Via an Adversarial Regularizer (2019)
15. Bachman, P., Alsharif, O., Precup, D.: Learning with pseudo-ensembles. In: Advances in Neural
Information Processing Systems, pp. 3365–3373 (2014)
16. Goodfellow, O.V., Saxe, A.: Qualitatively characterizing neural network optimization prob-
lems. In: International Conference on Learning Representations (2015)
17. Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A., Yamamoto, K., Ha, D.: Deep Learning
for Classical Japanese Literature. arXiv:1812.01718 (2018)
18. Oliver, A.O., Raffel, C., Cubuk, E.D., Goodfellow, J.: Realistic evaluation of deep semi-
supervised learning algorithms. In: ICLR Workshop (2018)
19. Park, S., Park, J., Shin, S.-J., Moon, I.-C.: Adversarial dropout for supervised and semi-
supervised learning. In: AAAI (2018)
20. Balcan, M.-F., Broder, A., Zhang, T.: Margin based active learning. In: international Conference
on Computational Learning Theory, pp. 35–50. Springer (2007)
An Energy-Efficient PSO-Based Cloud
Scheduling Strategy
Ranga Swamy Sirisati, M. Vishnu Vardhana Rao, S. Dilli Babu,

and M. V. Narayana
Abstract Cloud computing provides useful services to users with extensive and
scalable resources that virtualized over the internet. It defined as a collection of
the communication and computing resources located in the data-center. The service
based on on-demand is subject to QoS, the load balance, and certain other constraints
with a direct effect on the user’s consumption of resources that are controlled by this
cloud infrastructure. It is considered a popular method as it has several advantages
that have been provided by a cloud infrastructure. The cloud scheduling algorithm’s
primary goal was to bring down the time taken for completion (the cost of execution)
of the task graph. The start time and the finish time for the task node influence the
task graph completion completed to the time (the cost). The task node sort order an
essential aspect that influences the start time and the finish time for every task node.
In a hybrid cloud, efficient dense particle mass-based cloud scheduling is efficient
because users need to maintain the security of the hybrid cloud. Different algorithms
with different algorithms suggested by researchers in the cloud. This paper proposes
particle swarm optimization (PSO)-based cloud optimal scheduling. Effective results
obtained in an efficient fuzzy mass-based PSO cloud scheduling.
Keywords Cloud scheduling · Particle swarm optimization · Cloud tasks · Load

balance · Fuzzyness
R. S. Sirisati (B) · M. Vishnu Vardhana Rao · S. Dilli Babu

Department of CSE, Vignan’s Institute of Management and Technology for Women, Hyderabad,
India
e-mail: sirisatiranga@gmail.com
M. Vishnu Vardhana Rao
e-mail: mvvraomca31@gmail.com
S. Dilli Babu
e-mail: dillibooks@gmail.com
M. V. Narayana
Department of CSE, Guru Nanak Institutions Technical Campus, Hyderabad, India
e-mail: mvnarayanacse@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_79
750 R. S. Sirisati et al.
1 Introduction
Cloud computing delivers the services and resources of computers that include user
applications, computer processing power, networks, specialized corporates, and data
storage space. Cloud computing permits users to make use of software, hardware
managed by service providers of the cloud without the knowledge of the servers.
The main advantage of moving to the cloud is the scalability of the application.
Unlike the other grids, the cloud resource scalability has permitted some real-time
provisioning of the resources to meet the application’s requirements. The various
cloud services, such as storage, the capacity of data transfer and storage, are used
to bring down the expenses. Some new scheduling proposed to defeat the properties
of the network among the clients and their resources. They may use a part of the
ideas of traditional planning to combine them with some techniques to ensure effi-
cient scheduling [1]. Typically, these errands scheduled by the client’s needs. Firstly,
the scheduling algorithms executed by the networks. The reduced performance that
faced in grids, also a need for actualizing planning in clouds. Further, it enables
the workflow management systems to meet the QoS requirements for applications
as opposed to a conventional approach needed earlier in the common multi-client
grid conditions. The various cloud services, such as the data transfer capacity, the
resources, the process, and the storage, have to be accessible at a low cost. The envi-
ronments are not comfortable to be made on the grid resources. Each framework site
another setup that can result in some additional exertion each time the application
ported to one more site [2].
The VMs further permit an application developer to make a completely
customized and convenient environment in their application as shown in Fig. 1.
A traditional way to do this is to make use of all clients’ immediate undertakings
as their base of overhead applications. The main issue is the association between
the overhead application and how there are unlike ways in which there are some
overhead expenses resources found in the cloud systems. In case of a significant
number of these straightforward assignments, the cost decreased on the off chance
that it can have for the complex tasks. This on-demand service that is given by the
cloud result in the necessity of some newer strategies of scheduling. These proposed
combining some traditional concepts of scheduling along with the new parameters
of schedule like the cost of efficient scheduling, job migration, energy consumption,
and bandwidth [3].
1.1 Performance Parameters of Cloud Scheduling
Several parameters of scheduling discussed below:

The make-span: (Completion time): The time variation is between the start and
the finish of an entire sequence in a schedule.
An Energy-Efficient PSO-Based Cloud Scheduling Strategy 751
Fig. 1 A cloud scheduling strategy
The Resource Cost: This was determined in a cloud by the capacity of resources
and the time occupied. Their powerful resources result in a higher cost.
Scalability: It refers to the capacity to deal with and perform along with an
increased workload with the capability to be able to enhance the resources effectively.
Reliability: The system and its ability to continue the work even in a situation of
failure.
Resource Utilization: A parameter defines the effectiveness of the utilization of
the resources by the system.
1.2 Existing Algorithms
Based on a straightforward method of classification, these algorithms grouped into

two. Batch-mode heuristic scheduling algorithms (BMHA) is the first one. Online
mode heuristic algorithms (OMHA) is the second one. BHMA places jobs in queues
and crates the one set based on timeslices. It comes into the concept and completed
within the predefine timeslices. The following are some of the models in these
classification algorithms.
• First come first served (FCFS) scheduling algorithm.

• Round robin (RR) scheduling algorithm).
• Min–Min scheduling algorithm.
• Max–Min scheduling algorithm.
Now, the second model contains numerous jobs that scheduled at the time they
arrived. As in the case of a cloud, the situation was unrelated. The quickness of every
workstation will vary fast, and the algorithms of online mode heuristic scheduling
will be well-suited for the cloud environment. The most fit task scheduling algorithm
(MFTF) is an ideal example of this online mode heuristic scheduling algorithm. The
process of scheduling in a cloud has 3-stages:
• The resource realizing and sifting—The agent of the data-center will be aware of
the present status of the resources available in a cloud and the remaining resources
that could be available. However, these resources may generally be the VMs.
The resource selection on the origin of the obtained facts the status of resources
involving to the queued jobs and the information on the cloud resource status
in connection with the cloud resources, a cloud scheduler will make decisions
relating to the deletion and the creation of some could nodes (VMs) for being
well-suited in the set of the jobs that to run.
• The Job Submission—for this stage, the job is submitted, and resource is selected.
2 Literature Review
The static scheduling versus the dynamic scheduling: In case of the static scheduling,
all information on the status of the resources available in the cloud and also in the
needs of the jobs. In the case of the dynamic scheduling of task allocation, the jobs
enter dynamically. The scheduler works hard on the decision making for allocating
resources within a stipulated time. The main improvement of the dynamic scheduling
over that of static scheduling is that the system does not have to have a runtime
behavior of application before running. In centralized scheduling, there is a central-
ized scheduler or a set of distributed schedulers to make global scheduling decisions.
Therefore, there is more resource control. The scheduler can continuously monitor
all the available resources with some ease on the implementation. The disadvantage,
however, is that there is a lack of scalability, performance, and fault-tolerance.
In decentralized scheduling, there is no other central entity that controls the
resources. Here, the lower schedulers called the local resources machine (LRM)
would manage and also maintain the various job queue. Energy-aware-scheduling
has some quick enhancement in cloud computing, and a data-center of a large scale
has played a vital role in the case of cloud computing. The depletion of energy for
these distributed systems (DS) is now a projecting issue that has been getting much
attention. Most of the application scheduling approaches had not been considering
the cost of energy for the network devices. The approaches also not considered the
cost of energy on the devices that have been a large portion of the consumption
of power in the enormous data -centers. The model used in minimizing the energy
consumption of servers and devices of the network that developed. Gang scheduling
is a job that is efficient in the time-sharing applied in the parallel and the distributed
systems. This way, every job needs many processors to a certain amount of paral-
lelism. The tasks are executed based on arrival, send-off. In the cloud-computing
setup, using the job migration with the mutable workloads, the size of the jobs and
their types fit in the computing of high performance in cloud-computing. Different
methodology proposed by dieerent authors are arranged in Table 1.
3 Problem Statement
The scheduling algorithms that are prevalent in clouds include, based on time,
cost scheduling algorithm. The proposed novel algorithm of compromised-time-cost
scheduling that considers the cloud-computing characteristics for accommodating
the workflows that are instance-intensive and cost-constrained using compromising
the time taken for execution and the cost with the user input that enabled on the
fly. Particle swarm optimization (PSO)-based heuristics for the scheduling of Work-
flow Applications: There is a PSO-based heuristic for scheduling the applications to
the cloud resources considering the computation cost and the cost of transmission.
It used for a workflow application using changing its computation as well as the
communication cost. The results of the experiment proved that this PSO was able
to achieve a good savings of cost and proper distribution of the workload on the
resources. An improved cost-based algorithm for a task scheduling: This is to make
an efficient mapping of the available tasks in the cloud. There is a need to improvise
the traditional activity based on costing, which has been proposed by task scheduling
strategy in a cloud environment. This algorithm will divide the user tasks based on
the priority of the tasks into three lists. It measures the resource cost and also the
computation performance, thus improving the computation to communication ratio.
4 Proposed Energy-Efficient PSO (EPSO) Based Cloud

Optimal Scheduling
Several strategies and algorithms for cloud schedules have recently proposed. The
beneficiaries of cloud-computing technology depend entirely on the cloud scheduling
method or algorithm. Fuzzy inheritance-based cloud scheduling is a useful compar-
ison with older fuzzy-based cloud scheduling. Cloud scheduling based on the opti-
mization algorithm is implemented by a set of efficient fuzzy cells to improve fuzzy
gene-based scheduling tasks. Effective thin-cell mass optimization used in cloud
scheduling to make cloud scheduling more efficient with PSO algorithm shown in
Fig. 2.
Table 1 Varoius scheduling algorithms with parameters

Author (Refs.) Methodology Parameters Algorithm Computing model
proposed worked theme
[4] Pheromone Completion time self-reliant Grid model
strategy of
updation
[5] Pheromone Reliability, Large-scale Grid model
strategy of completion time, Workflows
updation Execution cost
[6] Initialization of Deadline Workflows Grid simulation
methods constraint, model
execution cost
[7] Pheromone Deadline Time-varying Grid model
strategy of constraint, workflows
updation execution cost
[8] Internal search Completion time Workflows Cluster model
[9] Types of ant SLA constraints self-reliant All
agents of throughput,
power usage,
response time
[10] Basic ant colony Scheduling self-reliant CloudAnalyst
optimization
[11] Historical Power usage VM Placement Own java-based
scheduling to simulation toolkit
forcast future
demand of cloud
resources
[12] Vector algebra Resource VM Placement Cloud simulation
utilization, power model
usage
[13] Ant colony Deadline Workflows Grid model
optimization constraint,
execution cost
[14] Load balancing in Scheduling, self-reliant CloudSim
VM energy
consumption,
SLA violation
[3] Ant colony Completion time self-reliant Grid model
optimization of the last job
[2] Load balancing in Completion time, self-reliant CloudSim
VM load balancing
[15] Pheromone Completion time, self-reliant Grid simulation
strategy of load balancing model
updation
(continued)
Table 1 (continued)
Author (Refs.) Methodology Parameters Algorithm Computing model
proposed worked theme
[16] Pheromone nature Power usage VM Placement Cloud simulation
in virtual model
machines
[17] Load hot spots Scheduling self-reliant Cloud model
utilization ACO
[18] Pheromone Completion time self-reliant Grid simulation
strategy of model (GridSim
updation toolkit)
[19] Pheromone Scheduling self-reliant Not mentioned
strategy of
updation
[20] Online Response time, VM Placement CloudSim
environment throughput
[21] Basic ant colony Completion time self-reliant Cloud simulation
optimization model (CloudSim
toolkit)
[22] ACO and PSO Resource Workflows Cloud simulation
(hybrid) utilization ratio, model (MatLlab
completion time 7.0)
[23] Dynamic load Scheduling self-reliant Java
balancing
Cell populations are a population-based strategy for adaptation that mimics the
social behavior of fish or bird livestock. In the PSO system, each candidate solution
is called a piece. Each cell moves faster in the search space. It adjusts dynamically
depending on the experience of the associated cells and cell partners. In mathematics,
cells organized by the following Eqs. (1) and (2):
Cid (x + 1) = wi ∗ Cid (x) + j1 ∗ p1 ∗ [cid (x)−m id (x)]

+ p2 ∗ q2 ∗ C gd (x)−m id (x) (1)
Pid (x + 1) = Pid (x) + A ∗ Cid (x + 1) (2)
The parameters here specifies that, C id , x, wi , p, q, A, m represents cloud resource

positions, ith particle„ previous weights, particle vector societal factor parametrs.
Fig. 2 The proposed PSO algorithm flow
The performance of the proposed method was analyzed using the Cloud Sim Simula-
tion Kit. Cloud Sim Toolkit Mode supports researchers in the cloud-computing envi-
ronment, where the laboratory of cloud computing and distribution systems at the
University of Melbourne is released. Additionally, it provides features for modeling

and simulation of cloud-computing environments. According to Cloud Sim, users are
trying to display their etches in cloud form. Each cloudlet contains file size attributes
and several instructions to execute. The fuzzy-based genetic algorithm (FGA), stan-
dard PSO, basic genetic algorithm (GA), and fuzzy logic (FL) approache are used to
compare with proposed EPSO algoritm. The results are represented in Tables 2 and
3 the corresponding bar pictorial representation are shown in Figs. 3 and 4.
The results shows allow the broker to schedule a virtual machine associated with
the cloudlet schedule, the advantage of creating agent-driven processes. Cloud SIM
refers to virtual machines created on the class hosts described in the VM. Production
of hosts based on an agent that assigns each VM to different hosts. Data centers
can manage many hosts, and agents can dynamically change the host and VM
Table 2 Time is taken makespan for various algorithm execution (in a sec)
Algorithms/no of tasks 50 100 150 200
EPSO 382 556 690 906
FGA 404 578 712 924
PSO 430 600 730 1000
GA 458 626 758 1052
FL 488 700 848 1310
Table 3 Computational cost of tasks in cloud

Algorithms/no of tasks 50 100 150 200
EPSO 6200.46 12,001.56 17,200.64 22,027.08
FGA 6409.72 12,973.84 18,091.2 24,249.5
PSO 6801.1 13,981.34 20,203.78 26,534.18
GA 7201.44 15,044.46 21,890.5 28,503.7
FL 8045.64 16,245 22,904 30,984.68
Fig. 3 Time is taken Makespan vs Tasks

makespan for various 1400
algorithm execution (in a
sec) 1200
1000
Time insec)
800
600
400
200
0
EFPSO FGA PSO GA FL
Algorithm
Fig. 4 Time taken Execution Cost vs Tasks

for execition Cost vs 35000
Computational Cost
Tasks for various algorithm 30000
execution (in a sec) 25000
20000
15000
10000
5000
0
EFPSO FGA PSO GA FL
Algorithm
system. Various parameters used to evaluate the performance of specific FGA-based

scheduling in a cloud-computing environment. These include work time (duration),
cost of implementation, percentage of resource utilization, speed and efficiency.
6 Conclusion
Since cloud computing provides the resources based on demand, it is called the
on-demand resource provisioning based on a subscription. There is also a central
remote server that maintains data and application. Owing to the reliability, fault toler-
ance, effective communication and speed, cloud computing is now a fast-emerging
technology. Cloud computing provides several scheduling algorithms for solving
real-world computing resource provisioning. In this paper, an energy efficient fuzzy
particle swarm optimization algorithm developed to provide optimum optimization
in cloud filling. In the scheduling process, optimization is an important step. Due to
the lack of optimization, some scheduling processes yield fewer results than cloud
scheduling, using only genetic algorithm or fuzzy logic. It gave good results, and
the performance of the algorithm was much better than other algorithms. Therefore,
it concluded here that the fuzzy particle mass optimization algorithm is effective in
cloud scheduling.
References
1. Ge, J.W., Yuan, Y.S.: Research of cloud computing task scheduling algorithm based on
improved genetic algorithm. Appl. Mech. Mater. 347, 2426–2429 (2013)
2. Li, K., Xu, G., Zhao, G., Dong, Y., Wang, D.: Cloud task scheduling based on load balancing
ant colony optimization. Sixth Annu. Chinagrid Conf. 2011, 3–9 (2011). https://doi.org/10.
1109/ChinaGrid.2011.17
3. Kousalya, K.: To improve ant algorithm’ s grid scheduling using local search. Int. J. Comput.
Cogn. 7, 47–57 (2009)
4. Bagherzadeh, J., MadadyarAdeh, M.: An improved ant algorithm for grid scheduling problem
using biased initial ants. In: 3rd International Conference on Computer Research and
Development, pp. 373–378 (2011). https://doi.org/10.1109/CSICC.2009.5349368
5. Chen, W.-N., Zhang, J.Z.J.: An ant colony optimization approach to a grid workflow scheduling
problem with various QoS requirements. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.)
39, 29–43 (2009). https://doi.org/10.1109/TSMCC.2008.2001722
6. Chen, W.-N., Zhang, J., Yu, Y.: Workflow scheduling in grids: an ant colony optimization
approach. IEEE Congr. Evol. Comput. 3308–3315 (2007)
7. Chen, W., Shi, Y., Zhang, J.: An ant colony optimization algorithm for the time-varying
workflow scheduling problem in grids. IEEE Congr. Evol. Comput. 875–880 (2009)
8. Chiang, C.-W., Lee, Y.-C., Lee, C.-N., Chou, T.-Y.: Ant colony optimisation for task matching
and scheduling. IEE Proc. Comput. Digit. Tech. 153, 373–380 (2006). https://doi.org/10.1049/
ip-cdt
9. Chimakurthi, L., Madhu Kumar, S.: Power efficient resource allocation for clouds using ant
colony framework. Available from arXiv:11022608 (2011)
10. Dam, S., Mandal, G., Dasgupta, K., Dutta, P.: An ant colony based load balancing strategy
in cloud computing. Adv. Comput. Netw. Inform. 2, 403–413 (2014). https://doi.org/10.1007/
978-3-319-073507
11. Feller, E., Rilling, L., Morin, C.: Energy-aware ant colony based workload placement in clouds.
In: Proceedings of 12th IEEE/ACM International Conference on Grid Computing, pp. 26–33
(2011). https://doi.org/10.1109/Grid.2011.13
12. Ferdaus, M.H., Murshed, M., Calheiros, R.N., Buyya, R.: Virtual machine consolidation in
cloud data centers using ACO metaheuristic. In: Euro-Par 2014 Parallel Process, pp. 306–317.
Springer (2014). https://doi.org/10.1007/978-3-319-09873-9
13. Hu, Y., Xing, L., Zhang, W., Xiao, W., Tang, D.: A knowledge-based ant colony optimization
for a grid workflow scheduling problem. In: Adv. Swarm Intell. Notes Comput. Sci. 241–248
(2010). https://doi.org/10.1007/978-3-642-38703-6
14. Khan, S., Sharma, N.: Effective scheduling algorithm for load balancing (SALB) using ant
colony optimization in cloud computing. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4, 966–973
(2014)
15. Liu, A.L.A., Wang, Z.W.Z.: Grid task scheduling based on adaptive ant colony algorithm.
In: International Conference on Management e-Commerce eGovernment, pp. 415–418. IEEE
(2008). https://doi.org/10.1109/ICMECG.2008.50
16. Liu, X., Zhan, Z., Du, K., Chen, W.: Energy aware virtual machine placement scheduling in
cloud computing based on ant colony optimization. In: Proceedings of Conference on Genetic
and Evolution Computing, pp. 41–47. ACM (2014). https://doi.org/10.1145/2576768.2598265
17. Lu, X., Gu, Z.: A load-adapative cloud resource scheduling model based on ant colony
algorithm. In: IEEE International Conference on Cloud Computing Intelligent System 2011,
pp. 296–300. https://doi.org/10.1109/CCIS.2011.6045078
18. Mathiyalagan, P., Suriya, S., Sivanandam, S.N.: Modified ant colony algorithm for grid
scheduling. Int. J. Comput. Sci. Eng. 02, 132–139 (2010)
19. Nishant, K., Sharma, P., Krishna, V., Gupta, C., Singh, K.P., Nitin, et al.: Load balancing of
nodes in cloud using ant colony optimization. In: UKSim 14th International Conference on
Computing Model Simulation, pp. 3–8 (2012). https://doi.org/10.1109/UKSim.2012.11
20. Pacini, E., Mateos, C., Garino C.G.: Balancing throughput and response time in online scientific
clouds via ant colony optimization. Adv. Eng. Softw. 84, 31–47 (2015)
21. Tawfeek, M.A., El-Sisi, A., Keshk, A.E., Torkey, F.A.: Cloud task scheduling based on ant
colony optimization. In: 8th International Conference on Computer Engineering System,
pp. 64–69 (2013). https://doi.org/10.1109/ICCES.2013.6707172
22. Wen, X., Huang, M., Shi, J.: Study on resources scheduling based on ACO algorithm and PSO
algorithm in cloud computing. In: Proceedings of 11th International Symposium Distribution
Computing Application to Business Engineering Science, pp. 219–222 (2012). https://doi.org/
10.1109/DCABES.2012.63
23. Zhang, Z., Zhang, X.: A load balancing mechanism based on ant colony and complex network
theory in open cloud computing federation. Int. Conf. Ind. Mechatron. Autom. 2, 240–243
(2010). https://doi.org/10.1109/ICINDMA.2010.5538385
A Pronoun Replacement-Based Special
Tagging System for Bengali Language
Processing (BLP)
Busrat Jahan, Ismail Siddiqi Emon, Sharmin Akter Milu,

Mohammad Mobarak Hossain, and Sheikh Shahparan Mahtab
Abstract Natural language processing (NLP) is one of the most important thing for
human machine interaction and a very important thing for machine learning system.
In the world, over 27 crore people use Bengali as their first and mother language, and
it has its own written system, so it is very much important to process Bengali language
for natural language processing. In this research work, we have tried to demonstrate
an upgraded parts of speech tagging system (POS) for Bengali language, where we
have used special tagging system with general grammatical parts of speech based on
many different things like—Considering suffixes for verb, where get 68% success
rate. We have also added places name, occupation name, Bengali Name, Bengali
repeated word, digit of Bengali in both written and digit form, English acronym,
organization name for both cases. The success rate of tagging for genera tagging is
70 and 76% for special tagging which is ever highest. This tagging system can be
used for Bengali language processing (BLP) like—sentiment analysis for Bengali,
Bengali text summarization, etc.
Keywords POS tagging · Bengali POS tagging · Special tagging · BLP · NLP
Bengali · Bangla POS tagging
B. Jahan · I. S. Emon
Department of CSE, Feni University, Feni, Bangladesh
e-mail: hossenbipasa980@gmail.com
I. S. Emon
e-mail: emonsahriar0@gmail.com
S. A. Milu
Department of CSTE, Noakhali Science and Technology University, Noakhali, Bangladesh
M. M. Hossain
Department of CSE, Asian University of Bangladesh, Dhaka, Bangladesh
e-mail: mobarak3112@gmail.com
S. S. Mahtab (B)
Department of EEE, Feni University, Feni, Chittagong Division, Bangladesh
e-mail: mahtabshahzad@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_80
762 B. Jahan et al.
1 Introduction
Nowadays, we share, store, write, and publish text through the revolutionized
advances of Internet, hardware, and software. In this regard, a new era of information
explosion is impending. Users often find each retrieved document very lengthy that
is very tedious and time consuming to read [1]. Therefore, the automatic text summa-
rization is needed to process the huge amount of Internet data efficiently [2]. It is a
matter of fact that unlike English which has seen a large number of systems devel-
oped to cater to it; other languages are less fortunate [3]. So, the development of text
summarization has no mentionable progress for other languages specifically Bangla.
Point to be stated that we are Bangladeshi people, and our national language is Bangla.
Bangla is the fourth largest language in the Indo-European language family and the
sixth largest in the world in terms of number of native speakers. Bangla language is
also the seventh largest oral language in the world out of 3500 languages. Bangla
is the mother tongue of Bangladeshi people and the second largest oral language of
some states in India. According to economic surveys—2015, 62.2% of the educated
people in Bangladesh and most of them are only accustomed to Bengali language
only [4]. Summarize the text scientific documents, the literature, news documents,
books, etc. may be required today; the online content of Bangla news documents is
growing very fast, and mass people are reading it regularly [6]. Bangla news docu-
ment like, the online portal of Bangla magazine is also increasing rapidly, and also
electronic Bangla text is expanding in the cyber world with no borders, and there are
lots of great lessons from people and so on. Very few research works have been done
for improving the expanding big amount of text of Bangla. In addition, more research
work needs to be done for the community of Bengali-speaking people, especially for
retrieving Bangla information [7].
2 Methodology
The research work of Bangla text documents is much more difficult for the ensuring
things:
i. According to our study, automated tools for research work are rarely available
for Bengali language.
ii. Though a similar tool is under development and has limited features, but there
is no lexicon-based dictionary like WordNet for Bangla.
iii. Researchers have done differently, and there is very little consolidation.
iv. Lack of free and open source software [8].
v. The Bengali language originated from Sanskrit and is mostly maintained incon-
sistency rules. For proper recognition of the structure of sentences, the subjects
and objects of all the sentences need to be identified which are not easy in Bangla
compare to English.
A Pronoun Replacement-Based Special … 763
Despite these difficult issues, this method is focused on these difficulties in the
output of Bangla data recovery methods. In data recovery output, where the corre-
sponding nouns are missing, here, some sentences can be extracted with dangling
pronouns. User would not get any appropriate message from summary, and there are
chances to misunderstand the text for these pronouns. So, these dangling pronouns
need to be replaced by corresponding nouns [9]. Based on the analysis of Bangla news
documents, the immediate preceding sentence or in the second immediate preceding
sentence, it has been observed that the noun related to any pronoun is existed 88.63%
of the time.
The purpose to study this work is to recover Bangla information output methods
to free from dangling pronouns. Otherwise, a user may be misunderstood the text
because of an only dangling pronoun that is sufficient for sending the incorrect
message. In between in the situation, a method to solve this problem with the
following heads is proposed here contributions [10, 11]:
(i) Verify the nature of a word is needed to dependency parsing. In which the
inactive words are tagged by using dependency parsing; (ii) Identify pronouns and
distinguish subject or object; (iii) Detecting every words of the nature of Bangla text
sentence as noun, pronoun, regardless of verbs, subjects, objects, digits, acronym,
organization, people, and place names, etc. There are two phases in which words are
tagged as general and special tagging; (iv) Identify nouns related to pronouns and
replace pronouns in appropriate format [12].
Derived from the research of the authors’, this is found the much better result for
pronoun replacing in Bangla than others. Some rules are used here after analyzing
some news documents. For instance, special tagging, dependency parsing, and subject
and object identification, and all pronoun replacements are done after completion,
based on these rules. To carry out this work, we have been selected 3000 Bangla
news documents [collected from the most popular Bangladeshi newspaper the Daily
Prothom-Alo (February 2020)].
2.1 General Tagging
All words are tagged (like noun, pronoun, adjective, verb, preposition, etc.) by using a
lexicon database [1] and SentiWordNet [2]. Using lexicon database, we can be tagged
the words as “JJ” (Adjective), “NP” (Proper noun), “VM” (Verb), “NC” (Common
Noun), “PPR” (Pronoun), etc. On the other hand, SentiWordNet has list of words
with tag as “a” (Adjective), “n” (Noun), “r” (Adverb), “v” (Verb), “u” (Unknown).
Depend on these lists of words that are predefined, we have experimented on 200
Bangla news documents and found that 70% words can be tagged. Though we use
word stemming to find base form of word, here when the verb is not active form, then
it couldn’t be stemmed. In fact, it is very difficult for identifying verb because there are
many suffixes in Bangla. For instance, depending on the tense and person, the English
words “do” may be “doing,” “did,” and “does,” but on the other hand, the word may
have different forms in Bangla. To consider the present continuous tense like, “ ”
764 B. Jahan et al.
(kor-do), three main forms of this word can only have depend on the first, second,
and third person. Also it can be “ ” (doing) for first person, “ ” (doing)
for second person and “ ” (doing) for third person, respectively. To consider
the present continuous tense like, “ ” (kor-do), three main forms of this word
can only have depend on the first, second, and third person. Also it can be “ ”
(doing) for first person, “ ” (doing) for second person and “ ” (doing)
for third person, respectively. All these meanings for the forms of verbs of “you” are
also different in Bangla. As, “ ” (you are doing), “ ” (you
are doing), “ ” (you are doing) where those terms are specified in present
continuous tense and also with second person. Thus, this word “ ” (do) may have
the next forms: “ ” (do), “ ” (do), “ ” (do), “ ” (do), “ ”
(doing), “ ” (doing), “ ” (doing), “ ” (doing), “ ” (doing), “
” (did), “ ” (did), “ ” (did), “ ” (did), “ ” (did), “
” (do), “ ” (do), “ ” (did), “ ” (did), “ ” (did), “ ”
(did), “ ” (did), “ ” (do), “ ” (did), “ ” (did), “ ”
(did), “ ” (did), ” (doing), “ ” (doing), “ ” (doing),
“ ” (doing), “ ” (doing), “ ” (doing), “ ” (doing), “
” (doing), “ ” (doing), “ ” (doing), (doing),
“ ” (doing), “ ” (doing), “ ” (do), “ ” (do), “ ”
(do), “ ” (do), “ ” (do). However, verb identification plays a vital role for
language processing because this is the main root of a sentence. Thus, there is no any
comparison between the complexity of verb in Bangla and English. A list of suffixes
is considered as for the final checking in following: “ ” (itechhis), “ ”
(techhis), “ ” (itis), “ ” (ile), “ ” (ibi), etc. The result of the percentage
of word tagging has been amplified from 68.12% (before using the list of suffixes
[4]) to 70% (after using the list of suffixes). Some tagging we get in this step can be
an initial tag and some tags updating in the next steps. Again, certain words will be
specifically tagged as acronym, named unit, occupation, etc., in the next step.
2.2 Special Tagging
After general tagging, special tagging was introduced to identify the words as
acronym, elementary form, numerical figure, repetitive words, name of occupation,
organization, and places.
1. Examining for English acronym: When the words are formed by the initials of the
other words, then it is called acronym. Such as, “ ” (UNO), “ ”
(OIC), “ ” (USA), etc. For examining these kinds of words, when we
can separate these words that like: “ ” (UNO) to match with “ ” (U),
“ ”, “ ” (O), those are matched every letter of the words. Actually, we can
write all English letters in Bangla like: A for (“ ”), B for (“ ”), C for (“ ”),
D for (“ ”), … W for (“ ”), X for (“ ”), Y for (“ ”), Z for
(“ ”), and if we can sort them by descending order depend on their string
lengths, where W (“ ”) will be in the first place and A (“ ”) will be in

the last place, then match every letter of the words. It is important in descending
order that is always used to ensure the longest match. Such as, “ ” (M) does
not match with “ ” (A), but it will match with “ ” (M). This experiment
shows that 98% success rate for this case.
2. Studying for Bangla elementary tag: Bangla letters with spaces, like: “ ”
(A K M), “ ” (A B M), etc. These letters will be tagged as Bangla primary
tag. We have gotten based on research; the accuracy of the elementary result is
100%.
3. Studying for repetitive words: Repetitive words are special form of word combi-
nation, where same word can be placed for two times consecutively. For example,
“ ” (thandathanda—cold cold), “ ” (boroboro—big big), “
” (chotochoto—small small), etc. There are some words; they are
partially repeated such as “ ” (khawadawa—eat). We have found
100% accuracy on identifying repetitive words.
4. Studying for numerical form: There are three conditions for recognizing the
numerical representation in words and digits which are examined as follows:
(a) It is formed by following the first part of the word, like as, 0 for ( ), 1 for
( ), 2 for ( ), …, 9 for ( ) or “ ” (one), “ ” (two), “ ” (three),
“ ” (four) to “ ” (ninety nine). The decimal point (.) is also
considered when examining the numerical form from digits.
(b) The next part (if any) is followed by: “ ” (hundred), “ ” (thousand),
etc.
(c) Finally, it can have suffixes such as, “ ” (this), “ ” (this), “ ” (en),
etc.
After the experiment on our sample test documents, 100% numerical form
can be found from both numerical values and text documents.
5. Studying for name of occupation: Occupation has a significant word and for
the human named entity identification, occupation is very much helpful by
which named entity can be recognized. If we get any word as occupation,
then we may consider the immediate next some words to find out named
entity. We have retrieved some entries for the occupation of Bangladesh from
a table such as, “ ” (shikkhok—master), “ ” (sangbadik-
journalist), etc. Every word has matched with these words (that we collected
from different online source), and if any matches are found, then tagged as
occupation. Here, “ ” (shikkhok-master) will turn into “ ”
(prodhanshikkhok-Head master), and so on. From this study, it may identify
96% for occupation.
6. Studying for name of organization: Name of organization is an important factor,
where any type of word may be the element of organizational name. From our
analysis, it has been given below:
a. The following complete name of the organization, which is depended
on the acronym of the name that is together with this parenthesis. For
766 B. Jahan et al.
example, “ ” “RajdhaniUnnayanKarti-
pakkh (RAJUK)-Anticorruption Commission (ACC).” Depending on the
total number of letters in the acronym, if there is any acronym bounded with
parentheses. Then, before the acronym of the number of same words are
tagged as the name of organization. In this case, the acronym can be added
to the initial letters of the word immediately after the commencement of
the acronym; otherwise, this process will not be applicable. Research shows
that after name of acronym in parentheses may be found for the name of
organization 85%.
(b) The organization name with last part may contain certain words. Such as,
“ ” (limited-limited), “ ” (biddaloy-school), “ ”
(montronaloy-ministry), “ ” (kong–kong), etc., [5]. Along with the
above point. If any such of words are presented in the text according to the
point (b), then immediately check the three words of the particular word.
Consisting of more than three words and then also selecting three words of
the organization, it will be considered sufficient for the purpose. Uncertainty
when the three words are found as noun, name entity or any blocked word,
then call them the name of an organization. It is found that the organizations
name can be accepted for 85% times based on the point (b).
7. Studying for name of place: There is a table the name of places of Bangladesh, it
is made with 800 names for the list of division, district, upazila, and municipality.
Here, the top level is division, second level is district, and third level is upazila
or municipality in area-based separation. In addition, we have analyzed 230
countries names and their capitals. In this way, about 91% of the place names
can be identified in our experiment.
Word tagging success rate experimental result of word tagging success rate is given
in Table 1 for each phase. The Experiment has been conducted on 32,143 words of
200 test documents.
In the results of special tagging (shown in Table 2), it has been found that some
nature of words have been identified for 100% as acronym, initial, numerical figure
from digits and words. The procedures have followed some patterns to identify these
Table 1 Results of word

Phases of word Number of tagged Percentage of
tagging of different phases
tagging words tagged words (%)
General tagging 21,896 68.12
Considering 22,500 70.00
suffixes for verb
Special tagging 26,098 76.98
Table 2 Exploratory different types of word results of special tagging

Types of word Success rate (%)
English acronym 98
Repeated words 100
Bengali name 100
Digit 100
Occupation 96
Places name 91
Organization name for both cases 85
Table 3 Results on number of pronoun replacement for 200 news documents

Total pronoun Unaffected Incorrectly exchanged/replaced Properly exchanged/replaced
301 71 15 215
(acronym, initial, numerical figure from digits and words) and not based on any
limited number of predefined words. These specific patterns are the main reason
for 100% success rate achievement. But some nature of words can’t be identified
completely. Here, occupation has been identified as 96%, name of organization by
considering acronym 85%, name of human and places have been identified as 100
and 91% correspondingly. The procedures have utilized some lists of predefined
words to identify occupation, name of organization, name of human, and places.
3.1 Results on Replacement of Pronoun
From the evaluated 200 documents, we have counted the pronoun manually and
crosscheck it with our program. The results of replacement of pronoun and number
of pronouns have been given in Table 3.
4 Conclusion
Natural language processing (NLP) is one of the most important thing for human
machine interaction and a very important thing for machine learning system. In the
world, over 27 crore people use Bengali as their first and mother language, and it
has its own written system, so it is very much important to process Bengali language
for natural language processing. In this research work, we have tried to demonstrate
an upgraded parts of speech tagging system (POS) for Bengali language, where we
have used special tagging system with general grammatical parts of speech based on
768 B. Jahan et al.
many different things like—Considering suffixes for verb, where get 68% success
rate. We have also added places name, occupation name, Bengali name, Bengali
repeated word, digit of Bengali in both written and digit form, English acronym,
organization name for both cases. The success rate of tagging for Genera tagging is
70 and 76% for special tagging which is ever highest. This tagging system can be
used for Bengali language processing (BLP) like—sentiment analysis for Bengali,
Bengali text summarization, etc.
References
1. Azmi, A.M., Al-Thanyyan, S.: A text summarizer for arabic. J. Comput. Speech Lang. 26(4),
260–273 (2012)
2. Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
3. Indian Statistical Institute: A Lexical Database for Bengali 2015 [Online]. Available https://
www.isical.ac.in/∼lru/wordnetnew/index.php/site/aboutus. Accessed 28 Oct 2015
4. Chakma, R. et al.: Navigation and tracking of AGV in ware house via wireless sensor network.
In: 2019 IEEE 3rd International Electrical and Energy Conference (CIEEC), pp. 1686–1690.
Beijing, China, 2019. https://doi.org/10.1109/CIEEC47146.2019.CIEEC-2019589
5. Milu, S.A., et al.: Sentiment analysis of Bengali reviews for data and knowledge engineering:
a Bengali language processing approach. In: Bindhu, V., Chen, J., Tavares, J. (eds.) Interna-
tional Conference on Communication, Computing and Electronics Systems. Lecture Notes in
Electrical Engineering, vol. 637. Springer, Singapore (2020). https://doi.org/10.1007/978-981-
15-2612-1_8
6. Notes for Students: Rule Based System, Nov 2000 [Online]. Available https://www.jpaine.org/
students/lectures/lect3/node5.html. Accessed 01 Apr 2017
7. Gpedia: Gpedia, your encyclopaedia [Online]. Available www.gpedia.com/bn. Accessed 25
June 2016
8. BdJobs.com: Occupation in Bangladesh, Name of Occupation in Largest Job Site in
Bangladesh, Feb 2016 [Online]. Available https://bdjobs.com. Accessed 25 June 2016
9. Emon, I.S., Ahmed, S.S., Milu, S.A., Mahtab, S.S.: Sentiment analysis of Bengali online
reviews written with English letter using machine learning approaches. In: Proceedings of the
6th International Conference on Networking, Systems and Security (NSysS ’19). Association
for Computing Machinery, New York, NY, USA, pp. 109–115 (2019). https://doi.org/10.1145/
3362966.3362977
10. Khan, M.F.S., Mahtab, S.S.: PLC based energy-efficient home automation system with smart
task scheduling. In: 2019 IEEE Sustainable Power and Energy Conference (iSPEC), pp. 35–38.
Beijing, China, 2019.https://doi.org/10.1109/iSPEC48194.2019.8975223
11. Ahmed, S.S., et al.: Opinion mining of bengali review written with english character using
machine learning approaches. In: Bindhu, V., Chen, J., Tavares, J. (eds.) International Confer-
ence on Communication, Computing and Electronics Systems. Lecture Notes in Electrical
Engineering, vol. 637. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-261
2-1_5
12. Mahtab, S.S., Monsur, A., Ahmed, S.S., Chakma, R., Alam, M.J.: Design and optimization of
perovskite solar cell with thin ZnO insulator layer as electron transport. In: 2018 International
Conference on Advancement in Electrical and Electronic Engineering (ICAEEE), pp. 1–4.
IEEE, Gazipur, Bangladesh (2018). https://doi.org/10.1109/ICAEEE.2018.8643012
A High Performance Pipelined Parallel
Generative Adversarial Network
(PipeGAN)
Rithvik Chandan, Niharika Pentapati, Rahul M. Koushik, and Rahul Nagpal
Abstract Generative Adversarial Networks (GANs) are gaining popularity with

applications in unsupervised, supervised as well as reinforcement learning for gen-
erating images, videos, and other artefacts. However, the inherently sequential nature
of GANs is an Achilles heel to widespread adoption. In this paper, we propose and
experimentally evaluate a novel sophisticated pipelined parallel version of GANs
by dividing the training process into different balanced pipeline stages. Our exper-
imental evaluation of the proposed technique shows significant performance gain
up to 30% and 23% with an average speed-up close to 23% and 15% as compared to
the serial implementation in the context of NumPy and Pytorch respectively when
used to accurately classify real and fake images from standard MNIST and Fashion
MNIST datasets.
Keywords Generative adversarial networks · Parallelism · Pipeline parallelism ·

Performance · MNIST
1 Introduction
Generative Adversarial Networks [1] are gaining popularity with extensive use in
the generation of realistic images of objects, people, and scenes among others. Other
applications of GANs include image-to-image translation tasks such as translating
photos of night to day and autumn to spring. A GAN is usually composed of two
R. Chandan · N. Pentapati · R. M. Koushik · R. Nagpal (B)

Department of Computer Science and Engineering, PES University, Bangalore, India
e-mail: rahulnagpal@pes.edu
R. Chandan
e-mail: rithvikchan1@gmail.com
N. Pentapati
e-mail: pniharika369@gmail.com
R. M. Koushik
e-mail: rahulkoushik1999@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_81
770 R. Chandan et al.
contending neural networks co-trained at the same time, involving forward and back-
ward propagation. The co-training, though effective, suffers from the curse of tight
dependencies posing a significant parallelism challenge and is not amenable to tra-
ditional data and task parallelization techniques. Pipeline parallelization divides the
work into different subtasks that constitutes pipeline stages thereby overlaps execu-
tion as data passes through stages. If stages are kept sufficiently balanced, this style
of parallelism offers good speed-up even in the presence of dependencies which ren-
der data and task parallelism impotent. Pipeline parallelism is ideal for accelerating
GANs involving thousands of batches in each epoch by overlapping batches.
In this paper, we describe and experimentally evaluate our novel and sophisticated
pipelined parallel implementation of GANs that accurately classify and generate new
images with significant speed-up. Our main contributions are as follows:
1. Design and implementation of a novel and sophisticated pipelined parallel imple-
mentation of GANs.
2. Detailed Experimental Evaluation of our scheme on standard MNIST and Fashion
MNIST dataset demonstrating speed-up of up to 30% with an average speed-up
of 20%.
The rest of the paper is organized as follows. We discuss related work in Sect. 2,
followed by our methodology and pipelined parallel algorithm in Sect. 3. We discuss
our implementation in Sect. 4 followed by a hard-nosed experimental evaluation of
our technique in Sect. 5. We present the results in Sect. 6 and conclude in Sect. 7.
2 Related Work
PipeDream [2] parallelize deep neural networks separating model layers into various
stages using an optimization algorithm mapping each stage to GPU. Every stage
performs a forward propagation and passes the result to the next, and the loss is
calculated at the final stage. This loss is propagated backward and the weights of the
model are updated. PipeDream pipelines the training of internal layers. However,
PipeDream does not pipeline the training process across multiple networks.
GPipe [3] proposes a batch splitting algorithm for fast training of large scale Deep
Neural Networks. During forward propagation, mini-batches are divided into smaller
batches and are pipelined over accelerators. Similarly, during backward propagation,
the gradients from each smaller batch are aggregated for every mini-batch to update
the parameters used in the model. Although, this technique attempts pipelined par-
allelism in deep neural networks. However, GPipe does not consider adversarial
settings peculiar to GANs.
Another approach to specifically parallelizing the training of Convolutional Neural
Networks (CNNs) [4] strives to maximize the communications. It uses a thread once
the gradients are computed followed by concurrent communication of data generated
during backpropagation and computations from different layers. This approach is not
focused on GANs.
A High Performance Pipelined Parallel Generative … 771
Whereas all the earlier techniques are specific to pipelining or parallelizing within
a neural network and don’t take into consideration any adversarial settings like that
of a GAN which separates our work from earlier proposals. Earlier techniques, as
discussed, focus mostly on the performance enhancements while training the inter-
nal layers of the neural network. Our technique specifically resolves the difficult
challenge of pipelining the training process itself.
3 Methodology
3.1 Training
Figure 1 shows various stages of computation and evaluation used in the training of
the GAN with input training data of images split into multiple mini-batches of size
d. These stages with input and output are outlined as follows:
• Discriminator Real Forward Propagation: Takes a mini-batch of real images and
outputs probability Preal that differentiates real from fake images.
• Gradient Compute Discriminator Real: Takes Probability Preal and determines loss
using the function in (1). This is followed by padding the computed loss with a
cost function used to compute gradients for each discriminator layer.
L discriminatorfake (Preal ) = 1/Preal (1)
• Discriminator Weight Update Real: The discriminator’s weights are updated using
gradients.
• Generator Fake Forward Propagation: Forward propagation of the generator is
performed by transforming a random noise into d images. These images are labeled
as Fake Images.
• Discriminator Fake Forward Propagation: Fake Images are passed to the discrim-
inator which emits a probability Pfake .
Fig. 1 Stages of GAN training

• Gradient Compute Discriminator Fake: Gradients are computed using the proba-
bility Pfake based on the loss function specified in (2).
L discriminatorfake (Pfake ) = 1/Pfake (2)
• Discriminator Weight Update Fake: Weights are updated based on gradients com-
puted in the Gradient Compute Discriminator Fake stage.
• Generator Forward Propagation: A new set of fake images is generated using the
same noise used to train the discriminator in order to train the generator.
• Discriminator Generator Forward Propagation: These images are passed to dis-
criminator to emit probabilities.
• Gradient Compute Generator: Probabilities are used to calculate gradients.
• Generator Weight Update: Generator weights are updated based on the gradients
computed.
A total of 40 epochs are performed on the MNIST and Fashion MNIST datasets as
the generated images reach a desirable quality at this point.
3.2 Pipelining Algorithm
To implement pipelining in the GAN training process, the training data is split into
mini-batches and the 11 stage pipeline as described in Sect. 3.1 is used for training.
These overlapped executions of stages are interconnected in a pipeline structure to
provide efficiency. Figure 2 shows the pipelined structure using a timing diagram
for the abbreviated stages. A column denotes a point in time along with the units
running at that instant. As multiple units run simultaneously, pipeline parallelism
is harnessed. At the beginning, training data consisting of the images are split into
mini-batches which are maintained in a queue called the BatchQ. A mini-batch from
this queue is passed to stage 1 and in the next time step, it is propagated to stage 2.
Simultaneously, the next mini-batch enters stage 1 and so on. The pipeline is full and
runs at maximum capacity after 10 such iterations. The first 10 iterations are marked
by the HandleStart() function in Algorithm 1. When all functional units are busy,
mini-batches are sent to the functional unit queue one after another in a pipelined
fashion. Once the mini-batch finishes all the stages of the functional unit queue, that
mini-batch is popped from the BatchQ. These sequences of steps are done repeatedly
for e number of times.
Dependency analysis is performed in the kernel part of the pipeline. The results are
shown in Fig. 3. Nodes marked in grey are Weight Updates which causes Read After
Write (RAW) and Write After Write (WAW) dependencies. These dependencies
occur during shared weight update that causes conflicts. These conflicts are resolved
by explicit synchronization mechanisms such as locks or mutexes.
Fig. 2 Pipelining time diagram
Fig. 3 Dependence graph for functional units
4 Implementation
In this section, we describe our implementation including the pipelined parallel

implementation as described above. The generator and discriminator networks have
architectures similar to traditional neural networks. Each layer consists of a collection
of nodes that operate on a received weighted input and transforms it with a set of
mostly non-linear functions. The layers then pass these values as output to the next
layer that in turn performs similar actions.
The generator consists of 11 layers where the activation function for 10 layers is
LeakyRelu. The output activation function used is tanh. These are standard activation
functions used in a generator model. The numbers of nodes in each layer are as
follows—100, 128, 256, 512, 1024, 784. The discriminator consists of 4 layers
where the activation function for 3 hidden layers is LeakyRelu. These are standard
activation functions used in a discriminator model. The output activation function
used is sigmoid. The numbers of nodes in each layer are as follows—784, 512, 256, 1.
Algorithm 1: Pipelining Method

Input: Queue containing mini-batches (Batch Q), functional unit queue (FuncQ),
H andleStar t function, total epochs (e), number of mini-batches(num batches ),
number of functional units (n), cr eateT hr ead and star t T hr ead functions to
implement threading
length pi pe =n + num batches − 1;
for epoch in range(e) do
threads = [] ;
for iter in range(length pi pe ) do
if iter < n then
H andleStar t () ;
else
if (len(Batch Q) > n)) then
Batch Q. pop() ;
input = Batch Q(n) ;
for (i in range(2, 9)) do
threads. push(threadCr eate(FuncQ[i])) ;
end
threads. push(threadCr eate(FuncQ[1], input)) ;
else
for (le f tover in range(Batch Q)) do
threads. push(threadCr eate(FuncQ[le f tover ])) ;
end
end
end
star t T hr eads() ;
end
end
We have used the standard Modified National Institute of Standards and Technol-
ogy (MNIST) dataset having a large number of handwritten digits as well as Fashion
MNIST dataset with 28 × 28 grayscale images, from 10 classes namely apparel types
to further corroborate our results. Figures 4 and 5 depict generated images using these
datasets.
A serial, a non-pipelined parallel and pipelined parallel version of GAN is imple-
mented using Python. The non-pipelined parallel version exploited vectorization and
data parallelism wherever possible along with concurrent execution in the weight
update phases. Two pipelined versions using NumPy and PyTorch implement the
pipelined architecture as explained earlier. Pipelined NumPy version has the advan-
tage of not requiring backward compatibility, and the Pipelined PyTorch imple-
mentation generates quality MNIST and Fashion MNIST images at even higher
performance.
Fig. 4 MNIST Images generated by Pipelined Pytorch-GAN
Fig. 5 Fashion MNIST Images generated by Pipelined Pytorch-GAN
Fig. 6 Limitations in non-pipelined parallel implementation
5 Experimental Evaluation
The experimental setup consists of a GPU enabled Google Colab with 12.72GB
RAM, 68.4GB disk space, and a Python3 Google Compute Engine GPU having
multiple cores to leverage the pipeline parallel execution. The implementation of
non-pipelined parallel version shows that the overhead of spawning the threads not
only steals any possible benefits but essentially takes more time compared to serial
Fig. 7 Execution times of pipelined and serial numpy and PyTorch versions
versions as depicted in Fig. 6 due to tight dependencies. However, pipelining can

work even in the presence of dependencies by overlapping execution leading to a
significant amount of speed-up.
Figure 7 depicts the speed-up obtained by our pipelined NumPy and PyTorch imple-
mentations compared to the respective serial version. We observe significant perfor-
mance gain up to 30% with an average speed-up as 23% for NumPy. PyTorch serial
GAN implementation has some pre-existing optimization resulting in a relatively
lesser speed-up compared to NumPy. However, our pipelined PyTorch version still
gained significantly in performance with maximum speed-up of 23% and an average
speed-up of 15% compared to pre-optimized serial version in the context of PyTorch.
7 Conclusion
Generative Adversarial Networks (GANs) are growing in popularity with extensive

applications in a variety of learning techniques for generating as well as differen-
tiating artefacts. However, inherent sequential nature has inhibited parallelization
of GANs in the past. We design and implement a novel and sophisticated pipeline
parallel GANs (PipeGan) by dividing the training process into different stages. Our
experimental evaluation demonstrate that the proposed pipeline parallel technique
achieves significant performance gain of 30% and 23% with average speed-up close
to 23% and 15% as compared to the serial implementation in the context of NumPy
and Pytorch respectively and accurately classifying real and fake images from stan-
dard MNIST and Fashion MNIST datasets. In the future, we intend to experiment
with various buffering and caching techniques in our pipeline parallel implementa-
tion in the quest of further performance gains.
References
1. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.,
Bengio, Y.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, 2672–2680 (2014)
2. Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N.R., Ganger, G.R., Gibbons,
P.B., Zaharia, M.: PipeDream: generalized pipeline parallelism for DNN training. In: Proceed-
ings of the 27th ACM Symposium on Operating Systems Principles (SOSP ’19). Association
for Computing Machinery, New York, NY, USA, pp. 1–15 (2019)
3. Huang, Y., Cheng, Y., Chen, D., Lee, H., Ngiam, J., Le, Q.V., Chen, Z.: Efficient Training of
Giant Neural Networks using Pipeline Parallelism, GPipe (2018)
4. Lee, S., Jha, D., Agrawal, A., Choudhary, A., Liao, W.: Parallel deep convolutional neural
network training by exploiting the overlapping of computation and communication. In: 2017
IEEE 24th International Conference on High Performance Computing (HiPC), Jaipur, pp. 183–
192 (2017)
Electroencephalogram-Based
Classification of Brain Disorders Using
Artificial Intelligence
Laxmi Raja and R. Santhosh
Abstract Electroencephalogram (EEG) is a medically advanced screening tech-

nology currently used to classify various brain disorders and problems. In this paper,
we have proposed a new framework for acquiring EEG signals so that it can be
beneficial to various researchers of the field. Ag/AgCl electrodes are used to obtain
EEG signals. Depending on the requirement of particular studies, different number of
channels can be used. The electrodes are placed over the scalp using gel and signals
obtained. The data is pre-processed to remove unwanted noise/disturbance. Dual tree
complex discrete wavelet transform (DTCWT) was used to transform the data, so
that redundancy is reduced to a minimum. Signals were classified using Gaussian
mixture model (GMM). We present this model in detail which can be used for studies
involving collection of EEG data in medical illnesses.
Keywords Electroencephalogram · DTCWT · GMM
1 Introduction
Electroencephalography (EEG) is an inexpensive and versatile tool which is used in

the last 85 years to investigate the electrical signals generated in the brain. Advance-
ment in digital technology made EEG cheap and user-friendly and provided effective
pre-processing and classification of signals which cannot be easily observed by the
naked eye [1].
EEG has a very high temporal sensitivity and is used to evaluate cerebral func-
tioning. The list of clinical uses of EEG is long including evaluation of epilepsy,
L. Raja (B) · R. Santhosh

Department of CSE, Faculty of Engineering, Karpagam Academy of Higher Education,
Coimbatore, India
e-mail: laxmirajaphd@gmail.com
R. Santhosh
e-mail: santhoshrd@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_82
780 L. Raja and R. Santhosh
sleep disorders and brain death and to monitor depth of anesthesia. Most of the EEG
signals range between 1 and 20 Hz. They have bandwidths like alpha, beta, theta and
delta [2].
In this study, EEG datasets of people with various brain disorders were obtained
and used. Those EEG were obtained using placing scalp electrodes using Interna-
tional 10–20 system (Fig. 1). Brain is inside skull covered by scalp outside [3]. All
the cells in the human body have a resting membrane potential and can produce
electrical signal. So, an EEG not only records brain electrical activity, but also the
extra-neuronal signals too (Fig. 2). These are termed as artifacts.
As a result, pre-processing methods or filtering was used to remove the unwanted
noise present in the EEG data which were present around the range of 0.4–100 Hz.
Moreover, electrical or power line interference usually present at around 50 Hz [4,
5]. This was removed using a notch filter. After pre-processing, dual tree complex
wavelet transform (DTCWT) was used to extract feature of the signals. Gaussian
mixture model (GMM) classifier was used to classify EEG signals to identify the
brain disorders.
Fig. 1 International 10–20 system of placing EEG electrodes and composite signal of EEG rhythms
Fig. 2 International 10–20 system of placing EEG electrodes and composite signal of EEG rhythms
Electroencephalogram-Based Classification … 781
2 EEG Signals Pre-processing
Pre-processing of EEG signals is a very significant step present in order to eliminate

unwanted data and interference present in the EEG signal. These unwanted data
mixed with our original data is called as noise and can be both internal as well as
external noise. This may lead to misinterpretation of EEG and arriving at wrong
clinical decisions. Removing these unwanted artifacts or noise is the first step of
pre-processing.
The artifacts present in the signal can be classified as two types, namely physio-
logical and extra physiological. The artifacts or noise which arises anywhere from the
body except the signals from brain is called as physiological noise. For example, heart
produces electrical activity and this will appear in EEG and will be more enhanced in
short-necked individuals and people with artificial pacemakers. Mechanical artifacts
from heart like pulse artifact and ballisto-cardiographic artifact exist.
Sweat, slow roving eye movements create low amplitude artifacts. Muscular activ-
ities like chewing, swallowing and facial movements can lead to noise. Extra physio-
logical noise is all the other noises generated from rest of the sources like equipment
and environment. Electrode pops are a type of artifact which is due to spontaneous
discharge arising skin and the gel. Poor electrode placement also produces differ-
ences in impedance. Power line interference is caused due to electrical connections
present around the system which is embedded in the EEG data at 50–100 Hz [6].
Three commonly used filters are high- and low-frequency filters and notch filters.
Low-frequency filters remove signals with low frequencies and allow high-frequency
signals so they are called high-pass filters. High-frequency filters are similarly called
low-pass filters [7]. Notch filters filter out activity at a specific frequency instead of a
range. See Fig. 3. This round of checking takes place about two weeks after the files
have been sent to the editorial by the contact volume editor, i.e., roughly seven weeks
before the start of the conference for conference proceedings, or seven weeks before
the volume leaves the printers, for post-proceedings. If SPS does not receive a reply
from a particular contact author, within the timeframe given, then it is presumed that
the author has found no errors in the paper. The tight publication schedule of LNCS
does not allow SPS to send reminders or search for alternative email addresses on
the Internet.
Power line interference was removed using a notch filter. Band-pass filters are
those which allow only certain range of frequency to pass through. Wavelet trans-
forms virtual instrumentation functions was used to remove the extra physiological
artifacts present in the signals. Here, time domain signals were converted to frequency
domain and as a result, redundant data is removed and more accuracy obtained. See
Fig. 4.
Fig. 3 Effect of 5 Hz filter
3 Feature Extraction by Dual Tree Complex Wavelet

Transform (DTCWT)
Wavelet transform is the process of converting the wavelets of the signals to approxi-
mate coefficients. Data compression, motion estimation, classification and denoising
are some of the issues which can be solved using wavelet transform. Wavelet trans-
form helps to preserve the symmetry, smoothness and shape which are important to
get correct coefficients. DTCWT uses high and low double wavelet filters at each
scale. This results in real and imaginary complex wavelet coefficient. This property
is applied in areas of pattern recognition and signal processing [8, 9].
When the real signals are needed to be further processed or transformed, an
inverse transform of length is required, which will be produced by this transform. A
representation of this method is given in Fig. 5. Datasets of denoised EEG signals
given in Fig. 6a and b.
Fig. 4 Artifact removal techniques
Fig. 5 Scheme of DTCWT

Fig. 6 a and b Datasets of denoised and real-time EEG signals
4 Gaussian Mixture Model Classifier
The two basic categories of classifiers are deterministic and statistical classifiers. The
deterministic classifiers can be explained as the classifiers which take into account
for initialization of the unlabeled parameters, and search is taken place only in the
Fig. 7 Distributed Gaussian models
search space [10]. On the other hand, threshold values of density functions are only
considered for statistical classifiers. Here, we are dealing with the Gaussian mixture
model which is an unsupervised learning method where we can find the pattern
without using class labels [11, 12].
The Expectation Maximization (EM) technique is used to find the amount of
data points which will be present in the clusters [13–15]. Later, cluster means and
covariance are calculated based on it. As a result, a cluster covariance is produced
for the signals used. And as we partition these clusters, we can differentiate and find
the classification pattern. See Fig. 7.
As a result, standard mean and covariance matrix are developed which helps to
analyze the independent Gaussian distribution. Using the patterns, we can eventually
distinguish the EEG signals of people with brain disorders with normal people.
5 Conclusion
The scope of the paper is in real-time analysis of people with brain disorder. In
the exponentially increasing population and parallelly increasing disorders, it is very
difficult to expect humanly diagnosis and personal inspection for each person. More-
over, some minute patterns will be ignored by human eyes due to the extent of data
in case of each person. As a result, analyzing EEG signals of such people seems to
be the best solution. In our paper, we have concluded that combination of DTCWT
and Gaussian mixture model classifier seems best for the purpose.
References
1. Teplan, M.: Fundamentals of EEG measurement. Measur. Sci. Rev. 2, Section 2 (2002)
2. Britton, J.W., Frey, L.C., Hopp, J.: Electroencephalography (EEG): an introductory text and
atlas of normal and abnormal findings in adults, children, and infants. American Epilepsy
Society. Chicago (2016)
3. Fahmie, M., Bin, I., Rodzi, M.: EEG Acquisition Using Labview. Faculty Electronics and
communication Engineering, Kolej University Teknikal Kebangsaan, Malaysia, May 2006
4. Adalarasu, K.: Detection of early onset of driver fatigue using multimodal bio signal.
Department of biotechnology, Indian institute of technology, Chennai India, February 2010
5. Arman, S.I., Ahmed, A., Syed, A.: Cost-effective EEG signal acquisition and recording system.
Int. J. Biosci. Biochem. Bioinform. 2(5) (2012)
6. Khatwani, P., Tiwari, A.: A survey on different noise removal techniques of EEG signals. Int.
J. Adv. Res. Comput. Commun. Eng. 2(2). ISSN 2319-5940 (2013)
7. Gurumurthy, S., VudiSai Mahit, Ghosh, R.: Analysis and simulation of brain signal data by
EEG signal processing technique using MATLAB. Int. J. Eng. Technol. (IJET) 5(3), ISSN
0975-4024 (2013)
8. Kingsbury, N.: The dual tree complex wavelet transform: a new technique for shift invariance
and directional filters. University of Cambridge, Cambridge CB2 1PZ
9. Slimen, I.B., Boubchir, L., Mbarki, Z., Seddik, H.: EEG epileptic seizure detection and clas-
sification based on dual-tree complex wavelet transform and machine learning algorithms. J.
Biomed. Res. 34(3), 151–161. https://doi.org/10.7555/JBR.34.20190026
10. Cao, M.: Practice on classification using gaussian mixture model course project report for
COMP-135 (2010)
11. Lakshmi, R., Prasad, T.V., Prakash, C.: Survey on EEG signal processing methods. Int. J. Adv.
Res. Comput. Sci. Softw. Eng. 4(1). ISSN 2277-128X (2014)
12. Raj, A., Deo, A., Kumari, M., Tripathi, S.: A review on automated detection, classification
and clustering of epileptic EEG using wavelet transform and soft computing techniques. Int. J.
Innov. Res. Sci. Eng. 17. ISSN 2347-320 (2016)
13. Patel, R.: A real time frequency analysis of the electroencephalogram using lab view. A Thesis
Submitted to the Faculty of New Jersey Institute of Technology in Partial Fulfillment of the
Requirements for the Degree of Master of Science in Biomedical Engineering, Department of
Biomedical Engineering, January 2002
14. Varunadikkarapatti, V.: Optimal EEG channels and rhythm selection for task classification.
A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Masters of
Science in Engineering, Madras University, India (2004)
15. Raja, L., Arunkumar, B.: A comparative study of various artificial neural network classifiers for
EEG based autism spectrum disorder diagnosis. J. Adv. Res. Dyn. Control Syst. 11(1) (2019)
Parallel Antlion Optimisation (ALO)
and Grasshopper Optimization (GOA)
for Travelling Salesman Problem (TSP)
G. R. Dheemanth, V. C. Skanda, and Rahul Nagpal
Abstract We present our parallel high-performance version of the adapted Antlion

and Grasshopper meta-heuristic algorithms to solve the Travelling Salesman prob-
lem. Our detailed experimental evaluation reveals significant improvements over
the traditional Genetic and Ant-Colony based solutions in both accuracy and speed
with a performance gain of up to 4× thereby making it possible to solve Travelling
salesman problems for a large number of cities.
Keywords ALO · GOA · ACO · TSP · Combinatorial · Optimisation ·

Parallelism
1 Introduction
The Travelling Salesman Problem (TSP) is to find the shortest path that goes through
all the cities and returns to the first city in the path, given the direct path length
between each pair of cities. This problem has no known exact optimal polynomial-
time algorithm and is NP-complete. Many heuristic and meta-heuristic algorithms
strive to find near-optimal solutions. Meta-heuristic algorithms iteratively generate
a vast number of random solutions and search for global optima.
Ant Colony Optimisation (ACO) [5] and Genetic Algorithm (GA) [4] have been
applied widely to solve TSP. In this paper, we propose our adapted ALO and GOA
algorithm to solve TSP. We have also developed exclusive parallel versions of these
algorithms and have evaluated our algorithm compared to earlier proposed ACO
and GA. Our experimental evaluation reveals that our proposed ALO and GOA
G. R. Dheemanth · V. C. Skanda · R. Nagpal (B)

Department of Computer Science and Engineering, PES University, Bengaluru, India
e-mail: rahulnagpal@pes.edu
G. R. Dheemanth
e-mail: dheemanthgr@gmail.com
V. C. Skanda
e-mail: skandavc18@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_83
788 G. R. Dheemanth et al.
algorithms are faster and more accurate in finding a solution to TSP for an even large
number of cities. Our main contributions are as follows:
1. We adapted ALO and GOA targeting TSP and this, to the best of our knowledge,
is the first attempt in this direction.
2. We have additionally developed and implemented parallel versions of our ALO
and GOA for TSP.
3. We performed a detailed hard-nosed experimental evaluation of the accuracy of
our algorithms as well as speedup of the parallel version of our ALO and GOA
that shows a significant gain in both accuracy and performance as compared to
the earlier state-of-the-art ACO and GA algorithms.
The rest of the paper is organized as follows. We explain related work in Sect. 2,
followed by our ALO and GOA algorithm as adapted to solve TSP in Sects. 3
and 4 respectively including parallel versions of these algorithms. We analyze
our results in Sect. 5 and conclude in Sect. 6 with mention of future directions for
this work.
2 Related Work
The Held Karp dynamic programming Solution [1, 2] brings down the O(N !) time
complexity to O(n 2 2n ) with the space complexity of order O(n2n ) and is one of the
best known exact solutions to TSP. However, the algorithm is still exponential with
space complexity that increases exponentially with an increasing number of cities
thereby this solution quickly becomes prohibitive even with 25–30 cities. Therefore,
meta-heuristics such as ACO [5] and GA [4] have been applied to TSP problems.
GA evolves a collection of possible solutions known as phenotype towards a better
solution by altering and rearranging solutions and selecting the fittest solution for the
next generation based on the principle of “Survival of the fittest”. However, GA tends
to frequently converge towards local optima thereby missing the optimal solution.
ACO exploits an ant’s capability to find the shortest path to a destination containing
food with each path travelled by an ant associated with a pheromone trail aiding in
path tracing. The intensity of the pheromone trail is proportional to the quality of a
sub-path, a property that is used to build better solutions while each ant decides on
which sub-path to take next. However, ACO does not scale well and thereby is not
efficient for large scale combinatorial optimization problems such as when solving
TSP for a large number of cities.
ALO mimics the interaction of antlions and their prey which are ants with various
random operators used to find optimal solutions. ALO has been used recently to
successfully solve a variety of engineering problems, including training neural net-
works. GOA is a population-based algorithm that mimics the movement of a swarm
of grasshoppers and their social interaction. Each grasshopper in the swarm repre-
sents a solution to the optimization problem. In contrast to earlier applications of
Parallel Antlion Optimisation (ALO) and Grasshopper … 789
ALO and GOA primarily that are primarily in the context of continuous optimiza-
tion problems, we have adapted and parallelized ALO and GOA targeting discrete
optimization problems.
3 Adaptation of ALO Algorithm to TSP
ALO makes use of random operators to avoid local optima compared to other meta-
heuristic algorithms. In this paper, we propose a novel discrete version with an array
of cities to model the TSP path in contrast to the proposal in [3] targeted at optimizing
parameters of continuous variables that can not be used for a solution to TSP.
In our implementation, as outlined in Algorithm 1, each ant and antlion is a
permutation of the cities, and fitness is the total path length of the array of cities.
Ants randomly walk in a bounded area (search space) represented as an array of
numbers which are initialized to the maximum distance between two cities (upper
bound). We propose “random permutation of cities” as the random walk. Since a
random permutation can lead to the ant going out of the search space, we also
propose a new normalization function that tries to bring the maximum number of
distances between cities within the search space. The normalization function iterates
through the cities in the path. If the distance between any two cities i and j is greater
than the upper bound, then a city k is found such that the distance between i and k
is less than the distance between i and j, and following this, k and j are swapped.
This process repeats until the end of the array is reached. During the random walk,
ants can sometimes fall into the trap of antlions which is simulated using a roulette
wheel to select the antlion, which traps an ant. The roulette wheel is used to give
more preference to fitter antlions. To simulate elitism, which is a salient feature of
ALO, the antlion with the minimum fitness or minimum path length is taken as a
better solution (elitism). Since the elite antlion makes the best traps affecting all ants,
after the random walk of an ant, the position is updated taking into consideration
both the elite antlion and the antlion which was randomly selected using a roulette
wheel. This updated path can sometimes violate the TSP path by having a city more
than once in the path. So a function was developed to correct the path. To simulate
the sliding of the ants towards the antlion, the search space is gradually reduced as
the number of iterations increases.
3.1 Parallel ALO for TSP
We have observed that ALO, as described in Algorithm 1, can be effectively par-

allelized to improve accuracy by increasing the number of search agents in a fixed
time or to improve speed for a fixed number of search agents as well as any suitable
combination of the both based on the available budget. We have capitalized on these
observations in our implementation by parallelizing initialization of search-agents as
well as the calculation of fitness of the search agents attributed to no dependencies.
4 Adaptation of GOA Algorithm to TSP
GOA is a population-based metaheuristic algorithm where each grasshopper rep-

resents a solution. We have proposed and implemented a discrete version of GOA
in contrast to [6] that targets continuous optimization by using an array of cities to
represent the TSP path with initialization of the grasshopper path using a random
permutation of cities. Fitness is modelled as the total path length or the sum of the
weights of the edges in the path. The grasshopper having the minimum fitness or
minimum path length was considered to be the best agent.
The overall process, as described in Algorithm 2, is as follows. First, the grasshop-
pers are initialized with a random permutation of cities. Also, a new grasshopper-
update function for the TSP problem is introduced compared to what is proposed
in [6]. The modification of the grasshopper’s position depends on three criteria,
namely the current position of the grasshopper, a random grasshopper’s position,
and the best grasshopper’s position. The random grasshopper’s position simulates
the function s, which calculates the social forces as in [6]. The main interactions
between grasshoppers are attractive and repulsive forces. These interactions are mod-
elled as a comfort zone around every grasshopper where the repulsive force is greater
Algorithm 1: Pseudocode of ALO algorithm

Input: Graph in form of Adjacency Matrix, list of lower and upper bounds
Output: Path length of optimal TSP tour
Initialise the first population of ants and antlions randomly;
for each node ant in ants do
calculate fitness;
end
for each node ant in antlions do
calculate fitness;
end
repeat
Select an antlion using Roulette wheel ;
Update upper bound ;
generate random permutation ;
update position of ant;
end
calculate fitness;
end
if fitness(elite) is greater than fitness(maxFitness(ants)) then
update elite ;
update leastfitness(antlions) with elite ;
else
continue ;
end
until end criterion not satisfied;
than the attractive force. A parameter c is used to represent this comfort zone. Ini-
tially, this parameter is high, allowing the grasshoppers to explore large parts of the
search space. Over the iterations, the value of the parameter is reduced, leading to
the movement and convergence of the grasshoppers.
The overall process, as described in Algorithm 2, is as follows. First, the grasshop-
pers are initialized with a random permutation of cities. Also, a new grasshopper-
update function for the TSP problem is introduced compared to what is proposed
in [6]. The modification of the grasshopper’s position depends on three criteria.
(a) the current position of the grasshopper
(b) a random grasshopper’s position, and
(c) the best grasshopper’s position.
The random grasshopper’s position simulates the function s, which calculates the
social forces as in [6]. The main interactions between grasshoppers are attractive and
repulsive forces. These interactions are modelled as a comfort zone around every
grasshopper where the repulsive force is greater than the attractive force. A parameter
c is used to represent this comfort zone. Initially, this parameter is high, allowing the
grasshoppers to explore large parts of the search space. Over the iterations, the value
of the parameter is reduced, leading to the movement and convergence.
4.1 Parallel GOA for TSP
Parallelization strategy used for GOA is similar to ALO with parallelization of the
initialization of search agents as well as fitness calculation.
Algorithm 2: Pseudocode of GOA algorithm

Input: Graph in the form of Adjacency Matrix, list of lower and upper bounds
Output: Path length of optimal TSP tour
Initialize the first population of grasshoppers randomly ;
for each grasshopper in the swarm do
calculate fitness;
end
T = best search agent ;
repeat
for each search agent do
Update position of current search agent ;
calculate fitness;
end
update T if there is a better solution ;
until end criterion not satisfied;
5 Performance Evaluation
Figure 1 compares the accuracy of ALO, GOA, ACO and GA with reference to
the Held Karp dynamic programming Algorithm for progressively more number of
cities. We observe that our GOA and ALO algorithms perform the best in accuracy,
whereas the Genetic Algorithm (GA) performs the worst.
Figure 2 depicts the speedup of the Parallel version of ALO, GOA, ACO and GA
compared to the corresponding serial version. Our ALO and GOA algorithms are
on average 1.5× and 4× faster compared to the serial version respectively on our
system with Ryzen 7 1700 8-Core CPU with 16 GB of memory and Nvidia GPU:
Nvidia GTX 1050 Ti GPU. There is little speedup at all for most of the algorithms
on a small number of cities because of the overheads of thread spawning that steals
away any parallelism benefits.
6 Conclusion and Future Directions
In this paper, we presented our adapted ALO and GOA algorithms for TSP, along
with exclusively designed and developed parallel high-performance versions of these
algorithms. We implemented serial and parallel versions of ALO, GOA and exper-
imentally evaluated the accuracy and performance compared to other well-known
algorithms. Our experimental results revealed that our ALO and GOA algorithms per-
Fig. 1 Comparison of accuracy of various algorithms

Fig. 2 Comparison of time taken by serial and parallel version of algorithms on Ryzen 7 with GTX
1050Ti
form better in terms of accuracy as well as performance compared to other algorithms

and are on average 1.5× and 4× faster compared to the serial version. In the future,
we plan to implement and evaluate these algorithms in a distributed environment.
References
1. Bellman, R.: Dynamic programming treatment of the travelling salesman problem. J. ACM
(JACM) 9(1), 61–63 (1962)
2. Held, M., Karp, R.M.: A dynamic programming approach to sequencing problems. J. Soc. Ind.
Appl. Math. 10(1), 196–210 (1962)
3. Mirjalili, S.: The ant lion optimizer. Adv. Eng. Softw. 83, 80–98 (2015)
4. Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA, USA (1996)
5. Moyson, F., Manderick, B.: The Collective Behavior of Ants: An Example of Self-organization
in Massive Parallelism. Vrije Universiteit Brussel, Artificial Intelligence Laboratory (1988)
6. Saremi, S., Mirjalili, S., Lewis, A.: Grasshopper optimisation algorithm: theory and application.
Adv. Eng. Softw. 105, 30–47 (2017). https://doi.org/10.1016/j.advengsoft.2017.01.004
Design and Development of Machine
Learning Model for Osteoarthritis
Identification
Naidu Srinivas Kiran Babu, E. Madhusudhana Reddy, S. Jayanthi,

and K. Rajkumar
Abstract In the body of human beings, the calcaneus or heel bone is one of the
strongest and biggest bone which is in the foot. It helps the foot flexibly in normal
walking movements. In recent years, most of the people in the age group of 35
to 50 years are falling victim to osteoarthritis (calcaneal shift) as it makes serious
impacts continuously. Such types of diseases direct to ailment of the knee. If the
ache in the knee is not released by physiotherapy or medication, the affected person
may be confined to bed and has to undertake calcaneal osteotomy. It leads to the
development of a user-friendly application to load and run Calcaneus images into
the application model developed for calcaneal osteotomy. In general, machines are
available to predict calcaneal shift occurrence, but they fail to predict and analyze
the other subtypes of the calcaneal shift in the foot. This research work focused on
developing a model to predict and analyze the subtypes of calcaneal shift occurrence
in the foot.
Keywords Calcaneal shift · Deep convolution neural network · Osteotomy ·

Osteoarthritis · Binary classification
N. S. K. Babu (B)
Department of Computer Applications, Career Point University, Kota, India
e-mail: kiranbabu.naidu16@gmail.com
E. M. Reddy
Department of CSE, Guru Nanak Institutions Technical Campus, Hyderabad, India
e-mail: e_mreddy@yahoo.com
S. Jayanthi
Department of IT, Guru Nanak Institute of Technology, Hyderabad, India
e-mail: drsjayanthicse@gmail.com
K. Rajkumar
School of Computer Science and Information Technology, DMI-St John the Baptist University,
Mangochi, Malawi
e-mail: rajkumarengg2020@gmail.com
https://doi.org/10.1007/978-981-33-4543-0_84
796 N. S. K. Babu et al.
1 Introduction
Osteoarthritis is one type of arthritis by which widespread number of people are

being affected worldwide. It shall occur if the cartilage which appears at both the
end point bones that outwear over time. In few cases, osteoarthritis also may occur
due to irregular foot workings, like flat feet or pes planus or high arches. Flatfoot
is a common foot condition that affects the patients of all walks of life in India.
Musculotendinous component is one of the important components of pes planus
which can be aggravated by deficiencies in the circulatory system, or increased
wait loading. Flatfoot is caused with a deterioration of medial longitudinal arch of
the foot and is usually associated by hindfoot valgus and abduction of forefoot as
shown in Fig. 1. To resolve this issue, no consistent method is developed to treat
flatfoot malalignment. Conservative treatment and surgical correction are used to
abate symptoms such as orthoses and immobilization. These surgeries taken for
approaching the malalignment in flatfoot that may consist of the combinations such
as transfer of tendons, medial column plantar flexor Osteotomies, stabilization of
medial column and different hindfoot osteotomies [1, 2].
There is an agreement available amid surgeons to perform and follow surgical
procedures. The commonly followed hindfoot osteotomy methodologies are medi-
alizing calcaneal osteotomy, eans calcaneal osteotomy and calcaneocuboid distrac-
tion. In recent years, calcaneal Z-osteotomy has been adopted to approach seriously
deformed flatfoot. But this method has not been reviewed and discussed objectively
in this research work.
There are diverse chances available to study the deformity of flatfoot and surgical
corrections to treat flatfoot. The first option is to observe the human subjects, but
it is time consuming and costly. Also, there are a limited number of measurements
Fig. 1 External symptoms of pes planus a crumple of medial fallen arch, b malalignment of
the hindfoot and c forefoot abduction
Design and Development of Machine Learning Model … 797
available to treat flatfoot without harming the patient. Moreover, it is very tricky
for making comparisons directly on the different medical procedures performed on
confounding variables which are generated from the treatments given to specific
patients [3–6].
However, computational models are significantly influential, and they are
restricted to certain degree based on the robustness of its development and the rigor-
ousness of its validation. There are a number of issues such as anatomical accu-
racy, tissue mechanical properties and biofidelic boundary conditions which arise
in computational models. Cadaveric models are efficient tools which have often
adopted to be used to investigate more invasive biomechanics. In cadaveric models,
vitro lower limbs use to be loaded in the physiological simulation. It presents an
efficient way to explore foot mechanics and diverse treatment methods [7–9].
It permits to measure the different parameters which would not ethically be
feasible on living human being. For example, quantifying motion with bone pines
is very difficult task with living human beings. However various studies have been
carried out on flatfoot, most of the research and studies are quasi-static (or static), and
examine the farefoot at midstance, or select particular specific locations in stance.
Dynamic gait simulators are a recently developed simulator that avail researchers to
perform completely dynamic simulation with cadaveric specimens [6, 10].
2 Literature Review
After an extensive review carried out in the related work, it is observed that the aim of
the statistical model used for ideology is to enhance efficiency of a computational foot
system by using artificial neural networks. Human joints are used for computational
foot modeling to observe joint deformation and kinematics and also to study about
how joint function is affected by its structure. More particularly, research works
carried out using these foot computational models have done study on many topics
which includes joint motion and the positions of related bones with simulation of
load, and the forces made on joints due to injury or day to day routine activities, or
the hardware positioning for correcting the defect.
A. Evaluating calcaneal load from the footprint of human while standing using
a 3D scanner.
This research has done an extensive research in finding the relationship between the
footprint load and its depths in the calcaneal area of the human standing straightly.
Footprint depths which are the deformation in calcaneal area have been obtained
through the z-value extraction obtained from 3D scanner foot scanning. In this obser-
vation, force-sensing resistor sensor has been placed over the shoe in the calcaneal
area. Then, the peak loads are estimated from the footprint. To carry out these findings,
20 patients have been selected [1, 11, 12].
In this study, a notable difference is observed in calculating the calcaneal loading

due to plantar foot position of patients. It is observed that the plantar foot position
that bends toward front, back or side also affects the result. 3D scanner can be used to
estimate the calcaneal loading during standing posture. The benefit of implementing
this method is to calculate the pressure or load at the footprint that contact with
another surface area. It is calculated by using the desired footprint depth location
instead of the maximum footprint.
B. The experimental results of the calcaneal lengthening osteotomy to
approach pes planovalgus and evaluate the position of the foot.
The efficiency of the calcaneal lengthening is evaluated using an adapted Evans

osteotomy method in handling pes planovalgus and reinstates the normal position
of the foot in patients. This technique has been carried out using the adapted Evans
technique among 11 patients with different age group with pes planovalgus distor-
tion. Five patients have been cerebral palsy, one was sequela of myelomeningocele,
and one was sensorimotor polyneuropathy, and four patients were assessed as idio-
pathic. Out of 11 patients, ten patients undergone long-term conventional therapy
preoperatively, but one patient had not gone for any surgery to cure the distortion of
the foot [6, 10, 13].
Clinical assessment has been done on ten different parameters. Radiographic
assessment has been done on seven different parameters on typical anteroposterior
and radiographs. These assessments have been carried out for 15 months. After
the thorough assessment, it has been observed that union has been attained in the
patients after seven weeks. It is also observed that clinical result was excellent in 17
feet, good in 3 feet, fair in 1 foot and poor in 1 foot. In radiographic, five feet have
been observed as excellent, 13 feet as good and four feet as fair, respectively. It is
observed that calcaneal lengthening has been done for 7.3 mm length. It is also found
that distortion or overcorrection has not occurred during and after the treatment. Also,
before carrying out surgery, five of the patients could able to walk on the heel of their
foot without any support [6, 9].
C. Deep CNN model for classification of sentence.
Deep CNN model used classification of sentences and has three of the filtered region
with sizes: 2, 3 and 4, and each has two filters as shown in Fig. 2. Filters on this model
perform convolution in sentence matrix and produce feature maps with variable
length. Followed by this step, 1-max pooling will be carried out over each of the
maps. From six maps, univariate feature vector is created and is merged to formulate
a feature vector to penultimate layer. Subsequently, this feature vector is sent as the
input to the final layer of softmax model to perform classification in the sentence;
here, as we perform binary classification, it produces two outputs such as normal and
defected [10, 14].
Fig. 2 Illustration of deep CNN model for classification of sentence [14]
3 Methodology
The research work on the proposed method is partitioned into four parts, namely
data collection, data preprocessing, fusion and features extraction, and classification
of patterns. First, in the data collection phase, images have been collected from
individuals. In the second phase, deep convolution neural network is applied to
extract images. In the third phase, important features are extracted from the images.
During the final phase, softmax algorithm, a pattern recognition algorithm has been
for recognition of patterns as either normal or defected.
Fig. 3 Deep CNN softmax model for classification of calcaneal image
4 Development of a Novel Classification Model
The goal of developing the novel classification model which takes a collection of
sample test case inputs to provide a reliable coverage at a identifiable depth in test
space. This leads to a set of test cases which are focused on triggering the functionality
of the foot independent of the model used for the implementation. This softmax model
based on a deep CNN can be divided into three steps.
1. Defining the operational scope of the softmax model. This step included data
acquisition and preprocessing stage of the osteoarthritis/flatfoot identification.
2. Identifying and enumerating the attributes in images and its values, respectively.
This step includes analyzing the preprocessed image which re-fed as input to
CNN softmax algorithm.
3. Appling the deep CNN softmax model on the data/images and classifying the
images either as normal or defected (Fig. 3).
The detailed architectural model for automatic detection of images of flatfoot and
prediction of results is shown in Fig. 4. It is similar to digital stain which identifies
the images’ region that is required and most relevant to classify as either normal or
defected in flatfoot. This research work is intended to propose a model for classifica-
tion of flatfoot by using CNN softmax algorithm. The efficacy of this model will be
tested on varying foot images to detect its distortion. Its efficiency parameters also
will be evaluated by comparing with that of comparative algorithms.
5 Importance of Proposed Research
The proposed research on developing a novel classification model is to predict and

approach calcaneal Shift in the foot of flatfoot patients. This model helps to reduce
the time complexity, and the accuracy of the prediction result is also better compared
Visual Interpretable
Prediction
Automatic Detection of
Basal-Cell Carcinoma
(Softmax Classifier)
Image Representation
(Convolutional Auto
Encoder)
Unsupervised Failure
Learning
Image
Fig. 4 Architecural model for classification of calcaneal image using softmax algorithm
to the conventional method of predicting or measuring the calcaneal shift. As it takes

less time and accurate in predicting the calcaneal shift, patients can able to undergone
preventive measures in early stages to avoid difficult effects on the joints such as hip,
pelvis, knee and spine. So this method can be used to predict the flatfoot problem
before the deformity occurs.
6 Conclusion
Calcaneal lengthening osteotomy is used for pain relief and notable clinical and
radiographic modification in forefoot and hindfoot for symptomatic pes planovalgus.
Different feeding techniques can be implemented to further enhance the model.
Further work can be carried out to detect the congenital anomalies related to this
calcaneal shift. And also, this program code can also be made to design for design
standards other than the Indian Standard by incorporating necessary modifications.
References
1. Albon, T.: Plantar force distribution for increasing heel height within women’s shoes. Physics,
The College of Wooster, Wooster, Ohio, December 2011
2. Wibowo, D.B., Gunawan, D.H., Agus, P.: Estimation of foot pressure from human footprint
depths using 3D scanner. AIP Conf. Proc. 1717 (2016)
3. Wright, R.W., Boyce, R.H., Michener, T., Shyr, Y., McCarty, E.C., Spindler, K.P.: Radiographs
are not useful in detecting arthroscopically confirmed mild chondral damage. Clin. Orthop.
Relat. Res. 245–25 (2006)
4. Urry, S., Wearing, S.: Arch indexes from ink footprints and pressure platforms are different.
Foot 15(2), 68–73 (2005)
5. Hunt, A.E., Fahey, A.J.: Static measures of calcaneal deviation and arch angle as predictors of
rearfoot motion during walking. Aust. J. Phys. 46, 9–17 (2000)
6. Hsu, T.C., et al.: Comparison of the mechanical properties of the heel pad between young and
elderly adults. Arch. Phys. Med. Rehabil. 79, 1101–1104 (1998)
7. Barrett, S.L., O’Malley, R.: Plantar fasciitis and other causes of heel pain. Am. Fam. Phys.
15:59(8), 2200–2206 (1999)
8. Nass, D., Hennig, Treek, V.: The thickness of the heel pad loaded by bodyweight in obese
and normal weight adults. Biomechanics Laboratory, University of Essen, Germany, D 45117
(2000)
9. Pinto, C.C., Marques, M., Ramos, N.V., Vaz, M.A.P.: 3D modelling for FEM simulation of an
obese foot. ResearchGate. Conference Paper, January 2010
10. Filardi, V.: Flatfoot and normal foot a comparative analysis of the stress shielding. 15(3),
820–825 (2018)
11. Baldonado, M., Chang, C.-C.K., Gravano, L., Paepcke, A.: The stanford digital library metadata
architecture. Int. J. Digit. Libr. 108–121 (1997)
12. ScanPod3D: 3D Scanner Mini and Scansoft for Foot Orthotic. Vismach Technology Ltd. www.
scanpod3d.com (2013)
13. Lee, D.G., Davis, B.L.: Assessment of the effects of diabetes on midfoot joint pressures using
a robotic gait simulator. Foot Ankle Int. 30(8), 767–772 (2009)
14. Kim, J.: Convolutional neural network (CNN) perform text classification with word embed-
dings. In: Towards Data Science, Dec 3, 2017
Author Index
A Bharti, Shubam, 153

Aathira, M., 307 Bhat, Prashant, 547, 565
Abbas, Junaid, 527 Bhat, Aruna, 161
Abimannan, Satheesh, 429 Bhatia, Rajesh, 153
Abraham, Mathews, 473 Bhattacharyya, Koushik, 737
Aditya Sai Srinivas, T., 43 Bhosale, M. S., 205
Afnan, Ayesha, 527 Budyal, Rahul Rajendrakumar, 403
Agarwal, Mohit, 1
Agrahari, Neeraj Kumar, 215
Ahire, Deepak, 341 C
Ahir, Hemal, 573 Chandan, Rithvik, 769
Akhilesh, N. S., 555 Chegaraddi, Sangeetha S., 403
Akhil, K., 63 Chhabra, Bhavya, 153
Amrita, I., 597 Choubey, Rishita, 737
Anand, Nikhil, 457 Choudhary, Mukesh, 87
Aniruddha, M. N., 555 Cross, Maria Anisha, 277
Anirudh, R. V., 317
Antonidoss, A., 483
Appalanaidu, Majji V., 515 D
Aravind, Karrothu, 43 Das, Arijit, 161
Arya, S. J., 633 Devarapalli, Danny Joel, 259
Ashok Kumar, S., 19, 27 Dheemanth, G. R., 787
Asish, A., 633 Dilli Babu, S., 749
Awasthi, Lalit Kumar, 651 Dubey, Nishita, 701
B E
Babu, Brundha Rajendra, 317 Eapen, Justin, 297
Babu, Naidu Srinivas Kiran, 795 Emon, Ismail Siddiqi, 363, 761
Balakesava Reddy, P., 53
Bala, Shashi, 493
Begum, Gousiya, 269 F
Bhandari, Smriti, 341 Faizul Huq Arif, Md., 363
Bhandigare, Shivani, 643 Febi Shine, B. S., 633
Bhardwaj, Vivek, 195, 493 Fernandes, Chelsea, 701
© The Editor(s) (if applicable) and The Author(s), under exclusive license 803
to Springer Nature Singapore Pte Ltd. 2021
https://doi.org/10.1007/978-981-33-4543-0
804 Author Index
G Kaman, Sweta, 235

Gajbhiye, Snehal, 333 Kamble, Kiran, 341
Gajjar, Sachin, 693 Kamble, Kiran P., 643
Garg, Deepak, 215 Kamble, R. M., 205
Ghosh, Anirban, 555 Kanchan, Shohna, 701
Godwin Barnabas, S., 327 Karjole, Aditi, 333
Gopalakrishnan, E. A., 185 Karri, Sai Prashanth Reddy, 259
Gorijavolu, Harshit, 259 Kathuria, Shivam, 153
Gote, Anuja, 9 Katkuri, Pavan Kumar, 605
Govardhan, A., 465 Kaur, Harshdeep, 195
Govinda, K., 53 Kaviya, V., 709
Govindasamy, C, 483 Khan, Shahnawaz, 429
Gupta, Anmol, 429 Kodavade, Prajkta, 643
Gupta, Himanshu, 437, 447 Koushik, Rahul M., 769
Gupta, Suneet Kr., 1 Krishna Menon, Vijay, 185
Gupta, Sunny, 9 Kulkarni, Linganagouda, 79
Gurumurthi, Jaishankar, 87 Kulkarni, Tejas, 9
Kumaravelan, G., 515
Kumari, Dara Anitha, 465
H Kumari, Ruchika, 99
Hamdare, Safa, 701 Kumar, Manish, 153
Hari, Akshaya, 527 Kumar Mohanta, Bhabendu, 73
Harini, N., 185 Kumar, Rakesh, 99
Harisankar, V., 709
Hegde, Prajna, 565
Hossain, Javed, 353
L
Hossain, Mohammad Mobarak, 761
Ladge, Leena, 87
Huq, S. Zahoor Ul, 269
Lakshmi Priya, E., 411
I
Islam, Aminul, 583 M
Maan, Veerpaul Kaur, 123
Maheshwari, Sagar, 693
J Mahtab, Sheikh Shahparan, 363, 761
Jahan, Busrat, 363, 761 Maity, Soumayadev, 583
Jaimin, Patel, 241 Malaganve, Pradnya, 547
Jaya Kumar, D., 393 Malage, Rajshri N., 375
Jayanthi, S., 795 Malathi Latha, Y. L., 143
Jena, Debasish, 73 Malhotra, Ruchika, 727
Jetawat, Ashok, 385 Manivannan, S. S., 43
Jeyakumar, G., 307, 411, 539 Mantri, Archana, 605
Jha, Shikha, 9 Mate, Yash, 171
John, Jeffin, 297 Mathur, Aakansha, 503
John, Jewel Moncy, 297 Mavilla, Venkata Sai Dheeraj, 259
Joseph, Ebin, 297 Milu, Sharmin Akter, 353, 363, 761
Julfiker Raju, Md., 363 Mohapatra, Niva, 73
Juliet, D. Sujitha, 225 Moharana, Suresh Chandra, 719
Moon, Kapila, 385
Mund, Ganga Bishnu, 719
K Mupila, Francis K., 437, 447
Kadam, Aishwarya, 643 Murali Krishna, T., 135
Kadyan, Virender, 195, 493 Murthy, GRS, 285
Kalsekar, Samruddhi, 701 Myna, P., 317
Author Index 805
N Rohit, Chatla Venkat, 285

Nagaprasad, S., 677
Nagaraj, Akash, 63
Nagdev, Nikhil, 171 S
Nagpal, Rahul, 769, 787 Sahu, Gitimayee, 33
Nair, Akhil, 87 Saidulu, Ch., 135
Narayana, M. V., 749 Saidulu, D., 53
Nayak, Jyothi S., 317 Sai Sreekari, C., 411
Nehal, Patel, 241 Sajith Variyar, V. V., 185
Nimmakuri, Sri Anjaneya, 259 Salwan, Poonam, 123
Nobi, Ashadun, 353 Sandip, Patel, 241
Nuthakki, Ramesh, 527 Sangal, Amrit Lal, 623, 657
Sankarachelliah, N., 327
Santhosh, R., 779
P Saravanan, S., 225
Padmavathi, S., 709 Sateesh Kumar, K., 135
Pandey, Mithilesh Kumar, 215 Satyavathi, K., 393
Pandey, Shivani, 727 Satyavathi, N., 115
Pandey, Sonal, 663 Selva Sundar, T., 327
Patel, Meet, 573 Sen, Protiva, 251
Pathak, Bageshree, 333 Sen, Snigdha, 597
Patil, Gayatri, 171 Senthilram, P., 327
Patil, Mithun B., 375 Shanmuganantham, T., 19, 27
Patil, S. T., 205 Shariff, Faisal Ahmed, 527
Pawar, Sanjay S., 33 Sharma, Ashu, 663
Pawar, Sonali, 333 Sharma, Hemant Kumar, 657
Pentapati, Niharika, 769 Sharma, Kapil, 419
Pranitha, B. L., 597 Sharma, Sanjay, 663
Prashamshini, Eleanor, 317 Shekar, K. Chandra, 277
Premjith, B., 615 Sindhu, K., 555
Priya, R. L., 171 Singh, Shivam, 215
Pushpatha, T., 677 Singh, Shreyanshi, 73
Sirisati, Ranga Swamy, 749
Siva Kumar, A. P., 269
R Skanda, V. C., 787
Radhakrishnan, Anisha, 539 Soman, K. P., 185, 615
Rahman, Mostafizur, 251 Somula, Ramasubbareddy, 43
Rajakarunakaran, S., 1, 327 Sowmya, V., 185
Raja Kishore, R., 393 Sreelakshmi, J. L., 633
Raja, Laxmi, 779 Sreelakshmi, K., 615
Rajkumar, K., 795 Srikanth, H. R., 63
Rama, B., 115 Srinath, Raghunandan, 403
Rama Krishna, C., 663 Srinivas, Pattlola, 143
Ramasubbareddy, Somula, 53 Srinu, Dhonvan, 393
Ramji, B., 185 Subramanyam, S., 19
Ranjan, Sudhanshu, 583 Sultana, Razia, 503
Rastogi, Abhinav, 161 Sumukh, Y. R., 403
Rastogi, Akanksha, 623 Swain, Amulya Ratna, 719
Rawat, Arun Pratap, 583 Swami Das, M., 143
Reddy, E. Madhusudhana, 795
Redekar, Neha, 643
Regi, Mathew, 473 T
Rijith Kumar, V., 327 Thakkar, Falgun, 573
806 Author Index
Thejas, B. K., 597 Venkatesh, Akshay, 63

Thirunavukkarasu, K., 429 Venugopal Rao, K., 27
Thomas, Abraham K, 297 Verma, Harsh K., 651
Tirodkar, Gaurav, 171 Verma, Prashant, 419
Tulasi Sasidhar, T., 615 Vichore, Hrishikesh, 87
Vijayalakshmi, M., 79
Vijay Kumar, P., 135
U Vishnu Vardhana Rao, M., 749
Udaya Bhanu, P., 135 Vrindavanam, Jayavrinda, 403
V
Valai Ganesh, S., 1, 327 Y
Varghese, Elizabeth, 633 Yadav, Anupama, 651
Vasudevan, Vignesh, 277 Yashaswini, L., 403

Bok - 978 981 33 4543 0

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bok - 978 981 33 4543 0

Uploaded by

Copyright:

Available Formats

Lecture Notes in Networks and Systems 171

More information about this series at http://www.springer.com/series/15179

A. Govardhan Rajkumar Buyya

ISSN 2367-3370 ISSN 2367-3389 (electronic)

• The organizing committee of ICICSE-2020 takes the opportunity to thank the

Hyderabad, India H. S. Saini

Static and Dynamic Activities Prediction of Human Using Machine

Self Driven UGV for Military Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 87

A Hybrid Approach for Region-Based Medical Image Compression

Finding the Kth Max Sum Pair in an Array of Distinct Elements

Substituting Phrases with Idioms: A Sequence-to-Sequence

Intelligent Cane for Assistant to Blind and Visual Impairment

Continuous Recognition of 3D Space Handwriting Using Deep

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803

About the Editors

Dr. A. Govardhan is presently Professor of computer science and engineering,

Dr. Rajkumar Buyya is Redmond Barry Distinguished Professor and Director of

M. Aathira Department of Computer Science and Engineering, Amrita School of

Hemal Ahir G H Patel College of Engineering and Technology, Vallabh

Smriti Bhandari Department of Computer Science and Engineering, Annasaheb

S. Dilli Babu Department of CSE, Vignan’s Institute of Management and

Suneet Kr. Gupta Bennett University, Greater Noida, India

G. Jeyakumar Department of Computer Science and Engineering, Amrita School

Harshdeep Kaur Chitkara University, Institute of Engineering and Technology,

Soumayadev Maity Department of Information Technology, Indian Institute of

S. Nagaprasad Department of M.C.A., St.Ann’s College, Mehdipatnam,

Gayatri Patil Computer Department, Vivekanand Education Society’s Education

Somula Ramasubbareddy Information Technology, VNRVJIET, Hyderabad,

Amrit Lal Sangal Department of Computer Science and Engineering, Dr. B R

Ranga Swamy Sirisati Department of CSE, Vignan’s Institute of Management and

K. Thirunavukkarasu School of Computer Science and Engineering, Galgotias

S. Valai Ganesh, Mohit Agarwal, Suneet Kr. Gupta, and S. Rajakarunakaran

Abstract Recent advancement in smart phones and computing technologies has

Keywords Human activity recognition · LSTM · Sensors · Smart phones ·

S. Valai Ganesh (B) · S. Rajakarunakaran

4 Machine Learning Models

Fig. 1 Precision and recall comparison of machine learning methods

Fig. 2 Accuracy and F1-score comparison of machine Learning methods

5 Deep Learning Model-LSTM

Fig. 3 Overview of LSTM model

Table 1 LSTM single layer output results—Model-I

Table 2 LSTM two layer output results—Model-II

Table 3 LSTM hyperparameters—Model-III

7 Conclusion and Future Work

Acknowledgements We are thankful to RAMCO Institute of Technology and Bennett University

Tejas Kulkarni, Shikha Jha, Sunny Gupta, and Anuja Gote

Abstract Braille tab is an electronic device used to perform various functions of a

Keywords Braille tab · Android app · Firebase · Shift register · Multiplexer

T. Kulkarni (B) · S. Jha · S. Gupta · A. Gote

Fig. 1 Selecting 1st horizontal line

4 Implementation and Working (Software)

Fig. 2 Selecting 1st character from first line

Fig. 3 Login page

Fig. 4 Registration request

Fig. 7 Registration request

Fig. 8 Select mode

Table 1 Chart for understanding different parameters before fetching data

Fig. 9 Select file type

Fig. 10 ‘Dictate’ mode

Fig. 11 “Upload File” mode

Fig. 12 LED matrix

5 Implementation and Working (Hardware)

• In the proposed system, a 3 × 3 Braille tab is implemented, in which each character