You are on page 1of 401

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/377921827

Depth Maps-Based 3D Convolutional Neural Network and 3D Skeleton


Information with Time Sequence for HAR.

Article in International Journal of Artificial Intelligence · February 2024

CITATIONS READS

0 11

3 authors, including:

Guang HUI Hua Manjunath Aradhya


University of Mysore Sri Jayachamarajendra College of Engineering
3 PUBLICATIONS 0 CITATIONS 151 PUBLICATIONS 939 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Guang HUI Hua on 02 February 2024.

The user has requested enhancement of the downloaded file.


Algorithms for Intelligent Systems
Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar

Ritu Tiwari
Mario F. Pavone
Mukesh Saraswat Editors

Proceedings
of International
Conference
on Computational
Intelligence
ICCI 2022
Algorithms for Intelligent Systems

Series Editors
Jagdish Chand Bansal, Department of Mathematics, South Asian University,
New Delhi, Delhi, India
Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee,
Roorkee, Uttarakhand, India
Atulya K. Nagar, School of Mathematics, Computer Science and Engineering,
Liverpool Hope University, Liverpool, UK
This book series publishes research on the analysis and development of algorithms for
intelligent systems with their applications to various real world problems. It covers
research related to autonomous agents, multi-agent systems, behavioral modeling,
reinforcement learning, game theory, mechanism design, machine learning, meta-
heuristic search, optimization, planning and scheduling, artificial neural networks,
evolutionary computation, swarm intelligence and other algorithms for intelligent
systems.
The book series includes recent advancements, modification and applications of
the artificial neural networks, evolutionary computation, swarm intelligence, artifi-
cial immune systems, fuzzy system, autonomous and multi agent systems, machine
learning and other intelligent systems related areas. The material will be benefi-
cial for the graduate students, post-graduate students as well as the researchers who
want a broader view of advances in algorithms for intelligent systems. The contents
will also be useful to the researchers from other fields who have no knowledge of
the power of intelligent systems, e.g. the researchers in the field of bioinformatics,
biochemists, mechanical and chemical engineers, economists, musicians and medical
practitioners.
The series publishes monographs, edited volumes, advanced textbooks and
selected proceedings.
Indexed by zbMATH.
All books published in the series are submitted for consideration in Web of
Science.
Ritu Tiwari · Mario F. Pavone · Mukesh Saraswat
Editors

Proceedings of International
Conference
on Computational
Intelligence
ICCI 2022
Editors
Ritu Tiwari Mario F. Pavone
Indian Institute of Information Technology Department of Mathematics and
Pune, India Computer Science
University of Catania
Mukesh Saraswat Catania, Italy
Jaypee Institute of Information Technology
Noida, India

ISSN 2524-7565 ISSN 2524-7573 (electronic)


Algorithms for Intelligent Systems
ISBN 978-981-99-2853-8 ISBN 978-981-99-2854-5 (eBook)
https://doi.org/10.1007/978-981-99-2854-5

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2023

This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface

This book contains outstanding research papers as the proceedings of the Interna-
tional Conference on Computational Intelligence—(ICCI 2022), held on December
29–30, 2022, at Indian Institute of Information Technology, Pune, India, under the
technical sponsorship of the Soft Computing Research Society, India. The conference
is conceived as a platform for disseminating and exchanging ideas, concepts, and
results of researchers from academia and industry to develop a comprehensive under-
standing of the challenges of the advancements of intelligence in computational view-
points. This book will help in strengthening congenial networking between academia
and industry. We have tried our best to enrich the quality of the ICCI 2022 through the
stringent and careful peer-review process. This book presents novel contributions to
Computational Intelligence and serves as reference material for advanced research.
We have tried our best to enrich the quality of the ICCI 2022 through a stringent
and careful peer-review process. ICCI 2022 received many technical contributed
articles from distinguished participants from home and abroad. After a very strin-
gent peer-reviewing process, only 33 high-quality papers were finally accepted for
presentation and the final proceedings. The proceedings of ICCI 2022 contains 33
research papers on Computational Intelligence-based Algorithms and applications
and serves as reference material for advanced research.

Pune, India Ritu Tiwari


Catania, Italy Mario F. Pavone
Noida, India Mukesh Saraswat

v
Contents

1 Entropy Measure for the Linguistic Intuitionistic Fuzzy Set . . . . . . . 1


Ritu Malik and Kamal Kumar
2 IoT-Based Smart City Architecture and Its Applications . . . . . . . . . . 11
Sree Charan Mamidi, Shadab Siddiqui, and Sheikh Fahad Ahmad
3 Principal Component Analysis and Correlation
Coefficient-Based Decision-Making Approach for Stock
Portfolio Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Garima Bisht and A. K. Pal
4 Survey on Crop Production and Crop Protection . . . . . . . . . . . . . . . . . 39
H. S. Rakshitha, Mayur S. Gowda, and Akshata S. Kori
5 Disease Detection for Grapes: A Review . . . . . . . . . . . . . . . . . . . . . . . . . 51
Priya Deshpande and Sharada Kore
6 URL Weight-Based Round Robin Load Balancing in Cloud
Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Vijay Kumar Nampally, Satarupa Mohanty,
and Prasant Kumar Pattnaik
7 Determination of Thickness and Refractive Indices of Thin
Films from Reflectivity Spectrum Using Rao-1 Optimization
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Bhautik H. Gevariya, Sanjaykumar J. Patel, and Vipul Kheraj
8 Depth Maps-Based 3D Convolutional Neural Network and 3D
Skeleton Information with Time Sequence for HAR . . . . . . . . . . . . . . 89
Hua Guang Hui, G. Hemantha Kumar,
and V. N. Manjunath Aradhya
9 Deep Sea Debris Detection Using YOLOIncep Network . . . . . . . . . . . 101
J. Sudaroli Sandana, Sai Vignesh, R. Sharan, and S. Deivalakshmi

vii
viii Contents

10 Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means


and Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
M. Jeyavani and M. Karuppasamy
11 Precipitation Forecasting: LSTM Modeling in Visual Analytic
Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Sudha Govindan and Suguna Sangaiah
12 Cyclone Forecasting Before Eye Formation Using Deep
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Aryan Khandelwal, R. S. Ramya, S. Ayushi, R. Bhumika,
P. Adhoksh, Keshav Jhawar, Ayush Shah, and K. R. Venugopal
13 Fusion of Information Acquired from Camera and Ultrasonic
Range Finders for Obstacle Detection and Depth Computation . . . . 151
Jyoti Madake, Heenakauser Pyare, Sagar Nilgar, Sagar Shedge,
Shripad Bhatlawande, Swati Shilaskar, and Rajesh Jalnekar
14 Efficient Approach for Malware Detection Using Machine
Learning Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Umesh V. Nikam and Vaishali M. Deshmukh
15 Evaluation of a Hybrid Dataset for Risk Assessment of Heart
Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Indrani Mukherjee, Pratik Bhattacharjee, and Suparna Biswas
16 Distances from Fuzzy Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Kavit Nanavati, Megha Gupta, and Balasubramaniam Jayaram
17 Real-Time Quick Fog Removal Technique for Supporting
Vehicles on Hilly Routes Amid Dense Fog . . . . . . . . . . . . . . . . . . . . . . . . 199
K. Janaki, K. Jebastin, and K. Dhinakaran
18 Deep Learning-Based Approach for Outlier Detection
in Wireless Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Biswaranjan Sarangi and Biswajit Tripathy
19 Predicting Kidney Tumor Using Convolutional Neural
Network (CNN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Kajal Rai and Pawan Kumar
20 Hybrid Machine Learning Approach for Sentiment Analysis
of Amazon Products: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Om Sarulkar, Rahul Pitale, Shivam Tikhe, Rohan More,
and Sumit Giri
21 Sentimentum: A Method of Detecting Fake News . . . . . . . . . . . . . . . . . 249
Vitor da Silva Souza and Leandro Augusto Silva
Contents ix

22 Artificial Neural Networks for Self-phase Modulation


Compensation in Unrepeated Digital Coherent Optical
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Grazielle Cossa, Camila Costa, Vitória Cesar, Lucas Marim,
Rafael Penchel, José Augusto de Oliveira, Mirian Santos,
Denilson Souza dos Santos, and Ivan Aldaya
23 Comparative Analysis of Cognitive Services in Popular Cloud
Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Preethi Sheba Hepsiba Darius, K. Krishna Sowjanya,
V. N. Manju, Sanchari Saha, Paramita Mitra, S. Aswathi,
Bhuvanesh Bhattarai, and Shreekanth M. Prabhu
24 A Survey on Efficient Neural Network Compression
Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Nipun Jain, Medha Wyawahare, Vivek Mankar,
and Tanmay Paratkar
25 Ortho-FLD: Analysis of Emotions Based on EEG Signals . . . . . . . . . 299
M. S. Thejaswini, G. Hemantha Kumar,
and V. N. Manjunath Aradhya
26 Implementation of Reliable Post-disaster Relief
Communication Network Using Hybrid Secure Routing
Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
G. Sabeena Gnana Selvi, A. Prasanth, D. Sandhya,
and B. Gracelin Sheena
27 Compact Metamaterial Octagonal Antenna for Wireless Body
Area Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
Goswami Siddhant Arun and Deepak C. Karia
28 Brain Tumor Detection and Segmentation Empowered
with Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Pooja V. Kamat, Rahul Mansharamani, Pratyush Jain,
Sudhanshu Pandey, Prakhar Agarwal, Shruti Patil, and Rahul Joshi
29 Security of Electronic Voting Systems Using Blockchain
Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Rakesh Kumar Pandey and Rakesh Kumar Tiwari
30 Go-Kart Simulation in HoloLens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
K. Paridhi, Shola Olabisi, Y. V. Srinivasa Murthy, and J. Vaishnavi
31 A Survey on Different Techniques for Anomaly Detection . . . . . . . . . 365
Priyanka P. Pawar and Anuradha C. Phadke
32 A Scholastic Comprehensive Study on 6G Wireless
Communication System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
Kavita H. Gudadhe, Warsha P. Sirskar, and Swati Gaikwad
x Contents

33 A Modified LSB Steganography Algorithm to Store Images


of Large Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
Y. V. Srinivasa Murthy, Shashidhar G. Koolagudi, Saloni Parekh,
Deshpande Arnav Sunil, and J. Vaishnavi

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407


About the Editors

Prof. Ritu Tiwari is currently working as Professor in Department of Computer


Science and Engineering at Indian Institute of Information Technology (IIIT) Pune.
Before joining IIIT Pune, she was Associate Professor in Department of Informa-
tion and Communication Technology at ABV-Indian Institute of Information Tech-
nology and Management (IIITM) Gwalior. She has 12 years of teaching and research
experience. Her field of research includes robotics, artificial intelligence, and soft
computing and applications. She has published five books and more than 80 research
papers in various national and international journals/conferences and is Reviewer for
many international journals/conferences. She has received Young Scientist Award
from Chhattisgarh Council of Science and Technology in the year 2006. She also
received Gold Medal in her postgraduation from NIT, Raipur.

Dr. Mario F. Pavone is currently working as Associate Professor in Computer


Science at the Department of Mathematics and Computer Science, University of
Catania, Italy. Professor Pavone is focused on the design and development of meta-
heuristics applied in several research areas, such as in combinatorial optimization;
computational biology; network sciences and social networks. Professor Pavone was
Visiting Professor with fellowship at the Faculty of Sciences, University of Angers,
France, in 2016. From August 2017, Prof. Pavone is Member of the IEEE Task
Force on the Ethical and Social Implications of Computational Intelligence, for the
IEEE Computational Intelligence Society (IEEE CIS). Since February 2015, Prof.
Pavone is Vice-Chair of the Task Force on Interdisciplinary Emergent Technologies
for the IEEE Computational Intelligence Society (Emergent Technologies Technical
Committee—ETTC), whose main aim is to promote the interdisciplinary study of
emergent computation in bio-informatics, bio-physics, interdisciplinary domains of
economy, medicine, and industry. Professor Pavone also served as the Chair of the
Task Force on Artificial Immune Systems for the IEEE Computational Intelligence
Society (IEEE CIS). Professor Pavone is Member of several Editorial Boards for
international journals, as well as Member of many Program Committees in interna-
tional conferences and workshops. Professor Pavone has also an extensive experience
of organizing successful workshops, symposium, conferences, and summer schools.

xi
xii About the Editors

Professor Pavone was also Invited Speaker for several international conferences and
Editor of many special issues in: artificial life, engineering applications of artificial
intelligence (EAAI), applied soft computing (ASOC), BMC immunology, natural
computing, and memetic computing. etc. Professor Pavone is Co-founder of Tao
Science Research Center and Scientific Director of ANTs Lab—Advanced New
Technologies Research Laboratory. Professor Pavone was Visiting Professor at the
School of Computer Science, University of Nottingham, UK, and Visiting Researcher
at the IBM-KAIST Bio-Computing Research Center, Department of Bio and Brain
Engineering, at the Korea Advanced Institute of Science and Technology (KAIST)
in 2009 and 2006, respectively.

Dr. Mukesh Saraswat is Associate Professor at Jaypee Institute of Information


Technology, Noida, India. Dr. Saraswat obtained his Ph.D. in Computer Science and
Engineering from ABV-IIITM Gwalior, India. He has more than 19 years of teaching
and research experience. He has guided three Ph.D. students and presently guiding
four Ph.D. students. He has published more than 70 journal and conference papers in
the area of image processing, pattern recognition, data mining, and soft computing.
He was part of a successfully completed project funded by SERB, New Delhi, on
image analysis and currently running one project funded by CRS, RTU, Kota. He has
been Active Member of many organizing committees for various conferences and
workshops. He is also Guest Editor of the Array, Journal of Swarm Intelligence, and
Journal of Intelligent Engineering Informatics. He is one of the General Chairs of
the International Conference on Data Science and Applications. He is also Editorial
Board Member of the Journal MethodsX. He is also Series Editor of the SCRS
Book Series on Computing and Intelligent Systems (CIS). He is Active Member of
IEEE, ACM, CSI, and SCRS Professional Bodies. His research areas include image
processing, pattern recognition, data mining, and soft computing.
Chapter 1
Entropy Measure for the Linguistic
Intuitionistic Fuzzy Set

Ritu Malik and Kamal Kumar

1 Introduction

Decision-making (DM) is an important part of the human life. In every field of


human life like as business, society, medical science, project evaluation, etc., DM
is a common activity. During the decision-making process, various decision-makers
face the various uncertainties. To overcome this issue, in 1965, fuzzy set (FS) the-
ory was proposed by Zadeh [1]. In some particular circumstances, FS theory was
unable to give a proper explanation of provided information. Then, Atanassov [2]
proposed the extension of fuzzy set, which is known as intuitionistic fuzzy set (IFS).
Decision-makers realize that IF sets are much convenient for practical presentation
of quantitative fuzzy information. In the last decades, many researchers have devel-
oped various tools and technologies in the field of IFSs. Chen et al. [3] constructed
the MADM approach using the TOPSIS techniques for the IFNs environment. In [4],
Feng et al. defined a MADM approach for the IFNs environment based on Minkowski
weighted score. Dhankhar and Kumar [5] defined the advanced possibility degree
measures for the IFSs. Dhankhar et al. [6] defined a ranking method for comparing
the IFSs. Kumar and Chen [7] defined the Heronian mean aggregation operators
(AOs) for combing the intuitionistic fuzzy numbers (IFNs). In [8], Garg presented
the interactive averaging AOs for LIFNs. Kumar and Chen [9] defined the improved
Einstein AOs for the IFNs.
To measure the uncertainty, entropy measure (EM) is an effective tool that can
depict fuzziness of the data. Szmidt and Kacprzyk [10] broadened the idea of entropy
measure for IFSs. Zhang and Jiang [11] defined the logarithmic EM for IFSs, while
[12] introduced an EM based on cosine function for IFS. Liu and Ren [13] realized
that the existing EMs did not contain the hesitance degree of IFS and also defined
an EM by including the degree of uncertainty of IFS. Garg and Kaur [14] defined a

R. Malik · K. Kumar (B)


Department of Mathematics, Amity School of Applied Sciences, Amity University Haryana,
Gurugram, India
e-mail: kamalkumarrajput92@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 1
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_1
2 R. Malik and K. Kumar

novel (R,S)-norm EM for IFS. Another main important part of solving the MADM
issues is aggregating the provided decision-maker(s)’ data.
However, IFS is not so proficient, when we work with qualitative fuzzy infor-
mation. It is much easier to express qualitative fuzzy information with linguistic
variables [15]. For example, when quality of some food product is assessed, terms
like “not good”, “good”, and “very good” are generally adopted by decision-makers
to support their choice. To handle the qualitative data’s uncertainty, in 2015, [16]
have developed the linguistic intuitionistic fuzzy set (LIFS) by combining the char-
acteristics of LV and IFS. Kumar and Chen [17] defined the weighted averaging
AOs for aggregating LIFNs. Liu and Wang [18] defined the improved AOs for lin-
guistic intuitionistic fuzzy numbers (LIFNs). Peng et al. [19] presented the AOs for
LIFNs through the use of Frank Heronian operations. Set pair analysis (SPA) theory
based AOs for LIFNs is proposed by Garg and Kumar [20]. Garg and Kumar [21]
defined the possibility degree measure for comparing the LIFNs. Kumar and Chen
[22] defined the distance measures for the LIFSs and group decision-making method
for LIFSs. Meng and Dong [23] defined the similarity measures and PROMETHEE
method based on it for the LIFSs. Tang and Meng [24] defined the Hamacher aggre-
gation operators for aggregating the LIFNs. Liu et al. [25] defined the three-way
decision method for LIFNs. Li et al. [26] proposed the entropy measure for LIFSs
and extended VIKOR method based on LIFS operations laws and proposed entropy
measure. In 2021, [27] defined a new entropy measure for LIFS for solving decision-
making problem.
However, on mathematical verification, we found some inadequacy in existing
EMs of LIFSs. To overcome these drawbacks, there is a requirement of distinct EM
to measure the uncertainty of LIFSs. This paper proposes a new EM for the LIFSs.
We also defined the proof of some desirable properties and validity condition of the
presented EM of LIFSs to validate it. The proposed EM can overcome the downsides
of the current EMs of the LIFSs. The proposed EM is very easy and useful to calculate
the uncertainty of the LIFSs.
To achieve the above mentioned target, rest part of the paper is concluded as: In
Sect. 2, brief introduction of fundamental concepts, which are relevant to this paper,
is given. The drawbacks of the current EMs are given in Sect. 3. In Sect. 4, we have
defined a new EM for LIFS environment that can defeat the disadvantages of the
current EMs of LIFSs. Finally, Sect. 5 concludes the paper.

2 Preliminaries
 
Definition 1 [28] Let a linguistic term (LT) set (LTS) be S = st | t = 0, 1, 2, . . . , h
with a finite odd cardinality, where st is a desired value for a linguistic variable (LV).
For example, when evaluating a laptop’s “configuration”, we can implement seven
LTs as s0 (“none”), s1 (“very low”), s2 (“low”), s3 (“medium”), s4 (“high”), s5 (“very
high”), and s6 (“perfect”).
1 Entropy Measure for the Linguistic Intuitionistic Fuzzy Set 3

LTS must satisfy the following properties [28]:


(i) sk ≤ st ⇔ k ≤ t;
(ii) Neg(sk ) = sh−k ;
(iii) max(sk , st ) = sk ⇔ sk ≥ st ;
(iv) min(sk , st ) = st ⇔ sk ≥ st .
Later on, discrete LTS S is extended to continuous LTS by [29] as
 
S[0,h] = sz | s0 ≤ sz ≤ sh .

Definition 2 [16] A linguistic intuitionistic fuzzy set (LIFS) in the universe of dis-
courese U is defined as

Z = {u, sρ(u i ) , sη(u i )  | u i ∈ U } (1)

where sρ(u i ) ∈ s[0,h] and sη(u i ) ∈ s[0,h] indicate the belongingness degree (BD) and
non-belongingness degree (NBD) of the element u i ∈ U to Z , respectively, 0 ≤
ρ(u i ) ≤ h, 0 ≤ η(u i ) ≤ h, and 0 ≤ ρ(u i ) + η(u i ) ≤ h. sπ(u i ) = sh−ρ(u i )−η(u i ) is
called the hesitance degree of u i to Z where 0 ≤ π(u i ) ≤ h, u i ∈ U .
Usually, the pair sρ , sη  is defined as linguistic intuitionistic fuzzy number (LIFN)
where 0 ≤ ρ ≤ h, 0 ≤ η ≤ h, and 0 ≤ ρ + η ≤ h.
Let [0,h] be the collection of the LIFSs.
Definition 3 [16] Let β1 = (sρ1 , sη1 ) and β2 = (sρ2 , sη2 ) be any two LIFNs, then

(1) β1 β2 = (sρ1 +ρ2 − ρ1hρ2 , s η1hη2 );

(2) β1 β2 = (s ρ1 ρ2 , sη1 +η2 − η1hη2 );
h
(3) kβ = k(sρ , sη ) = (sh−h(1− ρh )k , sh( hη )k );
(4) β k = (sρ , sη )k = (sh( ρh )k , (sh−h(1− hη )k );
where k > 0.
Definition 4 [16] For any LIFN β = (sρ , sη ), score value S(β) and accuracy func-
tion H (β) are represented as:

S(β) = ρ − η (2)

where S(β) ∈ [−h, h]

H (β) = ρ + η (3)

where H (β) ∈ [0, h].


Definition 5 [26, 27] Let Z = {u i , sρ(u i ) , sη(u i )  | u i ∈ U } ∈ [0,h] be any LIFS,
then the entropy measure (EM) E(Z ) must satisfy the following properties:
4 R. Malik and K. Kumar

(P1) E(Z ) = 0, if and only if Z is linguistic set.


(P2) E(Z ) = 1, if and only if sρ(u i ) = sη(u i ) ; for every u i ∈ U .
(P3) E(Z ) = E(Z c ).
(P4) For any Z 1 , Z 2 ∈ [0,h] if Z 1 is less fuzzy than Z 2 , then E(Z 1 ) ≤ E(Z 2 ), i.e.,
ρ1 (u i ) ≤ ρ2 (u i ), η2 (u i ) ≤ η1 (u i ) for ρ2 (u i ) ≤ η2 (u i ) or ρ1 (u i ) ≥ ρ2 (u i ), η2
(u i ) ≥ η1 (u i ) for ρ2 (u i ) ≥ η2 (u i )∀u i ∈ U .

In the following, we are reviewing the some existing EMs for the LIFSs. Let Z =
{u i , sρ(u i ) , sη(u i )  | u i ∈ U } ∈ [0,h] be any LIFS, then

(a) Kumar et al.’s EM [27]:

1   
n
E 1 (Z ) = 4 ρ(u i ).η(u i ) + π (u i ) + 2 (h − ρ(u i ))(h − η(u i )) .
3nh i=1
(4)

(b) Li et al.’s EM [26]:

1  h − |ρ(u i ) − η(u i )| + π(u i )


n
E 2 (Z ) = . (5)
n i=1 h + π(u i )

3 Drawbacks of the Existing Entropy Measures

Definition 6 [16] Let Z = {u, sρ(u i ) , sη(u i )  | u i ∈ U } be any LIFS and k > 0, then
Z k is defined as

Z k = u i , s 
ρ(u i ) k ,s 
η(u ) k  | ui ∈ U . (6)
h h h 1− 1− h i

Example 1 Let a LIFS Z as “good” on U as

Z = {u 1 , s1 , s7 , u 2 , s4 , s1 , u 3 , s2 , s6 , u 4 , s5 , s2 , u 5 , s3 , s3 } ∈ [0,8] . (7)

By using Eq. (6), we obtain


Z 1/2 = u 1 , s2.8284 , s5.1716 , u 2 , s5.6569 , s0.5167 , u 3 , s4 , s4 , u 4 , s6.3246 , s1.0718 ,

u 5 , s4.8990 , s1.6754  may be treated as “not good”;
 
Z = u 1 , s1 , s7 , u 2 , s4 , s1 , u 3 , s2 , s6 , u 4 , s5 , s2 , u 5 , s3 , s3  may be treated as
“GOOD”; 
Z 2 = u 1 , s0.1256 , s7.8750 , u 2 , s2 , s1.8750 , u 3 , s0.5 , s7.5 , u 4 , s3.1250 , s3.5 ,

u 5 , s1.1250 , s4.8750  may be treated as “very good”;
1 Entropy Measure for the Linguistic Intuitionistic Fuzzy Set 5

Z 3 = u 1 , s0.0156 , s7.9844 , u 2 , s1 , s2.6406 , u 3 , s0.1250 , s7.8750 , u 4 , s1.9531 , s4.6250 ,

u 5 , s0.4219 , s6.0469  may be treated as “quite good”;

Z 4 = u 1 , s0.0020 , s7.9980 , u 2 , s0.5 , s3.3105 , u 3 , s0.0312 , s7.9688 , u 4 , s1.2207 , s5.4688 ,

u 5 , s0.1582 , s6.7793  may be treated as “very very good”.

Now, by utilizing Eq. (4), we calculate the existing EM E 1 for the LIFSs Z 1/2 ,
Z , Z 2 , Z 3 , and Z 4 and get E 1 (Z 1/2 ) = 0.8630, E 1 (Z ) = 0.8698, E 1 (Z 2 ) = 0.7181,
E 1 (Z 3 ) = 0.5773, and E 1 (Z 4 ) = 0.4689.
For the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4 , an effective EM must satisfy the following
relation [13, 14, 27]:

E(Z 1/2 ) > E(Z ) > E(Z 2 ) > E(Z 3 ) > E(Z 4 ). (8)

Based on the computed result of the existing EM [27] given in Eq. (4), we obtain
E 1 (Z ) > E 1 (Z 1/2 ) > E 1 (Z 2 ) > E 1 (Z 3 ) > E 1 (Z 4 ). Thus the existing EM E 1 given
in Eq. (4) does not satisfy the relation given in Eq. (8) for this example. Hence, we
require a new EM for LIFSs that overcomes the disadvantages of the existing EM of
LIFSs.

Example 2 Let Z 1 = s0.6 , s0.5 , Z 2 = s2.8 , s3 , Z 3 = s2.9 , s3.1 , Z 4 = s3.79 , s2.31 ,


and Z 5 = s2.729 , s4.1  be any five LIFNs, and Z t ∈ [0,h] , ∀t = 1, 2, 3, 4, 5. Now, we
calculate the existing EMs E 1 and E 2 given in Eqs. (4) and (5), respectively, and
get E 1 (Z 1 ) = E 1 (Z 2 ) = E 1 (Z 3 ) = 0.9996 and E 2 (Z 4 ) = E 2 (Z 5 ) = 0.8505. Thus,
from the result, it is clear that the existing entropy measures E 1 and E 2 given in
Eqs. (4) and (5), respectively, are inconsistent. So, there is a need to enhance these
measures.

4 Proposed Entropy Measure for LIFS

In this section, we propose a new entropy measure of the LIFSs.


Definition 7 Let Z = {u i , sρ(u i ) , sη(u i )  | u i ∈ U } ∈ [0,h] be any LIFS, then the
proposed entropy measure E(Z ) for the LIFS Z is defined as:

1 
n
1
E(Z ) = h − |ρ(u i ) − η(u i )|(h − π (u i )) (9)
nh i=1 h

Theorem 1 The proposed entropy measure E(Z ) of LIFS Z = {u i , sρ(u i ) , sη(u i )  |
u i ∈ U } ∈ [0,h] satisfies the properties given in Definition 5.

Proof Let a LIFS Z = {u, sρ(u i ) , sη(u i )  | u i ∈ U } ∈ [0,h] .


6 R. Malik and K. Kumar

(P1) We have E(Z) = 0

1  1 
⇔ h − |ρ(u i ) − η(u i )|(h − π(u i )) = 0
nh h
1
⇔ h − |ρ(u i ) − η(u i )|(h − π(u i )) = 0
h
⇔ h 2 − |ρ(u i ) − η(u i )|(h − π(u i )) = 0
⇔ ρ(u i ) = h, η(u i ) = 0 or ρ(u i ) = 0, η(u i ) = h

(P2) We have E(Z) = 1

1  1 
⇔ h − |ρ(u i ) − η(u i )|(h − π(u i )) = 1
nh h
 1 
⇔ h − |ρ(u i ) − η(u i )|(h − π(u i )) = h
h
1 
⇔ |ρ(u i ) − η(u i )|(h − π(u i )) = 0
h
⇔ |ρ(u i ) − η(u i )|(h − π(u i )) = 0
⇔ ρ(u i ) = η(u i )

(P3) Z c = {u, sρ(u i ) , sη(u i )  | u i ∈ U }. Then

1  1 
E(Z ) = h− |ρ(u i ) − η(u i )|(h − π(u i ))
nh h
1  1 
= h− |η(u i ) − ρ(u i )|(h − π(u i ))
nh h
= E(Z c )
 
(P4) Consider the function f (x, y) = h − h1 |x − y|(x + y) , where 0 ≤ x, y ≤ h
and 0 ≤ x + y ≤ h. We must demonstrate that when x ≤ y, the function f (x, y)
increases with respect to x and decreases with respect to y. We have

∂ f (x, y) 1
= − [(|x − y| + (x + y))]
∂x h
∂ f (x, y) 1
= − [(|x − y| − (x + y))].
∂y h

Since ∂ f ∂(x,y)
x
≥ 0 and ∂ f (x,y)
∂y
≤ 0 for x ≤ y. Thus, for x ≤ y, the function f (x, y)
increases with respect to x and decreases with respect to y. Hence, f (ρ1 (u i ), η1 (u i )) ≤
f (ρ2 (u i ), η2 (u i )) when ρ2 (u i ) ≤ η2 (u i ) and ρ1 (u i ) ≤ ρ2 (u i ), η1 (u i ) ≥ η2 (u i ).
1 Entropy Measure for the Linguistic Intuitionistic Fuzzy Set 7

Table 1 Value of EMs E 1 (.), E 2 (.), and E(.) for the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4
E1 E2 E
Z 1/2 0.8630 0.6463 0.6546
Z 0.8698 0.6288 0.6375
Z2 0.7181 0.5462 0.5517
Z3 0.5773 0.4057 0.4197
Z4 0.4689 0.3182 0.3358

Similarly, ∂ f ∂(x,y)x
≤ 0 and ∂ f (x,y)
∂y
≥ 0 for x ≥ y. Thus, for x ≥ y, the func-
tion f (x, y) decreases with respect to x and increases with respect to y. Hence,
f (ρ1 (u i ), η1 (u i )) ≤ f (ρ2 (u i ), η2 (u i )) when ρ2 (u i ) ≥ η2 (u i ) and ρ1 (u i ) ≥ ρ2 (u i ),
η1 (u i ) ≤ η2 (u i ). n
Therefore, if H1 is less fuzzy compare to H2 , then n1 i=1 f (ρ1 (u i ), η1 (u i )) ≤
 n
1
n i=1 f (ρ 2 (u i ), η 2 (u i )). Hence, E(H 1 ) ≤ E(H 2 ).
 
Example 3 Let a LIFS Z = u 1 , s1 , s7 , u 2 , s4 , s1 , u 3 , s2 , s6  ∈ [0,8] . By using
Eq. (9), we computed the proposed EM E(Z ) of the LIFS Z as follows:

1 
n
1
E(Z ) = (h − |ρ(u i ) − η(u i )|(h − π (u i )))
nh h
i=1
 
1 1 1 1
= (8 − |1 − 7|(8 − 0)) + (8 − |4 − 1|(8 − 3)) + (8 − |2 − 6|(8 − 0))
3×8 8 8 8
 
1 1 1 1
= (8 − (6)(8)) + (8 − (3)(5)) + (8 − (4)(8))
3×8 8 8 8
 
1 49
= (2) + + (4)
24 8
= 0.5052.

Example 4 Consider the same LIFSs from Example 1 to calculate the proposed EM
E(.) for the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4 . By utilizing Eq. (9), we calculate the pro-
posed EM E(.) for the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4 and obtain E(Z 1/2 ) = 0.6546,
E(Z ) = 0.6375, E(Z 2 ) = 0.5517, E(Z 3 ) = 0.4197, and E(Z 4 ) = 0.3358. Hence,
the proposed EM satisfies the relation E(Z 1/2 ) > E(Z ) > E(Z 2 ) > E(Z 3 ) >
E(Z 4 ). Hence, proposed EM of LIFSs is a valid EM.
We make a comparative study for the Example 4. Table 1 consists the value of EMs
E 1 (.), E 2 (.), and E(.) for the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4 given in Example 1.
From Table 1, it is visible that performances of EMs E 2 (.) and E(.) are according
to the relation given in Eq. (8), while the performance of the EM E 1 (.) is not according
to the the relation given in Eq. (8).

Example 5 Consider the same LIFNs Z 1 = s0.6 , s0.5 , Z 2 = s2.8 , s3 , Z 3 = s2.9 ,


s3.1 , Z 4 = s3.79 , s2.31 , and Z 5 = s2.729 , s4.1  as in Example 2 to calculate the pro-
8 R. Malik and K. Kumar

Table 2 Value of EMs E 1 (.), E 2 (.), and E(.) for the LIFNs Z 1 , Z 2 , Z 3 , Z 4 , and Z 5
E 1 (.) E 2 (.) E(.)
Z1 0.9996 0.9933 0.9983
Z2 0.9996 0.9804 0.9819
Z3 0.9996 0.9800 0.9812
Z4 0.9802 0.8505 0.8589
Z5 0.9841 0.8505 0.8537

posed EM E(.). By using Eq. (9), we calculate the proposed EM E(.) for the LIFNs
Z 1 , Z 2 , Z 3 , Z 4 , and Z 5 as follows: 
E(Z 1 ) = 18 (8 − 18 |0.6 − 0.5|(8 − 6.9)) = 0.9983,
 
E(Z 2 ) = 18 (8 − 18 |2.8 − 3.0|(8 − 2.2)) = 0.9819,
 
E(Z 3 ) = 18 (8 − 18 |2.9 − 3.1|(8 − 2.0)) = 0.9812,
 
E(Z 4 ) = 18 (8 − 18 |3.79 − 2.31|(8 − 1.9)) = 0.8589,
 
E(Z 5 ) = 18 (8 − 18 |2.729 − 4.1|(8 − 1.1710)) = 0.8537.

We make a comparative study for Example 5. Table 2 consists the value of EMs
E 1 (.), E 2 (.), and E(.) for the LIFNs Z 1 , Z 2 , Z 3 , Z 4 , and Z 5 given in Example 2.
From Table 2, it is visible that E 1 (Z 1 ) = E 1 (Z 2 ) = E 1 (Z 3 ) = 0.9996 and E 2
(Z 4 ) = E 2 (Z 5 ) = 0.8505 while Z 1 , Z 2 , Z 3 , Z 4 , and Z 5 all are different. Hence,
proposed EM E(.) can address the shortcomings of the existing EMs E 1 and E 2 of
the LIFSs given in Eqs. (4) and (5), respectively.

Examples 4 and 5 show that the proposed EM of LIFSs can address the flaws
of the existing EMs of LIFSs. The proposed EM is a useful tool for depicting the
uncertainty of LIFSs.

5 Conclusion

Linguistic intuitionistic fuzzy set (LIFS) is a dynamic continuation of the fuzzy set to
express and deal with fuzziness of qualitative information. This paper proposed a new
entropy measure (EM) for LIFSs, which not only contain belongingness degree and
non-belongingness degree even include the grade of uncertainty. The proposed EM
is used to measure the uncertainty of the LIFSs. Certain properties of the proposed
EM have also been discussed to validate the proposed EM. The proposed EM can
overcome the disadvantages of the existing EMs of the LIFSs. The proposed EM is
very useful for the decision-makers to measure the uncertainty of any LIFSs. In the
future, we will prepare some decision-making methods for the LIFSs environment
based on the proposed EM. By using the proposed EM, we can measure weights for
the attributes in decision-making problems.
1 Entropy Measure for the Linguistic Intuitionistic Fuzzy Set 9

References

1. Zadeh LA (1965) Fuzzy sets. Inform Control 8(3):338–353


2. Atanassov KT (1986) Intuitionistic fuzzy sets. Fuzzy Sets Syst 20(1):87–96
3. Chen SM, Cheng SH, Lan TC (2016) Multicriteria decision making based on the TOPSIS
method and similarity measures between intuitionistic fuzzy values. Inform Sci 367:279–295
4. Feng F, Zheng Y, Alcantud JCR, Wang Q (2020) Minkowski weighted score functions of
intuitionistic fuzzy values. Mathematics 8(7):1143. https://doi.org/10.3390/math8071143
5. Dhankhar C, Kumar K (2022) Multi-attribute decision-making based on the advanced possi-
bility degree measure of intuitionistic fuzzy numbers. In: Granular computing, pp 1–12
6. Dhankhar C, Yadav AK, Kumar K (2022) A ranking method for q-rung orthopair fuzzy set
based on possibility degree measure. In: Soft computing: theories and applications, volume
425 of Lecture notes in networks and systems. Springer, pp 15–24. https://doi.org/10.1007/
978-981-19-0707-4_2
7. Kumar K, Chen SM (2022) Group decision making based on advanced intuitionistic fuzzy
weighted Heronian mean aggregation operator of intuitionistic fuzzy values. Inform Sci
601:306–322
8. Garg H (2016) Some series of intuitionistic fuzzy interactive averaging aggregation operators.
SpringerPlus 5(1):1–27
9. Kumar K, Chen SM (2021) Multiattribute decision making based on the improved intuitionistic
fuzzy Einstein weighted averaging operator of intuitionistic fuzzy values. Inform Sci 568:369–
383
10. Szmidt E, Kacprzyk J (2001) Entropy for intuitionistic fuzzy sets. Fuzzy Sets Syst 118(3):467–
477
11. Zhang QS, Jiang SY (2008) A note on information entropy measures for vague sets and its
applications. Inform Sci 178(21):4184–4191
12. Wei CP, Gao ZH, Guo TT (2012) An intuitionistic fuzzy entropy measure based on trigono-
metric function. Control Decis 27(4):571–574
13. Liu M, Ren H (2014) A new intuitionistic fuzzy entropy and application in multi-attribute
decision making. Information 5(4):587–601
14. Garg H, Kaur J (2018) A novel (r, s)-norm entropy measure of intuitionistic fuzzy sets and its
applications in multi-attribute decision-making. Mathematics 6(6):92
15. Zadeh LA (1975) The concept of a linguistic variable and its application to approximate
reasoning–I. Inform Sci 8(3):199–249
16. Chen Z, Liu P, Pei Z (2015) An approach to multiple attribute group decision making based on
linguistic intuitionistic fuzzy numbers. Int J Comput Intell Syst 8(4):747–760
17. Kumar K, Chen SM (2022) Multiple attribute group decision making based on advanced lin-
guistic intuitionistic fuzzy weighted averaging aggregation operator of linguistic intuitionistic
fuzzy numbers. Inform Sci 587:813–824. https://doi.org/10.1016/j.ins.2021.11.014
18. Liu P, Wang P (2017) Some improved linguistic intuitionistic fuzzy aggregation operators and
their applications to multiple-attribute decision making. Int J Inform Technol Decis Making
16(03):817–850
19. Peng H, Wang J, Cheng P (2018) A linguistic intuitionistic multi-criteria decision-making
method based on the Frank Heronian mean operator and its application in evaluating coal mine
safety. Int J Mach Learn Cybern 9:1053–1068. https://doi.org/10.1007/s13042-016-0630-z
20. Garg H, Kumar K (2018) Some aggregation operators for linguistic intuitionistic fuzzy set and
its application to group decision-making process using the set pair analysis. Arab J Sci Eng
43(6):3213–3227
21. Garg H, Kumar K (2018) Group decision making approach based on possibility degree measures
and the linguistic intuitionistic fuzzy aggregation operators using Einstein norm operations. J
Multiple-Valued Logic Soft Comput 31:175–209
22. Kumar K, Chen SM (2022) Group decision making based on weighted distance measure of
linguistic intuitionistic fuzzy sets and the TOPSIS method. Inform Sci 611:660–676
10 R. Malik and K. Kumar

23. Meng F, Dong B (2021) Linguistic intuitionistic fuzzy PROMETHEE method based on simi-
larity measure for the selection of sustainable building materials. J Ambient Intell Humanized
Comput 1–21
24. Tang J, Meng F (2019) Linguistic intuitionistic fuzzy Hamacher aggregation operators and
their application to group decision making. Granular Comput 4(1):109–124
25. Liu J, Mai J, Li H, Huang B, Liu Y (2022) On three perspectives for deriving three-way decision
with linguistic intuitionistic fuzzy information. Inform Sci 588:350–380
26. Li Z, Liu P, Qin X (2017) An extended VIKOR method for decision making problem with
linguistic intuitionistic fuzzy numbers based on some new operational laws and entropy. J
Intell Fuzzy Syst 33(3):1919–1931
27. Kumar K, Mani N, Sharma A, Bhardwaj R (2021) A novel entropy measure for linguis-
tic intuitionistic fuzzy sets and their application in decision-making. In: Multi-criteria deci-
sion modelling: applicational techniques and case studies, p 121. https://doi.org/10.1201/
9781003125150
28. Herrera F, Martínez L (2001) A model based on linguistic 2-tuples for dealing with multigran-
ular hierarchical linguistic contexts in multi-expert decision-making. IEEE Trans Syst Man
Cybern Part B Cybern 31(2):227–234
29. Xu Z (2004) A method based on linguistic aggregation operators for group decision making
with linguistic preference relations. Inform Sci 166(1):19–30
Chapter 2
IoT-Based Smart City Architecture
and Its Applications

Sree Charan Mamidi, Shadab Siddiqui, and Sheikh Fahad Ahmad

1 Introduction

A technologically advanced urban setting known as a “smart city”i uses various elec-
trical devices and sensors to gather data. The information is then used to improve city
operations. Assets, resources, and services are successfully managed by using the
knowledge gathered from these data. Data are gathered from people, devices, build-
ings, and assets to monitor and control traffic and transportation systems, power
plants, utilities, water supply networks, garbage, criminal detection, information
management, schools, libraries, hospitals, and other community services. Smart
cities have superior monitoring, planning, and governance mechanisms in addition
to creative technology utilization [1]. The success of a smart city depends on its
capacity to forge a solid alliance between the public and private sectors, especially
in terms of bureaucracy and regulations.

2 Literature Review

In-depth discussion and assessment of the role of enabling technologies in smart


cities are provided in this study [3]. The obstacles and restrictions facing the creation
of smart cities are also highlighted in the report, along with potential solutions.
Three categories of challenges—technical, socioeconomic, and environmental—are
specifically mentioned, with details on each. A newly defined smart city paradigm
is suggested in the form of smart tourism for the Mauritius city of Port Louis [2].
This study examines smart tourism model examples and considers how they may

S. C. Mamidi (B) · S. Siddiqui · S. F. Ahmad


Koneru Lakshmaiah Education Foundation, Hyderabad, India
e-mail: mamidisreecharan20@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 11
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_2
12 S. C. Mamidi et al.

be included into Allam and Newman’s smart city framework. The purpose of this
study’s conclusions is to provide policymakers with information on alternative and
more pertinent economic potential for Port Louis through smart tourism.
A concept is suggested [5] that handles a smart city’s island functioning, which
transforms it into a smart island. This work uses cloud theory in addition to smart
island modeling to quantify the uncertainties in STS and MG. Finally, the suggested
model is simulated to check for accuracy and efficacy. A methodology based on a
conceptual IoT implementation process is proposed [10], as a specific IoT applica-
tions, in a customized input–process–output model. The primary factors in the model
are the original conceptualization and definition of an IoT concept (input), which is
evaluated (process) before being deployed and potentially having an effect in practice
(output).

3 The Internet-of-Things (IoT)

The “Internet-of-Things” (IoT) is a network of physical objects, including furni-


ture, vehicles, structures, and other objects, which are connected to the internet
and equipped with sensors, electronics, software, and network connectivity [4, 8].
Through the network, IoT devices collect and exchange data from the real world.
Figure 1 depicts the primary IoT application for smart cities.
The most common use cases for IoT are:
• Smart cities—providing residents with more efficient traffic management systems
as well as more efficient lighting infrastructure [6, 7].
• Industrial automation—reducing costs by automating production processes
through sensors embedded in machinery.
• Health care—monitoring patient care via automated patient monitoring systems
(APMS).

Fig. 1 Primary IoT


application for smart cities
2 IoT-Based Smart City Architecture and Its Applications 13

3.1 Components Used

Arduino Uno3, Servo motors, IR Sensor, TCRT5000, LED, LCD, LDR, PIR Sensor,
Relay, Buzzer, 4H0.3 AH Battery, MQ5 Gas Sensor, Smoke Sensor.

4 Proposed Work

The proposed work consists of following:

4.1 Smart Home Automation

This module demonstrates the use of automated lighting in which human contact
is minimal, all works are done automatically, and two physical parameters, human
mobility, and light intensity, are managed as shown in Figs. 2 and 3. When a person
enters the room, the sensor detects it and the light turns on automatically, and when
the human exits, the light turns off automatically.
Equipment Used
LDR, PIR Sensor, Relay, Arduino Uno3.

Fig. 2 Flow diagram of


home automation
14 S. C. Mamidi et al.

Fig. 3 Circuit diagram for home automation

4.2 Smart Parking

This module demonstrates the application of automatic parking where there is no


human interaction which means human interaction is minor. And the task will be
done automatically with the assistance of sensors and other devices. The primary
goal of this module is to shorten the time required to seek parking places, hence
lowering fuel usage. The working of this module is that when any vehicles have
come for parking and he/she will see on the display board whether parking will be
full or empty, the sensor will give a command and that command will be displayed
on the board. In that smart way, we are consuming our time as depicted in Fig. 4.
Equipment Used
Arduino Uno3, Servo motors, IR Sensor, TCRT5000, LED, LCD.

Fig. 4 Circuit diagram for automation


2 IoT-Based Smart City Architecture and Its Applications 15

Fig. 5 Flow diagram of


smoke detector

4.3 Smart Water Monitoring System

In this module, we will evaluate and monitor water quality factors such as PH, soil
moisture, and temperature. This sensor provides information about the water level
task and communicates with the monitor section. This technology preserves the water
by using a real-time system to do active measurements.

4.4 Smoke Detector Alarm

This module is critical in smart cities since it will protect our homes and communities
well as shown in Fig. 5. As a result, the smoke detector in this module can detect
the presence of smoke, and when smoke is detected, a buzzer will immediately ring.
Individual battery-powered devices to numerous interconnected units with battery
backups are available for domestic smoke detectors.
Equipment Used
Buzzer, 4H0.3 AH Battery, MQ5 Gas Sensor, Smoke Sensor.

4.5 Smart Water Harvesting

In this module of the project, we need to construct a harvesting system which is a


collection of devices and a delivery system. Sometimes, rainfall can be exceeded,
and dram can be overflow. This module will develop one of the Uno microcontrollers
for this project; ultrasonic sensor and water sensor have been connected along with
the Uno microcontroller. When rainfall falls on the water sensor, a door linked to
a large pit is opened, and the ultrasensor and water level sensor calculate the level
stored in the pit as information, as well as all information created by a sensor.
16 S. C. Mamidi et al.

4.6 Proposed Model for IoT-Based Smart City Platform

Designing a fundamental architecture from the outset will act as a platform for
later improvements and enable the addition of new services without compromising
functional performance, which is essential for smart city deployment to scale. A
fundamental IoT solution for smart cities consists of four elements as shown in
Fig. 6.
• The network of smart objects
A smart city uses smart objects with sensors and actuators, much like any IoT
system. Data collection and transmission to a centralized cloud management plat-
form are the immediate goals of sensors. Devices can act thanks to actuators; for
example, they can change the lights or stop water from flowing into a leaky pipe.
• Gateways
Any IoT system consists of two components: a cloud component and a “phys-
ical” component made up of IoT devices and network nodes. Data cannot just
“flow” from one component to another. Field gateways and doors are neces-
sary. By cleaning and filtering data before sending it to the cloud, field gateways
make data collection and compression easier. Between field gateways and the
cloud component of a smart city solution, the cloud gateway enables safe data
transmission.
• Data lake
A data lake’s principal function is to store data. Data lakes maintain data in its
unprocessed form. The large data warehouse receives the extracted data when it
is required for insightful analyses.
• Big data storage

Fig. 6 Proposed model for IoT-based smart city


2 IoT-Based Smart City Architecture and Its Applications 17

One data repository makes up a massive data warehouse. In contrast to data lakes,
it solely includes structured data. Data are extracted, converted, and loaded into
the big data warehouse when its value has been determined. Additionally, it saves
the instructions that control apps send to the actuators of linked devices, such
as the date that sensors were installed, as well as contextual information about
connected things.

5 A Combination of Innovative Technologies Can


Transform Our Cities

Smart cities are the solution to many of the problems we face [12]. Smart cities can
be used to improve public safety, health care, and energy use—to name just a few
areas where smart technology is already being used as shown in Fig. 7. The future
of urban living looks bright with so many innovative technologies coming online in
this field.

5.1 City-Wide Information Systems for Sustainable Cities

A city-wide information system (CIS) is a network of technology and data that can
be used to improve efficiency and reduce carbon footprint [11]. These systems can
help cities to improve their sustainability, resilience, and prosperity by providing the
following:
• Information about emissions from various sources within the city.

Fig. 7 Fundamental objects


of smart cities
18 S. C. Mamidi et al.

• Information about available resources for energy generation or consumption in


different sectors of the economy.
• Data on weather patterns that impact climate change impacts at a local level.

5.2 Public Safety and Security

Smart cities will improve public safety and security.


• Smart city technologies are already being used to prevent crime, such as the
installation of cameras and sensors at traffic intersections that can detect when a
car is about to hit someone while passing through an intersection; these devices
record the license plate number of every vehicle that passes through them, which
police then cross-reference against their database of wanted criminals [15].
• Smart cities will be more efficient. The same technology that assists us in avoiding
accidents also allows us to monitor our energy consumption: for example, if you
leave your house without turning off your lights or AC (or even just setting them
to “low”), this could indicate an electrical circuit failure within your home—but
only if someone has access [16]! As a result, it is critical for anybody living in
a smart city environment who wants to save money on power bills while still
enjoying all the other benefits we discussed earlier—and we know that there are
few people who do not!

5.3 Smart Buildings and Infrastructure

Smart buildings and infrastructure can help to improve efficiency and reduce costs,
as well as provide several other benefits. For example:
• Smart buildings can help to save energy by minimizing the amount of heat
produced during the day. This not only makes people more comfortable in the
summer when heating expenditures are high, but it also decreases greenhouse gas
emissions from power plants or companies that generate heat.
• Smart buildings can help to minimize carbon dioxide emissions by improving
insulation and ventilation.
• Conditioning systems that use less electricity (and therefore produce less
pollution).
• Smart buildings can also help with security by monitoring security cameras in
real-time, so you know if someone has taken your property without permission or
if there is an intruder inside your building at night. If this happens before anyone
else notices what’s going on around them, then there is no need for expensive
repairs later down the line when other people discover their belongings scattered
across the floor because someone broke into their house looking for valuables
such as cash lying around on display tables full of coins waiting patiently until
2 IoT-Based Smart City Architecture and Its Applications 19

being picked up by someone who would take them home with them after paying
off debts owed due date coming up soon!

5.4 Energy and Environment Management Systems

Smart cities are a concept that has been around for some time now. The idea is to
create a more sustainable environment through technology and innovation, which
will help to make cities more liveable for everyone [5].
Smart meters, smart grids, and smart buildings are all parts of a larger utility
management system. These technologies allow utilities to monitor their assets more
closely than ever before and make sure that they are being used as efficiently as
possible.
This can help you to save money on things like electricity or natural gas usage
by reducing wasted energy or increasing production when necessary. It also helps
you to avoid outages by detecting when something goes wrong with your systems
(like water pipes breaking), so if this happens occasionally, it will not be an issue
anymore!
Smart appliances and smart city management are two more aspects of what makes
a city smart. They work together to make life easier for everyone involved, from
residents to businesses, as well as government agencies.
Smart appliances can save you money on your utility bill by not wasting energy
or water when they are not in use. They will also monitor themselves and notify you
if there is something wrong with them (like a leaky pipe), so it will be easy to fix
before getting worse! The smart city management aspect of what makes a city smart
is how these technologies work together with other aspects like public transportation
systems and emergency response teams.

5.5 Health Care and Telemedicine

Telemedicine is the delivery of medical services, diagnosis, and treatment through


telecommunications technology. It can be used to improve the quality of health care
in remote areas and reduce costs for patients who would otherwise have to travel
long distances for care.
Telemedicine is a form of telecommunication that provides access to information
and support from experts via a networked computer system or other devices such as
a smartphone or tablet computer. “This allows practitioners at any location across
an international border or within the same country (including those without internet
access) with little time investment required on their part; instead, they simply need
access through their existing equipment such as landlines or mobile phones.”
20 S. C. Mamidi et al.

6 Smart City Initiatives and Concepts Based on ICT

The first strategy presents SC as a city that makes innovative and clever use of existing
ICTs to accomplish its objectives. This definition states that the ICT infrastructures
of the “Smart City” are what enable a smarter, more connected, and more sustainable
metropolitan system.
The “Internet-of-Things” (IoT) paradigm, which offers a system where a range
of devices that can communicate with each other without human involvement is
present in large numbers, supports the need for this ICT deployment [9, 10]. In
this scenario, networked objects dispersed throughout the metro region push and
assist SC. By utilizing technologies like contemporary wireless sensing machine-to-
machine (M2M), radio-frequency identification (RFID), or wireless sensor networks,
the Internet-of-Things is anticipated to significantly contribute to more precise and
efficient resource consumption (WSN). By enabling access to a vast amount of data
“Big-Data” that can be assessed for potential future use using data mining techniques,
the “Internet-of-Things” is expected to successfully contribute to more precise and
efficient resource usage.
The concept of a smart city in which citizens, goods, services, and so forth
are seamlessly integrated with omnipresent technology is becoming a reality,
dramatically improving the experience in twenty-first-century urban regions [13, 14].
The domains of transportation, services, and power efficiency in cities have all
been the subject of proposals created using this methodology. All proposals connected
to big data and data mining can also be included. Numerous of them have also been
financed, developed, or promoted by significant ICT firms, like Endesa-Enel & IBM
in Malaga, Spain, and IBM in Songdo City.

6.1 Citizens-Centered Smart City Initiatives

One school of thought says that the construction of a truly smart city can only
be realized through the development of intelligent residents, who are the ones to
confer the “smart” quality on cities, in response to the difficulties given by the
technologically dominant SC model. These initiatives have opted for citizen-centric
and participatory strategies for the co-design and creation of smart cities rather than
viewing people as just another enabling component of the SC. The concept of a
human smart city is emerging as a completely new and unique sort of SC [12, 17].
Despite this, most initiatives to foster the growth of intelligent citizens have
restricted public involvement to functions like data source or tester of a pre-designed
concept or service, with only a few outliers incorporating people throughout the
process [11]. The notable exception has been the development of Living Labs in
the field of smart cities, where the environment has allowed for the emergence of
initiatives in which users have played a significant part at every stage.
2 IoT-Based Smart City Architecture and Its Applications 21

Table 1 Comparison between ICT-based and citizen-based SCs


The comparison between SC based on ICT Citizenship-Based SC
ICT-based and citizen-based SCs
Leadership Companies in the Neighborhood organizations
ICT/energy/utility sector Small groupings [11]
policymakers in the city
Assignee Organizations, governments, Citizens and participating
and residents collectives
Base for innovation Based on technology Innovation that is open or
collaborative
Priorities and objectives Development of cities The common good in social
infrastructure enhancements welfare
Citizen participation
Capital Public assets Crowdfunding by individuals
Private capital investment

Table 2 Benefits and drawbacks of ICT-based and citizen-based SC


The benefits and drawbacks of SC based on ICT Citizenship-Based SC
ICT-based and citizen-based SC
Benefits Safe funding for projects Ensured client participation
Massive media influence Initiatives with specific goals
Resources for data Concentrate on the common
mining good
Drawbacks Inadequate citizen Insufficient funding
involvement Inadequate communication
Ambiguous objectives abilities
Private advantages New tools and procedures are
required

Table 1 depicts the comparison between ICT-based and citizen-based SCs on


several factors.
Table 2 highlights the benefits and drawbacks of ICT-based and citizen-based SC
in real-life scenarios

6.2 Realize Smart Cities, It is Necessary to Create


an Artificial Intelligence-Based Decision Support System

Smart cities are based on the use of artificial intelligence (AI) to make better decisions
and better use of resources. AI is a powerful technology that can help to make cities
smarter and more efficient and save money by making better use of their existing
infrastructure and services.
The potential benefits of smart cities include [18]:
22 S. C. Mamidi et al.

• Better decision-making through predictive analytics.


• Reduced energy consumption due to smart meters.
• Improved efficiency through sensors measuring traffic flow.
• Optimization of transport networks with seamless integration between public
transport modes such as buses or trains.
• Better management tools for urban planning such as multi-modal planning
systems or city models that consider all factors affecting residents’ quality of
life including social connectivity, economic conditions, etc.

6.3 Smart Cities Are the Way of the Future

Smart cities are the future. Smart cities already exist, and they are being built all over
the world. They are such an integral part of our lives that we cannot imagine living
without them.
Smart cities are a way of life [19]. When you think about smart city technology,
what do you see? A lot of people would say “smart homes” or “smart cars,” but those
are just two ways that smart technology is helping us to live better lives today.

7 Conclusion

We live in a world of exponential change, where technology is transforming our


cities and our lives. We can shape a better world through smart cities, but we must
do so with foresight and care. Cities must be prepared for the future by investing in
modern technologies to support their residents and businesses. Cities need to collab-
orate across sectors as they develop solutions that address some of today’s biggest
challenges—climate change mitigation, public safety issues like violent crime or
natural disasters, and urban economic growth by attracting new residents into their
communities through public infrastructure improvements like public transit systems
which can help attract employers within walking distance from residences or schools.
This paper proposed a smart city assessment concept and provided a comparison of
smart cities based on ICT and Citizenship-Based SC along with their benefits and
drawbacks.

References

1. Ahad MA, Paiva S, Tripathi G, Feroz N (2020) Enabling technologies and sustainable smart
cities. Sustain Cities Soc 61:102301. https://doi.org/10.1016/j.scs.2020.102301
2. Dabeedooal YJ, Dindoyal V, Allam Z, Jones DS (2019) Smart tourism as a pillar for sustainable
urban development: an alternate smart city strategy from Mauritius. Smart Cities 2:153–162.
https://doi.org/10.3390/smartcities2020011
2 IoT-Based Smart City Architecture and Its Applications 23

3. Darmawan AK, Siahaan D, Susanto TD, et al (2019) Identifying success factors in smart city
readiness using a structure equation modelling approach. In: 2019 international conference on
computer science, information technology, and electrical engineering (ICOMITEE). https://
doi.org/10.1109/icomitee.2019.8921312
4. Einola S, Kohtamäki M, Hietikko H (2019) Open strategy in a Smart City. Technol Innov
Manag Rev 9:35–43. https://doi.org/10.22215/timreview/1267
5. Esapour K, Moazzen F, Karimi M et al (2022) A novel energy management framework incor-
porating multi-carrier energy hub for Smart City. IET Gener Transm Distrib. https://doi.org/
10.1049/gtd2.12500
6. Gokozan H, Tastan M, Sari A (2017) Smart cities and management strategies. Chapter 8 in
Book: 2017 Socio-Economic Strategies. ISBN: 978-3-330-06982-4
7. Heidari A, Navimipour NJ, Unal M (2022) Applications of ML/DL in the management of Smart
Cities and societies based on new trends in information technologies: a systematic literature
review. Sustain Cities Soc 85:104089. https://doi.org/10.1016/j.scs.2022.104089
8. Internet of Things. http://www.ti.com/technologies/internet-of-things/overview.html.
Accessed 01 Apr 2019
9. Khanna A, Kaur S (2019) Evolution of internet of things (IOT) and its significant impact in the
field of precision agriculture. Comput Electron Agric 157:218–231. https://doi.org/10.1016/j.
compag.2018.12.039
10. Korte A, Tiberius V, Brem A (2021) Internet of things (IOT) technology research in business
and management literature: results from a co-citation analysis. J Theor Appl Electron Commer
Res 16:2073–2090. https://doi.org/10.3390/jtaer16060116
11. Kummitha RK, Crutzen N (2019) Smart cities and the citizen-driven internet of things: a
qualitative inquiry into an emerging Smart City. Technol Forecast Soc Chang 140:44–53.
https://doi.org/10.1016/j.techfore.2018.12.001
12. Kuyper T (2016) Smart city strategy and upscaling: comparing Barcelona and Amster-dam.
Master Thesis, MSc. IT & Strategic Management. https://doi.org/10.13140/RG.2.2.24999.
14242
13. Lemphane NJ, Kotze B, Kuriakose RB (2022) A review on current IOT-based pasture manage-
ment systems and applications of digital twins in farming. Adv Intell Syst Comput 173–180.
https://doi.org/10.1007/978-981-16-4538-9_18
14. Mora-Sanchez OB, Lopez-Neri E, Cedillo-Elias EJ et al (2021) Validation of IOT infrastructure
for the construction of Smart Cities solutions on living lab platform. IEEE Trans Eng Manage
68:899–908. https://doi.org/10.1109/tem.2020.3002250
15. Rotuna C, Gheorghita A, Zamfiroiu A, Smada D-M (2019) Smart city ecosystem using
Blockchain technology. Informatica Economica 23:41–50. https://doi.org/10.12948/issn14531
305/23.4.2019.04
16. Rout RR, Vemireddy S, Raul SK, Somayajulu DVLN (2020) Fuzzy logic-based emergency
vehicle routing: an IOT system development for Smart City applications. Comput Electr Eng
88:106839. https://doi.org/10.1016/j.compeleceng.2020.106839
17. Saba D, Sahli Y, Berbaoui B, Maouedj R (2019) Towards smart cities: challenges, compo-
nents, and architectures. In: Toward social Internet of Things (SIoT): enabling technologies,
architectures and applications, pp 249–286. https://doi.org/10.1007/978-3-030-24513-9_15
18. Sharma M, Joshi S, Kannan D et al (2020) Internet of things (IOT) adoption barriers of Smart
Cities’ waste management: an Indian context. J Clean Prod 270:122047. https://doi.org/10.
1016/j.jclepro.2020.122047
19. Toledo P, Rubino R, Musolino F, Crovetti P (2021) Re-thinking analog integrated circuits in
digital terms: a new design concept for the IOT ERA. IEEE Trans Circuits Syst II Express
Briefs 68:816–822. https://doi.org/10.1109/tcsii.2021.3049680
Chapter 3
Principal Component Analysis
and Correlation Coefficient-Based
Decision-Making Approach for Stock
Portfolio Selection

Garima Bisht and A. K. Pal

1 Introduction

Since the financial market is one of the riskiest markets, it has always been a topic of
great interest to investors due to its ability to raise capital greatly, but investors still
face a lack of choice of the right stocks for the portfolio. Stocks should be assessed on
the basis of multiple criteria. Investors always try to maximize return and minimize
risk, but this is not always possible because usually with increased return there is an
increase in risk and vice versa; therefore, stocks should be combined in such a way
as to allow an acceptable compromise between risk and return. For this, investors
require intricate knowledge of the financial market.
Since the stock selection process is a complex decision-making process with many
contradictory objectives, it normally consists of two phases: (1) selection of suitable
shares and (2) determining weight of each share to be invested in. Stock selection
is viewed as a multi-criteria decision-making problem as it includes selection of
stocks based on certain sets of criterions. MCDM is a measured tool used for both,
determining the criteria weights and to rank the alternatives. Over the past decades,
many researchers and inventors have cited numerous approaches for ranking the
alternatives as well as for determining the weights of criteria [1, 2]. The involvement
of multi-criteria decision analysis (MCDA) to solve the problem of financial market
was examined by [3]. In the recent decades also much research work has been carried

G. Bisht (B) · A. K. Pal


Department of Mathematics, Statistics and Computer Science, G. B. Pant University of
Agriculture and Technology, Pantnagar, Uttarakhand 263145, India
e-mail: garimabisht98@gmail.com
A. K. Pal
e-mail: ak.pal@gbpuat-cbsh.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 25
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_3
26 G. Bisht and A. K. Pal

out where financial decisions are made based on MCDM approaches [4, 5]. The work
points out the substantial involvement of this type of analysis on the optimal selection
problem of financial portfolios.
For capitalizing in stock exchange [6] implemented a hybrid MCDM technique
integrating DEMETAL (Decision-Making Trial and Evaluation Laboratory) and
VIKOR (VlseKriterijumska Optimizacija I Kompromisno Resenje) methods. Refer-
ence [7] introduced a novel hybrid MCDM approach based on the spearman corre-
lation coefficient to rank the stocks. In the framework of Tehran stock exchange
(TSE), an effort has been made by [8] in view of DEA-TOPSIS (Data envelop-
ment analysis-Technique for order preference by similarity to an ideal solution)
outline. Reference [9] developed portfolio selection to rank high ranked stocks. Refer-
ence [10] introduces a hybrid DEA-COPRAS (Complex Proportional Assessment)
approach for selection of portfolios of NSE-based risk return interfaces. Another
hybrid AHP-TOPSIS (Analytic Hierarchy Process) technique was developed by [11]
for ranking the economic performance of particular Indian private banks. Reference
[12] proposed some new mean–variance portfolio models. Almost all the research
in the past considers a hybrid approach for stock selection; however, a combined
two-stage framework considering weights of decision criteria and ranking of stocks
is erratic in literature. Reference [13] proposed the model of mean–variance that laid
the foundation for modern portfolio theory. The past researchers have recognized the
utility of including additional criteria beyond variance and return into the portfolio
selection model [14, 15].
Assigning proper weights to criteria is one of the biggest challenges in the
multi-criteria decision-making process [16]. During early studies, the easiest way
to determine the attributed weights was to assign equal weights [17]. But the final
ranking depends on the weights of attributes hence taking equal weights was never
an appropriate option [18]. During further studies, numerous weight determination
methods were developed which were classified into subjective, objective, and hybrid
methods. Subjective methods weights depend completely on the DM’s preferences
like SMART method [19]. Objective methods weights depend on the data in the deci-
sion matrix like ENTROPY method, CRITIC method, etc. Hybrid methods contain
the combination of both [20, 21]. Almost all the conventional weighing methods
assume that the criteria are independent of each other which is not always true in
realistic problems.
A multivariate statistical procedure known as principal component analysis PCA
is used to condense the huge number of criteria into a smaller number of independent
principal components which are a linear combination of criteria. Thus, the use of PCA
as a weight determination method can be more reliable as compared to the previously
defined weighing methods. It condenses data by recognizing variables that justify for
a huge share of variance in a large dataset [22, 23]. PCA finds principal components
as linear vectors that intend the justification of data’s variability [24]. PCA can be
conventionally used through common statistical computer programs due to which it
is now one of the most popular analytical methods [25]. In numerous sectors, it has
been efficiently used as a large data multivariate analysis tool like vendor and supply
chain [25], commercial airline industry [26], chemometrics [24] life cycle assessment
3 Principal Component Analysis and Correlation Coefficient-Based … 27

[27] and decision making [23, 25]. Recently, the efficiency of transport companies
was evaluated by an integrated PCA model [28], and a PCA-based tensor evaluation
model was developed for group decision making [29]. Due to the non-requirement
of past weight assignment units for all statistics, PCA lessens the subjectivity due to
individual lookouts held between decision makers [25]. However, the determination
of weights of conflicting criteria of stock selection through PCA is rare in literature.
The primary motivations of the paper are
1. The weights of the criteria play a significant role in the ranking of alternatives.
Almost all the conventional weighing methods assume that the criteria are inde-
pendent of each other; but considering the realistic decision-making problems,
the hypothesis of independence of criteria is not always satisfied. Thus, the study
uses the concept of PCA which converts the interdependent criteria into a set of
linearly independent principal components. Also, as a dimension reduction tool,
it can easily deal with large datasets. It provides a data-focused method that elim-
inates unnecessary subjectivity due to human requirements for normalized units
[25]. Unlike traditional weighing methods, PCA also accounts for uncertainty
in data [30]. Thus, PCA can be an efficient tool for the determination of criteria
weights.
2. The ranking methods for stock selection developed in the past are mostly hybrid
methods which are a combination of previously defined MCDM approaches.
Thus, the study develops a novel two-stage approach where weights of the finan-
cial criteria are determined by PCA and ranking using the concept of a correla-
tion coefficient. The most acceptable alternatives show positive correlation with
positive ideal solution and negative correlation with negative ideal solution.
3. The two main objectives of the portfolio optimization problem for any novice
investor are risk and return, but there exist many other factors which affect the
decision of portfolio optimization. The present study incorporates an additional
objective p/e ratio which is used to gauge the valuation of a stock. It expresses
to the investor whether the stock is undervalued or overvalued.
The rest of the paper is systematized as follows: The different phases of the
proposed methodology are presented in Sect. 2. An applied execution of the proposed
approach in stock selection is shown in Sect. 3. Results are discussed in Sect. 4,
followed by conclusions in Sect. 5.

2 Proposed Methodology

The section defines a two-stage framework for ranking the stocks. In the first stage,
the weights of the financial criteria are determined by PCA. In the second stage, we
introduce a correlation coefficient-based approach to estimate the rank of stocks. The
detailed steps involved in the process are explained in the following sections.
28 G. Bisht and A. K. Pal

2.1 To Determine Criteria Weights

This section presents an objective weight determination method for obtaining weights
of the criteria in a multi-criteria decision-making process based on principal compo-
nent analysis (PCA). The method assigns high weightage to those criteria which
have a positive impact on the principal components as compared to the ones that are
negatively affecting the principal components. The steps to attain criteria weights
are as follows:
1. Construct the initial decision matrix considering the evaluations of stocks with
respect to different financial criteria. If n stocks are evaluated on the basis of m
criteria, then the matrix is represented as
⎡ ⎤
x11 x12 ... x1m
⎢ x21 x22 ... x2m ⎥
⎢ ⎥
A = ⎢. .. .. .. ⎥.
⎣ .. . . . ⎦
xn1 xn2 . . . xmn

2. Perform the PCA on the given decision matrix of the MCDM problem to attain
the proportion of each principal component.

PC1 PC2 PC3 PC4


a1 a2 a3 a4

such that a1 + a2 + a3 + a4 + · · · = 1.
3. Form the positive and negative set of each principal component by analyzing the
criteria having positive and negative affect on the components.

PC+
1 = C α1 , C β1 , . . . .. ,


PC−
1 = C α2 , C β2 , . . . .. ,

where Cα1 , Cβ1 , . . . .. have positive impact on PC1 and Cα2 , Cβ2 , . . . .. have a
negative impact on PC1.
4. Find the weights of the criteria by considering the type of impact they have on
principal components.
Example: C 1 have positive impact on PC1 , PC2 , PC4 and negative impact on
PC3, then wc1 = |a1 + a2 − a3 + a4 |.
5. Finally find the standardized weights using Eq. (1).
wci
ωci = n (1)
i=1 wci

n
such that i=1 ωci = 1.
3 Principal Component Analysis and Correlation Coefficient-Based … 29

2.2 To Rank the Alternatives

This section presents a ranking method based on the perception of correlation of


alternatives with best and worst solutions. The steps involved in the process are
1. Construct an initial decision matrix.
2. Since, the scale of different financial criteria are different, the next step is to
normalize the decision matrix problem using the vector normalization as shown
in Eq. (2).
ai j
ni j = (2)
n 2
i=1 ai j

such that n i j ∈ [0, 1].


3. Now, considering the importance of different attributes we get a weighted
normalized matrix using Eq. (3) as shown below.

ci j = w j .n i j , (3)

where w j represents the weight of different attributes.


4. Find the best and worst ideal solutions.

A+ = (n i1 , n i2 , . . . n im )|n i j is the best value of jth attribute


A− = (n i1 , n i2 , . . . n im )|n i j is the worst value of jth attribute .

5. Find the correlation of each alternative from the best and worst ideal solution.
6. Determine the utility value for each alternative by using Eq. (4)

U Ai = si+ − si− , (4)

where si+ is the correlation coefficient of Ai from the positive ideal solution and
si− is the correlation coefficient of Ai from the negative ideal solution.

3 A Real Case Study

The vital step before investing in the stocks is their evaluation based on the financial
criteria. This section presents the application of the proposed method in ranking eight
different stocks, Hindustan Unilever (I 1 ), Bajaj Finance (I 2 ), Asian Paints (I 3 ), Tata
Consultancy Services (I 4 ), Pidilite (I 5 ), Tata Steel (I 6 ), Titan Company (I 7 ), Reliance
Industries (I 8 ) based on the real data. There exist numerous decision criteria which
affect the performance of the stocks. Considering the uncertainties, there is no way
to select a suitable number of financial criteria for evaluating the stocks. In view
30 G. Bisht and A. K. Pal

of the literature and the expert’s opinion, we consider five fundamental criteria for
evaluating the stocks. These criteria are revenue, earning per share, return on equity,
debt, and long term beta. First three criteria belong to beneficial criteria specifying
good growth for higher value, while the last two criteria belong to non-beneficial
criteria specifying good growth for lower value. Real data showing evaluation of the
eight alternatives based on five criteria are retrieved from finance.yahoo.com from 1/
1/2012 to 1/1/2022. Exponential moving average method is used for the conversion
of multi-dimensional data into single numerical data given in Table 1.
PCA was performed for the data given in Table 1, and the results are given in Table
2. It can easily be seen that PC1 accounts for most of the variation 51.4% followed
by PC2 29.33%.
The positive and negative set of each principal component is formed by analyzing
the criteria having positive and negative affect on the components.

Table 1 EMA of actual data


M1 M2 M3 M4 M5
I1 39,765.61 27.06053 61.92668 0.005491 0.42
I2 18,125.48 48.28956 17.67792 3.71046 1.7
I3 18,878.97 25.26834 28.05581 0.051065 0.574
I4 141,358.9 78.91646 37.15121 0.00081 0.555
I5 6619.68 19.166 25.00322 0.041992 0.666
I6 148,210 42.13002 9.308674 1.254881 1.22
I7 18,642.31 12.61235 21.67825 0.399102 0.897
I8 479,372.9 59.82727 10.63541 0.611331 1.1

Table 2 Principal component analysis


PC1 PC2 PC3 PC4 PC5
Proportion 0.514 0.2933 0.1472 0.04413 0.00137
Cumulative 0.514 0.8073 0.9545 0.99863 1
M1 0.272571 0.67614 0.278581 −0.60737 0.148443
M2 0.277957 0.580854 −0.58917 0.481274 −0.08124
M3 −0.49675 0.016589 −0.63688 −0.54898 −0.2144
M4 0.503621 −0.37824 −0.4102 −0.23233 0.617296
M5 0.589956 −0.2492 −0.03721 −0.21005 −0.7378
3 Principal Component Analysis and Correlation Coefficient-Based … 31

PC+ −
1 = {M1 , M2 , M4 , M5 }, PC1 = {M3 }
+ −
PC2 = {M1 , M2 , M3 }, PC2 = {M4 , M5 }
PC+ −
3 = {M1 }, PC3 = {M2 , M3 , M4 , M5 }.
PC+ −
4 = {M2 }, PC4 = {M1 , M3 , M4 , M5 }
PC+ −
5 = {M1 , M4 }, PC5 = {M2 , M3 , M5 }

Based on the given sets and the proportion of different PCA’s, we can find the
criteria weights as given in Table 3.

w1 = |0.514 + 0.2933 + 0.1472 − 0.04413 + 0.00137| = 0.91174.

w2 = |0.514 + 0.2933 − 0.1472 + 0.04413 − 0.00137| = 0.70286.

w3 = |−0.514 + 0.2933 − 0.1472 − 0.04413 − 0.00137| = 0.4134.

w4 = |0.514 − 0.2933 − 0.1472 − 0.04413 + 0.00137| = 0.03074.

w5 = |0.514 − 0.2933 − 0.1472 − 0.04413 − 0.00137| = 0.028.

The ranking obtained by using the proposed methodology is given in Table 4.

Table 3 Weights of criteria


Weights ω1 ω2 ω3 ω4 ω5
0.436921 0.336822 0.198108 0.014731 0.013418

Table 4 Ranking of
U Ai Ranking
alternatives
I1 −0.29311 4
I2 −0.69968 8
I3 −0.52991 6
I4 0.014956 3
I5 −0.65133 7
I6 0.671128 2
I7 −0.31702 5
I8 1.151763 1
32 G. Bisht and A. K. Pal

Table 5 Ranking results by different MCDM models


MADM models Ranking results Optimal project
Proposed method I8 > I6 > I4 > I1 > I7 > I3 > I5 > I2 I8
TOPSIS I8 > I4 > I6 > I1 > I2 > I3 > I5 > I7 I8
VIKOR I8 > I6 > I4 > I1 > I3 > I7 > I2 > I5 I8
COPRAS I8 > I4 > I1 > I6 > I2 > I3 > I5 > I7 I8
MABAC I8 > I4 > I1 > I6 > I2 > I3 > I5 > I7 I8
WPM I8 > I4 > I6 > I1 > I2 > I3 > I7 > I5 I8

4 Results and Discussions

4.1 Comparative Analysis

In order to verify the effectiveness and validity of the proposed approach for ranking
the alternatives, this section compares the proposed approach with other existing
traditional MADM approaches. Considering the example of stock selection presented
in Sect. 3, we compare the ranking results obtained by our proposed approach with
five MADM models, namely TOPSIS, VIKOR, COPRAS, MABAC, and WPM,
respectively. The ranking results obtained by the models are given in Table 5.

4.2 Sensitivity Analysis

This section demonstrates the stability of the proposed approach toward the change
in weights of criteria. For this, we make change in the criteria weights by 1–30%
and observe the variation in the ranking of alternatives. Table 6 shows the spearman
correlation coefficient in the ranking observed when the criteria weights are changed
by different percentage with the original ranking.

Table 6 SSC between ranking with different criteria weights


% Change in weights (%) Ranking SCCs
1 I8 > I6 > I4 > I1 > I7 > I3 > I5 > I2 1
3 I8 > I6 > I4 > I1 > I7 > I3 > I5 > I2 1
5 I8 > I6 > I4 > I1 > I7 > I3 > I5 > I2 1
10 I8 > I6 > I4 > I1 > I3 > I7 > I5 > I2 0.97619
15 I8 > I6 > I4 > I1 > I3 > I5 > I7 > I2 0.928571
20 I8 > I4 > I6 > I1 > I3 > I5 > I7 > I2 0.904762
30 I8 > I4 > I6 > I1 > I3 > I5 > I7 > I2 0.904762
3 Principal Component Analysis and Correlation Coefficient-Based … 33

From Table 6, we can observe that for the change of (<5%) in the criteria weights
there is no conflict in the ranking of alternatives. For the change (>5%), there arises
a difference in the ranking, but the correlation coefficient of the observed ranking
with the original ranking is high. Also, the optimal solution in all circumstances is
the same; hence, this verifies the stability of the proposed method toward the optimal
solution.

4.3 Portfolio Analysis

On the basis of the ranking obtained above we construct four portfolios P1 , P2 , P3 ,


P4 by selecting the top four, five, six, seven alternatives, respectively. For this, we
collect historical data of the securities from 1/1/2016 to 1/1/2022. Table 7 depicts
the securities return.
A multi-objective genetic algorithm is employed to obtain the weights of stocks in
a portfolio. The optimization is performed in the MATLAB simulation platform. The
optimization toolbox has been used for multi-objective genetic algorithm (MOGA)
algorithms for generating a set of pareto optimal solutions.
The two important objectives considered by any investor are return and risk. An
investor always faces a trade-off between maximization of return and minimization
of risk. The present study considers an additional objective, i.e., to minimize the p/
e ratio (PE) of a portfolio. The p/e ratio helps to gauge the valuation of the stock.
It tells us whether the stock is undervalued or overvalued. Thus, the optimization
problem can be stated as F = min {Risk, −Return, PE} subject to the constraint that
n
sum of weights of all the stocks = 1, i.e., i=1 xi = 1.
1. Risk: The risk of the portfolio is represented by the portfolio downside deviation
given by Eq. (5) where xi represents the weight and di represent the downside
deviation of the securities.


n
Min. risk = xi di (5)
i=1

2. Return: The expected return of the portfolio is determined by Eq. (6) where xi
and ri represents the weight and the return of the securities.

Table 7 Return of securities


I8 I6 I4 I1 I7 I3 A5
Average monthly return 0.02520 0.02765 0.01838 0.01590 0.03051 0.02026 0.02288
Annual return 0.30245 0.33190 0.22064 0.19080 0.36620 0.24312 0.27467
34 G. Bisht and A. K. Pal


n
Max. return = x i ri (6)
i=1

3. P/E ratio: The p/e ratio of portfolio is determined by Eq. (7) where xi , yi, and
ei represents the weight, share price, and the EPS (earning per share) of the
securities.
n
i=1 x i · yi
Min. p/e = n (7)
i=1 x i · ei

The multi-objective genetic algorithm provides us with a set of non-dominated


optimal solutions in the form of pareto front. To obtain an optimal solution, we
require a decision-making technique. The present study employs the use of fuzzy
decision-making technique [31] to obtain an optimal solution from the collection of
non-dominated optimal solutions. The fuzzy membership value of ith objective is
calculated as

⎨ 1F max −F forFi ≤ Fi̇
min

X i = F max forFi̇min ≤ Fi ≤ Fi̇max ,
i̇ i

⎪ −Fi̇min
⎩ i̇
0 forFi ≥ Fi̇max

where the maximum and minimum value of the ith objective function are represented
by Fi̇max and Fi̇min . For each non-dominated solution, the normalized function is
defined as [32]
n p
Xi
χp = m
i=1
n p,
p=1 i=1 Xi

where “n” represents the number of objective functions and “m” represents the non-
dominated solutions. The optimal solution out of the collection of non-dominated
optimal solutions on parent front is the one with the maximum value of χ p .
The weights of the securities obtained by the fuzzy decision-making technique
out of all the solutions of pareto front and the expected portfolio return based on
the proposed method are depicted in Tables 8 and 9. From Table 9, it is observed
that portfolio P3 attains the highest expected return. Hence, the combination of top
six stocks is to be selected for investment. The comparison between the proposed
approach and the previous studies is given in Table 10. The expected return by the
proposed model is 28.205% which is much more than the return by previously defined
models, also the return is almost double that of Thakur [10] model. This indicates
that the proposed model is capable of giving better results. Thus, it verifies the
effectiveness and robustness of the proposed approach in a multi-criteria decision-
making system.
3 Principal Component Analysis and Correlation Coefficient-Based … 35

Table 8 Weights of stocks


I8 I6 I4 I1 I7 I3 I5
P1 0.00515 0.32112 0.09570 0.57801 – – –
P2 0.00136 0.00215 0.12844 0.72955 0.1385 – –
P3 0.00908 0.03721 0.02558 0.31913 0.42543 0.18356 –
P4 0.00963 0.03335 0.0117 0.08165 0.12691 0.5061 0.12553

Table 9 Performance of
Portfolio Portfolio return
different portfolios
P1 0.23958
P2 0.21938
P3 0.28205
P4 0.2590

Table 10 Comparison of proposed approach with previous studies


Model Thakur et al. [9] Naveenan [33] Narang et al. [34] Proposed approach
Year 2016 2019 2021 2022
Expected return 0.1301 0.17 0.1672 0.28205

5 Conclusions

In the present study, a novel two-stage multi-criteria decision-making approach is


proposed for stock selection, portfolio construction, and optimization for novice
investors, in which the first stage demonstrates the use of PCA for finding the
weights of the criteria and the second stage establishes ranking of alternatives on
the basis of their correlation coefficients from positive and negative ideal solutions.
In comparison with the previously defined weight determination methods, the use of
PCA makes the present approach more liable as PCA not only converts the corre-
lated criterions into the set of linearly independent principal components, but it is
an efficient dimension reduction tool also which helps to deal with large datasets.
It provides a data-focused method that eliminates unnecessary subjectivity due to
human requirement for normalized units. Ranking of stocks considering the corre-
lation coefficient presents a new approach, as in past studies, the stocks are ranked
considering the hybrid approaches of previously defined MCDM methods. From the
financial point of view risk and return are the two important factors considered by
any novice investors, the study developed a new multi-objective function consid-
ering p/e ratio as an additional objective which is used to gauge the valuation of a
stock. Finally, a multi multi-objective genetic algorithm is employed to optimize the
portfolio. The applicability of the proposed approach is shown by considering a real
case study aiming to rank eight securities based on five criteria. A portfolio based
36 G. Bisht and A. K. Pal

on rank affinity is built to analyze the performance of the proposed method. The
outcome specifies that the portfolio is proficient to deliver better returns (0.28205 or
28.205%). The performance of the results has been shown to be effective compared
to the previous models.

References

1. Haseli G, Sheikh R, Sana SS (2019) Base-criteria on multi criteria decision making method
and its applications. Int J Manag Sci Eng Manag 15(2):79–88
2. Pamučar D, Žižović M, Biswas S, Božanić D (2021) A new logarithm methodology of additive
weights (LMAW) for multi-criteria decision-making: application in logistics. Facta Univer,
Ser: Mech Eng 19(3):361–380
3. Zopounidis C (1999) Multicriteria decision aid in financial management. Euro J Oper Res
119:404–415
4. Xidonas P, Doukas H, Hassapis C (2021) Grouped data, investment committees and multicri-
teria portfolio selection. J Bus Res 129:205–222
5. Mendonça GHM, Ferreira FGDC, Cardoso RTC, Martins FVC (2020) Multi-attribute decision
making applied to financial portfolio optimization problem. Expert Syst Appl 158:113527
6. Fazli S, Jafar H (2012) Developing a hybrid multi-criteria model for investment in stock
exchange. Manag Sci Lett 2(2):457–468
7. Poklepović T, Babić Z (2014) Stock selection using a hybrid MCDM approach. Croatian Oper
Res Rev 5:273–290
8. Mansouri A, Ebrahimi N, Ramazani M (2014) Ranking of companies based on TOPSIS-DEA
approach methods (evidence from cement industry in Tehran stock exchange). Pak J Stat Oper
Res 10(2):189–209
9. Thakur GSM, Bhattacharyya R, Sarkar S (2018) Stock portfolio selection using Dempster-
Shafer evidence theory. J King Saud Univer Comput Inf Sci 30:223–235
10. Gupta S, Bandyopadhyay G, Bhattacharjee M, Biswas S (2019) Portfolio selection using DEA-
COPRAS at risk – return interface based on NSE (India). Int J Innov Technol Explor Eng
(IJITEE) 8(10)
11. Gupta S, Mathew M, Gupta S, Dawar V (2020) Benchmarking the private sector banks in India
using MCDM approach. Wiley 21(2)
12. Dai Z, Kang J (2022) Some new efficient mean-variance portfolio selection models. Int J Financ
Econ 27(4):4784–4796
13. Markowitz HM (1990) Portfolio selection, efficient diversification of investments. Blackwell,
Cambridge MA, Oxford UK
14. Steuer RE, Qi Y, Hirschberger M (2007) Suitable-portfolio investors, nondominated frontier
sensitivity, and the effect of multiple objectives on standard portfolio selection. Ann Oper Res
152:297–317
15. Roman D, Darby-Dowman K, Mitra G (2007) Mean-risk models using two risk measures: a
multi-objective approach. Q Financ 7(4):443–458
16. Velazquez MA, Claudio D, Ravindran AR (2010) Experiments in multiple criteria selection
problems with multiple decision makers. Int J Oper Res 7(4):413–428
17. Wang JJ, Jing YY, Zhang CF, Zhao JH (2009) Review on multi-criteria decision analysis aid
in sustainable energy decision making. Renew Sustain Energy Rev 13(9):2263–2278
18. Ginevičius R (2011) A new determining method for the criteria weights in multicriteria
evaluation. Int J Inf Technol Decis Mak 10:1067–1095
19. Zardari NH, Ahmed K, Shirazi SM, Yusop ZB (2014) Weighting methods and their effects
on multi-criteria decision-making model outcomes in water resources management. Springer,
New York, NY, USA
3 Principal Component Analysis and Correlation Coefficient-Based … 37

20. Delice EK, Can GF (2020) A new approach for ergonomic risk assessment integrating
KEMIRA, best–worst and MCDM methods. Soft Comput 24:15093–15110
21. Du YW, Gao K (2020) Ecological security evaluation of marine ranching with AHP-entropy-
based TOPSIS: a case study of Yantai. China Mar Policy 122:104223
22. Adler N, Golany B (2001) Evaluation of deregulated airline networks using data envelopment
analysis combined with principal component analysis with an application to Western Europe.
Eur J Oper Res 132(2):260–273
23. Zhu J (1998) Data envelopment analysis vs. principal component analysis: an illustrative study
of economic performance of Chinese cities. Euro J Oper Res 111(1):50–61
24. Bro R, Smilde AK (2014) Principal component analysis. Anal Meth 6(9):2812–2831
25. Petroni A, Braglia M (2000) Vendor selection using principal component analysis. J Supply
Chain Manag 36(2):63–69
26. Adler N, Golany B (2002) Including principal component weights to improve discrimination
in data envelopment analysis. J Oper Res Soc 53(9):985–991
27. Balugani E, Lolli F, Pini M, Ferrari AM, Neri P, Gamberini R, Rimini B (2021) Dimensionality
reduced robust ordinal regression applied to life cycle assessment. Expert Syst Appl 178:115021
28. Stevic Z, Miskic S, Vojinovic D, Huskanovic E, Stankovic M, Pamucar D (2022) Development
of a model for evaluating the efficiency of transport companies: PCA-DEA-MCDM model.
Axioms 11(3):140
29. Singh M, Pant M, Kong L, Alijani Z, Snasel V (2023) A PCA-based fuzzy tensor evaluation
model for multi-criteria group decision making. Appl Soft Comput 132:109753
30. Ning C, You F (2018) Data-driven decision making under uncertainty integrating robust opti-
mization with principal component analysis and kernel smoothing methods. Comput Chem
Eng 112:190–210
31. Biswas PP, Suganthan PN, Qu BY, Amaratunga GAJ (2018) Multiobjective economic envi-
ronmental power dispatch with stochastic wind solar small hydro power energy. Energy
150:1039–1057
32. Brka A, Al-Abdeli YM, Kothapalli G (2015) The interplay between renewables penetration,
costing and emissions in the sizing of stand-alone hydrogen systems. Int J Hydrogen Energy
40(1):125–135
33. Naveenan RV (2019) Risk and return analysis of portfolio management services of reliance
nippon asset management limited (RNAM). Global J Manag Bus 6(1):108–117
34. Narang M, Joshi MC, Bisht K, Pal A (2022) Stock portfolio selection using a new decision-
making approach based on the integration of fuzzy cocoso with heroninan mean operator. In:
Decision making: applications in management and engineering
Chapter 4
Survey on Crop Production and Crop
Protection

H. S. Rakshitha, Mayur S. Gowda, and Akshata S. Kori

1 Introduction

According to the Food and Agriculture Organization of the UN, growth in the popu-
lation may rapidly increase to 9 billion by 2050. Climate change, increasing demand
for organic food, rapid population, conversion of farmland to industrial areas, and
growing market demands have posed a great challenge in crop production. The focus
on sustainability is also a challenge to protect the quality of soil in upcoming years. In
this, the growing technological advancements have shown better results as conveyed
in this paper.
Agriculture plays a major role in the economy of the country as it is the basic
source of livelihood for many low-income and developing countries. The agriculture
industry needs to grow its production levels by 70% to feed the world’s growing
population. To increase the yield of crops, monitoring the environmental factors is
not a complete solution. There are several other factors that reduce productivity in
agriculture to an extreme extent.
The US Department of Agriculture, Agricultural Research Service, is the fore-
most agricultural research organization in the world with more than 3000 scientists
conducting agricultural research in nearly 100 locations around the USA and in
three foreign countries [1]. The need for automation is suggested in agriculture to
overcome the challenges posed by human and natural resources.
This paper analyzes the application of various innovative technologies for
crop production and protection. Innovative technologies achieve self-sufficiency
in agriculture by introducing innovative environmentally suitable solutions and
modern agricultural technologies that are necessary for improving productivity and
decreasing production costs. Embedded-based applications help farmers with many

H. S. Rakshitha · M. S. Gowda · A. S. Kori (B)


Ramaiah Institute of Technology, Bangalore, India
e-mail: kori.akshu@msrit.edu

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 39
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_4
40 H. S. Rakshitha et al.

agricultural activities like sowing seeds, watering crops, applying fertilizers, insec-
ticides, pesticides, etc. These applications will help in moisture monitoring, weather
monitoring, growth monitoring, etc.
These are the most promising technologies for solving the present-day crisis in
underdeveloped and developing countries. This kind of technology solves hunger
problems globally. Crop production is undergoing a huge transition with the use of
technology in all fields from microbiology to artificial intelligence. Management of
systems for data and information clustering is a linchpin for crop production and
protection. The intervention of real-time applications in agriculture has made rapid
growth in crop management. As demand for food and employment is increasing, arti-
ficial intelligence and machine learning help in good quality and quantity production
of crops and also increase job opportunity in this field. These technologies have made
a revolution in the agriculture sector.
In this paper, recent works for better crop production and protection have been
extensively studied and noted. These act as guides for narrowing down the research
on crop protection and yield generation to have better results in a short time frame.

2 Literature Survey

This section explores recent event studies that cover different aspects of innovative
technology for crop production and protection.
Crop production can be increased in several ways such as watering the plants from
time to time, protecting them from pesticides, and protecting them from heavy storms
and bad weather conditions. So, in order to perform this, manual effort is applied
by the people. This manual effort can be reduced by using the upcoming innovative
technologies. [2, 3] educate us on how drones are helpful in the agriculture sector.
Drones are aerial robots as shown in Fig. 1. They are programmed by artificial
intelligence that will help farmers to optimize the use of inputs (seed, fertilizers,
water), react very quickly to threats (weeds, pests, fungi), save crop scouting time,
and to roughly calculate the yield from a field.
The importance of crops during unforeseeable weather conditions and the destruc-
tion of crops in many other naturally occurring phenomena are indicated by [4], which
makes the protection of crops a majority issue that can be solved using data analytics
and the internet of things, and these concepts also help in increasing the productivity
of the crops as shown in Fig. 2.
Several concepts of IoT as shown in Fig. 3 use various wireless sensor networks,
RF identification, and cloud computing which have been used to solve these existing
issues. The authors discuss how IoT and data analytics can be coupled to provide
better solutions. The IoT ecosystem consists of IoT devices that consist of sensors
and actuators, which are wirelessly connected and are mainly used for sensing
temperature and humidity conditions related to crops.
The communication technology is used to deliver the related data extracted from
the sensors toward the main node either using the unlicensed or licensed ISM bands.
4 Survey on Crop Production and Crop Protection 41

Fig. 1 Use of drone technology in spraying pesticides

Fig. 2 Farming with data analytics

Fig. 3 Applications of IoT in farming


42 H. S. Rakshitha et al.

The communication standards that can be used include ZigBee, Bluetooth, Z-wave,
etc. For long-range communication, internet-connected devices can be used for trans-
mitting the collected data from the sensors to the main node. The inclusion of data
analytics with IoT helps in improving crop protection in such a way that the data
extracted through the sensors can be used to analyze the crop or the field conditions.
The sensors installed in storage facilities help to monitor unfavorable conditions that
might occur. In that case, the control center will receive an alert message for further
actions.
Big data analytics, ML, and DL algorithms are used in the agriculture sector.
Bhat et al [5] inform that development of an algorithm can be easily done, but
the algorithm must guarantee accuracy and consistency in all the scenarios. Deep
learning algorithms are the most promising technologies that give more effectiveness
in innovation. Here, it also talks about the neural networks that can be implemented in
these innovative technologies. These sensors can be directly deployed or implanted
on the land, robots can be developed for nurturing crops, or weather stations can be
maintained from IoT. Hereby, maintenance and protection can be easily performed
by the farmers or companies. They also give an idea to implement the technologies
like big data analytics and artificial intelligence.
The usage of farmer’s manual efforts can be reduced by utilizing the present
technologies that provide several advantages in farming mechanics which include
monitoring crops and livestock. Joseph et al. [6] All of these can be handled using AI
frameworks and ML algorithms. Also, the Unmanned Autonomous Vehicle (UAV)
as shown in Fig. 4 can be utilized in order to improve precision farming using
human skills and the currently booming technologies. In the methodology proposed,
the information of the crop field is collected by taking images of the crop field
using their computational intelligence vision sensors, and based on the information
collected, the machine learning model is trained in such a way that on the basis of
color features obtained from the images, the nutrient content in the plant is provided
as output information.

Fig. 4 UAV in precision farming


4 Survey on Crop Production and Crop Protection 43

Estimation of the higher crop yield is one of the difficulties that are faced by
farmers in the agriculture business. So, various ML algorithms are utilized to esti-
mate crop production and yield. Since the significance of agricultural yield prediction
is increasing, [7] shows how ML approaches can be used to estimate crop produc-
tion. Since a large amount of dataset is available for the selection of the seeds and
forecasting of the yields, it becomes difficult for the farmers to perform these actions.
This work of the farmers can be minimized using artificial intelligence.
The productivity of the crop also depends on the area where they grow. So,
[8] proposes a model that is trained with ML concepts that determine productivity
grounded on the parameter’s moisture, downfall, and temperature. Prediction algo-
rithms such as logistic regression, Naive Bayes classifier, random forest, Support
Vector Machines (SVMs), k-Nearest Neighbor (KNN), Multi-Condition Filtering,
and collaborative filtering algorithms are applied. After training the dataset model
and applying any of these algorithms, a comparison of these algorithms is made
to analyze the accuracy of the model. For the recommendation, Multi-Condition
Filtering and collaborative filtering algorithms were applied. The input parameter
of the collaborative filtering is compared with the trained data of the system, and
it filters the crops based on their cosine similarities and categorizes the crop with
a different combination of the low, moderate, and high ranges of the input parame-
ters and shows the crop consequently using the Multi-Condition Filtering algorithm.
Table 1 depicts the overview of surveys from [9–24].

3 Analysis of the Survey

Crop production is a tedious job and protecting that crop is very significant for every
farmer to keep in mind. To make this work easier, many innovative technologies can
be used.
Big data analysis, machine learning, deep learning, artificial intelligence, etc. are
the technologies used to improve crop quality and quantity. This survey says that
we can implement innovative technologies in crop plant production and protection
as shown in Fig. 5. These technologies help from the analysis of the soil to the
harvesting of the soil.
The above-listed existing works are shown in Table 1 which provides an idea to
prepare the idea for the operation of innovative technologies like ML and AI in the
complete flow of agriculture practices.
For any crop production, the very first work is to prepare the soil. This includes
checking soil fertility, health, and its surrounding environment like temperature and
humidity for crop production. At this time, a farmer can use deep learning algorithms
to analyze and monitor the water level of the soil and the temperature of the weather,
and it can also educate AI-based technology like robots for the maintenance of soil.
This makes farmer’s work easier and more efficient.
The next stage of work is seed selection and sowing. In the traditional way of seed
selection and sowing, farmers without knowledge sow every seed and this might cause
44 H. S. Rakshitha et al.

Table 1 Key points of the research papers that are referred


S. Name Key points
No.
1 Agricultural spraying Chemical sprays and other nutrients to the plants through drones
drones: advantages and will have better battery quality to use on large hectares of land
disadvantages
2 Drones support Drones are used in identifying parasite and their information and
inprecision agriculture alert the farmer about the disease it causes and the precautionary
for fighting against measures to be taken to prevent them
parasites
3 RFIDsensing Crop protection and effective production can be done using data
technologies for smart analytics and the internet of things. The properties like humidity
agriculture and temperature are analyzed and sent to the farmers through
Bluetooth and other WiFi-related communication models
4 Big data and AI Accuracy and consistency are more essential factors in any
revolution in precision innovation and that can be done with better technologies like ML
agriculture: survey and and big data analytics. This paper gives ideas about how
challenges technologies are used in the agriculture field
5 Ethics of using AI and This paper aims to show the use of big data and AI in the field of
big data in agriculture: agriculture and their proper use to get rid of any problems that
the case of large might cause in the farmer economy and other aspects
agriculture
multinationals
6 Machine learning This speaks about the different algorithms used in ML, DL, and
applications for AI. Different algorithms in different fields of agriculture are
precision agriculture explained well

Fig. 5 Crop yield using artificial intelligence

loss of crop in some areas due to unhealthy and infertile seeds as shown in Figs. 6.
and 7. Using machine learning and deep learning algorithms, we can design a system
that can differentiate the healthy and unhealthy seeds and robots that can sow seeds
at a proper distance for the good growth of crops.
Once the sowing seed is completed, the next work is to provide manures and
fertilizers to the crops. Before providing the manures, measuring the quantity of
micro and macronutrients required for a particular crop is very important. To do
that, we can use deep learning technology. For providing manures and fertilizers,
4 Survey on Crop Production and Crop Protection 45

Fig. 6 Data analysis for


water logging effect on soil

Fig. 7 Data analysis of


water logging on growth of
plant

we can use semi-old technology like drip irrigation and its upgrades or technology
like artificial intelligence and machine learning. From these technologies, we can
provide fertilizers through automated pipeline systems or robotic technology.
When the crop plants start to grow, a farmer needs to take care of the plants from
pests and insects. Usually, we use pesticides and insecticides to protect plants. These
protectors should not be given to the root of the plants; hence, we can use drone
robots to spray these protectants aerially. This again uses artificial intelligence and
machine learning for working. Plant health can be monitored by GPS technology,
and these data can be stored and analyzed using deep learning.
Crops are protected and nurtured till they grow big to harvest. During this period, a
farmer’s work is to only go through the data and information obtained about the crop
plants. Once they are ready for harvesting, we can again use the robot technology and
pieces of machinery to harvest the crops. For fruit and vegetable harvesting, robots
can be used to pluck the fruits and vegetables. And for crops like ragi, wheat rice,
etc., harvesting machinery which is already on the market can be used along with
the updated version of those machineries.
As we all know, some crops must be stored before they sell and also some crops
must be sold out once they are harvested. So, it is very important to keep in mind
storage also before selling. Farmers blindly cannot store the harvested crops. So,
they can use big data analytics and IoT for analyzing the temperature, humidity, and
46 H. S. Rakshitha et al.

pressure of the stored room. A farmer will get a notification if the room is not in the
threshold conditions. And at any time, he can maintain the store room conditions.
Mean time for the selling of crops, the farmer can use deep learning technology for
analyzing the pricing of the crop from recollection and foreseeing. This can decrease
the burden of loss on the farmer’s economic conditions. This will also give good profit
for the farmer. In crop production, we can use the technologies like satellite photog-
raphy and imagery, global information systems (GIS), global positioning systems
(GPS), measuring systems and weather monitoring, yield monitoring systems, and
soil and plant sensing systems, and these systems are part of AI and IoT.

4 Inference from the Analysis

This paper aims to get knowledge for the usage of innovative technologies in each and
every step of crop production practices, which includes soil analysis, seed selection,
sowing micro and macronutrients’ analysis, crop growth monitoring, pest detection,
and alerting, yield monitoring, smart harvesting, etc. Figure 8 gives the best infor-
mation about the agricultural practices from start to finish of the work. Figure 5
gives complete information on using innovative technology in crop production and
protection.
Usually, farmers will experience difficulties in finding manpower for many field-
works performed in agriculture. So, using robots designed with artificial intelligence
and machine learning algorithms will be very much helpful in reducing manpower
and effective use of technological strength instead of manual strength.
In areas like soil testing and seed selection, farmers should prefer an expert. But
sometimes experts may not be near farmer’s land, or they can’t reach at a perfect
time, and that might cost a high economy to the farmer because soil testing is required
for every single crop the farmer has to grow. So, these problems can also be reduced

Fig. 8 Overview of
agriculture practices
4 Survey on Crop Production and Crop Protection 47

by implementing a system, which can check the pH, soil moisture and minerals, and
other things needed for the better development of the crop plants.
Also, the present technology can be used in crop growth monitoring and
harvesting. Here, we use deep learning and big data analytics to ensure proper main-
tenance of crop production. Technology can also be used in crop protection by having
drones, IoT, and big data analytics as a combination. In this, the farmer can check
the production activity from his place and get the data on what pest is attacking the
crop and what precautionary measures are to be taken in protecting the crop.

5 Conclusion

Crop production must increase in order to satisfy the increasing demands for food
so as to prevent future threats that may arise. The research was conducted on crop
production and land use. It was seen that during the period of crop growth, the crops
get normally affected by bad weather conditions, insects that eat up the grown crop,
and the type of soils that are used for the growth of the crop. All these factors must
be taken into consideration in order to obtain good production of the crops. This can
be achieved using artificial intelligence such that different ML and DL algorithms
can be used to predict the required features by training the model using several data
that are collected in real-time.
So, using the predictions obtained from the machine learning-trained models,
required measures can be taken to improve productivity. Once the productivity of
the crop has increased, the next step is the protection of the crops that are stored
in the facilities, such that they must be monitored in order to prevent the crop from
getting damaged. The moisture content and temperature must be managed in the
facility in which the crops are stored in order to prevent harm, the main cause of crop
harm is due to insects, and this can be prevented from live monitoring. So, different
research papers were analyzed and different models that were trained using machine
learning were understood and it was seen that the results from that model were almost
90%–96% accurate.

6 Future Scope

The work carried out in this paper is based on the theoretical information and the real-
time data that were obtained from the farmers for the purpose of understanding the
scenarios which affect crop production, and these are analyzed using the theoretical
models that were available, which helped us in providing a better solution using the
artificial intelligence combined with the IoT. Since the work has been carried out only
in a hypothetical manner, the advantages of implementing the system for providing
a much more precise solution were not performed for the existing problems in crop
production. So, the extension of the work in order to impose a perfect system in the
48 H. S. Rakshitha et al.

proper field conditions can provide us with more information to improve the existing
methods to reduce the problems in the agricultural domain.

References

1. Liu SY (2020) Artificial intelligence (AI) in agriculture. In: IT professional, vol 22, no 3, pp
14–15. https://doi.org/10.1109/MITP.2020.2986121
2. Shahrooz M, Talaeizadeh A, Alasty A (2020) Agricultural spraying drones: advantages and
disadvantages. Virtual Sympos Plant Omics Sci (OMICAS) 2020:1–5
3. Potrino G, Palmieri N, Antonello V, Serianni A (2018) Drones support in precision agriculture
for fighting against parasites. In: 2018 26th telecommunications forum (TELFOR), pp 1–4
4. Rayhana R, Xiao G, Liu Z (2021) RFID sensing technologies for smart agriculture. IEEE
Instrum Meas Mag 24(3):50–60
5. Bhat SA, Huang N-F (2021) Big data and AI revolution in precision agriculture: survey and
challenges. IEEE
6. Joseph RB, Lakshmi MB, Suresh S, Sunder R (2020) Innovative analysis of precision farming
techniques with artificial intelligence. In: 2020 2nd international conference on innovative
mechanisms for industry applications (ICIMIA), pp 353–358. https://doi.org/10.1109/ICIMIA
48430.2020.9074937
7. Sharma SK, Sharma DP, Verma JK (202) Study on machine learning algorithms in crop yield
predictions specific to Indian agricultural contexts. In: 2021 international conference on compu-
tational performance evaluation (ComPE), pp 155–166. https://doi.org/10.1109/ComPE53109.
2021.9752260
8. Talukder S, Jannat H, Sengupta K, Saha S, Hossain MI (2020)Enhancing crops production based
on environmental status using machine learning techniques. In: 2020 international conference
on computer science and its application in agriculture (ICOSICA), pp 1–5. https://doi.org/10.
1109/ICOSICA49951.2020.9243
9. Junior CRG, Gomes PH, Mano LY, de Oliveira RB, de Carvalho ACPLF, Faiçal BS (2017)
A machine learning-based approach for prediction of plant protection product deposition. In:
2017 Brazilian conference on intelligent systems (BRACIS), pp 234–239. https://doi.org/10.
1109/BRACIS.2017.26.
10. JR, HD, PB (2022) A machine learning-based approach for crop yield prediction and fertilizer
recommendation. In: 2022 6th international conference on trends in electronics and informatics
(ICOEI), pp 1330–1334. https://doi.org/10.1109/ICOEI53556.2022.9777230
11. Kumar R, Singh MP, Kumar P, Singh JP (2015) Crop selection method to maximize crop yield
rate using machine learning technique. In: 2015 international conference on smart technologies
and management for computing, communication, controls, energy and materials (ICSTM), pp
138–145. https://doi.org/10.1109/ICSTM.2015.7225403
12. Dwivedi P, Kumar S, Vijh S, Chaturvedi Y (2021) Study of machine learning techniques
for plant disease recognition in agriculture. In: 2021 11th international conference on cloud
computing, data science and engineering (confluence), pp 752–756. https://doi.org/10.1109/
Confluence51648.2021.9377186
13. Alam M, Alam MS, Roman M, Tufail M, Khan MU, Khan MT (2020) Real-time machine-
learning based crop/weed detection and classification for variable- rate spraying in precision
agriculture. In: 2020 7th international conference on electrical and electronics engineering
(ICEEE), pp 273–280. https://doi.org/10.1109/ICEEE49618.2020.9102505
14. Kavita M, Mathur P (2020) Crop yield estimation in India using machine learning. In: 2020
IEEE 5th international conference on computing communication and automation (ICCCA), pp
220–224. https://doi.org/10.1109/ICCCA49541.2020.9250915
15. Gandhi N, Petkar O, Armstrong LJ (2016) Rice crop yield prediction using artificial neural
networks. In: 2016 IEEE technological innovations in ICT for agriculture and rural development
(TIAR). Chennai, India, pp 105–110
4 Survey on Crop Production and Crop Protection 49

16. Khaki S, Wang L (2019) Crop yield prediction using deep neural networks. Front Plant Sci
10:621
17. Crane-Droesch A (2018) Machine learning methods for crop yield prediction and climate
change impact assessment in agriculture. Environ Res Lett 13(11):114003
18. Khosla E, Dharavath R, Priya R (2019) Crop yield prediction using aggregated rainfall-based
modular artificial neural networks and support vector regression. Environ Dev Sustain
19. Maya Gopal PS, Bhargavi R (2019) Optimum feature subset for optimizing crop yield prediction
using filter and wrapper approaches. Appl Eng Agric 35(1):9–14
20. Kim N, Lee Y-W (2016) Machine learning approaches to corn yield estimation using satellite
images and climate data: a case of Iowa state, vol 34, no 4, pp 383–390
21. Xiaoxue L, Xuesong B, Longhe W, Bingyuan R, Shuhan L, Lin L (2021) Review and trend
analysis of knowledge graphs for crop pest and diseases. IEEE Access 7:62251–62264
22. Wolfert S, Ge L, Verdouw C, Bogaardt M-J (2017) Big data in smart farming – a review. Agric
Syst 153. ISSN 0308-521X
23. Manik SMN, Pengilley G, Dean G, Field B, Shabala S, Zhou M (2019) Soil and crop manage-
ment practices to minimize the impact of waterlogging on crop productivity. Front Plant Sci
12(10):140. https://doi.org/10.3389/fpls.2019.00140.PMID:30809241;PMCID:PMC6379354
24. Quy VK, Hau NV, Anh DV, Quy NM, Ban NT, Lanza S, Randazzo G, Muzirafuti A (2022)
IoT-enabled smart agriculture: architecture, applications, and challenges. Appl Sci. https://doi.
org/10.3390/app12073396
Chapter 5
Disease Detection for Grapes: A Review

Priya Deshpande and Sharada Kore

1 Introduction

As the world population is growing, there is a huge demand for the supply of food. To
satisfy this demand, agricultural productivity needs to be increased, and yield needs
to be increased. This is possible when the crops grown are healthy. But because
of pathogens present in the environment, the crops get various diseases, and these
unhealthy crops tend to reduce productivity. It is therefore necessary to monitor
the crop health and its growth progress and detect the disease at the early stage and
provide the future prediction of the disease spread, so that farmers can take necessary
actions like spraying herbicides/pesticides to prevent the crop from severe disease.
In the earlier time of crop disease detection, manual inspection by the farmers
was used, and accordingly decisions were taken to spray the chemicals. From the last
decade the advanced and state of art technologies like artificial intelligence, machine
learning, Internet of Things, Computer Vision, and image processing techniques are
being used in the field of crop disease detection by the researchers.
Grapes are one of the profitable and cost-effective crops. Grape fruits are being
used for the preparation of wine, juices, jams, and jellies. Million tons of grapes are
exported and imported in the world. However, the grape crop is affected by many
diseases which reduce the yield of the crop. The diseases with which grape crops are
affected are Powdery Mildew, Anthracnose, greenaria bitter rot, bacterial leaf spot,
alternaria blight, Black Rot, blue mold rot, botrytis bunch rot, Downy Mildew, black

P. Deshpande (B)
VIIT, SPPU, Pune, India
e-mail: priya.221p0049@viit.ac.in
PVG’sCOET, SPPU, Pune, India
S. Kore
BVCOEW, SPPU, Pune, India
e-mail: sharda.kore@bharatividyapeeth.edu

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 51
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_5
52 P. Deshpande and S. Kore

mold rot, green mold rot, rhizopus rot, Rust, foot rot, IPM for grapes. There is a need
to detect the disease, predict the severity, and suggest pesticide use so that farmers
can take the required actions.
This paper has five sections. Section 2 presents a survey of methods for plant
disease detection. Section 3 presents a survey of methods for grapes disease detection.
Section 4 presents a summary of the survey in tabular format. Section 5 discusses
challenges and future directions, and Sect. 6 is the conclusion section.

2 Survey of Methods for Plants Disease Detection

Image processing and convolutional neural networks-based elaborated review for


plant disease detection is presented in [1]. According to the survey, it is observed
that when CNNs are applied on the data captured on real-time environments, the
accuracy tends to drop by 30–40%, and results tend to vary significantly because
of the diversity of pests, diseases, crops, and environment. When applied to the
same model on the PlantVillage dataset, there is enhancement in accuracy. Some
diseases are also caused by abiotic factors which have similar characteristics as that
of biotic diseases and therefore can result in false diagnosis of the disease. From the
survey, it has been observed that among the researches carried out in plant disease
detection 65.28% use the datasets created in controlled environments and 37.19% of
studies involve datasets from PlantVillage. It is also found that rice, corn, cucumbers,
tomatoes, wheat, bananas, and grapes are the most investigated plants. Onsite survey
also highlights that multispectral and hyperspectral imaging can be used with CNN
which contain more information compared to RGB imaging in further scope of plant
disease detection. The authors also suggest use of Undammed Aerial Vehicle (UAV)
technology for capturing high resolution images.
[2] presents a CNN technique to identify plant disease for 10 crops with 27
diseases with Inception (ResNet-v2 backend) model for training. It used an AI chal-
lenger competition public dataset. Python, TensorFlow deep learning framework,
and Windows OS were used for implementation. The model achieved accuracy of
86.1%AP, and assisted farmers in identifying the disease. Surveys of various CNNs
widely used like LeNet, AlexNet, Inception, and deep residual network are provided.
A crop disease recognition model and processes involved in it is also presented. The
future work is directed toward creating more datasets and more crops covering a
large variety of diseases. Crop image classification accuracy can further be enhanced
by designing accurate network models.
A review on detection and classification of plant diseases in [3] discusses various
current trends and techniques for plant disease detection using the image processing
and deep learning techniques. It targets plant disease detection studies on plants of
apple, tomato, rice, and cucumber. It is observed that the traditional image processing
methods like Global Color Histogram (GCH), Color 3 Coherence Vector (CCV),
principle component analysis (PCA) can give good accuracy, but they still lack in
some areas like the process is time-consuming, It is difficult to test the performance
5 Disease Detection for Grapes: A Review 53

of the disease detection model in complex environments. So this calls for the design
of a novel disease detection model which is more accurate, fast, and intelligent. So
the author suggests development of new deep learning models. The author has also
reviewed the machine learning models like support vector machine (SVM), KNN,
K-means clustering, deep learning models like CNN, GANs. Labeled datasets are
difficult to obtain for early plant detection using HSIte Adversarial networks (for data
augmentation) techniques. Various research gaps are identified like there is a need
for larger datasets for CNN training. For plant disease detection, large and diverse
datasets are not collected. If a large dataset is not available, then there is a need to
implement transfer learning with deep learning with limited dataset. It is found that
early detection of diseases with limited sample sets is still under research, and more
research can be directed toward it. There is a need to build a large dataset of plant
diseases in actual real conditions, for the experimentation purpose the dataset from
PlantVillage is most commonly used, but the data in PlantVillage data is created
under laboratory conditions.
A review on advanced techniques for agricultural disease detection is presented
in [4]. It compares the merits and demerits of machine learning methods with deep
learning and transfer learning methods. Traditional ML methods like SVM, Bayesian
classifiers depend on the quality of data images. Also, the realization is complex and
difficult when the number of training samples is large. It’s concluded that deep
learning with CNN is best suitable for disease detection as compared to traditional
machine learning methods, but still there is scope to improve the accuracy of CNN
as the dataset is limited. Transfer learning can also be used over the deep learning
methods as DL requires a huge amount of dataset and quality of deep learning
models is more dependent on the datasets and in agriculture there is still scarcity of
huge datasets. Also, Parameter Optimization is a major concern in DL. The author
explains the need for the construction of image dataset and expanding current datasets
as presently lack of disease image labeled data determines the quality and accuracy
of DL models. From this survey, it is concluded that most crop disease study is
focused on tomato, rice, cucumbers, apples, and citrus, there is need to design a
method to identify disease independent of specific crop. It suggests that DL can be
integrated with current smartphone technology. Along with disease detection, it is
necessary to find the severity of the disease and also need to relate disease with other
factors like temperature, humidity, soil type. Diverse image dataset construction in
the actual cultivation environment is needed instead of the image datasets collected
in a controlled laboratory environment that will help to improve the accuracy of
the plant disease detection deep learning models. The author suggests the use of
a heterogeneous mode of transfer learning can be employed to predict the disease
based on text, image, and video data instead of only image data.
A detailed survey about plant disease detection using image processing and ML
techniques is presented in [5]. It gives a survey of various plant diseases for plants like
apple, corn, cherry, grapes etc. It also discusses the steps involved in the plant disease
detection process like image pre-processing, Feature Extraction and selection, image
segmentation, disease classification; various classifiers for plant disease detection are
also explored. It also summarizes the previous research work done by researchers for
54 P. Deshpande and S. Kore

various crops like profit crops, mixed culture, grains, etc., using image processing
techniques in terms of percentage of papers. From this survey, it is observed that a
lot of research is being done on rice, tomato, cucumber, citrus, and wheat, but less
research is directed toward profit crops like sugarcane, groundnut. Further the gaps
in research for plant disease detection like there is a need to detect the disease at
a particular stage are discussed. It will be helpful to farmers if stage wise special
precautions are suggested to him. Also, if precise estimation of the infected area of
the plant is done, then it is possible to control and minimize the unmanaged use of
pesticides by the farmers. Though a lot of researchers have provided solutions to this
problem, there is less availability of the actual corresponding systems, so there is a
need to develop mobile-based applications and Website solutions for the farmers in
the world. A “Disease Analysis Report” can be generated for the farmers. There is
a need to develop real-time applications using real-time conditions data rather than
data obtained from the controlled environment in the laboratory.
Deep convolutional network with nine layers methodology is presented in [6] for
39 different classes of plant leaf diseases. The nine-layer deep CNN performance
is compared with SVM, KNN, AlexNet, VGG16, InceptionV3, and ResNet. The
image dataset was taken from PlantVillage. As the training model requires huge
data, the images are augmented to create many numbers of images. The models
were trained and tested using Keras, OpenCV, and Pillow libraries with Python
Programming. The developed model achieved 96.46% of accuracy as compared to
SVM, KNN, Logistic Regression, and decision tree. The further suggestion is that
an improvement in accuracy can be achieved by creating the enhanced dataset. The
new dataset can be created by collecting the different images from different plants,
cultivation, geographical areas, and image qualities. The research can be extended
to fruits, flowers, and stem parts of the plant. Also, the research can be extended to
plant disease diagnosis.
A comparative study of various deep learning models for plant disease iden-
tification and classification is presented in [7]. It provides information about the
image processing-based disease detection techniques using deep convolutional neural
networks. It used plant disease dataset from the ImageNet Dataset Library and imple-
mented the deep learning architectures VGG 16, Inception V4, ResNet with 50,100,
and 152 layers, DenseNet with 121 layers and compared their performance. It is
found that DenseNet gives more accuracy as compared to others, but some research
can be still carried out to reduce the computational processing time.
Table 1 gives a tabular summary of plant disease detection methods and gaps
identified in the literature.

3 Survey of Methods for Grapes Disease Detection

A novel method of image processing and multiclass support vector machine was used
in [8]. Grape diseases like leaf blight, Black Measles, and Black Rot were detected.
Authors used Gray-level co-occurrence matrix (GLCM) and principle component
5 Disease Detection for Grapes: A Review 55

Table 1 Plant disease detection methods


References No Methodology used Limitations/future scope
Abade et al. [1] Image processing and Use of CNN with multispectral and
convolutional neural networks hyperspectral imaging with more
information compared to RGB images.
Use of Unmanned Aerial Vehicles (UAV)
technology for capturing high resolution
images
Ai et al. [2] CNN, inception ResNetV2 model Farmers use books and local networks and
with AI challenger competition experts to manage crop disease
public dataset used Python, Dataset can be extended for rice and wheat
TensorFlow deep learning and their diseases and more crops can be
framework considered
Crop image classification accuracy can be
enhanced further by designing another
network models
Lili et al. [3] Traditional image processing Supervised DL techniques present
GCH, PCA, SVM CNNS SVM, challenges in terms of large amounts of
KNN, K-means GAN for data data, data labeling is a tedious process
augmentation integration of Unlabeled data with unsupervised learning
multiple CNN classifiers may be promising
multiscale ResNet model Early detection of diseases with limited
hyperspectral imaging sample set is still under research
Need of intelligent, rapid, and accurate
plant disease recognition
Need of larger datasets for CNN training.
For plant disease detection, large and
diverse datasets are not collected
Use of transfer learning with DL for
limited dataset
Need to establish a large dataset of plant
diseases in real conditions, most of the
datasets are taken from PlantVillage
datasets, but these datasets are obtained in
a laboratory
Construction of image dataset where
images are collected under actual
cultivation conditions rather in controlled
environments in the laboratory. Traditional
ML methods depend on quality of data
images. Difficult when no of training
samples is large
Most of crop disease study is focused on
tomato, rice, cucumbers, apples, and citrus
(continued)
56 P. Deshpande and S. Kore

Table 1 (continued)
References No Methodology used Limitations/future scope
Yuan et al. [4] DL and transfer learning CNN for Parameter optimization is major concern
image classification in DL
homogeneous transfer learning Can integrate DL with current smartphone
technology
Necessity to find the severity of the disease
and related disease with other factors like
temperature, humidity, soil type
Heterogeneous mode of transfer learning
can be employed to predict the disease
based on text, Image and video data
instead of only image data
Kumar et al. [5] Image processing unsupervised Recognition stage of infection accurate
and supervised classifiers classification
Development of website solution and
mobile app and reliability of detection
systems
Geetharamani Image processing deep CNN Need to increase database classes and size
et al. [6] by capturing images in real environment.
Research can be extended to other parts of
plant like flower, fruits, and stems
Too et al. [7] Deep CNN, VGG 16, inception Computational time needs to be improved
V4, ResNet with 50,100 and 152
layers, DenseNet with 121 layers
Keras with Theano Backend for
training

analysis (PCA) for extracting features and reducing feature dimensions. An accuracy
of 98.71% was obtained using the GLCM method while the PCA method achieved
an accuracy of 98.97%. Deep learning algorithms, i.e., CNN and GoogLeNet were
also used and an accuracy of 86.82% and 94.05% were achieved, respectively.
Authors in [9] proposed a deep convolutional network (DCNN) for identification
and classification of grape leaf diseases. The grape leaves RGB image dataset from
PlantVillage was used. The developed model obtained an accuracy of 99.34%.
Ghost Convolution and Transformer Network for grape leaf disease detection and
pest detection is proposed in [10]. Total of 8 grape diseases, namely Black Rot, leaf
blight, Esca, Downy Mildew, Brown Spot, Powdery Mildew, Nutrient Deficiency,
and viruses were identified. A dataset of 12,615 images was collected. An accuracy
of 98.14 percent was achieved using this model. One of the drawbacks listed is that
the proposed model works only on labeled data. A suggestion to enhance the labeled
dataset is also given to enhance the accuracy. Further the research can be directed
toward segmenting the legion area for severity grading.
Hyperspectral imaging and machine learning approach for detecting Flavescense
Doree Grapevine disease is used in [11]. The auto-encoders are used for reducing
5 Disease Detection for Grapes: A Review 57

the dimensionality of hyperspectral images. The dataset consisted of 35 hyperspec-


tral wine grape leaf images in 272 bands. But for reduction in the computational
complexity, the number of bands is reduced from 272 to 64.The proposed model
achieved an accuracy of 83%. Further the authors suggest using all full band data of
272 bands to improve the accuracy.
A pre-trained Model AlexNet DL model is used in [12] for mango and grape
leaf disease detection also called transfer learning. The grape diseases addressed
are Black Rot, Black Measles, and leaf blight. The model was trained and tested
for 7222 grape leaf disease images taken from PlantVillage dataset. An accuracy of
99% was achieved. The authors used RGB images captured in Single Background
and Uniform lighting conditions. So a suggestion to use a large dataset with an
uncontrolled environment to increase the accuracy is suggested.
In [13], a machine learning model fine grained generative adversarial network
(GAN) is used to classify five grape diseases, namely Leaf Spot, Round Spot, Downy
Mildew, Anthracnose and Sphaceloma with limited training samples. The model
achieved accuracy of 96.27%. Authors used around 1500 images for the experimen-
tation. The drawback of the system is that it can detect a single main disease on a
multi-diseased leaf.
The performance of AlexNet, GoogLeNet, and ResNet-18 for grape disease detec-
tion for three classes of grape diseases, namely Black Rot, Black Measles, and
Isariopsis is compared in [14]. The accuracy of 95.65% was achieved on AlexNet,
92.29% on GoogLeNet, and 89.49% on ResNet-18. The authors used annotated
image dataset from Kaggle.com. The dataset of around 1000 images was used for
the experimentation.
In [15], real-time grape leaf disease detection using deep CNNs using Inception
ResNet-v2 and Inception V1 was implemented. The disease detection is done for
four classes of grape disease, namely Black Rot, Black Measles, leaf blight, and
Mites of grape. The authors created a grape leaf disease dataset of total 4449 images
in controlled environment and in grapery and used augmentation for enhancing the
dataset to 62,286 images. The model achieved an accuracy of 99.47%. It expresses
a need to use a large dataset.
Multiple convolutional neural networks united model by integrating GoogLeNet
and ResNet was used in [16]. The research used an image dataset from PlantVillage
with 1619 images. The United Model achieved accuracy of 98.57%. The authors
express the need to enhance the dataset in an uncontrolled environment in a complex
background and extension of the model for other crops. The model classified three
diseases, namely Black Rot, Esca, and Isariopsis Leaf Spot.
Machine learning techniques like support vector machine (SVM), Random Forest,
and AdaBoost with 5675 images of grape leaves disease from the PlantVillage dataset
for the identification of three grape diseases Black Rot, Esca, leaf blight along with
image processing techniques were presented in [17]. The methodology used has
achieved accuracy of 93%.
In [18], a grape disease detection technique using Back Propagation Neural
Network (BPNN) and image processing is presented. It used Wiener filtering along
with wavelet transform. The research used a dataset of 300 images. Five types of grape
58 P. Deshpande and S. Kore

diseases Leaf Spot, Anthracnose, Downy Mildew, Round Spot, and Sphaceloma
Ampelinum De Bary were detected with an accuracy of 80%.
A grape disease detection using Random Forest-based classification was presented
in [19]. Back Propagation Neural Networks (BPNN), Probabilistic Neural Networks
(PNN), support vector machine (SVM), and Random Forest implementation were
done with their performance comparison. The dataset of 900 images captured in an
uncontrolled environment was used and the proposed model achieved accuracy of
86%. The research targeted three grape fungi diseases, namely Anthracnose, Downy
Mildew, and Powdery Mildew.

4 Summary

A tabular summary of above grape disease detection survey is given in Table 2

5 Challenges and Future Directions for Researchers

As far as the literature survey for the plant and grape disease detection is concerned,
most of the disease detection is carried out using the dataset from PlantVillage,
ImageNet dataset which is collected in a controlled environment. There is a need
to design more accurate models to detect multiple diseases. For grape plant disease
detection systems to be more accurate, there is a need to create a diverse dataset by
considering the real environment and not the laboratory environment. To address this
issue, the dataset can be created by taking images with the help of high resolution
smartphone RGB cameras, multispectral cameras, and hyperspectral cameras. There
is a need to detect the disease stage wise and inform farmers. Early detection of
disease is important. It will be helpful to farmers if stage wise special precautions
are suggested to him. Also, if precise estimation of the infected area of the plant is
done, then it is possible to control and minimize the unmanaged use of pesticides by
the farmer. It calls for the precision praying system to be implemented. According
to the literature survey, most of the disease detection techniques work with data of
diseases on the leaf section of plants. The research can be directed toward disease
detection by considering other parts of the plants like stem, fruits, etc.

6 Conclusion

The survey of various disease detection methods for grapes diseases is presented in
this paper. This paper provides a summary of existing methods and the challenges
present. In the future, a more accurate disease detection model can be developed
using the dataset created by capturing images in real-time scenarios and varying
5 Disease Detection for Grapes: A Review 59

Table 2 Grape disease detection methods


References Methodology used and Accuracy Limitations/Future Scope
No Diseases Detected (%)
Javidan et al. Multiclass support vector 98.71 Experimentation carried out using
[8] machine, gray-level 98.97 a limited dataset. Images in the
co-occurrence matrix 86.82 dataset are taken from
(GLCM) and principle 94.05 PlantVillage dataset and not from
component analysis (PCA), the actual environment
CNN, GoogLeNet
Black Measles, Black Rot,
and leaf blight
Math et al. [9] Deep convolutional neural 99.34 Images in the dataset are taken
network from PlantVillage dataset and not
from the actual environment
Lu et al. [10] Ghost convolution and 98.14 Diagnosis of severity using legion
transformer network area segmentation Use of data
Black Rot, leaf blight, Downy Augmentation for enhancing
Mildew, Powdery Mildew, diversity of dataset
Brown Spot, ESCA, Nutrient
deficiency
Silva et al. Hyperspectral image and 83.00 Experimentation with full 272
[11] machine learning, band high resolution image data
Flavescense Doree–grape
wine disease
Sanath Rao Deep CNN, transfer 99.03 Classify additional classes of
et al. [12] learning—AlexNet disease and development of a
Black Measles, Black Rot, recommendation system Preparing
leaf blight diverse dataset in real time with
varying lighting conditions
Zhou et al. Fine grained GAN 96.27 Method only applicable to Leaf
[13] Leaf Spot, Round Spot Spot
identification, Downy Inability with multi labels
Mildew, Anthracnose, multiclass classification
Sphaceloma ampelinum De In the future, multiple diseases on
bary a single leaf may be detected
Model implementation on
hardware
Lauguico AlexNet, GoogLeNet, 95.65 Real-time dataset is not used
et al. [14] ResNet-18 Black Rot, Black
Measles
Isariopsis
Xie et al. [15] Deep CNN, inception V1, 99.47 Classify additional classes of
inception ResNetV2 disease and improve accuracy
Black Rot, leaf blight, Black
Measles, Mites of grapes
Ji et al. [16] Multiple CNN (Integration) 98.57 Create a real-time dataset. To do
of GoogLeNet and ResNet model compression to reduce
Black Rot, ESCA, Isariopsis computational resources
(continued)
60 P. Deshpande and S. Kore

Table 2 (continued)
References Methodology used and Accuracy Limitations/Future Scope
No Diseases Detected (%)
Jaisakthi Image processing, machine 93.00 Real-time dataset is not used
et al. [17] learning algorithms, SVM.,
AdaBoost, Random Forest,
Black Rot, Esca, leaf blight
Zhu et al. [18] Image analysis (Wiener filter 80.00 Dataset size can be enlarged
and Wavelet transform) and 3
stage BPNN
Anthracnose, Downy Mildew,
Round Spot, Leaf Spot,
Sphaceloma ampelinum De
bary
Sandika et al. PNN, BPNN, SVM, Random 86.00 Dataset size can be enlarged
[19] Forest
Anthracnose, Powdery
Mildew, Downy Mildew

illumination conditions. The dataset can be created by capturing multispectral and


hyperspectral images and developing a model for it. It is not only sufficient to classify
the plant as diseased or non-diseased but also it is needed to identify the type of disease
and severity of it and its future prediction of spread so that the quantity of pesticides
can be decided. Additional classes of diseases can be considered for model training,
and a recommendation system for the farmers can be developed to take appropriate
action for getting rid of the disease.

References

1. Abade A, Afonso P, Ferreira Flavio de Barros V (2021) Plant diseases recognition on images
using convolutional neural networks: a systematic review. Comput Electron Agric 106125:1–31
2. Yong AI, Sun C, Tie J, Cai X (2020) Research on recognition model of crop diseases and insect
pests based on deep learning in harsh environments. IEEE Access 8:171686–171693
3. Lili LI, Zhang S, Wang B (2021) Plant disease detection and classification by deep learning a
review. IEEE Access 9:56683–56698
4. Yuan Y, Chen L, Lit HWL (2021) Advanced agriculture disease image recognition technologies:
a review. J Inf Proc Agric 9(1):48–59
5. Kumar V, Vishnoi KK, Kumar B (2021) Plant disease detection using computational
intelligence and image processing. J Plant Diseases Protect 128:19–53
6. Geetharamani G, Arun Pandian J (2019) Identification of plant leaf diseases using a nine-layer
deep convolutional neural networks. Comput Electric Eng 323–338
7. Too EC, Yujian L, Njukia S, Yingchun L (2019) A comparative study of fine tuning deep
learning models for plant disease identification. Elsevier J Comput Electron Agric 61:272–279
8. Javidan SM, Banakar A, Vakilian KA, Ampatzidis Y (2023) Diagnosis of grape leaf diseases
using automatic K-means clustering and machine learning. Smart Agric Technol 3:100081
9. Math RM, Dharwadkar NV (2022) Early detection and identification of grape diseases using
convolutional neural networks. J Plant Dis Prot 129:521–532
5 Disease Detection for Grapes: A Review 61

10. Yang XLR, Zhou J, Jiao J, Liu F, Liu Y, Su B, Gu P (2022) A hybrid model of ghost-convolution
enlightened transformer for effective diagnosis of grape leaf disease and pest. J King Saud
Univer Comput Inf Sci 1–13
11. Silvaa DM, Bernardinc T, Fanton K, Nepaul R, Joaquim LP, Sousaab J, Cunhaab A (2022)
Automatic detection of Flavescense Dorée grapevine disease in hyperspectral images using
machine learning. Procedia Comput Sci 196:125–132. https://doi.org/10.1016/j.procs.2021.
11.081
12. Sanath Rao U, Swathia R, Sanjanaa V, Arpitha L, Chandrasekhara K, Chinmayi P, Naik K
(2021) Deep learning precision farming: grapes and mango leaf disease detection by transfer
learning. Glob Trans Proc 2(2):535–544
13. Zhou C, Zhang Z, Zhou S, Xing J, Wu Q, Song J (2021) Grape leaf spot identification under
limited samples by fine grained-GAN. Access 9:100480–100489
14. Lauguico S, Concepcion R, Tobias RR, Bandala A, Vicerra RR, Dadios E (2020) Grape Leaf
multi-disease detection with confidence value using transfer learning integrated to regions with
convolutional neural network. In: 2020 IEEE region 10 conference (TENCON), pp 767–772
15. Xie X, Ma Y, Liu B, He J, Li S, Wang H (2020) A deep learning-based real-time detector for
grape leaf diseases using improved convolutional neural networks. Front Plant Sci 1–14
16. Ji M, Zhang L, Qiufeng W (2020) Automatic grape leaf diseases identification via united model
based on multiple convolutional neural networks. Inf Proc Agric 7(3):418–426
17. Jaisakthi SM, Mirunalini P. Thenmozhi D, Vatsala (2019) Grape leaf disease identification using
machine learning techniques. In: 2019 international conference on computational intelligence
in data science (ICCIDS), pp 21–23
18. Zhu J, Wu A, Wang X, Zhang H (2020) Identification of grape diseases using image analysis
and BP neural networks. Multimedia Tools Applications 79(21,2):14539–14551
19. Sandika B, Avil S, Sanat S, Srinivasu P (2016) Random forest based classification of diseases in
grapes from images captured in uncontrolled environments. In: 2016 IEEE 13TH international
conference on signal processing (ICSP), pp 1775–1780
Chapter 6
URL Weight-Based Round Robin Load
Balancing in Cloud Environment

Vijay Kumar Nampally, Satarupa Mohanty, and Prasant Kumar Pattnaik

1 Introduction

Cloud computing is a simplification of reality used to access the networks, storage,


servers, services, and applications shared with multiple users through the Internet.
Patidar et al. [1] presented a detailed survey on cloud computing architecture and its
uses. Figure 1 shows the cloud model and the set of services offered by the cloud.

1.1 Cloud Load Balancing

Zhou et al. [2] explained the process of distributing workloads in a cloud computing
environment for different computing resources by balancing network traffic using
the resources assessment of cloud resources is called Cloud load balancing. Cloud
load balancing is used to meet the organization’s needs by routing incoming traffic
to multiple servers, networks, or other resources, improving performance, and
protecting it from service disturbances. Cloud load balancing can distribute the work-
loads in 2 or more geographic locations. Configuration policy routes the requests to
targets based on the load balancer receiving incoming traffic.
Rahman et al. [3] and AlKhatib et al. [4] in their papers explained how load
balancer as a service can be used in cloud and different load balancing tech-
niques, respectively. The load balancer looks at all the individual nodes/targets,
which should be fully operational. For balancing the load in the cloud, so many

V. K. Nampally (B) · S. Mohanty · P. K. Pattnaik


KIITs Deemed to be University, Bhubaneswar, Odisha, India
e-mail: vijaynampally2019@gmail.com
V. K. Nampally
B V RAJU Institute of Technology, Narsapur, Telangana, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 63
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_6
64 V. K. Nampally et al.

Fig. 1 Cloud model

algorithms were there like Static Algorithm, Dynamic Algorithm, Round Robin
Algorithm, Weighted Round Robin, Opportunistic Load Balancing Algorithm, Load
Balancing Algorithm Minimum To Minimum, Load Balancing Algorithm Maximum
To Minimum, Least connection, Weighted slightest connection, Resource-based,
Request-based, Response time Load Balancing Algorithm. Jaiswal and Jain [5]
showed Load Balancing should be optimal for balancing the load to achieve better
performance for the utilization of resources of the cloud. In general, load balancing is
done with software help but not hardware because hardware costs more than software.
The working of load balancers is shown in Fig. 2.
Normally the user’s request will interact with cloud Load Balancer using the
Internet to access the cloud resources. The purpose of the Cloud load balancer is to
distribute the user requests/traffic across resources. Cloud load balancer reduces the
risk of performance issues of your applications. Generally, the resources will be the
compute engines, computational servers, or virtual machine instances. The servers in
the cloud always store the data in cloud storage buckets. Here the data in the database

Fig. 2 Cloud load balancer working


6 URL Weight-Based Round Robin Load Balancing in Cloud Environment 65

is stored in the form of cloud storage buckets. Cloud load balancers can address the
traffic type like HTTP/HTTPS/TCP/UDP/ESP/GRE/ICMP and ICMPv6. The data
is taken back up in the backend, which can be in multiple regions.

1.2 Cloud Computing Service Model Types

Islam and Hasan [6] explained different computing service and model types. To serve
the request of the clients, there are four service models available in the market. They
were
1. On-Premise Environment.
2. Infrastructure as a Service—IAAS.
3. Platform as a Service—PAAS.
4. Software as a Service—SAAS.

On-Premise Environment
Here all the things/resources from networking to applications must be taken care of
by the user but not the cloud provider.
Infrastructure as a Service—IAAS
Infrastructure as a Service provides access to resources (virtual, physical machines,
virtual storage, etc.) in the cloud environment. Examples are AWS, VMware, and
Rackspace.
Platform as a Service—PAAS
PAAS provides a runtime environment for applications, development, and deploy-
ment tools.
Examples are Azure, force, and Google App Engine.
Software as a Service—SAAS
Software as a Service allows using software applications as a service to end users.
Examples are Google Docs, MS Office, and Gmail.

1.3 Cloud Load Balancing Features

Cloud Load Balancing Features are used to create and configure the cloud environ-
ment as required by the user [7]. Each feature of the cloud is used for a specific
purpose as mentioned in Table 1.
66 V. K. Nampally et al.

Table 1 Cloud load balancing features


Feature name Details
Single Anycast IP Single Anycast IP Address is a “unique/frontend” IP address for all
Address “other/backend” instance regions worldwide
Autoscaling Scaling is a feature in the cloud used to increase/decrease the resources of
the cloud as required by the user on pay per use basis
System type Cloud load balancer system type can be software or hardware
Traffic type Cloud load balancer can handle traffic types like HTTP/HTTPS/TCP/
UDP/ESP/GRE/ICMP and ICMPv6
CDN integration Cloud load balancer can be integrated with CDN. CDN means a content
delivery network is a group of servers distributed geographically which
works together to provide fast delivery of Internet content. Or CDN cache
content from the original server on geographically distributed CDN cache
servers to reach users faster. It allows the fast transfer of assets required
for loading Internet content like HTM pages, HTML pages, JavaScript
files, CSS files, images, and videos. It is set up in two ways. They were a
peer-to-peer (P2P) network and a peering/private model. It dynamically
Performs routing using the Domain Name System (DNS)
Load distribution Cloud load balancing distributes your load balanced resources in single or
multiple regions
Load balancing type Load balancing types can be external and internal. External load balancing
is used when your users reach your applications from the Internet. Internal
load balancing when your clients are inside of Cloud provider
Security CDN is integrated with cloud armor or cloud security kit to secure your
infrastructure from distributed denial-of-service (DDoS) attacks and
attacks on your targeted applications
Advanced support IPv6, Web sockets, user-defined request headers, source-IP-based traffic
steering, and protocol forwarding for private VIPs
Network service tier Premium/standard
OSI layers for load Balancing to direct traffic based on data from network and transport layer
balancing protocols
Logging and Cloud load balancing can be analyzed using the concept called logging
metrics
Content Verified requests from the clients are served by the servers or VMs of the
authentication cloud load balancer
Cost Cloud load balancer costs less if it is implemented by software.
Otherwise, it’s costly if it is implemented by hardware

1.4 Cloud Load Balancing Approaches

Cloud load balancer distributes network traffic across resources using software- or
hardware-based approach [6]. When both are compared for cost and performance-
based approach is the best.
6 URL Weight-Based Round Robin Load Balancing in Cloud Environment 67

Software-Based Approach
Here the software is used for balancing the load in the cloud environment.
Hardware-based approach
Here hardware is used for balancing load in the cloud environment.
Primary Cloud Platform Providers List:
Many providers offer cloud load balancing services which include three major
platforms: AWS, Azure, and GCP.
Company Name: Amazon.
Cloud Platform Name: Amazon Web Services (AWS).
Load Balancing: Amit S. Rodge [8] explained that Elastic Load Balancing
distributes incoming traffic to targets (EC2 instances). Elastic Load Balancing
in AWS is Application, Network, Gateway, and Classic.
Company Name: Google.
Cloud Platform Name: Google Cloud Platform (GCP) [9]
Load Balancing: Mishra et al. [9] showed how load balancing is rendered in
Google cloud. It is built on the front-end server infrastructure of Google.
Company Name: Microsoft.
Cloud Platform Name: Azure [10]
Load Balancing: Load Balancing uses Azure Traffic Manager to distribute
incoming traffic to targets. Carutasu et al. [10] used the concept of VMs to
distribute incoming traffic to targets.

1.5 Cloud Load Balancing Benefits

Joshi and Kumari et al. [11] in their paper how cloud Load Balancing is used to
control Traffic, Increase (Resource Utilization, Resource Availability, Throughput,
Performance, Response time, etc, and Reduce (Infrastructure Cost, Latency, Fault
Tolerance, and Migration Time). Cloud Load Balancing is used to scale the resources
(Add/scale up and remove/scale down). It is used to meet the Client demands to have
connected High Number of Client connections and to serve the Distributed workloads
by serving the Resources Usage Fully Operational.

1.6 Challenges of Cloud Load Balancing

Cloud creation and management are very easy but cloud has some challenges which
have to be managed by highly skillful employees or users or customers while
dealing with sensitive areas of cloud like Tasks Migration, Cloud Interoperability,
and Security. The major challenges of cloud are mentioned in Table 2.
68 V. K. Nampally et al.

Table 2 Challenges of cloud load balancing Sreenivas et al. [7] showcased different challenges
posed in cloud load balancing
Challenges of cloud Details
load balancing
Tasks migration Tasks migration’s purpose is to move tasks from an overloaded virtual
machine to a non-overloaded virtual machine
Energy management The energy management in the cloud should be good to get better
performance
Stored data management Data in the cloud should be appropriately distributed for fast access
and storage
Use of small different Small different data centers are always used for optimal resource
datacenters utilization and cloud computing in case of emergence
Cloud nodes distribution All the nodes should be distributed spatially in the cloud for
accessible locations
Cloud interoperability Cloud interoperability is the ability of one cloud service to interact
with other cloud services by exchanging information
Storage efficiency Storage efficiency comes by using the concept of data replication in
the cloud to different nodes
Load balancing Load balancing algorithm complexity should always be less for
algorithm complexity operations and execution
Fault tolerance/ Another controller must do processing load balancing if the primary
controller failure controller fails
Security In load balancing algorithm has to look over data security while
processing data before, after, and while

1.7 Applications of Cloud Load Balancing

Cloud Load Balancing can be used in various real-time applications and some of
them are mentioned in Table 3.

2 Literature Survey

Many researchers contributed their work to Cloud Load Balancing. The different
research papers and their methods are given in Table 4.
6 URL Weight-Based Round Robin Load Balancing in Cloud Environment 69

Table 3 Applications of cloud load balancing


S. Applications type Details Examples
No
1 Art applications Used to design Adobe Creative Cloud, Moo, Vistaprint
applications like
cards, booklets, and
images
2 Business Ensures that business Salesforce, MailChimp and Chatter,
applications applications are 24*7 Bitrix24 and Paypal, Slack and Quickbooks
available to users
3 Data storage and Used to store Box.com, Mozy, Joukuu, Google G Suite
backup applications information
4 Education Used by students to Google (web-based email, calendar,
applications improve their skills documents, and collaborative study)
5 Entertainment Used in online
applications games, video
conferencing apps,
6 Management Used by the cloud Toggle (track allocated period for a
applications administrators of particular project), Evernote (saves notes),
cloud to manage the Outright (manage user accounts and
cloud activities GoToMeeting (for Video Conferencing)
7 Social applications Used by users to Facebook, Twitter, Yammer, LinkedIn
connect

3 Proposed Methodology: URL Weight-Based Round


Robin Cloud Load Balancing in Cloud Servers

In the URL weight-based Round Robin Cloud Load Balancing algorithm, every
requested URL is assigned a specific weight (1 or 2) by the load balancer as a time
slice. Weight 1 for standard page request and 2 values for database request.
The load balancer forwards the tasks to a particular server, and the server assigns
the tasks to particular VMs to process or redirect the tasks until all the tasks got
completed. Here the load balancer sends the tasks to servers, and the server sends
the tasks to VMS. VMs can send the tasks to other VMS or servers called task
redirection/migration. Task migration will be done until all the tasks got completed.
The flowchart of the proposed algorithm is shown in Fig. 3.
70 V. K. Nampally et al.

Table 4 Research papers and research area


S. Research papers Details Research area Year
No
1 Static algorithm Here the total traffic is equally Balancing load in 2017
distributed across all the servers, and cloud environment
load shifting decision doesn’t depend
on the system’s present state [12]
2 Dynamic algorithm Here a server with fewer loads in the Balancing load in 2018
complete network is given a high cloud environment
preference, and the load balancing
decision depends on the system’s
present state. Here the processes are
moved from a machine with many
loads/tasks to a machine with fewer
loads/tasks in real time [13]
3 Algorithm round Each task is assigned to a specified Balancing load in 2018
robin time slice for completion by the load cloud environment
balancer to the servers in a round
robin fashion (circularly) [14]
4 Load balancing It is a type of Round Robin Balancing load in 2014
algorithm weighted Algorithm where tasks/jobs are cloud environment
round robin assigned specific weights. Based on
these weights, they are assigned to
servers. Usually, higher-weighted
servers are assigned more tasks [15]
5 Opportunistic load It doesn’t consider the system’s Balancing load in 2020
balancing algorithm current workload, but it considers the cloud environment
workload of every node and
randomly distributes the workload to
uncompleted tasks to these nodes [16]
6 Load balancing Here the tasks which take less time to Balancing load in 2018
algorithm minimum complete will be scheduled first cloud environment
to minimum i.e., the tasks are arranged based on
completion time (Minimum to
Minimum) and assigned servers for
processing by the load balancer [17]
7 Load balancing Here the tasks which take more time Balancing load in 2014
algorithm maximum to complete will be scheduled first cloud environment
to minimum I.e., the tasks are arranged based on
completion time (Minimum to
Minimum) and assigned servers for
processing by the load balancer [18]
8 Most minor Here routed to workload instances Balancing load in 2017
connections (least busy instances) with fewer cloud environment
connections [19]
(continued)
6 URL Weight-Based Round Robin Load Balancing in Cloud Environment 71

Table 4 (continued)
S. Research papers Details Research area Year
No
9 Weighted slightest Here every node is assigned a value Balancing load in 2015
connection by administrators. Most minor cloud environment
connection activities do traffic
distribution based on the assigned
value [20]
10 Resource-based Here a software agent is used at each Balancing load in 2018
node to send the complete details to cloud environment
the load balancer. Load balancer
takes the dynamic traffic routing
decisions with that information [6]
11 Request-based The load balancer distributes the Balancing load in 2020
traffic based on fields in query cloud environment
parameters, header data, and source
and destination IP addresses which
helps to move traffic from particular
sources to intended destinations and
maintain sessions [21]
12 Response time load Based on the response time of the Balancing load in 2014
balancing algorithm tasks previously done is used to cloud environment
assign tasks to the cloud load
balancer; i.e., the least response time
of the tasks is given to the cloud load
balancer [22]

Fig. 3 Flowchart for the proposed algorithm


72 V. K. Nampally et al.

Algorithm for URL weight-based Round Robin Cloud Load Balancing in Cloud
Servers
1. Initialize DataCentres with VMs, cloudlets, and Broker
a. Create VMs with specifications
i. Assign VM specification with capacity 100, placed at a unique data center.
b. Create cloudlets with specifications
i. Assign a Load of 2 for Database requests and a Load of 1 for HTTP
requests.
c. Create a Broker to transfer cloudlets to Datacenters.
d. Broker_0: Cloud Resource List received with n resource(s)
i. Create VM(s) in Datacenter(s)
1. VM #0 has been allocated to the host#0 Datacenter_0
2. VM #1 has been allocated to the host#0 Datacenter_1
3. VM #n-1 has been allocated to the host#0 Datacenter_n-1
4. VM #n has been allocated to the host#0 Datacenter n
2. Invoke the Scheduler and Load Balancer
a. Specify the Scheduler policy and Call the Load Balancer
i. Get Datacenter Ids List
ii. Distribute Requests For New VMs Across Data Centers Using Round
Robin
1. Initialize number of VMs allocated = 0;
2. Initialize available Datacenters;
3. If data center capacity is not Full
a. For each VM, get the data center Id in Round Robin Fashion
//Datacenter ID = availableDatacenters.get(i++ % available-
Datacenters.size());
b. Increment number of VMs Allocated;
c. Send Acknowledgment to Broker
3. Broker Sends cloudlets in Round Robin Fashion
a. 0 cloudlet to VM #0
b. 1 cloudlet to VM #1
c. n-1 cloudlet to VM #n-1
d. n cloudlet to VM #n
4. Broker receives cloudlets
5. Broker Destroys VMs
6. Shutdown DataCentres and Broker
In the above algorithm, cloudlets are the small data centers to which the VMS
are associated. For these VMs, the Work Loads requests are assigned in the round
robin fashion with a URL weight-based. The weights are assigned based on the
waiting time required for each VM. After calculating waiting time for all the VMs,
assign weight to each VMs and sort in ascending order. Now, assign a load of 2 for
6 URL Weight-Based Round Robin Load Balancing in Cloud Environment 73

Database requests and a load of 1 for HTTP requests. Now a broker is created to
transfer cloudlets to datacenters. In the next step, the broker sends the cloudlets in
round robin fashion. If the data center capacity is not full, then new workloads are
assigned otherwise workloads are assigned to new cloudlets till all the requests are
completed. Every cloudlets, VM, and tasks assigned are automatically done using
the GridSim Tool.

4 Results

We run the simulation more than one hour (approximately 100 times) on different
numbers of tasks with random length cloudlet (tasks) and calculate the result using the
space shared policy in CloudSim. Consider 5 virtual machines with bandwidth 1000
mbps, the number of CPUs for each virtual machine is 1. Keeping the number of tasks
ranging from 100 to 300 for each virtual machine, and the length of task is varying
from 10000 MI to 200,000 MI. Computational results show that proposed algorithms
reduce the makespan time compared to FCFS, SJF, and Min-Min algorithm as shown
in Table 5 and Fig. 4 shows the comparison between tasks and makespan.
Here the main data center consists of cloudlets. Each Cloudlet contains VMS used
to receive the Requests/tasks from the users. So each VM processes a different set of
tasks, and each task completion and makespans are different based on the workload
of the task and their URL weight (0 for less weight URL/1 for more weight URL).

Table 5 Tasks distribution to VMS and their makespans


No. of No. of Makespan No. of No. of Makespan No. of No. of Makespan
VMs tasks VMs tasks VMs tasks
1 100 796.91 1 300 2390.54 1 500 3984.16
2 100 403.33 2 300 1209.78 2 500 2016.23
3 100 271.02 3 300 806.55 3 500 1346.87
4 100 203.3 4 300 609.76 4 500 1016.23
5 100 162.66 5 300 487.84 5 500 812.96

Fig. 4 Comparison of tasks versus makespan


74 V. K. Nampally et al.

Makespan
It is the total time taken by a set of jobs for its complete execution. So makespan
minimization is important while allotting the tasks to the VMS using any algorithm.
Every task in the cloud can be compared with another task for the parameters number
of VMs, the number of tasks, and makespan [23, 24].

5 Conclusion

In general, for balancing the load in the cloud, any one of the following algorithms
like Static Algorithm, Dynamic Algorithm, Round Robin Algorithm, Weighted
Round Robin, Opportunistic Load Balancing Algorithm, Load Balancing Algo-
rithm Minimum To Minimum, Load Balancing Algorithm Maximum to Minimum,
Least connection, weighted most minor connection, Resource-based, Request-based,
Response time Load Balancing Algorithm can be used. But in this paper, we
use weights for the round robin. In URL weight-based Round Robin Cloud Load
Balancing, every request is classified into one of the two categories and assigned to
the load balancer. The load balancer forwards the tasks to particular VMs to process or
redirect the tasks until all the tasks got completed by using the assigned values to the
URL as a time slice. In URL weight-based Round Robin Cloud Load Balancing, the
main parameters used are the number of virtual machines, tasks, and makespan used
to evaluate the algorithm’s performance. URL weight-based Round Robin Cloud
Load Balancing has to be implemented using the software-based approach for better
performance and utilization of resources in a cloud environment.

References

1. Patidar S, Rane D, Jain P, A survey paper on cloud computing. In: 012 second international
conference on advanced computing and communication technologies
2. Zhou M, Zhang R, Zeng D, Qian W, Services in the cloud computing era: a survey. 978-1-
4244-7820-0/10/$26.00 ©2010 IEEE IUCS2010
3. Rahman M, Iqbal S, Gao J (2014) Load balancer as a service in cloud computing. In: 2014
IEEE 8th international symposium on service oriented system engineering
4. AlKhatib AAA, Sawalha T, AlZu’bi S (2020) Load balancing techniques in software-defined
cloud computing: an overview. In: 2020 seventh international conference on software defined
systems (SDS)
5. Jaiswal AA, Jain S (2014) An approach towards the dynamic load management techniques in
cloud computing environment. 978-1-4799-7169-5/14/$31.00 ©2014 IEEE
6. Islam T, Hasan MS (2017) A performance comparison of load balancing algorithms for cloud
computing. 978-1-5386-3148-5/17/$31.00 © 2017 IEEE
7. Sreenivas V, Prathap M, Kemae M, Load balancing techniques: major challenge in cloud
computing – a systematic review
8. Rodge AS, Pramanik C, Bose J, Soni SK (2014) Multicast routing with load balancing using
amazon web service. In: 2014 annual IEEE India conference (INDICON)
6 URL Weight-Based Round Robin Load Balancing in Cloud Environment 75

9. Mishra SK, Sahoo B, Parida PP (2018) Load balancing in cloud computing: a big picture.
Preprint Submitted J LATEX Templates
10. Carutasu G, Botezatu MA, Botezatu C (2017) Cloud computing and windows azure. All content
following this page was uploaded by George Carutasu
11. Joshi S, Kumari U (2016) Load balancing in cloud computing: challenges & issues. 978-1-
5090-5256-1/16/$31.00_c 2016 IEEE
12. Aligarh Muslim University, Aligarh Muslim University (2017) A survey on load balancing
algorithms in cloud computing. Article Int J Autonomic Comput
13. Patel KD, Bhalodia TM, An efficient dynamic load balancing algorithm for virtual machine in
cloud computing. IEEE Xplore Part Number: CFP19K34-ART; ISBN: 978-1-5386-8113-8
14. Ghosh S, Banerjee C (2018) Dynamic time quantum priority based round robin for load
balancing in cloud environment. In: 2018 fourth international conference on research in
computational intelligence and communication networks (ICRCICN)
15. Wang W, Casale G (2014) Evaluating weighted round robin load balancing for cloud web
services. In: 2014 16th international symposium on symbolic and numeric algorithms for
scientific computing
16. Ojha SK, Rai H, Nazarov A (2020) Optimal load balancing in three level cloud computing
using osmotic hybrid and firefly algorithm. In: 2020 international conference engineering and
telecommunication (En& T) | 978-1-7281-8829-4/20/$31.00 ©2020 IEEE | https://doi.org/10.
1109/ENT50437.2020.9431250
17. Vishalika, Malhotra D (2018) LD_ASG: load balancing algorithm in cloud computing. In:
5th IEEE international conference on parallel, distributed and grid computing (PDGC-2018).
Solan, India 978–1
18. Li X, Mao Y, Xiao X, Zhuang Y (2014) An improved max-min task-scheduling algorithm for
elastic cloud. In: 2014 international symposium on computer, consumer and control
19. Islam T, Hasan MS (2017) A performance comparison of load balancing algorithms for cloud
computing. 978-1-5386-3148-5/17/$31.00 © 2017 IEEE 130
20. Kang L, Ting X (2015) Application of adaptive load balancing algorithm based on minimum
traffic in cloud computing architecture. 978-1-4799-1891-1/15/$31.00 ©2015 IEEE
21. Mohammed MA, Hasan RA, Ahmed MA, Tapus N, Shanan MA, Khaleel MK, Ali AH (2018)
A focal load balancer based algorithm for task assignment in a cloud environment. 978-1-5386-
4901-5/18/$31.00 ©2018 IEEE
22. Swarnakar S, Kumar N, Kumar A (2020) Modified genetic based algorithm for load balancing
in cloud computing. 978-1-7281-7340-5/20/$31.00 ©2020 IEEE
23. Sharma A, Peddoju SK (2014) Response time based load balancing in cloud computing. 978-
1-4799-4190-2/14/$31.00 ©2014 IEEE
24. Al-Maytami BA, Fan P, Hussain A, Baker T, Liatsis P, A task scheduling algorithm with
improved makespan based on prediction of tasks computation time algorithm for cloud
computing. Digital object identifier. https://doi.org/10.1109/ACCESS.2019.2948704
Chapter 7
Determination of Thickness
and Refractive Indices of Thin Films
from Reflectivity Spectrum Using Rao-1
Optimization Algorithm

Bhautik H. Gevariya, Sanjaykumar J. Patel, and Vipul Kheraj

1 Introduction

Anti-reflective coatings (ARC) are commonly utilized to reduce undesired reflec-


tion from the surface and improve the performance of many optoelectronic devices.
Because of abrupt change in refractive index between the boundary of the surrounding
medium, generally air, and semiconductor active layer, materials used for optoelec-
tronic devices (e.g., Si, GaAs, and InP) often display high reflectivity in the region
of 30–40%. By applying ARC at the boundary between the semiconductor active
layer and the surrounding medium, photon collection in solar cells and emission
of photons in laser diode (LD) and superluminescent LED could be enhanced. For
narrow wavelength ranges, a single or double-layer AR optical coating with quarter
wave optical thickness of dielectric materials can effectively reduce reflectance at
certain incident angles. Monochromatic devices, such as laser diodes, can benefit
from such designs [1]. However, various devices, such as solar cells [2–4] and detec-
tors [5, 6], require a very low reflectivity throughout a broader wavelength band
to enhance the efficiency of light collection and to enhance the efficiency of light
emitting from light-emitting diodes (LED) [7] and superluminescent LED [8, 9]. AR
coatings with more advanced thin film designs with many layers or a continuously
graded refractive index layer are required for that type of application. The graded

B. H. Gevariya · V. Kheraj (B)


Department of Physics, S. V. National Institute of Technology, Surat 395007, India
e-mail: vk@phy.svnit.ac.in
S. J. Patel
Department of Physics, School of Science and Technology, Vanita Vishram Women’s University,
Vanita Vishram, Surat, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 77
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_7
78 B. H. Gevariya et al.

refractive index layer design has been proven to be effective in providing the neces-
sary AR coating performance for a variety of applications [10, 11]. For such a design,
the required spectral response, namely the reflectance and transmittance spectrum,
may be achieved by adjusting the refractive index (n) and thickness (t) of chosen
ARC materials. Hence, precise knowledge of the thickness and refractive index of
anti-reflective thin films is always required for designing optical coatings, and as a
result, the performance of optoelectronic devices may be enhanced. Hence, frequent
measurements are essential. Thus, an easy and fast technique for determining the
thickness and refractive index of optical thin films has significant importance.
Spectroscopic ellipsometry (SE) [12, 13] and spectrophotometry [14, 15] are
commonly used methods for the determination of the n and t of thin films. Though
the former method is more robust and reliable, it is significantly costlier. Keeping
that view in mind, the latter gives a comparatively good result. It can be used with a
multi-wavelength spectrum fitting technique, in which the experimentally measured
reflectance and/or transmittance spectrum are fitted with the theoretically calcu-
lated results using any optimization algorithm to determine the film’s thickness and
refractive index for a required wavelength domain. The refractive index is closely
related to the wavelength, in the multi-wavelength technique, and this relationship
can be described by certain optical dispersion equations, which can yield excellent
results for a wide range of materials and over a wide wavelength range. Several
global optimization algorithms have been effectively employed to determine n and
t of thin films, including particle swarm optimization [16], genetic algorithm [17,
18], pattern search [19], artificial neural network [20], simulated annealing [21] and
TLBO [22, 23]. However, in order to find the best solution, they need some algorithm-
specific parameters. As an example, PSO uses inertia weight, social, and cognitive
parameters. Similarly, GA utilizes mutation probability, crossover rate, and selection
operator. Furthermore, these factors are problem-specific, and determining optimal
values for these parameters is challenging. Improper parameter selection for these
algorithm-specific parameters may even increase calculation time or result in a local
optimum instead of a global one.
R. Venkata Rao introduced a new optimization algorithm, the Rao-1 optimization
algorithm [24], which significantly reduces the above-mentioned limitations. The
beauty of this algorithm is that no algorithm-specific parameters are needed. It simply
needs very few input parameters, such as no. of iterations and population size, which
are most common to every nature-inspired optimization algorithm. Until now, to our
knowledge, the Rao-1 optimization algorithm has not been used in the literature to
determine ARC thin film thickness and refractive index.
In this paper, the reflectivity of optical ARC thin films is measured using a spec-
trophotometric reflectometry method. This procedure is quite straightforward, non-
destructive, and relatively very simple to set up in the laboratory. The Rao-1 algorithm
is then used to fit the experimentally measured reflectivity spectra to theoretical ones.
PyCharm software is used to implement the algorithm, which is written in Python
(version 3.9).
7 Determination of Thickness and Refractive Indices of Thin Films … 79

2 Rao-1 Algorithm

The Rao-1 algorithm is a simply metaheuristic population-based algorithm that only


depends on the results obtained by population to proceed toward the global optimum
like other optimization algorithms [24].
Suppose f (x) be the objective function to be minimized or maximized. Consider
there are ‘v’ no. of unknown parameters (i.e., m = 1,2, …, v), ‘p’ no. of possible
solutions or populations (i.e., n = 1, 2, …, p). Now the best population is represented
as f (x)best and the worst population is represented as f (x)worst . If X l.m.n is the mth
parameter value for the nth population for the lth iteration, then the updated value
can be found as per Eq. (1) given below,

 
X l,m,n = X l,m,n + rl,m,1 X l,m,best − X l,m,worst (1)

where X l,m,best is the mth parameter value for the best solution and X l,m,worst is the

mth parameter value for the worst solution. X l,m,n is the updated value of X l,m,n and
rl,m,1 is the randomly generated number for the mth parameter for the lth iteration

in the range of 0–1. X l,m,n is acceptable if it improves the objective function’s value
otherwise the old solution remains as it is. All the acceptable objective function
values at the end of the iteration are kept and used as the input for the next iteration.

3 Application of Rao-1 Algorithm for Determination


of Thickness and Refractive Index of ARC Thin Film

The determination of the thickness and refractive index are done by utilizing the
experimentally obtained reflectivity data for optical AR thin film. The Sellmeier
dispersion relation [25] up to two terms is used in this study to determine the refractive
index for the considered wavelength range. The following Eq. (2) represents the
Sellmeier equation that is utilized.

B1 λ2 B2 λ2
n 2 (λ) = 1 + + (2)
λ2 − C12 λ2 − C22

Hence, the four Sellmeier coefficients B1 , B2 , C1 , and C2 and the thickness (t)
form the population which contains all the unknown parameters or variables depicted
as Pi = (B1i , C1i , B2i , C2i , t1 ), where i = 1, 2, 3, … N, where N shows the size of
populations. Quality of the individual population (Pi ) can be decided by calculating
the value of the specified fitness function which is given by
s  2
k=1 R exp (λk ) − R cal (λk , B1i , C1i , B2i , C2i , ti )
F(P) = , (3)
s
80 B. H. Gevariya et al.

Table 1 Terminology of the Rao-1 algorithm with respect to present problem


Rao-1 Equivalent parameters in present problem
algorithm
terms
Unknown Thickness of layer and sellmeier coefficients
variables
Population All four Sellmeier coefficients and thickness
X i, j,best Value of unknown variable from best and worst solutions exist in population
X i, j,worst with minimum and maximum fitness function value
Search space Lower and upper bound values for all Sellmeier coefficient and film thickness

where R exp (λk ) is given by the value of reflectivity which is measured exper-
imentally at wavelength λk and R cal (λk , B1i , C1i , B2i , C2i , ti ) is given by the
value of reflectivity which is calculated theoretically at wavelength λk using the
transfer matrix method [26] with the help of the five unknown parameters, namely
B1i , C1i , B2i , C2i , ti . The s is given by the total number of points for which reflec-
tivity is measured. The values of variables are optimized in such a fashion so that
the calculated fitness function value of Pi as per Eq. (3) is improved iteration by
iteration by using the Rao-1 algorithm. By doing this iteratively, the best match of
theoretically calculated reflectivity values with the experimentally observed one is
found across a considered wavelength range. For the execution of code, three control
parameters are required: the number of unknown parameters, the number of itera-
tions, and the initial search range, which must be provided initially to the code as
input parameters. Unknown variables are optimized iteratively for the considered
problem by using the Rao-1 algorithm. The terminology of the Rao-1 algorithm in
respect of the considered problem is given in Table 1. The population was varied from
20 to 100 in steps of 20 with 50 runs for each population, and the no. of iterations
were kept constant at 1000 for the whole exercise.

4 Thin Film Deposition and Spectrophotometric


Reflectivity Measurement

A thin film of frequently utilized optical ARC materials, namely magnesium fluoride
(MgF2 ), aluminum oxide (Al2 O3 ), and silicon dioxide (SiO2 ), is grown separately
on an Indium Phosphide (InP) substrate at 100 °C under high vacuum conditions
(10–6 mbar). The deposition is carried out with the help of a 3 kW electron beam
evaporation unit provided with a 180° bend electron beam gun facility. The film’s
deposition rate and thickness were monitored by a quartz crystal oscillator integrated
within the chamber as it grew. The radiant heater mounted within the chamber is used
to heat the substrate.
7 Determination of Thickness and Refractive Indices of Thin Films … 81

In this paper, a reflectometry experimental setup established in our laboratory


was used to measure the reflectivity spectrum of prepared samples. The setup design
contains an optical chopper (SR-540), monochromator (oriel cornerstone—260¼
m), light source (Ocean Optics—Model No. HL-2000), silicon detector (Edmond
optics—NT53-373), and lock-in amplifier (SR-830). The reflectance measurement
was carried out for the considered wavelength range 450–850 nm for all samples
deposited on InP substrate in the step of 5 nm at a low angle of incidence.

5 Results and Interpretation

To our knowledge, it is the first time that the Rao-1 algorithm has been used to
evaluate the refractive index and thickness of a thin film. However, before using an
algorithm in a practical application, it is critical to assess its efficiency. Keeping this
in mind, standard ellipsometric measurements are utilized as an experimental verifi-
cation tool. The experimentally obtained reflectivity spectra of thin films are matched
with theoretically calculated ones with the help of the transfer matrix method using
the self-developed program which uses Rao-1 algorithm to determine the thickness
and refractive index for all thin films. The thickness and refractive index values esti-
mated by Rao-1 are compared with ellipsometric measurements of the same samples.
The results obtained for various thin film samples are analyzed and discussed in a
subsequent subsections.

5.1 A Single-Layer MgF2 Deposited on InP Substrate

We validated our self-developed program on an MgF2 film deposited on an InP


substrate. The experimental curve (blue dashed line) and its fitted calculated reflec-
tivity curve (black solid line) optimized by Rao-1 algorithm over the wavelength
range 450–850 nm are shown in Fig. 1a. It is clearly observed that the optimized
curve obtained from the algorithm is precisely imposed on the experimental curve.
Figure 1b shows the wavelength-dependent refractive index of MgF2 film optimized
by the Rao-1 algorithm (blue dashed line) and corresponding values obtained for the
same sample by the ellipsometry (black solid line). The optimized refractive index
values for an MgF2 film for the considered wavelength range 450–850 nm are in
close agreement with the results acquired by ellipsometry, as shown in the figure.
The thickness values obtained by ellipsometry measurement and optimized by our
algorithm are shown in Table 2. Figure 1c shows evolution of the fitness function
with iterations with enlarged sections.
82 B. H. Gevariya et al.

A single layer MgF2 on InP substrate A single layer MgF2 on InP substrate
40 1.44
Experimental Curve By Ellipsometry
Fitted curve using optimization algorithm By optimization algorithm
35

1.42
30

Refractive index (n)


Reflectivity (%)

25
1.40
20

15
1.38

10

5 1.36
0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85
Wavelength λ (μm) Wavelength λ (μm)

(a) (b)
Fitness value evolution with iteration
0.200 Experimental Curve

0.175

0.150
Fitness value

0.125

0.100

0.075

0.050

0.025

0 200 400 600 800 1000


No. of iterations

(c)

Fig. 1 a Fitted reflectivity spectrum for MgF2 . b Wavelength-dependent refractive index for MgF2 .
c Fitness function evaluation with iteration for MgF2

Table 2 Comparison between optimized and ellipsometrically measured thickness


Types of coating Measured thickness Calculated thickness (Å) Average % relative
using ellipsometry (Å) and (% relative error) error in refractive index
obtained using different measurement for
algorithms different algorithms
Rao-1 TLBO [22] Rao-1 TLBO [22]
MgF2 on InP 1327 1326.006 1326 0.247394 0.199119
(0.074906) (0.075358)
Al2 O3 on InP 2520 2518.423 2519 1.071486 0.586557
(0.062579) (0.039682)
SiO2 on InP 2384 2305.503 2317 0.514212 0.601356
(3.292659) (2.8104)
7 Determination of Thickness and Refractive Indices of Thin Films … 83

5.2 A Single-Layer Al2 O3 Deposited on InP Substrate

To validate the credibility of the self-developed program, we applied it to Al2 O3 film


coated on an InP substrate. The experimental curve (blue dashed line) and its fitted
calculated reflectivity curve (black solid line) optimized by Rao-1 algorithm over
wavelength range 450–850 nm are shown in Fig. 2a. It is observed that the optimized
curve obtained from the algorithm nearly imposed on the experimental curve with
some minor deviation at the end for optimized thickness and refractive index values.
Table 2 shows the thickness values obtained by ellipsometry measurement and
optimized by our algorithm. Even in the case of Al2 O3 thin film, we observed that the
thickness value obtained by the algorithmic approach is in excellent agreement with
an ellipsometry measurement. The graphical representation of wavelength-dependent
refractive index obtained by Rao-1 algorithm (black solid line) and ellipsometry

A single layer Al2O3 on InP substrate A single layer Al2O3 on InP substrate
40 1.66
Experimental Curve By Ellipsometry
Fitted curve using optimization algorithm By optimization algorithm
35

30 1.64
Refractive index (n)
Reflectivity (%)

25

1.62
20

15

1.60
10

5
1.58
0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85
Wavelength λ (μm) Wavelength λ (μm)

(a) (b)
Fitness value evolution with iteration
10 Experimental Curve

8
Fitness value

0
0 200 400 600 800 1000
No. of iterations

(c)

Fig. 2 a Fitted reflectivity spectrum of Al2 O3 . b Wavelength-dependent refractive index for Al2 O3 .
c Fitness function evaluation with iteration for Al2 O3
84 B. H. Gevariya et al.

measurement (blue dashed line) for Al2 O3 film is shown in Fig. 2b. Although the
difference in refractive index values produced by the Rao-1 and attained by ellip-
sometry is considerably larger for Al2 O3 in comparison with MgF2 , it may still be
within an acceptable tolerance for most optical coating applications. Figure 2c shows
evolution of the fitness function with iterations with enlarged sections.

5.3 A Single-Layer SiO2 Deposited on InP Substrate

To demonstrate the flexibility of our self-developed program for application of it to


a variety of ARC materials, we applied it on SiO2 deposited on an InP substrate.
Figure 3a depicts the experimental curve (blue dashed line) and its fitted calculated
reflectivity curve (black solid line) optimized by Rao-1 algorithm over wavelength
range 450–850 nm. It is clearly observed from the figure that the optimized curve
obtained from the algorithm nearly imposed on the experimental curve with a slight
deviation. However, in this case, the thickness determined by the Rao-1 algorithm is
more deviated than MgF2 and Al2 O3 with compare to thickness value obtained by
ellipsometry.
Figure 3b shows the wavelength-dependent refractive index for SiO2 thin film as
optimized by Rao-1 algorithm (black solid line) and corresponding values obtained
for the same sample by the ellipsometry (blue dashed line). Figure 3c shows evolution
of the fitness function with iterations with enlarged section.
For all three AR coating materials examined in this study, Table 2 provides the
comparison of algorithmically generated thickness values with those obtained by
ellipsometry arrangement. We also show thickness values obtained by the TLBO
algorithm implemented in the past [22] in the table to give a comparison between
different algorithms. As apparent from the table, for all three materials, obtained
values of thicknesses by Rao-1 algorithm are in good agreement with the thickness
values evaluated by ellipsometry measurement and TLBO algorithm. These obtained
results indicate the capability of the approach presented.
We also computed the average percentage relative error of wavelength-dependent
refractive index and percentage relative errors for thickness value for each sample,
which is given in Table 2. The thickness error values in single-layer MgF2 and
Al2 O3 films were found to be lower. However, in the case of single-layer SiO2 , the
error was considerably larger, at roughly 3%. This larger error might be because
of non-uniformity introduced to the film during the developing stage. In the case
of refractive index, the mean percentage relative error for single-layer MgF2 was
measured to be less than 0.25%, while the error in single-layer Al2 O3 and SiO2 was
comparatively higher, which is around 1.1% and around 0.5%, respectively. We can
also spot somewhat higher variation in refractive index values computed by Rao-
1 compared to those attained by ellipsometry from the graphical representation of
refractive index.
We think that these variations in the dispersive refractive index values are related
to measurement limitations to some extent. The data used for the reflectivity spectra
7 Determination of Thickness and Refractive Indices of Thin Films … 85

A single layer SiO2 on InP substrate A single layer SiO2 on InP substrate
40 1.48
Experimental Curve By Ellipsometry
Fitted curve using optimization algorithm By optimization algorithm
35

1.46
30

Refractive index (n)


Reflectivity (%)

25
1.44
20

15
1.42

10

5 1.40
0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85
Wavelength λ (μm) Wavelength λ (μm)

(a) (b)
Fitness value evolution with iteration
7 Experimental Curve

5
Fitness value

0 200 400 600 800 1000

No. of iterations

(c)

Fig. 3 a Fitted reflectivity spectrum of SiO2 . b Wavelength-dependent refractive index for SiO2 .
c Fitness function evaluation with iteration for SiO2

were collected at a low angle of incidence, roughly 5° off normal in our case, while
the transfer matrix method was used to compute reflectivity considered perfectly
normal incidence. Due to the mechanical limitations of the setup in the real world,
it is challenging to set up the incidence angle perfectly normal. This contributes
to a minor error in the fitting, particularly when calculating the dispersive refractive
index. This error may be minimized further in two ways: Firstly, by improving the
setup of reflectivity measurement if practically possible and secondly, by considering
the transfer matrix method approach for the non-normal angle of incidence. In addi-
tion to the above-mentioned measurement issues, the transfer matrix approach also
presumes that the films are completely homogenous and that the interfaces are sharp.
86 B. H. Gevariya et al.

In the real world, however, sharp interfaces are nearly hard to accomplish by most
of the available deposition techniques, which may also add variation in computed
reflectivity values. Eventually, this may have a greater impact on refractive index
values than on the value of thickness because of intrinsic nonlinearity and omittance
of higher-order terms from the sellmeier dispersion equation.

6 Conclusion

We were able to effectively demonstrate a straightforward technique to determine


thickness and refractive index as a function of wavelength from experimental reflec-
tivity spectra of various ARC thin film samples with the help of the Rao-1 algorithm.
The theoretically calculated reflectivity spectrum of thin films was successfully fitted
on the experimental one using the Rao-1 algorithm. For verification of the credibility
of the new Rao-1 algorithm approach, thickness and wavelength-dependent refrac-
tive index were measured for the same films by ellipsometry technique. By carrying
out a comparison between optimized thickness and refractive index values with the
corresponding results obtained for the considered thin film samples using standard
ellipsometry measurements, it was concluded that the thickness values obtained by
the Rao-1 algorithm are in close agreement with standard ellipsometry measurements
for different thin films considered in this study. The refractive index estimated by
the Rao-1 algorithm agrees well with the ellipsometry results. Hence, we can say
that the thickness and dispersive refractive index may be determined with consider-
able consistency for many optical coating applications from a simple experimentally
measured reflectivity spectrum. It is observed that the thicknesses and dispersive
refractive index profiles over the wavelength range 450–850 nm, as obtained by
Rao-1 approach, are in good agreement with the ellipsometry results. However, the
convergence speed and the convergence rate are a little low in case of Rao-1 algorithm.
Further study and more optimizations may be required to improve the conversance
rate, particularly in the case of anti-reflection coating design and diagnosis. More
work is also required in order to extend the similar work for multilayer ARC film,
which is used to reduce reflection for a broad wavelength range. This can be taken
up as an extension of our current research work in future. In conclusion, we have
successfully implemented the Rao-1 algorithm approach for practical estimation of
the thickness and the refractive index with relatively better precision for several
materials.
7 Determination of Thickness and Refractive Indices of Thin Films … 87

References

1. Kheraj VA, Panchal CJ, Patel PK, Arora BM, Sharma TK (2007) Optimization of facet coating
for highly strained InGaAs quantum well lasers operating at 1200 nm. Opt Laser Technol
39:1395–1399
2. Han L, Zhao H (2014) Simulation analysis of GaN microdomes with broadband omnidirectional
antireflection for concentrator photovoltaics. J Appl Phys 115:133102
3. Young NG, Perl EE, Farrell RM, Iza M, Keller S, Bowers JE, Nakamura S, DenBaars SP,
Speck JS (2014) High-performance broadband optical coatings on InGaN/GaN solar cells for
multijunction device integration. Appl Phys Lett 104:163902
4. Perl EE, McMahon WE, Bowers JE, Friedman DJ (2014) Design of anti-reflective nanostruc-
tures and optical coatings for next-generation multijunction photovoltaic devices. Opt Exp OE.
22:A1243–A1256
5. Hamden ET, Greer F, Hoenk ME, Blacksberg J, Dickie MR, Nikzad S, Christopher Martin D,
Schiminovich D (2011) Ultraviolet antireflection coatings for use in silicon detector design.
Appl Opt AO 50:4180–4188
6. Mancuso M, Beeman JW, Giuliani A, Dumoulin L, Olivieri E, Pessina G, Plantevin O, Rusconi
C, Tenconi M (2014) An experimental study of anti-reflective coatings in Ge light detectors
for scintillating bolometers. EPJ Web Conf 65:04003
7. Cho J-Y, Byeon K-J, Lee H (2011) Forming the graded-refractive-index antireflection layers
on light-emitting diodes to enhance the light extraction. Opt Lett OL 36:3203–3205
8. Zibik EA, Ng WH, Revin DG, Wilson LR, Cockburn JW, Groom KM, Hopkinson M (2006)
Broadband 6μm<λ<8μm superluminescent quantum cascade light-emitting diodes. Appl Phys
Lett 88:121109
9. Wang J, Li LT, Xu W, Yu R, Ramalingam J, Wu Z, Zhu W, Li X (2005) Ultrabroad-bandwidth
and high-power superluminescent light emitting diodes. In: Coherence domain optical methods
and optical coherence tomography in biomedicine IX. SPIE, pp 531–539
10. Deng C, Ki H (2016) Pulsed laser deposition of refractive-index-graded broadband antireflec-
tion coatings for silicon solar cells. Sol Energy Mater Sol Cells 147:37–45
11. Zhang J-C, Xiong L-M, Fang M, He H-B (2013) Wide-angle and broadband graded-refractive-
index antireflection coatings. Chinese Phys B 22:044201
12. Tompkins HG, Baker JH, Smith S, Convey D (2000) Spectroscopic ellipsometry and
reflectometry: a user’s perspective
13. Vedam K, Kim SY (1989) Simultaneous determination of refractive index, its dispersion
and depth-profile of magnesium oxide thin film by spectroscopic ellipsometry. Appl Opt AO
28:2691–2694
14. Dobrowolski JA, Ho FC, Waldorf A (1983) Determination of optical constants of thin film
coating materials based on inverse synthesis. Appl Opt AO 22:3191–3200
15. Caliendo C, Verona E, Saggio G (1997) An integrated optical method for measuring the
thickness and refractive index of birefringent thin films. Thin Solid Films 292:255–259
16. Salvi J, Barchiesi D (2014) Measurement of thicknesses and optical properties of thin films
from surface plasmon resonance (SPR). Appl Phys A 115:245–255
17. Torres-Costa V, Martín-Palma RJ, Martínez-Duart JM (2004) Optical constants of porous
silicon films and multilayers determined by genetic algorithms. J Appl Phys 96:4197–4203
18. Patel SJ, Kheraj V (2013) Determination of refractive index and thickness of thin-film from
reflectivity spectrum using genetic algorithm. AIP Conf Proc 1536:509–510
19. Miloua R, Kebbab Z, Chiker F, Sahraoui K, Khadraoui M, Benramdane N (2012) Determination
of layer thickness and optical constants of thin films by using a modified pattern search method.
Opt Lett OL 37:449–451
20. Tabet MF, McGahan WA (1999) Thickness and index measurement of transparent thin films
using neural network processed reflectance data. J Vac Sci Technol, A 17:1836–1839
21. Gao L, Lemarchand F, Lequime M (2011) Application of global optimization algorithms for
optical thin film index determination from spectro-photometric analysis. In: Advances in optical
thin films IV. SPIE, pp 65–81
88 B. H. Gevariya et al.

22. Patel SJ, Jariwala A, Panchal CJ, Kheraj V (2020) Determination of thickness and optical
parameters of thin films from reflectivity spectra using teaching-learning based optimization
algorithm. J Nano Electron Phys
23. Patel SJ et al (2017) A novel teaching-learning based optimization approach for design of
broad-band anti-reflection coatings. Swarm Evol Comput 34:68–74
24. Rao R (2020) Rao algorithms: three metaphor-less simple algorithms for solving optimization
problems. Int J Ind Eng Comput 11:107–130
25. Tatian B (1984) Fitting refractive-index data with the sellmeier dispersion formula. Appl Opt
AO 23:4477–4485
26. Kheraj VA, Panchal CJ, Desai MS, Potbhare V (2009) Simulation of reflectivity spectrum for
non-absorbing multilayer optical thin films. Pramana J Phys 72:1011–1022
Chapter 8
Depth Maps-Based 3D Convolutional
Neural Network and 3D Skeleton
Information with Time Sequence
for HAR

Hua Guang Hui, G. Hemantha Kumar, and V. N. Manjunath Aradhya

1 Introduction

The basic aim of video-based human activity recognition is automatically classifying


human activity. Human activity recognition (HAR) has been applied to real-world
various areas widely, such as automatic surveillance, artificial remote monitoring,
smart home, human–machine interaction and abnormal activity detection in public
areas. Aggarwal et al. [1] and Ali et al. [2] divided human activities into four groups:
gestures, actions, interactions and group activities. Gestures are considered an ele-
ment of action from parts of the human body. Actions are movements to describe a
motion or motivation of a person. Human–human or human–object interactions are
activities that include two or more two persons or objects. The group activities are
the more complex condition or more objects in a scene.
The human activity recognition system consists of data pre-processing, object
detection/segmentation, feature extraction/representation, feature analysis and clas-
sification. About the dataset, compared with the traditional RGB camera, the tradi-
tional RGB image just records object and background 2D information, the RGB-D
image besides collecting two-dimensional colour video sequence, also gets a depth
maps sequence. It is more robust to illumination changes and discriminative than
colour or features of texture. After dataset selection, the operation is the data pre-
processing for suitable data into the feature extraction system. The different ways of
feature extraction and representation can be divided into two groups: the handcrafted

H. G. Hui (B) · G. Hemantha Kumar


Department of Studies in Computer Science, University of Mysore, Mysuru
570006, Karnataka, India
e-mail: glory_hua@yahoo.com
V. N. Manjunath Aradhya
Department of Computer Applications, JSS Science and Technology University, Mysuru 570006,
Karnataka, India
e-mail: aradhya@sjce.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 89
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_8
90 H. G. Hui et al.

Fig. 1 Structure based on 3D convolutional neural network and skeleton joint information for
human activity recognition

method and the automatic deep learning features-based method. In the traditional
method, the human activity recognition system is to design handcrafted features to
represent human activity. Handcrafted-based methods such as Space-Time Interest
Point (STIP), Optical Flow, Trajectory, Silhouette and Histogram of Oriented Gra-
dients (HOG) to extract relevant activity information from video sequences are the
well-known techniques in our research topic. The automatic learning features-based
method is either a 2D colour video sequence or a 3D depth video sequence as a
raw sequence into a network for feature extraction automatically. Liu et al. [3] used
a three-dimensional convolutional neural network (3DCNN) to extract a high-level
feature map put into SVM to classify different classes with very promising accuracy.
In this paper, we apply 3DCNN to extract a high-level feature map and then merge
skeleton point geometric information where distance, angles and time sequences
information as feature vectors are put into SVM to classify different activities. The
structure of the human activity recognition system with 3DCNN and skeleton infor-
mation was shown in Fig. 1. Section 2 reviews the recent related work for human activ-
ity recognition, Sect. 3 is the proposed methodology and implementation progress,
and Sect. 4 is the benchmark datasets MSR-Action3D and the experiment result.
Moreover, we also analysed the experiments result based on the hybrid feature vec-
tor on the MSR-Action3D datasets. Finally, Sect. 5 gives the conclusion and future
work.

2 Related Work

Now, with the development of computer methodology and computer device advances,
it already to be the hot top that deep learning with high-performance approaches in
several kinds of research domains such as speech recognization, autopilot, computer
vision, natural language processing and so on.
In the traditional method, the global- or local-based feature extraction and repre-
sentation still have great significance in human activity recognition. Well-designed
8 Depth Maps-Based 3D Convolutional Neural Network . . . 91

features vector is still to distinguish the types of activities with very promising accu-
racy. Tripathi et al. [4] described the system structure and it’s challenges in human
activity recognition, such as illumination changes, the shadow of objects, partial or
full object occlusions and noise in the image. In the meanwhile, discuss the human
activity recognition of all steps from dataset to classification in techniques. Wu et al.
[5] described types of current state-of-the-art approaches based on the deep learning
method reviews on RGB-D, single and multi-view datasets. The different viewpoint
feature representation and deep learning approach give us more new ideas in feature
extraction steps. Zhang et al. [6] highlighted the advances in the system of the human
activity recognition system: the global and local feature extraction, representation and
classification methods. Boualia et al. [7] discussed the human activity recognition
methodologies, advantages and disadvantages. In particular, distribute the feature
representation as local (depth maps-based, skeleton-based), global (Space-time vol-
ume, frequency) and modelling (Simple Blob, 2D model, 3D model). Dhiman et al.
[8] summarized the various existing handcrafted and deep learning approaches in
human activity recognition with the two types of 2D and 3D datasets. Pareek et al.
[9] discussed the types of machine learning approaches, the deeping learning tech-
niques and the characteristics of public datasets used for human activity recognition
system. Hbali et al. [10] presented a novel skeleton-based technique to describe the
spatiotemporal features of the human activity system. Ghazal et al. [11] extracted
skeleton information (motion and shape feature) using the Openpose library for
human activity classification. Dwivedi et al. [12] proposed new skeleton-based fea-
tures (Orientation Invariant Skeleton Feature) for human activity recognition. Yadav
et al. [13] proposed a novel deep learning network called a long short-term memory
network for skeleton-based activity recognition and fall detection system. Jalal et al.
[19] proposed novel multi-fused spatiotemporal features from continuous sequences
of depth maps and spatiotemporal skeleton point information in the human activity
recognition system. Fakhredanesh et al. [20] put forward an unsupervised activities
change detection approach, based on it to detect the action changes in time sequences.
Especially for the activity dataset of video is always changed in different frames, we
consider the time sequences to extract relevant information on frame changes.
In deep learning methods, the deep neural network (DNN) automatically extracts
the features from the video sequences or image and then put the feature map into fully
connected layers or a traditional machine learning classifier. Pham et al. [14] pre-
sented an overview of the current up-to-date deep learning method for human activity
recognition systems, presenting the branching of the well-known deep learning mod-
els, advantages and limitations for human activity recognition systems. Khan et al.
[31] proposed a well-known technique that principal component analysis (PCA) and
probabilistic neural network (PNN) for characters recognition system. Khanet al. [15]
offered a hybrid feature based on the silhouette (body shape) and deep learning fea-
tures map. Khan et al. [16] developed a hybrid model that combines a convolutional
neural network (CNN) with a long short-term memory network. Tran et al. [17, 18]
proposed a suitable, simple and efficient deep learning method 3D CNN to present
spatiotemporal features on large-scale supervised video data. Compared with the 2D
convolutional neural networks, 3D CNN considered one more dimension of informa-
92 H. G. Hui et al.

Fig. 2 a 2D convolution on video, b 3D convolution on video

tion where the time sequences extract the relevant action of frames. Figure 2 gives the
2DCNN and 3DCNN convolutional processing on image or frame sequences. Based
on the 3DCNN automatic convolutional feature map and 3D skeleton information,
we hybrid the two types of feature representation to classify different activities on
the MSR-Action3D benchmark dataset.

3 Methodology and Implementation

3.1 The Architecture of Human Activity Recognition System

Compared with the traditional RGB video dataset, RGB-D video depth maps have
more advantages, especially for lighting changes and dimension information col-
lection. Depth maps include one more dimension of the depth distance from the
camera to the object and less computation than RGB video images. Compared to
single-feature vectors, multi-feature fusion approaches extract more aspect informa-
tion about the activities in a video. In traditional method, Kumar et al. [21] hybrid
optical flow and texture information to extract feature vectors for activity classifi-
cation. In our paper, we proposed an automatic deep learning feature-based method
that 3D convolutional networks to extract feature maps from video sequences. Then
considered the 3D skeleton geometric information that key point distance, relevant
point angle changes and a novel time sequence difference information on activity
frames. The hybrid feature vector (feature maps and 3D skeleton information) was
put into a multi-class support vector machine [22] to discriminate different activities.
8 Depth Maps-Based 3D Convolutional Neural Network . . . 93

3.2 Data Pre-processing and Feature Representation

In the HAR system, we have been thinking about a system that automatically extracts
some precise information from video data. In data pre-processing, we process the
data to be well-suitable for the feature extraction model such as removing noise,
foreground detecting, image segmentation and transform approaches. In this paper,
we focus on feature extraction and feature representation to extract well-designed
feature vectors. At first, our proposed method used a two layers 3D CNN framework
to extract the Spatiotemporal feature maps from the raw data automatically. Before
implementing the network, we resized the frame to 128*128*38 (height * width *
time) and crop the foreground object from the full video frame. The raw data is
put into the convolution and maxpooling layer, and the second convolution layer
and maxpooling are the following layers. Finally, three full connecting layers carry
out a high-level feature map. The kernel size of the first Convolution layer (CL1)
is 7*6*6 (6*6 is the spatial dimension and 7 is the temporal dimension) and the
second layer is 5*5*5. In the following 3D Maxpooling layers, the kernel size is
2*2*2 down-sampling to reduce the dimension and redundant information. Finally,
the full connect layers are 63,488, 1024 and 128, and the 128-dimensional feature
map is carried out. The second type of feature representation is Kenict-based 3D
skeleton information. The distance, angle and time sequence of different changes on
the frame are calculated from skeleton joints key point 7 (hip centre) to the others.
The distance feature vector consists of 7–(1, 2, 8, 9, 10, 11, 5, 6, 14, 15, 16, 17)
key point distance information. The angle feature vectors are 3–(8, 9, 10, 11), 7–
(14, 15, 16, 17) and 8–10, 9–11, 14–16, 15–17. Finally, we considered the skeleton
point changes in every three frames with time sequences. Figures 3 and 4 show the

Fig. 3 Skeleton joint details and distance feature calculation


94 H. G. Hui et al.

Fig. 4 Skeleton joint angle information and time sequences key point changes on frame difference

feature values calculation of the skeleton key point. The feature vector which deep
learning feature maps and 3D skeleton joint information from the depth maps and
the corresponding skeleton information was put into the SVM classifier.

4 Experimental Result and Analysis

4.1 The MSR-Action3D Datasets

We evaluated the accuracy of the hybrid feature vector system in the publicly available
benchmark dataset. MSR-Action3D dataset [23, 24] is an RGB-D action dataset,
captured by a depth camera. It is composed of 10 different people implementing
20 actions two or three times, and it includes bend, draw a circle, draw tick, draw
x, forward kick, forward punch, golf swing, hammer, hand catch, hand clap, high
arm wave, high throw, horizontal arm wave, jogging, pick up and throw, side kick,
side-boxing, tennis serve, tennis swing and two hand wave activities 20 classes. It
is a challenging dataset because of very similar activities such as forward punch-
hammer, bend-pick up and throw. All video sequences of that were recorded from a
fixed viewpoint camera and subjects are facing the camera while performing actions.
At the moment, in post-processing, remove the background. In our experiments, we
divide the activity into three groups AS1, AS2 and AS3 (Tabel 1). Skeleton joint data
are corresponding with the MSR-Action3D depth map by the same device capture.
Each frame has 20 key points in (x, y, z, c). Figure 3 gives the details about the Skelton
coordinate details.
8 Depth Maps-Based 3D Convolutional Neural Network . . . 95

Table 1 MSR-Action3D was divided into three subsets (action subset 1, 2, 3) in the experiment
AS1 AS2 AS3
Bend Draw tick Forward kick
Hammer Draw X Golf swing
Hand clap Forward tick High throw
High arm wave Hand catch Jogging
High throw High arm wave Side kick
Pick up and throw Side-box Tennis serve
Tennis serve Two hand wave Tennis swing

Table 2 Accuracy of the proposed method on subset AS1


Method Human activity Accuracy rate (%)
Bend 100
Hammer 87
Hand clap 100
Proposed method High arm wave 100
High throw 100
Pick up and throw 100
Tennis serve 100
Average 98.1

4.2 The Result of Experiment and Comparison Analysis

Random selecting 1/3 of the samples as testing and the rest as training on the MSR-
Action3D dataset for proposed method validation. The MSR-Action3D dataset was
divided into AS1, AS2 and AS3. Tables 2, 3 and 4 give the discriminate result on
AS1, AS2 and AS3. The accuracy of the activities Hammer is 87%, and the activities
Bend, Hand clap, High arm wave, High throw, Pick up and throw, and Tennis serves
all of them are 100%. The average accuracy is 98.1, 92 and 94.7%. In subsets AS1,
AS2 and AS3 of the datasets, Table 5 gives an accurate comparison of the human
activity recognition system on the MSR-Action3D dataset. Kao et al. [27] put for-
ward a skeleton-based graph structure feature representation in the human activity
recognition system. Chen et al. [29] presented a novel structure of depth motion
map features from depth sequences. Liu et al. [3] using 3D2 CNN extracted spatial–
temporal features from depth sequence. Our method hybrid 3D CNN feature maps
and skeleton joint information with time sequence. Even though we already have
gotten a better result on the dataset, we still need to apply more complex datasets
and up-to-date deep learning techniques to our method (Table 6).
96 H. G. Hui et al.

Table 3 Accuracy of the proposed method on subset AS2


Method Human activity Accuracy rate (%)
Draw tick 90
Draw X 100
Forward tick 100
Proposed method Hand catch 80
High arm wave 100
Side-box 90
Two hand wave 90
Average 92

Table 4 Accuracy of the proposed method on subset AS3


Method Human activity Accuracy rate
Forward kick 100
Golf swing 90
High throw 90
Proposed method Jogging 100
Side kick 83
Tennis serve 100
Tennis swing 100
Average 94.7

Table 5 A accuracy comparison of HAR system on the MSR-Action3D dataset


Method Activity accuracy rate (%)
Kao et al. [27] 74
Chen et al. [29] 75.8
Zhao et al. [26] 86.1
Paoletti et al. [25] 88.51
Liu et al. [3] 89.29
Bulbul et al. [30] 93.0
Li et al. [23] 94.2
Proposed method 95.2
8 Depth Maps-Based 3D Convolutional Neural Network . . . 97

Table 6 Accuracy comparison of the HAR system on the MSR-Action3D dataset, random selecting
one-third of the samples as testing and the rest as training
Random 1/3 as testing data
[23] [28] [3] Our
AS1 93.4 98.61 92.78 98.1
AS2 92.9 97.92 97.06 92.8
AS3 96.3 94.93 98.59 94.7
Avg. 94.2 97.15 98.14 95.2

5 Conclusion

In this paper, we have proposed a 3D convolutional neural network for depth maps
video sequences to extract the high-level feature maps automatically after two layers
of 3DCNN and three full connecting layers. During the same time, calculate the
relevant corresponding skeleton joint distance from the hip centre to others without
foot, hand and head and then calculate action performance angles from six centre
points which are the shoulder centre, hip centre, two elbows and knee skeleton point
to compute angle changes. Finally, we proposed a novel feature vector on 3Dskelton
that is the time sequences skeleton joint changes on different frames. Hybrid deep
learning automatic feature maps and skeleton joint information is applied to the MSR-
Action3D dataset. The experiment result shows that our proposed method achieves
better results for classifying different activities when compared to other currently
existing approaches. Several actions got 100% accuracy. In future work, at first, we
wish to validate the proposed method with different validation approaches, such as
leave one out (LOO) and cross-validation and then applied the model to a more
complex dataset. Meanwhile, we also study deep learning techniques about time
sequence features to classify human activity.

References

1. Aggarwal JK, Ryoo MS (2011) Human activity analysis: a review. ACM Comput Surv (CSUR)
43(3):1–43
2. Ali HH, Moftah HM, Youssif AA (2018) Depth-based human activity recognition: a compar-
ative perspective study on feature extraction. Future Comput Inform J 3(1):51–67
3. Liu Z, Zhang C, Tian Y (2016) 3D-based deep convolutional neural network for action recog-
nition with depth sequences. Image Vis Comput 55:93–100
4. Tripathi RK, Jalal AS, Agrawal SC (2018) Suspicious human activity recognition: a review.
Artif Intell Rev 50(2):283–339
5. Wu D, Sharma N, Blumenstein M (2017, May) Recent advances in video-based human action
recognition using deep learning: a review. In: 2017 international joint conference on neural
networks (IJCNN). IEEE, pp 2865–2872
6. Zhang S, Wei Z, Nie J, Huang L, Wang S, Li Z (2017) A review on human activity recognition
using vision-based method. J Healthcare Eng
98 H. G. Hui et al.

7. Boualia SN, Amara NEB (2019, June) Pose-based human activity recognition: a review. In:
2019 15th international wireless communications and mobile computing conference (IWCMC).
IEEE, pp 1468–1475
8. Dhiman C, Vishwakarma DK (2019) A review of state-of-the-art techniques for abnormal
human activity recognition. Eng Appl Artif Intell 77:21–45
9. Pareek P, Thakkar A (2021) A survey on video-based human action recognition: recent updates,
datasets, challenges, and applications. Artif Intell Rev 54(3):2259–2322
10. Hbali Y, Hbali S, Ballihi L, Sadgal M (2018) Skeleton-based human activity recognition for
elderly monitoring systems. IET Comput Vis 12(1):16–26
11. Ghazal S, Khan US, Mubasher Saleem M, Rashid N, Iqbal J (2019) Human activity recognition
using 2D skeleton data and supervised machine learning. IET Image Process 13(13):2572–2578
12. Dwivedi N, Singh DK, Kushwaha DS (2020) Orientation invariant skeleton feature (OISF): a
new feature for human activity recognition. Multimedia Tools Appl 79(29):21037–21072
13. Yadav SK, Tiwari K, Pandey HM, Akbar SA (2022) Skeleton-based human activity recognition
using ConvLSTM and guided feature learning. Soft Comput 26(2):877–890
14. Pham HH, Khoudour L, Crouzil A, Zegers P, Velastin SA (2022) Video-based human action
recognition using deep learning: a review. arXiv preprint arXiv:2208.03775
15. Khan MA, Zhang YD, Allison M, Kadry S, Wang SH, Saba T, Iqbal T (2021) A fused het-
erogeneous deep neural network and robust feature selection framework for human actions
recognition. Arab J Sci Eng 1–16
16. Khan IU, Afzal S, Lee JW (2022) Human activity recognition via hybrid deep learning based
model. Sensors 22(1):323
17. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal fea-
tures with 3d convolutional networks. In: Proceedings of the IEEE international conference on
computer vision, pp 4489–4497
18. Ji S, Xu W, Yang M, Yu K (2012) 3D convolutional neural networks for human action recog-
nition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
19. Jalal A, Kim YH, Kim YJ, Kamal S, Kim D (2017) Robust human activity recognition from
depth video using spatiotemporal multi-fused features. Pattern Recogn 61:295–308
20. Fakhredanesh M, Roostaie S (2020) Action change detection in video based on HOG. J Electr
Comput Eng Innov (JECEI) 8(1):135–144
21. Kumar SS, John M (2016, October) Human activity recognition using optical flow based feature
set. In: 2016 IEEE international Carnahan conference on security technology (ICCST). IEEE,
pp 1–5
22. Weston J, Watkins C (1998) Multi-class support vector machines. Technical Report CSD-TR-
98-04, Department of Computer Science, Royal Hol-loway, University of London, May, pp
98–04
23. Li W, Zhang Z, Liu Z (2010, June) Action recognition based on a bag of 3d points. In: 2010
IEEE computer society conference on computer vision and pattern recognition-workshops.
IEEE, pp 9–14
24. Zhang J, Li W, Ogunbona PO, Wang P, Tang C (2016) RGB-D-based action recognition datasets:
a survey. Pattern Recogn 60:86–105
25. Paoletti G, Cavazza J, Beyan C, Del Bue A (2021, January) Subspace clustering for action
recognition with covariance representations and temporal pruning. In: 2020 25th international
conference on pattern recognition (ICPR). IEEE, pp 6035–6042
26. Zhao R, Xu W, Su H, Ji Q (2019) Bayesian hierarchical dynamic model for human action
recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pp 7733–7742
27. Kao JY, Ortega A, Tian D, Mansour H, Vetro A (2019, September) Graph based skeleton mod-
eling for human activity analysis. In: 2019 IEEE international conference on image processing
(ICIP). IEEE, pp 2025–2029
28. Ni B, Pei Y, Moulin P, Yan S (2013) Multilevel depth and image fusion for human activity
detection. IEEE Trans Cybern 43(5):1383–1394
8 Depth Maps-Based 3D Convolutional Neural Network . . . 99

29. Chen C, Zhang B, Hou Z, Jiang J, Liu M, Yang Y (2017) Action recognition from depth
sequences using weighted fusion of 2D and 3D auto-correlation of gradients features. Multim
Tools Appl 76(3):4651–4669
30. Bulbul MF, Islam S, Azme Z, Pareek P, Kabir M, Ali H (2022) Enhancing the performance of
3D auto-correlation gradient features in depth action classification. Int J Multimedia Inform
Retrieval 11(1):61–76
31. Aradhya VM, Niranjan SK, Kumar GH (2010) Probabilistic neural network based approach
for handwritten character recognition. Special Issue of IJCCT 1(2):3
Chapter 9
Deep Sea Debris Detection Using
YOLOIncep Network

J. Sudaroli Sandana, Sai Vignesh, R. Sharan, and S. Deivalakshmi

1 Introduction

Marine environments everywhere, from shallow waters to the deep sea, are increas-
ingly clogged with debris. This issue exists in rivers and other bodies of water as
well. Marine debris is typically composed of difficult-to-degrade components, which
persist in the ecosystem. This debris negatively impacts and causes serious water
pollution issues over time. As a result, detecting and addressing them as soon as
possible is essential. The main problem with detecting marine debris is that it loses
its original shape in underwater due to the high pressure and temperature of its
surroundings. Therefore, it is difficult to obtain a detailed underwater debris data
set as only limited images are available, and there has not been much emphasis on
the deep sea floor debris detection field. Further, there is a lot of similarity between
classes and lots of diversity within a single class in deep sea debris data sets. This
poses yet another challenge that needs to be overcome in order to get better perfor-
mance. Currently, with the alarming increase of sea water pollution, many unmanned
vehicles [1] are being sent to clean the polluted water. But without effective detec-
tion algorithms, the unmanned vehicle may predict a bio-organism as debris or vice
versa. With the development of computer vision technology, it has become possible
to augment the available data sets, and results obtained previously can be improved.
Recent attempts have used skip connections in their models to improve the perfor-
mance metrics. This is clear in networks such as ResNet and DenseNet. Rather than
using a typical convolution layer, [2] proposed DenseNet with DeepResidual channel

J. Sudaroli Sandana · S. Vignesh · R. Sharan · S. Deivalakshmi (B)


Department of Electronics and Communication Engineering, National Institute of Technology,
Tiruchirappalli, India
e-mail: deiva@nitt.edu
J. Sudaroli Sandana
e-mail: sandhana3000@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 101
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_9
102 J. Sudaroli Sandana et al.

connection (DRCA) by creating dense connections between residual groups. Recent


papers have used models such as FasterRCNN and single shot detector (SSD) for
object detection. FasterRCNN detector adds a region proposal network (RPN) to
generate Anchor Boxes for Object Detection instead of Edge Boxes [3]. By utilizing
a grid to partition the image, SSD assigns responsibility for object detection to each
grid cell. Predicting the class and placement of an object inside that area is then done.
An SSD consists of two parts: a head and a backbone model. A pre-trained image
classification network serves as the backbone model. Usually, a network like ResNet
trained on ImageNet has been modified by removing the last fully connected classi-
fication layer to serve as the backbone. Thus, the deep neural network can preserve
the spatial structure of the image at a lesser resolution while still extracting semantic
information from the input image. In recent times, YOLO architecture has become
extremely popular, and many researchers tend to take advantage of the simple and
versatile nature of YOLO. In the paper [4], undersea life was recognized with 69.6
and 77.2% accuracy using YOLOv3 for waste identification and to remove debris
floating on the ocean surface.
The major contribution of the proposed work is as follows:
• The proposed YOLOIncep model uses the benefits of both the YOLO model and
the multiple receptive fields scaling in the inception model in classifying and
locating the deep sea debris.
• Four popular YOLO models are used to compare the suggested work with the
state-of-the-art networks. The outcomes of the effective evaluation show that the
proposed method has good data fitting capabilities and can produce promising
debris detection outcomes.
• The three categories of debris and non-debris deep sea images obtained from
the deep sea by the Japan Agency for Marine-Earth Science and Technology
(JAMSTEC) used for experimentation are authentically derived from deep sea
conditions. All the experiments in this work are carried out using images from
this data set, which can reach some degree of reliability and applicability for
practical applications.
The rest of this paper is organized as follows: The related works are given in
Sect. 2. Then, the proposed method is discussed in Sect. 3. Section 4 discusses the
data set, comparative models, and training settings. Next, in Sect. 5, the results and
justification are discussed. Finally, Sect. 6 contains a conclusion of the proposed
work.

2 Related Works

One of the recent works in deep sea debris detection is a novel network called
Shuffle-Xception [5]. Considering the diversity of class similarity [6] found in deep
sea debris, this network was proposed based on Xception architecture. The architec-
ture mentioned above initially uses separable convolution procedures to extract more
9 Deep Sea Debris Detection Using YOLOIncep Network 103

features and sophisticated characteristics from the deep sea data sets. The convolu-
tional layer learns both the spatial and channel dimensions of the feature map in
classical convolution.
Compared to other models, YOLO is a highly lightweight object detection and
localization model that recognizes objects with higher accuracy and recall rates. The
initial You Only Look Once (YOLO) [7] model was proposed by Joseph Redmon
et al. in 2015. Until then, RCNN models were the most widely used object detection
models. Despite being accurate, the RCNN family of models was slow since they
needed a multi-step process to find the ideal region for the bounding box, categorize
these regions, and then refine the outcome using postprocessing. YOLO was created
to replace multistage object detection with a single stage to enhance performance. The
unified detection strategy used by the YOLO model, which unifies several elements
of object identification into a single feed neural network, is the basis of its primary
operation.
The YOLO model divides the input image into several grids and evaluates the
chances that each grid contains the object in every image. Next, the algorithm creates
a single object by combining nearby high-value probability grids. YOLO employs
the non-max suppression (NMS) method of eliminating low-value predictions. The
higher probability bounding boxes are suppressed in favor of the lower value ones.
The model is also trained by comparing the center of each identified object to the
ground truth. Because bounding boxes learn entirely from data, YOLO V1 has defi-
cient performance in localizing boxes. YOLOv1 received a few upgrades, which
prompted the release of YOLOv2 [8]. The second version had anchor boxes. As seen
in Fig. 1, anchor boxes are predetermined areas representing the idealized place-
ment of the objects to be detected. The overlap over union (IoU) ratio between the
predicted bounding box and the pre-defined anchor box is calculated. If the chance
of the detected item is strong enough to generate a forecast, it is determined by the
IoU value.
The upgraded YOLOv3 [9] is an enhanced version of YOLO. Instead of fully
connected or pooling layers, YOLOv3 used 75 convolutional layers to produce a
far more compact and lightweight model. It learned a variety of features swiftly
and effectively by combining residual models from the ResNet model with feature
pyramid networks (FPN). An image feature extractor that extracts features of various
sizes, shapes, and kinds is known as a feature pyramid network.
All the input to the model is combined to boost the model to learn both local and
global features. Using logistic classifiers and activations, the YOLOv3 [10] class
predictions beat RetinaNet-50 and 101 in terms of accuracy. The foundation of the
YOLOv3 idea is the DarkNet53 architecture. The base YOLO architecture used for
deep sea debris detection is YOLOv5, the latest version of YOLO. Though many
researchers have attempted to improve object detection performance using YOLO,
the inception network with multiple kernel sizes motivates to modify the backbone
of the YOLO for deep sea debris detection.
104 J. Sudaroli Sandana et al.

Fig. 1 Working of YOLOv1 [7]

3 Proposed Work

Inception network [11, 12] is one of the most efficient models with better perfor-
mance. This method reduces the computing resources while expanding the network’s
depth and breadth. Because of multiple scaling kernels in each convolution of the
inception network, it relieves from the loop of testing the optimal kernel size. As
explained before, YOLO is a very versatile model. It can be used with any other
existing models as the backbone. Many have attempted to use models such as ResNet
and VGG16 as their backbone. The advantages mentioned above of YOLO and the
inception network motivate us to implement Inception-Net as its backbone for deep
sea debris detection.

3.1 Methodology

As inspired from the super resolution [13] techniques for improving the classification
and localization performance, as a preprocessing step, super resolution techniques are
used to improve the resolution of deep sea debris images. The super resolution model
FuNIEGan [14] is used, as it was designed for deep sea applications. As a result,
FuNIEGan is used to improve the resolution of the images in the data set, as shown
in Fig. 2. Then, the super-resolved images are fed into the proposed YOLOIncep
model for classification and localization of debris in the images.
9 Deep Sea Debris Detection Using YOLOIncep Network 105

Fig. 2 Before (upper row) and after super resolution (below row)

3.2 Network Architecture

The proposed YOLOIncep architecture for deep sea debris detection is shown in
Fig. 3. YOLO with inception modules in its backbone is shown in Fig. 3a. The
proposed YOLOIncep network consists of three parts: backbone, neck, and head. The
backbone part of the proposed architecture includes the inception block, bottleneck
layer block, and spatial pyramid pooling (SPP) block.
The inception network used in the proposed work is shown in Fig. 3b. Since it has
multiple kernels with different scales, it relieves us from choosing the optimal kernel
size and can detect target objects of all sizes. The major advantage of the inception
network is its receptive field because of the different scales of filter sizes. The multiple
scaling kernels in the inception block extract distinctive features, and the concate-
nation of these features increases the performance of the model. A bottleneck CSP
layer has fewer number of nodes than the preceding layers. A reduced-dimensional
representation of the input can be obtained using the bottleneck layer. Each bottle-
neck layer in YOLOv5 consists of three convolutions, as shown in Fig. 4. In addition
to this, the number of output channels from the bottleneck layer is determined by the
expansion factor, which reduces the number of channels in the successive layer of
the model.
Spatial Pyramid Pooling (SPP) [15] is a pooling layer that enables a CNN to
function without a fixed-size input constraint. Typically, the SPP layer is used on top
of a convolution layer. The fully connected layers get fixed-length outputs from the
SPP layer. To minimize the requirement for initial cropping or warping, information
106 J. Sudaroli Sandana et al.

(a).YOLO with Inception backbone Architecture

(b).The Inception Block

Fig. 3 Our proposed architecture

aggregation at a higher level of the network structure (between convolutional and


fully connected layers) is performed by SPP.
Path aggregation network (PANet) is the neck part of the proposed architecture.
The PANet consists of the standard convolutional layer with kernel size 1 × 1 and 3
× 3, bottleneck layers, up samplers, and concatenation layers. The next block is the
head part which consists of the standard convolutional layer with kernel size 1 × 1.
9 Deep Sea Debris Detection Using YOLOIncep Network 107

Fig. 4 Bottleneck CSP


architecture

4 Experiments

4.1 Data Set Description

The images used in the proposed work are obtained from the Japan Agency for
Marine-Earth Science and Technology (JAMSTEC) data set [16] (https://www.
godac.jamstec.go.jp/jedi/e/). The sample images from the data set are shown in
Fig. 5. The data set is an open source; it includes images of marine trash captured by
the deep sea submersibles “SHINKAI 6500” and “HYPER-DOLPHIN”. From this
publicly available data set, three classes of debris and non-debris are used for the
experimentation as defined below:

Fig. 5 Data set sample images


108 J. Sudaroli Sandana et al.

Table 1 Data set size


Category Number
Bio 1362
Plastic 3931
ROV 428

Plastic: Consists of plastic and marine debris.


ROV: Consists of all artificial objects such as ROVs, permanent sensors that have
been purposefully inserted into the environment.
Bio: Fish, plants, and other naturally occurring biological debris are all considered
biological stuff.

4.2 Training Settings

Super resolution technique was applied to the images of size 256 × 256 used for
the proposed work. Input sizes that are either small or too large could result in data
loss, memory overflow, and more complicated calculations. Additionally, because
of memory constraints, if the input image scale is too large, the batch size is limited
(batch size = 1, 2), which could lead to dubious classification accuracy from the
network. After being super-resolved, the image is fed to the proposed YOLOIncep
model. The number of images in each category before data augmentation is given in
Table 1. The code was implemented in Python in Google’s Colab, which runs Python
3.7.13.

4.3 Optimizers and Losses

An optimizer is an algorithm that alters neural network properties like weights and
learning rates. SGD stands for stochastic gradient descent. It is an iterative technique
for maximizing an objective function with sufficient smoothness qualities. The loss
function used by YOLOv5 is GIoU. GIoU stands for generalized intersection over
union [17]. GIoU is an improved version of the IoU algorithm. The IoU algorithm
does not tell us if two shapes, A and B, are in the vicinity of one another. In the GIoU
algorithm, an object C is introduced such that C is the smallest object enclosing A
and B. GIoU (Eq. 1) is defined as

|C\(AUB)|
GIoU = IoU − . (1)
|C|
9 Deep Sea Debris Detection Using YOLOIncep Network 109

4.4 Performance Metrics

Precision is defined as the number of true positives divided by the total number
of positive predictions (i.e., the number of true positives plus the number of false
positives), as given in Eq. 2.

true positive
Precision = . (2)
true positive + false positive

The recall is a fraction of a class rightly identified as the target object, given in
Eq. 3.

true positive
Recall = . (3)
true positive + false negative

F1 score is a single statistic that combines both precision and recall. This calculates
the harmonic mean of a classifier’s accuracy, and recall is given in Eq. 4. This is done
because, in machine learning models, there is usually a trade-off between precision
and recall.
2( precision × Recall)
F1 Score = . (4)
precision + Recall

5 Results and Discussion

The detection and localization performance of the proposed model is shown in Fig. 6.
The model locates the object of interest in the image and encircles it with a bounding
box. The label indicates the class to which the object belongs. Multiple objects can
be detected from a single image. The existing YOLO models are compared with the
proposed YOLO with the inception backbone model (YOLOIncep). YOLO has many
models based on the depth of each model ranging from YOLOv5S (shallow YOLO)
to YOLOv5L (deep YOLO). The results obtained for the different YOLO versions
are given in Table 2. Hence, it is observed that modifying the YOLO backbone by
adding inception modules improves the performance of YOLO models. As mentioned
earlier, the various scaling filter sizes used in the inception module enhance the
performance of the network because of its receptive field. Furthermore, the YOLOv5
version performs better than YOLOv3 and YOLOv2. Debris and undersea life were
identified with 69.6% and 77.2% accuracy, respectively, by Watanabe et al. [4] using
YOLOV3, while YOLOv5 and YOLOIncep give much better results predictably.
Further, YOLO models can identify and localize the target in a single pass, a feature
not shown in other models [5]. Multiple objects of different classes are identified in
the same frame, as shown in Fig. 6.
110 J. Sudaroli Sandana et al.

Fig. 6 YOLOIncep debris detection results

Table 2 Comparison of
Model Plastic BIO ROV
existing YOLO models with
proposed YOLOIncep Precision
YOLOv5x 97.3 95.7 93
YOLOv5m 84.6 91.4 78
YOLOv5s 83.3 74.1 59.7
YOLOIncep 98.9 91.7 93.4
Recall
YOLOv5x 96.9 95 87.5
YOLOv5m 88.2 72.7 66.5
YOLOv5s 85.9 74.1 59.7
YOLOIncep 98.2 95.7 93.1
F1Score
YOLOv5x 97.1 95.3 90.2
YOLOv5m 86.4 81 71.8
YOLOv5s 84.6 78 67.1
YOLOIncep 98.5 93.7 93.2

6 Conclusion

The proposed work uses deep convolutional neural networks to examine the clas-
sification and localization of deep sea debris. The YOLO network model with an
inception backbone is proposed in this paper and compared against other existing
YOLO models. YOLOv5x performs better than the earlier versions of YOLO, as
9 Deep Sea Debris Detection Using YOLOIncep Network 111

expected. YOLOIncep, YOLOv5 with a backbone having inception modules with


different receptive fields, gives a high mAP @0.5 of 0.979 on the JAMSTEC open-
source deep sea data set. Further, the models can perform both classification and
localization of the images with high precision.

References

1. Fulton M, Hong J, Islam MdJ, Sattar J (2019) Robotic detection of marine litter using deep
visual detection models. In: 2019 international conference on robotics and automation (ICRA).
IEEE, pp 5752–5758
2. Jang D-W, Park R-H (2019) Densenet with deep residual channel-attention blocks for single
image super resolution. In: Proceedings of the IEEE/CVF conference on computer vision and
pattern recognition workshops, pp 0–0
3. Xue B, Huang B, Wei W, Ge C, Li H, Zhao N, Zhang H (2021) An efficient deep-sea debris
detection method using deep neural networks. IEEE J Sel Topics Appl Earth Observ Remo
Sens PP:1–1. https://doi.org/10.1109/JSTARS.2021.3130238
4. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with
region proposal networks. In: Advances in neural information processing systems, vol 28
5. Kamat J, Gupta R (2021) Inception SN: an inception based convolutional neural network
for hyperspectral image classification. In: 2021 2nd global conference for advancement in
technology (GCAT), pp 1–4. https://doi.org/10.1109/GCAT52182.2021.9587504
6. Yin H, Cheng C (2010) Monitoring methods study on the great Pacific Ocean garbage patch.
In: 2010 international conference on management and service science. IEEE, pp 1–4
7. Watanabe J-I, Shao Y, Miura N (2019) Underwater and airborne monitoring of marine
ecosystems and debris. J Appl Rem Sens 13:1. https://doi.org/10.1117/1.JRS.13.044509
8. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object
detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp 779–788
9. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE
conference on computer vision and pattern recognition, pp 7263–7271
10. Lu Z, Lu J, Ge Q, Zhan T (2019) Multi-object detection method based on YOLO and
ResNet hybrid networks. In: 2019 IEEE 4th international conference on advanced robotics
and mechatronics (ICARM), pp 827–832. https://doi.org/10.1109/ICARM.2019.8833671
11. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.
02767
12. Chen J, Chen W, Zeb A, Yang S, Zhang D (2022) Lightweight inception networks for the
recognition and detection of rice plant diseases. IEEE Sens J 22(14):14628–14638. https://doi.
org/10.1109/JSEN.2022.3182304
13. Islam MdJ, Xia Y, Sattar J (2020) Fast underwater image enhancement for improved visual
perception. IEEE Robot Autom Lett 5(2):3227–3234. https://doi.org/10.1109/LRA.2020.297
4710
14. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich
A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer
vision and pattern recognition, pp 1–9
15. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks
for visual recognition. In: Computer vision—ECCV 2014. Springer International Publishing,
pp 346–361. https://doi.org/10.1007/978-3-319-10578-9_23
112 J. Sudaroli Sandana et al.

16. JAMSTEC (2009) JAMSTEC OFES (Ocean General Circulation Model for the Earth
Simulator) Dataset. JAMSTEC.https://doi.org/10.17596/0002029
17. Rezatofighi H, Tsoi N, Gwak JY, Sadeghian A, Reid I, Savarese S (2019) Generalized intersec-
tion over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/
CVF conference on computer vision and pattern recognition, pp 658–666
Chapter 10
Brain Tumor Early Diagnosis Using
Hybrid Fuzzy K-Means
and Convolutional Neural Networks

M. Jeyavani and M. Karuppasamy

1 Introduction

A primary brain tumor is a common tumor associated with the brain, and a secondary
brain tumor starts from other carcinomas including lung, melanoma, breast, and
kidney [1]. According to the American Cancer Society estimated in 2022 in the USA,
there will be 25,050 adults diagnosed with primary malignant tumors of the brain and
spinal cord (14,170 males and 10,880 women). Human brains are encased in a fluid
called cerebrospinal fluid (CSF). From one ventricle to the next, this cerebrospinal
fluid flows. It aids with spine and brain protection. The subarachnoid space and the
central nervous system are encircled by the fluid in these ventricles; the nerves that
are located above the brain absorb the extra fluid that is released. Pressure builds up
in other areas of the brain when this waste fluid is not absorbed by the neurons [2].
Some symptoms result from this. Senior citizens have regular stress. However, a slow
secretion of extra fluid and an increase in pressure are the causes of hydrocephalus in
people over 60. Medical data mining can be used to investigate hidden information.
Various methodologies are handled in various ways in MRI datasets to uncover this
classification issue. However, this deep learning strategy is used to improve the data’s
accuracy and reduce processing time. In data mining applications, deep learning
reduces unrelated features and raises the associated number of features. The new
approach’s goal is to identify appropriate subgroups that are dealt with in complex
and unclear packages.

M. Jeyavani (B) · M. Karuppasamy


Department of Computer Applications, Kalasalingam Academy of Research and Education,
Krishnankoil, Tamil Nadu 626126, India
e-mail: jeyavanim@gmail.com
M. Karuppasamy
e-mail: karuppasamy.m1987@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 113
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_10
114 M. Jeyavani and M. Karuppasamy

A simple CT scan method was employed to diagnose the illness. But an advanced
scan technology called an MRI is used to assess, identify, and keep track of a variety
of medical disorders affecting the anatomy of the skull from various perspectives. The
most recent method, known as 3D MRI, shows brain activity and aids in visualizing
the brain to determine whether there is a blockage brought on by fluid in the brain.
In the new approach, real-time MRI scans datasets to find fuzzy logic pattern
recognition were utilized, to apply the fuzzy K-means membership function, which
is used to diagnose the human brain from many perspectives. To avoid fuzziness or
overlap between the clusters CNN-based forward propagation and backward prop-
agation utilized, the region-growing approach is the extraction of brain regions to
determine benign or malignant.

2 Related Work

Rejiga [3] one of the major contributions, segmentation was used to evaluate, auto-
matic detection and analysis to predict hydrocephalus tumors in young patients health
issues using MRI scans. It is predicted and supported by Digital Imaging and Commu-
nications (DICOM). Also, the region-growing approach is the extraction of brain
regions, and the hydrocephalus segmentation method is employed. When compared
to the prior method, time is very much less of a factor, but prediction accuracy is low.
(Chen et al. [4] the benefaction) Fuzzy linear soft max mode, and rough reduction
are used and predict the fundamental emotions. It is assessing the depth feature’s
capability and the fuzzy rough set loss function used for detection. The final stage is to
extract and classify features using convolutional neural networks (CNNs). The GPU
model, forward propagation, backward propagation, and AlexNet were employed,
but the time consumption is high.
(Perez et al. [5] the beneficence) Fuzzy-based K-means clustering technique is
utilized to generate unique features. Also, an artificial neural network is used for
feature extraction. Also, a gray-level co-occurrence matrix has been used that has
more ability to diagnose the possible efficiently. It is identifying the affected part of
the brain tumor region. But more noise and accuracy are low.
The new approach indicated that fuzzy rule-based classification was utilized for
the fuzzy partition and merged with CNN to remove noise from the dataset, enhance
accuracy, and decrease the time consideration. The fuzzy partition method, a rule-
based approach with a membership function, is used to present a graphical represen-
tation of the patient’s health conditions that is simple to comprehend. The CNN-based
classifier is utilized to more accurately update the membership function and is also
employed to speed up the processing of the enormous data collection.
10 Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means … 115

3 Proposed Work

The brain tumor machine learning real-time databases are gathered from the popular
MRI scan center. The following phases were added to reduce noise reduction, increase
accuracy, and consider time. The hybrid fuzzy-based K-means methodology, fully
connected network-based feed-forward, and backpropagation were applied to illus-
trate the specific area of the affected brain tumor region. Phase 1: To illustrate
the patient’s health problems graphically, fuzzy rule-based membership functions
were employed. Phase 2: To modify the membership function, avoid the overlapping
problem and classify the clusters utilized in convolutional neural networks. However,
as there will be noise in the data during categorization, the enormous data collec-
tion of noise is filtered using a fuzzy rule-based method and the classification of the
statistical data makes good use of feed-forward and backpropagation.

4 Methodology

The new technique utilizes a hybrid approach to compare the performance of the fuzzy
K-means approach and convolutional neural networks. To display the performance
outcomes, the real-time datasets are evaluated and contrasted with the validation
data. The patient reports for brain tumors are gathered from a well-known MRI scan
center, developed as a machine learning database, and utilized. However, a fuzzy set
cannot, by itself, remove the noise data for the huge data in dimensionality reduction.
So, to determine the dimensionality reduction of the huge data, a fuzzy K-means
clustering rule-based algorithm was presented. To evaluate the accuracy performance,
all true values were collected, subtracted from the false value, and divided by the
number of total objects. Figure 1 shows the classifications of fuzzy that make use of
deep learning. Additionally, the fuzzy datasets were applied to convolutional neural
networks based on fully connected neural networks to obtain more accurate results.
For the diagnosis of hydrocephalus, four common MRI modalities FLAIR, T1, T1c,
and T2 are employed. The Fluid Attenuated Inversion Recovery (FLAIR) modality
is employed to detect the entire tumor component among these four modalities. The
core tumor area is seen in the T2 modality of the MRI datasets. The enhancing tumor
portion of the core tumor region is identified by the T1c modality.
The data are classified by utilizing fuzzy K-means and CNN-based fully connected
networks built based on convolutions. Effective outcomes have been employed in
fuzzy to deal with ambiguity. In the fuzzy set theories, the traditional bivalent set
is referred to as a crisp set. Everything operates according to a rule of true or false.
An object is divided into two categories: fully within a set and partially within a set
or union and intersect. In order to identify groupings that have not been explicitly
classified in the data, the K-means clustering technique was utilized. The K-means
algorithm optimizes similarity between data points inside clusters while minimizing
116 M. Jeyavani and M. Karuppasamy

Fig. 1 Fully connected network

similarity between points in various clusters. The afflicted area was successfully iden-
tified using a CNN-based fully connected network that used feed-forward and back-
propagation that illustrate brain tumor detection. Similar features can be programmed
in these networks. A number of hidden layers are present in networks that have been
properly trained and set up to map the knowledge of the input and output training
pairs.

5 Preprocessing

Before undergoing further processing, data that is being prepared for analysis under-
goes initial processing. Data preprocessing is the process of processing data. Data
cleaning is the process of resolving data disputes to restore lost utilities and reduce
noise. Data integration: By merging data from several forms, conflicts in the data
were resolved. Data transformation is the collected, normalized, and generalized data.
Data reduction: This stage aims to give the datasets a reduced representation. Data
discretization involves continuous features to narrow the characteristic component
spacing range. The data collection includes features, numerous classes, and real-time
data. The training dataset and the testing dataset were separated from the dataset to
be evaluated. Irrelevant features have been removed from the dataset, which was
collected from the MRI scan machine learning database. It is now in a format that
makes the original data understandable. To tackle such issues, data preprocessing
was used.
10 Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means … 117

6 Fuzzy K-Means Approach

To find a crisp set and fuzzy classification function, follow the steps. (1) fuzzy set,
(2) the fuzzy K-means algorithm, and (3) prediction and instance selection is listed
in that order.

6.1 Fuzzy Set

A category of things having continuous membership is referred to as a fuzzy set.


A member’s quality, which goes from zero to one, is represented by each object.
This collection is defined by the member properties. Concepts like inclusion, union,
intersection, complement, relatedness, and convexity are included in the material of
this package. Fuzzy sets are created by applying two CSV files containing sample
datasets, after which the decision attributes and condition attributes are applied to
the decision tables. Applying the decision table in union and intersect will make it a
clean set. The in-universe set’s frequency range has been altered by moving it from
0. Finally, a fuzzy membership application was made employing the fuzzy partition
variables and fuzzy rules. Systems of information or decision-making consist of an
ill-defined, approximate group of objects with attributes and rows. The choices have
been given in the columns along with the conditions in each row [6]. A crisp universe
partition and a family of crisp equivalence classes were constructed using a crisp
equivalence relation. In accordance with this, a number of fuzzy equivalence classes
are generated by a fuzzy equivalence relation, commonly known as “fuzzy knowledge
granules,” “fuzzy partition of the universe,” “fuzzy decision characteristics,” and
“fuzzy condition characteristics” that are all possible. Dimensionality reduction is
handled in a crude manner, but it cannot eliminate noise from the huge data. In
order to handle the enormous dataset and eliminate noise, fuzzy is used jointly to
improve the great majority of the dataset’s real-valued properties. The ambiguous,
hard surface fuzzy equivalence classes are established as follows:

μ R (x, y) = μ(x, y)∀ y ∈ X, (1)

where U = (x 1 , x 2 , …, x n ) signifies complete sample sets, x, y denote the n-layer


feature vectors, and x; y e X denotes two samples from a whole set. ∀ y ∈ X means
that y has the value for everything that belongs to X. μR (x, y) denotes the membership
of a crisp fuzzy set in set X. Two attributes, decision attribute and condition attribute,
are suggested with the rules in the search for the relation. The kernel-based Gaus-
sian function suggested, was developed to determine the imperceptible relationship
between the decision attribute and condition attribute. As denoted by,

−x − y2
μ R (x, y) = exp , (2)
s
118 M. Jeyavani and M. Karuppasamy

where μ R (x, y) is a fuzzy similarity relation and can be any distance function
or kernel, whereas μ R (x, y) specifies the expression for the negative values and
exp −x−y
2

s
denotes the expression for the positive values.

6.2 Improving Fuzzy K-Means Algorithm

Unsupervised learning is accomplished using the K-means algorithm. The number


of groups or clusters needed to categorize the objects is indicated by the letter “K”
in the algorithm’s name. The items will be divided up into K groups or clusters of
resemblance by the algorithm. Euclidean distance was used as a unit of measure-
ment to calculate that similarity. The K algorithm first initializes K points and then
randomly classifies each object from the cluster centroids to its nearest mean and
updates the mean coordinates, which are the averages of the items categorized in that
cluster so far, to get the cluster the process repeated for a given number of iterations
and at the end [7].

6.3 Fuzzy Instance Selection

Using the rules has been created to forecast the decision values and classifications of
brand-new data. With a large amount of data, it can filter out the noise and identify
reliable data. The vast amount of data makes it exceedingly challenging to determine
its level of dependence. Therefore, it is crucial to eliminate data reduction for accuracy
objectives. The fuzzy degree of membership is evaluated during instance selection.
The two fuzzy positive regions determine Fuzzy Instance Selection (FIS). Threshold:
The amount that determines whether or not an object can be eliminated. Alpha is
a variable used to gauge fuzzy similarity [8]. A crucial idea that is also frequently
misinterpreted is probability. One area where this is crucial is in the understanding
of risk and relative risk. The model is created using the training and testing sets of
data. Based on the individuals’ ages and genders, the importance was determined
(male or female). Analysis has been done on the features to see if they can accurately
predict values in the histogram and plot after the data has been transformed and
before reduction is widely used.
10 Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means … 119

7 Brain Tumor Detection Based on Convolutional Neural


Networks (CNNs)

The DNN is a deep neural network, which is how the brain is organized. The brain
is the center of how a neural network works. Its more than 100 neurons enable it
to sophisticatedly process and compute massive volumes of data. There are three
different kinds of neural networks: convolutional neural networks (CNNs), recur-
rent neural networks (RNN), and artificial neural networks (ANN). Convolutional
neural networks are designed with several layers to extract features from raw input.
CNN’s primary layer is broken up into three sections, for instance, convolution,
pooling, and fully connected layers. In this new approach, a fully connected network
is suggested. There are numerous ways to alter CNN’s architecture. Every neuron in
a layer communicates with every other neuron in the layer above it. The input layer
is the first layer in the hierarchy and links the inputs and pixels directly. Datasets
are therefore collected and supplied to the input layer. The convolution layer then
receives these data samples from the input layer and performs feature extraction on
them [9].
The convolutional neural networks are employed in the most recent classification
technique. The dataset consists of two sets: a training dataset and a dataset for clas-
sifier testing. The process of dividing a dataset into two halves so that the exercise
and test datasets have equal sample sizes is called cross-validation with a holdout. In
our analysis, each sample has 127 samples. After the dataset has been split into two
sets, the CNN classifier is constructed using the training dataset. The test dataset is
then used to gauge the accuracy of the created classifier.

7.1 Convolution Layer

All of the neurons represent a certain region of the preceding layer when the extraction
process in accordance with kernel size is complete. Typically, layers apply and act
without a direct map to activate this convolution layer. The 3×3 kernels, which are
ingrained in the architecture and fast in testing, not only intensify and normalize
connections but also multiply data based on connection rotation because they were
built to be included in two models for all standards. Three different FCN kinds and
six different convolution layer types make up advanced brain tumor prediction. CNN
makes patchwork predictions for the probability distribution. However, pixel-level
probability distributions are predicted by FCN models. It will dive into great depth.
The information neighborhood pixels are the focal point of that pixel. Post-processing
is incorporated with hydrocephalus datasets to increase accuracy and forecast value.
 

vlj = f dil−1 ∗ kil j + blj (3)
i∈
120 M. Jeyavani and M. Karuppasamy

The input datasets, d i , are twisted with vlj learnable kernels, k ij , and supplied via
the function of activation, f (..), to construct the feature map that is produced vlj in the
convolution layer, dil−1 serves as the input channel’s representation. The result of the
convolution layer symbol is vlj . A bias additive, blj , is then assigned to each feature
map after the source of the input dataset is complicated using learnable kernels. A
bias vector is a set of neural network weights that is equivalent to the output of a
zero-input artificial neural network and does not require any input. Pre-output layers
each contain an additional neuron termed bias.

7.2 Pooling Layer

According to the design, the pooling layer comes after the next layer and aids in
reducing the output of the dimensionality convolutions. Pooling layers often come
in two different varieties: the normal pooling layer and the max pooling layer. With
typical pooling, the result is rounded off. But the brighter pixels were chosen via max
pooling. Max pooling is used when there are sharp features that cannot be recognized
[10]. When a black background is sought after, pixels are lighter. Max pooling layer:
Following the application of kernels to the input datasets, a subsampling layer is
used to generate geographical and configuration invariance. It might result in a 50%
reduction in the computation time for feature maps. The maximum pooling layer is
achieved in the non-overlapping neighborhoods by Eq. (4).
 
z i jk = max vi, j+n,k+m , (4)

where i and j denote an increase to update the max pooling rows and columns and k
is centroid point. To find the greatest value close to n × m, max pooling is utilized.
The fully connected network’s input vector receives the output of the subsampling
layer after that.

7.3 Fully Connected Network (FCN)

A fully connected network (FCN) was developed by Long et al. A completely linked
network representing the fully connected layer is returned. The extraction of high-
level structures takes place as completely integrated layers [11].
Optimize the weights of the neural connections and kernel constantly throughout
the backpropagation process; this layer improved the prediction value by imple-
menting the dense-based pixel prediction. Implemented maps are gathered by convo-
lution layers for publications along with sample output to obtain high accuracy. All
of the neurons connected to the neurons of the layer below are connected to the fully
connected network layers in Fig. 1. The qualities Age, Sex, T1, T2, FLAIR, V1,
10 Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means … 121

V2, V3, and V4 make up the FCN. Both neurons are connected to each layer in the
figure via a different layer. All of the layers carry error signals. Through 115 stages,
869.107694 error rates are calculated. The accuracy of the general set of data and
the pixel-wise prediction from the specific datasets were both improved by a fully
connected network. Any size of input from datasets is obtained using dense-based
pixels. After that, the values of the fuzzy rough set dimensionality reduction are
mapped using FCN. In completely connected neural networks, which are artificial
neural networks, all of the nodes, or neurons, in one layer are connected to the neurons
in the subsequent layer. The fully connected layer is processed as a feed-forward and
back-forward neural network.

7.4 Feature Extraction

To extract distinctive characteristics from datasets that cannot be altered by prepro-


cessing, transformation, reduction, or prediction, a trainable feature extractor is
utilized. Fully connected layer is divided into two sections: forward propagation
and backpropagation computation. The input data is forwarded and transmitted from
the input layer through many hidden levels to produce an output signal. The computa-
tion for forwarding propagation in each layer is covered in the section each of which
is the fully connected layer (FCL). The weight connection multiplies the input signal,
which is subsequently combined with a bias. Equation 5 can be used to build a layer
that is completely connected.


m
x ij = wi j × yil−1 bi (5)
i=0

yil−1 is the input of M-FCL from the previous layer’s output, where m is the total
number of inputs that nerve cell j has received. Biased values are added to the sums
produced at each node during the forward phase (excluding Input nodes). In other
words, the bias related to a particular node is added to the score before utilizing the
activation function at that node. The M-FCL output signal y il is defined as follows:
 
y lj = f x lj , (6)
 
where f x lj stands for the fully connected layer’s activation function. The output
signal is then compared to the desired output, which results in an error that is returned
and relayed layer by layer into the system network, as has been detailed in more detail
in the accompanying picture.
The deep convolutional neural network uses back and forward propagation in
Fig. 2. The qualities Age, Sex, T1, T2, FLAIR, V1, V2, V3, and V4 make up the
propagation. Both neurons are coupled for forwarding and backward propagation
122 M. Jeyavani and M. Karuppasamy

Fig. 2 Forward propagation and backpropagation

in the illustration [12]. All of the layers carry error signals. Through 189 steps,
870.492308 error rates are calculated. Backpropagation involves calculating an error
signal by contrasting the network’s output with the desired output. The ensuing erro-
neous signal travels layer by layer backward through the network. The convolutional
forward network’s weight connections are updated in this study using the stochastic
gradient descent backpropagation method. When the forward propagation parameters
are altered, the error signal is sent back to the convolutional layer and subsampling
layer. The subsampling layer multiplies the local gradient. The local gradients of the
convolutional layer are defined as follows for kernel updates:
   
δlj = f l u j .up δl+1
j (7)
 
f l u j are the derivatives of the activation function, u j is the input prior to activation,
and up(.) is the subsampling of neighborhood gradients from layer l + 1, which is
the subsampling layer. The bias, bj , is determined by summing each component in
δlj as follows,

δE  
= δlj qr, (8)
δb j q,r

δE  
= δlj qr ( pil−1 )qr. (9)
δki, j
l
q,r

Finally, Eq. 9 computes the gradients for updating the kernels, where ( pil−1 )qr is
the patch in u l−1
i this is multiplication by elements with ki,l j . . The element at (q, r)
in the convolutional map’s output, u li , is computed [13].

7.5 Feature Selection

The features were used to obtain optimal brain features from the collection of the
selection approach. Multiple feature subsets are integrated into group-based feature
10 Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means … 123

selection by picking the best subset of features using a combination of classical


rankings to increase categorization and accuracy. Diverse datasets, feature subsets,
and taxonomies can be used to achieve diversity. To increase the performance of the
classification, the feature selection procedure removes irrelevant features from the
dataset. The analysis has strengthened dimensionality reduction. Therefore, feature
selection is used to improve the overall brain tumor recognition process and to
identify the region either as a normal region or an abnormal region and type of
tumor [14]. Using the R tool, a density plot illustrates the kernel density estimate
for a numeric variable that is used to display the probability function. The T1, T2,
and FLAIR probabilities are taken and explained in the list below. The distance
between the two observations is handled by CNN. Depending on the measured
and unmeasured distances, the likelihood of these observations varies. Although
the wrapper approaches offer superior categorization, they are quite sophisticated
and have problems with data over-fitting. The benefits of both the filter and wrapper
approaches are combined with higher performance and reduced complexity in the
hybrid approaches. When compared to existing techniques, a feature selection-based
classifier was used to achieve greater classification accuracy [15]. The presented
result shows that the suggested strategy can be successfully applied for the early
detection of brain tumors. It is more accurate in classifying things than the methods
now in use. CNN categorization is also effective, and adopting the provided method
for classifying data would work better. Recent advancements in data analysis and
computational technologies have significantly improved the classification of brain
tumors. Early detection improves treatment outcomes [16].

8 Implementation of Result

Techniques for hybrid fuzzy K-means cannot manage noisy data on their own. To
implement fuzzy in the membership function, fully connected network-based feed-
forward and backpropagation were integrated. Table 1 contains a summary of the
sample dataset.
Table 1 shows each dataset including the number of features (real-time dataset
attributes). To illustrate the prediction value, the fitted value between 0 and 1 is also
referred to as the predicted value 0, 1, or near 1. With this prediction, the value was
found to be accurate.

Table 1 MRI scan real-time datasets


Age MF T1 T2 FLAIR V1 V2 V3 V4
30 F 0 0 0 0 0 1 1
40 F 2.7 2.7 2.5 0 0 0 1
50 F 4.7 4.5 3.5 0 0 0 4
60 M 6.5 5.4 5 0 0 1 1
124 M. Jeyavani and M. Karuppasamy

Finally, Table 2 has the overall performance to show the fuzzy K-means (FKM) and
convolutional neural networks (CNNs)-based reduction (RED). The table portrays
the accuracy attained by the FKM, FKM-RED, FKM-CNN, and FKM-QR-CNN
accuracy are 84, 86, 88, and 90%, showing that the fuzzy K-means reduction
integrated with CNN result has the highest value 90%.
Figure 3 shows the results of our fuzzy K-means clustering integrated with a fully
connected network compared to the performance. In the end, according to the analysis
overall classification, it is demonstrated that FKM-CNN attained high accuracy.
To evaluate, the results are compared with those from the previous method to
demonstrate the suggested method’s effectiveness. The accuracy of several classi-
fications of diagnoses of brain tumors with and without the suggested strategy is
shown in the datasets [4, 6, 7, 16]
Without a prediction strategy, the accuracy of the different classifications on the
brain tumor dataset would be substantially lower. Therefore, after the prediction
strategy the accuracy has been taken for the brain tumor dataset. Table 3 compares
the accuracy of various classifications with and without the suggested approach. The
aforementioned table shows that, in terms of classification accuracy, the suggested
method utilized outperforms high in all other classifications on the dataset.

Table 2 Classification
FKM FKM-RED FKM-CNN FKM-RED-CNN
accuracy results
0.83673 0.85714 0.87755 0.89795

FKM-CNN Accuracy Results Series1


FKM-RED-CNN
FKM-CNN
FKM-RED
FKM
80 82 84 86 88 90 92

Fig. 3 Accuracy results of FKM-CNN

Table 3 Classification
Accuracy based methodology Accuracy
accuracy of comparative
study table Fuzzy rough optimization HOG 84.70
KNN FNPME-FS 87.38
DEFRS SVM 87.03
Neural network FRQR 80.06
SegNet Max DT 85.00
Fuzzy K-means FKM-CNN 89.795
The [bold] designation in Table 3 denotes when compared to the
current state of the art, our new method, FKM-CNN, provides
better accuracy
10 Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means … 125

9 Conclusion

To explore a new approach discussed, soft computing tools were considered and
analyzed and compared to show the training dataset and test dataset. Comparing
which two have the highest prediction value, test data was found to have the highest
prediction value. It breaks the entire writing process into phases and provides facts
on the most common types of brain tumors. Fuzzy K-means and convolutional neural
network outcomes were compared, and the data was assessed for how to separate
brain tumors, a particular type of brain tissue, from real-time MRI data. The suggested
method uses less time and reduces noise. Also, FKM, FKM-RED, FKM-CNN, and
FKM-QR-CNN accuracy are 84, 86, 88, and 90%. So it shows that FKM-RED-
CNN accuracy is better than other solutions for early brain tumor detection. The
development of each field related to FKM-RED-CNN has been promoted but is still
in its infancy, and a variety of problems are still unresolved.

References

1. Wong T-T, Liang M-L, Chen H-H, Chang F-C (2011) Hydrocephalus with brain tumors in
children. In: Child’s nervous system, vol 27(10). Springer, pp 1723–1734
2. Bulat M (1993) Dynamics and statics of the cerebrospinal fluid: the classical and a new
hypothesis. In: Intracranial pressure VIII. Springer, pp 726–730
3. Rajiga SV, Gunasekaran M (2021) Techniques of image processing and segmentation in
predicting hydrocephalus using magnetic resonance image. In: 7th international conference
on advanced computing and communication systems (ICACCS) (1). IEEE, pp 1942–1945
4. Chen X, Li D, Wang P, Yang X (2020) A deep convolutional neural network with fuzzy rough
sets for FER. IEEE Access 8:2772–2779
5. Sharma M, Purohit GN, Mukherjee S (2018) Information retrieves from brain MRI images for
tumor detection using hybrid technique K-means and artificial neural network (KMANN). In:
Networking communication and data knowledge engineering, vol 14. Springer, pp 145–157
6. Jeyavani M, Karuppasamy M (2022) EEG in optic nerves disorder based on FSVM using kernel
membership function. In: ICT with intelligent applications, vol 1(16). Springer, pp 144–154
7. Vijay J, Subhashini J (2013) An efficient brain tumor detection methodology using K-means
clustering algorithm. In: International conference on communications and signal processing
(ICCSP). IEEE Xplore, pp 653–657
8. Wei J, Chang Z, Mao L (2021) Matrix-based optimistic multigranulation fuzzy covering rough
sets. In: 2nd international conference on big data. IEEE, pp 838–841
9. Akkus Z, Galimzianova A, Hoogi A, Rubin DL, Erickson BJ (2017) Deep learning for brain
MRI segmentation: state of the art and future directions. J Dig Imaging 30(4):449–459. Springer
10. Naceur B, Mostefa MA, Saouli R, Kachouri R (2020) Deep convolutional neural networks
for brain tumor segmentation: boosting performance using deep transfer learning: preliminary
results. In: International MICCAI brain lesion workshop. Springer, pp 303–315
11. Hesamian MH, Jia W, He X, Kennedy P (2019) Deep learning techniques for medical image
segmentation: achievements and challenges. J Dig Imaging 32(4):582–596
12. Gupta TK, Raza K (2020) Optimizing deep feedforward neural network architecture. In: A
tabu search based approach. Springer (Neural Process Lett 51(3):2855–2870)
13. Lau MM, Phang JTS, Lim KH (2019) Convolutional deep feedforward network for image clas-
sification. In: 7th international conference on smart computing and communications (ICSCC).
IEEE, pp 1–4
126 M. Jeyavani and M. Karuppasamy

14. Jansi Rani M, Karuppasamy M (2022) Cloud computing-based parallel mutual information
for gene selection and support vector machine classification for brain tumor microarray data.
NeuroQuantology 20(6):6223–6233
15. Jansi Rani M, Karuppasamy M, Prabha M (2021) Bacterial foraging optimization algorithm
based feature selection for microarray data classification. Mater Today Proc. Elsevier
16. Alqazzaz S, Sun X, Yang X, Nokes L (2019) Automated brain tumor segmentation on multi-
modal MR image using SegNet. In: Computational visual media, vol 5(2). Springer, pp 209–219
Chapter 11
Precipitation Forecasting: LSTM
Modeling in Visual Analytic Framework

Sudha Govindan and Suguna Sangaiah

1 Introduction

1.1 Visual Analytics Approach

Visual analytics is the broad branch of visualization which deals with systematic
investigation of input, implantation of a suitable data/analytical model, investigating
outputs and assessing the model through the visualizations constructed in par with
each stage of application development. It is well known that ‘A picture worth 1000
words’. Visualizations aid the understanding and assessment of underlying models
easier. This research work uses LSTM as a modelling component in visual analytic
framework; the performance of LSTM is outlooked by the visualizations generated
by tensor flow depicting the losses.

1.2 LSTM

Long short-term memory, LSTM, is the inherent recurrent neural network in which
the controlled manner of relative learning happens with respect to ‘specified number
of past days’—the trend and seasonality weightage is learnt here. This specialty
of LSTM benefits the time series-based applications, which are naturally nonlinear.
LSTM networks contain padded sequences of LSTM memory cells incurring removal

S. Govindan (B)
Madurai Kamaraj University, Madurai, Tamilnadu, India
e-mail: g.sudha79@rediff.com
S. Sangaiah
PG and Research Department of Computer Science, Sri Meenakshi Government Arts College
for Women (A), Madurai, Tamilnadu, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 127
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_11
128 S. Govindan and S. Sangaiah

or adding information capability. Each input layer of the LSTM network has number
of LSTM cells to accept input vector, followed by LSTM-hidden layer that consists
of forget unit performing weightage-based recent learning in par with past days and
output unit to deliver learning outcome to next LSTM/dense/dropout layer.
LSTM requires input (represented by ‘x’) features arranged in associated manner
and accumulated as batches; fed in a sequence. The input–output association is leant
by LSTM during its training which incurs a number of epochs. Cells are connected
with cells of next layer that are weighted; weight matrix ‘w’ denotes weights assigned;
‘cell state at time-t’ is given by ‘ct ’; bias is denoted by ‘b’; ‘o’ represents cell output;
‘f ’ represents forget state of memory cell.
Each LSTM cell contains (a) input gate using sigmoid layer (Eq. 1) used to
control which values are to be updated. And tanh (or different function may be used
demanded by underlying application) layer (Eq. 2) assigning weights to the values
to be added to the cell state.
   
i t = σ Wi . h t−1 , xt + bi (1)

 
ĉt = tanh(Wc . h t−1 , xt + bc ) (2)

(b) forget gate insists on the amount of past value to be forgotten (Eq. 3).
   
f t = σ W f . h t−1 , xt + b f (3)

(c) output gate decides ‘what part of current cell’ is to be delivered as output through
layers. The sigmoid layer (uses Eq. 4) decides ‘which part of the cell state’ is selected
for output. Next layer, tanh, provides rendering of weights using Eq. 5.

ot = σ (Wo .[h t−1 , xt + bo ) (4)

h t = ot ∗ tanh(ct ) (5)

Wide variety of activation functions can be used concerning the domain of


attributes; Widely used ‘sigmoid’ function squeezes the value between ‘0 and 1’
to enforce amount of forgetting and ‘tanh’ excerpt output to ‘− 1 to + 1’ to establish
gradient diminishing. Bias and dropout layers are optionally added to the network
in order to prevent penalization, overfitting of the model.
11 Precipitation Forecasting: LSTM Modeling in Visual Analytic Framework 129

1.3 Background Study

Several researches have been shown the usage of LSTM for forecast reveals its detain-
ment of recent trends and seasonality. Lee et al. [1] have summarized the research
work in guiding the traffic using LSTM and suggested usage of ‘tanhx’ at hidden
layer, and ‘softmax’ activation functions at output layer with run of 50 epochs have
yield better prediction. They concluded that LSTM is capable of learning nonlinearity.
Anandharajan et al. [2] have presented weather prediction based on regression in
artificial intelligence approach and illustrated with cost function aiming to minimize
mean square error (MSE), and cost function was minimized using gradient descent
differential function. Patlakas et al. [3] have predicted wind gust speed through the
polynomial Kalman filtering local adaptation model. Research works presented [4, 6,
8] prove that LSTM effectively learns the past day observations and is able to predict
better than other machine learning, neural network models. The results were more
correlated with actual values since the prediction is made on high impact/relevance to
recent past days scenario. Zhao et al. [5] have presented research work to predict air
quality index through datamining technique based on temporal correlation; the same
is meaningfully done by LSTM that additionally learnt trend. Research work done
by Shah et al. [7] presented a feed forward neural network with ‘logsig’ activation
function for hidden layers, ‘puresig’ activation function for output layer and used
analytical equations to predict pollution features. Khairudin et al. [10] concluded
that LSTM outperforms decision tree, support vector machine and random forest
algorithm in weather forecasting application. Sudha et al. [9, 11, 12] have presented
the way in which visual analytics improves the insights procurement while adopting
linear regression and autoregressive moving average models in weather and pollution
feature predictions.

1.4 Proposed Methodology

The precipitation is to be forecasted by LSTM after learning the past year recording
of weather and weather features. The underlying real-time dataset (recorded at
multiple weather and pollution monitoring stations of ‘Chennai’ city, Tamilnadu,
India) [13–18] used in this work has three weather-related attributes such as temper-
ature, humidity, visibility and two weather-influencing attributes: particulate matter
of size 0.25 (PM2.5) and nitrogen dioxide recorded in 2018. One-year observa-
tions are fed as input, and 15-day forecasting is estimated; the entire work is coded
using Python—tensor flow framework. The works [9, 11, 12] elaborated missing
data procurement through Pearson correlation and Autoregressive Integrated Moving
Average (ARIMA).
130 S. Govindan and S. Sangaiah

Preprocessing
Preprocessing of the dataset is mandatory and achieved through either normaliza-
tion or standardization process. Normalization squeezes the attribute values between
‘0 and 1’. Standardization transforms the values into certain lower discrete levels,
generally about mean zero. Such kind of preprocessing aids the LSTM in reducing
pay loads, convergence, and improving training time. The Algorithm 1, PreProcess,
applies various normalization and standardization techniques on the given dataset;
the respective steps are summarized as comment lines preceded with ‘//’. The good-
ness of transformation is accessed by plotting the attribute values in a histogram
visualization. If the visualization of the residuals, tends to have a bell-shaped curve
about zero or some constant, then the standardization is achieved. Table 1 summa-
rizes the application of different scalar functions on the underlying dataset and its
impacts.

Training the LSTM Network


The underlying dataset is split into training, validation and testing datasets in the
ratio of 80:10:10. The training and validation loss of the LSTM is monitored as
per visual analytic approach, inpar-generation of visualizations depicting learning
processes happening at underlying LSTM structure. Various LSTM cell activation
functions are applied, and summary of LSTMs’ performance is presented in Table
1. This research work has accompanied MSE and mean absolute error (MAE) as
loss functions at various LSTM network layers and used ‘adam’ as the optimization
function.
Table 1 LSTM model specification and its performance
Model LSTM structure Scalar Activation Error Past Max. training loss Max. validation loss
# transformation function measured days
in
1 3 layers with (64-64-64) structure → MinMax Linear MAE 15 0.0112676575 0.184092675
Dropout (0.2) layer → Dense layer → o/p
layer
2 4 layers with (64-32-32-64) structure → MinMax Linear MSE 15 0.0177675657 0.1572338656
Dropout(0.2) layer → Dense layer → o/p
layer
3 2 layers with (64-64) structure → Dropout MinMax Elu MSE 15 0.0230985708 0.078932875309
(0.2) layer → Dense layer → o/p layer
4 3 layers with (64-64-64) structure → MinMax Tanh MSE 15 0.011926722712814808 0.24254712462425232
Dropout (0.2) layer → Dense layer → o/p
layer
5 3 layers with (64-64-64) structure → Robust Tanh MSE 15 0.011006745509803295 0.09347487777471542
Dropout (0.2) layer → Dense layer → o/p
layer
6 3 layers with (64-128-64) structure → Robust Swish MSE 7 0.0392828970 0.988347272014
Dropout (0.2) layer → Dense layer → o/p
layer
7 3 layers with (64-64-64) structure → Robust Swish MSE 15 0.019537563 0.07888357444
11 Precipitation Forecasting: LSTM Modeling in Visual Analytic Framework

Dropout (0.2) layer → Dense layer → o/p


layer
8 4 layers with (64-64-64-64) layer → Yeo-Johnson Swish MSE 5 0.0187637065 0.8726627225603
Dropout (0.1) layer → Dense layer → o/p
layer
131
132 S. Govindan and S. Sangaiah

Fig. 1 [Model #7]: visualization representing the loss occurred at the training and validation phase
of LSTM. X-axis represents the number of epochs and the Y-axis represents the loss

The quality of learning acquired by underlying LSTM is plotted as visualization,


refer Fig. 1, which further helps the researcher to choose the right activation function,
loss function, LSTM structure for the dataset and application. Algorithm 2 presents
the methodology used to train and validate LSTM by the proposed work. Algorithm
2 is a recursive one and invokes Algorithm 1 to normalize the input dataset.
11 Precipitation Forecasting: LSTM Modeling in Visual Analytic Framework 133

This work has exercised the activation functions such as exponential linear unit
(elu), exponential activation (exponential), selu (scaled exponential), gelu (Gaussian
error linear unit), linear (input is unmodified), sigmoid, softplus, softsign, swish,
tanh, linear activation functions. Since precipitation is the forecasting feature (having
nature of minimal value), this work has used loss functions MSE and MAE in order
to assess the learning capacity of LSTM network.
134 S. Govindan and S. Sangaiah

1.5 Results and Discussion

Table 1 summarizes the training and validation loss observed in various LSTM
models taken into the account of standardization of input, activation function, number
of past days for remembrance, error measurement adopted. Several models with
different structure and parameters were built and assessed; the seven better models
are tabulated in Table 1. Even though some of the models’ losses are less, they are not
capable of predicting precipitation which is highly correlated to actual observation.
It is learnt that LSTM prediction cannot be judged only by the training, validation
losses.
The better prediction is yield by model [Model #7], having 17,920, 33,024, 65
parameters at input, hidden and dense layer (prior to output layer) respectively.
Figure 1 represents the training and validation loss observed for 50 epochs of
Model #7. Input at training dataset has nine-month observations with a batch size
of four. Figure 2 and Table 2 show the forecast done by Model #7, for the next
15 days, which are more than 90% closer to actual recordings; it also predicts rainy
and non-rainy days perfectly.
Differentiating the input values aided improved results in 1–3%. Differentiation
of input yields better forecasts than log exponential transformations; in this case,
inverse transformation functions of respective transformations are necessary.

Fig. 2 [Model #7]: precipitation forecasting (shown in orange color) for next 15 days and the
prediction matches with real observation for both rainy days, non-rainy days
11 Precipitation Forecasting: LSTM Modeling in Visual Analytic Framework 135

Table 2 Predicted
Day Actual recording Predicted precipitation
precipitation compared with
actual recording 1 0 0.000001
2 0.515044 0.505088
3 0.126652 0.20049
4 0.194881 0.2011
5 0.127578 0.1189
6 0.052117 0.038
7 0.015975 0.03
8 0 0.0001
9 0 0.00009
10 0.3 0.3198
11 0.5 0.44876
12 1.2493 0.9872
13 0.94142 0.92
14 0.854374 0.83
15 0.673 0.5812

1.6 Conclusion

LSTM is capable of learning nonlinearity and forecast corresponding to recent past


days (set by the domain expert). Improved forecasting was attained, when the custom
transformation and normalization/standardization is applied before feeding input to
LSTM. Accuracy of forecasts produced by LSTM is higher than traditional fore-
casting methods. Usage of different activation functions at hidden and output layers
of LSTM reacts well in long-term and short-term forecasting. Minmax transfor-
mation with elu, gelu activation function yields doubled prediction values. Usage
of ‘MaxAbsolute scalar’ failed to predict non-rainy days. Application of ‘Quantile
scalar’ failed to predict rainy days.

References

1. Lee C et al (2020) A visual analytics system for exploring, monitoring, and forecasting road
traffic congestion. IEEE Trans Vis Comput Graph 26(11):3133–3146. https://doi.org/10.1109/
TVCG.2019.2922597
2. Anandharajan TRV, Hariharan GA, Vignajeth KK, Jijendiran R, Kushmita (2016) Weather
monitoring using artificial intelligence. In: 2016 2nd international conference on computational
intelligence and networks (CINE), pp 106–111. https://doi.org/10.1109/CINE.2016.26
3. Patlakas P, Drakaki E, Galanis G, Spyrou C, Kallos G (2017) Wind gust estimation by combining
a numerical weather prediction model and statistical post-processing. Energ Procedia 125:190–
198
136 S. Govindan and S. Sangaiah

4. Korunoski M, Stojkoska BR, Trivodaliev K (2019) Internet of things solution for intelligent air
pollution prediction and visualization, pp 1–6. https://doi.org/10.1109/EUROCON.2019.886
1609
5. Zhao G, Huang G, He H, Wang Q (2019) Innovative spatial-temporal network modeling and
analysis method of air quality. IEEE Access 7:26241–26254. https://doi.org/10.1109/ACCESS.
2019.2900997
6. Zhang B, Zhang H, Zhao G, Lian J (2020) Constructing a PM2.5 concentration prediction
model by combining auto-encoder with Bi-LSTM neural networks. Environm Model Softw
124:104600. ISSN 1364-8152,https://doi.org/10.1016/j.envsoft.2019.104600
7. Shah J, Mishra B (2020) Analytical equations based prediction approach for PM2.5 using
artificial neural network. SN Appl Sci 2:1516. https://doi.org/10.1007/s42452-020-03294-w
8. Askari B, Le Quy T, Ntoutsi E (2020) Taxi demand prediction using an LSTM-based deep
sequence model and points of interest. In: 2020 IEEE 44th annual computers, software, and
applications conference (COMPSAC), pp 1719–1724. https://doi.org/10.1109/COMPSAC48
688.2020.000-7
9. Sudha G, Thangaraj M, Sangaiah S (2020) Numerical weather analysis using statistical
modelling as visual analytics technique. In: Venkata Krishna P, Obaidat M (eds) Emerging
research in data engineering systems and computer communications. Advances in intelligent
systems and computing, vol 1054. Springer, Singapore. https://doi.org/10.1007/978-981-15-
0135-7_9
10. Khairudin NBM, Mustapha NB, Aris TNBM, Zolkepli MB (2020) Comparison of machine
learning models for rainfall forecasting. In: 2020 international conference on computer science
and its application in agriculture (ICOSICA), pp 1–5.https://doi.org/10.1109/ICOSICA49951.
2020.9243275
11. Sudha G, Sangaiah S Insights through visualizations of air quality at Chennai city. In: Proceed-
ings of international conference on data science and information ecosystem’21 (ICDIE’21),
pp 135–138. ISBN 978-93-91373-04-7
12. Sudha G, Suguna S (2022) Health hazard: PM2.5 forecast—a visual analytic framework using
ARIMA. Int J Health Sci 6(S2):630–642. https://doi.org/10.53730/ijhs.v6nS2.5066
13. https://mausam.imd.gov.in/chennai/. Last accessed 2019/04/01
14. http://www.tnpcb.gov.in/air-quality.php. Last accessed 2019/09/22
15. https://smartcities.data.gov.in/. Last accessed 2019/10/18
16. https://www.tn.gov.in/. Last accessed 2019/11/21
17. https://app.cpcbccr.com/. Last accessed 2019/05/21
18. http://bhuvan.nrsc.gov.in/home/index.php. Last accessed 2019/01/09
19. https://www.noaa.gov/. Last accessed 2019/02/06
Chapter 12
Cyclone Forecasting Before Eye
Formation Using Deep Learning

Aryan Khandelwal, R. S. Ramya, S. Ayushi, R. Bhumika, P. Adhoksh,


Keshav Jhawar, Ayush Shah, and K. R. Venugopal

1 Introduction

A cyclone is a fast-moving storm that arises in the oceans and absorbs energy to grow.
Tropical cyclones are among the greatest hazards to property and life, even in their
initial phases of formation. Hence, the detection and prediction of cyclones in the
early stages that are before the formation of eye are important. Detection of cyclones
and their prediction can be done using conventional means such as practices that
involve specific parameters like temperature, wind speed, etc. Deep learning tech-
niques such as segmentation and time series forecasting are being used in recent
advances. The solution proposed in this paper elaborates on the aforementioned neu-
ral network-based method. Furthermore, the cyclone is detected using K-means that
is compared with other segmentation techniques like Detectron2 and Mask recurrent
convolutional neural network (R-CNN). This proposed model has applications per-
taining to the detection and prediction of a cyclone using the satellite. Prediction of
forest fires is also one of the application of this model.
Tropical cyclones are one of the biggest threats to life and property even in the
formative stages of their development. Once the eye of the cyclone is formed it
reaches the shore really rapidly and can cause a number of different hazards that
can individually cause significant impacts, such as storm surge, flooding an extreme
winds. Hence, it is crucial that we detect the eye and predict the occurrence of the
cyclone as early as possible. Our model aims to solve the aforementioned problem
by providing prior predictions of the formation of cyclones by analyzing the change
in the surrounding parameters.

A. Khandelwal · R. S. Ramya · S. Ayushi (B) · R. Bhumika · P. Adhoksh · K. Jhawar · A. Shah


Department of Computer Science and Engineering, Dayananda Sagar College of Engineering,
Bengaluru, India
e-mail: ayushisundaresh123@gmail.com
K. R. Venugopal
UVCE Bangalore University, Bengaluru, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 137
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_12
138 A. Khandelwal et al.

2 Related Works

Wen et al. [1] proposed deep learning-based image processing that is utilized to
categorize and detect faults using SEM images of metallic AM components (cracks
and pores). Almost any single imperfection may be classified as a fracture or a pore
using the adjusted CNN model. The model does not give accurate results with large
data.
Nair et al. [2] proposed that tropical cyclones are the foremost dangerous weather
systems that originate over the tropical oceans, with roughly 90 storms forming
annually throughout the globe. Quick identification and tracking of tropical cyclones
are crucial for advanced warning to sensitive locations. As these storms occur over
open oceans distant from the landmass, remote sensing is required to detect them.
The model faces difficulty in detecting the edges of the domain.
Raza et al. [3] suggested the IR-MSDNET architecture to merge infrared and
visible images. This is done to ensure that the fused image retains significant elements
from both IR and visible images. Object detection trials on the IR-MSDNET model
revealed that the model did an excellent job of enhancing the features of the fused
image. This model is slower than other deep learning methods.
Lu et al. [4] proposed a semi-supervised approach for discovering the 2D param-
eters of extratropical cyclones (ET) in the Northern Hemisphere in this work. By
comparison, the new method effectively supports the old method by increasing the
number of recognized cyclones by 8.29%. The Mask R-CNN model is also excellent
in detecting horizontal characteristics in tropical cyclones. The labeling process is
not suitable for large-scale labeled database.
Zhao et al. [5] presented the DeepGlobe building extraction challenge that asks
you to locate all the building polygons in the satellite photos provided. In the studies,
the Mask R-CNN approach achieves nearly comparable accuracy and completeness.
In comparison with Mask R-CNN, this method produces more regularized polygons
that is advantageous in a range of applications. The challenging problem of this
model is to detect small objects and closely located buildings.
Rau et al. [6] designed the backpropagation trained artificial neural network
(BP ANN) to classify diverse sea surface conditions using information from high-
resolution AVHRR visible and infrared images. A neural network was further pro-
posed to forecast the movement of ice coverage using time series analysis. The
research work for this model is to implement motion detection neural net that is still
under progress.
Gupta et al. [7] used vast rainfall data to create helpful storm patterns. The
three categories of storms-local severe storms, hourly storms, and overall storms-
discovered using MapReduce-based algorithms were stated according to the paper.
Local severe storms, on the whole, have the temporal features of storms that takes
place in a localized place. Storms that occur at a specific hour have spatial char-
acteristics. The paper uses K-means clustering to identify distinct sorts of hourly
storms based on their shapes and sizes. Actual shapes of the centroids of the cluster
in experiment screenshot are not adequately handled.
12 Cyclone Forecasting Before Eye Formation … 139

Emre Celebi et al. [8] developed color quantization that is helpful graphics and
digital image processing technology. The effectiveness of k-means as a color quan-
tizer is investigated in this work. Their study developed fast and precise k-means
with several starting techniques and compare the quantizers to some of the most
prominent quantizers in the literature. All the variants of k-means, each one with a
different initialization scheme proposed involves randomness.
Pham et al. [9] suggest that the most important job in allowing for timely road
damage repair is to promptly and accurately classify and detect the damage. The
paper test Detectron2 is better than the faster R-CNN implementation because it uses
various base models and parameters. When tested on the X101-FPN basic model,
the findings that displayed the F1-scores for the faster R-CNN and Detectron2 were
51.0% and 51.4%, respectively. Improvisation of the labeling process for this dataset
should be implemented.
Liu et al. [10] proposed that the energy that is acquired from the wind is the
one with the most rapid expansion on the planet, and it is viable ecologically. Wind
speed prediction for time series forecasting is critical for an accurate and efficient
appraisal of offshore wind energy that benefits wind farm owners, grid operators,
and end-users. One of the most used models out there for predicting hourly wind
speed in Scotland’s offshore/coastal region is the SARIMA model. The constructed
prediction model was then compared to recently developed deep learning-based
algorithms LSTM and GRU, and a quantified performance measure was generated.
Among the three evaluated prediction models, SARIMA had the best accuracy and
robustness. The SARIMA model has to be trained further on more parameters to get
accurate results.
Lim et al. [11] have discussed how encoder and decoder design have been applied
in one-step-ahead and multi-horizon time series forecasting. The paper also outlines
an approach for deep learning that is hybrid where statistical methods are combined
with neural network elements to boost efficiency. The paper gives an overview of
how to use the deep learning technique that can aid decision-making with time series
data.
Karevan et al. [12] study shows that LSTM’s capacity to recognize long-term
correlations has been widely used. Karevan, in the paper, discusses how LSTM is used
on time series data for weather forecasting. Here, a different version of LSTM that is
T-LSTM is used. T-LSTM, also known as transductive LSTM, uses local information
for prediction. The test points have a bigger influence on making a prediction in T-
LSTM. Two different weighting schemes were used, and the experiments were also
conducted at two different times of the year to get better predictions. It is seen that
T-LSTM works better in terms of predictions. To prevent commonality between two
days in a row, the latest two samples in the dataset are taken into account. The dataset
is thereby made smaller.
Geetha et al. [13] have proposed that ARIMA, also known as the auto-regressive
integrated moving average model, is well known for time series forecasting. It helps to
predict unknown data in the series. The model was used to forecast tropical cyclones.
Tropical cyclones cause a lot of damage to humankind, so predicting accurately could
avert a lot of cyclone-related disasters. The model is built on the ARIMA of TSM.
140 A. Khandelwal et al.

Table 1 Comparison of cyclone detection techniques


SN Author Year Concept/algo Advantage Disadvantage
1 Wen [1] 2021 Recognize defects in Accurately classifies Further, the model
3D-printed parts all cracks and pores must be improved
using Mask R-CNN that exist in the with large data
original video
2 Nair [2] 2021 An automated TC Hyperparameters of Sometimes, the
detection from the entire pipeline model does not
satellite images are optimized to detect the edges of
based on a novel showcase the best the domain
deep learning performance
technique
3 Raza [3] 2021 Fusion of IR and Higher and low Slower than other
visible images average miss rate deep learning
methods like
IFCNN deep fuse,
fusion GAN
4 Lu [4] 2020 Identifying Shows good Improve the labeling
extratropical performance in process by
cyclones in a identifying the constructing
quasi-supervised horizontal structures large-scale labeled
manner by using of tropical cyclone database
Mask R-CNN
5 Zhao [5] 2018 Localizing all Mask R-CNN Extracting small
building polygons in achieve almost buildings and
the given satellite equivalent closely located
images performance in buildings is
terms of accuracy challenging
and completeness

Table 1 compares by illustrating their advantages and disadvantages. Table 2 elab-


orates the various attributes, concepts, and approached of the papers, while comparing
them from the research papers referred.

3 Hexagon Framework

The proposed model involves a semantic segmentation approach to detect cyclones


from satellite images. In this study, K-means and Detectron2 have been used for iden-
tifying cyclone regions in images. The images are further processed for time series
forecasting using bidirectional GRU. Figure 1 represents the proposed architecture.
Table 2 Summary of different models for cyclone detection
Author Year Dataset Approach Preprocessing Efficiency Activation Optimizer
Wen [1] 2021 SEM dataset X101-FPN, Training Detectron2 90% (training set) BCE Adam
R50-FPN, model using
R101-DC5 Labelme
Nair [2] 2021 geo spatial dataset CNN+LSTM Image segmentation 86.55% (training set) softmax layer Bayesian
using Mask R-CNN
Raza [3] 2021 Aerial image IR-MSDNet Feature extraction Log average miss ReLU Gaussian smoothing,
dataset, Tno dataset rate (MR) of 27.45 IR feature
12 Cyclone Forecasting Before Eye Formation …

per (longest among suppression ratio


other models)
Lu [4] 2020 ERA-Interim dataset SPP-Net and CAA is used to 90.87% (training set) ReLU Binary-classification
RoIAlign initially generate the
cyclone’s regime
(mask)
Zhao [5] 2018 COCO dataset ResNet-101-FPN Polygon 71.7% training set – MDL
regularization (test 1)
141
142 A. Khandelwal et al.

Fig. 1 Hexagon framework

3.1 Preprocessing

The INSAT-3D satellite records visible (VIS) and infrared (TIR-1) channel photos
that are included in the Dataset collection. Images are converted from .tif to.jpg format
during preprocessing, resulting in grayscale images with a resolution of 1074 * 984
pixels. This paper describes a method for improving cyclone image segmentation. It
consists of two steps: segmentation and batch normalization. The image intensities
are first standardized using pixel histograms during preprocessing. Morphological
processing is then used to remove the non-cyclone portions. During the segmenta-
tion process, Detectron2 was presented as a method for detecting cyclone zones in
images and compared to another popular method, K-means clustering. The batch
normalization inputs the shape of the image as 150 * 150 and adds it to the proposed
model. Figure 2 depicts a manually annotated image to identify the cyclone portion.
Figure 3 is treated as a series of clusters when processed under K-means.
12 Cyclone Forecasting Before Eye Formation … 143

Fig. 2 Cyclone detection


using Detectron2

Fig. 3 Cyclone detection


using K-means clustering

3.2 CNN-Bi-GRU Model

There are two considerations when developing a model for sequential time series
forecasting. The convolutional neural network (CNN) covers spatial applications,
while the bidirectional gated recurrent unit (Bi-GRU) monitors temporal applica-
tions. Convolutional neural network (CNN) is a subset of neural networks used to
process images and perform functions like classification, prediction, and segmenta-
tion. CNN is a form of multilayer perceptron where every neuron in the current layer
is connected to the neurons in the next layer. CNN assists in changing the images into
a format that is simpler to analyze without compromising details that are essential
for making accurate predictions. The model’s preprocessing phase consists of three
local feature learning block (LFLB) layers, each led by two Bi-GRU-CNN layers.
The grayscale images fed into the model had 400 * 400 pixel dimensions. A series of
four images were used to train the model that produced a tensor with the dimension
(4, 150, 150, 1), and the final prediction was compared to the next image in the
series. The dataset contains 45 images, with four images in each set of five. Tanh is
the activation function used in the Bi-GRU-CNN layers, with a dropout of 0.3. The
prediction proceeds in the following manner: First, images at time sequences 1, 2,
3, and 4 are extracted from the INSAT-3D dataset, with each image occurring at a
30-minute interval, that the model uses as input for training. Using the four images,
the model predicts the image for the fifth time sequence and compares it to the actual
image.

3.2.1 Bi-GRU

A bidirectional gated recurrent unit, often known as Bi-GRU, is a sequence processing


model made up of two-directional GRUs. It will receive input from both forward and
backward directions. A GRU, or gated recurrent unit, is a type of recurrent neural
network that is similar to an LSTM but only has two gates—a reset gate and an
update gate.
144 A. Khandelwal et al.

Fig. 4 A GRU unit

Figure 4 shows a single GRU unit. The update gate X t and the reset gate Z t are
combined to form a GRU unit. The equations formed by the update gate are shown
in Eq. 1, and the reset gate is shown in Eq. 2. The output kt is controlled by both
the current input Vt and the previous state kt−1 while these two gates are operating.
Equation 4 shows the outcome of the current input Vt and the previous state kt−1 . The
outputs of the gates and the GRU unit are calculated as follows. Equation 3 denotes
the output of the combination of update gate and reset gate. The computations for
the GRU unit, the rest and update gates, and their respective outputs are listed below:
The outputs of the gates are computed using the logistic sigmoid function, the GRU
unit, and the hyperbolic tangent. The weight matrices Wx , Ux , W y , U y , Wz , and Uz
are employed in this process. The Hadamard product. C x , C y and C z are the synthesis
of bias vectors for input Vt and prior state kt−1 .

Z t = σ (Wz ∗ Vz + Uz ∗ kt−1 + C z ) (1)

X t = σ (Wx ∗ Vt + Ux ∗ kt−1 + C x ) (2)

Yt = σ [W y ∗ Vt + U y ∗ (Z t  kt−1 ) + C x ] (3)

When working with the present data, models with a bi-directional structure have the
capacity to learn information from both past and subsequent data. The first GRU goes
forward, starting at the beginning of the data series, while the second GRU moves
backward, starting at the conclusion of the data sequence. This enables knowledge
from the past as well as the future to affect the conditions of the present.

kt = (1 − X t )  kt−1 + X t  Yt (4)

k = GRUFwd (Vt , ht−1 ) (5)


12 Cyclone Forecasting Before Eye Formation … 145

Table 3 Symbol definition table


Symbols Symbol description
Xt It represents the update gate
Yt Combined output of the two gates
Zt It represents the reset gate
Vt It represents the current input
Kt It represents the final output
kt−1 Shows the output of the previous state
W x , W y , Wz Weight matrices for current state
U x , U y , Uz Weight matrices for previous state
Cx , C y , Cz Synthesis of bias vectors
k State of the forward GRU
←−
k State of the backward GRU
ht−1 Previous hidden state
ht+1 Next hidden state



k = GRUBkw (Vt , ht+1 ) (6)
←−
kt = kt ⊕ k (7)
←−
where k shows the forward GRU that forms Eq. 5, and the other ( k ) state represents
the backward GRU, and the equation formed by backward GRU is shown in Eq. 6,
⊕ indicates the operation of concatenating two vectors. The final product of kt is
displayed in Eq. 7. Table 3 shows symbol definition.

4 Experiments and Results

For cyclone detection, two methods were compared, namely Detectron2 and K-
means. The comparison is shown in Table 4. Figure 5 shows the results of Detectron2
and K-means.
For the cyclone prediction, different architectural models such as CNN-LSTM,
CNN-BiLSTM, CNN-GRU, and CNN-Bi-GRU were compared, and out of that
CNN-Bi-GRU showed the best results. The comparison of models based on their
evaluation metrics is shown in Table 5. Many hyperparameters like the number of
epochs, pool size, drop-out probability, number of units, activation layer, and opti-
mizer are experimented with and compared. The model has been built on different
input sequences to analyze at what configuration the model generates, the lowest
error, measured in terms of MSE. Table 6 depicts this comparison on different input
specifications. From the input specification it was observed, the base model with the
lowest MSE was achieved with four (150 * 150) images to the network and predicted
the fifth image.
146 A. Khandelwal et al.

Table 4 Comparison between K-means and Detectron2


Criterion K-means Detectron2
Hyperparameter The choice of K in K-means Batch size and number of epochs
Feature vector Four feature vectors are computed No separate feature vector
per pixel
Annotation It is not required It is required
Nature of mask Binary and pixel level Cyclone level with confidence
Training time Training time is observed to be Training takes more time than
lower than Detectron2 K-means
Use case Use for color quantization of the Used to detect and highlight the
images areas of interest

Fig. 5 Results: K-means and Detectron2

Table 5 Comparison between models on basis of evaluation metrics


Model MSE SSIM
CNN-Bi-GRU 1611.27591 (0.99836, 0.99978)
CNN-GRU 1886.89365 (0.99752, 0.99978)
CNN-BiLSTM 1950.40867 (0.99738, 0.99978)
CNN-LSTM 1954.50245 (0.99739, 0.99979)

Table 6 Model with different input sequences


Model Time (s) MSE SSIM
(150 * 150) 4-InputSequence 217 1611.27591 (0.99836, 0.99978)
(300 * 300) 4-InputSequence 368.4 1791.54913 (0.99722, 0.99986)
12 Cyclone Forecasting Before Eye Formation … 147

Fig. 6 Predicted result

The final CNN-Bi-GRU model that was constructed after all experiments included
input size as 150 × 150, number of epochs as 150 with pool size equal to 4 × 4,
drop-out probability equal to 0.4, number of GRU units equal to 1024, and nadam
as optimizer. The model gave an MSE of 1611.27591 and an SSIM of (0.99836,
0.99975) to train 150 epochs. SSIM is in the format: (Output_Image, Input_Image).
The parameters used for plotting the loss function graph consist of nadam as the
optimizer, soft plus as the activation function, and mean squared error as the loss
function. The provided set works the best for regression or prediction models when
compared to other sets and was plotted after model training as shown in Fig. 7, and
the predicted result is shown in Fig. 6. Figure 8 shows training overview of loss vs
input image plot.
148 A. Khandelwal et al.

Fig. 7 Loss function

Fig. 8 Input size


comparison

5 Conclusion

The focus of this paper is to detect cyclones and predict them before eye formation
using the proposed model and algorithms, from INSAT-3D satellite images. Cyclone
detection is accomplished by K-means and Detectron2, which segments the image
and pinpoints the cyclone region. CNN-Bi-GRU model is used for cyclone forecast-
ing. The performance of the suggested CNN-Bi-GRU model is assessed using two
metrics. One of the metrics used here is mean squared error, which is defined by the
square root of the differences in pixel intensities of the compared input images. The
structural difference between the two images is determined with the use of structural
similarity index (SSIM) which is the other metric. SSIM is a more robust algorithm
since it compares two images based on a window size of N ∗ N rather than com-
paring the complete images like MSE does. According to the output obtained, SSIM
of the proposed model is (0.99836, 0.99978), and the MSE value is 1611.27591. In
future, the wind speed and temperature data from the satellite could be analyzed to
predict the intensity of the cyclone and achieve better results.

Acknowledgements This project was created under India Space Research Organization’s problem
statement SS591 at Smart India Hackathon 2022. The authors would like to thank SIH, AICTE,
and ISRO for providing us with this opportunity.
12 Cyclone Forecasting Before Eye Formation … 149

References

1. Wen H, Huang C, Guo S (2021) The application of convolutional neural networks (CNNs) to
recognize defects in 3D-printed parts. Materials 14(10):2575
2. Nair A, Sai Srujan KSS, Kulkarni SR, Alwadhi K, Jain N, Kodamana H, Sandeep S, John VO
(2021) A deep learning framework for the detection of tropical cyclones from satellite images.
IEEE Geosci Remote Sens Lett 19
3. Raza A, Liu J, Liu Y, Liu J, Li Z, Chen X, Huo H, Fang T (2021) IR-MSDNet: infrared and
visible image fusion based on infrared features and multiscale dense network. IEEE J Sel Topics
Appl Earth Obs Remote Sens 14:3426–3437
4. Lu C, Kong Y, Guan Z (2020) A mask R-CNN model for reidentifying extratropical cyclones
based on quasi-supervised thought. Sci Rep 10(1):1–9
5. Zhao K, Kang J, Jung J, Sohn G (2018) Building extraction from satellite images using mask
R-CNN with building boundary regularization. In: Proceedings of the IEEE conference on
computer vision and pattern recognition workshops, pp 247–251
6. Rau Y-C, Comiso JC, Lure FYM (1994) Application of neural networks for identification of
sea ice coverage and movements from satellite imagery. In: Proceedings of IEEE international
geoscience and remote sensing symposium IGARSS, vol 3, pp 1407–1409
7. Gupta U, Jitkajornwanich K, Elmasri R, Fegaras L (2016) Adapting K-means clustering to
identify spatial patterns in storms. IEEE, pp 2646–2654
8. Emre Celebi M (2009) Effective Initialization of K-means for color quantization. IEEE, pp
1649–1652
9. Pham V, Pham C, Dang T (2020) Road damage detection and classification with Detectron2
and faster R-CNN. IEEE, pp 5592–5601
10. Liu X, Lin Z, Feng Z (2021) Short-term offshore wind speed forecast by seasonal ARIMA—a
comparison against GRU and LSTM. Energy 227:120492
11. Lim B, Zohren S (2021) Time-series forecasting with deep learning: a survey. Philos Trans
Roy Soc A 379(2194):20200209
12. Karevan Z, Suykens JAK (2020) Transductive LSTM for time-series prediction: an application
to weather forecasting. Neural Netw 125:1–9
13. Geetha A, Nasira GM (2016) Time series modeling and forecasting: tropical cyclone prediction
using ARIMA model. In: Proceedings of 2016 3rd international conference on computing for
sustainable global development (INDIACom), pp 3080–3086
Chapter 13
Fusion of Information Acquired
from Camera and Ultrasonic Range
Finders for Obstacle Detection
and Depth Computation

Jyoti Madake, Heenakauser Pyare, Sagar Nilgar, Sagar Shedge,


Shripad Bhatlawande, Swati Shilaskar, and Rajesh Jalnekar

1 Introduction

The most essential sense for understanding reality around us is eyesight. Playing a
key role in sensor integration, it provides a means of feedback to balance interaction
with the environment. The loss of vision makes it difficult to live a normal life. The
way one performs socially, psychologically, physically, and independently varies and
can be adversely affected by it. According to the World_Health_Organization [1],
284 million persons are visually impaired worldwide, and 39 million of those are
blind. Visual information can apprehend gadgets within the surroundings. However,

J. Madake (B) · H. Pyare · S. Nilgar · S. Shedge · S. Bhatlawande · S. Shilaskar · R. Jalnekar


Vishwakarma Institute of Technology, Pune 411037, India
e-mail: jyoti.madake@vit.edu
H. Pyare
e-mail: heenakauser.pyare20@vit.edu
S. Nilgar
e-mail: sagar.nilgar20@vit.edu
S. Shedge
e-mail: sagar.shedge20@vit.edu
S. Bhatlawande
e-mail: shripad.bhatlawande@vit.edu
S. Shilaskar
e-mail: swati.shilaskar@vit.edu
R. Jalnekar
e-mail: rajesh.Jalnekar@vit.edu

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 151
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_13
152 J. Madake et al.

judging by the space of the item, the use of a picture is computationally expensive.
Our task is to fuse sensors, i.e., a digital digicam and an ultrasonic sensor, to stumble
on an item and calculate its distance. That is cheaper in terms of computation and
cost. According to the 2019 Road Accident Report [2], 449,002 accidents occurred
throughout the country, resulting in 151,113 deaths and 451,361 injuries. Data fusion
[3] is a technique for combining various data and expertise to give a consistent, accu-
rate, and comprehensive depiction of an environment or process to provide the most
thorough description feasibly. Applications include information systems, process
management, autonomous systems, military systems, and civilian surveillance and
monitoring duties. Combining various sensors, including sound, vision, pressure,
etc., will lead to more accurate and comprehensive statistics that cannot be obtained
from a single sensor alone.
Sensor fusion lets autonomous vehicles avoid obstacles. Autonomous naviga-
tion drives a car without human intervention. Existing object categorization and
detection methods use photos and videos. Image and video data cannot accurately
measure distance. Data fusion has involved fusing multiple sensors, but the hardware
and computational expenses are high. Fusion ensures reliable object detection data.
Ningthoujam et al. [4] identified things with a camera and ultrasonic sensor. Image
segmentation determines object edges. Ultrasonic sensors measured image items’
sizes. Bai et al. [5] found the picture’s item using way finding. Ultrasonic range
finders and depth cameras eliminate inaccuracy while measuring distance. In the
laboratory, ultrasonic sensors were extensively evaluated [6]. Multiple sensors were
recommended instead of one. A sensor may malfunction if another sensor blocks
it. Fusion-based depth camera and ultrasonic sensor detection were employed [7,
8]. The ultrasonic sensor offers 2.61% accuracy for solid, hard-like surfaces but
many inaccuracies for sponge or mesh surfaces. The Kinect depth sensor has 0.89%
accuracy and works with more objects. SURF can detect picture patch size changes
even without optical flow [9]. Valipoor et al. [10] found that sensors were used for
low-range detection and cameras for mid-to-high range.
The fusion of GPS and the magnetic campus was used in [11] to get the location.
The obstacles present in the real-time environment were well understood in [12]
by using stereo vision and ultrasonic sensors. A heat-detecting infrared camera was
installed [13], allowing the front car and pedestrian to be spotted in advance. The
moving item was tracked using a static observation point in [14] using a method
that combines stereo vision and the Kanade_Lucas_Tomasi feature tracker. A self-
integrated low-cost stereo vision system was utilized to map the three-dimensional
environment utilizing MATLAB-based point cloud creation. A dynamic sub-goal
selection approach used in [15] directs individuals and supports them in avoiding
obstacles. This method was a key component of a whole navigation system for blind
people’s daily walks. A fusion of color-detecting sensors and obstacle sensors was
used in [16], together with a voice-based support system, to make a person aware of
the path they travel as well as the obstacles in their way. Static barriers were extracted
from a series of pictures using depth maps. Using monocular fisheye cameras in [17],
a large broader field of vision was covered and objects closer to the vehicle were
detected. Xu et al. employed two types of security measures [18]: Single_sensor_
13 Fusion of Information Acquired from Camera and Ultrasonic Range … 153

based Physical_Shift_Authentication, in which signals are verified at the physical


level, and Numerous_Sensor_Consistency_Check, which verifies signals on a system
level utilizing several sensors. Ultrasonic sensors and self-driving automobiles will be
safer as a result of this. A multi-sensor framework was suggested in [19] for the iden-
tification of items on railway tracks, such as tiny impediments and advancing trains,
using camera and LiDAR data. For people who are blind or visually challenged, a low-
cost wearable gadget was created in [20] that enables them to recognize and locate
items in their environment. A laser pointer and an Android smartphone comprise
the system’s two main hardware building blocks, making it relatively affordable and
widely available.
Data fusion from numerous sensors was used in [21] to create a self-driving assis-
tance system. On the one hand, a 360° picture of the environment was gathered
using six fisheye cameras and 12 ultrasonic radars. On the other hand, this method
provides a forward view of the surroundings by combining a low light camera with
a Lidar. Panoramic mosaic, surround view data fusion, and forward view data fusion
were examples of data fusion techniques. A real-time obstacle detection system
was created in [22] to warn persons with vision impairments. Interest points based
on an image grid were utilized to detect both stationary and moving objects, and
the multiscale Lucas_Kanade approach was employed to track them. The Bag_of_
Visual_Words retrieval system now includes the Histogram_of_Oriented_Gradients_
descriptor, used to categorize obstacles in video streams. A sensor fusion system
was capable of determining the size and placement of obstacles. The Kalman filter
was used in [23] to minimize systematic errors in encoder data. LookTel designed
2 smartphone applications in [24] Money_Reader and_Recognizer. The Money_
Reader reads the bill’s value using a voice synthesizer, which aids in the identifica-
tion of paper money. The recognizer compares an image of an object to an internal
database generated by the user. After the system recognizes the object, it replicates
the user’s previously recorded object description. A multimodal sensor array that
allows for omnidirectional obstacle perception employed for obstacle identification
was used in [25] and consisted of a 3D laser scanner. A multi-layered technique
was used to produce trajectories, from mission_planning through global and local_
trajectory_planning for reactive_obstacle avoidance. Features of software modules
designed for Android cellphones were developed in [26] aimed toward blind people.
The primary module recognizes scanned things and compares them to a database
of objects. The two remaining modules recognize significant colors and locate the
brightest regions of acquired photos. The pre-trained deep neural network YOLOv2
was used in [27, 28] to recognize objects. An object detector’s output and an ultra-
sonic sensor mounted on an Arduino Uno were combined to determine the range.
The SFIT algorithm and key point matching were used in [29] to detect and find
objects in photos.
154 J. Madake et al.

2 Methodology

This paper presents an object detection and distance calculation system using a
monocular camera and ultrasonic sensors as shown in Fig. 1.
This system uses a single monocular image as its input, and, once it has identified
an object, it determines the distance between itself and the object. In addition, an
ultrasonic sensor is used to determine the distance. When the distances are the same,
a voice message is produced by the device.

2.1 Binary Thresholding

The proposed system is implemented using a monocular camera. The image of the
obstacle is captured and resized to 1200 × 600 pixels. The threshold value is 0.5, and,
based on the threshold values; it converts a gray picture to a binary image. The stan-
dard thresholding value is 0.5 and is used if the item is lighter than the background.
Inverse thresholding is used when the item is darker than the background.

2.2 Contours Detection

Typically, contour refers to pixels with the same color and intensity along a border.
We utilized the simple chain approximation contour detection algorithm. Only the
endpoints required to draw the contour line are returned by this approach. Chain
approx. none stores all contour points, while chain approx. stores only corner points.
Chain approx. is therefore memory efficient. Chain approx. compresses only the
endpoints of horizontal, vertical, and diagonal segments along a contour. Thus, any
points along straight lines will be deleted, leaving only the final points. Consider the

Fig. 1 Multimodal object detection block diagram


13 Fusion of Information Acquired from Camera and Ultrasonic Range … 155

shape of a curved rectangle. With the exception of the four corner points, all contour
points will be rejected. This method does not save all of the points, but it requires
less memory and so executes faster than chain approx. none. In the initial phase, the
image was read and converted to grayscale. Converting an image to grayscale is vital
since it prepares it for the subsequent step. For the contour detection technique to
function properly, the image must be transformed to a grayscale image with a single
channel before thresholding. Always perform binary thresholding or intelligent edge
detection to the grayscale image before searching for contours. In this instance, binary
thresholding was utilized. This turns the image to black and white, highlighting
areas of interest and making the task of the algorithm for detecting contours easier.
Thresholding makes the image’s object’s border consistently white, with the same
intensity for each pixel. Based on these white pixels, the software may now infer the
object’s edges. They discovered contours in the image.

2.3 Object Detection Using Contours

In order to create an exact boundary across the detected object that has well-defined
edges, contours are used as mentioned in Algorithm 1. Contours are just a curve that
connects all of the continuous points (along the border) that have the same color or
intensity. An ordered list of 2D vertices (control points) is connected by straight lines
of fixed length. The contours are a valuable tool for item detection and recognition
as well as form analysis. Binary thresholding is applied before taking out contours.
It can create a maximum of ten boundary boxes for ten objects in an image.
A complete process diagram of the proposed system is shown in Fig. 2.

Algorithm 1 Object detection using a contours


Input: Monocular image.
Output: Object detection and distance estimation.
1. Read the image and convert it to grayscale.
2. Apply binary thresholding.
3. Find contours in the threshed image.
4. Draw contours for 10 objects with a maximum area.
5. Draw bounding boxes for 10 objects with a max area.
6. Find the focal length.
7. Calculated distance.

The monocular image is resized and converted to grayscale to extract the descrip-
tors. To increase the accuracy, a binary threshold is applied to contours to detect the
object and create the boundary boxes using ten objects with a maximum area. After
finding the bounding boxes, we found the focal length and calculated the distance.
156 J. Madake et al.

Fig. 2 Block diagram of object detection

We compared the distance calculated by the camera and the ultrasonic sensor. And
finally, voice output was given for the matched distance.

2.4 Distance Calculation Using a Camera

Consider the object of height ‘h’, kept at a distance ‘d’ from the lens as shown in
Fig. 3, which creates the θ 1, and when we move the object with a distance ‘m’, it
creates the θ 2 .
f = focal length, h = height of the object.

Fig. 3 Distance calculation


13 Fusion of Information Acquired from Camera and Ultrasonic Range … 157

Consider that ‘OBJ’ is the position of the original object, and ‘h’ is the height of
the object; ‘f ’ is the distance between the lens and CMOS. At the point of ‘OBJ’,
the height of the reflected image is ‘a’; when we move our object with distance ‘m’
toward the lens, then the height of the reflected image is ‘b’ which is greater than
‘a’.
For θ 1 ,

tan θ1 = h/d = a/ f (opposite side/adjacent side) (1)

For θ 2

tan θ2 = h/d − m = b/ f (opposite side/adjacent side) (2)

Divide Eqs. (1) and (2).

a/b = h/d ∗ d − m/ h
a/b = d − m/d = 1 − m/d
a/b = d − m/d = 1 − m/d
m/d = 1 − a/b
d = m/1 − (a/b)

When an object gets closer to the lens, the size of the reflected image increases.

2.5 Distance Estimation Using an Ultrasonic Sensor

The HC-SR04 is a popular non-contact distance measurement module with a 2–


400 cm range. It uses sound to accurately measure distance, like bats and dolphins.
The system has an ultrasonic transmitter, receiver, and control circuit. The item
reflects short energy bursts from the transmitter, which the receiver collects. Ultra-
sonic HC-SR04 emits 40,000 Hz airborne ultrasound. If something blocks the
ultrasonic module, it collides and returns.
Distance is calculated using the formula:

Distance = speed ∗ time.

The time taken by a pulse is used to travel to and from ultrasonic signals, but we
only need half of it. As a result, time is divided by two.

Distance = Speed ∗ Time/2

An ultrasonic sensor and a monocular camera are connected through Arduino as


shown in Fig. 4. The angle of detection is the same for both sensors.
158 J. Madake et al.

Fig. 4 Hardware
implementation

2.6 Multiple Objects Detection Using YOLOv3 Algorithm

As was explained in Algorithm 2, the YOLOv3 algorithm is utilized in order to


accomplish real-time object detection. This algorithm is both more accurate and
more efficient in its operation. YOLO’s ability to perform object detection in real
time is made possible by the utilization of convolutional neural networks (CNN).
Through the use of the monocular camera, the system is able to recognize a wide
variety of objects, including humans, automobiles, bicycles, and many others. The
algorithm is equipped with the capability of identifying multiple objects all at once.

Algorithm 2 Object detection using YOLOv3


Input: Monocular image.
Output: Object detection and distance estimation.
1. Load Yolo’s v3 model.
2. Take the reference image with a known distance and width.
3. Read the image and resize it.
4. Find objects using Yolo v3.
5. Find the focal length.
6. Find the distance.
13 Fusion of Information Acquired from Camera and Ultrasonic Range … 159

7. Display the distance on the screen.

Before processing, YOLOv3 slices an image into a grid. Boundary boxes, also
known as anchor boxes, surround items with high categorization scores. Each
bounding box is used to identify one object, and its confidence score reflects the
forecast’s accuracy. The initial data set’s most common forms and sizes construct
the boundary boxes. To find the most common ground truth box dimensions, they
are aggregated.

3 Results and Discussion

The results of the distance between the camera and the detected object are included
in the first column of Table 1. The second column includes the distance between the
ultrasonic sensor and the detected object. The third column includes the difference
between these two distances, which is included in the error section.
An ultrasonic sensor and a monocular camera were used to detect objects at a
distance. We take seven different readings for cars, bikes, and people, as given in
Table 1.
The monocular camera captures the objects along with their distance, and
measurements from the ultrasonic sensor also give the distance. The analysis was
made by observing the distance and analyzed the error distance between the monoc-
ular camera and the ultrasonic sensor. Table 1 shows a comparison of the observations
with the samples. In some cases, the difference is as small as 2 cm and, in some cases,
as large as 35 cm. The system received 90% of accuracy after testing on different
objects at different distances. Here are the results shown in Fig. 5 of detecting multiple
objects using the YOLOv3 algorithm. Figure 5 shows that YOLOv3 identified several
objects. YOLOv3’s model backbone is DarkNet-53. DarkNet-53 has residual blocks

Table 1 Samples and observations


No of sample/ Object Distance from Distance from ultrasonic Error/
observations type camera (cm) sensor (cm) difference
(cm)
1 Car 160 162 ±2
2 Car 200 165 ± 35
3 Car 175 209 ± 34
4 Car 202 210 ±8
5 Bike 170 161 ±9
6 Bike 161 168 ±7
7 Person 174 141 ± 33
160 J. Madake et al.

Fig. 5 Multiple object detection

and up-sampling networks. YOLOv3 can anticipate at three scales because of its
unique architecture. These forecasts require layer 82, 94, and 106 feature maps.
YOLOv3 can recognize features at three scales, making up for YOLOv2 and YOLO’s
weaknesses in recognizing smaller objects. The technique preserves fine-grained
features by concatenating up-sampled layer outputs with data from prior layers. This
helps identify smaller objects. YOLOv3 predicts three bounding boxes for each cell,
compared to five in YOLOv2, but it does so at three layers, bringing the total to nine.
Figure 6 displays the output images that were acquired by a monocular camera
while it was being used to calculate distance.
13 Fusion of Information Acquired from Camera and Ultrasonic Range … 161

Fig. 6 Object detection with a monocular camera

4 Conclusion

A camera served as one modality in the system that we built, and an ultrasonic sensor
served as the second modality. The use of vision and distance estimation based on
ultrasonic waves has been combined. The system can calculate the distance with a
small error which is acceptable, as mentioned in Table 1. Accurate item categorization
is achieved by the utilization of YOLO version 3. The system has an accuracy of
90% both in terms of detecting objects and estimating their distances. The system
is able to function in real time and can determine the distance between two points
even if one of them is moving. The proposed system can be used in a number of
potential applications, some of which include route planning, obstacle recognition,
and algorithms for avoiding obstacles.
162 J. Madake et al.

References

1. WHO releases new global estimates on visual impairment. http://www.emro.who.int/control-


and-preventions-of-blindness-and-deafness/announcements/global-estimates-on-visual-imp
airment.html. Accessed 20 May 2022
2. Ministry of Road Transport and Highways, Government of India. https://morth.nic.in. Accessed
22 May 2022
3. Data fusion—Wikipedia. https://en.wikipedia.org/wiki/Data_fusion. Accessed 26 May 2022
4. Ningthoujam B, Ningthoujam JS, Namram RS, Nongmeikapam K (2016) Image and ultra-
sonic sensor fusion for object size detection. In: 2019 fifth international conference on image
information processing, pp 137–140
5. Bai J, Lian S, Liu Z, Wang K, Liu D (2017) Smart guiding glasses for visually impaired people
in indoor environment. IEEE Trans Consum Electron 63(3):258–266
6. Lim BS, Keoh SL, Thing VLL (2018) Autonomous vehicle ultrasonic sensor vulnerability and
impact assessment. In: IEEE 4th world forum on internet of things, pp 231–236
7. Forouher D, Besselmann MG, Maehle E (2016) Sensor fusion of depth camera and ultrasound
data for obstacle detection and robot navigation. In: 2016 14th international conference on
control, automation, robotics and vision, pp 1–6
8. Adhikary A, Vatsa R, Burnwal A, Samanta J (2020) Performance evaluation of low-cost RGB-
depth camera and ultrasonic sensors. In: Proceedings of the 2nd international conference on
communication, devices, and computing. Springer, Singapore, pp 331–341
9. Mori T, Scherer S (2013) First results in detecting and avoiding frontal obstacles from a monoc-
ular camera for micro unmanned aerial vehicles. In: 2013 IEEE international conference on
robotics and automation, pp 1750–1757
10. Valipoor MM, de Antonio A (2022) Recent trends in computer vision-driven scene under-
standing for VI/blind users: a systematic mapping. Univers Access Inform Soc 1–23
11. Panicker M, Mitha T, Oak K, Deshpande AM, Ganguly C (2016) Multisensor data fusion for an
autonomous ground vehicle. In: 2016 conference on advances in signal processing, pp 507–512
12. Bansal V, Balasubramanian K, Natarajan P (2020) Obstacle avoidance using stereo vision and
depth maps for visual aid devices. SN Appl Sci 2:1–17
13. Cho M (2019) A study on the obstacle recognition for autonomous driving RC car using Lidar
and thermal infrared camera. In: 2019 eleventh international conference on ubiquitous and
future networks. IEEE, pp 544–546
14. Sharma KS, Sahoo SR, Manivannan PV (2018) A hybrid vision system for dynamic obstacle
detection. Procedia Comput Sci 133:153–160
15. Bai J, Lian S, Liu Z, Wang K, Liu D (2018) Virtual-blind-road following-based wearable
navigation device for blind people. IEEE Trans Consum Electron 64:136–143
16. Lakde CK, Prasad PS (2015) Navigation system for visually impaired people. In: 2015 inter-
national conference on computation of power, energy, information and communication, pp
0093–0098
17. Häne C, Sattler T, Pollefeys M (2015) Obstacle detection for self-driving cars using only monoc-
ular cameras and wheel odometry. In: 2015 IEEE/RSJ international conference on intelligent
robots and systems, pp 5101–5108
18. Xu W, Yan C, Jia W, Ji X, Liu J (2018) Analyzing and enhancing the security of ultrasonic
sensors for autonomous vehicles. IEEE Internet Things J 5:5015–5029
19. Zhangyu W, Guizhen Y, Xinkai W, Haoran L, Da L (2021) A camera and LiDAR data fusion
method for railway object detection. IEEE Sens J 13442–13454
20. Saffoury R, Blank P, Sessner J, Groh BH, Martindale CF, Dorschky E, Franke J, Eskofier BM
(2016) Blind path obstacle detector using smartphone camera and line laser emitter. In: 2016
1st international conference on technology and innovation in sports, health and wellbeing, pp
1–7
21. Yang J, Liu S, Su H, Tian Y (2021) Driving assistance system based on data fusion of
multisource sensors for autonomous unmanned ground vehicles. Comput Netw 192:108053
13 Fusion of Information Acquired from Camera and Ultrasonic Range … 163

22. Tapu R, Mocanu B, Zaharia T (2013) A computer vision system that ensure the autonomous
navigation of blind people. In: E-health and bioengineering conference, pp 1–4
23. Aman MdS, Mahmud MdA, Jiang H, Abdelgawad A, Yelamarthi K (2016) A sensor fusion
methodology for obstacle avoidance robot. In: 2016 IEEE international conference on electro
information technology. IEEE, pp 0458–0463
24. Terven JR, Salas J, Raducanu B (2013) New opportunities for computer vision-based assistive
technology systems for the visually impaired. Computer 47(4):52–58
25. Nieuwenhuisen M, Droeschel D, Beul M, Behnke S (2014) Obstacle detection and navigation
planning for autonomous micro aerial vehicles. In: 2014 international conference on unmanned
aircraft systems, pp 1040–1047
26. Matusiak K, Skulimowski P, Strurniłło P (2013) Object recognition in a mobile phone appli-
cation for visually impaired users. In: 2013 6th international conference on human system
interactions, pp 479–484
27. Shahira KC, Tripathy S, Lijiya A (2019) Obstacle detection, depth estimation and warning
system for visually impaired people. In: TENCON 2019—2019 IEEE region 10 conference,
pp 863–868
28. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object
detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp 779–788
29. Jabnoun H, Benzarti F, Amiri H (2015) Object detection and identification for blind people
in video scene. In: 2015 15th international conference on intelligent systems design and
applications, pp 363–367
Chapter 14
Efficient Approach for Malware
Detection Using Machine Learning
Classifier

Umesh V. Nikam and Vaishali M. Deshmukh

1 Introduction

In today’s generation, every individual from younger one to the old age person is
addicted to smart gadgets. Every one of us spend so many hours daily using smart
gadgets. Use of all the smart gadgets has rapidly increased in some years because
every device is Internet enabled and everything is available on the Internet nowa-
days [1]. At the same time, risk is also increased because these devices knowingly
unknowingly collect the sensitive information of the user. There are many incidents
where attackers have looted people’s money online by sending those fake links and
malwares [2]. Recent study also shows that android smartphones are the easiest target
of attackers and are most sensitive to the virus attacks [3]. In this digital era, we can
do money transactions using our smartphone sitting comfortably at our home also.
But if a device is affected with malware, there is a big risk that your account may be
hacked and misused by an attacker. In the past, many traditional malware detection
techniques like signature based, etc., were available and used for malware detection,
but over the years these malwares have also upgraded themselves and due to these
traditional techniques are inefficient now, and it is very difficult to detect these latest
malwares using traditional techniques.
So, we need to find some latest approach for malware detection. In various research
works and through various studies, it has been proved that machine learning is
the best suitable technique for efficiently detecting malwares [4]. So, exploring,

U. V. Nikam (B) · V. M. Deshmukh


Department of Computer Science and Engineering, P. R. M. I. T and R, Badnera-Amravati, MS,
India
e-mail: umeshnikam3@gmail.com
V. M. Deshmukh
e-mail: vmdeshmukh@mitra.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 165
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_14
166 U. V. Nikam and V. M. Deshmukh

implementing, and working with machine learning techniques can provide effec-
tive solutions for malware detection. Considering all these things, many machine
learning techniques were studied, and then an efficient technique was developed. The
same technique is explained in this paper. This approach has evaluated ten different
machine learning algorithms using parameters such as accuracy, AUC, precision, and
recall. Accuracy and efficiency achieved for all these algorithms using our technique
are presented in the below sections of this paper.
This paper provides an efficient technique to detect malware using machine
learning classifiers. This research work is implemented using ten machine learning
classifiers. The model is built and trained using a debrian_215 Kaggle dataset. This
dataset includes 5560 malware and 9476 benign samples [5]. Performance of the
various classifiers is measured using some parameters such as, AUC, accuracy,
precision, recall, and F1 measure.
A paper is arranged in a way: Sect. 2 of the paper contains the related work
regarding the malware detection techniques. Methodology of the technique used is
explained in Sect. 3. Various criteria that are used for performance measurement of
machine learning classifiers and a discussion regarding the obtained results are in
Sect. 4 of the paper. Conclusion is written in Sect. 5 of the paper.

2 Related Works

Dhalaria et al. [6] have used a hybrid approach with static and dynamic features
for malware detection. They have created their two datasets and applied different
machine learning algorithms to their methodology. Results obtained show that
selecting hybrid features can be effective in detection of malware. Accuracy obtained
here is about 96.65%.
Darem et al. [7] have created an adaptive behavioral-based deep learning model
for the detection of malware. They worked on the concept drift problem and made
efforts to reduce malware’s evasive behavioral impact by naturalizing the operating
system to look like an actual machine and shading the virtual environment.
Gao et al. [8] have used a graph convolutional network for detecting malware. They
used API usage patterns to model the relevance of APIs, thereby doing classification.
The obtained results show that the GDroid technique outperformed. Their work yields
insight into the API usage patterns for malware detection and classification. GDroid
achieved an average accuracy of 97%.
Roseline et al. [9] have suggested a deep forest model for detecting malware.
The specialty of the proposed model is deep layering of ensemble and low model
complexity. This technique has outperformed others in malware detection. The
detection rate of this model through the result shown is 98.65%.
McGiff et al. [10] have combined the hardware and some permission data features
for the detecting malwares. Model’s performance as well as its accuracy is improved
due to the effect of combining these features. So, working on the combination of a
variety of features can be effective.
14 Efficient Approach for Malware Detection Using Machine Learning … 167

In [11], Noor Azleen Anuar et al. have made use of the opcode analysis method
for detecting malware. With their experiment and the results obtained, they proved
that the frequency of occurrence of an opcode is higher with malicious applications
than in benign ones. As a result of their findings, we can conclude that the opcode
feature may be critical in distinguishing malware from benign applications.
Shhadat et al. [12] have experimented with some of the machine learning tech-
niques for malware detection in their work. To reduce a count of features, their work
presented an enhanced feature set with random forest. They experimented with some
machine learning algorithms on the benchmark dataset. Decision trees achieved the
highest accuracy of 98.2% among the machine learning algorithms used, while Naive
Bayes achieved the lowest accuracy of 91%.
This study [17] offers MAPAS, a malware detection system that uses computa-
tional resources adaptably and with high accuracy. MAPAS utilizes convolutional
neural networks to assess the actions of malicious programs that are based on API call
graphs (CNN). It just uses CNN to find commonalities in malware’s API call graphs.
A lightweight classifier used by MAPAS to effectively detect malware compares the
API call graphs of applications that will be classified with those used for harmful
activity in order to determine how similar they are. The results of the assessment
show that MAPAS can categorize applications 145.8% faster and utilize memory
about ten times slower than MaMaDroid.

2.1 Tools and Technology Used

This technique is implemented using a Python IDE, Spyder, which is available in the
Anaconda tool. It has built-in support for so many useful machine learning packages
that are used in both supervised and unsupervised work. The results are generated
with a computer having an Intel i5-3317U processor executing at 3.20 GHz, 8 GB
of RAM, and the Windows 10 operating system.

3 Methodology

The methodology implemented has three steps: collection of data, extraction and
selection of features, and performance measurement of machine learning algorithms.
All three steps are depicted in Fig. 1.

3.1 Step I: Collection of Data

In this first step of the methodology, the main focus is on collection of the required
data. In any malware detection approach, the dataset selection role is very important.
168 U. V. Nikam and V. M. Deshmukh

Fig. 1 Implemented
technique

The collection of data created in this step consists of samples of benign as well as
malicious applications. A debrian-215 dataset available on the Kaggle website [5] is
used to measure the performance of the classifiers used in this technique. This dataset
has 5560 malicious samples and 9476 benign samples. The dataset consists of 215
features. The majority of the features are manifest permissions, which account for
53% of the total, API call signatures, which account for 33%, and other permissions,
which account for 14%. The presence of various permissions for benign as well as
malicious applications in the dataset is shown by values of 0 or 1. A value of 0 in a
dataset for a particular permission means that said permission is not needed for that
application, and a value of 1 means it is needed. A dataset has one more column that
indicates whether the application is benign or malicious.
14 Efficient Approach for Malware Detection Using Machine Learning … 169

3.2 Step II: Preprocessing of Dataset, Extracting,


and Selecting Features

In the second step of the methodology, first a dataset is preprocessed, and then
required features are extracted from the applications and selected for further tasks as
discussed below:
Preprocessing of Dataset: A dataset used may contain irrelevant data as well as
many columns that have a lot of missing values. Due to the inappropriate nature
of the data and the missing values, many errors could arise at the time of training
a model, and because of this, a model may not be trained properly and hence be
inefficient for malware detection. So it is really important to carefully preprocess a
dataset before it is used for any type of operation.
Python has many libraries like NumPy, Pandas, scikit-learn [12], etc. Various
features from these libraries can be used for preprocessing data. This approach uses
the Simple Imputer class and an average value method from the scikit-learn library
for preprocessing of data [13].
Extraction of Features: The majority of Android applications are built with Java
code. Compiling this Java code forms a byte code, and then again, DEX byte code is
formed by converting this byte code. These byte code files have the .class extension.
All the .class files are combined together, and a single .dex file is formed with the use
of the dx tool. At last, APK contents are formed by packing an Android application.
For analyzing these APKs, one needs to extract and separate them. Many reverse
engineering applications, like dex2jar, apk_tool, and jadx, are available for separating
and analyzing the performance of these applications.
Any Android application is nothing but an archive bundle. This includes a file
named manifest .xml. This file is very important in any apk, and it contains many
features required for doing static analysis. To extract these features, we must undo
this file by reverse engineering and extraction software.
As can be seen in Fig. 2, feature extraction and selection, every Android APK file
contains the manifest .xml and classes .dex files. These two files have features, namely
permission requests and API calls, which are important features in distinguishing
malware. These features are extracted using AndroPyTool. Permission requests from
manifest .xml files and API calls from classes .dex files are extracted for analysis
purposes. Similar to this, many other required features can be extracted and used.
This approach has focused only on the permission request and API call features.
Once features are extracted, 15 important features based on their score value that
are vital in differentiating between malicious and benign apps are selected using the
feature importance technique of machine learning. The following section explains
feature selection techniques.
Selection of Features: For achieving accurate results, selection of relevant features
that are effective in distinguishing between malware and benign apps is very impor-
tant. In this phase to preprocess a dataset, the Simple Imputer class of the scikit-learn
library is used.
170 U. V. Nikam and V. M. Deshmukh

Fig. 2 Feature extraction and selection

Once preprocessing is completed, features are extracted from an application, and


then, out of the total features available, 15 features are selected with the highest score
value. For selecting these features, feature importance technique is used. Features
selected are shown in Fig. 3. Selected important features.
For achieving accurate results, selection of relevant features that are effective in
distinguishing between malware and benign apps is very important. In this phase to
preprocess a dataset, the Simple Imputer class of the scikit-learn library is used.
Out of the 215 features, 15 significant features are chosen using the feature selec-
tion method based on their score rankings. All the features with their score values
are shown in Fig. 3.

Fig. 3 Selected features


14 Efficient Approach for Malware Detection Using Machine Learning … 171

3.3 Step III: Applying Machine Learning Algorithm

Malware detection relies heavily on a modern machine learning technology. The


selected features are used for machine learning model training and testing purposes.
These features are given as inputs to this model, which then detects whether an
application is benign or malicious by applying different machine learning classifiers.
For better results, a dataset is divided into 70:30 ratios. 70% of the dataset is
used for training purposes, and 30% is used for testing purposes. A K-fold technique
of cross-validation and a K = 10 value is used for estimating the effectiveness of
machine learning models.
Naïve Bayes, KNN, SVM, random forest, XGBoost, decision tree, logistic regres-
sion, gradient boost, kernel SVM, and AdaBoost are the machine learning algorithms
used and are evaluated using the debrian-215 dataset. An analysis of the obtained
result is discussed below.

4 Results Obtained

4.1 Evaluation Criteria

The following parameters are used to evaluate the performance and effectiveness of
machine learning algorithms:
Confusion Matrix: It is a table that displays the performance of machine learning
classifiers using various parameters. This information can be used to visualize the
performance of a model as well as to determine the usefulness of a machine learning
model [14]. Various parameters of the confusion matrix are given in Table 1 info of
the confusion matrix.
• False Positive: It is a ratio of incorrect positive samples to the entire negative
sample. Formula of FPR is

FPR = FP/(FP + TN).

Table 1 Info of confusion matrix


Actual positive (1) Actual negative (0)
Positive prediction (1) TP FP
True positive values False positive values
Negative prediction (0) FN TN
False negative values True negative values
172 U. V. Nikam and V. M. Deshmukh

• False Negative: It means the model has detected it as negative, but actually it is
a positive sample. Formula for FN is

FNR = FN/(FN + TP).

• Accuracy: It can be defined as the ratio of correctly classified samples to the


entire sample.

Accuracy = ((TP + TN)/(TP + FP + FN + TN)).

• Recall: It is the ratio of correct positive samples out of the total number of positive
samples and false negative samples. The value 0.0 means no recall, and 1.0 reflects
perfect recall.

Recall = ((TP)/(TP + FN)).

• Precision: It is the ratio of correct positive samples out of the total number of
positive samples and false positive samples. The value 0.0 meaning no precision
and 1.0 reflects perfect precision.

Precision = ((TP)/(TP + FP)).

• F1 Measure: It is used to measure the accuracy of a model. The F1 measure is


the harmonic mean of precision and recall. Its value lies between 0 and 1. Value
1 means perfect precision and recall, and value 0 means either precision or recall
is zero.

F1 Measure = 2.((Precision.Recall)/(Precision + Recall)).

Area under ROC Curve: The AUC-ROC curve helps in more clearly visualizing the
performance of machine learning classifiers. When the output of AUC is 1, it means
the classifier has perfectly differentiated between malicious and benign samples, and
when it is 0.5, it means a random prediction. Higher the AUC value better is the
performance of a classifier [15] (Fig. 4).

4.2 Performance of Algorithm

Table 2 shows the obtained values for each machine learning algorithms’ perfor-
mance:
14 Efficient Approach for Malware Detection Using Machine Learning … 173

Fig. 4 ROC curves

Table 2 Performance of algorithms


Name of algorithms Accuracy AUC Precision Recall F1 measure
Logistic regression 97.66 0.9773 0.973304 0.973304 0.97330433
SVM 97.73 0.9775 0.967574 0.974275 0.97091303
Random forest 98.33 0.9781 0.987902 0.964563 0.97609324
Naïve Bayes 69.39 0.49 1 0.5 0.66666667
KNN 97.75 0.9746 0.969389 0.968432 0.96891038
Kernel SVM 97.96 0.9768 0.986893 0.962628 0.97460918
Decision tree 97.53 0.9709 0.958016 0.967462 0.96271586
XGBoost 98.71 0.9899 0.993007 0.986111 0.98954766
AdaBoost 97.11 0.9699 0.967323 0.959721 0.96350725
Gradient boost 97.23 0.9711 0.975993 0.95685 0.96632663

4.3 Discussion

From the results shown in Table 2 and as per the graphs plotted in Fig. 5, it can be
concluded that all the algorithms have performed very well. In terms of accuracy,
XGBoost, random forest, and kernel SVM are the top three algorithms. The accuracy
of almost all the algorithms is above 97%, except for Naive Bayes.
In terms of precision and recall, again the same three algorithms have the highest
performance, which is reflected in the F1 measure score as well.
174 U. V. Nikam and V. M. Deshmukh

Fig. 5 Graph for accuracy of algorithms

In terms of accuracy, the performance of XGBoost is the highest, i.e., 98.71, of all
the classifiers used, and it has also performed best with respect to precision, recall,
and the F1 measure.
Looking at the result shown in Table 2, it can be stated that XGBoost has achieved
the top most accuracy and, followed by that, is the accuracy of the random forest
algorithm.
Figure 4 for ROC curves shows the performance of the algorithms used. The false
positive rate is plotted along the x-axis, and the true positive rate is plotted along the
y-axis in this curve. Its value falls between 0 and 1. Looking at the values obtained
with the ROC curve in Fig. 4, the area under the curve for the XGBoost algorithm
is 0.990, which is considered the perfectly accurate curve. So, in terms of the ROC
curve parameter, the XGBoost algorithm’s performance is found to be the best of all.
Most of the other algorithms have an AUC value greater than 0.97, indicating better
performance. Only the Naive Bayes AUC value is 0.47, which is considered a poor
performance of the model [16].
Looking at the obtained results for performance evaluation of various classifiers
shown in Table 2 and graph plotted for accuracy in Fig. 5, it can be stated that
XGBoost has the highest accuracy out of all the algorithms evaluated, i.e., 98.71 and
an AUC value of 0.9899. So it can be claimed that it is the best classifier among all
others.
So it can be recommended to readers to use this algorithm in their malware
detection techniques to achieve significant results.
14 Efficient Approach for Malware Detection Using Machine Learning … 175

5 Conclusion

The debrian-215 dataset was used in this paper to assess the performance of ten
machine learning classifiers. This dataset consists of data from 15,036 malicious and
benign applications. A dataset was divided into a 70:30 ratio, where 70% of the data
was used for training and 30% was used for testing the model. The parameters accu-
racy, AUC, precision, recall, and F1 measure are used for evaluating the performance
of the ten machine learning algorithms.
From the result obtained, it is reflected that the accuracy of XGBoost algorithm,
i.e., 98.71, is the highest compared to other algorithms and has achieved the almost
perfect AUC value of 0.99. With respect to other parameters also, like precision,
recall, and F1 measure, the performance of XGBoost is superior to other algorithms.
In the future, in the search for more accurate techniques for malware detec-
tion, deep learning algorithms can also be evaluated, and their performance can
be measured with respect to a few parameters.

Acknowledgements We are indebted to the Computer Science and Engineering Department of


PRMIT & R, Badnera-Amravati, for their assistance and support during the course of this research.

References

1. Naseer M, Rusdi J, Shanono N, Salam S, Zulkiflee M, Abu N, Abadi I (2021) Malware detection:
issues and challenges. J Phys Conf Ser 1807:012011. https://doi.org/10.1088/1742-6596/1807/
1/012011
2. Nikam UV, Deshmuh VM (2022) Performance evaluation of machine learning classifiers in
malware detection. In: 2022 IEEE international conference on distributed computing and elec-
trical circuits and electronics (ICDCECE), pp 1–5.https://doi.org/10.1109/ICDCECE53908.
2022.9793102
3. Arslan RS (2021) Identify type of android malware with machine learning based ensemble
model. In: 2021 5th international symposium on multidisciplinary studies and innovative
technologies (ISMSIT), pp 628–632. https://doi.org/10.1109/ISMSIT52890.2021.9604661
4. Ali R, Ali A, Iqbal F, Hussain M, Ullah F (2022) Deep learning methods for malware and
intrusion detection: a systematic literature review. Sec Commun Netw 2022:31. Article ID
2959222. https://doi.org/10.1155/2022/2959222
5. Miranda TC et al (2022) Debiasing android malware datasets: how can I trust your results if
your dataset is biased? IEEE Trans Inform Forensics Sec 17:2182–2197
6. Dhalaria M, Gandotra E (2020) A hybrid approach for android malware detection and family
classification. Int J Interact Multimedia Artif Intell In Press. 1. https://doi.org/10.9781/ijimai.
2020.09.001
7. Darem A, Ghaleb F, Al-Hashmi A, Abawajy J, Alanazi S, AL-Rezami A (2021) An adaptive
behavioral-based incremental batch learning malware variants detection model using concept
drift detection and sequential deep learning. IEEE Access 1–1. https://doi.org/10.1109/ACC
ESS.2021.3093366
8. Gao H, Cheng S, Zhang W (2021) GDroid: android malware detection and classification with
graph convolutional network. Comput Secur 106:102264. https://doi.org/10.1016/j.cose.2021.
102264
176 U. V. Nikam and V. M. Deshmukh

9. Roseline A, Subbiah G, Kadry S, Nam Y (2020) Intelligent vision-based malware detection


and classification using deep random forest paradigm. IEEE Access 8:206303–206324. https:/
/doi.org/10.1109/ACCESS.2020.3036491
10. McGiff J, Hatcher WG (2019) Towards multimodal learning for android malware detec-
tion. In: International conference on computing, networking and communications (ICNC):
communications and information security symposium, pp 432–436
11. Anuar NA, Masud MZ (2020) Analysis of machine learning classifier in android malware detec-
tion through opcode. In: IEEE conference on application, information and network security
(AINS), IEEE. https://doi.org/10.1109/AINS50155.2020.9315060
12. Shhadat I, Al-bataineh B, Hayajneh A, Al-Sharif Z (2020) The use of machine learning tech-
niques to advance the detection and classification of unknown malware. Procedia Comput Sci
170:917–922.https://doi.org/10.1016/j.procs.2020.03.110
13. Zhu J et al (2022) abess: a fast best-subset selection library in python and r. J Mach Learn Res
23(202):1–7
14. Chaganti R, Ravi V, Pham TD (2022) Deep learning based cross architecture internet of things
malware detection and classification. Comput Secur 30:102779
15. Zhang X-L, Xu M (2022) AUC optimization for deep learning-based voice activity detection.
EURASIP J Audio Speech Music Process 2022(1):1–12
16. Xing X, Jin X, Elahi H, Jiang H, Wang G (2022) A malware detection approach using
autoencoder in deep learning. IEEE Access 10:1–1. https://doi.org/10.1109/ACCESS.2022.
3155695
17. Kim J, Ban Y, Ko E, Cho H, Yi J (2022) MAPAS: a practical deep learning-based android
malware detection system. Int J Inf Secur 21:1–14. https://doi.org/10.1007/s10207-022-005
79-6
Chapter 15
Evaluation of a Hybrid Dataset for Risk
Assessment of Heart Disease

Indrani Mukherjee, Pratik Bhattacharjee, and Suparna Biswas

1 Introduction

Machine learning has a wide range of uses, and healthcare is one of them. For
example, machine learning is used in identifying diseases and making diagnoses,
smart health records, drug discovery and manufacturing, medical image diagnosis,
and machine learning-based behavior modification, among other things. People over
60 and men have a very high risk of heart disease. Nevertheless, even young people
have the same chance of being less than 60. “Obesity” is a prominent determinant.
This research will focus on whether a person who is obese also has heart disease.
This will be done by looking at the criteria for obesity. In the future, the researchers
want to find a way to use Human Activity Recognition to detect obesity and, based
on the results, determine how likely someone is to have heart disease. Some of the
causes of obesity that we know of so far are high blood pressure and an increase in
LDL. People who are overweight tend to eat too much. Therefore, the number of
calories one can burn will go down. For girls, having PCOS can be one of the reasons
they get heart disease if it is caused by being overweight or obese.
The main objective of the research as follows:
1. Firstly, checkout the factors that play an important role in making a person obese–
it can be any medical factors, lifestyle, and daily habits.

I. Mukherjee (B)
OmDayal Group of Institution, Uluberia, West Bengal, India
e-mail: indranim849@gmail.com
P. Bhattacharjee
Sister Nivedita University, Newtown, Kolkata, West Bengal, India
e-mail: pratikb@ieee.org
S. Biswas
Maulana Abdul Kalam Azad University of Technology, Haringhata, West Bengal, India
e-mail: mailtosuparna@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 177
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_15
178 I. Mukherjee et al.

2. To find out the thresholds of each concerned factor to highlight the abnormality
of a person’s health and habits, that makes them obese and make certain classes.
3. Find out the relations of those factors with heart disease.
4. Make a comparison of experimental results of supervised learning algorithms
(Support Vector Machine, Decision Tree, and Logistic Regression).
5. Trying to improve accuracy if not satisfactory.
6. Finding methodology to make hybrid datasets (Combining two or more datasets)
and measure the performance.
7. Compare the accuracy of two methods lastly.
Details of the methodology for combining two or more similar kinds of databases
are discussed on Sect. 3.2.

2 Earlier Works

It has been proposed to use an FCN-based DenseNet framework to automatically


detect and classify skin lesions in dermoscopy images and it has an accuracy of 98%
in 7 category experimental data [1]. Another research has been based on prediction
of heart disease using some supervised learning methods–Naïve-Bayesian, Deci-
sion Tree, K-Nearest Neighbor, and Random Forest algorithm and then comparing
their accuracy level. Naïve-Bayesian has got 84.16%, SVM-RFE has 83.49%, accu-
racy received from Decision Tree is 71.43%, and from K-NN it is 83.16% [2].
Few works were done during the covid period based on remote health care [3, 4].
Another comparison of machine learning algorithms like–Gaussian NB, Support
Vector Machine, Random Forest, Hoeffding Tree, Logistic Model has been done for
prediction of heart disease and comparing the accuracy of given algorithms [5]. A
method for application of unsupervised learning in weight loss categorization have
been proposed by the researchers, that aims to identify the thresholds for unsuper-
vised learning for weight loss categories, those are required for supervised learning
by applying a dietary innervation program by analyzing the patient’s profile at the
entry of the program and for some weekly basis [6]. Another method describes
a process of comparing deep learning methods of Recurrent Neural Network and
Convolution Neural Network in identifying Human Activity Recognition [7]. The
methodology for diagnosis of skin diseases using Convolutional Neural Networks
has been introduced by another research group. This system utilizes a computa-
tional technique which is used to analyze the image, process it, and predict various
features of the images [8]. Another comparison of machine learning algorithms Naïve
Bayes, Classification tree, K-NN, Logistic Regression, SVM, ANN and then eval-
uate the performances for those algorithms to check which does more better job [9].
A method of using two-branch CNN is used to learn a classification of Glaucoma
disease based on retinal image–by examining the whole image and the local region
separately. In the first branch, a CNN is used to analyze the whole image and in
the second branch a faster RCNN is used to analyze the disc region. For data 3554
15 Evaluation of a Hybrid Dataset for Risk Assessment of Heart Disease 179

images are collected from 2000 patients, from which this procedure gives success
on 1391 cases and for 2163 images unsuccessful 81.69% accuracy [10]. Three spar-
sity learning-based regression models are presented and evaluated in this research
with application of the automated prediction of the Mini-Mental State Examina-
tion (MMSE) scores for Alzheimer’s disease using T1-weighted magnetic resonance
images (MRIs) with 678 subjects, including 190 healthy control (HC) subjects, 331
mild cognitive impairment (MCI) subjects and 157 AD subjects. They used ridge,
lasso, and elastic net as a regression algorithm with five levels of whole brain volume
and MMSE score to be independent variables. Tenfold-cross validation is used to
measure the prediction performance and another tenfold used to estimate the optimal
parameter [11]. This research has introduced to describe the methodology to help
in diagnosis of sepsis machine learning techniques, like–Backpropagation, Artifi-
cial Neural Network (ANN), a Support Vector Machine (SVM) and Random Forest
(RF) classifiers were trained and tested by using the data of electronic health record
(EHR) for 185 critically ill patients, among which 13 patients were diagnosed as
having heatstroke; 27 with trauma; 9 with severe pancreatitis and 15 with post oper-
ation. Meanwhile, 102 cases of those data were diagnosed with bacterial sepsis by a
physician through the medical records [12]. A method of monitoring human activity
without smartphones. Activity recognition through–posture identification has been
done [13].

3 Proposed Methodology

3.1 Dataset

The “Cardiovascular Disease Dataset,” a publicly available dataset containing 70,000


occurrences and 11 characteristics, was utilized as the basis for this study’s first
findings. After that, it was combined with the Heart failure Prediction Dataset from
the public domain to generate a hybrid dataset. This latter dataset is likewise a hybrid,
consisting of four separate databases, and has 918 observations with 11 characteristics
(including one target attribute). This hybrid database contains 303 instances with all
the attributes of the firstly mentioned database with value modification of Cholesterol.
The merging procedure will be discussed in this paper afterwards. The stepwise
workflow is shown in Fig. 1.
180 I. Mukherjee et al.

Fig. 1 Step wise procedure

3.2 Formation of Hybrid Dataset:

Several steps and conditions are followed for making a hybrid dataset.
i. First in the given values in first dataset, we have a pre-classified values for
Cholesterol and it has 3 classes–1: normal, 2: above normal, 3: well above
normal.
15 Evaluation of a Hybrid Dataset for Risk Assessment of Heart Disease 181

ii. In general, attribute Cholesterol can be categorized into 4 classes with respective
threshold values:
a. Cholesterol < 130 (Optimal/Normal) &#xF03E; 1
b. Cholesterol > = 130 and Cholesterol < = 159 (Borderline) &#xF03E; 2
c. Cholesterol > = 160 and Cholesterol < = 189 (High) &#xF03E; 3
d. Cholesterol > = 190 (Very High) &#xF03E; 4
iii. Now in the first dataset consider several factors like age, gender, BMI (calculated
from Height and Weight), systolic and diastolic blood pressure and check the
level of abnormality one patient has.
iv. In the next step check for that particular patient, if he/she has heart disease or
not in the data record by target attribute of the dataset 1.
v. If yes then check the level of cholesterol (in dataset 1) of his/her and replace a
close value of cholesterol from the dataset 2 for a patient who has heart disease
and also the factors are closely matched.

3.3 Data Preprocessing and Feature Selection

Here, this is needful to check if any null value exists. However, there is no null value
in this dataset. The selection of features focused on the dataset with a high correla-
tion between being overweight or obese and having heart disease. The relationship
between the qualities we chose and the target variable has been double-checked using
the information gain technique.

3.4 Making Classes and Data Classification

Attribute thresholds must be determined before classes can be made for that
attribute. Before that, everyone’s BMI was determined using their height and weight.
Classifications of qualities with their thresholds are displayed in Table1.
The binary classification problem is studied using three machine learning
algorithms: the Support Vector Machine, the Decision Tree, and the Logistic
Regression.
To determine the optimal highest margin separating hyperplane between the two
classes, Support Vector Machines (SVMs) are used in both classification and regres-
sion (a supervised learning technique) [14]. An SVM is a finite-dimensional vector
space in which each dimension represents a characteristic of a given sample [15].
The hyperplane (and thus the kernel) can be linear if there are only two features;
otherwise, it might be a polynomial or radial basis function.
Classification and regression can benefit from the supervised learning approach
known as a Decision Tree. The input to a Decision Tree is an item or circumstance
characterized by a collection of qualities, and the output is a binary “yes” or “no.”
182 I. Mukherjee et al.

Table 1 Attribute threshold values


Attributes Class making with thresholds
BMI BMI < 30 (Not Obese) Class 1, BMI > = 30 and BMI < = 34.9 (Obese) Class 2
BMI > = 35 and BMI < = 39.9 (Medium Obese) Class 3
BMI > = 40 (Sevier Obese) Class 4
ap_hi ap_hi < 120 Class 1, ap_hi > = 120 and ap_hi < = 139 Class 2, ap_hi > = 140
(Systolic and ap_hi < = 159 Class 3, ap_hi > = 160 Class 4
Blood
Pressure)
ap_lo ap_lo < 80 Class 1, ap_lo > = 80 and ap_lo < = 89 Class 2, ap_lo > = 90 and
(Diastolic ap_lo < = 99 Class 3, ap_lo > = 100 Class 4
Blood
Pressure)
Cholesterol Cholesterol < 130 Class 1, Cholesterol > = 130 and Cholesterol < = 159 Class
2, Cholesterol > = 160 and Cholesterol < = 189 Class 3, Cholesterol > = 190
Class 4

Both continuous and discrete input values are acceptable. The leaf nodes return class
labels or probability scores [10]. Information gain and entropy calculations are the
basis of Decision Tree.
The categorical dependent variable may be predicted from a collection of inde-
pendent factors using the supervised learning process known as Logistic Regression.
Dependent variables are assigned probabilistic values between 0 and 1, which can
be interpreted as a range. Logistic Regression can be modeled as:

p(x)
log = β0 + xβ
1 − p(x)

where p(x) = Probability of some input x.


β 0 , β = Regression coefficients.

3.5 Performance Measure

In the first experiment that is before making the hybrid dataset the dataset is divided
into two parts by train-test split validation technique in 80% and 20%, respectively,
and Table 2 shows the result of accuracy of the research.

Table 2 Accuracy
Sl. No. Name of the algorithm Accuracy (%)
measurement for ML
algorithms 1 SVM (rbf karnel) 72.15
2 Decision tree 73.078
3 Logistic regression 71.85
15 Evaluation of a Hybrid Dataset for Risk Assessment of Heart Disease 183

Table 3 Confusion matrix


Actual value
Predicted Positive value Negative
value value
Positive TP FP
value
Negative FN TN
value

Again, the dataset has been split into train and test data at a ratio of 80–20% using
the train-test split validation approach to gauge performance using a hybrid dataset.
The accuracy may be evaluated using several approaches, including the Accuracy,
Confusion Matrix, Precision, and Recall Value methods. The accuracy under the
three classifiers is shown in Table 2.
Confusion matrix–The confusion matrix is shown in Table 3, where
TP = True Positive (The actual value is positive and the model predicted value is
also a positive value.)
FP = False Positive (The actual value is negative and the model predicted value
a positive value.)
TN = True Negative (The actual value is negative and the model predicted value
is also a negative value.)
FN = False Negative (The actual value is positive and the model predicted value
a negative value.)

TP
Precision =
FP + TP
TN + TP
Accuracy =
(TN + FN + TP + FP)
TP
Recall value =
FN + TP

Table 4 shows the accuracy values, Table 5 shows the recall value, and Table 6
shows the precision value of SVM, DT, and LR in hybrid dataset, and the corre-
sponding confusion matrix is shown in Figs. 2, 3 and 4 for SVM, DT, and LR,
respectively.

Table 4 Accuracy under


Sl. No. Name of the algorithm Accuracy (%)
different algorithms with
proposed hybrid dataset 1 SVM (rbf Kernel) 79.85
2 Decision tree 98.8
3 Logistic regression 85.45
184 I. Mukherjee et al.

Table 5 Recall value under


Sl. No. Name of the algorithm Accuracy (%)
different algorithms with
proposed hybrid dataset 1 SVM (rbf Kernel) 87.5
2 Decision tree 98.8
3 Logistic regression 84.88

Table 6 Precision value


Sl. No. Name of the algorithm Accuracy (%)
under different algorithms
with proposed hybrid dataset 1 SVM (rbf Kernel) 72.095
2 Decision tree 98.8
3 Logistic regression 77.79

Fig. 2 Confusion matrix for SVM (Proposed method)

Fig. 3 Confusion matrix for DT (Proposed method)

Fig. 4 Confusion matrix for LR (Proposed method)


15 Evaluation of a Hybrid Dataset for Risk Assessment of Heart Disease 185

The result and accuracy of the proposed methodology is compared with the state-
of-the-art methods in Table 7. It is observed that the proposed methodology performed
much better with an accuracy upto 98.8%.
The accuracy between two proposed methods is compared graphically in Fig. 5. It
is found that the accuracy improved significantly after applying the hybrid database.

Table 7 Accuracy comparison with state-of-the-art methods


References No. Authors Techniques Accuracy
[14] Halima EL HAMDAOUI, Saïd NB 84.28
BOUJRAF, Nour El Houda K-NN 81.31
CHAOUI, Mustapha SVM 77.42
MAAROUFI RF 77.14
DT 82.28
[5] Pranav Motarwar, Ankita Gaussian NB 93.44%
Duraphe, G Suganya, M Support Vector 77.16%
Premalatha Machine 95.08%
Random Forest 81.24%
Hoeffding Tree 80.69%
Logistic Model
Tree
[2] Devansh Shah · Samir Patel · Naïve Bayes 88.157%
Santosh Kumar Bharti K-NN 90.789%
Decision tree 80.263%
Random forest 86.84%
[9] Kumar Dwivedi Naïve Bayes Classification tree 83%
K-NN 77%
Logistic regression 80%
SVM 80%
ANN 76%
84%
First model (Without hybrid SVM (rbf karnel) 72.15%
dataset) Decision tree 73.078%
Logistic regression 71.85%
Proposed method (Hybrid SVM (rbf karnel) 77.85%
dataset) DT 98.8%
LR 83.45%
186 I. Mukherjee et al.

Fig. 5 The accuracy comparison of two methods (Before and after applying hybrid database)

4 Conclusion

This study’s primary objective is to devise a method for increasing the precision
of classification algorithms (Support Vector Machine, Decision Tree, and Logistic
Regression) by combining existing databases and creating a new one. Using this
idea, the results have been displayed in Table 4, and in the Decision Tree, the largest
improvement in accuracy has been demonstrated. The remaining two saw some
progress as well. Table 5 shows the accuracy comparison of this research with other
existing research and Fig. 5 shows the bar graph to illustrate how accuracy has been
increased for our first method to detect heart disease with respect to obesity, after
applying hybrid database method.
In the long run, this research could lead to a combination of Human Activity
Recognitions so that further progress can be made.

References

1. Adegun AA, Viriri S (2020) FCN-based DenseNet framework for automated detection and
classification of skin lesions in dermoscopy images. IEEE Access 8:150377–150396
2. Shah D, Patel S, Bharti SK (2020) Heart disease prediction using machine learning techniques.
SN Comput Sci 1–6, Springer Nature Journal
3. Bhattacharjee P, Biswas S, Roy S (2022) Design of an optimised, low cost, contactless ther-
mometer with distance compensation for rapid body temperature scanning. Int Conf Electr
Electron Eng 503–511. https://doi.org/10.1007/978-981-19-1677-945
4. Bhattacharjee P, Biswas S (2021) Smart walking assistant (swa) for elderly care using an
intelligent realtime hybrid model. Evolving Syst 1–15 (2021). https://doi.org/10.1007/s12530-
021-09382-5
15 Evaluation of a Hybrid Dataset for Risk Assessment of Heart Disease 187

5. Motarwar P, Duraphe A, Suganya G, Premalatha M (2020) Cognitive approach for heart disease
prediction using machine learning. In: 2020 international conference on emerging trends in
information technology and engineering (ic-ETITE), pp 1–5
6. Babajide O, Tawfik H, Palczewska A, Gorbenko A, Astrup A, Martinez JA, Oppert J-M,
Sorensen TIA (2019) Application of unsupervised learning in weight-loss categorisation for
weight management programs. In: The 10h IEEE international conference on dependable
systems, services and technologies, DESSERT’2019. IEEE, pp 94–101
7. Roobini S, Fenila Naomi J (2019) Smartphone sensor based human activity recognition using
deep learning models. Int J Recent Technol Eng (IJRTE), 8(1), 2019
8. Rathod J, Waghmode V, Sodha A, Bhavathankar P (2018) Diagnosis of skin diseases
using convolutional neural networks. In: Proceedings of the 2nd international conference on
electronics, communication and aerospace technology (ICECA 2018), pp 1048–1051, IEEE
9. Dwivedi AK (2018) Performance evaluation of diferent machine learning techniques for
prediction of heart disease. Neural Comput Appl 29(10):685–693
10. Chai Y, He L, Mei Q, Liu H, Xu L (2017) Deep learning through two-branch convolutional
neuron network for glaucoma diagnosis. In: Proceedings of international conference on smart
health. Springer, Berlin, pp 191–201
11. Zhang J, Luo Y, Jiang Z, Tang X (2017) Regression analysis and prediction of mini-mental state
examination score in Alzheimer’s disease using multi-granularity whole-brain segmentations.
In: Proceedings of international conference on smart health. Springer, Berlin, pp 202–213
12. Liu Y, Choi KS (2017) Using machine learning to diagnose bacterial sepsis in the critically
Ill patients. In: Proceedings of international conference on smart health. Springer, Berlin, pp
223–233
13. Saha J, Chowdhury C, Biswas S (2017) Device independent activity monitoring using smart
handles. In: 7th International conference of cloud computing data science and engineering, pp
1–6
14. Hamdaoui HE, Boujraf S, Chaoui NEH, Maaroufi M (2020) A clinical support system for
prediction of heart disease using machine learning techniques. In: 5th International conference
on advanced technologies for signal and image processing, ATSIP’ 2020. pp 1–5
15. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Chapter 16
Distances from Fuzzy Implications

Kavit Nanavati, Megha Gupta, and Balasubramaniam Jayaram

1 Introduction

In the literature, a few works have dealt with the construction of distance functions
using t-norms, copulas, quasi-copulas, and t-conorms, all of which are either com-
mutative, associative, or monotonic fuzzy logic connectives; see [1, 2, 10]. Recently
[7], the construction of distance functions using non-commutative, non-associative,
and mixed-monotonic fuzzy logic connective, viz., a fuzzy implication, has been
proposed. The necessary and sufficient condition for the proposed distance function
to be a metric leads to a functional inequality, which has been studied for different
families of fuzzy implications; see [7–9].
Recently, pseudo-monometrics w.r.t. a ternary relation, called the betweenness
relation, have garnered a lot of attention for their essential role in penalty-based
data aggregation, ranking rules, and binary classification [4, 11, 12]. These are a
few applications showcasing the importance of construction of monometrics on a set
equipped with different relational structures. In [9], it was shown that the distance
function proposed through fuzzy implications turns out to be a pseudo-monometric
on a partially ordered set X . In [5], the authors have proposed yet another construction
of distance functions from fuzzy implications on a lattice, which turns out to be a
pseudo-monometric w.r.t. the lattice betweenness relation.

K. Nanavati (B) · M. Gupta · B. Jayaram


Department of Mathematics, Indian Institute of Technology Hyderabad, Hyderabad, Telangana
502285, India
e-mail: ma20resch01004@iith.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 189
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_16
190 K. Nanavati et al.

1.1 Motivation for and Contribution of this work

In this work, we generalise the distance from fuzzy implications that have been
proposed in [9] using t-conorms. We show the sufficient conditions under which the
proposed distance yields a metric for different t-conorms along with examples and
counter-examples. In this quest, we also give a characterisation of fuzzy implications
I for which the sum of I (x, y) and I (y, x) is constant.
In our work, we show if and when the proposed distance function from fuzzy
implications turns out to be a pseudo-monometric on ([0, 1], ≤).

2 Preliminaries

In this section, we take a look at some definitions and examples that will be useful
in the sequel.

Definition 1 (cf. [6]) A commutative, associative, and increasing function S :


[0, 1]2 → [0, 1] is called a t-conorm if S(0, x) = x, for all x ∈ [0, 1].

Table 1 lists a few examples of t-conorms.

Definition 2 (cf. [3]) A function I : [0, 1]2 → [0, 1] is said to be fuzzy implication
if I is decreasing in the first variable, increasing in the second variable and satisfies
I (0, 0) = 1, I (1, 1) = 1 and I (1, 0) = 0.

Table 2 lists a few examples of fuzzy implications. For more examples; see [3].

Definition 3 A symmetric function d : X × X → [0, ∞) is called a distance func-


tion on X if it satisfies the following property for any x, y ∈ X :

x = y =⇒ d(x, y) = 0. (P1)

Further, it is called a metric if the converse of (P1) holds and it also satisfies the
following property for any x, y, z ∈ X :

Table 1 Some examples of t-conorms


Name Formula
Maximum SM (x, y) = max(x, y)
Probabilistic sum SP (x, y) = x + y − x y
Łukasiewicz SLK (x, y) = min(x + y, 1)

max(x, y), if min(x, y) = 0,
Drastic sum SD (x, y) =
1, otherwise.
16 Distances from Fuzzy Implications 191

Table 2 Some examples of fuzzy implications


Name Formula
Reichenbach IRC (x, y) = 1 − x + x y

1, if x ≤ y,
Rescher IRS (x, y) =
0, otherwise.

0, if (x, y) = (1, 0),
I1 I1 (x, y) =
1, otherwise.

1, if x = 0 or y = 1,
I0 I0 (x, y) =
0, otherwise.

d(x, z) ≤ d(x, y) + d(y, z). (P2)

Definition 4 (cf. [12]) Let (X , ≤) be a partially ordered set. A function d : X ×


X → [0, ∞) is called a pseudo-monometric on (X , ≤) if it satisfies (P1), and its
converse, along with the following property for any x, y, z ∈ X :

x ≤ y ≤ z =⇒ max(d(x, y), d(y, z)) ≤ d(x, z). (1)

In the sequel, when we refer to the term pseudo-monometric, we mean a pseudo-


monometric on ([0, 1], ≤), where ≤ is the usual order on [0, 1].

Definition 5 (cf. [9]) Given a t-conorm S and a fuzzy implication I on [0, 1], the
pair (S, I ) is said to satisfy (S, I )-transitivity if

S(I (x, y), I (y, z)) ≥ I (x, z), for all x, y, z ∈ [0, 1]. (SIT)

Definition 6 [7] Let I be a fuzzy implication. Define d I : [0, 1] × [0, 1] → [0, 1]


as

0, if x = y,
d I (x, y) =
I (min(x, y), max(x, y)), otherwise.

Theorem 1 (cf. Theorem 1 [7]) d I is a metric iff I satisfies (SLK , I )-transitivity and
satisfies the following condition:

I (x, y) > 0, whenever x < y, x, y ∈ [0, 1]. (2)

Theorem 2 (cf. Lemma 12 [9]) d I is a pseudo-monometric for any fuzzy implica-


tion I .
192 K. Nanavati et al.

3 Distance Functions using Fuzzy Implications

In this section, we shall generalise the distance function given in Definition 6 using
any t-conorm S. We shall then present some sufficient conditions under which our
proposed distance function yields a metric or a pseudo-monometric for the major
t-conorms given in Table 1. We shall provide examples and counter-examples for the
same.
Note that the distance function d I defined in Definition 6 is equivalent to

0, if x = y,
d I (x, y) =
max(I (x, y), I (y, x)), otherwise.

0, if x = y,
=
SM (I (x, y), I (y, x)), otherwise.

Taking a cue from the above definition, we can generalise d I for any t-conorm S.
Definition 7 Let I be a fuzzy implication. Define d I,S : [0, 1] × [0, 1] → [0, 1] as

0, if x = y,
d I,S (x, y) = .
S(I (x, y), I (y, x)), otherwise.

Note that d I is a particular case of d I,S with S = SM , i.e., d I = d I,SM .


Lemma 1 Given any fuzzy implication I and a t-conorm S, there exists a fuzzy
implication I  such that d I,SM = d I  ,S .
Note that such a fuzzy implication I  can be defined as:

 0, if x > y,
I (x, y) = . (3)
I (x, y), otherwise.

Note that d I,S is always a distance function and satisfies the converse of (P1) only
if I satisfies (2). Also, it need not always satisfy the triangle inequality which can be
seen from the following result.
Lemma 2 Let I  be a fuzzy implication as defined in (3), where I does not satisfy
(SLK , I )-transitivity. Then d I  ,S is not a metric w.r.t. any t-conorm S.
The following lemma provides a sufficient condition under which d I,S yields a
pseudo-monometric .
Lemma 3 d I,S is a pseudo-monometric if I (x, y) = 0 whenever x > y.
Now, we take a look at the behaviour of d I,S for the major t-conorms given in
Table 1. Recall that for S = SM , d I = d I,SM , and the results pertaining to d I have been
discussed in Sect. 2 (for more details; see [9]). Thus, we shall discuss the remaining
t-conorms in the sequel.
16 Distances from Fuzzy Implications 193

3.1 S = SLK

In this section, we study the sufficient conditions under which the distance function
d I,S yields a metric and a pseudo-monometric when S is the Łukasiewicz t-conorm.
We also give examples and counter-examples for the same. For S = SLK , we get the
following definition for d I,S :
Definition 8 Let I be a fuzzy implication. Define d I,SLK : [0, 1] × [0, 1] → [0, 1]
as

0, if x = y,
d I,SLK (x, y) = .
min(I (x, y) + I (y, x), 1), otherwise.

Theorem 3 d I,SLK is a metric if I satisfies (SLK , I )-transitivity.


Corollary 1 If d I is a metric then d I,SLK is also a metric.
Note that the converse of the above result need not be true. Consider the fuzzy
implication I defined as follows:




1,   if x = 0,
1 + 4y
I (x, y) = min , 1 , if x < 0.11, (4)

⎪ 3

y, otherwise.

Then, d I,SLK is a metric but d I is not, since

d I (0.1, 0.11) + d I (0.11, 0.45) = 0.48 + 0.45 = 0.93  0.933 = d I (0.1, 0.45).

We thus see from Corollary 1 and the example above that d I,SLK is a richer source
of metrics than d I .
Note that d I,SLK need not be always a metric, as can be seen from the remark
below.
Remark 1 Using the fuzzy implication I given in (4), one can construct a fuzzy
implication I  as given in (3). From Lemma 2, we see that d I  ,SLK would not be a
metric since I does not satisfy (SLK , I )-transitivity.
Remark 2 From Lemma 3, it is clear that d I,SLK is a pseudo-monometric if I (x, y) =
0 whenever x > y. However, it need not always be a pseudo-monometric, see the
example below.
Example 1 Consider the fuzzy implication I defined as in (4). Then d I,SLK is not a
pseudo-monometric since for the triplet (0.2, 0.3, 0.4), we have

d I,SLK (0.3, 0.4) = 0.7  0.6 = d I,SLK (0.2, 0.4).


194 K. Nanavati et al.

3.2 S = SP

In this section, we study the sufficient conditions under which the distance function
d I,S yields a metric and a pseudo-monometric when S is the probabilistic sum t-
conorm. We also give some examples and counter-examples for the same. For S = SP ,
we get the following definition for d I,S :
Definition 9 Let I be a fuzzy implication. Define d I,SP : [0, 1] × [0, 1] → [0, 1] as

0, if x = y,
d I,SP (x, y) = .
I (x, y) + I (y, x) − I (x, y).I (y, x), otherwise.

Example 2 Consider the fuzzy implication I defined as in (4). Then d I,SP is a metric.
Note that d I,SP need not always be a metric, see the remark below.
Remark 3 Using the fuzzy implication I given in (4), one can construct a fuzzy
implication I  as given in (3). From Lemma 2, we see that d I  ,SP would not be a
metric since I does not satisfy (SLK , I )-transitivity.
Remark 4 From Lemma 3, it is clear that d I,SP is a pseudo-monometric if I (x, y) =
0 whenever x > y. However, it need not always be a pseudo-monometric, see the
example below.
Example 3 Consider the fuzzy implication I defined as in (4). Then d I,SP is not a
pseudo-monometric since for the triplet (0.2, 0.3, 0.4), we have

d I,SP (0.3, 0.4) = 0.58  0.52 = d I,SP (0.2, 0.4).

Theorem 4 Let I be a fuzzy implication such that

I (x, y) + I (y, x) = k, for all (x, y) ∈ (0, 1)2 where k ∈ [0, 2]. (5)

Then d I,SP is both a metric and a pseudo-monometric.


In the following theorem, we provide a complete characterisation of fuzzy impli-
cations satisfying (5).
Theorem 5 Let I be a fuzzy implication. Then (5) is true if and only if for all
(x, y) ∈ (0, 1)2 there exists a fuzzy implication I  such that

⎪ k

⎪ , if x = y,

⎨2 

k 

I (x, y) = min k, max( , I (x, y)) , if x < y, (6)

⎪  2 

⎪ k

⎩k − min k, max( , I  (y, x)) , if x > y.
2
16 Distances from Fuzzy Implications 195

We shall denote the fuzzy implication I defined in (6) as I = I  , k .

Example 4 I1 = I  = IRS , k = 2 , and I0 = I  , k = 0 for any fuzzy implication


I .

3.3 S = SD

In this section, we study the sufficient conditions under which the distance function
d I,S yields a metric and a pseudo-monometric when S is the drastic t-conorm. We
also give some examples and counter-examples for the same. For S = SD , we get the
following definition for d I,S :
Definition 10 Let I be a fuzzy implication. Define d I,SD : [0, 1] × [0, 1] → [0, 1]
as


⎨0, if x = y,
d I,SD (x, y) = I (min(x, y), max(x, y)), if I (max(x, y), min(x, y)) = 0, .


1, otherwise.

It is clear that the fuzzy implications for which I (max(x, y), min(x, y)) = 0 for
all x, y ∈ [0, 1], d I,SD = d I,SM . For instance, the fuzzy implication defined in (3).

Lemma 4 d I,SD is a discrete metric if I (x, y) > 0 whenever x > y, except when(x, y) =
(1, 0).

Note that the converse of the above lemma need not be true. Consider, for example,
the Rescher implication IRS given in Table 2. While IRS (x, y) = 0 whenever x >
y, it still yields a discrete metric. In fact, for any fuzzy implication I , satisfying
I (x, y) + I (y, x) = 1, d I,SD yields a discrete metric.
Note that d I,SD need not always be a metric or a pseudo-monometric, see the
example below.

Example 5 Consider the fuzzy implication I defined as follows:




⎨1, if x = 0 or y = 1,
I (x, y) = 0, if (x, y) ∈ [0.5, 1] × [0, 0.3],


0.1, otherwise.

Then, d I,SD is not a metric since

d I,SD (0.3, 0.5) + d I,SD (0.5, 0.2) = 0.1 + 0.1 = 0.2  1 = d I,SD (0.3, 0.2).

Also, it is not a pseudo-monometric since for the triplet (0.2, 0.3, 0.5), we have
196 K. Nanavati et al.

d I,SD (0.2, 0.3) = 1  0.1 = d I,SD (0.2, 0.5).

Remark 5 One can easily lift the distance function d I,S on [0, 1] to any X = ∅ as
follows:

Consider a mapping f : X → [0, 1]. Define d I,S : X × X → [0, 1] as follows:
for any x, y ∈ X ,

∗ 0, if x = y,
d I,S (x, y) = d I,S ( f (x), f (y)) =
S(I ( f (x), f (y)), I ( f (y), f (x))), otherwise.


Clearly, d I,S is a distance function and it is a metric if d I,S is a metric.

4 Concluding Remarks

In [9], authors proposed a distance function d I using a fuzzy implication I that


turns out to be a metric if I satisfies (SLK , I )-transitivity and is always a pseudo-
monometric on ([0, 1], ≤). In this work, we generalise d I using a t-conorm, showcas-
ing the applicational value of FLCs. The paper aims to study the sufficient conditions
under which our proposed distance function d I,S is a metric for the major t-conorms.
Towards this end, we show that d I,SLK is a richer source of metrics than d I . In this
quest, our work also offers a characterisation of fuzzy implications satisfying the
functional equality I (x, y) + I (y, x) = k, when (x, y) ∈ (0, 1)2 and k ∈ [0, 2].
Also, while d I always yields a pseudo-monometric on ([0, 1], ≤), we see that d I,S
doesn’t. We study the conditions under which we can obtain a pseudo-monometric on
([0, 1], ≤) using d I,S , which shows yet another construction of pseudo-monometrics.
It has also been shown that any metric or pseudo-monometric obtained from d I can
be obtained from d I,S for any t-conorm S, showing that we now have more examples
for pseudo-monometrics and metrics.

Acknowledgements The third author would like to acknowledge the support obtained from SERB
under the project MTR/2020/000506 for the work contained in this submission.

References

1. Aguiló I, Martín J, Mayor G, Suñer J (2015) On distances derived from t-norms. Fuzzy Sets
Syst 278:40–47
2. Alsina, C.: On some metrics induced by copulas. In: General Inequalities 4, pp. 397–397.
Springer (1984)
3. Baczyński M, Jayaram B (2008) Fuzzy implications. Studies in fuzziness and soft computing,
vol 231. Springer, Berlin, Heidelberg
4. Gupta M, Jayaram B (manuscript under preparation) On the role of monometrics in nearest
neighbor classification
16 Distances from Fuzzy Implications 197

5. Gupta M, Nanavati K, Jayaram B (submitted) Pseudo-monometrics on lattice betweenness


using fuzzy implications
6. Klement EP, Mesiar R, Pap E (2000) Triangular norms. Trends in logic, vol 8. Kluwer Academic
Publishers, Dordrecht
7. Nanavati K, Gupta M, Jayaram B (2021) Metrics from fuzzy implications and their application.
In: 9th international conference on pattern recognition and machine intelligence(PREMI)
8. Nanavati K, Gupta M, Jayaram B (2022) Monodistances from fuzzy implications. In: Informa-
tion processing and management of uncertainty in knowledge-based systems. Springer, Cham,
pp 169–181
9. Nanavati K, Gupta M, Jayaram B (2022) Pseudo-monometrics from fuzzy implications. Fuzzy
Sets Syst
10. Ouyang Y (2012) A note on metrics induced by copulas. Fuzzy Sets Syst 191:122–125
11. Pérez-Fernández R, Baets BD (2017) The role of betweenness relations, monometrics and
penalty functions in data aggregation. In: Proceedings of IFSA-SCIS 2017. IEEE, pp 1–6
12. Pérez-Fernández R, Rademaker M, De Baets B (2017) Monometrics and their role in the
rationalisation of ranking rules. Inf Fusion 34:16–27
Chapter 17
Real-Time Quick Fog Removal
Technique for Supporting Vehicles
on Hilly Routes Amid Dense Fog

K. Janaki, K. Jebastin, and K. Dhinakaran

1 Introduction

Around 1.4 million individuals worldwide lose their precious lives to traffic accidents
each year, with 3287 people dying on average each day. Road accidents result in
20–50 million extra injuries worldwide every year. One death is predicted to occur
globally every 25 s. Every year, more than 0.147 million individuals in India pass
away, and more than 0.47 million suffer injuries. A media article claims that more
than 11,000 lives are lost annually in traffic accidents because of fog. Each year,
fog causes over 24,000 injuries, or 16% of all traffic accidents Organization [1],
Transport [2]. The earth’s surface is fog, a collection of extremely fine moisture
from tiny water drops. Due to the drastic drop in temperature, moisture in the air
is suspended and creates fog. Water droplets with a radius of 1 to 10 µm make up
fog. Every time light penetrates the fog, it disperses and lessens contrast in the area.
Fog hence creates thick, white visibility. Driving becomes exceedingly difficult for
a motorist because of thick visibility. The high altitude in hilly terrain causes a faster
rate of temperature decline than in the plain zone. As a result of moisture suspension,
thick fog accumulates in the hilly terrain. Because mountainous roads are riskier to

K. Janaki (B)
M.E-Applied Electronics, PSN College of Engineering and Technology, Melathediyoor,
Tirunelveli, Tamilnadu, India
e-mail: janakik905@gmail.com
K. Jebastin
Deparment of Electronics and Communication Engineering, PSN College of Engineering and
Technology, Melathediyoor, Tirunelveli, Tamilnadu, India
e-mail: jebastin@psncet.ac.in
K. Dhinakaran
Senior Tech Lead HCL Technologies, Bangalore, India
e-mail: dhinakarank@hcl.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 199
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_17
200 K. Janaki et al.

drive on than flat ones, they are considered in this situation. On a mountainous road,
dense fog affects how drivers perceive their surroundings, making it difficult to see
nearby objects, pedestrians, and even other cars. Too much fog obscures the road
view. Driving at high speeds is impossible for drivers. As a result, driving becomes
extremely dangerous. The likelihood of an accident increases in two ways: first,
the likelihood of a collision increases, and second, the likelihood of falling into the
depth of the slope even increases. Figure 1 depicts a mountainous route covered
in dense fog. Some established techniques for defogging photos, such as driving
on the road. However, there is still much to learn about how to remove the thick
fog on uphill routes. To aid drivers in seeing clearly while driving uphill in heavy
fog, this article proposes a rapid, real-time fog removal technique. The suggested
method would be helpful for a safe drive on a heavy mountainous route with poor
visibility (below 100 m). The following summary of this paper’s main contributions:
For defogging thick video frames, a least-squares approach based on an atmospheric
scattering model is paired with separate histogram equalization developed on the
color channel. Compared to cutting-edge approaches, these integrated techniques
offer a clear, fog-free output in real-time. Rather than estimating ambient light at
every frame, it is done so at intervals of 6000 frames to cut down on the lengthy
processing time. It is suggested that a dynamic patch be used to implement frame
inversion, providing smaller patches for darker pixels and larger patches for brighter
pixels to reduce significant computation time without compromising the final frame’s
fog-free quality. The dynamic patch approach solves the issue of frame improvement
for the dark and sky regions. The literature study is included in Sect. 2, and the
suggested technique and execution are explained in Sect. 3. The comparison, time
delay analysis, and experimental and simulation results are presented in Sect. 4. The
following list includes a handful of the current methods. With the use of a guided
filter, a fog removal technique is shown for both pictures and movies in Lin and Wang
[3].

Fig. 1 Flowchart of the proposed real-time fog removal approach


17 Real-Time Quick Fog Removal Technique for Supporting Vehicles … 201

2 Field of Study

There are some current fog dispersal algorithms available. All of these, nevertheless,
are relevant to a single image and a certain context, such as daytime, nighttime, sea
view. The following list includes a handful of the current methods. With the use of a
guided filter, a fog removal technique is shown for both pictures and movies in Lin
and Wang [3].
Attenuation is restored after the filter analyses the light from the atmosphere
(decreases the contrast). An introduction of a dark channel-prior removes the mist
pictures. Therefore, a dark pixel may be used to determine the haze transmission.
A haze-free image may be reconstructed by combining soft matching with a haze-
imaging mod haze-imaging light in the cloudy input frame is used to estimate the
optical transmission Fattal [4]. A scene view without any fog is possible thanks to
the depth map, which also allows for a fast approximation of the transmission map.
An optimum transmission map for removing fog from a single image is created He
et al. [5]. A boundary prior is added to the initial transmission map after carefully
analyzing the visual model. For nighttime frames, the super-pixel-based fog reduc-
tion approach is anticipated. By utilizing virtual smoothness, the input frames are
separated into glow-free and glow-foggy frames. For visual marine surveillance, Hu
et al. [6] offer a single-picture fog removal technique. A scattering model and the
radiance decomposition approach remove the fog layer and glow upshot on the air
light, respectively. The transmission map is then projected. The suggested radiance
compensation approach also makes it possible to create a frame that is free of fog.
A gamma correction prior-based dehazing technique is provided to restore the hazy
images.

3 Theory and Proposed Approach

In this paper, a quick and creative method for removing fog from a driver’s field
of vision in dense fog in mountainous terrain is provided. The temporal complexity
and a clear, fog-free output are two of the biggest hurdles. The processing time for
each frame will now be relatively brief, thanks to the distinctive architecture of the
suggested technique. The suggested method combines frame inversion, transmission
map estimate, and recovery of a clear image using the atmospheric light scattering
model. All frames are subjected to separate equalization depending on the color
channel for significant contrast modification. The initial frame’s pixel intensity is
used to determine atmospheric light, which is adjusted every 6000 frames via a
dynamic patch. A step-by-step representation of the complete structure. Real-time
video capture video acquisition is the first stage in frame enhancement. Real-time
video recording is done with a high-definition web camera. The camera is positioned
within the windshield glass at the driver’s eye level to give a sense of the road. This
202 K. Janaki et al.

camera can capture a 31-color frame per second and up to 1280 × 720 pixels in
quality.

3.1 Frame Extraction

As the real-time acquired video is 30 fps, 30 frames are extracted in seconds to be


processed and also there will not much changes in next consecutive frames, every
alternate frames are taken to perform fog detection, since given approaches accessing
every pixel for implementing equations mentioned in other sections, input frames
are resized to half of its original scale. It may help to process the frame double the
time faster.

3.2 Atmospheric Light Estimation Using Least-Filtering


Technique with Dynamic Patch

Badhe and Ramteke [7], Bai et al. [8], Tian et al. [9], Toka et al. [10], Maa et al.
[11] is widely used to define the foundation of any hazy/foggy image frame based
on scattering model by the given equation:

I (x) = J (x)t (x) + A(1 − t (x)) (1)

where I(x) indicate the input foggy frame, A specifies the atmospheric light, and J(x)
signifies the fog-free output frame. Also, t(x) implies the inverted frame and is given
as:

t (x) = eβd(x) (2)

where d(x) denotes depth in the image and denotes the fog factor (He et al. [12],
Zhu et al. [13]). Without taking into account its impact, we obtained I ≈ J for the
picture that was taken in perfect conditions, β ≈ 0. Similar to this, when an image is
taken under heavily foggy conditions, β > 0 and it becomes a non-negligible value.
In (1), J(x)t(x) is the linear attenuation and A(1-t(x)) is the light of the atmosphere.
A full frame is divided into numerous little size  patches. t, A, and J from I Tufail
et al. [14] are to be computed as part of the fog elimination process. The intensity of
the local pixels has the following effects on the run-time calculation of the dynamic
local patch (x):

(x) = {5∀ pixel where intensity ≤ 100 10∀


pixels where 101 ≤ intensity ≤ 200
15∀ pixels where intensity > 200} (3)
17 Real-Time Quick Fog Removal Technique for Supporting Vehicles … 203

Each patch has at least one RGB value that is the lowest among the color channels.
Every local patch of the three RGB channels receives the least amount of filtering.
This method produces a frame with very little intensity. The following is an estimate
for the lowest value of intensity I lowest (x) of any pixel:
 
I lowest (x) = min ye{(x)} I c (y) ce{r, g, b} (4)

where I represents a sample input frame that has had the fog or haze removed. Tian
et al. [9], He et al. [12], Tufail et al. [14], I c is I’s color channel, I lowest is I’s lowest
intensity, which is nearly 0, and (x) denotes the local patch at the x location. Two
least operators, min and mincp{r, g, b}least filter, together yield the lowest intensity
(Fig. 2b, c). Commutative operators are the least operators. By examining the RGB’s
lowest intensity (I low ), the atmospheric light (A) is calculated. The brightest 0.1%
of all the pixels are then selected, along with a few others, as having the highest
intensity value. The coordinate position of these brightest (0.1%) pixels is chosen,
and Yawale and Kapse [15], He et al. [12] distinctly determine the peak value of
intensity in each RGB color channel from these pixel locations. These three RGB
channel intensity values are regarded as the final value for atmospheric light (A).
Thus, ‘A’ is a vector of 3 × 1 in which each value means the maximum intensity
value between R, G, and B as follows:
 
A = 3c = 1I c avg max I lowest (x) x ∈ (0.1% ∗ h ∗ w) (5)

The light in the atmosphere (A) is brought on by sunlight. Sunshine won’t fluctuate
as quickly in every frame. As a result, atmospheric light (A) is determined for each
picture and then again after 6000 frames. The pixels in the input frame with the
highest intensity value can be used to determine the ambient light (A). The average
intensity of these pixels (low-intensity pixels) is then determined. It is therefore
possible to obtain ambient light.

Fig. 2 Calculation of lowest intensity of pixels. a An arbitrary frame I. b Calculated lowest of


RGB values. c Calculated least filter achieved from b, i.e., the lowest intensity of J with 15 × 15
patch size ()
204 K. Janaki et al.

3.3 Estimation of the Frame Inversion and Transmit Board

The inversion of a frame is computed for each actual time body via the usage of atmo-
spheric light (Ac). Every pixel of the enter frame is divided by way of its constant value
in ‘A’ to compute RGB channels Yawale and Kapse [15], He et al. [12]. Normalization
of the Eq. (1) of a hazy frame is completed as follows:

J c (x) J c (x)
= t(x) + 1 − t(x) (6)
Ac Ac
By inserting the minimal operator on each facet of the Eq. (6), the lowest intensity
is calculated as,

I c (y) J c (y)
min y ∈ (x) min c = t(x) min y ∈ (x) min c + 1 − t(x) (7)
Ac Ac
The transmission is denoted here by t(x). The atmospheric light’s constant positive
value Ac is equal to the lowest intensity value J lowest , which is virtually zero. Since
J is a fog-free output frame, J’s lowest intensity is almost 0, meaning.

J lowest (x) = min y ∈ {(x)} min c J c (y) = 0 (8)

As the atmospheric light Ac is continuously positive, so

J c (y)
min yε{(x)} min c =0 (9)
Ac
Putting (9) into (7), the transmission t(x) is assessed by

J c(y)
t(x) = 1 − min y ∈ {(x)} min c =0 (10)
Ac
The frame is inverted in this transmission, t(x). The Eq. (10), even if the trans-
mission is almost nil, can be used to both sky and non-sky locations. The sky region
does not need to be divided (Fig. 3).
There is no need to add any constant parameters to purposefully keep even a tiny
amount of fog present because it remains dense in hilly locations. Figure 4b displays
an inversion of the input hazy frame.

3.4 Fogg Free Scene Recovery

The fog-free scene brightness is restored in accordance with using computed inverted
frame and atmospheric light (1). Thus, even without an inversion, the linear attenua-
tion J(x)t(x) can be zero. As the fog is so dense, it is purposefully not retained here in
17 Real-Time Quick Fog Removal Technique for Supporting Vehicles … 205

Fig. 3 Computation time in two different CPU (CPU1: Intel(R) Core (TM) i5 8250U CPU @
1.60–1.80 GHz with 8 GB RAM, CPU2: Intel(R) Core (TM) i7 8550U @ 4.00 Ghz with 12 GB
RAM and 128 GB SSD)

a) Original Frame b) Enhanced Frame

Fig. 4 a Original frame. b Enhanced frame

any small amount; instead, it is removed as much as possible. In order to reconstruct


the ideal fog-free scene radiance J(x),

I (x) − A
J (x) = +A (11)
t(x)

As the brightness of the scene is not as bright as atmospheric light (A), the frame
after fog removal appears weak. As a result, J(x) exposure is increased in He et al.
[12].
206 K. Janaki et al.

3.5 Color-Based Independent Histogram Equalization

The haziness of a frame caused by intense fog is practically obvious after fog-free
recovery. To make the frame even more practical and prominent, however, there is
still room for improvement in the contrast adjustment. To get the final prominent
view, each picture is subjected to the independent histogram equalization channel.
Independent histogram equalization, a type of image processing, distributes pixels
based on the value of the color channels to increase visual contrast. It was chosen
because it is a quick procedure that, after clearing away heavy fog, makes noticeable
contrast improvements. The histogram shows how each frame’s tonal values are
distributed across all pixels. All of the RGB color channels have been balanced. The
accessible color levels are 0 to 255 in the case of an 8-bit image, where the potential
color levels range from 0 to I to L-1. The number I stands for is a pixel’s color
saturation. Based on the color. The transformation portion is now shown; starting
with (13),

s = T(i), 0 ≤ i ≤ L − 1 (12)

cdf(i ≤ t) = (13)


t k = 0 pk (14)

the probability,

sk = T(i) = floor((L − 1) ∗ ik = 0 pk (15)

Enter Sk into an array that is equalized. Reconstruction of a video from processed


frames. The final step in creating a new real-time video is reconstruction, which
involves putting the frames in chronological order while maintaining a constant
pace. All frames that have been processed are finished after histogram equivalence.
To recreate a new movie, all processed frames are arranged in the camera’s original,
chronological acquisition sequence. In the suggested approach, freshly rebuilt at a
fixed pace of 31 frames per second to display on the screen. The driver will be able to
enjoy comfortable live streaming in real-time. Live broadcasting of video orientation
with a resolution of 1920 × 1200 shows newly rebuilt footage. In the automobile, the
monitor is positioned immediately above the dashboard and below the windscreen.
The majority of the frames are defogged using the suggested method, as can be
seen. The experimental outcomes of this suggested strategy are displayed. When a
road turns, the changes; and when a tunel is entered. However, these errors only
last for a limited number of frames before the ambient light is estimated once more.
The average visibility distance during severe fog increases by more than 92% after
defogging, and it has been reported. Additionally, when there is less fog, visibility
is greatly increased.
17 Real-Time Quick Fog Removal Technique for Supporting Vehicles … 207

Table 1 Comparison of computation time (in millisecond) with popular state-of-the-art methods
for various frame sizes
Method Frame size
1024 × 786 600 × 450 441 × 450
DPC (He et al. [12]) 36,896 12,228 9866
CAP (Zhu et al. [13]) 4278 2219 1420
FAMED-Net (Zhang et al. [16]) 1800 889 508
IDGCP (Ju et al. [17]) 1106 500 341
CCR (Wang et al. [18]) 2563 850 368
DPCMR (Colores et al. [19]) 125.98 48.35 21.36
SSIM (Li et al. [20]) 4563 2865 1023
CCEMDCP (Liu et al. [21]) 550 318 150
Histogram scattering model 94.82 35.54 18.83
Proposed method 60.10 20.6 9.7

4 Results and Discussion

4.1 Run-time Examination

The most important component while driving is timing. A major accident is likely
if a motorist cannot see the live road view immediately and without delay. A single
frame’s overall processing time shouldn’t be excessively long. Between the live
captured input frame and the output video display, there should be a negligible time
difference. For each frame in the proposed method, the total computation time for the
whole operation is only a few milliseconds. A motorist will now see this processed
footage as authentic real-time live video. In the suggested method, only the first
frame’s lowest intensity of pixels—those that were next to atmospheric light (A)—is
estimated. It is refreshed every 6000 frames and reduces the amount of work required.
Following that, frame inversion and individual histogram equalization depending on
color channel are performed for each frame (for the final contrast adjustment). The
total computation time of the proposed technique is shown in Fig. 1, with varying CPU
speeds. Table 1 shows that the computation times for each frame using the suggested
technique are much longer than those using other well-liked current methods.

4.2 Qualitative Contrast with Current Approaches

The quality of the images is compared with widely used existing methods using
several densely foggy frames of mountainous routes. Figure 4a displays the first thick
input frame of the fog, which was captured. Figure 4b, respectively. The outcome
208 K. Janaki et al.

of the suggested strategy is displayed. The majority of the fog is cleared, as seen in
Fig. 4b, but the frame darkens due to an unbalanced contrast. Xu’s study compara-
tively, the recommended approach is used to display the defogged output in Fig. 4b.
Contrast distortion is three trustworthy assessment methodologies that are utilized
to evaluate the quantitative performance of our suggested strategy with cutting-edge
approaches. The associated MSE is as: denotes the image’s pixel positions, width,
and height, respectively. The better the approximated image, the higher the PSNR
value (x). Three factors are taken into account in restored photos by the SSIM index,
which is used to measure the similarity between two images: lighting l(x), contrast
c(x), and structure s. (x). The decimal value of the SSIM index falls between 1 and
1. Only when comparing two identical photos with equal pieces of data does SSIM
= 1. According to the following, NIQMC determines an image’s quality based on
its local details and global histogram: where is a constant weight used to regulate the
respective significance of the local and global techniques. Local and global quality
measurements are denoted here by the letters QL and QG, respectively. Quite compa-
rable in this case, as seen by the high PSNR and SSIM values. Similarly, NIQMC
prefers photos in particular.

PSNR = 10 logs (16)

MSE = 10[MAX2 IHF(x) MSE] (17)

 
1(w × h) wx = 1 h y = 1 (J(x) − IHF(x))2 SSIM(x) = f(1(x), c(x), s(x))
(18)

NIQMC = QL + QG 1 + WHC (19)

Therefore, higher NIQMC values imply stronger visual contrast. The greatest,
second-best and third-best performances are denoted by the colors red, green, and
blue, respectively. Table 1 shows that it performs worse than other approaches
across all assessment procedures. The reason for the method’s poor performance
is that it struggles to work well when the hazy input photos have a large number
of dark patches. This method beats most existing strategies in terms of quantitative
performance.
The whole processing time is only a few milliseconds, as can be seen in Fig. 3.
As a result, there won’t be much of a delay between the camera capturing a real-time
frame and the monitor showing the processed frame.
Several cutting-edge techniques for single image fog removal are taken into
consideration for comparison. Each method’s overall computing time is assessed.
For various frame resolutions (1024 × 786, 600 × 450, and 441 × 450), the proposed
method is here compared against the most recent state-of-the-art methods. Table 1
shows a comparison of computation times.
17 Real-Time Quick Fog Removal Technique for Supporting Vehicles … 209

Additionally, the recovery photographs’ cloud and sky regions look genuine, and
the targets’ texture details have been amplified. Additionally, it has been noted that
and perform less well for sky areas. Particularly, Wu et al. [22] performed worse
than the majority of more current approaches, as evidenced by the PSNR value.
When used for images where the ambient air light was uneven, the method’s greater
patch size proved useless. Since it was discovered that this method performs less
well when the picture is affected by a severe haze. It is observed that certain current
approaches, such, produce superior results for a small number of frames. In contrast,
the values for the remaining frames are similar to those of the suggested study
quantitatively demonstrate, however, that our proposed approach beats previously
known frame-defogging restoration techniques (highest mean value).
In actual driving encounters and responses, several real-time tests are performed.
The drivers benefit from having a nice driving experience. It only appears; however,
when they perceive that the front view is completely obscured by severe fog and
that there is little to no visibility left, they turn back to the display screen. The
suggested system lengthens the visibility distance. As a result, through the display
screen, drivers may see obstacles on the road (such as potholes, speed bumps, or
pedestrians) that are far away. Even in extremely deep fog, drivers report feeling no
fog.

5 Conclusion

This paper describes a quick, efficient defogging method to clear the severe fog
from the driver’s field of view while driving. By employing the suggested method, a
motorist may navigate any heavily foggy route (such as a road in mountainous terrain)
while maintaining a clear field of view. This method can deliver a crystal-clear, fog-
free result in real-time with maximum visibility in the shortest calculation time.
Compared to the current approaches, dynamic patch size for predicting transmission
maps reduces the issue of dark and sky regions. Both low and dense fog may be
effectively eliminated using this method. Driving in deep fog is used to evaluate a
variety of real-time scenarios. Any vehicle can apply the suggested method when
traveling in heavily foggy situations. Any motorist may safely go through dense fog,
such as on a steep foggy road. The suggested strategy allows for a safe voyage for
passengers. If everyone takes the suggested action, pedestrians can cross the road
safely. There will be fewer traffic collisions, fatalities, injuries, and delays caused
by fog in reaching the target. The suggested strategy can be improved in the future
by streamlining the defogging procedure. One or more dynamic strategies can solve
the issue of varying sunshine. The vision distance may be increased even further,
enabling drivers to operate any vehicle or railway safely in deep fog and assisting
fighter jets with takeoff and landing maneuvers.
210 K. Janaki et al.

References

1. Organization WH (2018) Violence and injury prevention and World Health Organization: global
status report on road safety 2018: Supporting a decade of action. Global Status Report on Road
Safety 2018: Supporting a Decade of Action, Geneve
2. Transport Research Wing M R T H: Government of India (2017) Road accidents in India 2017.
New Delhi
3. Lin Z, Wang X (2012) Dehazing for image and video using guided filter. Open J Appl Sci
2(4B):123–127
4. Fattal R (2008) Single image dehazing. In: Proceeding of the ACM SIGGRAPH 08, Los
Angeles, California
5. He L, Zhao J, Zheng N, Bi D (2017) Haze removal using the difference-structure-preservation
prior. IEEE Trans Image Process 26(3):1063–1075
6. Hu HM, Guo Q, Zheng J, Wang H, Li B (2019) Single image defogging based on illumination
decomposition for visual maritime surveil- lance. IEEE Trans Image Process 28(6):2882–2897
7. Badhe MV, Ramteke PL (2016) A survey on haze removal using image visibility restoration
technique. Int J Comput Sci Mobile Comput 5(2):96–101
8. Bai L, Wu Y, Xie J, Wen P (2015) Real time image haze removal on multi-core DSP. In:
Asia-Pacific international symposium on aerospace technology, China
9. Tian Y, Xiao C, Chen X, Yang D, Chen Z (2016) Haze removal of single remote sensing image
by combining dark channel prior with superpixel. In: International symposium on electronic
imaging 2016: visual information processing and communication VII, California, USA
10. Toka V, Sankaramurthy NH, Kini RPM, Avanigadda PK, Kar S (2016) A fast method of fog
and haze removal. In: International conference on acoustics, speech, and signal processing,
Lujiazui, Shanghai, China
11. Maa N, Xu J, Li H (2018) A fast video haze removal algorithm via dark channel prior. In: 8th
international congress of information and communication technology, Xiamen, China
12. He K, Sun J, Tang X (2011) Single image haze removal using dark channel prior. IEEE Trans
Pattern Anal Mach Intell 33(12):2341–2353
13. Zhu Q, Mai J, Shao L (2015) A fast single image haze removal algorithm using color attenuation
prior. IEEE Trans Image Process 24(11):3522–3533
14. Tufail Z, Khurshid K, Salman A, Nizami IF, Khurshid K, Jeon B (2018) Improved dark channel
prior for image defogging using RGB and YCbCr color space. IEEE Access 6:32576–32587
15. Yawale RP, Kapse AS (2016) Digital image defogging using dark channel prior and histogram
stretching method. Int J Adv Res Comput Commun Eng 5(4):889–894
16. Zhang J, Tao D (2020) FAMED-Net: a fast and accurate multi-scale end-to-end dehazing
network. IEEE Trans Image Process 29:72–84
17. Ju M, Ding C, Guo YJ, Zhang D (2019) IDGCP: image dehazing based on gamma correction
prior. IEEE Trans Image Process 29:3104–3118
18. Wang W, Li Z, Wu S, Zeng L (2020) Hazy image decolorization with color contrast restoration.
IEEE Trans Image Process 29:1776–1787
19. Colores SS, Yepez EC, Arreguin JMR, Botella G, Carrillo LML, Ledesma S (2019) A fast
image dehazing algorithm using morphological reconstruction. IEEE Trans Image Process
28(5):2357–2366
20. Li L et al (2020) Semi-supervised image dehazing. IEEE Trans Image Process 29:2766–2779
17 Real-Time Quick Fog Removal Technique for Supporting Vehicles … 211

21. Liu P, Horng S, Lin J, Li T (2019) Contrast in haze removal: configurable contrast enhancement
model based on dark channel prior. IEEE Trans Image Process 28(5):2212–2227
22. Wu Q, Ren W, Cao X (2020) Learning interleaved cascade of shrinkage fields for joint image
dehazing and denoising. IEEE Trans Image Process 29:1788–1801
Chapter 18
Deep Learning-Based Approach
for Outlier Detection in Wireless Sensor
Network

Biswaranjan Sarangi and Biswajit Tripathy

1 Introduction

Outliers are considered as a significant deviation from the usual pattern of sensed
data due to faults in sensors. The faults in WSN may occur unexpectedly due to
many constraints like low-power transmitter, limited energy resources, environ-
mental impact, etc. As the outlier data are unreliable and inaccurate, it may lead
to life-threatening events as maximum use of WSNs is involved in safety-critical
applications. The primary goal of outlier identification in WSNs is to locate outliers
in distributed streaming data online with high detection accuracy and limiting the
network’s resource consumption [1].
To our knowledge, the majority of the existing outlier identification techniques
are inapplicable in real-time application. Following the successful identification of
outliers in real-time data, it is possible to stop the entry of the outlier data into the
network, avoiding the relay nodes unnecessary involvement in the transmission of
the outlier data to the sink node.
In this paper, we suggest an unsupervised learning technique called GAN. The
architecture is suggested here by using robust continuous clustering where the cluster
heads use the proposed detection algorithm to detect outliers locally.

B. Sarangi (B)
Biju Patnaik University of Technology, Rourkela, Odisha 769015, India
e-mail: biswaranjan.sarangi@gmail.com
B. Tripathy
GITA Autonomous College, Bhubaneswar, Odisha 752054, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 213
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_18
214 B. Sarangi and B. Tripathy

2 Related Work

Zhang et al. in [2] and Ayadi et al. in [3] give a comprehensive literature review on
outlier detection methods in WSN. The criteria used by the authors to categorize the
outlier identification approaches in [2] include input sensor data, outlier type (local
and global), outlier identity, outlier degree, and availability of pre-defined data. They
have divided outlier identification methods into ways based on nearest neighbors,
statistics, classification, and spectral decomposition.
In order to solve the problem of outlier detection, statisticians employed statis-
tical approaches as their first algorithms in the early nineteenth century [4]. Statis-
tical methods can also be divided into parametric and non-parametric categories. A
time-series analysis and geostatistics method that locates outliers and distinguishes
between errors and events in a distributed and online mode have been proposed in
[4]. In order to define normal behavior, their method makes use of the spatiotem-
poral correlations in WSN data. The strategy based on parametric techniques is
not appropriate in real-world settings because there is no prior knowledge of data
distribution.
A parameter-free outlier detection algorithm is suggested in [5] for calculating the
ordered outlier distance difference factor. The difference in the ordered distances is
taken into account when calculating the outlier score for each data point. It is recom-
mended in [6] to use data nearest for outlier detection (DNOD) for unsupervised
outlier detection. This approach seeks to find outlier measurements by analyzing the
learning data that sensors have gathered. Non-parametric methods have a significant
computing cost for handling multivariate data, making them unsuitable for real-time
applications.
To find outliers in sensor nodes, Rajasegarar et al. [7] suggest a global outlier
identification technique based on clustering. Each node clusters the measured data
and reports the cluster summaries rather than sending the measured data to its parent.
The parent then sends the sink cluster summaries that were compiled from its entire
offspring and combined. If the average intercluster distance of a cluster in the sink
node exceeds a threshold value of the intercluster distances defined, an abnormal
cluster can be discovered. In WSN applications, the choice of cluster width is
crucial. All data patterns’ distance measurements are computationally demanding
and inappropriate for sensors with minimal resources.
In the fields of machine learning, systematic classification approaches are crucial
[2]. They develop a classification model using the collection of data instances
(training) and classify an ambiguous occurrence into one of the learnt classes (tests).
Unsupervised-based categorization does not require any prior knowledge of labeled
training data. The classification model, which fits the majority of the data examples,
is learned during training. The outlier identification techniques for WSN are based on
Bayesian networks, support vector machines (SVMs), and deep learning, depending
on the type of classification model being used. Although it resolves the multivariate
data issue, it must train on the newly arrived normal dataset.
18 Deep Learning-Based Approach for Outlier Detection in Wireless … 215

Using SVM, Rajasegarar et al. [8] suggest an approach for outlier detection in
sensor data. This method makes use of a single-class quarter SVM to reduce the effort
required for computational complexity and locally locate outliers of each node. An
anomaly in the sensor data is known to exist outside of the quarter-sphere. In [9], the
authors suggested two distributed and online outlier detection algorithms based on a
one-class hyper-ellipsoidal SVM. They have considered the correlation between the
sensor data attributes.
For the purpose of detecting outliers and events in WSNs, a thorough analysis
of several one-class SVMs, including the hyper-plane, hyper-sphere, quarter-sphere,
and hyper-ellipsoid, is provided in [10].
In [11], a method for detecting outliers called the support vector data description
based on spatiotemporal and attribute correlations (STASVDD) is proposed. This
method assumes that once the collected data vectors are independently and uniformly
distributed in WSNs, outliers can independently occur in every attribute.
In [12], the autoencoder neural networks are used to solve the outlier detection
problem in WSN. The authors have developed a two-part algorithm, which resides
respectively on sensor nodes and the cloud. The anomalies are detected in a distributed
manner at sensor nodes without having to communicate with any other sensor nodes
or the cloud. A time-series-based recurrent autoencoder ensembles are proposed to
detect outliers in the reference [13]. Their proposed two solutions exploit sparsely
connected recurrent neural networks (S-RNNs), which ensures the design of multiple
autoencoders with different neural network connection structures.

3 Proposed Approach

Based on robust estimation, when clustering is expressed as optimization of a contin-


uous objective, it is defined as Robust Continuous Clustering or RCC [14]. In spite
of the fact that the number of clusters is unknown, it is non-parametric and achieves
good clustering accuracy.
Consider the problem having set of n data points for clustering and the input is
given by X = [x1 , x2 , …. xn ], where xi ∈ R D which will operate on a set of
representatives U = [u 1 , u 2 , …. u n ], where u i ∈ R D . Each data point xi has a
corresponding representative u i . The optimization on U reveals the cluster structure
latent in X. Hence, it is not necessary to know the number of clusters in advance.
RCC first creates a more reliable connection structure E u based on mutual k-nearest
neighbor connectivity, where E is the collection of graph edges that connect the data
points. The graph is automatically constructed from the data.
The RCC objective formula is

1    
n
λ
C(U ) = xi − u i 22 + w p,q ρ u p − u q 2 (1)
2 i=1 2
(x p ,xq )∈Eu
216 B. Sarangi and B. Tripathy

Fig. 1 GAN framework [16]

Here, the weights w p,q balance the role of each data point to the pairwise terms
and λ is used to balance the strength of the data terms and pairwise terms, whereas
an appropriate robust penalty function ρ(.) is important on the regularization terms.
A graph G u is constructed
 on the optimized value of U in which a pair x p and xq is
connected if u p − u q 2 < δ. The outputs, ku and ka subsets, are created from the
unlabeled data and discovered anomalies. When compared to the subsets separated
by similar outputs, the subsets are partitioned in a way that faithfully captures the
latent cluster structure of the complex data structure.
GAN as suggested by Goodfellow et al. [15] is the method for estimating gener-
ative models through an adversarial mechanism in which two models, one of which
is a discriminator (D) distinguish between real and generated data while the other
one is a generator (G) create data to fool the discriminator as shown in Fig. 1.
As suggested in [15], D and G play two-player minimax game with respect to a
joint loss function for V (G, D) which is given by
   
V (D, G) = E x∼P data(x) log D(x) + E z∼P z(z) log(1 − D(G(z)) . (2)

For generated samples Gauto(zi), where z is a latent space distribution, the generator
G, implicitly determines the probability distribution. The average negative cross-
entropy between the predictions and their sequence labels is then trained to be as low
as possible by the discriminator. Thus, the discriminator loss is given by

1  
M
Dloss = log Dauto (xi ) + log(1 − Dauto (G auto (z i ))) . (3)
M i=1
18 Deep Learning-Based Approach for Outlier Detection in Wireless … 217

The discriminator loss must be minimized to recognize that xi is real and Gauto(zi)
is false. The generator is trained to confuse the discriminator so that the discriminator
recognizes as many of the generated samples as real as possible.
The generator loss is given by

1  
M
G loss = log(1 − Dauto (G auto (z i ))) . (4)
M i=1

At the end of module training, the threshold is evaluated using precision and recall.
The trained module will then be deployed to all the cluster heads. Updated W, b and
threshold are scheduled to be sent periodically to sink or cloud cluster heads.
Clusters have a smaller cluster size, closer to the base station, which reduces the
energy spent on data processing in the cluster. As shown in Fig. 2, with the increase
of distance from the sink node, the cluster size increases. Each cluster head runs a
copy of the GAN. All sensor readings are taken from individual cluster heads in the
cloud. For each cluster head in the network, the sink node or the cloud will make one
copy of the GAN, i.e., n copies of the GAN assuming that there are n cluster heads
in the network. Each copy of GAN represents a cluster that is periodically trained in
the cloud by using the sensor data received from the respective cluster head.

Fig. 2 Overview of clusters and spanning tree [16]


218 B. Sarangi and B. Tripathy

4 Experimental Results

In order to evaluate the effectiveness of the suggested method, experiments are carried
out on synthetic data using the Python library Pymote 2.0. In this experiment, both the
discriminator and generator are trained and the threshold is obtained experimentally.
For training, 80% of data and for testing 20% of data from synthetic dataset are used.
The following metrics are considered for performance evaluation

TP + TN
Accuracy rate = , (5)
TP + TN + FP + FN
TP
Precision (P) = , (6)
TP + FP
TP
True Positive Rate(Recall)TPR = , (7)
TP + FN
FP
False Positive Rate FPR = , (8)
FP + TN
2(Precision × Recall)
F1 = . (9)
Precision + Recall

Different precision and recall values for different threshold values are shown
in Fig. 3. The threshold at which the precision curve intersects the recall curve is
called the optimum threshold, and its value is found to be 0.9. Figure 4 shows the
reconstruction error for various test data points. The outliers are the data points above
the threshold line.

Fig. 3 Precision and recall for different threshold values


18 Deep Learning-Based Approach for Outlier Detection in Wireless … 219

Fig. 4 Outlier detection using threshold

Fig. 5 Illustration of dataset

Our model displays a division boundary surrounding the normal data, identifying
partially identified group outliers and all discrete outliers from the synthetic dataset
as shown in Fig. 5.
A confusion matrix is a frequently used table to assess a classification model’s
performance on a test dataset where the true values are known. The confusion matrix
for the suggested strategy is shown in Fig. 6.
Table 1 compares the suggested method’s performance with those of state-of-the-
art solutions.
220 B. Sarangi and B. Tripathy

Fig. 6 Confusion matrix

Table 1 Comparison with state-of-the-art solutions


Method AR TPR FPR P F1
N- STASVDD c [11] 90.65 92.78 29.74 96.76 94.72
DADA [7] 86.94 89.45 37.11 95.85 92.54
Proposed 93.11 94.81 28.42 96.62 95.7

5 Conclusion

The main goal of the outlier detection method is to spot misbehaving nodes and
prevent the outlier data that these nodes report from entering the network. In this
research, we develop a robust continuous clustering-integrated online outlier identi-
fication approach based on GAN. An optimal threshold for outlier detection is exper-
imentally determined. The performance in regard to accuracy, TPR, FPR, precision,
and F1 is compared with the state-of-the-art techniques. Our model shows accuracy
of 95.7% with a low FPR of 28.42%.

References

1. Sarangi B, Mahapatro A, Tripathy B (2021) Outlier detection using convolutional neural


network for wireless sensor network. Int J Bus Data Commun Netw (IJBDCN) 17(2):91–106.
https://doi.org/10.4018/IJBDCN.286705
2. Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor
networks: a survey. In: IEEE Communications Surveys & Tutorials, vol. 12, no. 2. Second
Quarter, pp 159–170
3. Ayadi A, Ghorbel O, Obeid AFM, Abid M (2017) Outlier detection approaches for wireless
sensor networks: a survey. Comput Netw 129(1):319–333
4. Zhang Y, Hamm NAS, Meratnia N, Stein A, van de Voort M, Havinga PJM (2012) Having a,
statistics-based outlier detection for wireless sensor networks. Int J Geogr Inf Sci 1373–1392
18 Deep Learning-Based Approach for Outlier Detection in Wireless … 221

5. Buthong N, Luangsodsai A, Sinapiromsaran K (2013) Outlier detection score based on ordered


distance difference. In: International computer science and engineering conference (ICSEC),
pp 157–162
6. Abid A, Kachouri A, Mahfoudhi A (2016) Anomaly detection through outlier and neighborhood
data in wireless sensor networks. In: Advanced technologies for signal and image processing
(ATSIP), 2nd international conference, pp 26–30
7. Rajasegarar S, Leckie C, Palaniswami M, Bezdek JC (2006) Distributed anomaly detection in
wireless sensor networks. Proc IEEE ICCS
8. Rajasegarar S, Leckie C, Palaniswami M, Bezdek JC (2007) Quarter sphere based distributed
anomaly detection in wireless sensor networks. In: Proceeding of the IEEE international
conference on communications, pp 3864–3869
9. Zhang Y, Meratnia N, Havinga PJM (2013) Distributed online outlier detection in wireless
sensor networks using ellipsoidal support vector machine. Ad Hoc Netw 11(3):1062–1074
10. Shahid N, Naqvi IH, Qaisar SB (2015) One-class support vector ma- chines: analysis of outlier
detection for wireless sensor networks in harsh environments. Artif Intell Rev 43:515–563
11. Chen Y, Li S (2019) A lightweight anomaly detection method based on SVDD for wireless
sensor networks. Wireless Pers Commun 105:1235–1256
12. Luo T, Nagarajan SG (2018) Distributed anomaly detection using autoencoder neural networks
in WSN for IoT. In: IEEE International conference on communications (ICC). Kansas City,
MO, pp 1–6
13. Kieu T et al (2019) Outlier detection for time series with recurrent autoencoder ensembles.
In: Proceeding of the 28th international joint conference artificial intelligence (IJCAI), pp
2725–2732
14. Shah SA, Koltun V (2017) Robust continuous clustering. In: Proceedings of the national
academy of sciences, vol. 114, no. 37, pp 9814–9819
15. Goodfellow I et al (2014) Generative adversarial nets. In: Proceeding of the advance neural
information processing systems, pp 2672–2680
16. Sarangi B, Tripathy B (2023) Outlier detection technique for wireless sensor network using
GAN with Autoencoder to increase the network lifetime. Int J Comput Netw Inf Secur (IJCNIS)
15(1):26–38. https://doi.org/10.5815/ijcnis.2023.01.03
Chapter 19
Predicting Kidney Tumor Using
Convolutional Neural Network (CNN)

Kajal Rai and Pawan Kumar

1 Introduction

According to the survey, one in six deaths worldwide is caused by cancer, which is the
second prominent cause of mortality [1]. Renal cell carcinoma (RCC), that happens
in almost 90% of all cases of kidney cancer, is by far the most prevalent category
of kidney cancer [2]. Cancer prediction places a greater emphasis predisposition,
reappearance, and diagnosis of cancer. Cancer identification’s main aim is to classify
tumor categories and associate indicators that help build a classifier to recognize
particular advanced cancer kind or discover cancer at its initial phase.
A series of multilayer neural network models called “deep learning” (DL) is a
branch of machine learning which is a subset of artificial intelligence. It excels at
the challenge of learning from large amounts of data which is called “big data”
[3]. Similar to various machine learning approaches, deep learning has two stages:
a training stage in which network constraints are approximated using a specified
training dataset, and a testing stage, in which the trained network is used to forecast the
results of new input data. The development of the DL model for enhanced precision
and creative interoperability for cancer category forecast was made possible by the
gathering of entire transcriptomic data of tumor specimens.
CNN has in recent times turn into the genuine standard for segmenting kidney
tumors due to its par-excellence functioning when equated to other models in con-
ventional computer vision and medical image evaluation. CNN models can be trained
to generate 3D feature hierarchies using internal data.

K. Rai (B)
G.L. Bajaj Institute of Technology and Management, Greater Noida, India
e-mail: kajal.rai@glbitm.ac
P. Kumar
School of Computer Applications, Lovely Professional University, Punjab, India
e-mail: pawan.11522@lpu.co.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 223
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_19
224 K. Rai and P. Kumar

Fig. 1 Illustrative diagram of convolutional neural network (CNN) structure [4]

The preliminary layer of a convolutional neural network is called the convolutional


layer. CNN could be extended by adding extra layers, but the conclusive layer is
entirely connected. With each subsequent layer, the CNN grows more complex,
recognizing larger portions of the image, as is evident in Fig. 1. By computing the
dot product between the weights of the neurons linked to the local areas of the
input and the region related to the input dimensions, the convolutional layer can
ensure the output of the neurons linked to those local parts of the input. Set of
convolutional layers is typically used in conjunction with the pooling layer. The goal
is to hold important features while reducing the magnitude of the working model.
Based on the features gathered by the prior layers and multiple filters in them, this
layer accomplishes categorization tasks. The entirely connected layers then classify
the features that were extracted by the preliminary layer and supplementary pooling
layers [5, 6].

2 Related Work

In recent times, deep learning models constructed on CNN have been made known
for auspicious results on a number of medical image analysis tasks. There are many
layers in CNN and have been being developed by Fukushima since the end of 1970s
[7], and in 1995, they were also utilized to examine medical images. The segmenta-
tion of computer tomography (CT) images was done by the authors in [8] using 2D
CNN. Researchers have employed a variety of pattern analysis methods, including
Resnet50, Resnet50V2, Modified CNN, InceptionV3, 3D U-Net, V-Net, ReLU, and
GoogleNet in their work. In the study by Myronenko et al. [9], the authors intro-
duced borderline from start to finish using well-known CNN for correct semantic
segmentation of kidney tumor using arterial stage abdominal 3D CT pictures.
19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN) 225

Fig. 2 Methodology used for research

In the initial approximation of long-lasting kidney sickness using prognostic inter-


pretive machine learning, the competences of several machine learning approaches
for timely identifying incurable kidney diseases were examined. This problem has
been considered broadly, though the association among given data factors and the
selected or final category class feature has been examined and machine learning
approaches have generated very fruitful outcomes for timely assessment [10].
To categorize glomerular segmentation and glomeruli on all drift pictures with
frozen segment, two deep learning models were accustomed using a formerly created
CNN. The normalized confusion matrix for patch-based model has an average suc-
cess rate of 0.865 and mean of 0.879 of these models. According to reports, this work
is essential for the timely assessment of the donor’s kidneys before transplantation.
The findings of this study led to the consensus that it plays a significant role in the
functionality for transplant assessment in clinical scenario [11].

3 Research Methodology

In this paper, the research methodology used consists of various phases which can
be depicted by Fig. 2.

3.1 Data Collection

We had gathered the data of kidney from Picture Archiving and Communication
System (PACS) from different hospitals in Bangladesh. Table 1 shows the dataset
used with number of instances of each type.
226 K. Rai and P. Kumar

Table 1 Dataset used


Type No. of instances
Cyst 3709
Normal 5077
Stone 1377
Tumor 2283
Total 12,446

3.2 Preprocessing

A technique called data preprocessing is used to transform raw data into desirable
data format which can be used for model construction. Images were cropped to
remove unnecessary portions, and also, the patients’ information was removed from
the images. Then the images were converted into jpeg format. After the conversion,
each image finding was again confirmed by a radiologist and a medical technologist
to reconfirm the correctness of the data. Also, this research work consists of prepro-
cessing tasks such as attribute selection, cleaning missing values, and splitting the
dataset into training and testing. Some attributes such as serial number is removed
as it does not contribute to classification.

3.3 Model Generation

In this research, convolutional neural network (CNN) model that categorizes tumor
and non-tumor instances into their appropriate categories based on unstructured gene
expression is presented.

3.4 Classification

Classification is done to predict which images have cancer and is of which category,
either, Cyst, Stone, etc. Accuracy is one of the significant methods for estimating
classification models. Accuracy is the fraction of predictions the generated model
got correct. Accuracy is equal to the ratio of correct forecasts to all other guesses,
and it is given in Eq. (1).
Number of Correct Predictions
Accuracy := (1)
Total Number of Predictions
19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN) 227

Accuracy can also be measured in terms of positives and negatives for binary
classification as follows:
True Positive + True Negative
Accuracy :=
True Positive + True Negative + False Positive + False Negative
(2)

3.5 Result Analysis

The results obtained from CNN are analyzed and summarized based on accuracy.

4 Experimentation

Python language is used for experimentation which is widely used machine learning
language to build models and does the prediction of various things. For experiments,
dataset is downloaded from Kaggle [12]. All the data are in images format (jpeg).
Various python libraries like Seaborn, Keras are used to do the training and testing
of CNN models.
First the dataset is uploaded. Figure 3 shows the glimpse of images dataset.
Figure 4 displays the total number of instances in four different classes. Then we
split the dataset randomly into training, testing, and validation sets. The size of
training dataset was 11,200 images, 621 images for testing, and 1249 images for
validation of the results. CNN 2D sequential model was used for the experiments.
Figure 5 shows the model generation details.

Fig. 3 Images of kidney tumor dataset


228 K. Rai and P. Kumar

Fig. 4 Distribution of number of instances in each class

Table 2 Report on classification of trained data


Precision Recall F1-score Support
Cyst 1 1 1 372
Normal 1 1 1 509
Stone 1 1 1 139
Tumor 1 1 1 229
Accuracy 1 1249
Macro avg 1 1 1 1249
Weighted avg 1 1 1 1249

After model generation training, testing and validation of the model has been
done and the result is based on certain parameters like precision, recall, accuracy,
and loss. Figures 6 and 7 present the graphs of training and validation results with
different number of epochs. It can be clearly visualized from both the figures that
with an increase in the number of epochs while training the model, the accuracy of
the model also increases.
We also did the prediction of kidney tumor on test dataset, from which we got on
an average of 99% result on the given dataset. There is the division of 80:20 split
on the training and test data. Figure 8 shows the confusion matrix on heat map on
trained data, and Fig. 9 shows the confusion matrix of test data.
Tables 2 and 3 show the classification report of the predicted result on trained data
and test data, respectively.
19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN) 229

Fig. 5 Model generation using CNN


230 K. Rai and P. Kumar

Fig. 6 Model validation outcomes with three epochs

Fig. 7 Model validation outcomes with five epochs

Fig. 8 Confusion matrix on trained data


19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN) 231

Fig. 9 Confusion matrix on test data

Table 3 Report on classification of test data


Precision Recall F1-score Support
Cyst 1 1 1 186
Normal 1 1 1 255
Stone 0.9940 0.9835 0.9885 66
Tumor 0.9940 0.9835 0.9885 114
Accuracy 0.992 621
Macro avg 0.9970 0.9917 0.9942 621
Weighted avg 0.9982 0.9975 0.9966 621
232 K. Rai and P. Kumar

5 Conclusion and Future Scope

A prompt and accurate identification is crucial for timely diagnosis of cancer and the
excessive death rate. In particular, some types of kidney cancer may not exhibit symp-
toms until the very end and may remain localized in the kidneys without spreading
to other body organs. Therefore, it is tremendously essential to increase approxi-
mation accuracy by using updated and advanced techniques when treating cancer.
Numerous researches have been conducted recently, especially using machine learn-
ing and deep learning approaches, on various cancer types. In this paper, CNN model
is developed that categorizes tumor and non-tumor instances into their designated
cancer categories or as normal based on unstructured gene expression. CT data is
used to train and test the model, which has 12,446 unique data points, including
3709 cysts, 5077 normals, 1377 stones, and 2283 tumors. The model was 100%
accurate on trained data due to over-fitting of the model, but on test data the result
is not 100% accurate it is in between 96 and 100%, i.e., 98.6% or 99% on an
average. As the number of epochs of training the model increases, the accuracy
and precision increase and as a result model loss decreases. To a large extent,
segmentation issues for kidney and renal malignancies have been met with great
success as a foundation for further development although including the usage of
these technologies in the test set outside of the sampled population would be
challenging.

References

1. Siegel RL, Miller KD, Jemal A (2018) Cancer statistics, 2018. CA: Cancer J Clin 68(1):7–30.
https://doi.org/10.3322/caac.21442
2. American Cancer Society. About kidney cancer. www.cancer.org/cancer/kidney-cancer/about.
html
3. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.
1038/nature14539
4. Mu G, Lin Z, Han M, Yao G, Gao Y (2019) Segmentation of kidney tumor by multi-resolution
VB-Nets. Univ. Minn. Libr., pp 1–5
5. Magadza T, Viriri S (2021) Deep learning for brain tumor segmentation: a survey of state-of-
the-art. J Imaging 7–19
6. Kumar P, Sharma M (2021) Feature-importance feature-interactions (FIFI) graph: a graph-
based novel visualization for interpretable machine learning. In: 2021 international conference
on intelligent technologies (CONIT). IEEE, pp 1–7
7. Lo S-CB, Lou S-LA, Lin J-S, Freedman MT, Chien MV, Mun SK (1995) Applications for lung
nodule detection. IEEE Trans Med Imaging 14:711–718
8. Thong W, Kadoury S, Piché N, Pal CJ (2018) Convolutional networks for kidney segmentation
in contrast-enhanced CT scans. Comput Methods Biomech Biomed Eng Imaging Vis 6:277–
282
9. Myronenko A, Hatamizadeh A (2019) Edge-aware network for kidneys and kidney tumor
semantic segmentation. University of Minnesota Libraries Publishing, Mankato, MN, USA
10. Aljaaf AJ et al (2018) Early prediction of chronic kidney disease using machine learning
supported by predictive analytics. IEEE Evrimsel Hesaplama Kongresi (CEC) 1–9
19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN) 233

11. Marsh JN, Matlock MK, Kudose S, Liu T-C, Stappenbeck TS, Gaut JP, Swamidass SJ (2018)
Deep learning global glomerulosclerosis in transplant kidney frozen sections
12. Kaggle: Data Science Community. https://www.kaggle.com/datasets/nazmul0087/ct-kidney-
dataset-normal-cyst-tumor-and-stone
Chapter 20
Hybrid Machine Learning Approach
for Sentiment Analysis of Amazon
Products: A Survey

Om Sarulkar, Rahul Pitale, Shivam Tikhe, Rohan More, and Sumit Giri

1 Introduction

In the modern world, media platforms, online retail, and e-commerce play a signif-
icant part in forming an online community and allowing them to voice their views
and ideas on any topic. For instance, amazon inc. subsidiary, amazon retail is a
well-known online store these days. It has an option given to users to post and
converse about their opinions about any item available on the platform, due to which
a huge amount of data is generated which is classified as semi-structured data. In
order to uncover crucial information about the items that have reviews posted about
them, understand people’s sentiment, sentiment analysis is utilised to explore and
assess these data. Sentiment analysis (SA), often known as text classification or
sentiment analysis, is an integral branch in natural language processing (NLP). The
branch of machine learning to understand human language is called natural language
processing. In this study, we look at different machine learning algorithms used by
researchers to get insights into the amazon/retail website product review sentiments.
We evaluate recent supervised classification algorithms and their combination that
have been used to identify sentiment analysis in Amazon product evaluations in order
to locate the best one that can deliver trustworthy and accurate findings. This method
may then be used as a starting point for Amazon reviews, categorization jobs, recom-
mendation systems, and so on. An accurate and reliable system to deduce the product
sentiments can broaden the spectrum of its application into movie reviews, service
reviews, etc.

O. Sarulkar (B) · R. Pitale · S. Tikhe · R. More · S. Giri


Department of Computer Engineering, Pimpri Chinchwad College of Engineering,
Pimpri-Chinchwad, India
e-mail: om.sarulkar19@pccoepune.org
R. Pitale
e-mail: rahul.pitale@pccoepune.org

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 235
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_20
236 O. Sarulkar et al.

2 Amazon Product E-Commerce

Amazon is one of the biggest internet merchants in the world. It had expanded since
its inception as an online platform in 1994. It now offers over 12 million goods and
has 200 million active users accessing the store from their PC or their phone, making
it a microcosm for great user-supplied evaluations. Amazon offers a variety of things
such as books, phone applications, movies, apparel, gadgets, toys, and so on and
uses a star-based rating system ranging from 1 to 5 stars (1 = least, 5 = most) and
provides an option to write a review. An example of the system is shown in Fig. 1.
This score system comes with no instructions on how to use it, and the product
evaluations are subjective and personal. As a result, a user might give an excellent
product a “1” but have a bad user experience, such as no satisfaction with the quality
or delivery compromise, and vice versa. The lack of rules makes identifying the
user’s feelings regarding various product elements and components of a purchasing
experience challenging. Moreover, a “5” product review does not always correspond
to the product review of an item. To gain more information about the product review,
sentiment analysis is done.

Fig. 1 Example reviews of an Amazon product


20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon … 237

3 Sentiment Analysis

Opinion mining, also known as sentiment analysis, is one of the studies under NLP
research. To investigate people’s opinions, it leverages textual data that are readily
available on e-commerce sites like Amazon. It focuses on the theme area of the text—
a word or a sentence—those points in a positive or negative direction. By offering
businesses a thorough understanding of how customers feel about their products, SA
plays a vital role in the commercial sphere. As a result, businesses may modify their
strategies to meet customer expectations and requests and avoid loss. On the other
hand, choosing the items you want to purchase might be helpful for potential buyers.

3.1 Sentiment Analysis: Degree

Sentiment analysis is often researched at three different degrees, depending on the


text groups: a document which is a collection of sentences, a unique sentence, and
lastly, at feature level. In a document, the goal is to determine if the overall tone
of the language conveys a favourable or unfavourable emotion towards a certain
entity. The sentence level of analysis, in contrast, is concerned with determining if
each sentence in the text carries a positive, negative, or neutral attitude. Item- and
aspect-level analyses may be conducted; however the other levels cannot since they
are focused only on identifying whether or not consumers like certain qualities. It is
also known as feature-level analysis and phrase-level sentiment analysis. It is used
while doing sentiment analysis on evaluations of electrical devices and movies.

3.2 Approach

In practice, two primary traditional methodologies are applied in tackling sentiment


analysis difficulties: machine learning and lexicon-based. Figure 2 demonstrates the
methodologies used in a collection of simple sentences based on customer reviews
or remarks, to discern whether negative and positive comments are mentioned in that
material. To improve the results, a hybrid approach of machine learning methods is
used which combines more than two machine learning techniques.
Machine Learning
These methods deal with the problem of how text analysis may teach a computer
programme to recognise intricate patterns and draw wise conclusions from data.
Techniques for supervised and unsupervised learning make up the majority of
it. While supervised techniques use ML classification algorithms, unsupervised
methods make advantage of clusters that offer lexicon approaches.
238 O. Sarulkar et al.

Fig. 2 Approaches in sentiment analysis

Supervised Machine Learning Method


We focus largely on data classification and categorization in supervised learning. An
algorithm typically requires a large labelled training dataset in order to be trained on
the relationship between each word (or sequence) in a text and the overall conclusion
of the sentence in a supervised way. Among other common supervised methods are
Classification Tree DT, Naïve Bayesian NB, Maximum Entropy ME, and support
vector machine SVM. This method calls for manually labelling the data, which is
usually time-consuming and not always practical.
Unsupervised Machine Learning Method
In contrast, in the unsupervised approach, we concentrate on classifying unordered
data based on commonalities or variations without providing the computer with any
data training. It makes it possible to analyse the data without the requirement for
human involvement using traditional unsupervised clustering types including hier-
archical, K-means, K-Nearest Neighbours (KNN), Principal Component Analysis
(PCA), and others. When there is a paucity of tagged data, this strategy is helpful.
When hybrid learning or semi-supervised learning is used, these methods need some
supervision of the output.
Lexicon-Based Method
This approach looks for the vocabulary that expresses the viewpoint and then evalu-
ates it, for instance by using a dictionary of words and phrases that express the opinion
as well as their synonyms and antonyms, as well as the associated emotion scales.
Additionally, it is separated into dictionary-based and corpus-based approaches.
20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon … 239

Dictionary Based
WordNet, SentiWordNet, and online dictionaries are just a few examples of opinion
dictionaries that often feature both positive and negative opinions. This approach
looks for words with ambiguous meaning in the text, compares them to terms from
the dictionary, and then calculates the appropriate scores. This approach cannot find
views that are domain- or context-specific.
Corpus Based
In order to find domain- or context-specific views that dictionary-based techniques
are unable to find, it finds opinionated keywords in the corpus and assigns polarity
to all of these words. It calls for an English dictionary or a dictionary with a sizable
word definition database. The algorithm must be able to access and retrieve it.
Hybrid Machine Learning
Hybrid machine learning is a method where two or more machine learning algorithms
are used together to obtain better results. Results of one model are used to augment
the input to another model. This kind of ensemble learning improves the quality of
data when it is fed to the classification model.

4 Literature Review

4.1 Roadmap for the Literature Survey

The literature survey was conducted based on recent developments in the field of
sentiment analysis primarily on Amazon product reviews. Figure 3 demonstrates the
process followed for the survey. Firstly, the application of supervised classification
algorithms on Amazon product review was studied. After surveying the recent studies
and researches, papers containing a combination of best performing classification
algorithms were surveyed. To improve accuracy of existing algorithms, researchers
have implemented artificial neural networks (ANN) for the classification process.
Lastly, the application of ANN was surveyed.

Fig. 3 Roadmap for the literature survey


240 O. Sarulkar et al.

4.2 Previous Work

At first glance, we begin by looking at related work which uses traditional supervised
learning algorithms to calculate the performance of machine learning models. The
algorithms that are in focus are support vector machines—SVMs, Naive Bayes—NB,
and Decision Trees.
The authors in [1] compared three classification algorithms: SVM, NB, and
Maximum Entropy. As the number of data points in training increased, the perfor-
mance of SVM improved subsequently compared to NB and the poorest performer
was ME. However, SVM suffered when unigrams were used in preprocessing. In [2],
the authors used six different classification models along with five, tenfold cross-
validation. SVM performed the best with tenfold, while the limitation is being that
tenfold takes up large amounts of time. The paper surveyed in [3] however tried the
NB and OneR classification methods. OneR performed better but took a very large
amount of time, while NB was faster with similar results. The Ensemble Classi-
fier beat the aforementioned machine learning algorithms when it was compared in
[4] to others including logistic regression, SVM, Naive Bayes, Decision Tree, and
Multinomial. In [5], in this paper, authors used a combination of bigram mode with
SVM, so the hybrid algorithm gives the highest accuracy of 85%. In [6], the authors
compare between two machine learning approaches which are SVM and NB for
analysing the sentiment of the customers’ reviews on Amazon products. SVM offers
a much greater accuracy and precision recall. The authors of [7] analyse the dataset
of Amazon reviews and investigate sentiment categorization using several machine
learning techniques. The reviews were first converted into word vectors using a
variety of methods, including glove, TF-IDF, and bag-of-words. Then, they trained
many machine learning algorithms, including bert, naive bias, bidirectional long-
short memory and long—term, random forest, and logistic regression. The models
were then assessed using cross-entropy gradient descent, precision, F1-score, accu-
racy, and recall. In [8], the authors examine preprocessing procedures on the dataset,
such as stemming, tokenization, casing, stop word removal, and eventually offer a
rating for its categorization in negativity or positivity. In [9], we see a rise in accu-
racy of scores while using unstructured data. The model achieves an accuracy of
98% of Naive Bayes algorithm and accuracy of 93% of SVM. In [10], the authors
had done the context-based analysis for Amazon products. The was collected from
amazon product site and preprocessed accordingly for analysis data. They had used
the Naive Bayes and Support Vector Machine models to classify the reviews and then
perform the context-based analysis. Measures of performance, i.e. precision, recall,
and F1-scores were calculated, and on the basis of that, models were compared. The
area of work was to improve the sales based on the sentiments delineated, and every
product was considered whether it has positive or negative inclined reviews. In [11],
the authors had done the sentiment analysis of products using machine learning. They
had gathered the data from Amazon product site for the following products: Cameras,
Laptops, Tablets, and Televisions. The data are treated with preprocessing technique.
The preprocessing technique used is bag-of-words (BOW). The data then are used to
20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon … 241

train Naive Bayes and support vector machine classifiers to mould the models. Naive
Bayes classifier came up with 90% and above accuracies for each product, whereas
the support vector machine classifier performed dim with accuracies less than 90%.
Thus, the Naive Bayes was superior to SVM in sentiment analysis. The authors of
[12] conducted a sentiment analysis of user reviews for Amazon items. They had
gathered the information from the Amazon product page, performed some rudimen-
tary preprocessing on it, and then utilised it right away for model training. Decision
Tree, Naive Bayes, and Support Vector Machine were the algorithms used for the
study. The writers of [13] had collected the information from the Amazon goods
page. Following that, the data were analysed using review-level and sentence-level
classifications. The categorising method used was called “Phrase of Speech.” The
training of the model was then supplied with these data. The classification algorithms
Naive Bayes and support vector machine were taught. [14] describes a categorization
method that the authors developed for a dataset of music CDs and Microsoft goods
that were scanned using a Python crawler. They looked at five different categories
(most negative, negative, neutral, positive, and most positive). The paper used three
different types of adverbs as features, namely Adverbs RB, Comparative adverbs
RBR, Superlative adverbs RRS, as well as a mixture of them, to achieve review-level
classification. Other classifiers included RF, DT, NB, SVM, GB, and LSTM classi-
fiers. The analyses show that a single RBR feature is adequate for most classifiers,
with the exception of LSTM and NB, and that a combination of RBR-RBS features
is more effective for all classifiers [15]. They made use of the Amazon polarity
dataset for their study. They have used deep learning models LSTM, CNN, SVM,
and logistic regression. A sizable dataset had been used to test each model. The
optimal combination approach was found to operate stemming over lemmatization
and exclude spelling checking. They investigated and analysed several preprocessing
strategies that increase accuracy. They used a variety of feature techniques, including
their TF-IDF, bag-of-words, and n-grams.
Moving on towards hybrid machine learning approaches where techniques such
as ensemble learning is used to change NLP rules or augment input data. Researches
have tried to improve the input data towards the classifier models.
In [16], SVM and NB are used as classification models, but their input data are
enriched using reputation scores. This method uses previous data for the assigning
of weights bringing dependency into the previous data. In [17], authors have tried
to categorise the training dataset using SVM and later k-means for clustering. This
model outperformed the individual classifiers. The authors in [18] used KNN for
grouping data and NB and LSTM for classification. LSTM provided better accuracy
while it suffered when the dataset was large. In [19], the authors performed ensemble
learning compared to Naive Bayes and SVM. The ensemble method gave much better
results, while the other two suffered. In [20], technologies used are data cleaning
and preprocessing. This paper dataset is used as relevant graphs. This dataset has
the highest accuracy, almost 95.7%. In [21], the authors tried a hybrid rule-based
approach to observe results of algorithms such as SVM, RF, and NB. The hybrid rule-
based approach got better results [22]. The authors used RF to form an ensemble of
decision trees. The tree data structure was used with SVM to form a classifier model.
242 O. Sarulkar et al.

The hybrid model showed a 2% rise in accuracy. [23] The authors have revisited the
RF ensemble method paired with SVM. They achieved a greater accuracy than [24]
with the same dataset. Bootstrap method was used as an extension of Random Forest.
[24] The authors employed an ensemble learning method in data preprocessing where
unigram, bigram, and trigram with and without stop word removal were used. RF with
unigram with stop word removal showed the best results. In [25], the researchers had
used natural language processing on the Arabic language reviews on products. They
had built the recurrent neural network of the sentiment analysis of those reviews.
They had built the dataset of the Arabic language reviews. The model performs at
the considerably efficiency of 85% on the given dataset which consists of 7480 test
items. The model will behave more precisely when trained with the large data.
Tables 1 and 2 show the comparison between different research approaches based
on the literature review.
From the comparison table, we can deduce that conventional supervised learning
algorithms perform worse than hybrid methods. In [8, 10, 11], we observe that
enhancing the preprocessing data improves the accuracy significantly. The use of
hybrid methods, i.e. ensemble learning helps the classifier algorithm and improves
its performance.

Table 1 Comparison of conventional supervised algorithms


References No. Tools used Dataset Accuracy (%)
[1] SVM, ME, NB Amazon.com 81.2, 70.3, 77.42
[2] SVM, NB, GD, RF, LR, DT Amazon.com 93, 90, 91, 92, 88, 91
[3] NB, OneR Amazon, Twitter 85, 87
[4] SVM, NB Amazon.com 82,38
[5] SVM Amazon.com 85
[6] SVM, NB Amazon.com 84, 82.875
[7] NB Amazon.com 82
[8] SVM Amazon.com 83
[9] NB, SVM Amazon.com 98, 93
[10] NB, SVM Amazon.com 84, 81
[11] SVM, NB Amazon.com < 90, 90 >
[12] NB, DT, SVM Amazon.com 66, 74, 81
[13] SVM, NB Amazon.com 90, 96
[14] RF, DT, NB, SVM Amazon.com 95, 95, 91, 94
[15] LR, SVM, NB Amazon.com 83, 91, 90
20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon … 243

Table 2 Comparison of hybrid methods


References No. Tools used Dataset Accuracy (%)
[16] Enriched SVM, NB Amazon.com 86.4, 84.2
[17] SVM, k-means Twitter, Amazon 88.32
[18] NB, KNN and NB, LSTM Amazon.com 87, 92
[19] NB, SVM Amazon.com 78.68
[20] Ensemble learning Amazon.com 95.7
[21] KNN, RF Amazon.com 83
[22] SVM, RF Amazon.com 83.4
[23] SVM, RF Amazon.com 84.7
[24] RF, unigram Amazon.com 89.87
[25] RNN, NLP Arabic dataset 85

5 Literature Survey Conclusion

Figure 4 shows the steps and the workflow researchers have followed to come up
with the conclusions of their sentiment analysis research.

Fig. 4 Sentiment analysis


workflow
244 O. Sarulkar et al.

5.1 Data Collection

The goal of this stage is to import these data, eliminate columns, deal with missing
values, and so on, to prepare the data for future processing—the Pandas Python
library may help a lot in this step. A suitable dataset must be established before the
text can be analysed and classified.

5.2 Data Preprocessing

Data Preparation
After obtaining the text, the data must be prepared for usage in subsequent machine
learning procedures. Preprocessing is used to remove data that are useless for text
categorization, such as grammar, digits, accent marks, stop words, sparse terms,
white spaces, and specific words. Other components of this include word conversion
to lower case, tokenization, stemming, lemmatization, part of speech labelling, and
so on. These noisy data may have an impact on the classifier’s accuracy. In this stage,
it is preferred to use the natural language processing toolkit (NLTK).
Feature Extraction and Selection
Features must explain the data in the format needed by the machine learning algorithm
for it to find a solution. By combining and reformatting these initial characteristics
using a number of approaches (such as TF-IDF, POS, N-grams, Word Embedding,
BOW), feature extraction creates a new collection of features that may be used by
machine learning models. Then, dismiss everything except the important, helpful, and
illuminating components. It avoids overfitting and the dimensionality curse, which
occurs when there are too many features to properly represent inadequate data, by
removing redundancy or gaining a predetermined number of features. The extraction
and selection of features have a significant impact on the classifier’s accuracy. As a
consequence, the best technique for acquiring the attributes must be selected. The
Scikit-learn package has a number of built-in algorithms that might be quite useful
in this situation.

5.3 Sentiment Categorization

This stage involves determining the polarity of the review documents using a number
of sentiment classification techniques; in SA, supervised learning techniques are
often used to apply the sentiment label to a specific text. One of two types best
describes SA problems: binary issues with positive and negative labels. Another
example is multi-class, which specifies more than two labels (most positive, positive,
neutral, negative, and most negative). Python library for machine learning and data
20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon … 245

preparation—The Scikit-learn library contains a number of classes that assist in this


process.

5.4 Evaluating Results

The success of the machine learning techniques used to establish the overall accuracy
of the sentiment analysis will be evaluated in this last step. The models generate labels
of 1 and 0 as their result. Later, a confusion matrix is created by evaluating these
labels, yielding true positives (TP), false positives (FP), true negatives (TN), and false
negatives (FN). True positives and true negatives are values that the model correctly
predicts genuine labels, while false positives and false negatives are values that the
model got incorrect. The performance metrics that are obtained from the confusion
matrix employed statistical metric parameters in the Scikit-learn toolkit to assess
the performance of each algorithm are accuracy (1), precision (2), recall (3), and
F1-score (4).

Accuracy = TP + TN/TP + TN + FN + FN, (1)

Precision = TP/TP + FP, (2)

Recall = TP/TP + FN, (3)

F1 Score = 2 ∗ TP/2 ∗ TP + FP + FN. (4)

6 Proposed Work

We saw that the supervised machine learning algorithms could achieve an accu-
racy of only. Paired with ensemble machine learning methods, the accuracy only
increases by at most 2%. The proposed methodology in this paper aims to improve
the already existing random forest ensemble method by removing the covariance in
data preprocessing. This method will form better random forest ensembles and will
try to improve the accuracy of supervised machine learning algorithms. Figure 5
illustrates the proposed methodology.
The support vector machine model will get the data input that has been already
broken down into decision trees which will try to improve the performance metrics
of the SVM classifier.
246 O. Sarulkar et al.

Fig. 5 Block diagram of


proposed methodology

7 Conclusion and Future Work

Sentiment analysis is the computational study of irrational textual expressions


that represent the user’s view about things on microblogging social media sites.
Researchers are working to find very precise solutions to the problems. This study
compared several criteria including features, approaches, and accuracy to uncover
the sentiment opinion concealed in Amazon reviews’ data using classic supervised
learning methods as well as hybrid methods that are often utilised by researchers.
The importance of supervised ensemble learning in improving established techniques
like RF, LR, SVM, and NB is discussed in more detail. This paper provides a starting
point for further study on the use of sophisticated hybrid machine learning techniques
and unsupervised algorithms. These strategies work as well in other e-commerce
platforms.
20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon … 247

References

1. Rathor AS, Agarwal A, Dimri P (2018) Comparative study of machine learning approaches
for Amazon reviews. Procedia Comput Sci 132:1552–1561 (2018)
2. Haque, TUl, Saber NN, Shah FM (2018) Sentiment analysis on large scale Amazon product
reviews. In: 2018 IEEE international conference on innovative research and development
(ICIRD). IEEE
3. Singh J, Singh G, Singh R (2017) Optimization of sentiment analysis using machine learning
classifiers. HCIS 7(1):1–12
4. Brownfield S, Zhou J (2020) Sentiment analysis of Amazon product reviews. In: Proceedings
of the computational methods in systems and software. Springer, Cham
5. Maurya S, Pratap V. (2022) Sentiment analysis on amazon product reviews. In: 2022 interna-
tional conference on machine learning, big data, cloud and parallel computing (COM-IT-CON),
pp 236–240. https://doi.org/10.1109/COM-IT-CON54601.2022.9850758
6. Dey S, Wasif S, Tonmoy DS, Sultana S, Sarkar J, Dey M (2020) A comparative study of support
vector machine and naive bayes classifier for sentiment analysis on Amazon product reviews.
In: 2020 international conference on contemporary computing and applications (IC3A), pp
217–220. https://doi.org/10.1109/IC3A48958.2020.233300
7. AlQahtani, ASM (2021) Product sentiment analysis for amazon reviews. Int. J. Comput. Sci.
Inf. Technol. (IJCSIT) 13(3), June 2021, Available at SSRN: https://ssrn.com/abstract=388
6135
8. Nandal N, Tanwar R, Pruthi J (2020) Machine learning based aspect level sentiment analysis
for Amazon products. Spat Inf Res 28:601–607. https://doi.org/10.1007/s41324-020-00320-2
9. Jagdale RS, Shirsat VS, Deshmukh SN (2019) Sentiment analysis on product reviews using
machine learning techniques. In: Mallick P, Balas V, Bhoi A, Zobaa A (eds) Cognitive infor-
matics and soft computing. Advances in intelligent systems and computing, vol 768. Springer,
Singapore. https://doi.org/10.1007/978-981-13-0617-4_61
10. Sindhu C, Rajkakati D, Shelukar C, Chandra Sekharan S (2020) Context-based sentiment anal-
ysis on Amazon Product customer feedback data. https://doi.org/10.1007/978-981-15-5329-5_
48
11. Jagdale R, Shirsath V, Deshmukh S (2019) Sentiment analysis on product reviews using
machine learning techniques: proceeding of CISC 2017. https://doi.org/10.1007/978-981-13-
0617-4_61
12. Singla Z, Randhawa S, Jain S (2017) Sentiment analysis of customer product reviews using
machine learning. In: 2017 international conference on intelligent computing and control
(I2C2). IEEE
13. Fang X, Zhan J (2015) Sentiment analysis using product review data. J Big Data 2:5. https://
doi.org/10.1186/s40537-015-0015-2
14. Kausar S, Huahu X, Ahmad W, Shabir MY, Ahmad W (2020) A sentiment polarity categoriza-
tion technique for online product reviews. IEEE Access 8:3594–3605. https://doi.org/10.1109/
ACCESS.2019.2963020
15. Katić T, Milićević N (2018) Comparing sentiment analysis and document representation
methods of amazon reviews. In: 2018 IEEE 16th international symposium on intelligent systems
and informatics (SISY), pp 000283–000286, https://doi.org/10.1109/SISY.2018.8524814
16. Benlahbib A, Nfaoui EH (2020) A hybrid approach for generating reputation based on opinions
fusion and sentiment analysis. J Organ Comput Electron Commer 30(1):9–27 (2020)
17. Korovkinas K, Danėnas P, Garšva G (2019) SVM and k-means hybrid method for textual data
sentiment analysis. Baltic J Mod Comput 7(1):47–60
18. Budhwar MJ, Singh S (2021) Sentiment analysis based method for Amazon product reviews.
Int J Eng Res Technol (Ijert) Icact 9(08) (2021)
19. Sadhasivam J, Babu R (2019) Sentiment analysis of Amazon products using ensemble machine
learning algorithm. Inter J Math Eng Manage Sci 4:508–520. https://doi.org/10.33889/IJM
EMS.2019.4.2-041
248 O. Sarulkar et al.

20. Iqbal F et al (2019) A hybrid framework for sentiment analysis using genetic algorithm based
feature reduction. IEEE Access 7:14637–14652. https://doi.org/10.1109/ACCESS.2019.289
2852
21. Dadhich A, Thankachan B (2022) Sentiment analysis of amazon product reviews using hybrid
rule-based approach. In: Smart systems: innovations in computing. Springer, Singapore, pp
173–193
22. Al Amrani Y, Lazaar M, El Kadiri KE (2018) Random forest and support vector machine-based
hybrid approach to sentiment analysis. Procedia Comput Sci 127:511–520
23. Al Amrani Y, Lazaar M, El Kadiri KE (2018) A novel hybrid classification approach for
sentiment analysis of text document. Int J Electr Comput Eng 8(6), 2088–8708 (2018)
24. Alrehili A, Albalawi K (2019) Sentiment analysis of customer reviews using ensemble method.
Int Conf Comput Inf Sci (ICCIS) 2019:1–6. https://doi.org/10.1109/ICCISci.2019.8716454
25. Alroobaea R (2022) Sentiment analysis on Amazon product reviews using the recurrent neural
network (RNN). Int J Adv Comput Sci Appl 13(4) (2022)
Chapter 21
Sentimentum: A Method of Detecting
Fake News

Vitor da Silva Souza and Leandro Augusto Silva

1 Introduction

In recent years, the topic of fake news has experienced a growth of interest in society.
Events like Brexit [2], the US election of the president in 2016, and more recently the
pandemic of covid-19 contributed to the growth of these interests. In social media,
fake news has wide dissemination, compared with traditional media like tv, radio,
and journal. Social media gives the possibility of any user spreading news in a few
seconds, in contrast, which also gives the possibility of any user spreading amounts
of fake news in seconds.
There is no universal definition for fake news, but there are concepts that are
always related when talking about fake news, definitions that, although imprecise,
help us to understand the topic and research problems that are related to it [3].
The authors [3] argue that there are some concepts related to fake news as news
with bias and deceptive discourse. However, what distinguishes this concept from
fake news is that also the false information author has the intentionality to obtain an
advantage with the dissemination of fake news, whether economic or political advan-
tages, in addition, fake news presents a fast spread on the network, often associated
with the use of bots.
Given this scenario and the significant result obtained by machine learning
approaches in other problems [3–8], this paper proposes to modify a method
presented in the paper Detecting Deceptive Discussion in Conference Calls (D3C2)

V. da Silva Souza (B)


Natural Computing Laboratory, Mackenzie Presbyterian University, São Paulo, SP, Brazil
e-mail: vitorsouza7512@gmail.com
L. A. Silva
Postgraduate Program in Electrical and Computer Engineering (PPGEC), Mackenzie Presbyterian
University, São Paulo, Brazil
e-mail: leandroaugusto.silva@mackenzie.br

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 249
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_21
250 V. da Silva Souza and L. A. Silva

to the context of detection of fake news utilizing algorithms of machine learning and
techniques of natural language processing.
This paper is organized as follows. Section 2 describes the key concepts applied
in this study. Section 3 presents the key concepts from the paper Detecting Deceptive
Discussions in Conference Calls, and this paper inspired our method called Senti-
mentum, to detect fake news statements [9]. The proposed approach is detailed in
Sect. 4. Finally, Sect. 5 depicts the final considerations, as well as the possibilities
regarding future research.

2 Fake News Detection

To understand fake news detection firstly, we need to define what is fake news.
According to [3], we do not have a universal definition of what is fake news, but we
have some concepts that help us to understand what fake news is. Fake news can be
understood as an intentional distribution of unreliable news disseminated in media
like journals, television, radio, and social media and wants political, economic, or
social benefits [10]. Fake news detection is the task of evaluating news claims and
classifying them as true news or fake news, according to [11] we have seven types
of fake news: satire or parody; false connection; misleading content; false context;
imposter content; manipulated content; and fabricated content, and Fig. 1 details the
meaning of each of these seven types of fake news.
The automatic detection of fake news is the task of evaluating statements in news,
classifying them as true or false (true news or fake news) [4]. With the dissemination
of fake news in social media, traditional document structuring techniques of natural
language processing (NLP), such as bag-of-words or n-gram, are used, but they have
the following limitations [9]:
• As they are based on word count, they do not consider the context in which the
word is used.

Fig. 1 Seven kinds of fake news. Adapted from [11]


21 Sentimentum: A Method of Detecting Fake News 251

• In the case of n-grams, the processes present a high computational cost, for greater
values of n.

3 Detecting Deceptive Discussions in Conference Calls

In this paper, we adapted the method used in the paper Detecting Deceptive Discus-
sion in Conference Calls (D3C2) to the context of fake news detection. In D3C2, the
authors perform a linguistic and syntactic analysis of texts extracted from closing
conferences of the company’s quarterly financial statements [9].
The set of calls from these conferences was transcribed into texts and served as a
basis for building a model for predicting the probability of an error in the disclosure
of quarterly reports. The set of conferences analyzed comprises the period from
September 2003 to May 2007. The purpose of this method was to identify misleading
speeches propagated by the CEOs and CFOs of these companies at quarterly income
statement conferences.
The authors argue that CEOs and CFOs often have real knowledge of the data, but
for economic reasons, they may present intentionally false information. This type
of analysis interests researchers, investors, creditors, and financial market regulatory
bodies as it manages to capture misleading disclosures more accurately [9].
To carry out the linguistic and syntactic analysis, the authors base themselves on
the literature review based on [12], which has four perspectives of psychology as a
premise: emotions, cognitive effort, attempt to control, and lack of embracement.
To extract the linguistic and syntactic features from the text, the authors use the
linguistic inquiry and word count (LIWC) software, extracting words associated with
the LIWC categories from the text, using the premise that these categories are the
ones that best fit in the detection of deceptive speech [9].
The LIWC software reads the text and compares each word with its internal
dictionary’s word list and calculates the percentage of the total words in the text
that match each of the dictionary’s categories. Internally, LIWC applies the “bag
of words” model that represents the text through a vector of words, counting how
many times a given word appears in the text, the difference between LIWC and the
“bag of words” model is that with LIWC, and the words that are found within the
LIWC category dictionary are counted. This dictionary has specific categories that
are associated with psychology, and in this way, the dictionary counts the number of
words that occur for each LIWC category [9].

4 Evaluation

In this section, to empirically validate our developed system called Sentimentum, we


applied the same method presented in section D3C2 [9] to realize fake news detection
in news on the internet. We first introduce the study setup of our experiments.
252 V. da Silva Souza and L. A. Silva

4.1 Study Setup

4.1.1 Datasets

We utilize an open fake news dataset based on Kaggle: “Fake News—Build a system
to identify unreliable news articles” which was prepared by students at the University
of Tennessee [13]. The database has 20,800 news organized into five attributes: id,
title, author, text, and label. The id attribute represents a unique identifier, the title
attribute represents the title of the text, the author attribute contains the name of the
author of the news, the text attribute contains the text of the news, and the label
attribute represents the classification of the news (zero (0) means true news and one
(1) means fake news) [13]. The database has a random distribution of 50% fake news
and 50% true news, and the texts of the text attribute are in the English language.
To evaluate the performance of the method, we will use methods, such as accuracy,
precision, recall, and confusion matrix [14].

4.1.2 Experimental Setting

The first step utilized the software LIWC in our dataset [13], in the attribute text,
representing the text of the news in English. The LIWC calculates the degree of
different categories of words through your intern dictionary also called LIWC. The
LIWC has different categories like anxiety, anger, affectivity, positive, negative, etc.
The software realizes processing called tokenization, stemming, and remotion of
stop words to count words associated with your internal dictionary. LIWC counts
the words in your internal dictionary and calculates the percentage of words in the
text associated with your internal dictionary.
The software counts the words within the text it finds in your dictionary, then
calculates the percentage of words that belong to each category. After this, we have
defined that there were texts, in which all attributes had a value zero, that is, after
applying LIWC, no information was obtained from any attribute associated with the
internal dictionary LIWC, and these texts with missing values were removed from the
dataset. We also removed some texts that had 100% of the text in just one attribute.
A second treatment was performed to remove outliers that had more than 20% of the
text in a single attribute. After the preprocessing, the dataset went from 20,800 texts
to 20,552 texts.
Table 1 shows a sample dataset after preprocessing performed with LIWC, and
this sample has only five lines out of a total of 20,552 records and ten attributes out
of a total of 28, considering the label attribute, our target attribute. In the database,
we can visualize a percentage of words belonging to each attribute we previously
chose in LIWC.
To adapt the D3C2 method in this study, we use LIWC categories as a basis,
which were used in the D3C2 paper and categories that fit the premises listed by
the authors, that is, the four perspectives of psychology: emotions, cognitive effort,
21 Sentimentum: A Method of Detecting Fake News 253

Table 1 Sample of a dataset after preprocessing with LIWC


label pronoum ppron i we prep negate affect posemo negemo
0 1 19.75 13.03 4.50 1.87 9.71 0.93 5.31 3.78 1.28
1 0 19.40 11.80 5.65 2.21 12.13 2.41 5.11 4.09 0.97
2 1 10.39 5.44 1.65 1.17 13.77 1.68 4.89 3.15 1.66
3 1 13.23 7.56 2.00 1.03 12.73 1.82 4.60 2.71 1.82
4 1 12.56 7.01 2.03 1.01 13.79 1.55 4.83 2.99 1.82

attempted of control and lack of embracement, the application of the method, and
the 28 categories selected are listed in Table 2.

4.2 Classification

After the preprocessing of the dataset, we implemented algorithms of machine


learning. The first algorithm applied was support vector machine (SVM), we have
defined the target attribute as the label, and this attribute assumes 0 when we have
true news and 1 when we have fake news.
We applied cross-validation to divide tests and training in classification, with
tenfolds. After the application of the algorithm, we observed an accuracy of 0.996.
In Fig. 2, we have a confusion matrix generated after the implementation of the
algorithm, and we verify that there is a balance in the database and the classifica-
tion algorithm because declassification algorithm in false negative and true positive
presented close values, that is, the algorithm did not present a bias classification,
framing the largest size in a single quadrant.
The second algorithm used to classify the database was a decision tree. The
algorithm was divided into 20% test and 80% of the base as training. In Fig. 3, we
have the result of applying the algorithm as the max depth parameter which represents
the maximum depth of the tree that will be generated equal to three. As the depth
increases, it becomes increasingly difficult to see the generated three.
The algorithm was applied due to its visualization which allows us to identify
which attributes exert greater influence on the classification of true news or fake
news. From the visualization generated in Fig. 3, it can be seen for the decision tree
algorithm, and the attribute power is the most influential factor for the detection of
fake news five the sample that was used.
It is noted that of the 27 LIWC categories used, four have greater relevance for the
classification of fake news, and they are: power, certain, negate, prep, and I (personal
pronoun). This result corroborates the four perspectives followed as a basis for the
D3C2 study, this is liars tend to be more negative and seek to remove the first person
for bringing details that may compromise the veracity of the study and tend to lack
conviction for not having experienced the fact. Who is narrating, even in the news,
where there is more time to prepare the lie.
254 V. da Silva Souza and L. A. Silva

Table 2 LIWC attributes


Column Examples Data
used in preprocessing
label 0 true news and 1 fake news Binary
pronoun I, them, itself Numeric
ppron Personal pronoun Numeric
i I, me, mine, etc. Numeric
we We, us, our, etc. Numeric
prep Preposition Numeric
negate No, not, never, etc. Numeric
affect Love, like, etc. Numeric
posemo Positive emotions Numeric
negemo Negative emotions Numeric
anx Worried, fearful, nervous, etc. Numeric
anger Hate, kill, annoyed, etc. Numeric
sad Sadness Numeric
social Social Numeric
family Family Numeric
friend Friend Numeric
cause cause Numeric
certain Always, never, etc. Numeric
feel Love, touch, etc. Numeric
power Power, Numeric
risk Danger, accident, etc. Numeric
relativ Related, dependent, etc. Numeric
money Money, cash, capital, etc. Numeric
relig Faith Numeric
death Death, kill, etc. Numeric
informal informal Numeric
swear Screw, hell, etc. Numeric
assent Agree, ok, yes Numeric

The first attribute that has great influence in determining whether we have fake
news is the attribute power, the model identifies that for values smaller than 0.745
there is a set of 959 samples that have a high probability that the text is true compared
with values less than 1.395 there is a greater probability that we are dealing with fake
news.
21 Sentimentum: A Method of Detecting Fake News 255

Fig. 2 Confusion matrix SVM

Fig. 3 Decision tree fake news

5 Conclusion

People are incrementally producing and consuming news through social media,
instead of traditional media like journals, magazines, and tv. The dissemination of
fake news has intensified in recent years in events like Brexit and the 2016 presi-
dential election of Donald Trump [2]. The study of the identification of fake news is
fundamental to identifying and combating the disinformation that represents political,
economic, and social risks.
256 V. da Silva Souza and L. A. Silva

To overcome this problem of dissemination of fake news, this paper presented


a method of automatic detection of fake news based on sentiment analysis called
Sentimentum. The method uses LIWC to extract categories of words in news and
calculates the percentage of each category in text. Through the extraction of these
categories applied a preprocessing removing outliers and noise and finally applied
algorithms of machine learning like SVM and Decision Trees to the classification of
a dataset in true news and fake news.
This method was based on an article called D3C2, that is based on psychology
perspectives to choose some categories of LIWC to identify deceptive discourses in
conference calls.
The paper presents results satisfactory to the detection of fake news when
compared to other fake news detection studies [15], the best value accuracy was
0.920, and in this article, we reached an accuracy of 0.996 for the SVM algorithm,
with the fake news detection context. A second aspect of the research that is worth
mentioning is the relevance of the attributes identified in LIWC in comparison with
the assumptions used by the authors of article D3C2 to select attributes of the LIWC.
For the result obtained in the research through Decision Tree, we verified that the
premises presented in D3C2 are observed in the context of fake news detection, that
is, the attributes negate and I attribute. The negate attribute represents negative words
and the I attribute is an extremely relevant attribute for the classification of fake news
and true news.
As a suggestion for future work, carry out the application of the Sentimentum
method using a database in Portuguese and the LIWC dictionary in Portuguese
available by Aluisio et al. [16]. This process would involve carrying tokenization,
lemmatization, and word count that are performed by LIWC software for the LIWC
dictionary in Portuguese, in addition to testing the method with other algorithms
instead of decision trees and SVM which were used as deep learning algorithms
such as convolutional networks and recurrent neural networks.
Another suggestion for future work would be to use techniques that capture part
of speech such as POS because one of the weak points of using the bag-of-words
technique is that it does not carry out the semantic analysis of the text, which can
bring inaccuracies in analyses that are done individually word by word.

References

1. Hootsuite Digital (2021) Available 7 Dec 2021, from Hootsuite Inc: https://hootsuite.widen.
net/s/zcdrtxwczn/digital2021_globalreport_en
2. Bastos MT, Mercea D (2019) The Brexit botnet and user-generated hyperpartisan news. Social
science computer review
3. Zhou X, Zafarani R (2020) A survey of fake news: fundamental theories, detection methods,
and opportunities. ACM Comput Surv (CSUR) 1–40
4. Oshikawa R, Qian J, Wang WY (2018) A survey on natural language processing for fake news
detection. arXiv preprint arXiv:1811.00770
5. Parikh SB, Atrey PK (2018) Media-rich fake news detection: A survey. IEEE Conf Multimedia
Inf Process Retrieval (MIPR) 2018:436–441
21 Sentimentum: A Method of Detecting Fake News 257

6. Lillie AE, Middelboe ER (2019) Fake news detection using stance classification: a survey.
arXiv preprint arXiv:1907.00181
7. Cardoso Durier da Silva F, Vieira R, Garcia AC (2019) Can machines learn to detect fake news?
a survey focused on social media. In: Proceedings of the 52nd Hawaii international conference
on system sciences
8. Shu KE (2017) Fake news detection on social media: a data mining perspective. ACM SIGKDD
Explor Newsl 22–36
9. Larcker DF, Zakolyukina AA (2012) Detecting deceptive discussions in conference calls. J
Account Res 50(2):495–540
10. Kaplan A (2020) Artificial intelligence, social media, and fake news: is this the end of
democracy? Media Soc 149
11. Wardle C, Derakhshan H (2017) Information disorder: toward an interdisciplinary framework
for research and policy making. Counc Europe
12. Vrij A (2008) Detecting lies and deceit: Pitfalls and opportunities. Wiley
13. Kaggle BA (2017) Build a system to identify unreliable news articles. Available 4 Nov 2021,
from Kaggle: https://www.kaggle.com/c/fake-news/data
14. de Castro LN, Ferrari DG (2016) Introduction to data mining, 1ª. Saraiva Educação SA, São
Paulo
15. Medeiros FD, Braga RB (2020) Fake news detection in social media: a systematic review. A
systematic review. In: XVI Brazilian symposium on information systems, pp 1–8
16. Aluisio S, Checchia R, Chishman R (2022). PortLex. Fonte: LIWC: http://143.107.183.175:
21380/portlex/index.php/pt/projetos/liwc
Chapter 22
Artificial Neural Networks for Self-phase
Modulation Compensation in Unrepeated
Digital Coherent Optical Systems

Grazielle Cossa, Camila Costa, Vitória Cesar, Lucas Marim, Rafael Penchel,
José Augusto de Oliveira, Mirian Santos, Denilson Souza dos Santos,
and Ivan Aldaya

1 Introduction

The popularization of multimedia applications and the migration to cloud storage and
computing services are forcing Internet service providers to increase their transmis-
sion rates [1]. To meet these capacity requirements, optical communication systems
have undergone a silent revolution, migrating from traditional intensity-modulated
with direct detection systems to digital coherent systems [2]. Thus, the traditional
communications systems where information was transmitted just by modulating the
intensity of a lightwave have been progressively substituted by more sophisticated
systems in which not only the amplitude but also the phase and polarization diver-
sity are exploited to achieve higher spectral efficiency [3]. Digital coherent systems
were initially adopted in long-distance systems but, as the electronic evolves, they
became competitive at shorter ranges. As an example, in May 2020, the 400ZR
communication standard for connection between data centers was released [4]. This
standard aims to support up to four multiplexed 100G Ethernet connections, employ-
ing dual polarization 16-ary quadrature amplitude modulation (DP-16QAM). This
standard considers two operating modes: an unamplified single-channel system and
an amplified system with wavelength channel multiplexing. In both cases, the system
is limited by the combination of additive noise and nonlinear distortion induced by
the fiber Kerr effect. The Kerr effect is the optoelectronic effect by which the refrac-
tive index of the medium varies in the presence of high-intensity electromagnetic
waves [5]. In fiber transmission systems, this effect gives rise to three well-known
signal distortions denominated self-phase modulation (SPM), cross-phase modula-
tion (XPM), and four-wave mixing (FWM). Which of these distortions is dominant
will be dependent on the system configuration [5].

G. Cossa · C. Costa · V. Cesar · L. Marim · R. Penchel · J. A. de Oliveira · M. Santos ·


D. Souza dos Santos · I. Aldaya (B)
School of Engineering of São João da Boa Vista, Center for Advanced and Sustainable
Technologies (CAST), São Paulo State University (UNESP), São Paulo, Brazil
e-mail: ivan.aldaya@unesp.br
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 259
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_22
260 G. Cossa et al.

Due to the stochastic nature of additive noise, it is difficult to compensate for


its effect via digital signal processing (DSP). On the other hand, nonlinear distor-
tion is deterministic, and its effect can be mitigated, at least partially, in the digital
domain after the photodetection of the signal at the receiver. The first methods of
compensating nonlinearities were based on model inversion. In particular, digital
backpropagation (DBP)-based equalizers [6] and the inverse Volterra series transfer
function (IVSTF) have been extensively studied [7, 8]. Unfortunately, the computa-
tional cost of these algorithms, as well as the resulting latency, limit their adoption in
real-time applications. In this context, equalizers based on artificial intelligence have
emerged as a compromise between performance and computational cost. Artificial
neural networks (ANNs) have attracted increasing attention due to their flexibility
and adaptability to different problems [9, 10]. Among the different ANN topologies,
one of the most used is the multilayer perceptron (MLP) due to its flexibility and
efficient training [11].
In this paper, we use MLPs to compensate for the nonlinear distortion in 175 km
unrepeated optical links based on digital coherent technology and employing polar-
ization multiplexing. We consider two approaches: (i) processing of each polarization
independently and (ii) processing of both polarizations at the same time. By adopting
this method, we achieve a reduction of the bit-error ratio (BER) from 0.8 × 10−4 to
0.4 × 10−4 and 0.2 × 10−4 for approaches (i) and (ii), respectively. We also analyze
the training process and optimize the number of neurons for different launched opti-
cal power levels, that is, for different strengths of the nonlinear distortion. Numerical
results also reveal that the MLP to process the two polarizations simultaneously
requires a larger number of neurons to achieve optimum performance. The rest of
the paper is organized as follows: In Sect. 2, we briefly present the theoretical back-
ground, including the Manakov equations that govern signal propagation through an
optical fiber and the basic concepts of MLPs. The simulation setup is described in
Sect. 3, whereas the results are presented in Sect. 4. Finally, the main conclusions are
drawn in Sect. 5.

2 Nonlinear Distortion Compensation Based on MLPs

In the present section, we introduce the Manakov equations and discuss the benefits
of processing both polarizations simultaneously. Afterward, the MLP architecture is
presented, describing the adopted configuration.

2.1 Propagation of Signals Through Optical Fibers

Signal propagation through an optical fiber is a complex process in which diverse


transmission effects interact. Among the linear effects affecting the propagation,
we can mention chromatic dispersion (CD), polarization mode dispersion (PMD),
22 Artificial Neural Networks for Self-phase Modulation … 261

attenuation, and polarization rotation, whereas the nonlinear mechanisms can be split
into the Kerr effect and stimulated scattering of light, which can be further classified
as stimulated Brillouin scattering (SBS) and stimulated Raman scattering (SRS) [5].
For the particular case of digital coherent systems, the lack of an optical car-
rier increases the SBS and SRS power thresholds, and therefore, these effects can
be neglected for typical launched optical transmission power levels. On the other
hand, the high baud rate makes the PMD have a significant effect. In addition, the
interferometric nature of the receiver in digital coherent systems and the adoption of
polarization multiplexing lead to a critical sensitivity to the fluctuations of the state
of polarization (SoP) of the incident optical signal. Consequently, it is important to
consider both polarizations. Thus, employing Jone’s formalism, the vectorial phasor
associated with the optical signal can be written as follows:
 
Ax  
E(t) = x̂ ŷ exp( jω0 t), (1)
Ay

where A x and A y are the complex amplitudes of the x and y polarizations, respec-
tively, x̂ and ŷ are the unit norm vectors indicating the directions of the x and y
polarizations, and ω0 is the central angular frequency of the signal. By setting a suit-
able spatiotemporal framework, the evolution of A x and A y can then be described
by the following set of partial differential equations [5]:

∂ Ax ∂ Ax jβ2 ∂ 2 A x α
+ β1x + + Ax
∂z ∂t 2 ∂t 2 2
 
2 jγ ∗ 2
= jγ |A x |2 + |A y |2 A x + A A exp(−2 jβz)
3 3 x y
∂ Ay ∂ Ay jβ2 ∂ 2 A y α
+ β1y + + Ay
∂z ∂t 2 ∂t 2 2
 
2 jγ ∗ 2
= jγ |A y |2 + |A x |2 A y + A A exp(+2 jβz). (2)
3 3 y x

with β = β0x − β0y . In this set of equations, z is the propagation coordinate, and
β1x and β1y are related to the inverse of the group velocity in the x and y polar-
izations, which differ due to the birefringence caused by the core ellipticity. β2 is
the second-order dispersion parameter (assumed not to be significantly affected by
the aforementioned ellipticity), and α is the intensity attenuation coefficient. The
right-hand side of both equations represents the Kerr effect that can be split into two
contributions. Both of them depend on the nonlinear coefficient γ that is related to
the nonlinear refractive index through γ = k0 n 2 /Aeff , being k0 = 2π/λ0 (λ0 is the
operation wavelength) and Aeff the effective modal area. Nevertheless, these two
nonlinear terms present different effects on the transmitted signal because the first
term causes a nonlinear phase rotation that depends on |E x |2 and |E y |2 , while the
second term represents an additive interference. The interpretation of the contribu-
tions of nonlinear effects depends on the criterion adopted to define signal. If we
262 G. Cossa et al.

consider that each polarization constitutes a signal, then the first term represents
intra-polarization SPM, the second term corresponds to the inter-polarization XPM,
and the third term represents the FWM between the two polarizations.
It is important to note that the nonlinear term couples the two polarizations.
This is not merely a curiosity, but it leads to profound implications that impact the
architecture of the nonlinear compensation MLP. Therefore, if each polarization is
processed individually, the only nonlinear term that is compensated is the term that
we identified as SPM. The information of the other two terms is regarded and appears
as a noise contribution. If both polarizations are simultaneously processed, on the
other hand, the inter-polarization nonlinear distortion can be partially mitigated.
In the particular case of DP-16QAM, that is, in systems where each polarization
is modulated with a 16QAM signal, the variation of the intensity in each polarization
leads to XPM between polarizations. This nonlinear polarization crosstalk has a
significant impact on the system performance, as can be concluded from the analysis
presented in [12]. The mitigation of this effect is far from trivial due to the interaction
between the chromatic dispersion and nonlinear effects described in Eq. 2.

2.2 MLPs as Adaptive Model Inverter

Among the different ANN architectures, feed-forward dense-connected networks,


also denominated MLP, are particularly interesting due to their flexibility to adapt to
a broad variety of problems and the efficient training process. In the present work, we
employed MLPs in supervised regression mode, fed with in-phase and quadrature
components of the transmitted ideal constellation and the received distorted con-
stellations. As we mentioned, we adopted two approaches: the first processed the
components of each polarization independently, and the second approach operated
considering the in-phase and quadrature components of the X and Y polarizations,
as shown in Fig. 1. In both cases, the inputs and outputs were normalized to ensure
zero mean and unit variance, and logistic function was chosen for the activation func-
tion. Regarding the training process, 70% of the data were used. The well-known
backpropagation method was combined with Adam optimizer, configured with an
exponential decay rate of 0.9 and an exponential decay rate for estimates of the sec-
ond moment of 0.999. The training was set to stop when the loss function does not
reduce 10−4 in the last 10 iterations or when the training attains 100 iterations.

3 Simulation Setup

In order to obtain the data for the MLPs’ training and validation, we used the com-
mercial software VPIphotonics Transmission Maker. This tool offers a broad variety
of modules to simulate not only optical devices such as fiber and optical amplifiers
but also the associated electronics and digital processing blocks. The bit rate of the
22 Artificial Neural Networks for Self-phase Modulation … 263

Transmitted constellation Noise+Nonlinear distortion


ŝi,x[n]

Dem.
ŝq,x[n]

X-pol

ŝi,y[n]

Dem.
ŝq,y[n]
si,x[n] s'i,x[n]
Mapping

sq,x[n] s'q,x[n] Y-pol


sin[n]
si,y[n]
TX RX s' [n]
i,y

sq,y[n] s'q,y[n]

ŝi,x[n]

Demapping
ŝq,x[n]
ŝi,y[n]
ŝq,y[n]

X and Y polarizations

Fig. 1 General block diagram of a digital optical coherent link, including the two MLPs employed
for nonlinear distortion compensation. In addition, the transmitted constellation and the distorted
constellation are shown

system was configured to 112 Gbps, that is, 56 Gbps per polarization, and the number
of the simulated symbols was set to 262,144.
The simulation setup is shown in Fig. 1. Two independent pseudorandom bit
sequences were mapped into 16QAM constellations, converted to the continuous
time, and filtered using Nyquist filters with 20% roll-off factors. These electrical sig-
nals modulated the in-phase and quadrature components of the two orthogonal polar-
izations of a continuous-wave laser. The two modulated polarizations were joined in
a polarization beam combiner and amplified using an erbium-doped fiber amplifier,
whose output power was swept from 4 to 12 dBm. The optical signal was then trans-
mitted through a fiber span of 175 km. At the receiver, the orthogonal polarizations
of the received signal were separated and combined with the corresponding polariza-
tion components of the receiver laser in 90-degree hybrid networks. The four outputs
of each 90-degree hybrid network were digitalized and fed into the DSP, where the
signals are orthogonalized and equalized before time, and phase synchronizations
were performed. Afterward, the frequency offset and phase noise were corrected
using frequency-domain shift and blind-phase search. A detailed description of the
setup and its parameters are given in [12]. Once the phase and time synchronizations
were performed and the phase noise and frequency offset corrected, the nonlinear
distortion was mitigated using the MLPs.
264 G. Cossa et al.

4 Results

In this section, we first analyze the training curves and optimize the MLP in terms of
the number of neurons for different launched optical power levels (6, 8, and 10 dBm)
considering the process of each polarization individually and both polarizations at
the same time. Afterward, the BER for launched optical power levels ranging from
4 to 12 dBm is analyzed when the different approaches are applied. Finally, the
complexity of the proposed MLP-based equalizers is briefly discussed.

4.1 Analysis of Training Curves and the Impact of Neuron


Numbers

In Fig. 2, we show the evolution of the loss function as the MLP is trained and
the obtained BER for different numbers of the hidden layer for the two proposed
approaches, that is, processing polarizations independently and simultaneously.
Regarding the training curves, we considered a hidden layer with 50 neurons. For
this configuration, the curves obtained for launched optical power levels of 6 dBm,
Fig. 2a, 8 dBm, Fig. 2b, and 10 dBm, Fig. 2c, present a pronounced initial drop fol-
lowed by a slower convergence stage. However, there are some differences as the
launched optical power is increased. The first difference is that for the lowest con-
templated launched power, 6 dBm, the training curves for the MLPs processing each
polarization independently and simultaneously almost overlap. Indeed, the two train-
ing curves converge to very similar values. When we increase the launched power
to 8 dBm, the two curves converge to slightly different values, and, following the
tendency, the difference between the final values of loss functions for single and
dual polarization processing increases for 10 dBm. The second remarkable differ-
ence when we increase the launched optical power is the required amount of epochs
to achieve convergence. Thus, when each polarization is individually processed, the
required epochs remain almost constant at around 25. When processing the two
polarizations simultaneously, on the other hand, it can be observed that the number
of required epochs increases from 28 for 6 dBm to 39 for 10 dBm. The comparison
between the required epoch numbers indicates that the MLP for processing the two
polarizations simultaneously is more complex than for a single polarization, which
was expected as the former processes more information. In addition, the fact that
the number of required epochs increases significantly for simultaneous processing
suggests that the higher launched optical power levels lead to more complex systems
that need to be trained for a longer time.
Regarding the effect of the number of neurons in the hidden layer, in Fig. 2a–c,
we show the BER of the validation symbols for power levels of 6 dBm, 8 dBm, and
10 dBm, respectively. For each power level, the number of neurons in the hidden
layer was swept from 5 to 50, and the BER obtained employing maximum likeli-
hood (ML) is included as a reference. At first glance, the main difference between the
22 Artificial Neural Networks for Self-phase Modulation … 265

100
(a) -3.55 (d)
-3.60

log10(BER)
Loss

10-1 -3.65 ML
SP
SP -3.70
DP
-3.75 DP
10-2
0 10 20 30 40 10 20 30 40 50
Epochs Number of neurons
100 -3.8
(b) (e)
ML
-4.0

log10(BER)
-4.2
SP
Loss

10-1
-4.4
SP
-4.6
DP DP
-2
10 -4.8
0 10 20 30 40 10 20 30 40 50
Epochs Number of neurons
100 -3.2
(c)
ML (f)
log10(BER)

-3.6
Loss

10-1
SP
SP -4.0

DP DP
10-2 -4.4
0 10 20 30 40 10 20 30 40 50
Epochs Number of neurons

Fig. 2 Evolution of the loss function during the training for MLPs processing each polarization
separately (SP) and the two polarizations together (DP) for different powers launched in the fiber:
a 6 dBm, b 8 dBm and c 10 dBm. BER in terms of the number of neurons in the hidden layer
considering individual processing and set of polarizations for different powers launched in the
fiber: e 6 dBm, f 8 dBm and g 10 dBm. The BER for ML detection has been included as a reference

subfigures corresponding to different launched optical power levels is the higher per-
formance difference for stronger launched optical power levels. Thus, for 6 dBm, the
performance when ML is adopted is similar to that achieved when MLP is used and,
therefore, the use of MLP does not seem to represent a significant advantage over ML.
For 8 dBm launched optical power, it is possible to identify some differences between
the BER values obtained using ML and MLP. Furthermore, processing each polar-
ization individually and both polarizations simultaneously present slightly different
behavior. For instance, the performance when each polarization is processed inde-
pendently is virtually independent of the number of neurons, whereas when the two
polarizations are simultaneously processed, the performance slightly enhances as the
266 G. Cossa et al.

number of neurons increases. Indeed, it is interesting to note that for very low neuron
numbers, the MLP for processing the two polarizations is outperformed by the MLP
processing each polarization but as the number of neurons increases and the MLP
becomes more complex, processing both polarizations leads to lower BER values.

4.2 Performance Analysis

Once the training and the effect of the number of neurons are analyzed, we set the
number of neurons to 50 and swept the launched optical power from 4 to 12 dBm
(outside this range of launched optical power, the signal quality was not enough to
allow the synchronization of the signal at the receiver side). The calculated BER
obtained using ML and MLP with single and dual polarization processing is shown
in Fig. 3a. Comparing the curves, it is possible to observe that for launched optical
power levels up to 6 dBm, the BER curves for the different approaches overlap. As
the launch optical power level increases, the curves separate, and the performance
enhancement when MLP-based nonlinear compensation is more significant. When
we contrast the performance of processing each polarization and both polarizations,
the enhancement is more significant for higher power levels, particularly for lev-
els above 8 dBm. This indicates that processing both polarizations simultaneously
improves the performance because, in addition to the intra-polarization SPM, the
inter-polarization XPM can be partially compensated.
The effect of the nonlinear compensation using MPL can be visualized in Fig. 3b,
where the received constellation is presented alongside the output of the MLP in two
configurations: processing each polarization individually and the two polarizations
simultaneously. Looking at the different obtained polarizations, we can observe the
characteristic spiral-like shape of the constellation when SPM and XPM are present
and the partial mitigation when MLP is employed. In fact, it is possible to perceive
a reduction in the point dispersion when both polarizations are processed together.

4.3 Complexity Analysis

The complexity analysis will be performed by counting the number of floating-point


operations required to process each received symbol in the test stage. Typically, the
test stage is considered because, due to the long coherent time of nonlinear effects,
the training process should be repeated very rarely. In addition, it is commonly
considered that the activation function is to be implemented using a look-up table, and
consequently, it does not contribute to the operation count. The number of operations
is then governed by the products of the synaptic weights and the sum to compute
the activation potential. Generally speaking, the number of operations required by
an MLP with an input layer with Ni neurons, a single hidden layer with Nh , and an
output layer with No can be calculated layer by layer:
22 Artificial Neural Networks for Self-phase Modulation … 267

(a) (b)
ML
SP
DP

Quadrature
-3
log10(BER)

-4

4 5 6 7 8 9 10 11 12 In-phase
Transmitted optical power [dBm]

Fig. 3 a BER in terms of the launched optical power considering maximum likelihood and MLP-
based equalization operating on each polarization independently and on both polarizations simul-
taneously. b Constellation diagrams in the absence of maximum likelihood detection (MLP) and
when MLP is applied to each polarization and both polarizations. The color code is the same as in (a)

1. Input layer: In the input layer, no operation is performed.


2. Hidden layer: In the hidden layer, each neuron needs to compute the activation
potential, that is, multiply each input by its corresponding weight and then sum
all the elements. Therefore, the number of operations in each neuron in the hidden
layer is given by:
hidden
Nop_neuron = Ni + (Ni − 1), (3)

and the total number of operations in the hidden layer is:

Nop_hidden = Nop_neuron
hidden
· Nh = [Ni + (Ni − 1)] · Nh (4)

3. Output layer: the number of the operations in each neuron of the output layer is
calculated similarly, obtaining:
out put
Nop_neuron = Nh + (Nh − 1), (5)

and, therefore, for the whole output layer, the number of operations is:

Nop_output = Nop_neuron
output
· No = [Nh + (Nh − 1)] · No . (6)

The total operation count is the sum of the previous counts, giving as a result:

Noper = Nop_hidden + Nop_output


= [Ni + (Ni − 1)] · Nh + [Nh + (Nh − 1)] · No . (7)
268 G. Cossa et al.

Fig. 4 Complexity of the 800

Number of requiered operations, Noper


MLP-based nonlinearities
compensation in terms of the
number of neurons in the 596
600
hidden layer. The
configurations of MLP
operating on a single DP
polarization with 10 neurons 400
and operating on dual SP
polarizations with 40
neurons are also identified 200
136

0
0 10 20 30 40 50
Number of neurons in the hidden layer, Nh

The previous expression can be particularized for the two contemplated cases, that is,
the MLP processing each polarization independently and processing the two polar-
izations. Therefore, we have for the former case:

2 · (7Nh − 2) for each polarization processed individually
Noper = (8)
15Nh − 4 for both polarizations processed simultaneously.

Note that we included a factor of 2 in the single polarization processing case to


account for the processing of the two polarizations (even if they are independently
corrected).
In Fig. 4, we show the complexity of the MLP-based compensator graphically
for each symbol in terms of the number of neurons in the hidden layer when each
polarization is processed individually and when they are processed together. As
expected from Eq. 8, the complexity of both approaches is similar when analyzed
as a function of the hidden layer; however, we should recall that processing each
polarization independently requires a lower number of neurons (according to Fig. 2e,
around 10 neurons) than the processing of the two polarizations (around 40). For
the sake of visibility, we have identified the complexity of the single polarization
processing with 10 neurons and the single polarization processing with 40 neurons,
which correspond to 136 and 596 operations, respectively.

5 Conclusions

In this paper, we have employed MLPs to compensate for the nonlinear distor-
tion in 175 km-long unrepeated digital coherent systems employing DP-16QAM.
In particular, we use two different MLPs, one of them operating in each polariza-
22 Artificial Neural Networks for Self-phase Modulation … 269

tion independently and another MLP that processed both polarizations at the same
time. Simulation results reveal that, indeed, MLPs are able to mitigate the nonlinear
distortion partially. Furthermore, we could observe that the MLP that operated on
the two polarizations simultaneously outperforms the MLP that only processed one
polarization because, in addition to SPM, it can also mitigate the XPM caused by the
orthogonal polarization. This performance enhancement, however, is achieved at the
expense of a higher computational loss. Therefore, the network designer can choose
between a high cost and superior performance or poorer performance with reduced
cost.

Acknowledgements The authors thank the Sao Paulo Research Foundation (grant number
15/24517-8) and The National Council for Scientific and Technological Development.

References

1. Cisco forecast (2016) Technical report. Cisco


2. El-Nahal FI (2018) Coherent quadrature phase shift keying optical communication systems.
Optoelectron Lett 14(5):372–375
3. Kikuchi K (2011) Digital coherent optical communication systems: fundamentals and future
prospects. IEICE Electron Express 8(20):1642–1662
4. Implementation agreement 400ZR (2020) Technical report. OIF
5. Agrawal G (2000) Nonlinear fiber optics. Springer, Berlin
6. Ip E, Kahn J (2008) Compensation of dispersion and nonlinear impairments using digital
backpropagation. J Lightw Technol 26(20):3416–3425
7. Gao G, Zhang J, Gu W (2013) Analytical evaluation of practical DBP-based intra-channel
nonlinearity compensators. Photon Technol Lett 25(8):717–720
8. Giacoumidis E, Aldaya I, Jaraajreh M, Tsokanos A, Le S, Farjady F, Jaouen Y, Ellis A, Doran N
(2014) Volterra-based reconfigurable nonlinear equalizer for coherent OFDM. Photon Technol
Lett 26(14):1383–1386
9. Aldaya I, Giacoumidis E, Tsokanos A, Jarajreh M, Wen Y, Wei J, Barry L (2020) Compensation
of nonlinear distortion in coherent optical OFDM systems using a MIMO deep neural network-
based equalizer. Opt Lett 45(20):5820–5823
10. Kurokawa Y, Kyono T, Nakamura M (2020) Polarization tracking and optical nonlinearity
compensation using artificial neural networks. In: Opto-electronics and communications con-
ference (OECC). IEEE Press, pp 1–3
11. Da Silva LM, De Paula R, De Oliveira JA, Santos M, Penchel R, Perez GG, Aldaya I (2021)
Nonlinear phase noise compensation in single-span digital coherent optical systems employing
artificial neural networks. In: International optics and photonics conference (SBFoton). IEEE,
pp 1–4
12. Aldaya I, Marim L, Borges L, Costa C, Abbade M (2020) Fiber-induced nonlinear limitation
in 400-Gbps single-channel coherent optical interconnects. In: Brazilian symposium of signal
processing and telecommunications, pp 1–4
Chapter 23
Comparative Analysis of Cognitive
Services in Popular Cloud Platforms

Preethi Sheba Hepsiba Darius, K. Krishna Sowjanya, V. N. Manju,


Sanchari Saha, Paramita Mitra, S. Aswathi, Bhuvanesh Bhattarai,
and Shreekanth M. Prabhu

1 Introduction

The root word for ‘cognition’ in Latin, ‘cognoscere’ translates to learn, to recognize,
to be acquainted with, to know, to find to be, and to inquire or examine. Cognitive
computing helps human experts by delving into the complexity of big data and
providing support which either humans or machines do on their own [1]. Prabhu
[2] explains that it works with reality (data) and knowledge (information) and turns
models into reality by perception, induction, conception, and deduction. Leading

P. S. H. Darius (B) · K. Krishna Sowjanya · V. N. Manju · S. Saha · P. Mitra · S. Aswathi ·


B. Bhattarai · S. M. Prabhu
CMR Institute of Technology, Bengaluru, Karnataka 560037, India
e-mail: preethisheba.h@cmrit.ac.in
K. Krishna Sowjanya
e-mail: sowjanya.k@cmrit.ac.in
V. N. Manju
e-mail: manju.vn@cmrit.ac.in
S. Saha
e-mail: sanchari.s@cmrit.ac.in
P. Mitra
e-mail: paramita.m@cmrit.ac.in
S. Aswathi
e-mail: aswathi.s@cmrit.ac.in
B. Bhattarai
e-mail: bhbh19cs@cmrit.ac.in
S. M. Prabhu
e-mail: shreekanth.p@cmrit.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 271
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_23
272 P. S. H. Darius et al.

cloud providers capitalize on offering cognitive APIs to developers and the global
cognitive market share is set to reach USD 15.28 Billion by 2023 [3]. Among the
key market players in cognitive services are IBM, Microsoft, AWS, and Google [4].
Microsoft Azure categorizes cognitive API as speech, language, vision, and deci-
sion. Table 1 presents the counterparts in Google, Amazon, and IBM Watson. Figure 1
shows the various end-user applications that use these cognitive services.
A comparative analysis of the various features, pros, and cons of cognitive APIs
for speech, language, vision, and decision among the key players in cloud platforms.
The scope of application and limitations of these APIs are examined by case studies.
The tools and techniques required to develop custom APIs are demonstrated.

Table 1 Comparison of cognitive APIs in Azure, Google, Amazon, and IBM Watson
Azure Google Amazon IBM Watson
Speech Speech to text Speech to text Amazon transcribe Speech to text
Text to speech Text to speech Amazon polly Text to speech
Speech Translation AI Amazon translate Language translator
translation
Speaker Speech to text Amazon transcribe Speech to text
recognition (includes Speaker
diarization)
Language Entity Cloud Natural Amazon kendra Natural language
recognition language understanding
Sentiment Cloud natural Amazon comprehend Natural language
analysis language understanding
Question DialogFlow Amazon mechanical IBM watson
answering turk Assistant
Conversational Media Amazon lex natural language
language translation understanding
understanding
Translator Translation AI Amazon translate Language translator
Vision Computer vision Video AI/ Amazon rekognition/ Watson visual
vision AI amazon lookout recognition
Custom vision Video AI/ AWS panorama Watson visual
vision AI recognition
Face API Video AI/ Amazon rekognition Watson visual
vision AI recognition
Decision Anomaly Timeseries Amazon lookout for Anomaly detection
detector insights API metrics/Amazon fraud
detector
Content Perspective Amazon rekognition
moderator API
Personalizer Amazon personalizer
23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms 273

Diagnosis and
Quality management
Treatment

CogniƟve Services API Safety and Se-


Supply Chain Maintenance
End User ApplicaƟon curity management

MarkeƟng Analysis PredicƟve Maintenance

Fig. 1 End user applications of cognitive services API

2 Cognitive APIs

Cloud cognitive APIs are the enablers of smart cities, Industry 5.0, smart homes,
and digital transformation in the economy and ecosystem among many others. The
broad classes of speech, language, vision, and decision APIs are discussed below.

2.1 Speech API

Virtual assistants for the visually disabled by Sultan et al. [5], real-time conversion
of speech to sign language by Jadhav et al. [6], giving instructions in an augmented
reality environment in industries described in Tseng [7], AI chatbots implemented
by Prasad et al. [8], home automation, video narration, voice-overs all rely heavily
on the efficiency of the speech API.
Azure offers options to create natural voices that can express emotions and
create custom models. The speech SDK is available in multiple programming
languages and works with local devices or Azure Blob storage. These capabilities are
enabled through speech-to-text, speech translation, and text-to-speech with speaker
recognition APIs.
IBM Watson offers speech-to-text services where users can customize audio’s
language, format, and sampling rate. In text-to-speech, voices are smooth with dialect
and language-appropriate rhythm and phrasing. When used with IBM Assistant, call
centers at MRS BPO report a 20% increase in revenue. Google’s Speech-to-text
provides customization and domain-specific trained models (voice control, phone
call, video transcription) for both public and private clouds. HSBC is one of the clients
that use this solution in every Cantonese-English call center that presents terms and
conditions [9]. Another speech-to-text service, Amazon Transcribe adds punctuation,
274 P. S. H. Darius et al.

and number normalization, and recognizes multiple speakers, and attributes it in


text. It has been used successfully in transcribing conversations between health-care
providers and patients, subtitling in videos, and generating call analytics. Amazon
Polly turns text into realistic speech and its neural and standard voices are priced
differently. Table 2 presents the comparison of the various speech APIs.

Table 2 Comparison of various speech APIs


AWS [10] Azure [11] Google cloud IBM Watson [12]
[12]
Models Neural Neural Neural, custom Neural, custom
Custom neural
Language 27 + dialects Neural voice with 120 languages 16 languages
support emotion expression
78 + dialects
SDK C + + , Go, Java, C#, C + + , Go, Java, Python, Java, Node.js,
JS, Kotlin,.NET, Java, JS, Python, etc Node.js, Python,.NET
PHP, Ruby, Rust, Ruby,Go,.NET,
Swift PHP
Pricing Amazon Speech to $0.006 / Text-to-Speech—USD
Transcribe—varies Text—Rs.82 per 15 s over 0.02 per thousand
depending on audio hour 60 min characters
domain, region, Text-to-Speech Monthly usage Speech-to-text—1,000,000
real-time, batch ≈ Rs.1300 for is capped at 1 + for USD 0.01/minute
Amazon Polly—“A real-time synthesis million
Christmas Carol” using neural nets minutes per
by Charles Dickens Speech month
~ 165 k characters, Translation—Rs.204
64 pages Standard per audio hour
$0.66, Neural, Speaker
$2.64 recognition—Rs.820
per 1000
transactions for
identification
Free tier Available on the Text to speech: 20 $300 in free Text-to-speech—10,000
free tier for transactions per credits to characters per month
12 months within second spend on 35 neural voices
usage quotas Transcription: 1 Speech-to-Text Speech-to-text—500 min
concurrent request 60 min for per month
Speech-to-text: not transcribing 38 pre-trained speech
available and analyzing models
Model audio per
Customization: 2 month
datasets, up to 300
requests per minute
23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms 275

2.2 Language API

Natural language processing (NLP) contains methods for speech and text processing
for automatic analysis and presentation in human language representation as
described by Cambria and White [13]. The recognition of entities, sentiment anal-
ysis, conversational language understanding, and translation services are important
features in language APIs. Dale [14] states that basic tasks about morphological and
syntactic analysis are provided by standard cloud APIs.
The various features of Language API are given below.
• Sentiment analysis determines the emotional opinion of it being positive, negative,
or neutral.
• Entity analysis identifies nouns like public figures or landmarks and common
nouns like schools, and buildings.
• Entity sentiment analysis identifies the emotional opinion about that entity.
• Syntactic analysis extracts linguistic information and provides this information
in tokens.
• Content classification analyses text content and assigns it to one of several content
categories.
Straightforward deployment of pipelines, easy upload and storage of information,
parallelization independent of the algorithm, load balancing, security and fault toler-
ance were listed as the technological blueprint required for providing NLP as a cloud
service by Pais et al. [15].
Popular NLP APIs include Amazon Comprehend [16], Microsoft Azure Cognitive
Services [17], and Google Cloud Natural Language [18].
The Amazon Comprehend service identifies entities and targeted emotions with a
confidence level for language by returning the dominant language from hundreds of
languages. Syntax analysis and topic modeling are also done. If a customer comment
is to be analyzed, assuming there are 500 characters and 6 units per request, it is
charged $0.0001 and the cost will be $6.00. A sample output from Amazon Compre-
hend is shown in Table 3 for consumer reviews in the tutorial [19]. The overall
sentiment is Positive as it has the highest sentiment score as compared to Negative,
Neutral and Mixed scores.
Microsoft Azure Cognitive Services has several applications that can analyze
sentiment and identify the language of a given text by using Azure Text Analytics

Table 3 Sentiment analysis


‘Sentiment’: ‘POSITIVE’
of consumer reviews using
Amazon comprehend [19] ‘SentimentScore’
‘Positive’ 0.762
‘Negative’ 0.066
‘Neutral’ 0.147
‘Mixed’ 0.024
276 P. S. H. Darius et al.

API. Azure Language Understanding service can understand things like user intent.
Google Cloud Natural Language works on emails, chat, and social media to iden-
tify entities, and perform sentiment and syntax analysis and categorization. Google
AutoML Natural Language allows users to provide training data to create custom
machine-learning models for users with more specialized needs. Another notable
API is Diffbot [20] which precisely extracts data from websites. MonkeyLearn [21],
automates workflows on unstructured data.

2.3 Vision API

Computer Vision (CV) is a technology that allows the machine to detect and recognize
people, places, and things in a given image with a human-like accuracy at higher
speed and efficiency. Often, with the help of machine learning models, it analyses
the images, identifies the features and classifies them, and provides useful insight to
the user. It is used mostly in the domains of autonomous robots, analysis of medical
imaging, identifying people on social media, etc.
AWS provides a service called Amazon Rekognition. It provides a deep-learning-
based visual search and image classification. AWS Computer Vision offers content
moderation, face compare, and search, labels, celebrity recognition, video segment
detection, face detection and analysis, and text detection. It can be used to detect
inappropriate content from videos/images, verify the identity of a celebrity online,
and analyze and streamline media content. It supports JPEG and PNG image formats
and the resolution should be between 320 × 240 and 640 × 480 or higher.
Computer vision in Microsoft Azure service analyses the content in the images or
videos and extracts the information, and provides useful insights to the user. Various
services are provided by the Azure cloud platform on computer vision consisting of
text extraction, image understanding, and spatial analysis with flexible deployment
models on the cloud. Azure could identify around 10,000 objects from an image.
Azure provides a cloud-based Computer Vision API with the flexibility of
choosing the inputs and the algorithms based on the user’s choice. The prominent
services provided are Optical Character Recognition (OCR), image analysis, face
detection and recognition, and spatial analysis. A sample of the vision API is shown
in Fig. 2.
Vision Studio by the Microsoft Azure platform lets the user explore, build and
integrate the features from Azure Computer vision. This tool uses REST APIs to
embed the services into the applications.
Google Cloud Platform (GCP) provides a computer vision environment, Vision
AI, that allows the user to create CV applications or derive insights from the images
and Videos. It supports these operations with the help of pre-trained APIs, Auto ML,
or custom models done by the users. It is accessible through REST and Remote
Procedure Call (RPC) APIs. It can detect objects, read printed and handwritten text
and build valuable metadata in the image catalog. It also supports the environment
Vertex AI Vision, which can be used to build CV applications with custom ML models
23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms 277

Fig. 2 Image captioning provided by Azure’s vision API “a yellow car on the street” with 55%
confidence [22]

Table 4 Computer vision API features in popular cloud computing platforms


Feature AWS Azure Google cloud
Supported APIs Amazon Rekognition REST API REST and RPC API
API
tool used Amazon Rekognition Vision studio Vision AI/vertex AI
Billing $0.01 per 1000 face 0–1 M transactions—$1 Based on User
vectors per month per 1000 transactions Specifications
Free credits 5000 images per month 5000 transactions free per $300 in free credits
month
Image format JPEG, PNG JPEG, PNG, GIF, BMP JPEG, PNG, GIF,
supported BMP, ICO
Maximum image 4096 × 4096 10,000 × 10,000 1024 × 1024
size

for unique customer needs to be optimized for accuracy, latency, and size. It can take
the input only through Streams to ingest real-time video data. Table 4 provides a
comparative view of the different features of computer vision services provided by
popular cloud computing platforms.

2.4 Decision API

Anomaly APIs
Anomaly detection is a process in machine learning which identifies events, data
points, and observations that deviate from a dataset’s normal behavior. In indus-
trial applications, Lima et al. [23] state that it is very challenging to find anomalies
from unlabelled time series data. In supervised anomaly detection, labelled data that
278 P. S. H. Darius et al.

represents previous failures or anomalies are used to learn the model. In unsuper-
vised detection, no labeled data is provided. In semi-supervised anomaly detection,
a small amount of labelled data is provided to validate the model and select the best-
performing model trained on normal data (or data with no anomalies). A sample
output for a univariate dataset using IBM Watson API [24] is shown in Table 5 using
PredAD and Chi-square labeling method. The anomaly score refers to the level at
which a data point deviates from the normal data. If the anomaly score is high, a
label of −1 is returned. If the label returned is 1 that means it is normal.
A comparison of various features is presented in Table 6.
Content Moderator API
Nowadays, User Generated Content (UGC) such as social media posts and content
published on the web in the form of text, image, or video needs to be routinely checked
for offensive or undesirable material as pointed out by Kharb [29]. Content Moderator
API provides these services and flags content. The application then proceeds to
enforce appropriate measures on the flagged content.
Content Moderation APIs use AI models to detect sensitive content in bodies
of text, including those shared via online platforms or social media. Azure Content
Moderator gives freemium services. In a free instance, 1 transaction per second
is allowed. In standard instances, 10 transactions per second are allowed. Use

Table 5 Sample output for anomaly detection for univariate dataset [24]
Anomaly detection algorithm PredAD—unsupervised time series prediction model
Labelling Method Chi-Square
Normal
{“timestamp”:"2017–01-01 05:45:00”,
“value”:{“anomaly_label”:[1.0],"anomaly_score”:[2.9599127858341574]}}
Anomaly
{“timestamp”:"2017–01-01 21:45:00”,
“value”:{“anomaly_label”:[-1.0],"anomaly_score”:[4.011492546951829]}}

Table 6 Comparison of features in anomaly detection


IBM Z® anomaly analytics Microsoft azure cognitive AWS cost anomaly detection
with Watson 5.1 services anomaly detector]
[25]
Metric-based anomaly Powerful inference engine Create pre-built or custom
detection and visualization monitors
Integrated log anomaly Automatic detection Set alert subscription
detection [26]
Topology service and hybrid Customizable settings [27] Receive alerts when anomalous
correlation spend is detected [28]
Univariate and multivariate
anomaly detector
23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms 279

cases of content moderator APIs are smart media monitoring, protecting advertisers,
protecting brand reputation, increasing brand loyalty, and increasing brand engage-
ment. Some limitations of Content Moderation APIs are moderation process is not
fully automated, mistakes in the identification of harmful content, and contextual
variations in speech, images, and cultural norms.
Personalizer API
The future of the digital experience is personalization. The power of customer data to
increase engagement, loyalty, and advocacy. Al Zhoube [30] discusses assessment-
based personalization learning in the cloud. Some personalizer APIs are Microsoft
Azure Cognitive Services Personalizer API and Amazon Personalize. In Microsoft,
for the freemium tier, 50,000 transactions for free per month are allowed and a 10 GB
storage quota is available. In standard instances, a charge per thousand transactions
is invoked. In Amazon Personalize API free trial data processing and storage up to
20 GB per month per eligible AWS Region may be availed. In paid services, prices are
per 100,000 users. Uses of personalizer are intent clarification and disambiguation,
default suggestions for menus and options, Bot traits and tone, etc. Some drawbacks
of Personalizer APIs are that the setup process is complex, documentation is not
good and pricing plans are not developer or customer friendly.

3 Case Studies

There are numerous case studies of success stories in using Cognitive APIs. Two
user stories are presented here.

3.1 Equadox Uses Cognitive Services to Help People


with Language Disorders

Equadox, a French company developed an application called Helpicto to help chil-


dren with autism communicate with pictograms and associated keywords. It was
developed in.NET, Azure SQL database, and Microsoft Cognitive Services. Figure 3
shows the workflow wherein an image is uploaded, Helpicto chooses results with
over 95% accuracy and chooses the most accurate keywords for display.
A screenshot of Helpicto is depicted in Fig. 4 with an associated pictogram for the
question “Do you want to draw?” to the child that can be repeated by the caregivers
and the child can respond with “Yes” or “No”.
280 P. S. H. Darius et al.

Helpicto Azure function Accuracy > 0.95


Analyse image Download blob Translate keywords Return JSON

Azure Storage Cognitive Services Cognitive Services


Blob Computer vision API Translator API

Fig. 3 Workflow of the Helpicto application [31]

Fig. 4 Screenshot of the Helpicto application in action [31]

3.2 IBM’s Cognitive Assistant for Siemens

Siemens and IBM created CARL [24], a Human Resource (HR) agent powered by
IBM Watson Discovery and IBM Watson Assistant. The Siemens HR division has a
workforce of around 4 lakhs.CARL was developed as a single point of contact for
all HR-related questions as shown in Fig. 5.

Fig. 5 CARL-your cognitive HR Assistant (created by SIEMENS) in action [24]


23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms 281

It initially addressed the most common topics like sick leaves or vacations. But
it is now customizable which allows CARL to meet employees’ unique needs. It
is deployed in over 20 countries. It is conversational in more than 200 topics and
responds to 1 million employee queries a month. It has made life easier for employees
at Siemens including the human resource department. It continues to evolve based
on improvements and suggestions by HR staff.

4 Conclusion

Cognitive Services API is revolutionizing the applications we use in day-to-day life:


issuing voice commands, subtitling, recognizing the speaker, translating between
various languages, conversing with a Chabot in natural language to answer queries,
captioning images, recognizing objects in images, moderating user content and
personalization are all made possible.
Several applications with a social and economic impact in the healthcare, finance,
automotive, and information technology industries with a significant increase in
revenue and reduction in manpower and effort are also observed with the use of these
APIs in intelligent applications. Those suffering from various debilitating effects on
their cognitive functions also benefit from applications that aid in the visual, auditory,
and language processing areas.
The variety of APIs and varying pricing schemes in the various cloud platforms
are to be deliberated and considered before using an API for a particular problem. The
challenges ahead for developers and organizations in using cloud APIs for cognitive
services are as follows:
• Identifying the API that best suit their domain and need.
• Comparing the prices of the API and quotas and determining which API to use.
• Developing custom models to improve performance for domain-specific needs.
• Analyzing which API can be used for the existing system (especially if it is not a
cloud-native application).
The success stories evident in several organizations that have adopted an intelligent
cloud solution do tip the scales in favor of cognitive services.

References

1. Cognitive Human-Computer Interaction - IBM (2022). https://researcher.watson.ibm.com/res


earcher/view_group.php?id=5695. Accessed 23 Oct 2022
2. Prabhu SM (2019) Making sense of AI and ML. https://www.researchgate.net/publication/336
994436_Making_Sense_of_AI_and_ML. Accessed 23 Oct 2022
3. Cognitive Services Market Size, Share and Global Market Forecast to 2023 | Markets and
Markets (2018). https://www.marketsandmarkets.com/Market-Reports/cognitive-services-mar
ket-155826417.html. Accessed 23 Oct 2022
282 P. S. H. Darius et al.

4. Cognitive Computing Market Size, Share | Global Industry Growth [2027](2020). https://www.
fortunebusinessinsights.com/cognitive-computing-market-103377. Accessed 23 Oct 2022
5. Sultan MR, Hoque MM, Heeya FU, Ahmed I, Ferdouse MR, Mubin SMA (2021) A bangla
virtual assistant for visually impaired. In: 2021 2nd international conference on robotics,
electrical and signal processing techniques (ICREST), pp 597–602
6. Jadhav S, Kumar S, Chauhan H, Negi S, Singh V (2018) Real-time conversion of speech to
sign language and hand gesture recognition. In: Application of communication computational
intelligence and learning. Routledge, pp 269–278
7. Tseng JL (2021) Intelligent augmented reality system based on speech recognition. Int J Circuits
Syst Sig Proc 15:178–186
8. Prasad PVKV, Krishna NV, Jacob TP (2022) AI CHATBOT using web speech API and Node.js.
In: 2022 international conference on sustainable computing and data communication systems
(ICSCDS). IEEE, pp 360–362
9. Case Study | Google Cloud. https://cloud.google.com/customers/hsbc. Accessed 30 Oct 2022
10. Amazon Transcribe – Speech to Text - AWS. https://aws.amazon.com/transcribe/?nc=sn&
loc=1. Accessed 30 Oct 2022
11. What is the Speech service? - Azure Cognitive Services | Microsoft Learn. https://learn.micros
oft.com/en-us/azure/cognitive-services/speech-service/overview. Accessed 30 Oct 2022
12. Speech to Text | IBM Cloud API Docs. https://cloud.ibm.com/apidocs/speech-to-text. Accessed
30 Oct 2022
13. Cambria E, White B (2014) Jumping NLP curves: a review of natural language processing
research. IEEE Comput Intell Mag 9(2):48–57
14. Dale R (2015) NLP meets the cloud. In: Natural language engineering, vol 21, no 4. Cambridge
University Press, pp 653–659
15. Pais S, Cordeiro J, Jamil ML (2022) NLP-based platform as a service: a brief review. J Big
Data 9(1)
16. Natural Language Processing – Amazon Comprehend – Amazon Web Services. https://aws.
amazon.com/comprehend/. Accessed 30 Oct 2022
17. Cognitive Services—APIs for AI Solutions | Microsoft Azure. https://azure.microsoft.com/en-
us/products/cognitive-services/. Accessed 30 Oct 2022
18. Cloud Natural Language | Google Cloud. https://cloud.google.com/natural-language. Accessed
30 Oct 2022
19. Get better insight from reviews using Amazon Comprehend | AWS Machine
Learning Blog. https://aws.amazon.com/blogs/machine-learning/get-better-insight-from-rev
iews-using-amazon-comprehend/. Accessed 30 Oct 2022
20. diffbot. https://docs.diffbot.com/docs/what-diffbot-product-do-i-need. Accessed 30 Oct 2022
21. MonkeyLearn - Text Analytics. https://monkeylearn.com/. Accessed 30 Oct 2022
22. AI Demos. https://aidemos.microsoft.com/computer-vision. Accessed 30 Oct 2022
23. Lima J, Salles R, Porto F, Coutinho R, Alpis P, Escobrar L, Pacitti E, Ogasawara E (2022)
Forward and backward inertial anomaly detector: a novel time series event detection method.
In: International joint conference on neural networks (IJCNN), pp 1–8
24. Siemens | CARL: Your Cognitive HR Assistant | The One Club. https://www.oneclub.org/por
tfolio/view/-8285/carl-your-cognitive-hr-assistant. Accessed 30 Oct 2022
25. Nawrocki P, Sus W (2022) Anomaly detection in the context of long-term cloud resource usage
planning. Knowl Inf Syst 64(10):2689–2711
26. An L, Tu A-J, Liu X, Akkiraju R (2022) Real-time statistical log anomaly detection with
continuous AIOps learning. In: Proceedings of the 12th international conference on cloud
computing and services science, pp 223–230
27. Hrusto A, Engstrom E, Runeson P (2022) Optimization of anomaly detection in a microservice
system through continuous feedback from development. In: IEEE/ACM 10th international
workshop on software engineering for systems-of-systems and software ecosystems (SESoS),
pp 13–20
28. Givnan S, Chalmers C, Fergus P, Ortega-Martorell S, Whalley T (2022) Anomaly detection
using autoencoder reconstruction upon industrial motors. Sensors (Basel) 22(9)
23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms 283

29. Kharb DL (2017) Embedding intelligence through cognitive services. Int J Res Appl Sci Eng
Technol V(XI):533–537
30. Al-Zoube M (2009) E-learning on the cloud. Int J Arab e-Technol 1(2)
31. How Equadex used Cognitive Services to help people with language disorders | Microsoft
Technical Case Studies (2017) https://microsoft.github.io/techcasestudies/cognitiveservices/
2017/08/04/equadexcognitives.html. Accessed 30 Oct 2022
Chapter 24
A Survey on Efficient Neural Network
Compression Techniques

Nipun Jain, Medha Wyawahare, Vivek Mankar, and Tanmay Paratkar

1 Introduction

The increase in computational capabilities of devices has tremendously impacted


deep learning research, resulting in highly accurate models that can even surpass
human-level performance. But generally, such models tend to have a lot of param-
eters which results in large sizes and high computational requirements for infer-
ence. Most of the systems that require real-time inferences (such as IoT systems,
robotics systems) and constrained systems on the cloud possess only limited compu-
tational resources. Neural network compression techniques play a crucial role to
deploy highly accurate deep learning models onto such resource-constrained systems.
The objective of these techniques is to shrink the size of neural networks without
compromising performance.
The implementation of deep learning in embedded devices is the key to intelligent
automation systems, for example, self-driving cars. Such applications would require
the use of more sophisticated neural network architectures like convolutional neural
networks (CNN) [1, 2], recurrent neural networks (RNN) [3, 4], and transformers.
This contributes to the increase in the system’s memory and storage requirements.
By using the right compression techniques, the size of these models can be reduced.

N. Jain (B) · M. Wyawahare · V. Mankar · T. Paratkar


Vishwakarma Institute of Technology, Bibwewadi, Pune 411037, India
e-mail: nipun.jain18@vit.edu
M. Wyawahare
e-mail: medha.wyawahare@vit.edu
V. Mankar
e-mail: vivek.mankar18@vit.ed
T. Paratkar
e-mail: tanmay.paratkar18@vit.edu

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 285
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_24
286 N. Jain et al.

Considering the example of computer vision tasks, convolutional neural networks


(CNNs) [1, 2]-based model architectures are proven to provide highly accurate solu-
tions for the research challenges like image classification, object detection, image
segmentation, and regression. Table 1 shows the Top-1 accuracy for various CNN-
based architectures. We can see that with the increase in model size, the accuracy
increases as well. Thus, we can say that the size of a neural network model has some
positive correlation with the accuracy of the model.
According to Moore’s law [12], the hardware capacity doubles itself every 2 years
and the cost of a semiconductor chip fabrication doubles every 4 years. Figure 1 shows
that the amount of computing required by major AI systems nearly quadrupled every
3–4 months.
The increase in hardware capacity also possesses certain limitations, while the
model architecture can reach any level of sophistication. This clearly shows that along
with the increase in research for better model accuracy, the need for an increase in
research for better NN compression techniques is a must.

Table 1 Accuracy and number of parameters of image classification models


Architecture Year Top-1 accuracy (%) Parameters (M)
DenseNet-169 [5] 2017 76.2 14
Inception-V3 [6] 2016 78.8 24
Inception–ResNetV2 [7] 2017 80.1 56
PolyNet [8] 2017 81.3 92
SENet [9] 2018 82.7 146
GPipe [10] 2018 84.3 557
ResNeXt-101 32 × 48d [11] 2019 85.4 892

Fig. 1 Standard neural network model architectures by year and the number of petaflops required
(for training) [13]
24 A Survey on Efficient Neural Network Compression Techniques 287

We know that the main goal in most real-world applications involving deep
learning inference is to attain maximum accuracy with the shortest possible run
time. As a model architecture grows in complexity, the number of floating-point
operations (FLOPs) also increases, and this demand increases in the storage and
processing capacities of a system. Thus, smaller models with better or similar
accuracy/performance are the key to the future.
Consider an example use case of image captioning; image captioning [14, 15]
is a technique that produces human-readable textual descriptions of images using
various techniques like natural language processing and deep learning methodolo-
gies. Remote sensing images [16, 17] give an account of images captured from a high
altitude like satellites where arguably the task of detection involves a higher degree
of complexity as compared to default object detection/classification techniques. To
overcome this, various neural techniques [18, 19] are employed to achieve a success
rate for the model while keeping the model lightweight. Similarly, there are various
ways to reduce the size of models thereby decreasing the running time at inference.
From the model point of view, techniques like quantization, pruning, knowledge
distillation, and efficient model architecture can be used. These techniques aim to
shrink the size of the neural network model by making some suitable architectural
changes. Quantization is the process of approximating the high-bit floating-point
numbers used in a neural network to low floating-point numbers, for example, if we
change the size of learned weight parameters from FP32 to FP16, the overall size of
the model will get reduced.
Pruning is the process of selectively eliminating redundant connections between
the neurons in a neural network. This decreases the model size and number of compu-
tations required during inference. Knowledge distillation is the process of training a
smaller model by using a larger model; the goal is to achieve similar accuracy with
the smaller model so that it can be used for inference rather than the original larger
model. The efficient model architecture is a technique that aims for creating smaller
and more efficient models which can produce similar results compared to larger and
more sophisticated model architectures.
The remaining paper is outlined as follows: In Section II, we summarize and
discuss quantization along with its implementation and results with respect to various
deep learning tasks such as detection CNN, speech recognition, and machine trans-
lation. In Section III, we discuss the pruning method for NN compression, including
an analysis of its performance on various tasks with respect to standard datasets like
CIFAR-10 [20] and ImageNet [21]. In Section IV, we summarize the knowledge
distillation method and analyze its performance and applications on various tasks.
In Section V, we discuss various efficient neural architectures and summarize their
applications with respect to various deep learning tasks. In Section VI, we compare
and analyze the observed results of all the mentioned results, and finally, in Section
VII, we provide a conclusion and our recommendations on the above-discussed
compression techniques for deep learning architectures.
288 N. Jain et al.

2 Quantization

The purpose of quantization is to minimize the size of a model by converting a


neural network’s weights and activations into smaller precision numbers. Thus, all the
internal computations are performed with lower bit values and the trained parameters
are stored in a smaller precision format. This helps in reducing the computation and
memory requirements of a model.
Different techniques like mixed-precision training [22], binary connect [23], near
lognormal gradients [24], adaptive gradient quantization [25], etc., can be used to
achieve this goal. Sharan Narang et al. [22] presented the mixed-precision training
approach, which utilizes mixed 16-bit and 32-bit floating-point values in a model
while training to make it execute faster and consume less memory while retaining
high accuracy and shrinking the model’s overall size. Matthieu Columbarium et al.
[23] proposed a methodology in which, during forward and backward propagations,
a deep neural network is trained with binary weights, while the precision of the stored
weights in which gradients are collected is preserved.
The method described by Brian Chmiel et al. [24] focuses on approximate neuron
gradients because they have statistical features that are significantly different from
typical weights and activations. The goal of adaptive gradient quantization is to
update compression techniques in parallel by computing enough statistics from a
parametric distribution efficiently. Mixed-precision training is the most extensively
used of these strategies in most deep learning applications.
Neural networks typically use 32-bit floating-point values (FP32) to store the
eights and activation gradients during forward and backward propagations. Reduced
precision describes this idea of using 8- or 16-bit values like (INT8, FP16, etc.)
instead of 32-bit floating-point values. In the mixed-precision method, we switch
between 16- and 32-bit precision in layers of a neural network.
Sharan Narang et al. [22] showed that this method is useful to reduce the model
size without actually losing accuracy or changing hyperparameters. Here the loss
of any critical information is prevented by strategically accumulating FP16 products
into FP32, using single-precision master weights and loss scaling. Because the neural
network parameters are stored as FP16 in mixed precision, an FP32 master copy of
the weights is kept and updated with the weight gradient throughout the optimizer
step. This is done in order to match the accuracy of standard FP32 networks. Figure 2
shows the training iteration for a layer maintaining an FP32 master copy of weights.

Fig. 2 Training iteration for a layer in mixed precision


24 A Survey on Efficient Neural Network Compression Techniques 289

Compared to FP16 precision, the FP32 precision has a much higher dynamic
range making it possible to avoid numeric overflow and underflow. However, in
FP16 precision, any value above 65,504 will become infinity (overflow) and any
value below 6.0 × 10^−8 will become zero (underflow). The idea of loss scaling is
to multiply the loss value with a suitable multiplication factor so that the overflow and
underflow issues can be avoided. Finally, single-precision outputs are transformed
to half-precision before being stored in memory to retain model correctness.
The mixed-precision training methodology works across a wide range of advanced
tasks, such as object detection, speech recognition, and machine translation. Sharan
Narang et al. [22] trained the Faster-RCNN model [26] using mixed precision with
loss scale and found that the model outperformed the baseline of 69.1% (mAP) on the
Pascal VOC 2007 test set. Similarly, the Deep Speech 2 model for speech recognition
trained using mixed precision on the English dataset has achieved close results to the
original baseline of 2.20 Character Error Rate (CER) with 1.99 CER.
Along with the object detection and speech recognition tasks, mixed-precision
training has also shown good results for machine translation tasks. Figure 3a and b
shows the training perplexity of the t3 × 1024 LSTM [27] model for the English to
French translation task without and with mixed-precision technique. Three separate
FP32 training runs are represented by ref1, ref2, and ref3. This shows that during
training, the half-precision storage format may operate as a regularizer.
Binary connect [23] is another popular technique for quantization that has shown
good results in the test-time inference of DNN models trained on standard benchmark
datasets like Population MNIST [28] and CIFAR-10 [20].
The stochastic version of the binary connect technique has shown 8.27% error
rates of DNNs trained on the CIFAR-10[20] dataset. This shows that, despite using
only a single bit per weight during propagation, performance is not only comparable
to that of ordinary (non-regularizer) DNNs, but actually better, implying that binary
connect can be considered a regularizer.

3 Pruning

Over-parameterized networks which are generally large networks that contain redun-
dancies to remove these redundancies pruning are used. Removing these redundan-
cies results in a reduction in the size of the model and increases the speed. Pruning
can also be defined as the removal of unused parameters from the other network
which is over-parameterized.
Similar works in structured pruning like data-driven sparse structure selection [29]
and HAP [30] have an immense contribution to reducing the size and computational
complexity of applications.
A similar approach is proposed in AMC [31]; the approach uses reinforcement
learning. This approach provides the policy of model compression, which performs
much better than the conventional rule-based compression policy. The conventional
290 N. Jain et al.

B
Fig. 3 English to French translation network training perplexity

rule-based compression policy has a higher compression ratio and better accuracy
as compared to the model compression policy.
Similarly in HAP [20], instead of pruning all the components, the components
which are not sensitive are pruned.
Pruning revolves around the idea of cutting down additional weights in order to
reduce computational and memory expenses [29]. The basic principles of pruning
consist of removing unnecessary weighted information using second derivative infor-
mation which results in better results, a much-improved speed of processing the
results, and a significant reduction in size as well. The decision of importance and
an important wait is done through the ranking of neurons from the neural network
that has been explained in optimal brain damage. In order to avoid pruning mystery
neurons, it is an iterative process. As neural networks are black boxes, this also
ensures that a significant part of the network is not lost.
AutoML for model compression (AMC) [31] is to find the irrelevant weights and
biases for each layer on the basis of sparsity. Indian celebrity’s reinforcement learning
24 A Survey on Efficient Neural Network Compression Techniques 291

Fig. 4 Architecture of AutoML for model compression engine

for efficient search or actions face; however, they have introduced the detailed setting
of reinforcement learning framework using three catalysts.
• The State Space
• The Action Space
• Deterministic Policy Gradient (DDPG).
As shown in Fig. 4, on the left AMC replaces manual efforts and makes model
compression fully automated. In the right form, as a reinforcement learning program,
it processes a pre-trained network (e.g., MobileNet [32]) per layer.
In order to achieve both accuracy and latency, a single non-RNN controller is
required on AMC engine optimization which will not only help assist exploration
using fewer GPU hours but also support continuous action space.
In VGG-16 [33], AMC [31] outperformed all heuristic methods by more than
0.9% and beat human experts by 0.6% without manual efforts. Even for MobileNet
V2 [34], which is the best model designed, still 1% accuracy can be improved using
AMC.
AMC successfully compressed the ratio of ResNet-50 [35] on ImageNet from 3.4
times to 5 times. Without loss of performance on ImageNet [36] (AMC’s pruned
model top-5 accuracy came out to be 92.89%).
Guo et al. [37] showed dynamic network surgery to prune parameters during
training, but the nature of irregular sparse weights limited them to yield compression
to not faster inference in terms of wall clock time.
  2
Loss = 1/N yi − Q Si , ai |θ Q (1)
i

yi = ri − b + γ Q(si+1 )|θ Q .

In Eq. 1, here γ is the discount factor, and it is set to 1 so that there is no over-
prioritizing of short-term reward.
292 N. Jain et al.

4 Knowledge Distillation

Knowledge distillation is a technique that is used for transferring knowledge between


two models. A dataset is split into two parts, the larger model, the teacher, is trained
and the smaller model, the student, is trained to behave the same way as the teacher, the
larger model. Knowledge distillation has various applications like natural language
processing, speech recognition, and detection of objects.
Various works in knowledge distillation like model compression via distillation
and quantization, learning from noisy labels with distillation, and dreaming to distill:
Data-free Knowledge Transfer via DeepInversion have contributed to the reduction
and compression of neural networks.
In quantized distillation [38], it uses distillation loss while training the teacher
network, which is then used during the training of the student network.
Similarly, in noisy labels with distillation [39], there are two datasets, a noisy
dataset and a clean dataset; the objective is to train the noisy dataset and use it on a
small clean dataset.
Data-free Knowledge Transfer via DeepInversion [40] involves two techniques:
A. DeepInversion B and Adaptive DeepInversion.
DeepInversion is another method in knowledge distillation used to synthesize
images from different networks which are trained on various datasets like CIFAR-
10 [20], ImageNet [36], etc. To increase the diversity of these images, Adaptive
DeepInversion is used. Adaptive DeepInversion avoids the repetition of images which
helps to maintain diversity.
As shown in Fig. 5, DeepInversion is applied to the ResNet50v1.5 model which
is trained on ImageNet to synthesize the images. These images are then trained
on another ResNet50v1.5 model from the very beginning. The images which are
synthesized act as a teacher network, and they are in return used to train a student
network.
Images generated from DeepInversion can also be applicable to data-free continual
learning. Continual learning is a concept in which a model learns sequentially by
acquiring knowledge from previous data.
When images are generated by DeepInversion from a pre-trained ResNet-50 on
ImageNet dataset, it is found that images generated by DeepInversion have high
resolution.

Fig. 5 Images obtained by DeepInversion are trained on the ResNet-50 classifier


24 A Survey on Efficient Neural Network Compression Techniques 293

When compared to different networks, the classification accuracy of ResNet-50


and also the images generated are of high resolution and have detailed features and
textures. Also, when the inception score of DeepInversion is compared to Deep
Dream, DeepInversion performs better than Deep Dream by an inception score of
54.4.
We can enhance the quality of images of Deep Dream by expanding image
regularization and using a new image feature distribution regularization term.
The new image feature distribution regularization term can be evaluated by
       
R(x) =  μl x̂ − E(X ) +  σ L2 x̂ − E(X )2 . (2)
 
In above Eq. 2, μl x̂ is batch-wise mean and σ L2 is variance. Operators such as
E(.) and ||.||2 show the value that is expected and l2 norm calculations, respectively.
Along with the quality of images, diversity of images is also important to avoid
redundancy of images. For this, an additional loss (Rcomplete) is introduced for the
generation of images which is based on a divergence called Jensen–Shannon (JS).
Rcomplete can be calculated using
    
Rcomplete = 1 − Js PT x̂ ,PS x̂ (3)

             
JS PT x̂ ,PS x̂ = 1/2 KL pt x̂ , M + KL ps x̂ , M .
   
In Eq. 3, JS PT x̂ , ps x̂ is the average of the teacher–student distribution.
The top one accuracy of DeepInversion surpasses that of a Deep Dream by a signif-
icant margin when considering models like ResNet-18, Inception-V3, MobileNet V2
[28], and VGG-11.
After adding feature distribution regularization, there is an improvement in
accuracy by 40–69%.
Upon using competition-based inversion, it is observed that there is an improve-
ment in accuracy by 1–10% which brings the accuracy of the student to that of a
teacher who is trained on the CIFAR-10 dataset.
Quantized distillation [38] is the method with better accuracy as compared to
an array of bit widths and architectures. It performs better postmortem quantization
for 2-bit and 4-bit quantization. It has a better accuracy which is within 0.2% of
the teacher at 8 bits on the larger student model and a small accuracy loss at 4-bit
quantization.

5 Efficient Model Architecture

In recent times, there has been high demand to make space-efficient neural networks.
Various approaches like [41–45] are categorized as either compressing pre-trained
networks or simultaneously training small networks.
294 N. Jain et al.

Fig. 6 Convolutional layer using batch norm and ReLU

MobileNet [46] is a type of network architecture that gives the model developer
the freedom to choose a mini network that matches the resource requirement. For
their application, Andrew G. Howard et al. primarily improves the latency while
working on small networks.
Another efficient network is the SqueezeNet [42], which makes use of the
bottleneck approach to design an efficient network.
Figure 6 shows a standard convolutional layer with rectified linear unit (ReLU)
and batch norm (BN) and the right side also shows rectified linear unit and batch
norm, but they are preceded by depth-wise and pointwise layers.
Depth-wise separable convolution is the base of MobileNet architecture. All its
layers are followed by batch norm and ReLU nonlinearity. The final spatial reso-
lution is reduced to 1 using average pooling before the fully connected layers. In
total, MobileNet has 28 layers. Even though MobileNet architecture is a very space-
efficient and low-latency network. To make it even smaller, a simpler parameter
width multiplier can be used. The use of a width multiplier is to uniformly shrink
the network at each layer.
Expression 4 represents the formula for the calculation of the computational cost
of a depth-wise separable convolution.

Dk .Dk .α M.D F .D F (4)

Expression 5 represents the formula for calculation of computational cost of a


depth-wise separable convolution when width multiplier α is taken into consideration:

Dk .Dk .α M.D F .D F + α M.α N .D F .D F , (5)

where Dk is the spatial dimension of kernel which is assumed that it is squared, M


is number of input channels, and N is the number of output channels.
Expression 6 represents the formula for calculation of computational cost of
the depth-wise separable convolutions along width multiplier α and an resolution
multiplier represented as ρ:

Dk .Dk .α M.ρ D F .ρ D F + α M.α N .ρ D F .ρ D F . (6)


24 A Survey on Efficient Neural Network Compression Techniques 295

When depth-wise separable convolutions are compared to full convolutions, it is


observed that there is a reduction in accuracy by 1% on the ImageNet dataset.
When thinner models are compared to shallow models, it is observed that
MobileNets thinner are better than MobileNets shallower by 3%. When MobileNet
is shrunk using a width multiplier, its accuracy drops off until the value of the width
multiplier is 0.25.
MobileNet is as accurate as VGG-16 even though it is 32 times smaller and has
27 times less computation, whereas it is more accurate than GoogleNet even though
it is smaller than it and has more than 2.5 times less computation.
When MobileNet is reduced with a width multiplier at 0.5 by reducing the reso-
lution of the images to 160 × 160, it is observed that MobileNet is better than
AlexNet in terms of size and computation. It is 45 times smaller and has 9.4 times
less computation. At the same size, it is better than SqueezeNet by 4% and 22 times
less computation.
Deep roots [47] were able to achieve a reduction in CPU and GPU run time for
the best performance without compromising accuracy.
Compared with other counterparts, ShuffleNet V2 [48] recorded better accuracy,
but the speed of the GPU of MobileNet V1 is significantly greater than that of
shuffle. With the evolution of CNN, RNN, LSTM in image classification, and NLP
tasks, deep learning models have become more complex and harder to manage. The
advancement in size is usually associated with an improvement in accuracy and
precision. It comes with undesirable costs like longer training time, inference time,
and larger memory usage. The four mentioned methods have emerged as crucial
strategies for the compression of these models.

6 Discussion

We have shown that, with the increase in research for better model accuracy, the need
for an increase in research for better NN compression techniques is a must. Pruning
involves the removal of unnecessary weights and biases in order to get a small and
efficient model. On the other hand, quantization reduces the number of bits in which
weights are stored to achieve a smaller size, while the knowledge distillation method
involves training a deep teacher network on the dataset and then training a small
student network to learn from a teacher network with an aspiration that the smaller
network will achieve similar performance as the bigger network.
Pruning connections however lead to sparse matrices which results in computa-
tional difficulty. Since in a complex network there are so many connections, pruning
them is not computationally cheap and can cause its own problems. Alternatively,
a simple approach using quantization techniques may sometimes lead to a substan-
tial loss in accuracy, for example, in binarization, 32 × model compression can
be achieved, but this has shown poor accuracy on LSTM and RNN models since
its simplicity impacts the vanishing/exploding gradients. Loss-aware quantization
techniques can be considered a better approach to simple static quantization as it
296 N. Jain et al.

quantized weights with respect to the loss, showing superior performance to more
static quantization methods.
Efficient neural architecture basically focuses on the data flow management of a
neural network architecture in order to acquire the best accuracy in the least memory
usage. A plethora of research still needs to be done in the domain of designing
efficient neural architectures in order to make them useful for various use cases in
image classification and segmentation.

7 Conclusion

In this paper, we have seen some of the main neural network compression tech-
niques, namely quantization, pruning, knowledge destination, and efficient model
architecture. We have analyzed the implementation of these methods and discussed
the pros and cons of each method. Before implementing the compression technique,
it is important to understand how each and every method works and what impact will
it bring on the performance of the model. From our comparative analysis, knowl-
edge distillation could be particularly a better subset of model compression methods
as it requires less human effort. We also believe, depending on the use cases, each
of these methods can prove pretty helpful when it comes to reducing the size of
the model. With AI technology spreading its roots to resource-constrained edge
devices, the development of advanced neural network compression techniques is a
must. This will also play a vital role in increasing the usability of NN models in
resource-constrained systems such as IoT and space applications.

References

1. Kim P, Convolutional neural network. In: MATLAB deep learning. Apress, Berkeley, CA
2. O’Shea K, Nash R, An introduction to convolutional neural networks
3. Mandic D, Chambers J, Recurrent neural networks for prediction: learning algorithms,
architectures, and stability. Wiley
4. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Sig Proc
45(11):2673–2681. https://doi.org/10.1109/78.650093
5. Huang G, Liu Z, van der Maaten L, Weinberger KQ, Densely connected convolutional networks.
arXiv:1608.06993 [cs.CV]
6. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z, Rethinking the inception architecture for
computer vision. University College London
7. Szegedy C, Ioffe S, Vanhoucke V, Alemi A, Inception-ResNet and the impact of residual
connections on learning
8. Zhang X, Li Z, Change C, Lin LD, PolyNet: a pursuit of structural diversity in very deep
networks
9. Hu J, Shen L, Albanie S, Sun G, Wu E, Squeeze-and-excitation networks. Comput Vis Pattern
Recog (cs.CV)
24 A Survey on Efficient Neural Network Compression Techniques 297

10. Huang Y, Cheng Y, Bapna A, Firat O, Chen MX, Chen D, Lee H, Ngiam J, Le QV, Wu Y, Chen
Z, GPipe: efficient training of giant neural networks using pipeline parallelism. Comput Vis
Pattern Recog (cs.CV)
11. Xie S, Dollar RGP, He ZTK, Aggregated residual transformations for deep neural networks.
Facebook AI Research, UC San Diego
12. Schaller R (1997) Moore’s law: past, present, and future. IEEE Spectrum 52–59
13. Amodei D, Hernandez D (2018) AI and Compute. Open-ai Research
14. Stefanini M, Cornia M, Baraldi L, Cascianelli S, Fiameni G, Cucchiara R (2021) From show
to tell: a survey on image captioning. arXiv preprint arXiv:2107.06912
15. Hossain Z, Sohel F, Shiratuddin MF, Laga H (2019) A comprehensive survey of deep learning
for image captioning. ACM Comput Surv 51(6) Article 118 36 pp
16. Li K, Wan G, Cheng G, Meng L, Han J (2020) Object detection in optical remote sensing
images: A survey and a new benchmark. ISPRS J Photogramm Remote Sens 159:296–307
17. Yuan Q, Shen H, Li T, Li Z, Li S, Jiang Y, Xu H, Tan W, Yang Q, Wang J, Gao J (2020) Deep
learning in environmental remote sensing: achievements and challenges. Remote Sens Environ
241:111716
18. Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A survey of convolutional neural networks: analysis,
applications, and prospects. IEEE Trans Neural Netw Learn Syst
19. Kiranyaz S, Avci O, Abdeljaber O, Ince T, Gabbouj M, Inman DJ (2021) 1D convolutional
neural networks and applications: a survey. Mech Syst Signal Process 151:107398
20. CIFAR10 to compare visual recognition performance between deep neural networks and
humans. Tien Ho-Phuoc the University of Danang – University of Science and Technology
21. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical
image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255.
https://doi.org/10.1109/CVPR.2009.5206848
22. Narang S, Micikevicius P, Diamos G, Elsen E, Alben J, Garcia D, Ginsburg B, Houston M,
Kuchaiev O, Venkatesh G, Wu H (2018) Mixed precision training. ICLR
23. Courbariaux M, Bengio Y, David J-P (2016) BinaryConnect: training deep neural networks
with binary weights during propagations. CS.LG
24. Chmiel B, Ben-Uri L, Shkolnik M, Hoffer E, Banner R, Soudry D, Neural gradients are
near-lognormal: improved quantized and sparse training. In: Habana labs—an intel company.
Caesarea, Israel, Department of Electrical Engineering - Technion, Haifa, Israel
25. Faghri F, Tabrizian I, Markov I, Alistarh D, Roy DM, Ramezani-Kebrya A, Adaptive gradient
quantization for data-parallel SGD. University of Toronto, Vector Institute, IST Austria and
Neural Magic
26. Ren S, He K, Girshick R, Sun J, Faster R-CNN: towards real-time object detection with region
proposal networks. Comput Vis Pattern Recog. arXiv:1506.01497 [cs.CV]
27. Hochreiter S, Schmidhuber J, Long short-term memory, Neural Comput 9:1735–80. https://
doi.org/10.1162/neco.1997.9.8.1735
28. LeCun Y, The mnist database of handwritten digits. Courant Institute, NYU, Corinna Cortes,
Google Labs, New York, Christopher J.C. Burges, Microsoft Research, Redmond
29. Huang Z, Wan N, Simple T, Data-driven sparse structure selection for deep neural networks
30. Yu S, Yao Z, Gholami A, Dong Z, Kim S, Mahoney MW, Keutzer K, Hessian-aware pruning
and optimal neural implant. Peking University, University of California, Berkeley
31. Hi Y, Lin J, Liu Z, Wang H, Li L-J, Han S, AMC: AutoML for model compression and acceler-
ation on mobile devices. Massachusetts Institute of Technology, Carnegie Mellon University,
Google
32. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H,
MobileNets: efficient convolutional neural networks for mobile vision applications. Comput
Vis Pattern Recog. arXiv:1704.04861 [cs.CV]
33. Simonyan K, Zisserman A, Very deep convolutional networks for large-scale image recognition.
Comput Vis Pattern Recog. arXiv:1409.1556 [cs.CV]
34. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen, L-C, MobileNetV2: inverted residuals
and linear bottlenecks. Comput Vis Pattern Recog. arXiv:1801.04381 [cs.CV]
298 N. Jain et al.

35. He K, Zhang X, Ren S, Sun J, Deep residual learning for image recognition. Comput Vis Pattern
Recog. arXiv:1512.03385 [cs.CV]
36. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A,
Bernstein M, Berg A, Fei-Fei L (2014) ImageNet large scale visual recognition challenge. Int
J Comput Vis. 115. https://doi.org/10.1007/s11263-015-0816-y
37. Guo Y, Yao A, Chen Y, Dynamic network surgery for efficient DNNS. In: NIPS
38. Model compression via distillation and quantization. Antonio Polino- ETH Zurich, Razvan
Pascanu - Google DeepMind , Dan Alistarh - IST Austria
39. Li Y, Yang J, Song Y, Cao L, Luo J, Li, L-J (2017) Learning from noisy labels with distillation.
CS>CV
40. Yin H, Molchanov P, Li Z, Alvarez JM, Mallya A, Hoiem D, Jha NK, Kautz J, Dreaming to
distill: data-free knowledge transfer via DeepInversion. In: NVIDIA. Princeton University, the
University of Illinois at Urbana-Champaign
41. Jin J, Dundar A, Culurciello E (2014) Flattened convolutional neural networks for feedforward
acceleration
42. Iandola FN, Moskewicz MW, Ashraf K, Han S, Dally WJ, Keutzer K (2016) Squeezenet:
Alexnet-level accuracy with 50x fewer parameters and 1MB model size
43. Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnornet: Imagenet classification using
binary convolutional neural networks. arXiv preprint
44. Wang M, Liu B, Foroosh H (2016) Factorized convolutional neural networks
45. Yang Z, Moczulski M, Denil M, de Freitas N, Smola A, Song L, Wang Z (2015) Deep-
fried convnets. In: Proceedings of the IEEE international conference on computer vision, pp
1476–1483
46. Andrew G, Menglong H, Chen ZB, Kalenichenko D, Weyand WWT, Andreetto M, Adam H,
MobileNets: efficient convolutional neural networks for mobile vision applications. Google
Inc.
47. Ioannou Y, Robertson D, Cipolla R, Criminisi A, Deep roots: improving CNN efficiency with
hierarchical filter groups. University of Cambridge and Microsoft Research
48. Ma N, Zhang X, Zheng H-T, Sun J, ShuffleNet V2: practical guidelines for efficient CNN
architecture design. Megvii Inc (Face++) and Tsinghua University
Chapter 25
Ortho-FLD: Analysis of Emotions Based
on EEG Signals

M. S. Thejaswini, G. Hemantha Kumar, and V. N. Manjunath Aradhya

1 Introduction

Everyday interactions of human being along with the external environment depend
on several emotional states ranging from basic to complex ones. In recent years, fast-
growing and rapid advances in the development of machine learning and information
technology have made it feasible to empower machine intelligence in the analysis
of emotions from various perspectives. Emotion is a physiological condition that
serves as a representation of individual moods, and also, they are a powerful source
in determining the shapes/outlooks of how we feel about particular events around
us. Involving affective, cognitive, expressive, and motivational components, they are
considered multi-component phenomena [1]. Unhappy circumstances in humans and
the core of mental illness are caused by an emotional imbalance. Therefore, analyzing
different emotional states and developing an emotionally intelligent system is a cru-
cial task in the field of affective computing. Recent records of literature demonstrate
that audiovisual and physiological signals [2] are two kinds of emotional reflections
used in eliciting emotions from various applications. In general, reference points
of an audiovisual research study are drawn from facial expression [3], speech [4],
and body movements/gestures [5, 6]. On the other hand, these modes of emotional
reflection may be controlled and varied based on the internal and external sources
around. Hence, academic research in this field may be negatively impacted due to
this complexity and variability between individuals and situational heterogeneity.
Physiological signals are true in nature and may not be under the control of humans,

M. S. Thejaswini (B) · G. Hemantha Kumar


Department of Studies in Computer Science, University of Mysore, Mysuru, Karnataka, 570006,
India
e-mail: thejaswini@compsci.uni-mysore.ac.in
V. N. Manjunath Aradhya
Department of Computer Applications, JSS Science and Technology University, Mysuru,
Karnataka 570006, India
e-mail: aradhya@sjce.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 299
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_25
300 M. S. Thejaswini et al.

and also, it is difficult to be fake, in this regard experimentation based on physio-


logical signals are more efficient and accurate than audiovisual studies in detecting
emotions. In the past, recent studies suggest that the relationship between physio-
logical signals and emotions is extensively studied through electroencephalography
(EEG) signals[7, 8], it is a non-invasive, fast, and affordable mode of brain–computer
interfacing technology. Typically (BCI) activities in the brain directly reflect the func-
tions of the central nervous system, which is where EEG signals are obtained, and
also, they are more adequate in addressing human emotional states. Consequently,
the motive behind the proposed study is to build an automated emotion recognition
system that employs EEG signal information.
Given the significance and interest in the research area of affect recognition
through predictive features from EEG waves for deliberately recognizing emotions
is firmly entrenched to date. In the process of identifying emotions from information
potential as features using sub-bands of EEG signals. Cross-subject classification of
emotion was evaluated on the SEED dataset by Gupta et al. [9], which contains three
different emotional data (neutral, happy, and sad). The understanding emotional sen-
sitivity of various people across different brain regions can be earned through a cross-
subject classification approach, with channel-specific nature when the same stimuli
are presented to the people. Finally, retrieved feature values are smoothed before
being input to random forest and support vector machines classifier and achieved
better performance. Arjun et al. [10] present a novel deep learning architecture that
can recognize emotions across subjects using the DEAP dataset, SEED dataset, and
CHB-MIT dataset. To obtain a subject-invariant latent representation of EEG data,
novel (long-term memory) LSTM with channel attention autoencoder method was
employed. When addressing inter-subject variability, concentrate on tasks that are
subject-independent. Based on latent vectors from an autoencoder, a classification
task is performed by CNN with an attention framework. Automatic feature extrac-
tion and classification using various convolutions neural networks (CNNs) have been
proposed by Khare et al. [11], for identifying four different emotional states happy,
fear, sadness, and relaxation. Smoothed pseudo-Wigner-Ville distribution is adopted
for time–frequency representation in creating an image from the filtered EEG sig-
nals. These images are transmitted to AlexNet, ResNet50, and VGG16 pre-designed
with configurable CNN. The results obtained by evaluating four CNNs show that
configurable CNNs require much fewer learning parameters when compare to other
with better accuracy. Tuncer et al. [12] explored a multilevel handcrafted feature
generator model for automatic emotion categorization from EEG signals using three
databases (DREAMER, GAMEEMO, and DEAP). This work proposes Tetromino,
a novel approach for representing textural patterns that draw inspiration from the
Tetris video game, and discrete wavelet transform (DWT) is applied to the EEG
signals in decomposing signals into various levels. The Tetromino approach is then
used to create unique features from the decomposed DWT sub-bands along with this,
most discriminating features are extracted from the maximum relevance minimum
redundancy (mRMR) features selection approach, and then utilizing support vector
machine for the classifying several emotions. Yin et al. [13], study suggests a unique
deep-learning model for emotion identification (ERDL). EEG data are separated
25 Ortho-FLD: Analysis of Emotions Based on EEG Signals 301

into segments with a 6-second time window and calibrated using 3-second baseline
data. Each segment’s differential entropy is then extracted to create a feature cube
and further deep learning model that combines a graph convolution neural network
(GCNN) and long-short term memory neural networks using this feature cube of
each segment as its input to (LSTM). Multiple GCNNs are employed in the fusion
model to extract graph domain information, LSTM cells are used to extract temporal
features by memorizing how the relationship between two channels changes over
time, and a dense layer is applied to obtain the results of the emotion classification
from the DEAP dataset.
At each level of investigating emotions, there are frequently multiple aspects that
occur in the field of machine learning, leading to complex issues from different
perspectives. Ultimately, all the stages are essential factors in analyzing various
emotional states. However, one of the bottom-line factors is the quantity of input
feature at the stage of classification generally, most of the features are correlated
and leads to redundancy, and thus, it is important to explore the new concepts in
the representation of features along with reduced dimension without losing crucial
information is challenging task.

2 Proposed Methodology

In an analysis of categorizing four different kinds of emotions from EEG signals,


our proposed research study established the classification model for recognizing
emotion. When it comes to the performance evaluation in any machine learning sys-
tem, one of the most essentially required factors is to have a smaller set of relevant
features extracted from high-dimensional database unquestionably, history of EEG-
based emotion recognition models shows that set of originally generated features
incorporates with good information, which is suitable in giving as input to classi-
fier for reaching better results [14]. Though contemplating this humongous amount
of features as input to the classifier may decrease the efficiency of positive impact
in performance analysis at the phase of classification. In this scenario to minimize
the amount of required data and computational time, we adopted the processes of
dimensional reduction as our prime objective; hence, we designed a feature selection
algorithm for the reduction of huge dimensions EEG data into a lower one. In accom-
plishing this task, a new approach of pyramidal structured dimensional reduction
algorithm was employed for generating features in performing difference approach,
which was inspired by a mathematical technique of numerical analysis called forward
interpolation: Assuming that function f (x) is single-valued and continuous then the
values in function f (x) it corresponds to set off fixed values in X say x0, x1, x3....xn
then it is possible to easily compute and tabulate some satisfying condition in dif-
ferent levels of iteration where the output of previous x value in a set of X of each
iteration will be treated as input to the next level of iteration such a process in the
numerical analysis is called as interpolation. For more information refer to [15].
302 M. S. Thejaswini et al.

2.1 Feature Representation Through Pyramidal Structured


Technique

The pyramidal structured forward interpolation technique is used to extract and rep-
resent relevant features from high-dimensional time domain EEG signals. It involves
distinguishing discrete samples of a given database discontinuously (samples differ
from one another) in a closed loop of intervals (consecutive samples of even and odd
terms) in different levels of iterations, for reducing the high-dimensional data into
half of its samples, successively in each level of iterations, the obtained results are
also discrete because our input set of samples is discrete. In general, the notation of
our proposed work at each level of interpolation is given by (x) : (n) − (n+1) .
Where n = 1, 2, 3 . . . n (38000 samples of each subject from four different classes).
(x) : Different levels of forward Difference operations. (n) and ( n + 1) : Values
in each samples.
Five different levels of forward difference interpolation iterations for dimensional
reduction are as follows:

1 (x) = xn − xn+1 : First level of forward difference.


2 (x) = 1 n(x) − 1n+1 (x) : Second level of forward difference.
3 (x) = 2 n(x) − 2n+1 (x) : Third level of forward difference.
4 (x) = 3 n(x) − 3 n+1 (x) : Fourth level of forward difference.
5 (x) = 4 n(x) − 4n+1 (x) : Fifth level of forward difference.

2.2 Ortho-Fisher Linear Discriminant Analysis

A frequently used dimension reduction method is PCA [16, 17], and it makes use
of principal components computed through single value decomposition. But the
direction of principle components maximizes variation in the projected data pattern
(PCA is unsupervised learning approach) instead linear discriminant analysis (LDA)
takes into the account of label data where PCA refuses. LDA is a popularly known
method for reducing the dimension of data, which is built on the criteria of the Fisher
ratio. For optimizing the separation between classes, LDA makes use of Fisher linear
discriminant analysis (FLD) which minimizes the data dimension this happens by
reducing variance within the class and increasing the gap between the calculated
means of classes. It is one of the supervised learning scatters matrix-based classifiers
if label data is given as input to the classifier, it can determine a set of weights to draw
a decision boundary and thus classify the data. It aims in finding the vector which
maximizes between class separation of the projected data (maximizing separation can
be ambiguous). The important criteria followed by an FLD is to maximize the distance
between projected means and minimize the projections within the class variance
more formalized explanation when considering several independent feature matrices
which are relative to the label data. FLD tends to generate a linear combination of
these and that produces the greater main differences between related classes [18].
25 Ortho-FLD: Analysis of Emotions Based on EEG Signals 303

Considering that there are M number of training as samples, where Ak (K =


1, 2, ......M) indicated by m by n dimension matrix, which holds C number of classes
and the i t h class contains Ci including n i samples. For each training, EEG features
define the corresponding feature as follows: For calculating the scatter matrix Sw
within the class for the ith class scatter matrices and Si scatter matrix is computed as
the summation
 of co-variance matrices of mean-centered EEG features in that class.
Si = x∈xi (x − m i )(x − m i )T . Where (m i ) is the mean of EEG features cin the
class. The summation of all scatter matrix within class is denoted by Sw = i=1 Si .
The between class scatter matrix Sb is computed as the summation of the covari-
ance matrix by calculating the difference
c between total mean Tand mean of the par-
ticular class of EEG features: Sb = i=1 n i (x − m i )(x − m i )
Let u1, u2, u3 be the set of discriminant vectors which have been followed for
minimization of objective functions, and it is determined by FLD along with trans-
formation matrix (U )  T 
U Sb U 
U = argmax  T ,
U Sw U 

where U belong to eigenvectors interrelated to the d largest eigenvectors of matrix:


Sw−1 Sb .
There are some drawbacks observed while exploring FLD for extracting features,
it performs well only when vectors have a smaller number of projections as features,
but selecting a relevant number of projection vectors is an individualized effort in
producing improved accuracy. Additionally, the performance during classification
deteriorates greatly when features are extracted by utilizing more or all number of
projection vectors, these kinds of difficulties in FLD is occurred due to the adoption of
large projection vectors; still, it is extremely desirable to integrate a strategy for these
forms of algorithms that enables them to perform better when an optimal number
of projection vectors are employed. Concerning this, we introduced an equivalent
method called orthogonalization for eliminating the dependency within the vectors
preferred by FLD which addresses the above-explained drawbacks. To incorporate
this approach, we used the Gram–Schmidt decomposition process (it gives a path for
constructing an orthogonal basis over an arbitrary interval for an arbitrary weighting
function from a non-orthogonal set of linearly independent functions).
Let us assume that u1, u2, u3 are the vectors of Fisher linear discriminant and
v1, v2, v3 will correlate as orthogonalized projections vectors, accepting that v1 as
u1 and assume k vectors ( v1, v2, v3 ) when 1 ≤ k ≤ −1 have been already estimated
represents the orthogonalized fisher discriminant vector as follows:

K
viT vk + 1
vk+1 = u k+1 − vi
i=1
viT vi

Along with this it should also be noted that orthogonal discriminant vectors
v1, v2, v3 are used for extracting features rather than the original discriminant vec-
tors u1, u2, u3 Finally, it is worth knowing that PCA-based methods are by definition
304 M. S. Thejaswini et al.

orthogonal in nature., anyhow in the case of FLD, the transformation matrix of Sw− 1
is not symmetric one. In a such set of conditions for a non-symmetric matrix, it is
feasible to gather an eigenvector that is linearly independent and correlated obtained
this process increases the likelihood of appropriating redundant information among
fisher discriminant vectors. This rationale behind causes FLD for poor performance
when more or all projection vectors are considered [19].

2.3 GRNN for Classification

Artificial neural network (ANN) explicitly influenced by biological neural network


from virtue of its properties which mimic human brain through set of algorithms
are considered to be one of the important exposure of AI with various applications
[20–23]. In this proposed study, we utilized a kind of supervised artificial neural such
as GRNN, which is good at classifying time series prediction tasks when compared
to other neural network-based classifiers; it has enhanced the reliability of the results
and improved the performance evolution in the classification of EEG signals into
4 distinct types of emotional states with dimension reduction approach. Through
interpolation, we extracted and represented relevant EEG features along with reduced
dimensions. For these extracted features, we applied ortho-FLD (OFLD) to produce
the ten most discriminating projections and these projections are given as input for
GRNN in further classification of emotional states. GRNN as a memory-based neural
network has exhibited voluminous performance in addressing real-world problems
through a diverse range of applications The principal advantages of GRNN are if
there is a sufficient number of samples as inputs, it is computationally less intensive,
single pass, and quick training network which perform well in noisy environments
and produce a faster response. The topology of GRNN is made up of three layers
input layer, a hidden layer (a Gaussian kernel is used as an activation function),
and an output layer. The observed number of attributes are examined as input layers
named feature matrix I, all of these input layers are integrated into pattern layers
by neurons, which provide training patterns and its output to the summation layer
normalizes the resultant output set (pattern and summations are the part of hidden
layers) then all the pattern layers are linked to the neurons of summation layer
[24, 25] and uses the following equation to calculate the weight vector.
n  
Ti Wi ||I − It ||2
F(I ) = i=1
n , Wi = e (1)
i=1 Wi 2h 2

3 Experimental Results and Performance Analysis

The proposed study employs a publicly available bench-marking EEG-based


GAMEEMO dataset which is composed of aural-visual stimuli for eliciting emo-
25 Ortho-FLD: Analysis of Emotions Based on EEG Signals 305

Table 1 Comparison table of different dimensional reduction methods


Study Methods Dataset Accuracy
Yu Chen [27] Linear discriminant DEAP 88.70
analysis+Ada-boost
Qiang GAO [28] Principal component Own dataset 89.17
analysis+SVM
DongKoo [29] Genetic algorithm DEAP 71.76
Proposed method OFLD approach GAMEEMO 100

tions, EEG signal information was recorded from 28 healthy subjects aged between
20–27, while they were playing four different computer games for 20 min (each game
time duration was five minutes and games were named as G1:Boring, G2:Calm,
G3:Funny, G4:Horror), for recording EEG signals 14 (AF3, F7, F3, FC5, T7, P7,
O1, O2, P8, T8, FC6, F4, F8, AF4) channel wireless EMOTIVE EPOC EEG device
was used, basically, the sampling rate of the device was 2048 Hz; however, at the
time of experimentation it was down-sampled to 128 Hz. Dataset holds two folders
(raw and preprocessed signals) since we considered only preprocessed EEG signals
for our proposed experiments. To remove artifacts caused by hand, head, and arm
movements the author adopted the fifth-order sinc filter which was built into the EEG
device itself. The dataset contains 1568 (4 * 14 *8) EEG data where 4 represents
the number of games played, 14 stands for the number of EEG channels, and 28 is
subjects who participated during the time of recording and the sample length of EEG
signals for a single subject in each emotion is 38,252. To acquire more detailed and
technical knowledge on the dataset refer to [26].

3.1 Experimental Procedure

This section details the experimental design carried out using the video game-based
EEG GAMEEMO dataset, and we considered 28 subjects’ preprocessed EEG fea-
tures from all four emotional classes recorded using 14 channel EEG device for
implementing the proposed work. To start with the implementation of MATLAB
2018 on a PC with an Intel I5 processor and 8GB ram was preferred. In the dataset,
the sample length of a single subject in each different class of emotion is 38,000. The
earlier section details how extraction and representation of EEG feature from the time
domain along with reduction of dimension using pyramidal structure interpolation
and OFLD technique is achieved. The pyramidal approach represents EEG features
by differencing the larger set of EEG features with a sample length of 38,000 to 1187
length of samples in performing interpolations in five different levels of iterations.
Then obtained 1187 features were divided into training and testing in an 80:20 ratio,
we selected 22 subjects’ EEG features as training and 6 subjects’ EEG features as
testing for classifying 4 different emotions, the same procedure was followed for all
306 M. S. Thejaswini et al.

Table 2 Comparison table on accuracy (percentage) using GAMEEMO dataset for all 14 channel
EEG signals
Method AF3 AF4 F3 F4 F7 F8 FC5
Alakus et al., method 61 75 59 67 67 75 64
+KNN [26]
Alakus et al., method 81 88 63 72 84 80 66
+SVM [26]
Alakus et al., method 86 87 79 83 84 84 79
+MLPN [26]
Tuncer et al., method 98.75 98.57 99.11 98.39 98.21 98.75 98.57
+LEDPatNet19 [30]
Tuncer et al., method 99.33 99.55 98.66 98.21 98.66 99.78 99.88
+SVM [12]
Our proposed 100 100 100 100 100 100 100
method+GRNN
Method FC6 O1 O2 P7 P8 T7 T8
Alakus et al., method 68 65 65 61 73 61 64
+KNN [26]
Alakus et al., method 68 57 70 59 81 65 81
+SVM [26]
Alakus et al., method 85 79 83 79 77 75 79
+MLPN [26]
Tuncer et al., method 99,29 99.11 98.39 98.57 98.57 98.04 98.57
+LEDPatNet19 [30]
Tuncer et al., method 98.66 97.32 99.33 99.78 98.88 98.88 100
+SVM [12]
Our proposed 100 100 100 100 100 100 100
method+GRNN

14 channel EEG features. Then 1187 EEG features from the interpolation technique
were projected to OFLD which gave us the ten most high discrimination patterns of
EEG features. OFLD derives the projection of the input space of multi-dimension
onto the line of projection vectors which produces a maximum ratio of scatter matrix
between the class and within the class. Then 10 most projection was applied to GRNN
to test the data. The purpose of choosing GRNNs as classifiers is due to their high
quality of performance in time series prediction tasks in a wide range of applications.
The below-tabulated results show that obtained results with a combination of dimen-
sional reduction approach and GRNN performed well when compared to another
state of existing methods. The results obtained for all 14 channels in the proposed
method are given in Tables 1 and 2. It is noticeable from tabulated results compar-
ing the proposed study, that a combination of dimensional reduction and GRNN
outperforms other existing methods.
25 Ortho-FLD: Analysis of Emotions Based on EEG Signals 307

4 Conclusion

In this research study, a new method for classifying four different emotions using
EEG signals was presented; the prime intention of this study was to extract and
represent EEG features and reduce the huge dimension of EEG features into a smaller
dimension without the information loss. In accomplishing this purpose, we have
adopted a combination of interpolation for the representation of features and OFLD
approaches for dimensional reduction. Through the interpolation technique, a larger
set of EEG features was applied with interpolation in extracting and representing
relevant features and then observed features along with the reduced dimension were
employed with OFLD which exhibits a distinctive way of representing given patterns
along with high discrimination. When working with an orthogonal system as opposed
to a non-orthogonal one precision and calculations are always worthwhile. Hence,
in this study behavioral characteristic of OFLD was explored in the classification
of EEG signals. Results from ortho-FLD reached better classification performance
when we utilize a lesser number of training samples. Empathetically there is a positive
impact in the proposed study when compared with other state-of-the-art methods
because initially representation of features employee’s selection of relevant features
from the time domain of one-dimensional EEG data, here no data transformation was
carried out like other traditional methods. Finally, classification with a conventional-
based neural network such as GRNN was used in classifying four different emotions.
The developed model was examined on the GAMEEMO dataset, and it is noticeable
that observed results are promising for all 14 channel EEG signals. The proposed
study is unique and the first in the kind of history related to emotion recognition
from EEG signals. In the future, we wish to grow by exploring new different ways
of dimension reduction methods for detecting various emotions in the field of effect
recognition.

References

1. Padhmashree V, Bhattacharyya A (2022) Human emotion recognition based on time-frequency


analysis of multivariate EEG signal. Knowledge-Based Syst 238:107867
2. Aslan M (2022) CNN based efficient approach for emotion recognition. J King Saud Univer-
Comput Inf Sci 34(9):7335–7346
3. Li S, Deng W (2020) Deep facial expression recognition: a survey. IEEE Trans Affect Comput
4. Ramakrishnan S (2012) Recognition of emotion from speech: a review. Speech Enhanc Model
Recogn-Algor Appl 7:121–137
5. Sogon S, Masutani M (1989) Nature the, of emotions: human emotions have deep evolutionary
roots, a fact that may explain their complexity and provide tools for clinical practice identifi-
cation of emotion from body movements: a cross-cultural study of Americans and Japanese.
Psychol Rep 65(1):35-46E
6. Castellano G, Kessous L, Caridakis G (2007) Multimodal Emotion Recogn Expres Faces, Body
Gestures Speech. Doctoral Consortium of ACII, Lisbon
7. Sanei S, Chambers JA (2013) EEG signal processing. John Wiley & Sons
8. Sanei S, Chambers JA (2021) EEG signal processing and machine learning. John Wiley & Sons
308 M. S. Thejaswini et al.

9. Gupta V, Chopda MD, Pachori RB (2018) Cross-subject emotion recognition using flexible
analytic wavelet transform from EEG signals. IEEE Sens J 19(6):2266–2274
10. Arjun A, Rajpoot AS, Panicker MR (2021) Introducing attention mechanism for EEG signals:
emotion recognition with vision transformers. In: 2021 43rd annual international conference
of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 5723–5726
11. Khare SK, Bajaj V (2020) Time-frequency representation and convolutional neural network-
based emotion recognition. IEEE Trans Neural Networks Learn Syst 32(7):2901–2909
12. Tuncer T, Dogan S, Baygin M, Acharya UR (2022) Tetromino pattern based accurate EEG
emotion classification model. Artific Intell Med 123:102210
13. Yin Y, Zheng X, Hu B, Zhang Y, Cui X (2021) EEG emotion recognition using fusion model
of graph convolutional neural networks and LSTM. Applied Soft Comput 100:106954
14. Liu J, Meng H, Li M, Zhang F, Qin R, Nandi AK (2018) Emotion detection from EEG record-
ings based on supervised and unsupervised dimension reduction. Concurr Comput Pract Exp
30(23):e4446
15. Thejaswini MS, Hemantha Kumar G, Manjunatha Aradhya VN (2022) A pyramidal approach
for emotion recognition from EEG signals. In: 2nd international conference on applied intel-
ligence and informatics. Springer Cham. (Paper accepted and article in Press)
16. Bazgir O, Mohammadi Z, Habibi SAH (2018) Emotion recognition with machine learning using
EEG signals. In: 2018 25th national and 3rd international iranian conference on biomedical
engineering (ICBME). IEEE, pp 1–5
17. Chen J, Ro T, Zhu Z (2022) Emotion recognition with audio, video, EEG, and EMG: a dataset
and baseline approaches. IEEE Access 10:13229–13242
18. Aradhya VM, Niranjan SK, Hamsaveni L (2013) A robust analysis of FLD and orthogonal
FLD on handwritten characters. In: 2013 international conference on communication systems
and network technologies. IEEE, pp 105–108
19. Gilbert S (2007) Linear algebra and its applications. Thomson
20. Aradhya VNM, Niranjan SK, Hemantha Kumar G (2010) Probabilistic neural network based
approach for handwritten character recognition. Special Issue of IJCCT 1(2):3
21. Aradhya VNM, Pavithra MS, Naveena C (2012) A robust multilingual text detection approach
based on transforms and wavelet entropy. Procedia Technol 4:232–237
22. Aradhya VN, Mahmud M, Guru DS, Agarwal B, Kaiser MS (2021) One-shot cluster-based
approach for the detection of COVID-19 from chest X-ray images. Cognit Comput 13(4):873–
881
23. Aradhya VNM, Niranjan SK, Hemantha Kumar G (2010) Probabilistic neural network based
approach for handwritten character recognition. Special Issue of IJCCT 1(2)
24. Prakash BV, Ajay DV, Ashoka, Manjunath Aradhya VN (2015) An exploration of PNN and
GRNN models For efficient software development effort estimation
25. Aradhya VNM, et al (2020) Learning through one shot: a phase by phase approach for COVID-
19 chest X-ray classification. In: 2020 IEEE-EMBS conference on biomedical engineering and
sciences (IECBES). IEEE
26. Alakus TB, Gonen M, Turkoglu I (2020) Database for an emotion recognition system based on
EEG signals and various computer games-GAMEEMO. Biomed Sig Proc Control 60:101951
27. Chen Y, Chang R, Guo J (2021) Emotion recognition of EEG signals based on the ensemble
learning method: Adaboost. Math Prob Eng
28. Gao Q, Wang CH, Wang Z, Song XL, Dong EZ, Song Y (2020) EEG based emotion recognition
using fusion feature extraction method. Multimedia Tools Appl 79(37):27057–27074
29. Shon D, Im K, Park JH, Lim DS, Jang B, Kim JM (2018) Emotional stress state detection using
genetic algorithm-based feature selection on EEG signals. Int J Environ Res Public Health
15(11):2461
30. Tuncer T, Dogan S, Subasi A (2022) LEDPatNet19:automated emotion recognition model
based on nonlinear LED pattern feature extraction function using EEG signals. Cognitive
Neurodyn 16(4):779–790
Chapter 26
Implementation of Reliable Post-disaster
Relief Communication Network Using
Hybrid Secure Routing Protocol

G. Sabeena Gnana Selvi, A. Prasanth, D. Sandhya, and B. Gracelin Sheena

1 Introduction

Communication systems could go down fully or partially as a result of disasters.


To preserve lives in such a situation, relief efforts require a communication system
that can be quickly deployed. Making critical decisions requires the rescue team
to exchange information. Whether a tragedy is man-made or natural, there will
continuously be a need for food, medical care, and release efforts.
The term “ad hoc network” refers to peer-to-peer communication between an infi-
nite number of devices that is infrastructure-free, self-organizing, self-configuring,
wireless and emerges impulsively [1]. Wireless Sensor Network (WSN) is a category
of Ad Hoc Network which is made up of numerous small, inexpensive, and straight-
forward sensor bulges dispersed over a huge area [2–5]. It collects environmental
data and transmits it in a multi-hop fashion to a static sink. Once the data is received,
the sink will process and analyze the sensed data [6].
The idea behind a Vehicular Ad Hoc Network (VANET) is to set up a network
of vehicles moved and stationery street side units for a particular requirement or
circumstance [7]. Besides, MANET is another category of Ad Hoc Network in which
the nodes spontaneously take on the roles of routers and end-system nodes. As a

G. Sabeena Gnana Selvi (B) · B. Gracelin Sheena


Department of Computer Science and Engineering, Sathyabama Institute of Science and
Technology, Chennai, Tamil Nadu, India
e-mail: sabijack3@gmail.com
A. Prasanth
Department of Electronics and Communication Engineering, Sri Venkateswara College of
Engineering, Chennai, Tamil Nadu, India
D. Sandhya
Department of Computer Science, Dr. M.G.R Educational and Research Institute, Chennai, Tamil
Nadu, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 309
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_26
310 G. Sabeena Gnana Selvi et al.

self-configured network, MANET is self-organized, requires no outside network


configuration, enables the creation of ephemeral networks, and enables nodes to
transfer with each other effortlessly. Both the military and the civilian realms require
the major use of MANET technology for communication purposes [8–10].
Everyone would agree that early warning of a disaster is more crucial than treat-
ment and damage restoration afterward [11]. The largest natural disaster to hit India
since the 2004 tsunami occurred in June 2013, when a multi-day cloudburst with
its center in the state of Uttarakhand in the north of the country. Aside from normal
catastrophes, some Indian metropolises are also at risk from industrial, chemical, and
man-made disasters [12]. In the last few decades, the security threats have increased
in the MANET environment because the processing information will be stored in
the cloud platform. Thus, the security issues are more concern while utilizing the
MANET in disaster management. To alleviate this, a new HSR framework has been
proposed to provide secure routing among multiple devices. The secure routing also
mitigates the control packet overhead problem in MANET.

2 Related Work

The primary feature of MANET is the ongoing flexibility of nodes which may lead
to frequent topology changes and other difficulties including how to route packets
of data between nodes. MANET can be used in a variety of settings to quickly
and easily construct a network; these settings include disaster situations, WSN, and
VANET [13]. Each environment differs from the others in certain ways. A collection
of wireless mobile computers work together by forwarding packets for one another,
so they can connect outside the bandwidth of direct wireless signals. An independent
group of mobile users known as a MANET can communicate over wireless links that
are only moderately fast. Since the nodes are portable, the network architecture may
alter rapidly and changeably over time. Owing to self-configure and decentralize
ability, the mobile nodes will need to provide routing functions [14].
The routers and involved nodes serve as the wireless topology of the network and
may vary quickly and unpredictably due to the router’s freedom to move and govern
itself at will. Such a network might function independently or it might be linked to
the wider Internet. While other nodes require the assistance of intermediary nodes to
transmit their packets, a number of nodes can directly connect with those nodes that
are within radio transmission range of each other. These networks can function every-
where without the help of any infrastructure because they are completely distributed.
These networks are quite robust due to this characteristic [15]. The wireless connec-
tivity between the nodes exists at any given time based on the placements of the nodes,
their spreader and receiver attention designs, communication stages of power, and co-
channel meddling stages. Since users are not constrained to a single physical location
as is the case with traditional wireline networks, the MANET permits a more trans-
parent communication architecture. It is a brand-new, unique connection without any
fixed cable communication infrastructure or additional network hardware [16].
26 Implementation of Reliable Post-disaster Relief Communication … 311

Since MANET nodes vary in communication range and have incomplete energy
resources that cannot typically be recharged or substituted, they face numerous
challenges, including low bandwidth, high energy consumption, limited memory,
processing limitations, and changes in mobility patterns [17]. Examples of these
devices include mobile phones, PDAs, digital cameras, earphones, wristwatches,
iPads, and laptops. The difficulty with mobility patterns causes periodic reorganiza-
tions of the network topology. The wireless network is unique compared to wired
networks because of issues with interference, intra-flow, inter-flow, and fade. In
the absence of a centralized node, nodes interact with one another through peer-
to-peer queries. As a result, data must be transmitted through intermediary nodes,
making routing a significant problem in a MANET [18]. The following sub-section
summarizes the various existing routing protocols utilized in the MANET envi-
ronment. Hybrid Algorithm for Secured MANET Environment is the suggested
model (HASME). MANET HASME algorithm implementation to evict problem-
atic nodes is contrasting the HASME with the current three procedures. The self-
starting, multi-hop, and dynamic routing among all the network participants who
seek to construct and maintain a network connecting all the existing nodes are made
possible by the HASME algorithm presented in this research study. As was covered in
the section before, MANETs are subjected to a variety of network assaults, including
gray hole and black hole attacks. The technique suggested in this research study is
primarily designed to counter these assaults and offer message transmission in the
network that is safe. Additionally, the method enables all mobile nodes to swiftly
find new paths to their end point [19]. New EENS-DA model proposed to achieve
network slicing and data aggregation inWSN. The EENS-DA model has allocated the
needed resources specific applications clearly and efficiently. Moreover, the EENS-
DA model has employed Conv-LSTM-based network slicing and tree-based aggre-
gation techniques. The EENS-DA technique enhances the efficacy of data slicing,
enhances the accuracy, and ensures the privacy preservation in the network [20].

2.1 Routing Protocols in MANET

In MANET without infrastructure support, as is the case with wireless connections,


a recipient may be outside of the range of a supplier node transceiver data packet,
so a sending process has always been necessary to find a path so as to have sent the
packets appropriately between both the beginnings and the goal [21, 22]. Forwarding
is the process of creating a path from the transmitting node to the destination nodes.
Proactive routing protocols (Table-driven)
Even before it is required, these protocols keep the routing information. Each node in
the system keeps track of its routing information to all other nodes [23]. Routing tables
typically contain route information, which is updated periodically as the network
architecture changes. Depending on how frequently the routing data is changed
in each routing table, the protocols that fall under this category differ from one
312 G. Sabeena Gnana Selvi et al.

another. Additionally, the number of tables maintained by these routing methods


varies because they require maintaining node information for every single node in
each node’s routing table, and proactive direction-finding protocols are not suitable
for bigger networks. As a result, the routing database has greater overhead, which
takes more bandwidth. Routing tables are used to manage and retain the routing info
in proactive protocols [24, 25].
Optimized Link State Routing (OLSR) Protocol
The OLSR periodically shares topological information according to the some chosen
nodes known as multipoint relay’s (MPRs) nodes [26]. Three distinct kinds of
management signals are used to try to provide the optimum routes depending on
the hop-count measurement: first, HELLO communications, which carry out local
link detection and neighbor recognition up to two-hop neighbors. Second, Topology
Control (TC) messages are employed to conduct the topology statement task. Finally,
for nodes with numerous interfaces, the Multiple Interaction Declaration messages
are exploited. Only the MPR nodes which spread around the network can transmit
TC messages headfirst.

2.2 Distributed or Reactive (On-Demand) Routing Protocols

Although reactive protocols only establish routes when those routes are required,
they are known as on-demand protocols. As the name implies, the source creates the
need. When a source node needs a route to a destination, it starts the network’s route
discovery process. Once a route is discovered or all potential route variations have
been looked at, the process is finished. Following that, a route maintenance technique
is followed to maintain the legitimate routes and eliminate the invalid routes [27].
Ad hoc On-demand Distance Vector routing (AODV)
A distance vector routing protocol called AODV was launched for MANET in 2003
[28]. AODV is built to operate at a variety of speeds and high-density network
topologies. In order to overcome the counting to the infinite problem that plagues
traditional distance vector protocols, it has been created to work in a trusted network
that cannot contain malware in a loop-free manner. The AODV routing protocol
has two operational modes: route discovery and route management. Route Requests,
Route Replies, Route Errors, and Route Reply Acknowledgment are all types of
AODV control messages. Only requests are made to start the routing process.
Dynamic Source Routing (DSR)
In 1994, the DSR on-demand protocol made its debut. It has two stages, like AODV:
route discovery and route maintenance. Even yet, a system with up to 200 node
density and high rates of mobility can guarantee loop-free routing by employing
a variety of strategies that allow for many paths to be followed to any endpoint.
DSR enables unidirectional links, in contrast to AODV operation. Due to the fact
26 Implementation of Reliable Post-disaster Relief Communication … 313

that the header of a piece data packet encloses all necessary routing statistics to
reach the target node, this protocol is known as source routing. Again, unlike AODV,
connectivity among neighbors is not required to be periodically updated [29].

3 Proposed Methodology

When a disaster occurs, a lot of people start looking for disaster-relevant information.
This could cause congestion issues, which would greatly reduce network performance
and increase end-to-end delay. Most routing protocols choose the least number of
hops between the sources and destination pairs when routing traffic. Battery live at the
path’s nodes will be quickly depleted if the same path is repeatedly used. Furthermore,
load balancing in the network is not accomplished via shortest path routing. Data
loss results from a path break, and network reconfiguration takes longer. A node
that transmits at maximum power is likely to quickly run out of battery life. Battery
life must be managed wisely to extend the lifetime of the network because it is a
resource that is crucial to the network’s durability. Therefore, the optimal solution
has been recommended called HSR protocol to alleviate the aforementioned issues
in a MANET which is utilized for effective post-disaster communication. Figure 1
illustrates the proposed disaster relief communication model. The model illustrates
how post-disaster communication may take place in a real-world context.

Fig. 1 Proposed disaster relief communication model


314 G. Sabeena Gnana Selvi et al.

3.1 Route Discovery

The route request packet (Rq ) is employed to control the path to the destination when
the source node is not able to find a path in the route cache. The route discovery
process is necessitated in order to transmit the packets across the network. As the
packet moves from the source to the destination, each central mobile node adds its
own Internet address to the list of IP addresses in the request for the route. As a
response, when the demand packet arrives at the destination node, it contains the
whole path from source to destination, a process known as path building. After
getting the signal from the source node, the target node restarts the path discovery
process in order to deliver the route response packet back to the source node.

Algorithm 1: Proposed HSR algorithm

1. Begin
2. if Rq is received from a legitimate node, then do
3. RSA technique is utilized to decrypt the content of the cipher at the
receiver node;
4. Create a UPD packet using the QUE message as a foundation;
5. As in step 2, encrypting UPD;
6. Send UPD to the source node;
7. end if
8. if a malevolent node receives a QUE packet, then do
9. Construct a UPD message and send it to the source;
10. Decrypt the obtained UPD using the RSA algorithm;
11. UPD will be successfully decrypted and the hash code (signa-
ture) will be identical if UPD is a trustworthy node;
12. To show that UPD has originated from a legitimate node, set a flag to 0;
13. else
14. Set the flag to 1 to show that the UPD originated from a malicious node;
15. end if
16. Calculate the trust value of the link through Eq. (1);
17. Accomplish the link from source to sink;
18. Choose the accurate path according to the TS value;
19. Exclude the legitimate node from the transmission path;
20. Mitigate the links with low TS value;
21. if a node becomes attacked during the packet communication, then do
22. Repeat steps from 4 to 7;
23. else
24. Secured connection is created;
25. end if
26. end
26 Implementation of Reliable Post-disaster Relief Communication … 315

3.2 Secure Routing

The secure routing phase will construct the secured route from source to destina-
tion. Algorithm 1 depicts the detailed description of the proposed HSR algorithm.
Primarily, the proposed HSR protocol utilizes the RSA to encrypt the query (QUE)
packet. The SHA-512/256 method is implemented to provide a signature for the QUE
packet. Afterward, it sends a QUE packet to nearby nodes which transfers the hashes
and the ciphertext from the source to the target node. If a malevolent node receives a
QUE packet, then it constructs a Unified Path Discovery (UPD) message and sends
it to the source. The source node further decrypts the obtained UPD using the RSA
algorithm. The UPD will be successfully decrypted and the hash code (signature)
will be identical if UPD is a trustworthy node. To show the UPD has originated from
a legitimate node, set a flag to 0, whereas set the flag to 1 to notify that the UPD
originated from a malicious node. The trust (TS) value of the link can be evaluated
as,

Tc
TS = , (1)
Tt

where Tc indicates the correct transmission and Tt implies the total transmission.
Accomplish the link from source to sink and choose the accurate path according to
the TS value. At the same time, exclude the legitimate node from the transmission
path where it mitigates the links with low TS value. If a node becomes attacked during
the packet communication, then repeat the transmission steps again; otherwise, the
secured connection is created from the source to destination.

3.3 Route Maintenance

The proposed HSR protocol does not include the AODV protocol or proactive routing
techniques. The responsibility of the route maintenance phase is to maintain the
secure routing protocol among multiple deployed nodes within the network. This
path is discovered by the MAC layer or software acknowledgment which is exclusive
to HSR. A source route reply packet is used to notify the source node of the specific
route path and restart the route discovery mechanism when a connection between
two locations is lost. Since HSR is built on the idea of many pathways, when a source
receives a packet containing a route error, it can immediately use an alternative route
that is stored in the source route cache. It reduces the routing overhead issues within
the network. According to the concept of datagram pick-up, in the event that any
intermediate route between the source and the route detects a cracked next hop link,
if that intermediate route has an extra route to the destination in its route cache, it
can immediately use that similar route to forward the packet to the terminus.
316 G. Sabeena Gnana Selvi et al.

4 Results and Discussion

The NS2 platform has been utilized to evaluate the performance of proposed as well
as existing routing protocols. In general, NS2 tool is preferred more than a discrete
event simulator for networking research. By employing the TCP, UDP, IP, and CBR
message patterns, NS2 offers simulation and investigation support for wired and
wireless networks. The two main components of NS2 are NS, which stands for
network simulator, and NAM, which stands for network animator. The network
circumstances taken into account for simulation are listed in Table 1. To study how
network size affects protocol performance, the number of nodes in the system is
varied.
Four essential metrics such as Average Energy Utilization (AEU), Throughput
(THR), Packet Delivery Ratio (PDR), and Average End-to-End Delay (AEED) are
applied to analyze the performance of the proposed model. These metrics are assessed
by varying the number of nodes from 50 to 300. The comparative methods are AODV
[26], DSR [27], and OLSR [24].

4.1 AEU Analysis

The AEU analysis of the various routing protocols is exposed in Fig. 2. It is observed
that the proposed protocol obtains lesser energy depletion than the existing protocols.
A detailed statistical analysis is as follows: The AEU of the proposed protocol shows
superior results of 72, 58, and 45% as compared with the AODV, DSR, and OLSR
protocols, respectively. These better results are owing to accomplishment of proper
route discovery and route maintenance strategies during the route formation phase.
The optimal path is attained in the proposed protocol with the aid of lesser AEU value.

Table 1 Parameters for


Parameter Value
simulation
Terrain area 1200 m × 1200 m
No. of nodes 50–300
Propagation Two-ray model
Simulation time 100 ms
Platform Ubuntu 12.04
Channel Wireless
MAC type 802.11
Initial energy 1J
Application traffic CBR
Data 512 bytes/packet
26 Implementation of Reliable Post-disaster Relief Communication … 317

Fig. 2 Comparison of AEU under varying number of nodes

In contrast, the existing protocols lagged to implement the optimal route between
the nodes. This leads to acquiring higher AEU values of 0.26, 0.19, and 0.14 J.

4.2 THR Analysis

Bandwidth is defined as the ratio of packets formed at the source to transmissions


at the endpoint. The THR results for the different protocols are depicted in Table 2.
From Table 2, it is noticed that the THR is almost the same under regular routing
conditions and under catastrophe protection conditions. The network THR of the
existing protocols is reduced when a disaster condition is imposed. This is due to
arise of larger control packet overhead issues in the packet exchange stage. The
bandwidth is entirely wasted once the control packet overhead issues is occurred in
the network. This can be alleviated by establishing the proposed HSR protocol in
MANET environment. It utilizes the UPD packet significantly to notify the occur-
rence of malicious nodes. Owing to the utilization of UPD packet, the exchange of
control packet between two nodes is predominately reduced. This achieves larger
stability and better THR value of 366 kbps for dense network.

Table 2 Computation of THR (kbps) for different routing protocols


Method/No. of nodes 50 100 150 200 250 300
AODV [26] 200 218 236 242 268 286
DSR [27] 226 250 260 282 294 310
OLSR [24] 224 248 262 288 300 314
Proposed 250 265 290 321 334 376
318 G. Sabeena Gnana Selvi et al.

Fig. 3 Evaluation of PDR over different routing protocols

4.3 PDR Analysis

Estimated PDR is the proportion of packets delivered by the different CBR sources
that were accepted by the recipients. It also refers to the ratio of the entire quantity of
data packets received by the destination side to the total number of data packets trans-
mitted by the source node. This indicator shows how many data packets effectively
arrive at their intended locations. The PDR comparison for the various protocols is
shown in Fig. 3. Based on the HSR, it is apparent that the proposed HSR model
achieves a superior PDR value of 95% for a larger network. Meanwhile, the AODV,
DSR, and OLSR protocols sustain the PDR of 65%, 78%, and 80%, respectively.
The higher PDR of the proposed HSR model is because of employment of proper
path optimization algorithms. It finds the secure routing from source to destination
with a lesser energy. Thus, the attacker is not able to crash the routing path which
increases the packet transmission at the destination node.

4.4 AEED Analysis

AEED is the amount of time it takes a packet to travel along a system from its
beginning to its destination. According to Fig. 4, it is noticed that the proposed HSR
model takes a lesser AEED of 0.3 s than the AODV, DSR, and OLSR protocols.
This is because of quick route formation and query response from the proposed HSR
model. The route formation requires minimal time for packet transmission from
source to destination. At the same time, the utilization of QUE messages enhances
26 Implementation of Reliable Post-disaster Relief Communication … 319

Fig. 4 Analysis of AEED under varying number of nodes

the packet transmission without any delay. This lesser delay allows the proposed
model to maintain lower AEED of 40, 18, and 17% when compared with the AODV,
DSR, and OLSR protocols. The existing models are vulnerable to numerous attacks
where the attacker can easily change the routing path between two nodes. Henceforth,
the packet will be transmitted in a longer route to reach the destination.

5 Conclusion

An independent cluster of mobile users can communicate the information related


to the disaster. Because the node is movable, the network architecture may change
quickly and unpredictably over time. All network events including determining the
topology and distribution messages are necessary to be carried out by the node itself
in this self-configuring manner. Therefore, routing capabilities are built into the
mobile nodes in order to provide the appropriate communication during the disaster
conditions. The proposed HSR model was introduced to offer optimal path and
secure communication among multiple nodes. This secure communication enhances
the proposed model to acquire better performance than the conventional routing
protocols. Furthermore, the AEU of the proposed protocol exposes superior results
of 72%, 58%, and 45% as compared with the AODV, DSR, and OLSR protocols,
respectively. The better performance facilitates the proposed HSR protocol to operate
as a more efficient post-disaster communication model. In the future work, the other
security parameters can be considered in the proposed HSR protocol to increase the
overall effectiveness of the MANET environment.
320 G. Sabeena Gnana Selvi et al.

References

1. Angueira P, Val I, Montalbán J (2022) A survey of physical layer techniques for secure wireless
communications in industry. IEEE Commun Surv Tutorials 24(2):810–838
2. Prasanth A, Pavalarajan S (2019) Zone-based sink mobility in wireless sensor networks. Sens
Rev 39:874–880
3. Sekar J, Aruchamy P (2022) An efficient clinical support system for heart disease prediction
using TANFIS classifier. Comput Intell 38:610–640
4. Shantha R, Mahender K, Jenifer A (2022) Security analysis of hybrid one time password
generation algorithm for IoT data. AIP Conf Proc 2418:1–10
5. Prasanth A, Jayachitra S (2020) A novel multi-objective optimization strategy for enhancing
quality of service in IoT-enabled WSN applications. Peer-to-Peer Netw Appl 13:1905–1920
6. Bhaskar KB, Aruchamy P, Saranya P (2022) An energy-efficient blockchain approach for
secure communication in IoT-enabled electric vehicles. Int J Commun Syst 35:1–27
7. Kaur G, Kakkar D (2022) Hybrid optimization enabled trust-based secure routing with deep
learning-based attack detection in VANET. Ad Hoc Netw 136:1–22
8. Prasanth A, Ganeshkumar P (2015) Zone based gateway patrolling in wireless sensor networks.
In: Proceedings in IEEE international conference on engineering and technology, pp 1–6
9. Kaur G, Chanak P, Bhattacharya M (2020) Memetic algorithm-based data gathering scheme
for IoT-enabled wireless sensor networks. IEEE Sens J 20(19):11725–11734
10. Prasanth A, Pavalarajan S (2020) Implementation of efficient intra and inter-zone routing for
extending network consistency in wireless sensor networks. J Circ Syst Comput 29:1–19
11. Rezapour S, Farahani R (2020) Impact of timing in post-warning prepositioning decisions
on performance measures of disaster management: a real-life application. Eur J Oper Res
293:312–335
12. Milanez B, Ali S (2021) Mapping industrial disaster recovery: lessons from mining dam failures
in Brazil. Extr Ind Soc 8:1–21
13. Vazhuthi P, Manikandan SP (2022) An energy-efficient auto clustering framework for enlarging
quality of service in internet of things-enabled wireless sensor networks using fuzzy logic
system. In: Concurrency and computation: practice and experience, pp 1–28
14. Prasanth A, Pavalarajan S, Karthihadevi M (2019) Particle swarm optimization algorithm based
zone head selection in wireless sensor networks. Int J Sci Technol Res 8:1594–1597
15. Srividya P, Devi L (2022) An optimal cluster and trusted path for routing formation and classifi-
cation of intrusion using the machine learning classification approach in WSN. Glob Transitions
Proc 3:317–325
16. Jim L, Islam N (2022) Enhanced MANET security using artificial immune system based danger
theory to detect selfish nodes. Comput Sec 113:1–18
17. Sangeetha A, Rajendran T (2022) Supervised vector machine learning with brown boost energy
efficient data delivery in MANET. Sustain Comput Inform Syst 35:1–10
18. Singh S (2022) A cryptographic approach to prevent network incursion for enhancement of
QoS in sustainable smart city using MANET. Sustain Cities Soc 79:1–19
19. Sabeena Gnanaselvi G, Ananthan TV, Eswaran S (2019) Secured packet transfer using HASME
for AODV protocol to detect black hole and gray hole attack. J Adv Res Dyn Control Syst
11(2):168–177
20. Sheena G, Snehalatha N (2021) An energy efficient network slicing with data aggregation
technique for wireless sensor networks. ICICV, 9388536 (IEEE Explore Digital Library), pp
13–18
21. Feng Y, Zhang B, Chai S (2017) An optimized AODV protocol based on clustering for WSNs.
In: Proceedings in 6th international conference on computer science and network technology,
pp 1–6
22. Subha R, Anandakumar H (2022) Adaptive fuzzy logic inspired path longevity factor-based
forecasting model reliable routing in MANETs. Sens Int 3:1–9
23. Satish Kumar G, Rama Devi P (2021) A novel proactive routing strategy to defend node
isolation attack in MANETS. Mater Today Proc 1–10
26 Implementation of Reliable Post-disaster Relief Communication … 321

24. Abid M, Belghith A (2015) SARP: a dynamically readjustable period size proactive routing
protocol for MANETs. J Comput Syst Sci 81:496–515
25. Jagdale BN (2012) Analysis and comparison of distance vector, DSDV and AODV protocol
of MANET. Int J Distrib Parallel Syst 3:1–11
26. Semchedine F, Moussaoui A (2016) CRY OLSR: crypto optimized link state routing for
MANET. In: Proceedings in 5th international conference on multimedia computing and systems
(ICMCS), pp 1–6
27. Brar G, Thakur P (2019) Routing protocols in MANET: an overview. In: Proceedings in 2nd
international conference on intelligent computing, instrumentation and control technologies
(ICICICT), pp 1–6
28. Reddy P, Reddy B (2022) The AODV routing protocol with built-in security to counter blackhole
attack in MANET. Mater Today Proc 50:1152–1158
29. Ramya T, Mathana JM (2022) Exploration on enhanced Quality of Services for MANET
through modified Lumer and Fai-eta algorithm with modified AODV and DSR protocol. Mater
Today Proc 50:1152–1158
Chapter 27
Compact Metamaterial Octagonal
Antenna for Wireless Body Area Network

Goswami Siddhant Arun and Deepak C. Karia

1 Introduction

In Current Era, monumental growth is seen in area of sports, real-time monitoring,


pre- and post-heath monitoring checkups. Biomedical industry has seen continuous
growth in last few years. Body area network is used in blood pressure monitoring,
heart beat rate monitoring, and other healthcare parameters. In addition to healthcare
world, body area network applications are emerging in search and rescue (civilian and
military applications). Along the same lines, ambitious projects like google smart
watch have endless promising future [1, 2]. IEEE 802.15.6 Band is allocated for
wireless body area network [3, 4].
Designing an antenna to satisfy above requirements of small size with better
performance is a challenge [5, 6]. To obtain size reduction, we have used metamaterial
spit ring resonator geometry [7, 8]. The proposed research aims in size reduction
with operating frequency at 2.4 GHz [9, 10].
The detailed miniaturization comparison is given in Table 1.
70 % Size reduction is achieved by using SRR inspired MTM structure and band-
width of proposed antenna improved up to 3 times compared to traditional octagonal
antenna without SRR. The following are the primary contributions of this paper.
• Antenna-1 has used FR4 substrate which is low priced and easily accessible. It has
dimensions of 58 × 54 mm2 . It has dielectric constant (r ) of 4.4 and loss tangent
tan (δ) of 0.02, respectively.

G. Siddhant Arun (B) · D. C. Karia


Electronics Engineering, Sardar Patel Institute of Technology, Mumbai 400058, Maharashtra,
India
e-mail: siddhantgoswami23@gmail.com
D. C. Karia
e-mail: deepak_karia@spit.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 323
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_27
324 G. Siddhant Arun and D. C. Karia

Table 1 Comparison between antenna-1 and antenna-2


Parameter Size (mm2 ) Volume (mm3 ) Miniaturization Bandwidth
(%)) (MHz)
Antenna 1 58 × 55 910 – 40
Antenna 2 30 × 26 270 70 125

Table 2 Dimensions of antennas 1 and 2


Parameter Dimension (mm) Parameter Dimension (mm)
W 58 L 55
F 30 R 17
K 2 W1 26
L1 30 F1 17
K1 1.8 Q 8.36
P 10 T 10
K2 26 P2 30
X 6 Z 9.46
Y 8.6 T2 2.58

• Antenna-2 has used FR4 substrate which is low priced and easily accessible. It has
dimensions of 30 × 26 mm2 . It has dielectric constant (r ) of 4.4 and loss tangent
tan (δ) of 0.02, respectively. The size reduction is obtained using metamaterial spit
ring resonator.
• Bandwidth of antenna-2 is increased by 3 times as compared with antenna-1.
• 2.4 GHz is the antenna’s resonant frequency. It can be used for wireless body area
network applications.

2 Stepwise Analysis

2.1 Step 1 (Antenna-1)

We have designed a compact octagonal-shaped antenna. The antenna’s dimensions


are 58 × 55 mm2 . Figures 1 and 2 depict the antenna’s top and bottom views. As
illustrated in Fig. 7, the obtained simulated return loss is around −12 dB. The obtained
VSWR is about 1.6044 is shown in Fig. 8. Figure 9a and b shows the E-plane and
H-plane radiation pattern of Antenna-1. Table 2 shows the dimensions of Antenna-1
and Antenna-2.
27 Compact Metamaterial Octagonal Antenna for Wireless Body Area Network 325

Fig. 1 Antenna top view

Fig. 2 Antenna top and bottom view

2.2 Step 2 (Antenna-2)

In this step, we have used metamaterial complementary split ring resonator in bottom
and top patch of antenna. The antenna’s size is reduced as compared with Step 1. The
dimensions of Antenna-2 are 30 × 26 mm2 . Figure 3 shows the fabricated model of
antenna. The parametric dimensions are shown in Figs. 4 and 5. Figure 6 depicts the
antenna’s top and bottom view. According to Fig. 7, the resulting simulated return
loss is around −30 dB at 2.46 GHz. At 2.4 GHz, the measured return loss on the VNA
is around −25 dB. The obtained VSWR is about −1.074 shown in Fig. 8. Figures 10a
and b depict the radiation pattern for the E-plane and the H-plane, respectively
(Fig. 9).
326 G. Siddhant Arun and D. C. Karia

Fig. 3 Fabricated antenna top and bottom view

Fig. 4 Antenna top view

3 Simulation Results

See Figs. 7 and 8.


27 Compact Metamaterial Octagonal Antenna for Wireless Body Area Network 327

Fig. 5 Antenna bottom view

Fig. 6 Antenna top and bottom orientations

Fig. 7 Simulated return loss


for antennas 1 and 2, along
with measured return loss for
antenna-2
328 G. Siddhant Arun and D. C. Karia

Fig. 8 Voltage standing


wave ratio for antennas 1
and 2

Fig. 9 a Antenna-1: E-plane


radiation pattern. b
Antenna-1: H-plane
radiation pattern

Fig. 10 a Antenna-2:
E-plane radiation pattern. b
Antenna-2: H-plane
radiation pattern
27 Compact Metamaterial Octagonal Antenna for Wireless Body Area Network 329

Table 3 Parameter comparison between antenna-1 and antenna-2


Type of antenna Without SRR octagonal With SRR octagonal antenna
antenna (Ant-1) (Ant-2)
Freq (GHz) 2.45 2.46
Return loss (dB) −12.68 −30.52
VSWR 1.60 1.07
BW (MHz) 40 125
Area reduction (mm2 ) 910 210
Overall size (mm2 ) 58 × 55 mm 30 × 26 mm

Fig. 11 Antenna-2 with muscle model with air gap

4 Wireless Body Area Network Analysis

Table 3 shows the comparison between the simulated antenna without metamaterial
and with metamaterial SRR.

5 Body Area Network (BAN)

The antenna is put to the test on a human body with muscle phantom ranging in
thickness from 4 to 10 mm. Figure 11 shows the muscle model is kept at the bottom
of antenna. The simulated change in return loss is shown in Fig. 12. Looking at the
return loss, for gap as 4 mm, there is higher frequency shift observed beyond 2.5 GHz.
As the distance increases, the effect on return loss is reduced and frequency shifts
below 2.5 GHz.
330 G. Siddhant Arun and D. C. Karia

Fig. 12 Simulated return


loss of antenna-2 by moving
changing the gap between
antenna and muscle model
from 4 to 10 mm

6 Conclusion

Use of metamaterial split ring resonator at top and bottom of patch has helped in
size reduction upto 70 % and improves bandwidth of antenna. The antenna is also
simulated with muscle model, and effect of return loss by varying distance g is
analyzed. There is good agreement between the measured and simulated findings.

References

1. Zhang K, Soh PJ, Yan S (2020) Meta-wearable antennas-a review of metamaterial based anten-
nas in wireless body area networks. Materials 14(1):149
2. Chaturvedi D, Raghavan S (2019) A compact metamaterial-inspired antenna for WBAN appli-
cation. Wireless Personal Commun 105(4):1449–1460
3. Abbas SM, Esselle KP, Ranga Y (2014) An armband-wearable printed antenna with a full
ground plane for body area networks. In: 2014 IEEE antennas and propagation society inter-
national symposium (APSURSI). IEEE
4. Sabban A (2017) Novel wearable antennas for communication and medical systems. CRC
Press
5. Sarkar SB, Impact of metamaterial in antenna design: a review
6. Bala, Bashir D, et al (2012) Design and analysis of metamaterial antenna using triangular
resonator. In: 2012 Asia Pacific microwave conference proceedings. IEEE
7. Chen ZN, et al. (2014) Metamaterials-based antennas: from concepts to technology. In: 5th
international conference on metamaterials, photonic crystals and plasmonics (META’14)
8. Yılmaz HÖ, Yaman F (2019) Metamaterial antenna designs for a 5.8-GHz Doppler radar. IEEE
Trans Instrum Measur 69(4):1775–1782
9. Rani Rakhi, Kaur Preet, Verma Neha (2015) Metamaterials and their applications in patch
antenna: A. Int J Hybrid Inf Technol 8(11):199–212
10. Ali T, et al (2017) A miniaturized metamaterial slot antenna for wireless applications. AEU-Int
J Electron Commun 82:368–382
Chapter 28
Brain Tumor Detection
and Segmentation Empowered with Deep
Learning

Pooja V. Kamat, Rahul Mansharamani, Pratyush Jain, Sudhanshu Pandey,


Prakhar Agarwal, Shruti Patil, and Rahul Joshi

1 Introduction

A brain tumor is a potentially fatal condition that impairs the normal functioning of
the human body. For an appropriate diagnosis and therapeutic planning, the brain
tumor must be recognized in its early stages. Medical image analysis relies heavily
on digital image processing. Brain tumor segmentation entails separating aberrant
brain tissues from normal brain tissues. Several researchers have presented semi-
automated and completely automatic approaches for detecting and segmenting brain
tumors in the past.
The most prevalent form of tumor in India is a brain tumor, which ranks tenth.
Magnetic resonance imaging (MRI) scanning can identify the existence of a tumor.
The problem arrives as these MRIs are to be studied by the medical practitioners. It
is not only time consuming but also many times MRI lacks details and is difficult to
locate the region of spread of the tumor in the brain MRIs.
Deep learning models have become very efficient at finding and locating hidden
structures as Ranjbarzadeh et al. [1] presented in brain tumor segmentation theory.
Especially lately using computer vision, a lot of image-based difficult tasks have

P. V. Kamat (B)
Department of AI and ML, Symbiosis International (Deemed University), Symbiosis Institute of
Technology, Pune, Maharashtra, India
e-mail: pooja.kamat@sitpune.edu.in
R. Mansharamani · P. Jain · S. Pandey · P. Agarwal · R. Joshi
Department of CSE, Symbiosis International (Deemed University), Symbiosis Institute of
Technology, Pune, Maharashtra, India
S. Patil
Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed
University), Pune, Maharashtra, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 331
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_28
332 P. V. Kamat et al.

been automated. In our approach, we have used the conditional GANs which not
only use the great ability of U-Nets to segment the image but also use patch GAN to
make the models to learn the minute detail of mapping from the brain MRIs to the
ground truth.
In this research work, we have tried to leverage the power of deep learning and
artificial intelligence to not only detect whether a tumor exists or not but also segment
the exact regions where the tumor is spread. Similarly, as stated by Arif et al. [2], the
objective of research is to assist medical practitioners quickly and accurately identify
the tumor spread.
After experimenting with several deep learning models as in Siddique et al. [3] like
VAE, UNETs, and Pix2Pix, we found that Pix2Pix gave us the best results. We have
utilized 250 × 250 brain MRI images of 110 patients. For evaluating the accuracy
of model prediction with ground truth, we have used two different metrics like MSE
(L1 Loss) and SSIM Loss (Structural Similarity Index) according to Brindha et al.
[4].

2 Overview

The deep learning model is supposed to learn a function which can map the relation
between brain MRI images with the ground truth.
The model has to learn to convert the brain MRI info to the segmented image.
Brain tumors are among the most lethal cancers in the world. Glioma, the most
frequent kind of primary brain tumor, is caused by glial cell carcinogenesis in the
spinal cord and brain. So, we will be segmenting the region of tumor spread in the
brain MRIs using deep learning methods. This can be understood better by observing
the images in Fig. 1.
The key objectives of this study are as follows:
1. To detect brain tumors.
2. To segment them using deep learning in order to provide better assistance.

a. Input MRI of brain tumor image b. Ground truth image of tumor to c. Overlapped image
be found using model

Fig. 1 Phases of image transformation


28 Brain Tumor Detection and Segmentation Empowered with Deep Learning 333

3. To improve the performance of segmentation in comparison with respect to


previous works.

3 Methodology

Figure 2 gives us a broader overview of the steps which we have followed. It starts
with getting and storing the data in the required format. Then, this converted data goes
through a data preprocessing pipeline which contains steps like Image Normalization
in which we normalize the image pixels. It is then followed by a center crop where
we crop the interested region of the image and then introduce some rotations to the
random image to make the model robust. This data preprocessing step is followed
by the model where we train the model using Pix2Pix architecture. After that we
evaluate the model using the metrics like MAE and SSIM Loss.

3.1 Method 1: VAE

Autoencoders are a form of neural network that learns from a dataset how to encode
unstructured input. Except for the last layer, the initial section is an encoder, which is
similar to a convolution neural network. Yousef et al. [5] proposed that the encoder’s
purpose is to use the dataset to learn effective data encoding and then transmit it
using a bottleneck design.
Variational autoencoder differs from autoencoder in that it gives a statistic for
characterizing the dataset’s samples in latent space. As a result, with a variational

Fig. 2 System design diagram


334 P. V. Kamat et al.

Fig. 3 U-Net model architecture

autoencoder, the encoder produces a probability distribution rather than a single


output value at the bottleneck layer.

3.2 Method 2: U-Net

The U-Net architecture, which was initially released in 2015, has caused a revolution
in the field of deep learning as highlighted by Hossain et al. [6].
According to the design (Fig. 3), an input picture is transmitted through the model,
followed by a pair of convolutional layers using the ReLU activation function.
This skip link is an important notion for preserving loss from prior layers so that
it reflects more strongly on the total values. Suggested by Rehman et al. [7], they
have also been scientifically demonstrated to offer superior results and accelerate
model convergence. We have a handful of convolutional layers followed by the last
convolution layer in the final convolution block.

3.3 Method 3: Pix2Pix (Proposed Model)

Another type of segmentation model is Pix2pix which is an Generative Adversarial


Network or simply GAN model which is designed merely for general-purpose image-
to-image translation as set out by Creswell et al. [8]. cGAN creates pictures utilizing
actual data, noise, and labels as opposed to vanilla GAN, which just uses real data
and noise to train and produce images.
The Pix2Pix concept is dependent on the training dataset. There is a relationship
between the training examples {x, y} in this pair-to-pair image translation. Alterna-
tively, as proposed by Lata et al. [9], it simply trains a conditional GAN, or cGAN,
to map a function so that the output picture depends on the input (in this case, the
input image).
28 Brain Tumor Detection and Segmentation Empowered with Deep Learning 335

The pix2pix has two significant architectures: U-Net and patchGAN, one for the
generator and the other for the discriminator. The discriminator model determines if
the target image is a feasible transformation of the input image to produce the output
picture from both the input/source image and the target image. In order to create the
output picture, the generator alters the input image.
In 2015, Ronneberger et al. created U-Net particularly for biomedical picture
segmentation.
The two primary parts of U-Net are as follows:
• A contraction path (left side) using convolutional layers that down samples the
data while extracting information.
• A long path consisting of an information-upsampling up transpose convolution
layer (right side) according to Saha et al. [10].
On the other hand, instead of discriminating a complete image all at once,
PatchGAN uses smaller patches of N×N size to determine if a generated image
is real or fake.
As an alternative, Pix2Pix, a pairwise picture translation technique, has an extra
loss that is exclusively meant for the generator, allowing it to generate images that
are more realistic and truer to life. In addition to Pix2pix as examined by Navidan
et al. [11], there are other GANs that may be compared to it, such as CyclicGAN,
which is similar to Pix2pix except for the data part. Instead of pair image translation,
unpaired translation is employed.
The Generator Architecture. Our generator architecture as depicted in Fig. 4 is
based upon U-Nets with hypertuned to our case study.

Fig. 4 Generator architecture model


336 P. V. Kamat et al.

Table 1 Generator
Hyperparameters Value
architecture model
Kernel size 4
Strides 2
Padding 1
Output padding 0

It can be broadly classified into four parts:


A. Convblocks. Convblocks represented by blue cuboids in Fig. 4. It contains one
convolutional layer with hyperparameters’ values given in Table 1.
B. Down-Convolution block. In Fig. 4, this block is indicated by red down arrows.
This block’s task is to divide the size of the input picture in half with each
down-convolution operation, resulting in a 2× reduction in size.
C. Skip Connections. To maintain the loss of information during the compression
in the encoder, we provide the skip connections which is transferring data from
encoder to the decoder as bring out by Vy et al. [12].
D. Up-Convolution layer. The up-convolution operation is represented by the green
up arrows in the decoder network similarly put forward by Wang et al. [13]. The
purpose of this layer is to increase the size of the image by a factor of 2, i.e.,
the image becomes twice of itself on every up-convolution operation. Table 1
describes the hyperparameters used in the generator architecture model
The Discriminator Architecture (PatchGAN). PatchGAN is a discriminator for
Generative Adversarial Networks that penalizes structure only based on the size of
local image patches according to Fan et al. [14]. The PatchGAN discriminator aims
to determine the authenticity of each NN patch in an image.
Convolutionally applying this discriminator to the image yields the result D by
averaging all answers. If pixels that are separated by more than a patch diameter
are considered independent, the discriminator correctly represents the image as a
Markov random field. As brought forward by Pereira et al. [15], it could be thought
to be lacking in texture or fashion.

4 About the Dataset

Both brain MRI images and manual FLAIR abnormality segmentation masks are
included in the dataset utilized in this investigation. The images were provided by
the Cancer Imaging Archive (TCIA). It corresponds to 110 patients from The Cancer
Genome Atlas (TCGA) collection who had at least one FLAIR sequence and genomic
cluster data. To make our model more robust, we used a variety of data argumentation
techniques such as gray scaling, rotation, and so on. This dataset includes MR brain
pictures as well as manual FLAIR abnormality segmentation masks. The pictures
were provided by the Cancer Imaging Archive (TCIA). They correspond to 110
28 Brain Tumor Detection and Segmentation Empowered with Deep Learning 337

Table 2 Hyperparameters values


Parameters Value
Dataset The Cancer Genome Atlas (TCGA)
source
No. of 110
patients
Image 250 × 250
dimensions
Descriptions This dataset includes MR brain pictures as well as manual FLAIR abnormality
segmentation masks

TCGA patients with lower-grade gliomas who have genomic cluster data and at least
one FLAIR sequencing. Both patient information and tumor genetic classifications
are included in the data .csv file. The picture used in this model has a dimension of
250 by 250 pixels as discussed in Table 2.

5 Result

5.1 Performance Matrix

Apart from the performance graphs given in Fig. 5, we can find the generator and
discriminator performances.
Figure 5 represents the performance of the generator model. The training loss is
shown by the blue line, while the validation loss is shown by the pink line. With
each epoch, both lines are dropping, as can be seen (x-axis represents epochs, y-axis
represents loss value) in the figure which was also mentioned in Tumor Segmentation
Quality Assessment as proposed by Hoebel et al. [16]

Fig. 5 Performance matrix


338 P. V. Kamat et al.

We are using two different Quality Assessment metrics as presented in Table 3.


• L1 Loss (MAE).
• SSIM Loss.

L1 Loss (MAE). Mean absolute error, often known as L1 Loss as stated in Fig. 6, is
one of the most fundamental loss functions and a straightforward evaluation metric.
According to Zaini et al. [17], it is determined by taking the absolute difference
between anticipated and actual values and averaging them throughout the whole
dataset.
We may use this metric to compare predicted tumor segmentation to ground truth
at the pixel level. MSE does not lower average error; however, MAE does. Instead,
MSE is very susceptible to outliers. For Image Enhancement, MAE will most likely
provide an image that looks to be of greater quality to a human viewer, whereas MSE
typically produces fuzzy output.
SSIM Loss. The Structural Similarity Index (SSIM) is a perceptual metric used to
compare the similarity of two pictures as shown in Fig. 7.
The Structural Similarity Index (SSIM) measure captures three key elements from
an image similarly proposed by Khan et al. [18]:
• Luminance. Averaging the pixel values yields the brightness. It is usually denoted
by (Mu), and the formula is as follows.
• Contrast. The standard deviation (square root of variance) of all pixel values is
used to compute it. The formula below symbolizes and represents it (sigma) as
stated by Kermiv et al. [19].

Table 3 Comparison of
Models used MAE (L1_LOSS) SSIM loss
different models
UNET 0.016 0.028
Pix2Pix 0.001 0.013

Fig. 6 L1 loss graph


28 Brain Tumor Detection and Segmentation Empowered with Deep Learning 339

Fig. 7 SSIM loss graph

• Structure. To get an output with a unit standard deviation, which enables a more
accurate comparison, we fundamentally divide the input signal by its standard
deviation. With the help of a consolidated formula, the structural comparison is
performed (more on that later).
  
2μx μ y + C1 2σx y + C2
SSIM(x, y) =  2  .
μx + μ2y + C1 σx2 + σ y2 + C2

In Fig. 8, the left column represents the input MRI images as proposed by Thaha
et al. [20], the central column represents the target CT images, and the right column
represents the generated CT images produced by the model.
340 P. V. Kamat et al.

Fig. 8 Results of MRI brain tumor images


28 Brain Tumor Detection and Segmentation Empowered with Deep Learning 341

6 Conclusion

In this report, we presented and analyzed a few approaches for detecting brain tumors
and segment them into different types. This study was conducted using a publicly
available dataset: LGG segmentation dataset. We have compared the Pix2Pix model
against the VAE AND UNET model. The UNET model gave us the accuracy (1-
MAE) of 92%, while the Pix2Pix model gave us an accuracy of 99%. Hence, we
found that conditional GANs, i.e., Pix2Pix would be the best option for segmenting
the brain tumor.
Moreover, the model’s performance could be enhanced by integrating or adding
additional parameters of the dataset. Getting research like these in a ready to use
condition and accessible to everyone is difficult, because of the reasons like lack
of an efficient amount of data to train our own custom model or creating a custom
model which stays updated with the new upcoming technologies.
So to overcome such problems, this model can be built and deployed in production
which can be accessible from a website using which medical practitioners can easily
get assistance and can be available to every medical official. This can be further
extended to hospitals and medical agencies which can help them in better assessment
and detection of brain tumor and its segmentation. Another element can also be added
as to categorize the type of brain tumor mainly in primary brain tumors and secondary
brain tumors and further classify into different types.

References

1. Ranjbarzadeh R, Bagherian KA, Jafarzadeh GS, Anari S, Naseri M, Bendechache M (2021)


Brain tumor segmentation based on deep learning and an attention mechanism using MRI
multi-modalities brain images. Sci Rep 11(1):10930
2. Arif M, Ajesh F, Shamsudheenl S, Geman O, Izdrui D-R, Vicoveanu D (2022) Brain tumor
detection and classification by MRI using biologically inspired orthogonal wavelet transform
and deep learning techniques. J Healthcare Eng
3. Siddique N, Paheding S, Elkin CP, Devabhaktuni V (2021) U-Net and its variants for medical
image segmentation: a review of theory and applications. IEEE Access 9:82031–82057
4. Gokila Brindha P, Kavinraj M, Manivasakam P, Prasanth P (2021) Brain tumor detection from
MRI images using deep learning techniques. IOP Conf Ser Mater Sci Eng 1055:012115
5. Yousef R, Gupta G, Vanipriya CH, Yousef N (2021) A comparative study of different machine
learning techniques for brain tumor analysis. Mater Today Proc. https://doi.org/10.1016/j.matpr.
2021.03.303
6. Hossain T, Shishir FS, Ashraf M, Al Nasim MA, Shah FM (2019) Brain tumor detection
using convolutional neural network. In: 1st international conference on advances in science,
engineering and robotics technology (ICASERT), Dhaka, Bangladesh
7. Rehman M, Cho SB, Kim J, Chong K (2020) BU-Net: brain tumor segmentation using modified
U-Net architecture. Electronics
8. Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative
adversarial networks: an overview. IEEE Sig Process Mag 35(1):53–65
9. Lata K, Dave M, Nishanth KN (2019) Image-to-image translation using generative adversarial
network. In: 2019 3rd international conference on electronics, communication and aerospace
technology (ICECA)
342 P. V. Kamat et al.

10. Saha A, Zhang YD, Satapathy SC (2021) Brain tumour segmentation with a multi-pathway
ResNet based UNet. J Grid Comput 19:43
11. Navidan H, Moshiri PF, Nabati M et al (2021) Generative adversarial networks (GANS) in
networking: a comprehensive survey and evaluation. Comput Netw
12. Vy NHA, Uyen LTT, Linh HQ (2022) Segmentation of brain tumour using UNET architecture.
In: Van Toi V, Nguyen TH, Long VB, Huong HTT (eds) 8th international conference on the
development of biomedical engineering in Vietnam. BME 2020. IFMBE Proceedings, vol 85.
Springer, Cham
13. Wang S, Dai C, Mo Y, Angelini E, Guo Y, Bai W (2020) Automatic brain tumour segmentation
and biophysics-guided survival prediction. In: Crimi A, Bakas S (eds) Brainlesion: glioma,
multiple sclerosis, stroke and traumatic brain injuries. BrainLes 2019. Lecture notes in computer
science, vol 11993. Springer, Cham
14. Fan C, Lin H, Qiu Y (2022) U-Patch GAN: a medical image fusion method based on GAN. J
Digit Imaging
15. Pereira S, Pinto A, Alves V, Silva CA (2016) Brain tumor segmentation using convolutional
neural networks in MRI images. IEEE Trans Med Imaging 35(5):1240–1251
16. Hoebel K, Andrearczyk V, Beers A, Patel J, Chang K, Depeursinge A, Müller H, Kalpathy-
Cramer J (2020) An exploration of uncertainty information for segmentation quality assess-
ment. Proc SPIE 11313. Medical Imaging
17. Syed Zaini SZ, Sofia NN, Marzuki M, Abdullah MF, Ahmad KA, Isa IS, Sulaiman SN (2019)
Image quality assessment for image segmentation algorithms: qualitative and quantitative anal-
yses. In: 2019 9th IEEE international conference on control system, computing and engineering
(ICCSCE)
18. Khan AH, Abbas S, Khan MA, Farooq U, Khan WA, Siddiqui SY, Ahmad A (2022) Intelligent
model for brain tumor identification using deep learning. Appl Comput Intell Soft Comput
2022:8104054
19. Kermi A, Mahmoudi I, Khadir MT (2019) Deep convolutional neural networks using U-Net
for automatic brain tumor segmentation in multimodal MRI volumes. In: Lecture notes in
computer science, pp 37–48
20. Thaha MM, Kumar KPM, Murugan BS, Dhanasekeran S, Vijayakarthick P, Selvi AS (2019)
Brain tumor segmentation using convolutional neural networks in MRI images. J Med Syst
43(9)
Chapter 29
Security of Electronic Voting Systems
Using Blockchain Technology

Rakesh Kumar Pandey and Rakesh Kumar Tiwari

1 Introduction

The nation, as well as the voters and their trust, depend on the integrity of an electronic
voting system. The government also thinks that electronic voting increases voter trust
while also increasing interest in voting. With the deployment of these electronic
voting systems, two key objectives can be accomplished as described by the authors
Anita et al. [1]: first, the expense of holding a presidential election is greatly reduced,
and second, voting locations are made more secure. Secure electronic voting is a
component of multiparty computations, in which a group of people makes decisions
that are kept hidden from one another.
A safe and reliable bulletin board is necessary to provide voters with a unified
viewpoint, but it is unclear to the administration whether or not this board (the public
bulletin) can be relied upon. Blockchain is regarded as a reliable option for building
secure message boards that the general public can trust. A safe and decentralized
platform for users is provided by the emerging field of blockchain technology.
Election security may be a subject of national security in every democracy. To
reduce the cost of organizing a national election while meeting and strengthening the
security criteria of an election, the probability of electronic voting systems has been
studied for 10 years in the field of computer security. Pen and paper commutation
has been a part of the legal system ever since elections were conducted democrat-
ically. The use of a substitute election technology with the conventional pen-paper
method is essential to reduce fraud and make the voting process traceable and veri-
fiable. Security experts view electronic voting equipment as defective based purely
on worries about physical security. Such a device will be sabotaged by anyone who

R. K. Pandey (B) · R. K. Tiwari


TIT Science, Bhopal, India
e-mail: rp174202@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 343
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_29
344 R. K. Pandey and R. K. Tiwari

has physical access to it, leading all votes cast through it to be altered. Blockchain
might be a distributed, irrefutable, immutable public ledger.
This article evaluates the usage of blockchain technology to create an associate’s
degree electronic voting system.

1.1 Background Details

The authors of this paper, Aste et al. and Roehrs et al. [2, 3] defined the term
blockchain as a collection of data (a “block”) that is protected by the widely used SHA
256 algorithm. Blockchain functions as a list of linked records, where newly, block
is always added at the end and contains the hash of the block before it (Lauf et al.
and Khan et al.) [4, 5]. Figure 1 shows the block details of the blockchain-applied
electronic voting system.
In the blockchain, each block is stored in a decentralized manner, often known
as a peer-to-peer network, with no central authority. There are two keys (public
and private) used for each node, one for rendering data unreadable and the other for
rendering it readable once more (Aumasson et al.) [6]. The data encrypted by a public
key that matches a private key can be decrypted. According to the author, Zheng
et al. [7] asymmetric cryptography is what will enable blockchain to have a non-
recoverable and stable characteristic. The characteristics of blockchain technology
are shown in Fig. 2.
It offers a database that is decentralized and doesn’t require a reliable third party.
Each node in this system keeps the block of data values locally. It was initially
developed to offer secure peer-to-peer money transfers, but it is now being utilized
in a variety of other industries, including healthcare, e-voting, and IoT devices as
described by the author Mathur et al. [8]. The standard SHA 256 algorithm can be
better understood with the help of Fig. 3.
• The SHA-256 method can produce an output of a specified length from an input
of any random length (256 bits).

Fig. 1 Voting blocks of blockchain-applied electronic voting system


29 Security of Electronic Voting Systems Using Blockchain Technology 345

Immutable

Anonymity Cryptography Provenance Transparency

Decentralization

Fig. 2 Architecture of a blockchain with its specific traits

Fig. 3 Steps of SHA-256 algorithm

• No matter how large or tiny the input is, when using the SHA-256 algorithm, the
output has a constant length (256 bits).

The following are characteristics of a cryptographic hash function.


1. Deterministic: This means that if we enter the same information multiple times,
the outcome will always be the same.
2. Quick computation: This indicates that the outcome is produced rapidly, which
raises the effectiveness of the system.
3. Pre-Image resistance: Assume that when we roll a dot (1–6), the result is the
hash value rather than a particular number. We now compute each number’s
hash value and contrast it with the outcome. Additionally, breaking pre-Image
346 R. K. Pandey and R. K. Tiwari

resistance through a brute force approach is conceivable for bigger datasets, but
the time required makes this strategy useless.
4. Sufficiently small modifications have a significant influence on the complete
output: Small input changes can have a significant impact on the overall output.
5. Resistance to collisions: Each input will include a distinct hash value.
6. Suitable for puzzles: The hash value of a new variable is determined by the
combination of two values.

1.2 Motivation

A chain of blocks that contain data makes up the blockchain. There is a hash reference
in each block pointing to the information in the block preceding it. As a result, any
modifications made to a single block by a hacker will have an impact over the entire
chain, which makes this concept extremely unique.
1. The distributed ledger has multiple locations, with no one point failure.
2. Any proposed “new block” to the ledger should refer to the prior version of the
ledger, without compromising the correctness of earlier entries, to construct the
changeless chain from which the blockchain derives its name.
3. A newly proposed block of entries cannot be made a regular part of the ledger
until it is approved by a majority of network nodes.
The system creates the following unique contributions:
The first step is to look into blockchain frameworks that can already be utilized to
create smart contracts and electronic voting platforms. The second step is to suggest
a blockchain-applied electronic voting system that modifies liquid democracy by
using a “permissioned blockchain” (Chaum et al. [9]).

1.3 Objectives

The voting system here must therefore fulfill the following requirements:
1. The voting process must be openly auditable and transparent.
2. The electoral process must ensure that each voter’s vote was recorded.
3. Only eligible electors may cast ballots.
4. Voting procedures must be unbreakable.
5. Election influencing and rigging should not be permitted by any group seeking
power.
The most crucial requirements are met by a blockchain:
• Authenticity: Only registered voters will be permitted to cast a ballot.
• Anonymity: The system forbids any connection to be made between the identity
of the voters and the votes they cast.
29 Security of Electronic Voting Systems Using Blockchain Technology 347

• Accuracy: Once cast, votes are irrevocably recorded and cannot, under any
circumstances, be reversed.
• Verifiability: The system should be able to be checked to ensure that all votes
were cast.

2 Related Work

2.1 Literature Review on Existing Work

The underlying justification for the security model with evaluation metrics for is
presented by Adida et al. [10], in this study. Additionally, it describes the pretty
graspable democracy web voting theme, which is more understandable than pretty
smart democracy, the only other theme that currently fits both the adequate security
model and the intended security model.
Scantegrity, described by the authors Chaum et al. [9] and having negligible impact
on election operations, represents the initial standalone E2E verification technique
that secures optical scanning as the underlying voting mechanism while allowing for
a revote.
To assure justice, the article’s author Dalia et al. [11] advises adding a commitment
round, and if voters abort, adding a recovery round that would allow the election
results to be announced. It also offered a computational security demonstration of
ballot secrecy.
The author Bell et al. [12] of the article, discusses the STAR-Vote design, which
might serve as Travis County’s and possibly other places’ preferred next-generation
electoral system.
By utilizing Ethereum, as introduced by the author McCorry [13] an open vote
network (OVN), the first use of an online voting system that is transparent, self-
tallying, and self-reporting. The voting size in OVN was constrained by the frame-
work to 50–60 electors. The OVN is powerless to halt the systemic corruption caused
by dishonest miners. By sending an invalid ballot, a dishonest voter can also evade
the voting process. The election administrator wishes to trust, but the protocol makes
no provisions for guaranteeing the ability to resist violence as stated by the authors
Zhang et al. and Chaieb et al. [14, 15].
Additionally, they needed an additional library to complete the task because
solidity somehow doesn’t allow elliptic curve cryptography (Woda et al.) [16]. Once
the library was implemented, these generated contracts of voting got too big for
storage in the blockchain. Due to previous instances of service attacks on the bitcoin
network, OVN is susceptible to them (Hjálmarsson et al.) [17].
Lai et al. in [18] presented DATE, which stands for “A decentralized anonymous
transparent e-voting system” and has a lesser chance for participant credibility. They
think that massive electronic elections can be conducted using the DATE voting
system as it is currently set up. However, their proposed methodology lacks a third
authority in charge of auditing the vote after the election process, hence it is ineffective
348 R. K. Pandey and R. K. Tiwari

at preventing DoS assaults. This approach is only suitable for compact sizes due to
the constraints of the platform.
Shahzad et al. [19] recommended the BSJC proof of completeness as a reliable
electronic voting process. They used a process model to describe the framework of
the whole system. It also made a smaller-scale effort to address issues with election
security, privacy, and anonymity. Yet numerous difficulties have been raised. For
instance, the mathematical task required to prove labor is significant, challenging,
and labor-intensive. When a third party is engaged, there is also an issue because
there is a high possibility of data manipulation, leaks, and unfair outcomes that could
affect end-to-end verification. On a wide scale, the block’s generation and sealing
could prolong the polling procedure.
An audit function-equipped anti-quantum electronic voting mechanism based
on blockchain has been proposed by Zheng et al. [20]. Moreover, modifications
have been made to the code-based Niederreiter algorithm to strengthen its resistance
against quantum attacks. The key generation center (KGC) is a certification authority
for certificate-less cryptography. In addition to recognizing the voter’s anonymity, it
significantly streamlines the auditing procedure. Yet, a closer examination of their
approach reveals that, even with a modest voter turnout, there are still considerable
security and efficiency benefits associated with this small-scale election.
To improve security, some efficiency may be decreased if the number is high as
described by the author Fernández-Caramés et al. [21].
Yi [22] provided ideas for strengthening the electronic voting system’s security in
a peer-to-peer network in his description of the blockchain-applied e-voting scheme.
A BES based on distributed ledger technology (DLT) might be used to stop voter
fraud. The system was developed and tested on Linux machines connected to a P2P
network. The main issue with this method is attacks using counter-measures. This
method necessitates the involvement of reliable third parties and is not ideal for
centralized application in a system with several agents. A distributed approach, such
as the usage of secure modular computers, may be used to resolve the problem. The
cost of computing could become unaffordable in this case, though, if the computation
function is complex and there are too many participants (Torra et al. and Khan et al.)
[23, 24].

2.2 Research Gap

One of the most recent and important technical difficulties facing e-voting systems
is secure digital identity management. Before the elections, everybody who wants to
become a citizen should register to vote. Their information ought to be in a digitally
processable format.
29 Security of Electronic Voting Systems Using Blockchain Technology 349

In addition, any information that involves them should keep their identity
information private. The following issues with the outdated e-voting system:
• Voting anonymously: After casting a ballot through the system, which may
or may not include a choice for each candidate, voters should maintain their
anonymity, including the system administrators.
• Customized voting procedures: It’s still up for debate how votes are represented
in the relevant databases or web apps. A hashed token is more likely to provide
obscurity and integrity than a transparent text message, which is the worst possible
strategy. In the meanwhile, the vote should be disreputable because it cannot be
secured by a symbolic resolution.
• Voter-verifiable ballot casting: The voter should be prepared to see and confirm
his or her vote at the time the ballot is cast. This is frequently important to under-
stand to stop, or at the very least to be aware of, any potential hostile conduct.
In addition to offering non-repudiation suggestions, this counter-live can signif-
icantly increase the voters’ sense of trust. Some modern applications partially
self-address these concerns. However, evidence reveals that numerous nations,
like Brazil, the UK, Japan, and the Republic of Estonia are currently using elec-
tronic voting. The Republic of Estonia should be rated differently from the others
because they offer a complete e-voting system that is compared to traditional
paper-based elections.
• Expensive initial deployments, especially for businesses: While operating and
maintaining online voting systems are much less expensive than conducting
traditional elections, early deployments can be expensive.
• Growing security issues: Public opinion polls are seriously threatened by cyber-
attacks. If an election is compromised by malicious hacking, nobody would accept
the blame.
DDoS assaults are well-documented and rarely occur during elections. The United
States Citizen Integrity Commission has provided an affidavit regarding the state of
the country’s elections. Ronald Rivest made it clear that “hackers have a variety
of approaches in which to attack pick machines” as a result. As an illustration, the
hacking technique may make use of the barcodes on ballots and smartphones at
specific locations. Apple explicitly states that we shouldn’t dismiss the fact that
computers can be hacked and that any proof can be easily erased. Double voting and
voters from opposing regions are other frequent problems.

3 The Impact of Blockchain on Electronic Voting Systems

By making voting clear and simple to use, avoiding voting fraud, boosting data secu-
rity and confirming the results, blockchain technology addressed problems with the
current electoral system. The blockchain must implement the electronic comput-
erized voting procedure (Xiao et al. [25]). Yet, there are also significant security
concerns with electronic voting, such as the potential for vote fraud and abuse if
350 R. K. Pandey and R. K. Tiwari

Fig. 4 Blockchain-based versus traditional voting

a voting system is compromised. Despite all of its potential advantages, nation-


wide adoption of electronic voting is still lacking. Blockchain technology provides a
workable workaround for the risks associated with today’s electronic voting. Figure 4
illustrates the main distinction between the two systems.
It is a digitally decentralized platform, secured, and transparent where manipula-
tion or fraud can only be done using proper technology. Because of the blockchain’s
decentralized architecture, an electronic voting system based on bitcoin reduces the
risks associated with online voting while also making the voting process tamper-
proof. Figure 5 depicts the requirement for a fully distributed voting infrastructure
for a blockchain-based electronic voting system. The author of this study. Impe-
rial [26], emphasized that blockchain-based electronic voting will only be viable in
settings where no single organization, not even the government, has entire authority
over the system for voting online.
In conclusion, free and fair elections can only occur in a society if the legitimacy
of those in positions of power is widely accepted. Polling can be strengthened in
regard to administration and engagement by drawing on expertise in the following
fields as a starting point. But, blockchain technology provided a novel method to
electronic voting.
Despite being decentralized and entirely transparent, the blockchain voting mech-
anism protects voters. This indicates that anyone can use blockchain electronic voting
to count the votes, but no one will know who cast a vote for whom. The block detail
of the e-voting system using blockchain technology is shown in Fig. 6. Both conven-
tional e-voting and electronic voting powered by blockchain apply to very varied
workplace conceptions.
29 Security of Electronic Voting Systems Using Blockchain Technology 351

Fig. 5 Blockchain-based electronic voting system

Voter’s ID

Vote

Vote’s Signature

TimeStamp

Hash of the previous Block

Fig. 6 Block detail of the e-voting system

4 Conclusion

A secure electronic voting system is part of multiple-party computations where a


group of persons makes their choice which is kept secret from one another. To
provide a consistent view for the voters there is a need for a secure and trusted
352 R. K. Pandey and R. K. Tiwari

bulletin board, it is also not clear to the administration whether this board (public
bulletin) can be trusted or not. Blockchain is considered a trusted solution for creating
a secure bulletin board that can be trusted publically. Blockchain is a new growing
technology that provides a secure and peer-to-peer platform for users. Therefore,
this paper surveyed the usage of blockchain in electronic voting, showing how the
existing electronic voting system has been replaced.

References

1. Lahane, A.A., Patel, J., Patha, T., Potdar, P.: Blockchain technology based e-voting system.
ITM Web Conf. 32, 1–8 (2020)
2. Aste T, Tasca P, Di Matteo T (2017) Blockchain technologies: the foreseeable impact on society
and industry. Computer 50:18–28
3. Roehrs A, da Costa CA, da Rosa Righi R, Alex R, Costa CA, Righi RR (2017) OmniPHR: a
distributed architecture model to integrate personal health records. J. Biomed. Inform. 71:70–81
4. Sleiman, M.D., Lauf, A.P., Yampolskiy, R.: Bitcoin message: data insertion on a proof-of-work
cryptocurrency system. In: Proceedings of the 2015 International Conference on Cyberworlds
(CW), Visby, Sweden, 7–9 Oct. 2015, pp. 332–336
5. Khan, M.A., Salah, K.: IoT security: review, blockchain solutions, and open challenges. Future
Gener. Comput. Syst. 82, 395–411 (2018)
6. Aumasson, J.: Serious Cryptography: A Practical Introduction to Modern Encryption. No
Starch Press, San Francisco, CA, USA (2017)
7. Zheng, Z., Xie, S., Dai, H., Chen, X., Wang, H.: An overview of blockchain technology:
architecture, consensus, and future trends. In: Proceedings of the 2017 IEEE International
Congress on Big Data (BigData Congress), Boston, MA, USA, 11–14 Dec. 2017, pp. 557–564
8. Mathur, G., Pandey, A., Goyal, S.: Immutable DNA sequence data transmission for next gener-
ation bioinformatics using blockchain technology. In: 2nd International Conference on Data,
Engineering and Applications (IDEA), Bhopal, India, pp. 1–6 (2020). https://doi.org/10.1109/
IDEA49133.2020.9170715
9. Chaum, D., Essex, A., Carback, R., Clark, J., Popoveniuc, S., Sherman, A., Vora, P.: Scantegrity:
end-to-end voter-veriable opticalscan voting. IEEE Sec. Privacy 6(3), 40–46 (2008)
10. Adida, B.: Helios: web-based open-audit voting. In: Proceedings of the 17th Conference on
Security Symposium, ser. SS’08. USENIX Association, Berkeley, CA, USA, pp. 335348 (2008)
11. Dalia, K., Ben, R., Peter, Y.A., Feng, H.: A fair and robust voting system. by broadcast. In: 5th
International Conference on E-voting (2012)
12. Bell, S., Benaloh, J., Byrne, M.D., Debeauvoir, D., Eakin, B., Kortum, P., McBurnett, N.,
Pereira, O., Stark, P.B., Wallach, D.S., Fisher, G., Montoya, J., Parker, M., Winn, M.:
Star-vote: a secure, transparent, auditable, and reliable voting system. In: 2013 Electronic
Voting Technology Workshop/Workshop on Trustworthy Elections (EVT/WOTE 13). USENIX
Association, Washington, DC (2013)
13. McCorry, P., Shahandashti, S.F., Hao, F.: A smart contract for boardroom voting with maximum
voter privacy. In: Proceedings of the International Conference on Financial Cryptography and
Data Security, Sliema, Malta, 3–7 Apr. 2017. [Google Scholar]
14. Zhang, S., Wang, L., Xiong, H.: Chaintegrity: blockchain-enabled large-scale e-voting system
with robustness and universal verifiability. Int. J. Inf. Sec. 19, 323–341 (2019) . [Google Scholar]
[CrossRef]
15. Chaieb, M., Koscina, M., Yousfi, S., Lafourcade, P., Robbana, R.: DABSTERS: distributed
authorities using blind signature to effect robust security in e-voting. Available online https://
hal.archives-ouvertes.fr/hal-02145809/document. Accessed on 28 July 2020
29 Security of Electronic Voting Systems Using Blockchain Technology 353

16. Woda, M., Huzaini, Z.: A proposal to use elliptical curves to secure the block in e-voting
system based on blockchain mechanism. In: Proceedings of the International Conference on
Dependability and Complex Systems, Wrocław, Poland, 28 June–2 July 2021. [Google Scholar]
17. Hjálmarsson, F.Þ., Hreiðarsson, G.K., Hamdaqa, M., Hjálmtýsson, G.: Blockchain-based e-
voting system. In: Proceedings of the 2018 IEEE 11th International Conference on Cloud
Computing (CLOUD), San Francisco, CA, USA, 2–7 July 2018. [Google Scholar]
18. Lai, W.J., Hsieh, Y.C., Hsueh, C.W., Wu, J.L.: Date: a decentralized, anonymous, and trans-
parent e-voting system. In: Proceedings of the 2018 1st IEEE International Conference on Hot
Information-Centric Networking (HotICN), Shenzhen, China, 15–17 Aug. 2018
19. Shahzad B, Crowcroft J (2019) Trustworthy electronic voting using adjusted blockchain
technology. IEEE Access 7:24477–24488
20. Gao, S., Zheng, D., Guo, R., Jing, C., Hu, C.: An anti-quantum e-voting protocol in blockchain
with audit function. IEEE Access (2019)
21. Fernández-Caramés, T.M., Fraga-Lamas, P.: Towards post-quantum blockchain: a review on
blockchain cryptography resistant to quantum computing attacks. IEEE Access 8, 21091–21116
(2020). [Google Scholar] [CrossRef]
22. Yi, H.: Securing e-voting based on blockchain in P2P network. EURASIP J. Wirel. Commun.
Netw. 2019, 137 (2019). [Google Scholar] [CrossRef][Green Version]
23. Torra V (2019) Random dictatorship for privacy-preserving social choice. Int. J. Inf. Sec.
19:537–543
24. Khan KM, Arshad J, Khan MM (2020) Investigating performance constraints for blockchain
based secure e-voting system. Future Gener. Comput. Syst. 105:13–26
25. Xiao, S., Wang, X.A., Wang, W., Wang, H.: Survey on blockchain-based electronic voting.
In: Proceedings of the International Conference on Intelligent Networking and Collaborative
Systems, Oita, Japan, 5–7 Sept. 2019
26. Imperial, M.: The democracy to come? An enquiry into the vision of blockchain-powered
e-voting start-ups
Chapter 30
Go-Kart Simulation in HoloLens

K. Paridhi, Shola Olabisi, Y.V. Srinivasa Murthy, and J. Vaishnavi

1 Introduction

There is a huge gap between enterprise applications and gaming applications for
HoloLens [1]. People either tend to use the HoloLens for industrial or fun purposes.
However, a lot of daily life problems can be solved through HoloLens applica-
tion development because of HoloLens features. Hence, an effort has been made to
develop an application with a creative enterprise solution which can be useful for car
industries both manufacturing and sales departments [2, 3].
We have got motivation through the major car companies working to deliver
mixed reality appearance to enhance car’s features. Here are some quotes from these
companies.
Make Way for Holograms: New Mixed Reality Technology incorporates with Car Design
as Ford Tests Microsoft HoloLens Globally. - FORD [4]

HoloLens: Peering into the soul of a Volvo. - Volvo [5]

Volvo engineers uses Microsoft HoloLens for car designing digitally. Since simu-
lation plays an important role to design cars, Swedish engineers are the first to use of
HoloLens mixed reality to interact with virtual parts. An around 165 million dollars

K. Paridhi · Y.V. Srinivasa Murthy (B) · J. Vaishnavi


Vellore Institute of Technology (VIT), Vellore, Tamil Nadu 632 014, India
e-mail: vishnu.murthy@vit.ac.in
J. Vaishnavi
e-mail: jvaishnavi.2019@vitstudent.ac.in
URL: http://www.vit.ac.in
S. Olabisi
College of Engineering Technology, Rochester Institute of Technology (RIT), Rochester, NY
14623, US
e-mail: sooiee@rit.edu
URL: https://www.rit.edu/

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 355
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_30
356 K. Paridhi et al.

is spend on autonomous vehicle test facility to start Phase-II construction work [6].
Apart from various engineering visualizations and remote diagnosis, other advan-
tages of the HoloLens could deliver a race team. HoloLens could offer notable driver
profits. With several emerging head mounted display technologies, it is essential to
comprehend what makes the HoloLens ‘Mixed Reality’ approach different [7].
Virtual reality (VR) devices like the Oculus Rift delivers immersive experiences
which substitutes real world. This excludes you from seeing the interior of the actual
car. Furthermore, the Oculus Rift is a fully tethered appliance, necessitating a large
gaming computer and several wires running from a computer. However, this creates
a fun gaming environment but has no practical usage in real world for racing cars
and simulation in next-generation simulating training systems.
HoloLens is a wearable computer which adds various other inventions such as
the speed AI and speech engines, gaze, gestures, spatial understanding, spatial audio
and several other sensors [8]. The HoloLens provides several inventions that can
carefully and safely keep the driver well-versed, competitive and in control on both
race world and in the simulation. The HoloLens also delivers many advantages over
conventional immersive headset technologies. The most significant being that it can
identify what the driver is looking at, a feature Microsoft HoloLens delievers as
‘Gaze’ [9].
HoloLens uses mixed reality (MR) technology to interact with the real world.
MR blends VR and augmented reality (AR) technologies to create an environment
where both physical and virtual objects become interactable on an instance [10]. This
feature to interact with both physical and digital objects gives MR applications an
immense number of potential applications. HoloLens could potentially turn out to
be common in schools, colleges, hospitals and used in a variety of other professions.
Not only this but MR will also be seen in the retail departments like e-commerce and
fashion.
Holographic technologies are also being used in the education and healthcare
industry to both enhance students’ ability to learn and being interactive [11]. The
following are the simple ways where MR can help in the classroom.
i. Interact objects with the environment in an immersive experience.
ii. Touch and manipulate 3D objects in real-world environment.
iii. It is an interactive and fun way of learning.
iv. MR can also be used to teach different subjects to specially abled students.
A majority of the fields including civil, mechanical, architects and others have been
using MR to design things like buildings and cars as digital prototypes of real world.
Companies have invested in cave automatic virtual environment (CAVE) technology,
where developer teams can view objects projected on the floor and modify designs
on the same time as reshaping, removing/adding different elements, saving money
on physical models and speeding up design. This can also be helpful for working
remotely, and engineers will be able to view the objects remotely through a immersive
headset to connect, interact and identify problems or collaborate with workers on
site in real time. Using MR technology engineers in other disciplines will also work
differently as the tools upend the design process.
30 Go-Kart Simulation in HoloLens 357

In this paper, a conventional clay car model is transformed into digital objects
embedded in the real world [12]. To embed fully functional digital objects into real
world, we need to make the real-world environment to work with mixed reality
technologies. In this paper, we have also implemented ML-based self-driving car
models in the Go-Kart system so that we can have an automated car. This will
be a very interactive, less time-consuming system with one time investment as the
people would not have to rely on conventional clay model to be made to show the
specifications, both producers and consumers can modify, communicate the system
and cars anytime. Not only they can have an interactive 3D system in real world,
they can also see the automated version of a car in a track with a self-driving deep
learning car model in it [13]. The examined capability of the application that could.
i. Provide consumer with suitable car features and information based on what they
look, and interactions possessed.
ii. Provide cars with suitable real-time information in response to the commands
and speech commands given through MR technology and self-driving car model.
iii. Track the position of the car and give out best projection.
iv. Provide drivers with relevant car related information through inbuilt object detec-
tion system in self-driving training model, e.g. speed and steering angle.
v. Allow the car to be configured by the person through voice, gaze and other
HoloLens interactions facilities.
The rest of the paper is organized as follows: Sect. 2 gives the brief research
happened in the field of Go-Kart simulation. The proposed methodology is clearly
explained in Sect. 3. Section 4 displays the simulated images and observations. The
paper is concluded in Sect. 5 with remarks.

2 Literature Review

There is a very few literature available related to this work. Moreover, there are no
sufficient data links that are available for such systems because mostly people focus
it for fun gaming environment. No dataset is directly available to develop a self-
driving car model. Hence, we have to develop some datasets first and then train the
model for Go-Kart simulation. The tasks such as object detection and classification
are burdensome from such a dataset.
It is possible to use the popular convolutional neural networks (CNNs) for object
detection and tracking. Real-time processing CNN contains many interconnections
and complicated mathematical computations which requires plenty of processing
power and computation time [14]. The precision of the image dataset is directly
dependent on its computation time. However, concerning the model to be a real time
a compensation to accuracy is required for better computation time. The categorized
dataset cannot be used again by several detection approaches because they need
distinctive preprocessing and clustering functions. A lot of research is still left for
developing mixed reality applications and especially in car manufacturing industries
358 K. Paridhi et al.

[15]. The designing process and creating a new set of data are quite difficult and
tedious. In this work, we made an effort to use mixed reality HoloLens concept for
Go-Kart simulation.

3 Proposed Methodology

An effort has been made to develop a Go-Kart simulated system in HoloLens through
mixed reality application in which we can see the detailed version of a chosen car,
chosen race track and simulation of an automated driving car scene. These features
were made through following three modules.
i. Developing deep learning self-driving car model.
ii. Developing mixed reality application.
iii. Configure CNN model to the mixed reality application car in racetrack scene.

3.1 Mixed Reality Application Development

The development of application required making race track, which we created and car
model which we used from standard assets provided by Unity and imported in Unity.
Further, the task to develop elements like buttons, panels, scenes to add function is
considered and specified features to it. The code has been done using Visual Studio
(VS) 2019. Later, we have deployed the application to remote machine (HoloLens).
The system consists of a mixed reality application which is a platform to see the 3D
car model, 3D track and their specifications. Car simulation with self-driving deep
learning mode has been implemented in HoloLens and a hardware prototype with
all software and hardware specifications to build such model.

3.2 Deep Learning Self-driving Car Model

In this paper, we first implemented the task of detecting lane lines for the car to give
them the direction and further focused on implementing number detection and traffic
signal detection. After configuring the code of these algorithms, the task of recording
has been started through left, centre and right cameras, respectively. Further, images
of nearly 13,000 have been collected to form the dataset. The collected images
have been pre-processed it to train the model with different techniques like zoomed
images (focusing only track), augmentation techniques and panned images[16].
Considering the behavioural cloning (Nvidia model Architecture) for training, the
pre-processed images have been trained by a neural network model. In this case,
we have implemented CNN model with backpropagation techniques to minimize
30 Go-Kart Simulation in HoloLens 359

the error function of the chosen task. The information related to training model
architecture is explained below.
Training model architecture After pre-processing all the data, we started designing
our model architecture to train such data. But, there was a problem to deal with such
large datasets because there were about 35,000 images for traffic signs detection of
32 × 32 order. Now, we have 13,000 images that are taken from centre, left and right
cameras to train the car model with 200 × 66 order. In this case, a suitable model for
behavioural cloning is called the Nvidia model.
The model proposed by the Nvidia model is an end-to-end learning for self-
driving cars which are implemented by real-life self-driving cars. The beginning
of the architecture model can be seen with an input plane consisting ultraviolet
(UV) images, which are already normalized and pre-processed through the code.
Here onwards, we begin the architecture of our model as you can see the Nvidia
model starts with an input plane which consists of our 66 × 200 by UV images and
these images are then normalized in the architecture. This data is then passed to
convolutional layer. Ensuring that we imported Conv2D libraries, added layer by
layer convolutional network.
The first layer consists of 24 filters with a kernel of the size 5 × 5. The kernel
will then be passed through our image by strides (function which refers to the stride
length). This will translate all the small image files to one pixel. So that can get
larger images with many more pixels to process through. Then we will use ReLU
activation function to add such layers to our CNN. Next layer will be a 2D layer
consisting of 36 filters with a kernel size of 5 × 5. Similarly, all the layers will be
added to convolutional layers keeping in mind regarding their kernel size and images.
Further, we finally combine all the layers to get our training model with the error
metric being squared error so the loss will be equal to minimum mean squared error
(MSE) and we will use adam optimizer.
By keeping low learning, it can help in improving accuracy and then trained
this architecture. To overcome the issues of over-fitting, we have also used dropout
layers in between. This will also help to generalize the training data. Also, it will use
combinations of various nodes to understand from the given data. At the verdict, we
have collected the parameters details to get an in-depth summary of all the parameters
inside our model. In order to train the data, we used 30 epochs, which is pretty high
level but this will result an efficient trained model to be implemented. The network
architecture diagram is given in Fig. 1.

3.3 Configuration of Trained Model with Application

Configuring deep learning self-driving car model to the car in mixed reality applica-
tion racetrack scene. It is observed that the model is efficient for our car model. We
configured the model with the python code from command prompt to the application
or unity racetrack, and then, it is all set to be able to use and also displays auto-
360 K. Paridhi et al.

Fig. 1 NVIDIA self-driving car architecture considered for the experimentation

mated simulation to the car. To configure the deep learning model, we tried to make
a client server model in such a way that client side is the training images, and trained
deep learning model through images and codes. We created a virtual environment
and imported all the libraries and packages required to run the model. This will be
become a server side. Hence, the server is running in the model and while taking the
references of images and subsequent values of steering angle, throttle speed, speed
and is learning to drive a car autonomously through it.
30 Go-Kart Simulation in HoloLens 361

4 Results and Observations

When the application is connected, it will land us in the main menu scene named as
HoloMenu, which contains four blocks, each block is a page containing information
about each element in the menu as shown in the Fig. 2a.
The first box named as car information contains information about the car, and
the specifications of the car along with the 3D model which we can interact with.
The model can be rotated and resized so that we can analyse and inspect the designed
model for any defects. The same has been depicted in Fig. 2b–d.
The second block named as track information contains the information of the
track and the assets that are present in the track scene along with a 3D model of the
track which can be interacted with bounding box as well. The track can be resized
and rotated to analyse the details of the track. The pink colour shown for the car info
page shows that the block is pressed earlier, as shown in Fig. 2e–g. The third block
of the application contains the hardware requirements. The hardware and software
configurations to make a real-life model.
The fourth block directs us to the main scene where we are simulating the automa-
tion of the car which uses the self-driving deep learning model. Before we can actually
start simulating the car we need to start the server which will run the deep learning
model. Take the image data and values from client side and the generate steering
angle, throttle speed and car speed through the every image instance happening
with the previous data available at client side. The information has been depicted
in Fig. 2h. Virtual environment server side connection for running the model and
getting steering angle, throttle speed and speed of the car.

5 Conclusion and Future Work

This application can be used by Go-Kart/race car drivers as well as car industries.
Go-Kart or race car drivers can analyse the track when they are racing and can deploy
their car into the mixed reality environment. It can see their car’s maximum potential
or how the car will be driven in the race track model loaded into the HoloLens
application. Since the car uses behavioural cloning, the car can be trained according
to the drivers’ capabilities. It can be used by the car industries as they will not require
to build a clay model to inspect the car instead, they can build the model into the
HoloLens application so as to analyse the car into the mixed reality environment and
make changes to it accordingly. The future goal of this work is to run the 3D model
of the car into the real-world environment so that we don’t require to import a virtual
world to simulate the automated car and give the users a feel of the look of their
pre-ordered vehicle.
362 K. Paridhi et al.

(a) Application’s main menu page (b) Gaze Interaction performed to


for Go-Kart simulation. interact with Car Info Page

(c) Car information page after (d) Bounding box function to


gesture tap of the block. zoom or rotate the car.

(e) Track information page gaze (f) Track information page after
interaction. gesture tap.

(g) Hardware information page. (h) Automated race track scene.

Fig. 2 Sample screenshots obtained out of the proposed Go-Kart simulation model
30 Go-Kart Simulation in HoloLens 363

References

1. Taylor AG (2016) Develop microsoft hololens apps now. Springer


2. Juraschek Max, Büth Lennart, Posselt Gerrit, Herrmann Christoph (2018) Mixed reality in
learning factories. Procedia Manuf 23:153–158
3. Srivastava JP, Readdy GG, Moizuddin M, Theja KS, Sambasiva Rao N (2020) Case study on
different go kart engine transmission systems. In: IOP conference series: materials science and
engineering, vol 981. IOP Publishing, p 042026
4. Jones C (2019) Mixed reality’s ability to craft and establish an experience of space
5. Jana A, Sharma M, Rao M (2017) HoloLens blueprints. Packt Publishing Ltd
6. Hussain Rasheed, Zeadally Sherali (2018) Autonomous cars: research results, issues, and future
challenges. IEEE Commun Surv Tutor 21(2):1275–1313
7. Strzys MP, Kapp S, Thees M, Kuhn J, Lukowicz P, Knierim P, Schmidt A (2017) Augmenting
the thermal flux experiment: a mixed reality approach with the hololens. Phys Teach 55(6):376–
377
8. Bahri H, Krčmařík D, Kočí J (2019) Accurate object detection system on hololens using yolo
algorithm. In: 2019 international conference on control, artificial intelligence, robotics and
optimization (ICCAIRO). IEEE, pp 219–224
9. van der Meulen H, Kun AL, Shaer O (2017) What are we missing? adding eye-tracking to the
hololens to improve gaze estimation accuracy. In: Proceedings of the 2017 ACM international
conference on interactive surfaces and spaces, pp 396–400
10. Park Sebeom, Bokijonov Shokhrukh, Choi Yosoon (2021) Review of microsoft hololens appli-
cations over the past five years. Appl Sci 11(16):7259
11. Paredes SG, Vázquez NR (2020) Is holographic teaching an educational innovation? Int J
Interact Des Manuf (IJIDeM) 14(4):1321–1336
12. Lakshmanasamy J, et al (2017) Optimization of the support frame for clay model cars
13. Wang Wei, Xingxing Wu, Chen Guanchen, Chen Zeqiang (2018) Holo3dgis: leveraging
microsoft hololens in 3d geographic information. ISPRS Int J Geo-Inf 7(2):60
14. Naritomi S, Tanno R, Ege T, Yanai K (2018) Foodchangelens: Cnn-based food transformation
on hololens. In: 2018 IEEE international conference on artificial intelligence and virtual reality
(AIVR). IEEE, pp 197–199
15. Blanco-Novoa Ó, Fraga-Lamas P, Vilar-Montesinos MA, Fernández-Caramés TM (2020) Cre-
ating the internet of augmented things: an open-source framework to make iot devices and
augmented and mixed reality systems talk to each other. Sensors 20(11):3328
16. Chy MKA, Masum AKM, Sayeed KAM, Uddin MZ (2021) Delicar: a smart deep learning
based self driving product delivery car in perspective of Bangladesh. Sensors 22(1):126
Chapter 31
A Survey on Different Techniques
for Anomaly Detection

Priyanka P. Pawar and Anuradha C. Phadke

1 Introduction

Both the governmental and private sectors employ video surveillance equipment.
They have far-reaching ramifications in the fight against criminals and terrorism.
Understanding human behavior from video is an important branch of computer vision
research that has become majorly important in recent research. Newly advances in
computer vision, the availability of affordable equipment such as video cameras, and
a wide stream of new applications such as personal individual and visual observation
are all driving interest in human motion analysis. It can analyze the mobility of a
human or body component from monocular or multi-view video pictures with no
need for human involvement.
Virtual reality, medical diagnostics, physical performance, human–machine inter-
action, and assessment have all been fascinating uses of the movement of the human
body analysis research. Tracking and estimating motion characteristics, studying the
human body structure, and detecting motion activities are three areas of research
directions in general. These are taken into account while analyzing human body
motion. One of the essential technologies in intelligent environments, security moni-
toring, and human–computer interaction is intelligent vision analysis. This method
is based on the detection of moving objects. Its main purpose is to detect moving
objects in relation to the entire picture. Other sophisticated applications, including
as target tracking, target categorization, and target behavior comprehension, are built
on the basis of detecting moving objects.

P. P. Pawar (B) · A. C. Phadke


School of Electronics and Communication Engineering, Dr. Vishwanath Karad MIT World Peace
University, Pune, India
e-mail: priyankashitole.16@gmail.com
A. C. Phadke
e-mail: anuradha.phadke@mitwpu.edu.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 365
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_31
366 P. P. Pawar and A. C. Phadke

The frame subtraction approach, the backdrop subtraction method, and the optical
flow method are the most often utilized methods in moving object recognition today.
The frame difference or frame subtraction technique detects moving objects by
computing the changes between pixels in successive frames of a video series, as
well as extracting motion areas using a time difference threshold between adja-
cent frames pixels. Although frame subtraction techniques are adaptable to scenes
with abrupt lighting changes, certain crucial pixels cannot be retrieved, resulting in
gaps inside moving things. Calculating the image optical flow field and clustering
processing based on the optical flow distribution features of the picture is the optical
flow technique. This approach can get comprehensive activity statistics and better
distinguish the mobile item out of the background, but it is not suited for real-time
demanding situations due to a high number of calculations, susceptibility to noise,
and poor anti-noise performance.

2 Survey Details

Despite the significant advancement achieved using deep learning methods in many
machine learning tasks, deep learning approach is found rare in anomaly detec-
tion. A number of authors conduct surveys on deep learning algorithms based on
their intended use, for example, fraud detection, cyber intrusion detection, medical
domain, IoT, big data anomaly detection, etc. The deep neural network design is
chosen based on the nature of the input data, which is classified as sequential data
and non-sequential data. Deep neural network architecture such as CNN, RNN,
LSTM is used for sequential data input. And, CNN, AE, and its variants are used for
non-sequential data inputs. The availability of labels is also a factor in deep learning
detection algorithms. Labels show if a certain data point is a typical or outlier. Based
on these labels, methods are classified as supervised, semi-supervised, and unsuper-
vised deep anomaly detection. Some new techniques have been employed depending
upon the training objectives, which are deep hybrid models and one-class neural
networks. This paper surveys the various methods and algorithms used in various
applications.

2.1 Supervised Anomaly Detection

Supervised Deep Anomaly Detection (DAD) involves utilizing labels of normal and
an abnormal data samples to train a deep supervised classifier which can be binary
or multi-class. Despite their better effectiveness, supervised DAD approaches are
not as common as semi-supervised or unsupervised methods due to the scarcity of
labeled training data. Furthermore, the performance of a deep supervised classifier
using an anomaly detector is sub-optimal owing to class imbalance (the total number
of positive class instances is far more than the total number of negative class of data)
31 A Survey on Different Techniques for Anomaly Detection 367

[6]. The most often used supervised algorithms are decision tree, support vector
machines (SVMs), supervised neural networks, k-nearest neighbors, and Bayesian
networks.
2.1.1. k-NN estimates the approximate distances between various points on the
input vectors and then assigns the unlabeled point to the K-nearest neighbor’s
class. Shailendra and Sanjay [6] proposed a hybrid feature selection strategy
that combines a two-phase filter and a wrapper. The filter phase chooses the
features with the largest information gain and sends them to the wrapper
phase, which generates the final feature subset. To categorize assaults, the
final feature subsets are fed into the K-nearest neighbor classifier. The useful-
ness of this approach is proved using the DARPA KDDCUP99 cyberattack
dataset.
2.1.2. The Bayesian network approach is commonly used for intrusion detection
in conjunction with statistical systems. According to Johansen and Lee [7],
a Bayesian network approach provides a sufficient mathematical basis for
making a seemingly tough problem simple. They suggest that Bayesian
network-based intrusion detection systems discern between assaults and
regular network activity by comparing metrics from each network traffic
sample. Moore and Zuev [8] employed a supervised Naive Bayes classi-
fier using 248 flow characteristics, in addition to various TCP header derived
features, to discern between different types of applications. Correlation-based
feature selection was utilized to create stronger features, and it revealed that
good classification requires just a small subset of less than 20 characteristics.
2.1.3. Supervised neural network (NN). If correctly planned and worked out, NN has
the potential to solve many of the difficulties experienced by rule-based tech-
niques. The most widely utilized supervised neural networks are multi-layer
perceptron (MLP) and radial basis function (RBF). Moradi and Zulkernine
[9], Mohammed et al. [11] employed three layers’ MLP (two hidden layers) to
not on ly detect normal and attack connections, but also to identify attack kind.
Jiang et al. [10] proposed a novel method for detecting abuse and anomalies
in a hierarchical RBF network. In the first layer, an RBF anomaly detector
determines if an event is normal or abnormal. Anomaly events are then sent
via an RBF abuse detector chain, with each detector detecting a different
sort of assault. Any anomalous occurrences that were not categorized by any
misuse detectors were recorded in a database. If enough anomalous events
were recorded, they were categorized into distinct categories by a C-means
clustering technique, which was then used to train a misuse RBF detector
and added to the misuse detector chain. This method automatically detects
and label all intrusion occurrences.
2.1.4. Decision tree has nodes, arcs, and leaves as main component. The decision
trees for DoS attacks, R2L attacks, U2R attacks, and Scan assaults were
constructed by Lee et al. [12]. The ID3 method is utilized as the learning
algorithm to automatically create the decision tree.
368 P. P. Pawar and A. C. Phadke

2.1.5. Support vector machine (SVM) initially translates the input vector into a
higher-dimensional feature space and then finds the best separating hyper-
plane in that space. Furthermore, the separation hyperplane, which is defined
by support vectors rather than the entire training sample, is particularly
resilient against outliers. The suggested PSO–SVM model by Wang, et al.
[13] is used as an intrusion detection issue, with the standard PSO used
to select the parameters of the support vector machine and the binary PSO
utilized to acquire the best feature subset at the building intrusion detection
system. Mukkamala et al. [14] created a model to detect network anoma-
lies by “applying kernel classifiers and classifier construction approaches
to network anomaly detection challenges.” They investigated the effect of
kernel type and parameter values on the accuracy of intrusion categorization
performed by a support vector machine (SVM).

2.2 Semi-Supervised Anomaly Detection

Because labels for normal examples are much easier to get than labels for anomalies,
semi-supervised DAD approaches have become more popular; it employs existing
labels of one (usually positive class) to differentiate anomaly. Deep autoencoders
are commonly used in outlier detection by training them semi-supervised on data
samples with no abnormalities [7]. DAD approaches that are semi-supervised or (one-
class classification) presume that all training cases have just one-class label. Because
computer networks are becoming more complex, network intrusion detection systems
(NIDSs) are becoming increasingly important. Machine learning-based detection
systems have received a lot of interest because of their capacity to detect new assaults
[16]. However, to train an efficient model, it requires an enough amount of labeled
training data, which is tough to gather and not at affordable cost. To that end, it
is necessary to develop models that can learn from unlabeled or partially labeled
data [16]. Min et al. [16] provide SU-IDS, an autoencoder-based system for semi-
supervised and unsupervised network anomaly detection. The methodology improves
performance by supplementing the standard clustering loss of an autoencoder. The
experimental findings on the traditional NSL-KDD dataset and the contemporary
CICIDS2017 data set suggest that proposed models are superior.
For surveillance applications, videos are the major source of information.
Although video content is frequently available in vast amounts, it typically has little
or no annotation for supervised learning. Kiran et al. [15] examine and categorize
state-of-the-art deep learning-based approaches for video anomaly detection based
on model type and detection criteria. We also conduct basic research to better under-
stand the various methodologies and give assessment criteria for spatiotemporal
anomaly identification. Perera and Patel [17] offer a unique deep learning-based
strategy for one-class transfer learning that uses labeled data from an unrelated task
for feature learning in one-class classification. The suggested technique works on top
of a convolutional neural network (CNN) of choice to generate descriptive features
31 A Survey on Different Techniques for Anomaly Detection 369

with low intraclass variation in the feature space for the given class. Two loss func-
tions, compactness loss and descriptiveness loss, are presented for this purpose,
coupled with a parallel CNN architecture.

2.3 Unsupervised Anomaly Detection

Unsupervised anomaly detection methods do not require any training data. They
used two fundamental assumptions as an alternative. First, they assume that most
network connections are normal and that only a tiny amount of traffic is problematic.
Second, they expect hostile traffic to be statistically different from normal traffic.
“According to these two assumptions, data groups of similar instances that appear
frequently are deemed to be regular traffic, whereas instances that differ significantly
from the bulk of the instances are considered malicious” Jebur, et al. [18]. K-means,
self-organizing maps (SOM), C-means, Expectation–Maximization meta-algorithm
(EM), adaptive resonance theory (ART), unsupervised niche clustering (UNC), and
one-class support vector machine are the most often used unsupervised algorithms.
2.3.1. Clustering techniques—Clustering algorithms have been discovered to
function by grouping observable data into clusters based on a specific
similarity or distance metric. There are at least two methods for detecting
anomalies using clustering. The anomaly detection model in the first tech-
nique is trained with unlabeled data that includes both normal and attack
traffic. The model is trained using just normal data in the second tech-
nique, and a profile of normal activity is constructed [18]. The first strategy
assumes that aberrant or attack data is a tiny fraction of total data. If this
assumption is correct, cluster sizes can be used to detect abnormalities and
assaults. Large clusters represent typical data, whereas the remaining data
points, which are outliers, represent assaults.
2.3.1.1. K-means separates the data into k clusters and ensures that data inside the
same cluster is similar, while data in other clusters has low similarities “The
K-means method first chooses K data at random as the initial cluster center,
then adds the rest of the data to the cluster with the highest similarity based
on its distance to the cluster center, and finally recalculates the cluster center
of each cluster. Repeat this process until no cluster centers change. As a
result, the data is separated into K clusters. Unfortunately, K-means clus-
tering is susceptible to outliers, and a group of objects closer to a centroid
may be empty, preventing centroids from being updated” Han [19]. Li [20]
proposes a method on intrusion detection based on data mining. To begin,
a method for reducing noise and isolating spots on the dataset was devel-
oped. An approach for calculating the number of the cluster centroid was
provided by splitting and merging clusters and utilizing the density radius
of a super sphere. An anomaly detection model was provided to achieve
370 P. P. Pawar and A. C. Phadke

a better detection result using a more precise way of locating k-clustering


centers.
2.3.1.2. Unsupervised neural network—Self-organizing maps and adaptive reso-
nance theory are two examples of unsupervised neural networks. Qu et al.
[21] offer a targeted literature review of self-organizing maps (SOM) for
intrusion detection. SOM architectures may be classified into two types:
static-layered architectures and dynamic-layered architectures. The former,
Hierarchical Self-Organizing Maps (HSOMs), may effectively decrease
computational overheads while also efficiently representing data hier-
archy. Growing Hierarchical Self-Organizing Maps (GHSOMs) are very
successful for online intrusion detection due to its low processing latency,
dynamic self-adaptability, and self-learning. The ultimate purpose of SOM
design is to precisely depict data topology in order to detect any unusual
assault. The overarching purpose of this investigation is to compare the
fundamental components and features of SOM-based intrusion detection
in great detail. We can easily comprehend the present problems of SOM-
based intrusion detection systems and identify future research paths by
comparing them to the two SOM-based intrusion detection systems [21].
Lotfi Shahreza et al. [22] describe the SOM for anomaly detection, its
advantages, and disadvantages along with particle swarm optimization.
Morteza et al. [23] introduce the Unsupervised Neural Net-Based Intrusion
Detector (UNNID) system, which uses unsupervised neural networks to
identify network-based intrusions and assaults. The system includes tools
for training, testing, and tuning unsupervised networks for use in intru-
sion detection. It used the system to evaluate two types of unsupervised
Adaptive Resonance Theory (ART) nets (ART-1 and ART-2). Based on
the findings, such networks can efficiently categorize network traffic as
normal or invasive. Because the system employs a combination of abuse
and anomaly detection methodologies, it is capable of identifying both
known and unknown attack types as anomalies.
2.3.1.3. Unsupervised Niche Clustering (UNC)—Leon et al. [24] describe an unsu-
pervised niche clustering-based technique to anomaly identification (UNC).
The UNC is a genetic niching clustering algorithm which manages inter-
ference and automatically calculates the number of clusters. The UNC
generates a profile of the normal space using the normal samples (clusters).
A fuzzy membership function which follows a Gaussian shape given by
the evolving cluster centers and radii can later be used to describe each
cluster. Experiments are carried out on actual datasets, a network intrusion
detection dataset is involved, and the findings are examined and published.
2.3.1.4. Fuzzy C-Means (FCM) method has been developed by Dunn [25] which
allows a single piece of data to be assigned to two or more clusters. Bezdek
[26] improved this method, which is used in areas when hard data catego-
rization is ineffective or impossible to achieve (e.g., pattern recognition).
“The C-Means method is similar to the K-Means algorithm, except that
each point’s membership is specified by a fuzzy function, and all points
31 A Survey on Different Techniques for Anomaly Detection 371

contribute to the re-location of a cluster centroid depending on their fuzzy


membership to that cluster” [18]. Shingo et al. [27] offers a unique fuzzy
class-association-rule mining approach for detecting network intrusions
based on genetic network programming (GNP). GNP is an evolutionary
optimization approach that uses directed graph structures rather than strings
in genetic algorithms or trees in genetic programming, resulting in improved
representation ability with compact programs obtained from the reusability
of nodes in a graph structure. The suggested technique, which combines
fuzzy set theory with GNP, can cope with a mixed database that comprises
both discrete and continuous characteristics, as well as extract numerous
key class-association rules that contribute to improving detection capabil-
ities. Shang et al. [28] proposed an intrusion detection method based on
clustering and SVM to solve the problem of virus and Trojan attacking the
application layer network protocol of industrial control system. To compute
the distance between industrial control network communication data and
the cluster center, the approach combines unsupervised fuzzy C-means
clustering (FCM) with supervised support vector (SVM) machine. Chen
et al. [29] proposed a hybrid KH-FCM algorithm which has strong global
search capability and simple optimization function structure.
2.3.1.5. Expectation–maximization meta-algorithm (EM)—Dempster et al. [30]
developed another soft clustering approach, EM, which is based on the
Expectation–Maximization meta-algorithm. The Expectation–Maximiza-
tion technique is used to determine the best probability estimates of parame-
ters in probabilistic models. “The expectation (E) stage of the EM clustering
method computes an estimation of likelihood using current model param-
eters (as if they are known), and the maximization (M) step computes the
maximum probability estimates of model parameters. The model parame-
ters’ revised estimations contribute to the following iteration’s expectation
step” [18].
Zong et al. [31] presented a Deep Autoencoding Gaussian Mixture Model
(DAGMM) for unsupervised anomaly detection. For each input data point, our model
employs a deep autoencoder to create a low-dimensional representation and recon-
struction error, which is then fed into a Gaussian Mixture Model (GMM). “Instead
of using decoupled two-stage training and the standard Expectation–Maximization
(EM) algorithm, DAGMM simultaneously optimizes the parameters of the deep
autoencoder and the mixture model in an end-to-end fashion, leveraging a separate
estimation network to facilitate mixture model parameter learning” [31].
2.3.2. One-class support vector machine (OC-SVM)—Li et al. [32] conducted a
thorough examination of attack and abuse trends in log files before proposing
a solution for anomaly identification based on support vector machines. It is
a one-class SVM-based technique that was developed using data from the
1999 DARPA user audit logs. Due to its versatility in fitting complicated
nonlinear boundaries between normal and new data, one-class support vector
machines (OC-SVMs) are one of the state-of-the-art algorithms for novelty
372 P. P. Pawar and A. C. Phadke

identification (or anomaly detection) in machine learning. Erfani et al. [33]


use combination of one-class SVM and deep learning for high-dimensional
and large-scale anomaly detection. The proposed method used linear kernel
instead of nonlinear one without loss of any accuracy, which makes the model
scalable and computationally efficient. Wang et al. [34] attempt to tackle the
problem of anomaly detection, which is critical in guaranteeing the safe and
stable functioning of power systems. Because the proportion of aberrant data
in power system operation is quite tiny, a one-class support vector machine
(OC-SVM) is used for categorization of imbalanced data. However, OC-
SVM’s performance is sensitive to its settings, and an inappropriate choice
would reduce its classification accuracy and generalization capacity. Wang
et al. [34] optimized the parameters of OC-SVM using particle swarm opti-
mization (PSO). The original PSO method is sluggish to converge and quickly
slips into a local optimum. To address this issue, they suggested an enhanced
PSO method for parameter optimization, in which adaptive speed weighting
and adaptive population splitting are used to boost the algorithm’s conver-
gence speed and assist the algorithm in breaking out of the local optimum
position. Hence, sort the problem of anomaly detection.

2.4 Anomaly Detection Techniques Based on Training


Objectives

2.4.1. Deep hybrid models (DHMs)—A deep hybrid model for detecting aberrant
flights is presented by Wang et al. [35]. Deep hybrid models for anomaly
detection employ deep neural networks, primarily autoencoders, as feature
extractors; the features learnt inside autoencoder’s hidden representations are
then fed into a cluster algorithm, which detects aberrant flights. Without preset
criteria or domain expertise, the model may detect flight irregularities and
related dangers. DHM for intrusion detection employs deep neural networks
as feature extractors, feeding features learned in hidden representations of
autoencoders into classic anomaly detection algorithms such as one-class
SVM (OC-SVM) to detect intrusion (Andrews et al. [36]). Ergen et al. [37]
suggested a hybrid model variation that incorporates combined training of
feature extractor together with OC-SVM (or SVDD) aim to enhance detection
performance. The lack of a trainable objective tailored for anomaly detection
is a key weakness of these hybrid techniques, since such models are unable
to extract rich differential features to detect intrusions. Hence, specialized
anomaly detection methods such as deep learning algorithms, deep one-class
classification, and one-class neural networks are implemented.
2.4.2. One-class Neural Network (OC-NN)—Chalapathy et al. [38] methods for
one-class neural network (OC-NN) classification are inspired by kernel-
based one-class classification, that includes the capability of deep neural
31 A Survey on Different Techniques for Anomaly Detection 373

networks to extract an increasingly rich representation of data with the one-


class goal of developing a tight enclosure around normal data. The OC-NN
technique is novel for one important reason: data representation in the hidden
layer is driven by the OC-NN aim and is therefore tailored for anomaly
detection. Deep Support Vector Data Description (Deep SVDD) (Ruff et al.
[39]) is another type of one-class neural network algorithm which is used
to train deep neural networks to analyze common sources of variations by
closely mapping normal data instances to the center of the sphere. OC-NN
outperformed conventional shallow methods in some scenarios.

2.5 Survey on Anomaly Detection Techniques Based


on Various Algorithms

2.5.1. Restricted Boltzmann Machine (RBM)—Fiore et al. [40] investigated the


efficacy of a machine learning-based detection approach, employing the
Discriminative Restricted Boltzmann Machine (RBM) to combine the expres-
sive power of generative models with high classification accuracy to infer
part of its knowledge from incomplete training data. A self-learning system
is required because network traffic is exceedingly complicated and unpre-
dictable, and the model is vulnerable to changes over time as anomalies
evolve. As a result, previously acquired knowledge on how to distinguish
them from normal traffic may no longer be applicable. This issue has been
overcome by the method explained in [40] based on machine learning
approach using discriminative RBM. Equation (1) shows the Boltzmann
distribution function where the probability of state P(υ,h) depends only on
energy of the state s(υ,h)
 
(υ, h) = exp(−s(υ, h))/ _(υ, h)exp(−s(υ, h)) , (1)

whereas structure of RBM, with no intralayer dependence, enables to write


 
P(v|h) = p(v i |h) and P(h|v) = p(h j |v). (2)
i j

Researchers sought to address the issues raised by increasingly modern technolog-


ical systems exposing possible holes, which invite hostile individuals to investigate
and breach their security, by developing outlier detection systems, that are security
layers which aim to identify harmful efforts.
Rosa et al. [41] present a unique strategy to dealing with anomaly detection in this
setting, in which they project the problem’s raw features via a constrained Boltzmann
machine rather than employing the problem’s raw features. Anomaly detection is
critical in the process of product quality inspection, because product data with large
374 P. P. Pawar and A. C. Phadke

dimensions and highly uneven distribution present certain obstacles. To address these
issues, a novel anomaly detection approach based on Gaussian Restricted Boltzmann
Machine (GRBM) is suggested by Zang et al. [49]. The investigation was conducted
using two real-world cases: wine quality and cigarette product testing.
2.5.2. Deep Belief Network—Deep Belief Networks (DBNs) are a type of deep
neural network that consists of numerous layers of Restricted Boltzmann
Machine graphical models (RBMs) [18]. DBNs are utilized as a directed
encoder–decoder network using a backpropagation method, according to the
hypothesis (Werbos [42]). DBNs are incapable of capturing the typical fluctu-
ations of anomalous samples, resulting in a large reconstruction error. DBNs
have been found to scale well to massive data and increase interpretability
(Wulsin et al. [43]).
2.5.3. Generalized denoising autoencoder—The Convolutional Autoencoder
(CAE) is an intriguing candidate for anomaly detection as it captures the
2D structure in an image sequences during the learning process. The work of
Ribeiro et al. [44] employs a CAE in the context of outlier identification, by
utilizing the reconstruction error of each frame in an image as an anomaly
score.
They present a method for combining high-level spatial and temporal charac-
teristics with the input instances and analyze resultant impact CAE ability while
exploring the CAE architecture. A simple parameter of video spatial complexity was
developed and associated with the CAE’s classification ability. Guo et al. [45] offer
AEKNN, an unsupervised anomaly detection framework that incorporates the bene-
fits of autonomously learned representation by deep neural networks to improve
anomaly detection performance. The system combines autoencoder training with
a k-th closest neighbor outlier identification algorithm. Jia et al. [46] suggested a
stacked denoising autoencoder-based intelligent rolling bearing failure diagnostic
system. The dimension of the original data was reduced using Principal Component
Analysis, and superfluous information was removed. The bearing data is then trained
using three denoising autoencoders. The learned DAE is then layered with a stack
denoising autoencoder with three hidden layers for backward optimization. Further,
the characteristics are fed into a soft-max classifier to detect faults.
2.5.4. Recurrent neural network (RNN)—Nanduri et al. [47] describe the appli-
cation of “Recurrent Neural Networks (RNN) with Long Term Short-Term
Memory (LTSM) and Gated Recurrent Units (GRU) architectures to over-
come the limitations of dimensionality reduction, poor sensitivity to short-
term anomalies, and inability to detect anomalies in latent features in machine
learning algorithms” [47].
2.5.5. Long Short-Term Memory Network—Ergen and Kozat [48] use extremely
effective gradient and quadratic programming-based training approaches for
training and tuning the values of the LSTM architecture and the OC-SVM
(or SVDD) algorithm. To use the gradient-based training approach, they
change the main aim criteria of the OC-SVM and SVDD algorithms, and
31 A Survey on Different Techniques for Anomaly Detection 375

Fig. 1 Approach for anomaly detection used in [48]

the convergence of the changed aim criteria to the main criteria is demon-
strated [48]. They obtain anomaly detection methods capable of processing
varied length data sequences and maintaining excellent performance, partic-
ularly for continuous series of data. Overall structure of this approach has
been summarized in Fig. 1
Elsayed et al. [49] presented a novel method that relied on Long Short- Term
Memory (LSTM) autoencoder and one-class support vector machine (OC-SVM) to
identify anomaly assaults in an imbalanced data by training the system with instances
from normal classes only.
“The LSTM-autoencoder is trained to learn the typical traffic pattern as well as
the compressed representation of the input data (i.e. latent features), after which it
is fed into an OC-SVM method. The hybrid model solves the drawbacks of the indi-
vidual OC-SVM” [49]. Malhotra et al. [50] introduced an encoder–decoder technique
for anomaly identification (EncDec-AD) that relies on Long Short-Term Memory
Networks which learns to rebuild “normal” time-series behavior and then use recon-
struction error to detect abnormalities. They test three accessible time-series datasets:
power demand, space shuttle, and ECG and two real-world engine datasets with
predictive and unpredictive behaviors. It has been demonstrated that EncDec-AD is
resilient and can identify anomalies in time series that are predictable, unexpected,
periodic, aperiodic, and quasi-periodic. EncDec-AD can detect abnormalities in both
short and long time series (lengths as short as 30 and length as large as 500) [50].
LSTM networks are work well for classification, processing, and making predictions
that rely on time-series data.

2.6 Survey on Application-Based Anomaly Detection


Techniques

2.6.1. Suspicious activity detection network for video surveillance using machine
learning—Shivtare et al. [1] proposed employing neural networks to detect
suspicious human activity in real-time CCTV data. It is extremely difficult
376 P. P. Pawar and A. C. Phadke

to continually monitor public spaces; consequently, intelligent video surveil-


lance is necessary that can monitor human actions in real time, classify them
as ordinary or exceptional, and create an alarm. Shivtare et al. [1] addressed
this problem. This article demonstrates how to create a real-time applica-
tion for detecting anomalous activities of persons in public settings. Figure 2
shows the flow of process for the anomaly detection using real time video as
an input.

2.6.2. Real-Time Anomaly Detection and Localization in Crowded Scene—


Sabokrou et al. [2] proposed approach for detection and localization of
anomalies in congested situations in real time. Each video is taken as a group
of cubic patches which are not overlapping and is characterized using the
two descriptors: local and global. These descriptions acquire video features
from various angles. The local and global features rely on structural simi-
larity between neighboring patches and unsupervised learning with a sparse
autoencoder. Experimental findings demonstrate that technique is similar to
a state-of-the-art procedure, but significantly more time-efficient [2].
2.6.3. Cascading 3D Deep Neural Networks for Fast Anomaly Identification and
Localization of Anomaly in Crowded Scene—Sabokrou et al. [5] present a
rapid and accurate approach for detecting and localizing anomalies in video
data depicting crowded settings. The topic of this work is the continuous
difficulty of time-efficient anomaly localization. They present a cubic patch-
based technique with a cascade of classifiers that use an advanced feature
learning methodology. The paper describes and employs a unique DNN
design for hierarchical portrayal of normal patches using partial features.
“A cascade classifier is suggested and one-class Gaussian classifier is used
in the intermediary layers of the DNNs” [5].
2.6.4. A Survey on Credit Card Fraud Detection Techniques—Zozaji et al. [3]
explored the problems of detecting credit card fraud and sought to assess
the state of the art in credit card fraud detection algorithms, datasets, and
evaluation criteria. The benefits and drawbacks of various fraud detection
technologies are listed and contrasted. The mention methodologies are classi-
fied into two basic fraud detection approaches, namely, misuses (supervised)

Fig. 2 Algorithm flow for anomaly detection


31 A Survey on Different Techniques for Anomaly Detection 377

Fig. 3 Fraud detection techniques

and anomaly detection (unsupervised). Different datasets utilized in the liter-


ature are then characterized and categorized as genuine and synthetic data,
and the effective and common qualities are retrieved for further use.

Figure 3 classifies fraud detection techniques.

2.6.5. Deep learning in bioinformatics—Seonwoo et al. [4] offer examples of


current research in deep learning in bioinformatics. They categorized research
in biomedical imaging and bioinformatics in signal processing along with
deep neural networks, convolutional neural networks, recurrent neural
networks and briefly describe work to provide a concrete information. They
also examine difficulties encountered using deep learning in bioinformatics
and make recommendations for further study [4].
2.6.6. Other areas for the application of various techniques to detect abnormal-
ities include intrusion detection, fraud detection in (banks, telecommuni-
cation, banking, insurance), Malware detection, medical anomaly detec-
tion, anomaly detection in social networks, IoT big data anomaly detection,
industrial anomaly detection, and video surveillance. In bioinformatics, deep
learning is anticipated to produce promising results.

3 Discussion and Conclusion

It has been observed that real-time data availability is tough to achieve and needs
a long process to access data from ongoing system. There is an enough gap in
developing a technique to access data logs, as well as building a system and validating
it in real-time situations. Machine learning algorithms are being developed to cope
with data that has a large dimensionality and to detect abnormal system behavior.
Deep learning, a subset of machine learning, shows considerable success in many
378 P. P. Pawar and A. C. Phadke

domains (such as computer vision and audio processing) in producing more accurate
outcomes of challenging problems. There is a need to apply novel models and analyze
their ability in the anomaly detection sector, particularly for intelligent transportation,
industrial, and smart object-based systems.
Lack of real-time data makes difficult for the systems to access data; hence, there
is a need of huge balance dataset to build models and validate it in real-time systems.
It has been found that while analyzing data, maximum amount of data seen is under
normal behavior condition, so finding an abnormality requires training the system
with huge data, and hence, more robust systems are required to develop to achieve
maximum accuracy and deal with complex real-time scenarios. The majority of recent
research have focused on the identification of abnormalities. Anomaly prediction and
prevention are still an area of study that needs to be explored. It can be very helpful
in predicting anomalies. New ways for proactively preventing system failures and
analyzing root cause analysis must be found and pursued.
The emergence of new methodologies and techniques to process the different data
streams provided by IoT devices, healthcare systems, intelligent surroundings, and
complicated industrial systems has been seen.

Conflict of Interest The authors declare that there is no conflict of interest in this paper.

References

1. Shivthare, K.V., Bhujbal, P.D., Darekar, A.P.: Suspicious activity detection network for video
surveillance using machine learning. Int. J. Adv. Sci. Res. Eng. Trends 6(4) (2021)
2. Sabokrou, M., Fathy, M., Hoseini, M., Klette, R.:Real-time anomaly detection and localization
in crowded scenes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition
Work-shops (CVPRW), pp. 56–62 (2015)
3. Zojaji, Z., Atani, R.E., Monadjemi, A.H.: A survey of credit card fraud detection techniques:
data and technique oriented perspective. arXiv pre-print arXiv:1611.06439 (2016)
4. Min, S., Lee, B., Yoon, S.S.: Deep learning in bioinformatics. Briefings Bioinform. 18(5),
851–869 (2017)
5. Sabokrou M, Fayyaz M, Fathy M, Klette R (2017) Deep-cascade: cascading 3D deep neural
networks for fast anomaly detection and localization in crowded scenes. IEEE Trans. Image
Process. 26(4):1992–2004
6. Singh, S., Silakari, S.: An ensemble approach for feature selection of Cyber Attack Dataset.
arXiv preprint arXiv:0912.1014 (2009)
7. Johansen, K., Lee, S.: CS424 network security: Bayesian network intrusion detection (BINDS)
(2003)
8. Moore, A.W., Zuev, D.: Internet traffic classification using Bayesian analysis techniques. In:
Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and
Modeling of Computer Systems (2005)
9. Moradi, M., Zulkernine, M.: A neural network based system for intrusion detection and
classification of attacks. In: Proceedings of the IEEE International Conference on Advances
in Intelligent Systems-Theory and Applications. IEEE Luxembourg-Kirchberg, Luxembourg
(2004)
10. Jiang, J., Zhang, C., Kamel, M.: RBF-based real-time hierarchical intrusion detection systems.
In: Proceedings of the International Joint Conference on Neural Networks, vol. 2. IEEE (2003)
31 A Survey on Different Techniques for Anomaly Detection 379

11. Sammany, M., et al.: Artificial neural networks architecture for intrusion detection systems and
classification of attacks. In: The 5th International Conference INFO2007 (2007)
12. Lee, J., Lee, J., Sohn, S., Ryu, J., Chung, T.: Effective value of decision tree with KDD 99 intru-
sion detection datasets for intrusion detection system. In: 2008 10th International Conference
on Advanced Communication Technology, pp. 1170–1175 (2008)
13. Wang, J., et al.: A real-time intrusion detection system based on PSO-SVM. In: Proceed-
ings of 2009 International Workshop on Information Security and Application (IWISA 2009).
Academy Publisher (2009)
14. Mukkamala, S., Sung, A.H., Ribeiro, B.M.: Model selection for kernel based intrusion detection
systems. In: Adaptive and Natural Computing Algorithms, pp. 458–461. Springer, Vienna
(2005)
15. Kiran BR, Thomas DM, Parakkal R (2018) An overview of deep learning based methods for
unsupervised and semi-supervised anomaly detection in videos. J. Imaging 4:36
16. Min, E., et al.: Su-ids: a semi-supervised and unsupervised framework for network intrusion
detection. In: International Conference on Cloud Computing and Security. Springer, Cham
(2018)
17. Perera P, Patel VM (2019) Learning deep features for one-class classification. IEEE Trans.
Image Process. 28(11):5450–5463
18. Omar, S., Ngadi, A., Jebur, H.H.: Machine learning techniques for anomaly detection: an
overview. Int. J. Comput. Appl. 79(2) (2013)
19. Han, J., Kamber, M.: Data Mining: Concept and Techniques, 1st ed. Morgan Kaufmann
Publishers (2001)
20. Li, H.: Research and implementation of an anomaly detection model based on clustering
analysis. In: International Symposium on Intelligent Information Processing and Trusted
Computing (2010)
21. Qu X, Yang L, Guo K et al (2021) A survey on the development of self-organizing maps for
unsupervised intrusion detection. Mob. Netw. Appl. 26:808–829
22. Lotfi Shahreza, M., Moazzami, D., Moshiri, B., Delavar, M.R.: Anomaly detection using a
self-organizing map and particle swarm optimization, Scientia Iranica 18(6) (2011)
23. Amini, M., Jalili, R.: Network-based intrusion detection using unsupervised adaptive resonance
theory (ART). In: Proceedings of the 4th Conference on Engineering of Intelligent Systems
(EIS 2004), Madeira, Portugal (2004)
24. Leon, E., Nasraoui, O., Gomez, J.: Anomaly detection based on unsupervised niche clustering
with application to network intrusion detection. In: Proceedings of the 2004 Congress on
Evolutionary Computation (IEEE Cat. No. 04TH8753), vol. 1. IEEE (2004)
25. Dunn, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-
separated clusters. 32–57 (1973)
26. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Springer
Science & Business Media (2013)
27. Mabu, S., et al.: An intrusion-detection model based on fuzzy class-association-rule mining
using genetic network programming. IEEE Trans. Syst. Man Cybern. Part C (Applications and
Reviews) 41(1), 130–139 (2010)
28. Shang, W., Cui, J., Song, C., Zhao, J., Zeng, P.: Research on industrial control anomaly detection
based on FCM and SVM. In: 2018 17th IEEE International Conference on Trust, Security and
Privacy in Computing and Communications/12th IEEE International Conference on Big Data
Science and Engineering (Trust-Com/BigDataSE), pp. 218–222 (2018)
29. Chen, R., Zhang, F., Xi, L.: Anomaly detection algorithm based on FCM with improved Krill
Herd. J. Phys. Conf. Ser. 1187(4) (2019). IOP Publishing
30. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the
EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39(1):1–22
31. Zong, B., et al.: Deep autoencoding Gaussian mixture model for unsupervised anomaly
detection. In: International Conference on Learning Representations (2018)
32. Li, K.-L., Huang, H.-K., Tian, S.-F., Xu, W.: Improving one-class SVM for anomaly detection.
In: Proceedings of the 2003 International Conference on Machine Learning and Cybernetics
380 P. P. Pawar and A. C. Phadke

(IEEE Cat. No.03EX693), vol. 5, pp. 3077–3081 (2003). https://doi.org/10.1109/ICMLC.2003.


1260106
33. Erfani, S.M., et al.: High-dimensional and large-scale anomaly detection using a linear one-class
SVM with deep learning. Pattern Recogn. 58, 121–134 (2016)
34. Wang, Z., et al.: Power system anomaly detection based on OCSVM optimized by improved
particle swarm optimization. IEEE Access 7, 181580–181588 (2019)
35. Wang, Q., Qin, K., Lu, B.: Flight anomaly detection based on deep hybrid model. In: 2020 IEEE
2nd International Conference on Civil Aviation Safety and Information Technology (ICCASIT,
2020), pp. 959–962 (2020)
36. Jerone, T.A.A., Morton, E.J., Griffin, L.D.: Detecting anomalous data using auto-encoders. Int.
J. Mach. Learn. Comput. 6(1), 21–26 (2016)
37. Tolga, E., Kozat, S.S.: Unsupervised anomaly detection with LSTM neural networks. IEEE
Trans. Neural Netw. Learn. Syst. 31(8), 3127–3141 (2019)
38. Chalapathy, R., Menon, A.K., Chawla, S.: Anomaly detection using one-class neural networks.
arXiv preprint arXiv:1802.06360 (2018)
39. Ruff, L., et al.: Deep one-class classification. In: International Conference on Machine Learning.
PMLR (2018)
40. Fiore, U., Palmieri, F., Castiglione, A., De Santis, A.: Network anomaly detection with the
restricted Boltzmann machine. Neurocomputing 122, 13–23 (2013). ISSN 0925-2312
41. de Rosa, G.H., Roder, M., Santos, D.F.S., et al.: Enhancing anomaly detection through restricted
Boltzmann machine features projection. Int. J. Inf. Tecnol. 13, 49–57 (2021)
42. Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10),
1550–1560 (1990)
43. Wulsin, D., Blanco, J., Mani, R., Litt, B.: Semi-supervised anomaly detection for EEG wave-
forms using deep belief nets. In: 2010 Ninth International Conference on Machine Learning
and Applications (ICMLA), pp. 436–441. IEEE (2010)
44. Ribeiro, M., Lazzaretti, A.E., Lopes, H.S.: A study of deep convolutional auto-encoders for
anomaly detection in videos. Pattern Recogn. Lett. 105, 13–22 (2018). ISSN 0167-8655
45. Guo, J., Liu, G., Zuo, Y., Wu, J.: An anomaly detection framework based on auto-encoder
and nearest neighbor. In: 2018 15th International Conference on Service Systems and Service
Management (ICSSSM), pp. 1–6 (2018). https://doi.org/10.1109/ICSSSM.2018.8464983
46. Jia, L., Du, X.: Rolling bearing fault classification based on stacked denoising auto encoders.
IOP Conf. Ser. Earth Environ. Sci. 769(4) (2021). IOP Publishing
47. Nanduri, A., Sherry, L.: Anomaly detection in aircraft data using Recurrent Neural Networks
(RNN). In: 2016 Integrated Communications Navigation and Surveillance (ICNS), pp. 5C2-
1–5C2-8 (2016)
48. Ergen T, Kozat SS (2020) Unsupervised anomaly detection with LSTM neural networks. IEEE
Trans. Neural Netw. Learn. Syst. 31(8):3127–3141. https://doi.org/10.1109/TNNLS.2019.293
5975
49. Elsayed, M.S., et al.: Network anomaly detection using LSTM based autoencoder. In: Proceed-
ings of the 16th ACM Symposium on QoS and Security for Wireless and Mobile Networks
(2020)
50. Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., Shroff, G.: LSTM-based
encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148 (2016)
Chapter 32
A Scholastic Comprehensive Study on 6G
Wireless Communication System

Kavita H. Gudadhe, Warsha P. Sirskar, and Swati Gaikwad

1 Introduction

The volume of mobile data traffic throughout the globe has skyrocketed in recent
years. According to estimates from the International Telecommunication Union
(ITU), monthly global mobile data traffic will grow from its present level to 607
Exabyte (EB) by 2025, and then to 5016 EB by 2030 [1]. In 2025, we predict a
total of around 39 EBs, and by 2030, we anticipate a total of about 257 EBs. Pro-
jections show that by 2025, more than 70% of the world’s population will subscribe
to a mobile service. More than half of these 70% are also likely to have access to
the Internet through mobile devices. The vast data flow necessitates an increase in
a variety of services, including full coverage, ultra-reliable, low-latency wireless
communications with a focus on throughput rather than protocol overhead. Personal
computers, portable media players, tablets, smart phones, sensors, and the Internet
itself have all played a role in the exponential growth of data traffic. The term “Inter-
net of Everything” refers to the interconnectivity and interoperability of all devices,
systems, and applications that may be linked to the web (IoE). These gadgets are data-
driven (especially in terms of video) and have a low call volume. The exponential
growth of Internet and mobile users, as well as M2M and linked devices. Projections
of the number of people using the Internet throughout the world for the year 2023.
There will be around twice as many M2M and connected devices in use by 2023,
according to projections. The total number of linked devices is in billions. The total
number of connected devices in billions across six different time periods from 2018
to 2023 [2]. It’s worth noting that 13.5 billion gadgets are expected to be connected in

K. H. Gudadhe (B) · W. P. Sirskar


Department of Information Technology, Yeshwantrao Chavan College of Engineering, Nagpur,
Maharashtra, India
e-mail: sukekavita@gmail.com
S. Gaikwad
Department of Pharmacy, Nagpur College of Pharmacy, Nagpur, Maharastra, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 381
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_32
382 K. H. Gudadhe et al.

APAC countries by 2020. These figures highlight the growing significance of wire-
less broadband connectivity across a wide range of sectors, from transportation and
health care to infrastructure and even home and military applications.

1.1 Evolution of Cellular Networks from 1G to 6G

It is important to present a short history of mobile communications networks, from the


first generation (1G) to the fifth generation (5G), in order to provide a clear picture of
what 6G networks may bring (5G). There have been five major generations of mobile
communications systems to date, each having its own quirks and prerequisites. There
has been a notable shift to a new generation of mobile communication networks about
every ten years. In only a few short years, the introduction of voice services in the
1980s s marked the beginning of the first generation of cellular technology, some-
times known as 1G. With 1G networks, the average transfer rate was 2.4 kbps. Due to
its dependency on analog transmission, 1G had low capacity, inconsistent delivery,
and security flaws [3]. The problems with the first-generation (1G) mobile commu-
nication system were addressed by developing second-generation (2G) networks in
the 1990s using digital modulation techniques. For some time now, 2G networks
have also been able to provide the delivery of encrypted data services like the short
message service (SMS) [4]. Data transmission speeds of up to 64 kbps were possible
on the second-generation network that was based on GSM technology. The public
switched telephone network is the foundation of both 1G and 2G mobile commu-
nication technologies (PSTN). This system is comprised of many different types of
telecommunications infrastructure, such as copper phone lines and switching hubs,
optical fiber networks, wireless networks, and even satellite networks. At the turn of
the millennium, the need for a variety of data services led to the introduction of third-
generation (3G) mobile communications systems. High-speed packet access (HSPA)
makes it feasible for 3G networks to achieve rates of up to 2 Mb/s [5]. As of 2009,
the year after the fourth network based on long-term evolution (LTE), sometimes
called 4G mobile technology [6]. Long-term evolution (LTE)-capable networks are
adaptable in that they can run in either time division duplex (TDD) or frequency
division duplex (FDD) modes. Some of the technologies used by LTE networks
include multiple-input and multiple-output (MIMO), coordinated multiple transmis-
sion/reception (COMP), and orthogonal frequency division multiplexing (OFDM).
Increases in data rates, transmission bandwidth, and the availability of mobile broad-
band connections are all made feasible by these innovations. With the advent of the
LTE-Advanced network in 2011, the LTE mobile communication technology became
capable of running on unlicensed airwaves. Comparatively, a 4G LTE network with
2 2MIMO offers up to 150 Mb/s, while an LTE-Advanced network with MIMO
may attain a maximum data throughput of up to 1 Gb/s using a 100 MHz aggregated
bandwidth [7]. The commercial rollout of 5G networks is now underway. As well
as wireless LAN and MAN radios, a 5G network may also handle wide area net-
work (WAN) bands (PAN). The 5G network might combine spectrum, allowing for
32 A Scholastic Comprehensive Study… 383

the streaming of HD movies and other data-intensive applications. Additionally, the


maximum data rate of a 5G network is about 20 Gb/s, making it far faster than the LTE
system. As an added bonus, beam division multiple access (BDMA), a cutting-edge
method of multiple access, may be used in a 5G network to boost system capacity.
Users may be assigned to an orthogonal beam in this multiplexing method based
on their physical locations [8]. Thanks to recent developments in smart devices and
software, IoE networks have become more commonplace. Autonomous and aerial
cars, healthcare software, smart services, and other time-critical services all make
use of IoE networks. Such an IoE network would need pervasive sensing and com-
putation capabilities, which could be too much to ask of 5G networks. The present
tremendous data traffic growth is also likely to exceed the data rate capabilities of
even the 5G networks. This has inspired research into how to improve the state of
the wireless industry so that it can support the expanding number of devices and
services that make up the Internet of Things. Therefore, research into 6G wireless
data networks for communication has become a priority. This is due to the fact that
unlike current networks, it is expected that future 6G wireless communication sys-
tems would support IoE applications and services. 6G makes it easier to connect
various networks, both on Earth and beyond it. For example, 6G’s compatibility
with satellite communication may increase connection speeds and guarantee ubiqui-
tous coverage. The seamless interoperability of 6G networks is enabled by network
slicing and multi-access edge computing through software-defined networks. It is
anticipated that 6G networks, which are more sophisticated than their predecessors,
would make substantial use of AI and machine learning (ML). We foresee a very
dense network design as a consequence of the rollout of 6G networks. Although large
MIMO was a key component of 5G, transmission in 6G systems would revolve on
strategically placed reflecting surfaces. In contrast to older networks, 6G will fully
allow Internet connections, which need a latency of less than 1 millisecond to enable
minimally invasive Expansion.

2 6G Vision

Sixth-generation networks aim to be even more advanced than the current generation
of wireless communication systems in order to better serve the needs of users and
handle enormous amounts of data traffic. Sixth-generation wireless networks aim to
improve data transfer speeds while reducing power consumption, expand broadband
access and coverage, reinforce communication security and trustworthiness, boost
connection dependability, reduce latency, and realize intelligent communication. In
theory, 6G networks might enable data rates in excess of 100 Gbps, assuming an
end-to-end latency of less than 1 millisecond. This is why ensuring the security of
user communications in 6G networks is crucial. It’s possible that if 6G networks are
extensively installed. The goal of 6G networks is to deliver reliable, low-latency wire-
less communications. Figure 1 shows the golden era of 6G network. Next-generation,
high-performance 6G networks rely heavily on very fast mobility to be successful.
384 K. H. Gudadhe et al.

Fig. 1 Ultra era in 6G networks

Extremely rapid wireless data transfer is expected to be possible because to the inte-
gration of massively numerous input/output (6G) technology and extremely high
frequencies in the advent of 6G networks [9]. In addition, 6G networks plan to allow
for 4K video streaming and lightning-fast data transfers.
Using cutting-edge methods of communication, 6G is theoretically feasible. Using
methods like ultra-large MIMO, new spectrum, holographic radio communications,
full-duplex wireless communications, multiple access, and modulation; it is possible
to achieve the greatest data speeds imaginable. For this field to make considerable
headway, energy collection and backscatter transmission will be essential. Improv-
ing connectivity and worldwide coverage need for cell-free massive MIMO systems
that integrate terrestrial and non-terrestrial communications. Both quantum commu-
nication and the blockchain have proven effective in protecting the privacy of digital
currency exchanges. The potential for ultra-reliable and low-latency communica-
tion may be facilitated by integrating holographic teleportation (telepresence) with
edge computing. To sum up, it’s feasible that AI and ML might be highly useful
in the advancement of genuine intelligence. The ultimate goal of sixth-generation
wireless technology is to allow for simultaneous operation of all wireless networks.
Part of the goal is to make it possible for existing wireless networks to reach more
immaterial locations, such as the surface of the ocean or the upper atmosphere. The
Internet is accessible from any location on the planet because to the streamlined
data-exchange capabilities provided by networks. Delay-sensitive applications will
32 A Scholastic Comprehensive Study… 385

need to be supported by 6G wireless communications networks. The Tactile Inter-


net, holographic teleportation (telepresence), the Internet of Sensing Things, and
multi-sensory extended reality (XR) such as augmented reality (AR), mixed reality
(MR), and virtual reality (VR) are all examples of such applications. Smart cities
may be broken up into radio settings, health care, the grid, transportation, manufac-
turing, farming, and the household. It is anticipated that 6G wireless communications
networks would completely enable all these intelligent applications.

3 Related Works and Paper Contribution

The probable future of the 6G network has been the subject of several studies, such
as [10]. It is described in [11] the results of an inquiry of the availability of various
ways. In reference [12], the authors investigate how quantum communication and
machine learning may be used to improve future 6G networks. Additional evidence
that AI will play a vital role in the architecture of future 6G networks is provided
by the research presented in [13]. Two sources that compare and contrast satellite
and terrestrial networks for data transfer are [14, 15]. Use of random access meth-
ods in the Internet of Things is investigated in [16]. For more on how 6G networks
and blockchain technologies could combine to provide intelligent healthcare solu-
tions, see [16]. The paper [17] investigates the potential of employing mm wave
frequency in upcoming 6G networks for satellite communications. The research in
[18] demonstrates the importance of confidentiality, privacy, and safety in the future
generation of 6G networks. as being important. Previous research neglected to take
into consideration the superior capabilities and characteristics of 6G wireless net-
works. Because of this, prior survey research has not done a good job of establishing
which technologies are necessary to satisfy specific long-term 6G ambitions. This
article includes a comprehensive overview of the current status of the subject as
well as an in-depth examination of the technologies that will form the foundation of
future 6G networks. This study investigates these measures because of their possible
relevance to the design of future 6G networks. The study extends beyond previous
surveys by investigating any and all technologies that have even a passing resem-
blance to the foundational technologies required to achieve the bare minimum in
performance standards. This review starts by naming the technologies in question
and then goes on to explain how they operate, list their key basic advantages, discuss
their predicted prospective applications, present the current state-of-the-art research,
and illuminate the research problems they face. Many emerging technologies, includ-
ing holographic teleportation (telepresence), multi-sensory extended reality, and the
Internet of Smart Things, are discussed in this study as potential applications of 6G
networks (IOT). The results of this study might be useful for both business lead-
ers and academic researchers. The writers of this review article also provide some
recommendations for further investigation.
386 K. H. Gudadhe et al.

4 Vision for 6G Networks and Key Enabler Technologies

This article describes the essential enabling technologies that will be required to sat-
isfy the needs of future 6G networks and concentrates on the important performance
features and criteria for such networks. Here, we describe the technology’s core oper-
ating concept, potential uses, current status of research, and technical challenges.

5 Maximizing the Data Rate/Spectral Efficiency

Most people feel that the data rate is the most crucial indicator of a mobile phone’s
performance. To increase the data rate of future 6G networks, the following sections
describe the major basic technologies that will be deployed.

5.1 Multiple Antenna Technology

Investigation into multiple antenna technology has increased substantially in recent


years due to its great potential to increase data speed and communication depend-
ability. Beam shaping, diversity, and spatial multiplexing all work together to achieve
this. Point-to-point (single-user) MIMO communications, in which each transceiver
has multiple antennas, was the primary focus of early research on multiple antenna
technology. Multiuser MIMO systems, which are now featured in several commu-
nication protocols including IEEE 802.11 (Wi-Fi), IEEE 802.16 (WiMAX), LTE,
and LTE-A, become the main topic of discussion. Multiuser MIMO systems, which
utilize a transmission method termed spatial multiplexing, have an advantage Over
point-to-point MIMO when it comes to serving a large number of customers simul-
taneously. Code Division Multiple Access (CDMA) for High-Speed Networks. The
N transmit antenna components in each BS can only serve the K non-cooperative
users who each have their own antenna.

5.2 Key Points of Using Large Number of Antenna Elements

The array gain and the number of degrees of freedom that may be achieved when
the transmit base station uses a high number of antenna elements are both increased.
Simple signal processing techniques may be used in both the uplink (UL) and the
downlink (DL) of MIMO. Using linear precoding techniques in DL and linear com-
bining methods in UL, the broadcast may be directed toward particular receivers
and a combination of broadcasts from several users can be created. Simple signal
processing techniques may be used in both the uplink (UL) and the downlink (DL)
32 A Scholastic Comprehensive Study… 387

of MIMO. Using linear precoding techniques in DL and linear combining methods


in UL, the broadcast may be directed toward particular receivers and a combination
of broadcasts from several users can be created. The large MIMO system optimizes
performance by focusing the BS’s outgoing signal power in the directions where
the majority of users are located. Consequently, it seems that large-scale MIMO
systems may be able to reduce their total energy consumption by decreasing the
power given by each individual node With so many positives, it’s clear that Massive
MIMO should be at the heart of the future generation of wireless communication
systems. Massive MIMO, the technology behind most mobile broadband services,
may have applications outside of networking. Massive MIMO, a low-power commu-
nications technology, might be useful in a variety of situations. Radar, sensors, and
complex machine-based networks are all examples. Important applications like radar
and MIMO communication] have received a lot of attention from academics in recent
years. Multiple-input multiple-output (MIMO) communication and radar work well
together because of (a) increased spectrum efficiency, (b) decreased hardware costs,
(c) an increase in the number of targets that can be detected with high specificity,
(d) enhanced spatial signal resolution, (e) decreased power consumption, and (f)
enhanced interference rejection. Massive MIMO has come a long way over the last
several years. Researchers have provided achievable sum rate analysis addressed the
pilot contamination issue investigated the role of correlation in the massive MIMO
systems and looked into energy efficiency and power optimization to enable FDD
operation in massive MIMO systems with two-stage precoding and in signal-stage
precoding. Researchers have proposed an extra large-scale massive MIMO system
(XL-massive MIMO or ultra-massive MIMO) to apply the advantages of massive
MIMO to higher-frequency bands and to make high-speed wireless communication
universal. Utilizing plasmonic nanoantennas to direct and concentrate transmitted
beams in the spatial and frequency domains, the XL-massive MIMO aims to sig-
nificantly increase signal strength, culminating in an array size of 1024. Due to its
ability to accommodate a high user density and enhance communication range, XL-
massive MIMO may be deployed in a scattered way. A building’s exterior, an airport
terminal, the framework of a sports stadium, or the separators of a shopping mall are
all examples of such areas.

5.3 Reconfigurable Intelligent Reflecting Surfaces

Lower coverage areas due to shorter range communications, 6G has significant hur-
dles as a result of its transition to higher-frequency bands, including less physical
channel degrees of freedom owing to fewer scattering objects and greater signal
attenuation, which impacts the dependability of transmissions between the trans-
mitter and receiver. Since the proliferation of Internet-connected home appliances
and sensors, there has been a push toward the deployment of wireless networking
technologies that are implemented entirely in software (SDNs). So, programmable
software allows for the remote management of wireless networks. Extending the cov-
388 K. H. Gudadhe et al.

erage area of future 6G networks, enhancing their communication dependability, and


optimizing their spectrum and energy efficiency all depend on finding solutions that
are both flexible and realistic without breaking the bank. A piece of software directs
the development of metasurfaces to accomplish this intelligent surface. As a low-
thickness, two-dimensional planar surface capable of manipulating the properties of
electromagnetic propagation waves. The flexible electromagnetic material consists
of planar integrated electrical circuits and software that may alter the propagation
of electromagnetic waves. Low-cost passive scattering components are used to build
these surfaces, and their amplitudes and phase shifts may be modified digitally to
redirect incoming signals to new receivers.

5.4 Key Points of the IRSs

IRSs could use less energy than other wireless communication methods. The reason
for this is that IRSs may operate very well even without the use of advanced tech-
niques such as interference control methods, complicated signal processing, or power
amplifiers with RF chains. Low production costs for IRS have made mass produc-
tion possible. Indoors, on walls/ceilings, in exhibition halls; outdoors, on irregularly
shaped surfaces like buildings, roads, walls, shopping malls, and airports. In places
with weak multipath propagation, a widespread deployment like this has the ability to
bring the network closer to more consumers. IRSs may be useful in communications
systems that operate in the millimeter wave (mmWave) or terahertz (THz) frequency
ranges. This is because it is generally accepted that signals at higher frequencies are
more susceptible to distortion caused by transmission fluctuations. IRSs may expand
wireless communications’ channel options beyond what is currently achievable using
the LoS approach. Several studies have examined the potential of IRSs in smart radio
communications. In systems based on simultaneous wireless information and power
transfer (SWIPT), for instance, IRSs are considered to enhance the propagating sig-
nal attenuation, allowing for appropriate energy harvesting at the receivers. Evidence
for this may be found in several scientific investigations, some of which are described
in. Based on the findings presented, it is suggested that IRSs be used in mobile edge
computing to increase communication reliability and decrease offloading wait times.
Mobile edge computing is a new paradigm in edge computing that makes it possi-
ble to run computation-intensive Internet of Things applications on mobile devices.
Additionally, in Section VI-E, we explore mobile edge computing in greater depth.
The authors of investigate the potential of installing IRSs at the cell’s edge in multi-
cellular networks to boost the signal of the serving BS and mitigate interference from
surrounding cells. There has been a lot of research toward integrating. IRS into cog-
nitive radio networks. IRSs may help secondary users, who repurpose the spectrum
initially granted to primary users, by increasing the transmission intensity between
the transmitter and receiver. IRSs may be used to increase physical layer security. A
number of IRSs have been studied as possible approaches to reducing data loss to
snoopers and increasing received signal strength for authorized users.
32 A Scholastic Comprehensive Study… 389

6 Holographic Radio Communications

Using holograms to dynamically shape and reroute electromagnetic waves, holo-


graphic radio communication is a kind of IRSs communication. Holographic beam
forming is often referred to as “holographic MIMO surfaces.” The low cost and low
circuit power consumption of software-defined electromagnetic wave modulators
make it possible to execute dynamic beam forming at low cost. However, holo-
graphic MIMO may be helpful for large active surfaces despite the fact that IRSs
are passive surfaces that can only reflect RF signals coming from neighboring trans-
mitters. The unification of electromagnetic and communications is at the heart of a
novel new theory called holographic MIMO. By projecting an array of an infinite
number of antennas onto a finite area or surface, holographic MIMO enables a spa-
tially continuous aperture in electromagnetic transceivers. Surfaces must allow for
the propagation of electromagnetic radiation and also act as barriers to it.

6.1 Key Points of Holographic Communications

Holographic MIMO has the potential to impact all of the world’s real settings.
Because of the hologram’s continuous electromagnetic aperture, wireless communi-
cations systems may achieve unprecedented densities and granularities in terms of
both data and location. In addition, it would enable the generation and detection of
electromagnetic waves at any spatial frequency, free from the interference caused by
side-lobe components. By virtue of its superior spatial resolution, holographic MIMO
should be able to significantly cut power consumption while significantly boosting
spatial multiplexing. The considerable propagation loss encountered by the mm wave
and THz bands may be reduced or eliminated by the use of holographic MIMO to pro-
duce super narrow beams Holographic MIMO has the potential to enhance spectrum
efficiency and network capacity since it combines visual and wireless communication
technologies.

6.2 Radio Designparadigms

The amount of data sent between Internet-connected devices, and the number of
such devices, have both increased dramatically during the last several years. The
Internet of Things (IoT) is driving the demand for faster data transfer speeds, and
developers are creating more apps that rely heavily on data. Therefore, there may
soon be an extremely severe scarcity of network capacity. As a result, efforts have
been made to make better use of the spectrum below 10 GHz and to investigate
operational frequency ranges such as mm Wave and THz. It is evident that in order
to address the wide range of needs associated with the Internet of Things, many
390 K. H. Gudadhe et al.

frequency bands must coexist inside a single system. By doing this, we may alleviate
strain on the current radio-frequency infrastructure while simultaneously decreasing
the potential for interference in wireless communications. Furthermore, future 6G
networks may benefit greatly from using higher-frequency bands, since this opens the
door to the prospect of getting faster peak data rates, more reliable communications,
and ultra-low latency. Furthermore, 6G networks are predicted to provide a unified
wireless interface by combining technologies from higher bands (above 10 GHz) and
the lower bands (below 10 GHz). However, expanding a system that uses exclusively
digital precoding from the sub-10 GHz ranges to higher bands presents a variety of
design and implementation issues and may even need significant modifications to
the physical layer.

7 Related Challenges and Future Research

However, further study is required before the system’s benefits can be fully appre-
ciated. The spatial correlation structure must be studied, for example, in order to
improve accurate channel modeling and channel estimation methods. Furthermore,
for holographic MIMO systems, it is crucial to identify practical pilot designs that
use either a purely digital or a mixed analog and digital beam forming architecture
and need minimum coherence time. Holographic MIMO systems need cutting-edge
signal processing technology and networking strategies before they can be employed
in the real world. Similarly important is the design of protocols and algorithms for
fast reconfiguration of the reflected electromagnetic signals.

8 Conclusion

Our research provides a comprehensive examination of the path forward for 6G


wireless communications networks of the future. This has allowed us to have an
understanding of the key metrics to consider when assessing a 6G network. Some of
the most important measures for measuring a 6G network’s performance have been
given, and the technology that will be needed to achieve them is also described. Each
technology’s basic idea and underlying functioning principle have been detailed. Fur-
ther, each technology’s primary practical benefit and future potential use have been
detailed. Research at the forefront of each technology’s field has been highlighted.
This paper also listed some open research problems and proposed some interesting
new research directions. The research also offered helpful insights and recommenda-
tions for implementing the technologies under discussion. This article also describes
potential new uses for 6G networks and the apps they may support. Finally, this paper
has provided both the corporate and academic communities with a detailed image of
what 6G wireless communications networks should include.
32 A Scholastic Comprehensive Study… 391

References

1. IMT traffic estimates for the years 2020 to 2030, document ITU 0-2370 (2015)
2. Cisco (2020) Cisco annual internet report (2018-2023). White Paper. https://www.cisco.
com/c/en/us/solutions/collaeral/executiveperspectives/annua-internetreport/white-paper-
c11-741490.html
3. Gupta A, Jha ERK (2015) A survey of 5G network: architecture and emerging technologies.
IEEE Access 3:1206–1232. fo:kes:nic:tue
4. David K, Berndt H (2018) 6G vision and requirements: is there any need for beyond 5G? IEEE
Veh Technol Mag 13(3):72–80
5. Sharma P (2013) Evolution of mobile wireless communication networks-1G to 5G as well
as future prospective of next generation communication network. Int J Comput Sci Mobile
Comput 2(8):47–53
6. Akyildiz IF, Gutierrez-Estevez DM, Balakrishnan R, Chavarria-Reyes E (2014) LTE-advanced
and the evolution to beyond 4G (B4G) systems. Phys Commun 10:31–60
7. Wang C-X, Haider F, Gao X, You X-H, Yang Y, Yuan D, Aggoune HM, Haas H, Fletcher S,
Hepsaydir E (2014) Cellular architecture and key technologies for 5G wireless communication
networks. IEEE Commun Mag 52(2):122–130
8. Al-Eryani Y, Hossain E (2019) The D-OMA method for massive multiple access in 6G: per-
formance, security, and challenges. IEEE Veh Technol Mag 14(3):92–99
9. Huang T, Yang W, Wu J, Ma J, Zhang X, Zhang D (2019) A survey on green 6Gnetwork:
architecture and technologies. IEEE Access 7:175758175768
10. Dang S, Amin O, Shihada B, Alouini MS (2020) What should 6G be?. Nat Electron 3(1):20–29.
https://doi.org/10.1109/HPDC.2001.945188
11. Letaief KB, Chen W, Shi Y, Zhang J, Zhang YJA (2019) The roadmap to 6G: AI empowered
wireless networks. IEEE Commun Mag 57(8):84–90
12. Zhang S, Xiang C, Xu S (2020) 6G: connecting everything by 1000 times price reduction.
IEEE Open J Veh Technol 1:107–115
13. Shafin R, Liu L, Chandrasekhar V, Chen H, Reed J, Zhang J (2019) Artificial intelligence-
enabled cellular networks: a critical path to beyond-5G and 6G. IEEE Wirel Commun
27(2):212–217
14. Chen S, Liang Y, Sun S, Kang S, Cheng W, Peng M (2020) Vision, requirements, and technology
trend of 6G: how to tackle the challenges of system coverage, capacity, user data-rate and
movement speed. IEEE Wirel Commun 27(2):218–228
15. Clazzer F, Munari A, Liva G, Lazaro F, Stefanovic C, Popovski P (2019) From 5G to 6G: has
the time for modern random access come?. arXiv:1903.03063
16. Nayak S, Patgiri R (2021) 6G communication technology: a vision on intelligent healthcare. In:
Health informatics: a computational Perspective in healthcare. Springer, Singapore, pp 1–18
17. Zhang D, Zhou Z, Xu C, Zhang Y, Rodriguez J, Sato T (2017) Capacity analysis of NOMA
with mmWave massive MIMO systems. IEEE J Sel Areas Commun 35(7)–1606
18. Huang X, Zhang JA, Liu RP, Guo YJ, Hanzo L (2019) Airplane- aided integrated networking
for 6G wireless: will it work? IEEE Veh Technol Mag 14(3):84–91
Chapter 33
A Modified LSB Steganography
Algorithm to Store Images of Large Size

Y. V. Srinivasa Murthy, Shashidhar G. Koolagudi, Saloni Parekh,


Deshpande Arnav Sunil, and J. Vaishnavi

1 Introduction

Steganography can be of many forms: physical, digital, in puzzles, and so on. Digital
steganography itself can be categorized into image, audio, and video steganography.
This paper focuses on image steganography. Image steganography involves hiding
a piece of information within an image. This can be done by directly manipulat-
ing the values of the pixels of an image—spatial domain image steganography—or
modifying the orthogonal transform of the image as opposed to the image itself—
transform domain image steganography. There are a variety of algorithms to do the
same. Least significant bit (LSB) algorithm comes under spatial domain. Algorithms
like discrete wavelet transform (DWT) and discrete cosine transform (DCT) come
under transform domain. However, there are many algorithms too. Once the data is

Y. V. Srinivasa Murthy (B) · S. Parekh · D. A. Sunil · J. Vaishnavi


School of Computer Science and Engineering (SCOPE), Vellore Institute of Technology (VIT),
Vellore, Tamil Nadu 632 014, India
e-mail: vishnu.murthy@vit.ac.in
S. Parekh
e-mail: saloni.parekh2019@vitstudent.ac.in
D. A. Sunil
e-mail: deshpandearnav.sunil2019@vitstudent.ac.in
J. Vaishnavi
e-mail: jvaishnavi.2019@vitstudent.ac.in
URL: http://www.vit.ac.in
S. G. Koolagudi
Department of Computer Science and Engineering, National Institute of Technology Karnataka
(NITK), Karnataka 575 025, India
e-mail: koolagudi@nitk.edu.in
URL: https://www.nitk.ac.in/

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 393
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_33
394 Y. V. Srinivasa Murthy et al.

hidden within the image, the image can be transmitted as usual over the communi-
cation channel, after which the receiver extracts the hidden message by applying a
process reverse to the data-hiding process [1].
Steganography is not to be confused with cryptography. Steganography differs
from cryptography in the sense that cryptography deals with protecting the contents
of a message in such a way that an eavesdropper will not be able to understand
original message when looking at the encrypted message, whereas steganography
deals with hiding the fact that a message has been sent in the first place. Usually,
cryptographic encryption is followed by steganography. This adds an extra layer of
security to the communication system.
Although the least significant bit (LSB) image steganography algorithm is simple
and easy to implement, it has inherent drawbacks that need to be addressed. One
of these drawbacks being that it is easily detectable common steganalysis tools,
because the LSB method is one of the default techniques that is checked for. Another
drawback is low data-hiding capacity, due to the fact that only one bit per pixel can
be used for storage. These drawbacks provide scope for enhancement in the LSB
algorithm.
The computational simplicity of the LSB algorithm accounts for its usefulness.
The LSB algorithm provides for quick hiding of data within an image, albeit much
concern for security as compared to other image steganography. Once example where
LSB image steganography is used is in the storage one-time-passwords (OTPs) in
images in mobile phones. This algorithm is particularly useful in this case ecause of
the limited processing power of a mobile device.
The biggest challenge in designing any image steganography algorithm is to pre-
serve the appearance of the image without any noticeable visual deformation as
compared to the actual image. At the same time, not compromising on the amount of
data to be stored, maintaining the security aspect. It should be computationally hard
to detect the data hidden in an image given just the image alone. Care should also
be taken to minimize the data loss during the data extraction from the image. Novel
LSB designs should ensure the above-mentioned qualities [2].
The paper proposes a modified LSB algorithm. The algorithm has been tested
based on the parameters. The rest of the paper is organized as follows: Sect. 2 details
the literature works that are done in the field of image steganography.

2 Literature Review

LSB steganography is the most widely steganography technique used to hide secret
data in images mainly for its simplicity of use [3]. The distortion in the resulting image
is also quite low [4]. There have been various modifications to these algorithms in
the past starting from its basic algorithm as a sequential procedure.
The basic LSB algorithm replaces the LSB of each pixel (in each channel also)
sequentially from left to right, top to bottom. This algorithm is really simple to
implement but at the same time is highly susceptible for attacks. Since data is stored
33 A Modified LSB Steganography Algorithm… 395

sequentially, data can be easily extracted also from the image and hence is not very
secure. After this, they have been several modifications that have been proposed.
Additional perceptual transparency is achieved by embedding the data at the edges
of the object [5]. Such algorithms make it difficult to extract the data from the images
as it is hard to find locations where the pixels have been modified. Adnan et al. have
used a technique where one of the RGB channels of the cover image is selected and
two LSBs of secret data are embedded in it [6]. However, hiding in a single RGB
channel decreases the amount of data that can be hidden significantly.
Mehdi and Mureed improved the LSB method increasing the embedding capacity
while retaining the quality of the stego image by changing upto five LSB of pixels
having low-intensity values. The message bits are also XORed before embedding
for higher security. However, such techniques are susceptible to detection by the
human eye [7]. There have been some random position-based algorithms developed
based on sequences also such as the famous Fibonacci sequence. However, these put
a very low limit on the amount of data that can be hidden in the image [8]. Some
RGB channel-based algorithms propose that the data be hidden in the blue channel
of the image as changes to this channel do not cause much distortion to the human
eye. There are several other image steganography algorithms based on direct cosine
transform (DCT) and discrete wavelet transform (DWT). These are, however, harder
to implement and not worth the efforts for most use cases. An improvision on Naive
LSB algorithm based on our needs is sufficient to get through most of our work needs.
The need to improve LSB also arises due to the advancement of steganalysis tools,
that is, tools that detect if any data is hidden in images or any other carrier. With the
advent of machine learning approaches, these tools have become stronger and are not
only able to detect hidden data but also extract the content in its meaningful form.
Hence, the data is no longer securely hidden. LSB as a simple and basic algorithm
is easily detected by such tools, and hence, data security is a major issue now in
LSB-based algorithms.
Our proposed method tries to improve the data security in the case of LSB by
hiding the data bits at randomly generated positions of the image. Such an imple-
mentation does not allow any third-party attacker to find out the order of data bits
of the secret data, keeping data secure even if able to extract. We also combined
RGB-based approaches with our method to keep the distortion at minimum.

3 Proposed Methodology

3.1 Naive LSB Algorithm

The naive LSB algorithm (given in Algorithm 1) is one of the earliest and most used
algorithms in steganography. LSB stands for least significant bit. LSB algorithm
involves altering the least significant bit plane in the image. Altering is done sequen-
tially to enable extracting the hidden data. Data can be hidden either beginning at
396 Y. V. Srinivasa Murthy et al.

the start of the cover image, the middle, or the end of the image. The receiver must
however be aware of the exact nature of the concealing algorithm used to be able to
be able to extract the hidden image.
LSB algorithm and its variants can be used used in any type of steganography.
When deployed for image steganography, the image compression algorithm used
must be lossless compression. Lossy compression may alter the Least significant bit
plane which makes it impossible to extract the concealed data.
This LSB algorithm is however subject to the threat of easy detection. It might not
be very visible to the naked eye, but however after subjecting to steganalysis tech-
niques, the concealed data becomes evident. One of the attacks that LSB algorithm
is not immune to is the bitplane analysis. When the image is analysed bit plane by
bit plane, the pattern of data concealed in the least significant bit becomes evident.
Similarly, there are many steganalysis algorithms suggested that can detect images
with concealed data. These analysis techniques mainly rely on the localization of
data in LSB algorithm.

3.2 Random Number-Based LSB Algorithm

Steganalysis methods generally rely on localization of data in LSB algorithm to


detect an underlying concealed pattern in the image. The random LSB algorithm
randomizes the distribution of data across the image so data is no longer localized
and steganalysis tools are no longer able to detect the concealed data.
In this algorithm, instead of selecting sequential bytes for concealing data, thereby
localizing it and bringing about a pattern in the least significant bit plane, the cover
image bytes chosen for concealment are randomized. The concealing algorithm
selects some cover image pixels at random and hides the data bits in the least signif-
icant bit of the chosen cover image pixels. The extracting algorithm must then select
the same pixels as the concealer did and also in the same order.
The random numbers generated must therefore be reproducible. The extractor
must be able to reproduce the same series of random numbers that the concealer
produces. However, the series produced each time must different series. Otherwise,
an eavesdropper can easily extract data and defeat the purpose of steganography.
Therefore, a pseudo-random generator is used. A pseudo-random generator takes
as input a seed value. If the seed value supplied is the same, at any point of time,
the series of random numbers generated is the same. Thus, the user of the algorithm
can input a randomly generated seed value. The seed value can then be encrypted
using any symmetric cryptographic algorithm based on the required strength and
performance. The seed value is also stored in the image in a certain pixel (say, the
first pixel), which is required during the extraction process, to generate the set of
random positions again.
When random numbers are generated, there is a high possibility of a collision.
In case of collisions, the previously saved data bit may be overwritten resulting
in data loss. To avoid collisions, the random numbers generated are inserted into
33 A Modified LSB Steganography Algorithm… 397

Algorithm 1 Hide and Extract procedures for Naive LSB Algorithm


1: function hide_data(image, data)
2: size_data = size of the data to be concealed (in bytes)
3: size_cover = size of the cover image in bytes
4: if size_data × 8 > size_cover - 32 then
5: printError - ‘Data size too large for cover image’
6: Exit program
7: end if
8: Represent image as a linearized array. For every i th pixel, the Red, Green and Blue channels
are at i × 3, (i × 3) + 1and(i × 3) + 2 indices in the linearized array
9: image_current_index = 0
10: for i in 1, 32 do
11: image[image_current_index] = (image[image_current_index] | 1) & (254 + (size_data 
(32 - i) & 1)
12: image_current_index = image_current_index + 1
13: end for
14: for j in 1,size_data do
15: current_byte = data[ j]
16: current_bitset = bitset(current_byte)
17: bitC = 7
18: while bitC ≥ 0 do
19: image[image_current_index] = (image[image_current_index] | 1) & (254 + cur-
rent_bitset[bitC])
20: image_current_index = image_current_index + 1
21: bitC = bitC - 1
22: end while
23: end for
24: end function
25: function extract_data(image)
26: Receive the image
27: Linearise the image. The Red, Green and Blue channels of every i th pixel will now be at
i × 3, (i × 3) + 1and(i × 3) + 2 indices in the linearised array
28: sizeData = 0
29: imageCurrentIndex = 0
30: dataCurrentIndex = 0
31: for i in 1, 32 do
32: sizeData = sizeData + ((image[imageCurrentIndex] & 1)  (32 - i))
33: imageCurrentIndex = imageCurrentIndex + 1
34: end for
35: for j in 1,sizeData do
36: currentBitset = bitset(0)
37: bitC = 7
38: while bitC ≥ 0 do
39: currentBitset[bitC] = image[imageCurrentIndex] & 1
40: bitC = bitC - 1
41: imageCurrentIndex = imageCurrentIndex + 1
42: end while
43: extractedData[dataCurrentIndex] = currrentBitset
44: dataCurrentIndex = dataCurrentIndex + 1
45: end for
46: end function
398 Y. V. Srinivasa Murthy et al.

a set implemented as a balanced binary search tree. Whenever a random number


is generated, a lookup on the tree is performed. If a collision is detected, the next
closest pixel which is not already in the tree is chosen.

3.3 RGB-Based LSB Algorithm

This algorithm follows the naive LSB method to a great extent but, however, it
carefully chooses its pixels by hiding the data in only those pixels which have a
value greater than a certain threshold (e.g., 100). This ensures the percentage change
in the pixel value is not very high and hence a lower chance of detection.
This method has been inspired from the several RGB plane-based LSB methods
where authors have chosen the RGB plane which causes least distortion to the human
eye. However, by carefully changing only those pixels that have intensity greater than
a certain threshold value, we ensure that the percentage change in value isn’t very
high, hence avoiding detection.
Data security is still a problem with the RGB-based method, and hence, this
algorithm has to be coupled with random number-based LSB algorithm to improve
the data security. In our results, we have combined both the algorithms and compared
with the Naive LSB-based algorithm.

4 Result and Observations

4.1 Image Similarity Metric Analysis

We have experimented with our algorithm as well as the Naive LSB substitution
algorithm on images of various sizes ranging from 100 × 100 to 500 × 500 sized
images. Image similarity metric has been calculated for all the images. The amount
of data to be hidden has been increased as the size of the image increases. It is kept
around 80% of the total amount of data that can hide in the image using the current
methodology.
We begin with experimenting with single LSB substitution which is the most
basic form of LSB substitution algorithms. Results as shown in the table show that
for most images it is around 99%. Such implementation barely causes any distortion
in the images, and this can observe in the following image example for 500 × 500
image.
Our next set of experiments was with two LSB substitutions. With this, we can start
seeing the significant differences between the Naive LSB as well as the Random RGB
proposed method. Clearly, the proposed RGB method is outperforming the Naive
LSB method in this case.
33 A Modified LSB Steganography Algorithm… 399

Algorithm 2 Hide and Extract procedures for Naive LSB Algorithm


1: function hide_data(image, data, seed)
2: size_data = size of the data to be concealed (in bytes)
3: size_cover = size of the cover image in bytes
4: if size_data × 8 > size_cover - 32 then
5: printError - ‘Data size too large for cover image’
6: Exit program
7: end if
8: Represent image as a linearized array. For every i th pixel, the Red, Green and Blue channels are at i × 3, (i ×
3) + 1and(i × 3) + 2 indices in the linearized array
9: image_current_index = 0
10: seed = encrypt(seed, user_passkey)
11: image[image_current_index], image[image_current_index + 1], image[image_current_index + 2],
image[image_current_index + 3] = seed
12: image_current_index = image_current_index + 4
13: for i in 1, 32 do
14: image[image_current_index] = (image[image_current_index] | 1) & (254 + (size_data  (32 - i) & 1)
15: image_current_index = image_current_index + 1
16: end for
17: Generate variable_random_array containing size_data × 8 unique random numbers (they can be generated as
described above)
18: for j in 1,size_data do
19: current_byte = data[ j ]
20: current_bitset = bitset(current_byte)
21: bitC = 7
22: k=0
23: while bitC ≥ 0 do
24: = image_current_index = random_array[i × 8 + k ]
25: image[image_current_index] = (image[image_current_index] | 1) & (254 + current_bitset[bitC])
26: bitC = bitC - 1
27: k=k+1
28: end while
29: end for
30: end function
31: function extract_data(image)
32: Receive the image
33: Linearise the image. The Red, Green and Blue channels of every i th pixel will now be at i × 3, (i × 3) + 1and(i ×
3) + 2 indices in the linearised array
34: sizeData = 0
35: imageCurrentIndex = 0
36: dataCurrentIndex = 0
37: seed = image[image_current_index], image[image_current_index + 1], image[image_current_index + 2],
image[image_current_index + 3]
38: image_current_index = image_current_index + 4
39: for i in 1, 32 do
40: sizeData = sizeData + ((image[imageCurrentIndex] & 1)  (32 - i))
41: imageCurrentIndex = imageCurrentIndex + 1
42: end for
43: Generate variable_random_array containing size_data × 8 unique random numbers (they can be generated as
described above)
44: for j in 1,sizeData do
45: currentBitset = bitset(0)
46: bitC = 7
47: k=0
48: while bitC ≥ 0 do
49: = image_current_index = random_array[i × 8 + k ]
50: currentBitset[bitC] = image[imageCurrentIndex] & 1
51: bitC = bitC - 1
52: k=k+1
53: end while
54: extractedData[dataCurrentIndex] = currrentBitset
55: dataCurrentIndex = dataCurrentIndex + 1
56: end for
57: end function
400 Y. V. Srinivasa Murthy et al.

Algorithm 3 Hide and extract procedures for RGB-Based LSB Algorithm


1: initialize threshold = 100
2: function hide_data(image, data)
3: Follows the steps 1-18 as shown in Naive-LSB algorithm
4: While storing each bit, let maxc = max(R,G,B), where R,G,B stand for the pixel intensities
at each Red, Green and Blue channels
5: if maxc > threshold then
6: Store pixel at that location
7: else
8: Move on to the next pixel
9: end if
10: Continue this till secret bits are hidden.
11: end function
12: function extract_data(image)
13: Extract individual bits similar to using Naive LSB algorithm
14: However, at each pixel position, ensure the pixel intensity is greater than threshold value.
15: end function

Random Pixel Sequential

Fig. 1 Outcome of random and sequential LSB steganography with the consideration of one LSB
pixels

With three LSB substitution, we finally start to see the changes that occur with
different sized images. As the size of the image decreases, clearly image similarity
metric reduces which supports the proposition that image similarity metric takes into
account the spatial positioning of the data bits. More bits are hidden closer to each
other in smaller images and hence higher chances of being detected.
With four and five LSB substitution, we start seeing noticeable changes in the
image. In the case of Naive LSB algorithm, we can see a series of darker dots
supporting that the pixels have become darker. In the case of Random+RGB proposed
method, as we look closer we can see certain distortions (dark spots) in the image.
The image similarity metric for such cases has also reduced significantly.
33 A Modified LSB Steganography Algorithm… 401

Random Pixel Sequential

Fig. 2 Outcome of random and sequential LSB steganography with the consideration of two LSB
pixels

Random Pixel Sequential

Fig. 3 Outcome of random and sequential LSB steganography with the consideration of three LSB
pixels

Five LSB Substitution as expected clearly shows us high levels of distortion. At


the same time, it shows how much better the proposed algorithm is as compared
to the Naive LSB Substitution method. Distortions in the image generated by Ran-
dom+RGB method can barely be spotted unless very close to the image.
We have provided five Figs. 1, 2, 3, 4, and 5 and five Tables 1, 2, 3, 4, and 5
that explains the change in the image pixels after applying one, two, three, four,
and five LSB steganography algorithms. Both the sequential and random images are
displayed.
402 Y. V. Srinivasa Murthy et al.

Random Pixel Sequential

Fig. 4 Outcome of random and sequential LSB steganography with the consideration of four LSB
pixels

Random Pixel Sequential

Fig. 5 Outcome of random and sequential LSB steganography with the consideration of five LSB
pixels

4.2 Algorithms Comparison

The algorithms we have proposed have their own merits and demerits. We have
displayed here a table comparing the properties of the proposed algorithms against
the Naive-Substitution algorithms. Table 6 gives the details about the efficiency of
the proposed approach over other algorithms. Table 7 gives the details about the
performance of the proposed approach over other algorithms.
33 A Modified LSB Steganography Algorithm… 403

Table 1 Results obtained using Naive and proposed methodology (1-LSB Steganography)
Naive Random + RGB
100 × 100 99.378 99.453
200 × 200 99.619 99.689
300 × 300 99.700 99.754
400 × 400 99.773 99.780
500 × 500 99.806 99.817

Table 2 Results obtained using Naive and proposed methodology (2-LSB Steganography)
Naive Proposed
100 × 100 89.178 98.787
200 × 200 90.839 98.910
300 × 300 91.461 99.011
400 × 400 92.109 99.291
500 × 500 92.647 99.339

Table 3 Results obtained using Naive and proposed methodology (3-LSB Steganography)
Naive Proposed
100 × 100 88.120 96.261
200 × 200 89.219 97.359
300 × 300 90.410 98.107
400 × 400 91.081 98.671
500 × 500 92.887 99.051

Table 4 Results obtained using Naive and proposed methodology (4-LSB Steganography)
Naive Proposed
100 × 100 66.799 92.117
200 × 200 68.014 93.399
300 × 300 69.102 94.651
400 × 400 60.221 95.301
500 × 500 61.114 96.101
404 Y. V. Srinivasa Murthy et al.

Table 5 Results obtained using Naive and proposed methodology (5-LSB Steganography)
Naive Proposed
100 × 100 57.087 92.017
200 × 200 59.211 93.399
300 × 300 60.205 94.781
400 × 400 61.121 95.401
500 × 500 61.9076 96.044

Table 6 Comparative analysis of the parameters security, amount of data, and spatial noise of
proposed approach over the other algorithms
Algorithm Bits/pixel (bpp) Data security Amount of data Spatial noise
Naive LSB 1–3 Low High High
RGB based LSB 1 Low Low Low
Random number <1 High High Moderate
Random+RGB <1 High Low Very low

Table 7 Performance of the proposed approach over the other approaches


Amount of data to hide (in bytes) Amount of data discovered (in bytes)
Naïve LSB RBG-based LSB Random number
based LSB
89280 111307 0 0
110565 122188 0 103723
151802 148196 0 131142
212048 142539 (data too big) 171646

4.3 Steganalysis Comparision

We also checked the images generated by the above algorithms against an available
and state-of-the-art steganalysis tool called StegExpose. The results of our experiment
are as follows:
We have performed a comparative analysis on our proposed algorithms and studied
their performance with respect to the following parameters:

• Noise induced in the image after the steganography


• Loss in data after extracting the data from the image
• Result of steganalysis techniques performed on the image after steganography
(amount of data discovered)
• Image similarity.
33 A Modified LSB Steganography Algorithm… 405

Table 8 Image similarity metric that is considered to compare original with the image of secret
information
#bits used Image similarity
Sequential Random
1 99.8065 99.8817
2 92.6473 99.3399
3 92.8875 99.0510
4 61.1147 96.1010
5 61.9076 96.0441

4.4 Image Similarity Metric

For comparing the image similarity of two images, the pixel values cannot be com-
pared directly. The images formed using Naive and random LSB have the same bit
differences with respect to the original image.
Humans however will not notice the same degree of difference between the two
in spite of same amount of difference with respect to pixel values. Thereby, we came
up with our metric called image similarity metric. Table 8 gives the details of the
similarities obtained over original images.

Algorithm 4 Calculation of Image Similarity Metric


1: Divide the image into n × n blocks of pixels.
2: Find the average of Red, Blue and Green channels in each block.
3: Find the percentage difference in the average values of each channel between the two corre-
sponding blocks of the two images being compared. Let these values be R p , B p , G p .
4: Average difference between the two corresponding blocks is calculated as a weighted average
of above values as follows: Average = R p × 0.299 + B p × 0.114 + G p × 0.587
5: The average visual difference between the two images is calculated as average of differences
between all the blocks.

5 Conclusion

The results we have collected from our extensive set of experiments have been shown
in the previous section. From all our results, we can clearly see that the proposed
RGB+Random-based method improves over the Naive LSB algorithm to a really
great extent. Our purpose of developing a modified and enhanced LSB Steganography
algorithm is thus complete.
406 Y. V. Srinivasa Murthy et al.

This paper has certain limitations. The modified algorithms work better than the
Naive LSB algorithm; however, more experiments need to be performed to compare
with the other modified LSB algorithms that are present today. This algorithm also
cannot be compared against more of the advanced image steganography algorithms
today. Future work has to be done to compare the results of this algorithm with other
algorithms in the domain of LSB steganography.

References

1. Li B, He J, Huang J, Shi YQ (2011) A survey on image steganography and steganalysis. J Inf


Hiding Multim Sign Process 2(2):142–172
2. Gutub AAA et al (2010) Pixel indicator technique for RGB image steganography. J Emerg
Technol Web Intell 2(1):56–64
3. Hashim MM, Rahim MSM, Johi FA, Taha MS, Hamad HS (2018) Performance evaluation
measurement of image steganography techniques with analysis of LSB based on variation
image formats. Int J Eng Technol 7(4):3505–3514
4. Neeta D, Snehal K, Jacobs D (2006) Implementation of LSB steganography and its evaluation
for various bits. In: 2006 1st international conference on digital information management, pp
173–178. IEEE
5. Hempstalk Kathryn (2006) Hiding behind corners: using edges in images for better steganog-
raphy. In: Proceedings of the computing women’s congress, Hamilton, New Zealand, pp 11–19
6. Karim SMM, Rahman MS, Hossain MI (2011) A new approach for LSB based image steganog-
raphy using secret key. In: 14th international conference on computer and information tech-
nology (ICCIT 2011)
7. Al-Shatnawi AM, AlFawwaz BM (2013) An integrated image steganography system with
improved image quality. Appl Math Sci 7(71):3545–3553
8. Rehman Amjad, Saba Tanzila, Mahmood Toqeer, Mehmood Zahid, Shah Mohsin, Anjum Adeel
(2019) Data hiding technique in steganography for information security using number theory.
J Inf Sci 45(6):767–778
Author Index

A Goswami Siddhant Arun, 323


Adhoksh, P., 137 Gracelin Sheena, B., 309
Akshata S. Kori, 39
Aldaya, Ivan, 259
Anuradha C. Phadke, 365 H
Aryan Khandelwal, 137 Heenakauser Pyare, 151
Aswathi, S., 271 Hemantha Kumar, G., 89, 299
Ayushi, S., 137 Hua Guang Hui, 89
Ayush Shah, 137

I
B Indrani Mukherjee, 177
Balasubramaniam Jayaram, 189
Bhautik H. Gevariya, 77
Bhumika, R., 137 J
Bhuvanesh Bhattarai, 271 Janaki, K., 199
Biswajit Tripathy, 213 Jebastin, K., 199
Biswaranjan Sarangi, 213 Jeyavani, M., 113
Jyoti Madake, 151
C
Cesar, Vitória, 259
K
Cossa, Grazielle, 259
Kajal Rai, 223
Costa, Camila, 259
Kamal Kumar, 1
Karuppasamy, M., 113
D Kavita H. Gudadhe, 381
Deepak C Karia, 323 Kavit Nanavati, 189
Deivalakshmi, S., 101 Keshav Jhawar, 137
de Oliveira, José Augusto, 259 Krishna Sowjanya, K., 271
Deshpande Arnav Sunil, 393
Dhinakaran, K., 199
M
Manjunath Aradhya, V. N., 89, 299
G Manju, V. N., 271
Garima Bisht, 25 Marim, Lucas, 259

© The Editor(s) (if applicable) and The Author(s), under exclusive license 407
to Springer Nature Singapore Pte Ltd. 2023
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5
408 Author Index

Mayur S. Gowda, 39 Sandhya, D., 309


Medha Wyawahare, 285 Sanjaykumar J. Patel, 77
Megha Gupta, 189 Santos, Mirian, 259
Satarupa Mohanty, 63
Shadab Siddiqui, 11
N Sharada Kore, 51
Nipun Jain, 285 Sharan, R., 101
Shashidhar G. Koolagudi, 393
Sheikh Fahad Ahmad, 11
O Shivam Tikhe, 235
Om Sarulkar, 235 Shola Olabisi, 355
Shreekanth M. Prabhu, 271
Shripad Bhatlawande, 151
P Shruti Patil, 331
Pal, A. K., 25 Silva, Leandro Augusto, 249
Paramita Mitra, 271 Silva Souza da, Vitor, 249
Paridhi, K., 355 Souza dos Santos, Denilson, 259
Pawan Kumar, 223 Sree Charan Mamidi, 11
Penchel, Rafael, 259 Srinivasa Murthy, Y. V., 355, 393
Pooja V. Kamat, 331 Sudaroli Sandana, J., 101
Prakhar Agarwal, 331 Sudha Govindan, 127
Prasanth, A., 309 Sudhanshu Pandey, 331
Prasant Kumar Pattnaik, 63 Suguna Sangaiah, 127
Pratik Bhattacharjee, 177 Sumit Giri, 235
Pratyush Jain, 331 Suparna Biswas, 177
Preethi Sheba Hepsiba Darius, 271 Swati Gaikwad, 381
Priya Deshpande, 51 Swati Shilaskar, 151
Priyanka P. Pawar, 365

T
R
Tanmay Paratkar, 285
Rahul Joshi, 331
Thejaswini, M. S., 299
Rahul Mansharamani, 331
Rahul Pitale, 235
Rajesh Jalnekar, 151
Rakesh Kumar Pandey, 343 U
Rakesh Kumar Tiwari, 343 Umesh V. Nikam, 165
Rakshitha, H. S., 39
Ramya, R. S., 137
Ritu Malik, 1 V
Rohan More, 235 Vaishali M. Deshmukh, 165
Vaishnavi, J., 355, 393
Venugopal, K. R., 137
S Vijay Kumar Nampally, 63
Sabeena Gnana Selvi, G., 309 Vipul Kheraj, 77
Sagar Nilgar, 151 Vivek Mankar, 285
Sagar Shedge, 151
Sai Vignesh, 101
Saloni Parekh, 393 W
Sanchari Saha, 271 Warsha P. Sirskar, 381

View publication stats

You might also like