Professional Documents
Culture Documents
net/publication/377921827
CITATIONS READS
0 11
3 authors, including:
All content following this page was uploaded by Guang HUI Hua on 02 February 2024.
Ritu Tiwari
Mario F. Pavone
Mukesh Saraswat Editors
Proceedings
of International
Conference
on Computational
Intelligence
ICCI 2022
Algorithms for Intelligent Systems
Series Editors
Jagdish Chand Bansal, Department of Mathematics, South Asian University,
New Delhi, Delhi, India
Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee,
Roorkee, Uttarakhand, India
Atulya K. Nagar, School of Mathematics, Computer Science and Engineering,
Liverpool Hope University, Liverpool, UK
This book series publishes research on the analysis and development of algorithms for
intelligent systems with their applications to various real world problems. It covers
research related to autonomous agents, multi-agent systems, behavioral modeling,
reinforcement learning, game theory, mechanism design, machine learning, meta-
heuristic search, optimization, planning and scheduling, artificial neural networks,
evolutionary computation, swarm intelligence and other algorithms for intelligent
systems.
The book series includes recent advancements, modification and applications of
the artificial neural networks, evolutionary computation, swarm intelligence, artifi-
cial immune systems, fuzzy system, autonomous and multi agent systems, machine
learning and other intelligent systems related areas. The material will be benefi-
cial for the graduate students, post-graduate students as well as the researchers who
want a broader view of advances in algorithms for intelligent systems. The contents
will also be useful to the researchers from other fields who have no knowledge of
the power of intelligent systems, e.g. the researchers in the field of bioinformatics,
biochemists, mechanical and chemical engineers, economists, musicians and medical
practitioners.
The series publishes monographs, edited volumes, advanced textbooks and
selected proceedings.
Indexed by zbMATH.
All books published in the series are submitted for consideration in Web of
Science.
Ritu Tiwari · Mario F. Pavone · Mukesh Saraswat
Editors
Proceedings of International
Conference
on Computational
Intelligence
ICCI 2022
Editors
Ritu Tiwari Mario F. Pavone
Indian Institute of Information Technology Department of Mathematics and
Pune, India Computer Science
University of Catania
Mukesh Saraswat Catania, Italy
Jaypee Institute of Information Technology
Noida, India
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
This book contains outstanding research papers as the proceedings of the Interna-
tional Conference on Computational Intelligence—(ICCI 2022), held on December
29–30, 2022, at Indian Institute of Information Technology, Pune, India, under the
technical sponsorship of the Soft Computing Research Society, India. The conference
is conceived as a platform for disseminating and exchanging ideas, concepts, and
results of researchers from academia and industry to develop a comprehensive under-
standing of the challenges of the advancements of intelligence in computational view-
points. This book will help in strengthening congenial networking between academia
and industry. We have tried our best to enrich the quality of the ICCI 2022 through the
stringent and careful peer-review process. This book presents novel contributions to
Computational Intelligence and serves as reference material for advanced research.
We have tried our best to enrich the quality of the ICCI 2022 through a stringent
and careful peer-review process. ICCI 2022 received many technical contributed
articles from distinguished participants from home and abroad. After a very strin-
gent peer-reviewing process, only 33 high-quality papers were finally accepted for
presentation and the final proceedings. The proceedings of ICCI 2022 contains 33
research papers on Computational Intelligence-based Algorithms and applications
and serves as reference material for advanced research.
v
Contents
vii
viii Contents
xi
xii About the Editors
Professor Pavone was also Invited Speaker for several international conferences and
Editor of many special issues in: artificial life, engineering applications of artificial
intelligence (EAAI), applied soft computing (ASOC), BMC immunology, natural
computing, and memetic computing. etc. Professor Pavone is Co-founder of Tao
Science Research Center and Scientific Director of ANTs Lab—Advanced New
Technologies Research Laboratory. Professor Pavone was Visiting Professor at the
School of Computer Science, University of Nottingham, UK, and Visiting Researcher
at the IBM-KAIST Bio-Computing Research Center, Department of Bio and Brain
Engineering, at the Korea Advanced Institute of Science and Technology (KAIST)
in 2009 and 2006, respectively.
1 Introduction
novel (R,S)-norm EM for IFS. Another main important part of solving the MADM
issues is aggregating the provided decision-maker(s)’ data.
However, IFS is not so proficient, when we work with qualitative fuzzy infor-
mation. It is much easier to express qualitative fuzzy information with linguistic
variables [15]. For example, when quality of some food product is assessed, terms
like “not good”, “good”, and “very good” are generally adopted by decision-makers
to support their choice. To handle the qualitative data’s uncertainty, in 2015, [16]
have developed the linguistic intuitionistic fuzzy set (LIFS) by combining the char-
acteristics of LV and IFS. Kumar and Chen [17] defined the weighted averaging
AOs for aggregating LIFNs. Liu and Wang [18] defined the improved AOs for lin-
guistic intuitionistic fuzzy numbers (LIFNs). Peng et al. [19] presented the AOs for
LIFNs through the use of Frank Heronian operations. Set pair analysis (SPA) theory
based AOs for LIFNs is proposed by Garg and Kumar [20]. Garg and Kumar [21]
defined the possibility degree measure for comparing the LIFNs. Kumar and Chen
[22] defined the distance measures for the LIFSs and group decision-making method
for LIFSs. Meng and Dong [23] defined the similarity measures and PROMETHEE
method based on it for the LIFSs. Tang and Meng [24] defined the Hamacher aggre-
gation operators for aggregating the LIFNs. Liu et al. [25] defined the three-way
decision method for LIFNs. Li et al. [26] proposed the entropy measure for LIFSs
and extended VIKOR method based on LIFS operations laws and proposed entropy
measure. In 2021, [27] defined a new entropy measure for LIFS for solving decision-
making problem.
However, on mathematical verification, we found some inadequacy in existing
EMs of LIFSs. To overcome these drawbacks, there is a requirement of distinct EM
to measure the uncertainty of LIFSs. This paper proposes a new EM for the LIFSs.
We also defined the proof of some desirable properties and validity condition of the
presented EM of LIFSs to validate it. The proposed EM can overcome the downsides
of the current EMs of the LIFSs. The proposed EM is very easy and useful to calculate
the uncertainty of the LIFSs.
To achieve the above mentioned target, rest part of the paper is concluded as: In
Sect. 2, brief introduction of fundamental concepts, which are relevant to this paper,
is given. The drawbacks of the current EMs are given in Sect. 3. In Sect. 4, we have
defined a new EM for LIFS environment that can defeat the disadvantages of the
current EMs of LIFSs. Finally, Sect. 5 concludes the paper.
2 Preliminaries
Definition 1 [28] Let a linguistic term (LT) set (LTS) be S = st | t = 0, 1, 2, . . . , h
with a finite odd cardinality, where st is a desired value for a linguistic variable (LV).
For example, when evaluating a laptop’s “configuration”, we can implement seven
LTs as s0 (“none”), s1 (“very low”), s2 (“low”), s3 (“medium”), s4 (“high”), s5 (“very
high”), and s6 (“perfect”).
1 Entropy Measure for the Linguistic Intuitionistic Fuzzy Set 3
Definition 2 [16] A linguistic intuitionistic fuzzy set (LIFS) in the universe of dis-
courese U is defined as
where sρ(u i ) ∈ s[0,h] and sη(u i ) ∈ s[0,h] indicate the belongingness degree (BD) and
non-belongingness degree (NBD) of the element u i ∈ U to Z , respectively, 0 ≤
ρ(u i ) ≤ h, 0 ≤ η(u i ) ≤ h, and 0 ≤ ρ(u i ) + η(u i ) ≤ h. sπ(u i ) = sh−ρ(u i )−η(u i ) is
called the hesitance degree of u i to Z where 0 ≤ π(u i ) ≤ h, u i ∈ U .
Usually, the pair sρ , sη is defined as linguistic intuitionistic fuzzy number (LIFN)
where 0 ≤ ρ ≤ h, 0 ≤ η ≤ h, and 0 ≤ ρ + η ≤ h.
Let [0,h] be the collection of the LIFSs.
Definition 3 [16] Let β1 = (sρ1 , sη1 ) and β2 = (sρ2 , sη2 ) be any two LIFNs, then
(1) β1 β2 = (sρ1 +ρ2 − ρ1hρ2 , s η1hη2 );
(2) β1 β2 = (s ρ1 ρ2 , sη1 +η2 − η1hη2 );
h
(3) kβ = k(sρ , sη ) = (sh−h(1− ρh )k , sh( hη )k );
(4) β k = (sρ , sη )k = (sh( ρh )k , (sh−h(1− hη )k );
where k > 0.
Definition 4 [16] For any LIFN β = (sρ , sη ), score value S(β) and accuracy func-
tion H (β) are represented as:
S(β) = ρ − η (2)
H (β) = ρ + η (3)
In the following, we are reviewing the some existing EMs for the LIFSs. Let Z =
{u i , sρ(u i ) , sη(u i ) | u i ∈ U } ∈ [0,h] be any LIFS, then
1
n
E 1 (Z ) = 4 ρ(u i ).η(u i ) + π (u i ) + 2 (h − ρ(u i ))(h − η(u i )) .
3nh i=1
(4)
Definition 6 [16] Let Z = {u, sρ(u i ) , sη(u i ) | u i ∈ U } be any LIFS and k > 0, then
Z k is defined as
Z k = u i , s
ρ(u i ) k ,s
η(u ) k | ui ∈ U . (6)
h h h 1− 1− h i
Now, by utilizing Eq. (4), we calculate the existing EM E 1 for the LIFSs Z 1/2 ,
Z , Z 2 , Z 3 , and Z 4 and get E 1 (Z 1/2 ) = 0.8630, E 1 (Z ) = 0.8698, E 1 (Z 2 ) = 0.7181,
E 1 (Z 3 ) = 0.5773, and E 1 (Z 4 ) = 0.4689.
For the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4 , an effective EM must satisfy the following
relation [13, 14, 27]:
E(Z 1/2 ) > E(Z ) > E(Z 2 ) > E(Z 3 ) > E(Z 4 ). (8)
Based on the computed result of the existing EM [27] given in Eq. (4), we obtain
E 1 (Z ) > E 1 (Z 1/2 ) > E 1 (Z 2 ) > E 1 (Z 3 ) > E 1 (Z 4 ). Thus the existing EM E 1 given
in Eq. (4) does not satisfy the relation given in Eq. (8) for this example. Hence, we
require a new EM for LIFSs that overcomes the disadvantages of the existing EM of
LIFSs.
1
n
1
E(Z ) = h − |ρ(u i ) − η(u i )|(h − π (u i )) (9)
nh i=1 h
Theorem 1 The proposed entropy measure E(Z ) of LIFS Z = {u i , sρ(u i ) , sη(u i ) |
u i ∈ U } ∈ [0,h] satisfies the properties given in Definition 5.
1 1
⇔ h − |ρ(u i ) − η(u i )|(h − π(u i )) = 0
nh h
1
⇔ h − |ρ(u i ) − η(u i )|(h − π(u i )) = 0
h
⇔ h 2 − |ρ(u i ) − η(u i )|(h − π(u i )) = 0
⇔ ρ(u i ) = h, η(u i ) = 0 or ρ(u i ) = 0, η(u i ) = h
1 1
⇔ h − |ρ(u i ) − η(u i )|(h − π(u i )) = 1
nh h
1
⇔ h − |ρ(u i ) − η(u i )|(h − π(u i )) = h
h
1
⇔ |ρ(u i ) − η(u i )|(h − π(u i )) = 0
h
⇔ |ρ(u i ) − η(u i )|(h − π(u i )) = 0
⇔ ρ(u i ) = η(u i )
1 1
E(Z ) = h− |ρ(u i ) − η(u i )|(h − π(u i ))
nh h
1 1
= h− |η(u i ) − ρ(u i )|(h − π(u i ))
nh h
= E(Z c )
(P4) Consider the function f (x, y) = h − h1 |x − y|(x + y) , where 0 ≤ x, y ≤ h
and 0 ≤ x + y ≤ h. We must demonstrate that when x ≤ y, the function f (x, y)
increases with respect to x and decreases with respect to y. We have
∂ f (x, y) 1
= − [(|x − y| + (x + y))]
∂x h
∂ f (x, y) 1
= − [(|x − y| − (x + y))].
∂y h
Since ∂ f ∂(x,y)
x
≥ 0 and ∂ f (x,y)
∂y
≤ 0 for x ≤ y. Thus, for x ≤ y, the function f (x, y)
increases with respect to x and decreases with respect to y. Hence, f (ρ1 (u i ), η1 (u i )) ≤
f (ρ2 (u i ), η2 (u i )) when ρ2 (u i ) ≤ η2 (u i ) and ρ1 (u i ) ≤ ρ2 (u i ), η1 (u i ) ≥ η2 (u i ).
1 Entropy Measure for the Linguistic Intuitionistic Fuzzy Set 7
Table 1 Value of EMs E 1 (.), E 2 (.), and E(.) for the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4
E1 E2 E
Z 1/2 0.8630 0.6463 0.6546
Z 0.8698 0.6288 0.6375
Z2 0.7181 0.5462 0.5517
Z3 0.5773 0.4057 0.4197
Z4 0.4689 0.3182 0.3358
Similarly, ∂ f ∂(x,y)x
≤ 0 and ∂ f (x,y)
∂y
≥ 0 for x ≥ y. Thus, for x ≥ y, the func-
tion f (x, y) decreases with respect to x and increases with respect to y. Hence,
f (ρ1 (u i ), η1 (u i )) ≤ f (ρ2 (u i ), η2 (u i )) when ρ2 (u i ) ≥ η2 (u i ) and ρ1 (u i ) ≥ ρ2 (u i ),
η1 (u i ) ≤ η2 (u i ). n
Therefore, if H1 is less fuzzy compare to H2 , then n1 i=1 f (ρ1 (u i ), η1 (u i )) ≤
n
1
n i=1 f (ρ 2 (u i ), η 2 (u i )). Hence, E(H 1 ) ≤ E(H 2 ).
Example 3 Let a LIFS Z = u 1 , s1 , s7 , u 2 , s4 , s1 , u 3 , s2 , s6 ∈ [0,8] . By using
Eq. (9), we computed the proposed EM E(Z ) of the LIFS Z as follows:
1
n
1
E(Z ) = (h − |ρ(u i ) − η(u i )|(h − π (u i )))
nh h
i=1
1 1 1 1
= (8 − |1 − 7|(8 − 0)) + (8 − |4 − 1|(8 − 3)) + (8 − |2 − 6|(8 − 0))
3×8 8 8 8
1 1 1 1
= (8 − (6)(8)) + (8 − (3)(5)) + (8 − (4)(8))
3×8 8 8 8
1 49
= (2) + + (4)
24 8
= 0.5052.
Example 4 Consider the same LIFSs from Example 1 to calculate the proposed EM
E(.) for the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4 . By utilizing Eq. (9), we calculate the pro-
posed EM E(.) for the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4 and obtain E(Z 1/2 ) = 0.6546,
E(Z ) = 0.6375, E(Z 2 ) = 0.5517, E(Z 3 ) = 0.4197, and E(Z 4 ) = 0.3358. Hence,
the proposed EM satisfies the relation E(Z 1/2 ) > E(Z ) > E(Z 2 ) > E(Z 3 ) >
E(Z 4 ). Hence, proposed EM of LIFSs is a valid EM.
We make a comparative study for the Example 4. Table 1 consists the value of EMs
E 1 (.), E 2 (.), and E(.) for the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4 given in Example 1.
From Table 1, it is visible that performances of EMs E 2 (.) and E(.) are according
to the relation given in Eq. (8), while the performance of the EM E 1 (.) is not according
to the the relation given in Eq. (8).
Table 2 Value of EMs E 1 (.), E 2 (.), and E(.) for the LIFNs Z 1 , Z 2 , Z 3 , Z 4 , and Z 5
E 1 (.) E 2 (.) E(.)
Z1 0.9996 0.9933 0.9983
Z2 0.9996 0.9804 0.9819
Z3 0.9996 0.9800 0.9812
Z4 0.9802 0.8505 0.8589
Z5 0.9841 0.8505 0.8537
posed EM E(.). By using Eq. (9), we calculate the proposed EM E(.) for the LIFNs
Z 1 , Z 2 , Z 3 , Z 4 , and Z 5 as follows:
E(Z 1 ) = 18 (8 − 18 |0.6 − 0.5|(8 − 6.9)) = 0.9983,
E(Z 2 ) = 18 (8 − 18 |2.8 − 3.0|(8 − 2.2)) = 0.9819,
E(Z 3 ) = 18 (8 − 18 |2.9 − 3.1|(8 − 2.0)) = 0.9812,
E(Z 4 ) = 18 (8 − 18 |3.79 − 2.31|(8 − 1.9)) = 0.8589,
E(Z 5 ) = 18 (8 − 18 |2.729 − 4.1|(8 − 1.1710)) = 0.8537.
We make a comparative study for Example 5. Table 2 consists the value of EMs
E 1 (.), E 2 (.), and E(.) for the LIFNs Z 1 , Z 2 , Z 3 , Z 4 , and Z 5 given in Example 2.
From Table 2, it is visible that E 1 (Z 1 ) = E 1 (Z 2 ) = E 1 (Z 3 ) = 0.9996 and E 2
(Z 4 ) = E 2 (Z 5 ) = 0.8505 while Z 1 , Z 2 , Z 3 , Z 4 , and Z 5 all are different. Hence,
proposed EM E(.) can address the shortcomings of the existing EMs E 1 and E 2 of
the LIFSs given in Eqs. (4) and (5), respectively.
Examples 4 and 5 show that the proposed EM of LIFSs can address the flaws
of the existing EMs of LIFSs. The proposed EM is a useful tool for depicting the
uncertainty of LIFSs.
5 Conclusion
Linguistic intuitionistic fuzzy set (LIFS) is a dynamic continuation of the fuzzy set to
express and deal with fuzziness of qualitative information. This paper proposed a new
entropy measure (EM) for LIFSs, which not only contain belongingness degree and
non-belongingness degree even include the grade of uncertainty. The proposed EM
is used to measure the uncertainty of the LIFSs. Certain properties of the proposed
EM have also been discussed to validate the proposed EM. The proposed EM can
overcome the disadvantages of the existing EMs of the LIFSs. The proposed EM is
very useful for the decision-makers to measure the uncertainty of any LIFSs. In the
future, we will prepare some decision-making methods for the LIFSs environment
based on the proposed EM. By using the proposed EM, we can measure weights for
the attributes in decision-making problems.
1 Entropy Measure for the Linguistic Intuitionistic Fuzzy Set 9
References
23. Meng F, Dong B (2021) Linguistic intuitionistic fuzzy PROMETHEE method based on simi-
larity measure for the selection of sustainable building materials. J Ambient Intell Humanized
Comput 1–21
24. Tang J, Meng F (2019) Linguistic intuitionistic fuzzy Hamacher aggregation operators and
their application to group decision making. Granular Comput 4(1):109–124
25. Liu J, Mai J, Li H, Huang B, Liu Y (2022) On three perspectives for deriving three-way decision
with linguistic intuitionistic fuzzy information. Inform Sci 588:350–380
26. Li Z, Liu P, Qin X (2017) An extended VIKOR method for decision making problem with
linguistic intuitionistic fuzzy numbers based on some new operational laws and entropy. J
Intell Fuzzy Syst 33(3):1919–1931
27. Kumar K, Mani N, Sharma A, Bhardwaj R (2021) A novel entropy measure for linguis-
tic intuitionistic fuzzy sets and their application in decision-making. In: Multi-criteria deci-
sion modelling: applicational techniques and case studies, p 121. https://doi.org/10.1201/
9781003125150
28. Herrera F, Martínez L (2001) A model based on linguistic 2-tuples for dealing with multigran-
ular hierarchical linguistic contexts in multi-expert decision-making. IEEE Trans Syst Man
Cybern Part B Cybern 31(2):227–234
29. Xu Z (2004) A method based on linguistic aggregation operators for group decision making
with linguistic preference relations. Inform Sci 166(1):19–30
Chapter 2
IoT-Based Smart City Architecture
and Its Applications
1 Introduction
A technologically advanced urban setting known as a “smart city”i uses various elec-
trical devices and sensors to gather data. The information is then used to improve city
operations. Assets, resources, and services are successfully managed by using the
knowledge gathered from these data. Data are gathered from people, devices, build-
ings, and assets to monitor and control traffic and transportation systems, power
plants, utilities, water supply networks, garbage, criminal detection, information
management, schools, libraries, hospitals, and other community services. Smart
cities have superior monitoring, planning, and governance mechanisms in addition
to creative technology utilization [1]. The success of a smart city depends on its
capacity to forge a solid alliance between the public and private sectors, especially
in terms of bureaucracy and regulations.
2 Literature Review
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 11
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_2
12 S. C. Mamidi et al.
be included into Allam and Newman’s smart city framework. The purpose of this
study’s conclusions is to provide policymakers with information on alternative and
more pertinent economic potential for Port Louis through smart tourism.
A concept is suggested [5] that handles a smart city’s island functioning, which
transforms it into a smart island. This work uses cloud theory in addition to smart
island modeling to quantify the uncertainties in STS and MG. Finally, the suggested
model is simulated to check for accuracy and efficacy. A methodology based on a
conceptual IoT implementation process is proposed [10], as a specific IoT applica-
tions, in a customized input–process–output model. The primary factors in the model
are the original conceptualization and definition of an IoT concept (input), which is
evaluated (process) before being deployed and potentially having an effect in practice
(output).
Arduino Uno3, Servo motors, IR Sensor, TCRT5000, LED, LCD, LDR, PIR Sensor,
Relay, Buzzer, 4H0.3 AH Battery, MQ5 Gas Sensor, Smoke Sensor.
4 Proposed Work
This module demonstrates the use of automated lighting in which human contact
is minimal, all works are done automatically, and two physical parameters, human
mobility, and light intensity, are managed as shown in Figs. 2 and 3. When a person
enters the room, the sensor detects it and the light turns on automatically, and when
the human exits, the light turns off automatically.
Equipment Used
LDR, PIR Sensor, Relay, Arduino Uno3.
In this module, we will evaluate and monitor water quality factors such as PH, soil
moisture, and temperature. This sensor provides information about the water level
task and communicates with the monitor section. This technology preserves the water
by using a real-time system to do active measurements.
This module is critical in smart cities since it will protect our homes and communities
well as shown in Fig. 5. As a result, the smoke detector in this module can detect
the presence of smoke, and when smoke is detected, a buzzer will immediately ring.
Individual battery-powered devices to numerous interconnected units with battery
backups are available for domestic smoke detectors.
Equipment Used
Buzzer, 4H0.3 AH Battery, MQ5 Gas Sensor, Smoke Sensor.
Designing a fundamental architecture from the outset will act as a platform for
later improvements and enable the addition of new services without compromising
functional performance, which is essential for smart city deployment to scale. A
fundamental IoT solution for smart cities consists of four elements as shown in
Fig. 6.
• The network of smart objects
A smart city uses smart objects with sensors and actuators, much like any IoT
system. Data collection and transmission to a centralized cloud management plat-
form are the immediate goals of sensors. Devices can act thanks to actuators; for
example, they can change the lights or stop water from flowing into a leaky pipe.
• Gateways
Any IoT system consists of two components: a cloud component and a “phys-
ical” component made up of IoT devices and network nodes. Data cannot just
“flow” from one component to another. Field gateways and doors are neces-
sary. By cleaning and filtering data before sending it to the cloud, field gateways
make data collection and compression easier. Between field gateways and the
cloud component of a smart city solution, the cloud gateway enables safe data
transmission.
• Data lake
A data lake’s principal function is to store data. Data lakes maintain data in its
unprocessed form. The large data warehouse receives the extracted data when it
is required for insightful analyses.
• Big data storage
One data repository makes up a massive data warehouse. In contrast to data lakes,
it solely includes structured data. Data are extracted, converted, and loaded into
the big data warehouse when its value has been determined. Additionally, it saves
the instructions that control apps send to the actuators of linked devices, such
as the date that sensors were installed, as well as contextual information about
connected things.
Smart cities are the solution to many of the problems we face [12]. Smart cities can
be used to improve public safety, health care, and energy use—to name just a few
areas where smart technology is already being used as shown in Fig. 7. The future
of urban living looks bright with so many innovative technologies coming online in
this field.
A city-wide information system (CIS) is a network of technology and data that can
be used to improve efficiency and reduce carbon footprint [11]. These systems can
help cities to improve their sustainability, resilience, and prosperity by providing the
following:
• Information about emissions from various sources within the city.
Smart buildings and infrastructure can help to improve efficiency and reduce costs,
as well as provide several other benefits. For example:
• Smart buildings can help to save energy by minimizing the amount of heat
produced during the day. This not only makes people more comfortable in the
summer when heating expenditures are high, but it also decreases greenhouse gas
emissions from power plants or companies that generate heat.
• Smart buildings can help to minimize carbon dioxide emissions by improving
insulation and ventilation.
• Conditioning systems that use less electricity (and therefore produce less
pollution).
• Smart buildings can also help with security by monitoring security cameras in
real-time, so you know if someone has taken your property without permission or
if there is an intruder inside your building at night. If this happens before anyone
else notices what’s going on around them, then there is no need for expensive
repairs later down the line when other people discover their belongings scattered
across the floor because someone broke into their house looking for valuables
such as cash lying around on display tables full of coins waiting patiently until
2 IoT-Based Smart City Architecture and Its Applications 19
being picked up by someone who would take them home with them after paying
off debts owed due date coming up soon!
Smart cities are a concept that has been around for some time now. The idea is to
create a more sustainable environment through technology and innovation, which
will help to make cities more liveable for everyone [5].
Smart meters, smart grids, and smart buildings are all parts of a larger utility
management system. These technologies allow utilities to monitor their assets more
closely than ever before and make sure that they are being used as efficiently as
possible.
This can help you to save money on things like electricity or natural gas usage
by reducing wasted energy or increasing production when necessary. It also helps
you to avoid outages by detecting when something goes wrong with your systems
(like water pipes breaking), so if this happens occasionally, it will not be an issue
anymore!
Smart appliances and smart city management are two more aspects of what makes
a city smart. They work together to make life easier for everyone involved, from
residents to businesses, as well as government agencies.
Smart appliances can save you money on your utility bill by not wasting energy
or water when they are not in use. They will also monitor themselves and notify you
if there is something wrong with them (like a leaky pipe), so it will be easy to fix
before getting worse! The smart city management aspect of what makes a city smart
is how these technologies work together with other aspects like public transportation
systems and emergency response teams.
The first strategy presents SC as a city that makes innovative and clever use of existing
ICTs to accomplish its objectives. This definition states that the ICT infrastructures
of the “Smart City” are what enable a smarter, more connected, and more sustainable
metropolitan system.
The “Internet-of-Things” (IoT) paradigm, which offers a system where a range
of devices that can communicate with each other without human involvement is
present in large numbers, supports the need for this ICT deployment [9, 10]. In
this scenario, networked objects dispersed throughout the metro region push and
assist SC. By utilizing technologies like contemporary wireless sensing machine-to-
machine (M2M), radio-frequency identification (RFID), or wireless sensor networks,
the Internet-of-Things is anticipated to significantly contribute to more precise and
efficient resource consumption (WSN). By enabling access to a vast amount of data
“Big-Data” that can be assessed for potential future use using data mining techniques,
the “Internet-of-Things” is expected to successfully contribute to more precise and
efficient resource usage.
The concept of a smart city in which citizens, goods, services, and so forth
are seamlessly integrated with omnipresent technology is becoming a reality,
dramatically improving the experience in twenty-first-century urban regions [13, 14].
The domains of transportation, services, and power efficiency in cities have all
been the subject of proposals created using this methodology. All proposals connected
to big data and data mining can also be included. Numerous of them have also been
financed, developed, or promoted by significant ICT firms, like Endesa-Enel & IBM
in Malaga, Spain, and IBM in Songdo City.
One school of thought says that the construction of a truly smart city can only
be realized through the development of intelligent residents, who are the ones to
confer the “smart” quality on cities, in response to the difficulties given by the
technologically dominant SC model. These initiatives have opted for citizen-centric
and participatory strategies for the co-design and creation of smart cities rather than
viewing people as just another enabling component of the SC. The concept of a
human smart city is emerging as a completely new and unique sort of SC [12, 17].
Despite this, most initiatives to foster the growth of intelligent citizens have
restricted public involvement to functions like data source or tester of a pre-designed
concept or service, with only a few outliers incorporating people throughout the
process [11]. The notable exception has been the development of Living Labs in
the field of smart cities, where the environment has allowed for the emergence of
initiatives in which users have played a significant part at every stage.
2 IoT-Based Smart City Architecture and Its Applications 21
Smart cities are based on the use of artificial intelligence (AI) to make better decisions
and better use of resources. AI is a powerful technology that can help to make cities
smarter and more efficient and save money by making better use of their existing
infrastructure and services.
The potential benefits of smart cities include [18]:
22 S. C. Mamidi et al.
Smart cities are the future. Smart cities already exist, and they are being built all over
the world. They are such an integral part of our lives that we cannot imagine living
without them.
Smart cities are a way of life [19]. When you think about smart city technology,
what do you see? A lot of people would say “smart homes” or “smart cars,” but those
are just two ways that smart technology is helping us to live better lives today.
7 Conclusion
References
1. Ahad MA, Paiva S, Tripathi G, Feroz N (2020) Enabling technologies and sustainable smart
cities. Sustain Cities Soc 61:102301. https://doi.org/10.1016/j.scs.2020.102301
2. Dabeedooal YJ, Dindoyal V, Allam Z, Jones DS (2019) Smart tourism as a pillar for sustainable
urban development: an alternate smart city strategy from Mauritius. Smart Cities 2:153–162.
https://doi.org/10.3390/smartcities2020011
2 IoT-Based Smart City Architecture and Its Applications 23
3. Darmawan AK, Siahaan D, Susanto TD, et al (2019) Identifying success factors in smart city
readiness using a structure equation modelling approach. In: 2019 international conference on
computer science, information technology, and electrical engineering (ICOMITEE). https://
doi.org/10.1109/icomitee.2019.8921312
4. Einola S, Kohtamäki M, Hietikko H (2019) Open strategy in a Smart City. Technol Innov
Manag Rev 9:35–43. https://doi.org/10.22215/timreview/1267
5. Esapour K, Moazzen F, Karimi M et al (2022) A novel energy management framework incor-
porating multi-carrier energy hub for Smart City. IET Gener Transm Distrib. https://doi.org/
10.1049/gtd2.12500
6. Gokozan H, Tastan M, Sari A (2017) Smart cities and management strategies. Chapter 8 in
Book: 2017 Socio-Economic Strategies. ISBN: 978-3-330-06982-4
7. Heidari A, Navimipour NJ, Unal M (2022) Applications of ML/DL in the management of Smart
Cities and societies based on new trends in information technologies: a systematic literature
review. Sustain Cities Soc 85:104089. https://doi.org/10.1016/j.scs.2022.104089
8. Internet of Things. http://www.ti.com/technologies/internet-of-things/overview.html.
Accessed 01 Apr 2019
9. Khanna A, Kaur S (2019) Evolution of internet of things (IOT) and its significant impact in the
field of precision agriculture. Comput Electron Agric 157:218–231. https://doi.org/10.1016/j.
compag.2018.12.039
10. Korte A, Tiberius V, Brem A (2021) Internet of things (IOT) technology research in business
and management literature: results from a co-citation analysis. J Theor Appl Electron Commer
Res 16:2073–2090. https://doi.org/10.3390/jtaer16060116
11. Kummitha RK, Crutzen N (2019) Smart cities and the citizen-driven internet of things: a
qualitative inquiry into an emerging Smart City. Technol Forecast Soc Chang 140:44–53.
https://doi.org/10.1016/j.techfore.2018.12.001
12. Kuyper T (2016) Smart city strategy and upscaling: comparing Barcelona and Amster-dam.
Master Thesis, MSc. IT & Strategic Management. https://doi.org/10.13140/RG.2.2.24999.
14242
13. Lemphane NJ, Kotze B, Kuriakose RB (2022) A review on current IOT-based pasture manage-
ment systems and applications of digital twins in farming. Adv Intell Syst Comput 173–180.
https://doi.org/10.1007/978-981-16-4538-9_18
14. Mora-Sanchez OB, Lopez-Neri E, Cedillo-Elias EJ et al (2021) Validation of IOT infrastructure
for the construction of Smart Cities solutions on living lab platform. IEEE Trans Eng Manage
68:899–908. https://doi.org/10.1109/tem.2020.3002250
15. Rotuna C, Gheorghita A, Zamfiroiu A, Smada D-M (2019) Smart city ecosystem using
Blockchain technology. Informatica Economica 23:41–50. https://doi.org/10.12948/issn14531
305/23.4.2019.04
16. Rout RR, Vemireddy S, Raul SK, Somayajulu DVLN (2020) Fuzzy logic-based emergency
vehicle routing: an IOT system development for Smart City applications. Comput Electr Eng
88:106839. https://doi.org/10.1016/j.compeleceng.2020.106839
17. Saba D, Sahli Y, Berbaoui B, Maouedj R (2019) Towards smart cities: challenges, compo-
nents, and architectures. In: Toward social Internet of Things (SIoT): enabling technologies,
architectures and applications, pp 249–286. https://doi.org/10.1007/978-3-030-24513-9_15
18. Sharma M, Joshi S, Kannan D et al (2020) Internet of things (IOT) adoption barriers of Smart
Cities’ waste management: an Indian context. J Clean Prod 270:122047. https://doi.org/10.
1016/j.jclepro.2020.122047
19. Toledo P, Rubino R, Musolino F, Crovetti P (2021) Re-thinking analog integrated circuits in
digital terms: a new design concept for the IOT ERA. IEEE Trans Circuits Syst II Express
Briefs 68:816–822. https://doi.org/10.1109/tcsii.2021.3049680
Chapter 3
Principal Component Analysis
and Correlation Coefficient-Based
Decision-Making Approach for Stock
Portfolio Selection
1 Introduction
Since the financial market is one of the riskiest markets, it has always been a topic of
great interest to investors due to its ability to raise capital greatly, but investors still
face a lack of choice of the right stocks for the portfolio. Stocks should be assessed on
the basis of multiple criteria. Investors always try to maximize return and minimize
risk, but this is not always possible because usually with increased return there is an
increase in risk and vice versa; therefore, stocks should be combined in such a way
as to allow an acceptable compromise between risk and return. For this, investors
require intricate knowledge of the financial market.
Since the stock selection process is a complex decision-making process with many
contradictory objectives, it normally consists of two phases: (1) selection of suitable
shares and (2) determining weight of each share to be invested in. Stock selection
is viewed as a multi-criteria decision-making problem as it includes selection of
stocks based on certain sets of criterions. MCDM is a measured tool used for both,
determining the criteria weights and to rank the alternatives. Over the past decades,
many researchers and inventors have cited numerous approaches for ranking the
alternatives as well as for determining the weights of criteria [1, 2]. The involvement
of multi-criteria decision analysis (MCDA) to solve the problem of financial market
was examined by [3]. In the recent decades also much research work has been carried
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 25
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_3
26 G. Bisht and A. K. Pal
out where financial decisions are made based on MCDM approaches [4, 5]. The work
points out the substantial involvement of this type of analysis on the optimal selection
problem of financial portfolios.
For capitalizing in stock exchange [6] implemented a hybrid MCDM technique
integrating DEMETAL (Decision-Making Trial and Evaluation Laboratory) and
VIKOR (VlseKriterijumska Optimizacija I Kompromisno Resenje) methods. Refer-
ence [7] introduced a novel hybrid MCDM approach based on the spearman corre-
lation coefficient to rank the stocks. In the framework of Tehran stock exchange
(TSE), an effort has been made by [8] in view of DEA-TOPSIS (Data envelop-
ment analysis-Technique for order preference by similarity to an ideal solution)
outline. Reference [9] developed portfolio selection to rank high ranked stocks. Refer-
ence [10] introduces a hybrid DEA-COPRAS (Complex Proportional Assessment)
approach for selection of portfolios of NSE-based risk return interfaces. Another
hybrid AHP-TOPSIS (Analytic Hierarchy Process) technique was developed by [11]
for ranking the economic performance of particular Indian private banks. Reference
[12] proposed some new mean–variance portfolio models. Almost all the research
in the past considers a hybrid approach for stock selection; however, a combined
two-stage framework considering weights of decision criteria and ranking of stocks
is erratic in literature. Reference [13] proposed the model of mean–variance that laid
the foundation for modern portfolio theory. The past researchers have recognized the
utility of including additional criteria beyond variance and return into the portfolio
selection model [14, 15].
Assigning proper weights to criteria is one of the biggest challenges in the
multi-criteria decision-making process [16]. During early studies, the easiest way
to determine the attributed weights was to assign equal weights [17]. But the final
ranking depends on the weights of attributes hence taking equal weights was never
an appropriate option [18]. During further studies, numerous weight determination
methods were developed which were classified into subjective, objective, and hybrid
methods. Subjective methods weights depend completely on the DM’s preferences
like SMART method [19]. Objective methods weights depend on the data in the deci-
sion matrix like ENTROPY method, CRITIC method, etc. Hybrid methods contain
the combination of both [20, 21]. Almost all the conventional weighing methods
assume that the criteria are independent of each other which is not always true in
realistic problems.
A multivariate statistical procedure known as principal component analysis PCA
is used to condense the huge number of criteria into a smaller number of independent
principal components which are a linear combination of criteria. Thus, the use of PCA
as a weight determination method can be more reliable as compared to the previously
defined weighing methods. It condenses data by recognizing variables that justify for
a huge share of variance in a large dataset [22, 23]. PCA finds principal components
as linear vectors that intend the justification of data’s variability [24]. PCA can be
conventionally used through common statistical computer programs due to which it
is now one of the most popular analytical methods [25]. In numerous sectors, it has
been efficiently used as a large data multivariate analysis tool like vendor and supply
chain [25], commercial airline industry [26], chemometrics [24] life cycle assessment
3 Principal Component Analysis and Correlation Coefficient-Based … 27
[27] and decision making [23, 25]. Recently, the efficiency of transport companies
was evaluated by an integrated PCA model [28], and a PCA-based tensor evaluation
model was developed for group decision making [29]. Due to the non-requirement
of past weight assignment units for all statistics, PCA lessens the subjectivity due to
individual lookouts held between decision makers [25]. However, the determination
of weights of conflicting criteria of stock selection through PCA is rare in literature.
The primary motivations of the paper are
1. The weights of the criteria play a significant role in the ranking of alternatives.
Almost all the conventional weighing methods assume that the criteria are inde-
pendent of each other; but considering the realistic decision-making problems,
the hypothesis of independence of criteria is not always satisfied. Thus, the study
uses the concept of PCA which converts the interdependent criteria into a set of
linearly independent principal components. Also, as a dimension reduction tool,
it can easily deal with large datasets. It provides a data-focused method that elim-
inates unnecessary subjectivity due to human requirements for normalized units
[25]. Unlike traditional weighing methods, PCA also accounts for uncertainty
in data [30]. Thus, PCA can be an efficient tool for the determination of criteria
weights.
2. The ranking methods for stock selection developed in the past are mostly hybrid
methods which are a combination of previously defined MCDM approaches.
Thus, the study develops a novel two-stage approach where weights of the finan-
cial criteria are determined by PCA and ranking using the concept of a correla-
tion coefficient. The most acceptable alternatives show positive correlation with
positive ideal solution and negative correlation with negative ideal solution.
3. The two main objectives of the portfolio optimization problem for any novice
investor are risk and return, but there exist many other factors which affect the
decision of portfolio optimization. The present study incorporates an additional
objective p/e ratio which is used to gauge the valuation of a stock. It expresses
to the investor whether the stock is undervalued or overvalued.
The rest of the paper is systematized as follows: The different phases of the
proposed methodology are presented in Sect. 2. An applied execution of the proposed
approach in stock selection is shown in Sect. 3. Results are discussed in Sect. 4,
followed by conclusions in Sect. 5.
2 Proposed Methodology
The section defines a two-stage framework for ranking the stocks. In the first stage,
the weights of the financial criteria are determined by PCA. In the second stage, we
introduce a correlation coefficient-based approach to estimate the rank of stocks. The
detailed steps involved in the process are explained in the following sections.
28 G. Bisht and A. K. Pal
This section presents an objective weight determination method for obtaining weights
of the criteria in a multi-criteria decision-making process based on principal compo-
nent analysis (PCA). The method assigns high weightage to those criteria which
have a positive impact on the principal components as compared to the ones that are
negatively affecting the principal components. The steps to attain criteria weights
are as follows:
1. Construct the initial decision matrix considering the evaluations of stocks with
respect to different financial criteria. If n stocks are evaluated on the basis of m
criteria, then the matrix is represented as
⎡ ⎤
x11 x12 ... x1m
⎢ x21 x22 ... x2m ⎥
⎢ ⎥
A = ⎢. .. .. .. ⎥.
⎣ .. . . . ⎦
xn1 xn2 . . . xmn
2. Perform the PCA on the given decision matrix of the MCDM problem to attain
the proportion of each principal component.
such that a1 + a2 + a3 + a4 + · · · = 1.
3. Form the positive and negative set of each principal component by analyzing the
criteria having positive and negative affect on the components.
PC+
1 = C α1 , C β1 , . . . .. ,
PC−
1 = C α2 , C β2 , . . . .. ,
where Cα1 , Cβ1 , . . . .. have positive impact on PC1 and Cα2 , Cβ2 , . . . .. have a
negative impact on PC1.
4. Find the weights of the criteria by considering the type of impact they have on
principal components.
Example: C 1 have positive impact on PC1 , PC2 , PC4 and negative impact on
PC3, then wc1 = |a1 + a2 − a3 + a4 |.
5. Finally find the standardized weights using Eq. (1).
wci
ωci = n (1)
i=1 wci
n
such that i=1 ωci = 1.
3 Principal Component Analysis and Correlation Coefficient-Based … 29
ci j = w j .n i j , (3)
A− = (n i1 , n i2 , . . . n im )|n i j is the worst value of jth attribute .
5. Find the correlation of each alternative from the best and worst ideal solution.
6. Determine the utility value for each alternative by using Eq. (4)
where si+ is the correlation coefficient of Ai from the positive ideal solution and
si− is the correlation coefficient of Ai from the negative ideal solution.
The vital step before investing in the stocks is their evaluation based on the financial
criteria. This section presents the application of the proposed method in ranking eight
different stocks, Hindustan Unilever (I 1 ), Bajaj Finance (I 2 ), Asian Paints (I 3 ), Tata
Consultancy Services (I 4 ), Pidilite (I 5 ), Tata Steel (I 6 ), Titan Company (I 7 ), Reliance
Industries (I 8 ) based on the real data. There exist numerous decision criteria which
affect the performance of the stocks. Considering the uncertainties, there is no way
to select a suitable number of financial criteria for evaluating the stocks. In view
30 G. Bisht and A. K. Pal
of the literature and the expert’s opinion, we consider five fundamental criteria for
evaluating the stocks. These criteria are revenue, earning per share, return on equity,
debt, and long term beta. First three criteria belong to beneficial criteria specifying
good growth for higher value, while the last two criteria belong to non-beneficial
criteria specifying good growth for lower value. Real data showing evaluation of the
eight alternatives based on five criteria are retrieved from finance.yahoo.com from 1/
1/2012 to 1/1/2022. Exponential moving average method is used for the conversion
of multi-dimensional data into single numerical data given in Table 1.
PCA was performed for the data given in Table 1, and the results are given in Table
2. It can easily be seen that PC1 accounts for most of the variation 51.4% followed
by PC2 29.33%.
The positive and negative set of each principal component is formed by analyzing
the criteria having positive and negative affect on the components.
PC+ −
1 = {M1 , M2 , M4 , M5 }, PC1 = {M3 }
+ −
PC2 = {M1 , M2 , M3 }, PC2 = {M4 , M5 }
PC+ −
3 = {M1 }, PC3 = {M2 , M3 , M4 , M5 }.
PC+ −
4 = {M2 }, PC4 = {M1 , M3 , M4 , M5 }
PC+ −
5 = {M1 , M4 }, PC5 = {M2 , M3 , M5 }
Based on the given sets and the proportion of different PCA’s, we can find the
criteria weights as given in Table 3.
Table 4 Ranking of
U Ai Ranking
alternatives
I1 −0.29311 4
I2 −0.69968 8
I3 −0.52991 6
I4 0.014956 3
I5 −0.65133 7
I6 0.671128 2
I7 −0.31702 5
I8 1.151763 1
32 G. Bisht and A. K. Pal
In order to verify the effectiveness and validity of the proposed approach for ranking
the alternatives, this section compares the proposed approach with other existing
traditional MADM approaches. Considering the example of stock selection presented
in Sect. 3, we compare the ranking results obtained by our proposed approach with
five MADM models, namely TOPSIS, VIKOR, COPRAS, MABAC, and WPM,
respectively. The ranking results obtained by the models are given in Table 5.
This section demonstrates the stability of the proposed approach toward the change
in weights of criteria. For this, we make change in the criteria weights by 1–30%
and observe the variation in the ranking of alternatives. Table 6 shows the spearman
correlation coefficient in the ranking observed when the criteria weights are changed
by different percentage with the original ranking.
From Table 6, we can observe that for the change of (<5%) in the criteria weights
there is no conflict in the ranking of alternatives. For the change (>5%), there arises
a difference in the ranking, but the correlation coefficient of the observed ranking
with the original ranking is high. Also, the optimal solution in all circumstances is
the same; hence, this verifies the stability of the proposed method toward the optimal
solution.
n
Min. risk = xi di (5)
i=1
2. Return: The expected return of the portfolio is determined by Eq. (6) where xi
and ri represents the weight and the return of the securities.
n
Max. return = x i ri (6)
i=1
3. P/E ratio: The p/e ratio of portfolio is determined by Eq. (7) where xi , yi, and
ei represents the weight, share price, and the EPS (earning per share) of the
securities.
n
i=1 x i · yi
Min. p/e = n (7)
i=1 x i · ei
⎪ −Fi̇min
⎩ i̇
0 forFi ≥ Fi̇max
where the maximum and minimum value of the ith objective function are represented
by Fi̇max and Fi̇min . For each non-dominated solution, the normalized function is
defined as [32]
n p
Xi
χp = m
i=1
n p,
p=1 i=1 Xi
where “n” represents the number of objective functions and “m” represents the non-
dominated solutions. The optimal solution out of the collection of non-dominated
optimal solutions on parent front is the one with the maximum value of χ p .
The weights of the securities obtained by the fuzzy decision-making technique
out of all the solutions of pareto front and the expected portfolio return based on
the proposed method are depicted in Tables 8 and 9. From Table 9, it is observed
that portfolio P3 attains the highest expected return. Hence, the combination of top
six stocks is to be selected for investment. The comparison between the proposed
approach and the previous studies is given in Table 10. The expected return by the
proposed model is 28.205% which is much more than the return by previously defined
models, also the return is almost double that of Thakur [10] model. This indicates
that the proposed model is capable of giving better results. Thus, it verifies the
effectiveness and robustness of the proposed approach in a multi-criteria decision-
making system.
3 Principal Component Analysis and Correlation Coefficient-Based … 35
Table 9 Performance of
Portfolio Portfolio return
different portfolios
P1 0.23958
P2 0.21938
P3 0.28205
P4 0.2590
5 Conclusions
on rank affinity is built to analyze the performance of the proposed method. The
outcome specifies that the portfolio is proficient to deliver better returns (0.28205 or
28.205%). The performance of the results has been shown to be effective compared
to the previous models.
References
1. Haseli G, Sheikh R, Sana SS (2019) Base-criteria on multi criteria decision making method
and its applications. Int J Manag Sci Eng Manag 15(2):79–88
2. Pamučar D, Žižović M, Biswas S, Božanić D (2021) A new logarithm methodology of additive
weights (LMAW) for multi-criteria decision-making: application in logistics. Facta Univer,
Ser: Mech Eng 19(3):361–380
3. Zopounidis C (1999) Multicriteria decision aid in financial management. Euro J Oper Res
119:404–415
4. Xidonas P, Doukas H, Hassapis C (2021) Grouped data, investment committees and multicri-
teria portfolio selection. J Bus Res 129:205–222
5. Mendonça GHM, Ferreira FGDC, Cardoso RTC, Martins FVC (2020) Multi-attribute decision
making applied to financial portfolio optimization problem. Expert Syst Appl 158:113527
6. Fazli S, Jafar H (2012) Developing a hybrid multi-criteria model for investment in stock
exchange. Manag Sci Lett 2(2):457–468
7. Poklepović T, Babić Z (2014) Stock selection using a hybrid MCDM approach. Croatian Oper
Res Rev 5:273–290
8. Mansouri A, Ebrahimi N, Ramazani M (2014) Ranking of companies based on TOPSIS-DEA
approach methods (evidence from cement industry in Tehran stock exchange). Pak J Stat Oper
Res 10(2):189–209
9. Thakur GSM, Bhattacharyya R, Sarkar S (2018) Stock portfolio selection using Dempster-
Shafer evidence theory. J King Saud Univer Comput Inf Sci 30:223–235
10. Gupta S, Bandyopadhyay G, Bhattacharjee M, Biswas S (2019) Portfolio selection using DEA-
COPRAS at risk – return interface based on NSE (India). Int J Innov Technol Explor Eng
(IJITEE) 8(10)
11. Gupta S, Mathew M, Gupta S, Dawar V (2020) Benchmarking the private sector banks in India
using MCDM approach. Wiley 21(2)
12. Dai Z, Kang J (2022) Some new efficient mean-variance portfolio selection models. Int J Financ
Econ 27(4):4784–4796
13. Markowitz HM (1990) Portfolio selection, efficient diversification of investments. Blackwell,
Cambridge MA, Oxford UK
14. Steuer RE, Qi Y, Hirschberger M (2007) Suitable-portfolio investors, nondominated frontier
sensitivity, and the effect of multiple objectives on standard portfolio selection. Ann Oper Res
152:297–317
15. Roman D, Darby-Dowman K, Mitra G (2007) Mean-risk models using two risk measures: a
multi-objective approach. Q Financ 7(4):443–458
16. Velazquez MA, Claudio D, Ravindran AR (2010) Experiments in multiple criteria selection
problems with multiple decision makers. Int J Oper Res 7(4):413–428
17. Wang JJ, Jing YY, Zhang CF, Zhao JH (2009) Review on multi-criteria decision analysis aid
in sustainable energy decision making. Renew Sustain Energy Rev 13(9):2263–2278
18. Ginevičius R (2011) A new determining method for the criteria weights in multicriteria
evaluation. Int J Inf Technol Decis Mak 10:1067–1095
19. Zardari NH, Ahmed K, Shirazi SM, Yusop ZB (2014) Weighting methods and their effects
on multi-criteria decision-making model outcomes in water resources management. Springer,
New York, NY, USA
3 Principal Component Analysis and Correlation Coefficient-Based … 37
20. Delice EK, Can GF (2020) A new approach for ergonomic risk assessment integrating
KEMIRA, best–worst and MCDM methods. Soft Comput 24:15093–15110
21. Du YW, Gao K (2020) Ecological security evaluation of marine ranching with AHP-entropy-
based TOPSIS: a case study of Yantai. China Mar Policy 122:104223
22. Adler N, Golany B (2001) Evaluation of deregulated airline networks using data envelopment
analysis combined with principal component analysis with an application to Western Europe.
Eur J Oper Res 132(2):260–273
23. Zhu J (1998) Data envelopment analysis vs. principal component analysis: an illustrative study
of economic performance of Chinese cities. Euro J Oper Res 111(1):50–61
24. Bro R, Smilde AK (2014) Principal component analysis. Anal Meth 6(9):2812–2831
25. Petroni A, Braglia M (2000) Vendor selection using principal component analysis. J Supply
Chain Manag 36(2):63–69
26. Adler N, Golany B (2002) Including principal component weights to improve discrimination
in data envelopment analysis. J Oper Res Soc 53(9):985–991
27. Balugani E, Lolli F, Pini M, Ferrari AM, Neri P, Gamberini R, Rimini B (2021) Dimensionality
reduced robust ordinal regression applied to life cycle assessment. Expert Syst Appl 178:115021
28. Stevic Z, Miskic S, Vojinovic D, Huskanovic E, Stankovic M, Pamucar D (2022) Development
of a model for evaluating the efficiency of transport companies: PCA-DEA-MCDM model.
Axioms 11(3):140
29. Singh M, Pant M, Kong L, Alijani Z, Snasel V (2023) A PCA-based fuzzy tensor evaluation
model for multi-criteria group decision making. Appl Soft Comput 132:109753
30. Ning C, You F (2018) Data-driven decision making under uncertainty integrating robust opti-
mization with principal component analysis and kernel smoothing methods. Comput Chem
Eng 112:190–210
31. Biswas PP, Suganthan PN, Qu BY, Amaratunga GAJ (2018) Multiobjective economic envi-
ronmental power dispatch with stochastic wind solar small hydro power energy. Energy
150:1039–1057
32. Brka A, Al-Abdeli YM, Kothapalli G (2015) The interplay between renewables penetration,
costing and emissions in the sizing of stand-alone hydrogen systems. Int J Hydrogen Energy
40(1):125–135
33. Naveenan RV (2019) Risk and return analysis of portfolio management services of reliance
nippon asset management limited (RNAM). Global J Manag Bus 6(1):108–117
34. Narang M, Joshi MC, Bisht K, Pal A (2022) Stock portfolio selection using a new decision-
making approach based on the integration of fuzzy cocoso with heroninan mean operator. In:
Decision making: applications in management and engineering
Chapter 4
Survey on Crop Production and Crop
Protection
1 Introduction
According to the Food and Agriculture Organization of the UN, growth in the popu-
lation may rapidly increase to 9 billion by 2050. Climate change, increasing demand
for organic food, rapid population, conversion of farmland to industrial areas, and
growing market demands have posed a great challenge in crop production. The focus
on sustainability is also a challenge to protect the quality of soil in upcoming years. In
this, the growing technological advancements have shown better results as conveyed
in this paper.
Agriculture plays a major role in the economy of the country as it is the basic
source of livelihood for many low-income and developing countries. The agriculture
industry needs to grow its production levels by 70% to feed the world’s growing
population. To increase the yield of crops, monitoring the environmental factors is
not a complete solution. There are several other factors that reduce productivity in
agriculture to an extreme extent.
The US Department of Agriculture, Agricultural Research Service, is the fore-
most agricultural research organization in the world with more than 3000 scientists
conducting agricultural research in nearly 100 locations around the USA and in
three foreign countries [1]. The need for automation is suggested in agriculture to
overcome the challenges posed by human and natural resources.
This paper analyzes the application of various innovative technologies for
crop production and protection. Innovative technologies achieve self-sufficiency
in agriculture by introducing innovative environmentally suitable solutions and
modern agricultural technologies that are necessary for improving productivity and
decreasing production costs. Embedded-based applications help farmers with many
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 39
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_4
40 H. S. Rakshitha et al.
agricultural activities like sowing seeds, watering crops, applying fertilizers, insec-
ticides, pesticides, etc. These applications will help in moisture monitoring, weather
monitoring, growth monitoring, etc.
These are the most promising technologies for solving the present-day crisis in
underdeveloped and developing countries. This kind of technology solves hunger
problems globally. Crop production is undergoing a huge transition with the use of
technology in all fields from microbiology to artificial intelligence. Management of
systems for data and information clustering is a linchpin for crop production and
protection. The intervention of real-time applications in agriculture has made rapid
growth in crop management. As demand for food and employment is increasing, arti-
ficial intelligence and machine learning help in good quality and quantity production
of crops and also increase job opportunity in this field. These technologies have made
a revolution in the agriculture sector.
In this paper, recent works for better crop production and protection have been
extensively studied and noted. These act as guides for narrowing down the research
on crop protection and yield generation to have better results in a short time frame.
2 Literature Survey
This section explores recent event studies that cover different aspects of innovative
technology for crop production and protection.
Crop production can be increased in several ways such as watering the plants from
time to time, protecting them from pesticides, and protecting them from heavy storms
and bad weather conditions. So, in order to perform this, manual effort is applied
by the people. This manual effort can be reduced by using the upcoming innovative
technologies. [2, 3] educate us on how drones are helpful in the agriculture sector.
Drones are aerial robots as shown in Fig. 1. They are programmed by artificial
intelligence that will help farmers to optimize the use of inputs (seed, fertilizers,
water), react very quickly to threats (weeds, pests, fungi), save crop scouting time,
and to roughly calculate the yield from a field.
The importance of crops during unforeseeable weather conditions and the destruc-
tion of crops in many other naturally occurring phenomena are indicated by [4], which
makes the protection of crops a majority issue that can be solved using data analytics
and the internet of things, and these concepts also help in increasing the productivity
of the crops as shown in Fig. 2.
Several concepts of IoT as shown in Fig. 3 use various wireless sensor networks,
RF identification, and cloud computing which have been used to solve these existing
issues. The authors discuss how IoT and data analytics can be coupled to provide
better solutions. The IoT ecosystem consists of IoT devices that consist of sensors
and actuators, which are wirelessly connected and are mainly used for sensing
temperature and humidity conditions related to crops.
The communication technology is used to deliver the related data extracted from
the sensors toward the main node either using the unlicensed or licensed ISM bands.
4 Survey on Crop Production and Crop Protection 41
The communication standards that can be used include ZigBee, Bluetooth, Z-wave,
etc. For long-range communication, internet-connected devices can be used for trans-
mitting the collected data from the sensors to the main node. The inclusion of data
analytics with IoT helps in improving crop protection in such a way that the data
extracted through the sensors can be used to analyze the crop or the field conditions.
The sensors installed in storage facilities help to monitor unfavorable conditions that
might occur. In that case, the control center will receive an alert message for further
actions.
Big data analytics, ML, and DL algorithms are used in the agriculture sector.
Bhat et al [5] inform that development of an algorithm can be easily done, but
the algorithm must guarantee accuracy and consistency in all the scenarios. Deep
learning algorithms are the most promising technologies that give more effectiveness
in innovation. Here, it also talks about the neural networks that can be implemented in
these innovative technologies. These sensors can be directly deployed or implanted
on the land, robots can be developed for nurturing crops, or weather stations can be
maintained from IoT. Hereby, maintenance and protection can be easily performed
by the farmers or companies. They also give an idea to implement the technologies
like big data analytics and artificial intelligence.
The usage of farmer’s manual efforts can be reduced by utilizing the present
technologies that provide several advantages in farming mechanics which include
monitoring crops and livestock. Joseph et al. [6] All of these can be handled using AI
frameworks and ML algorithms. Also, the Unmanned Autonomous Vehicle (UAV)
as shown in Fig. 4 can be utilized in order to improve precision farming using
human skills and the currently booming technologies. In the methodology proposed,
the information of the crop field is collected by taking images of the crop field
using their computational intelligence vision sensors, and based on the information
collected, the machine learning model is trained in such a way that on the basis of
color features obtained from the images, the nutrient content in the plant is provided
as output information.
Estimation of the higher crop yield is one of the difficulties that are faced by
farmers in the agriculture business. So, various ML algorithms are utilized to esti-
mate crop production and yield. Since the significance of agricultural yield prediction
is increasing, [7] shows how ML approaches can be used to estimate crop produc-
tion. Since a large amount of dataset is available for the selection of the seeds and
forecasting of the yields, it becomes difficult for the farmers to perform these actions.
This work of the farmers can be minimized using artificial intelligence.
The productivity of the crop also depends on the area where they grow. So,
[8] proposes a model that is trained with ML concepts that determine productivity
grounded on the parameter’s moisture, downfall, and temperature. Prediction algo-
rithms such as logistic regression, Naive Bayes classifier, random forest, Support
Vector Machines (SVMs), k-Nearest Neighbor (KNN), Multi-Condition Filtering,
and collaborative filtering algorithms are applied. After training the dataset model
and applying any of these algorithms, a comparison of these algorithms is made
to analyze the accuracy of the model. For the recommendation, Multi-Condition
Filtering and collaborative filtering algorithms were applied. The input parameter
of the collaborative filtering is compared with the trained data of the system, and
it filters the crops based on their cosine similarities and categorizes the crop with
a different combination of the low, moderate, and high ranges of the input parame-
ters and shows the crop consequently using the Multi-Condition Filtering algorithm.
Table 1 depicts the overview of surveys from [9–24].
Crop production is a tedious job and protecting that crop is very significant for every
farmer to keep in mind. To make this work easier, many innovative technologies can
be used.
Big data analysis, machine learning, deep learning, artificial intelligence, etc. are
the technologies used to improve crop quality and quantity. This survey says that
we can implement innovative technologies in crop plant production and protection
as shown in Fig. 5. These technologies help from the analysis of the soil to the
harvesting of the soil.
The above-listed existing works are shown in Table 1 which provides an idea to
prepare the idea for the operation of innovative technologies like ML and AI in the
complete flow of agriculture practices.
For any crop production, the very first work is to prepare the soil. This includes
checking soil fertility, health, and its surrounding environment like temperature and
humidity for crop production. At this time, a farmer can use deep learning algorithms
to analyze and monitor the water level of the soil and the temperature of the weather,
and it can also educate AI-based technology like robots for the maintenance of soil.
This makes farmer’s work easier and more efficient.
The next stage of work is seed selection and sowing. In the traditional way of seed
selection and sowing, farmers without knowledge sow every seed and this might cause
44 H. S. Rakshitha et al.
loss of crop in some areas due to unhealthy and infertile seeds as shown in Figs. 6.
and 7. Using machine learning and deep learning algorithms, we can design a system
that can differentiate the healthy and unhealthy seeds and robots that can sow seeds
at a proper distance for the good growth of crops.
Once the sowing seed is completed, the next work is to provide manures and
fertilizers to the crops. Before providing the manures, measuring the quantity of
micro and macronutrients required for a particular crop is very important. To do
that, we can use deep learning technology. For providing manures and fertilizers,
4 Survey on Crop Production and Crop Protection 45
we can use semi-old technology like drip irrigation and its upgrades or technology
like artificial intelligence and machine learning. From these technologies, we can
provide fertilizers through automated pipeline systems or robotic technology.
When the crop plants start to grow, a farmer needs to take care of the plants from
pests and insects. Usually, we use pesticides and insecticides to protect plants. These
protectors should not be given to the root of the plants; hence, we can use drone
robots to spray these protectants aerially. This again uses artificial intelligence and
machine learning for working. Plant health can be monitored by GPS technology,
and these data can be stored and analyzed using deep learning.
Crops are protected and nurtured till they grow big to harvest. During this period, a
farmer’s work is to only go through the data and information obtained about the crop
plants. Once they are ready for harvesting, we can again use the robot technology and
pieces of machinery to harvest the crops. For fruit and vegetable harvesting, robots
can be used to pluck the fruits and vegetables. And for crops like ragi, wheat rice,
etc., harvesting machinery which is already on the market can be used along with
the updated version of those machineries.
As we all know, some crops must be stored before they sell and also some crops
must be sold out once they are harvested. So, it is very important to keep in mind
storage also before selling. Farmers blindly cannot store the harvested crops. So,
they can use big data analytics and IoT for analyzing the temperature, humidity, and
46 H. S. Rakshitha et al.
pressure of the stored room. A farmer will get a notification if the room is not in the
threshold conditions. And at any time, he can maintain the store room conditions.
Mean time for the selling of crops, the farmer can use deep learning technology for
analyzing the pricing of the crop from recollection and foreseeing. This can decrease
the burden of loss on the farmer’s economic conditions. This will also give good profit
for the farmer. In crop production, we can use the technologies like satellite photog-
raphy and imagery, global information systems (GIS), global positioning systems
(GPS), measuring systems and weather monitoring, yield monitoring systems, and
soil and plant sensing systems, and these systems are part of AI and IoT.
This paper aims to get knowledge for the usage of innovative technologies in each and
every step of crop production practices, which includes soil analysis, seed selection,
sowing micro and macronutrients’ analysis, crop growth monitoring, pest detection,
and alerting, yield monitoring, smart harvesting, etc. Figure 8 gives the best infor-
mation about the agricultural practices from start to finish of the work. Figure 5
gives complete information on using innovative technology in crop production and
protection.
Usually, farmers will experience difficulties in finding manpower for many field-
works performed in agriculture. So, using robots designed with artificial intelligence
and machine learning algorithms will be very much helpful in reducing manpower
and effective use of technological strength instead of manual strength.
In areas like soil testing and seed selection, farmers should prefer an expert. But
sometimes experts may not be near farmer’s land, or they can’t reach at a perfect
time, and that might cost a high economy to the farmer because soil testing is required
for every single crop the farmer has to grow. So, these problems can also be reduced
Fig. 8 Overview of
agriculture practices
4 Survey on Crop Production and Crop Protection 47
by implementing a system, which can check the pH, soil moisture and minerals, and
other things needed for the better development of the crop plants.
Also, the present technology can be used in crop growth monitoring and
harvesting. Here, we use deep learning and big data analytics to ensure proper main-
tenance of crop production. Technology can also be used in crop protection by having
drones, IoT, and big data analytics as a combination. In this, the farmer can check
the production activity from his place and get the data on what pest is attacking the
crop and what precautionary measures are to be taken in protecting the crop.
5 Conclusion
Crop production must increase in order to satisfy the increasing demands for food
so as to prevent future threats that may arise. The research was conducted on crop
production and land use. It was seen that during the period of crop growth, the crops
get normally affected by bad weather conditions, insects that eat up the grown crop,
and the type of soils that are used for the growth of the crop. All these factors must
be taken into consideration in order to obtain good production of the crops. This can
be achieved using artificial intelligence such that different ML and DL algorithms
can be used to predict the required features by training the model using several data
that are collected in real-time.
So, using the predictions obtained from the machine learning-trained models,
required measures can be taken to improve productivity. Once the productivity of
the crop has increased, the next step is the protection of the crops that are stored
in the facilities, such that they must be monitored in order to prevent the crop from
getting damaged. The moisture content and temperature must be managed in the
facility in which the crops are stored in order to prevent harm, the main cause of crop
harm is due to insects, and this can be prevented from live monitoring. So, different
research papers were analyzed and different models that were trained using machine
learning were understood and it was seen that the results from that model were almost
90%–96% accurate.
6 Future Scope
The work carried out in this paper is based on the theoretical information and the real-
time data that were obtained from the farmers for the purpose of understanding the
scenarios which affect crop production, and these are analyzed using the theoretical
models that were available, which helped us in providing a better solution using the
artificial intelligence combined with the IoT. Since the work has been carried out only
in a hypothetical manner, the advantages of implementing the system for providing
a much more precise solution were not performed for the existing problems in crop
production. So, the extension of the work in order to impose a perfect system in the
48 H. S. Rakshitha et al.
proper field conditions can provide us with more information to improve the existing
methods to reduce the problems in the agricultural domain.
References
1. Liu SY (2020) Artificial intelligence (AI) in agriculture. In: IT professional, vol 22, no 3, pp
14–15. https://doi.org/10.1109/MITP.2020.2986121
2. Shahrooz M, Talaeizadeh A, Alasty A (2020) Agricultural spraying drones: advantages and
disadvantages. Virtual Sympos Plant Omics Sci (OMICAS) 2020:1–5
3. Potrino G, Palmieri N, Antonello V, Serianni A (2018) Drones support in precision agriculture
for fighting against parasites. In: 2018 26th telecommunications forum (TELFOR), pp 1–4
4. Rayhana R, Xiao G, Liu Z (2021) RFID sensing technologies for smart agriculture. IEEE
Instrum Meas Mag 24(3):50–60
5. Bhat SA, Huang N-F (2021) Big data and AI revolution in precision agriculture: survey and
challenges. IEEE
6. Joseph RB, Lakshmi MB, Suresh S, Sunder R (2020) Innovative analysis of precision farming
techniques with artificial intelligence. In: 2020 2nd international conference on innovative
mechanisms for industry applications (ICIMIA), pp 353–358. https://doi.org/10.1109/ICIMIA
48430.2020.9074937
7. Sharma SK, Sharma DP, Verma JK (202) Study on machine learning algorithms in crop yield
predictions specific to Indian agricultural contexts. In: 2021 international conference on compu-
tational performance evaluation (ComPE), pp 155–166. https://doi.org/10.1109/ComPE53109.
2021.9752260
8. Talukder S, Jannat H, Sengupta K, Saha S, Hossain MI (2020)Enhancing crops production based
on environmental status using machine learning techniques. In: 2020 international conference
on computer science and its application in agriculture (ICOSICA), pp 1–5. https://doi.org/10.
1109/ICOSICA49951.2020.9243
9. Junior CRG, Gomes PH, Mano LY, de Oliveira RB, de Carvalho ACPLF, Faiçal BS (2017)
A machine learning-based approach for prediction of plant protection product deposition. In:
2017 Brazilian conference on intelligent systems (BRACIS), pp 234–239. https://doi.org/10.
1109/BRACIS.2017.26.
10. JR, HD, PB (2022) A machine learning-based approach for crop yield prediction and fertilizer
recommendation. In: 2022 6th international conference on trends in electronics and informatics
(ICOEI), pp 1330–1334. https://doi.org/10.1109/ICOEI53556.2022.9777230
11. Kumar R, Singh MP, Kumar P, Singh JP (2015) Crop selection method to maximize crop yield
rate using machine learning technique. In: 2015 international conference on smart technologies
and management for computing, communication, controls, energy and materials (ICSTM), pp
138–145. https://doi.org/10.1109/ICSTM.2015.7225403
12. Dwivedi P, Kumar S, Vijh S, Chaturvedi Y (2021) Study of machine learning techniques
for plant disease recognition in agriculture. In: 2021 11th international conference on cloud
computing, data science and engineering (confluence), pp 752–756. https://doi.org/10.1109/
Confluence51648.2021.9377186
13. Alam M, Alam MS, Roman M, Tufail M, Khan MU, Khan MT (2020) Real-time machine-
learning based crop/weed detection and classification for variable- rate spraying in precision
agriculture. In: 2020 7th international conference on electrical and electronics engineering
(ICEEE), pp 273–280. https://doi.org/10.1109/ICEEE49618.2020.9102505
14. Kavita M, Mathur P (2020) Crop yield estimation in India using machine learning. In: 2020
IEEE 5th international conference on computing communication and automation (ICCCA), pp
220–224. https://doi.org/10.1109/ICCCA49541.2020.9250915
15. Gandhi N, Petkar O, Armstrong LJ (2016) Rice crop yield prediction using artificial neural
networks. In: 2016 IEEE technological innovations in ICT for agriculture and rural development
(TIAR). Chennai, India, pp 105–110
4 Survey on Crop Production and Crop Protection 49
16. Khaki S, Wang L (2019) Crop yield prediction using deep neural networks. Front Plant Sci
10:621
17. Crane-Droesch A (2018) Machine learning methods for crop yield prediction and climate
change impact assessment in agriculture. Environ Res Lett 13(11):114003
18. Khosla E, Dharavath R, Priya R (2019) Crop yield prediction using aggregated rainfall-based
modular artificial neural networks and support vector regression. Environ Dev Sustain
19. Maya Gopal PS, Bhargavi R (2019) Optimum feature subset for optimizing crop yield prediction
using filter and wrapper approaches. Appl Eng Agric 35(1):9–14
20. Kim N, Lee Y-W (2016) Machine learning approaches to corn yield estimation using satellite
images and climate data: a case of Iowa state, vol 34, no 4, pp 383–390
21. Xiaoxue L, Xuesong B, Longhe W, Bingyuan R, Shuhan L, Lin L (2021) Review and trend
analysis of knowledge graphs for crop pest and diseases. IEEE Access 7:62251–62264
22. Wolfert S, Ge L, Verdouw C, Bogaardt M-J (2017) Big data in smart farming – a review. Agric
Syst 153. ISSN 0308-521X
23. Manik SMN, Pengilley G, Dean G, Field B, Shabala S, Zhou M (2019) Soil and crop manage-
ment practices to minimize the impact of waterlogging on crop productivity. Front Plant Sci
12(10):140. https://doi.org/10.3389/fpls.2019.00140.PMID:30809241;PMCID:PMC6379354
24. Quy VK, Hau NV, Anh DV, Quy NM, Ban NT, Lanza S, Randazzo G, Muzirafuti A (2022)
IoT-enabled smart agriculture: architecture, applications, and challenges. Appl Sci. https://doi.
org/10.3390/app12073396
Chapter 5
Disease Detection for Grapes: A Review
1 Introduction
As the world population is growing, there is a huge demand for the supply of food. To
satisfy this demand, agricultural productivity needs to be increased, and yield needs
to be increased. This is possible when the crops grown are healthy. But because
of pathogens present in the environment, the crops get various diseases, and these
unhealthy crops tend to reduce productivity. It is therefore necessary to monitor
the crop health and its growth progress and detect the disease at the early stage and
provide the future prediction of the disease spread, so that farmers can take necessary
actions like spraying herbicides/pesticides to prevent the crop from severe disease.
In the earlier time of crop disease detection, manual inspection by the farmers
was used, and accordingly decisions were taken to spray the chemicals. From the last
decade the advanced and state of art technologies like artificial intelligence, machine
learning, Internet of Things, Computer Vision, and image processing techniques are
being used in the field of crop disease detection by the researchers.
Grapes are one of the profitable and cost-effective crops. Grape fruits are being
used for the preparation of wine, juices, jams, and jellies. Million tons of grapes are
exported and imported in the world. However, the grape crop is affected by many
diseases which reduce the yield of the crop. The diseases with which grape crops are
affected are Powdery Mildew, Anthracnose, greenaria bitter rot, bacterial leaf spot,
alternaria blight, Black Rot, blue mold rot, botrytis bunch rot, Downy Mildew, black
P. Deshpande (B)
VIIT, SPPU, Pune, India
e-mail: priya.221p0049@viit.ac.in
PVG’sCOET, SPPU, Pune, India
S. Kore
BVCOEW, SPPU, Pune, India
e-mail: sharda.kore@bharatividyapeeth.edu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 51
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_5
52 P. Deshpande and S. Kore
mold rot, green mold rot, rhizopus rot, Rust, foot rot, IPM for grapes. There is a need
to detect the disease, predict the severity, and suggest pesticide use so that farmers
can take the required actions.
This paper has five sections. Section 2 presents a survey of methods for plant
disease detection. Section 3 presents a survey of methods for grapes disease detection.
Section 4 presents a summary of the survey in tabular format. Section 5 discusses
challenges and future directions, and Sect. 6 is the conclusion section.
of the disease detection model in complex environments. So this calls for the design
of a novel disease detection model which is more accurate, fast, and intelligent. So
the author suggests development of new deep learning models. The author has also
reviewed the machine learning models like support vector machine (SVM), KNN,
K-means clustering, deep learning models like CNN, GANs. Labeled datasets are
difficult to obtain for early plant detection using HSIte Adversarial networks (for data
augmentation) techniques. Various research gaps are identified like there is a need
for larger datasets for CNN training. For plant disease detection, large and diverse
datasets are not collected. If a large dataset is not available, then there is a need to
implement transfer learning with deep learning with limited dataset. It is found that
early detection of diseases with limited sample sets is still under research, and more
research can be directed toward it. There is a need to build a large dataset of plant
diseases in actual real conditions, for the experimentation purpose the dataset from
PlantVillage is most commonly used, but the data in PlantVillage data is created
under laboratory conditions.
A review on advanced techniques for agricultural disease detection is presented
in [4]. It compares the merits and demerits of machine learning methods with deep
learning and transfer learning methods. Traditional ML methods like SVM, Bayesian
classifiers depend on the quality of data images. Also, the realization is complex and
difficult when the number of training samples is large. It’s concluded that deep
learning with CNN is best suitable for disease detection as compared to traditional
machine learning methods, but still there is scope to improve the accuracy of CNN
as the dataset is limited. Transfer learning can also be used over the deep learning
methods as DL requires a huge amount of dataset and quality of deep learning
models is more dependent on the datasets and in agriculture there is still scarcity of
huge datasets. Also, Parameter Optimization is a major concern in DL. The author
explains the need for the construction of image dataset and expanding current datasets
as presently lack of disease image labeled data determines the quality and accuracy
of DL models. From this survey, it is concluded that most crop disease study is
focused on tomato, rice, cucumbers, apples, and citrus, there is need to design a
method to identify disease independent of specific crop. It suggests that DL can be
integrated with current smartphone technology. Along with disease detection, it is
necessary to find the severity of the disease and also need to relate disease with other
factors like temperature, humidity, soil type. Diverse image dataset construction in
the actual cultivation environment is needed instead of the image datasets collected
in a controlled laboratory environment that will help to improve the accuracy of
the plant disease detection deep learning models. The author suggests the use of
a heterogeneous mode of transfer learning can be employed to predict the disease
based on text, image, and video data instead of only image data.
A detailed survey about plant disease detection using image processing and ML
techniques is presented in [5]. It gives a survey of various plant diseases for plants like
apple, corn, cherry, grapes etc. It also discusses the steps involved in the plant disease
detection process like image pre-processing, Feature Extraction and selection, image
segmentation, disease classification; various classifiers for plant disease detection are
also explored. It also summarizes the previous research work done by researchers for
54 P. Deshpande and S. Kore
various crops like profit crops, mixed culture, grains, etc., using image processing
techniques in terms of percentage of papers. From this survey, it is observed that a
lot of research is being done on rice, tomato, cucumber, citrus, and wheat, but less
research is directed toward profit crops like sugarcane, groundnut. Further the gaps
in research for plant disease detection like there is a need to detect the disease at
a particular stage are discussed. It will be helpful to farmers if stage wise special
precautions are suggested to him. Also, if precise estimation of the infected area of
the plant is done, then it is possible to control and minimize the unmanaged use of
pesticides by the farmers. Though a lot of researchers have provided solutions to this
problem, there is less availability of the actual corresponding systems, so there is a
need to develop mobile-based applications and Website solutions for the farmers in
the world. A “Disease Analysis Report” can be generated for the farmers. There is
a need to develop real-time applications using real-time conditions data rather than
data obtained from the controlled environment in the laboratory.
Deep convolutional network with nine layers methodology is presented in [6] for
39 different classes of plant leaf diseases. The nine-layer deep CNN performance
is compared with SVM, KNN, AlexNet, VGG16, InceptionV3, and ResNet. The
image dataset was taken from PlantVillage. As the training model requires huge
data, the images are augmented to create many numbers of images. The models
were trained and tested using Keras, OpenCV, and Pillow libraries with Python
Programming. The developed model achieved 96.46% of accuracy as compared to
SVM, KNN, Logistic Regression, and decision tree. The further suggestion is that
an improvement in accuracy can be achieved by creating the enhanced dataset. The
new dataset can be created by collecting the different images from different plants,
cultivation, geographical areas, and image qualities. The research can be extended
to fruits, flowers, and stem parts of the plant. Also, the research can be extended to
plant disease diagnosis.
A comparative study of various deep learning models for plant disease iden-
tification and classification is presented in [7]. It provides information about the
image processing-based disease detection techniques using deep convolutional neural
networks. It used plant disease dataset from the ImageNet Dataset Library and imple-
mented the deep learning architectures VGG 16, Inception V4, ResNet with 50,100,
and 152 layers, DenseNet with 121 layers and compared their performance. It is
found that DenseNet gives more accuracy as compared to others, but some research
can be still carried out to reduce the computational processing time.
Table 1 gives a tabular summary of plant disease detection methods and gaps
identified in the literature.
A novel method of image processing and multiclass support vector machine was used
in [8]. Grape diseases like leaf blight, Black Measles, and Black Rot were detected.
Authors used Gray-level co-occurrence matrix (GLCM) and principle component
5 Disease Detection for Grapes: A Review 55
Table 1 (continued)
References No Methodology used Limitations/future scope
Yuan et al. [4] DL and transfer learning CNN for Parameter optimization is major concern
image classification in DL
homogeneous transfer learning Can integrate DL with current smartphone
technology
Necessity to find the severity of the disease
and related disease with other factors like
temperature, humidity, soil type
Heterogeneous mode of transfer learning
can be employed to predict the disease
based on text, Image and video data
instead of only image data
Kumar et al. [5] Image processing unsupervised Recognition stage of infection accurate
and supervised classifiers classification
Development of website solution and
mobile app and reliability of detection
systems
Geetharamani Image processing deep CNN Need to increase database classes and size
et al. [6] by capturing images in real environment.
Research can be extended to other parts of
plant like flower, fruits, and stems
Too et al. [7] Deep CNN, VGG 16, inception Computational time needs to be improved
V4, ResNet with 50,100 and 152
layers, DenseNet with 121 layers
Keras with Theano Backend for
training
analysis (PCA) for extracting features and reducing feature dimensions. An accuracy
of 98.71% was obtained using the GLCM method while the PCA method achieved
an accuracy of 98.97%. Deep learning algorithms, i.e., CNN and GoogLeNet were
also used and an accuracy of 86.82% and 94.05% were achieved, respectively.
Authors in [9] proposed a deep convolutional network (DCNN) for identification
and classification of grape leaf diseases. The grape leaves RGB image dataset from
PlantVillage was used. The developed model obtained an accuracy of 99.34%.
Ghost Convolution and Transformer Network for grape leaf disease detection and
pest detection is proposed in [10]. Total of 8 grape diseases, namely Black Rot, leaf
blight, Esca, Downy Mildew, Brown Spot, Powdery Mildew, Nutrient Deficiency,
and viruses were identified. A dataset of 12,615 images was collected. An accuracy
of 98.14 percent was achieved using this model. One of the drawbacks listed is that
the proposed model works only on labeled data. A suggestion to enhance the labeled
dataset is also given to enhance the accuracy. Further the research can be directed
toward segmenting the legion area for severity grading.
Hyperspectral imaging and machine learning approach for detecting Flavescense
Doree Grapevine disease is used in [11]. The auto-encoders are used for reducing
5 Disease Detection for Grapes: A Review 57
diseases Leaf Spot, Anthracnose, Downy Mildew, Round Spot, and Sphaceloma
Ampelinum De Bary were detected with an accuracy of 80%.
A grape disease detection using Random Forest-based classification was presented
in [19]. Back Propagation Neural Networks (BPNN), Probabilistic Neural Networks
(PNN), support vector machine (SVM), and Random Forest implementation were
done with their performance comparison. The dataset of 900 images captured in an
uncontrolled environment was used and the proposed model achieved accuracy of
86%. The research targeted three grape fungi diseases, namely Anthracnose, Downy
Mildew, and Powdery Mildew.
4 Summary
As far as the literature survey for the plant and grape disease detection is concerned,
most of the disease detection is carried out using the dataset from PlantVillage,
ImageNet dataset which is collected in a controlled environment. There is a need
to design more accurate models to detect multiple diseases. For grape plant disease
detection systems to be more accurate, there is a need to create a diverse dataset by
considering the real environment and not the laboratory environment. To address this
issue, the dataset can be created by taking images with the help of high resolution
smartphone RGB cameras, multispectral cameras, and hyperspectral cameras. There
is a need to detect the disease stage wise and inform farmers. Early detection of
disease is important. It will be helpful to farmers if stage wise special precautions
are suggested to him. Also, if precise estimation of the infected area of the plant is
done, then it is possible to control and minimize the unmanaged use of pesticides by
the farmer. It calls for the precision praying system to be implemented. According
to the literature survey, most of the disease detection techniques work with data of
diseases on the leaf section of plants. The research can be directed toward disease
detection by considering other parts of the plants like stem, fruits, etc.
6 Conclusion
The survey of various disease detection methods for grapes diseases is presented in
this paper. This paper provides a summary of existing methods and the challenges
present. In the future, a more accurate disease detection model can be developed
using the dataset created by capturing images in real-time scenarios and varying
5 Disease Detection for Grapes: A Review 59
Table 2 (continued)
References Methodology used and Accuracy Limitations/Future Scope
No Diseases Detected (%)
Jaisakthi Image processing, machine 93.00 Real-time dataset is not used
et al. [17] learning algorithms, SVM.,
AdaBoost, Random Forest,
Black Rot, Esca, leaf blight
Zhu et al. [18] Image analysis (Wiener filter 80.00 Dataset size can be enlarged
and Wavelet transform) and 3
stage BPNN
Anthracnose, Downy Mildew,
Round Spot, Leaf Spot,
Sphaceloma ampelinum De
bary
Sandika et al. PNN, BPNN, SVM, Random 86.00 Dataset size can be enlarged
[19] Forest
Anthracnose, Powdery
Mildew, Downy Mildew
References
1. Abade A, Afonso P, Ferreira Flavio de Barros V (2021) Plant diseases recognition on images
using convolutional neural networks: a systematic review. Comput Electron Agric 106125:1–31
2. Yong AI, Sun C, Tie J, Cai X (2020) Research on recognition model of crop diseases and insect
pests based on deep learning in harsh environments. IEEE Access 8:171686–171693
3. Lili LI, Zhang S, Wang B (2021) Plant disease detection and classification by deep learning a
review. IEEE Access 9:56683–56698
4. Yuan Y, Chen L, Lit HWL (2021) Advanced agriculture disease image recognition technologies:
a review. J Inf Proc Agric 9(1):48–59
5. Kumar V, Vishnoi KK, Kumar B (2021) Plant disease detection using computational
intelligence and image processing. J Plant Diseases Protect 128:19–53
6. Geetharamani G, Arun Pandian J (2019) Identification of plant leaf diseases using a nine-layer
deep convolutional neural networks. Comput Electric Eng 323–338
7. Too EC, Yujian L, Njukia S, Yingchun L (2019) A comparative study of fine tuning deep
learning models for plant disease identification. Elsevier J Comput Electron Agric 61:272–279
8. Javidan SM, Banakar A, Vakilian KA, Ampatzidis Y (2023) Diagnosis of grape leaf diseases
using automatic K-means clustering and machine learning. Smart Agric Technol 3:100081
9. Math RM, Dharwadkar NV (2022) Early detection and identification of grape diseases using
convolutional neural networks. J Plant Dis Prot 129:521–532
5 Disease Detection for Grapes: A Review 61
10. Yang XLR, Zhou J, Jiao J, Liu F, Liu Y, Su B, Gu P (2022) A hybrid model of ghost-convolution
enlightened transformer for effective diagnosis of grape leaf disease and pest. J King Saud
Univer Comput Inf Sci 1–13
11. Silvaa DM, Bernardinc T, Fanton K, Nepaul R, Joaquim LP, Sousaab J, Cunhaab A (2022)
Automatic detection of Flavescense Dorée grapevine disease in hyperspectral images using
machine learning. Procedia Comput Sci 196:125–132. https://doi.org/10.1016/j.procs.2021.
11.081
12. Sanath Rao U, Swathia R, Sanjanaa V, Arpitha L, Chandrasekhara K, Chinmayi P, Naik K
(2021) Deep learning precision farming: grapes and mango leaf disease detection by transfer
learning. Glob Trans Proc 2(2):535–544
13. Zhou C, Zhang Z, Zhou S, Xing J, Wu Q, Song J (2021) Grape leaf spot identification under
limited samples by fine grained-GAN. Access 9:100480–100489
14. Lauguico S, Concepcion R, Tobias RR, Bandala A, Vicerra RR, Dadios E (2020) Grape Leaf
multi-disease detection with confidence value using transfer learning integrated to regions with
convolutional neural network. In: 2020 IEEE region 10 conference (TENCON), pp 767–772
15. Xie X, Ma Y, Liu B, He J, Li S, Wang H (2020) A deep learning-based real-time detector for
grape leaf diseases using improved convolutional neural networks. Front Plant Sci 1–14
16. Ji M, Zhang L, Qiufeng W (2020) Automatic grape leaf diseases identification via united model
based on multiple convolutional neural networks. Inf Proc Agric 7(3):418–426
17. Jaisakthi SM, Mirunalini P. Thenmozhi D, Vatsala (2019) Grape leaf disease identification using
machine learning techniques. In: 2019 international conference on computational intelligence
in data science (ICCIDS), pp 21–23
18. Zhu J, Wu A, Wang X, Zhang H (2020) Identification of grape diseases using image analysis
and BP neural networks. Multimedia Tools Applications 79(21,2):14539–14551
19. Sandika B, Avil S, Sanat S, Srinivasu P (2016) Random forest based classification of diseases in
grapes from images captured in uncontrolled environments. In: 2016 IEEE 13TH international
conference on signal processing (ICSP), pp 1775–1780
Chapter 6
URL Weight-Based Round Robin Load
Balancing in Cloud Environment
1 Introduction
Zhou et al. [2] explained the process of distributing workloads in a cloud computing
environment for different computing resources by balancing network traffic using
the resources assessment of cloud resources is called Cloud load balancing. Cloud
load balancing is used to meet the organization’s needs by routing incoming traffic
to multiple servers, networks, or other resources, improving performance, and
protecting it from service disturbances. Cloud load balancing can distribute the work-
loads in 2 or more geographic locations. Configuration policy routes the requests to
targets based on the load balancer receiving incoming traffic.
Rahman et al. [3] and AlKhatib et al. [4] in their papers explained how load
balancer as a service can be used in cloud and different load balancing tech-
niques, respectively. The load balancer looks at all the individual nodes/targets,
which should be fully operational. For balancing the load in the cloud, so many
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 63
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_6
64 V. K. Nampally et al.
algorithms were there like Static Algorithm, Dynamic Algorithm, Round Robin
Algorithm, Weighted Round Robin, Opportunistic Load Balancing Algorithm, Load
Balancing Algorithm Minimum To Minimum, Load Balancing Algorithm Maximum
To Minimum, Least connection, Weighted slightest connection, Resource-based,
Request-based, Response time Load Balancing Algorithm. Jaiswal and Jain [5]
showed Load Balancing should be optimal for balancing the load to achieve better
performance for the utilization of resources of the cloud. In general, load balancing is
done with software help but not hardware because hardware costs more than software.
The working of load balancers is shown in Fig. 2.
Normally the user’s request will interact with cloud Load Balancer using the
Internet to access the cloud resources. The purpose of the Cloud load balancer is to
distribute the user requests/traffic across resources. Cloud load balancer reduces the
risk of performance issues of your applications. Generally, the resources will be the
compute engines, computational servers, or virtual machine instances. The servers in
the cloud always store the data in cloud storage buckets. Here the data in the database
is stored in the form of cloud storage buckets. Cloud load balancers can address the
traffic type like HTTP/HTTPS/TCP/UDP/ESP/GRE/ICMP and ICMPv6. The data
is taken back up in the backend, which can be in multiple regions.
Islam and Hasan [6] explained different computing service and model types. To serve
the request of the clients, there are four service models available in the market. They
were
1. On-Premise Environment.
2. Infrastructure as a Service—IAAS.
3. Platform as a Service—PAAS.
4. Software as a Service—SAAS.
On-Premise Environment
Here all the things/resources from networking to applications must be taken care of
by the user but not the cloud provider.
Infrastructure as a Service—IAAS
Infrastructure as a Service provides access to resources (virtual, physical machines,
virtual storage, etc.) in the cloud environment. Examples are AWS, VMware, and
Rackspace.
Platform as a Service—PAAS
PAAS provides a runtime environment for applications, development, and deploy-
ment tools.
Examples are Azure, force, and Google App Engine.
Software as a Service—SAAS
Software as a Service allows using software applications as a service to end users.
Examples are Google Docs, MS Office, and Gmail.
Cloud Load Balancing Features are used to create and configure the cloud environ-
ment as required by the user [7]. Each feature of the cloud is used for a specific
purpose as mentioned in Table 1.
66 V. K. Nampally et al.
Cloud load balancer distributes network traffic across resources using software- or
hardware-based approach [6]. When both are compared for cost and performance-
based approach is the best.
6 URL Weight-Based Round Robin Load Balancing in Cloud Environment 67
Software-Based Approach
Here the software is used for balancing the load in the cloud environment.
Hardware-based approach
Here hardware is used for balancing load in the cloud environment.
Primary Cloud Platform Providers List:
Many providers offer cloud load balancing services which include three major
platforms: AWS, Azure, and GCP.
Company Name: Amazon.
Cloud Platform Name: Amazon Web Services (AWS).
Load Balancing: Amit S. Rodge [8] explained that Elastic Load Balancing
distributes incoming traffic to targets (EC2 instances). Elastic Load Balancing
in AWS is Application, Network, Gateway, and Classic.
Company Name: Google.
Cloud Platform Name: Google Cloud Platform (GCP) [9]
Load Balancing: Mishra et al. [9] showed how load balancing is rendered in
Google cloud. It is built on the front-end server infrastructure of Google.
Company Name: Microsoft.
Cloud Platform Name: Azure [10]
Load Balancing: Load Balancing uses Azure Traffic Manager to distribute
incoming traffic to targets. Carutasu et al. [10] used the concept of VMs to
distribute incoming traffic to targets.
Joshi and Kumari et al. [11] in their paper how cloud Load Balancing is used to
control Traffic, Increase (Resource Utilization, Resource Availability, Throughput,
Performance, Response time, etc, and Reduce (Infrastructure Cost, Latency, Fault
Tolerance, and Migration Time). Cloud Load Balancing is used to scale the resources
(Add/scale up and remove/scale down). It is used to meet the Client demands to have
connected High Number of Client connections and to serve the Distributed workloads
by serving the Resources Usage Fully Operational.
Cloud creation and management are very easy but cloud has some challenges which
have to be managed by highly skillful employees or users or customers while
dealing with sensitive areas of cloud like Tasks Migration, Cloud Interoperability,
and Security. The major challenges of cloud are mentioned in Table 2.
68 V. K. Nampally et al.
Table 2 Challenges of cloud load balancing Sreenivas et al. [7] showcased different challenges
posed in cloud load balancing
Challenges of cloud Details
load balancing
Tasks migration Tasks migration’s purpose is to move tasks from an overloaded virtual
machine to a non-overloaded virtual machine
Energy management The energy management in the cloud should be good to get better
performance
Stored data management Data in the cloud should be appropriately distributed for fast access
and storage
Use of small different Small different data centers are always used for optimal resource
datacenters utilization and cloud computing in case of emergence
Cloud nodes distribution All the nodes should be distributed spatially in the cloud for
accessible locations
Cloud interoperability Cloud interoperability is the ability of one cloud service to interact
with other cloud services by exchanging information
Storage efficiency Storage efficiency comes by using the concept of data replication in
the cloud to different nodes
Load balancing Load balancing algorithm complexity should always be less for
algorithm complexity operations and execution
Fault tolerance/ Another controller must do processing load balancing if the primary
controller failure controller fails
Security In load balancing algorithm has to look over data security while
processing data before, after, and while
Cloud Load Balancing can be used in various real-time applications and some of
them are mentioned in Table 3.
2 Literature Survey
Many researchers contributed their work to Cloud Load Balancing. The different
research papers and their methods are given in Table 4.
6 URL Weight-Based Round Robin Load Balancing in Cloud Environment 69
In the URL weight-based Round Robin Cloud Load Balancing algorithm, every
requested URL is assigned a specific weight (1 or 2) by the load balancer as a time
slice. Weight 1 for standard page request and 2 values for database request.
The load balancer forwards the tasks to a particular server, and the server assigns
the tasks to particular VMs to process or redirect the tasks until all the tasks got
completed. Here the load balancer sends the tasks to servers, and the server sends
the tasks to VMS. VMs can send the tasks to other VMS or servers called task
redirection/migration. Task migration will be done until all the tasks got completed.
The flowchart of the proposed algorithm is shown in Fig. 3.
70 V. K. Nampally et al.
Table 4 (continued)
S. Research papers Details Research area Year
No
9 Weighted slightest Here every node is assigned a value Balancing load in 2015
connection by administrators. Most minor cloud environment
connection activities do traffic
distribution based on the assigned
value [20]
10 Resource-based Here a software agent is used at each Balancing load in 2018
node to send the complete details to cloud environment
the load balancer. Load balancer
takes the dynamic traffic routing
decisions with that information [6]
11 Request-based The load balancer distributes the Balancing load in 2020
traffic based on fields in query cloud environment
parameters, header data, and source
and destination IP addresses which
helps to move traffic from particular
sources to intended destinations and
maintain sessions [21]
12 Response time load Based on the response time of the Balancing load in 2014
balancing algorithm tasks previously done is used to cloud environment
assign tasks to the cloud load
balancer; i.e., the least response time
of the tasks is given to the cloud load
balancer [22]
Algorithm for URL weight-based Round Robin Cloud Load Balancing in Cloud
Servers
1. Initialize DataCentres with VMs, cloudlets, and Broker
a. Create VMs with specifications
i. Assign VM specification with capacity 100, placed at a unique data center.
b. Create cloudlets with specifications
i. Assign a Load of 2 for Database requests and a Load of 1 for HTTP
requests.
c. Create a Broker to transfer cloudlets to Datacenters.
d. Broker_0: Cloud Resource List received with n resource(s)
i. Create VM(s) in Datacenter(s)
1. VM #0 has been allocated to the host#0 Datacenter_0
2. VM #1 has been allocated to the host#0 Datacenter_1
3. VM #n-1 has been allocated to the host#0 Datacenter_n-1
4. VM #n has been allocated to the host#0 Datacenter n
2. Invoke the Scheduler and Load Balancer
a. Specify the Scheduler policy and Call the Load Balancer
i. Get Datacenter Ids List
ii. Distribute Requests For New VMs Across Data Centers Using Round
Robin
1. Initialize number of VMs allocated = 0;
2. Initialize available Datacenters;
3. If data center capacity is not Full
a. For each VM, get the data center Id in Round Robin Fashion
//Datacenter ID = availableDatacenters.get(i++ % available-
Datacenters.size());
b. Increment number of VMs Allocated;
c. Send Acknowledgment to Broker
3. Broker Sends cloudlets in Round Robin Fashion
a. 0 cloudlet to VM #0
b. 1 cloudlet to VM #1
c. n-1 cloudlet to VM #n-1
d. n cloudlet to VM #n
4. Broker receives cloudlets
5. Broker Destroys VMs
6. Shutdown DataCentres and Broker
In the above algorithm, cloudlets are the small data centers to which the VMS
are associated. For these VMs, the Work Loads requests are assigned in the round
robin fashion with a URL weight-based. The weights are assigned based on the
waiting time required for each VM. After calculating waiting time for all the VMs,
assign weight to each VMs and sort in ascending order. Now, assign a load of 2 for
6 URL Weight-Based Round Robin Load Balancing in Cloud Environment 73
Database requests and a load of 1 for HTTP requests. Now a broker is created to
transfer cloudlets to datacenters. In the next step, the broker sends the cloudlets in
round robin fashion. If the data center capacity is not full, then new workloads are
assigned otherwise workloads are assigned to new cloudlets till all the requests are
completed. Every cloudlets, VM, and tasks assigned are automatically done using
the GridSim Tool.
4 Results
We run the simulation more than one hour (approximately 100 times) on different
numbers of tasks with random length cloudlet (tasks) and calculate the result using the
space shared policy in CloudSim. Consider 5 virtual machines with bandwidth 1000
mbps, the number of CPUs for each virtual machine is 1. Keeping the number of tasks
ranging from 100 to 300 for each virtual machine, and the length of task is varying
from 10000 MI to 200,000 MI. Computational results show that proposed algorithms
reduce the makespan time compared to FCFS, SJF, and Min-Min algorithm as shown
in Table 5 and Fig. 4 shows the comparison between tasks and makespan.
Here the main data center consists of cloudlets. Each Cloudlet contains VMS used
to receive the Requests/tasks from the users. So each VM processes a different set of
tasks, and each task completion and makespans are different based on the workload
of the task and their URL weight (0 for less weight URL/1 for more weight URL).
Makespan
It is the total time taken by a set of jobs for its complete execution. So makespan
minimization is important while allotting the tasks to the VMS using any algorithm.
Every task in the cloud can be compared with another task for the parameters number
of VMs, the number of tasks, and makespan [23, 24].
5 Conclusion
In general, for balancing the load in the cloud, any one of the following algorithms
like Static Algorithm, Dynamic Algorithm, Round Robin Algorithm, Weighted
Round Robin, Opportunistic Load Balancing Algorithm, Load Balancing Algo-
rithm Minimum To Minimum, Load Balancing Algorithm Maximum to Minimum,
Least connection, weighted most minor connection, Resource-based, Request-based,
Response time Load Balancing Algorithm can be used. But in this paper, we
use weights for the round robin. In URL weight-based Round Robin Cloud Load
Balancing, every request is classified into one of the two categories and assigned to
the load balancer. The load balancer forwards the tasks to particular VMs to process or
redirect the tasks until all the tasks got completed by using the assigned values to the
URL as a time slice. In URL weight-based Round Robin Cloud Load Balancing, the
main parameters used are the number of virtual machines, tasks, and makespan used
to evaluate the algorithm’s performance. URL weight-based Round Robin Cloud
Load Balancing has to be implemented using the software-based approach for better
performance and utilization of resources in a cloud environment.
References
1. Patidar S, Rane D, Jain P, A survey paper on cloud computing. In: 012 second international
conference on advanced computing and communication technologies
2. Zhou M, Zhang R, Zeng D, Qian W, Services in the cloud computing era: a survey. 978-1-
4244-7820-0/10/$26.00 ©2010 IEEE IUCS2010
3. Rahman M, Iqbal S, Gao J (2014) Load balancer as a service in cloud computing. In: 2014
IEEE 8th international symposium on service oriented system engineering
4. AlKhatib AAA, Sawalha T, AlZu’bi S (2020) Load balancing techniques in software-defined
cloud computing: an overview. In: 2020 seventh international conference on software defined
systems (SDS)
5. Jaiswal AA, Jain S (2014) An approach towards the dynamic load management techniques in
cloud computing environment. 978-1-4799-7169-5/14/$31.00 ©2014 IEEE
6. Islam T, Hasan MS (2017) A performance comparison of load balancing algorithms for cloud
computing. 978-1-5386-3148-5/17/$31.00 © 2017 IEEE
7. Sreenivas V, Prathap M, Kemae M, Load balancing techniques: major challenge in cloud
computing – a systematic review
8. Rodge AS, Pramanik C, Bose J, Soni SK (2014) Multicast routing with load balancing using
amazon web service. In: 2014 annual IEEE India conference (INDICON)
6 URL Weight-Based Round Robin Load Balancing in Cloud Environment 75
9. Mishra SK, Sahoo B, Parida PP (2018) Load balancing in cloud computing: a big picture.
Preprint Submitted J LATEX Templates
10. Carutasu G, Botezatu MA, Botezatu C (2017) Cloud computing and windows azure. All content
following this page was uploaded by George Carutasu
11. Joshi S, Kumari U (2016) Load balancing in cloud computing: challenges & issues. 978-1-
5090-5256-1/16/$31.00_c 2016 IEEE
12. Aligarh Muslim University, Aligarh Muslim University (2017) A survey on load balancing
algorithms in cloud computing. Article Int J Autonomic Comput
13. Patel KD, Bhalodia TM, An efficient dynamic load balancing algorithm for virtual machine in
cloud computing. IEEE Xplore Part Number: CFP19K34-ART; ISBN: 978-1-5386-8113-8
14. Ghosh S, Banerjee C (2018) Dynamic time quantum priority based round robin for load
balancing in cloud environment. In: 2018 fourth international conference on research in
computational intelligence and communication networks (ICRCICN)
15. Wang W, Casale G (2014) Evaluating weighted round robin load balancing for cloud web
services. In: 2014 16th international symposium on symbolic and numeric algorithms for
scientific computing
16. Ojha SK, Rai H, Nazarov A (2020) Optimal load balancing in three level cloud computing
using osmotic hybrid and firefly algorithm. In: 2020 international conference engineering and
telecommunication (En& T) | 978-1-7281-8829-4/20/$31.00 ©2020 IEEE | https://doi.org/10.
1109/ENT50437.2020.9431250
17. Vishalika, Malhotra D (2018) LD_ASG: load balancing algorithm in cloud computing. In:
5th IEEE international conference on parallel, distributed and grid computing (PDGC-2018).
Solan, India 978–1
18. Li X, Mao Y, Xiao X, Zhuang Y (2014) An improved max-min task-scheduling algorithm for
elastic cloud. In: 2014 international symposium on computer, consumer and control
19. Islam T, Hasan MS (2017) A performance comparison of load balancing algorithms for cloud
computing. 978-1-5386-3148-5/17/$31.00 © 2017 IEEE 130
20. Kang L, Ting X (2015) Application of adaptive load balancing algorithm based on minimum
traffic in cloud computing architecture. 978-1-4799-1891-1/15/$31.00 ©2015 IEEE
21. Mohammed MA, Hasan RA, Ahmed MA, Tapus N, Shanan MA, Khaleel MK, Ali AH (2018)
A focal load balancer based algorithm for task assignment in a cloud environment. 978-1-5386-
4901-5/18/$31.00 ©2018 IEEE
22. Swarnakar S, Kumar N, Kumar A (2020) Modified genetic based algorithm for load balancing
in cloud computing. 978-1-7281-7340-5/20/$31.00 ©2020 IEEE
23. Sharma A, Peddoju SK (2014) Response time based load balancing in cloud computing. 978-
1-4799-4190-2/14/$31.00 ©2014 IEEE
24. Al-Maytami BA, Fan P, Hussain A, Baker T, Liatsis P, A task scheduling algorithm with
improved makespan based on prediction of tasks computation time algorithm for cloud
computing. Digital object identifier. https://doi.org/10.1109/ACCESS.2019.2948704
Chapter 7
Determination of Thickness
and Refractive Indices of Thin Films
from Reflectivity Spectrum Using Rao-1
Optimization Algorithm
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 77
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_7
78 B. H. Gevariya et al.
refractive index layer design has been proven to be effective in providing the neces-
sary AR coating performance for a variety of applications [10, 11]. For such a design,
the required spectral response, namely the reflectance and transmittance spectrum,
may be achieved by adjusting the refractive index (n) and thickness (t) of chosen
ARC materials. Hence, precise knowledge of the thickness and refractive index of
anti-reflective thin films is always required for designing optical coatings, and as a
result, the performance of optoelectronic devices may be enhanced. Hence, frequent
measurements are essential. Thus, an easy and fast technique for determining the
thickness and refractive index of optical thin films has significant importance.
Spectroscopic ellipsometry (SE) [12, 13] and spectrophotometry [14, 15] are
commonly used methods for the determination of the n and t of thin films. Though
the former method is more robust and reliable, it is significantly costlier. Keeping
that view in mind, the latter gives a comparatively good result. It can be used with a
multi-wavelength spectrum fitting technique, in which the experimentally measured
reflectance and/or transmittance spectrum are fitted with the theoretically calcu-
lated results using any optimization algorithm to determine the film’s thickness and
refractive index for a required wavelength domain. The refractive index is closely
related to the wavelength, in the multi-wavelength technique, and this relationship
can be described by certain optical dispersion equations, which can yield excellent
results for a wide range of materials and over a wide wavelength range. Several
global optimization algorithms have been effectively employed to determine n and
t of thin films, including particle swarm optimization [16], genetic algorithm [17,
18], pattern search [19], artificial neural network [20], simulated annealing [21] and
TLBO [22, 23]. However, in order to find the best solution, they need some algorithm-
specific parameters. As an example, PSO uses inertia weight, social, and cognitive
parameters. Similarly, GA utilizes mutation probability, crossover rate, and selection
operator. Furthermore, these factors are problem-specific, and determining optimal
values for these parameters is challenging. Improper parameter selection for these
algorithm-specific parameters may even increase calculation time or result in a local
optimum instead of a global one.
R. Venkata Rao introduced a new optimization algorithm, the Rao-1 optimization
algorithm [24], which significantly reduces the above-mentioned limitations. The
beauty of this algorithm is that no algorithm-specific parameters are needed. It simply
needs very few input parameters, such as no. of iterations and population size, which
are most common to every nature-inspired optimization algorithm. Until now, to our
knowledge, the Rao-1 optimization algorithm has not been used in the literature to
determine ARC thin film thickness and refractive index.
In this paper, the reflectivity of optical ARC thin films is measured using a spec-
trophotometric reflectometry method. This procedure is quite straightforward, non-
destructive, and relatively very simple to set up in the laboratory. The Rao-1 algorithm
is then used to fit the experimentally measured reflectivity spectra to theoretical ones.
PyCharm software is used to implement the algorithm, which is written in Python
(version 3.9).
7 Determination of Thickness and Refractive Indices of Thin Films … 79
2 Rao-1 Algorithm
where X l,m,best is the mth parameter value for the best solution and X l,m,worst is the
mth parameter value for the worst solution. X l,m,n is the updated value of X l,m,n and
rl,m,1 is the randomly generated number for the mth parameter for the lth iteration
in the range of 0–1. X l,m,n is acceptable if it improves the objective function’s value
otherwise the old solution remains as it is. All the acceptable objective function
values at the end of the iteration are kept and used as the input for the next iteration.
The determination of the thickness and refractive index are done by utilizing the
experimentally obtained reflectivity data for optical AR thin film. The Sellmeier
dispersion relation [25] up to two terms is used in this study to determine the refractive
index for the considered wavelength range. The following Eq. (2) represents the
Sellmeier equation that is utilized.
B1 λ2 B2 λ2
n 2 (λ) = 1 + + (2)
λ2 − C12 λ2 − C22
Hence, the four Sellmeier coefficients B1 , B2 , C1 , and C2 and the thickness (t)
form the population which contains all the unknown parameters or variables depicted
as Pi = (B1i , C1i , B2i , C2i , t1 ), where i = 1, 2, 3, … N, where N shows the size of
populations. Quality of the individual population (Pi ) can be decided by calculating
the value of the specified fitness function which is given by
s 2
k=1 R exp (λk ) − R cal (λk , B1i , C1i , B2i , C2i , ti )
F(P) = , (3)
s
80 B. H. Gevariya et al.
where R exp (λk ) is given by the value of reflectivity which is measured exper-
imentally at wavelength λk and R cal (λk , B1i , C1i , B2i , C2i , ti ) is given by the
value of reflectivity which is calculated theoretically at wavelength λk using the
transfer matrix method [26] with the help of the five unknown parameters, namely
B1i , C1i , B2i , C2i , ti . The s is given by the total number of points for which reflec-
tivity is measured. The values of variables are optimized in such a fashion so that
the calculated fitness function value of Pi as per Eq. (3) is improved iteration by
iteration by using the Rao-1 algorithm. By doing this iteratively, the best match of
theoretically calculated reflectivity values with the experimentally observed one is
found across a considered wavelength range. For the execution of code, three control
parameters are required: the number of unknown parameters, the number of itera-
tions, and the initial search range, which must be provided initially to the code as
input parameters. Unknown variables are optimized iteratively for the considered
problem by using the Rao-1 algorithm. The terminology of the Rao-1 algorithm in
respect of the considered problem is given in Table 1. The population was varied from
20 to 100 in steps of 20 with 50 runs for each population, and the no. of iterations
were kept constant at 1000 for the whole exercise.
A thin film of frequently utilized optical ARC materials, namely magnesium fluoride
(MgF2 ), aluminum oxide (Al2 O3 ), and silicon dioxide (SiO2 ), is grown separately
on an Indium Phosphide (InP) substrate at 100 °C under high vacuum conditions
(10–6 mbar). The deposition is carried out with the help of a 3 kW electron beam
evaporation unit provided with a 180° bend electron beam gun facility. The film’s
deposition rate and thickness were monitored by a quartz crystal oscillator integrated
within the chamber as it grew. The radiant heater mounted within the chamber is used
to heat the substrate.
7 Determination of Thickness and Refractive Indices of Thin Films … 81
To our knowledge, it is the first time that the Rao-1 algorithm has been used to
evaluate the refractive index and thickness of a thin film. However, before using an
algorithm in a practical application, it is critical to assess its efficiency. Keeping this
in mind, standard ellipsometric measurements are utilized as an experimental verifi-
cation tool. The experimentally obtained reflectivity spectra of thin films are matched
with theoretically calculated ones with the help of the transfer matrix method using
the self-developed program which uses Rao-1 algorithm to determine the thickness
and refractive index for all thin films. The thickness and refractive index values esti-
mated by Rao-1 are compared with ellipsometric measurements of the same samples.
The results obtained for various thin film samples are analyzed and discussed in a
subsequent subsections.
A single layer MgF2 on InP substrate A single layer MgF2 on InP substrate
40 1.44
Experimental Curve By Ellipsometry
Fitted curve using optimization algorithm By optimization algorithm
35
1.42
30
25
1.40
20
15
1.38
10
5 1.36
0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85
Wavelength λ (μm) Wavelength λ (μm)
(a) (b)
Fitness value evolution with iteration
0.200 Experimental Curve
0.175
0.150
Fitness value
0.125
0.100
0.075
0.050
0.025
(c)
Fig. 1 a Fitted reflectivity spectrum for MgF2 . b Wavelength-dependent refractive index for MgF2 .
c Fitness function evaluation with iteration for MgF2
A single layer Al2O3 on InP substrate A single layer Al2O3 on InP substrate
40 1.66
Experimental Curve By Ellipsometry
Fitted curve using optimization algorithm By optimization algorithm
35
30 1.64
Refractive index (n)
Reflectivity (%)
25
1.62
20
15
1.60
10
5
1.58
0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85
Wavelength λ (μm) Wavelength λ (μm)
(a) (b)
Fitness value evolution with iteration
10 Experimental Curve
8
Fitness value
0
0 200 400 600 800 1000
No. of iterations
(c)
Fig. 2 a Fitted reflectivity spectrum of Al2 O3 . b Wavelength-dependent refractive index for Al2 O3 .
c Fitness function evaluation with iteration for Al2 O3
84 B. H. Gevariya et al.
measurement (blue dashed line) for Al2 O3 film is shown in Fig. 2b. Although the
difference in refractive index values produced by the Rao-1 and attained by ellip-
sometry is considerably larger for Al2 O3 in comparison with MgF2 , it may still be
within an acceptable tolerance for most optical coating applications. Figure 2c shows
evolution of the fitness function with iterations with enlarged sections.
A single layer SiO2 on InP substrate A single layer SiO2 on InP substrate
40 1.48
Experimental Curve By Ellipsometry
Fitted curve using optimization algorithm By optimization algorithm
35
1.46
30
25
1.44
20
15
1.42
10
5 1.40
0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85
Wavelength λ (μm) Wavelength λ (μm)
(a) (b)
Fitness value evolution with iteration
7 Experimental Curve
5
Fitness value
No. of iterations
(c)
Fig. 3 a Fitted reflectivity spectrum of SiO2 . b Wavelength-dependent refractive index for SiO2 .
c Fitness function evaluation with iteration for SiO2
were collected at a low angle of incidence, roughly 5° off normal in our case, while
the transfer matrix method was used to compute reflectivity considered perfectly
normal incidence. Due to the mechanical limitations of the setup in the real world,
it is challenging to set up the incidence angle perfectly normal. This contributes
to a minor error in the fitting, particularly when calculating the dispersive refractive
index. This error may be minimized further in two ways: Firstly, by improving the
setup of reflectivity measurement if practically possible and secondly, by considering
the transfer matrix method approach for the non-normal angle of incidence. In addi-
tion to the above-mentioned measurement issues, the transfer matrix approach also
presumes that the films are completely homogenous and that the interfaces are sharp.
86 B. H. Gevariya et al.
In the real world, however, sharp interfaces are nearly hard to accomplish by most
of the available deposition techniques, which may also add variation in computed
reflectivity values. Eventually, this may have a greater impact on refractive index
values than on the value of thickness because of intrinsic nonlinearity and omittance
of higher-order terms from the sellmeier dispersion equation.
6 Conclusion
References
1. Kheraj VA, Panchal CJ, Patel PK, Arora BM, Sharma TK (2007) Optimization of facet coating
for highly strained InGaAs quantum well lasers operating at 1200 nm. Opt Laser Technol
39:1395–1399
2. Han L, Zhao H (2014) Simulation analysis of GaN microdomes with broadband omnidirectional
antireflection for concentrator photovoltaics. J Appl Phys 115:133102
3. Young NG, Perl EE, Farrell RM, Iza M, Keller S, Bowers JE, Nakamura S, DenBaars SP,
Speck JS (2014) High-performance broadband optical coatings on InGaN/GaN solar cells for
multijunction device integration. Appl Phys Lett 104:163902
4. Perl EE, McMahon WE, Bowers JE, Friedman DJ (2014) Design of anti-reflective nanostruc-
tures and optical coatings for next-generation multijunction photovoltaic devices. Opt Exp OE.
22:A1243–A1256
5. Hamden ET, Greer F, Hoenk ME, Blacksberg J, Dickie MR, Nikzad S, Christopher Martin D,
Schiminovich D (2011) Ultraviolet antireflection coatings for use in silicon detector design.
Appl Opt AO 50:4180–4188
6. Mancuso M, Beeman JW, Giuliani A, Dumoulin L, Olivieri E, Pessina G, Plantevin O, Rusconi
C, Tenconi M (2014) An experimental study of anti-reflective coatings in Ge light detectors
for scintillating bolometers. EPJ Web Conf 65:04003
7. Cho J-Y, Byeon K-J, Lee H (2011) Forming the graded-refractive-index antireflection layers
on light-emitting diodes to enhance the light extraction. Opt Lett OL 36:3203–3205
8. Zibik EA, Ng WH, Revin DG, Wilson LR, Cockburn JW, Groom KM, Hopkinson M (2006)
Broadband 6μm<λ<8μm superluminescent quantum cascade light-emitting diodes. Appl Phys
Lett 88:121109
9. Wang J, Li LT, Xu W, Yu R, Ramalingam J, Wu Z, Zhu W, Li X (2005) Ultrabroad-bandwidth
and high-power superluminescent light emitting diodes. In: Coherence domain optical methods
and optical coherence tomography in biomedicine IX. SPIE, pp 531–539
10. Deng C, Ki H (2016) Pulsed laser deposition of refractive-index-graded broadband antireflec-
tion coatings for silicon solar cells. Sol Energy Mater Sol Cells 147:37–45
11. Zhang J-C, Xiong L-M, Fang M, He H-B (2013) Wide-angle and broadband graded-refractive-
index antireflection coatings. Chinese Phys B 22:044201
12. Tompkins HG, Baker JH, Smith S, Convey D (2000) Spectroscopic ellipsometry and
reflectometry: a user’s perspective
13. Vedam K, Kim SY (1989) Simultaneous determination of refractive index, its dispersion
and depth-profile of magnesium oxide thin film by spectroscopic ellipsometry. Appl Opt AO
28:2691–2694
14. Dobrowolski JA, Ho FC, Waldorf A (1983) Determination of optical constants of thin film
coating materials based on inverse synthesis. Appl Opt AO 22:3191–3200
15. Caliendo C, Verona E, Saggio G (1997) An integrated optical method for measuring the
thickness and refractive index of birefringent thin films. Thin Solid Films 292:255–259
16. Salvi J, Barchiesi D (2014) Measurement of thicknesses and optical properties of thin films
from surface plasmon resonance (SPR). Appl Phys A 115:245–255
17. Torres-Costa V, Martín-Palma RJ, Martínez-Duart JM (2004) Optical constants of porous
silicon films and multilayers determined by genetic algorithms. J Appl Phys 96:4197–4203
18. Patel SJ, Kheraj V (2013) Determination of refractive index and thickness of thin-film from
reflectivity spectrum using genetic algorithm. AIP Conf Proc 1536:509–510
19. Miloua R, Kebbab Z, Chiker F, Sahraoui K, Khadraoui M, Benramdane N (2012) Determination
of layer thickness and optical constants of thin films by using a modified pattern search method.
Opt Lett OL 37:449–451
20. Tabet MF, McGahan WA (1999) Thickness and index measurement of transparent thin films
using neural network processed reflectance data. J Vac Sci Technol, A 17:1836–1839
21. Gao L, Lemarchand F, Lequime M (2011) Application of global optimization algorithms for
optical thin film index determination from spectro-photometric analysis. In: Advances in optical
thin films IV. SPIE, pp 65–81
88 B. H. Gevariya et al.
22. Patel SJ, Jariwala A, Panchal CJ, Kheraj V (2020) Determination of thickness and optical
parameters of thin films from reflectivity spectra using teaching-learning based optimization
algorithm. J Nano Electron Phys
23. Patel SJ et al (2017) A novel teaching-learning based optimization approach for design of
broad-band anti-reflection coatings. Swarm Evol Comput 34:68–74
24. Rao R (2020) Rao algorithms: three metaphor-less simple algorithms for solving optimization
problems. Int J Ind Eng Comput 11:107–130
25. Tatian B (1984) Fitting refractive-index data with the sellmeier dispersion formula. Appl Opt
AO 23:4477–4485
26. Kheraj VA, Panchal CJ, Desai MS, Potbhare V (2009) Simulation of reflectivity spectrum for
non-absorbing multilayer optical thin films. Pramana J Phys 72:1011–1022
Chapter 8
Depth Maps-Based 3D Convolutional
Neural Network and 3D Skeleton
Information with Time Sequence
for HAR
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 89
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_8
90 H. G. Hui et al.
Fig. 1 Structure based on 3D convolutional neural network and skeleton joint information for
human activity recognition
method and the automatic deep learning features-based method. In the traditional
method, the human activity recognition system is to design handcrafted features to
represent human activity. Handcrafted-based methods such as Space-Time Interest
Point (STIP), Optical Flow, Trajectory, Silhouette and Histogram of Oriented Gra-
dients (HOG) to extract relevant activity information from video sequences are the
well-known techniques in our research topic. The automatic learning features-based
method is either a 2D colour video sequence or a 3D depth video sequence as a
raw sequence into a network for feature extraction automatically. Liu et al. [3] used
a three-dimensional convolutional neural network (3DCNN) to extract a high-level
feature map put into SVM to classify different classes with very promising accuracy.
In this paper, we apply 3DCNN to extract a high-level feature map and then merge
skeleton point geometric information where distance, angles and time sequences
information as feature vectors are put into SVM to classify different activities. The
structure of the human activity recognition system with 3DCNN and skeleton infor-
mation was shown in Fig. 1. Section 2 reviews the recent related work for human activ-
ity recognition, Sect. 3 is the proposed methodology and implementation progress,
and Sect. 4 is the benchmark datasets MSR-Action3D and the experiment result.
Moreover, we also analysed the experiments result based on the hybrid feature vec-
tor on the MSR-Action3D datasets. Finally, Sect. 5 gives the conclusion and future
work.
2 Related Work
Now, with the development of computer methodology and computer device advances,
it already to be the hot top that deep learning with high-performance approaches in
several kinds of research domains such as speech recognization, autopilot, computer
vision, natural language processing and so on.
In the traditional method, the global- or local-based feature extraction and repre-
sentation still have great significance in human activity recognition. Well-designed
8 Depth Maps-Based 3D Convolutional Neural Network . . . 91
features vector is still to distinguish the types of activities with very promising accu-
racy. Tripathi et al. [4] described the system structure and it’s challenges in human
activity recognition, such as illumination changes, the shadow of objects, partial or
full object occlusions and noise in the image. In the meanwhile, discuss the human
activity recognition of all steps from dataset to classification in techniques. Wu et al.
[5] described types of current state-of-the-art approaches based on the deep learning
method reviews on RGB-D, single and multi-view datasets. The different viewpoint
feature representation and deep learning approach give us more new ideas in feature
extraction steps. Zhang et al. [6] highlighted the advances in the system of the human
activity recognition system: the global and local feature extraction, representation and
classification methods. Boualia et al. [7] discussed the human activity recognition
methodologies, advantages and disadvantages. In particular, distribute the feature
representation as local (depth maps-based, skeleton-based), global (Space-time vol-
ume, frequency) and modelling (Simple Blob, 2D model, 3D model). Dhiman et al.
[8] summarized the various existing handcrafted and deep learning approaches in
human activity recognition with the two types of 2D and 3D datasets. Pareek et al.
[9] discussed the types of machine learning approaches, the deeping learning tech-
niques and the characteristics of public datasets used for human activity recognition
system. Hbali et al. [10] presented a novel skeleton-based technique to describe the
spatiotemporal features of the human activity system. Ghazal et al. [11] extracted
skeleton information (motion and shape feature) using the Openpose library for
human activity classification. Dwivedi et al. [12] proposed new skeleton-based fea-
tures (Orientation Invariant Skeleton Feature) for human activity recognition. Yadav
et al. [13] proposed a novel deep learning network called a long short-term memory
network for skeleton-based activity recognition and fall detection system. Jalal et al.
[19] proposed novel multi-fused spatiotemporal features from continuous sequences
of depth maps and spatiotemporal skeleton point information in the human activity
recognition system. Fakhredanesh et al. [20] put forward an unsupervised activities
change detection approach, based on it to detect the action changes in time sequences.
Especially for the activity dataset of video is always changed in different frames, we
consider the time sequences to extract relevant information on frame changes.
In deep learning methods, the deep neural network (DNN) automatically extracts
the features from the video sequences or image and then put the feature map into fully
connected layers or a traditional machine learning classifier. Pham et al. [14] pre-
sented an overview of the current up-to-date deep learning method for human activity
recognition systems, presenting the branching of the well-known deep learning mod-
els, advantages and limitations for human activity recognition systems. Khan et al.
[31] proposed a well-known technique that principal component analysis (PCA) and
probabilistic neural network (PNN) for characters recognition system. Khanet al. [15]
offered a hybrid feature based on the silhouette (body shape) and deep learning fea-
tures map. Khan et al. [16] developed a hybrid model that combines a convolutional
neural network (CNN) with a long short-term memory network. Tran et al. [17, 18]
proposed a suitable, simple and efficient deep learning method 3D CNN to present
spatiotemporal features on large-scale supervised video data. Compared with the 2D
convolutional neural networks, 3D CNN considered one more dimension of informa-
92 H. G. Hui et al.
tion where the time sequences extract the relevant action of frames. Figure 2 gives the
2DCNN and 3DCNN convolutional processing on image or frame sequences. Based
on the 3DCNN automatic convolutional feature map and 3D skeleton information,
we hybrid the two types of feature representation to classify different activities on
the MSR-Action3D benchmark dataset.
Compared with the traditional RGB video dataset, RGB-D video depth maps have
more advantages, especially for lighting changes and dimension information col-
lection. Depth maps include one more dimension of the depth distance from the
camera to the object and less computation than RGB video images. Compared to
single-feature vectors, multi-feature fusion approaches extract more aspect informa-
tion about the activities in a video. In traditional method, Kumar et al. [21] hybrid
optical flow and texture information to extract feature vectors for activity classifi-
cation. In our paper, we proposed an automatic deep learning feature-based method
that 3D convolutional networks to extract feature maps from video sequences. Then
considered the 3D skeleton geometric information that key point distance, relevant
point angle changes and a novel time sequence difference information on activity
frames. The hybrid feature vector (feature maps and 3D skeleton information) was
put into a multi-class support vector machine [22] to discriminate different activities.
8 Depth Maps-Based 3D Convolutional Neural Network . . . 93
In the HAR system, we have been thinking about a system that automatically extracts
some precise information from video data. In data pre-processing, we process the
data to be well-suitable for the feature extraction model such as removing noise,
foreground detecting, image segmentation and transform approaches. In this paper,
we focus on feature extraction and feature representation to extract well-designed
feature vectors. At first, our proposed method used a two layers 3D CNN framework
to extract the Spatiotemporal feature maps from the raw data automatically. Before
implementing the network, we resized the frame to 128*128*38 (height * width *
time) and crop the foreground object from the full video frame. The raw data is
put into the convolution and maxpooling layer, and the second convolution layer
and maxpooling are the following layers. Finally, three full connecting layers carry
out a high-level feature map. The kernel size of the first Convolution layer (CL1)
is 7*6*6 (6*6 is the spatial dimension and 7 is the temporal dimension) and the
second layer is 5*5*5. In the following 3D Maxpooling layers, the kernel size is
2*2*2 down-sampling to reduce the dimension and redundant information. Finally,
the full connect layers are 63,488, 1024 and 128, and the 128-dimensional feature
map is carried out. The second type of feature representation is Kenict-based 3D
skeleton information. The distance, angle and time sequence of different changes on
the frame are calculated from skeleton joints key point 7 (hip centre) to the others.
The distance feature vector consists of 7–(1, 2, 8, 9, 10, 11, 5, 6, 14, 15, 16, 17)
key point distance information. The angle feature vectors are 3–(8, 9, 10, 11), 7–
(14, 15, 16, 17) and 8–10, 9–11, 14–16, 15–17. Finally, we considered the skeleton
point changes in every three frames with time sequences. Figures 3 and 4 show the
Fig. 4 Skeleton joint angle information and time sequences key point changes on frame difference
feature values calculation of the skeleton key point. The feature vector which deep
learning feature maps and 3D skeleton joint information from the depth maps and
the corresponding skeleton information was put into the SVM classifier.
We evaluated the accuracy of the hybrid feature vector system in the publicly available
benchmark dataset. MSR-Action3D dataset [23, 24] is an RGB-D action dataset,
captured by a depth camera. It is composed of 10 different people implementing
20 actions two or three times, and it includes bend, draw a circle, draw tick, draw
x, forward kick, forward punch, golf swing, hammer, hand catch, hand clap, high
arm wave, high throw, horizontal arm wave, jogging, pick up and throw, side kick,
side-boxing, tennis serve, tennis swing and two hand wave activities 20 classes. It
is a challenging dataset because of very similar activities such as forward punch-
hammer, bend-pick up and throw. All video sequences of that were recorded from a
fixed viewpoint camera and subjects are facing the camera while performing actions.
At the moment, in post-processing, remove the background. In our experiments, we
divide the activity into three groups AS1, AS2 and AS3 (Tabel 1). Skeleton joint data
are corresponding with the MSR-Action3D depth map by the same device capture.
Each frame has 20 key points in (x, y, z, c). Figure 3 gives the details about the Skelton
coordinate details.
8 Depth Maps-Based 3D Convolutional Neural Network . . . 95
Table 1 MSR-Action3D was divided into three subsets (action subset 1, 2, 3) in the experiment
AS1 AS2 AS3
Bend Draw tick Forward kick
Hammer Draw X Golf swing
Hand clap Forward tick High throw
High arm wave Hand catch Jogging
High throw High arm wave Side kick
Pick up and throw Side-box Tennis serve
Tennis serve Two hand wave Tennis swing
Random selecting 1/3 of the samples as testing and the rest as training on the MSR-
Action3D dataset for proposed method validation. The MSR-Action3D dataset was
divided into AS1, AS2 and AS3. Tables 2, 3 and 4 give the discriminate result on
AS1, AS2 and AS3. The accuracy of the activities Hammer is 87%, and the activities
Bend, Hand clap, High arm wave, High throw, Pick up and throw, and Tennis serves
all of them are 100%. The average accuracy is 98.1, 92 and 94.7%. In subsets AS1,
AS2 and AS3 of the datasets, Table 5 gives an accurate comparison of the human
activity recognition system on the MSR-Action3D dataset. Kao et al. [27] put for-
ward a skeleton-based graph structure feature representation in the human activity
recognition system. Chen et al. [29] presented a novel structure of depth motion
map features from depth sequences. Liu et al. [3] using 3D2 CNN extracted spatial–
temporal features from depth sequence. Our method hybrid 3D CNN feature maps
and skeleton joint information with time sequence. Even though we already have
gotten a better result on the dataset, we still need to apply more complex datasets
and up-to-date deep learning techniques to our method (Table 6).
96 H. G. Hui et al.
Table 6 Accuracy comparison of the HAR system on the MSR-Action3D dataset, random selecting
one-third of the samples as testing and the rest as training
Random 1/3 as testing data
[23] [28] [3] Our
AS1 93.4 98.61 92.78 98.1
AS2 92.9 97.92 97.06 92.8
AS3 96.3 94.93 98.59 94.7
Avg. 94.2 97.15 98.14 95.2
5 Conclusion
In this paper, we have proposed a 3D convolutional neural network for depth maps
video sequences to extract the high-level feature maps automatically after two layers
of 3DCNN and three full connecting layers. During the same time, calculate the
relevant corresponding skeleton joint distance from the hip centre to others without
foot, hand and head and then calculate action performance angles from six centre
points which are the shoulder centre, hip centre, two elbows and knee skeleton point
to compute angle changes. Finally, we proposed a novel feature vector on 3Dskelton
that is the time sequences skeleton joint changes on different frames. Hybrid deep
learning automatic feature maps and skeleton joint information is applied to the MSR-
Action3D dataset. The experiment result shows that our proposed method achieves
better results for classifying different activities when compared to other currently
existing approaches. Several actions got 100% accuracy. In future work, at first, we
wish to validate the proposed method with different validation approaches, such as
leave one out (LOO) and cross-validation and then applied the model to a more
complex dataset. Meanwhile, we also study deep learning techniques about time
sequence features to classify human activity.
References
1. Aggarwal JK, Ryoo MS (2011) Human activity analysis: a review. ACM Comput Surv (CSUR)
43(3):1–43
2. Ali HH, Moftah HM, Youssif AA (2018) Depth-based human activity recognition: a compar-
ative perspective study on feature extraction. Future Comput Inform J 3(1):51–67
3. Liu Z, Zhang C, Tian Y (2016) 3D-based deep convolutional neural network for action recog-
nition with depth sequences. Image Vis Comput 55:93–100
4. Tripathi RK, Jalal AS, Agrawal SC (2018) Suspicious human activity recognition: a review.
Artif Intell Rev 50(2):283–339
5. Wu D, Sharma N, Blumenstein M (2017, May) Recent advances in video-based human action
recognition using deep learning: a review. In: 2017 international joint conference on neural
networks (IJCNN). IEEE, pp 2865–2872
6. Zhang S, Wei Z, Nie J, Huang L, Wang S, Li Z (2017) A review on human activity recognition
using vision-based method. J Healthcare Eng
98 H. G. Hui et al.
7. Boualia SN, Amara NEB (2019, June) Pose-based human activity recognition: a review. In:
2019 15th international wireless communications and mobile computing conference (IWCMC).
IEEE, pp 1468–1475
8. Dhiman C, Vishwakarma DK (2019) A review of state-of-the-art techniques for abnormal
human activity recognition. Eng Appl Artif Intell 77:21–45
9. Pareek P, Thakkar A (2021) A survey on video-based human action recognition: recent updates,
datasets, challenges, and applications. Artif Intell Rev 54(3):2259–2322
10. Hbali Y, Hbali S, Ballihi L, Sadgal M (2018) Skeleton-based human activity recognition for
elderly monitoring systems. IET Comput Vis 12(1):16–26
11. Ghazal S, Khan US, Mubasher Saleem M, Rashid N, Iqbal J (2019) Human activity recognition
using 2D skeleton data and supervised machine learning. IET Image Process 13(13):2572–2578
12. Dwivedi N, Singh DK, Kushwaha DS (2020) Orientation invariant skeleton feature (OISF): a
new feature for human activity recognition. Multimedia Tools Appl 79(29):21037–21072
13. Yadav SK, Tiwari K, Pandey HM, Akbar SA (2022) Skeleton-based human activity recognition
using ConvLSTM and guided feature learning. Soft Comput 26(2):877–890
14. Pham HH, Khoudour L, Crouzil A, Zegers P, Velastin SA (2022) Video-based human action
recognition using deep learning: a review. arXiv preprint arXiv:2208.03775
15. Khan MA, Zhang YD, Allison M, Kadry S, Wang SH, Saba T, Iqbal T (2021) A fused het-
erogeneous deep neural network and robust feature selection framework for human actions
recognition. Arab J Sci Eng 1–16
16. Khan IU, Afzal S, Lee JW (2022) Human activity recognition via hybrid deep learning based
model. Sensors 22(1):323
17. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal fea-
tures with 3d convolutional networks. In: Proceedings of the IEEE international conference on
computer vision, pp 4489–4497
18. Ji S, Xu W, Yang M, Yu K (2012) 3D convolutional neural networks for human action recog-
nition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
19. Jalal A, Kim YH, Kim YJ, Kamal S, Kim D (2017) Robust human activity recognition from
depth video using spatiotemporal multi-fused features. Pattern Recogn 61:295–308
20. Fakhredanesh M, Roostaie S (2020) Action change detection in video based on HOG. J Electr
Comput Eng Innov (JECEI) 8(1):135–144
21. Kumar SS, John M (2016, October) Human activity recognition using optical flow based feature
set. In: 2016 IEEE international Carnahan conference on security technology (ICCST). IEEE,
pp 1–5
22. Weston J, Watkins C (1998) Multi-class support vector machines. Technical Report CSD-TR-
98-04, Department of Computer Science, Royal Hol-loway, University of London, May, pp
98–04
23. Li W, Zhang Z, Liu Z (2010, June) Action recognition based on a bag of 3d points. In: 2010
IEEE computer society conference on computer vision and pattern recognition-workshops.
IEEE, pp 9–14
24. Zhang J, Li W, Ogunbona PO, Wang P, Tang C (2016) RGB-D-based action recognition datasets:
a survey. Pattern Recogn 60:86–105
25. Paoletti G, Cavazza J, Beyan C, Del Bue A (2021, January) Subspace clustering for action
recognition with covariance representations and temporal pruning. In: 2020 25th international
conference on pattern recognition (ICPR). IEEE, pp 6035–6042
26. Zhao R, Xu W, Su H, Ji Q (2019) Bayesian hierarchical dynamic model for human action
recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pp 7733–7742
27. Kao JY, Ortega A, Tian D, Mansour H, Vetro A (2019, September) Graph based skeleton mod-
eling for human activity analysis. In: 2019 IEEE international conference on image processing
(ICIP). IEEE, pp 2025–2029
28. Ni B, Pei Y, Moulin P, Yan S (2013) Multilevel depth and image fusion for human activity
detection. IEEE Trans Cybern 43(5):1383–1394
8 Depth Maps-Based 3D Convolutional Neural Network . . . 99
29. Chen C, Zhang B, Hou Z, Jiang J, Liu M, Yang Y (2017) Action recognition from depth
sequences using weighted fusion of 2D and 3D auto-correlation of gradients features. Multim
Tools Appl 76(3):4651–4669
30. Bulbul MF, Islam S, Azme Z, Pareek P, Kabir M, Ali H (2022) Enhancing the performance of
3D auto-correlation gradient features in depth action classification. Int J Multimedia Inform
Retrieval 11(1):61–76
31. Aradhya VM, Niranjan SK, Kumar GH (2010) Probabilistic neural network based approach
for handwritten character recognition. Special Issue of IJCCT 1(2):3
Chapter 9
Deep Sea Debris Detection Using
YOLOIncep Network
1 Introduction
Marine environments everywhere, from shallow waters to the deep sea, are increas-
ingly clogged with debris. This issue exists in rivers and other bodies of water as
well. Marine debris is typically composed of difficult-to-degrade components, which
persist in the ecosystem. This debris negatively impacts and causes serious water
pollution issues over time. As a result, detecting and addressing them as soon as
possible is essential. The main problem with detecting marine debris is that it loses
its original shape in underwater due to the high pressure and temperature of its
surroundings. Therefore, it is difficult to obtain a detailed underwater debris data
set as only limited images are available, and there has not been much emphasis on
the deep sea floor debris detection field. Further, there is a lot of similarity between
classes and lots of diversity within a single class in deep sea debris data sets. This
poses yet another challenge that needs to be overcome in order to get better perfor-
mance. Currently, with the alarming increase of sea water pollution, many unmanned
vehicles [1] are being sent to clean the polluted water. But without effective detec-
tion algorithms, the unmanned vehicle may predict a bio-organism as debris or vice
versa. With the development of computer vision technology, it has become possible
to augment the available data sets, and results obtained previously can be improved.
Recent attempts have used skip connections in their models to improve the perfor-
mance metrics. This is clear in networks such as ResNet and DenseNet. Rather than
using a typical convolution layer, [2] proposed DenseNet with DeepResidual channel
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 101
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_9
102 J. Sudaroli Sandana et al.
2 Related Works
One of the recent works in deep sea debris detection is a novel network called
Shuffle-Xception [5]. Considering the diversity of class similarity [6] found in deep
sea debris, this network was proposed based on Xception architecture. The architec-
ture mentioned above initially uses separable convolution procedures to extract more
9 Deep Sea Debris Detection Using YOLOIncep Network 103
features and sophisticated characteristics from the deep sea data sets. The convolu-
tional layer learns both the spatial and channel dimensions of the feature map in
classical convolution.
Compared to other models, YOLO is a highly lightweight object detection and
localization model that recognizes objects with higher accuracy and recall rates. The
initial You Only Look Once (YOLO) [7] model was proposed by Joseph Redmon
et al. in 2015. Until then, RCNN models were the most widely used object detection
models. Despite being accurate, the RCNN family of models was slow since they
needed a multi-step process to find the ideal region for the bounding box, categorize
these regions, and then refine the outcome using postprocessing. YOLO was created
to replace multistage object detection with a single stage to enhance performance. The
unified detection strategy used by the YOLO model, which unifies several elements
of object identification into a single feed neural network, is the basis of its primary
operation.
The YOLO model divides the input image into several grids and evaluates the
chances that each grid contains the object in every image. Next, the algorithm creates
a single object by combining nearby high-value probability grids. YOLO employs
the non-max suppression (NMS) method of eliminating low-value predictions. The
higher probability bounding boxes are suppressed in favor of the lower value ones.
The model is also trained by comparing the center of each identified object to the
ground truth. Because bounding boxes learn entirely from data, YOLO V1 has defi-
cient performance in localizing boxes. YOLOv1 received a few upgrades, which
prompted the release of YOLOv2 [8]. The second version had anchor boxes. As seen
in Fig. 1, anchor boxes are predetermined areas representing the idealized place-
ment of the objects to be detected. The overlap over union (IoU) ratio between the
predicted bounding box and the pre-defined anchor box is calculated. If the chance
of the detected item is strong enough to generate a forecast, it is determined by the
IoU value.
The upgraded YOLOv3 [9] is an enhanced version of YOLO. Instead of fully
connected or pooling layers, YOLOv3 used 75 convolutional layers to produce a
far more compact and lightweight model. It learned a variety of features swiftly
and effectively by combining residual models from the ResNet model with feature
pyramid networks (FPN). An image feature extractor that extracts features of various
sizes, shapes, and kinds is known as a feature pyramid network.
All the input to the model is combined to boost the model to learn both local and
global features. Using logistic classifiers and activations, the YOLOv3 [10] class
predictions beat RetinaNet-50 and 101 in terms of accuracy. The foundation of the
YOLOv3 idea is the DarkNet53 architecture. The base YOLO architecture used for
deep sea debris detection is YOLOv5, the latest version of YOLO. Though many
researchers have attempted to improve object detection performance using YOLO,
the inception network with multiple kernel sizes motivates to modify the backbone
of the YOLO for deep sea debris detection.
104 J. Sudaroli Sandana et al.
3 Proposed Work
Inception network [11, 12] is one of the most efficient models with better perfor-
mance. This method reduces the computing resources while expanding the network’s
depth and breadth. Because of multiple scaling kernels in each convolution of the
inception network, it relieves from the loop of testing the optimal kernel size. As
explained before, YOLO is a very versatile model. It can be used with any other
existing models as the backbone. Many have attempted to use models such as ResNet
and VGG16 as their backbone. The advantages mentioned above of YOLO and the
inception network motivate us to implement Inception-Net as its backbone for deep
sea debris detection.
3.1 Methodology
As inspired from the super resolution [13] techniques for improving the classification
and localization performance, as a preprocessing step, super resolution techniques are
used to improve the resolution of deep sea debris images. The super resolution model
FuNIEGan [14] is used, as it was designed for deep sea applications. As a result,
FuNIEGan is used to improve the resolution of the images in the data set, as shown
in Fig. 2. Then, the super-resolved images are fed into the proposed YOLOIncep
model for classification and localization of debris in the images.
9 Deep Sea Debris Detection Using YOLOIncep Network 105
Fig. 2 Before (upper row) and after super resolution (below row)
The proposed YOLOIncep architecture for deep sea debris detection is shown in
Fig. 3. YOLO with inception modules in its backbone is shown in Fig. 3a. The
proposed YOLOIncep network consists of three parts: backbone, neck, and head. The
backbone part of the proposed architecture includes the inception block, bottleneck
layer block, and spatial pyramid pooling (SPP) block.
The inception network used in the proposed work is shown in Fig. 3b. Since it has
multiple kernels with different scales, it relieves us from choosing the optimal kernel
size and can detect target objects of all sizes. The major advantage of the inception
network is its receptive field because of the different scales of filter sizes. The multiple
scaling kernels in the inception block extract distinctive features, and the concate-
nation of these features increases the performance of the model. A bottleneck CSP
layer has fewer number of nodes than the preceding layers. A reduced-dimensional
representation of the input can be obtained using the bottleneck layer. Each bottle-
neck layer in YOLOv5 consists of three convolutions, as shown in Fig. 4. In addition
to this, the number of output channels from the bottleneck layer is determined by the
expansion factor, which reduces the number of channels in the successive layer of
the model.
Spatial Pyramid Pooling (SPP) [15] is a pooling layer that enables a CNN to
function without a fixed-size input constraint. Typically, the SPP layer is used on top
of a convolution layer. The fully connected layers get fixed-length outputs from the
SPP layer. To minimize the requirement for initial cropping or warping, information
106 J. Sudaroli Sandana et al.
4 Experiments
The images used in the proposed work are obtained from the Japan Agency for
Marine-Earth Science and Technology (JAMSTEC) data set [16] (https://www.
godac.jamstec.go.jp/jedi/e/). The sample images from the data set are shown in
Fig. 5. The data set is an open source; it includes images of marine trash captured by
the deep sea submersibles “SHINKAI 6500” and “HYPER-DOLPHIN”. From this
publicly available data set, three classes of debris and non-debris are used for the
experimentation as defined below:
Super resolution technique was applied to the images of size 256 × 256 used for
the proposed work. Input sizes that are either small or too large could result in data
loss, memory overflow, and more complicated calculations. Additionally, because
of memory constraints, if the input image scale is too large, the batch size is limited
(batch size = 1, 2), which could lead to dubious classification accuracy from the
network. After being super-resolved, the image is fed to the proposed YOLOIncep
model. The number of images in each category before data augmentation is given in
Table 1. The code was implemented in Python in Google’s Colab, which runs Python
3.7.13.
An optimizer is an algorithm that alters neural network properties like weights and
learning rates. SGD stands for stochastic gradient descent. It is an iterative technique
for maximizing an objective function with sufficient smoothness qualities. The loss
function used by YOLOv5 is GIoU. GIoU stands for generalized intersection over
union [17]. GIoU is an improved version of the IoU algorithm. The IoU algorithm
does not tell us if two shapes, A and B, are in the vicinity of one another. In the GIoU
algorithm, an object C is introduced such that C is the smallest object enclosing A
and B. GIoU (Eq. 1) is defined as
|C\(AUB)|
GIoU = IoU − . (1)
|C|
9 Deep Sea Debris Detection Using YOLOIncep Network 109
Precision is defined as the number of true positives divided by the total number
of positive predictions (i.e., the number of true positives plus the number of false
positives), as given in Eq. 2.
true positive
Precision = . (2)
true positive + false positive
The recall is a fraction of a class rightly identified as the target object, given in
Eq. 3.
true positive
Recall = . (3)
true positive + false negative
F1 score is a single statistic that combines both precision and recall. This calculates
the harmonic mean of a classifier’s accuracy, and recall is given in Eq. 4. This is done
because, in machine learning models, there is usually a trade-off between precision
and recall.
2( precision × Recall)
F1 Score = . (4)
precision + Recall
The detection and localization performance of the proposed model is shown in Fig. 6.
The model locates the object of interest in the image and encircles it with a bounding
box. The label indicates the class to which the object belongs. Multiple objects can
be detected from a single image. The existing YOLO models are compared with the
proposed YOLO with the inception backbone model (YOLOIncep). YOLO has many
models based on the depth of each model ranging from YOLOv5S (shallow YOLO)
to YOLOv5L (deep YOLO). The results obtained for the different YOLO versions
are given in Table 2. Hence, it is observed that modifying the YOLO backbone by
adding inception modules improves the performance of YOLO models. As mentioned
earlier, the various scaling filter sizes used in the inception module enhance the
performance of the network because of its receptive field. Furthermore, the YOLOv5
version performs better than YOLOv3 and YOLOv2. Debris and undersea life were
identified with 69.6% and 77.2% accuracy, respectively, by Watanabe et al. [4] using
YOLOV3, while YOLOv5 and YOLOIncep give much better results predictably.
Further, YOLO models can identify and localize the target in a single pass, a feature
not shown in other models [5]. Multiple objects of different classes are identified in
the same frame, as shown in Fig. 6.
110 J. Sudaroli Sandana et al.
Table 2 Comparison of
Model Plastic BIO ROV
existing YOLO models with
proposed YOLOIncep Precision
YOLOv5x 97.3 95.7 93
YOLOv5m 84.6 91.4 78
YOLOv5s 83.3 74.1 59.7
YOLOIncep 98.9 91.7 93.4
Recall
YOLOv5x 96.9 95 87.5
YOLOv5m 88.2 72.7 66.5
YOLOv5s 85.9 74.1 59.7
YOLOIncep 98.2 95.7 93.1
F1Score
YOLOv5x 97.1 95.3 90.2
YOLOv5m 86.4 81 71.8
YOLOv5s 84.6 78 67.1
YOLOIncep 98.5 93.7 93.2
6 Conclusion
The proposed work uses deep convolutional neural networks to examine the clas-
sification and localization of deep sea debris. The YOLO network model with an
inception backbone is proposed in this paper and compared against other existing
YOLO models. YOLOv5x performs better than the earlier versions of YOLO, as
9 Deep Sea Debris Detection Using YOLOIncep Network 111
References
1. Fulton M, Hong J, Islam MdJ, Sattar J (2019) Robotic detection of marine litter using deep
visual detection models. In: 2019 international conference on robotics and automation (ICRA).
IEEE, pp 5752–5758
2. Jang D-W, Park R-H (2019) Densenet with deep residual channel-attention blocks for single
image super resolution. In: Proceedings of the IEEE/CVF conference on computer vision and
pattern recognition workshops, pp 0–0
3. Xue B, Huang B, Wei W, Ge C, Li H, Zhao N, Zhang H (2021) An efficient deep-sea debris
detection method using deep neural networks. IEEE J Sel Topics Appl Earth Observ Remo
Sens PP:1–1. https://doi.org/10.1109/JSTARS.2021.3130238
4. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with
region proposal networks. In: Advances in neural information processing systems, vol 28
5. Kamat J, Gupta R (2021) Inception SN: an inception based convolutional neural network
for hyperspectral image classification. In: 2021 2nd global conference for advancement in
technology (GCAT), pp 1–4. https://doi.org/10.1109/GCAT52182.2021.9587504
6. Yin H, Cheng C (2010) Monitoring methods study on the great Pacific Ocean garbage patch.
In: 2010 international conference on management and service science. IEEE, pp 1–4
7. Watanabe J-I, Shao Y, Miura N (2019) Underwater and airborne monitoring of marine
ecosystems and debris. J Appl Rem Sens 13:1. https://doi.org/10.1117/1.JRS.13.044509
8. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object
detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp 779–788
9. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE
conference on computer vision and pattern recognition, pp 7263–7271
10. Lu Z, Lu J, Ge Q, Zhan T (2019) Multi-object detection method based on YOLO and
ResNet hybrid networks. In: 2019 IEEE 4th international conference on advanced robotics
and mechatronics (ICARM), pp 827–832. https://doi.org/10.1109/ICARM.2019.8833671
11. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.
02767
12. Chen J, Chen W, Zeb A, Yang S, Zhang D (2022) Lightweight inception networks for the
recognition and detection of rice plant diseases. IEEE Sens J 22(14):14628–14638. https://doi.
org/10.1109/JSEN.2022.3182304
13. Islam MdJ, Xia Y, Sattar J (2020) Fast underwater image enhancement for improved visual
perception. IEEE Robot Autom Lett 5(2):3227–3234. https://doi.org/10.1109/LRA.2020.297
4710
14. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich
A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer
vision and pattern recognition, pp 1–9
15. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks
for visual recognition. In: Computer vision—ECCV 2014. Springer International Publishing,
pp 346–361. https://doi.org/10.1007/978-3-319-10578-9_23
112 J. Sudaroli Sandana et al.
16. JAMSTEC (2009) JAMSTEC OFES (Ocean General Circulation Model for the Earth
Simulator) Dataset. JAMSTEC.https://doi.org/10.17596/0002029
17. Rezatofighi H, Tsoi N, Gwak JY, Sadeghian A, Reid I, Savarese S (2019) Generalized intersec-
tion over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/
CVF conference on computer vision and pattern recognition, pp 658–666
Chapter 10
Brain Tumor Early Diagnosis Using
Hybrid Fuzzy K-Means
and Convolutional Neural Networks
1 Introduction
A primary brain tumor is a common tumor associated with the brain, and a secondary
brain tumor starts from other carcinomas including lung, melanoma, breast, and
kidney [1]. According to the American Cancer Society estimated in 2022 in the USA,
there will be 25,050 adults diagnosed with primary malignant tumors of the brain and
spinal cord (14,170 males and 10,880 women). Human brains are encased in a fluid
called cerebrospinal fluid (CSF). From one ventricle to the next, this cerebrospinal
fluid flows. It aids with spine and brain protection. The subarachnoid space and the
central nervous system are encircled by the fluid in these ventricles; the nerves that
are located above the brain absorb the extra fluid that is released. Pressure builds up
in other areas of the brain when this waste fluid is not absorbed by the neurons [2].
Some symptoms result from this. Senior citizens have regular stress. However, a slow
secretion of extra fluid and an increase in pressure are the causes of hydrocephalus in
people over 60. Medical data mining can be used to investigate hidden information.
Various methodologies are handled in various ways in MRI datasets to uncover this
classification issue. However, this deep learning strategy is used to improve the data’s
accuracy and reduce processing time. In data mining applications, deep learning
reduces unrelated features and raises the associated number of features. The new
approach’s goal is to identify appropriate subgroups that are dealt with in complex
and unclear packages.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 113
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_10
114 M. Jeyavani and M. Karuppasamy
A simple CT scan method was employed to diagnose the illness. But an advanced
scan technology called an MRI is used to assess, identify, and keep track of a variety
of medical disorders affecting the anatomy of the skull from various perspectives. The
most recent method, known as 3D MRI, shows brain activity and aids in visualizing
the brain to determine whether there is a blockage brought on by fluid in the brain.
In the new approach, real-time MRI scans datasets to find fuzzy logic pattern
recognition were utilized, to apply the fuzzy K-means membership function, which
is used to diagnose the human brain from many perspectives. To avoid fuzziness or
overlap between the clusters CNN-based forward propagation and backward prop-
agation utilized, the region-growing approach is the extraction of brain regions to
determine benign or malignant.
2 Related Work
Rejiga [3] one of the major contributions, segmentation was used to evaluate, auto-
matic detection and analysis to predict hydrocephalus tumors in young patients health
issues using MRI scans. It is predicted and supported by Digital Imaging and Commu-
nications (DICOM). Also, the region-growing approach is the extraction of brain
regions, and the hydrocephalus segmentation method is employed. When compared
to the prior method, time is very much less of a factor, but prediction accuracy is low.
(Chen et al. [4] the benefaction) Fuzzy linear soft max mode, and rough reduction
are used and predict the fundamental emotions. It is assessing the depth feature’s
capability and the fuzzy rough set loss function used for detection. The final stage is to
extract and classify features using convolutional neural networks (CNNs). The GPU
model, forward propagation, backward propagation, and AlexNet were employed,
but the time consumption is high.
(Perez et al. [5] the beneficence) Fuzzy-based K-means clustering technique is
utilized to generate unique features. Also, an artificial neural network is used for
feature extraction. Also, a gray-level co-occurrence matrix has been used that has
more ability to diagnose the possible efficiently. It is identifying the affected part of
the brain tumor region. But more noise and accuracy are low.
The new approach indicated that fuzzy rule-based classification was utilized for
the fuzzy partition and merged with CNN to remove noise from the dataset, enhance
accuracy, and decrease the time consideration. The fuzzy partition method, a rule-
based approach with a membership function, is used to present a graphical represen-
tation of the patient’s health conditions that is simple to comprehend. The CNN-based
classifier is utilized to more accurately update the membership function and is also
employed to speed up the processing of the enormous data collection.
10 Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means … 115
3 Proposed Work
The brain tumor machine learning real-time databases are gathered from the popular
MRI scan center. The following phases were added to reduce noise reduction, increase
accuracy, and consider time. The hybrid fuzzy-based K-means methodology, fully
connected network-based feed-forward, and backpropagation were applied to illus-
trate the specific area of the affected brain tumor region. Phase 1: To illustrate
the patient’s health problems graphically, fuzzy rule-based membership functions
were employed. Phase 2: To modify the membership function, avoid the overlapping
problem and classify the clusters utilized in convolutional neural networks. However,
as there will be noise in the data during categorization, the enormous data collec-
tion of noise is filtered using a fuzzy rule-based method and the classification of the
statistical data makes good use of feed-forward and backpropagation.
4 Methodology
The new technique utilizes a hybrid approach to compare the performance of the fuzzy
K-means approach and convolutional neural networks. To display the performance
outcomes, the real-time datasets are evaluated and contrasted with the validation
data. The patient reports for brain tumors are gathered from a well-known MRI scan
center, developed as a machine learning database, and utilized. However, a fuzzy set
cannot, by itself, remove the noise data for the huge data in dimensionality reduction.
So, to determine the dimensionality reduction of the huge data, a fuzzy K-means
clustering rule-based algorithm was presented. To evaluate the accuracy performance,
all true values were collected, subtracted from the false value, and divided by the
number of total objects. Figure 1 shows the classifications of fuzzy that make use of
deep learning. Additionally, the fuzzy datasets were applied to convolutional neural
networks based on fully connected neural networks to obtain more accurate results.
For the diagnosis of hydrocephalus, four common MRI modalities FLAIR, T1, T1c,
and T2 are employed. The Fluid Attenuated Inversion Recovery (FLAIR) modality
is employed to detect the entire tumor component among these four modalities. The
core tumor area is seen in the T2 modality of the MRI datasets. The enhancing tumor
portion of the core tumor region is identified by the T1c modality.
The data are classified by utilizing fuzzy K-means and CNN-based fully connected
networks built based on convolutions. Effective outcomes have been employed in
fuzzy to deal with ambiguity. In the fuzzy set theories, the traditional bivalent set
is referred to as a crisp set. Everything operates according to a rule of true or false.
An object is divided into two categories: fully within a set and partially within a set
or union and intersect. In order to identify groupings that have not been explicitly
classified in the data, the K-means clustering technique was utilized. The K-means
algorithm optimizes similarity between data points inside clusters while minimizing
116 M. Jeyavani and M. Karuppasamy
similarity between points in various clusters. The afflicted area was successfully iden-
tified using a CNN-based fully connected network that used feed-forward and back-
propagation that illustrate brain tumor detection. Similar features can be programmed
in these networks. A number of hidden layers are present in networks that have been
properly trained and set up to map the knowledge of the input and output training
pairs.
5 Preprocessing
Before undergoing further processing, data that is being prepared for analysis under-
goes initial processing. Data preprocessing is the process of processing data. Data
cleaning is the process of resolving data disputes to restore lost utilities and reduce
noise. Data integration: By merging data from several forms, conflicts in the data
were resolved. Data transformation is the collected, normalized, and generalized data.
Data reduction: This stage aims to give the datasets a reduced representation. Data
discretization involves continuous features to narrow the characteristic component
spacing range. The data collection includes features, numerous classes, and real-time
data. The training dataset and the testing dataset were separated from the dataset to
be evaluated. Irrelevant features have been removed from the dataset, which was
collected from the MRI scan machine learning database. It is now in a format that
makes the original data understandable. To tackle such issues, data preprocessing
was used.
10 Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means … 117
To find a crisp set and fuzzy classification function, follow the steps. (1) fuzzy set,
(2) the fuzzy K-means algorithm, and (3) prediction and instance selection is listed
in that order.
−x − y2
μ R (x, y) = exp , (2)
s
118 M. Jeyavani and M. Karuppasamy
where μ R (x, y) is a fuzzy similarity relation and can be any distance function
or kernel, whereas μ R (x, y) specifies the expression for the negative values and
exp −x−y
2
s
denotes the expression for the positive values.
Using the rules has been created to forecast the decision values and classifications of
brand-new data. With a large amount of data, it can filter out the noise and identify
reliable data. The vast amount of data makes it exceedingly challenging to determine
its level of dependence. Therefore, it is crucial to eliminate data reduction for accuracy
objectives. The fuzzy degree of membership is evaluated during instance selection.
The two fuzzy positive regions determine Fuzzy Instance Selection (FIS). Threshold:
The amount that determines whether or not an object can be eliminated. Alpha is
a variable used to gauge fuzzy similarity [8]. A crucial idea that is also frequently
misinterpreted is probability. One area where this is crucial is in the understanding
of risk and relative risk. The model is created using the training and testing sets of
data. Based on the individuals’ ages and genders, the importance was determined
(male or female). Analysis has been done on the features to see if they can accurately
predict values in the histogram and plot after the data has been transformed and
before reduction is widely used.
10 Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means … 119
The DNN is a deep neural network, which is how the brain is organized. The brain
is the center of how a neural network works. Its more than 100 neurons enable it
to sophisticatedly process and compute massive volumes of data. There are three
different kinds of neural networks: convolutional neural networks (CNNs), recur-
rent neural networks (RNN), and artificial neural networks (ANN). Convolutional
neural networks are designed with several layers to extract features from raw input.
CNN’s primary layer is broken up into three sections, for instance, convolution,
pooling, and fully connected layers. In this new approach, a fully connected network
is suggested. There are numerous ways to alter CNN’s architecture. Every neuron in
a layer communicates with every other neuron in the layer above it. The input layer
is the first layer in the hierarchy and links the inputs and pixels directly. Datasets
are therefore collected and supplied to the input layer. The convolution layer then
receives these data samples from the input layer and performs feature extraction on
them [9].
The convolutional neural networks are employed in the most recent classification
technique. The dataset consists of two sets: a training dataset and a dataset for clas-
sifier testing. The process of dividing a dataset into two halves so that the exercise
and test datasets have equal sample sizes is called cross-validation with a holdout. In
our analysis, each sample has 127 samples. After the dataset has been split into two
sets, the CNN classifier is constructed using the training dataset. The test dataset is
then used to gauge the accuracy of the created classifier.
All of the neurons represent a certain region of the preceding layer when the extraction
process in accordance with kernel size is complete. Typically, layers apply and act
without a direct map to activate this convolution layer. The 3×3 kernels, which are
ingrained in the architecture and fast in testing, not only intensify and normalize
connections but also multiply data based on connection rotation because they were
built to be included in two models for all standards. Three different FCN kinds and
six different convolution layer types make up advanced brain tumor prediction. CNN
makes patchwork predictions for the probability distribution. However, pixel-level
probability distributions are predicted by FCN models. It will dive into great depth.
The information neighborhood pixels are the focal point of that pixel. Post-processing
is incorporated with hydrocephalus datasets to increase accuracy and forecast value.
vlj = f dil−1 ∗ kil j + blj (3)
i∈
120 M. Jeyavani and M. Karuppasamy
The input datasets, d i , are twisted with vlj learnable kernels, k ij , and supplied via
the function of activation, f (..), to construct the feature map that is produced vlj in the
convolution layer, dil−1 serves as the input channel’s representation. The result of the
convolution layer symbol is vlj . A bias additive, blj , is then assigned to each feature
map after the source of the input dataset is complicated using learnable kernels. A
bias vector is a set of neural network weights that is equivalent to the output of a
zero-input artificial neural network and does not require any input. Pre-output layers
each contain an additional neuron termed bias.
According to the design, the pooling layer comes after the next layer and aids in
reducing the output of the dimensionality convolutions. Pooling layers often come
in two different varieties: the normal pooling layer and the max pooling layer. With
typical pooling, the result is rounded off. But the brighter pixels were chosen via max
pooling. Max pooling is used when there are sharp features that cannot be recognized
[10]. When a black background is sought after, pixels are lighter. Max pooling layer:
Following the application of kernels to the input datasets, a subsampling layer is
used to generate geographical and configuration invariance. It might result in a 50%
reduction in the computation time for feature maps. The maximum pooling layer is
achieved in the non-overlapping neighborhoods by Eq. (4).
z i jk = max vi, j+n,k+m , (4)
where i and j denote an increase to update the max pooling rows and columns and k
is centroid point. To find the greatest value close to n × m, max pooling is utilized.
The fully connected network’s input vector receives the output of the subsampling
layer after that.
A fully connected network (FCN) was developed by Long et al. A completely linked
network representing the fully connected layer is returned. The extraction of high-
level structures takes place as completely integrated layers [11].
Optimize the weights of the neural connections and kernel constantly throughout
the backpropagation process; this layer improved the prediction value by imple-
menting the dense-based pixel prediction. Implemented maps are gathered by convo-
lution layers for publications along with sample output to obtain high accuracy. All
of the neurons connected to the neurons of the layer below are connected to the fully
connected network layers in Fig. 1. The qualities Age, Sex, T1, T2, FLAIR, V1,
10 Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means … 121
V2, V3, and V4 make up the FCN. Both neurons are connected to each layer in the
figure via a different layer. All of the layers carry error signals. Through 115 stages,
869.107694 error rates are calculated. The accuracy of the general set of data and
the pixel-wise prediction from the specific datasets were both improved by a fully
connected network. Any size of input from datasets is obtained using dense-based
pixels. After that, the values of the fuzzy rough set dimensionality reduction are
mapped using FCN. In completely connected neural networks, which are artificial
neural networks, all of the nodes, or neurons, in one layer are connected to the neurons
in the subsequent layer. The fully connected layer is processed as a feed-forward and
back-forward neural network.
m
x ij = wi j × yil−1 bi (5)
i=0
yil−1 is the input of M-FCL from the previous layer’s output, where m is the total
number of inputs that nerve cell j has received. Biased values are added to the sums
produced at each node during the forward phase (excluding Input nodes). In other
words, the bias related to a particular node is added to the score before utilizing the
activation function at that node. The M-FCL output signal y il is defined as follows:
y lj = f x lj , (6)
where f x lj stands for the fully connected layer’s activation function. The output
signal is then compared to the desired output, which results in an error that is returned
and relayed layer by layer into the system network, as has been detailed in more detail
in the accompanying picture.
The deep convolutional neural network uses back and forward propagation in
Fig. 2. The qualities Age, Sex, T1, T2, FLAIR, V1, V2, V3, and V4 make up the
propagation. Both neurons are coupled for forwarding and backward propagation
122 M. Jeyavani and M. Karuppasamy
in the illustration [12]. All of the layers carry error signals. Through 189 steps,
870.492308 error rates are calculated. Backpropagation involves calculating an error
signal by contrasting the network’s output with the desired output. The ensuing erro-
neous signal travels layer by layer backward through the network. The convolutional
forward network’s weight connections are updated in this study using the stochastic
gradient descent backpropagation method. When the forward propagation parameters
are altered, the error signal is sent back to the convolutional layer and subsampling
layer. The subsampling layer multiplies the local gradient. The local gradients of the
convolutional layer are defined as follows for kernel updates:
δlj = f l u j .up δl+1
j (7)
f l u j are the derivatives of the activation function, u j is the input prior to activation,
and up(.) is the subsampling of neighborhood gradients from layer l + 1, which is
the subsampling layer. The bias, bj , is determined by summing each component in
δlj as follows,
δE
= δlj qr, (8)
δb j q,r
δE
= δlj qr ( pil−1 )qr. (9)
δki, j
l
q,r
Finally, Eq. 9 computes the gradients for updating the kernels, where ( pil−1 )qr is
the patch in u l−1
i this is multiplication by elements with ki,l j . . The element at (q, r)
in the convolutional map’s output, u li , is computed [13].
The features were used to obtain optimal brain features from the collection of the
selection approach. Multiple feature subsets are integrated into group-based feature
10 Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means … 123
8 Implementation of Result
Techniques for hybrid fuzzy K-means cannot manage noisy data on their own. To
implement fuzzy in the membership function, fully connected network-based feed-
forward and backpropagation were integrated. Table 1 contains a summary of the
sample dataset.
Table 1 shows each dataset including the number of features (real-time dataset
attributes). To illustrate the prediction value, the fitted value between 0 and 1 is also
referred to as the predicted value 0, 1, or near 1. With this prediction, the value was
found to be accurate.
Finally, Table 2 has the overall performance to show the fuzzy K-means (FKM) and
convolutional neural networks (CNNs)-based reduction (RED). The table portrays
the accuracy attained by the FKM, FKM-RED, FKM-CNN, and FKM-QR-CNN
accuracy are 84, 86, 88, and 90%, showing that the fuzzy K-means reduction
integrated with CNN result has the highest value 90%.
Figure 3 shows the results of our fuzzy K-means clustering integrated with a fully
connected network compared to the performance. In the end, according to the analysis
overall classification, it is demonstrated that FKM-CNN attained high accuracy.
To evaluate, the results are compared with those from the previous method to
demonstrate the suggested method’s effectiveness. The accuracy of several classi-
fications of diagnoses of brain tumors with and without the suggested strategy is
shown in the datasets [4, 6, 7, 16]
Without a prediction strategy, the accuracy of the different classifications on the
brain tumor dataset would be substantially lower. Therefore, after the prediction
strategy the accuracy has been taken for the brain tumor dataset. Table 3 compares
the accuracy of various classifications with and without the suggested approach. The
aforementioned table shows that, in terms of classification accuracy, the suggested
method utilized outperforms high in all other classifications on the dataset.
Table 2 Classification
FKM FKM-RED FKM-CNN FKM-RED-CNN
accuracy results
0.83673 0.85714 0.87755 0.89795
Table 3 Classification
Accuracy based methodology Accuracy
accuracy of comparative
study table Fuzzy rough optimization HOG 84.70
KNN FNPME-FS 87.38
DEFRS SVM 87.03
Neural network FRQR 80.06
SegNet Max DT 85.00
Fuzzy K-means FKM-CNN 89.795
The [bold] designation in Table 3 denotes when compared to the
current state of the art, our new method, FKM-CNN, provides
better accuracy
10 Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means … 125
9 Conclusion
To explore a new approach discussed, soft computing tools were considered and
analyzed and compared to show the training dataset and test dataset. Comparing
which two have the highest prediction value, test data was found to have the highest
prediction value. It breaks the entire writing process into phases and provides facts
on the most common types of brain tumors. Fuzzy K-means and convolutional neural
network outcomes were compared, and the data was assessed for how to separate
brain tumors, a particular type of brain tissue, from real-time MRI data. The suggested
method uses less time and reduces noise. Also, FKM, FKM-RED, FKM-CNN, and
FKM-QR-CNN accuracy are 84, 86, 88, and 90%. So it shows that FKM-RED-
CNN accuracy is better than other solutions for early brain tumor detection. The
development of each field related to FKM-RED-CNN has been promoted but is still
in its infancy, and a variety of problems are still unresolved.
References
1. Wong T-T, Liang M-L, Chen H-H, Chang F-C (2011) Hydrocephalus with brain tumors in
children. In: Child’s nervous system, vol 27(10). Springer, pp 1723–1734
2. Bulat M (1993) Dynamics and statics of the cerebrospinal fluid: the classical and a new
hypothesis. In: Intracranial pressure VIII. Springer, pp 726–730
3. Rajiga SV, Gunasekaran M (2021) Techniques of image processing and segmentation in
predicting hydrocephalus using magnetic resonance image. In: 7th international conference
on advanced computing and communication systems (ICACCS) (1). IEEE, pp 1942–1945
4. Chen X, Li D, Wang P, Yang X (2020) A deep convolutional neural network with fuzzy rough
sets for FER. IEEE Access 8:2772–2779
5. Sharma M, Purohit GN, Mukherjee S (2018) Information retrieves from brain MRI images for
tumor detection using hybrid technique K-means and artificial neural network (KMANN). In:
Networking communication and data knowledge engineering, vol 14. Springer, pp 145–157
6. Jeyavani M, Karuppasamy M (2022) EEG in optic nerves disorder based on FSVM using kernel
membership function. In: ICT with intelligent applications, vol 1(16). Springer, pp 144–154
7. Vijay J, Subhashini J (2013) An efficient brain tumor detection methodology using K-means
clustering algorithm. In: International conference on communications and signal processing
(ICCSP). IEEE Xplore, pp 653–657
8. Wei J, Chang Z, Mao L (2021) Matrix-based optimistic multigranulation fuzzy covering rough
sets. In: 2nd international conference on big data. IEEE, pp 838–841
9. Akkus Z, Galimzianova A, Hoogi A, Rubin DL, Erickson BJ (2017) Deep learning for brain
MRI segmentation: state of the art and future directions. J Dig Imaging 30(4):449–459. Springer
10. Naceur B, Mostefa MA, Saouli R, Kachouri R (2020) Deep convolutional neural networks
for brain tumor segmentation: boosting performance using deep transfer learning: preliminary
results. In: International MICCAI brain lesion workshop. Springer, pp 303–315
11. Hesamian MH, Jia W, He X, Kennedy P (2019) Deep learning techniques for medical image
segmentation: achievements and challenges. J Dig Imaging 32(4):582–596
12. Gupta TK, Raza K (2020) Optimizing deep feedforward neural network architecture. In: A
tabu search based approach. Springer (Neural Process Lett 51(3):2855–2870)
13. Lau MM, Phang JTS, Lim KH (2019) Convolutional deep feedforward network for image clas-
sification. In: 7th international conference on smart computing and communications (ICSCC).
IEEE, pp 1–4
126 M. Jeyavani and M. Karuppasamy
14. Jansi Rani M, Karuppasamy M (2022) Cloud computing-based parallel mutual information
for gene selection and support vector machine classification for brain tumor microarray data.
NeuroQuantology 20(6):6223–6233
15. Jansi Rani M, Karuppasamy M, Prabha M (2021) Bacterial foraging optimization algorithm
based feature selection for microarray data classification. Mater Today Proc. Elsevier
16. Alqazzaz S, Sun X, Yang X, Nokes L (2019) Automated brain tumor segmentation on multi-
modal MR image using SegNet. In: Computational visual media, vol 5(2). Springer, pp 209–219
Chapter 11
Precipitation Forecasting: LSTM
Modeling in Visual Analytic Framework
1 Introduction
Visual analytics is the broad branch of visualization which deals with systematic
investigation of input, implantation of a suitable data/analytical model, investigating
outputs and assessing the model through the visualizations constructed in par with
each stage of application development. It is well known that ‘A picture worth 1000
words’. Visualizations aid the understanding and assessment of underlying models
easier. This research work uses LSTM as a modelling component in visual analytic
framework; the performance of LSTM is outlooked by the visualizations generated
by tensor flow depicting the losses.
1.2 LSTM
Long short-term memory, LSTM, is the inherent recurrent neural network in which
the controlled manner of relative learning happens with respect to ‘specified number
of past days’—the trend and seasonality weightage is learnt here. This specialty
of LSTM benefits the time series-based applications, which are naturally nonlinear.
LSTM networks contain padded sequences of LSTM memory cells incurring removal
S. Govindan (B)
Madurai Kamaraj University, Madurai, Tamilnadu, India
e-mail: g.sudha79@rediff.com
S. Sangaiah
PG and Research Department of Computer Science, Sri Meenakshi Government Arts College
for Women (A), Madurai, Tamilnadu, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 127
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_11
128 S. Govindan and S. Sangaiah
or adding information capability. Each input layer of the LSTM network has number
of LSTM cells to accept input vector, followed by LSTM-hidden layer that consists
of forget unit performing weightage-based recent learning in par with past days and
output unit to deliver learning outcome to next LSTM/dense/dropout layer.
LSTM requires input (represented by ‘x’) features arranged in associated manner
and accumulated as batches; fed in a sequence. The input–output association is leant
by LSTM during its training which incurs a number of epochs. Cells are connected
with cells of next layer that are weighted; weight matrix ‘w’ denotes weights assigned;
‘cell state at time-t’ is given by ‘ct ’; bias is denoted by ‘b’; ‘o’ represents cell output;
‘f ’ represents forget state of memory cell.
Each LSTM cell contains (a) input gate using sigmoid layer (Eq. 1) used to
control which values are to be updated. And tanh (or different function may be used
demanded by underlying application) layer (Eq. 2) assigning weights to the values
to be added to the cell state.
i t = σ Wi . h t−1 , xt + bi (1)
ĉt = tanh(Wc . h t−1 , xt + bc ) (2)
(b) forget gate insists on the amount of past value to be forgotten (Eq. 3).
f t = σ W f . h t−1 , xt + b f (3)
(c) output gate decides ‘what part of current cell’ is to be delivered as output through
layers. The sigmoid layer (uses Eq. 4) decides ‘which part of the cell state’ is selected
for output. Next layer, tanh, provides rendering of weights using Eq. 5.
ot = σ (Wo .[h t−1 , xt + bo ) (4)
h t = ot ∗ tanh(ct ) (5)
Several researches have been shown the usage of LSTM for forecast reveals its detain-
ment of recent trends and seasonality. Lee et al. [1] have summarized the research
work in guiding the traffic using LSTM and suggested usage of ‘tanhx’ at hidden
layer, and ‘softmax’ activation functions at output layer with run of 50 epochs have
yield better prediction. They concluded that LSTM is capable of learning nonlinearity.
Anandharajan et al. [2] have presented weather prediction based on regression in
artificial intelligence approach and illustrated with cost function aiming to minimize
mean square error (MSE), and cost function was minimized using gradient descent
differential function. Patlakas et al. [3] have predicted wind gust speed through the
polynomial Kalman filtering local adaptation model. Research works presented [4, 6,
8] prove that LSTM effectively learns the past day observations and is able to predict
better than other machine learning, neural network models. The results were more
correlated with actual values since the prediction is made on high impact/relevance to
recent past days scenario. Zhao et al. [5] have presented research work to predict air
quality index through datamining technique based on temporal correlation; the same
is meaningfully done by LSTM that additionally learnt trend. Research work done
by Shah et al. [7] presented a feed forward neural network with ‘logsig’ activation
function for hidden layers, ‘puresig’ activation function for output layer and used
analytical equations to predict pollution features. Khairudin et al. [10] concluded
that LSTM outperforms decision tree, support vector machine and random forest
algorithm in weather forecasting application. Sudha et al. [9, 11, 12] have presented
the way in which visual analytics improves the insights procurement while adopting
linear regression and autoregressive moving average models in weather and pollution
feature predictions.
The precipitation is to be forecasted by LSTM after learning the past year recording
of weather and weather features. The underlying real-time dataset (recorded at
multiple weather and pollution monitoring stations of ‘Chennai’ city, Tamilnadu,
India) [13–18] used in this work has three weather-related attributes such as temper-
ature, humidity, visibility and two weather-influencing attributes: particulate matter
of size 0.25 (PM2.5) and nitrogen dioxide recorded in 2018. One-year observa-
tions are fed as input, and 15-day forecasting is estimated; the entire work is coded
using Python—tensor flow framework. The works [9, 11, 12] elaborated missing
data procurement through Pearson correlation and Autoregressive Integrated Moving
Average (ARIMA).
130 S. Govindan and S. Sangaiah
Preprocessing
Preprocessing of the dataset is mandatory and achieved through either normaliza-
tion or standardization process. Normalization squeezes the attribute values between
‘0 and 1’. Standardization transforms the values into certain lower discrete levels,
generally about mean zero. Such kind of preprocessing aids the LSTM in reducing
pay loads, convergence, and improving training time. The Algorithm 1, PreProcess,
applies various normalization and standardization techniques on the given dataset;
the respective steps are summarized as comment lines preceded with ‘//’. The good-
ness of transformation is accessed by plotting the attribute values in a histogram
visualization. If the visualization of the residuals, tends to have a bell-shaped curve
about zero or some constant, then the standardization is achieved. Table 1 summa-
rizes the application of different scalar functions on the underlying dataset and its
impacts.
Fig. 1 [Model #7]: visualization representing the loss occurred at the training and validation phase
of LSTM. X-axis represents the number of epochs and the Y-axis represents the loss
This work has exercised the activation functions such as exponential linear unit
(elu), exponential activation (exponential), selu (scaled exponential), gelu (Gaussian
error linear unit), linear (input is unmodified), sigmoid, softplus, softsign, swish,
tanh, linear activation functions. Since precipitation is the forecasting feature (having
nature of minimal value), this work has used loss functions MSE and MAE in order
to assess the learning capacity of LSTM network.
134 S. Govindan and S. Sangaiah
Table 1 summarizes the training and validation loss observed in various LSTM
models taken into the account of standardization of input, activation function, number
of past days for remembrance, error measurement adopted. Several models with
different structure and parameters were built and assessed; the seven better models
are tabulated in Table 1. Even though some of the models’ losses are less, they are not
capable of predicting precipitation which is highly correlated to actual observation.
It is learnt that LSTM prediction cannot be judged only by the training, validation
losses.
The better prediction is yield by model [Model #7], having 17,920, 33,024, 65
parameters at input, hidden and dense layer (prior to output layer) respectively.
Figure 1 represents the training and validation loss observed for 50 epochs of
Model #7. Input at training dataset has nine-month observations with a batch size
of four. Figure 2 and Table 2 show the forecast done by Model #7, for the next
15 days, which are more than 90% closer to actual recordings; it also predicts rainy
and non-rainy days perfectly.
Differentiating the input values aided improved results in 1–3%. Differentiation
of input yields better forecasts than log exponential transformations; in this case,
inverse transformation functions of respective transformations are necessary.
Fig. 2 [Model #7]: precipitation forecasting (shown in orange color) for next 15 days and the
prediction matches with real observation for both rainy days, non-rainy days
11 Precipitation Forecasting: LSTM Modeling in Visual Analytic Framework 135
Table 2 Predicted
Day Actual recording Predicted precipitation
precipitation compared with
actual recording 1 0 0.000001
2 0.515044 0.505088
3 0.126652 0.20049
4 0.194881 0.2011
5 0.127578 0.1189
6 0.052117 0.038
7 0.015975 0.03
8 0 0.0001
9 0 0.00009
10 0.3 0.3198
11 0.5 0.44876
12 1.2493 0.9872
13 0.94142 0.92
14 0.854374 0.83
15 0.673 0.5812
1.6 Conclusion
References
1. Lee C et al (2020) A visual analytics system for exploring, monitoring, and forecasting road
traffic congestion. IEEE Trans Vis Comput Graph 26(11):3133–3146. https://doi.org/10.1109/
TVCG.2019.2922597
2. Anandharajan TRV, Hariharan GA, Vignajeth KK, Jijendiran R, Kushmita (2016) Weather
monitoring using artificial intelligence. In: 2016 2nd international conference on computational
intelligence and networks (CINE), pp 106–111. https://doi.org/10.1109/CINE.2016.26
3. Patlakas P, Drakaki E, Galanis G, Spyrou C, Kallos G (2017) Wind gust estimation by combining
a numerical weather prediction model and statistical post-processing. Energ Procedia 125:190–
198
136 S. Govindan and S. Sangaiah
4. Korunoski M, Stojkoska BR, Trivodaliev K (2019) Internet of things solution for intelligent air
pollution prediction and visualization, pp 1–6. https://doi.org/10.1109/EUROCON.2019.886
1609
5. Zhao G, Huang G, He H, Wang Q (2019) Innovative spatial-temporal network modeling and
analysis method of air quality. IEEE Access 7:26241–26254. https://doi.org/10.1109/ACCESS.
2019.2900997
6. Zhang B, Zhang H, Zhao G, Lian J (2020) Constructing a PM2.5 concentration prediction
model by combining auto-encoder with Bi-LSTM neural networks. Environm Model Softw
124:104600. ISSN 1364-8152,https://doi.org/10.1016/j.envsoft.2019.104600
7. Shah J, Mishra B (2020) Analytical equations based prediction approach for PM2.5 using
artificial neural network. SN Appl Sci 2:1516. https://doi.org/10.1007/s42452-020-03294-w
8. Askari B, Le Quy T, Ntoutsi E (2020) Taxi demand prediction using an LSTM-based deep
sequence model and points of interest. In: 2020 IEEE 44th annual computers, software, and
applications conference (COMPSAC), pp 1719–1724. https://doi.org/10.1109/COMPSAC48
688.2020.000-7
9. Sudha G, Thangaraj M, Sangaiah S (2020) Numerical weather analysis using statistical
modelling as visual analytics technique. In: Venkata Krishna P, Obaidat M (eds) Emerging
research in data engineering systems and computer communications. Advances in intelligent
systems and computing, vol 1054. Springer, Singapore. https://doi.org/10.1007/978-981-15-
0135-7_9
10. Khairudin NBM, Mustapha NB, Aris TNBM, Zolkepli MB (2020) Comparison of machine
learning models for rainfall forecasting. In: 2020 international conference on computer science
and its application in agriculture (ICOSICA), pp 1–5.https://doi.org/10.1109/ICOSICA49951.
2020.9243275
11. Sudha G, Sangaiah S Insights through visualizations of air quality at Chennai city. In: Proceed-
ings of international conference on data science and information ecosystem’21 (ICDIE’21),
pp 135–138. ISBN 978-93-91373-04-7
12. Sudha G, Suguna S (2022) Health hazard: PM2.5 forecast—a visual analytic framework using
ARIMA. Int J Health Sci 6(S2):630–642. https://doi.org/10.53730/ijhs.v6nS2.5066
13. https://mausam.imd.gov.in/chennai/. Last accessed 2019/04/01
14. http://www.tnpcb.gov.in/air-quality.php. Last accessed 2019/09/22
15. https://smartcities.data.gov.in/. Last accessed 2019/10/18
16. https://www.tn.gov.in/. Last accessed 2019/11/21
17. https://app.cpcbccr.com/. Last accessed 2019/05/21
18. http://bhuvan.nrsc.gov.in/home/index.php. Last accessed 2019/01/09
19. https://www.noaa.gov/. Last accessed 2019/02/06
Chapter 12
Cyclone Forecasting Before Eye
Formation Using Deep Learning
1 Introduction
A cyclone is a fast-moving storm that arises in the oceans and absorbs energy to grow.
Tropical cyclones are among the greatest hazards to property and life, even in their
initial phases of formation. Hence, the detection and prediction of cyclones in the
early stages that are before the formation of eye are important. Detection of cyclones
and their prediction can be done using conventional means such as practices that
involve specific parameters like temperature, wind speed, etc. Deep learning tech-
niques such as segmentation and time series forecasting are being used in recent
advances. The solution proposed in this paper elaborates on the aforementioned neu-
ral network-based method. Furthermore, the cyclone is detected using K-means that
is compared with other segmentation techniques like Detectron2 and Mask recurrent
convolutional neural network (R-CNN). This proposed model has applications per-
taining to the detection and prediction of a cyclone using the satellite. Prediction of
forest fires is also one of the application of this model.
Tropical cyclones are one of the biggest threats to life and property even in the
formative stages of their development. Once the eye of the cyclone is formed it
reaches the shore really rapidly and can cause a number of different hazards that
can individually cause significant impacts, such as storm surge, flooding an extreme
winds. Hence, it is crucial that we detect the eye and predict the occurrence of the
cyclone as early as possible. Our model aims to solve the aforementioned problem
by providing prior predictions of the formation of cyclones by analyzing the change
in the surrounding parameters.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 137
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_12
138 A. Khandelwal et al.
2 Related Works
Wen et al. [1] proposed deep learning-based image processing that is utilized to
categorize and detect faults using SEM images of metallic AM components (cracks
and pores). Almost any single imperfection may be classified as a fracture or a pore
using the adjusted CNN model. The model does not give accurate results with large
data.
Nair et al. [2] proposed that tropical cyclones are the foremost dangerous weather
systems that originate over the tropical oceans, with roughly 90 storms forming
annually throughout the globe. Quick identification and tracking of tropical cyclones
are crucial for advanced warning to sensitive locations. As these storms occur over
open oceans distant from the landmass, remote sensing is required to detect them.
The model faces difficulty in detecting the edges of the domain.
Raza et al. [3] suggested the IR-MSDNET architecture to merge infrared and
visible images. This is done to ensure that the fused image retains significant elements
from both IR and visible images. Object detection trials on the IR-MSDNET model
revealed that the model did an excellent job of enhancing the features of the fused
image. This model is slower than other deep learning methods.
Lu et al. [4] proposed a semi-supervised approach for discovering the 2D param-
eters of extratropical cyclones (ET) in the Northern Hemisphere in this work. By
comparison, the new method effectively supports the old method by increasing the
number of recognized cyclones by 8.29%. The Mask R-CNN model is also excellent
in detecting horizontal characteristics in tropical cyclones. The labeling process is
not suitable for large-scale labeled database.
Zhao et al. [5] presented the DeepGlobe building extraction challenge that asks
you to locate all the building polygons in the satellite photos provided. In the studies,
the Mask R-CNN approach achieves nearly comparable accuracy and completeness.
In comparison with Mask R-CNN, this method produces more regularized polygons
that is advantageous in a range of applications. The challenging problem of this
model is to detect small objects and closely located buildings.
Rau et al. [6] designed the backpropagation trained artificial neural network
(BP ANN) to classify diverse sea surface conditions using information from high-
resolution AVHRR visible and infrared images. A neural network was further pro-
posed to forecast the movement of ice coverage using time series analysis. The
research work for this model is to implement motion detection neural net that is still
under progress.
Gupta et al. [7] used vast rainfall data to create helpful storm patterns. The
three categories of storms-local severe storms, hourly storms, and overall storms-
discovered using MapReduce-based algorithms were stated according to the paper.
Local severe storms, on the whole, have the temporal features of storms that takes
place in a localized place. Storms that occur at a specific hour have spatial char-
acteristics. The paper uses K-means clustering to identify distinct sorts of hourly
storms based on their shapes and sizes. Actual shapes of the centroids of the cluster
in experiment screenshot are not adequately handled.
12 Cyclone Forecasting Before Eye Formation … 139
Emre Celebi et al. [8] developed color quantization that is helpful graphics and
digital image processing technology. The effectiveness of k-means as a color quan-
tizer is investigated in this work. Their study developed fast and precise k-means
with several starting techniques and compare the quantizers to some of the most
prominent quantizers in the literature. All the variants of k-means, each one with a
different initialization scheme proposed involves randomness.
Pham et al. [9] suggest that the most important job in allowing for timely road
damage repair is to promptly and accurately classify and detect the damage. The
paper test Detectron2 is better than the faster R-CNN implementation because it uses
various base models and parameters. When tested on the X101-FPN basic model,
the findings that displayed the F1-scores for the faster R-CNN and Detectron2 were
51.0% and 51.4%, respectively. Improvisation of the labeling process for this dataset
should be implemented.
Liu et al. [10] proposed that the energy that is acquired from the wind is the
one with the most rapid expansion on the planet, and it is viable ecologically. Wind
speed prediction for time series forecasting is critical for an accurate and efficient
appraisal of offshore wind energy that benefits wind farm owners, grid operators,
and end-users. One of the most used models out there for predicting hourly wind
speed in Scotland’s offshore/coastal region is the SARIMA model. The constructed
prediction model was then compared to recently developed deep learning-based
algorithms LSTM and GRU, and a quantified performance measure was generated.
Among the three evaluated prediction models, SARIMA had the best accuracy and
robustness. The SARIMA model has to be trained further on more parameters to get
accurate results.
Lim et al. [11] have discussed how encoder and decoder design have been applied
in one-step-ahead and multi-horizon time series forecasting. The paper also outlines
an approach for deep learning that is hybrid where statistical methods are combined
with neural network elements to boost efficiency. The paper gives an overview of
how to use the deep learning technique that can aid decision-making with time series
data.
Karevan et al. [12] study shows that LSTM’s capacity to recognize long-term
correlations has been widely used. Karevan, in the paper, discusses how LSTM is used
on time series data for weather forecasting. Here, a different version of LSTM that is
T-LSTM is used. T-LSTM, also known as transductive LSTM, uses local information
for prediction. The test points have a bigger influence on making a prediction in T-
LSTM. Two different weighting schemes were used, and the experiments were also
conducted at two different times of the year to get better predictions. It is seen that
T-LSTM works better in terms of predictions. To prevent commonality between two
days in a row, the latest two samples in the dataset are taken into account. The dataset
is thereby made smaller.
Geetha et al. [13] have proposed that ARIMA, also known as the auto-regressive
integrated moving average model, is well known for time series forecasting. It helps to
predict unknown data in the series. The model was used to forecast tropical cyclones.
Tropical cyclones cause a lot of damage to humankind, so predicting accurately could
avert a lot of cyclone-related disasters. The model is built on the ARIMA of TSM.
140 A. Khandelwal et al.
3 Hexagon Framework
3.1 Preprocessing
The INSAT-3D satellite records visible (VIS) and infrared (TIR-1) channel photos
that are included in the Dataset collection. Images are converted from .tif to.jpg format
during preprocessing, resulting in grayscale images with a resolution of 1074 * 984
pixels. This paper describes a method for improving cyclone image segmentation. It
consists of two steps: segmentation and batch normalization. The image intensities
are first standardized using pixel histograms during preprocessing. Morphological
processing is then used to remove the non-cyclone portions. During the segmenta-
tion process, Detectron2 was presented as a method for detecting cyclone zones in
images and compared to another popular method, K-means clustering. The batch
normalization inputs the shape of the image as 150 * 150 and adds it to the proposed
model. Figure 2 depicts a manually annotated image to identify the cyclone portion.
Figure 3 is treated as a series of clusters when processed under K-means.
12 Cyclone Forecasting Before Eye Formation … 143
There are two considerations when developing a model for sequential time series
forecasting. The convolutional neural network (CNN) covers spatial applications,
while the bidirectional gated recurrent unit (Bi-GRU) monitors temporal applica-
tions. Convolutional neural network (CNN) is a subset of neural networks used to
process images and perform functions like classification, prediction, and segmenta-
tion. CNN is a form of multilayer perceptron where every neuron in the current layer
is connected to the neurons in the next layer. CNN assists in changing the images into
a format that is simpler to analyze without compromising details that are essential
for making accurate predictions. The model’s preprocessing phase consists of three
local feature learning block (LFLB) layers, each led by two Bi-GRU-CNN layers.
The grayscale images fed into the model had 400 * 400 pixel dimensions. A series of
four images were used to train the model that produced a tensor with the dimension
(4, 150, 150, 1), and the final prediction was compared to the next image in the
series. The dataset contains 45 images, with four images in each set of five. Tanh is
the activation function used in the Bi-GRU-CNN layers, with a dropout of 0.3. The
prediction proceeds in the following manner: First, images at time sequences 1, 2,
3, and 4 are extracted from the INSAT-3D dataset, with each image occurring at a
30-minute interval, that the model uses as input for training. Using the four images,
the model predicts the image for the fifth time sequence and compares it to the actual
image.
3.2.1 Bi-GRU
Figure 4 shows a single GRU unit. The update gate X t and the reset gate Z t are
combined to form a GRU unit. The equations formed by the update gate are shown
in Eq. 1, and the reset gate is shown in Eq. 2. The output kt is controlled by both
the current input Vt and the previous state kt−1 while these two gates are operating.
Equation 4 shows the outcome of the current input Vt and the previous state kt−1 . The
outputs of the gates and the GRU unit are calculated as follows. Equation 3 denotes
the output of the combination of update gate and reset gate. The computations for
the GRU unit, the rest and update gates, and their respective outputs are listed below:
The outputs of the gates are computed using the logistic sigmoid function, the GRU
unit, and the hyperbolic tangent. The weight matrices Wx , Ux , W y , U y , Wz , and Uz
are employed in this process. The Hadamard product. C x , C y and C z are the synthesis
of bias vectors for input Vt and prior state kt−1 .
Yt = σ [W y ∗ Vt + U y ∗ (Z t kt−1 ) + C x ] (3)
When working with the present data, models with a bi-directional structure have the
capacity to learn information from both past and subsequent data. The first GRU goes
forward, starting at the beginning of the data series, while the second GRU moves
backward, starting at the conclusion of the data sequence. This enables knowledge
from the past as well as the future to affect the conditions of the present.
kt = (1 − X t ) kt−1 + X t Yt (4)
←
−
k = GRUBkw (Vt , ht+1 ) (6)
←−
kt = kt ⊕ k (7)
←−
where k shows the forward GRU that forms Eq. 5, and the other ( k ) state represents
the backward GRU, and the equation formed by backward GRU is shown in Eq. 6,
⊕ indicates the operation of concatenating two vectors. The final product of kt is
displayed in Eq. 7. Table 3 shows symbol definition.
For cyclone detection, two methods were compared, namely Detectron2 and K-
means. The comparison is shown in Table 4. Figure 5 shows the results of Detectron2
and K-means.
For the cyclone prediction, different architectural models such as CNN-LSTM,
CNN-BiLSTM, CNN-GRU, and CNN-Bi-GRU were compared, and out of that
CNN-Bi-GRU showed the best results. The comparison of models based on their
evaluation metrics is shown in Table 5. Many hyperparameters like the number of
epochs, pool size, drop-out probability, number of units, activation layer, and opti-
mizer are experimented with and compared. The model has been built on different
input sequences to analyze at what configuration the model generates, the lowest
error, measured in terms of MSE. Table 6 depicts this comparison on different input
specifications. From the input specification it was observed, the base model with the
lowest MSE was achieved with four (150 * 150) images to the network and predicted
the fifth image.
146 A. Khandelwal et al.
The final CNN-Bi-GRU model that was constructed after all experiments included
input size as 150 × 150, number of epochs as 150 with pool size equal to 4 × 4,
drop-out probability equal to 0.4, number of GRU units equal to 1024, and nadam
as optimizer. The model gave an MSE of 1611.27591 and an SSIM of (0.99836,
0.99975) to train 150 epochs. SSIM is in the format: (Output_Image, Input_Image).
The parameters used for plotting the loss function graph consist of nadam as the
optimizer, soft plus as the activation function, and mean squared error as the loss
function. The provided set works the best for regression or prediction models when
compared to other sets and was plotted after model training as shown in Fig. 7, and
the predicted result is shown in Fig. 6. Figure 8 shows training overview of loss vs
input image plot.
148 A. Khandelwal et al.
5 Conclusion
The focus of this paper is to detect cyclones and predict them before eye formation
using the proposed model and algorithms, from INSAT-3D satellite images. Cyclone
detection is accomplished by K-means and Detectron2, which segments the image
and pinpoints the cyclone region. CNN-Bi-GRU model is used for cyclone forecast-
ing. The performance of the suggested CNN-Bi-GRU model is assessed using two
metrics. One of the metrics used here is mean squared error, which is defined by the
square root of the differences in pixel intensities of the compared input images. The
structural difference between the two images is determined with the use of structural
similarity index (SSIM) which is the other metric. SSIM is a more robust algorithm
since it compares two images based on a window size of N ∗ N rather than com-
paring the complete images like MSE does. According to the output obtained, SSIM
of the proposed model is (0.99836, 0.99978), and the MSE value is 1611.27591. In
future, the wind speed and temperature data from the satellite could be analyzed to
predict the intensity of the cyclone and achieve better results.
Acknowledgements This project was created under India Space Research Organization’s problem
statement SS591 at Smart India Hackathon 2022. The authors would like to thank SIH, AICTE,
and ISRO for providing us with this opportunity.
12 Cyclone Forecasting Before Eye Formation … 149
References
1. Wen H, Huang C, Guo S (2021) The application of convolutional neural networks (CNNs) to
recognize defects in 3D-printed parts. Materials 14(10):2575
2. Nair A, Sai Srujan KSS, Kulkarni SR, Alwadhi K, Jain N, Kodamana H, Sandeep S, John VO
(2021) A deep learning framework for the detection of tropical cyclones from satellite images.
IEEE Geosci Remote Sens Lett 19
3. Raza A, Liu J, Liu Y, Liu J, Li Z, Chen X, Huo H, Fang T (2021) IR-MSDNet: infrared and
visible image fusion based on infrared features and multiscale dense network. IEEE J Sel Topics
Appl Earth Obs Remote Sens 14:3426–3437
4. Lu C, Kong Y, Guan Z (2020) A mask R-CNN model for reidentifying extratropical cyclones
based on quasi-supervised thought. Sci Rep 10(1):1–9
5. Zhao K, Kang J, Jung J, Sohn G (2018) Building extraction from satellite images using mask
R-CNN with building boundary regularization. In: Proceedings of the IEEE conference on
computer vision and pattern recognition workshops, pp 247–251
6. Rau Y-C, Comiso JC, Lure FYM (1994) Application of neural networks for identification of
sea ice coverage and movements from satellite imagery. In: Proceedings of IEEE international
geoscience and remote sensing symposium IGARSS, vol 3, pp 1407–1409
7. Gupta U, Jitkajornwanich K, Elmasri R, Fegaras L (2016) Adapting K-means clustering to
identify spatial patterns in storms. IEEE, pp 2646–2654
8. Emre Celebi M (2009) Effective Initialization of K-means for color quantization. IEEE, pp
1649–1652
9. Pham V, Pham C, Dang T (2020) Road damage detection and classification with Detectron2
and faster R-CNN. IEEE, pp 5592–5601
10. Liu X, Lin Z, Feng Z (2021) Short-term offshore wind speed forecast by seasonal ARIMA—a
comparison against GRU and LSTM. Energy 227:120492
11. Lim B, Zohren S (2021) Time-series forecasting with deep learning: a survey. Philos Trans
Roy Soc A 379(2194):20200209
12. Karevan Z, Suykens JAK (2020) Transductive LSTM for time-series prediction: an application
to weather forecasting. Neural Netw 125:1–9
13. Geetha A, Nasira GM (2016) Time series modeling and forecasting: tropical cyclone prediction
using ARIMA model. In: Proceedings of 2016 3rd international conference on computing for
sustainable global development (INDIACom), pp 3080–3086
Chapter 13
Fusion of Information Acquired
from Camera and Ultrasonic Range
Finders for Obstacle Detection
and Depth Computation
1 Introduction
The most essential sense for understanding reality around us is eyesight. Playing a
key role in sensor integration, it provides a means of feedback to balance interaction
with the environment. The loss of vision makes it difficult to live a normal life. The
way one performs socially, psychologically, physically, and independently varies and
can be adversely affected by it. According to the World_Health_Organization [1],
284 million persons are visually impaired worldwide, and 39 million of those are
blind. Visual information can apprehend gadgets within the surroundings. However,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 151
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_13
152 J. Madake et al.
judging by the space of the item, the use of a picture is computationally expensive.
Our task is to fuse sensors, i.e., a digital digicam and an ultrasonic sensor, to stumble
on an item and calculate its distance. That is cheaper in terms of computation and
cost. According to the 2019 Road Accident Report [2], 449,002 accidents occurred
throughout the country, resulting in 151,113 deaths and 451,361 injuries. Data fusion
[3] is a technique for combining various data and expertise to give a consistent, accu-
rate, and comprehensive depiction of an environment or process to provide the most
thorough description feasibly. Applications include information systems, process
management, autonomous systems, military systems, and civilian surveillance and
monitoring duties. Combining various sensors, including sound, vision, pressure,
etc., will lead to more accurate and comprehensive statistics that cannot be obtained
from a single sensor alone.
Sensor fusion lets autonomous vehicles avoid obstacles. Autonomous naviga-
tion drives a car without human intervention. Existing object categorization and
detection methods use photos and videos. Image and video data cannot accurately
measure distance. Data fusion has involved fusing multiple sensors, but the hardware
and computational expenses are high. Fusion ensures reliable object detection data.
Ningthoujam et al. [4] identified things with a camera and ultrasonic sensor. Image
segmentation determines object edges. Ultrasonic sensors measured image items’
sizes. Bai et al. [5] found the picture’s item using way finding. Ultrasonic range
finders and depth cameras eliminate inaccuracy while measuring distance. In the
laboratory, ultrasonic sensors were extensively evaluated [6]. Multiple sensors were
recommended instead of one. A sensor may malfunction if another sensor blocks
it. Fusion-based depth camera and ultrasonic sensor detection were employed [7,
8]. The ultrasonic sensor offers 2.61% accuracy for solid, hard-like surfaces but
many inaccuracies for sponge or mesh surfaces. The Kinect depth sensor has 0.89%
accuracy and works with more objects. SURF can detect picture patch size changes
even without optical flow [9]. Valipoor et al. [10] found that sensors were used for
low-range detection and cameras for mid-to-high range.
The fusion of GPS and the magnetic campus was used in [11] to get the location.
The obstacles present in the real-time environment were well understood in [12]
by using stereo vision and ultrasonic sensors. A heat-detecting infrared camera was
installed [13], allowing the front car and pedestrian to be spotted in advance. The
moving item was tracked using a static observation point in [14] using a method
that combines stereo vision and the Kanade_Lucas_Tomasi feature tracker. A self-
integrated low-cost stereo vision system was utilized to map the three-dimensional
environment utilizing MATLAB-based point cloud creation. A dynamic sub-goal
selection approach used in [15] directs individuals and supports them in avoiding
obstacles. This method was a key component of a whole navigation system for blind
people’s daily walks. A fusion of color-detecting sensors and obstacle sensors was
used in [16], together with a voice-based support system, to make a person aware of
the path they travel as well as the obstacles in their way. Static barriers were extracted
from a series of pictures using depth maps. Using monocular fisheye cameras in [17],
a large broader field of vision was covered and objects closer to the vehicle were
detected. Xu et al. employed two types of security measures [18]: Single_sensor_
13 Fusion of Information Acquired from Camera and Ultrasonic Range … 153
2 Methodology
This paper presents an object detection and distance calculation system using a
monocular camera and ultrasonic sensors as shown in Fig. 1.
This system uses a single monocular image as its input, and, once it has identified
an object, it determines the distance between itself and the object. In addition, an
ultrasonic sensor is used to determine the distance. When the distances are the same,
a voice message is produced by the device.
The proposed system is implemented using a monocular camera. The image of the
obstacle is captured and resized to 1200 × 600 pixels. The threshold value is 0.5, and,
based on the threshold values; it converts a gray picture to a binary image. The stan-
dard thresholding value is 0.5 and is used if the item is lighter than the background.
Inverse thresholding is used when the item is darker than the background.
Typically, contour refers to pixels with the same color and intensity along a border.
We utilized the simple chain approximation contour detection algorithm. Only the
endpoints required to draw the contour line are returned by this approach. Chain
approx. none stores all contour points, while chain approx. stores only corner points.
Chain approx. is therefore memory efficient. Chain approx. compresses only the
endpoints of horizontal, vertical, and diagonal segments along a contour. Thus, any
points along straight lines will be deleted, leaving only the final points. Consider the
shape of a curved rectangle. With the exception of the four corner points, all contour
points will be rejected. This method does not save all of the points, but it requires
less memory and so executes faster than chain approx. none. In the initial phase, the
image was read and converted to grayscale. Converting an image to grayscale is vital
since it prepares it for the subsequent step. For the contour detection technique to
function properly, the image must be transformed to a grayscale image with a single
channel before thresholding. Always perform binary thresholding or intelligent edge
detection to the grayscale image before searching for contours. In this instance, binary
thresholding was utilized. This turns the image to black and white, highlighting
areas of interest and making the task of the algorithm for detecting contours easier.
Thresholding makes the image’s object’s border consistently white, with the same
intensity for each pixel. Based on these white pixels, the software may now infer the
object’s edges. They discovered contours in the image.
In order to create an exact boundary across the detected object that has well-defined
edges, contours are used as mentioned in Algorithm 1. Contours are just a curve that
connects all of the continuous points (along the border) that have the same color or
intensity. An ordered list of 2D vertices (control points) is connected by straight lines
of fixed length. The contours are a valuable tool for item detection and recognition
as well as form analysis. Binary thresholding is applied before taking out contours.
It can create a maximum of ten boundary boxes for ten objects in an image.
A complete process diagram of the proposed system is shown in Fig. 2.
The monocular image is resized and converted to grayscale to extract the descrip-
tors. To increase the accuracy, a binary threshold is applied to contours to detect the
object and create the boundary boxes using ten objects with a maximum area. After
finding the bounding boxes, we found the focal length and calculated the distance.
156 J. Madake et al.
We compared the distance calculated by the camera and the ultrasonic sensor. And
finally, voice output was given for the matched distance.
Consider the object of height ‘h’, kept at a distance ‘d’ from the lens as shown in
Fig. 3, which creates the θ 1, and when we move the object with a distance ‘m’, it
creates the θ 2 .
f = focal length, h = height of the object.
Consider that ‘OBJ’ is the position of the original object, and ‘h’ is the height of
the object; ‘f ’ is the distance between the lens and CMOS. At the point of ‘OBJ’,
the height of the reflected image is ‘a’; when we move our object with distance ‘m’
toward the lens, then the height of the reflected image is ‘b’ which is greater than
‘a’.
For θ 1 ,
For θ 2
a/b = h/d ∗ d − m/ h
a/b = d − m/d = 1 − m/d
a/b = d − m/d = 1 − m/d
m/d = 1 − a/b
d = m/1 − (a/b)
When an object gets closer to the lens, the size of the reflected image increases.
The time taken by a pulse is used to travel to and from ultrasonic signals, but we
only need half of it. As a result, time is divided by two.
Fig. 4 Hardware
implementation
Before processing, YOLOv3 slices an image into a grid. Boundary boxes, also
known as anchor boxes, surround items with high categorization scores. Each
bounding box is used to identify one object, and its confidence score reflects the
forecast’s accuracy. The initial data set’s most common forms and sizes construct
the boundary boxes. To find the most common ground truth box dimensions, they
are aggregated.
The results of the distance between the camera and the detected object are included
in the first column of Table 1. The second column includes the distance between the
ultrasonic sensor and the detected object. The third column includes the difference
between these two distances, which is included in the error section.
An ultrasonic sensor and a monocular camera were used to detect objects at a
distance. We take seven different readings for cars, bikes, and people, as given in
Table 1.
The monocular camera captures the objects along with their distance, and
measurements from the ultrasonic sensor also give the distance. The analysis was
made by observing the distance and analyzed the error distance between the monoc-
ular camera and the ultrasonic sensor. Table 1 shows a comparison of the observations
with the samples. In some cases, the difference is as small as 2 cm and, in some cases,
as large as 35 cm. The system received 90% of accuracy after testing on different
objects at different distances. Here are the results shown in Fig. 5 of detecting multiple
objects using the YOLOv3 algorithm. Figure 5 shows that YOLOv3 identified several
objects. YOLOv3’s model backbone is DarkNet-53. DarkNet-53 has residual blocks
and up-sampling networks. YOLOv3 can anticipate at three scales because of its
unique architecture. These forecasts require layer 82, 94, and 106 feature maps.
YOLOv3 can recognize features at three scales, making up for YOLOv2 and YOLO’s
weaknesses in recognizing smaller objects. The technique preserves fine-grained
features by concatenating up-sampled layer outputs with data from prior layers. This
helps identify smaller objects. YOLOv3 predicts three bounding boxes for each cell,
compared to five in YOLOv2, but it does so at three layers, bringing the total to nine.
Figure 6 displays the output images that were acquired by a monocular camera
while it was being used to calculate distance.
13 Fusion of Information Acquired from Camera and Ultrasonic Range … 161
4 Conclusion
A camera served as one modality in the system that we built, and an ultrasonic sensor
served as the second modality. The use of vision and distance estimation based on
ultrasonic waves has been combined. The system can calculate the distance with a
small error which is acceptable, as mentioned in Table 1. Accurate item categorization
is achieved by the utilization of YOLO version 3. The system has an accuracy of
90% both in terms of detecting objects and estimating their distances. The system
is able to function in real time and can determine the distance between two points
even if one of them is moving. The proposed system can be used in a number of
potential applications, some of which include route planning, obstacle recognition,
and algorithms for avoiding obstacles.
162 J. Madake et al.
References
22. Tapu R, Mocanu B, Zaharia T (2013) A computer vision system that ensure the autonomous
navigation of blind people. In: E-health and bioengineering conference, pp 1–4
23. Aman MdS, Mahmud MdA, Jiang H, Abdelgawad A, Yelamarthi K (2016) A sensor fusion
methodology for obstacle avoidance robot. In: 2016 IEEE international conference on electro
information technology. IEEE, pp 0458–0463
24. Terven JR, Salas J, Raducanu B (2013) New opportunities for computer vision-based assistive
technology systems for the visually impaired. Computer 47(4):52–58
25. Nieuwenhuisen M, Droeschel D, Beul M, Behnke S (2014) Obstacle detection and navigation
planning for autonomous micro aerial vehicles. In: 2014 international conference on unmanned
aircraft systems, pp 1040–1047
26. Matusiak K, Skulimowski P, Strurniłło P (2013) Object recognition in a mobile phone appli-
cation for visually impaired users. In: 2013 6th international conference on human system
interactions, pp 479–484
27. Shahira KC, Tripathy S, Lijiya A (2019) Obstacle detection, depth estimation and warning
system for visually impaired people. In: TENCON 2019—2019 IEEE region 10 conference,
pp 863–868
28. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object
detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp 779–788
29. Jabnoun H, Benzarti F, Amiri H (2015) Object detection and identification for blind people
in video scene. In: 2015 15th international conference on intelligent systems design and
applications, pp 363–367
Chapter 14
Efficient Approach for Malware
Detection Using Machine Learning
Classifier
1 Introduction
In today’s generation, every individual from younger one to the old age person is
addicted to smart gadgets. Every one of us spend so many hours daily using smart
gadgets. Use of all the smart gadgets has rapidly increased in some years because
every device is Internet enabled and everything is available on the Internet nowa-
days [1]. At the same time, risk is also increased because these devices knowingly
unknowingly collect the sensitive information of the user. There are many incidents
where attackers have looted people’s money online by sending those fake links and
malwares [2]. Recent study also shows that android smartphones are the easiest target
of attackers and are most sensitive to the virus attacks [3]. In this digital era, we can
do money transactions using our smartphone sitting comfortably at our home also.
But if a device is affected with malware, there is a big risk that your account may be
hacked and misused by an attacker. In the past, many traditional malware detection
techniques like signature based, etc., were available and used for malware detection,
but over the years these malwares have also upgraded themselves and due to these
traditional techniques are inefficient now, and it is very difficult to detect these latest
malwares using traditional techniques.
So, we need to find some latest approach for malware detection. In various research
works and through various studies, it has been proved that machine learning is
the best suitable technique for efficiently detecting malwares [4]. So, exploring,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 165
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_14
166 U. V. Nikam and V. M. Deshmukh
implementing, and working with machine learning techniques can provide effec-
tive solutions for malware detection. Considering all these things, many machine
learning techniques were studied, and then an efficient technique was developed. The
same technique is explained in this paper. This approach has evaluated ten different
machine learning algorithms using parameters such as accuracy, AUC, precision, and
recall. Accuracy and efficiency achieved for all these algorithms using our technique
are presented in the below sections of this paper.
This paper provides an efficient technique to detect malware using machine
learning classifiers. This research work is implemented using ten machine learning
classifiers. The model is built and trained using a debrian_215 Kaggle dataset. This
dataset includes 5560 malware and 9476 benign samples [5]. Performance of the
various classifiers is measured using some parameters such as, AUC, accuracy,
precision, recall, and F1 measure.
A paper is arranged in a way: Sect. 2 of the paper contains the related work
regarding the malware detection techniques. Methodology of the technique used is
explained in Sect. 3. Various criteria that are used for performance measurement of
machine learning classifiers and a discussion regarding the obtained results are in
Sect. 4 of the paper. Conclusion is written in Sect. 5 of the paper.
2 Related Works
Dhalaria et al. [6] have used a hybrid approach with static and dynamic features
for malware detection. They have created their two datasets and applied different
machine learning algorithms to their methodology. Results obtained show that
selecting hybrid features can be effective in detection of malware. Accuracy obtained
here is about 96.65%.
Darem et al. [7] have created an adaptive behavioral-based deep learning model
for the detection of malware. They worked on the concept drift problem and made
efforts to reduce malware’s evasive behavioral impact by naturalizing the operating
system to look like an actual machine and shading the virtual environment.
Gao et al. [8] have used a graph convolutional network for detecting malware. They
used API usage patterns to model the relevance of APIs, thereby doing classification.
The obtained results show that the GDroid technique outperformed. Their work yields
insight into the API usage patterns for malware detection and classification. GDroid
achieved an average accuracy of 97%.
Roseline et al. [9] have suggested a deep forest model for detecting malware.
The specialty of the proposed model is deep layering of ensemble and low model
complexity. This technique has outperformed others in malware detection. The
detection rate of this model through the result shown is 98.65%.
McGiff et al. [10] have combined the hardware and some permission data features
for the detecting malwares. Model’s performance as well as its accuracy is improved
due to the effect of combining these features. So, working on the combination of a
variety of features can be effective.
14 Efficient Approach for Malware Detection Using Machine Learning … 167
In [11], Noor Azleen Anuar et al. have made use of the opcode analysis method
for detecting malware. With their experiment and the results obtained, they proved
that the frequency of occurrence of an opcode is higher with malicious applications
than in benign ones. As a result of their findings, we can conclude that the opcode
feature may be critical in distinguishing malware from benign applications.
Shhadat et al. [12] have experimented with some of the machine learning tech-
niques for malware detection in their work. To reduce a count of features, their work
presented an enhanced feature set with random forest. They experimented with some
machine learning algorithms on the benchmark dataset. Decision trees achieved the
highest accuracy of 98.2% among the machine learning algorithms used, while Naive
Bayes achieved the lowest accuracy of 91%.
This study [17] offers MAPAS, a malware detection system that uses computa-
tional resources adaptably and with high accuracy. MAPAS utilizes convolutional
neural networks to assess the actions of malicious programs that are based on API call
graphs (CNN). It just uses CNN to find commonalities in malware’s API call graphs.
A lightweight classifier used by MAPAS to effectively detect malware compares the
API call graphs of applications that will be classified with those used for harmful
activity in order to determine how similar they are. The results of the assessment
show that MAPAS can categorize applications 145.8% faster and utilize memory
about ten times slower than MaMaDroid.
This technique is implemented using a Python IDE, Spyder, which is available in the
Anaconda tool. It has built-in support for so many useful machine learning packages
that are used in both supervised and unsupervised work. The results are generated
with a computer having an Intel i5-3317U processor executing at 3.20 GHz, 8 GB
of RAM, and the Windows 10 operating system.
3 Methodology
The methodology implemented has three steps: collection of data, extraction and
selection of features, and performance measurement of machine learning algorithms.
All three steps are depicted in Fig. 1.
In this first step of the methodology, the main focus is on collection of the required
data. In any malware detection approach, the dataset selection role is very important.
168 U. V. Nikam and V. M. Deshmukh
Fig. 1 Implemented
technique
The collection of data created in this step consists of samples of benign as well as
malicious applications. A debrian-215 dataset available on the Kaggle website [5] is
used to measure the performance of the classifiers used in this technique. This dataset
has 5560 malicious samples and 9476 benign samples. The dataset consists of 215
features. The majority of the features are manifest permissions, which account for
53% of the total, API call signatures, which account for 33%, and other permissions,
which account for 14%. The presence of various permissions for benign as well as
malicious applications in the dataset is shown by values of 0 or 1. A value of 0 in a
dataset for a particular permission means that said permission is not needed for that
application, and a value of 1 means it is needed. A dataset has one more column that
indicates whether the application is benign or malicious.
14 Efficient Approach for Malware Detection Using Machine Learning … 169
In the second step of the methodology, first a dataset is preprocessed, and then
required features are extracted from the applications and selected for further tasks as
discussed below:
Preprocessing of Dataset: A dataset used may contain irrelevant data as well as
many columns that have a lot of missing values. Due to the inappropriate nature
of the data and the missing values, many errors could arise at the time of training
a model, and because of this, a model may not be trained properly and hence be
inefficient for malware detection. So it is really important to carefully preprocess a
dataset before it is used for any type of operation.
Python has many libraries like NumPy, Pandas, scikit-learn [12], etc. Various
features from these libraries can be used for preprocessing data. This approach uses
the Simple Imputer class and an average value method from the scikit-learn library
for preprocessing of data [13].
Extraction of Features: The majority of Android applications are built with Java
code. Compiling this Java code forms a byte code, and then again, DEX byte code is
formed by converting this byte code. These byte code files have the .class extension.
All the .class files are combined together, and a single .dex file is formed with the use
of the dx tool. At last, APK contents are formed by packing an Android application.
For analyzing these APKs, one needs to extract and separate them. Many reverse
engineering applications, like dex2jar, apk_tool, and jadx, are available for separating
and analyzing the performance of these applications.
Any Android application is nothing but an archive bundle. This includes a file
named manifest .xml. This file is very important in any apk, and it contains many
features required for doing static analysis. To extract these features, we must undo
this file by reverse engineering and extraction software.
As can be seen in Fig. 2, feature extraction and selection, every Android APK file
contains the manifest .xml and classes .dex files. These two files have features, namely
permission requests and API calls, which are important features in distinguishing
malware. These features are extracted using AndroPyTool. Permission requests from
manifest .xml files and API calls from classes .dex files are extracted for analysis
purposes. Similar to this, many other required features can be extracted and used.
This approach has focused only on the permission request and API call features.
Once features are extracted, 15 important features based on their score value that
are vital in differentiating between malicious and benign apps are selected using the
feature importance technique of machine learning. The following section explains
feature selection techniques.
Selection of Features: For achieving accurate results, selection of relevant features
that are effective in distinguishing between malware and benign apps is very impor-
tant. In this phase to preprocess a dataset, the Simple Imputer class of the scikit-learn
library is used.
170 U. V. Nikam and V. M. Deshmukh
4 Results Obtained
The following parameters are used to evaluate the performance and effectiveness of
machine learning algorithms:
Confusion Matrix: It is a table that displays the performance of machine learning
classifiers using various parameters. This information can be used to visualize the
performance of a model as well as to determine the usefulness of a machine learning
model [14]. Various parameters of the confusion matrix are given in Table 1 info of
the confusion matrix.
• False Positive: It is a ratio of incorrect positive samples to the entire negative
sample. Formula of FPR is
• False Negative: It means the model has detected it as negative, but actually it is
a positive sample. Formula for FN is
• Recall: It is the ratio of correct positive samples out of the total number of positive
samples and false negative samples. The value 0.0 means no recall, and 1.0 reflects
perfect recall.
• Precision: It is the ratio of correct positive samples out of the total number of
positive samples and false positive samples. The value 0.0 meaning no precision
and 1.0 reflects perfect precision.
Area under ROC Curve: The AUC-ROC curve helps in more clearly visualizing the
performance of machine learning classifiers. When the output of AUC is 1, it means
the classifier has perfectly differentiated between malicious and benign samples, and
when it is 0.5, it means a random prediction. Higher the AUC value better is the
performance of a classifier [15] (Fig. 4).
Table 2 shows the obtained values for each machine learning algorithms’ perfor-
mance:
14 Efficient Approach for Malware Detection Using Machine Learning … 173
4.3 Discussion
From the results shown in Table 2 and as per the graphs plotted in Fig. 5, it can be
concluded that all the algorithms have performed very well. In terms of accuracy,
XGBoost, random forest, and kernel SVM are the top three algorithms. The accuracy
of almost all the algorithms is above 97%, except for Naive Bayes.
In terms of precision and recall, again the same three algorithms have the highest
performance, which is reflected in the F1 measure score as well.
174 U. V. Nikam and V. M. Deshmukh
In terms of accuracy, the performance of XGBoost is the highest, i.e., 98.71, of all
the classifiers used, and it has also performed best with respect to precision, recall,
and the F1 measure.
Looking at the result shown in Table 2, it can be stated that XGBoost has achieved
the top most accuracy and, followed by that, is the accuracy of the random forest
algorithm.
Figure 4 for ROC curves shows the performance of the algorithms used. The false
positive rate is plotted along the x-axis, and the true positive rate is plotted along the
y-axis in this curve. Its value falls between 0 and 1. Looking at the values obtained
with the ROC curve in Fig. 4, the area under the curve for the XGBoost algorithm
is 0.990, which is considered the perfectly accurate curve. So, in terms of the ROC
curve parameter, the XGBoost algorithm’s performance is found to be the best of all.
Most of the other algorithms have an AUC value greater than 0.97, indicating better
performance. Only the Naive Bayes AUC value is 0.47, which is considered a poor
performance of the model [16].
Looking at the obtained results for performance evaluation of various classifiers
shown in Table 2 and graph plotted for accuracy in Fig. 5, it can be stated that
XGBoost has the highest accuracy out of all the algorithms evaluated, i.e., 98.71 and
an AUC value of 0.9899. So it can be claimed that it is the best classifier among all
others.
So it can be recommended to readers to use this algorithm in their malware
detection techniques to achieve significant results.
14 Efficient Approach for Malware Detection Using Machine Learning … 175
5 Conclusion
The debrian-215 dataset was used in this paper to assess the performance of ten
machine learning classifiers. This dataset consists of data from 15,036 malicious and
benign applications. A dataset was divided into a 70:30 ratio, where 70% of the data
was used for training and 30% was used for testing the model. The parameters accu-
racy, AUC, precision, recall, and F1 measure are used for evaluating the performance
of the ten machine learning algorithms.
From the result obtained, it is reflected that the accuracy of XGBoost algorithm,
i.e., 98.71, is the highest compared to other algorithms and has achieved the almost
perfect AUC value of 0.99. With respect to other parameters also, like precision,
recall, and F1 measure, the performance of XGBoost is superior to other algorithms.
In the future, in the search for more accurate techniques for malware detec-
tion, deep learning algorithms can also be evaluated, and their performance can
be measured with respect to a few parameters.
References
1. Naseer M, Rusdi J, Shanono N, Salam S, Zulkiflee M, Abu N, Abadi I (2021) Malware detection:
issues and challenges. J Phys Conf Ser 1807:012011. https://doi.org/10.1088/1742-6596/1807/
1/012011
2. Nikam UV, Deshmuh VM (2022) Performance evaluation of machine learning classifiers in
malware detection. In: 2022 IEEE international conference on distributed computing and elec-
trical circuits and electronics (ICDCECE), pp 1–5.https://doi.org/10.1109/ICDCECE53908.
2022.9793102
3. Arslan RS (2021) Identify type of android malware with machine learning based ensemble
model. In: 2021 5th international symposium on multidisciplinary studies and innovative
technologies (ISMSIT), pp 628–632. https://doi.org/10.1109/ISMSIT52890.2021.9604661
4. Ali R, Ali A, Iqbal F, Hussain M, Ullah F (2022) Deep learning methods for malware and
intrusion detection: a systematic literature review. Sec Commun Netw 2022:31. Article ID
2959222. https://doi.org/10.1155/2022/2959222
5. Miranda TC et al (2022) Debiasing android malware datasets: how can I trust your results if
your dataset is biased? IEEE Trans Inform Forensics Sec 17:2182–2197
6. Dhalaria M, Gandotra E (2020) A hybrid approach for android malware detection and family
classification. Int J Interact Multimedia Artif Intell In Press. 1. https://doi.org/10.9781/ijimai.
2020.09.001
7. Darem A, Ghaleb F, Al-Hashmi A, Abawajy J, Alanazi S, AL-Rezami A (2021) An adaptive
behavioral-based incremental batch learning malware variants detection model using concept
drift detection and sequential deep learning. IEEE Access 1–1. https://doi.org/10.1109/ACC
ESS.2021.3093366
8. Gao H, Cheng S, Zhang W (2021) GDroid: android malware detection and classification with
graph convolutional network. Comput Secur 106:102264. https://doi.org/10.1016/j.cose.2021.
102264
176 U. V. Nikam and V. M. Deshmukh
1 Introduction
Machine learning has a wide range of uses, and healthcare is one of them. For
example, machine learning is used in identifying diseases and making diagnoses,
smart health records, drug discovery and manufacturing, medical image diagnosis,
and machine learning-based behavior modification, among other things. People over
60 and men have a very high risk of heart disease. Nevertheless, even young people
have the same chance of being less than 60. “Obesity” is a prominent determinant.
This research will focus on whether a person who is obese also has heart disease.
This will be done by looking at the criteria for obesity. In the future, the researchers
want to find a way to use Human Activity Recognition to detect obesity and, based
on the results, determine how likely someone is to have heart disease. Some of the
causes of obesity that we know of so far are high blood pressure and an increase in
LDL. People who are overweight tend to eat too much. Therefore, the number of
calories one can burn will go down. For girls, having PCOS can be one of the reasons
they get heart disease if it is caused by being overweight or obese.
The main objective of the research as follows:
1. Firstly, checkout the factors that play an important role in making a person obese–
it can be any medical factors, lifestyle, and daily habits.
I. Mukherjee (B)
OmDayal Group of Institution, Uluberia, West Bengal, India
e-mail: indranim849@gmail.com
P. Bhattacharjee
Sister Nivedita University, Newtown, Kolkata, West Bengal, India
e-mail: pratikb@ieee.org
S. Biswas
Maulana Abdul Kalam Azad University of Technology, Haringhata, West Bengal, India
e-mail: mailtosuparna@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 177
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_15
178 I. Mukherjee et al.
2. To find out the thresholds of each concerned factor to highlight the abnormality
of a person’s health and habits, that makes them obese and make certain classes.
3. Find out the relations of those factors with heart disease.
4. Make a comparison of experimental results of supervised learning algorithms
(Support Vector Machine, Decision Tree, and Logistic Regression).
5. Trying to improve accuracy if not satisfactory.
6. Finding methodology to make hybrid datasets (Combining two or more datasets)
and measure the performance.
7. Compare the accuracy of two methods lastly.
Details of the methodology for combining two or more similar kinds of databases
are discussed on Sect. 3.2.
2 Earlier Works
images are collected from 2000 patients, from which this procedure gives success
on 1391 cases and for 2163 images unsuccessful 81.69% accuracy [10]. Three spar-
sity learning-based regression models are presented and evaluated in this research
with application of the automated prediction of the Mini-Mental State Examina-
tion (MMSE) scores for Alzheimer’s disease using T1-weighted magnetic resonance
images (MRIs) with 678 subjects, including 190 healthy control (HC) subjects, 331
mild cognitive impairment (MCI) subjects and 157 AD subjects. They used ridge,
lasso, and elastic net as a regression algorithm with five levels of whole brain volume
and MMSE score to be independent variables. Tenfold-cross validation is used to
measure the prediction performance and another tenfold used to estimate the optimal
parameter [11]. This research has introduced to describe the methodology to help
in diagnosis of sepsis machine learning techniques, like–Backpropagation, Artifi-
cial Neural Network (ANN), a Support Vector Machine (SVM) and Random Forest
(RF) classifiers were trained and tested by using the data of electronic health record
(EHR) for 185 critically ill patients, among which 13 patients were diagnosed as
having heatstroke; 27 with trauma; 9 with severe pancreatitis and 15 with post oper-
ation. Meanwhile, 102 cases of those data were diagnosed with bacterial sepsis by a
physician through the medical records [12]. A method of monitoring human activity
without smartphones. Activity recognition through–posture identification has been
done [13].
3 Proposed Methodology
3.1 Dataset
Several steps and conditions are followed for making a hybrid dataset.
i. First in the given values in first dataset, we have a pre-classified values for
Cholesterol and it has 3 classes–1: normal, 2: above normal, 3: well above
normal.
15 Evaluation of a Hybrid Dataset for Risk Assessment of Heart Disease 181
ii. In general, attribute Cholesterol can be categorized into 4 classes with respective
threshold values:
a. Cholesterol < 130 (Optimal/Normal)  1
b. Cholesterol > = 130 and Cholesterol < = 159 (Borderline)  2
c. Cholesterol > = 160 and Cholesterol < = 189 (High)  3
d. Cholesterol > = 190 (Very High)  4
iii. Now in the first dataset consider several factors like age, gender, BMI (calculated
from Height and Weight), systolic and diastolic blood pressure and check the
level of abnormality one patient has.
iv. In the next step check for that particular patient, if he/she has heart disease or
not in the data record by target attribute of the dataset 1.
v. If yes then check the level of cholesterol (in dataset 1) of his/her and replace a
close value of cholesterol from the dataset 2 for a patient who has heart disease
and also the factors are closely matched.
Here, this is needful to check if any null value exists. However, there is no null value
in this dataset. The selection of features focused on the dataset with a high correla-
tion between being overweight or obese and having heart disease. The relationship
between the qualities we chose and the target variable has been double-checked using
the information gain technique.
Attribute thresholds must be determined before classes can be made for that
attribute. Before that, everyone’s BMI was determined using their height and weight.
Classifications of qualities with their thresholds are displayed in Table1.
The binary classification problem is studied using three machine learning
algorithms: the Support Vector Machine, the Decision Tree, and the Logistic
Regression.
To determine the optimal highest margin separating hyperplane between the two
classes, Support Vector Machines (SVMs) are used in both classification and regres-
sion (a supervised learning technique) [14]. An SVM is a finite-dimensional vector
space in which each dimension represents a characteristic of a given sample [15].
The hyperplane (and thus the kernel) can be linear if there are only two features;
otherwise, it might be a polynomial or radial basis function.
Classification and regression can benefit from the supervised learning approach
known as a Decision Tree. The input to a Decision Tree is an item or circumstance
characterized by a collection of qualities, and the output is a binary “yes” or “no.”
182 I. Mukherjee et al.
Both continuous and discrete input values are acceptable. The leaf nodes return class
labels or probability scores [10]. Information gain and entropy calculations are the
basis of Decision Tree.
The categorical dependent variable may be predicted from a collection of inde-
pendent factors using the supervised learning process known as Logistic Regression.
Dependent variables are assigned probabilistic values between 0 and 1, which can
be interpreted as a range. Logistic Regression can be modeled as:
p(x)
log = β0 + xβ
1 − p(x)
In the first experiment that is before making the hybrid dataset the dataset is divided
into two parts by train-test split validation technique in 80% and 20%, respectively,
and Table 2 shows the result of accuracy of the research.
Table 2 Accuracy
Sl. No. Name of the algorithm Accuracy (%)
measurement for ML
algorithms 1 SVM (rbf karnel) 72.15
2 Decision tree 73.078
3 Logistic regression 71.85
15 Evaluation of a Hybrid Dataset for Risk Assessment of Heart Disease 183
Again, the dataset has been split into train and test data at a ratio of 80–20% using
the train-test split validation approach to gauge performance using a hybrid dataset.
The accuracy may be evaluated using several approaches, including the Accuracy,
Confusion Matrix, Precision, and Recall Value methods. The accuracy under the
three classifiers is shown in Table 2.
Confusion matrix–The confusion matrix is shown in Table 3, where
TP = True Positive (The actual value is positive and the model predicted value is
also a positive value.)
FP = False Positive (The actual value is negative and the model predicted value
a positive value.)
TN = True Negative (The actual value is negative and the model predicted value
is also a negative value.)
FN = False Negative (The actual value is positive and the model predicted value
a negative value.)
TP
Precision =
FP + TP
TN + TP
Accuracy =
(TN + FN + TP + FP)
TP
Recall value =
FN + TP
Table 4 shows the accuracy values, Table 5 shows the recall value, and Table 6
shows the precision value of SVM, DT, and LR in hybrid dataset, and the corre-
sponding confusion matrix is shown in Figs. 2, 3 and 4 for SVM, DT, and LR,
respectively.
The result and accuracy of the proposed methodology is compared with the state-
of-the-art methods in Table 7. It is observed that the proposed methodology performed
much better with an accuracy upto 98.8%.
The accuracy between two proposed methods is compared graphically in Fig. 5. It
is found that the accuracy improved significantly after applying the hybrid database.
Fig. 5 The accuracy comparison of two methods (Before and after applying hybrid database)
4 Conclusion
This study’s primary objective is to devise a method for increasing the precision
of classification algorithms (Support Vector Machine, Decision Tree, and Logistic
Regression) by combining existing databases and creating a new one. Using this
idea, the results have been displayed in Table 4, and in the Decision Tree, the largest
improvement in accuracy has been demonstrated. The remaining two saw some
progress as well. Table 5 shows the accuracy comparison of this research with other
existing research and Fig. 5 shows the bar graph to illustrate how accuracy has been
increased for our first method to detect heart disease with respect to obesity, after
applying hybrid database method.
In the long run, this research could lead to a combination of Human Activity
Recognitions so that further progress can be made.
References
1. Adegun AA, Viriri S (2020) FCN-based DenseNet framework for automated detection and
classification of skin lesions in dermoscopy images. IEEE Access 8:150377–150396
2. Shah D, Patel S, Bharti SK (2020) Heart disease prediction using machine learning techniques.
SN Comput Sci 1–6, Springer Nature Journal
3. Bhattacharjee P, Biswas S, Roy S (2022) Design of an optimised, low cost, contactless ther-
mometer with distance compensation for rapid body temperature scanning. Int Conf Electr
Electron Eng 503–511. https://doi.org/10.1007/978-981-19-1677-945
4. Bhattacharjee P, Biswas S (2021) Smart walking assistant (swa) for elderly care using an
intelligent realtime hybrid model. Evolving Syst 1–15 (2021). https://doi.org/10.1007/s12530-
021-09382-5
15 Evaluation of a Hybrid Dataset for Risk Assessment of Heart Disease 187
5. Motarwar P, Duraphe A, Suganya G, Premalatha M (2020) Cognitive approach for heart disease
prediction using machine learning. In: 2020 international conference on emerging trends in
information technology and engineering (ic-ETITE), pp 1–5
6. Babajide O, Tawfik H, Palczewska A, Gorbenko A, Astrup A, Martinez JA, Oppert J-M,
Sorensen TIA (2019) Application of unsupervised learning in weight-loss categorisation for
weight management programs. In: The 10h IEEE international conference on dependable
systems, services and technologies, DESSERT’2019. IEEE, pp 94–101
7. Roobini S, Fenila Naomi J (2019) Smartphone sensor based human activity recognition using
deep learning models. Int J Recent Technol Eng (IJRTE), 8(1), 2019
8. Rathod J, Waghmode V, Sodha A, Bhavathankar P (2018) Diagnosis of skin diseases
using convolutional neural networks. In: Proceedings of the 2nd international conference on
electronics, communication and aerospace technology (ICECA 2018), pp 1048–1051, IEEE
9. Dwivedi AK (2018) Performance evaluation of diferent machine learning techniques for
prediction of heart disease. Neural Comput Appl 29(10):685–693
10. Chai Y, He L, Mei Q, Liu H, Xu L (2017) Deep learning through two-branch convolutional
neuron network for glaucoma diagnosis. In: Proceedings of international conference on smart
health. Springer, Berlin, pp 191–201
11. Zhang J, Luo Y, Jiang Z, Tang X (2017) Regression analysis and prediction of mini-mental state
examination score in Alzheimer’s disease using multi-granularity whole-brain segmentations.
In: Proceedings of international conference on smart health. Springer, Berlin, pp 202–213
12. Liu Y, Choi KS (2017) Using machine learning to diagnose bacterial sepsis in the critically
Ill patients. In: Proceedings of international conference on smart health. Springer, Berlin, pp
223–233
13. Saha J, Chowdhury C, Biswas S (2017) Device independent activity monitoring using smart
handles. In: 7th International conference of cloud computing data science and engineering, pp
1–6
14. Hamdaoui HE, Boujraf S, Chaoui NEH, Maaroufi M (2020) A clinical support system for
prediction of heart disease using machine learning techniques. In: 5th International conference
on advanced technologies for signal and image processing, ATSIP’ 2020. pp 1–5
15. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Chapter 16
Distances from Fuzzy Implications
1 Introduction
In the literature, a few works have dealt with the construction of distance functions
using t-norms, copulas, quasi-copulas, and t-conorms, all of which are either com-
mutative, associative, or monotonic fuzzy logic connectives; see [1, 2, 10]. Recently
[7], the construction of distance functions using non-commutative, non-associative,
and mixed-monotonic fuzzy logic connective, viz., a fuzzy implication, has been
proposed. The necessary and sufficient condition for the proposed distance function
to be a metric leads to a functional inequality, which has been studied for different
families of fuzzy implications; see [7–9].
Recently, pseudo-monometrics w.r.t. a ternary relation, called the betweenness
relation, have garnered a lot of attention for their essential role in penalty-based
data aggregation, ranking rules, and binary classification [4, 11, 12]. These are a
few applications showcasing the importance of construction of monometrics on a set
equipped with different relational structures. In [9], it was shown that the distance
function proposed through fuzzy implications turns out to be a pseudo-monometric
on a partially ordered set X . In [5], the authors have proposed yet another construction
of distance functions from fuzzy implications on a lattice, which turns out to be a
pseudo-monometric w.r.t. the lattice betweenness relation.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 189
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_16
190 K. Nanavati et al.
In this work, we generalise the distance from fuzzy implications that have been
proposed in [9] using t-conorms. We show the sufficient conditions under which the
proposed distance yields a metric for different t-conorms along with examples and
counter-examples. In this quest, we also give a characterisation of fuzzy implications
I for which the sum of I (x, y) and I (y, x) is constant.
In our work, we show if and when the proposed distance function from fuzzy
implications turns out to be a pseudo-monometric on ([0, 1], ≤).
2 Preliminaries
In this section, we take a look at some definitions and examples that will be useful
in the sequel.
Definition 2 (cf. [3]) A function I : [0, 1]2 → [0, 1] is said to be fuzzy implication
if I is decreasing in the first variable, increasing in the second variable and satisfies
I (0, 0) = 1, I (1, 1) = 1 and I (1, 0) = 0.
Table 2 lists a few examples of fuzzy implications. For more examples; see [3].
x = y =⇒ d(x, y) = 0. (P1)
Further, it is called a metric if the converse of (P1) holds and it also satisfies the
following property for any x, y, z ∈ X :
Definition 5 (cf. [9]) Given a t-conorm S and a fuzzy implication I on [0, 1], the
pair (S, I ) is said to satisfy (S, I )-transitivity if
S(I (x, y), I (y, z)) ≥ I (x, z), for all x, y, z ∈ [0, 1]. (SIT)
Theorem 1 (cf. Theorem 1 [7]) d I is a metric iff I satisfies (SLK , I )-transitivity and
satisfies the following condition:
In this section, we shall generalise the distance function given in Definition 6 using
any t-conorm S. We shall then present some sufficient conditions under which our
proposed distance function yields a metric or a pseudo-monometric for the major
t-conorms given in Table 1. We shall provide examples and counter-examples for the
same.
Note that the distance function d I defined in Definition 6 is equivalent to
0, if x = y,
d I (x, y) =
max(I (x, y), I (y, x)), otherwise.
0, if x = y,
=
SM (I (x, y), I (y, x)), otherwise.
Taking a cue from the above definition, we can generalise d I for any t-conorm S.
Definition 7 Let I be a fuzzy implication. Define d I,S : [0, 1] × [0, 1] → [0, 1] as
0, if x = y,
d I,S (x, y) = .
S(I (x, y), I (y, x)), otherwise.
Note that d I,S is always a distance function and satisfies the converse of (P1) only
if I satisfies (2). Also, it need not always satisfy the triangle inequality which can be
seen from the following result.
Lemma 2 Let I be a fuzzy implication as defined in (3), where I does not satisfy
(SLK , I )-transitivity. Then d I ,S is not a metric w.r.t. any t-conorm S.
The following lemma provides a sufficient condition under which d I,S yields a
pseudo-monometric .
Lemma 3 d I,S is a pseudo-monometric if I (x, y) = 0 whenever x > y.
Now, we take a look at the behaviour of d I,S for the major t-conorms given in
Table 1. Recall that for S = SM , d I = d I,SM , and the results pertaining to d I have been
discussed in Sect. 2 (for more details; see [9]). Thus, we shall discuss the remaining
t-conorms in the sequel.
16 Distances from Fuzzy Implications 193
3.1 S = SLK
In this section, we study the sufficient conditions under which the distance function
d I,S yields a metric and a pseudo-monometric when S is the Łukasiewicz t-conorm.
We also give examples and counter-examples for the same. For S = SLK , we get the
following definition for d I,S :
Definition 8 Let I be a fuzzy implication. Define d I,SLK : [0, 1] × [0, 1] → [0, 1]
as
0, if x = y,
d I,SLK (x, y) = .
min(I (x, y) + I (y, x), 1), otherwise.
d I (0.1, 0.11) + d I (0.11, 0.45) = 0.48 + 0.45 = 0.93 0.933 = d I (0.1, 0.45).
We thus see from Corollary 1 and the example above that d I,SLK is a richer source
of metrics than d I .
Note that d I,SLK need not be always a metric, as can be seen from the remark
below.
Remark 1 Using the fuzzy implication I given in (4), one can construct a fuzzy
implication I as given in (3). From Lemma 2, we see that d I ,SLK would not be a
metric since I does not satisfy (SLK , I )-transitivity.
Remark 2 From Lemma 3, it is clear that d I,SLK is a pseudo-monometric if I (x, y) =
0 whenever x > y. However, it need not always be a pseudo-monometric, see the
example below.
Example 1 Consider the fuzzy implication I defined as in (4). Then d I,SLK is not a
pseudo-monometric since for the triplet (0.2, 0.3, 0.4), we have
3.2 S = SP
In this section, we study the sufficient conditions under which the distance function
d I,S yields a metric and a pseudo-monometric when S is the probabilistic sum t-
conorm. We also give some examples and counter-examples for the same. For S = SP ,
we get the following definition for d I,S :
Definition 9 Let I be a fuzzy implication. Define d I,SP : [0, 1] × [0, 1] → [0, 1] as
0, if x = y,
d I,SP (x, y) = .
I (x, y) + I (y, x) − I (x, y).I (y, x), otherwise.
Example 2 Consider the fuzzy implication I defined as in (4). Then d I,SP is a metric.
Note that d I,SP need not always be a metric, see the remark below.
Remark 3 Using the fuzzy implication I given in (4), one can construct a fuzzy
implication I as given in (3). From Lemma 2, we see that d I ,SP would not be a
metric since I does not satisfy (SLK , I )-transitivity.
Remark 4 From Lemma 3, it is clear that d I,SP is a pseudo-monometric if I (x, y) =
0 whenever x > y. However, it need not always be a pseudo-monometric, see the
example below.
Example 3 Consider the fuzzy implication I defined as in (4). Then d I,SP is not a
pseudo-monometric since for the triplet (0.2, 0.3, 0.4), we have
I (x, y) + I (y, x) = k, for all (x, y) ∈ (0, 1)2 where k ∈ [0, 2]. (5)
3.3 S = SD
In this section, we study the sufficient conditions under which the distance function
d I,S yields a metric and a pseudo-monometric when S is the drastic t-conorm. We
also give some examples and counter-examples for the same. For S = SD , we get the
following definition for d I,S :
Definition 10 Let I be a fuzzy implication. Define d I,SD : [0, 1] × [0, 1] → [0, 1]
as
⎧
⎪
⎨0, if x = y,
d I,SD (x, y) = I (min(x, y), max(x, y)), if I (max(x, y), min(x, y)) = 0, .
⎪
⎩
1, otherwise.
It is clear that the fuzzy implications for which I (max(x, y), min(x, y)) = 0 for
all x, y ∈ [0, 1], d I,SD = d I,SM . For instance, the fuzzy implication defined in (3).
Lemma 4 d I,SD is a discrete metric if I (x, y) > 0 whenever x > y, except when(x, y) =
(1, 0).
Note that the converse of the above lemma need not be true. Consider, for example,
the Rescher implication IRS given in Table 2. While IRS (x, y) = 0 whenever x >
y, it still yields a discrete metric. In fact, for any fuzzy implication I , satisfying
I (x, y) + I (y, x) = 1, d I,SD yields a discrete metric.
Note that d I,SD need not always be a metric or a pseudo-monometric, see the
example below.
d I,SD (0.3, 0.5) + d I,SD (0.5, 0.2) = 0.1 + 0.1 = 0.2 1 = d I,SD (0.3, 0.2).
Also, it is not a pseudo-monometric since for the triplet (0.2, 0.3, 0.5), we have
196 K. Nanavati et al.
Remark 5 One can easily lift the distance function d I,S on [0, 1] to any X = ∅ as
follows:
∗
Consider a mapping f : X → [0, 1]. Define d I,S : X × X → [0, 1] as follows:
for any x, y ∈ X ,
∗ 0, if x = y,
d I,S (x, y) = d I,S ( f (x), f (y)) =
S(I ( f (x), f (y)), I ( f (y), f (x))), otherwise.
∗
Clearly, d I,S is a distance function and it is a metric if d I,S is a metric.
4 Concluding Remarks
Acknowledgements The third author would like to acknowledge the support obtained from SERB
under the project MTR/2020/000506 for the work contained in this submission.
References
1. Aguiló I, Martín J, Mayor G, Suñer J (2015) On distances derived from t-norms. Fuzzy Sets
Syst 278:40–47
2. Alsina, C.: On some metrics induced by copulas. In: General Inequalities 4, pp. 397–397.
Springer (1984)
3. Baczyński M, Jayaram B (2008) Fuzzy implications. Studies in fuzziness and soft computing,
vol 231. Springer, Berlin, Heidelberg
4. Gupta M, Jayaram B (manuscript under preparation) On the role of monometrics in nearest
neighbor classification
16 Distances from Fuzzy Implications 197
1 Introduction
Around 1.4 million individuals worldwide lose their precious lives to traffic accidents
each year, with 3287 people dying on average each day. Road accidents result in
20–50 million extra injuries worldwide every year. One death is predicted to occur
globally every 25 s. Every year, more than 0.147 million individuals in India pass
away, and more than 0.47 million suffer injuries. A media article claims that more
than 11,000 lives are lost annually in traffic accidents because of fog. Each year,
fog causes over 24,000 injuries, or 16% of all traffic accidents Organization [1],
Transport [2]. The earth’s surface is fog, a collection of extremely fine moisture
from tiny water drops. Due to the drastic drop in temperature, moisture in the air
is suspended and creates fog. Water droplets with a radius of 1 to 10 µm make up
fog. Every time light penetrates the fog, it disperses and lessens contrast in the area.
Fog hence creates thick, white visibility. Driving becomes exceedingly difficult for
a motorist because of thick visibility. The high altitude in hilly terrain causes a faster
rate of temperature decline than in the plain zone. As a result of moisture suspension,
thick fog accumulates in the hilly terrain. Because mountainous roads are riskier to
K. Janaki (B)
M.E-Applied Electronics, PSN College of Engineering and Technology, Melathediyoor,
Tirunelveli, Tamilnadu, India
e-mail: janakik905@gmail.com
K. Jebastin
Deparment of Electronics and Communication Engineering, PSN College of Engineering and
Technology, Melathediyoor, Tirunelveli, Tamilnadu, India
e-mail: jebastin@psncet.ac.in
K. Dhinakaran
Senior Tech Lead HCL Technologies, Bangalore, India
e-mail: dhinakarank@hcl.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 199
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_17
200 K. Janaki et al.
drive on than flat ones, they are considered in this situation. On a mountainous road,
dense fog affects how drivers perceive their surroundings, making it difficult to see
nearby objects, pedestrians, and even other cars. Too much fog obscures the road
view. Driving at high speeds is impossible for drivers. As a result, driving becomes
extremely dangerous. The likelihood of an accident increases in two ways: first,
the likelihood of a collision increases, and second, the likelihood of falling into the
depth of the slope even increases. Figure 1 depicts a mountainous route covered
in dense fog. Some established techniques for defogging photos, such as driving
on the road. However, there is still much to learn about how to remove the thick
fog on uphill routes. To aid drivers in seeing clearly while driving uphill in heavy
fog, this article proposes a rapid, real-time fog removal technique. The suggested
method would be helpful for a safe drive on a heavy mountainous route with poor
visibility (below 100 m). The following summary of this paper’s main contributions:
For defogging thick video frames, a least-squares approach based on an atmospheric
scattering model is paired with separate histogram equalization developed on the
color channel. Compared to cutting-edge approaches, these integrated techniques
offer a clear, fog-free output in real-time. Rather than estimating ambient light at
every frame, it is done so at intervals of 6000 frames to cut down on the lengthy
processing time. It is suggested that a dynamic patch be used to implement frame
inversion, providing smaller patches for darker pixels and larger patches for brighter
pixels to reduce significant computation time without compromising the final frame’s
fog-free quality. The dynamic patch approach solves the issue of frame improvement
for the dark and sky regions. The literature study is included in Sect. 2, and the
suggested technique and execution are explained in Sect. 3. The comparison, time
delay analysis, and experimental and simulation results are presented in Sect. 4. The
following list includes a handful of the current methods. With the use of a guided
filter, a fog removal technique is shown for both pictures and movies in Lin and Wang
[3].
2 Field of Study
There are some current fog dispersal algorithms available. All of these, nevertheless,
are relevant to a single image and a certain context, such as daytime, nighttime, sea
view. The following list includes a handful of the current methods. With the use of a
guided filter, a fog removal technique is shown for both pictures and movies in Lin
and Wang [3].
Attenuation is restored after the filter analyses the light from the atmosphere
(decreases the contrast). An introduction of a dark channel-prior removes the mist
pictures. Therefore, a dark pixel may be used to determine the haze transmission.
A haze-free image may be reconstructed by combining soft matching with a haze-
imaging mod haze-imaging light in the cloudy input frame is used to estimate the
optical transmission Fattal [4]. A scene view without any fog is possible thanks to
the depth map, which also allows for a fast approximation of the transmission map.
An optimum transmission map for removing fog from a single image is created He
et al. [5]. A boundary prior is added to the initial transmission map after carefully
analyzing the visual model. For nighttime frames, the super-pixel-based fog reduc-
tion approach is anticipated. By utilizing virtual smoothness, the input frames are
separated into glow-free and glow-foggy frames. For visual marine surveillance, Hu
et al. [6] offer a single-picture fog removal technique. A scattering model and the
radiance decomposition approach remove the fog layer and glow upshot on the air
light, respectively. The transmission map is then projected. The suggested radiance
compensation approach also makes it possible to create a frame that is free of fog.
A gamma correction prior-based dehazing technique is provided to restore the hazy
images.
In this paper, a quick and creative method for removing fog from a driver’s field
of vision in dense fog in mountainous terrain is provided. The temporal complexity
and a clear, fog-free output are two of the biggest hurdles. The processing time for
each frame will now be relatively brief, thanks to the distinctive architecture of the
suggested technique. The suggested method combines frame inversion, transmission
map estimate, and recovery of a clear image using the atmospheric light scattering
model. All frames are subjected to separate equalization depending on the color
channel for significant contrast modification. The initial frame’s pixel intensity is
used to determine atmospheric light, which is adjusted every 6000 frames via a
dynamic patch. A step-by-step representation of the complete structure. Real-time
video capture video acquisition is the first stage in frame enhancement. Real-time
video recording is done with a high-definition web camera. The camera is positioned
within the windshield glass at the driver’s eye level to give a sense of the road. This
202 K. Janaki et al.
camera can capture a 31-color frame per second and up to 1280 × 720 pixels in
quality.
Badhe and Ramteke [7], Bai et al. [8], Tian et al. [9], Toka et al. [10], Maa et al.
[11] is widely used to define the foundation of any hazy/foggy image frame based
on scattering model by the given equation:
where I(x) indicate the input foggy frame, A specifies the atmospheric light, and J(x)
signifies the fog-free output frame. Also, t(x) implies the inverted frame and is given
as:
where d(x) denotes depth in the image and denotes the fog factor (He et al. [12],
Zhu et al. [13]). Without taking into account its impact, we obtained I ≈ J for the
picture that was taken in perfect conditions, β ≈ 0. Similar to this, when an image is
taken under heavily foggy conditions, β > 0 and it becomes a non-negligible value.
In (1), J(x)t(x) is the linear attenuation and A(1-t(x)) is the light of the atmosphere.
A full frame is divided into numerous little size patches. t, A, and J from I Tufail
et al. [14] are to be computed as part of the fog elimination process. The intensity of
the local pixels has the following effects on the run-time calculation of the dynamic
local patch (x):
Each patch has at least one RGB value that is the lowest among the color channels.
Every local patch of the three RGB channels receives the least amount of filtering.
This method produces a frame with very little intensity. The following is an estimate
for the lowest value of intensity I lowest (x) of any pixel:
I lowest (x) = min ye{(x)} I c (y) ce{r, g, b} (4)
where I represents a sample input frame that has had the fog or haze removed. Tian
et al. [9], He et al. [12], Tufail et al. [14], I c is I’s color channel, I lowest is I’s lowest
intensity, which is nearly 0, and (x) denotes the local patch at the x location. Two
least operators, min and mincp{r, g, b}least filter, together yield the lowest intensity
(Fig. 2b, c). Commutative operators are the least operators. By examining the RGB’s
lowest intensity (I low ), the atmospheric light (A) is calculated. The brightest 0.1%
of all the pixels are then selected, along with a few others, as having the highest
intensity value. The coordinate position of these brightest (0.1%) pixels is chosen,
and Yawale and Kapse [15], He et al. [12] distinctly determine the peak value of
intensity in each RGB color channel from these pixel locations. These three RGB
channel intensity values are regarded as the final value for atmospheric light (A).
Thus, ‘A’ is a vector of 3 × 1 in which each value means the maximum intensity
value between R, G, and B as follows:
A = 3c = 1I c avg max I lowest (x) x ∈ (0.1% ∗ h ∗ w) (5)
The light in the atmosphere (A) is brought on by sunlight. Sunshine won’t fluctuate
as quickly in every frame. As a result, atmospheric light (A) is determined for each
picture and then again after 6000 frames. The pixels in the input frame with the
highest intensity value can be used to determine the ambient light (A). The average
intensity of these pixels (low-intensity pixels) is then determined. It is therefore
possible to obtain ambient light.
The inversion of a frame is computed for each actual time body via the usage of atmo-
spheric light (Ac). Every pixel of the enter frame is divided by way of its constant value
in ‘A’ to compute RGB channels Yawale and Kapse [15], He et al. [12]. Normalization
of the Eq. (1) of a hazy frame is completed as follows:
J c (x) J c (x)
= t(x) + 1 − t(x) (6)
Ac Ac
By inserting the minimal operator on each facet of the Eq. (6), the lowest intensity
is calculated as,
I c (y) J c (y)
min y ∈ (x) min c = t(x) min y ∈ (x) min c + 1 − t(x) (7)
Ac Ac
The transmission is denoted here by t(x). The atmospheric light’s constant positive
value Ac is equal to the lowest intensity value J lowest , which is virtually zero. Since
J is a fog-free output frame, J’s lowest intensity is almost 0, meaning.
J c (y)
min yε{(x)} min c =0 (9)
Ac
Putting (9) into (7), the transmission t(x) is assessed by
J c(y)
t(x) = 1 − min y ∈ {(x)} min c =0 (10)
Ac
The frame is inverted in this transmission, t(x). The Eq. (10), even if the trans-
mission is almost nil, can be used to both sky and non-sky locations. The sky region
does not need to be divided (Fig. 3).
There is no need to add any constant parameters to purposefully keep even a tiny
amount of fog present because it remains dense in hilly locations. Figure 4b displays
an inversion of the input hazy frame.
The fog-free scene brightness is restored in accordance with using computed inverted
frame and atmospheric light (1). Thus, even without an inversion, the linear attenua-
tion J(x)t(x) can be zero. As the fog is so dense, it is purposefully not retained here in
17 Real-Time Quick Fog Removal Technique for Supporting Vehicles … 205
Fig. 3 Computation time in two different CPU (CPU1: Intel(R) Core (TM) i5 8250U CPU @
1.60–1.80 GHz with 8 GB RAM, CPU2: Intel(R) Core (TM) i7 8550U @ 4.00 Ghz with 12 GB
RAM and 128 GB SSD)
I (x) − A
J (x) = +A (11)
t(x)
As the brightness of the scene is not as bright as atmospheric light (A), the frame
after fog removal appears weak. As a result, J(x) exposure is increased in He et al.
[12].
206 K. Janaki et al.
The haziness of a frame caused by intense fog is practically obvious after fog-free
recovery. To make the frame even more practical and prominent, however, there is
still room for improvement in the contrast adjustment. To get the final prominent
view, each picture is subjected to the independent histogram equalization channel.
Independent histogram equalization, a type of image processing, distributes pixels
based on the value of the color channels to increase visual contrast. It was chosen
because it is a quick procedure that, after clearing away heavy fog, makes noticeable
contrast improvements. The histogram shows how each frame’s tonal values are
distributed across all pixels. All of the RGB color channels have been balanced. The
accessible color levels are 0 to 255 in the case of an 8-bit image, where the potential
color levels range from 0 to I to L-1. The number I stands for is a pixel’s color
saturation. Based on the color. The transformation portion is now shown; starting
with (13),
s = T(i), 0 ≤ i ≤ L − 1 (12)
cdf(i ≤ t) = (13)
t k = 0 pk (14)
the probability,
sk = T(i) = floor((L − 1) ∗ ik = 0 pk (15)
Table 1 Comparison of computation time (in millisecond) with popular state-of-the-art methods
for various frame sizes
Method Frame size
1024 × 786 600 × 450 441 × 450
DPC (He et al. [12]) 36,896 12,228 9866
CAP (Zhu et al. [13]) 4278 2219 1420
FAMED-Net (Zhang et al. [16]) 1800 889 508
IDGCP (Ju et al. [17]) 1106 500 341
CCR (Wang et al. [18]) 2563 850 368
DPCMR (Colores et al. [19]) 125.98 48.35 21.36
SSIM (Li et al. [20]) 4563 2865 1023
CCEMDCP (Liu et al. [21]) 550 318 150
Histogram scattering model 94.82 35.54 18.83
Proposed method 60.10 20.6 9.7
The most important component while driving is timing. A major accident is likely
if a motorist cannot see the live road view immediately and without delay. A single
frame’s overall processing time shouldn’t be excessively long. Between the live
captured input frame and the output video display, there should be a negligible time
difference. For each frame in the proposed method, the total computation time for the
whole operation is only a few milliseconds. A motorist will now see this processed
footage as authentic real-time live video. In the suggested method, only the first
frame’s lowest intensity of pixels—those that were next to atmospheric light (A)—is
estimated. It is refreshed every 6000 frames and reduces the amount of work required.
Following that, frame inversion and individual histogram equalization depending on
color channel are performed for each frame (for the final contrast adjustment). The
total computation time of the proposed technique is shown in Fig. 1, with varying CPU
speeds. Table 1 shows that the computation times for each frame using the suggested
technique are much longer than those using other well-liked current methods.
The quality of the images is compared with widely used existing methods using
several densely foggy frames of mountainous routes. Figure 4a displays the first thick
input frame of the fog, which was captured. Figure 4b, respectively. The outcome
208 K. Janaki et al.
of the suggested strategy is displayed. The majority of the fog is cleared, as seen in
Fig. 4b, but the frame darkens due to an unbalanced contrast. Xu’s study compara-
tively, the recommended approach is used to display the defogged output in Fig. 4b.
Contrast distortion is three trustworthy assessment methodologies that are utilized
to evaluate the quantitative performance of our suggested strategy with cutting-edge
approaches. The associated MSE is as: denotes the image’s pixel positions, width,
and height, respectively. The better the approximated image, the higher the PSNR
value (x). Three factors are taken into account in restored photos by the SSIM index,
which is used to measure the similarity between two images: lighting l(x), contrast
c(x), and structure s. (x). The decimal value of the SSIM index falls between 1 and
1. Only when comparing two identical photos with equal pieces of data does SSIM
= 1. According to the following, NIQMC determines an image’s quality based on
its local details and global histogram: where is a constant weight used to regulate the
respective significance of the local and global techniques. Local and global quality
measurements are denoted here by the letters QL and QG, respectively. Quite compa-
rable in this case, as seen by the high PSNR and SSIM values. Similarly, NIQMC
prefers photos in particular.
1(w × h) wx = 1 h y = 1 (J(x) − IHF(x))2 SSIM(x) = f(1(x), c(x), s(x))
(18)
Therefore, higher NIQMC values imply stronger visual contrast. The greatest,
second-best and third-best performances are denoted by the colors red, green, and
blue, respectively. Table 1 shows that it performs worse than other approaches
across all assessment procedures. The reason for the method’s poor performance
is that it struggles to work well when the hazy input photos have a large number
of dark patches. This method beats most existing strategies in terms of quantitative
performance.
The whole processing time is only a few milliseconds, as can be seen in Fig. 3.
As a result, there won’t be much of a delay between the camera capturing a real-time
frame and the monitor showing the processed frame.
Several cutting-edge techniques for single image fog removal are taken into
consideration for comparison. Each method’s overall computing time is assessed.
For various frame resolutions (1024 × 786, 600 × 450, and 441 × 450), the proposed
method is here compared against the most recent state-of-the-art methods. Table 1
shows a comparison of computation times.
17 Real-Time Quick Fog Removal Technique for Supporting Vehicles … 209
Additionally, the recovery photographs’ cloud and sky regions look genuine, and
the targets’ texture details have been amplified. Additionally, it has been noted that
and perform less well for sky areas. Particularly, Wu et al. [22] performed worse
than the majority of more current approaches, as evidenced by the PSNR value.
When used for images where the ambient air light was uneven, the method’s greater
patch size proved useless. Since it was discovered that this method performs less
well when the picture is affected by a severe haze. It is observed that certain current
approaches, such, produce superior results for a small number of frames. In contrast,
the values for the remaining frames are similar to those of the suggested study
quantitatively demonstrate, however, that our proposed approach beats previously
known frame-defogging restoration techniques (highest mean value).
In actual driving encounters and responses, several real-time tests are performed.
The drivers benefit from having a nice driving experience. It only appears; however,
when they perceive that the front view is completely obscured by severe fog and
that there is little to no visibility left, they turn back to the display screen. The
suggested system lengthens the visibility distance. As a result, through the display
screen, drivers may see obstacles on the road (such as potholes, speed bumps, or
pedestrians) that are far away. Even in extremely deep fog, drivers report feeling no
fog.
5 Conclusion
This paper describes a quick, efficient defogging method to clear the severe fog
from the driver’s field of view while driving. By employing the suggested method, a
motorist may navigate any heavily foggy route (such as a road in mountainous terrain)
while maintaining a clear field of view. This method can deliver a crystal-clear, fog-
free result in real-time with maximum visibility in the shortest calculation time.
Compared to the current approaches, dynamic patch size for predicting transmission
maps reduces the issue of dark and sky regions. Both low and dense fog may be
effectively eliminated using this method. Driving in deep fog is used to evaluate a
variety of real-time scenarios. Any vehicle can apply the suggested method when
traveling in heavily foggy situations. Any motorist may safely go through dense fog,
such as on a steep foggy road. The suggested strategy allows for a safe voyage for
passengers. If everyone takes the suggested action, pedestrians can cross the road
safely. There will be fewer traffic collisions, fatalities, injuries, and delays caused
by fog in reaching the target. The suggested strategy can be improved in the future
by streamlining the defogging procedure. One or more dynamic strategies can solve
the issue of varying sunshine. The vision distance may be increased even further,
enabling drivers to operate any vehicle or railway safely in deep fog and assisting
fighter jets with takeoff and landing maneuvers.
210 K. Janaki et al.
References
1. Organization WH (2018) Violence and injury prevention and World Health Organization: global
status report on road safety 2018: Supporting a decade of action. Global Status Report on Road
Safety 2018: Supporting a Decade of Action, Geneve
2. Transport Research Wing M R T H: Government of India (2017) Road accidents in India 2017.
New Delhi
3. Lin Z, Wang X (2012) Dehazing for image and video using guided filter. Open J Appl Sci
2(4B):123–127
4. Fattal R (2008) Single image dehazing. In: Proceeding of the ACM SIGGRAPH 08, Los
Angeles, California
5. He L, Zhao J, Zheng N, Bi D (2017) Haze removal using the difference-structure-preservation
prior. IEEE Trans Image Process 26(3):1063–1075
6. Hu HM, Guo Q, Zheng J, Wang H, Li B (2019) Single image defogging based on illumination
decomposition for visual maritime surveil- lance. IEEE Trans Image Process 28(6):2882–2897
7. Badhe MV, Ramteke PL (2016) A survey on haze removal using image visibility restoration
technique. Int J Comput Sci Mobile Comput 5(2):96–101
8. Bai L, Wu Y, Xie J, Wen P (2015) Real time image haze removal on multi-core DSP. In:
Asia-Pacific international symposium on aerospace technology, China
9. Tian Y, Xiao C, Chen X, Yang D, Chen Z (2016) Haze removal of single remote sensing image
by combining dark channel prior with superpixel. In: International symposium on electronic
imaging 2016: visual information processing and communication VII, California, USA
10. Toka V, Sankaramurthy NH, Kini RPM, Avanigadda PK, Kar S (2016) A fast method of fog
and haze removal. In: International conference on acoustics, speech, and signal processing,
Lujiazui, Shanghai, China
11. Maa N, Xu J, Li H (2018) A fast video haze removal algorithm via dark channel prior. In: 8th
international congress of information and communication technology, Xiamen, China
12. He K, Sun J, Tang X (2011) Single image haze removal using dark channel prior. IEEE Trans
Pattern Anal Mach Intell 33(12):2341–2353
13. Zhu Q, Mai J, Shao L (2015) A fast single image haze removal algorithm using color attenuation
prior. IEEE Trans Image Process 24(11):3522–3533
14. Tufail Z, Khurshid K, Salman A, Nizami IF, Khurshid K, Jeon B (2018) Improved dark channel
prior for image defogging using RGB and YCbCr color space. IEEE Access 6:32576–32587
15. Yawale RP, Kapse AS (2016) Digital image defogging using dark channel prior and histogram
stretching method. Int J Adv Res Comput Commun Eng 5(4):889–894
16. Zhang J, Tao D (2020) FAMED-Net: a fast and accurate multi-scale end-to-end dehazing
network. IEEE Trans Image Process 29:72–84
17. Ju M, Ding C, Guo YJ, Zhang D (2019) IDGCP: image dehazing based on gamma correction
prior. IEEE Trans Image Process 29:3104–3118
18. Wang W, Li Z, Wu S, Zeng L (2020) Hazy image decolorization with color contrast restoration.
IEEE Trans Image Process 29:1776–1787
19. Colores SS, Yepez EC, Arreguin JMR, Botella G, Carrillo LML, Ledesma S (2019) A fast
image dehazing algorithm using morphological reconstruction. IEEE Trans Image Process
28(5):2357–2366
20. Li L et al (2020) Semi-supervised image dehazing. IEEE Trans Image Process 29:2766–2779
17 Real-Time Quick Fog Removal Technique for Supporting Vehicles … 211
21. Liu P, Horng S, Lin J, Li T (2019) Contrast in haze removal: configurable contrast enhancement
model based on dark channel prior. IEEE Trans Image Process 28(5):2212–2227
22. Wu Q, Ren W, Cao X (2020) Learning interleaved cascade of shrinkage fields for joint image
dehazing and denoising. IEEE Trans Image Process 29:1788–1801
Chapter 18
Deep Learning-Based Approach
for Outlier Detection in Wireless Sensor
Network
1 Introduction
Outliers are considered as a significant deviation from the usual pattern of sensed
data due to faults in sensors. The faults in WSN may occur unexpectedly due to
many constraints like low-power transmitter, limited energy resources, environ-
mental impact, etc. As the outlier data are unreliable and inaccurate, it may lead
to life-threatening events as maximum use of WSNs is involved in safety-critical
applications. The primary goal of outlier identification in WSNs is to locate outliers
in distributed streaming data online with high detection accuracy and limiting the
network’s resource consumption [1].
To our knowledge, the majority of the existing outlier identification techniques
are inapplicable in real-time application. Following the successful identification of
outliers in real-time data, it is possible to stop the entry of the outlier data into the
network, avoiding the relay nodes unnecessary involvement in the transmission of
the outlier data to the sink node.
In this paper, we suggest an unsupervised learning technique called GAN. The
architecture is suggested here by using robust continuous clustering where the cluster
heads use the proposed detection algorithm to detect outliers locally.
B. Sarangi (B)
Biju Patnaik University of Technology, Rourkela, Odisha 769015, India
e-mail: biswaranjan.sarangi@gmail.com
B. Tripathy
GITA Autonomous College, Bhubaneswar, Odisha 752054, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 213
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_18
214 B. Sarangi and B. Tripathy
2 Related Work
Zhang et al. in [2] and Ayadi et al. in [3] give a comprehensive literature review on
outlier detection methods in WSN. The criteria used by the authors to categorize the
outlier identification approaches in [2] include input sensor data, outlier type (local
and global), outlier identity, outlier degree, and availability of pre-defined data. They
have divided outlier identification methods into ways based on nearest neighbors,
statistics, classification, and spectral decomposition.
In order to solve the problem of outlier detection, statisticians employed statis-
tical approaches as their first algorithms in the early nineteenth century [4]. Statis-
tical methods can also be divided into parametric and non-parametric categories. A
time-series analysis and geostatistics method that locates outliers and distinguishes
between errors and events in a distributed and online mode have been proposed in
[4]. In order to define normal behavior, their method makes use of the spatiotem-
poral correlations in WSN data. The strategy based on parametric techniques is
not appropriate in real-world settings because there is no prior knowledge of data
distribution.
A parameter-free outlier detection algorithm is suggested in [5] for calculating the
ordered outlier distance difference factor. The difference in the ordered distances is
taken into account when calculating the outlier score for each data point. It is recom-
mended in [6] to use data nearest for outlier detection (DNOD) for unsupervised
outlier detection. This approach seeks to find outlier measurements by analyzing the
learning data that sensors have gathered. Non-parametric methods have a significant
computing cost for handling multivariate data, making them unsuitable for real-time
applications.
To find outliers in sensor nodes, Rajasegarar et al. [7] suggest a global outlier
identification technique based on clustering. Each node clusters the measured data
and reports the cluster summaries rather than sending the measured data to its parent.
The parent then sends the sink cluster summaries that were compiled from its entire
offspring and combined. If the average intercluster distance of a cluster in the sink
node exceeds a threshold value of the intercluster distances defined, an abnormal
cluster can be discovered. In WSN applications, the choice of cluster width is
crucial. All data patterns’ distance measurements are computationally demanding
and inappropriate for sensors with minimal resources.
In the fields of machine learning, systematic classification approaches are crucial
[2]. They develop a classification model using the collection of data instances
(training) and classify an ambiguous occurrence into one of the learnt classes (tests).
Unsupervised-based categorization does not require any prior knowledge of labeled
training data. The classification model, which fits the majority of the data examples,
is learned during training. The outlier identification techniques for WSN are based on
Bayesian networks, support vector machines (SVMs), and deep learning, depending
on the type of classification model being used. Although it resolves the multivariate
data issue, it must train on the newly arrived normal dataset.
18 Deep Learning-Based Approach for Outlier Detection in Wireless … 215
Using SVM, Rajasegarar et al. [8] suggest an approach for outlier detection in
sensor data. This method makes use of a single-class quarter SVM to reduce the effort
required for computational complexity and locally locate outliers of each node. An
anomaly in the sensor data is known to exist outside of the quarter-sphere. In [9], the
authors suggested two distributed and online outlier detection algorithms based on a
one-class hyper-ellipsoidal SVM. They have considered the correlation between the
sensor data attributes.
For the purpose of detecting outliers and events in WSNs, a thorough analysis
of several one-class SVMs, including the hyper-plane, hyper-sphere, quarter-sphere,
and hyper-ellipsoid, is provided in [10].
In [11], a method for detecting outliers called the support vector data description
based on spatiotemporal and attribute correlations (STASVDD) is proposed. This
method assumes that once the collected data vectors are independently and uniformly
distributed in WSNs, outliers can independently occur in every attribute.
In [12], the autoencoder neural networks are used to solve the outlier detection
problem in WSN. The authors have developed a two-part algorithm, which resides
respectively on sensor nodes and the cloud. The anomalies are detected in a distributed
manner at sensor nodes without having to communicate with any other sensor nodes
or the cloud. A time-series-based recurrent autoencoder ensembles are proposed to
detect outliers in the reference [13]. Their proposed two solutions exploit sparsely
connected recurrent neural networks (S-RNNs), which ensures the design of multiple
autoencoders with different neural network connection structures.
3 Proposed Approach
1
n
λ
C(U ) = xi − u i 22 + w p,q ρ u p − u q 2 (1)
2 i=1 2
(x p ,xq )∈Eu
216 B. Sarangi and B. Tripathy
Here, the weights w p,q balance the role of each data point to the pairwise terms
and λ is used to balance the strength of the data terms and pairwise terms, whereas
an appropriate robust penalty function ρ(.) is important on the regularization terms.
A graph G u is constructed
on the optimized value of U in which a pair x p and xq is
connected if u p − u q 2 < δ. The outputs, ku and ka subsets, are created from the
unlabeled data and discovered anomalies. When compared to the subsets separated
by similar outputs, the subsets are partitioned in a way that faithfully captures the
latent cluster structure of the complex data structure.
GAN as suggested by Goodfellow et al. [15] is the method for estimating gener-
ative models through an adversarial mechanism in which two models, one of which
is a discriminator (D) distinguish between real and generated data while the other
one is a generator (G) create data to fool the discriminator as shown in Fig. 1.
As suggested in [15], D and G play two-player minimax game with respect to a
joint loss function for V (G, D) which is given by
V (D, G) = E x∼P data(x) log D(x) + E z∼P z(z) log(1 − D(G(z)) . (2)
For generated samples Gauto(zi), where z is a latent space distribution, the generator
G, implicitly determines the probability distribution. The average negative cross-
entropy between the predictions and their sequence labels is then trained to be as low
as possible by the discriminator. Thus, the discriminator loss is given by
1
M
Dloss = log Dauto (xi ) + log(1 − Dauto (G auto (z i ))) . (3)
M i=1
18 Deep Learning-Based Approach for Outlier Detection in Wireless … 217
The discriminator loss must be minimized to recognize that xi is real and Gauto(zi)
is false. The generator is trained to confuse the discriminator so that the discriminator
recognizes as many of the generated samples as real as possible.
The generator loss is given by
1
M
G loss = log(1 − Dauto (G auto (z i ))) . (4)
M i=1
At the end of module training, the threshold is evaluated using precision and recall.
The trained module will then be deployed to all the cluster heads. Updated W, b and
threshold are scheduled to be sent periodically to sink or cloud cluster heads.
Clusters have a smaller cluster size, closer to the base station, which reduces the
energy spent on data processing in the cluster. As shown in Fig. 2, with the increase
of distance from the sink node, the cluster size increases. Each cluster head runs a
copy of the GAN. All sensor readings are taken from individual cluster heads in the
cloud. For each cluster head in the network, the sink node or the cloud will make one
copy of the GAN, i.e., n copies of the GAN assuming that there are n cluster heads
in the network. Each copy of GAN represents a cluster that is periodically trained in
the cloud by using the sensor data received from the respective cluster head.
4 Experimental Results
In order to evaluate the effectiveness of the suggested method, experiments are carried
out on synthetic data using the Python library Pymote 2.0. In this experiment, both the
discriminator and generator are trained and the threshold is obtained experimentally.
For training, 80% of data and for testing 20% of data from synthetic dataset are used.
The following metrics are considered for performance evaluation
TP + TN
Accuracy rate = , (5)
TP + TN + FP + FN
TP
Precision (P) = , (6)
TP + FP
TP
True Positive Rate(Recall)TPR = , (7)
TP + FN
FP
False Positive Rate FPR = , (8)
FP + TN
2(Precision × Recall)
F1 = . (9)
Precision + Recall
Different precision and recall values for different threshold values are shown
in Fig. 3. The threshold at which the precision curve intersects the recall curve is
called the optimum threshold, and its value is found to be 0.9. Figure 4 shows the
reconstruction error for various test data points. The outliers are the data points above
the threshold line.
Our model displays a division boundary surrounding the normal data, identifying
partially identified group outliers and all discrete outliers from the synthetic dataset
as shown in Fig. 5.
A confusion matrix is a frequently used table to assess a classification model’s
performance on a test dataset where the true values are known. The confusion matrix
for the suggested strategy is shown in Fig. 6.
Table 1 compares the suggested method’s performance with those of state-of-the-
art solutions.
220 B. Sarangi and B. Tripathy
5 Conclusion
The main goal of the outlier detection method is to spot misbehaving nodes and
prevent the outlier data that these nodes report from entering the network. In this
research, we develop a robust continuous clustering-integrated online outlier identi-
fication approach based on GAN. An optimal threshold for outlier detection is exper-
imentally determined. The performance in regard to accuracy, TPR, FPR, precision,
and F1 is compared with the state-of-the-art techniques. Our model shows accuracy
of 95.7% with a low FPR of 28.42%.
References
1 Introduction
According to the survey, one in six deaths worldwide is caused by cancer, which is the
second prominent cause of mortality [1]. Renal cell carcinoma (RCC), that happens
in almost 90% of all cases of kidney cancer, is by far the most prevalent category
of kidney cancer [2]. Cancer prediction places a greater emphasis predisposition,
reappearance, and diagnosis of cancer. Cancer identification’s main aim is to classify
tumor categories and associate indicators that help build a classifier to recognize
particular advanced cancer kind or discover cancer at its initial phase.
A series of multilayer neural network models called “deep learning” (DL) is a
branch of machine learning which is a subset of artificial intelligence. It excels at
the challenge of learning from large amounts of data which is called “big data”
[3]. Similar to various machine learning approaches, deep learning has two stages:
a training stage in which network constraints are approximated using a specified
training dataset, and a testing stage, in which the trained network is used to forecast the
results of new input data. The development of the DL model for enhanced precision
and creative interoperability for cancer category forecast was made possible by the
gathering of entire transcriptomic data of tumor specimens.
CNN has in recent times turn into the genuine standard for segmenting kidney
tumors due to its par-excellence functioning when equated to other models in con-
ventional computer vision and medical image evaluation. CNN models can be trained
to generate 3D feature hierarchies using internal data.
K. Rai (B)
G.L. Bajaj Institute of Technology and Management, Greater Noida, India
e-mail: kajal.rai@glbitm.ac
P. Kumar
School of Computer Applications, Lovely Professional University, Punjab, India
e-mail: pawan.11522@lpu.co.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 223
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_19
224 K. Rai and P. Kumar
2 Related Work
In recent times, deep learning models constructed on CNN have been made known
for auspicious results on a number of medical image analysis tasks. There are many
layers in CNN and have been being developed by Fukushima since the end of 1970s
[7], and in 1995, they were also utilized to examine medical images. The segmenta-
tion of computer tomography (CT) images was done by the authors in [8] using 2D
CNN. Researchers have employed a variety of pattern analysis methods, including
Resnet50, Resnet50V2, Modified CNN, InceptionV3, 3D U-Net, V-Net, ReLU, and
GoogleNet in their work. In the study by Myronenko et al. [9], the authors intro-
duced borderline from start to finish using well-known CNN for correct semantic
segmentation of kidney tumor using arterial stage abdominal 3D CT pictures.
19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN) 225
3 Research Methodology
In this paper, the research methodology used consists of various phases which can
be depicted by Fig. 2.
We had gathered the data of kidney from Picture Archiving and Communication
System (PACS) from different hospitals in Bangladesh. Table 1 shows the dataset
used with number of instances of each type.
226 K. Rai and P. Kumar
3.2 Preprocessing
A technique called data preprocessing is used to transform raw data into desirable
data format which can be used for model construction. Images were cropped to
remove unnecessary portions, and also, the patients’ information was removed from
the images. Then the images were converted into jpeg format. After the conversion,
each image finding was again confirmed by a radiologist and a medical technologist
to reconfirm the correctness of the data. Also, this research work consists of prepro-
cessing tasks such as attribute selection, cleaning missing values, and splitting the
dataset into training and testing. Some attributes such as serial number is removed
as it does not contribute to classification.
In this research, convolutional neural network (CNN) model that categorizes tumor
and non-tumor instances into their appropriate categories based on unstructured gene
expression is presented.
3.4 Classification
Classification is done to predict which images have cancer and is of which category,
either, Cyst, Stone, etc. Accuracy is one of the significant methods for estimating
classification models. Accuracy is the fraction of predictions the generated model
got correct. Accuracy is equal to the ratio of correct forecasts to all other guesses,
and it is given in Eq. (1).
Number of Correct Predictions
Accuracy := (1)
Total Number of Predictions
19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN) 227
Accuracy can also be measured in terms of positives and negatives for binary
classification as follows:
True Positive + True Negative
Accuracy :=
True Positive + True Negative + False Positive + False Negative
(2)
The results obtained from CNN are analyzed and summarized based on accuracy.
4 Experimentation
Python language is used for experimentation which is widely used machine learning
language to build models and does the prediction of various things. For experiments,
dataset is downloaded from Kaggle [12]. All the data are in images format (jpeg).
Various python libraries like Seaborn, Keras are used to do the training and testing
of CNN models.
First the dataset is uploaded. Figure 3 shows the glimpse of images dataset.
Figure 4 displays the total number of instances in four different classes. Then we
split the dataset randomly into training, testing, and validation sets. The size of
training dataset was 11,200 images, 621 images for testing, and 1249 images for
validation of the results. CNN 2D sequential model was used for the experiments.
Figure 5 shows the model generation details.
After model generation training, testing and validation of the model has been
done and the result is based on certain parameters like precision, recall, accuracy,
and loss. Figures 6 and 7 present the graphs of training and validation results with
different number of epochs. It can be clearly visualized from both the figures that
with an increase in the number of epochs while training the model, the accuracy of
the model also increases.
We also did the prediction of kidney tumor on test dataset, from which we got on
an average of 99% result on the given dataset. There is the division of 80:20 split
on the training and test data. Figure 8 shows the confusion matrix on heat map on
trained data, and Fig. 9 shows the confusion matrix of test data.
Tables 2 and 3 show the classification report of the predicted result on trained data
and test data, respectively.
19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN) 229
A prompt and accurate identification is crucial for timely diagnosis of cancer and the
excessive death rate. In particular, some types of kidney cancer may not exhibit symp-
toms until the very end and may remain localized in the kidneys without spreading
to other body organs. Therefore, it is tremendously essential to increase approxi-
mation accuracy by using updated and advanced techniques when treating cancer.
Numerous researches have been conducted recently, especially using machine learn-
ing and deep learning approaches, on various cancer types. In this paper, CNN model
is developed that categorizes tumor and non-tumor instances into their designated
cancer categories or as normal based on unstructured gene expression. CT data is
used to train and test the model, which has 12,446 unique data points, including
3709 cysts, 5077 normals, 1377 stones, and 2283 tumors. The model was 100%
accurate on trained data due to over-fitting of the model, but on test data the result
is not 100% accurate it is in between 96 and 100%, i.e., 98.6% or 99% on an
average. As the number of epochs of training the model increases, the accuracy
and precision increase and as a result model loss decreases. To a large extent,
segmentation issues for kidney and renal malignancies have been met with great
success as a foundation for further development although including the usage of
these technologies in the test set outside of the sampled population would be
challenging.
References
1. Siegel RL, Miller KD, Jemal A (2018) Cancer statistics, 2018. CA: Cancer J Clin 68(1):7–30.
https://doi.org/10.3322/caac.21442
2. American Cancer Society. About kidney cancer. www.cancer.org/cancer/kidney-cancer/about.
html
3. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.
1038/nature14539
4. Mu G, Lin Z, Han M, Yao G, Gao Y (2019) Segmentation of kidney tumor by multi-resolution
VB-Nets. Univ. Minn. Libr., pp 1–5
5. Magadza T, Viriri S (2021) Deep learning for brain tumor segmentation: a survey of state-of-
the-art. J Imaging 7–19
6. Kumar P, Sharma M (2021) Feature-importance feature-interactions (FIFI) graph: a graph-
based novel visualization for interpretable machine learning. In: 2021 international conference
on intelligent technologies (CONIT). IEEE, pp 1–7
7. Lo S-CB, Lou S-LA, Lin J-S, Freedman MT, Chien MV, Mun SK (1995) Applications for lung
nodule detection. IEEE Trans Med Imaging 14:711–718
8. Thong W, Kadoury S, Piché N, Pal CJ (2018) Convolutional networks for kidney segmentation
in contrast-enhanced CT scans. Comput Methods Biomech Biomed Eng Imaging Vis 6:277–
282
9. Myronenko A, Hatamizadeh A (2019) Edge-aware network for kidneys and kidney tumor
semantic segmentation. University of Minnesota Libraries Publishing, Mankato, MN, USA
10. Aljaaf AJ et al (2018) Early prediction of chronic kidney disease using machine learning
supported by predictive analytics. IEEE Evrimsel Hesaplama Kongresi (CEC) 1–9
19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN) 233
11. Marsh JN, Matlock MK, Kudose S, Liu T-C, Stappenbeck TS, Gaut JP, Swamidass SJ (2018)
Deep learning global glomerulosclerosis in transplant kidney frozen sections
12. Kaggle: Data Science Community. https://www.kaggle.com/datasets/nazmul0087/ct-kidney-
dataset-normal-cyst-tumor-and-stone
Chapter 20
Hybrid Machine Learning Approach
for Sentiment Analysis of Amazon
Products: A Survey
Om Sarulkar, Rahul Pitale, Shivam Tikhe, Rohan More, and Sumit Giri
1 Introduction
In the modern world, media platforms, online retail, and e-commerce play a signif-
icant part in forming an online community and allowing them to voice their views
and ideas on any topic. For instance, amazon inc. subsidiary, amazon retail is a
well-known online store these days. It has an option given to users to post and
converse about their opinions about any item available on the platform, due to which
a huge amount of data is generated which is classified as semi-structured data. In
order to uncover crucial information about the items that have reviews posted about
them, understand people’s sentiment, sentiment analysis is utilised to explore and
assess these data. Sentiment analysis (SA), often known as text classification or
sentiment analysis, is an integral branch in natural language processing (NLP). The
branch of machine learning to understand human language is called natural language
processing. In this study, we look at different machine learning algorithms used by
researchers to get insights into the amazon/retail website product review sentiments.
We evaluate recent supervised classification algorithms and their combination that
have been used to identify sentiment analysis in Amazon product evaluations in order
to locate the best one that can deliver trustworthy and accurate findings. This method
may then be used as a starting point for Amazon reviews, categorization jobs, recom-
mendation systems, and so on. An accurate and reliable system to deduce the product
sentiments can broaden the spectrum of its application into movie reviews, service
reviews, etc.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 235
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_20
236 O. Sarulkar et al.
Amazon is one of the biggest internet merchants in the world. It had expanded since
its inception as an online platform in 1994. It now offers over 12 million goods and
has 200 million active users accessing the store from their PC or their phone, making
it a microcosm for great user-supplied evaluations. Amazon offers a variety of things
such as books, phone applications, movies, apparel, gadgets, toys, and so on and
uses a star-based rating system ranging from 1 to 5 stars (1 = least, 5 = most) and
provides an option to write a review. An example of the system is shown in Fig. 1.
This score system comes with no instructions on how to use it, and the product
evaluations are subjective and personal. As a result, a user might give an excellent
product a “1” but have a bad user experience, such as no satisfaction with the quality
or delivery compromise, and vice versa. The lack of rules makes identifying the
user’s feelings regarding various product elements and components of a purchasing
experience challenging. Moreover, a “5” product review does not always correspond
to the product review of an item. To gain more information about the product review,
sentiment analysis is done.
3 Sentiment Analysis
Opinion mining, also known as sentiment analysis, is one of the studies under NLP
research. To investigate people’s opinions, it leverages textual data that are readily
available on e-commerce sites like Amazon. It focuses on the theme area of the text—
a word or a sentence—those points in a positive or negative direction. By offering
businesses a thorough understanding of how customers feel about their products, SA
plays a vital role in the commercial sphere. As a result, businesses may modify their
strategies to meet customer expectations and requests and avoid loss. On the other
hand, choosing the items you want to purchase might be helpful for potential buyers.
3.2 Approach
Dictionary Based
WordNet, SentiWordNet, and online dictionaries are just a few examples of opinion
dictionaries that often feature both positive and negative opinions. This approach
looks for words with ambiguous meaning in the text, compares them to terms from
the dictionary, and then calculates the appropriate scores. This approach cannot find
views that are domain- or context-specific.
Corpus Based
In order to find domain- or context-specific views that dictionary-based techniques
are unable to find, it finds opinionated keywords in the corpus and assigns polarity
to all of these words. It calls for an English dictionary or a dictionary with a sizable
word definition database. The algorithm must be able to access and retrieve it.
Hybrid Machine Learning
Hybrid machine learning is a method where two or more machine learning algorithms
are used together to obtain better results. Results of one model are used to augment
the input to another model. This kind of ensemble learning improves the quality of
data when it is fed to the classification model.
4 Literature Review
The literature survey was conducted based on recent developments in the field of
sentiment analysis primarily on Amazon product reviews. Figure 3 demonstrates the
process followed for the survey. Firstly, the application of supervised classification
algorithms on Amazon product review was studied. After surveying the recent studies
and researches, papers containing a combination of best performing classification
algorithms were surveyed. To improve accuracy of existing algorithms, researchers
have implemented artificial neural networks (ANN) for the classification process.
Lastly, the application of ANN was surveyed.
At first glance, we begin by looking at related work which uses traditional supervised
learning algorithms to calculate the performance of machine learning models. The
algorithms that are in focus are support vector machines—SVMs, Naive Bayes—NB,
and Decision Trees.
The authors in [1] compared three classification algorithms: SVM, NB, and
Maximum Entropy. As the number of data points in training increased, the perfor-
mance of SVM improved subsequently compared to NB and the poorest performer
was ME. However, SVM suffered when unigrams were used in preprocessing. In [2],
the authors used six different classification models along with five, tenfold cross-
validation. SVM performed the best with tenfold, while the limitation is being that
tenfold takes up large amounts of time. The paper surveyed in [3] however tried the
NB and OneR classification methods. OneR performed better but took a very large
amount of time, while NB was faster with similar results. The Ensemble Classi-
fier beat the aforementioned machine learning algorithms when it was compared in
[4] to others including logistic regression, SVM, Naive Bayes, Decision Tree, and
Multinomial. In [5], in this paper, authors used a combination of bigram mode with
SVM, so the hybrid algorithm gives the highest accuracy of 85%. In [6], the authors
compare between two machine learning approaches which are SVM and NB for
analysing the sentiment of the customers’ reviews on Amazon products. SVM offers
a much greater accuracy and precision recall. The authors of [7] analyse the dataset
of Amazon reviews and investigate sentiment categorization using several machine
learning techniques. The reviews were first converted into word vectors using a
variety of methods, including glove, TF-IDF, and bag-of-words. Then, they trained
many machine learning algorithms, including bert, naive bias, bidirectional long-
short memory and long—term, random forest, and logistic regression. The models
were then assessed using cross-entropy gradient descent, precision, F1-score, accu-
racy, and recall. In [8], the authors examine preprocessing procedures on the dataset,
such as stemming, tokenization, casing, stop word removal, and eventually offer a
rating for its categorization in negativity or positivity. In [9], we see a rise in accu-
racy of scores while using unstructured data. The model achieves an accuracy of
98% of Naive Bayes algorithm and accuracy of 93% of SVM. In [10], the authors
had done the context-based analysis for Amazon products. The was collected from
amazon product site and preprocessed accordingly for analysis data. They had used
the Naive Bayes and Support Vector Machine models to classify the reviews and then
perform the context-based analysis. Measures of performance, i.e. precision, recall,
and F1-scores were calculated, and on the basis of that, models were compared. The
area of work was to improve the sales based on the sentiments delineated, and every
product was considered whether it has positive or negative inclined reviews. In [11],
the authors had done the sentiment analysis of products using machine learning. They
had gathered the data from Amazon product site for the following products: Cameras,
Laptops, Tablets, and Televisions. The data are treated with preprocessing technique.
The preprocessing technique used is bag-of-words (BOW). The data then are used to
20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon … 241
train Naive Bayes and support vector machine classifiers to mould the models. Naive
Bayes classifier came up with 90% and above accuracies for each product, whereas
the support vector machine classifier performed dim with accuracies less than 90%.
Thus, the Naive Bayes was superior to SVM in sentiment analysis. The authors of
[12] conducted a sentiment analysis of user reviews for Amazon items. They had
gathered the information from the Amazon product page, performed some rudimen-
tary preprocessing on it, and then utilised it right away for model training. Decision
Tree, Naive Bayes, and Support Vector Machine were the algorithms used for the
study. The writers of [13] had collected the information from the Amazon goods
page. Following that, the data were analysed using review-level and sentence-level
classifications. The categorising method used was called “Phrase of Speech.” The
training of the model was then supplied with these data. The classification algorithms
Naive Bayes and support vector machine were taught. [14] describes a categorization
method that the authors developed for a dataset of music CDs and Microsoft goods
that were scanned using a Python crawler. They looked at five different categories
(most negative, negative, neutral, positive, and most positive). The paper used three
different types of adverbs as features, namely Adverbs RB, Comparative adverbs
RBR, Superlative adverbs RRS, as well as a mixture of them, to achieve review-level
classification. Other classifiers included RF, DT, NB, SVM, GB, and LSTM classi-
fiers. The analyses show that a single RBR feature is adequate for most classifiers,
with the exception of LSTM and NB, and that a combination of RBR-RBS features
is more effective for all classifiers [15]. They made use of the Amazon polarity
dataset for their study. They have used deep learning models LSTM, CNN, SVM,
and logistic regression. A sizable dataset had been used to test each model. The
optimal combination approach was found to operate stemming over lemmatization
and exclude spelling checking. They investigated and analysed several preprocessing
strategies that increase accuracy. They used a variety of feature techniques, including
their TF-IDF, bag-of-words, and n-grams.
Moving on towards hybrid machine learning approaches where techniques such
as ensemble learning is used to change NLP rules or augment input data. Researches
have tried to improve the input data towards the classifier models.
In [16], SVM and NB are used as classification models, but their input data are
enriched using reputation scores. This method uses previous data for the assigning
of weights bringing dependency into the previous data. In [17], authors have tried
to categorise the training dataset using SVM and later k-means for clustering. This
model outperformed the individual classifiers. The authors in [18] used KNN for
grouping data and NB and LSTM for classification. LSTM provided better accuracy
while it suffered when the dataset was large. In [19], the authors performed ensemble
learning compared to Naive Bayes and SVM. The ensemble method gave much better
results, while the other two suffered. In [20], technologies used are data cleaning
and preprocessing. This paper dataset is used as relevant graphs. This dataset has
the highest accuracy, almost 95.7%. In [21], the authors tried a hybrid rule-based
approach to observe results of algorithms such as SVM, RF, and NB. The hybrid rule-
based approach got better results [22]. The authors used RF to form an ensemble of
decision trees. The tree data structure was used with SVM to form a classifier model.
242 O. Sarulkar et al.
The hybrid model showed a 2% rise in accuracy. [23] The authors have revisited the
RF ensemble method paired with SVM. They achieved a greater accuracy than [24]
with the same dataset. Bootstrap method was used as an extension of Random Forest.
[24] The authors employed an ensemble learning method in data preprocessing where
unigram, bigram, and trigram with and without stop word removal were used. RF with
unigram with stop word removal showed the best results. In [25], the researchers had
used natural language processing on the Arabic language reviews on products. They
had built the recurrent neural network of the sentiment analysis of those reviews.
They had built the dataset of the Arabic language reviews. The model performs at
the considerably efficiency of 85% on the given dataset which consists of 7480 test
items. The model will behave more precisely when trained with the large data.
Tables 1 and 2 show the comparison between different research approaches based
on the literature review.
From the comparison table, we can deduce that conventional supervised learning
algorithms perform worse than hybrid methods. In [8, 10, 11], we observe that
enhancing the preprocessing data improves the accuracy significantly. The use of
hybrid methods, i.e. ensemble learning helps the classifier algorithm and improves
its performance.
Figure 4 shows the steps and the workflow researchers have followed to come up
with the conclusions of their sentiment analysis research.
The goal of this stage is to import these data, eliminate columns, deal with missing
values, and so on, to prepare the data for future processing—the Pandas Python
library may help a lot in this step. A suitable dataset must be established before the
text can be analysed and classified.
Data Preparation
After obtaining the text, the data must be prepared for usage in subsequent machine
learning procedures. Preprocessing is used to remove data that are useless for text
categorization, such as grammar, digits, accent marks, stop words, sparse terms,
white spaces, and specific words. Other components of this include word conversion
to lower case, tokenization, stemming, lemmatization, part of speech labelling, and
so on. These noisy data may have an impact on the classifier’s accuracy. In this stage,
it is preferred to use the natural language processing toolkit (NLTK).
Feature Extraction and Selection
Features must explain the data in the format needed by the machine learning algorithm
for it to find a solution. By combining and reformatting these initial characteristics
using a number of approaches (such as TF-IDF, POS, N-grams, Word Embedding,
BOW), feature extraction creates a new collection of features that may be used by
machine learning models. Then, dismiss everything except the important, helpful, and
illuminating components. It avoids overfitting and the dimensionality curse, which
occurs when there are too many features to properly represent inadequate data, by
removing redundancy or gaining a predetermined number of features. The extraction
and selection of features have a significant impact on the classifier’s accuracy. As a
consequence, the best technique for acquiring the attributes must be selected. The
Scikit-learn package has a number of built-in algorithms that might be quite useful
in this situation.
This stage involves determining the polarity of the review documents using a number
of sentiment classification techniques; in SA, supervised learning techniques are
often used to apply the sentiment label to a specific text. One of two types best
describes SA problems: binary issues with positive and negative labels. Another
example is multi-class, which specifies more than two labels (most positive, positive,
neutral, negative, and most negative). Python library for machine learning and data
20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon … 245
The success of the machine learning techniques used to establish the overall accuracy
of the sentiment analysis will be evaluated in this last step. The models generate labels
of 1 and 0 as their result. Later, a confusion matrix is created by evaluating these
labels, yielding true positives (TP), false positives (FP), true negatives (TN), and false
negatives (FN). True positives and true negatives are values that the model correctly
predicts genuine labels, while false positives and false negatives are values that the
model got incorrect. The performance metrics that are obtained from the confusion
matrix employed statistical metric parameters in the Scikit-learn toolkit to assess
the performance of each algorithm are accuracy (1), precision (2), recall (3), and
F1-score (4).
6 Proposed Work
We saw that the supervised machine learning algorithms could achieve an accu-
racy of only. Paired with ensemble machine learning methods, the accuracy only
increases by at most 2%. The proposed methodology in this paper aims to improve
the already existing random forest ensemble method by removing the covariance in
data preprocessing. This method will form better random forest ensembles and will
try to improve the accuracy of supervised machine learning algorithms. Figure 5
illustrates the proposed methodology.
The support vector machine model will get the data input that has been already
broken down into decision trees which will try to improve the performance metrics
of the SVM classifier.
246 O. Sarulkar et al.
References
1. Rathor AS, Agarwal A, Dimri P (2018) Comparative study of machine learning approaches
for Amazon reviews. Procedia Comput Sci 132:1552–1561 (2018)
2. Haque, TUl, Saber NN, Shah FM (2018) Sentiment analysis on large scale Amazon product
reviews. In: 2018 IEEE international conference on innovative research and development
(ICIRD). IEEE
3. Singh J, Singh G, Singh R (2017) Optimization of sentiment analysis using machine learning
classifiers. HCIS 7(1):1–12
4. Brownfield S, Zhou J (2020) Sentiment analysis of Amazon product reviews. In: Proceedings
of the computational methods in systems and software. Springer, Cham
5. Maurya S, Pratap V. (2022) Sentiment analysis on amazon product reviews. In: 2022 interna-
tional conference on machine learning, big data, cloud and parallel computing (COM-IT-CON),
pp 236–240. https://doi.org/10.1109/COM-IT-CON54601.2022.9850758
6. Dey S, Wasif S, Tonmoy DS, Sultana S, Sarkar J, Dey M (2020) A comparative study of support
vector machine and naive bayes classifier for sentiment analysis on Amazon product reviews.
In: 2020 international conference on contemporary computing and applications (IC3A), pp
217–220. https://doi.org/10.1109/IC3A48958.2020.233300
7. AlQahtani, ASM (2021) Product sentiment analysis for amazon reviews. Int. J. Comput. Sci.
Inf. Technol. (IJCSIT) 13(3), June 2021, Available at SSRN: https://ssrn.com/abstract=388
6135
8. Nandal N, Tanwar R, Pruthi J (2020) Machine learning based aspect level sentiment analysis
for Amazon products. Spat Inf Res 28:601–607. https://doi.org/10.1007/s41324-020-00320-2
9. Jagdale RS, Shirsat VS, Deshmukh SN (2019) Sentiment analysis on product reviews using
machine learning techniques. In: Mallick P, Balas V, Bhoi A, Zobaa A (eds) Cognitive infor-
matics and soft computing. Advances in intelligent systems and computing, vol 768. Springer,
Singapore. https://doi.org/10.1007/978-981-13-0617-4_61
10. Sindhu C, Rajkakati D, Shelukar C, Chandra Sekharan S (2020) Context-based sentiment anal-
ysis on Amazon Product customer feedback data. https://doi.org/10.1007/978-981-15-5329-5_
48
11. Jagdale R, Shirsath V, Deshmukh S (2019) Sentiment analysis on product reviews using
machine learning techniques: proceeding of CISC 2017. https://doi.org/10.1007/978-981-13-
0617-4_61
12. Singla Z, Randhawa S, Jain S (2017) Sentiment analysis of customer product reviews using
machine learning. In: 2017 international conference on intelligent computing and control
(I2C2). IEEE
13. Fang X, Zhan J (2015) Sentiment analysis using product review data. J Big Data 2:5. https://
doi.org/10.1186/s40537-015-0015-2
14. Kausar S, Huahu X, Ahmad W, Shabir MY, Ahmad W (2020) A sentiment polarity categoriza-
tion technique for online product reviews. IEEE Access 8:3594–3605. https://doi.org/10.1109/
ACCESS.2019.2963020
15. Katić T, Milićević N (2018) Comparing sentiment analysis and document representation
methods of amazon reviews. In: 2018 IEEE 16th international symposium on intelligent systems
and informatics (SISY), pp 000283–000286, https://doi.org/10.1109/SISY.2018.8524814
16. Benlahbib A, Nfaoui EH (2020) A hybrid approach for generating reputation based on opinions
fusion and sentiment analysis. J Organ Comput Electron Commer 30(1):9–27 (2020)
17. Korovkinas K, Danėnas P, Garšva G (2019) SVM and k-means hybrid method for textual data
sentiment analysis. Baltic J Mod Comput 7(1):47–60
18. Budhwar MJ, Singh S (2021) Sentiment analysis based method for Amazon product reviews.
Int J Eng Res Technol (Ijert) Icact 9(08) (2021)
19. Sadhasivam J, Babu R (2019) Sentiment analysis of Amazon products using ensemble machine
learning algorithm. Inter J Math Eng Manage Sci 4:508–520. https://doi.org/10.33889/IJM
EMS.2019.4.2-041
248 O. Sarulkar et al.
20. Iqbal F et al (2019) A hybrid framework for sentiment analysis using genetic algorithm based
feature reduction. IEEE Access 7:14637–14652. https://doi.org/10.1109/ACCESS.2019.289
2852
21. Dadhich A, Thankachan B (2022) Sentiment analysis of amazon product reviews using hybrid
rule-based approach. In: Smart systems: innovations in computing. Springer, Singapore, pp
173–193
22. Al Amrani Y, Lazaar M, El Kadiri KE (2018) Random forest and support vector machine-based
hybrid approach to sentiment analysis. Procedia Comput Sci 127:511–520
23. Al Amrani Y, Lazaar M, El Kadiri KE (2018) A novel hybrid classification approach for
sentiment analysis of text document. Int J Electr Comput Eng 8(6), 2088–8708 (2018)
24. Alrehili A, Albalawi K (2019) Sentiment analysis of customer reviews using ensemble method.
Int Conf Comput Inf Sci (ICCIS) 2019:1–6. https://doi.org/10.1109/ICCISci.2019.8716454
25. Alroobaea R (2022) Sentiment analysis on Amazon product reviews using the recurrent neural
network (RNN). Int J Adv Comput Sci Appl 13(4) (2022)
Chapter 21
Sentimentum: A Method of Detecting
Fake News
1 Introduction
In recent years, the topic of fake news has experienced a growth of interest in society.
Events like Brexit [2], the US election of the president in 2016, and more recently the
pandemic of covid-19 contributed to the growth of these interests. In social media,
fake news has wide dissemination, compared with traditional media like tv, radio,
and journal. Social media gives the possibility of any user spreading news in a few
seconds, in contrast, which also gives the possibility of any user spreading amounts
of fake news in seconds.
There is no universal definition for fake news, but there are concepts that are
always related when talking about fake news, definitions that, although imprecise,
help us to understand the topic and research problems that are related to it [3].
The authors [3] argue that there are some concepts related to fake news as news
with bias and deceptive discourse. However, what distinguishes this concept from
fake news is that also the false information author has the intentionality to obtain an
advantage with the dissemination of fake news, whether economic or political advan-
tages, in addition, fake news presents a fast spread on the network, often associated
with the use of bots.
Given this scenario and the significant result obtained by machine learning
approaches in other problems [3–8], this paper proposes to modify a method
presented in the paper Detecting Deceptive Discussion in Conference Calls (D3C2)
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 249
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_21
250 V. da Silva Souza and L. A. Silva
to the context of detection of fake news utilizing algorithms of machine learning and
techniques of natural language processing.
This paper is organized as follows. Section 2 describes the key concepts applied
in this study. Section 3 presents the key concepts from the paper Detecting Deceptive
Discussions in Conference Calls, and this paper inspired our method called Senti-
mentum, to detect fake news statements [9]. The proposed approach is detailed in
Sect. 4. Finally, Sect. 5 depicts the final considerations, as well as the possibilities
regarding future research.
To understand fake news detection firstly, we need to define what is fake news.
According to [3], we do not have a universal definition of what is fake news, but we
have some concepts that help us to understand what fake news is. Fake news can be
understood as an intentional distribution of unreliable news disseminated in media
like journals, television, radio, and social media and wants political, economic, or
social benefits [10]. Fake news detection is the task of evaluating news claims and
classifying them as true news or fake news, according to [11] we have seven types
of fake news: satire or parody; false connection; misleading content; false context;
imposter content; manipulated content; and fabricated content, and Fig. 1 details the
meaning of each of these seven types of fake news.
The automatic detection of fake news is the task of evaluating statements in news,
classifying them as true or false (true news or fake news) [4]. With the dissemination
of fake news in social media, traditional document structuring techniques of natural
language processing (NLP), such as bag-of-words or n-gram, are used, but they have
the following limitations [9]:
• As they are based on word count, they do not consider the context in which the
word is used.
• In the case of n-grams, the processes present a high computational cost, for greater
values of n.
In this paper, we adapted the method used in the paper Detecting Deceptive Discus-
sion in Conference Calls (D3C2) to the context of fake news detection. In D3C2, the
authors perform a linguistic and syntactic analysis of texts extracted from closing
conferences of the company’s quarterly financial statements [9].
The set of calls from these conferences was transcribed into texts and served as a
basis for building a model for predicting the probability of an error in the disclosure
of quarterly reports. The set of conferences analyzed comprises the period from
September 2003 to May 2007. The purpose of this method was to identify misleading
speeches propagated by the CEOs and CFOs of these companies at quarterly income
statement conferences.
The authors argue that CEOs and CFOs often have real knowledge of the data, but
for economic reasons, they may present intentionally false information. This type
of analysis interests researchers, investors, creditors, and financial market regulatory
bodies as it manages to capture misleading disclosures more accurately [9].
To carry out the linguistic and syntactic analysis, the authors base themselves on
the literature review based on [12], which has four perspectives of psychology as a
premise: emotions, cognitive effort, attempt to control, and lack of embracement.
To extract the linguistic and syntactic features from the text, the authors use the
linguistic inquiry and word count (LIWC) software, extracting words associated with
the LIWC categories from the text, using the premise that these categories are the
ones that best fit in the detection of deceptive speech [9].
The LIWC software reads the text and compares each word with its internal
dictionary’s word list and calculates the percentage of the total words in the text
that match each of the dictionary’s categories. Internally, LIWC applies the “bag
of words” model that represents the text through a vector of words, counting how
many times a given word appears in the text, the difference between LIWC and the
“bag of words” model is that with LIWC, and the words that are found within the
LIWC category dictionary are counted. This dictionary has specific categories that
are associated with psychology, and in this way, the dictionary counts the number of
words that occur for each LIWC category [9].
4 Evaluation
4.1.1 Datasets
We utilize an open fake news dataset based on Kaggle: “Fake News—Build a system
to identify unreliable news articles” which was prepared by students at the University
of Tennessee [13]. The database has 20,800 news organized into five attributes: id,
title, author, text, and label. The id attribute represents a unique identifier, the title
attribute represents the title of the text, the author attribute contains the name of the
author of the news, the text attribute contains the text of the news, and the label
attribute represents the classification of the news (zero (0) means true news and one
(1) means fake news) [13]. The database has a random distribution of 50% fake news
and 50% true news, and the texts of the text attribute are in the English language.
To evaluate the performance of the method, we will use methods, such as accuracy,
precision, recall, and confusion matrix [14].
The first step utilized the software LIWC in our dataset [13], in the attribute text,
representing the text of the news in English. The LIWC calculates the degree of
different categories of words through your intern dictionary also called LIWC. The
LIWC has different categories like anxiety, anger, affectivity, positive, negative, etc.
The software realizes processing called tokenization, stemming, and remotion of
stop words to count words associated with your internal dictionary. LIWC counts
the words in your internal dictionary and calculates the percentage of words in the
text associated with your internal dictionary.
The software counts the words within the text it finds in your dictionary, then
calculates the percentage of words that belong to each category. After this, we have
defined that there were texts, in which all attributes had a value zero, that is, after
applying LIWC, no information was obtained from any attribute associated with the
internal dictionary LIWC, and these texts with missing values were removed from the
dataset. We also removed some texts that had 100% of the text in just one attribute.
A second treatment was performed to remove outliers that had more than 20% of the
text in a single attribute. After the preprocessing, the dataset went from 20,800 texts
to 20,552 texts.
Table 1 shows a sample dataset after preprocessing performed with LIWC, and
this sample has only five lines out of a total of 20,552 records and ten attributes out
of a total of 28, considering the label attribute, our target attribute. In the database,
we can visualize a percentage of words belonging to each attribute we previously
chose in LIWC.
To adapt the D3C2 method in this study, we use LIWC categories as a basis,
which were used in the D3C2 paper and categories that fit the premises listed by
the authors, that is, the four perspectives of psychology: emotions, cognitive effort,
21 Sentimentum: A Method of Detecting Fake News 253
attempted of control and lack of embracement, the application of the method, and
the 28 categories selected are listed in Table 2.
4.2 Classification
The first attribute that has great influence in determining whether we have fake
news is the attribute power, the model identifies that for values smaller than 0.745
there is a set of 959 samples that have a high probability that the text is true compared
with values less than 1.395 there is a greater probability that we are dealing with fake
news.
21 Sentimentum: A Method of Detecting Fake News 255
5 Conclusion
People are incrementally producing and consuming news through social media,
instead of traditional media like journals, magazines, and tv. The dissemination of
fake news has intensified in recent years in events like Brexit and the 2016 presi-
dential election of Donald Trump [2]. The study of the identification of fake news is
fundamental to identifying and combating the disinformation that represents political,
economic, and social risks.
256 V. da Silva Souza and L. A. Silva
References
1. Hootsuite Digital (2021) Available 7 Dec 2021, from Hootsuite Inc: https://hootsuite.widen.
net/s/zcdrtxwczn/digital2021_globalreport_en
2. Bastos MT, Mercea D (2019) The Brexit botnet and user-generated hyperpartisan news. Social
science computer review
3. Zhou X, Zafarani R (2020) A survey of fake news: fundamental theories, detection methods,
and opportunities. ACM Comput Surv (CSUR) 1–40
4. Oshikawa R, Qian J, Wang WY (2018) A survey on natural language processing for fake news
detection. arXiv preprint arXiv:1811.00770
5. Parikh SB, Atrey PK (2018) Media-rich fake news detection: A survey. IEEE Conf Multimedia
Inf Process Retrieval (MIPR) 2018:436–441
21 Sentimentum: A Method of Detecting Fake News 257
6. Lillie AE, Middelboe ER (2019) Fake news detection using stance classification: a survey.
arXiv preprint arXiv:1907.00181
7. Cardoso Durier da Silva F, Vieira R, Garcia AC (2019) Can machines learn to detect fake news?
a survey focused on social media. In: Proceedings of the 52nd Hawaii international conference
on system sciences
8. Shu KE (2017) Fake news detection on social media: a data mining perspective. ACM SIGKDD
Explor Newsl 22–36
9. Larcker DF, Zakolyukina AA (2012) Detecting deceptive discussions in conference calls. J
Account Res 50(2):495–540
10. Kaplan A (2020) Artificial intelligence, social media, and fake news: is this the end of
democracy? Media Soc 149
11. Wardle C, Derakhshan H (2017) Information disorder: toward an interdisciplinary framework
for research and policy making. Counc Europe
12. Vrij A (2008) Detecting lies and deceit: Pitfalls and opportunities. Wiley
13. Kaggle BA (2017) Build a system to identify unreliable news articles. Available 4 Nov 2021,
from Kaggle: https://www.kaggle.com/c/fake-news/data
14. de Castro LN, Ferrari DG (2016) Introduction to data mining, 1ª. Saraiva Educação SA, São
Paulo
15. Medeiros FD, Braga RB (2020) Fake news detection in social media: a systematic review. A
systematic review. In: XVI Brazilian symposium on information systems, pp 1–8
16. Aluisio S, Checchia R, Chishman R (2022). PortLex. Fonte: LIWC: http://143.107.183.175:
21380/portlex/index.php/pt/projetos/liwc
Chapter 22
Artificial Neural Networks for Self-phase
Modulation Compensation in Unrepeated
Digital Coherent Optical Systems
Grazielle Cossa, Camila Costa, Vitória Cesar, Lucas Marim, Rafael Penchel,
José Augusto de Oliveira, Mirian Santos, Denilson Souza dos Santos,
and Ivan Aldaya
1 Introduction
The popularization of multimedia applications and the migration to cloud storage and
computing services are forcing Internet service providers to increase their transmis-
sion rates [1]. To meet these capacity requirements, optical communication systems
have undergone a silent revolution, migrating from traditional intensity-modulated
with direct detection systems to digital coherent systems [2]. Thus, the traditional
communications systems where information was transmitted just by modulating the
intensity of a lightwave have been progressively substituted by more sophisticated
systems in which not only the amplitude but also the phase and polarization diver-
sity are exploited to achieve higher spectral efficiency [3]. Digital coherent systems
were initially adopted in long-distance systems but, as the electronic evolves, they
became competitive at shorter ranges. As an example, in May 2020, the 400ZR
communication standard for connection between data centers was released [4]. This
standard aims to support up to four multiplexed 100G Ethernet connections, employ-
ing dual polarization 16-ary quadrature amplitude modulation (DP-16QAM). This
standard considers two operating modes: an unamplified single-channel system and
an amplified system with wavelength channel multiplexing. In both cases, the system
is limited by the combination of additive noise and nonlinear distortion induced by
the fiber Kerr effect. The Kerr effect is the optoelectronic effect by which the refrac-
tive index of the medium varies in the presence of high-intensity electromagnetic
waves [5]. In fiber transmission systems, this effect gives rise to three well-known
signal distortions denominated self-phase modulation (SPM), cross-phase modula-
tion (XPM), and four-wave mixing (FWM). Which of these distortions is dominant
will be dependent on the system configuration [5].
In the present section, we introduce the Manakov equations and discuss the benefits
of processing both polarizations simultaneously. Afterward, the MLP architecture is
presented, describing the adopted configuration.
attenuation, and polarization rotation, whereas the nonlinear mechanisms can be split
into the Kerr effect and stimulated scattering of light, which can be further classified
as stimulated Brillouin scattering (SBS) and stimulated Raman scattering (SRS) [5].
For the particular case of digital coherent systems, the lack of an optical car-
rier increases the SBS and SRS power thresholds, and therefore, these effects can
be neglected for typical launched optical transmission power levels. On the other
hand, the high baud rate makes the PMD have a significant effect. In addition, the
interferometric nature of the receiver in digital coherent systems and the adoption of
polarization multiplexing lead to a critical sensitivity to the fluctuations of the state
of polarization (SoP) of the incident optical signal. Consequently, it is important to
consider both polarizations. Thus, employing Jone’s formalism, the vectorial phasor
associated with the optical signal can be written as follows:
Ax
E(t) = x̂ ŷ exp( jω0 t), (1)
Ay
where A x and A y are the complex amplitudes of the x and y polarizations, respec-
tively, x̂ and ŷ are the unit norm vectors indicating the directions of the x and y
polarizations, and ω0 is the central angular frequency of the signal. By setting a suit-
able spatiotemporal framework, the evolution of A x and A y can then be described
by the following set of partial differential equations [5]:
∂ Ax ∂ Ax jβ2 ∂ 2 A x α
+ β1x + + Ax
∂z ∂t 2 ∂t 2 2
2 jγ ∗ 2
= jγ |A x |2 + |A y |2 A x + A A exp(−2 jβz)
3 3 x y
∂ Ay ∂ Ay jβ2 ∂ 2 A y α
+ β1y + + Ay
∂z ∂t 2 ∂t 2 2
2 jγ ∗ 2
= jγ |A y |2 + |A x |2 A y + A A exp(+2 jβz). (2)
3 3 y x
with β = β0x − β0y . In this set of equations, z is the propagation coordinate, and
β1x and β1y are related to the inverse of the group velocity in the x and y polar-
izations, which differ due to the birefringence caused by the core ellipticity. β2 is
the second-order dispersion parameter (assumed not to be significantly affected by
the aforementioned ellipticity), and α is the intensity attenuation coefficient. The
right-hand side of both equations represents the Kerr effect that can be split into two
contributions. Both of them depend on the nonlinear coefficient γ that is related to
the nonlinear refractive index through γ = k0 n 2 /Aeff , being k0 = 2π/λ0 (λ0 is the
operation wavelength) and Aeff the effective modal area. Nevertheless, these two
nonlinear terms present different effects on the transmitted signal because the first
term causes a nonlinear phase rotation that depends on |E x |2 and |E y |2 , while the
second term represents an additive interference. The interpretation of the contribu-
tions of nonlinear effects depends on the criterion adopted to define signal. If we
262 G. Cossa et al.
consider that each polarization constitutes a signal, then the first term represents
intra-polarization SPM, the second term corresponds to the inter-polarization XPM,
and the third term represents the FWM between the two polarizations.
It is important to note that the nonlinear term couples the two polarizations.
This is not merely a curiosity, but it leads to profound implications that impact the
architecture of the nonlinear compensation MLP. Therefore, if each polarization is
processed individually, the only nonlinear term that is compensated is the term that
we identified as SPM. The information of the other two terms is regarded and appears
as a noise contribution. If both polarizations are simultaneously processed, on the
other hand, the inter-polarization nonlinear distortion can be partially mitigated.
In the particular case of DP-16QAM, that is, in systems where each polarization
is modulated with a 16QAM signal, the variation of the intensity in each polarization
leads to XPM between polarizations. This nonlinear polarization crosstalk has a
significant impact on the system performance, as can be concluded from the analysis
presented in [12]. The mitigation of this effect is far from trivial due to the interaction
between the chromatic dispersion and nonlinear effects described in Eq. 2.
3 Simulation Setup
In order to obtain the data for the MLPs’ training and validation, we used the com-
mercial software VPIphotonics Transmission Maker. This tool offers a broad variety
of modules to simulate not only optical devices such as fiber and optical amplifiers
but also the associated electronics and digital processing blocks. The bit rate of the
22 Artificial Neural Networks for Self-phase Modulation … 263
Dem.
ŝq,x[n]
X-pol
ŝi,y[n]
Dem.
ŝq,y[n]
si,x[n] s'i,x[n]
Mapping
sq,y[n] s'q,y[n]
ŝi,x[n]
Demapping
ŝq,x[n]
ŝi,y[n]
ŝq,y[n]
X and Y polarizations
Fig. 1 General block diagram of a digital optical coherent link, including the two MLPs employed
for nonlinear distortion compensation. In addition, the transmitted constellation and the distorted
constellation are shown
system was configured to 112 Gbps, that is, 56 Gbps per polarization, and the number
of the simulated symbols was set to 262,144.
The simulation setup is shown in Fig. 1. Two independent pseudorandom bit
sequences were mapped into 16QAM constellations, converted to the continuous
time, and filtered using Nyquist filters with 20% roll-off factors. These electrical sig-
nals modulated the in-phase and quadrature components of the two orthogonal polar-
izations of a continuous-wave laser. The two modulated polarizations were joined in
a polarization beam combiner and amplified using an erbium-doped fiber amplifier,
whose output power was swept from 4 to 12 dBm. The optical signal was then trans-
mitted through a fiber span of 175 km. At the receiver, the orthogonal polarizations
of the received signal were separated and combined with the corresponding polariza-
tion components of the receiver laser in 90-degree hybrid networks. The four outputs
of each 90-degree hybrid network were digitalized and fed into the DSP, where the
signals are orthogonalized and equalized before time, and phase synchronizations
were performed. Afterward, the frequency offset and phase noise were corrected
using frequency-domain shift and blind-phase search. A detailed description of the
setup and its parameters are given in [12]. Once the phase and time synchronizations
were performed and the phase noise and frequency offset corrected, the nonlinear
distortion was mitigated using the MLPs.
264 G. Cossa et al.
4 Results
In this section, we first analyze the training curves and optimize the MLP in terms of
the number of neurons for different launched optical power levels (6, 8, and 10 dBm)
considering the process of each polarization individually and both polarizations at
the same time. Afterward, the BER for launched optical power levels ranging from
4 to 12 dBm is analyzed when the different approaches are applied. Finally, the
complexity of the proposed MLP-based equalizers is briefly discussed.
In Fig. 2, we show the evolution of the loss function as the MLP is trained and
the obtained BER for different numbers of the hidden layer for the two proposed
approaches, that is, processing polarizations independently and simultaneously.
Regarding the training curves, we considered a hidden layer with 50 neurons. For
this configuration, the curves obtained for launched optical power levels of 6 dBm,
Fig. 2a, 8 dBm, Fig. 2b, and 10 dBm, Fig. 2c, present a pronounced initial drop fol-
lowed by a slower convergence stage. However, there are some differences as the
launched optical power is increased. The first difference is that for the lowest con-
templated launched power, 6 dBm, the training curves for the MLPs processing each
polarization independently and simultaneously almost overlap. Indeed, the two train-
ing curves converge to very similar values. When we increase the launched power
to 8 dBm, the two curves converge to slightly different values, and, following the
tendency, the difference between the final values of loss functions for single and
dual polarization processing increases for 10 dBm. The second remarkable differ-
ence when we increase the launched optical power is the required amount of epochs
to achieve convergence. Thus, when each polarization is individually processed, the
required epochs remain almost constant at around 25. When processing the two
polarizations simultaneously, on the other hand, it can be observed that the number
of required epochs increases from 28 for 6 dBm to 39 for 10 dBm. The comparison
between the required epoch numbers indicates that the MLP for processing the two
polarizations simultaneously is more complex than for a single polarization, which
was expected as the former processes more information. In addition, the fact that
the number of required epochs increases significantly for simultaneous processing
suggests that the higher launched optical power levels lead to more complex systems
that need to be trained for a longer time.
Regarding the effect of the number of neurons in the hidden layer, in Fig. 2a–c,
we show the BER of the validation symbols for power levels of 6 dBm, 8 dBm, and
10 dBm, respectively. For each power level, the number of neurons in the hidden
layer was swept from 5 to 50, and the BER obtained employing maximum likeli-
hood (ML) is included as a reference. At first glance, the main difference between the
22 Artificial Neural Networks for Self-phase Modulation … 265
100
(a) -3.55 (d)
-3.60
log10(BER)
Loss
10-1 -3.65 ML
SP
SP -3.70
DP
-3.75 DP
10-2
0 10 20 30 40 10 20 30 40 50
Epochs Number of neurons
100 -3.8
(b) (e)
ML
-4.0
log10(BER)
-4.2
SP
Loss
10-1
-4.4
SP
-4.6
DP DP
-2
10 -4.8
0 10 20 30 40 10 20 30 40 50
Epochs Number of neurons
100 -3.2
(c)
ML (f)
log10(BER)
-3.6
Loss
10-1
SP
SP -4.0
DP DP
10-2 -4.4
0 10 20 30 40 10 20 30 40 50
Epochs Number of neurons
Fig. 2 Evolution of the loss function during the training for MLPs processing each polarization
separately (SP) and the two polarizations together (DP) for different powers launched in the fiber:
a 6 dBm, b 8 dBm and c 10 dBm. BER in terms of the number of neurons in the hidden layer
considering individual processing and set of polarizations for different powers launched in the
fiber: e 6 dBm, f 8 dBm and g 10 dBm. The BER for ML detection has been included as a reference
subfigures corresponding to different launched optical power levels is the higher per-
formance difference for stronger launched optical power levels. Thus, for 6 dBm, the
performance when ML is adopted is similar to that achieved when MLP is used and,
therefore, the use of MLP does not seem to represent a significant advantage over ML.
For 8 dBm launched optical power, it is possible to identify some differences between
the BER values obtained using ML and MLP. Furthermore, processing each polar-
ization individually and both polarizations simultaneously present slightly different
behavior. For instance, the performance when each polarization is processed inde-
pendently is virtually independent of the number of neurons, whereas when the two
polarizations are simultaneously processed, the performance slightly enhances as the
266 G. Cossa et al.
number of neurons increases. Indeed, it is interesting to note that for very low neuron
numbers, the MLP for processing the two polarizations is outperformed by the MLP
processing each polarization but as the number of neurons increases and the MLP
becomes more complex, processing both polarizations leads to lower BER values.
Once the training and the effect of the number of neurons are analyzed, we set the
number of neurons to 50 and swept the launched optical power from 4 to 12 dBm
(outside this range of launched optical power, the signal quality was not enough to
allow the synchronization of the signal at the receiver side). The calculated BER
obtained using ML and MLP with single and dual polarization processing is shown
in Fig. 3a. Comparing the curves, it is possible to observe that for launched optical
power levels up to 6 dBm, the BER curves for the different approaches overlap. As
the launch optical power level increases, the curves separate, and the performance
enhancement when MLP-based nonlinear compensation is more significant. When
we contrast the performance of processing each polarization and both polarizations,
the enhancement is more significant for higher power levels, particularly for lev-
els above 8 dBm. This indicates that processing both polarizations simultaneously
improves the performance because, in addition to the intra-polarization SPM, the
inter-polarization XPM can be partially compensated.
The effect of the nonlinear compensation using MPL can be visualized in Fig. 3b,
where the received constellation is presented alongside the output of the MLP in two
configurations: processing each polarization individually and the two polarizations
simultaneously. Looking at the different obtained polarizations, we can observe the
characteristic spiral-like shape of the constellation when SPM and XPM are present
and the partial mitigation when MLP is employed. In fact, it is possible to perceive
a reduction in the point dispersion when both polarizations are processed together.
(a) (b)
ML
SP
DP
Quadrature
-3
log10(BER)
-4
4 5 6 7 8 9 10 11 12 In-phase
Transmitted optical power [dBm]
Fig. 3 a BER in terms of the launched optical power considering maximum likelihood and MLP-
based equalization operating on each polarization independently and on both polarizations simul-
taneously. b Constellation diagrams in the absence of maximum likelihood detection (MLP) and
when MLP is applied to each polarization and both polarizations. The color code is the same as in (a)
Nop_hidden = Nop_neuron
hidden
· Nh = [Ni + (Ni − 1)] · Nh (4)
3. Output layer: the number of the operations in each neuron of the output layer is
calculated similarly, obtaining:
out put
Nop_neuron = Nh + (Nh − 1), (5)
and, therefore, for the whole output layer, the number of operations is:
Nop_output = Nop_neuron
output
· No = [Nh + (Nh − 1)] · No . (6)
The total operation count is the sum of the previous counts, giving as a result:
0
0 10 20 30 40 50
Number of neurons in the hidden layer, Nh
The previous expression can be particularized for the two contemplated cases, that is,
the MLP processing each polarization independently and processing the two polar-
izations. Therefore, we have for the former case:
2 · (7Nh − 2) for each polarization processed individually
Noper = (8)
15Nh − 4 for both polarizations processed simultaneously.
5 Conclusions
In this paper, we have employed MLPs to compensate for the nonlinear distor-
tion in 175 km-long unrepeated digital coherent systems employing DP-16QAM.
In particular, we use two different MLPs, one of them operating in each polariza-
22 Artificial Neural Networks for Self-phase Modulation … 269
tion independently and another MLP that processed both polarizations at the same
time. Simulation results reveal that, indeed, MLPs are able to mitigate the nonlinear
distortion partially. Furthermore, we could observe that the MLP that operated on
the two polarizations simultaneously outperforms the MLP that only processed one
polarization because, in addition to SPM, it can also mitigate the XPM caused by the
orthogonal polarization. This performance enhancement, however, is achieved at the
expense of a higher computational loss. Therefore, the network designer can choose
between a high cost and superior performance or poorer performance with reduced
cost.
Acknowledgements The authors thank the Sao Paulo Research Foundation (grant number
15/24517-8) and The National Council for Scientific and Technological Development.
References
1 Introduction
The root word for ‘cognition’ in Latin, ‘cognoscere’ translates to learn, to recognize,
to be acquainted with, to know, to find to be, and to inquire or examine. Cognitive
computing helps human experts by delving into the complexity of big data and
providing support which either humans or machines do on their own [1]. Prabhu
[2] explains that it works with reality (data) and knowledge (information) and turns
models into reality by perception, induction, conception, and deduction. Leading
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 271
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_23
272 P. S. H. Darius et al.
cloud providers capitalize on offering cognitive APIs to developers and the global
cognitive market share is set to reach USD 15.28 Billion by 2023 [3]. Among the
key market players in cognitive services are IBM, Microsoft, AWS, and Google [4].
Microsoft Azure categorizes cognitive API as speech, language, vision, and deci-
sion. Table 1 presents the counterparts in Google, Amazon, and IBM Watson. Figure 1
shows the various end-user applications that use these cognitive services.
A comparative analysis of the various features, pros, and cons of cognitive APIs
for speech, language, vision, and decision among the key players in cloud platforms.
The scope of application and limitations of these APIs are examined by case studies.
The tools and techniques required to develop custom APIs are demonstrated.
Table 1 Comparison of cognitive APIs in Azure, Google, Amazon, and IBM Watson
Azure Google Amazon IBM Watson
Speech Speech to text Speech to text Amazon transcribe Speech to text
Text to speech Text to speech Amazon polly Text to speech
Speech Translation AI Amazon translate Language translator
translation
Speaker Speech to text Amazon transcribe Speech to text
recognition (includes Speaker
diarization)
Language Entity Cloud Natural Amazon kendra Natural language
recognition language understanding
Sentiment Cloud natural Amazon comprehend Natural language
analysis language understanding
Question DialogFlow Amazon mechanical IBM watson
answering turk Assistant
Conversational Media Amazon lex natural language
language translation understanding
understanding
Translator Translation AI Amazon translate Language translator
Vision Computer vision Video AI/ Amazon rekognition/ Watson visual
vision AI amazon lookout recognition
Custom vision Video AI/ AWS panorama Watson visual
vision AI recognition
Face API Video AI/ Amazon rekognition Watson visual
vision AI recognition
Decision Anomaly Timeseries Amazon lookout for Anomaly detection
detector insights API metrics/Amazon fraud
detector
Content Perspective Amazon rekognition
moderator API
Personalizer Amazon personalizer
23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms 273
Diagnosis and
Quality management
Treatment
2 Cognitive APIs
Cloud cognitive APIs are the enablers of smart cities, Industry 5.0, smart homes,
and digital transformation in the economy and ecosystem among many others. The
broad classes of speech, language, vision, and decision APIs are discussed below.
Virtual assistants for the visually disabled by Sultan et al. [5], real-time conversion
of speech to sign language by Jadhav et al. [6], giving instructions in an augmented
reality environment in industries described in Tseng [7], AI chatbots implemented
by Prasad et al. [8], home automation, video narration, voice-overs all rely heavily
on the efficiency of the speech API.
Azure offers options to create natural voices that can express emotions and
create custom models. The speech SDK is available in multiple programming
languages and works with local devices or Azure Blob storage. These capabilities are
enabled through speech-to-text, speech translation, and text-to-speech with speaker
recognition APIs.
IBM Watson offers speech-to-text services where users can customize audio’s
language, format, and sampling rate. In text-to-speech, voices are smooth with dialect
and language-appropriate rhythm and phrasing. When used with IBM Assistant, call
centers at MRS BPO report a 20% increase in revenue. Google’s Speech-to-text
provides customization and domain-specific trained models (voice control, phone
call, video transcription) for both public and private clouds. HSBC is one of the clients
that use this solution in every Cantonese-English call center that presents terms and
conditions [9]. Another speech-to-text service, Amazon Transcribe adds punctuation,
274 P. S. H. Darius et al.
Natural language processing (NLP) contains methods for speech and text processing
for automatic analysis and presentation in human language representation as
described by Cambria and White [13]. The recognition of entities, sentiment anal-
ysis, conversational language understanding, and translation services are important
features in language APIs. Dale [14] states that basic tasks about morphological and
syntactic analysis are provided by standard cloud APIs.
The various features of Language API are given below.
• Sentiment analysis determines the emotional opinion of it being positive, negative,
or neutral.
• Entity analysis identifies nouns like public figures or landmarks and common
nouns like schools, and buildings.
• Entity sentiment analysis identifies the emotional opinion about that entity.
• Syntactic analysis extracts linguistic information and provides this information
in tokens.
• Content classification analyses text content and assigns it to one of several content
categories.
Straightforward deployment of pipelines, easy upload and storage of information,
parallelization independent of the algorithm, load balancing, security and fault toler-
ance were listed as the technological blueprint required for providing NLP as a cloud
service by Pais et al. [15].
Popular NLP APIs include Amazon Comprehend [16], Microsoft Azure Cognitive
Services [17], and Google Cloud Natural Language [18].
The Amazon Comprehend service identifies entities and targeted emotions with a
confidence level for language by returning the dominant language from hundreds of
languages. Syntax analysis and topic modeling are also done. If a customer comment
is to be analyzed, assuming there are 500 characters and 6 units per request, it is
charged $0.0001 and the cost will be $6.00. A sample output from Amazon Compre-
hend is shown in Table 3 for consumer reviews in the tutorial [19]. The overall
sentiment is Positive as it has the highest sentiment score as compared to Negative,
Neutral and Mixed scores.
Microsoft Azure Cognitive Services has several applications that can analyze
sentiment and identify the language of a given text by using Azure Text Analytics
API. Azure Language Understanding service can understand things like user intent.
Google Cloud Natural Language works on emails, chat, and social media to iden-
tify entities, and perform sentiment and syntax analysis and categorization. Google
AutoML Natural Language allows users to provide training data to create custom
machine-learning models for users with more specialized needs. Another notable
API is Diffbot [20] which precisely extracts data from websites. MonkeyLearn [21],
automates workflows on unstructured data.
Computer Vision (CV) is a technology that allows the machine to detect and recognize
people, places, and things in a given image with a human-like accuracy at higher
speed and efficiency. Often, with the help of machine learning models, it analyses
the images, identifies the features and classifies them, and provides useful insight to
the user. It is used mostly in the domains of autonomous robots, analysis of medical
imaging, identifying people on social media, etc.
AWS provides a service called Amazon Rekognition. It provides a deep-learning-
based visual search and image classification. AWS Computer Vision offers content
moderation, face compare, and search, labels, celebrity recognition, video segment
detection, face detection and analysis, and text detection. It can be used to detect
inappropriate content from videos/images, verify the identity of a celebrity online,
and analyze and streamline media content. It supports JPEG and PNG image formats
and the resolution should be between 320 × 240 and 640 × 480 or higher.
Computer vision in Microsoft Azure service analyses the content in the images or
videos and extracts the information, and provides useful insights to the user. Various
services are provided by the Azure cloud platform on computer vision consisting of
text extraction, image understanding, and spatial analysis with flexible deployment
models on the cloud. Azure could identify around 10,000 objects from an image.
Azure provides a cloud-based Computer Vision API with the flexibility of
choosing the inputs and the algorithms based on the user’s choice. The prominent
services provided are Optical Character Recognition (OCR), image analysis, face
detection and recognition, and spatial analysis. A sample of the vision API is shown
in Fig. 2.
Vision Studio by the Microsoft Azure platform lets the user explore, build and
integrate the features from Azure Computer vision. This tool uses REST APIs to
embed the services into the applications.
Google Cloud Platform (GCP) provides a computer vision environment, Vision
AI, that allows the user to create CV applications or derive insights from the images
and Videos. It supports these operations with the help of pre-trained APIs, Auto ML,
or custom models done by the users. It is accessible through REST and Remote
Procedure Call (RPC) APIs. It can detect objects, read printed and handwritten text
and build valuable metadata in the image catalog. It also supports the environment
Vertex AI Vision, which can be used to build CV applications with custom ML models
23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms 277
Fig. 2 Image captioning provided by Azure’s vision API “a yellow car on the street” with 55%
confidence [22]
for unique customer needs to be optimized for accuracy, latency, and size. It can take
the input only through Streams to ingest real-time video data. Table 4 provides a
comparative view of the different features of computer vision services provided by
popular cloud computing platforms.
Anomaly APIs
Anomaly detection is a process in machine learning which identifies events, data
points, and observations that deviate from a dataset’s normal behavior. In indus-
trial applications, Lima et al. [23] state that it is very challenging to find anomalies
from unlabelled time series data. In supervised anomaly detection, labelled data that
278 P. S. H. Darius et al.
represents previous failures or anomalies are used to learn the model. In unsuper-
vised detection, no labeled data is provided. In semi-supervised anomaly detection,
a small amount of labelled data is provided to validate the model and select the best-
performing model trained on normal data (or data with no anomalies). A sample
output for a univariate dataset using IBM Watson API [24] is shown in Table 5 using
PredAD and Chi-square labeling method. The anomaly score refers to the level at
which a data point deviates from the normal data. If the anomaly score is high, a
label of −1 is returned. If the label returned is 1 that means it is normal.
A comparison of various features is presented in Table 6.
Content Moderator API
Nowadays, User Generated Content (UGC) such as social media posts and content
published on the web in the form of text, image, or video needs to be routinely checked
for offensive or undesirable material as pointed out by Kharb [29]. Content Moderator
API provides these services and flags content. The application then proceeds to
enforce appropriate measures on the flagged content.
Content Moderation APIs use AI models to detect sensitive content in bodies
of text, including those shared via online platforms or social media. Azure Content
Moderator gives freemium services. In a free instance, 1 transaction per second
is allowed. In standard instances, 10 transactions per second are allowed. Use
Table 5 Sample output for anomaly detection for univariate dataset [24]
Anomaly detection algorithm PredAD—unsupervised time series prediction model
Labelling Method Chi-Square
Normal
{“timestamp”:"2017–01-01 05:45:00”,
“value”:{“anomaly_label”:[1.0],"anomaly_score”:[2.9599127858341574]}}
Anomaly
{“timestamp”:"2017–01-01 21:45:00”,
“value”:{“anomaly_label”:[-1.0],"anomaly_score”:[4.011492546951829]}}
cases of content moderator APIs are smart media monitoring, protecting advertisers,
protecting brand reputation, increasing brand loyalty, and increasing brand engage-
ment. Some limitations of Content Moderation APIs are moderation process is not
fully automated, mistakes in the identification of harmful content, and contextual
variations in speech, images, and cultural norms.
Personalizer API
The future of the digital experience is personalization. The power of customer data to
increase engagement, loyalty, and advocacy. Al Zhoube [30] discusses assessment-
based personalization learning in the cloud. Some personalizer APIs are Microsoft
Azure Cognitive Services Personalizer API and Amazon Personalize. In Microsoft,
for the freemium tier, 50,000 transactions for free per month are allowed and a 10 GB
storage quota is available. In standard instances, a charge per thousand transactions
is invoked. In Amazon Personalize API free trial data processing and storage up to
20 GB per month per eligible AWS Region may be availed. In paid services, prices are
per 100,000 users. Uses of personalizer are intent clarification and disambiguation,
default suggestions for menus and options, Bot traits and tone, etc. Some drawbacks
of Personalizer APIs are that the setup process is complex, documentation is not
good and pricing plans are not developer or customer friendly.
3 Case Studies
There are numerous case studies of success stories in using Cognitive APIs. Two
user stories are presented here.
Siemens and IBM created CARL [24], a Human Resource (HR) agent powered by
IBM Watson Discovery and IBM Watson Assistant. The Siemens HR division has a
workforce of around 4 lakhs.CARL was developed as a single point of contact for
all HR-related questions as shown in Fig. 5.
It initially addressed the most common topics like sick leaves or vacations. But
it is now customizable which allows CARL to meet employees’ unique needs. It
is deployed in over 20 countries. It is conversational in more than 200 topics and
responds to 1 million employee queries a month. It has made life easier for employees
at Siemens including the human resource department. It continues to evolve based
on improvements and suggestions by HR staff.
4 Conclusion
References
4. Cognitive Computing Market Size, Share | Global Industry Growth [2027](2020). https://www.
fortunebusinessinsights.com/cognitive-computing-market-103377. Accessed 23 Oct 2022
5. Sultan MR, Hoque MM, Heeya FU, Ahmed I, Ferdouse MR, Mubin SMA (2021) A bangla
virtual assistant for visually impaired. In: 2021 2nd international conference on robotics,
electrical and signal processing techniques (ICREST), pp 597–602
6. Jadhav S, Kumar S, Chauhan H, Negi S, Singh V (2018) Real-time conversion of speech to
sign language and hand gesture recognition. In: Application of communication computational
intelligence and learning. Routledge, pp 269–278
7. Tseng JL (2021) Intelligent augmented reality system based on speech recognition. Int J Circuits
Syst Sig Proc 15:178–186
8. Prasad PVKV, Krishna NV, Jacob TP (2022) AI CHATBOT using web speech API and Node.js.
In: 2022 international conference on sustainable computing and data communication systems
(ICSCDS). IEEE, pp 360–362
9. Case Study | Google Cloud. https://cloud.google.com/customers/hsbc. Accessed 30 Oct 2022
10. Amazon Transcribe – Speech to Text - AWS. https://aws.amazon.com/transcribe/?nc=sn&
loc=1. Accessed 30 Oct 2022
11. What is the Speech service? - Azure Cognitive Services | Microsoft Learn. https://learn.micros
oft.com/en-us/azure/cognitive-services/speech-service/overview. Accessed 30 Oct 2022
12. Speech to Text | IBM Cloud API Docs. https://cloud.ibm.com/apidocs/speech-to-text. Accessed
30 Oct 2022
13. Cambria E, White B (2014) Jumping NLP curves: a review of natural language processing
research. IEEE Comput Intell Mag 9(2):48–57
14. Dale R (2015) NLP meets the cloud. In: Natural language engineering, vol 21, no 4. Cambridge
University Press, pp 653–659
15. Pais S, Cordeiro J, Jamil ML (2022) NLP-based platform as a service: a brief review. J Big
Data 9(1)
16. Natural Language Processing – Amazon Comprehend – Amazon Web Services. https://aws.
amazon.com/comprehend/. Accessed 30 Oct 2022
17. Cognitive Services—APIs for AI Solutions | Microsoft Azure. https://azure.microsoft.com/en-
us/products/cognitive-services/. Accessed 30 Oct 2022
18. Cloud Natural Language | Google Cloud. https://cloud.google.com/natural-language. Accessed
30 Oct 2022
19. Get better insight from reviews using Amazon Comprehend | AWS Machine
Learning Blog. https://aws.amazon.com/blogs/machine-learning/get-better-insight-from-rev
iews-using-amazon-comprehend/. Accessed 30 Oct 2022
20. diffbot. https://docs.diffbot.com/docs/what-diffbot-product-do-i-need. Accessed 30 Oct 2022
21. MonkeyLearn - Text Analytics. https://monkeylearn.com/. Accessed 30 Oct 2022
22. AI Demos. https://aidemos.microsoft.com/computer-vision. Accessed 30 Oct 2022
23. Lima J, Salles R, Porto F, Coutinho R, Alpis P, Escobrar L, Pacitti E, Ogasawara E (2022)
Forward and backward inertial anomaly detector: a novel time series event detection method.
In: International joint conference on neural networks (IJCNN), pp 1–8
24. Siemens | CARL: Your Cognitive HR Assistant | The One Club. https://www.oneclub.org/por
tfolio/view/-8285/carl-your-cognitive-hr-assistant. Accessed 30 Oct 2022
25. Nawrocki P, Sus W (2022) Anomaly detection in the context of long-term cloud resource usage
planning. Knowl Inf Syst 64(10):2689–2711
26. An L, Tu A-J, Liu X, Akkiraju R (2022) Real-time statistical log anomaly detection with
continuous AIOps learning. In: Proceedings of the 12th international conference on cloud
computing and services science, pp 223–230
27. Hrusto A, Engstrom E, Runeson P (2022) Optimization of anomaly detection in a microservice
system through continuous feedback from development. In: IEEE/ACM 10th international
workshop on software engineering for systems-of-systems and software ecosystems (SESoS),
pp 13–20
28. Givnan S, Chalmers C, Fergus P, Ortega-Martorell S, Whalley T (2022) Anomaly detection
using autoencoder reconstruction upon industrial motors. Sensors (Basel) 22(9)
23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms 283
29. Kharb DL (2017) Embedding intelligence through cognitive services. Int J Res Appl Sci Eng
Technol V(XI):533–537
30. Al-Zoube M (2009) E-learning on the cloud. Int J Arab e-Technol 1(2)
31. How Equadex used Cognitive Services to help people with language disorders | Microsoft
Technical Case Studies (2017) https://microsoft.github.io/techcasestudies/cognitiveservices/
2017/08/04/equadexcognitives.html. Accessed 30 Oct 2022
Chapter 24
A Survey on Efficient Neural Network
Compression Techniques
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 285
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_24
286 N. Jain et al.
Fig. 1 Standard neural network model architectures by year and the number of petaflops required
(for training) [13]
24 A Survey on Efficient Neural Network Compression Techniques 287
We know that the main goal in most real-world applications involving deep
learning inference is to attain maximum accuracy with the shortest possible run
time. As a model architecture grows in complexity, the number of floating-point
operations (FLOPs) also increases, and this demand increases in the storage and
processing capacities of a system. Thus, smaller models with better or similar
accuracy/performance are the key to the future.
Consider an example use case of image captioning; image captioning [14, 15]
is a technique that produces human-readable textual descriptions of images using
various techniques like natural language processing and deep learning methodolo-
gies. Remote sensing images [16, 17] give an account of images captured from a high
altitude like satellites where arguably the task of detection involves a higher degree
of complexity as compared to default object detection/classification techniques. To
overcome this, various neural techniques [18, 19] are employed to achieve a success
rate for the model while keeping the model lightweight. Similarly, there are various
ways to reduce the size of models thereby decreasing the running time at inference.
From the model point of view, techniques like quantization, pruning, knowledge
distillation, and efficient model architecture can be used. These techniques aim to
shrink the size of the neural network model by making some suitable architectural
changes. Quantization is the process of approximating the high-bit floating-point
numbers used in a neural network to low floating-point numbers, for example, if we
change the size of learned weight parameters from FP32 to FP16, the overall size of
the model will get reduced.
Pruning is the process of selectively eliminating redundant connections between
the neurons in a neural network. This decreases the model size and number of compu-
tations required during inference. Knowledge distillation is the process of training a
smaller model by using a larger model; the goal is to achieve similar accuracy with
the smaller model so that it can be used for inference rather than the original larger
model. The efficient model architecture is a technique that aims for creating smaller
and more efficient models which can produce similar results compared to larger and
more sophisticated model architectures.
The remaining paper is outlined as follows: In Section II, we summarize and
discuss quantization along with its implementation and results with respect to various
deep learning tasks such as detection CNN, speech recognition, and machine trans-
lation. In Section III, we discuss the pruning method for NN compression, including
an analysis of its performance on various tasks with respect to standard datasets like
CIFAR-10 [20] and ImageNet [21]. In Section IV, we summarize the knowledge
distillation method and analyze its performance and applications on various tasks.
In Section V, we discuss various efficient neural architectures and summarize their
applications with respect to various deep learning tasks. In Section VI, we compare
and analyze the observed results of all the mentioned results, and finally, in Section
VII, we provide a conclusion and our recommendations on the above-discussed
compression techniques for deep learning architectures.
288 N. Jain et al.
2 Quantization
Compared to FP16 precision, the FP32 precision has a much higher dynamic
range making it possible to avoid numeric overflow and underflow. However, in
FP16 precision, any value above 65,504 will become infinity (overflow) and any
value below 6.0 × 10^−8 will become zero (underflow). The idea of loss scaling is
to multiply the loss value with a suitable multiplication factor so that the overflow and
underflow issues can be avoided. Finally, single-precision outputs are transformed
to half-precision before being stored in memory to retain model correctness.
The mixed-precision training methodology works across a wide range of advanced
tasks, such as object detection, speech recognition, and machine translation. Sharan
Narang et al. [22] trained the Faster-RCNN model [26] using mixed precision with
loss scale and found that the model outperformed the baseline of 69.1% (mAP) on the
Pascal VOC 2007 test set. Similarly, the Deep Speech 2 model for speech recognition
trained using mixed precision on the English dataset has achieved close results to the
original baseline of 2.20 Character Error Rate (CER) with 1.99 CER.
Along with the object detection and speech recognition tasks, mixed-precision
training has also shown good results for machine translation tasks. Figure 3a and b
shows the training perplexity of the t3 × 1024 LSTM [27] model for the English to
French translation task without and with mixed-precision technique. Three separate
FP32 training runs are represented by ref1, ref2, and ref3. This shows that during
training, the half-precision storage format may operate as a regularizer.
Binary connect [23] is another popular technique for quantization that has shown
good results in the test-time inference of DNN models trained on standard benchmark
datasets like Population MNIST [28] and CIFAR-10 [20].
The stochastic version of the binary connect technique has shown 8.27% error
rates of DNNs trained on the CIFAR-10[20] dataset. This shows that, despite using
only a single bit per weight during propagation, performance is not only comparable
to that of ordinary (non-regularizer) DNNs, but actually better, implying that binary
connect can be considered a regularizer.
3 Pruning
Over-parameterized networks which are generally large networks that contain redun-
dancies to remove these redundancies pruning are used. Removing these redundan-
cies results in a reduction in the size of the model and increases the speed. Pruning
can also be defined as the removal of unused parameters from the other network
which is over-parameterized.
Similar works in structured pruning like data-driven sparse structure selection [29]
and HAP [30] have an immense contribution to reducing the size and computational
complexity of applications.
A similar approach is proposed in AMC [31]; the approach uses reinforcement
learning. This approach provides the policy of model compression, which performs
much better than the conventional rule-based compression policy. The conventional
290 N. Jain et al.
B
Fig. 3 English to French translation network training perplexity
rule-based compression policy has a higher compression ratio and better accuracy
as compared to the model compression policy.
Similarly in HAP [20], instead of pruning all the components, the components
which are not sensitive are pruned.
Pruning revolves around the idea of cutting down additional weights in order to
reduce computational and memory expenses [29]. The basic principles of pruning
consist of removing unnecessary weighted information using second derivative infor-
mation which results in better results, a much-improved speed of processing the
results, and a significant reduction in size as well. The decision of importance and
an important wait is done through the ranking of neurons from the neural network
that has been explained in optimal brain damage. In order to avoid pruning mystery
neurons, it is an iterative process. As neural networks are black boxes, this also
ensures that a significant part of the network is not lost.
AutoML for model compression (AMC) [31] is to find the irrelevant weights and
biases for each layer on the basis of sparsity. Indian celebrity’s reinforcement learning
24 A Survey on Efficient Neural Network Compression Techniques 291
for efficient search or actions face; however, they have introduced the detailed setting
of reinforcement learning framework using three catalysts.
• The State Space
• The Action Space
• Deterministic Policy Gradient (DDPG).
As shown in Fig. 4, on the left AMC replaces manual efforts and makes model
compression fully automated. In the right form, as a reinforcement learning program,
it processes a pre-trained network (e.g., MobileNet [32]) per layer.
In order to achieve both accuracy and latency, a single non-RNN controller is
required on AMC engine optimization which will not only help assist exploration
using fewer GPU hours but also support continuous action space.
In VGG-16 [33], AMC [31] outperformed all heuristic methods by more than
0.9% and beat human experts by 0.6% without manual efforts. Even for MobileNet
V2 [34], which is the best model designed, still 1% accuracy can be improved using
AMC.
AMC successfully compressed the ratio of ResNet-50 [35] on ImageNet from 3.4
times to 5 times. Without loss of performance on ImageNet [36] (AMC’s pruned
model top-5 accuracy came out to be 92.89%).
Guo et al. [37] showed dynamic network surgery to prune parameters during
training, but the nature of irregular sparse weights limited them to yield compression
to not faster inference in terms of wall clock time.
2
Loss = 1/N yi − Q Si , ai |θ Q (1)
i
yi = ri − b + γ Q(si+1 )|θ Q .
In Eq. 1, here γ is the discount factor, and it is set to 1 so that there is no over-
prioritizing of short-term reward.
292 N. Jain et al.
4 Knowledge Distillation
JS PT x̂ ,PS x̂ = 1/2 KL pt x̂ , M + KL ps x̂ , M .
In Eq. 3, JS PT x̂ , ps x̂ is the average of the teacher–student distribution.
The top one accuracy of DeepInversion surpasses that of a Deep Dream by a signif-
icant margin when considering models like ResNet-18, Inception-V3, MobileNet V2
[28], and VGG-11.
After adding feature distribution regularization, there is an improvement in
accuracy by 40–69%.
Upon using competition-based inversion, it is observed that there is an improve-
ment in accuracy by 1–10% which brings the accuracy of the student to that of a
teacher who is trained on the CIFAR-10 dataset.
Quantized distillation [38] is the method with better accuracy as compared to
an array of bit widths and architectures. It performs better postmortem quantization
for 2-bit and 4-bit quantization. It has a better accuracy which is within 0.2% of
the teacher at 8 bits on the larger student model and a small accuracy loss at 4-bit
quantization.
In recent times, there has been high demand to make space-efficient neural networks.
Various approaches like [41–45] are categorized as either compressing pre-trained
networks or simultaneously training small networks.
294 N. Jain et al.
MobileNet [46] is a type of network architecture that gives the model developer
the freedom to choose a mini network that matches the resource requirement. For
their application, Andrew G. Howard et al. primarily improves the latency while
working on small networks.
Another efficient network is the SqueezeNet [42], which makes use of the
bottleneck approach to design an efficient network.
Figure 6 shows a standard convolutional layer with rectified linear unit (ReLU)
and batch norm (BN) and the right side also shows rectified linear unit and batch
norm, but they are preceded by depth-wise and pointwise layers.
Depth-wise separable convolution is the base of MobileNet architecture. All its
layers are followed by batch norm and ReLU nonlinearity. The final spatial reso-
lution is reduced to 1 using average pooling before the fully connected layers. In
total, MobileNet has 28 layers. Even though MobileNet architecture is a very space-
efficient and low-latency network. To make it even smaller, a simpler parameter
width multiplier can be used. The use of a width multiplier is to uniformly shrink
the network at each layer.
Expression 4 represents the formula for the calculation of the computational cost
of a depth-wise separable convolution.
6 Discussion
We have shown that, with the increase in research for better model accuracy, the need
for an increase in research for better NN compression techniques is a must. Pruning
involves the removal of unnecessary weights and biases in order to get a small and
efficient model. On the other hand, quantization reduces the number of bits in which
weights are stored to achieve a smaller size, while the knowledge distillation method
involves training a deep teacher network on the dataset and then training a small
student network to learn from a teacher network with an aspiration that the smaller
network will achieve similar performance as the bigger network.
Pruning connections however lead to sparse matrices which results in computa-
tional difficulty. Since in a complex network there are so many connections, pruning
them is not computationally cheap and can cause its own problems. Alternatively,
a simple approach using quantization techniques may sometimes lead to a substan-
tial loss in accuracy, for example, in binarization, 32 × model compression can
be achieved, but this has shown poor accuracy on LSTM and RNN models since
its simplicity impacts the vanishing/exploding gradients. Loss-aware quantization
techniques can be considered a better approach to simple static quantization as it
296 N. Jain et al.
quantized weights with respect to the loss, showing superior performance to more
static quantization methods.
Efficient neural architecture basically focuses on the data flow management of a
neural network architecture in order to acquire the best accuracy in the least memory
usage. A plethora of research still needs to be done in the domain of designing
efficient neural architectures in order to make them useful for various use cases in
image classification and segmentation.
7 Conclusion
In this paper, we have seen some of the main neural network compression tech-
niques, namely quantization, pruning, knowledge destination, and efficient model
architecture. We have analyzed the implementation of these methods and discussed
the pros and cons of each method. Before implementing the compression technique,
it is important to understand how each and every method works and what impact will
it bring on the performance of the model. From our comparative analysis, knowl-
edge distillation could be particularly a better subset of model compression methods
as it requires less human effort. We also believe, depending on the use cases, each
of these methods can prove pretty helpful when it comes to reducing the size of
the model. With AI technology spreading its roots to resource-constrained edge
devices, the development of advanced neural network compression techniques is a
must. This will also play a vital role in increasing the usability of NN models in
resource-constrained systems such as IoT and space applications.
References
1. Kim P, Convolutional neural network. In: MATLAB deep learning. Apress, Berkeley, CA
2. O’Shea K, Nash R, An introduction to convolutional neural networks
3. Mandic D, Chambers J, Recurrent neural networks for prediction: learning algorithms,
architectures, and stability. Wiley
4. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Sig Proc
45(11):2673–2681. https://doi.org/10.1109/78.650093
5. Huang G, Liu Z, van der Maaten L, Weinberger KQ, Densely connected convolutional networks.
arXiv:1608.06993 [cs.CV]
6. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z, Rethinking the inception architecture for
computer vision. University College London
7. Szegedy C, Ioffe S, Vanhoucke V, Alemi A, Inception-ResNet and the impact of residual
connections on learning
8. Zhang X, Li Z, Change C, Lin LD, PolyNet: a pursuit of structural diversity in very deep
networks
9. Hu J, Shen L, Albanie S, Sun G, Wu E, Squeeze-and-excitation networks. Comput Vis Pattern
Recog (cs.CV)
24 A Survey on Efficient Neural Network Compression Techniques 297
10. Huang Y, Cheng Y, Bapna A, Firat O, Chen MX, Chen D, Lee H, Ngiam J, Le QV, Wu Y, Chen
Z, GPipe: efficient training of giant neural networks using pipeline parallelism. Comput Vis
Pattern Recog (cs.CV)
11. Xie S, Dollar RGP, He ZTK, Aggregated residual transformations for deep neural networks.
Facebook AI Research, UC San Diego
12. Schaller R (1997) Moore’s law: past, present, and future. IEEE Spectrum 52–59
13. Amodei D, Hernandez D (2018) AI and Compute. Open-ai Research
14. Stefanini M, Cornia M, Baraldi L, Cascianelli S, Fiameni G, Cucchiara R (2021) From show
to tell: a survey on image captioning. arXiv preprint arXiv:2107.06912
15. Hossain Z, Sohel F, Shiratuddin MF, Laga H (2019) A comprehensive survey of deep learning
for image captioning. ACM Comput Surv 51(6) Article 118 36 pp
16. Li K, Wan G, Cheng G, Meng L, Han J (2020) Object detection in optical remote sensing
images: A survey and a new benchmark. ISPRS J Photogramm Remote Sens 159:296–307
17. Yuan Q, Shen H, Li T, Li Z, Li S, Jiang Y, Xu H, Tan W, Yang Q, Wang J, Gao J (2020) Deep
learning in environmental remote sensing: achievements and challenges. Remote Sens Environ
241:111716
18. Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A survey of convolutional neural networks: analysis,
applications, and prospects. IEEE Trans Neural Netw Learn Syst
19. Kiranyaz S, Avci O, Abdeljaber O, Ince T, Gabbouj M, Inman DJ (2021) 1D convolutional
neural networks and applications: a survey. Mech Syst Signal Process 151:107398
20. CIFAR10 to compare visual recognition performance between deep neural networks and
humans. Tien Ho-Phuoc the University of Danang – University of Science and Technology
21. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical
image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255.
https://doi.org/10.1109/CVPR.2009.5206848
22. Narang S, Micikevicius P, Diamos G, Elsen E, Alben J, Garcia D, Ginsburg B, Houston M,
Kuchaiev O, Venkatesh G, Wu H (2018) Mixed precision training. ICLR
23. Courbariaux M, Bengio Y, David J-P (2016) BinaryConnect: training deep neural networks
with binary weights during propagations. CS.LG
24. Chmiel B, Ben-Uri L, Shkolnik M, Hoffer E, Banner R, Soudry D, Neural gradients are
near-lognormal: improved quantized and sparse training. In: Habana labs—an intel company.
Caesarea, Israel, Department of Electrical Engineering - Technion, Haifa, Israel
25. Faghri F, Tabrizian I, Markov I, Alistarh D, Roy DM, Ramezani-Kebrya A, Adaptive gradient
quantization for data-parallel SGD. University of Toronto, Vector Institute, IST Austria and
Neural Magic
26. Ren S, He K, Girshick R, Sun J, Faster R-CNN: towards real-time object detection with region
proposal networks. Comput Vis Pattern Recog. arXiv:1506.01497 [cs.CV]
27. Hochreiter S, Schmidhuber J, Long short-term memory, Neural Comput 9:1735–80. https://
doi.org/10.1162/neco.1997.9.8.1735
28. LeCun Y, The mnist database of handwritten digits. Courant Institute, NYU, Corinna Cortes,
Google Labs, New York, Christopher J.C. Burges, Microsoft Research, Redmond
29. Huang Z, Wan N, Simple T, Data-driven sparse structure selection for deep neural networks
30. Yu S, Yao Z, Gholami A, Dong Z, Kim S, Mahoney MW, Keutzer K, Hessian-aware pruning
and optimal neural implant. Peking University, University of California, Berkeley
31. Hi Y, Lin J, Liu Z, Wang H, Li L-J, Han S, AMC: AutoML for model compression and acceler-
ation on mobile devices. Massachusetts Institute of Technology, Carnegie Mellon University,
Google
32. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H,
MobileNets: efficient convolutional neural networks for mobile vision applications. Comput
Vis Pattern Recog. arXiv:1704.04861 [cs.CV]
33. Simonyan K, Zisserman A, Very deep convolutional networks for large-scale image recognition.
Comput Vis Pattern Recog. arXiv:1409.1556 [cs.CV]
34. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen, L-C, MobileNetV2: inverted residuals
and linear bottlenecks. Comput Vis Pattern Recog. arXiv:1801.04381 [cs.CV]
298 N. Jain et al.
35. He K, Zhang X, Ren S, Sun J, Deep residual learning for image recognition. Comput Vis Pattern
Recog. arXiv:1512.03385 [cs.CV]
36. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A,
Bernstein M, Berg A, Fei-Fei L (2014) ImageNet large scale visual recognition challenge. Int
J Comput Vis. 115. https://doi.org/10.1007/s11263-015-0816-y
37. Guo Y, Yao A, Chen Y, Dynamic network surgery for efficient DNNS. In: NIPS
38. Model compression via distillation and quantization. Antonio Polino- ETH Zurich, Razvan
Pascanu - Google DeepMind , Dan Alistarh - IST Austria
39. Li Y, Yang J, Song Y, Cao L, Luo J, Li, L-J (2017) Learning from noisy labels with distillation.
CS>CV
40. Yin H, Molchanov P, Li Z, Alvarez JM, Mallya A, Hoiem D, Jha NK, Kautz J, Dreaming to
distill: data-free knowledge transfer via DeepInversion. In: NVIDIA. Princeton University, the
University of Illinois at Urbana-Champaign
41. Jin J, Dundar A, Culurciello E (2014) Flattened convolutional neural networks for feedforward
acceleration
42. Iandola FN, Moskewicz MW, Ashraf K, Han S, Dally WJ, Keutzer K (2016) Squeezenet:
Alexnet-level accuracy with 50x fewer parameters and 1MB model size
43. Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnornet: Imagenet classification using
binary convolutional neural networks. arXiv preprint
44. Wang M, Liu B, Foroosh H (2016) Factorized convolutional neural networks
45. Yang Z, Moczulski M, Denil M, de Freitas N, Smola A, Song L, Wang Z (2015) Deep-
fried convnets. In: Proceedings of the IEEE international conference on computer vision, pp
1476–1483
46. Andrew G, Menglong H, Chen ZB, Kalenichenko D, Weyand WWT, Andreetto M, Adam H,
MobileNets: efficient convolutional neural networks for mobile vision applications. Google
Inc.
47. Ioannou Y, Robertson D, Cipolla R, Criminisi A, Deep roots: improving CNN efficiency with
hierarchical filter groups. University of Cambridge and Microsoft Research
48. Ma N, Zhang X, Zheng H-T, Sun J, ShuffleNet V2: practical guidelines for efficient CNN
architecture design. Megvii Inc (Face++) and Tsinghua University
Chapter 25
Ortho-FLD: Analysis of Emotions Based
on EEG Signals
1 Introduction
Everyday interactions of human being along with the external environment depend
on several emotional states ranging from basic to complex ones. In recent years, fast-
growing and rapid advances in the development of machine learning and information
technology have made it feasible to empower machine intelligence in the analysis
of emotions from various perspectives. Emotion is a physiological condition that
serves as a representation of individual moods, and also, they are a powerful source
in determining the shapes/outlooks of how we feel about particular events around
us. Involving affective, cognitive, expressive, and motivational components, they are
considered multi-component phenomena [1]. Unhappy circumstances in humans and
the core of mental illness are caused by an emotional imbalance. Therefore, analyzing
different emotional states and developing an emotionally intelligent system is a cru-
cial task in the field of affective computing. Recent records of literature demonstrate
that audiovisual and physiological signals [2] are two kinds of emotional reflections
used in eliciting emotions from various applications. In general, reference points
of an audiovisual research study are drawn from facial expression [3], speech [4],
and body movements/gestures [5, 6]. On the other hand, these modes of emotional
reflection may be controlled and varied based on the internal and external sources
around. Hence, academic research in this field may be negatively impacted due to
this complexity and variability between individuals and situational heterogeneity.
Physiological signals are true in nature and may not be under the control of humans,
into segments with a 6-second time window and calibrated using 3-second baseline
data. Each segment’s differential entropy is then extracted to create a feature cube
and further deep learning model that combines a graph convolution neural network
(GCNN) and long-short term memory neural networks using this feature cube of
each segment as its input to (LSTM). Multiple GCNNs are employed in the fusion
model to extract graph domain information, LSTM cells are used to extract temporal
features by memorizing how the relationship between two channels changes over
time, and a dense layer is applied to obtain the results of the emotion classification
from the DEAP dataset.
At each level of investigating emotions, there are frequently multiple aspects that
occur in the field of machine learning, leading to complex issues from different
perspectives. Ultimately, all the stages are essential factors in analyzing various
emotional states. However, one of the bottom-line factors is the quantity of input
feature at the stage of classification generally, most of the features are correlated
and leads to redundancy, and thus, it is important to explore the new concepts in
the representation of features along with reduced dimension without losing crucial
information is challenging task.
2 Proposed Methodology
The pyramidal structured forward interpolation technique is used to extract and rep-
resent relevant features from high-dimensional time domain EEG signals. It involves
distinguishing discrete samples of a given database discontinuously (samples differ
from one another) in a closed loop of intervals (consecutive samples of even and odd
terms) in different levels of iterations, for reducing the high-dimensional data into
half of its samples, successively in each level of iterations, the obtained results are
also discrete because our input set of samples is discrete. In general, the notation of
our proposed work at each level of interpolation is given by (x) : (n) − (n+1) .
Where n = 1, 2, 3 . . . n (38000 samples of each subject from four different classes).
(x) : Different levels of forward Difference operations. (n) and ( n + 1) : Values
in each samples.
Five different levels of forward difference interpolation iterations for dimensional
reduction are as follows:
A frequently used dimension reduction method is PCA [16, 17], and it makes use
of principal components computed through single value decomposition. But the
direction of principle components maximizes variation in the projected data pattern
(PCA is unsupervised learning approach) instead linear discriminant analysis (LDA)
takes into the account of label data where PCA refuses. LDA is a popularly known
method for reducing the dimension of data, which is built on the criteria of the Fisher
ratio. For optimizing the separation between classes, LDA makes use of Fisher linear
discriminant analysis (FLD) which minimizes the data dimension this happens by
reducing variance within the class and increasing the gap between the calculated
means of classes. It is one of the supervised learning scatters matrix-based classifiers
if label data is given as input to the classifier, it can determine a set of weights to draw
a decision boundary and thus classify the data. It aims in finding the vector which
maximizes between class separation of the projected data (maximizing separation can
be ambiguous). The important criteria followed by an FLD is to maximize the distance
between projected means and minimize the projections within the class variance
more formalized explanation when considering several independent feature matrices
which are relative to the label data. FLD tends to generate a linear combination of
these and that produces the greater main differences between related classes [18].
25 Ortho-FLD: Analysis of Emotions Based on EEG Signals 303
K
viT vk + 1
vk+1 = u k+1 − vi
i=1
viT vi
Along with this it should also be noted that orthogonal discriminant vectors
v1, v2, v3 are used for extracting features rather than the original discriminant vec-
tors u1, u2, u3 Finally, it is worth knowing that PCA-based methods are by definition
304 M. S. Thejaswini et al.
orthogonal in nature., anyhow in the case of FLD, the transformation matrix of Sw− 1
is not symmetric one. In a such set of conditions for a non-symmetric matrix, it is
feasible to gather an eigenvector that is linearly independent and correlated obtained
this process increases the likelihood of appropriating redundant information among
fisher discriminant vectors. This rationale behind causes FLD for poor performance
when more or all projection vectors are considered [19].
tions, EEG signal information was recorded from 28 healthy subjects aged between
20–27, while they were playing four different computer games for 20 min (each game
time duration was five minutes and games were named as G1:Boring, G2:Calm,
G3:Funny, G4:Horror), for recording EEG signals 14 (AF3, F7, F3, FC5, T7, P7,
O1, O2, P8, T8, FC6, F4, F8, AF4) channel wireless EMOTIVE EPOC EEG device
was used, basically, the sampling rate of the device was 2048 Hz; however, at the
time of experimentation it was down-sampled to 128 Hz. Dataset holds two folders
(raw and preprocessed signals) since we considered only preprocessed EEG signals
for our proposed experiments. To remove artifacts caused by hand, head, and arm
movements the author adopted the fifth-order sinc filter which was built into the EEG
device itself. The dataset contains 1568 (4 * 14 *8) EEG data where 4 represents
the number of games played, 14 stands for the number of EEG channels, and 28 is
subjects who participated during the time of recording and the sample length of EEG
signals for a single subject in each emotion is 38,252. To acquire more detailed and
technical knowledge on the dataset refer to [26].
This section details the experimental design carried out using the video game-based
EEG GAMEEMO dataset, and we considered 28 subjects’ preprocessed EEG fea-
tures from all four emotional classes recorded using 14 channel EEG device for
implementing the proposed work. To start with the implementation of MATLAB
2018 on a PC with an Intel I5 processor and 8GB ram was preferred. In the dataset,
the sample length of a single subject in each different class of emotion is 38,000. The
earlier section details how extraction and representation of EEG feature from the time
domain along with reduction of dimension using pyramidal structure interpolation
and OFLD technique is achieved. The pyramidal approach represents EEG features
by differencing the larger set of EEG features with a sample length of 38,000 to 1187
length of samples in performing interpolations in five different levels of iterations.
Then obtained 1187 features were divided into training and testing in an 80:20 ratio,
we selected 22 subjects’ EEG features as training and 6 subjects’ EEG features as
testing for classifying 4 different emotions, the same procedure was followed for all
306 M. S. Thejaswini et al.
Table 2 Comparison table on accuracy (percentage) using GAMEEMO dataset for all 14 channel
EEG signals
Method AF3 AF4 F3 F4 F7 F8 FC5
Alakus et al., method 61 75 59 67 67 75 64
+KNN [26]
Alakus et al., method 81 88 63 72 84 80 66
+SVM [26]
Alakus et al., method 86 87 79 83 84 84 79
+MLPN [26]
Tuncer et al., method 98.75 98.57 99.11 98.39 98.21 98.75 98.57
+LEDPatNet19 [30]
Tuncer et al., method 99.33 99.55 98.66 98.21 98.66 99.78 99.88
+SVM [12]
Our proposed 100 100 100 100 100 100 100
method+GRNN
Method FC6 O1 O2 P7 P8 T7 T8
Alakus et al., method 68 65 65 61 73 61 64
+KNN [26]
Alakus et al., method 68 57 70 59 81 65 81
+SVM [26]
Alakus et al., method 85 79 83 79 77 75 79
+MLPN [26]
Tuncer et al., method 99,29 99.11 98.39 98.57 98.57 98.04 98.57
+LEDPatNet19 [30]
Tuncer et al., method 98.66 97.32 99.33 99.78 98.88 98.88 100
+SVM [12]
Our proposed 100 100 100 100 100 100 100
method+GRNN
14 channel EEG features. Then 1187 EEG features from the interpolation technique
were projected to OFLD which gave us the ten most high discrimination patterns of
EEG features. OFLD derives the projection of the input space of multi-dimension
onto the line of projection vectors which produces a maximum ratio of scatter matrix
between the class and within the class. Then 10 most projection was applied to GRNN
to test the data. The purpose of choosing GRNNs as classifiers is due to their high
quality of performance in time series prediction tasks in a wide range of applications.
The below-tabulated results show that obtained results with a combination of dimen-
sional reduction approach and GRNN performed well when compared to another
state of existing methods. The results obtained for all 14 channels in the proposed
method are given in Tables 1 and 2. It is noticeable from tabulated results compar-
ing the proposed study, that a combination of dimensional reduction and GRNN
outperforms other existing methods.
25 Ortho-FLD: Analysis of Emotions Based on EEG Signals 307
4 Conclusion
In this research study, a new method for classifying four different emotions using
EEG signals was presented; the prime intention of this study was to extract and
represent EEG features and reduce the huge dimension of EEG features into a smaller
dimension without the information loss. In accomplishing this purpose, we have
adopted a combination of interpolation for the representation of features and OFLD
approaches for dimensional reduction. Through the interpolation technique, a larger
set of EEG features was applied with interpolation in extracting and representing
relevant features and then observed features along with the reduced dimension were
employed with OFLD which exhibits a distinctive way of representing given patterns
along with high discrimination. When working with an orthogonal system as opposed
to a non-orthogonal one precision and calculations are always worthwhile. Hence,
in this study behavioral characteristic of OFLD was explored in the classification
of EEG signals. Results from ortho-FLD reached better classification performance
when we utilize a lesser number of training samples. Empathetically there is a positive
impact in the proposed study when compared with other state-of-the-art methods
because initially representation of features employee’s selection of relevant features
from the time domain of one-dimensional EEG data, here no data transformation was
carried out like other traditional methods. Finally, classification with a conventional-
based neural network such as GRNN was used in classifying four different emotions.
The developed model was examined on the GAMEEMO dataset, and it is noticeable
that observed results are promising for all 14 channel EEG signals. The proposed
study is unique and the first in the kind of history related to emotion recognition
from EEG signals. In the future, we wish to grow by exploring new different ways
of dimension reduction methods for detecting various emotions in the field of effect
recognition.
References
9. Gupta V, Chopda MD, Pachori RB (2018) Cross-subject emotion recognition using flexible
analytic wavelet transform from EEG signals. IEEE Sens J 19(6):2266–2274
10. Arjun A, Rajpoot AS, Panicker MR (2021) Introducing attention mechanism for EEG signals:
emotion recognition with vision transformers. In: 2021 43rd annual international conference
of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 5723–5726
11. Khare SK, Bajaj V (2020) Time-frequency representation and convolutional neural network-
based emotion recognition. IEEE Trans Neural Networks Learn Syst 32(7):2901–2909
12. Tuncer T, Dogan S, Baygin M, Acharya UR (2022) Tetromino pattern based accurate EEG
emotion classification model. Artific Intell Med 123:102210
13. Yin Y, Zheng X, Hu B, Zhang Y, Cui X (2021) EEG emotion recognition using fusion model
of graph convolutional neural networks and LSTM. Applied Soft Comput 100:106954
14. Liu J, Meng H, Li M, Zhang F, Qin R, Nandi AK (2018) Emotion detection from EEG record-
ings based on supervised and unsupervised dimension reduction. Concurr Comput Pract Exp
30(23):e4446
15. Thejaswini MS, Hemantha Kumar G, Manjunatha Aradhya VN (2022) A pyramidal approach
for emotion recognition from EEG signals. In: 2nd international conference on applied intel-
ligence and informatics. Springer Cham. (Paper accepted and article in Press)
16. Bazgir O, Mohammadi Z, Habibi SAH (2018) Emotion recognition with machine learning using
EEG signals. In: 2018 25th national and 3rd international iranian conference on biomedical
engineering (ICBME). IEEE, pp 1–5
17. Chen J, Ro T, Zhu Z (2022) Emotion recognition with audio, video, EEG, and EMG: a dataset
and baseline approaches. IEEE Access 10:13229–13242
18. Aradhya VM, Niranjan SK, Hamsaveni L (2013) A robust analysis of FLD and orthogonal
FLD on handwritten characters. In: 2013 international conference on communication systems
and network technologies. IEEE, pp 105–108
19. Gilbert S (2007) Linear algebra and its applications. Thomson
20. Aradhya VNM, Niranjan SK, Hemantha Kumar G (2010) Probabilistic neural network based
approach for handwritten character recognition. Special Issue of IJCCT 1(2):3
21. Aradhya VNM, Pavithra MS, Naveena C (2012) A robust multilingual text detection approach
based on transforms and wavelet entropy. Procedia Technol 4:232–237
22. Aradhya VN, Mahmud M, Guru DS, Agarwal B, Kaiser MS (2021) One-shot cluster-based
approach for the detection of COVID-19 from chest X-ray images. Cognit Comput 13(4):873–
881
23. Aradhya VNM, Niranjan SK, Hemantha Kumar G (2010) Probabilistic neural network based
approach for handwritten character recognition. Special Issue of IJCCT 1(2)
24. Prakash BV, Ajay DV, Ashoka, Manjunath Aradhya VN (2015) An exploration of PNN and
GRNN models For efficient software development effort estimation
25. Aradhya VNM, et al (2020) Learning through one shot: a phase by phase approach for COVID-
19 chest X-ray classification. In: 2020 IEEE-EMBS conference on biomedical engineering and
sciences (IECBES). IEEE
26. Alakus TB, Gonen M, Turkoglu I (2020) Database for an emotion recognition system based on
EEG signals and various computer games-GAMEEMO. Biomed Sig Proc Control 60:101951
27. Chen Y, Chang R, Guo J (2021) Emotion recognition of EEG signals based on the ensemble
learning method: Adaboost. Math Prob Eng
28. Gao Q, Wang CH, Wang Z, Song XL, Dong EZ, Song Y (2020) EEG based emotion recognition
using fusion feature extraction method. Multimedia Tools Appl 79(37):27057–27074
29. Shon D, Im K, Park JH, Lim DS, Jang B, Kim JM (2018) Emotional stress state detection using
genetic algorithm-based feature selection on EEG signals. Int J Environ Res Public Health
15(11):2461
30. Tuncer T, Dogan S, Subasi A (2022) LEDPatNet19:automated emotion recognition model
based on nonlinear LED pattern feature extraction function using EEG signals. Cognitive
Neurodyn 16(4):779–790
Chapter 26
Implementation of Reliable Post-disaster
Relief Communication Network Using
Hybrid Secure Routing Protocol
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 309
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_26
310 G. Sabeena Gnana Selvi et al.
2 Related Work
The primary feature of MANET is the ongoing flexibility of nodes which may lead
to frequent topology changes and other difficulties including how to route packets
of data between nodes. MANET can be used in a variety of settings to quickly
and easily construct a network; these settings include disaster situations, WSN, and
VANET [13]. Each environment differs from the others in certain ways. A collection
of wireless mobile computers work together by forwarding packets for one another,
so they can connect outside the bandwidth of direct wireless signals. An independent
group of mobile users known as a MANET can communicate over wireless links that
are only moderately fast. Since the nodes are portable, the network architecture may
alter rapidly and changeably over time. Owing to self-configure and decentralize
ability, the mobile nodes will need to provide routing functions [14].
The routers and involved nodes serve as the wireless topology of the network and
may vary quickly and unpredictably due to the router’s freedom to move and govern
itself at will. Such a network might function independently or it might be linked to
the wider Internet. While other nodes require the assistance of intermediary nodes to
transmit their packets, a number of nodes can directly connect with those nodes that
are within radio transmission range of each other. These networks can function every-
where without the help of any infrastructure because they are completely distributed.
These networks are quite robust due to this characteristic [15]. The wireless connec-
tivity between the nodes exists at any given time based on the placements of the nodes,
their spreader and receiver attention designs, communication stages of power, and co-
channel meddling stages. Since users are not constrained to a single physical location
as is the case with traditional wireline networks, the MANET permits a more trans-
parent communication architecture. It is a brand-new, unique connection without any
fixed cable communication infrastructure or additional network hardware [16].
26 Implementation of Reliable Post-disaster Relief Communication … 311
Since MANET nodes vary in communication range and have incomplete energy
resources that cannot typically be recharged or substituted, they face numerous
challenges, including low bandwidth, high energy consumption, limited memory,
processing limitations, and changes in mobility patterns [17]. Examples of these
devices include mobile phones, PDAs, digital cameras, earphones, wristwatches,
iPads, and laptops. The difficulty with mobility patterns causes periodic reorganiza-
tions of the network topology. The wireless network is unique compared to wired
networks because of issues with interference, intra-flow, inter-flow, and fade. In
the absence of a centralized node, nodes interact with one another through peer-
to-peer queries. As a result, data must be transmitted through intermediary nodes,
making routing a significant problem in a MANET [18]. The following sub-section
summarizes the various existing routing protocols utilized in the MANET envi-
ronment. Hybrid Algorithm for Secured MANET Environment is the suggested
model (HASME). MANET HASME algorithm implementation to evict problem-
atic nodes is contrasting the HASME with the current three procedures. The self-
starting, multi-hop, and dynamic routing among all the network participants who
seek to construct and maintain a network connecting all the existing nodes are made
possible by the HASME algorithm presented in this research study. As was covered in
the section before, MANETs are subjected to a variety of network assaults, including
gray hole and black hole attacks. The technique suggested in this research study is
primarily designed to counter these assaults and offer message transmission in the
network that is safe. Additionally, the method enables all mobile nodes to swiftly
find new paths to their end point [19]. New EENS-DA model proposed to achieve
network slicing and data aggregation inWSN. The EENS-DA model has allocated the
needed resources specific applications clearly and efficiently. Moreover, the EENS-
DA model has employed Conv-LSTM-based network slicing and tree-based aggre-
gation techniques. The EENS-DA technique enhances the efficacy of data slicing,
enhances the accuracy, and ensures the privacy preservation in the network [20].
Although reactive protocols only establish routes when those routes are required,
they are known as on-demand protocols. As the name implies, the source creates the
need. When a source node needs a route to a destination, it starts the network’s route
discovery process. Once a route is discovered or all potential route variations have
been looked at, the process is finished. Following that, a route maintenance technique
is followed to maintain the legitimate routes and eliminate the invalid routes [27].
Ad hoc On-demand Distance Vector routing (AODV)
A distance vector routing protocol called AODV was launched for MANET in 2003
[28]. AODV is built to operate at a variety of speeds and high-density network
topologies. In order to overcome the counting to the infinite problem that plagues
traditional distance vector protocols, it has been created to work in a trusted network
that cannot contain malware in a loop-free manner. The AODV routing protocol
has two operational modes: route discovery and route management. Route Requests,
Route Replies, Route Errors, and Route Reply Acknowledgment are all types of
AODV control messages. Only requests are made to start the routing process.
Dynamic Source Routing (DSR)
In 1994, the DSR on-demand protocol made its debut. It has two stages, like AODV:
route discovery and route maintenance. Even yet, a system with up to 200 node
density and high rates of mobility can guarantee loop-free routing by employing
a variety of strategies that allow for many paths to be followed to any endpoint.
DSR enables unidirectional links, in contrast to AODV operation. Due to the fact
26 Implementation of Reliable Post-disaster Relief Communication … 313
that the header of a piece data packet encloses all necessary routing statistics to
reach the target node, this protocol is known as source routing. Again, unlike AODV,
connectivity among neighbors is not required to be periodically updated [29].
3 Proposed Methodology
When a disaster occurs, a lot of people start looking for disaster-relevant information.
This could cause congestion issues, which would greatly reduce network performance
and increase end-to-end delay. Most routing protocols choose the least number of
hops between the sources and destination pairs when routing traffic. Battery live at the
path’s nodes will be quickly depleted if the same path is repeatedly used. Furthermore,
load balancing in the network is not accomplished via shortest path routing. Data
loss results from a path break, and network reconfiguration takes longer. A node
that transmits at maximum power is likely to quickly run out of battery life. Battery
life must be managed wisely to extend the lifetime of the network because it is a
resource that is crucial to the network’s durability. Therefore, the optimal solution
has been recommended called HSR protocol to alleviate the aforementioned issues
in a MANET which is utilized for effective post-disaster communication. Figure 1
illustrates the proposed disaster relief communication model. The model illustrates
how post-disaster communication may take place in a real-world context.
The route request packet (Rq ) is employed to control the path to the destination when
the source node is not able to find a path in the route cache. The route discovery
process is necessitated in order to transmit the packets across the network. As the
packet moves from the source to the destination, each central mobile node adds its
own Internet address to the list of IP addresses in the request for the route. As a
response, when the demand packet arrives at the destination node, it contains the
whole path from source to destination, a process known as path building. After
getting the signal from the source node, the target node restarts the path discovery
process in order to deliver the route response packet back to the source node.
1. Begin
2. if Rq is received from a legitimate node, then do
3. RSA technique is utilized to decrypt the content of the cipher at the
receiver node;
4. Create a UPD packet using the QUE message as a foundation;
5. As in step 2, encrypting UPD;
6. Send UPD to the source node;
7. end if
8. if a malevolent node receives a QUE packet, then do
9. Construct a UPD message and send it to the source;
10. Decrypt the obtained UPD using the RSA algorithm;
11. UPD will be successfully decrypted and the hash code (signa-
ture) will be identical if UPD is a trustworthy node;
12. To show that UPD has originated from a legitimate node, set a flag to 0;
13. else
14. Set the flag to 1 to show that the UPD originated from a malicious node;
15. end if
16. Calculate the trust value of the link through Eq. (1);
17. Accomplish the link from source to sink;
18. Choose the accurate path according to the TS value;
19. Exclude the legitimate node from the transmission path;
20. Mitigate the links with low TS value;
21. if a node becomes attacked during the packet communication, then do
22. Repeat steps from 4 to 7;
23. else
24. Secured connection is created;
25. end if
26. end
26 Implementation of Reliable Post-disaster Relief Communication … 315
The secure routing phase will construct the secured route from source to destina-
tion. Algorithm 1 depicts the detailed description of the proposed HSR algorithm.
Primarily, the proposed HSR protocol utilizes the RSA to encrypt the query (QUE)
packet. The SHA-512/256 method is implemented to provide a signature for the QUE
packet. Afterward, it sends a QUE packet to nearby nodes which transfers the hashes
and the ciphertext from the source to the target node. If a malevolent node receives a
QUE packet, then it constructs a Unified Path Discovery (UPD) message and sends
it to the source. The source node further decrypts the obtained UPD using the RSA
algorithm. The UPD will be successfully decrypted and the hash code (signature)
will be identical if UPD is a trustworthy node. To show the UPD has originated from
a legitimate node, set a flag to 0, whereas set the flag to 1 to notify that the UPD
originated from a malicious node. The trust (TS) value of the link can be evaluated
as,
Tc
TS = , (1)
Tt
where Tc indicates the correct transmission and Tt implies the total transmission.
Accomplish the link from source to sink and choose the accurate path according to
the TS value. At the same time, exclude the legitimate node from the transmission
path where it mitigates the links with low TS value. If a node becomes attacked during
the packet communication, then repeat the transmission steps again; otherwise, the
secured connection is created from the source to destination.
The proposed HSR protocol does not include the AODV protocol or proactive routing
techniques. The responsibility of the route maintenance phase is to maintain the
secure routing protocol among multiple deployed nodes within the network. This
path is discovered by the MAC layer or software acknowledgment which is exclusive
to HSR. A source route reply packet is used to notify the source node of the specific
route path and restart the route discovery mechanism when a connection between
two locations is lost. Since HSR is built on the idea of many pathways, when a source
receives a packet containing a route error, it can immediately use an alternative route
that is stored in the source route cache. It reduces the routing overhead issues within
the network. According to the concept of datagram pick-up, in the event that any
intermediate route between the source and the route detects a cracked next hop link,
if that intermediate route has an extra route to the destination in its route cache, it
can immediately use that similar route to forward the packet to the terminus.
316 G. Sabeena Gnana Selvi et al.
The NS2 platform has been utilized to evaluate the performance of proposed as well
as existing routing protocols. In general, NS2 tool is preferred more than a discrete
event simulator for networking research. By employing the TCP, UDP, IP, and CBR
message patterns, NS2 offers simulation and investigation support for wired and
wireless networks. The two main components of NS2 are NS, which stands for
network simulator, and NAM, which stands for network animator. The network
circumstances taken into account for simulation are listed in Table 1. To study how
network size affects protocol performance, the number of nodes in the system is
varied.
Four essential metrics such as Average Energy Utilization (AEU), Throughput
(THR), Packet Delivery Ratio (PDR), and Average End-to-End Delay (AEED) are
applied to analyze the performance of the proposed model. These metrics are assessed
by varying the number of nodes from 50 to 300. The comparative methods are AODV
[26], DSR [27], and OLSR [24].
The AEU analysis of the various routing protocols is exposed in Fig. 2. It is observed
that the proposed protocol obtains lesser energy depletion than the existing protocols.
A detailed statistical analysis is as follows: The AEU of the proposed protocol shows
superior results of 72, 58, and 45% as compared with the AODV, DSR, and OLSR
protocols, respectively. These better results are owing to accomplishment of proper
route discovery and route maintenance strategies during the route formation phase.
The optimal path is attained in the proposed protocol with the aid of lesser AEU value.
In contrast, the existing protocols lagged to implement the optimal route between
the nodes. This leads to acquiring higher AEU values of 0.26, 0.19, and 0.14 J.
Estimated PDR is the proportion of packets delivered by the different CBR sources
that were accepted by the recipients. It also refers to the ratio of the entire quantity of
data packets received by the destination side to the total number of data packets trans-
mitted by the source node. This indicator shows how many data packets effectively
arrive at their intended locations. The PDR comparison for the various protocols is
shown in Fig. 3. Based on the HSR, it is apparent that the proposed HSR model
achieves a superior PDR value of 95% for a larger network. Meanwhile, the AODV,
DSR, and OLSR protocols sustain the PDR of 65%, 78%, and 80%, respectively.
The higher PDR of the proposed HSR model is because of employment of proper
path optimization algorithms. It finds the secure routing from source to destination
with a lesser energy. Thus, the attacker is not able to crash the routing path which
increases the packet transmission at the destination node.
AEED is the amount of time it takes a packet to travel along a system from its
beginning to its destination. According to Fig. 4, it is noticed that the proposed HSR
model takes a lesser AEED of 0.3 s than the AODV, DSR, and OLSR protocols.
This is because of quick route formation and query response from the proposed HSR
model. The route formation requires minimal time for packet transmission from
source to destination. At the same time, the utilization of QUE messages enhances
26 Implementation of Reliable Post-disaster Relief Communication … 319
the packet transmission without any delay. This lesser delay allows the proposed
model to maintain lower AEED of 40, 18, and 17% when compared with the AODV,
DSR, and OLSR protocols. The existing models are vulnerable to numerous attacks
where the attacker can easily change the routing path between two nodes. Henceforth,
the packet will be transmitted in a longer route to reach the destination.
5 Conclusion
References
1. Angueira P, Val I, Montalbán J (2022) A survey of physical layer techniques for secure wireless
communications in industry. IEEE Commun Surv Tutorials 24(2):810–838
2. Prasanth A, Pavalarajan S (2019) Zone-based sink mobility in wireless sensor networks. Sens
Rev 39:874–880
3. Sekar J, Aruchamy P (2022) An efficient clinical support system for heart disease prediction
using TANFIS classifier. Comput Intell 38:610–640
4. Shantha R, Mahender K, Jenifer A (2022) Security analysis of hybrid one time password
generation algorithm for IoT data. AIP Conf Proc 2418:1–10
5. Prasanth A, Jayachitra S (2020) A novel multi-objective optimization strategy for enhancing
quality of service in IoT-enabled WSN applications. Peer-to-Peer Netw Appl 13:1905–1920
6. Bhaskar KB, Aruchamy P, Saranya P (2022) An energy-efficient blockchain approach for
secure communication in IoT-enabled electric vehicles. Int J Commun Syst 35:1–27
7. Kaur G, Kakkar D (2022) Hybrid optimization enabled trust-based secure routing with deep
learning-based attack detection in VANET. Ad Hoc Netw 136:1–22
8. Prasanth A, Ganeshkumar P (2015) Zone based gateway patrolling in wireless sensor networks.
In: Proceedings in IEEE international conference on engineering and technology, pp 1–6
9. Kaur G, Chanak P, Bhattacharya M (2020) Memetic algorithm-based data gathering scheme
for IoT-enabled wireless sensor networks. IEEE Sens J 20(19):11725–11734
10. Prasanth A, Pavalarajan S (2020) Implementation of efficient intra and inter-zone routing for
extending network consistency in wireless sensor networks. J Circ Syst Comput 29:1–19
11. Rezapour S, Farahani R (2020) Impact of timing in post-warning prepositioning decisions
on performance measures of disaster management: a real-life application. Eur J Oper Res
293:312–335
12. Milanez B, Ali S (2021) Mapping industrial disaster recovery: lessons from mining dam failures
in Brazil. Extr Ind Soc 8:1–21
13. Vazhuthi P, Manikandan SP (2022) An energy-efficient auto clustering framework for enlarging
quality of service in internet of things-enabled wireless sensor networks using fuzzy logic
system. In: Concurrency and computation: practice and experience, pp 1–28
14. Prasanth A, Pavalarajan S, Karthihadevi M (2019) Particle swarm optimization algorithm based
zone head selection in wireless sensor networks. Int J Sci Technol Res 8:1594–1597
15. Srividya P, Devi L (2022) An optimal cluster and trusted path for routing formation and classifi-
cation of intrusion using the machine learning classification approach in WSN. Glob Transitions
Proc 3:317–325
16. Jim L, Islam N (2022) Enhanced MANET security using artificial immune system based danger
theory to detect selfish nodes. Comput Sec 113:1–18
17. Sangeetha A, Rajendran T (2022) Supervised vector machine learning with brown boost energy
efficient data delivery in MANET. Sustain Comput Inform Syst 35:1–10
18. Singh S (2022) A cryptographic approach to prevent network incursion for enhancement of
QoS in sustainable smart city using MANET. Sustain Cities Soc 79:1–19
19. Sabeena Gnanaselvi G, Ananthan TV, Eswaran S (2019) Secured packet transfer using HASME
for AODV protocol to detect black hole and gray hole attack. J Adv Res Dyn Control Syst
11(2):168–177
20. Sheena G, Snehalatha N (2021) An energy efficient network slicing with data aggregation
technique for wireless sensor networks. ICICV, 9388536 (IEEE Explore Digital Library), pp
13–18
21. Feng Y, Zhang B, Chai S (2017) An optimized AODV protocol based on clustering for WSNs.
In: Proceedings in 6th international conference on computer science and network technology,
pp 1–6
22. Subha R, Anandakumar H (2022) Adaptive fuzzy logic inspired path longevity factor-based
forecasting model reliable routing in MANETs. Sens Int 3:1–9
23. Satish Kumar G, Rama Devi P (2021) A novel proactive routing strategy to defend node
isolation attack in MANETS. Mater Today Proc 1–10
26 Implementation of Reliable Post-disaster Relief Communication … 321
24. Abid M, Belghith A (2015) SARP: a dynamically readjustable period size proactive routing
protocol for MANETs. J Comput Syst Sci 81:496–515
25. Jagdale BN (2012) Analysis and comparison of distance vector, DSDV and AODV protocol
of MANET. Int J Distrib Parallel Syst 3:1–11
26. Semchedine F, Moussaoui A (2016) CRY OLSR: crypto optimized link state routing for
MANET. In: Proceedings in 5th international conference on multimedia computing and systems
(ICMCS), pp 1–6
27. Brar G, Thakur P (2019) Routing protocols in MANET: an overview. In: Proceedings in 2nd
international conference on intelligent computing, instrumentation and control technologies
(ICICICT), pp 1–6
28. Reddy P, Reddy B (2022) The AODV routing protocol with built-in security to counter blackhole
attack in MANET. Mater Today Proc 50:1152–1158
29. Ramya T, Mathana JM (2022) Exploration on enhanced Quality of Services for MANET
through modified Lumer and Fai-eta algorithm with modified AODV and DSR protocol. Mater
Today Proc 50:1152–1158
Chapter 27
Compact Metamaterial Octagonal
Antenna for Wireless Body Area Network
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 323
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_27
324 G. Siddhant Arun and D. C. Karia
• Antenna-2 has used FR4 substrate which is low priced and easily accessible. It has
dimensions of 30 × 26 mm2 . It has dielectric constant (r ) of 4.4 and loss tangent
tan (δ) of 0.02, respectively. The size reduction is obtained using metamaterial spit
ring resonator.
• Bandwidth of antenna-2 is increased by 3 times as compared with antenna-1.
• 2.4 GHz is the antenna’s resonant frequency. It can be used for wireless body area
network applications.
2 Stepwise Analysis
In this step, we have used metamaterial complementary split ring resonator in bottom
and top patch of antenna. The antenna’s size is reduced as compared with Step 1. The
dimensions of Antenna-2 are 30 × 26 mm2 . Figure 3 shows the fabricated model of
antenna. The parametric dimensions are shown in Figs. 4 and 5. Figure 6 depicts the
antenna’s top and bottom view. According to Fig. 7, the resulting simulated return
loss is around −30 dB at 2.46 GHz. At 2.4 GHz, the measured return loss on the VNA
is around −25 dB. The obtained VSWR is about −1.074 shown in Fig. 8. Figures 10a
and b depict the radiation pattern for the E-plane and the H-plane, respectively
(Fig. 9).
326 G. Siddhant Arun and D. C. Karia
3 Simulation Results
Fig. 10 a Antenna-2:
E-plane radiation pattern. b
Antenna-2: H-plane
radiation pattern
27 Compact Metamaterial Octagonal Antenna for Wireless Body Area Network 329
Table 3 shows the comparison between the simulated antenna without metamaterial
and with metamaterial SRR.
The antenna is put to the test on a human body with muscle phantom ranging in
thickness from 4 to 10 mm. Figure 11 shows the muscle model is kept at the bottom
of antenna. The simulated change in return loss is shown in Fig. 12. Looking at the
return loss, for gap as 4 mm, there is higher frequency shift observed beyond 2.5 GHz.
As the distance increases, the effect on return loss is reduced and frequency shifts
below 2.5 GHz.
330 G. Siddhant Arun and D. C. Karia
6 Conclusion
Use of metamaterial split ring resonator at top and bottom of patch has helped in
size reduction upto 70 % and improves bandwidth of antenna. The antenna is also
simulated with muscle model, and effect of return loss by varying distance g is
analyzed. There is good agreement between the measured and simulated findings.
References
1. Zhang K, Soh PJ, Yan S (2020) Meta-wearable antennas-a review of metamaterial based anten-
nas in wireless body area networks. Materials 14(1):149
2. Chaturvedi D, Raghavan S (2019) A compact metamaterial-inspired antenna for WBAN appli-
cation. Wireless Personal Commun 105(4):1449–1460
3. Abbas SM, Esselle KP, Ranga Y (2014) An armband-wearable printed antenna with a full
ground plane for body area networks. In: 2014 IEEE antennas and propagation society inter-
national symposium (APSURSI). IEEE
4. Sabban A (2017) Novel wearable antennas for communication and medical systems. CRC
Press
5. Sarkar SB, Impact of metamaterial in antenna design: a review
6. Bala, Bashir D, et al (2012) Design and analysis of metamaterial antenna using triangular
resonator. In: 2012 Asia Pacific microwave conference proceedings. IEEE
7. Chen ZN, et al. (2014) Metamaterials-based antennas: from concepts to technology. In: 5th
international conference on metamaterials, photonic crystals and plasmonics (META’14)
8. Yılmaz HÖ, Yaman F (2019) Metamaterial antenna designs for a 5.8-GHz Doppler radar. IEEE
Trans Instrum Measur 69(4):1775–1782
9. Rani Rakhi, Kaur Preet, Verma Neha (2015) Metamaterials and their applications in patch
antenna: A. Int J Hybrid Inf Technol 8(11):199–212
10. Ali T, et al (2017) A miniaturized metamaterial slot antenna for wireless applications. AEU-Int
J Electron Commun 82:368–382
Chapter 28
Brain Tumor Detection
and Segmentation Empowered with Deep
Learning
1 Introduction
A brain tumor is a potentially fatal condition that impairs the normal functioning of
the human body. For an appropriate diagnosis and therapeutic planning, the brain
tumor must be recognized in its early stages. Medical image analysis relies heavily
on digital image processing. Brain tumor segmentation entails separating aberrant
brain tissues from normal brain tissues. Several researchers have presented semi-
automated and completely automatic approaches for detecting and segmenting brain
tumors in the past.
The most prevalent form of tumor in India is a brain tumor, which ranks tenth.
Magnetic resonance imaging (MRI) scanning can identify the existence of a tumor.
The problem arrives as these MRIs are to be studied by the medical practitioners. It
is not only time consuming but also many times MRI lacks details and is difficult to
locate the region of spread of the tumor in the brain MRIs.
Deep learning models have become very efficient at finding and locating hidden
structures as Ranjbarzadeh et al. [1] presented in brain tumor segmentation theory.
Especially lately using computer vision, a lot of image-based difficult tasks have
P. V. Kamat (B)
Department of AI and ML, Symbiosis International (Deemed University), Symbiosis Institute of
Technology, Pune, Maharashtra, India
e-mail: pooja.kamat@sitpune.edu.in
R. Mansharamani · P. Jain · S. Pandey · P. Agarwal · R. Joshi
Department of CSE, Symbiosis International (Deemed University), Symbiosis Institute of
Technology, Pune, Maharashtra, India
S. Patil
Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed
University), Pune, Maharashtra, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 331
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_28
332 P. V. Kamat et al.
been automated. In our approach, we have used the conditional GANs which not
only use the great ability of U-Nets to segment the image but also use patch GAN to
make the models to learn the minute detail of mapping from the brain MRIs to the
ground truth.
In this research work, we have tried to leverage the power of deep learning and
artificial intelligence to not only detect whether a tumor exists or not but also segment
the exact regions where the tumor is spread. Similarly, as stated by Arif et al. [2], the
objective of research is to assist medical practitioners quickly and accurately identify
the tumor spread.
After experimenting with several deep learning models as in Siddique et al. [3] like
VAE, UNETs, and Pix2Pix, we found that Pix2Pix gave us the best results. We have
utilized 250 × 250 brain MRI images of 110 patients. For evaluating the accuracy
of model prediction with ground truth, we have used two different metrics like MSE
(L1 Loss) and SSIM Loss (Structural Similarity Index) according to Brindha et al.
[4].
2 Overview
The deep learning model is supposed to learn a function which can map the relation
between brain MRI images with the ground truth.
The model has to learn to convert the brain MRI info to the segmented image.
Brain tumors are among the most lethal cancers in the world. Glioma, the most
frequent kind of primary brain tumor, is caused by glial cell carcinogenesis in the
spinal cord and brain. So, we will be segmenting the region of tumor spread in the
brain MRIs using deep learning methods. This can be understood better by observing
the images in Fig. 1.
The key objectives of this study are as follows:
1. To detect brain tumors.
2. To segment them using deep learning in order to provide better assistance.
a. Input MRI of brain tumor image b. Ground truth image of tumor to c. Overlapped image
be found using model
3 Methodology
Figure 2 gives us a broader overview of the steps which we have followed. It starts
with getting and storing the data in the required format. Then, this converted data goes
through a data preprocessing pipeline which contains steps like Image Normalization
in which we normalize the image pixels. It is then followed by a center crop where
we crop the interested region of the image and then introduce some rotations to the
random image to make the model robust. This data preprocessing step is followed
by the model where we train the model using Pix2Pix architecture. After that we
evaluate the model using the metrics like MAE and SSIM Loss.
Autoencoders are a form of neural network that learns from a dataset how to encode
unstructured input. Except for the last layer, the initial section is an encoder, which is
similar to a convolution neural network. Yousef et al. [5] proposed that the encoder’s
purpose is to use the dataset to learn effective data encoding and then transmit it
using a bottleneck design.
Variational autoencoder differs from autoencoder in that it gives a statistic for
characterizing the dataset’s samples in latent space. As a result, with a variational
The U-Net architecture, which was initially released in 2015, has caused a revolution
in the field of deep learning as highlighted by Hossain et al. [6].
According to the design (Fig. 3), an input picture is transmitted through the model,
followed by a pair of convolutional layers using the ReLU activation function.
This skip link is an important notion for preserving loss from prior layers so that
it reflects more strongly on the total values. Suggested by Rehman et al. [7], they
have also been scientifically demonstrated to offer superior results and accelerate
model convergence. We have a handful of convolutional layers followed by the last
convolution layer in the final convolution block.
The pix2pix has two significant architectures: U-Net and patchGAN, one for the
generator and the other for the discriminator. The discriminator model determines if
the target image is a feasible transformation of the input image to produce the output
picture from both the input/source image and the target image. In order to create the
output picture, the generator alters the input image.
In 2015, Ronneberger et al. created U-Net particularly for biomedical picture
segmentation.
The two primary parts of U-Net are as follows:
• A contraction path (left side) using convolutional layers that down samples the
data while extracting information.
• A long path consisting of an information-upsampling up transpose convolution
layer (right side) according to Saha et al. [10].
On the other hand, instead of discriminating a complete image all at once,
PatchGAN uses smaller patches of N×N size to determine if a generated image
is real or fake.
As an alternative, Pix2Pix, a pairwise picture translation technique, has an extra
loss that is exclusively meant for the generator, allowing it to generate images that
are more realistic and truer to life. In addition to Pix2pix as examined by Navidan
et al. [11], there are other GANs that may be compared to it, such as CyclicGAN,
which is similar to Pix2pix except for the data part. Instead of pair image translation,
unpaired translation is employed.
The Generator Architecture. Our generator architecture as depicted in Fig. 4 is
based upon U-Nets with hypertuned to our case study.
Table 1 Generator
Hyperparameters Value
architecture model
Kernel size 4
Strides 2
Padding 1
Output padding 0
Both brain MRI images and manual FLAIR abnormality segmentation masks are
included in the dataset utilized in this investigation. The images were provided by
the Cancer Imaging Archive (TCIA). It corresponds to 110 patients from The Cancer
Genome Atlas (TCGA) collection who had at least one FLAIR sequence and genomic
cluster data. To make our model more robust, we used a variety of data argumentation
techniques such as gray scaling, rotation, and so on. This dataset includes MR brain
pictures as well as manual FLAIR abnormality segmentation masks. The pictures
were provided by the Cancer Imaging Archive (TCIA). They correspond to 110
28 Brain Tumor Detection and Segmentation Empowered with Deep Learning 337
TCGA patients with lower-grade gliomas who have genomic cluster data and at least
one FLAIR sequencing. Both patient information and tumor genetic classifications
are included in the data .csv file. The picture used in this model has a dimension of
250 by 250 pixels as discussed in Table 2.
5 Result
Apart from the performance graphs given in Fig. 5, we can find the generator and
discriminator performances.
Figure 5 represents the performance of the generator model. The training loss is
shown by the blue line, while the validation loss is shown by the pink line. With
each epoch, both lines are dropping, as can be seen (x-axis represents epochs, y-axis
represents loss value) in the figure which was also mentioned in Tumor Segmentation
Quality Assessment as proposed by Hoebel et al. [16]
L1 Loss (MAE). Mean absolute error, often known as L1 Loss as stated in Fig. 6, is
one of the most fundamental loss functions and a straightforward evaluation metric.
According to Zaini et al. [17], it is determined by taking the absolute difference
between anticipated and actual values and averaging them throughout the whole
dataset.
We may use this metric to compare predicted tumor segmentation to ground truth
at the pixel level. MSE does not lower average error; however, MAE does. Instead,
MSE is very susceptible to outliers. For Image Enhancement, MAE will most likely
provide an image that looks to be of greater quality to a human viewer, whereas MSE
typically produces fuzzy output.
SSIM Loss. The Structural Similarity Index (SSIM) is a perceptual metric used to
compare the similarity of two pictures as shown in Fig. 7.
The Structural Similarity Index (SSIM) measure captures three key elements from
an image similarly proposed by Khan et al. [18]:
• Luminance. Averaging the pixel values yields the brightness. It is usually denoted
by (Mu), and the formula is as follows.
• Contrast. The standard deviation (square root of variance) of all pixel values is
used to compute it. The formula below symbolizes and represents it (sigma) as
stated by Kermiv et al. [19].
Table 3 Comparison of
Models used MAE (L1_LOSS) SSIM loss
different models
UNET 0.016 0.028
Pix2Pix 0.001 0.013
• Structure. To get an output with a unit standard deviation, which enables a more
accurate comparison, we fundamentally divide the input signal by its standard
deviation. With the help of a consolidated formula, the structural comparison is
performed (more on that later).
2μx μ y + C1 2σx y + C2
SSIM(x, y) = 2 .
μx + μ2y + C1 σx2 + σ y2 + C2
In Fig. 8, the left column represents the input MRI images as proposed by Thaha
et al. [20], the central column represents the target CT images, and the right column
represents the generated CT images produced by the model.
340 P. V. Kamat et al.
6 Conclusion
In this report, we presented and analyzed a few approaches for detecting brain tumors
and segment them into different types. This study was conducted using a publicly
available dataset: LGG segmentation dataset. We have compared the Pix2Pix model
against the VAE AND UNET model. The UNET model gave us the accuracy (1-
MAE) of 92%, while the Pix2Pix model gave us an accuracy of 99%. Hence, we
found that conditional GANs, i.e., Pix2Pix would be the best option for segmenting
the brain tumor.
Moreover, the model’s performance could be enhanced by integrating or adding
additional parameters of the dataset. Getting research like these in a ready to use
condition and accessible to everyone is difficult, because of the reasons like lack
of an efficient amount of data to train our own custom model or creating a custom
model which stays updated with the new upcoming technologies.
So to overcome such problems, this model can be built and deployed in production
which can be accessible from a website using which medical practitioners can easily
get assistance and can be available to every medical official. This can be further
extended to hospitals and medical agencies which can help them in better assessment
and detection of brain tumor and its segmentation. Another element can also be added
as to categorize the type of brain tumor mainly in primary brain tumors and secondary
brain tumors and further classify into different types.
References
10. Saha A, Zhang YD, Satapathy SC (2021) Brain tumour segmentation with a multi-pathway
ResNet based UNet. J Grid Comput 19:43
11. Navidan H, Moshiri PF, Nabati M et al (2021) Generative adversarial networks (GANS) in
networking: a comprehensive survey and evaluation. Comput Netw
12. Vy NHA, Uyen LTT, Linh HQ (2022) Segmentation of brain tumour using UNET architecture.
In: Van Toi V, Nguyen TH, Long VB, Huong HTT (eds) 8th international conference on the
development of biomedical engineering in Vietnam. BME 2020. IFMBE Proceedings, vol 85.
Springer, Cham
13. Wang S, Dai C, Mo Y, Angelini E, Guo Y, Bai W (2020) Automatic brain tumour segmentation
and biophysics-guided survival prediction. In: Crimi A, Bakas S (eds) Brainlesion: glioma,
multiple sclerosis, stroke and traumatic brain injuries. BrainLes 2019. Lecture notes in computer
science, vol 11993. Springer, Cham
14. Fan C, Lin H, Qiu Y (2022) U-Patch GAN: a medical image fusion method based on GAN. J
Digit Imaging
15. Pereira S, Pinto A, Alves V, Silva CA (2016) Brain tumor segmentation using convolutional
neural networks in MRI images. IEEE Trans Med Imaging 35(5):1240–1251
16. Hoebel K, Andrearczyk V, Beers A, Patel J, Chang K, Depeursinge A, Müller H, Kalpathy-
Cramer J (2020) An exploration of uncertainty information for segmentation quality assess-
ment. Proc SPIE 11313. Medical Imaging
17. Syed Zaini SZ, Sofia NN, Marzuki M, Abdullah MF, Ahmad KA, Isa IS, Sulaiman SN (2019)
Image quality assessment for image segmentation algorithms: qualitative and quantitative anal-
yses. In: 2019 9th IEEE international conference on control system, computing and engineering
(ICCSCE)
18. Khan AH, Abbas S, Khan MA, Farooq U, Khan WA, Siddiqui SY, Ahmad A (2022) Intelligent
model for brain tumor identification using deep learning. Appl Comput Intell Soft Comput
2022:8104054
19. Kermi A, Mahmoudi I, Khadir MT (2019) Deep convolutional neural networks using U-Net
for automatic brain tumor segmentation in multimodal MRI volumes. In: Lecture notes in
computer science, pp 37–48
20. Thaha MM, Kumar KPM, Murugan BS, Dhanasekeran S, Vijayakarthick P, Selvi AS (2019)
Brain tumor segmentation using convolutional neural networks in MRI images. J Med Syst
43(9)
Chapter 29
Security of Electronic Voting Systems
Using Blockchain Technology
1 Introduction
The nation, as well as the voters and their trust, depend on the integrity of an electronic
voting system. The government also thinks that electronic voting increases voter trust
while also increasing interest in voting. With the deployment of these electronic
voting systems, two key objectives can be accomplished as described by the authors
Anita et al. [1]: first, the expense of holding a presidential election is greatly reduced,
and second, voting locations are made more secure. Secure electronic voting is a
component of multiparty computations, in which a group of people makes decisions
that are kept hidden from one another.
A safe and reliable bulletin board is necessary to provide voters with a unified
viewpoint, but it is unclear to the administration whether or not this board (the public
bulletin) can be relied upon. Blockchain is regarded as a reliable option for building
secure message boards that the general public can trust. A safe and decentralized
platform for users is provided by the emerging field of blockchain technology.
Election security may be a subject of national security in every democracy. To
reduce the cost of organizing a national election while meeting and strengthening the
security criteria of an election, the probability of electronic voting systems has been
studied for 10 years in the field of computer security. Pen and paper commutation
has been a part of the legal system ever since elections were conducted democrat-
ically. The use of a substitute election technology with the conventional pen-paper
method is essential to reduce fraud and make the voting process traceable and veri-
fiable. Security experts view electronic voting equipment as defective based purely
on worries about physical security. Such a device will be sabotaged by anyone who
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 343
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_29
344 R. K. Pandey and R. K. Tiwari
has physical access to it, leading all votes cast through it to be altered. Blockchain
might be a distributed, irrefutable, immutable public ledger.
This article evaluates the usage of blockchain technology to create an associate’s
degree electronic voting system.
The authors of this paper, Aste et al. and Roehrs et al. [2, 3] defined the term
blockchain as a collection of data (a “block”) that is protected by the widely used SHA
256 algorithm. Blockchain functions as a list of linked records, where newly, block
is always added at the end and contains the hash of the block before it (Lauf et al.
and Khan et al.) [4, 5]. Figure 1 shows the block details of the blockchain-applied
electronic voting system.
In the blockchain, each block is stored in a decentralized manner, often known
as a peer-to-peer network, with no central authority. There are two keys (public
and private) used for each node, one for rendering data unreadable and the other for
rendering it readable once more (Aumasson et al.) [6]. The data encrypted by a public
key that matches a private key can be decrypted. According to the author, Zheng
et al. [7] asymmetric cryptography is what will enable blockchain to have a non-
recoverable and stable characteristic. The characteristics of blockchain technology
are shown in Fig. 2.
It offers a database that is decentralized and doesn’t require a reliable third party.
Each node in this system keeps the block of data values locally. It was initially
developed to offer secure peer-to-peer money transfers, but it is now being utilized
in a variety of other industries, including healthcare, e-voting, and IoT devices as
described by the author Mathur et al. [8]. The standard SHA 256 algorithm can be
better understood with the help of Fig. 3.
• The SHA-256 method can produce an output of a specified length from an input
of any random length (256 bits).
Immutable
Decentralization
• No matter how large or tiny the input is, when using the SHA-256 algorithm, the
output has a constant length (256 bits).
resistance through a brute force approach is conceivable for bigger datasets, but
the time required makes this strategy useless.
4. Sufficiently small modifications have a significant influence on the complete
output: Small input changes can have a significant impact on the overall output.
5. Resistance to collisions: Each input will include a distinct hash value.
6. Suitable for puzzles: The hash value of a new variable is determined by the
combination of two values.
1.2 Motivation
A chain of blocks that contain data makes up the blockchain. There is a hash reference
in each block pointing to the information in the block preceding it. As a result, any
modifications made to a single block by a hacker will have an impact over the entire
chain, which makes this concept extremely unique.
1. The distributed ledger has multiple locations, with no one point failure.
2. Any proposed “new block” to the ledger should refer to the prior version of the
ledger, without compromising the correctness of earlier entries, to construct the
changeless chain from which the blockchain derives its name.
3. A newly proposed block of entries cannot be made a regular part of the ledger
until it is approved by a majority of network nodes.
The system creates the following unique contributions:
The first step is to look into blockchain frameworks that can already be utilized to
create smart contracts and electronic voting platforms. The second step is to suggest
a blockchain-applied electronic voting system that modifies liquid democracy by
using a “permissioned blockchain” (Chaum et al. [9]).
1.3 Objectives
The voting system here must therefore fulfill the following requirements:
1. The voting process must be openly auditable and transparent.
2. The electoral process must ensure that each voter’s vote was recorded.
3. Only eligible electors may cast ballots.
4. Voting procedures must be unbreakable.
5. Election influencing and rigging should not be permitted by any group seeking
power.
The most crucial requirements are met by a blockchain:
• Authenticity: Only registered voters will be permitted to cast a ballot.
• Anonymity: The system forbids any connection to be made between the identity
of the voters and the votes they cast.
29 Security of Electronic Voting Systems Using Blockchain Technology 347
• Accuracy: Once cast, votes are irrevocably recorded and cannot, under any
circumstances, be reversed.
• Verifiability: The system should be able to be checked to ensure that all votes
were cast.
2 Related Work
The underlying justification for the security model with evaluation metrics for is
presented by Adida et al. [10], in this study. Additionally, it describes the pretty
graspable democracy web voting theme, which is more understandable than pretty
smart democracy, the only other theme that currently fits both the adequate security
model and the intended security model.
Scantegrity, described by the authors Chaum et al. [9] and having negligible impact
on election operations, represents the initial standalone E2E verification technique
that secures optical scanning as the underlying voting mechanism while allowing for
a revote.
To assure justice, the article’s author Dalia et al. [11] advises adding a commitment
round, and if voters abort, adding a recovery round that would allow the election
results to be announced. It also offered a computational security demonstration of
ballot secrecy.
The author Bell et al. [12] of the article, discusses the STAR-Vote design, which
might serve as Travis County’s and possibly other places’ preferred next-generation
electoral system.
By utilizing Ethereum, as introduced by the author McCorry [13] an open vote
network (OVN), the first use of an online voting system that is transparent, self-
tallying, and self-reporting. The voting size in OVN was constrained by the frame-
work to 50–60 electors. The OVN is powerless to halt the systemic corruption caused
by dishonest miners. By sending an invalid ballot, a dishonest voter can also evade
the voting process. The election administrator wishes to trust, but the protocol makes
no provisions for guaranteeing the ability to resist violence as stated by the authors
Zhang et al. and Chaieb et al. [14, 15].
Additionally, they needed an additional library to complete the task because
solidity somehow doesn’t allow elliptic curve cryptography (Woda et al.) [16]. Once
the library was implemented, these generated contracts of voting got too big for
storage in the blockchain. Due to previous instances of service attacks on the bitcoin
network, OVN is susceptible to them (Hjálmarsson et al.) [17].
Lai et al. in [18] presented DATE, which stands for “A decentralized anonymous
transparent e-voting system” and has a lesser chance for participant credibility. They
think that massive electronic elections can be conducted using the DATE voting
system as it is currently set up. However, their proposed methodology lacks a third
authority in charge of auditing the vote after the election process, hence it is ineffective
348 R. K. Pandey and R. K. Tiwari
at preventing DoS assaults. This approach is only suitable for compact sizes due to
the constraints of the platform.
Shahzad et al. [19] recommended the BSJC proof of completeness as a reliable
electronic voting process. They used a process model to describe the framework of
the whole system. It also made a smaller-scale effort to address issues with election
security, privacy, and anonymity. Yet numerous difficulties have been raised. For
instance, the mathematical task required to prove labor is significant, challenging,
and labor-intensive. When a third party is engaged, there is also an issue because
there is a high possibility of data manipulation, leaks, and unfair outcomes that could
affect end-to-end verification. On a wide scale, the block’s generation and sealing
could prolong the polling procedure.
An audit function-equipped anti-quantum electronic voting mechanism based
on blockchain has been proposed by Zheng et al. [20]. Moreover, modifications
have been made to the code-based Niederreiter algorithm to strengthen its resistance
against quantum attacks. The key generation center (KGC) is a certification authority
for certificate-less cryptography. In addition to recognizing the voter’s anonymity, it
significantly streamlines the auditing procedure. Yet, a closer examination of their
approach reveals that, even with a modest voter turnout, there are still considerable
security and efficiency benefits associated with this small-scale election.
To improve security, some efficiency may be decreased if the number is high as
described by the author Fernández-Caramés et al. [21].
Yi [22] provided ideas for strengthening the electronic voting system’s security in
a peer-to-peer network in his description of the blockchain-applied e-voting scheme.
A BES based on distributed ledger technology (DLT) might be used to stop voter
fraud. The system was developed and tested on Linux machines connected to a P2P
network. The main issue with this method is attacks using counter-measures. This
method necessitates the involvement of reliable third parties and is not ideal for
centralized application in a system with several agents. A distributed approach, such
as the usage of secure modular computers, may be used to resolve the problem. The
cost of computing could become unaffordable in this case, though, if the computation
function is complex and there are too many participants (Torra et al. and Khan et al.)
[23, 24].
One of the most recent and important technical difficulties facing e-voting systems
is secure digital identity management. Before the elections, everybody who wants to
become a citizen should register to vote. Their information ought to be in a digitally
processable format.
29 Security of Electronic Voting Systems Using Blockchain Technology 349
In addition, any information that involves them should keep their identity
information private. The following issues with the outdated e-voting system:
• Voting anonymously: After casting a ballot through the system, which may
or may not include a choice for each candidate, voters should maintain their
anonymity, including the system administrators.
• Customized voting procedures: It’s still up for debate how votes are represented
in the relevant databases or web apps. A hashed token is more likely to provide
obscurity and integrity than a transparent text message, which is the worst possible
strategy. In the meanwhile, the vote should be disreputable because it cannot be
secured by a symbolic resolution.
• Voter-verifiable ballot casting: The voter should be prepared to see and confirm
his or her vote at the time the ballot is cast. This is frequently important to under-
stand to stop, or at the very least to be aware of, any potential hostile conduct.
In addition to offering non-repudiation suggestions, this counter-live can signif-
icantly increase the voters’ sense of trust. Some modern applications partially
self-address these concerns. However, evidence reveals that numerous nations,
like Brazil, the UK, Japan, and the Republic of Estonia are currently using elec-
tronic voting. The Republic of Estonia should be rated differently from the others
because they offer a complete e-voting system that is compared to traditional
paper-based elections.
• Expensive initial deployments, especially for businesses: While operating and
maintaining online voting systems are much less expensive than conducting
traditional elections, early deployments can be expensive.
• Growing security issues: Public opinion polls are seriously threatened by cyber-
attacks. If an election is compromised by malicious hacking, nobody would accept
the blame.
DDoS assaults are well-documented and rarely occur during elections. The United
States Citizen Integrity Commission has provided an affidavit regarding the state of
the country’s elections. Ronald Rivest made it clear that “hackers have a variety
of approaches in which to attack pick machines” as a result. As an illustration, the
hacking technique may make use of the barcodes on ballots and smartphones at
specific locations. Apple explicitly states that we shouldn’t dismiss the fact that
computers can be hacked and that any proof can be easily erased. Double voting and
voters from opposing regions are other frequent problems.
By making voting clear and simple to use, avoiding voting fraud, boosting data secu-
rity and confirming the results, blockchain technology addressed problems with the
current electoral system. The blockchain must implement the electronic comput-
erized voting procedure (Xiao et al. [25]). Yet, there are also significant security
concerns with electronic voting, such as the potential for vote fraud and abuse if
350 R. K. Pandey and R. K. Tiwari
Voter’s ID
Vote
Vote’s Signature
TimeStamp
4 Conclusion
bulletin board, it is also not clear to the administration whether this board (public
bulletin) can be trusted or not. Blockchain is considered a trusted solution for creating
a secure bulletin board that can be trusted publically. Blockchain is a new growing
technology that provides a secure and peer-to-peer platform for users. Therefore,
this paper surveyed the usage of blockchain in electronic voting, showing how the
existing electronic voting system has been replaced.
References
1. Lahane, A.A., Patel, J., Patha, T., Potdar, P.: Blockchain technology based e-voting system.
ITM Web Conf. 32, 1–8 (2020)
2. Aste T, Tasca P, Di Matteo T (2017) Blockchain technologies: the foreseeable impact on society
and industry. Computer 50:18–28
3. Roehrs A, da Costa CA, da Rosa Righi R, Alex R, Costa CA, Righi RR (2017) OmniPHR: a
distributed architecture model to integrate personal health records. J. Biomed. Inform. 71:70–81
4. Sleiman, M.D., Lauf, A.P., Yampolskiy, R.: Bitcoin message: data insertion on a proof-of-work
cryptocurrency system. In: Proceedings of the 2015 International Conference on Cyberworlds
(CW), Visby, Sweden, 7–9 Oct. 2015, pp. 332–336
5. Khan, M.A., Salah, K.: IoT security: review, blockchain solutions, and open challenges. Future
Gener. Comput. Syst. 82, 395–411 (2018)
6. Aumasson, J.: Serious Cryptography: A Practical Introduction to Modern Encryption. No
Starch Press, San Francisco, CA, USA (2017)
7. Zheng, Z., Xie, S., Dai, H., Chen, X., Wang, H.: An overview of blockchain technology:
architecture, consensus, and future trends. In: Proceedings of the 2017 IEEE International
Congress on Big Data (BigData Congress), Boston, MA, USA, 11–14 Dec. 2017, pp. 557–564
8. Mathur, G., Pandey, A., Goyal, S.: Immutable DNA sequence data transmission for next gener-
ation bioinformatics using blockchain technology. In: 2nd International Conference on Data,
Engineering and Applications (IDEA), Bhopal, India, pp. 1–6 (2020). https://doi.org/10.1109/
IDEA49133.2020.9170715
9. Chaum, D., Essex, A., Carback, R., Clark, J., Popoveniuc, S., Sherman, A., Vora, P.: Scantegrity:
end-to-end voter-veriable opticalscan voting. IEEE Sec. Privacy 6(3), 40–46 (2008)
10. Adida, B.: Helios: web-based open-audit voting. In: Proceedings of the 17th Conference on
Security Symposium, ser. SS’08. USENIX Association, Berkeley, CA, USA, pp. 335348 (2008)
11. Dalia, K., Ben, R., Peter, Y.A., Feng, H.: A fair and robust voting system. by broadcast. In: 5th
International Conference on E-voting (2012)
12. Bell, S., Benaloh, J., Byrne, M.D., Debeauvoir, D., Eakin, B., Kortum, P., McBurnett, N.,
Pereira, O., Stark, P.B., Wallach, D.S., Fisher, G., Montoya, J., Parker, M., Winn, M.:
Star-vote: a secure, transparent, auditable, and reliable voting system. In: 2013 Electronic
Voting Technology Workshop/Workshop on Trustworthy Elections (EVT/WOTE 13). USENIX
Association, Washington, DC (2013)
13. McCorry, P., Shahandashti, S.F., Hao, F.: A smart contract for boardroom voting with maximum
voter privacy. In: Proceedings of the International Conference on Financial Cryptography and
Data Security, Sliema, Malta, 3–7 Apr. 2017. [Google Scholar]
14. Zhang, S., Wang, L., Xiong, H.: Chaintegrity: blockchain-enabled large-scale e-voting system
with robustness and universal verifiability. Int. J. Inf. Sec. 19, 323–341 (2019) . [Google Scholar]
[CrossRef]
15. Chaieb, M., Koscina, M., Yousfi, S., Lafourcade, P., Robbana, R.: DABSTERS: distributed
authorities using blind signature to effect robust security in e-voting. Available online https://
hal.archives-ouvertes.fr/hal-02145809/document. Accessed on 28 July 2020
29 Security of Electronic Voting Systems Using Blockchain Technology 353
16. Woda, M., Huzaini, Z.: A proposal to use elliptical curves to secure the block in e-voting
system based on blockchain mechanism. In: Proceedings of the International Conference on
Dependability and Complex Systems, Wrocław, Poland, 28 June–2 July 2021. [Google Scholar]
17. Hjálmarsson, F.Þ., Hreiðarsson, G.K., Hamdaqa, M., Hjálmtýsson, G.: Blockchain-based e-
voting system. In: Proceedings of the 2018 IEEE 11th International Conference on Cloud
Computing (CLOUD), San Francisco, CA, USA, 2–7 July 2018. [Google Scholar]
18. Lai, W.J., Hsieh, Y.C., Hsueh, C.W., Wu, J.L.: Date: a decentralized, anonymous, and trans-
parent e-voting system. In: Proceedings of the 2018 1st IEEE International Conference on Hot
Information-Centric Networking (HotICN), Shenzhen, China, 15–17 Aug. 2018
19. Shahzad B, Crowcroft J (2019) Trustworthy electronic voting using adjusted blockchain
technology. IEEE Access 7:24477–24488
20. Gao, S., Zheng, D., Guo, R., Jing, C., Hu, C.: An anti-quantum e-voting protocol in blockchain
with audit function. IEEE Access (2019)
21. Fernández-Caramés, T.M., Fraga-Lamas, P.: Towards post-quantum blockchain: a review on
blockchain cryptography resistant to quantum computing attacks. IEEE Access 8, 21091–21116
(2020). [Google Scholar] [CrossRef]
22. Yi, H.: Securing e-voting based on blockchain in P2P network. EURASIP J. Wirel. Commun.
Netw. 2019, 137 (2019). [Google Scholar] [CrossRef][Green Version]
23. Torra V (2019) Random dictatorship for privacy-preserving social choice. Int. J. Inf. Sec.
19:537–543
24. Khan KM, Arshad J, Khan MM (2020) Investigating performance constraints for blockchain
based secure e-voting system. Future Gener. Comput. Syst. 105:13–26
25. Xiao, S., Wang, X.A., Wang, W., Wang, H.: Survey on blockchain-based electronic voting.
In: Proceedings of the International Conference on Intelligent Networking and Collaborative
Systems, Oita, Japan, 5–7 Sept. 2019
26. Imperial, M.: The democracy to come? An enquiry into the vision of blockchain-powered
e-voting start-ups
Chapter 30
Go-Kart Simulation in HoloLens
1 Introduction
There is a huge gap between enterprise applications and gaming applications for
HoloLens [1]. People either tend to use the HoloLens for industrial or fun purposes.
However, a lot of daily life problems can be solved through HoloLens applica-
tion development because of HoloLens features. Hence, an effort has been made to
develop an application with a creative enterprise solution which can be useful for car
industries both manufacturing and sales departments [2, 3].
We have got motivation through the major car companies working to deliver
mixed reality appearance to enhance car’s features. Here are some quotes from these
companies.
Make Way for Holograms: New Mixed Reality Technology incorporates with Car Design
as Ford Tests Microsoft HoloLens Globally. - FORD [4]
Volvo engineers uses Microsoft HoloLens for car designing digitally. Since simu-
lation plays an important role to design cars, Swedish engineers are the first to use of
HoloLens mixed reality to interact with virtual parts. An around 165 million dollars
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 355
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_30
356 K. Paridhi et al.
is spend on autonomous vehicle test facility to start Phase-II construction work [6].
Apart from various engineering visualizations and remote diagnosis, other advan-
tages of the HoloLens could deliver a race team. HoloLens could offer notable driver
profits. With several emerging head mounted display technologies, it is essential to
comprehend what makes the HoloLens ‘Mixed Reality’ approach different [7].
Virtual reality (VR) devices like the Oculus Rift delivers immersive experiences
which substitutes real world. This excludes you from seeing the interior of the actual
car. Furthermore, the Oculus Rift is a fully tethered appliance, necessitating a large
gaming computer and several wires running from a computer. However, this creates
a fun gaming environment but has no practical usage in real world for racing cars
and simulation in next-generation simulating training systems.
HoloLens is a wearable computer which adds various other inventions such as
the speed AI and speech engines, gaze, gestures, spatial understanding, spatial audio
and several other sensors [8]. The HoloLens provides several inventions that can
carefully and safely keep the driver well-versed, competitive and in control on both
race world and in the simulation. The HoloLens also delivers many advantages over
conventional immersive headset technologies. The most significant being that it can
identify what the driver is looking at, a feature Microsoft HoloLens delievers as
‘Gaze’ [9].
HoloLens uses mixed reality (MR) technology to interact with the real world.
MR blends VR and augmented reality (AR) technologies to create an environment
where both physical and virtual objects become interactable on an instance [10]. This
feature to interact with both physical and digital objects gives MR applications an
immense number of potential applications. HoloLens could potentially turn out to
be common in schools, colleges, hospitals and used in a variety of other professions.
Not only this but MR will also be seen in the retail departments like e-commerce and
fashion.
Holographic technologies are also being used in the education and healthcare
industry to both enhance students’ ability to learn and being interactive [11]. The
following are the simple ways where MR can help in the classroom.
i. Interact objects with the environment in an immersive experience.
ii. Touch and manipulate 3D objects in real-world environment.
iii. It is an interactive and fun way of learning.
iv. MR can also be used to teach different subjects to specially abled students.
A majority of the fields including civil, mechanical, architects and others have been
using MR to design things like buildings and cars as digital prototypes of real world.
Companies have invested in cave automatic virtual environment (CAVE) technology,
where developer teams can view objects projected on the floor and modify designs
on the same time as reshaping, removing/adding different elements, saving money
on physical models and speeding up design. This can also be helpful for working
remotely, and engineers will be able to view the objects remotely through a immersive
headset to connect, interact and identify problems or collaborate with workers on
site in real time. Using MR technology engineers in other disciplines will also work
differently as the tools upend the design process.
30 Go-Kart Simulation in HoloLens 357
In this paper, a conventional clay car model is transformed into digital objects
embedded in the real world [12]. To embed fully functional digital objects into real
world, we need to make the real-world environment to work with mixed reality
technologies. In this paper, we have also implemented ML-based self-driving car
models in the Go-Kart system so that we can have an automated car. This will
be a very interactive, less time-consuming system with one time investment as the
people would not have to rely on conventional clay model to be made to show the
specifications, both producers and consumers can modify, communicate the system
and cars anytime. Not only they can have an interactive 3D system in real world,
they can also see the automated version of a car in a track with a self-driving deep
learning car model in it [13]. The examined capability of the application that could.
i. Provide consumer with suitable car features and information based on what they
look, and interactions possessed.
ii. Provide cars with suitable real-time information in response to the commands
and speech commands given through MR technology and self-driving car model.
iii. Track the position of the car and give out best projection.
iv. Provide drivers with relevant car related information through inbuilt object detec-
tion system in self-driving training model, e.g. speed and steering angle.
v. Allow the car to be configured by the person through voice, gaze and other
HoloLens interactions facilities.
The rest of the paper is organized as follows: Sect. 2 gives the brief research
happened in the field of Go-Kart simulation. The proposed methodology is clearly
explained in Sect. 3. Section 4 displays the simulated images and observations. The
paper is concluded in Sect. 5 with remarks.
2 Literature Review
There is a very few literature available related to this work. Moreover, there are no
sufficient data links that are available for such systems because mostly people focus
it for fun gaming environment. No dataset is directly available to develop a self-
driving car model. Hence, we have to develop some datasets first and then train the
model for Go-Kart simulation. The tasks such as object detection and classification
are burdensome from such a dataset.
It is possible to use the popular convolutional neural networks (CNNs) for object
detection and tracking. Real-time processing CNN contains many interconnections
and complicated mathematical computations which requires plenty of processing
power and computation time [14]. The precision of the image dataset is directly
dependent on its computation time. However, concerning the model to be a real time
a compensation to accuracy is required for better computation time. The categorized
dataset cannot be used again by several detection approaches because they need
distinctive preprocessing and clustering functions. A lot of research is still left for
developing mixed reality applications and especially in car manufacturing industries
358 K. Paridhi et al.
[15]. The designing process and creating a new set of data are quite difficult and
tedious. In this work, we made an effort to use mixed reality HoloLens concept for
Go-Kart simulation.
3 Proposed Methodology
An effort has been made to develop a Go-Kart simulated system in HoloLens through
mixed reality application in which we can see the detailed version of a chosen car,
chosen race track and simulation of an automated driving car scene. These features
were made through following three modules.
i. Developing deep learning self-driving car model.
ii. Developing mixed reality application.
iii. Configure CNN model to the mixed reality application car in racetrack scene.
The development of application required making race track, which we created and car
model which we used from standard assets provided by Unity and imported in Unity.
Further, the task to develop elements like buttons, panels, scenes to add function is
considered and specified features to it. The code has been done using Visual Studio
(VS) 2019. Later, we have deployed the application to remote machine (HoloLens).
The system consists of a mixed reality application which is a platform to see the 3D
car model, 3D track and their specifications. Car simulation with self-driving deep
learning mode has been implemented in HoloLens and a hardware prototype with
all software and hardware specifications to build such model.
In this paper, we first implemented the task of detecting lane lines for the car to give
them the direction and further focused on implementing number detection and traffic
signal detection. After configuring the code of these algorithms, the task of recording
has been started through left, centre and right cameras, respectively. Further, images
of nearly 13,000 have been collected to form the dataset. The collected images
have been pre-processed it to train the model with different techniques like zoomed
images (focusing only track), augmentation techniques and panned images[16].
Considering the behavioural cloning (Nvidia model Architecture) for training, the
pre-processed images have been trained by a neural network model. In this case,
we have implemented CNN model with backpropagation techniques to minimize
30 Go-Kart Simulation in HoloLens 359
the error function of the chosen task. The information related to training model
architecture is explained below.
Training model architecture After pre-processing all the data, we started designing
our model architecture to train such data. But, there was a problem to deal with such
large datasets because there were about 35,000 images for traffic signs detection of
32 × 32 order. Now, we have 13,000 images that are taken from centre, left and right
cameras to train the car model with 200 × 66 order. In this case, a suitable model for
behavioural cloning is called the Nvidia model.
The model proposed by the Nvidia model is an end-to-end learning for self-
driving cars which are implemented by real-life self-driving cars. The beginning
of the architecture model can be seen with an input plane consisting ultraviolet
(UV) images, which are already normalized and pre-processed through the code.
Here onwards, we begin the architecture of our model as you can see the Nvidia
model starts with an input plane which consists of our 66 × 200 by UV images and
these images are then normalized in the architecture. This data is then passed to
convolutional layer. Ensuring that we imported Conv2D libraries, added layer by
layer convolutional network.
The first layer consists of 24 filters with a kernel of the size 5 × 5. The kernel
will then be passed through our image by strides (function which refers to the stride
length). This will translate all the small image files to one pixel. So that can get
larger images with many more pixels to process through. Then we will use ReLU
activation function to add such layers to our CNN. Next layer will be a 2D layer
consisting of 36 filters with a kernel size of 5 × 5. Similarly, all the layers will be
added to convolutional layers keeping in mind regarding their kernel size and images.
Further, we finally combine all the layers to get our training model with the error
metric being squared error so the loss will be equal to minimum mean squared error
(MSE) and we will use adam optimizer.
By keeping low learning, it can help in improving accuracy and then trained
this architecture. To overcome the issues of over-fitting, we have also used dropout
layers in between. This will also help to generalize the training data. Also, it will use
combinations of various nodes to understand from the given data. At the verdict, we
have collected the parameters details to get an in-depth summary of all the parameters
inside our model. In order to train the data, we used 30 epochs, which is pretty high
level but this will result an efficient trained model to be implemented. The network
architecture diagram is given in Fig. 1.
Configuring deep learning self-driving car model to the car in mixed reality applica-
tion racetrack scene. It is observed that the model is efficient for our car model. We
configured the model with the python code from command prompt to the application
or unity racetrack, and then, it is all set to be able to use and also displays auto-
360 K. Paridhi et al.
mated simulation to the car. To configure the deep learning model, we tried to make
a client server model in such a way that client side is the training images, and trained
deep learning model through images and codes. We created a virtual environment
and imported all the libraries and packages required to run the model. This will be
become a server side. Hence, the server is running in the model and while taking the
references of images and subsequent values of steering angle, throttle speed, speed
and is learning to drive a car autonomously through it.
30 Go-Kart Simulation in HoloLens 361
When the application is connected, it will land us in the main menu scene named as
HoloMenu, which contains four blocks, each block is a page containing information
about each element in the menu as shown in the Fig. 2a.
The first box named as car information contains information about the car, and
the specifications of the car along with the 3D model which we can interact with.
The model can be rotated and resized so that we can analyse and inspect the designed
model for any defects. The same has been depicted in Fig. 2b–d.
The second block named as track information contains the information of the
track and the assets that are present in the track scene along with a 3D model of the
track which can be interacted with bounding box as well. The track can be resized
and rotated to analyse the details of the track. The pink colour shown for the car info
page shows that the block is pressed earlier, as shown in Fig. 2e–g. The third block
of the application contains the hardware requirements. The hardware and software
configurations to make a real-life model.
The fourth block directs us to the main scene where we are simulating the automa-
tion of the car which uses the self-driving deep learning model. Before we can actually
start simulating the car we need to start the server which will run the deep learning
model. Take the image data and values from client side and the generate steering
angle, throttle speed and car speed through the every image instance happening
with the previous data available at client side. The information has been depicted
in Fig. 2h. Virtual environment server side connection for running the model and
getting steering angle, throttle speed and speed of the car.
This application can be used by Go-Kart/race car drivers as well as car industries.
Go-Kart or race car drivers can analyse the track when they are racing and can deploy
their car into the mixed reality environment. It can see their car’s maximum potential
or how the car will be driven in the race track model loaded into the HoloLens
application. Since the car uses behavioural cloning, the car can be trained according
to the drivers’ capabilities. It can be used by the car industries as they will not require
to build a clay model to inspect the car instead, they can build the model into the
HoloLens application so as to analyse the car into the mixed reality environment and
make changes to it accordingly. The future goal of this work is to run the 3D model
of the car into the real-world environment so that we don’t require to import a virtual
world to simulate the automated car and give the users a feel of the look of their
pre-ordered vehicle.
362 K. Paridhi et al.
(e) Track information page gaze (f) Track information page after
interaction. gesture tap.
Fig. 2 Sample screenshots obtained out of the proposed Go-Kart simulation model
30 Go-Kart Simulation in HoloLens 363
References
1 Introduction
Both the governmental and private sectors employ video surveillance equipment.
They have far-reaching ramifications in the fight against criminals and terrorism.
Understanding human behavior from video is an important branch of computer vision
research that has become majorly important in recent research. Newly advances in
computer vision, the availability of affordable equipment such as video cameras, and
a wide stream of new applications such as personal individual and visual observation
are all driving interest in human motion analysis. It can analyze the mobility of a
human or body component from monocular or multi-view video pictures with no
need for human involvement.
Virtual reality, medical diagnostics, physical performance, human–machine inter-
action, and assessment have all been fascinating uses of the movement of the human
body analysis research. Tracking and estimating motion characteristics, studying the
human body structure, and detecting motion activities are three areas of research
directions in general. These are taken into account while analyzing human body
motion. One of the essential technologies in intelligent environments, security moni-
toring, and human–computer interaction is intelligent vision analysis. This method
is based on the detection of moving objects. Its main purpose is to detect moving
objects in relation to the entire picture. Other sophisticated applications, including
as target tracking, target categorization, and target behavior comprehension, are built
on the basis of detecting moving objects.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 365
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_31
366 P. P. Pawar and A. C. Phadke
The frame subtraction approach, the backdrop subtraction method, and the optical
flow method are the most often utilized methods in moving object recognition today.
The frame difference or frame subtraction technique detects moving objects by
computing the changes between pixels in successive frames of a video series, as
well as extracting motion areas using a time difference threshold between adja-
cent frames pixels. Although frame subtraction techniques are adaptable to scenes
with abrupt lighting changes, certain crucial pixels cannot be retrieved, resulting in
gaps inside moving things. Calculating the image optical flow field and clustering
processing based on the optical flow distribution features of the picture is the optical
flow technique. This approach can get comprehensive activity statistics and better
distinguish the mobile item out of the background, but it is not suited for real-time
demanding situations due to a high number of calculations, susceptibility to noise,
and poor anti-noise performance.
2 Survey Details
Despite the significant advancement achieved using deep learning methods in many
machine learning tasks, deep learning approach is found rare in anomaly detec-
tion. A number of authors conduct surveys on deep learning algorithms based on
their intended use, for example, fraud detection, cyber intrusion detection, medical
domain, IoT, big data anomaly detection, etc. The deep neural network design is
chosen based on the nature of the input data, which is classified as sequential data
and non-sequential data. Deep neural network architecture such as CNN, RNN,
LSTM is used for sequential data input. And, CNN, AE, and its variants are used for
non-sequential data inputs. The availability of labels is also a factor in deep learning
detection algorithms. Labels show if a certain data point is a typical or outlier. Based
on these labels, methods are classified as supervised, semi-supervised, and unsuper-
vised deep anomaly detection. Some new techniques have been employed depending
upon the training objectives, which are deep hybrid models and one-class neural
networks. This paper surveys the various methods and algorithms used in various
applications.
Supervised Deep Anomaly Detection (DAD) involves utilizing labels of normal and
an abnormal data samples to train a deep supervised classifier which can be binary
or multi-class. Despite their better effectiveness, supervised DAD approaches are
not as common as semi-supervised or unsupervised methods due to the scarcity of
labeled training data. Furthermore, the performance of a deep supervised classifier
using an anomaly detector is sub-optimal owing to class imbalance (the total number
of positive class instances is far more than the total number of negative class of data)
31 A Survey on Different Techniques for Anomaly Detection 367
[6]. The most often used supervised algorithms are decision tree, support vector
machines (SVMs), supervised neural networks, k-nearest neighbors, and Bayesian
networks.
2.1.1. k-NN estimates the approximate distances between various points on the
input vectors and then assigns the unlabeled point to the K-nearest neighbor’s
class. Shailendra and Sanjay [6] proposed a hybrid feature selection strategy
that combines a two-phase filter and a wrapper. The filter phase chooses the
features with the largest information gain and sends them to the wrapper
phase, which generates the final feature subset. To categorize assaults, the
final feature subsets are fed into the K-nearest neighbor classifier. The useful-
ness of this approach is proved using the DARPA KDDCUP99 cyberattack
dataset.
2.1.2. The Bayesian network approach is commonly used for intrusion detection
in conjunction with statistical systems. According to Johansen and Lee [7],
a Bayesian network approach provides a sufficient mathematical basis for
making a seemingly tough problem simple. They suggest that Bayesian
network-based intrusion detection systems discern between assaults and
regular network activity by comparing metrics from each network traffic
sample. Moore and Zuev [8] employed a supervised Naive Bayes classi-
fier using 248 flow characteristics, in addition to various TCP header derived
features, to discern between different types of applications. Correlation-based
feature selection was utilized to create stronger features, and it revealed that
good classification requires just a small subset of less than 20 characteristics.
2.1.3. Supervised neural network (NN). If correctly planned and worked out, NN has
the potential to solve many of the difficulties experienced by rule-based tech-
niques. The most widely utilized supervised neural networks are multi-layer
perceptron (MLP) and radial basis function (RBF). Moradi and Zulkernine
[9], Mohammed et al. [11] employed three layers’ MLP (two hidden layers) to
not on ly detect normal and attack connections, but also to identify attack kind.
Jiang et al. [10] proposed a novel method for detecting abuse and anomalies
in a hierarchical RBF network. In the first layer, an RBF anomaly detector
determines if an event is normal or abnormal. Anomaly events are then sent
via an RBF abuse detector chain, with each detector detecting a different
sort of assault. Any anomalous occurrences that were not categorized by any
misuse detectors were recorded in a database. If enough anomalous events
were recorded, they were categorized into distinct categories by a C-means
clustering technique, which was then used to train a misuse RBF detector
and added to the misuse detector chain. This method automatically detects
and label all intrusion occurrences.
2.1.4. Decision tree has nodes, arcs, and leaves as main component. The decision
trees for DoS attacks, R2L attacks, U2R attacks, and Scan assaults were
constructed by Lee et al. [12]. The ID3 method is utilized as the learning
algorithm to automatically create the decision tree.
368 P. P. Pawar and A. C. Phadke
2.1.5. Support vector machine (SVM) initially translates the input vector into a
higher-dimensional feature space and then finds the best separating hyper-
plane in that space. Furthermore, the separation hyperplane, which is defined
by support vectors rather than the entire training sample, is particularly
resilient against outliers. The suggested PSO–SVM model by Wang, et al.
[13] is used as an intrusion detection issue, with the standard PSO used
to select the parameters of the support vector machine and the binary PSO
utilized to acquire the best feature subset at the building intrusion detection
system. Mukkamala et al. [14] created a model to detect network anoma-
lies by “applying kernel classifiers and classifier construction approaches
to network anomaly detection challenges.” They investigated the effect of
kernel type and parameter values on the accuracy of intrusion categorization
performed by a support vector machine (SVM).
Because labels for normal examples are much easier to get than labels for anomalies,
semi-supervised DAD approaches have become more popular; it employs existing
labels of one (usually positive class) to differentiate anomaly. Deep autoencoders
are commonly used in outlier detection by training them semi-supervised on data
samples with no abnormalities [7]. DAD approaches that are semi-supervised or (one-
class classification) presume that all training cases have just one-class label. Because
computer networks are becoming more complex, network intrusion detection systems
(NIDSs) are becoming increasingly important. Machine learning-based detection
systems have received a lot of interest because of their capacity to detect new assaults
[16]. However, to train an efficient model, it requires an enough amount of labeled
training data, which is tough to gather and not at affordable cost. To that end, it
is necessary to develop models that can learn from unlabeled or partially labeled
data [16]. Min et al. [16] provide SU-IDS, an autoencoder-based system for semi-
supervised and unsupervised network anomaly detection. The methodology improves
performance by supplementing the standard clustering loss of an autoencoder. The
experimental findings on the traditional NSL-KDD dataset and the contemporary
CICIDS2017 data set suggest that proposed models are superior.
For surveillance applications, videos are the major source of information.
Although video content is frequently available in vast amounts, it typically has little
or no annotation for supervised learning. Kiran et al. [15] examine and categorize
state-of-the-art deep learning-based approaches for video anomaly detection based
on model type and detection criteria. We also conduct basic research to better under-
stand the various methodologies and give assessment criteria for spatiotemporal
anomaly identification. Perera and Patel [17] offer a unique deep learning-based
strategy for one-class transfer learning that uses labeled data from an unrelated task
for feature learning in one-class classification. The suggested technique works on top
of a convolutional neural network (CNN) of choice to generate descriptive features
31 A Survey on Different Techniques for Anomaly Detection 369
with low intraclass variation in the feature space for the given class. Two loss func-
tions, compactness loss and descriptiveness loss, are presented for this purpose,
coupled with a parallel CNN architecture.
Unsupervised anomaly detection methods do not require any training data. They
used two fundamental assumptions as an alternative. First, they assume that most
network connections are normal and that only a tiny amount of traffic is problematic.
Second, they expect hostile traffic to be statistically different from normal traffic.
“According to these two assumptions, data groups of similar instances that appear
frequently are deemed to be regular traffic, whereas instances that differ significantly
from the bulk of the instances are considered malicious” Jebur, et al. [18]. K-means,
self-organizing maps (SOM), C-means, Expectation–Maximization meta-algorithm
(EM), adaptive resonance theory (ART), unsupervised niche clustering (UNC), and
one-class support vector machine are the most often used unsupervised algorithms.
2.3.1. Clustering techniques—Clustering algorithms have been discovered to
function by grouping observable data into clusters based on a specific
similarity or distance metric. There are at least two methods for detecting
anomalies using clustering. The anomaly detection model in the first tech-
nique is trained with unlabeled data that includes both normal and attack
traffic. The model is trained using just normal data in the second tech-
nique, and a profile of normal activity is constructed [18]. The first strategy
assumes that aberrant or attack data is a tiny fraction of total data. If this
assumption is correct, cluster sizes can be used to detect abnormalities and
assaults. Large clusters represent typical data, whereas the remaining data
points, which are outliers, represent assaults.
2.3.1.1. K-means separates the data into k clusters and ensures that data inside the
same cluster is similar, while data in other clusters has low similarities “The
K-means method first chooses K data at random as the initial cluster center,
then adds the rest of the data to the cluster with the highest similarity based
on its distance to the cluster center, and finally recalculates the cluster center
of each cluster. Repeat this process until no cluster centers change. As a
result, the data is separated into K clusters. Unfortunately, K-means clus-
tering is susceptible to outliers, and a group of objects closer to a centroid
may be empty, preventing centroids from being updated” Han [19]. Li [20]
proposes a method on intrusion detection based on data mining. To begin,
a method for reducing noise and isolating spots on the dataset was devel-
oped. An approach for calculating the number of the cluster centroid was
provided by splitting and merging clusters and utilizing the density radius
of a super sphere. An anomaly detection model was provided to achieve
370 P. P. Pawar and A. C. Phadke
2.4.1. Deep hybrid models (DHMs)—A deep hybrid model for detecting aberrant
flights is presented by Wang et al. [35]. Deep hybrid models for anomaly
detection employ deep neural networks, primarily autoencoders, as feature
extractors; the features learnt inside autoencoder’s hidden representations are
then fed into a cluster algorithm, which detects aberrant flights. Without preset
criteria or domain expertise, the model may detect flight irregularities and
related dangers. DHM for intrusion detection employs deep neural networks
as feature extractors, feeding features learned in hidden representations of
autoencoders into classic anomaly detection algorithms such as one-class
SVM (OC-SVM) to detect intrusion (Andrews et al. [36]). Ergen et al. [37]
suggested a hybrid model variation that incorporates combined training of
feature extractor together with OC-SVM (or SVDD) aim to enhance detection
performance. The lack of a trainable objective tailored for anomaly detection
is a key weakness of these hybrid techniques, since such models are unable
to extract rich differential features to detect intrusions. Hence, specialized
anomaly detection methods such as deep learning algorithms, deep one-class
classification, and one-class neural networks are implemented.
2.4.2. One-class Neural Network (OC-NN)—Chalapathy et al. [38] methods for
one-class neural network (OC-NN) classification are inspired by kernel-
based one-class classification, that includes the capability of deep neural
31 A Survey on Different Techniques for Anomaly Detection 373
dimensions and highly uneven distribution present certain obstacles. To address these
issues, a novel anomaly detection approach based on Gaussian Restricted Boltzmann
Machine (GRBM) is suggested by Zang et al. [49]. The investigation was conducted
using two real-world cases: wine quality and cigarette product testing.
2.5.2. Deep Belief Network—Deep Belief Networks (DBNs) are a type of deep
neural network that consists of numerous layers of Restricted Boltzmann
Machine graphical models (RBMs) [18]. DBNs are utilized as a directed
encoder–decoder network using a backpropagation method, according to the
hypothesis (Werbos [42]). DBNs are incapable of capturing the typical fluctu-
ations of anomalous samples, resulting in a large reconstruction error. DBNs
have been found to scale well to massive data and increase interpretability
(Wulsin et al. [43]).
2.5.3. Generalized denoising autoencoder—The Convolutional Autoencoder
(CAE) is an intriguing candidate for anomaly detection as it captures the
2D structure in an image sequences during the learning process. The work of
Ribeiro et al. [44] employs a CAE in the context of outlier identification, by
utilizing the reconstruction error of each frame in an image as an anomaly
score.
They present a method for combining high-level spatial and temporal charac-
teristics with the input instances and analyze resultant impact CAE ability while
exploring the CAE architecture. A simple parameter of video spatial complexity was
developed and associated with the CAE’s classification ability. Guo et al. [45] offer
AEKNN, an unsupervised anomaly detection framework that incorporates the bene-
fits of autonomously learned representation by deep neural networks to improve
anomaly detection performance. The system combines autoencoder training with
a k-th closest neighbor outlier identification algorithm. Jia et al. [46] suggested a
stacked denoising autoencoder-based intelligent rolling bearing failure diagnostic
system. The dimension of the original data was reduced using Principal Component
Analysis, and superfluous information was removed. The bearing data is then trained
using three denoising autoencoders. The learned DAE is then layered with a stack
denoising autoencoder with three hidden layers for backward optimization. Further,
the characteristics are fed into a soft-max classifier to detect faults.
2.5.4. Recurrent neural network (RNN)—Nanduri et al. [47] describe the appli-
cation of “Recurrent Neural Networks (RNN) with Long Term Short-Term
Memory (LTSM) and Gated Recurrent Units (GRU) architectures to over-
come the limitations of dimensionality reduction, poor sensitivity to short-
term anomalies, and inability to detect anomalies in latent features in machine
learning algorithms” [47].
2.5.5. Long Short-Term Memory Network—Ergen and Kozat [48] use extremely
effective gradient and quadratic programming-based training approaches for
training and tuning the values of the LSTM architecture and the OC-SVM
(or SVDD) algorithm. To use the gradient-based training approach, they
change the main aim criteria of the OC-SVM and SVDD algorithms, and
31 A Survey on Different Techniques for Anomaly Detection 375
the convergence of the changed aim criteria to the main criteria is demon-
strated [48]. They obtain anomaly detection methods capable of processing
varied length data sequences and maintaining excellent performance, partic-
ularly for continuous series of data. Overall structure of this approach has
been summarized in Fig. 1
Elsayed et al. [49] presented a novel method that relied on Long Short- Term
Memory (LSTM) autoencoder and one-class support vector machine (OC-SVM) to
identify anomaly assaults in an imbalanced data by training the system with instances
from normal classes only.
“The LSTM-autoencoder is trained to learn the typical traffic pattern as well as
the compressed representation of the input data (i.e. latent features), after which it
is fed into an OC-SVM method. The hybrid model solves the drawbacks of the indi-
vidual OC-SVM” [49]. Malhotra et al. [50] introduced an encoder–decoder technique
for anomaly identification (EncDec-AD) that relies on Long Short-Term Memory
Networks which learns to rebuild “normal” time-series behavior and then use recon-
struction error to detect abnormalities. They test three accessible time-series datasets:
power demand, space shuttle, and ECG and two real-world engine datasets with
predictive and unpredictive behaviors. It has been demonstrated that EncDec-AD is
resilient and can identify anomalies in time series that are predictable, unexpected,
periodic, aperiodic, and quasi-periodic. EncDec-AD can detect abnormalities in both
short and long time series (lengths as short as 30 and length as large as 500) [50].
LSTM networks are work well for classification, processing, and making predictions
that rely on time-series data.
2.6.1. Suspicious activity detection network for video surveillance using machine
learning—Shivtare et al. [1] proposed employing neural networks to detect
suspicious human activity in real-time CCTV data. It is extremely difficult
376 P. P. Pawar and A. C. Phadke
It has been observed that real-time data availability is tough to achieve and needs
a long process to access data from ongoing system. There is an enough gap in
developing a technique to access data logs, as well as building a system and validating
it in real-time situations. Machine learning algorithms are being developed to cope
with data that has a large dimensionality and to detect abnormal system behavior.
Deep learning, a subset of machine learning, shows considerable success in many
378 P. P. Pawar and A. C. Phadke
domains (such as computer vision and audio processing) in producing more accurate
outcomes of challenging problems. There is a need to apply novel models and analyze
their ability in the anomaly detection sector, particularly for intelligent transportation,
industrial, and smart object-based systems.
Lack of real-time data makes difficult for the systems to access data; hence, there
is a need of huge balance dataset to build models and validate it in real-time systems.
It has been found that while analyzing data, maximum amount of data seen is under
normal behavior condition, so finding an abnormality requires training the system
with huge data, and hence, more robust systems are required to develop to achieve
maximum accuracy and deal with complex real-time scenarios. The majority of recent
research have focused on the identification of abnormalities. Anomaly prediction and
prevention are still an area of study that needs to be explored. It can be very helpful
in predicting anomalies. New ways for proactively preventing system failures and
analyzing root cause analysis must be found and pursued.
The emergence of new methodologies and techniques to process the different data
streams provided by IoT devices, healthcare systems, intelligent surroundings, and
complicated industrial systems has been seen.
Conflict of Interest The authors declare that there is no conflict of interest in this paper.
References
1. Shivthare, K.V., Bhujbal, P.D., Darekar, A.P.: Suspicious activity detection network for video
surveillance using machine learning. Int. J. Adv. Sci. Res. Eng. Trends 6(4) (2021)
2. Sabokrou, M., Fathy, M., Hoseini, M., Klette, R.:Real-time anomaly detection and localization
in crowded scenes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition
Work-shops (CVPRW), pp. 56–62 (2015)
3. Zojaji, Z., Atani, R.E., Monadjemi, A.H.: A survey of credit card fraud detection techniques:
data and technique oriented perspective. arXiv pre-print arXiv:1611.06439 (2016)
4. Min, S., Lee, B., Yoon, S.S.: Deep learning in bioinformatics. Briefings Bioinform. 18(5),
851–869 (2017)
5. Sabokrou M, Fayyaz M, Fathy M, Klette R (2017) Deep-cascade: cascading 3D deep neural
networks for fast anomaly detection and localization in crowded scenes. IEEE Trans. Image
Process. 26(4):1992–2004
6. Singh, S., Silakari, S.: An ensemble approach for feature selection of Cyber Attack Dataset.
arXiv preprint arXiv:0912.1014 (2009)
7. Johansen, K., Lee, S.: CS424 network security: Bayesian network intrusion detection (BINDS)
(2003)
8. Moore, A.W., Zuev, D.: Internet traffic classification using Bayesian analysis techniques. In:
Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and
Modeling of Computer Systems (2005)
9. Moradi, M., Zulkernine, M.: A neural network based system for intrusion detection and
classification of attacks. In: Proceedings of the IEEE International Conference on Advances
in Intelligent Systems-Theory and Applications. IEEE Luxembourg-Kirchberg, Luxembourg
(2004)
10. Jiang, J., Zhang, C., Kamel, M.: RBF-based real-time hierarchical intrusion detection systems.
In: Proceedings of the International Joint Conference on Neural Networks, vol. 2. IEEE (2003)
31 A Survey on Different Techniques for Anomaly Detection 379
11. Sammany, M., et al.: Artificial neural networks architecture for intrusion detection systems and
classification of attacks. In: The 5th International Conference INFO2007 (2007)
12. Lee, J., Lee, J., Sohn, S., Ryu, J., Chung, T.: Effective value of decision tree with KDD 99 intru-
sion detection datasets for intrusion detection system. In: 2008 10th International Conference
on Advanced Communication Technology, pp. 1170–1175 (2008)
13. Wang, J., et al.: A real-time intrusion detection system based on PSO-SVM. In: Proceed-
ings of 2009 International Workshop on Information Security and Application (IWISA 2009).
Academy Publisher (2009)
14. Mukkamala, S., Sung, A.H., Ribeiro, B.M.: Model selection for kernel based intrusion detection
systems. In: Adaptive and Natural Computing Algorithms, pp. 458–461. Springer, Vienna
(2005)
15. Kiran BR, Thomas DM, Parakkal R (2018) An overview of deep learning based methods for
unsupervised and semi-supervised anomaly detection in videos. J. Imaging 4:36
16. Min, E., et al.: Su-ids: a semi-supervised and unsupervised framework for network intrusion
detection. In: International Conference on Cloud Computing and Security. Springer, Cham
(2018)
17. Perera P, Patel VM (2019) Learning deep features for one-class classification. IEEE Trans.
Image Process. 28(11):5450–5463
18. Omar, S., Ngadi, A., Jebur, H.H.: Machine learning techniques for anomaly detection: an
overview. Int. J. Comput. Appl. 79(2) (2013)
19. Han, J., Kamber, M.: Data Mining: Concept and Techniques, 1st ed. Morgan Kaufmann
Publishers (2001)
20. Li, H.: Research and implementation of an anomaly detection model based on clustering
analysis. In: International Symposium on Intelligent Information Processing and Trusted
Computing (2010)
21. Qu X, Yang L, Guo K et al (2021) A survey on the development of self-organizing maps for
unsupervised intrusion detection. Mob. Netw. Appl. 26:808–829
22. Lotfi Shahreza, M., Moazzami, D., Moshiri, B., Delavar, M.R.: Anomaly detection using a
self-organizing map and particle swarm optimization, Scientia Iranica 18(6) (2011)
23. Amini, M., Jalili, R.: Network-based intrusion detection using unsupervised adaptive resonance
theory (ART). In: Proceedings of the 4th Conference on Engineering of Intelligent Systems
(EIS 2004), Madeira, Portugal (2004)
24. Leon, E., Nasraoui, O., Gomez, J.: Anomaly detection based on unsupervised niche clustering
with application to network intrusion detection. In: Proceedings of the 2004 Congress on
Evolutionary Computation (IEEE Cat. No. 04TH8753), vol. 1. IEEE (2004)
25. Dunn, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-
separated clusters. 32–57 (1973)
26. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Springer
Science & Business Media (2013)
27. Mabu, S., et al.: An intrusion-detection model based on fuzzy class-association-rule mining
using genetic network programming. IEEE Trans. Syst. Man Cybern. Part C (Applications and
Reviews) 41(1), 130–139 (2010)
28. Shang, W., Cui, J., Song, C., Zhao, J., Zeng, P.: Research on industrial control anomaly detection
based on FCM and SVM. In: 2018 17th IEEE International Conference on Trust, Security and
Privacy in Computing and Communications/12th IEEE International Conference on Big Data
Science and Engineering (Trust-Com/BigDataSE), pp. 218–222 (2018)
29. Chen, R., Zhang, F., Xi, L.: Anomaly detection algorithm based on FCM with improved Krill
Herd. J. Phys. Conf. Ser. 1187(4) (2019). IOP Publishing
30. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the
EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39(1):1–22
31. Zong, B., et al.: Deep autoencoding Gaussian mixture model for unsupervised anomaly
detection. In: International Conference on Learning Representations (2018)
32. Li, K.-L., Huang, H.-K., Tian, S.-F., Xu, W.: Improving one-class SVM for anomaly detection.
In: Proceedings of the 2003 International Conference on Machine Learning and Cybernetics
380 P. P. Pawar and A. C. Phadke
1 Introduction
The volume of mobile data traffic throughout the globe has skyrocketed in recent
years. According to estimates from the International Telecommunication Union
(ITU), monthly global mobile data traffic will grow from its present level to 607
Exabyte (EB) by 2025, and then to 5016 EB by 2030 [1]. In 2025, we predict a
total of around 39 EBs, and by 2030, we anticipate a total of about 257 EBs. Pro-
jections show that by 2025, more than 70% of the world’s population will subscribe
to a mobile service. More than half of these 70% are also likely to have access to
the Internet through mobile devices. The vast data flow necessitates an increase in
a variety of services, including full coverage, ultra-reliable, low-latency wireless
communications with a focus on throughput rather than protocol overhead. Personal
computers, portable media players, tablets, smart phones, sensors, and the Internet
itself have all played a role in the exponential growth of data traffic. The term “Inter-
net of Everything” refers to the interconnectivity and interoperability of all devices,
systems, and applications that may be linked to the web (IoE). These gadgets are data-
driven (especially in terms of video) and have a low call volume. The exponential
growth of Internet and mobile users, as well as M2M and linked devices. Projections
of the number of people using the Internet throughout the world for the year 2023.
There will be around twice as many M2M and connected devices in use by 2023,
according to projections. The total number of linked devices is in billions. The total
number of connected devices in billions across six different time periods from 2018
to 2023 [2]. It’s worth noting that 13.5 billion gadgets are expected to be connected in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 381
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_32
382 K. H. Gudadhe et al.
APAC countries by 2020. These figures highlight the growing significance of wire-
less broadband connectivity across a wide range of sectors, from transportation and
health care to infrastructure and even home and military applications.
2 6G Vision
Sixth-generation networks aim to be even more advanced than the current generation
of wireless communication systems in order to better serve the needs of users and
handle enormous amounts of data traffic. Sixth-generation wireless networks aim to
improve data transfer speeds while reducing power consumption, expand broadband
access and coverage, reinforce communication security and trustworthiness, boost
connection dependability, reduce latency, and realize intelligent communication. In
theory, 6G networks might enable data rates in excess of 100 Gbps, assuming an
end-to-end latency of less than 1 millisecond. This is why ensuring the security of
user communications in 6G networks is crucial. It’s possible that if 6G networks are
extensively installed. The goal of 6G networks is to deliver reliable, low-latency wire-
less communications. Figure 1 shows the golden era of 6G network. Next-generation,
high-performance 6G networks rely heavily on very fast mobility to be successful.
384 K. H. Gudadhe et al.
Extremely rapid wireless data transfer is expected to be possible because to the inte-
gration of massively numerous input/output (6G) technology and extremely high
frequencies in the advent of 6G networks [9]. In addition, 6G networks plan to allow
for 4K video streaming and lightning-fast data transfers.
Using cutting-edge methods of communication, 6G is theoretically feasible. Using
methods like ultra-large MIMO, new spectrum, holographic radio communications,
full-duplex wireless communications, multiple access, and modulation; it is possible
to achieve the greatest data speeds imaginable. For this field to make considerable
headway, energy collection and backscatter transmission will be essential. Improv-
ing connectivity and worldwide coverage need for cell-free massive MIMO systems
that integrate terrestrial and non-terrestrial communications. Both quantum commu-
nication and the blockchain have proven effective in protecting the privacy of digital
currency exchanges. The potential for ultra-reliable and low-latency communica-
tion may be facilitated by integrating holographic teleportation (telepresence) with
edge computing. To sum up, it’s feasible that AI and ML might be highly useful
in the advancement of genuine intelligence. The ultimate goal of sixth-generation
wireless technology is to allow for simultaneous operation of all wireless networks.
Part of the goal is to make it possible for existing wireless networks to reach more
immaterial locations, such as the surface of the ocean or the upper atmosphere. The
Internet is accessible from any location on the planet because to the streamlined
data-exchange capabilities provided by networks. Delay-sensitive applications will
32 A Scholastic Comprehensive Study… 385
The probable future of the 6G network has been the subject of several studies, such
as [10]. It is described in [11] the results of an inquiry of the availability of various
ways. In reference [12], the authors investigate how quantum communication and
machine learning may be used to improve future 6G networks. Additional evidence
that AI will play a vital role in the architecture of future 6G networks is provided
by the research presented in [13]. Two sources that compare and contrast satellite
and terrestrial networks for data transfer are [14, 15]. Use of random access meth-
ods in the Internet of Things is investigated in [16]. For more on how 6G networks
and blockchain technologies could combine to provide intelligent healthcare solu-
tions, see [16]. The paper [17] investigates the potential of employing mm wave
frequency in upcoming 6G networks for satellite communications. The research in
[18] demonstrates the importance of confidentiality, privacy, and safety in the future
generation of 6G networks. as being important. Previous research neglected to take
into consideration the superior capabilities and characteristics of 6G wireless net-
works. Because of this, prior survey research has not done a good job of establishing
which technologies are necessary to satisfy specific long-term 6G ambitions. This
article includes a comprehensive overview of the current status of the subject as
well as an in-depth examination of the technologies that will form the foundation of
future 6G networks. This study investigates these measures because of their possible
relevance to the design of future 6G networks. The study extends beyond previous
surveys by investigating any and all technologies that have even a passing resem-
blance to the foundational technologies required to achieve the bare minimum in
performance standards. This review starts by naming the technologies in question
and then goes on to explain how they operate, list their key basic advantages, discuss
their predicted prospective applications, present the current state-of-the-art research,
and illuminate the research problems they face. Many emerging technologies, includ-
ing holographic teleportation (telepresence), multi-sensory extended reality, and the
Internet of Smart Things, are discussed in this study as potential applications of 6G
networks (IOT). The results of this study might be useful for both business lead-
ers and academic researchers. The writers of this review article also provide some
recommendations for further investigation.
386 K. H. Gudadhe et al.
This article describes the essential enabling technologies that will be required to sat-
isfy the needs of future 6G networks and concentrates on the important performance
features and criteria for such networks. Here, we describe the technology’s core oper-
ating concept, potential uses, current status of research, and technical challenges.
Most people feel that the data rate is the most crucial indicator of a mobile phone’s
performance. To increase the data rate of future 6G networks, the following sections
describe the major basic technologies that will be deployed.
The array gain and the number of degrees of freedom that may be achieved when
the transmit base station uses a high number of antenna elements are both increased.
Simple signal processing techniques may be used in both the uplink (UL) and the
downlink (DL) of MIMO. Using linear precoding techniques in DL and linear com-
bining methods in UL, the broadcast may be directed toward particular receivers
and a combination of broadcasts from several users can be created. Simple signal
processing techniques may be used in both the uplink (UL) and the downlink (DL)
32 A Scholastic Comprehensive Study… 387
Lower coverage areas due to shorter range communications, 6G has significant hur-
dles as a result of its transition to higher-frequency bands, including less physical
channel degrees of freedom owing to fewer scattering objects and greater signal
attenuation, which impacts the dependability of transmissions between the trans-
mitter and receiver. Since the proliferation of Internet-connected home appliances
and sensors, there has been a push toward the deployment of wireless networking
technologies that are implemented entirely in software (SDNs). So, programmable
software allows for the remote management of wireless networks. Extending the cov-
388 K. H. Gudadhe et al.
IRSs could use less energy than other wireless communication methods. The reason
for this is that IRSs may operate very well even without the use of advanced tech-
niques such as interference control methods, complicated signal processing, or power
amplifiers with RF chains. Low production costs for IRS have made mass produc-
tion possible. Indoors, on walls/ceilings, in exhibition halls; outdoors, on irregularly
shaped surfaces like buildings, roads, walls, shopping malls, and airports. In places
with weak multipath propagation, a widespread deployment like this has the ability to
bring the network closer to more consumers. IRSs may be useful in communications
systems that operate in the millimeter wave (mmWave) or terahertz (THz) frequency
ranges. This is because it is generally accepted that signals at higher frequencies are
more susceptible to distortion caused by transmission fluctuations. IRSs may expand
wireless communications’ channel options beyond what is currently achievable using
the LoS approach. Several studies have examined the potential of IRSs in smart radio
communications. In systems based on simultaneous wireless information and power
transfer (SWIPT), for instance, IRSs are considered to enhance the propagating sig-
nal attenuation, allowing for appropriate energy harvesting at the receivers. Evidence
for this may be found in several scientific investigations, some of which are described
in. Based on the findings presented, it is suggested that IRSs be used in mobile edge
computing to increase communication reliability and decrease offloading wait times.
Mobile edge computing is a new paradigm in edge computing that makes it possi-
ble to run computation-intensive Internet of Things applications on mobile devices.
Additionally, in Section VI-E, we explore mobile edge computing in greater depth.
The authors of investigate the potential of installing IRSs at the cell’s edge in multi-
cellular networks to boost the signal of the serving BS and mitigate interference from
surrounding cells. There has been a lot of research toward integrating. IRS into cog-
nitive radio networks. IRSs may help secondary users, who repurpose the spectrum
initially granted to primary users, by increasing the transmission intensity between
the transmitter and receiver. IRSs may be used to increase physical layer security. A
number of IRSs have been studied as possible approaches to reducing data loss to
snoopers and increasing received signal strength for authorized users.
32 A Scholastic Comprehensive Study… 389
Holographic MIMO has the potential to impact all of the world’s real settings.
Because of the hologram’s continuous electromagnetic aperture, wireless communi-
cations systems may achieve unprecedented densities and granularities in terms of
both data and location. In addition, it would enable the generation and detection of
electromagnetic waves at any spatial frequency, free from the interference caused by
side-lobe components. By virtue of its superior spatial resolution, holographic MIMO
should be able to significantly cut power consumption while significantly boosting
spatial multiplexing. The considerable propagation loss encountered by the mm wave
and THz bands may be reduced or eliminated by the use of holographic MIMO to pro-
duce super narrow beams Holographic MIMO has the potential to enhance spectrum
efficiency and network capacity since it combines visual and wireless communication
technologies.
The amount of data sent between Internet-connected devices, and the number of
such devices, have both increased dramatically during the last several years. The
Internet of Things (IoT) is driving the demand for faster data transfer speeds, and
developers are creating more apps that rely heavily on data. Therefore, there may
soon be an extremely severe scarcity of network capacity. As a result, efforts have
been made to make better use of the spectrum below 10 GHz and to investigate
operational frequency ranges such as mm Wave and THz. It is evident that in order
to address the wide range of needs associated with the Internet of Things, many
390 K. H. Gudadhe et al.
frequency bands must coexist inside a single system. By doing this, we may alleviate
strain on the current radio-frequency infrastructure while simultaneously decreasing
the potential for interference in wireless communications. Furthermore, future 6G
networks may benefit greatly from using higher-frequency bands, since this opens the
door to the prospect of getting faster peak data rates, more reliable communications,
and ultra-low latency. Furthermore, 6G networks are predicted to provide a unified
wireless interface by combining technologies from higher bands (above 10 GHz) and
the lower bands (below 10 GHz). However, expanding a system that uses exclusively
digital precoding from the sub-10 GHz ranges to higher bands presents a variety of
design and implementation issues and may even need significant modifications to
the physical layer.
However, further study is required before the system’s benefits can be fully appre-
ciated. The spatial correlation structure must be studied, for example, in order to
improve accurate channel modeling and channel estimation methods. Furthermore,
for holographic MIMO systems, it is crucial to identify practical pilot designs that
use either a purely digital or a mixed analog and digital beam forming architecture
and need minimum coherence time. Holographic MIMO systems need cutting-edge
signal processing technology and networking strategies before they can be employed
in the real world. Similarly important is the design of protocols and algorithms for
fast reconfiguration of the reflected electromagnetic signals.
8 Conclusion
References
1. IMT traffic estimates for the years 2020 to 2030, document ITU 0-2370 (2015)
2. Cisco (2020) Cisco annual internet report (2018-2023). White Paper. https://www.cisco.
com/c/en/us/solutions/collaeral/executiveperspectives/annua-internetreport/white-paper-
c11-741490.html
3. Gupta A, Jha ERK (2015) A survey of 5G network: architecture and emerging technologies.
IEEE Access 3:1206–1232. fo:kes:nic:tue
4. David K, Berndt H (2018) 6G vision and requirements: is there any need for beyond 5G? IEEE
Veh Technol Mag 13(3):72–80
5. Sharma P (2013) Evolution of mobile wireless communication networks-1G to 5G as well
as future prospective of next generation communication network. Int J Comput Sci Mobile
Comput 2(8):47–53
6. Akyildiz IF, Gutierrez-Estevez DM, Balakrishnan R, Chavarria-Reyes E (2014) LTE-advanced
and the evolution to beyond 4G (B4G) systems. Phys Commun 10:31–60
7. Wang C-X, Haider F, Gao X, You X-H, Yang Y, Yuan D, Aggoune HM, Haas H, Fletcher S,
Hepsaydir E (2014) Cellular architecture and key technologies for 5G wireless communication
networks. IEEE Commun Mag 52(2):122–130
8. Al-Eryani Y, Hossain E (2019) The D-OMA method for massive multiple access in 6G: per-
formance, security, and challenges. IEEE Veh Technol Mag 14(3):92–99
9. Huang T, Yang W, Wu J, Ma J, Zhang X, Zhang D (2019) A survey on green 6Gnetwork:
architecture and technologies. IEEE Access 7:175758175768
10. Dang S, Amin O, Shihada B, Alouini MS (2020) What should 6G be?. Nat Electron 3(1):20–29.
https://doi.org/10.1109/HPDC.2001.945188
11. Letaief KB, Chen W, Shi Y, Zhang J, Zhang YJA (2019) The roadmap to 6G: AI empowered
wireless networks. IEEE Commun Mag 57(8):84–90
12. Zhang S, Xiang C, Xu S (2020) 6G: connecting everything by 1000 times price reduction.
IEEE Open J Veh Technol 1:107–115
13. Shafin R, Liu L, Chandrasekhar V, Chen H, Reed J, Zhang J (2019) Artificial intelligence-
enabled cellular networks: a critical path to beyond-5G and 6G. IEEE Wirel Commun
27(2):212–217
14. Chen S, Liang Y, Sun S, Kang S, Cheng W, Peng M (2020) Vision, requirements, and technology
trend of 6G: how to tackle the challenges of system coverage, capacity, user data-rate and
movement speed. IEEE Wirel Commun 27(2):218–228
15. Clazzer F, Munari A, Liva G, Lazaro F, Stefanovic C, Popovski P (2019) From 5G to 6G: has
the time for modern random access come?. arXiv:1903.03063
16. Nayak S, Patgiri R (2021) 6G communication technology: a vision on intelligent healthcare. In:
Health informatics: a computational Perspective in healthcare. Springer, Singapore, pp 1–18
17. Zhang D, Zhou Z, Xu C, Zhang Y, Rodriguez J, Sato T (2017) Capacity analysis of NOMA
with mmWave massive MIMO systems. IEEE J Sel Areas Commun 35(7)–1606
18. Huang X, Zhang JA, Liu RP, Guo YJ, Hanzo L (2019) Airplane- aided integrated networking
for 6G wireless: will it work? IEEE Veh Technol Mag 14(3):84–91
Chapter 33
A Modified LSB Steganography
Algorithm to Store Images of Large Size
1 Introduction
Steganography can be of many forms: physical, digital, in puzzles, and so on. Digital
steganography itself can be categorized into image, audio, and video steganography.
This paper focuses on image steganography. Image steganography involves hiding
a piece of information within an image. This can be done by directly manipulat-
ing the values of the pixels of an image—spatial domain image steganography—or
modifying the orthogonal transform of the image as opposed to the image itself—
transform domain image steganography. There are a variety of algorithms to do the
same. Least significant bit (LSB) algorithm comes under spatial domain. Algorithms
like discrete wavelet transform (DWT) and discrete cosine transform (DCT) come
under transform domain. However, there are many algorithms too. Once the data is
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 393
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5_33
394 Y. V. Srinivasa Murthy et al.
hidden within the image, the image can be transmitted as usual over the communi-
cation channel, after which the receiver extracts the hidden message by applying a
process reverse to the data-hiding process [1].
Steganography is not to be confused with cryptography. Steganography differs
from cryptography in the sense that cryptography deals with protecting the contents
of a message in such a way that an eavesdropper will not be able to understand
original message when looking at the encrypted message, whereas steganography
deals with hiding the fact that a message has been sent in the first place. Usually,
cryptographic encryption is followed by steganography. This adds an extra layer of
security to the communication system.
Although the least significant bit (LSB) image steganography algorithm is simple
and easy to implement, it has inherent drawbacks that need to be addressed. One
of these drawbacks being that it is easily detectable common steganalysis tools,
because the LSB method is one of the default techniques that is checked for. Another
drawback is low data-hiding capacity, due to the fact that only one bit per pixel can
be used for storage. These drawbacks provide scope for enhancement in the LSB
algorithm.
The computational simplicity of the LSB algorithm accounts for its usefulness.
The LSB algorithm provides for quick hiding of data within an image, albeit much
concern for security as compared to other image steganography. Once example where
LSB image steganography is used is in the storage one-time-passwords (OTPs) in
images in mobile phones. This algorithm is particularly useful in this case ecause of
the limited processing power of a mobile device.
The biggest challenge in designing any image steganography algorithm is to pre-
serve the appearance of the image without any noticeable visual deformation as
compared to the actual image. At the same time, not compromising on the amount of
data to be stored, maintaining the security aspect. It should be computationally hard
to detect the data hidden in an image given just the image alone. Care should also
be taken to minimize the data loss during the data extraction from the image. Novel
LSB designs should ensure the above-mentioned qualities [2].
The paper proposes a modified LSB algorithm. The algorithm has been tested
based on the parameters. The rest of the paper is organized as follows: Sect. 2 details
the literature works that are done in the field of image steganography.
2 Literature Review
LSB steganography is the most widely steganography technique used to hide secret
data in images mainly for its simplicity of use [3]. The distortion in the resulting image
is also quite low [4]. There have been various modifications to these algorithms in
the past starting from its basic algorithm as a sequential procedure.
The basic LSB algorithm replaces the LSB of each pixel (in each channel also)
sequentially from left to right, top to bottom. This algorithm is really simple to
implement but at the same time is highly susceptible for attacks. Since data is stored
33 A Modified LSB Steganography Algorithm… 395
sequentially, data can be easily extracted also from the image and hence is not very
secure. After this, they have been several modifications that have been proposed.
Additional perceptual transparency is achieved by embedding the data at the edges
of the object [5]. Such algorithms make it difficult to extract the data from the images
as it is hard to find locations where the pixels have been modified. Adnan et al. have
used a technique where one of the RGB channels of the cover image is selected and
two LSBs of secret data are embedded in it [6]. However, hiding in a single RGB
channel decreases the amount of data that can be hidden significantly.
Mehdi and Mureed improved the LSB method increasing the embedding capacity
while retaining the quality of the stego image by changing upto five LSB of pixels
having low-intensity values. The message bits are also XORed before embedding
for higher security. However, such techniques are susceptible to detection by the
human eye [7]. There have been some random position-based algorithms developed
based on sequences also such as the famous Fibonacci sequence. However, these put
a very low limit on the amount of data that can be hidden in the image [8]. Some
RGB channel-based algorithms propose that the data be hidden in the blue channel
of the image as changes to this channel do not cause much distortion to the human
eye. There are several other image steganography algorithms based on direct cosine
transform (DCT) and discrete wavelet transform (DWT). These are, however, harder
to implement and not worth the efforts for most use cases. An improvision on Naive
LSB algorithm based on our needs is sufficient to get through most of our work needs.
The need to improve LSB also arises due to the advancement of steganalysis tools,
that is, tools that detect if any data is hidden in images or any other carrier. With the
advent of machine learning approaches, these tools have become stronger and are not
only able to detect hidden data but also extract the content in its meaningful form.
Hence, the data is no longer securely hidden. LSB as a simple and basic algorithm
is easily detected by such tools, and hence, data security is a major issue now in
LSB-based algorithms.
Our proposed method tries to improve the data security in the case of LSB by
hiding the data bits at randomly generated positions of the image. Such an imple-
mentation does not allow any third-party attacker to find out the order of data bits
of the secret data, keeping data secure even if able to extract. We also combined
RGB-based approaches with our method to keep the distortion at minimum.
3 Proposed Methodology
The naive LSB algorithm (given in Algorithm 1) is one of the earliest and most used
algorithms in steganography. LSB stands for least significant bit. LSB algorithm
involves altering the least significant bit plane in the image. Altering is done sequen-
tially to enable extracting the hidden data. Data can be hidden either beginning at
396 Y. V. Srinivasa Murthy et al.
the start of the cover image, the middle, or the end of the image. The receiver must
however be aware of the exact nature of the concealing algorithm used to be able to
be able to extract the hidden image.
LSB algorithm and its variants can be used used in any type of steganography.
When deployed for image steganography, the image compression algorithm used
must be lossless compression. Lossy compression may alter the Least significant bit
plane which makes it impossible to extract the concealed data.
This LSB algorithm is however subject to the threat of easy detection. It might not
be very visible to the naked eye, but however after subjecting to steganalysis tech-
niques, the concealed data becomes evident. One of the attacks that LSB algorithm
is not immune to is the bitplane analysis. When the image is analysed bit plane by
bit plane, the pattern of data concealed in the least significant bit becomes evident.
Similarly, there are many steganalysis algorithms suggested that can detect images
with concealed data. These analysis techniques mainly rely on the localization of
data in LSB algorithm.
This algorithm follows the naive LSB method to a great extent but, however, it
carefully chooses its pixels by hiding the data in only those pixels which have a
value greater than a certain threshold (e.g., 100). This ensures the percentage change
in the pixel value is not very high and hence a lower chance of detection.
This method has been inspired from the several RGB plane-based LSB methods
where authors have chosen the RGB plane which causes least distortion to the human
eye. However, by carefully changing only those pixels that have intensity greater than
a certain threshold value, we ensure that the percentage change in value isn’t very
high, hence avoiding detection.
Data security is still a problem with the RGB-based method, and hence, this
algorithm has to be coupled with random number-based LSB algorithm to improve
the data security. In our results, we have combined both the algorithms and compared
with the Naive LSB-based algorithm.
We have experimented with our algorithm as well as the Naive LSB substitution
algorithm on images of various sizes ranging from 100 × 100 to 500 × 500 sized
images. Image similarity metric has been calculated for all the images. The amount
of data to be hidden has been increased as the size of the image increases. It is kept
around 80% of the total amount of data that can hide in the image using the current
methodology.
We begin with experimenting with single LSB substitution which is the most
basic form of LSB substitution algorithms. Results as shown in the table show that
for most images it is around 99%. Such implementation barely causes any distortion
in the images, and this can observe in the following image example for 500 × 500
image.
Our next set of experiments was with two LSB substitutions. With this, we can start
seeing the significant differences between the Naive LSB as well as the Random RGB
proposed method. Clearly, the proposed RGB method is outperforming the Naive
LSB method in this case.
33 A Modified LSB Steganography Algorithm… 399
Fig. 1 Outcome of random and sequential LSB steganography with the consideration of one LSB
pixels
With three LSB substitution, we finally start to see the changes that occur with
different sized images. As the size of the image decreases, clearly image similarity
metric reduces which supports the proposition that image similarity metric takes into
account the spatial positioning of the data bits. More bits are hidden closer to each
other in smaller images and hence higher chances of being detected.
With four and five LSB substitution, we start seeing noticeable changes in the
image. In the case of Naive LSB algorithm, we can see a series of darker dots
supporting that the pixels have become darker. In the case of Random+RGB proposed
method, as we look closer we can see certain distortions (dark spots) in the image.
The image similarity metric for such cases has also reduced significantly.
33 A Modified LSB Steganography Algorithm… 401
Fig. 2 Outcome of random and sequential LSB steganography with the consideration of two LSB
pixels
Fig. 3 Outcome of random and sequential LSB steganography with the consideration of three LSB
pixels
Fig. 4 Outcome of random and sequential LSB steganography with the consideration of four LSB
pixels
Fig. 5 Outcome of random and sequential LSB steganography with the consideration of five LSB
pixels
The algorithms we have proposed have their own merits and demerits. We have
displayed here a table comparing the properties of the proposed algorithms against
the Naive-Substitution algorithms. Table 6 gives the details about the efficiency of
the proposed approach over other algorithms. Table 7 gives the details about the
performance of the proposed approach over other algorithms.
33 A Modified LSB Steganography Algorithm… 403
Table 1 Results obtained using Naive and proposed methodology (1-LSB Steganography)
Naive Random + RGB
100 × 100 99.378 99.453
200 × 200 99.619 99.689
300 × 300 99.700 99.754
400 × 400 99.773 99.780
500 × 500 99.806 99.817
Table 2 Results obtained using Naive and proposed methodology (2-LSB Steganography)
Naive Proposed
100 × 100 89.178 98.787
200 × 200 90.839 98.910
300 × 300 91.461 99.011
400 × 400 92.109 99.291
500 × 500 92.647 99.339
Table 3 Results obtained using Naive and proposed methodology (3-LSB Steganography)
Naive Proposed
100 × 100 88.120 96.261
200 × 200 89.219 97.359
300 × 300 90.410 98.107
400 × 400 91.081 98.671
500 × 500 92.887 99.051
Table 4 Results obtained using Naive and proposed methodology (4-LSB Steganography)
Naive Proposed
100 × 100 66.799 92.117
200 × 200 68.014 93.399
300 × 300 69.102 94.651
400 × 400 60.221 95.301
500 × 500 61.114 96.101
404 Y. V. Srinivasa Murthy et al.
Table 5 Results obtained using Naive and proposed methodology (5-LSB Steganography)
Naive Proposed
100 × 100 57.087 92.017
200 × 200 59.211 93.399
300 × 300 60.205 94.781
400 × 400 61.121 95.401
500 × 500 61.9076 96.044
Table 6 Comparative analysis of the parameters security, amount of data, and spatial noise of
proposed approach over the other algorithms
Algorithm Bits/pixel (bpp) Data security Amount of data Spatial noise
Naive LSB 1–3 Low High High
RGB based LSB 1 Low Low Low
Random number <1 High High Moderate
Random+RGB <1 High Low Very low
We also checked the images generated by the above algorithms against an available
and state-of-the-art steganalysis tool called StegExpose. The results of our experiment
are as follows:
We have performed a comparative analysis on our proposed algorithms and studied
their performance with respect to the following parameters:
Table 8 Image similarity metric that is considered to compare original with the image of secret
information
#bits used Image similarity
Sequential Random
1 99.8065 99.8817
2 92.6473 99.3399
3 92.8875 99.0510
4 61.1147 96.1010
5 61.9076 96.0441
For comparing the image similarity of two images, the pixel values cannot be com-
pared directly. The images formed using Naive and random LSB have the same bit
differences with respect to the original image.
Humans however will not notice the same degree of difference between the two
in spite of same amount of difference with respect to pixel values. Thereby, we came
up with our metric called image similarity metric. Table 8 gives the details of the
similarities obtained over original images.
5 Conclusion
The results we have collected from our extensive set of experiments have been shown
in the previous section. From all our results, we can clearly see that the proposed
RGB+Random-based method improves over the Naive LSB algorithm to a really
great extent. Our purpose of developing a modified and enhanced LSB Steganography
algorithm is thus complete.
406 Y. V. Srinivasa Murthy et al.
This paper has certain limitations. The modified algorithms work better than the
Naive LSB algorithm; however, more experiments need to be performed to compare
with the other modified LSB algorithms that are present today. This algorithm also
cannot be compared against more of the advanced image steganography algorithms
today. Future work has to be done to compare the results of this algorithm with other
algorithms in the domain of LSB steganography.
References
I
B Indrani Mukherjee, 177
Balasubramaniam Jayaram, 189
Bhautik H. Gevariya, 77
Bhumika, R., 137 J
Bhuvanesh Bhattarai, 271 Janaki, K., 199
Biswajit Tripathy, 213 Jebastin, K., 199
Biswaranjan Sarangi, 213 Jeyavani, M., 113
Jyoti Madake, 151
C
Cesar, Vitória, 259
K
Cossa, Grazielle, 259
Kajal Rai, 223
Costa, Camila, 259
Kamal Kumar, 1
Karuppasamy, M., 113
D Kavita H. Gudadhe, 381
Deepak C Karia, 323 Kavit Nanavati, 189
Deivalakshmi, S., 101 Keshav Jhawar, 137
de Oliveira, José Augusto, 259 Krishna Sowjanya, K., 271
Deshpande Arnav Sunil, 393
Dhinakaran, K., 199
M
Manjunath Aradhya, V. N., 89, 299
G Manju, V. N., 271
Garima Bisht, 25 Marim, Lucas, 259
© The Editor(s) (if applicable) and The Author(s), under exclusive license 407
to Springer Nature Singapore Pte Ltd. 2023
R. Tiwari et al. (eds.), Proceedings of International Conference on Computational
Intelligence, Algorithms for Intelligent Systems,
https://doi.org/10.1007/978-981-99-2854-5
408 Author Index
T
R
Tanmay Paratkar, 285
Rahul Joshi, 331
Thejaswini, M. S., 299
Rahul Mansharamani, 331
Rahul Pitale, 235
Rajesh Jalnekar, 151
Rakesh Kumar Pandey, 343 U
Rakesh Kumar Tiwari, 343 Umesh V. Nikam, 165
Rakshitha, H. S., 39
Ramya, R. S., 137
Ritu Malik, 1 V
Rohan More, 235 Vaishali M. Deshmukh, 165
Vaishnavi, J., 355, 393
Venugopal, K. R., 137
S Vijay Kumar Nampally, 63
Sabeena Gnana Selvi, G., 309 Vipul Kheraj, 77
Sagar Nilgar, 151 Vivek Mankar, 285
Sagar Shedge, 151
Sai Vignesh, 101
Saloni Parekh, 393 W
Sanchari Saha, 271 Warsha P. Sirskar, 381