Professional Documents
Culture Documents
H. S. Saini
Rishi Sayal
A. Govardhan
Rajkumar Buyya Editors
Innovations
in Computer
Science and
Engineering
Proceedings of 8th ICICSE
Lecture Notes in Networks and Systems
Volume 171
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of Campinas—
UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Turkey
Derong Liu, Department of Electrical and Computer Engineering, University
of Illinois at Chicago, Chicago, USA; Institute of Automation, Chinese Academy
of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering
University of Alberta, Alberta, Canada; Systems Research Institute
Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to both
the contributors and the readership are the short publication timeframe and the
world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
Innovations in Computer
Science and Engineering
Proceedings of 8th ICICSE
Editors
H. S. Saini Rishi Sayal
Guru Nanak Institutions Guru Nanak Institutions
Ibrahimpatnam, Telangana, India Ibrahimpatnam, Telangana, India
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore
Pte Ltd. 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
This volume contains 84 papers that were presented at the Eighth International
Conference on Innovations in Computer Science and Engineering (ICICSE-2020)
held during August 28–29, 2020, at Guru Nanak Institutions, Hyderabad, India, in
collaboration with Computer Society of India(CSI) and funding from All India
Council for Technical Education (AICTE).
The aim of this conference is to provide an vibrant virtual international forum
that hubs together the researchers, scientists, academicians, corporate professionals
and technically sound students under a roof to make it as a phenomenal, informative
and interactive session which is acutely needed to pave the way to promote research
advancements in the field of computer science and engineering.
ICICSE-2020 received more than 400 research papers from various sub-fields of
computer science and engineering. Each submitted paper was meticulously reviewed
by our review committee consisting of senior academicians, industry professionals
and professors from premier institutions and universities.
• This conference was inaugurated and attended by top dignitaries such as Mr. Srini
Santhanam, Vice President, S2 Integrators LLC, Atlanta, Georgia, USA; Dr. A.
Govardhan, Professor and Rector, JNTU, Hyderabad; Dr. M. Manzoor Hussain,
Professor and Registrar, JNTU, Hyderabad; and Mr. Aninda Bose, Senior Editor,
Springer India Pvt. Ltd, India.
• This conference has a fantastic line up of keynote sessions, webinars sessions
by eminent speakers, paper presentation sessions to present the latest outcomes
related to advancements in computing technologies.
• The keynote and webinar sessions were conducted on cutting-edge technologies
such as advancement in the field of artificial intelligence, advanced machine
learning techniques, cybersecurity, data science-case studies, and the invited
speakers were Dr. Sujala Deepak Shetty, Professor, BITS Pilani, Dubai Campus,
UAE; Mr. Kiran Naidu, Data Scientist, AW Rostamani, Dubai, UAE; Dr. G.
Shanmugarathinam, Professor and CISCO Certified Ethical Hacker, Presidency
University, Bengaluru, India; and Dr. B. Sateesh Kumar, Professor, JNUTH,
Hyderabad, India, respectively.
v
vi Preface
vii
viii Contents
Dr. H. S. Saini Managing Director of Guru Nanak Institutions obtained his Ph.D.
in the field of computer science. He has over 30 years of experience at
university/college level in teaching UG/PG students and has guided several B.Tech.
and M.Tech. projects and six Ph.D. scholars. He has published/presented above 90
high-quality research papers in international and national journals and proceedings
of international conferences. He has published six books with Springer. He is a
lover of innovation and is an advisor for NBA/NAAC accreditation process to
many institutions in India and abroad. He is chief editor of many innovative
journals and chairing various international conferences.
Dr. Rishi Sayal Associate Director, Guru Nanak Institute of Technical Campus,
has completed his B.E. (CSE), M.Tech. (IT) and Ph.D. (CSE). He has obtained his
Ph.D. in computer science and engineering in the field of data mining from
prestigious Mysore University of Karnataka State. He has over 28 years of
experience in training, consultancy, teaching and placements. His current areas of
research interest include data mining, network security and databases. He has
published wide number of research papers in international conferences and
journals. He has guided many UG and PG research projects, and he is recipient of
many research grants from government funding agencies. He is co-editor of various
innovative journals and convened international conferences.
xv
xvi Editors and Contributors
1 M.Phil. and 135 M.Tech. projects. He has published 555 research papers at
international/national journals/conferences including IEEE, ACM, Springer,
Elsevier and Inderscience. He has delivered more than 100 keynote speeches and
invited lectures. He has chaired 22 sessions at the international/national
conferences in India and abroad. He has the research projects (completed/ ongoing)
worth of Rs. 1.159 crores.
Contributors
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 1
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_1
2 S. Valai Ganesh et al.
1 Introduction
Human behavior detection by static and dynamic motions is a latest technology that
can be used for identifying human activities through computer and smart phone
systems. A typical human behavior detection dataset is the ‘Activity Recognition
Using Smart Phones Dataset’ available from the internet. Input data’s can be taken
from several sorts of devices, such as sensors for capturing images, recording audio,
monitoring pressure level, orientation and accelerations. The quick development
of communication systems between human with computers and human with smart
phones leads to identifying activities of humans in every aspect. More importantly,
the recent introduction of GPU and deep learning [1, 2] algorithms has made human
behavior detection applications in various areas like athletic competition, smart home
automation and health care or monitoring for the Elder peoples.
In current scenario, two types of methods are available to detect human behavior
detection: First one is using live images of human behavior and the second one is
from wearable sensors [3, 9]. By using the sensors (Gyroscope and accelerometer) in
a smart phone [5], data’s like acceleration and orientation are taken from Accelerom-
eter and Gyroscope with several variations are recorded. Accelerometer [4, 6] and
Gyroscope readings are taken from 30 volunteers (referred as subjects) while per-
forming the static activities like Sitting, Standing or Laying and dynamic activities
like Walking, Walking Upstairs, Walking Downstairs. Accelerometer readings are
divided into gravity acceleration and body acceleration readings, which has three
dimensions in nature. Each Sensor signal is preprocessed by adopting noise filters.
The remaining portion of the article is aligned as follows. In Sect. 2, there was
discussion about research work completed in the past by the research community.
Dataset particulars in the proposed work are provided in Sect. 3. In Sects. 4 and 5,
a discussion about Machine Learning and LSTM model has been discussed. The
experimental results have been discussed in Sect. 6. Article is ended with the com-
parison of machine learning model’s accuracy with deep learning model accuracy
along with possible extension of the work by introducing new deep learning models
in future in Sect. 7.
2 Related Work
Bayat et al. [4] developing two different models for detecting human activities. One
model is named as “in-hand” and another model is “in-pocket”. There are six different
activities are detected. They are Fast Walk, Slow Walk, Running, Stairs-Up, Stairs-
Down and Dancing. They are using tri-axial accelerometer is used to detect the
human activities. Six different classification methods are adopted for this work and
their results are compared. Testing accuracy of up to 91.15% is achieved in everyday
activities using accelerometer.
Static and Dynamic Activities Prediction of Human … 3
Bulbul et al. [5] predicting human behavior using Smartphone’s by adopting deep
learning models. Sensors like Accelerometer and gyroscope are used to predict the
human behaviors. Dataset contains the information of nine individuals performing
three different dynamic activities like walking, climbing up the stairs, climbing down
the stairs, and three different static activities like sitting, standing and laying. Input
data’s are monitored with a frequency range of 50 Hz. The signals are received
and saved for every proportions. Designed models first trained with 80% of the
total dataset and tested with 20% data. Models are developed, observed and tested
using fivefold cross validation. Various conventional Machine learning Classification
models such as Decision Trees, SVM, etc. were used in this work.
Attal et al. [3] provides an overview of various techniques to detect human behav-
ior from a wearable inertial sensing unit. Sensors were located in different portions
of the physical body of a human. Importantly, the detecting devices were located in
the lumbar and waiting for detecting various static and dynamic activities. A con-
catenation of forty stochastic tasks was chosen for the study. Conventional machine
learning classification techniques used in their study among that k-Nearest Neighbors
produced better accuracy among other techniques.
3 HAR Dataset
An Accelerometer and Gyroscope readings are taken from 30 human beings (referred
as subjects) while performing the following 6 classes (labels) such as static classes
like Standing, Sitting, Laying and dynamic classes like Walking, Walking-Upstairs,
Walking Downstairs.
Accelerometer outputs are classified into two parameters, namely, gravity accel-
eration and body acceleration readings, which has three components x, y and z.
Gyroscope readings are used to represent angular velocities of three dynamic activi-
ties. Jerk signals are taken from body acceleration readings. Fourier Transforms are
finished on the above time readings to calculate frequency readings. There are 561
features are available in the dataset. Each window of readings is a data point of 561
features of subjects. Thirty subjects (human beings) data are randomly split to 70%
(21 persons) test and rest is train data. Each data point corresponds one of the 6 class
of activities.
HAR Dataset was initially tested by developing a model using conventional machine
learning models like Logistic Regression, Linear SVC, SVM, Random Forest, Deci-
sion Tree and Gradient Boosting methods. The results of the model are shown in
experimental result section.
4 S. Valai Ganesh et al.
Comparing the precision results of all machine learning models are shown in
Fig. 1. Linear SVC and SVM models are produced almost the same value whereas
DT model produced least value. Recall comparison results are shown that Linear
SVC produced some better value than other machine learning models.
Similarly, by comparing the other two parameters like Accuracy and F1 score
are shown in the form of bar chart as shown in Fig. 2. In that, the Linear SVC
model provides better accuracy than other five machine learning models. Decision
Tree provides least accuracy among other models. On the other hand, by Linear
SVC shows good F1 score as compared to all others. By using six different machine
learning models, Linear SVC produced some good results among others. Decision
tree models produced the least results compared to all other five machine learning
models where as SVM provide some decent results.
The Long Short-Term Memory model is selected in this work. LSTM able to
learn long-term dependencies. LSTM work extremely well on sequential modeling.
Sequential modeling means ability to predict what comes next in order. Problems like
Vanishing Gradient and Exponential gradient are normally occurring during back-
propagation through time process. LSTM normally overcomes this problem of BPTT
(Back Propagation Through Time). LSTMs looks chain like structure, on the other
Static and Dynamic Activities Prediction of Human … 5
hand the repeating module has a different structure. There are four interacting layers
in LSTM [7]. Interacting layers are equipped with point wise operations, activation
functions like sigmoid or hyperbolic tangent functions. Line merging indicates con-
catenation process, line forking denotes the copy process and copied content is fed
to various portions inside the interacting layers. The LSTM have able to forget the
information or append the incoming data to the cell state by means of layout called
as gates (Fig. 3).
There are two packages are added with LSTM. One is Hyperopt and Hyperas.
Hyperopt is the open source python library for doing optimization in serial and par-
allel spaces. Hyperopt may include real-valued, discrete, and conditional dimensions
of data. Hyperas is a casing for hyperopt to perform optimization for keras models.
The Softmax activation function is used here to predict six different classes. The
Softmax activation provides a probability based distribution function. It is used to
predict the output when multiple classes are involved. In this work there are six classes
are labels are available in which softmax function provides probability rates for all
classes. Based upon the highest probability rate the output is predicted. Normally,
softmax function is located in the endmost layer in a categorizing problem.
6 Experimental Results
In this work, human activities are obtained based upon movements. The experiment
is performed with python version 3. Initially, the work was started with LSTM layer.
Then expanded to two-layer LSTM, LSTM with hyperparameters with 15 evalua-
tions. The outcome of LSTM model is shown in Tables 1, 2 and 3 (Fig. 4).
The same HAR Dataset is trained and tested with deep learning models like LSTM
single and two layers then with hyperparameters. LSTM with single layer produced
around 92.40% validation accuracy, whereas LSTM with two layers produced slightly
improved results around 92.43% and LSTM with hyperparameters provides around
95.40% accuracy by using hyperopt modules. Linear SVC model and LSTM with
6 S. Valai Ganesh et al.
Fig. 4 a LSTM validation single layer; b LSTM validation two layer; c LSTM validation above
two layer
Static and Dynamic Activities Prediction of Human … 7
hyperparameters models are producing almost same validation results. In the upcom-
ing work, Convolutional Neural Network based model and Resnet based model will
be developed and check for their accuracy level. By adopting new deep learning mod-
els we can predict the many label of human activities during normal and abnormal
condition of humans.
Smart phone applications are not limited to communications and networks. Various
deep learning models are slowly incorporated in smart phones in order to collect the
data for various activities. HAR is one of the important outcomes of this feature.
LSTM produces 95.40% of testing accuracy. In the future, we are planning to add
some classes with developing other models like CNN and Resnet Models and check
for their validation accuracy level.
References
1. Agarwal, M., Kaliyar, R.K., Singal, G., Gupta, S.K.: FCNN-LDA: a faster convolution neural
network model for leaf disease identification on apple’s leaf dataset. In: 2019 12th International
Conference on Information & Communication Technology and System (ICTS), pp. 246–251.
IEEE (2019)
2. Agarwal, M., Sinha, A., Gupta, S.K., Mishra, D., Mishra, R.: Potato crop disease classification
using convolutional neural network. In: Smart Systems and IoT: Innovations in Computing,
pp. 391–400. Springer (2020)
3. Attal, F., Mohammed, S., Dedabrishvili, M., Chamroukhi, F., Oukhellou, L., Amirat, Y.: Phys-
ical human activity recognition using wearable sensors. Sensors 15(12), 31314–31338 (2015)
4. Bayat, A., Pomplun, M., Tran, D.A.: A study on human activity recognition using accelerometer
data from smartphones. Procedia Comput. Sci. 34, 450–457 (2014)
5. Bulbul, E., Cetin, A., Dogru, I.A.: Human activity recognition using smartphones. In: 2018 2nd
International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT),
pp. 1–6. IEEE (2018)
6. Kwapisz, J.R., Weiss, G.M., Moore, S.A.: Activity recognition using cell phone accelerometers.
ACM SigKDD Expl. Newslett. 12(2), 74–82 (2011)
7. Kwon, M.C., Choi, S.: Recognition of daily human activity using an artificial neural network
and smartwatch. Wirel. Commun. Mob. Comput. 2018 (2018)
8. Polu, S.K., Polu, S.K.: Human activity recognition on smartphones using machine learning
algorithms. Int. J. Innov. Res. Sci. Technol. 5(6), 31–37 (2018)
9. Sousa Lima, W., Souto, E., El-Khatib, K., Jalali, R., Gama, J.: Human activity recognition
using inertial sensors in a smartphone: an overview. Sensors 19(14), 3213 (2019)
10. Sun, J., Fu, Y., Li, S., He, J., Xu, C., Tan, L.: Sequential human activity recognition based on
deep convolutional network and extreme learning machine using wearable sensors. J. Sens.
2018 (2018)
Implementation of Braille Tab
1 Introduction
Digital revolution has brought tremendous changes in the field of education. One can
access a pool of knowledge on their fingertips. This revolution has made our society
more educated and liberal; due to which, the overall standard of living has improved.
But for the visually impaired, acquiring knowledge in not easy even in 2020. Efforts
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 9
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_2
10 T. Kulkarni et al.
have been made to provide them knowledge using audiobooks and Braille printed
books. However, research has it that dependence on audiobooks reduces cortical
plasticity—an important factor in cognitive development [1]. A part of the socio-
economic strata is not able to afford information sources for the blind because of
limited number of books are available in Braille lipi. Often, these books are bulky and
expensive, which limits their accessibility in social circles. In recent years, refreshable
tactile displays have been developed, thereby allowing the blind to access information
available online. In most of these projects, more emphasis was given on the hardware
aspect, making them accurate, but not focused on the ease of usage. Hence, the need
to develop a system ensuring easy access to braille tabs was realized and worked
upon.
2 Related Work
After conducting a thorough survey of existing systems, it was learnt that diverse
research has been performed on the hardware aspect of tactile display. Actuators were
composed of piezoelectric material, shape memory alloys, and solenoids to raise
individual pins in the Braille character [2]. Bending characteristics of electroactive
polymers are utilized to provide hydraulic action of Braille dot [3]. Pneumatic signals
were used to raise dots of several braille cells arranged in a row [4]. MCU was
designed to convert Chinese or English text into Braille, play music and provides
keyboard and other display features [5]. Character recognition was used for system
development [6]. Arduino-based tab was created for Devanagari to Braille conversion
[7]. Solidification of liquid state alloys was performed to allow locking of braille dots
[8]. Though a few drawbacks have been noted in above systems such as—(a) they
are commercially not viable, (b) the actuator mechanism can break when excessive
pressure is applied, (c) the one actuator per dot mechanism can make the device
bulky, and finally, (d) the refreshing rate is not satisfying in all the available tabs [9].
3 Proposed System
• Braille tab is in effect, an e-book reader, but for the visually impaired, a special
emphasis has been given on its usage in educational institutions. For testing
purpose, LEDs are used which can be easily replaced with solenoid actuators
for making tactile displays [10] (see Fig. 1).
• The device uses a hierarchical architecture consisting of shift registers and multi-
plexers for accessing each character individually. This makes it energy efficient
and reduces maintenance costs.
• For making any device user-friendly, android applications become helpful. Many
inbuilt features help users to access app and developers to create such apps without
losing focus on the main objective of the application.
Implementation of Braille Tab 11
• “Braille companion” acts as a mediator between the user and the braille tab,
helping user to utilize system effectively by providing a smooth flow. The android
application is designed keeping in mind that most of the users will be visually
impaired; hence, a description of talk back and large buttons are provided for
improving user experience.
• For providing a secured system and hassle-free usage, biometric lock is enabled
for the app. Only those fingerprints are valid which are already registered for the
device to unlock and saved in TEE. An authenticated user is prompted for the
same (Fig. 3).
• The application uses the Android ID which is unique and assigned during initial
boot-up, and which remains unchanged unless factory reset is performed. It is
used to identify user uniquely on firebase database.
• The absence of user profile indicates that the user is not registered and so is
redirected to the registration page (Fig. 4a).
• Users enrolled in the institution must fill their Institute ID after which they will
be redirected to another login page (Fig. 4b). After successful registration, data
is uploaded on firebase. To control accessibility, such user needs approval from
admin. Newly arrived requests are listed as shown in Fig. 7.
• Unless admin takes any action, the user must wait, and the same is prompted to
the user if a login attempt is made.
• If admin accepts the request using cloud messaging and volley, then a push notifi-
cation is sent. Also depending on admins action, user profile is updated on firebase
(Fig. 6).
• A teacher and a student can both decide to choose the purpose of using the tab
through two modes: “Classroom” and “Self-Learning” (Fig. 8).
• After joining a classroom, the student can access data available for particular
classroom.
12 T. Kulkarni et al.
• For a non-institutional user, classroom mode is not available, and only self-
learning mode is accessible.
• After mode is selected, user has to choose the type of data to be uploaded (Fig. 9).
• In document mode, .txt files can be selected which are then uploaded on
firestorage, and its download link is uploaded on the firebase for the tab to access
Implementation of Braille Tab 13
Fig. 5 Firestorage
Fig. 6 Notification
file. Once operation is done successfully, user is prompted for the same (Figs. 5
and 10).
14 T. Kulkarni et al.
• “Dictate” mode uses Google’s speech-to-text’ API (see Fig. 11) to perform speech-
to-text conversion. The output string is confirmed using talkback system and then
uploaded on firebase (Table 1).
Above table maps the flags set by application values and the download location
for the tab in firebase. For example, when teacher uses dictate, feature is self-learning
mode; flag values of “role-setup-mode” will be “1-2-2,” respectively. Tab can down-
load the string from androidID//Text//my. Flag values help the tab to identify which
mode was used most recently by the user, thereby indicating what data to display on
the tab (Fig. 10).
Implementation of Braille Tab 15
• Once a mux is enabled, it select lines are used to glow an LED as per require-
ment. Here, only one LED glows at a time, but due to high processing power of
micro-controller and persistence of vision, it seems that all the LEDs are glowing
simultaneously.
• For flawless communication, the android ID of user’s device is already registered
in tab.
6 Results
7 Conclusion
In this paper, a system for creating a user-friendly Braille tab was proposed. Imple-
mentation based on this proposed system was carried out, and software and hardware
aspects of the system were discussed. Special emphasis was put toward the applica-
tion side, in order to better the user experience. Future adaptations of the Braille tab
are possible.
The tab which is currently used for pre-defined classrooms can be further devel-
oped to create customized classrooms. A keyboard can also be added to the tab
depending upon user requirement. The tab can be developed to support formats other
than .txt files. The system can be adapted to be used as a book reader by designing
a web portal.
18 T. Kulkarni et al.
References
1. Hamilton, R.H., Pascual-Leone, A.: Cortical Plasticity Associated with Braille Learning (1998)
2. Schmidt, R.N., Lisy, F.J., Prince, T.S., Shaw, G.S.: US Patent number: US6743021B2. Retrieved
from https://patents.google.com/patent/US6743021B2/en (2002)
3. Yang, P.: US Patent number: US6881063B2. Retrieved from https://patents.google.com/patent/
US6881063B2/en (2005)
4. Sutherland, N.B.: US Patent number: US3659354A. Retrieved from https://patents.google.
com/patent/US3659354A/en (1972)
5. Xiaoli, H., Tao, L., Bing, H., Qiang, C., Qiang, X., Qiang, H.: Electronic reader for the blind
based on MCU. In: 2010 International Conference on Electrical and Control Engineering,
pp. 888–890. Wuhan (2010)
6. Wajid, M., Kumar, V.: E-Braille documents: novel method for error free generation. Image
Process. Commun. 19(4), 21–26 (2014)
7. Gupta, R., Singh, P.K., Bhanot, S.: Design and implementation of Arduino based refreshable
braille display controller. Indian J. Sci. Technol. 9, 33 (2016)
8. Soule, C.W., Lazarus, N.: Reconfigurable Braille display with phase change locking. Smart
Mater. Struct. 25(7), 075040 (2016)
9. Gote, A., Kulkarni, T., Jha, S., Gupta, S.: A review of literature on braille tab and the underlying
technology. In: 2020 5th International Conference on Devices, Circuits and Systems (ICDCS),
pp. 333–335. Coimbatore, India (2020)
10. Yang, T.-H., Lee, J.-S., Lee, S.S., Kim, S.-Y., Kwon, D.-S.: Conceptual design of new micro-
actuator for tactile display. In: 2007 International Conference on Control, Automation and
Systems, pp. 1306–1309, Seoul (2007)
Resized MIMO Antenna for 5G Mobile
Antenna Applications
1 Introduction
The user equipment can have a many advantages by using fifth generation (5G) such
as rate of high transmission and smaller inertia over the present 4G system. MIMO
antenna systems with multiple antennas (more than three antennas) are capable to
achieve higher rate of transmission from the 5G antenna. In this paper, the MIMO
antenna can achieve the high isolation by using the more number of antennas [1]
but, due to restricted space in mobile phones, not able to use the large number
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 19
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_3
20 S. Subramanyam et al.
of antennas. There are few techniques to develop the process, such as decoupling
element, orthogonal polarization, ground structure, and neutralization line [2–4].
Most of the conventional systems consist of two antennas only, but the signal
strength of two antennas is weak. Antenna isolation was improved by using the four-
antenna MIMO system [5–7]. The more number of antennas are presented in single
system; and the more antennas in a device have high data rate and upload speeds.
Mostly, the size of the antenna is large, but in this work to decrease the size of the
antenna around 30% by using the vertical stubs [8–10]. The U-structured antenna
element can be used as decoupling element and radiating element. This is the unique
feature of MIMO antenna to have very good isolation.
In this paper, the proposed antenna can be operated at the frequency about 2.4 GHz
(2.3–2.5 GHz) for four-port MIMO antenna and also to use the same frequency for
single antenna. The single antenna and four antennas had a same discrepancy in
dielectric, tangential loss, and top surface where dielectric = 3.4, tanδ = 0.02 and
top surface (t) = 1.524. The values of H and L are same for both the type of antennas.
The total length of single antenna and four antennas was different. The length of four
antennas is 130.3 mm. Here, the antenna size is decreased about 40% from the actual
height. It should be a small gap between the U-structure component and T-shaped
component. The space between them gives good impedance matching in the exact
antenna system result. MIMO technology leverages multipath behaviour by using
multi-smart transmitters and receivers. MIMO is a wireless technology and it is used
to increase the channel capacity. MIMO can also be called as spatial multiplexing.
q = 11.9 mm. A very good isolation can be obtained when antenna moves from one
antenna to another antenna.
The antenna radiation pattern, current distribution, and total efficiencies are good.
For 2.4 GHz frequency band, the efficiency of antenna is more than 90%. The current
passing through the antenna and ground plane explains us about MIMO methodology
isolation.
The data rate customary with a pair of 2.4 GHz frequency is greater to 34 bps/Hz.
The MIMO antenna return loss is influenced by factor t and perfectly matching
in 50 impedance matching at resonant frequency. The operating frequency is
operating based on various stub length. For better performance and proper gain, the
structure of antenna with specified measurements and then the behaviour is verified
at resonant frequency 2.4 GHz, with a reflection coefficient of −30 dB as exhibited
in Fig. 3.
The 2D radiation model of this antenna, which is 4 port MIMO, works well despite
showing different angles as shown in Figs. 4 and 5. Above figures show the maximum
gain is 1 dBi and the efficiency shows 90% at the operating frequency.
From the above figure shows the gain and directivity in between 4.5 and 5 dBi.
4 × 4 MIMO will consist of four antennas. Generally, a device with more antennas
is used to have a high cost because of its hardware. And they will use a bit of extra
power for extra wireless hardware.
Resized MIMO Antenna for 5G Mobile Antenna Applications 23
4 Conclusion
A resized four-port antenna system for 5G mobile antenna applications has been
presented. This antenna system is depended on compressed antenna component and
this antenna element is a self-isolated one. The MIMO antenna is confirmed by
reproduction and analysis. And it can also achieve the good isolation without any
decoupling element or isolation components. Without decreasing any efficiency, the
diffusing component of the implemented antenna acts as an un-coupling component.
Because of size reduction, the suggested MIMO system is a better choice for 5G
mobile in portable systems.
24 S. Subramanyam et al.
Acknowledgements The authors would like to thank JNTUH for supporting this project. This
research was supported by the TEQIP III Collaborative Research Scheme, JNTUH.
References
1. Teja, R., Kumar, S.A., Thangavelu: CPW-fed inverted six shaped antenna design for internet
of things (IoT) applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019
2. Sahithya, V., Kumar, S.A., Thangavelu, S.: Design of CPW fed Antenna for WIMAX
Applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019
3. Ravali, S., Kumar, S.A., Thangavelu, S.: Design of a CPW fed detective solid bowtie antenna
for satellite applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019
4. Kumar, S.A., Thangavelu, S.: Design of clover slot antenna for biomedical applications.
Alexandria Eng. J. 56, 313–317 (2016)
5. Andrews, J.G., et al.: What will 5G be? IEEE J. Sel. Areas Commun. 32(6), 1065–1082 (2018)
6. Kumar, S.A., Thangavelu, S.: CPW fed monopole implantable antenna for 2.45 GHz ISM band
applications. IJEL 3(3), 152–159 (2015)
7. Kumar, S.A., Thangavelu, S.: CPW fed implantable Z-monopole antennas for ISM band
biomedical applications. IJMWT 7, 529–533 (2015)
Resized MIMO Antenna for 5G Mobile Antenna Applications 25
8. Kumar, S.A., Thangavelu, S.: Implantable CPW fed rectangular patch antenna for ISM band
biomedical applications. IJMWT 6(1), 101–107 (2014)
9. Kumar, S.A., Shanmuganantham, T.: Design of CPW-fed inverted six shaped antenna for IoT
applications. TEEM, Springer (2020)
10. Kumar, S.A., Thangavelu, S.: Design and performance of textile antenna for wearable
applications. TEEM 19(5), 352–355 (2018)
Molecule Four-Port Antenna is Utilizing
Detachment Progress of MIMO
1 Introduction
In modern days, to make smaller in size, devices designing of MIMO system with
multiple antennas are developed. While designing some implications is done one
of that is the space between MIMO receiver materials as short as possible. Space
between the antenna element will lead the problem in mutual coupling. Number of
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 27
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_4
28 K. Venugopal Rao et al.
methods are designed in novel as similar to original location of the bits, placing
neutralization strip, moderately equivalent design, and further decoupling design.
The radiating bits are arranged perpendicular together and provided by coplanar
wave guide feeds to enhance isolation. The large area is occupied by usage of CPW.
Isolation is enhanced by ground plane on the coupling current to cancel out it neutral-
ization line is placed between the antenna element. It is reduced the mutual coupling
by considering L-shaped holes in elliptical, radiator slots this radiator and elliptical
slot in the ground plane between perpendicularly placed and fractal-shaped antenna
[1].
To enhance the isolation, radiating element is used inside the T-shaped stub and in
meander line feed combination. To get more isolation, instead of using t-stub we can
use I-model stubs on surface and F-model holes in a co-shared detachment. Most of
the conventional systems consist of two antennas only, but the signal strength of two
antennas is weak. Antenna isolation was improved by using the four-antenna MIMO
system [5–7]. The more number of antennas are presented in single system; and the
more antennas in a device have high data rate and upload speeds. Mostly, the size of
the antenna is large, but in this work to decrease the size of the antenna around 30%
by using the vertical stubs [8–10]. The U-structured antenna element can be used
as decoupling element and radiating element. This is the unique feature of MIMO
antenna to have very good isolation.
Figure 1 shows the diagrammatic representation of UWB MIMO antenna. The multi-
input multi-output system consists of four single poles; each bit delivered by 50
a micro-strip line. The good isolation is achieved by perpendicular orientation of
the perpendicular element. The edges of the geometry are applied at the hexagon
molecule fractal to achieve the wideband phenomenon.
The data rate customary with a pair of 3.7 GHz frequency is greater to 34 bps/Hz.
The MIMO antenna return loss is influenced by factor t and perfectly matching in
50 impedance matching at resonant frequency. The operating frequency is operating
based on various stub length. Top of form for better performance and proper gain,
the structure of antenna with specified measurements and then the result is verified at
resonant frequency 3.7 GHz, with a return of less than −22 dB at 3.7 GHz as shown
in Fig. 2.
The 2D radiation model of this antenna, which is four-port MIMO, works well
despite showing different angles as shown in Figs. 3 and 4. Figure 5 shows the
maximum gain is 1 dBi and the efficiency shows 90% at the operating frequency.
30 K. Venugopal Rao et al.
4 Conclusion
In this paper, four-port antenna system for 5G mobile antenna applications has been
presented. This antenna system is depended on compressed antenna component and
this antenna element is a self-isolated one. The MIMO antenna is confirmed by
reproduction and analysis. And it can also achieve the good isolation without any
decoupling element or isolation components. Without decreasing any efficiency, the
diffusing component of the implemented antenna acts as an un-coupling component.
Because of size reduction, the suggested MIMO system is a better choice for 5G
mobile in portable systems.
Molecule Four-Port Antenna is Utilizing Detachment Progress of MIMO 31
Acknowledgements The authors would like to thank JNTUH for supporting this project. This
research was supported by the TEQIP III Collaborative Research Scheme, JNTUH.
References
1. Sahithya, V., Kumar, S.A., Thangavelu, S.: Design of CPW fed antenna for WIMAX
applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019
2. Ravali, S., Kumar, S.A., Thangavelu, S.: Design of a CPW fed detective solid bowtie antenna
for satellite applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019
3. Kumar, S.A., Thangavelu, S.: Design of clover slot antenna for biomedical applications.
Alexandria Eng. J. 56, 313–317 (2016)
4. Kumar, S.A., Thangavelu, S.: Design of CPW-fed inverted six shaped antenna for IoT
applications. TEEM, Springer (2020)
5. Kumar, S.A., et al.: CPW fed monopole implantable antenna for 2.45 GHz ISM band
applications. IJEL 3(3), 152–159 (2015)
6. Kumar, S.A., Thangavelu, S.: CPW fed implantable Z-monopole antennas for ISM band
biomedical applications. IJMWT 7, 529–533 (2015)
7. Kumar, S.A., et al.: Implantable CPW fed rectangular patch antenna for ISM band biomedical
applications. IJMWT 6(1), 101–107 (2014)
8. Kumar, S.A., et al.: Design of implantable CPW fed monopole antenna for ISM band
applications. TEEM 15(2), 55–59 (2014)
9. Kumar, S.A., et al.: Design and performance of textile antenna for wearable applications. TEEM
19(5), 352–355 (2018)
10. Teja, R., Kumar, S.A., Shanmuganantham, T.: CPW fed inverted six shaped antenna design for
internet of things (IoT) applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019
IOT-Based Underwater Wireless
Communication
1 Introduction
The three forth of earth’s surface is enclosed with water in the aqua-structure of
oceans, rivers and seas. The unexplored underwater environment needs to be exam-
ined. The path to do fruitful experimentation has always dependent on various
technologies. Latest improvement in technologies has directed the way to do the
underwater explorations using different sensors at each level. Hence, underwater
G. Sahu (B)
Department of EXTC, UMIT, Juhu, Mumbai, India
e-mail: giti.sahoo@gmail.com
S. S. Pawar
Department of EXTC, UMIT, SNDT Women’s University, Mumbai, India
e-mail: drsanjayspawar@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 33
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_5
34 G. Sahu and S. S. Pawar
2 System Model
Figure 3 shows the transreceiver block diagram. The terrestrial module consists of
four main components, i.e. (i) Arduino, (ii) WiFi (Epson), (iii) HC12 and (iv) Node
MCU. The Arduino works as a controller to all this serially connected devices. First
the Arduino Uno triggers the WiFi Epson module which functions as hotspot and
operates at ISM band. After successful interfacing of hotspot module with the WiFi
Epson module, it displays a coded message on LCD [16 * 2]. After connecting to the
nearby RF network, it works as a blank spot which can be connected to many such
cellular devices in the geographical area approximately 1.5 km range.
The data rate is nearly 10 Kbps for the terrestrial communication. The interfaced
HC12 module frequently transmits and receives data with the specified frequency
range. It supports half duplex transmission. It has sensitivity of 5 Kbps. The paired
transreceiver antennas capable of communicating beyond 1 km range, thus provides
adequate coverage distance.
Node MCU (Epson 8266) is server-based communicating device interconnected
with on-chip Arduino, HC12 and WiFi module. When all these devices get ready
to transmit ping message from the underwater module, the node MCU will act as
transmitter for the IOT cloud via thingspeak website. This module also measures
temperature via sensor and checks the salinity level of the water.
38 G. Sahu and S. S. Pawar
The transreceiver consists of on-chip Arduino, WiFi Epson module, HC12 and piezo-
electric sensor. The HC12 module connects the upper layer which has half the range,
i.e. upto depth of 5 ft. When any cellular device comes in contact with the HC12
under the water, within a range of 0.5 km, any devices in this range will get attached
to the HC12 and the location of the lost cellular devices can be traced. The HC12 will
communicate with the Arduino for updating the location to upper module of HC12.
Then the upper module restores the information to www.thingspeak.com (i.e. IOT
cloud) site.
The HC-12 is a half-duplex 20 dBm (100 mW) transmitter paired with a receiver
that has −117 dBm (2 × 10 − 15 W) sensitivity at 5000 bps paired with an external
antenna. These transceivers are capable of communicating up to 1 km in open air
interface and provide adequate coverage and high throughout.
The research work provides underwater wireless network for short communication
distances, i.e. near field communication (NFC). It can be used to establish the wireless
network under the surface of sea water also. This network can be used to connect
between the cellular devices, track the devices and localize the devices like finding
IOT-Based Underwater Wireless Communication 39
the GPS position and to store the location in the IOT cloud. It can also be used for
precision monitoring, controlling the uncleanliness of the water, which may arrive
from neighbour localities and industries.
Figure 4 shows the RF terrestrial module will be switch ON first and gets the
broadband service from nearby base station, i.e. the primary server. The server will
ping node MCU, then the transmitter and receiver HC12 will ON by blinking. It
triggers Epson 826 WiFi module; it operates at the ISM band, i.e. 2.4 GHz. As
all the system starts to respond, the WiFi system will transmit and will connect
module II, i.e. underwater module as shown in Fig. 5. The module II activates and
communicate with the terrestrial module and tracks the nearby devices which are
included in its range. The HC12 will be interfaced with node MCU and will update the
location of the tracked device in www.thingspeak.com. The sensors connected with
module II are DS18B20 temperature sensor, piezoelectric ceramic ring transducer
(SMR3515T55) and Waspmote Smart Water quality monitoring sensor. It measures
temperature, vibration and water quality and update in the IOT cloud. Waspmote
is smart aqua-quality monitoring sensor which is portable and measures whether
there is any chemical leakage to the water or not. It checks various aqua-quality
monitoring parameters like pH level, dissolved oxygen (do), oxidation reduction
potential (ORP) and salinity level of the water. Figure 6 displays the location, i.e.
latitude and longitude of the submerged device under the water.
40 G. Sahu and S. S. Pawar
The various use cases include, (i) military applications, (ii) monitoring the marine
activities, (iii) industrial applications, for example, fish farming and (iv) decrement
of waste deposition on the sea bed. The main challenge lies with underwater wireless
network since water is the conducting medium, i.e. lossy in nature unlike air interface.
Hence, coverage range of the network will be small. So more base stations (BS) need
to be deployed for proper and adequate coverage. Deployment of BS inside or above
the surface of the water is also a major assert. As water has flowing in nature, hence
fixed deployment of BS is not possible. Through buoyant, drones or short range
water proof BSs can be placed on the surface of sea water to establish wireless
communication.
IOT-Based Underwater Wireless Communication 41
This research is highly significant since it establishes wireless network under water.
It can able to communicate between the devices, track and locate the devices under
water. It updates the GPS location in the IOT cloud for further reference. It helps
for marine monitoring, sensing and controlling the quality of the saline water. It also
measures temperature, vibration and monitors the quality of the water, i.e. pH level,
dissolved oxygen (do) and salinity level and update in the IOT cloud. It reduces
deposition of organic waste on the sea bed. The developed system provides coverage
upto a range of 1.2 km diameter and 50 m in depth. This coverage is adequate in
water like lossy medium as compared with other related literatures.
Further optimization can be done for range expansion, more number of attached
devices and to enhance the depth of the established network. It also increases the
quality of the signal and reduces losses due to back scattering and reflections.
References
1. Felemban, E., Shaikh, F.K., Qureshi, U.M.: Underwater sensor network applications: a
comprehensive survey. Sage J. (2015). https://doi.org/10.1155/2015/896832
2. Khalid, M.A., Shah, P.A., Iqbal, K., Gillani, S., Ahmad, W., Nam, Y.: Underwater wireless sensor
networks: a review of recent issues and challenges. Wirel. Commun. Mobile Comput. 6470359,
20 (2019)
3. Oubei, H.M., Durán, J.R., Janjua, B., Wang, H.-Y., Tsai, C.-T., Chi, Y.-C., Ng, T.K., Kuo, H.-C.,
He, J.-H., Alouini, M.-S., Lin, G.-R., Ooi, B.S.: Wireless optical transmission of 450 nm, 3.2
Gbit/s 16-QAM-OFDM signals over 6.6 m underwater channel. OSA Tech. Digest Opt. Soc.
Am. 23(18), 23302–23309 (2016)
4. Saeed, N., Celik, A., Al-Naffouri, Y.Y., Alouini, M.-S.: Camera based optical communications,
localization, navigation, and motion capture: a survey. Ad Hoc Netw. (2018)
5. Oubei, H.M. et al.: Light based underwater wireless communications. Jpn. J. Appl. Phys. (2018)
6. Jang, J., Adib, F.: Underwater backscatter networking. In: SIGCOMM, Aug 19, pp. 19–23
(2019). Beijing, China
7. Gussen, C.M.G., Diniz, P.S.R., Campos, M.L.R., Martins, W.A., Costa, F.M., Gois, J.N.: A
survey of underwater wireless communication technologies. J. Commun. Inf. Syst. 31(1) (2016)
8. Ranjan, A., Ranjan, A.: Underwater wireless communication network. Adv. Electron. Electr.
Eng. 3(1), 41–46 (2013)
Pattern Prediction Using Binary Trees
Abstract In this busy world, no one has time now. Technology is being developed
every day to increase the efficiency. In this front, word predictor is a small step which
increases our efficiency multifold times. Word predictor has applications in various
areas like texting, search engine, etc. To develop our word predictor program, this
project uses the data structure Trie. Our program uses a stored file of words to predict
the words which the user may think of thus helping a lot. This project has compared
the implementation of word completion using binary trees to that of binary tries.
The proposed method that this project has used is word prediction using binary trees
as compared to already existing binary tries and has proved that implementation of
binary tries takes longer time as compared to our proposed work. Auto-complete
is a feature which helps the user to find out the things that one wants to search
by predicting the value in the search box. This auto-complete starts predicting the
searches related to the few letters or words that are being typed by the user in the
search box. This feature works best when the words typed by the user are more
common such as when addressing an email.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 43
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_6
44 T. Aditya Sai Srinivas et al.
1 Introduction
The feature “Auto-complete” starts predicting the words when the user enters the
first few letters of the word that one wants to search. When the user enters the first
letter, the auto-complete displays all the words beginning with that letter, and so the
writer can select the word from the predicted values instead of fully typing the text.
This saves a lot of time for the users around [1, 2].
Sometimes the words predicted are the ones which are recently searched by
the user. Language modeling and Augmentative and Alternative Communication
(AAC) devices are used in the word prediction process to predict the most frequently
and commonly used words. The user also can enter the words into the prediction
dictionaries using the word prediction software [3, 4].
• To understand the dynamic data structure tree used in developing the program.
• To understand the data structure "trie" being used in the program.
• To construct a strong and efficient algorithm to develop the program which is
editable and can be later used as a module for bigger software mechanism.
• To develop a real-time program which is efficient and has a fast processing and
also has an industrial application.
In this program, data structure Trie is being used to search the data in an ordered
fashion. This data structure is usually known as radix or prefix or digital tree. This
helps in storing the data in a dynamic set in which the keys are strings.
The node in the tree doesn’t store any information about the key instead the
position of the node defines the key. All the successors of a node have a common
prefix term of the previous string associated with the earlier node, and the root is
associated with the empty string. Values tend only to be associated with leaves, and
with some inner nodes that display the keys of interest (Fig. 1).
Compact prefix tree is used in case of space optimization. In the above shown
example, the predictions are done in the nodes based on the first node information.
The final nodes have a prefix term as the earlier node [5–10].
Binary Trie
A tree is a data structure which has two elements in it which are known as left child
and right child. When general tree is converted to binary tree, then left most child of
parent will become left child, and all remaining children will be right child to their
siblings (Fig. 2).
To traverse through the binary tree, this project has three different types of
traversals. They are:
• Post-order
• In-order
• Pre-order
Post-order: It first traverse through the left child node and then right child node
and finally to the root (LRV).
In-order: It first traverse through the left child node and then root finally to right
child (LVR).
Pre-order: It first traverse through root and then left child finally to right child
(VLR).
In auto-complete binary tree, the traversal through the tree is it first traverse
through the root and then to the left child and then to its right child till the second
letter is found; then, it traverses through the right child of the found node.
To traverse for the word ARE
First visit the root node A and traverse to its left child N. Then compare to second
letter of the word. Since it is not the same, so this project traverse to its right child R
and compare. It is same; therefore, this project traverse to its left child (Fig. 3).
46 T. Aditya Sai Srinivas et al.
2 Background
The main theme of this study was to investigate if the word processing was helpful
to people, especially the disabled people who find difficulty in writing. It mainly
focused on teaching children to use word processing.
There was an activity conducted among the children. The first case was the children
wrote stories by their own which involved handwritten work while in the other case
children used word processing and word prediction in writing. The difference was
noted in the use of spellings, grammatical errors, and use of legible words. The
results varied. The difference clearly noted the importance of use of word processing
or word prediction [11–15].
Word processing with word prediction improves the legibility and spelling of
written assignments completed by some children with learning disabilities and hand-
writing difficulties. Many students a with physical disabilities find difficulty in
write fluently. One type of assistive technology that has been developed to improve
accuracy in writing is word prediction software, although there is lack of research
supporting its use for individuals with physical disabilities [16–20].
This study did a research on word prediction and word processing to examine
the accuracy in writing draft papers by physically disabled people. Results indicated
that there was no effect on the writing speed of the people but it shows promise in
decreasing spelling and typographical errors [21–25].
Writing is a medium of human communication that involves interaction between
physical and cognitive skills. Physically disabled people find difficulty in writing,
and so they have to overcome few barriers in order to overcome the difficulties in
writing.
Most of the opportunities gained by the individuals are based on their writing
skills, and so there must be some technological development in the field of writing.
One such technology is assistive technology which helps in increasing the fluency in
typing. The main motive of this study is to improve the writing skills of physically
disabled people. There was an alternating treatment designed in which the diverse
Pattern Prediction Using Binary Trees 47
physically disabled people were recruited in. The words correct per minute (WCPM)
and the grammatical errors were noted down for further investigation [26–30].
The recruited people were allowed to type for three minutes using the word
processor and word prediction. This was done to check which of these two were
more efficient in case of writing fluently. The most widely used websites involve the
library website and the online searching tools that are being used by the youth to
search things. And so providing automated searching features are most important for
gaining relevant results of their academic research scenarios. This feature helps the
user to end up getting the best hit.
This technology auto-completion has been very productive very affordable for
the people who have disabilities and find difficulty in writing. This thing is easily
made available when compared to other speech-to-text technology or special input
devices.
The only disadvantage of this feature is that for a document, consider a set A
and alphabetical range B of words, compute the set of all word-in-document pairs
(a, b) from the collection such that a belongs to A and b belongs to B. This leads
to the independent size of the underlying document collection. Python is the most
frequently used language for testing advanced features compared to some auto hotkey
scripting language [31–37].
3 Proposed Method
void parse Tree(Node∗, string, vector < string > &, bool&)
This module traverses through the binary tree and appends the word in the vector
of strings which passed as reference to the parse tree function. It uses the algorithm.
1. If left exists, then go left.
2. If right exists, then go right else return to previous node.
4 Result Analysis
In Fig. 4, y-axis represents the time in seconds, and x-axis represents the number of
iterations for binary tree and trie data for prefix appl. The binary tree approach is
better for the prefix appl (Tables 1 and 2).
In Fig. 5, y-axis represents the time in seconds, and x-axis represents the number
of iterations for binary tree and trie data for prefix appl. The binary tree approach is
better for the prefix Gan.
5 Conclusion
Word predictor has application in messaging application like WhatsApp, web search
engines, word processors, command like interpreters, etc. The original need of word
prediction software was to help people with physical disabilities which increase their
speed of typing as well as fewer the number of keystrokes needed in order to complete
a word or a sentence. Thus, in this front, this project has developed a program for
word prediction using data structure binary which definitely increases efficiency of
the user by at least 10%.
50 T. Aditya Sai Srinivas et al.
References
1. Sturm, J.M., Rankin-Erickson, J.L.: This report that mind mapping helps students with learning
disabilities to enhance their writing skills. Learn. Disabilities Res. Practice 17, 124–139 (2002)
2. Todman, J., Dugard, P.: Single-Case and Small-N Experimental Designs: A Practical Adviser
to Randomization Tests. Lawrence Erlbaum Associates, Mahwah, NJ (2001)
3. Tumlin, J., Heller, K.: Using word prediction software, writing becomes more easier to mild
disabilities. J. Special Educ. Technol. 19(3) (2004). https://jset.unlv.edu/19.3/tumlin/first.html
4. Weller, H.G.: Evaluating the effect of computer-based methods to support science teaching. J.
Res. Comput. Educ. 28, 461–485 (1996)
5. Zhang, Y.: Technology and the writing skills of students with learning disabilities. J. Res.
Comput. Educ. 32, 467–478 (2000)
6. Basu, S., Kannayaram, G., Ramasubbareddy, S., Venkatasubbaiah, C.: Improved genetic algo-
rithm for monitoring of virtual machines in cloud environment. In: Smart Intelligent Computing
and Applications, pp. 319–326. Springer, Singapore (2019)
7. Somula, R., Sasikala, R.: Round robin with load degree: an algorithm for optimal cloudlet
discovery in mobile cloud computing. Scal. Comput. Practice Exper. 19(1), 39–52 (2018)
8. Somula, R., Anilkumar, C., Venkatesh, B., Karrothu, A., Kumar, C. P., Sasikala, R.: Cloudlet
services for healthcare applications in mobile cloud computing. In: Proceedings of the 2nd
International Conference on Data Engineering and Communication Technology, pp. 535–543.
Springer, Singapore (2019)
9. Somula, R.S., Sasikala, R.: A survey on mobile cloud computing: mobile computing+ cloud
computing (MCC= MC + CC). Scal. Comput. Pract. Experi. 19(4), 309–337 (2018)
10. Somula, R., Sasikala, R.: A load and distance aware cloudlet selection strategy in multi-cloudlet
environment. Int. J. Grid High Perform. Comput. (IJGHPC) 11(2), 85–102 (2019)
11. Somula, R., Sasikala, R.: A honey bee inspired cloudlet selection for resource allocation. In:
Smart Intelligent Computing and Applications, pp. 335–343. Springer, Singapore (2019)
12. Nalluri, S., Ramasubbareddy, S., Kannayaram, G.: Weather prediction using clustering
strategies in machine learning. J. Comput. Theor. Nanosci. 16(5–6), 1977–1981 (2019)
13. Sahoo, K.S., Tiwary, M., Mishra, P., Reddy, S.R.S., Balusamy, B., Gandomi, A.H.: Improving
end-users utility in software-defined wide area network systems. In: IEEE Transactions on
Network and Service Management (2019)
14. Sahoo, K.S., Tiwary, M., Sahoo, B., Mishra, B.K., RamaSubbaReddy, S., Luhach, A.K.: RTSM:
response time optimisation during switch migration in software-defined wide area network. In:
IET Wireless Sensor Systems (2019)
15. Somula, R., Kumar, K.D., Aravindharamanan, S., Govinda, K.: Twitter sentiment analysis
based on US presidential election 2016. In: Smart Intelligent Computing and Applications,
pp. 363–373. Springer, Singapore (2020)
16. Sai, K.B.K., Subbareddy, S.R., Luhach, A.K.: IOT based air quality monitoring system using
MQ135 and MQ7 with machine learning analysis. Scal. Comput. Practice Experi. 20(4), 599–
606 (2019)
17. Somula, R., Narayana, Y., Nalluri, S., Chunduru, A., Sree, K.V.: POUPR: properly utilizing
user-provided recourses for energy saving in mobile cloud computing. In: Proceedings of the
2nd International Conference on Data Engineering and Communication Technology, pp. 585–
595. Springer, Singapore (2019)
18. Vaishali, R., Sasikala, R., Ramasubbareddy, S., Remya, S., Nalluri, S.: Genetic algorithm based
feature selection and MOE Fuzzy classification algorithm on Pima Indians Diabetes dataset. In:
2017 International Conference on Computing Networking and Informatics (ICCNI), pp. 1–5.
IEEE (2017, Oct)
Pattern Prediction Using Binary Trees 51
19. Somula, R., Sasikala, R.: A research review on energy consumption of different frameworks in
mobile cloud computing. In: Innovations in Computer Science and Engineering, pp. 129–142.
Springer, Singapore (2019); Kumar, I.P., Sambangi, S., Somukoa, R., Nalluri, S., Govinda, K.:
Server security in cloud computing using block-chaining technique. In: Data Engineering and
Communication Technology, pp. 913–920. Springer, Singapore (2020)
20. Kumar, I.P., Gopal, V.H., Ramasubbareddy, S., Nalluri, S., Govinda, K.: Dominant color palette
extraction by K-means clustering algorithm and reconstruction of image. In: Data Engineering
and Communication Technology, pp. 921–929. Springer, Singapore (2020)
21. Nalluri, S., Saraswathi, R.V., Ramasubbareddy, S., Govinda, K., Swetha, E.: Chronic heart
disease prediction using data mining techniques. In: Data Engineering and Communication
Technology, pp. 903–912. Springer, Singapore (2020)
22. Krishna, A.V., Ramasubbareddy, S., Govinda, K.: Task scheduling based on hybrid algo-
rithm for cloud computing. In: International Conference on Intelligent Computing and Smart
Communication 2019, pp. 415–421. Springer, Singapore (2020)
23. Srinivas, T.A.S., Ramasubbareddy, S., Govinda, K., Manivannan, S.S.: Web image authenti-
cation using embedding invisible watermarking. In: International Conference on Intelligent
Computing and Smart Communication 2019, pp. 207–218. Springer, Singapore (2020)
24. Krishna, A.V., Ramasubbareddy, S., Govinda, K.: A unified platform for crisis mapping using
web enabled crowdsourcing powered by knowledge management. In: International Conference
on Intelligent Computing and Smart Communication 2019, pp. 195–205. Springer, Singapore
(2020)
25. Saraswathi, R.V., Nalluri, S., Ramasubbareddy, S., Govinda, K., Swetha, E.: Brilliant corp yield
prediction utilizing internet of things. In: Data Engineering and Communication Technology,
pp. 893–902. Springer, Singapore (2020)
26. Baliarsingh, S.K., Vipsita, S., Gandomi, A.H., Panda, A., Bakshi, S., Ramasubbareddy, S.:
Analysis of high-dimensional genomic data using map reduce based probabilistic neural
network. Comput. Methods Progr. Biomed. 105625 (2020)
27. Lavanya, V., Ramasubbareddy, S., Govinda, K.: Fuzzy keyword matching using N-gram and
cryptographic approach over encrypted data in cloud. In: Embedded Systems and Artificial
Intelligence, pp. 551–558. Springer, Singapore (2020)
28. Revathi, A., Kalyani, D., Ramasubbareddy, S., Govinda, K.: Critical review on course recom-
mendation system with various similarities. In: Embedded Systems and Artificial Intelligence,
pp. 843–852. Springer, Singapore (2020)
29. Mahesh, B., Kumar, K.P., Ramasubbareddy, S., Swetha, E.: A review on data deduplication
techniques in cloud. In: Embedded Systems and Artificial Intelligence, pp. 825–833. Springer,
Singapore (2020)
30. Sathish, K., Ramasubbareddy, S., Govinda, K.: Detection and localization of multiple objects
using VGGNet and single shot detection. In: Emerging Research in Data Engineering Systems
and Computer Communications, pp. 427–439. Springer, Singapore (2020)
31. Pradeepthi, C., Geetha, V.V., Ramasubbareddy, S., Govinda, K.: Prediction of real estate price
using clustering techniques. In: Emerging Research in Data Engineering Systems and Computer
Communications, pp. 281–289. Springer, Singapore (2020)
32. Maddila, S., Ramasubbareddy, S., Govinda, K.: Crime and fraud detection using clustering
techniques. In: Innovations in Computer Science and Engineering, pp. 135–143. Springer,
Singapore (2020)
33. Rakshitha, K., Rao, A.S., Sagar, Y., Ramasubbareddy, S.: Demonstrating broadcast aggregate
keys for data sharing in cloud. In: Innovations in Computer Science and Engineering, pp. 185–
193. Springer, Singapore (2020)
34. Ramasubbareddy, S., Srinivas, T.A.S., Govinda, K., Manivannan, S.S.: Comparative study
of clustering techniques in market segmentation. In: Innovations in Computer Science and
Engineering, pp. 117–125. Springer, Singapore (2020)
35. Ramasubbareddy, S., Srinivas, T.A.S., Govinda, K., Manivannan, S.S.: Crime prediction
system. In: Innovations in Computer Science and Engineering, pp. 127–134. Springer,
Singapore (2020)
52 T. Aditya Sai Srinivas et al.
36. Sahoo, K.S., Tiwary, M., Sahoo, S., Nambiar, R., Sahoo, B., Dash, R.: A learning automata-
based DDoS attack defense mechanism in software defined networks. In: Proceedings of the
24th Annual International Conference on Mobile Computing and Networking, pp. 795–797
(2018, Oct)
37. Sahoo, K.S., Sahoo, S., Sarkar, A., Sahoo, B., Dash, R.: On the placement of controllers for
designing a wide area software defined networks. In: TENCON 2017–2017 IEEE Region 10
Conference, pp. 3123–3128. IEEE (2017, Nov)
Fruit Recognition Using Deep Learning
Abstract This paper discusses on the fruits classification for which the data is
collected from the dataset called Fruits_360. Using this data, training of a neural
network which will identify the fruit. Using the deep learning and image processing
concepts form a neural networking system. The proposed work uses convolution
neural networks in building the model and also used ResNet to get the results of image
classification from deep learning concept. To meet the resource requirement for the
proposed work, it uses Google cloud vision API which gives us the required GPU
to proceed with the process of analyzing the data from the image and also discussed
the in depth of how image classification is done using deep learning concepts. Here,
in building a deep learning model, which classifies the given image into any of these
nine categories: Apple, Avocado, Banana, Cherry, Cocos, Kiwi, Mango, Orange,
Lemon. This model can also be implemented into mobile version.
1 Introduction
Convolution neural networks are designed based on the neural networks of human
brain system. The human brain needs many incidents in real life so that our human
brain recognizes the incidents again and provides the action that is to be given [1].
Similarly, here, the testing data is given to the convolution network, and this testing
data is used to train the network for the further validation of the images [2]. So, this
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 53
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_7
54 P. Balakesava Reddy et al.
convolution network concept is similar to the human neural networks, and moreover,
it is designed in the same manner. Now, in the testing process, the given image is
read and compared with many other images and finds the nearest image to the given
image based on the probabilities. It processes the whole images sets until the last
image even though if it finds the accurate image so giving poor bastards will form
more complicated network, and achieving the high accurate output can be more
complicated and sometimes not even possible [3].
In convolution, the images are classified into two parts one is black and white
image which forms a 2D array and the other colored image which forms a 3D array.
Since they are different, the values assigned will be different for the pixels when
given to the CNN [4]. For the black and white image the values to the pixel are
assigned between 0 and 255 to represent the what is the color of the pixel, where
colored image is a combination of red green blue that they have separate extra layer
which means each color has a range of 0–255 like a pixel might have (255, 105, 180)
as its pixel value, and this defines the pink color of the pixel in the image [5]. So
from this, we get to know the color of the given image which is considered as the
input of the network.
First set the boundaries of the image by detecting its edges, and we sent the data
inside the boundaries as 1’s, and the remaining part of the image is giving 0’s; now,
we will be able to mark the location of the fruit boundaries.
Now, we use a 3 × 3 matrix as feature detector which is a kernel which detects
the image and gets the data. The input image is placed such that the cell from the
first row and first column is fit inside the boundary of a selected image. Later on the
feature detector is moved toward to the other end. Then, the detector is moved in the
next row from the previous row which results in the feature map. This feature map
helps in reducing the pixel which reduces the input image size and takes less time
to detect the image for further usage. The larger the stride is, the lesser the image
map will be, and the lesser the image map, then the accuracy will also be decreased.
There is also a loss in the information of the data which might reduce the accuracy;
however, its function is to take only main content of the image and remove the extra
part and which also improves the accuracy. Reducing the input image means that we
are just concentrating on the main features of the input; this will help the detection
process to concentrate only on the main data instead of the useless data in the image
which also will decrease the accuracy of the system [2].
The linearity for the image is maintained through the usage of rectified linear unit
when the image is under convolution operation. It removes all the black elements
and taking only positive value from the data [3].
In general, we use a regular feature map, whereas there is another kind, i.e., pooled
feature map which process differs from the general one. Now, let us take 2 × 2 box
and place it as usual we place in 3 × 3 at left corner. Now move it along the opposite
end of the row. If we use strides of 2 pixel it results in 3 × 3 pooled feature map.
In general, stride of 2 are very familiar in usage. The minimum value that leads
the account for distortion is the whole point for pooling. So, use these techniques
creation of the model can be achieved [6–10].
Fruit Recognition Using Deep Learning 55
2 Literature Survey
The previous work where the neural networks and deep learning concepts are used
in the image classification. Following these papers which discusses about counting
the fruits of various kinds from the given bunch of fruits. Locating and counting
the red and green pepper fruits from large bunch of fruits is the main aim. They
had used around 28,000 images for training and validation of around various plants.
Two steps involved in the process, one among them is placing the single image,
and the other one is integrating all the views to get accuracy. Here, in the project,
creation neural network concepts are used and also should use convolution networks.
Network will be trained using RGB images; the RGB images are further normalized
into two dimensions. The paper which discusses on the apple production prediction
from which get to know how to get the edges and cross-sectional area of the fruits and
getting the cross-sectional area of the ripened part. Detect the damaged part of the
fruit based on the texture and color of the fruit, and this is compared with the other
testing data using the k-nearest neighbor algorithm which predicts the accuracy. This
is also used in face detection and vehicle detection based on linear projections and
analysis of the image [1–4].
In this model, the concepts of convolutional neural network (CNN). This has five
layers called convolution layer after this then it goes to rectified linear Unit layer and
which passes through pooling layer then it goes to the fully connected layers and
then loss layer. Here, we use an RGB images with pixel size of 100 × 100 [5].
The operation over two functions which produces the third function is called
convolution. Here, the third result is derived from the two functions which have all
the characteristics of the two functions used to derive it. In convolution layer is also
done the same the input from all image data which is convoluted to form a resulting
function which predicts the output [11–15].
To increase the non-linear properties of the input data, we use ReLU layer. To
reduce the dimensions and to reduce the number of computations, we use pooling
layers. 2 × 2 is the pooling layer filter used by us with 2 strides. This makes us to
reduce the input to one fourth of it. The layers from the regular neural network are
called fully connected layers. The connection between the neurons from one layer
to another layer is done here [15–20].
3 Proposed Method
All the images in the dataset were pre-processed using tensorflow image data gener-
ator. Since there are some null entries for age, they are being filled by the mean
age in the dataset. To avoid overfitting and exploding or vanishing of gradients, all
the images in the dataset were normalized by subtracting by their mean and divided
by their standard deviation. Since the deep learning models can only feed on the
numeric data, we cannot directly feed the raw images to the model, so we convert the
56 P. Balakesava Reddy et al.
images into numeric tensors which are multi-dimensional arrays which are mostly
three dimensional. Generally, gray scale images are two dimensional, and their pixel
value varies between 0 and 255, whereas in the case of colored images, they are
three dimensional, and the third dimension is the color dimension, which generally
have three channels which are red, green, blue channels which are called as the RGB
channels. Similar to the gray-scaled images, the pixel value varies between 0 and
255 in these channels. Accordingly, all the images in the dataset will be converted to
numeric tensors with multiple channels. The training and the testing data were split
into 80:20 ratio. In present paper, size of the training set is increased by data augmen-
tation. The parameters such as rotation range, vertical flip, horizontal flip, height shift
range width shift range, and zoom range are added to the images to increase its size by
data augmentation. By applying these parameters, the image randomly gets cropped
and rotated, zoomed, flipped in different angles. In this way, the model won’t see the
repeated images of base image but it learns better by observing the image in different
ways. So, there is no chance of overfitting the model by data augmentation.
The values of the parameters used for this paper are:
Rotation range = 0.5,
Zoom range = 0.5,
Width shift range = 0.5,
Height shift range = 0.5,
Horizontal flip = False,
Vertical flip = False.
Normalizing our data is very important in deep learning because in the training
data, there will be various ranges of distribution of feature values for each feature,
and there will be a update caused by learning rate in each and every dimension that
will be different from other dimensions. We may be highly increasing the correction
in one of the weight dimensions at the same time decreasing in another. Training
also takes long time when training the model without normalization. In this paper,
we have normalized our training and test data by subtracting with their theoretical
mean values and then dividing by their standard deviation. The values will be roughly
between zero and one. The speed of the training also increases since the gradient
updating becomes easier and gradually the accuracy increases [21–25].
In this paper, the proposed method uses convolutional neural networks which
are playing major role in solving problems related to computer vision. Convolution
neural networks are similar to general neural networks which consist of input, hidden
layers, and output. The activation function used in this work is rectified linear unit
(RELU). The following layers after the convolution layer in the network are the max
pooling layers which help in reducing the dimension, preserve spatial invariance and
to output the high-level features of the input. Since to avoid overfitting, gradient
explode or gradient vanishing we have added some dropout layers in between the
model. Since the model is mainly build using convolutional layers and we need an
output among seven classes so at the end of the model, we had flattened the layers
and we have added some dense layers along with some dropout layers and as the
final layer, we have a SoftMax layer since we have multiple classes in our data. Since
Fruit Recognition Using Deep Learning 57
there are seven classes, we build the final layer with seven neurons. In the output, we
get seven probable values; each neuron outputs a probability value of input image
belonging to that class, and all the values of the seven neurons add up to get the sum
as one (Fig. 1).
The proposed method also uses tensorflow framework for designing, training,
and evaluating our deep learning model in this task. The model in this algorithm
is built mainly by using transfer learning, which means using a pre-trained model
which is trained previously on large datasets, and the pre-trained model can be used
in building our model. The pre-trained model used in our model is ResNet which is
trained on 14 million images containing more than 1000 classes (categories). ResNet
is trained on the ImageNet dataset, and the weights are saved after the training. So, at
the starting stage of the training, we assign these weights, which are generated when
trained ImageNet to the pre-trained model. And above the pre-trained model, we have
added some customized convolutional and max pooling layers. At first, we train the
model by freezing the weights of pre-trained model; it means that the pre-trained
model has the initial weights, and the weights don’t get updated. During the training,
only the parameters of added layers will get updated. The performance of the model
gets better after unfreezing the weights because the whole model will be trained on
the particular dataset provided by us. The parameters in the model get updated by
back propagation which means after calculating the loss between the actual and the
predicted output the parameters gets updated with respect to the loss value [26–30].
The loss between the actual and predicted output is calculated by using the func-
tion called categorical cross-entropy. In this work, the optimizer used to update the
parameters which are RMSprop optimizer with an initial learning rate of 0.0001.
As we are adding the earlier model to the latter model, artificial neural networks
and convolutional neural networks collide. Hence, it becomes more complex and
58 P. Balakesava Reddy et al.
4 Result
Epoch 31/40
95/95 [==============================]—3 s 31 ms/step—loss:
0.0558—acc: 0.9784—val_loss: 0.0126—val_acc: 0.9959
Epoch 32/40
95/95 [==============================]—3 s 31 ms/step—loss:
0.0602—acc: 0.9764—val_loss: 0.0034—val_acc: 0.9995
Epoch 33/40
95/95 [==============================]—3 s 30 ms/step—loss:
0.0603—acc: 0.9763—val_loss: 0.0618—val_acc: 0.9808
Epoch 34/40
95/95 [==============================]—3 s 31 ms/step—loss:
0.0544—acc: 0.9785—val_loss: 0.0254—val_acc: 0.9906
Epoch 35/40
95/95 [==============================]—3 s 31 ms/step—loss:
0.0703—acc: 0.9720—val_loss: 0.0472—val_acc: 0.9837
Fruit Recognition Using Deep Learning 59
Epoch 36/40
95/95 [==============================]—3 s 31 ms/step—loss:
0.0551—acc: 0.9788—val_loss: 0.0671—val_acc: 0.9787
Epoch 37/40
95/95 [==============================]—3 s 30 ms/step—loss:
0.0510—acc: 0.9800—val_loss: 0.0124—val_acc: 0.9925
Epoch 38/40
95/95 [==============================]—3 s 32 ms/step—loss:
0.0558—acc: 0.9781—val_loss: 0.0173—val_acc: 0.9934
Epoch 39/40
95/95 [==============================]—3 s 30 ms/step—loss:
0.0481—acc: 0.9810—val_loss: 0.0038—val_acc: 0.9979
Epoch 40/40
95/95 [==============================]—3 s 31 ms/step—loss:
0.0407—acc: 0.9839—val_loss: 0.0417—val_acc: 0.9889 (Figs. 2 and 3).
5 Conclusion
An effective algorithm for detection and to track the objects is explained. Also got
to know about the drawbacks and efficiency of the algorithm. In request to defeat
the issue of recognition, tracking related to movement and appearance. The major
application of fruit detection can be observed in the vision-based AI’s, where the
identification and tracking of individuals play major role. For any fruit tracking
algorithm, the initial step is to locate the fruit in the respective frame. Though there
are numerous algorithms choosing the accurate location of the face which has been
60 P. Balakesava Reddy et al.
a difficult task. A CNN is widely being used by all kind of researchers for the fruit
detection. Tracking is followed by the fruit detection. There are many algorithms
to track the objects. For future work, one can implement the same algorithm for
some more different objects with some more advanced filters for the noise reduc-
tion. In request to defeat the issue of recognition, tracking related to movement and
appearance.
References
1. O’Shea, K., Nash, R.: An Introduction to Convolutional Neural Networks. ArXiv e-prints
(2015)
2. Albawi, S., Abed Mohammed, T., Alzawi, S.: Understanding of a Convolutional Neural
Network (2017). https://doi.org/10.1109/ICEngTechnol.2017.8308186
3. Khan, A., Sohail, A., Zahoora, U., Saeed, A.: A Survey of the Recent Architectures of Deep
Convolutional Neural Networks (2019)
4. Zhang, F., Hu, M.: Memristor-Based Deep Convolution Neural Network: A Case Study (2018)
5. Bambharolia, P.: Overview of convolutional neural networks (2017)
6. Basu, S., Kannayaram, G., Ramasubbareddy, S., Venkatasubbaiah, C.: Improved genetic algo-
rithm for monitoring of virtual machines in cloud environment. In: Smart Intelligent Computing
and Applications, pp. 319–326. Springer, Singapore (2019)
7. Somula, R., Sasikala, R.: Round robin with load degree: an algorithm for optimal cloudlet
discovery in mobile cloud computing. Scal. Comput. Practice Exper. 19(1), 39–52 (2018)
8. Somula, R., Anilkumar, C., Venkatesh, B., Karrothu, A., Kumar, C.P., Sasikala, R.: Cloudlet
services for healthcare applications in mobile cloud computing. In: Proceedings of the 2nd
International Conference on Data Engineering and Communication Technology, pp. 535–543.
Springer, Singapore (2019)
9. Somula, R.S., Sasikala, R.: A survey on mobile cloud computing: mobile computing + cloud
computing (MCC= MC + CC). Scal. Comput. Practice Experi. 19(4), 309–337 (2018)
Fruit Recognition Using Deep Learning 61
10. Somula, R., Sasikala, R.: A load and distance aware cloudlet selection strategy in multi-cloudlet
environment. Int. J. Grid High Perform. Comput. (IJGHPC) 11(2), 85–102 (2019)
11. Somula, R., Sasikala, R.: A honey bee inspired cloudlet selection for resource allocation. In:
Smart Intelligent Computing and Applications, pp. 335–343. Springer, Singapore (2019)
12. Nalluri, S., Ramasubbareddy, S., Kannayaram, G.: Weather prediction using clustering
strategies in machine learning. J. Comput. Theor. Nanosci. 16(5–6), 1977–1981 (2019)
13. Sahoo, K.S., Tiwary, M., Mishra, P., Reddy, S.R.S., Balusamy, B., Gandomi, A.H.: Improving
end-users utility in software-defined wide area network systems. In: IEEE Transactions on
Network and Service Management
14. Sahoo, K.S., Tiwary, M., Sahoo, B., Mishra, B.K., RamaSubbaReddy, S., Luhach, A.K.: RTSM:
response time optimisation during switch migration in software-defined wide area network. In:
IET Wireless Sensor Systems
15. Somula, R., Kumar, K.D., Aravindharamanan, S., Govinda, K.: Twitter sentiment analysis
based on US presidential election 2016. In: Smart Intelligent Computing and Applications,
pp. 363–373. Springer, Singapore (2020)
16. Sai, K.B.K., Subbareddy, S.R., Luhach, A.K.: IOT based air quality monitoring system using
MQ135 and MQ7 with machine learning analysis. Scal. Comput. Practice Experi. 20(4), 599–
606 (2019)
17. Somula, R., Narayana, Y., Nalluri, S., Chunduru, A., Sree, K.V.: POUPR: properly utilizing
user-provided recourses for energy saving in mobile cloud computing. In: Proceedings of the
2nd International Conference on Data Engineering and Communication Technology, pp. 585–
595. Springer, Singapore (2019)
18. Vaishali, R., Sasikala, R., Ramasubbareddy, S., Remya, S., Nalluri, S.: Genetic algorithm based
feature selection and MOE Fuzzy classification algorithm on Pima Indians diabetes dataset. In:
2017 International Conference on Computing Networking and Informatics (ICCNI), pp. 1–5.
IEEE (2017, Oct)
19. Somula, R., Sasikala, R.: A research review on energy consumption of different frameworks in
mobile cloud computing. In: Innovations in Computer Science and Engineering, pp. 129–142.
Springer, Singapore (2019); Kumar, I.P., Sambangi, S., Somukoa, R., Nalluri, S., Govinda, K.:
Server security in cloud computing using block-chaining technique. In: Data Engineering and
Communication Technology, pp. 913–920. Springer, Singapore (2020)
20. Kumar, I.P., Gopal, V.H., Ramasubbareddy, S., Nalluri, S., Govinda, K.: Dominant color palette
extraction by K-means clustering algorithm and reconstruction of image. In: Data Engineering
and Communication Technology, pp. 921–929. Springer, Singapore (2020)
21. Nalluri, S., Saraswathi, R.V., Ramasubbareddy, S., Govinda, K., Swetha, E.: Chronic heart
disease prediction using data mining techniques. In: Data Engineering and Communication
Technology, pp. 903–912. Springer, Singapore (2020)
22. Krishna, A.V., Ramasubbareddy, S., Govinda, K.: Task scheduling based on hybrid algo-
rithm for cloud computing. In: International Conference on Intelligent Computing and Smart
Communication 2019, pp. 415–421. Springer, Singapore (2020)
23. Srinivas, T.A.S., Ramasubbareddy, S., Govinda, K., Manivannan, S.S.: Web image authenti-
cation using embedding invisible watermarking. In: International Conference on Intelligent
Computing and Smart Communication 2019, pp. 207–218. Springer, Singapore (2020)
24. Krishna, A.V., Ramasubbareddy, S., Govinda, K.: A unified platform for crisis mapping using
web enabled crowdsourcing powered by knowledge management. In: International Conference
on Intelligent Computing and Smart Communication 2019, pp. 195–205. Springer, Singapore
(2020)
25. Saraswathi, R.V., Nalluri, S., Ramasubbareddy, S., Govinda, K., Swetha, E.: Brilliant corp yield
prediction utilizing internet of things. In: Data Engineering and Communication Technology,
pp. 893–902. Springer, Singapore (2020)
26. Baliarsingh, S.K., Vipsita, S., Gandomi, A.H., Panda, A., Bakshi, S., Ramasubbareddy,
S.: Analysis of high-dimensional genomic data using mapreduce based probabilistic neural
network. Comput. Methods Progr. Biomed. 105625 (2020)
62 P. Balakesava Reddy et al.
27. Lavanya, V., Ramasubbareddy, S., Govinda, K.: Fuzzy keyword matching using N-gram and
cryptographic approach over encrypted data in cloud. In: Embedded Systems and Artificial
Intelligence, pp. 551–558. Springer, Singapore (2020)
28. Revathi, A., Kalyani, D., Ramasubbareddy, S., Govinda, K.: Critical review on course recom-
mendation system with various similarities. In: Embedded Systems and Artificial Intelligence,
pp. 843–852. Springer, Singapore (2020)
29. Mahesh, B., Kumar, K.P., Ramasubbareddy, S., Swetha, E.: A review on data deduplication
techniques in cloud. In: Embedded Systems and Artificial Intelligence, pp. 825–833. Springer,
Singapore (2020)
30. Sathish, K., Ramasubbareddy, S., Govinda, K.: Detection and localization of multiple objects
using VGGNet and single shot detection. In: Emerging Research in Data Engineering Systems
and Computer Communications, pp. 427–439. Springer, Singapore (2020)
31. Pradeepthi, C., Geetha, V.V., Ramasubbareddy, S., Govinda, K.: Prediction of real estate price
using clustering techniques. In: Emerging Research in Data Engineering Systems and Computer
Communications, pp. 281–289. Springer, Singapore (2020)
32. Maddila, S., Ramasubbareddy, S., Govinda, K.: Crime and fraud detection using clustering
techniques. In: Innovations in Computer Science and Engineering, pp. 135–143. Springer,
Singapore (2020)
33. Rakshitha, K., Rao, A.S., Sagar, Y., Ramasubbareddy, S.: Demonstrating broadcast aggregate
keys for data sharing in cloud. In: Innovations in Computer Science and Engineering, pp. 185–
193. Springer, Singapore (2020)
34. Ramasubbareddy, S., Srinivas, T.A.S., Govinda, K., Manivannan, S.S.: Comparative study
of clustering techniques in market segmentation. In: Innovations in Computer Science and
Engineering, pp. 117–125. Springer, Singapore (2020)
35. Ramasubbareddy, S., Srinivas, T.A.S., Govinda, K., Manivannan, S.S.: Crime prediction
system. In: Innovations in Computer Science and Engineering, pp. 127–134. Springer,
Singapore (2020)
36. Sahoo, K.S., Tiwary, M., Sahoo, S., Nambiar, R., Sahoo, B., Dash, R.: A learning automata-
based DDoS attack defense mechanism in software defined networks. In: Proceedings of the
24th Annual International Conference on Mobile Computing and Networking, pp. 795–797
(2018, Oct)
37. Sahoo, K.S., Sahoo, S., Sarkar, A., Sahoo, B., Dash, R.: On the placement of controllers for
designing a wide area software defined networks. In: TENCON 2017–2017 IEEE Region 10
Conference, pp. 3123–3128. IEEE (2017, Nov)
Cross-Domain Variational Capsules for
Information Extraction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 63
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_8
64 A. Nagaraj et al.
1 Introduction
The machine reasoning domain [1], a part of the machine learning umbrella, deals
with extracting information from latent data, decoding it and reasoning out the deci-
sions made by machine learning systems. Machine reasoning is a two-step process;
the generation of information and the generation of reasoning from this information.
We extract information by training the model on a few domains and testing the
model on a new domain. In doing so, the model discovers information from the new
domain. Though this might not seem like machine reasoning in the truest sense, it does
generate information from latent data. With this paper, we aim to solve a small prob-
lem in this vast domain: Simulate the way a human brain classifies cross-domain
information and generates insight, by identifying prominent characteristics in
data and use this identification mechanism to auto-generate insight from data
in unseen domains.
A part of machine reasoning is transfer learning [2]. It stores the knowledge gained
from tackling one problem and applies it to another problem which is related to the
previous problem solved. Our model incorporates transfer learning to transfer latent
information across domains, known as Domain Adaptation [3].
Domain adaptation is a field that deals with machine learning as well as transfer
learning. Domain Adaptation can be used when the goal is to learn from one source
distribution and apply the learning to a different target distribution related to the
source. Scenarios, in which there are multiple source distributions present, are called
multi-source domain adaptations. Research being done in this field addresses a major
issue—the need to determine a model’s capacity to accurately accept data from a
given target domain and label that data accordingly. The challenge arises because
the model is trained on a different source domain. Unsupervised learning algorithms
[4] that are implemented without using domain adaptation assume that the examples
are independent and identically distributed.
2 Dataset
2.1 Introduction
The dataset introduced in this paper, the Multi-domain Image Characteristic Dataset
[5], consists of thousands of images sourced from the internet. Each image falls
under one of three domains—animals, birds or furniture. There are five types under
each domain. There are 200 images of each type, summing up the total dataset to
Cross-Domain Variational Capsules for Information Extraction 65
3000 images. The master file consists of two columns; the image name and the
visible characteristics in that image. Every image was manually analysed and the
characteristics for each image was generated, ensuring accuracy.
Images falling under the same domain have a similar set of characteristics. For
example, pictures under the Birds domain will have a common set of characteristics
such as the color of the bird, the presence of a beak, wing, eye, legs, etc. Care has been
taken to ensure that each image is as unique as possible by including pictures that
have different combinations of visible characteristics present. This includes pictures
having variations in the capture angle, etc.
At the time of our research, there was a dearth of publicly available datasets that
contain visible characteristics in images belonging to various domains. The proposed
dataset [5] addresses this, as it has the following features:
• describes visible characteristics present in every picture.
• contains at least hundreds of pictures belonging to multiple domains, and also
contains multiple types within each domain. This is crucial to train our model
accurately.
• contains unique pictures belonging to a type that fall under a certain domain. This
is accomplished by collecting pictures that have different combinations of visible
characteristics, different angles in which the object was captured, etc.
We recommend a test-train split of 600 samples (20%) and 2400 samples (80%). A
.txt file with the images to be included in the test and train splits is included, with
no overlap between the sets. Following the train-test split as mentioned would help
ensure consistency of experiments reported on the Multi-domain Image Character-
istics Dataset.
3 Approach
natural fit for the model presented in this paper, as they provide a rich representation
of image data and are robust to tiny variations in the decoupled features of the image.
Let w[lower, higher] be a matrix where (lower, higher) are the dimensions of lower-
level and higher-level capsules respectively. The depth of the vector (dimensions)
is achieved by stacking m feature maps together. The vector output of the 32 lower
capsules is sent to all the higher-level capsules.
Essentially, from the squash function, it can be inferred the lower level capsules
sends information only to the capsule having the closest centroid to themselves; as it
reinforces this connection. It enforces a level of agreement or disagreement between
the capsules in different layers. The squash function:
||sj2 || sj
vj = (1)
1+ ||sj2 || ||sj ||
A prediction vector ûi/j is the prediction from the capsule i to the output of the capsule
j. If the activity vector vj is in close agreement with the prediction vector ûi/j , we
strengthen the connection bij . This is the Routing algorithm introduced in capsule
networks. “Agreement” coefficient:
The Routing algorithm works on inner epochs/iterations which specify the number
of times it needs to be run. This is a hyper-parameter to the capsule network model.
An epoch starts with bij = 0 for all capsules i in the lower level and corresponding
connection capsules j in the higher level.
A normalization function is added to bij . We define
||sj2 ||
vj = ŝj (5)
1 + ||sj2 ||
3.5 Losses
[8] Let Z be a latent variable, X be a real distribution, P the encoder network, Q the
decoder network and E the expectation.
log P(X ) − DKL (Q(Z|X )||P(Z|X )) = E(log P(X |Z)) − DKL (Q(Z|X )||P(Z))
(10)
Equation (10) is the variational autoencoder objective function. The left-hand
side of the objective can be interpreted as lowering the bound of log P(X ), which
describes our data. The error is the KL Divergence term and lowers the bound of
P(X ). The maximum likelihood estimate [9] (MLE) can be calculated by maximizing
log(P(X |Z)) and minimizing the difference between the true latent distribution P(Z)
and a simple Gaussian distribution Q(Z|X ).
Cross-Domain Variational Capsules for Information Extraction 69
where, k is the dimension of the Gaussian distribution, trace(x) is the trace function
(sum of diagonal of X ) and det(x) is the determinant (diagonal of matrix X ).
4.1.1 Metrics
The model’s objective dictates that it is tolerant with noisy characteristics but not
with missing ones. Due to this unequal weightage given to false positives and false
negatives, accuracy is a poor evaluation metric. Hence, the model uses recall and
precision instead. To achieve the objective, the recall must be high, while the precision
could be low.
4.1.2 Evaluation
4.2 Results
5 Conclusion
References
1. Bottou, L.: From machine learning to machine reasoning. Mach. Learn. 94(2), 133–149 (2014)
2. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10),
1345–1359 (2009)
3. Sun, B., Feng, J., Saenko, K.: Return of frustratingly easy domain adaptation. In: Thirtieth
AAAI Conference on Artificial Intelligence (2016)
4. Barlow, H.B.: Unsupervised learning. Neural Comput. 1(3), 295–311 (1989)
5. Nagaraj, A.K.A., Venkatesh, A.: Multi-domain Image Characteristic Dataset. https://www.
kaggle.com/grassknoted/multidomain-image-characteristics-dataset (2020)
6. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural
Information Processing Systems, pp. 3856–3866 (2017)
7. Doersch, C.: Tutorial on variational autoencoders. Stat 1050, 13 (2016)
8. Hershey, J.R., Olsen, P.A.: Approximating the kullback Leibler divergence between gaussian
mixture models. In: 2007 IEEE International Conference on Acoustics, Speech and Signal
Processing—ICASSP’07, vol. 4, pp. IV–317. IEEE (2007)
72 A. Nagaraj et al.
9. Myung, I.J.: Tutorial on maximum likelihood estimation. J. Math. Psychol. 47(1), 90–100
(2003)
10. Nagaraj, A., Sood, M., Srinivasa, G.: Real-time automated answer scoring. In: 2018 IEEE 18th
International Conference on Advanced Learning Technologies (ICALT), pp. 231–232. IEEE
(2018)
Automotive Accident Severity Prediction
Using Machine Learning
Abstract Prediction of the automotive accident severity plays a very crucial role in
the smart transportation system. The main motive behind our research is to find out
the specific features which could affect the vehicle accident severity. In this paper,
some of the classification models, specifically logistic regression, artificial neural
network, decision tree, k-nearest neighbors and random forest, have been imple-
mented for predicting the accident severity. All the models have been verified, and
the experimental results prove that these classification models have attained consid-
erable accuracy. The results of this research can be used in the smart transportation
system to predict if the road accident will be slight, severe or fatal, in accordance
with the top three features as predicted from the machine learning model.
1 Introduction
Road accidents are an increasing cause of concern in today’s world. These accidents
result in injuries, damage to properties and even death. These accidents also cause
heavy monetary losses. Many researchers have tried to examine the significant fea-
tures that can affect the automotive accident severity [1, 2]. The main aim of this
Rest of the sections are arranged as follows. In Sect. 2, some of the related works on
the prediction of crash severity are described briefly. The proposed model is presented
in Sect. 3. It is then followed by implementation of the model and results analysis in
Sect. 4. Further, our paper sums up with a conclusion and any possible future work
that can be extended from our research in Sect. 5.
2 Literature Review
Predicting the severity of road accidents has been a major challenge globally. Irani-
talab et al. [4] presented that pretermitting crash costs would result in misreckoning
while selecting the correct prediction algorithm. They have developed a crash cost-
based approach to compare the prediction models of accident severity and researched
about various clustering algorithms. Alkheder et al. [5] proposed an artificial neural
network algorithm, which would predict the severity of injuries in the road accidents.
For better accuracy of ANN classifier, the datasets were then splitted to three specific
clusters by the use of K-means Clustering (KC) algorithm. The outcomes after the
clustering unveiled remarkable enhancement in the accuracy of the ANN classifier.
Zong et al. [6] compared two ML modeling algorithms, Bayesian network as well
as regression models and concluded that the Bayesian network is more efficient than
regression models for predicting accident severity. Hashmienejad et al. [7] contrived
a multi-objective Genetic Algorithm (GA) for optimizing and identifying protocols
in accordance with the metrics (confidence, comprehensibility and support). Kunt
et al. [8] predicted the accident severity by implementing twelve crash related fea-
tures in GA, pattern search and ANN algorithms. They concluded that the ANN
algorithm obtained the highest R-value, leading to the result that ANN provided the
best prediction. The security and privacy issue of Internet of Things (IoT) are sur-
vey in detailed which mentioned that machine learning could be used for addressing
security issue [9, 10]. Although most of the previous works presented the effects of
various classification models, there has been no specific contribution that can com-
Automotive Accident Severity Prediction Using Machine Learning 75
pare the accuracy of five classification models taken together. Therefore, we have
collectively applied these models and further, accuracy of all the above mentioned
algorithms is compared, so that we can find the most efficient algorithm which can
predict the accident severity.
features with higher or wider ranges illegitimately dominating over those with low
variance. K-Fold cross-validation is used to produce a less biased result. Here, ten
splits have been chosen. Random forest model [11] can handle datasets with higher
dimensionality. Decision tree algorithm is used since it is quite resistant to the out-
liers [12]. Artificial neural networks [13] algorithm is used as it needs less statistical
training. KNN algorithm classifies a new data point, based on the similarity between
new and available data points [14]. Here, in our research, we have also used multi-
nomial LR classification, where it can deal with three or more than three classes
[15]. The mean accuracy and standard deviation of these five classification models
are compared. The selected features are then clustered using k-means clustering.
The ML model with highest accuracy is trained while clustering, which predicts the
severity of the impending accident.
Here, the authors have used a 64-bit operating system, x64-based processor with
an installed memory (RAM) of 8.00 GB. The system has an Intel(R) Core(TM)
i5-8250U CPU. A laptop manufactured by HP (Hewlett-Packard) is used, where
Windows 10 OS is booted by default. Python (version 3.6.10) is the programming
language used in this project. The front-end/UI technology used here is Flask (version
Automotive Accident Severity Prediction Using Machine Learning 77
1.1.2). The integrated development environments (IDEs) used are Jupyter Notebook
and PyCharm. Way2sms API is used for sending mobile alerts to the drivers or
doctors of the nearby hospitals.
The comparison of all the five ML models, after implementing all the machine
learning classifiers, is summed up in Fig. 3. The best ML model as per our research
is the Artificial Neural Network (ANN) with a mean accuracy of 73.98%. We also
concluded that the three most important conditions which can affect the automotive
accident severity are the age of casualty, number of vehicles and casualty class-
pedestrian.
References
1. Chong, M., Abraham, A., Paprzycki, M.: Traffic accident data mining using machine learning
paradigms. In Fourth International Conference on Intelligent Systems Design and Applications
(ISDA’04), Hungary, pp. 415–420 (2004)
2. Chong, M.M., Abraham, A., Paprzycki, M.: Traffic accident analysis using decision trees and
neural networks. arXiv preprint cs/0405050 (2004)
3. Dey, M.R., Satapathy, U., Bhanse, P., Mohanta, B.K., Jena, D.: MagTrack: detecting road
surface condition using smartphone sensors and machine learning. In: TENCON 2019—2019
IEEE Region 10 Conference (TENCON), pp. 2485–2489. IEEE (2019)
4. Iranitalab, A., Khattak, A.: Comparison of four statistical and machine learning methods for
crash severity prediction. Accid. Anal. Prev. 108, 27–36 (2017)
5. Alkheder, S., Taamneh, M., Taamneh, S.: Severity prediction of traffic accident using an arti-
ficial neural network. J. Forecast. 36(1), 100–108 (2017)
6. Zong, F., Xu, H., Zhang, H.: Prediction for traffic accident severity: comparing the Bayesian
network and regression models. Math. Probl. Eng. 2013 (2013)
7. Hashmienejad, S.H.A., Hasheminejad, S.M.H.: Traffic accident severity prediction using a
novel multi-objective genetic algorithm. Int. J. Crashworth. 22(4), 425–440 (2017)
8. Kunt, M.M., Aghayan, I., Noii, N.: Prediction for traffic accident severity: comparing the
artificial neural network, genetic algorithm, combined genetic algorithm and pattern search
methods. Transport 26(4), 353–366 (2011)
9. Mohanta, B.K., Jena, D., Satapathy, U., Patnaik, S.: Survey on IoT security: challenges and
solution using machine learning. Artificial Intelligence and Blockchain Technology, Internet
of Things, p. 100227 (2020)
10. Mohanta, B.K., Satapathy, U., Jena, D.: Addressing security and computation challenges in
IoT using machine learning. In: Advances in Distributed Computing and Machine Learning,
pp. 67–74. Springer, Singapore (2020)
11. Mohapatra, N., Shreya, K., Chinmay, A.: Optimization of the random forest algorithm. In:
Advances in Data Science and Management, pp. 201–208. Springer, Singapore (2020)
12. Tanha, J., van Someren, M., Afsarmanesh, H.: Semi-supervised self-training for decision tree
classifiers. Int. J. Mach. Learn. Cybern. 8(1), 355–370 (2017)
13. Da Silva, I.N., Spatti, D.H., Flauzino, R.A., Liboni, L.H.B., dos Reis Alves, S.F.: Artificial
neural networks, p. 39. Springer, Cham (2017)
14. Yu, B., Song, X., Guan, F., Yang, Z., Yao, B.: k-Nearest neighbor model for multiple-time-step
prediction of short-term traffic condition. J. Transp. Eng. 142(6), 04016018 (2016)
15. Yin, M., Zeng, D., Gao, J., Wu, Z., Xie, S.: Robust multinomial logistic regression based on
RPCA. IEEE J. Sel. Top. Signal Process. 12(6), 1144–1154 (2018)
Analysis of Quality of Experience (QoE)
in Video Streaming Over Wi-Fi in Real
Time
Abstract Over the years, in wireless and mobile networks, video traffic is becoming
more dominant. In order to assess the users’ satisfaction of the services, a measure
has to be considered which depicts the delight or annoyance of the users’ experience
with the services. Quality of Experience is one such measure which focuses on the
experience of the users with the services delivered, unlike quality of service (QoS)
which focuses on the media or network itself. In addition to video transmission,
Quality of Experience introduces a user experience-driven strategy that focuses on
the contextual and human factors. This is helpful because it expresses user experi-
ence both objectively and subjectively. Hence, in order enhance viewers’ experience,
measuring the Quality of Experience of the services along with network and a system
factor proves to be beneficial. We aim to analyze the Quality of Experience of users
in the university. The data gives insight about the various parameters that affect trans-
mission of video or any data in that regard. The quality of the transferred videos is
assessed by the end users by rating their experience. We aim to provide objective
and subjective measure of Quality of Experience by analyzing the factors affecting
Quality of Experience and the users’ experience, respectively.
1 Introduction
User satisfaction is important for any service provider, since it is decisive in deter-
mining the success of the service. Hence, quality of service based on user’s perception
plays an important role. Quality of Experience (QoE) is one such measure that reflects
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 79
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_10
80 M. Vijayalakshmi and L. Kulkarni
this user’s perception. It shows the satisfaction level of the customer is with a certain
service and represents how well the service fulfills the user’s expectation [1].
In video streaming and related applications, user viewing experience plays a major
role in determining whether the user wants to repeat the services or discard it forever.
A user will continue to avail the services of the same network provider depending
on the experience of the services offered, be it be video buffering and loading time,
or the quality of transmission. With the increasing demand of multimedia services
such as video transmission, there is a need to develop performance-based evaluation
metrics to evaluate video services/applications.
Although there are many video quality representation metrics like peak-signal-to-
noise-ratio (PSNR), jitter, bandwidth that objectively measures the quality of video
between the clients, but user’s views are not considered for quality evaluation, hence
they are incapable of representing the true experience of users. Quality of Experience
(QoE) is a user centric quality strategy that overcomes the shortcomings of the above
quality metrics. QoE is the degree of satisfaction or dissatisfaction of the user with an
application or service. There are various factors which drive the Quality of Experience
(QoE) for video consumption which in turn plays a key role in the perception of
quality of the service.
The paper is aimed to analyze the various factors affecting the quality of video
transmission over Wi-Fi of the university and thereby analyze the Quality of Expe-
rience (QoE) of the users with respect to these videos sent over the network. The
quality of the videos is assessed subjectively by the collective ratings given by the
users. These ratings constitute mean opinion score (MOS). MOS in this context is
a numerical measure of the human—judged overall quality of an experience. The
rating scale ranges from 1 to 5 with 1 indicating bad and 5 indicating excellent expe-
rience. The quality of the video is evaluated objectively by objective video quality
models, where several independent variables such as bit rate, length, PSNR are fit
against the results obtained in a subjective quality evaluation using regression tech-
niques. Finally, the objectively predicted values are compared with subjective scores
available as mean opinion score (MOS).
One of the most popular online services today is video streaming. It is occupying more
than 77% of all consumer Internet traffic [2] as per the cisco visual networking index.
Users demand high Quality of Experience (QoE) while using these video services
on wireless networks, such as Wi-Fi. This poses a challenge for network admin-
istrator’s environments such as university campuses and also the service providers.
Guaranteeing the best possible QoE becomes consequential. This leads to challenges
in optimizing network resources and also providing better experience to end-users.
Hence, QoE becomes a key metric for both network providers and end-users.
Analysis of Quality of Experience (QoE) … 81
Setup In Fig. 1, we show the proposed framework for the analysis of QoE. The anal-
ysis of QoE of videos is done over university Wi-Fi network. The network condition
at a place depends on the health of the Access Point the user is connected to. For
our analysis three network conditions, i.e., good, medium and poor, based on the
performance and health of the Access Points in the campus, the places are selected
for the transfer of videos under study. The Access Points (AP) is Aruba AP-135 [3].
Videos is sent from sender (Client 1) to receiver (Client 2) over the selected places
of the campus. Data and the changes during the transfer of the videos are extracted
from the Aruba controller software, [4] where MAC address and IP address identify
the devices of transfer. FFmpeg [5] tool is used to extract the required video charac-
teristics of the received videos. FFmpeg is a video framework with large collection
of coding libraries. It is also used to calculate PSNR of the received videos. PSNR is
a video quality metric or performance indicator. Receivers rate their experience and
MOS from all the receivers is tabulated and compared with MOS obtained from the
QoE metrics.
Test videos A total of 22 videos are transferred from Client 1 to Client 2 at different
network conditions. The videos are of variable length, resolution and in MP4 format.
PBR It is the number of bits per second. It determines the size and quality of the
video, the higher the bit rate, the better the quality. Hence, higher bit rate may be
providing excellent video quality.
Dropped frames When the connection to the server is unstable, or problems such as
random disconnections due to firewall/anti-virus/security software, routers, etc. Will
lead to dropping of frames. Because of this, some of the video frames will be dropped
in order to lower the traffic. This may lead to disconnection from the streaming server.
Due to congestion during the transfer of videos, the dropped frames will be resent
which constitutes the retried frames.
PSNR Peak signal-to-noise ratio is the ratio between the maximum power of signal
and the power of corrupting noise. Logarithmic decibel scale is used to express
PSNR, as many signals have a wide dynamic range. PSNR is used in detecting the
presence of dropped frames and the location of dropped frames in a video.
3 Related Work
Former works [6] have introduced a machine learning technique that explains the QoE
ground-truth, for all the video applications. In contrast to the above work, we focus
on analyzing the factors affecting the QoE for video streaming over university Wi-Fi
network and providing a comparison between subjective and objective MOS which
depicts the Quality of Experience of the users. Some works [7] shows the analysis of
video streaming over mobile network by considering MOS metrics. Models proposed
were a combination of clustering and logistic regression methods on a realistic data
set. Compared to this work, our model proposes to use different models such as
random forest, Ridge, Linear, and Lasso regression for analysis on a Wi-Fi network
in real time over a university scenario. Authors [8] proposed an analysis of QoE over
video streaming, by considering MOS (matrices). Proposed model is a combination
of K-mean clustering method and logistic regression method, and experiments were
conducted on realistic datasets and have the precision of 96.94, 97.13, and 97.54%
on dataset 1, dataset 2, and dataset 3. Authors [9] have developed a model based on
Markov chains for user experience using adaptive streaming in dynamic environment
and buffer-based DASH clients for switching frequency. The Authors [10] proposed
an SDN control plane approach to multimedia transmission problem, employing
video encoding based on latest standard. In paper [11], author explains about the
video streaming importance in the Wi-Fi environment and how helpful it will be to
stream video in Wi-Fi condition.
Analysis of Quality of Experience (QoE) … 83
4 Data Analysis
The dataset comprises of 22 videos that are of variable length, resolution, codec,
size, and in MP4 format. These videos resent over Wi-Fi network and are received
by receiver (Client 2). The receiver is then asked to rate their experience based on
the quality of the received videos on a scale of 1-5. All the ratings are collected and
this constitutes the subjective MOS. Subjective MOS gives the user’s perception of
video quality.
Subjective MOS is taken from user’s rating. And, objective MOS is calculated by
taking characteristics of received video and different parameters depicting network
conditions. Train and test data are split in the ration 7:3. QOE metrics, subjec-
tive MOS, and different characteristics are considered as x, and objective MOS is
predicted for test data. The further subjective and objective MOS are calculated
(Fig. 2).
Machine learning techniques are used to predict the objective mean opinion score.
Different models like linear regression, Lasso regression, ridge regression, AdaBoost,
random forest have been implemented and their accuracy and mean absolute error
have been calculated. Linear regression performs the task to predict the value of
dependent variable (Objective MOS) based on a given independent variable (QoE
metrics). By applying this model, mean absolute error of 5.023 was obtained. To
reduce the over-fitting caused by simple linear regression and to reduce complexities,
some of the simple techniques like ridge and Lasso regression are used. By applying
ridge and lasso regression, mean absolute error of 0.867 and 0.327 was obtained,
MOS Scores
respectively. By applying random forest model, 0.483 accuracy was obtained. The
predicted MOS and actual MOS have been compared.
Model name mean absolute error linear 5.023 Lasso0.327 random forest 48.3
ridge 0.867 (Table 1).
5 Conclusion
Measuring the Quality of Experience plays a major role in determining users’ satis-
faction with the services. The videos sent and received over the network based on
different performance states of the access points like good, medium, and low shows
how the experience of the users is affected based on these parameters. The compar-
ison between the subjective and objective mean opinion score (MOS) depicts the
users experience based on their perception of quality, and the quality of the received
videos based on the different parameters that contribute to it, respectively. This helps
in understanding of the how the users perceive the quality of the videos as well as the
different network and video parameters that determine the quality of the transferred
videos.
6 Future Work
References
1. Dai, Q.: A survey of quality of experience. In: Lehnert, R. (ed.) Energy-Aware Communications.
Springer, Berlin Heidelberg, pp. 146–156 (2011)
2. Pepper, R.: Cisco visual networking index (VNI) global mobile data traffic forecast update.
Tech. Rep. (2013)
3. Aruba ap-135. Available at http://content.etilize.com/user-manual/1023377357.pdf
Analysis of Quality of Experience (QoE) … 85
Abstract Soldiers of any nation are engaged in close combats against terrorists
where human life is at stake. Unmanned Ground Vehicles are used everywhere to
reduce human life loss as it may be impossible to have a human operator present
at the location. The vehicle will have a set of sensors to observe the environment
which mainly contains four cameras and a gun loaded on the top of the UGV acting
like a turret. The bot will have two modes of operation. For autonomous driving,
the accuracy of the self-driving model is about 74% and it will make decisions by
getting real-time feeds from the camera by using the Image Processing algorithm
with an accuracy of about 95%. For manual driving, a human operator will control
it from a remote control centre over the Internet by providing security against one of
the biggest threat in remote control vehicles i.e. Man-in-the-middle (ITM) Attack.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 87
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_11
88 H. Vichore et al.
1 Introduction
Carrying out military affairs requires a lot of manpower these days. Thus, the secu-
rity of life becomes the most important and prevailing question. To solve this a
concept known as Unmanned Ground Vehicle or in short UGVs was introduced. It
is a mechatronics robot that is used in place of humans to carry out life-threatening
tasks such as surveillance, disposal of bombs and shooting on spot. Albeit a military
robot, the application of it does not limit to defence systems only. It can also be used
for domestic purposes such as a toy car, cleaning bot or a payload carrying bot etc.
People, before used to lambaste on this idea but lately they are more munificent about
this idea. Climacteric operations such as rescue operations are the coup de grace for
using this technology. To also reduce the latency and time of the decision the bot is
self-driven. With 74% of accuracy, the bot will be able to avoid obstacles and correct
its location which is very helpful in rescue operations. The vehicle will use various
components such as cameras, servo motor, stepper motors, geared brush-less DC
motors. The cameras will observe the environment and detect human location and
movements. The vehicle will be controlled automatically through if needed a user
who is in a remote location will be able to override the decision of the autonomous
bot. The vehicle would be armed with a weapon mounted on a turret which will auto-
matically change its direction and follow the target. A person remotely operating the
vehicle will be getting the live video feed from the camera and will be able to trigger
the weapon according to his own decision. Security is a major issue in this type of
vehicles. Spoofing and Man-in-the-middle (MITM) attacks are the common attacks
used in security breaches. To provide security to the vehicle it will be averted from
such attacks using encryption techniques such as two-way authentication protocols
and Strict Transport Layer Security (STLS).
2 Literature Survey
Thomas et al. [1] describes the design and implementation of the unmanned ground
vehicle (UGV) for surveillance and bomb diffusion is presented. This paper gives
a general idea of surveillance, although the feedback is a haptic glove system. Fol-
lowing this idea, Noor et al. [2], describes the development of a remote-operated
multi-direction Unmanned Ground Vehicle (UGV) was designed and implemented.
Xbee Pro, PIC micro-controller was used to achieve this. The 16-bits Microchips
microcontroller were used in the UGV’s system that embeds with Xbee Pro through
variable baud-rate value via UART protocol and controls the direction of wheels.
For securing the communication Sandhya and Devi [3], provides countermeasures
for thwarting the MITM attack. Along with the existing approaches a new way was
discussed in the paper. It includes the generation of security overheads and allocating
a separate channel for the security.
Self Driven UGV for Military Requirements 89
Yong et al. [4], provides the methods for performing MITM along with defence,
handicapping the attack was also analyzed. Quantum key distribution was used to
distribute cryptographic key. Correspondents have no way to verify each other’s
key which created all the problem leading to communication be noticeably delayed.
For increasing the speed of object detection and still keeping the accuracy same,
Kanimozhi et al. [5], provides the solution of a lightweight model to reduce the
computational overhead. For the above purpose, MobileNet was used. Single Shot
Multi-Box Detection was used to increase accuracy and identifying the items in the
household real-time. Tensorflow Object Detection API is also used in this process.
The insights of Anping Gu and Xu et al. [6], proved to be very helpful in making the
detection of objects very fast. In this paper, a vision-based real-time circular ground
marker detection method, which is used to guild small autonomous control robotic
UAV (RUAV) to pick up a ground target object, is presented. The method is based
on RHT algorithm and running entirely on GPU via Microsoft DirectX 11 API since
the CPU on-board doesn’t have sufficient computing power.
To further improve the performance of object detection, Talukdar et al. [7], does
transfer learning through the use of synthetic images and pre-trained convolutional
neural networks offer a promising approach to improve the object detection perfor-
mance of deep neural networks. The object detection performance of various deep
CNN architectures is also studied, with Faster-RCNN proving to be the most suitable
choice, achieving the highest mAP of 70.67. Transfer Learning was used to increase
the accuracy of the state google’s TensorFlow object detection API and extend it on
a synthesized data. For self-driving car and it’s testing on an environment CARLA
the research of Dworak et al. [8], proved very resourceful although this paper uses
LiDAR for object detection, RGB camera sensor of the environment can also be
used for object detection. The data generated from the CARLA was used to train
the model which is a CNN model to create a self-driving experience. For navigation
of self-driven cars using CNN, Duong et al.[9], provided a very innovative way of
dealing with the complications of using Markov models and hence need of generative
models was eliminated. Such a colossal task was reduced down to one simple model.
3 Existing System
It allows the soldier to spot out the enemies on the patrol or waiting to ambush them. It
can help save lives. It can adjust the strategies based on the surroundings. UGVs can
be used to detonate the explosive. Helps in the firefight, combat as well as to supply
ammunition. It can spot the explosives or the human opposition before the soldiers
can be harmed in the combat. Some of the drawbacks in existing systems where,
Bandwidth always is the problem with the wireless solutions and even some wired
solutions. Battery discharge during some mission. If autonomous then can misfire
some other person rather than the enemy. Expensive, The big disadvantage is the
90 H. Vichore et al.
cost at which these vehicles bring in our military. Requires its specific programming,
They require many engineers to spend countless hours on testing and designing them.
They can be destroyed before they have benefited any of our soldiers.
4 Network Architecture
Majorly, the experiments done for the self-driving car involved some of the other
variations of generative models viz. hidden Markov models. This method was com-
putationally very expensive and thus needed a simple solution out of it. CNN is one
such solution to this problem. The camera fitted on the vehicle would act as the data
on which the training will happen. Accompanying the camera feeds is one more
feature viz. steering angle. All the above information is fed to the CNN model for
training and error will be solved by using the back-propagation algorithm. The train-
ing files are created by recording the CARLA environment which tells the speed of
the direction and steering angle. The data is recorded and stored in a npy file exten-
sion. Each npy file was of 185 MB and there were 106 such total files making the
entire dataset around 19.1 GB. Even though the recording was done at 1280 × 720p it
was later resized to 480 × 270p to make the images more CNN friendly. The record-
ing was done at 25 frames per second. The architecture used for training the model
was Inception v3 model. Since Inception focuses mainly on computational cost this
model was used. All the previous weights of the Inception v3 model was used during
the training.
There are 5 epochs overall i.e. the entire training will be done 5 times but fitting
for every file would be done just once in the batch size of 15. The learning rate for this
training was 1e-3. The average accuracy came out to be 74%. To avoid overfitting a
dropout rate of 0.3 was set. It took around 8 hours of training on Nvidia GTX 1070
GPU (Fig. 1).
4.2 Equations
Equation of CNN
G[m, n] = ( f ∗ h)[m, n] = h[ j, k] f [m − j, n − k] (1)
j k
Self Driven UGV for Military Requirements 91
1 ≤ i, j ≤ (2k + 1) (6)
4.3 CARLA
CARLA is an open source simulator for researching self driving cars. It’s dynamic
and free roam capacity along with different weather patterns and changing of game
mechanics according to it makes it the front runner in any consideration for self
driving experiments. As of writing this paper the latest stable version available for
Windows 10 is 0.9.5. CARLA along with standard RGB camera has 7 different
sensors which can also be used to make the self driving experience even more realistic.
In the experiments every frame from the CARLA was picked and applied Gaussian
blur (Fig. 2) to make out the out-linings of the lane and delete every other components
that doesn’t matter. To zero in on the lanes and avoid crossing of it this is a must do
process. Before applying Gaussian Filter Canny Edge Detection was also used with
two threshold values as 150 and upper end capping at 220. These values were set after
a bit of research and found out that the algorithm works best at these values. A 5 × 5
mask was applied during Gaussian Filter. To extract the shape of the lanes Hough
Lines Transform was used. A major reason behind this being it can recognize the
shape even if it broken or distorted to some extent.Finally, the images with Gaussian
Blur on it was imposed on the original images and we got back the original images
with two lines of green color indicating lanes (Fig. 2b). Our car will always be in
between these lanes and crossing it accounts for an error. This image which has
lanes imposed on it along with the vehicle is considered for training the model. The
prediction part of the model outputs an image of size 480 × 270p which is then scaled
up to 1280 × 720p. The choices for the prediction are straight, left, right, reverse,
forward left, forward right, reverse left, reverse right and lastly no keys pressed.
(a) Gaussian Filter Applied on (b) Lane detection for CARLA us-
CARLA ing Gaussian Filter
5 Proposed System
5.1 Objectives
• It identifies the person as well as the gun with the help of object detection API
from google.
• Increases the security.
• The bot can be triggered from the remote location.
• Can identify person in the night as well.
• To make the bot self driven.
5.2 Advantages
• Increases mobility.
• Loss of human life is reduced.
• Voice or autonomous controlling of bot.
• Invulnerable to Spoofing and MITM.
6 Methodology
We are using Tensorflow object detection API, a deep learning model which is pre-
trained model to detect the number of persons along with the weapon they are using.
The current version of the object detection API is 9.0.2. MobileNet V3 and COCO
database are used for training the model for which ResNet is used. A protocol buffer
known as Protobuf version 3.0.0 is used which helps in maintaining the kernel-based
interaction of the API for scheduling of the jobs. For the vehicle, iron is used instead
of aluminium to maintain ground clearance and stability. Two brushless DC planetary
motors are used. Two batteries with 12 V 9 A configuration are used in parallel. To
handle such heavy load RKI 1341 is used providing safety to the micro-controller.
The micro-controller used here is Raspberry Pi 3B+. Rpi is present on the vehicle for
processing all the inputs which includes controlling the hardware and also managing
the video feed input and providing it to the server. For safety precautions, 100 F
capacitor is used along with 1000k acting as a shield. End stop switches are used to
make the vehicle move forward, backward, left and right. The vehicle is to be armed
with a turret, originally an air pressure gun was used but reloading, in that case,
was very difficult. So, a gear-based model gun is used in that place. Stepper motor,
NEMA 17, is used to turn the motor with 30 steps. Security is a major concern in
this type of system and one of the most tangible attacks in this situation is Man in the
Middle Attack. So, to prevent this a two-step authentication is used and the service
was provided by NGROK. It also uses SHA-512 system for encryption. NGROK
creates a VPN tunnel through there servers and providing proxy servers for safety.
The connection is a TCP/STLS connection which maintains the security throughout.
The bot is contacted through the internet and no hotspot module is connected but a
GSM module is used.
7 Experimental Set up
Image of the bot (Fig. 4a) after completely assembling it with all the cameras and
gun. Using Scapy (Fig. 4) with python to perform Man-in-the-Middle attack in kali
linux. Using Metasploit (Fig. 4b) on kali linux to perform Man-in-the-Middle attack
using different types of sniffers. Using Raspberry pi (Fig. 4c) to provide internet
to the bot and transfer camera feeds over it. An example of how tensorflow object
detection API (Fig. 4d) works and classifies different types of objects. A diagram of
how CNN (Fig. 4e) is used to make self driven cars using camera feeds and steering
angles.
Self Driven UGV for Military Requirements 95
(e) Object Detection API (f) Flowchart for Self Driving Car
8 Results
To make the system secure and avoid MITM attack NGROK (Fig. 5a) was used.
It provides tunnels from a public endpoint to local running services. Given below
is the NGROK tunnel in working condition. The training accuracy (Fig. 5b) of the
model caps out at just a hair below 80% after more than 30 thousand iterations of
training.The validation accuracy (Fig. 5c) caps at around an average of 74% with
clear signs of over-fitting here and there.
96 H. Vichore et al.
9 Conclusion
Thus concerning the above lying scope decided for the project, we have successfully
implemented an Unmanned Ground Vehicle for threat detection and elimination
which is a prototype bot which can be used in war-prone areas to detect the threat
remotely using classification model which shows object detected in-camera and can
be remotely controlled over the internet along with the camera feed. CARLA is a
continuous and dynamic environment and thus data collected from it provides a very
Self Driven UGV for Military Requirements 97
(a) (b)
(c) (d)
(e) (f)
Fig. 6 Testing of the model on CARLA
realistic approach towards the self-driving car. CNN proves to be a better method
than hidden Markov models which are generative and computationally expensive.
The accuracy of the model is around 74%.
10 Future Scope
Due to limited computational resources, the data on which the model was trained
was limited. So, more data can be collected and different optimizers can be tried out
as well along with different loss functions. For the vehicle, a more lightweight metal
can be used to reduce its weight and the object detection API can be made more fast
and efficient as well. The reliability of the model can be increased because it still
runs out of bounds which can be a very serious threat. Over-fitting should also be
reduced.
98 H. Vichore et al.
Acknowledgements Every aspect of this idea that was brought into fruition wouldn’t have been
possible without the tremendous family support for all the authors. We would also like to thank the
SIES Graduate School of Technology and the HOD of IT Dept. Dr. Lakshmisudha Kondaka for
allowing us to work on the project and supporting us throughout.
References
1. Thomas, S., Devi, A.: Design and implementation of unmanned ground vehicle (UGV) for
surveillance and bomb detection using haptic arm technology. In: 2017 International Conference
on Innovations in Green Energy and Healthcare Technologies (IGEHT), Coimbatore, pp. 1–5
(2017)
2. Noor, M.Z.H., Zain, S.A.S.M., Mazalan, L.: Design and development of remote-operated multi-
direction unmanned ground vehicle (UGV). In: 2013 IEEE 3rd International Conference on
System Engineering and Technology, Shah Alam, pp. 188–192 (2013)
3. Sandhya, S., Devi, K.A.S.: Contention for man-in-the-middle attacks in bluetooth networks.
In: 2012 Fourth International Conference on Computational Intelligence and Communication
Networks, Mathura, pp. 700–703 (2012)
4. Wang, Y., Wang, H., Li, Z., Huang, J.: Man-in-the-middle attack on BB84 protocol and its
defence. In: 2009 2nd IEEE International Conference on Computer Science and Information
Technology, Beijing, pp. 438–439 (2009)
5. Kanimozhi, S., Gayathri, G., Mala, T.: Multiple real-time object identification using single shot
multi-box detection. In: 2019 International Conference on Computational Intelligence in Data
Science (ICCIDS), Chennai, India, pp. 1–5 (2019)
6. Gu, A., Xu, J.: Vision based ground marker fast detection for small robotic UAV. In: 2014
IEEE 5th International Conference on Software Engineering and Service Science, Beijing, pp.
975–978 (2014)
7. Talukdar, J., Gupta, S., Rajpura, P.S., Hegde, R.S.: Transfer learning for object detection using
state-of-the-art deep neural networks. In: 2018 5th International Conference on Signal Processing
and Integrated Networks (SPIN), Noida, pp. 78–83 (2018)
8. Dworak, D., Ciepiela, F., Derbisz, J., Izzat, I., Komorkiewicz, M., Wójcik, M.: Performance of
LiDAR object detection deep learning architectures based on artificially generated point cloud
data from CARLA simulator. In: 2019 24th International Conference on Methods and Models
in Automation and Robotics (MMAR), Miedzyzdroje, Poland, pp. 600–605 (2019)
9. Duong, M., Do, T., Le, M.: Navigating self-driving vehicles using convolutional neural net-
work. In: 2018 4th International Conference on Green Technology and Sustainable Development
(GTSD), Ho Chi Minh City, pp. 607–610 (2018)
Vehicular Ant Lion Optimization
Algorithm (VALOA) for Urban Traffic
Management
Abstract In the past years, various routing methods have been advanced for
VANETs. Routing protocols that utilize various parameters have been defined to
be most suitable for vehicle networks due to their efficiency in production with DE
modifies due to vehicular network mobility. Parameters are linked-stable, network
speed, and environment situations, etc. This research article presents a traffic-based
management system-based protocols for VANETs satisfactory for urban city back-
ground. The novel method is an improved description of the dynamic source routing
(DSR) energy-based protocol. The developed protocol, termed effective DSR, uses
an ant-based method to search a path that has optimized network connectivity. It is
supposed that every vehicle node has a vehicle id of paths such as road and streets,
etc. Utilizing data included of small control network packets called ANTLIONs,
the vehicle nodes evaluate a distance, energy for roadsides or streets to the network
connections. ANTLION data packets are defined by the vehicle in particular areas.
To search the valuable and the perfect route among source to sink node, the src
vehicle defines the route on roads with minimum total distance and energy for the
complete path. The fitness function of the planned routing protocol has been identi-
fied, and its presentation has been calculated in reproduction parameters or inputs.
The experiment outcomes define that the PDR rate 10% improved as compared with
the existing protocol (VACO: vehicle ant colony optimization) that also utilized and
vehicle ant lion optimization (VALO) method. In the calculation, the control the E2D
and network overhead (NO) is also mitigated.
R. Kumari (B)
Department of CSE, NITTTR, Chandigarh, India
e-mail: ruchika.katoch91@gmail.com
R. Kumar
Department of CSE, CUH Mahendergarh, Mahendergarh, Haryana, India
e-mail: raakeshdhiman@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 99
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_12
100 R. Kumari and R. Kumar
1 Introduction
In the past few years, enhancements in ITS have been motivated to reduce traffic
jamming to mitigate time complexity and overhead, the improvement of transporta-
tion management or traffic safety, etc. To attain the major communication needs of
both security and without security uses in VANET situation, there is a requirement to
advance vehicle communication (VC) and smart communications (SCs). VANET is
a sub-type of a MANET, in which VCs with each other and with nearby static road-
side device (RS). Its communications include various models like as a V2V and V2I.
VANET is a developing topology that aimed to present a wireless communication
(WC) between rotating vehicles and also among vehicles and organization’s stations
[1]. The main goal of VANET is to present security-related data to vehicles. Vehicles
interchange status data like speed and location of periodic data known as beacons,
to generate awareness for neighboring vehicles, improve security, and decrease the
rate of the accidents.
Figure 1 defines a characteristic vehicle network set-up, where vehicle-2-
infrastructure transportations can be utilized to access position services or attain
traffic data. Vehicle-2-Vehicle can be engaged to prepared about difficulties or reach
out of coverage vehicle nodes (CNs) done multihop message.
Vehicular communication is used in several applications with highly varied
requirements. The probable applications of VANET are secure oriented, convenience,
and commercial oriented. Some of the uses of VANET networks are: security, weather
conditions, and traffic management, etc., [2].
In the existing routing protocol described as a hybrid approach (VACO), it is a
combination of two algorithms which are GSR and ACO optimization algorithm.
Fig. 1 Illustration of
VANET scenes: accident
information (V2V) and V2I
to send data to difficulties
services [18]
Vehicular Ant Lion Optimization Algorithm (VALOA) … 101
Network
Initialization
Searching the
vehicle nodes for Stop
data
transmission
data, flooding can be prevented at the time of communication in this approach. Ant
colony optimization approach is an algorithm that is nature inspired by the foraging
conduct of the ants. Pheromone is the part of the hormone that can be identified
by ants. Its appeal to ants, and ants follow maximum pheromone absorptions. It
includes the concept of the specialization of the group of the regulation and selective
rules. This algorithm is based on swarm intelligence approach. The approach is
based on the standard evolution and shared conduct in animals. An algorithm is
dependent on the genetic swarm of the specific insect sequences. It leads to a complex
approach and may have intelligent conduct by the complex interface of the thousands
of automating swarm associates [4]. The communication is based on the main nature
with no supervision.
In the existing research work, an author [5] presents a traffic-aware location that
depends on routing methods for VANET. An improved version of the geograph-
ical source routing (GSR) method is used in existing work. VACO method found
an optimal path between a source and a sink, search route connectivity in the
network. VACO method was searching the route to improve network performance,
but maintenance cost was high, and packet delivery was only 10% improved.
The response of this research article is ordered as express: In Sect. 2, defines the
classification of numerous routing protocols of Vehicular ad hoc networks). After
that, 3 Section defines the previous routing protocols that cannot adequately satisfy
the routing requirements in these vehicular networks, due to the active behavior
of VANETs, which is a consequence of traffic situations and problems. Section 4
describes the research policies using dynamic source routing (DSR) protocol, ant
lion optimization (ALO) algorithm, and research methodology. Sections 5 and
6 present the experimental result analysis, mathematical formulas, comparison
between proposed and existing routing protocols (VACO, DSR, and VALO), define
conclusion and future scope.
2 Routing Scenarios
The routing process is the provider responsible for inventing and protecting routes
between origin and destination hops. The routing process is essential for network
operations. Routing protocols are used for interchanging node data among the nodes
in the network with the least network overhead. VANET routing protocols may be
indicated by different factors such as routing algorithm used, information of routing
data, similarities of protocols, network protocols. The VANET can be categorized as
[6].
Vehicular Ant Lion Optimization Algorithm (VALOA) … 103
This protocol is to connect the data that is present in the system to perform the
forwarding of the data. These protocols are categorized as [7]. The methods use
associations data that occur in the vehicle network to achieve packet forwarding.
This routing protocol is the method of direction finding of the data such as the nearest
node is managed in contextual instead of the transmission demands. The data packets
are regularly broadcasted and scattered between the hops to manage the route, and
then, the routing table is developed to build in the hop that recognizes the next node
along the receiver hop. The main benefit of the routing protocol is that there is a
requirement of the path investigation though receiver hop which is placed in the
contextual, but this protocol has minimum latency. It is known a table-driven routing
protocol. The protocol runs periodically due to the interchanging of the topology
between the hops in the system.
This protocol contains the route investigation stage, where query packets are scattered
to the system for searching for the route and completion of the task. The routing
protocols is known as on demand routing protocol since of the periodic updating of
the route when the information is to be transferred.
This protocol is developed to decrease the data overhead of pro-active routing proto-
cols, thus reduce the starting path investigation delay in reactive routing protocols
[8].
This protocol contains the group of routing approach. The geographical positioning
data features are shared to excellent the nearest advancing nodes. The data packet is
forwarded in the absence of the mapping data to one neighbor node that is nearest
to receiver hop. This is essential routing due to the absence of the globalized path
from sender to receiver hop which is required to be generated and managed. These
protocols are location greedy vehicle to vehicle protocol and delay resistant protocol.
104 R. Kumari and R. Kumar
This protocol is formed in which vehicles are placed close to each other to form
clusters. Every cluster has a unique CH that is accountable for intra- and inter-
cluster maintenance. Intra-cluster hops interconnect using directional links. On the
other hand, inter-cluster communication is presented by cluster head (CH).
It is mainly a position based on multiple route protocol. The main goal of this protocol
is to transmit data packets from source to all connecting hops in a unified geographical
area. The vehicles which are placed in the external region of the zone are not notified
to prevent an unexpected dangerous response. This protocol is determined as the
multiple cast provision in a geographical area [9].
3 Prior Work
In the section elaborates on the survey of the various research articles in VANET.
As described already, routing procedures are a major problem in vehicular ad hoc
networks. Goudarzi et al. [10] presented research on a traffic-aware position-based
routing protocol for VANET that was appropriate for the city scenario. The routing
protocol is an improved Ver. of the geographic src routing protocol. GSR protocol
is utilized as an ant dependent approach to search the path that has the optimum
connection. It was estimated that each vehicle has a digital map of the path that
consists of the distributed route. The data contain the smaller controlled packets
which are known as the ant and vehicle compute the mass of each route segment
related to that connection. The data packets of ant were established by the vehicles in
street areas. The optimum path was searched among the sender and receiver, where
the sender vehicle recognizes the route of the mapping street with less mass of the
Vehicular Ant Lion Optimization Algorithm (VALOA) … 105
whole path. The exact function of the planned protocol was recognizing, and perfor-
mance was evaluated through simulation outcomes. Simulation outcomes showed
that a PDR was enhanced by more than 10% for speed up to 70 km/h and associ-
ated to the VANET routing protocol that was based on the VACO. Mejdoubi et al.
[11] presented a segmented probable road traffic maintenance scheme for VANET.
It aimed at recognizing the traffic on the road along with the regular adoption of the
path at every junction to decrease the time of driving and prevent congestion. The
communication among the vehicles and roadside units determines the traffic predic-
tion which can be acquired by the segmented method. Nawaz and Sattar [12] analyzed
traffic in rural and urban areas using vehicle ad hoc network routing protocols. In this
research, some of the protocols were studied which were listed as AODV, DSDV, and
DSR. The exploration was achieved in both rustic and urban zones. The examination
was performed based on information drop, vehicle thickness, and throughput and
starts to finish delay. It was investigated from the got outcomes in the type of low
packet drop and maximum throughput. DSR gives better outcomes when contrasted
with AODV and DSDV in country regions, and AODV provides great execution in
contrast with DSR in conditions of low thickness. Saha et al. [13] proposed research
through simulation parameters of different cities. This research showed a near trial of
different versatility situations of vehicular specially appointed system in three surely
understood Indian metros. The AODV routing protocol has been utilized for the
simulation results. The comparison analysis was done among protocols based on the
packet drop, throughput, and complete time taken by the test system to simulate the
given system. Durga et al. [14] defined reliable information distributed in vehicular ad
hoc networks. Impact evasion and traffic advancements were further investigations of
imminence in the Wise Vehicle Framework. The widespread data and proficient data
between vehicles were significant parts of the considerable number of ITS appli-
cations. Guo et al. [15] implemented a real-time application in monitoring traffic
environment. In this research, they initially proposed a compelling constant traffic
data sharing component which depends on a dispersed transportation framework
with RSUs, which has a lower registering intricacy and less excess.
4 Research Policies
In this section, mitigate the existing issues; the major purpose of this proposed work
is to grow a technique for routing protocol in vehicle network to enhance:
• PDR full form is Packet Delivery Ratio
• E2D full form is End to End Delay
• RO’s full form is Routing Overhead.
The research technique utilizes a dynamic routing protocol (DRP) and the ant
lion optimization (ALO) approach (VALO) and hence improving the delivery rate,
network overhead, and end to end delay (E2D).
106 R. Kumari and R. Kumar
The DSR is a basic routing protocol in which the source determines a series of
intermediate hop in the data packet of the routing table. In this protocol, the header
is copied in the query packet of the middle hop that is transferred. After that, the
receiver retrieves the route from a query and utilizes it to reply to the receiver hop. In
case, receiver hop forward multiple paths, the source hop will get the data and store
multiple paths from the receiver hop. The other hops used a similar connection in
the present path [13]. DSR is a re-active protocol that depends on the source route
method. This protocol is mainly reliant on the link state convention in which the
sender initiates the route request on-demand basis [16].
The ant lion optimization algorithm is an evolutionary approach used in the searching
of the area through the establishment of the randomized output [13]. The group of
the applicant leads to searching for the exact global optimum output rather than a
random variable. The method used for resolving the problem of internal and external
results [14]. Hence, there is the establishment of the required optimal value with
randomized alterations in the output value. In this method, there were maximum
possible results to get the desired possible optimal comparison of the native optima
[17].
Initially, to create a vehicle ad hoc network with x-coordinates (network length)
and y-coordinates (network width). It defines the simulation parameters such as
vehicle nodes, energy, veh_ids, and data packet rates. It searches the source node and
destination node in the VANET. After that create the coverage set in the VANET. It
calculates the coverage set distance, range, and matrix of the VANET. Develop the
dynamic source routing protocol (DSR) algorithm to send the request one node to
the intermediator node. If route request sent the one node to another vehicular node
for data or packet transmission in VANET. If request accepts through intermediator
node using reply back. In case, route error occurs, then the DSR third phase used
(Route Maintenance). After that, evaluate the performance parameters such as PDR,
routing overhead (RO), and E2D. In the proposed algorithm, the ALO optimization
algorithm is an evolutionary approach used in the searching of the area through
the establishment of the randomized output. In this process is optimized the routes
and evaluate the performance metrics (PDR, RO, and E2D) and compared with the
existing VACO traffic control protocol.
Vehicular Ant Lion Optimization Algorithm (VALOA) … 107
5 Simulation Result
The simulation tool used MATLAB is the high-level programming language and
interacting environment for the numerical, alpha-numerical, and mathematical,
programming approach. It is established by the organization of the MathWorks.
Tables 1 and 2 show the network simulation parameters such as network area 2000
m × 2000 m; range of data communication value is 300 m; the protocol used DSR
for data communication in sequence manner; vehicle nodes used to data sending one
hop to another hope such as 0, 5, 10, 15, 20,…, 30. RSU is called as receiver value
is 5, coverage distance evaluate is 100, and network performance is calculated based
on PDR, network overhead, and delay.
The mathematical formula’s used in this research work:
Packet delivery ratio:
It is the proportion of the amount of the packets to be transferred. The performance
of the network is improved with an increase in PDR.
Mathematically, it is given as,
Table 2 Simulation
Parameters Values
parameters
Network area 2000 m × 2000 m
vnode 5, 10, 15, 20, 25, 30, …
Range 300 m
Energy Randomly
Parameters PDR, E2D, and RO
Data packets Randomly
RSU 5
108 R. Kumari and R. Kumar
No of packets achieved
PDR = × 100
No of packets transferred
Network overhead
It is the proportion of the amount of the packets created to the total number of the
packets created.
Figure 4 shows the deploy the VANET as a receiver. Calculate the vehicular ad
hoc network area based on the network length and network width. To search the
start node for packet transmission and a destination node in the vehicular ad hoc
network. The coverage set and calculate the distance between the source node to the
destination node in the VANET. It calculates the network distance from the coverage
range and matrix of the VANET. This network defines the vehicle node ID. When
the user sends the data one node to another node and assigns the unique id which is
100 up to 500, a unique id exceeds then overload increase and delays occur in the
VANET.
Figure 5 shows the path maintenance process in the VANET. Particularly, route
maintenance in the DSR protocol needs no periodical data packets at any equal in
the network. For instance, DSR does not need any route broadcasting and nearest
recognized data packets and does not depend on the functional data. The whole
on-demand conduct and loss of the periodical operation permit the amount of the
overhead data packets produced by DSR to measure data. When hop starts to transfer,
the pattern of the communication modifies, and the route data packet identifies the
route. In contrast to route searching, hop recognizes multiple paths to receiver hop.
It permits the response to route modification that may take place rapidly.
Figure 6 shows the sender will forward path demand information (PREQ). Every
node gets path demand and that is again forwarded to the neighboring node. If the
sink node or receiver gets the path demand, then it will respond to the sender as
a perfect route reply message (PREP). A start node will get the shortest route and
forward data packets toward the specific route. Path maintenance is accountable for
the failure of connections. In case middle hop determines path breakages that will
forward an error fault data to the source node.
Figure 7 shows the comparative analysis with numerous protocols such as VALO,
DSR, and VACO algorithm. VALO is a proposed routing protocol which is optimized
the valuable route and network performance. VACO is used for traffic-aware routing
protocol, and ants are transferred by well-organized broadcasting apparatus to control
the network issues. In DSR, routing protocol is used for route broadcasting, route
reply, and problems come under the network lacking then route maintenance in the
VANET. In VALO is reduce the end to end delay parameters as compared with the
VACO and dynamic source routing method.
Figure 8 demonstrations the comparison between proposed and existing routing
protocols such as DSR, VACO, and VALO optimization algorithm. VALO is used to
fast signal broadcasting and high data transmission from source to destination vehicle
nodes. In VACO, traffic routing protocols to manage the route errors and recover the
lost information from the valid route in the network. In DSR, routing protocol to
handle the route searching, maintenance, and replying inaccurate manner. In VALO,
PDR performance is increasing as compared to VACO and DSR Routing protocols.
Vehicular Ant Lion Optimization Algorithm (VALOA) … 111
Fig. 7 Comparison—end to
end delay (ms)
Fig. 8 Comparison—packet
delivery ratio (%)
Fig. 9 Comparison—
routing overhead
(Byte)
Table 4 Performance
Parameters DSR Protocol DSR_ALO
parameters with DSR and
DSR_ALO proposed Work PDR (%) 0.60–60 0.707–70.7
Delay (ms) 0.3 0.2
Overhead (Byte) 0.0013 0.00057
Table 5 Comparison
Parameters VACO DSR protocol VALO
between proposed and
existing protocol PDR (%) 0.55–55 0.60–60 0.707–70.7
Delay (ms) 0.8 0.3 0.2
Overhead (Byte) 0.150 0.0013 0.00057
Vehicular Ant Lion Optimization Algorithm (VALOA) … 113
References
1. Tonguz, O., Wisitpongphan, N., Bait, F., Mudaliget, P., Sadekart, V.: Broadcasting in VANET.
In: IEEE 2007 Mobile Networking For Vehicular Environments, pp. 7–12 (2007)
2. Singh, A., Kumar, M., Rishi, R., Madan, D.K.: A relative study of MANET and VANET:
its applications, broadcasting approaches, and challenging issues. In: Springe International
Conference on Computer Science and Information Technology, pp. 627–632 (2011)
3. Dixit, M., Kumar, R., Sagar, A.K.: VANET: architectures, research issues, routing protocols,
and its applications. In: IEEE International Conference on Computing, Communication and
Automation (ICCCA), pp. 555–561 (2016)
4. Qu, F., Wu, Z., Wang, F.Y., Cho, W.: A security and privacy review of VANETs. In: IEEE
Transactions on Intelligent Transportation Systems, vol. 16(6), pp. 2985–2996 (2015)
114 R. Kumari and R. Kumar
5. Gaikwad, D.S., Zaveri, M.: VANET routing protocols and mobility models: a survey. In:
Springer Trends in Network and Communications, pp. 334–342 (2011)
6. Cabrera, V., Rose, F.J., Ruiz, P.M.: Simulation-based study of common issues in vanet routing
protocols. In: 2009—IEEE 69th Vehicular Technology Conference, pp. 1–5 (2009)
7. Gozalvez, J., Sepulcre, M., Bauza, R.: Impact of the radio channel modelling on the performance
of VANET communication protocols. Springer Telecommun Syst 50(3), 149–167 (2012)
8. Paul, Bijan, Islam, Mohammed J.: Survey over VANET routing protocols for vehicle to vehicle
communication. IOSR J Comput Eng (IOSRJCE) 7(5), 1–9 (2012)
9. Kumar, S., Rani, S.: A study and performance analysis of AODV, DSR and GSR routing
protocols in VANET. Int J Comput Appl 96(9), 48–52 (2014)
10. Goudarzi, F., Asgari, H., Al-Raweshidy, H.S.: Traffic-aware VANET routing for city envi-
ronments—a protocol based on ant colony optimization. IEEE Syst J 13(1), 571–581
(2018)
11. Mejdoubi, A., Fouchal, H., Zytoune, O., Ouadou, M.: A distributed predictive road traffic
management system in urban VANETs. In: IEEE 15th International Wireless Communications
and Mobile Computing Conference (IWCMC), pp. 37–42 (2019)
12. Nawaz, A., Sattar, A.R.: Traffic analysis in rural/urban area using VANET routing protocols.
In: Hybrid Electrical/Fuel Cell Vehicles Advances in Automobile Engineering, pp 2–5 (2016)
13. Saha, S., Roy, D.U., Sinha, D.D.: VANET simulation in different Indian city scenario. In:
Advance in Electronic and Electric Engineering. ISSN 2231-1297 (2013)
14. Durga, C.V., Chakravarthy, G., Alekya, B. Efficient data dissemination in VANETs: urban
scenario. In: IEEE 2018 International Conference on Inventive Research in Computing
Applications (ICIRCA), pp. 891–896 (2018)
15. Guo, C., Li, D., Zhang, G., Zhai, M.: Real-time path planning in urban area via vanet-assisted
traffic information sharing. IEEE Trans Veh Technol 67(7), 5635–5649 (2018)
16. Kaur, H.: Analysis of VANET geographic routing protocols on real city map. In: 2017 2nd IEEE
International Conference on Recent Trends in Electronics, Information and Communication
Technology (RTEICT), pp. 895–899. IEEE (2017)
17. Sachdev, A., Mehta, K., Malik, L.: Design of protocol for cluster based routing in VANET using
Fire Fly algorithm. In: 2016 IEEE International Conference on Engineering and Technology
(ICETECH), pp. 490–495. IEEE (2016)
18. Zhu, M., Cao, J., Pang, D., He, Z., Xu, M.: SDN-based routing for efficient message propaga-
tion in VANET. In: Springer International Conference on Wireless Algorithms, Systems, and
Applications, pp. 788–797 (2015)
19. Kaur, H.: Analysis of VANET geographic routing protocols on real city map. In: 2017 2nd IEEE
International Conference on Recent Trends in Electronics, Information and Communication
Technology (RTEICT), pp. 895–899. IEEE (2017)
Dynamic and Incremental Update
of Mined Association Rules Against
Changes in Dataset
Abstract Association rule mining (ARM) in data mining provides quality associa-
tion rules based on support and confidence measures. These rules are interpreted by
domain experts for making well-informed decisions. However, there is an issue with
ARM when the dataset is subjected to changes from time to time. Discovering rules
by reinventing wheel, scanning entire dataset every time in other words, consumes
more memory, processing power, and time. This is still an open problem due to prolif-
eration of different data structures being used for extracting frequent item sets. We
proposed an algorithm for update of mined association rules when dataset changes
occur. The algorithm is known as FIN_INCRE which exploits the preorder coded tree
used by FIN algorithm for fast item set mining. The proposed algorithm outperforms
the traditional approach as it mines association rules incrementally and dynamically
updates mined association rules.
1 Introduction
Association rule mining (ARM) has numerous applications such as analysis of sales
and discovering latent relationships among attributes in medical dataset to mention
few. ARM has two important phases known as discovery of frequent item sets and
producing association rules from the results of first phase. Different association rule
mining algorithms are evaluated [1, 2]. The difference of algorithms lies in the usage
of data structure. For instance, node set is the data structure used [3] for reducing
time and space complexity. Fast item set mining was thus made possible. However,
N. Satyavathi (B)
Department of CSE, JNTUH, Hyderabad, Telangana, India
e-mail: Satyanadendla15@gmail.com
B. Rama
Department of CS, Kakatiya University, Warangal, Telangana, India
e-mail: rama.abbidi@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 115
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_13
116 N. Satyavathi and B. Rama
there was need for incremental association rule mining algorithms that are expected
to generate association rules incrementally without rescanning the entire database
when database update occurs.
We found that the dataset used in FIN algorithm [3] and POC-Tree associated
with that provides faster and efficient mining of incremental association rules.
FIN-INCRE is the algorithm proposed for discovering association rules incre-
mentally. It exploits the POC-Tree and the underlying data structure of FIN
algorithm.
The paper is structured as follows. Section 2 presents related work done for mining
association rules. Section 3 presents the proposed methodology that can be used for
incremental and fast mining of frequent item sets. Section 4 provides procedure for
generating interesting association rules. Conclusion of the paper and future scope of
the research is provided in Sect. 5.
2 Related Work
This section reviews literature on association rule mining. ARM has been a persistent
topic in the domain of data mining for number of years. Plentiful research is found
on ARM and it proved its utility.
Many algorithms were developed for incremental association rule mining:
DB-tree and PotFP-tree [4], AFPIM [5], IFP-Growth [6], CAN tree [7], EFPIM
[8], CP-tree [9], FUFP-tree [10], BIT-FP-Growth [11] were developed. A new
approach called IRARM [12] for mining relational association rules is developed. A
system [13] is developed for incremental mining. Original database is represented in
the form of COMVAN tree, and frequent item sets are mined using COMVAN tree.
Many approaches have been developed for mining incremental association rules. But
still, these algorithms suffer from the following drawbacks:
• Required to scan original database many times.
• Works only in case of insertions.
• Works only in case of Deletions.
• Doesn’t work in case of support change.
• Data structure used in mining is not capable in terms of time complexity/space
complexity.
Hence, there is a need for an algorithm for ARM, which overcomes the drawbacks
of existing algorithms, must be developed. Hence, such algorithm is proposed by
enhancing FIN algorithm for efficient mining of incremental association rules.
Dynamic and Incremental Update of Mined Association Rules … 117
3 Proposed Methodology
Proposed algorithm works well (i) When original database is changed with new
transactions (ii) When some of the old transactions are deleted (iii) When change
in specified user threshold. FIN algorithm is used for discovering frequent item
sets from database D. When D is subjected to new records or removal of existing
records or user specified support changes, the item which is frequent may become
infrequent and infrequent item sets may become frequent. The proposed incremental
mining algorithm FIN_INCRE (shown in Fig. 1) finds the items which became
infrequent after adding new transactions, deletes them from Original POC-Tree and
also finds items which became frequent after adding new transactions and adds them
to original POC-Tree. In this way, POC-Tree is updated to leverage performance of
FIN_INCRE significantly. Algorithm UPOCinINS (shown in Fig. 2), UPOCinDEL
(shown in Fig. 3), UPOCinSup (shown in Fig. 4) is used to update POC-Tree in case
of insertions, deletions, and support change, respectively.
Association rules can be generated from frequent item sets by using “confidence”
measure. But enormous amount of rules may be generated which may or may not be
interested to the user. So, post processing of rules is required. To find the interesting
rules, several evaluation measures can be used [14, 15].
120 N. Satyavathi and B. Rama
References
1. Satyavathi, N., Rama, B., Nagaraju, A.: Present State-of-the-Art of association rule mining
algorithms. Int. J. Eng. Adv. Technol. (IJEAT) 9(1). ISSN 2249–8958 (2019)
2. Satyavathi, N., Rama, B., Nagaraju, A.: Present State-of-the-art of dynamic association rule
mining algorithms. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 9(1). ISSN 2278-3075 (2019)
Dynamic and Incremental Update of Mined Association Rules … 121
3. Hong, Z., Deng, H., Sheng Long, L.V.: Fast mining frequent itemsets using Nodesets. Exp.
Syst. Appl. 41(10), 4505–4512 (2014)
4. Ezeife, C.I., Su, Y.: Mining incremental association rules with generalized FP-tree. In: Advances
in Artificial Intelligence, Lecture Notes in Computer Science, vol. 2338. Springer, Berlin,
Heidelberg (2002)
5. Koh, J.L., Shieh, S.F.: An efficient approach for maintaining association rules based on adjusting
FP-tree structures. In: Proceedings of the DASFAA, pp. 417–424. Springer, Berlin Heidelberg,
New York (2004)
6. Tong, Y., Baowen, X., Fangjun, W.: A FP-tree based incremental updating algorithm for mining
association rules. 5, 703–710 (2004)
7. Leung, C.K., Khan, Q.I., Hoque, T.: CanTree: a tree structure for efficient incremental mining of
frequent patterns. In: Proceedings of the Fifth IEEE International Conference on Data Mining
(ICDM’05) (2005)
8. Li, X., Deng, X., Tang, S.: A fast algorithm for maintenance of association rules in incre-
mental databases. In: Proceeding of International Conference on Advance Data Mining and
Applications, pp. 56–63 (2006)
9. Tanbeer, S.K., Ahmed, C.F., Jeong, B.S., Lee, YK.: CP-tree: a tree structure for single-pass
frequent pattern mining. In: Advances in Knowledge Discovery and Data Mining, Lecture
Notes in Computer Science, vol. 5012. Springer, Berlin, Heidelberg (2008)
10. Hong, T.P., Lin, J.W., We, Y.L.: Incrementally fast updated frequent pattern trees. Exp. Syst.
Appl. 34, 2424–2435 (2008)
11. Totad, S.G., Geeta, R.B., Prasad Reddy, P.V.G.D.: Batch incremental processing for FP-tree
construction using FP-growth algorithm. Knowl. Inform. Syst. 33(2), 475–490 (2012)
12. Diana-Lucia, M., Gabriela, C., Liana, C.: A new incremental Relational association rules mining
approach. In: International Conference on Knowledge Based and Intelligent Information and
Engineering Systems, KES2018, Belgrade, Serbia (2018)
13. Gupta, A., Tiwari, A., Jain, S.: A system for incremental association rule mining without
candidate generation. Int. J. Comput. Sci. Inform. Sec. (IJCSIS) 17(7) (2019)
14. Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association
analysis. Knowl. Discov. Data Min. 29(4), 293–313 (2004)
15. Liu, B., Hsu, W., Chen, S., Ma, Y.: Analyzing the subjective interestingness of association
rules. Intell. Syst. Appl. 15, 47–55 (2000). https://doi.org/10.1109/5254.889106. IEEE
E-Governance Using Big Data
Abstract The continuous advancements in the field of ICT and the constant efforts
from the Central and State governments have been the foremost forces for the
successful launch and reinforcement of e-governance in India. With the help of public
and private sectors, governments are encouraging organizations for interoperability
to store and process data from a central location that further enhances decision-
making. This fastest-growing data is turning into big data. The tools used to study
and analyse big data at a great speed and accuracy are known as big data analytics.
These big datasets can be text/audio/video/picture, etc. As the use of e-governance
datasets is increasing, the citizens expect to analyse and process datasets at greater
speed and accuracy. This paper shows the relationship between e-governance and
big data, its implementation around the globe, initiatives taken by India to estab-
lish e-governance, and some challenges in implementing big data with e-governance
projects.
1 Introduction
P. Salwan (B)
I.K. Gujral Punjab Technical University, Jalandhar, Punjab, India
e-mail: poonam12_sharma@yahoo.com
V. K. Maan
Giani Zail Singh Punjab Technical University, Bathinda, Punjab, India
e-mail: veerpalkaur1@rediffmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 123
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_14
124 P. Salwan and V. K. Maan
data has been increasing so fast that the traditional database management system
cannot be used to deal with such exponentially growing data. This also affects the
decision-making as more than half amounts of data remain unprocessed due to change
in its type. In this paper, Sect. 1 will discuss how to manage this continuously growing
data related to e-governance. Section 2 will discuss big data in e-governance and its
features. Section 3 will discuss the role of big data in e-governance projects across the
globe and India’s initiatives to adopt big data. Section 4 will discuss some challenges
that may occur while using big data analytics in e-governance.
Basically, all the datasets that satisfy the characteristics of 3Vs—Volume, Velocity,
and Variety—are considered as big data (Fig. 1). The technique used to study and
process mixed type datasets at a faster speed is called big data analytics. The big data
analytics process the big data by dividing datasets into equal sizes [7], storing them
on different computers known as nodes in a cluster of computers. This way big data
analytics make the processing faster and accurate.
The different phases comprising data as big data [7] are as follows (Fig. 2):
• Big data generation: This phase refers to different sources generating huge
amounts of data at greater speed.
E-Governance Using Big Data 125
• Big data acquisition: This phase refers to collecting data, turning into big
data, from different resources, or distributing data among other resources or
pre-processing of data.
• Big data storage: This phase refers to the management skills to store big data in
such a way that it could enhance the accessibility and availability of big data.
• Big data analytics: This phase refers to the analysis of struc-
tured/unstructured/mixed datasets to forecast future trends or predictions.
• It is capable to manage a dynamic type of data; i.e. it can manage the structured,
semi-structured, and unstructured type of data easily.
• It can easily manage a great volume of datasets, produced at a great velocity.
• It is scalable in nature; i.e. its setup can be modified as and when required.
• It has very vast analytic techniques meant for different types of data that help to
study different patterns or trends from processed/unprocessed data.
• It helps to take important decisions basis the current trend’s analysis.
Earlier, when the digital form of data was not available—the veteran leaders of
the government were expected to use their wisdom and past experiences to make
decisions [8]. In the present era, big data analytics help in decision-making using
digitized datasets. Almost 90% of datasets generated through different resources are
of an unstructured type. Big data analytic techniques give us a facility to explore
the unknown or hidden facts through the dissemination and processing of data under
different phases. Figure 3 shows how different types of datasets are collected, refined,
and synthesized to get the required data from the datasets [9].
The private sector has started using big data analytics to maximize their profit
by studying market trends, consumer behaviour, expectations, etc. The government
departments are using it for the growth and development of their citizens. The govern-
ments are also making laws and implementing policies to ensure security and privacy,
at all the phases of big data processing, for the validity of the information. Many
countries of the world like the US, UK, and Japan have already started projects using
big data analytic techniques to make future predictions [10].
Here is the analysis of various counties running e-governance projects based on big
data analytics [10, 11].
• The Australian government has been using big data analytics to provide better
services to their citizens. The Australian Customs and Border Protection Service
(ACBPS) is using big data analytics to ensure the security of their borders.
• The UK government had allotted £189 million for big data research, and major
emphasis was given to the agriculture industry.
• The government of France has allocated e11.5 million on the proposal related to
7 big data processing projects.
• The Norway government has been using big data analytics for the health care of
its citizens.
• The Indian government has invested Rs. 5630 crores on the UID project to provide
a unique ID to its citizens.
The United Nations Department of Economic and Social Affairs (U.N DESA)
conducts E-Governance Development Survey [12–14] every two years (biannu-
ally). This survey helps to find out the e-readiness of different countries and calcu-
lates E-Government Development Index (EGDI) using human development-related
parameters. The detail of these parameters is as follows:
1. Online Service Index (OSI): It checks whether the countries are following the
minimum level of Web Content Accessibility Guidelines or not.
2. Telecommunication Infrastructure Index (TII): It checks communication-related
aspects of the nation like total users of computer per 100 people; total connections
of telephone per 100 people; total connections of Internet per 100 people; total
users of mobile per 100 people; and total users of broadband per 100 people.
3. Human Capital Index (HCI): This parameter checks the literacy rate, enrolment,
and level of education at the primary and secondary levels, and skill development.
After calculating the above parameters, EGDI further finds out the composite
index based on the weighted average of these parameters. The possible values of this
index lie between zero (minimum) to 1 (maximum).
The EGDI index report 2018 (Table 1) shows Denmark at the top rank with 0.9150
index value. India, through its constant efforts, has made it possible to achieve 96th
global rank in the EGDI report with 0.5669 index value [12, 15].
Now the obvious question that comes to the mind is—Is the ranks scored by
different countries is the result of continuous efforts [16] or the result of efforts
invested in two years only? The answer to this question can be understood with the
help of Table 2. It shows the consolidated status of different countries on the basis
of EGDI Biannual reports of 2014, 2016, and 2018.
128 P. Salwan and V. K. Maan
• Agriculture: The main objective of this MMP is to inform the farmers [24] about
seeds, type of soil and matching crops, fertilizers, pesticides, government schemes,
weather forecasts, etc.
• Commercial taxes: The main objectives taken care by this project are e-filling of
returns [25], refunds, e-payment of taxes, online dealer ledger, etc.
• Education: Education is the common concern of both the Central and State
governments [26]. Thus, the Ministry of Human Resource Development (MHRD)
established a centralized structure that will be implemented by State governments.
• E-municipalities: Digitization of the state-level municipalities is another very
important initiative taken by the Central government [27] under the e-governance
plan.
• Digitization of land records: The main objective of this project is to digitize the
existing land records to avoid the chances of human mistakes [28].
• Employment Exchange: This project helps employers and employees to match
their requirements and find the best fit using online resources [29].
Integrated e-governance projects: Other than the projects mentioned above,
there are many projects seeking Central and the State governments’ coordination for
the welfare of the citizens, for example, land records, education, entertainment, etc.
Some of the integrated projects and their objectives are as follows.
E-Governance Using Big Data 131
• Road transport: This project created a unified scheme (states and union territo-
ries) to computerize their transport offices for efficient and quick management of
driving licences and certificates [30].
• E-Procurement: This project helps to make the procurement processes simple,
transparent, and result-oriented [31] using the Internet.
• EDI for eTrade: The electronic data interchange (EDI) for online trade provides
deliveries of services (24 * 7) electronically, increased transparency, reduced time,
cost, etc. [32].
• E-Biz: This project provides services in Government-to-Business (G2B) [33] by
sharing updated online information, easy to access the website, etc.
Big data analytic techniques have proved their worth in e-governance-based projects.
Still there seems some challenges or gaps to overcome for the successful use of big
data in e-governance [34, 3].
• Threat to privacy: Big data analytic techniques need to process personal details of
the citizens like UIDs, bank details, health details, sale, or purchase information
for analysis. If this personal information is not used appropriately, then it may
lead to its safety threat.
• Ethical versus unethical: As the end-users (citizens) are neither aware nor
informed that their personal details have been shared for future analysis, this act
inclines towards the unethical use of power for accessing sensitive information.
• Security of data: The e-governance project’s datasets, placed on the distant servers
may lead to intentional or unintentional threats to sensitive datasets.
• Lack of skilled resources: There is a deficiency of skilled resources to maximize
the utilization of big data analytics by finding out hidden patterns or detail.
• Reliability of information: The reliability of these reports mainly depends on the
capabilities and intentions of the enabled resource generating that report.
5 Conclusion
E-governance has been transforming the whole world. Now paper files have been
turned into computerized files and stored and maintained at the repositories placed
at far of places. Big data analytic techniques have been adding sophistication in the
e-governance by providing detailed insights of hidden patterns or datasets. Big data
analytic techniques have also been overwhelming the traditional DBMS problems
like storing, sharing, and processing huge volumes, high velocity of datasets at greater
speed. Big data analytics also have some issues or risks related to safety, security, and
accessing of datasets. Technocrats are continuously working to provide safeguards
132 P. Salwan and V. K. Maan
against all the odds being faced using big data analytic techniques. The Indian govern-
ment is also working to make India—Digital India. Various e-governance projects
have been implemented at the central and the state levels for the welfare of the citi-
zens. The most popular project; i.e. UID has been using big data analytic techniques
to store and process huge amounts of data. Thus, the integration of e-governance and
big data should be encouraged to make Indian cities—smart cities and India—Digital
India. This will also help the Indian government in decision-making, better planning,
and management of resources for the welfare of citizens.
References
Abstract The water dispenser is a system which can be used to dispense drinking
water at various work and commercial places. Due to extensive usage among public,
the demand for these water dispensers is increasing day to day. As per the health-
conscious and/or based on their interest prefer hot and cold water. Even though a
plethora of water dispensers are available in the market still there is scope to improve
in performance wise. The existing choice microcontroller unit-based water dispenser
systems dispensers are facing common problems like button operated and wastage of
water in the case of overflow of no glass/container case. In this paper, these problems
are addressed with hardware design. A novel voice control-based water dispenser is
proposed by maintaining the choice-based dispense with voice control using Arduino
Nano. This system also avoids the wastage of water.
1 Introduction
The water dispenser is a system which can be used to dispense drinking water
at various work and commercial places from schools to corporate workplaces
including hospitals. Due to extensive usage among public, the demand for these
K. Sateesh Kumar · P. Udaya Bhanu (B) · T. Murali Krishna · P. Vijay Kumar · Ch. Saidulu
Department of ECE, Vignan’s Lara Institute of Technology and Science, Vadlamudi, AP, India
e-mail: udayabhanu.potu@gmail.com
K. Sateesh Kumar
e-mail: sateeshkumarkanagala@gmail.com
T. Murali Krishna
e-mail: murali22061999@gmail.com
P. Vijay Kumar
e-mail: pusuluriv99@gmail.com
Ch. Saidulu
e-mail: saidulu.ch786@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 135
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_15
136 K. Sateesh Kumar et al.
water dispensers is increasing day to day [1, 2]. These dispensers are used to dispense
water at low (cool), normal (room temperature), and high (hot) temperatures with
different microcontroller-based circuits/designs. Even soft drinks also offer with this
technology. Particularly, in pandemic situations, people from all over the world are
very conscious about their health, particularly about drinking water. On the other
side, embedded systems are rapidly developing to address various real-time issues
in our day to day life [2]. This makes the large space for marketing innovative and
smart water dispensers on a global scale as an application of an embedded system.
With this, we propose an Arduino nano-based voice controlled hot and cold water
dispenser system.
This paper is organized as follows. Section 1 represents the introduction to the
research problem and followed by background in Sect. 2. Detailed discussion about
the proposed method is presented in Sect. 3. Algorithm of the proposed is presented
in Sect. 4. Results are presented in Sect. 5 and conclusion follows.
2 Background
water in the dispenser. The temperature sensor can be used to sense the temperature
of the water tank. The power supply is given to heat section coils heat the water.
Some of the drawbacks of the existing system are as follows.
• Water over flow will occur at the dispenser, during no glass case.
• Continuously operating the buttons of dispenser, till the completion of required
water level.
These drawbacks and addressed and resolved in the proposed method of voice-
based water dispenser system with advanced hardware and high computational power
than existing.
The block diagram of the proposed Arduino Nano-based water dispenser system is
represented in Fig. 1.
The proposed voice choice-based water dispenser system works with Arduino
Nano board. This Nano board is advanced than Arduino Uno [5, 10, 11]. This is
an improved version of the existing 89S52-based system in hardware design and
functionality (software). The novelty of the proposed system with the existing is
there is feedback between user (input) and outlet (output). This reduces the wastage
of water and improves the performance of the system. The insight of the proposed
water dispenser is categorized into two parts. (i) Hardware, (ii) Software.
Temperature
Sensor LCD Display
3.1 Hardware
The heart of the hardware section of the water dispenser Arduino Nano. Arduino
replaces the AT89S52, which was at down stage in the performance than Arduino.
Arduino Nano: Arduino Nano is a product of Arduino. It is a flexible advanced
microcontroller with and broadband supportive. It supports its earlier version UNO
at small in size. The output Arduino Nano can produce analog and digital outputs to
control the peripherals.
Power supply: The power supply required for this water dispenser is at maximum of
12 V.
Voice recognition module (VR-3): In this, voice recognition module voice(s) of user
are recorded and stored. When user again gives command, compares with database,
and gives the response as either hot or cold which was opted by the user.
Crystal oscillator: Crystal oscillator is used to provide the clock to Arduino board.
This is assembled on the board itself. The crystal frequency is in the order of 6 MHz.
Relay: Relay is electromechanical to acts as automatic switch to drive the pump
motor drivers (L298N) to dispense water of choice through outlets. Separate relays
are used for hot and cold outlets.
IR sensor: The IR sensor is used to identify the presence of the container or glass at
the outlet by transmitting and receiving infrared signals. Water level also measured
using IR sensor.
LCD display: In this proposed method, 16 × 2 LCD is used to display the status
of container, status of water that dispensing and any error messages. The outputs of
LCD display are discussed in the result section.
Temperature sensor: The temperature sensor is to sense the temperature value of
the hot and cold water tank. The mixing of water is also possible based on the user
choice.
5 Results
This section provides the brief discussion of the results with the help of corresponding
screenshots. Figure 2 shows the overall design of the proposed voice-based smart
water dispenser using Arduino and message of “Please place Glass”. Figure 3 repre-
sents the message “Glass detected give voice input,” and Fig. 4 shows the display of
“HOT water” and “Cold water” in LCD display after accepting the voice command
by user.
6 Conclusion
The voice-controlled water dispenser using Arduino Nano is proposed with more
user-friendly accessibility. This system is an improved version in hardware and in
functional than existing 89S52 model. Avoidance of water wastage is an added advan-
tage for this system. This makes it best suitable for home and commercial areas.
It needs minimum maintenance than power-free water dispensers. The proposed
Arduino-based system can save the water resource by implementing IR sensor-based
container detection mechanism and overflow detection.
Implementation of Voice Controlled Hot and Cold Water … 141
References
1. Huang, P.P.: The effect of different temperature water intake to the resting heart rate variability.
Department of Physical Education, Fu Jen Catholic University, Magisterial Thesis (2005)
2. Reverter, F., Gasulla, M., Palhls-Areny, R.: Analysis of power-supply interference effects on
direct sensor-to-microcontroller interfaces. IEEE Trans. Instrum. Meas. 56(1), l71–177 (2007)
3. Jinxiong, X., Dong, Z., Yuying, W., et al.: A design of energy-saving drinking dispenser based
on fuzzy memory control. J. Inspection Quarantine 20(3), 30–33 (2010)
4. Huang, J., Xie, J.: Intelligent water dispenser system based on embedded systems. In: Proceed-
ings of 2010 IEEE/ASME International Conference on Mechatronic and Embedded Systems
and Applications, pp 279–282. Qingdao (2010)
5. Cheng, W.Z., Cheng, R.Z., Shuo-Yanchou: Power saving for IoT enabled water dispenser
system. In: 2019 42nd International Conference on Tele Communications and Signal Processing
(2019)
6. Huang, C.J., Tsai, F.T.: Research and development of a practical water dispenser. In:
International Conference on Applied System Innovation (ICASI), pp. 1225–1228. Sapporo
(2017)
7. Zhongren, C., Fangjing, C., Yanfeng, Z.: Development and application of an external intelligent
power saver for drinking water dispenser. Shan Xi Electronic Technology (2012)
8. Smart Systems and IoT: Innovations in Computing. Springer Science and Business Media LLC
(2020)
9. Ariffin, S.H.S., Baharuddin, M.A., Fauzi, M.H.M., Latiff, N.M.A., Yusof, S.K.S., Latiff,
N.A.A.: Wireless water quality cloud monitoring system with self-healing algorithm. In:
2017 IEEE 13th Malaysia International Conference on Communications (MICC), pp. 218–223
(2017)
10. Yen, Y., Chou, Z., Hou, M., Wang, X.: The design of intelligent water supply device based on
MCU. In; 2015 IEEE 5th International Conference on Electronics Information and Emergency
Communication, pp. 388–391. Beijing (2015)
11. Aisuwarya, R., Hidyathi, Y.: Implementation of Ziegler-Nichols PID tuning method on stabi-
lizing temperature of hot-water dispenser. In: 2019 16th International Symposium on Research
(QIR), International Symposium on Electrical and Computer Engineering (2019)
Future Smart Home Appliances Using
IoT
Abstract Internet of Things is a physical device and objects that collect data, store,
and analyze the data. In-home appliances manually operated activities and func-
tions. The IoT uses technical product elements, in-home appliances rapid changes
in society. The IoT system design development, control and monitoring in various
applications such as health, transport, agriculture, home appliances, etc. We propose
a framework model future smart home appliances using IoT which helps the devel-
oper to build infrastructure home automation applications accordingly to the user
specifications and the requirements. The proposed model is the best solution to use
smart home applications with the use of sensors, communication, smart home opera-
tions, controlling with the use of mobile apps and Arduino. The system will provide
security, smart home automation. In future, it will be extended to develop intelligent
smart-based application with an integrated environment and reporting applications.
1 Introduction
Internet of Things is the network of physical devices, objects, sensors in the network
connectivity, which are used to collect, exchange data, store, and analyze the data.
IoT applications in home automation features energy, protection, and safety. In 1990,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 143
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_16
144 P. Srinivas et al.
home automation using Internet and devices. In 2000, home network systems using a
smartphone which uses apps for remote monitoring applications; in 2010, smart home
application by IoT and AI technologies is used for context-aware systems, now 2020,
intelligent smart home appliances using IoT, AI, and machine learning for record
data, store, analyze, and remote control and access according to the information in
context. IoT plays by 2023 US $137.9 billion, growth rate. The home appliances by
smartphone with wi-fi as the communication protocol.
Internet of Things is a new technology used in important various applications like
smart homes, health, energy servers, defense, monitoring, transport, traffic manage-
ment, infrastructure management, and water and building environment, etc. IoT
consisting of components, networks, sensors, able to integrate to read, store, and
analyze information. IoT essential technologies are WSN, RFID, middleware, cloud
computing, and IoT software applications.
IoT is a service required in worldwide various applications. According to world
statistics, 2020 by 20.4 billion IoT devices, expected to be 64 billion IoT devices are
used by 2025. Growing the popularity to use of home systems, security is the most
important; in IoT-based security helps guarantee availability of services. Today smart
home technology advancements are mostly used in general human aid, intelligent
smart home IoT services. IoT changing the life of a human, home appliances are
also used of home in the office, every domestic space light, dishwater, gardening,
air conditioning, etc. The sensors controlled smart devices use smartphone or tablets
with wi-fi connection are used to collect the sensor data, which allows to read,
store, analyze the data. Gardening use automatic sprinklers in smart automation
infrastructure of house with communication medium wi-fi or Bluetooth. In 2020,
there are 20 billion devices are connected in healthcare, advanced technology with
IoT products are used in home appliances and house automation. IoT rapid changes
in society to provide the scope of the devices. Future home appliances used TV,
lighting, heating, freeze operations and use of IoT systems. IoT devices connected
with communication efficiently.
The paper is organized Sect. 2 describes the literature survey, Sect. 3 describes
the proposed model, Sect. 4 is about discussions, and Sect. 5 specifies about the
conclusion and future scope.
2 Literature Survey
In Lee et al. [1], IoT technology emphasis the essential elements, products, and
services. Kumar Mandula et al. [2] discussed IoT-based applications health care,
home, etc., using automation microcontroller and mobile apps. Mussab Alaa et al.
[3] studied IoT related to 229 articles and technological advancements in smart home
applications, smart homes, apps, IoT databases and classified papers into IoT smart
Future Smart Home Appliances Using IoT 145
home survey. Vignesh et al. [4] proposed a model home automation that accesses
control devices remotely by smartphones with the use of WSN, cloud networking
from remote locations.
Timothy Malche et al. [5] proposed a FLIP architecture that uses sensor environ-
ment alert, monitor, controlling, intelligent, in smart home applications using Frugal
Laboratories IoT (FLIP) architecture. Swetha et al. [6] studied systems to monitor
smart homes in electrical appliances light, fan, etc., using sensors and the Internet.
According to Min Li et al. [7], smart home applications were used in the most impor-
tant part of smart grid usage that users respond from services in designing a smart
home with electricity service. Petnik et al. [8] proposed a home care cloud-based
service with an integration layer. Heetae Yang et al. [9] studied smart home service
functions the authors collected 216 samples from Korea, personal and characteris-
tics based on behavior. Jian Mao et al. [10] studied IoT functionality, security with
machine learning algorithms which play the most significant role in smart home
systems. Hana Jo et al. [11] study smart home, IoT-related technology which inte-
grated devices that are used to organize each device in a network perform activities.
Majid Al-Kuwari et al. [12] proposed a smart home automation with the use of IoT
sensing and monitoring. Controlling home, smart home with intelligent automation
with design, sensing, and monitoring. Shradha Somani et al. [13] proposed an IoT-
based smart security that provides home automation using a smart home that will
use the software, sensors, and actuators. Ahmed et al. [14] studied IoT quality assur-
ance. IoT applications growing in various domains such as security, e-health, smart
cities, and defense applications. Batalla et al. [15] proposed an architecture to provide
security and availability. According to Khalaf et al. [16], smart home control activity
using IoT sensors, processing, and applications.
Design and developing a model for IoT-based smart home appliance system with
automation activities based on sensors, processing data control and monitoring
system in the smart home environment.
3 Proposed Model
Arduinos
Control and
Sensor Networks IoT Home
Users Monitoring
appliances
Smart
MobileApp
Smart
Light Refrigerator Door
Phone
Smart Temperature
Security Smart TV
Speaker and others
meter depend on smart home IoT home automation gateway. Communication and
control smart mobile with functions and communicating to control and monitoring
with Arduino.
Home automation will provide functions according to the stakeholder’s specifica-
tions. Characteristics features like house automation, functions, and security services.
IoT emerging personal, home, enterprise, utilization of mobile operations for data
collection, keep track, home maintenance application services, and optimization.
Home automation flexibility to use automation uses of IoT home appliances. In open
source IoT platform: Home assistant, demotics, OpenHAB uses IoT device security
with message queuing, administration of devices, data collection, analysis, visual-
ization, and integrity with services. The other IoT applications are transport, agri-
culture, production, environment, industrial, safety, and retail. Network connected
objects will provide security and utilization of applications.
The availability of the services of the proposed architecture with efficient use of
technology like CCTV, door sensors, smart lock sensors with gateway collection
of the data using communication protocols, and alert or alarm warning information
to the users. Security, availability, and response according to events and provide
security, privacy is an important task in smart homes [19].
Smart home functions in the proposed smart IoT home system will provide various
functions according to the user specifications that use, smart control use of mobile
phones, functions like IoT operations which are electric light On/Off refrigerator
On/Off, door Open/Closed, security On/Off, smart speaker On/Off, smart TV On/Off,
and temperature On/Off. In addition to home appliance control using smartphone and
notification through Email, SMS, and other applications like solar power system and
the smart parking applications according to the user specifications.
Home IoT applications provide home automation using network service providers
by quality of services. The standard traffic network management, security protocol
Bluetooth use of optimizing data transmission using channels like Zig-Bhee, 5G
network to automate the response of actions in home automation by sending signals in
unidirectional sensors, interfaces, controllers, send commands to actuators (outputs).
148 P. Srinivas et al.
It uses residential gateway sensors, light, temperature with the use of the smartphone
with interface functional control with actuators (i.e., light home automation based
on WSN technology, Zigbee, and wi-fi technology) and android-based smartphone,
functionalities of smart home systems [20]. House automation systems are connected
with the Internet which communicates to the user. A sensor helps to user to provide
security. All security of terminals, alarm, records with sensor data like video, door
interface, and use of energy efficiently. The security system which uses automatic
doors, safety, and alarm systems [21]. Home assistant process and home controlling
including collect data, storing, and to analyze data. Home automation systems and
trigger commands are based on configuration. Smart home trigger based on past
behavior of a user, the apps will be used to control devices in mobile phones and
tablet. The model with a scenario to design, control, and monitor system for the
smart-based home systems [22, 23].
The home automation will use IoT-based sensing and monitoring platforms that
have sensors, signals in smart home—Automation. It uses communication with wi-fi,
Bluetooth, and others. The functional principles are using Algorithm 1. The sensors,
signals with devices read the data home security, smart home interior design, intel-
ligent light, hardware, Arduino board, software IoT design, residential connectivity,
home ecosystem devices, components, software, and sensors to monitor setup the
functions in smart home technology that chooses a site for smart home construction.
The architecture design that provides all amenities using IoT-based smart home tech-
nology according to the user specifications the some of the of functions which are
energy monitoring, health smart parking, and smart gardening, etc. The controlling of
operations and applications according to the sensor data with mobile app functions,
for example, light on/off.
Step.4. Control the signals using sensors control the home appliances with required
operations on/off
Step.5 Report the information to the user
End
Developing a smart home with home safety, required functions monitoring, smart
parking, and video surveillance, the smart home uses of hardware, software, sensors
and utilizes the household applications that optimize home application systems by
low-cost maintenance, energy (i.e., Remote adjustments, improve efficiency). For
examples, applications lighting is light turn on/off, stove turn on/off, water on/off,
video monitoring video conference with family, friends, security by video surveil-
lance control— automatic alert, SMS, and detection. Preventive decision systems
like healthcare application will use of Bluetooth technology to measure blood pres-
sure, temperature, safety, alarm, activity monitoring as well and direct exercises, diet,
food, and preventive measures, internal functions of each day life activities, SMS,
alert, and functional operations to the stake holders [24, 25].
4 Discussion
Sensor networks are used to measure for understanding the environment, natural
resources, urban environment, IoT, Internet into future specifications are user appli-
cations. Pervasive communication, IoT, and smart connectivity computer systems
that are interacting with objects, sensors using smart devices like (Smartphone, smart
watches). The Internet of Things helps humans in home automation, house hold oper-
ations which respond with actions. The proposed model will help the developers
in smart home appliances using IoT which use sensors, hardware, RF-transmitter,
receiver, sensors, user interfaces, hardware, processor, data collector analysis and
reporting systems, and effective utilization of required functional services.
In future, smart tags are used for logistics, vehicles using heterogeneous systems,
smart traffic that automatically uses intelligent smart traffic and intelligent applica-
tions. Future Internet, Worldwide Internet of Things many objects, smart connec-
tivity, network resources, smart phones, devices, objects, and intelligent environ-
ments that smart home appliances with integrated systems into future home appli-
ances are used to identify the suspicious activity to detect any activity by video
surveillance and report when events in critical conditions.
References
1. Lee, In, Lee, Kyoochun: The Internet of Things (IoT): applications, investments, and challenges
for enterprises. Bus. Horiz. 58, 431–440 (2015)
2. Kumar, M., Ramu, P., Murty, C.H.A.S., Magesh, E., Lunagariya, R.: Mobile-based horne
automation using Internet of Things (IoT). In: 2015 (lCCICCT). IEEE, pp. 340-34 (2015)
3. Alaa, M., Zaidan, A.A., Zaidan, B.B., Talal, M., Kiah, M.L.M.: A review of smart home
applications based on Internet of Things. J. Netw. Comput. Appl. 1–36 (2017). Elsevier
4. Vignesh, G., Sathiya Narayanan, M., Abubakar, B.: Customary Homes to Smart Homes Using
Internet of Things (IoT) and Mobile Application. IEEE, pp. 1059–1063 (2017)
5. Malche, T., Maheshwary, P.: Internet of Things (IoT) for building smart home system. In:
International Conference on I-SMAC (IoT in Social, Mobile, Analytics, and Cloud) (I-SMAC
2017). IEEE, pp. 65–70 (2017)
6. Swetha, S., Suprajah, S., Vaishnavi Kanna, S., Dhanalakshmi, R.: An intelligent monitor system
for home appliances using IoT. In: International Conference on Technical Advancements in
Computers and Communications. IEEE, pp. 106–109 (2017)
7. Li, M., Gu, W., Chen, W., He, Y., Wu, Y., Zhang, Y.: Smart home: architecture, technologies
and systems. In: ICICT-2018. Elsevier, pp. 393–400 (2018)
8. Petnik, J., Vanus, J.: Design of Smart Home Implementation Within IoT with Natural Language
Interface. Elsevier, pp. 174–179 (2018)
9. Yang, H., Lee, W., Lee, H.: IoT smart home adoption: the importance of proper level automation.
Hindawi. J. Sens. 1–11 (2018)
10. Mao, J., Lin, Q., Bian, J.: Application of learning algorithms in smart home IoT system security.
Math. Found. Comput. 63–76 (2018)
11. Jo, H., Yoon, Y.I.: Intelligent smart home energy efficiency model using artificial TensorFlow
engine. Hum. Cent. Computer. Inf. Sci. 8(9), 1–8 (2018)
12. Al-Kuwari, M., Ramadan, A., Ismael, Y., Al-Sughair, L., Gastli, A., Benammar, M.: Smart-
home automation using IoT-based sensing and monitoring platform. In: 2018 IEEE 12th (CPE-
POWERENG 2018), Doha, pp. 1–6 (2018)
Future Smart Home Appliances Using IoT 151
13. Somani, S., Solunke, P., Oke, S., Medhi, P., Laturkar, P.P.: IoT based smart security and home
automation. In: 2018 Fourth International Conference on Computing Communication Control
and Automation (ICCUBEA), Pune, India, pp. 1–4 (2018)
14. Ahmed, B.S., Bures, M., Frajtak, K., Cerny, T.: Aspects of quality in the Internet of Things
(IoT) solutions: a systematic mapping study. IEEE Access 7, 13758–13780 (2019)
15. Batalla, J.M., Gonciarz, F.: Deployment of the smart home management system at the edge:
mechanisms and protocols. Neural Comput. Appl. 31(1301–1315), 1301–1315 (2019)
16. Khalaf, R., Mohammed, A., Essa, E., Ali, H.: Controlling smart home activities Using IoT. In:
2019 International Conference on Computing and Information Science and Technology and
Their Applications (ICCISTA), Kirkuk, Iraq, pp. 1–6 (2019)
17. Bhat, O., Bhat, S., Gokhale, P.: Implementation of IoT in smart homes. Int. J. Adv. Res. Comput.
Commun. Eng. 6(12), 149–154 (2017)
18. Shah. H.: Home Automation Using IoT. https://www.simform.com
19. Batalla, J.M., Gonciarz, F.: Deployment of the smart home management system at the edge:
mechanisms and protocols. Neural Comput. Appl. 31, 1301–1315 (2019). https://doi.org/10.
1007/s00521-018-3545-7
20. https://www.slideshare.net/shohin/iot-home-automation-using-arduino-cayenne
21. https://www.businesswire.com/news/home/20200102005197/en/Prominent-IoT-Technology-
Leader-Showcase-Newest-Must-Have
22. Cheruvu, S., Kumar, A., Smith, N., Wheeler, D.M.: Demystifying Internet of Things security
successful IoT Device/Edge and Platform Security Deployment. Springer, pp. 347–411 (2020)
23. Linskell, J., Dewsbury, G.: Home automation system. Science Directory. In: Handbook of
Electronic Assistive Technology. Elsevier (2019)
24. Kadima, M.N., Jafari, F.: A customized design of smart home using Internet of Things. In:
ICIME2017. ACM, pp. 83–86 (2017)
25. Gubbi, J., Buyya, R., Music, S., Palaniswami, M.: Internet of Things (IoT): a vision architectural
elements and future directions. Future Gener. Comput. Syst. 29, 1645–1660 (2013)
Multilingual Crawling Strategies
for Information Retrieval from BRICS
Academic Websites
Abstract This paper proposes a web crawler for finding details of Indian origin
academicians working in foreign academic institutions. While collecting the data of
Indian origin academicians, we came across BRICS nations. In BRICS, except South
Africa, all other countries have university websites in native languages. Even if the
English version is available, it is with lesser data that can’t make the decision of
whether an academician is of Indian origin or not. This paper proposes a translation
method of the data from the main website in the native language to English language.
It is to be noted that google translation on such website does not give output in the
desired manner. We discover the area of translation using various APIs as well as
other techniques available for the same like UNL, NER (provides a supportive role
for translation), NMT, etc. Also, we will explore Stanford NER and segmenter for
these operations.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 153
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_17
154 S. Bharti et al.
1 Introduction
The heart of any search engine is the data collected by its web crawler. A web
crawler can be defined as the program that browses the internet storing links to
and information about the pages they visit. There is a type of web crawlers that
focus on only the web pages which are relevant according to a pre-defined set of
keywords/topics. They are called focused crawlers [1].
For finding the information of Indian origin academicians [2], various constraints
were devised, different educational domains were studied. After this, a relevancy
checking-based mechanism was developed that can guide the focused crawling.
Our developed crawler was running successfully for all the countries in which the
academic websites are in English, but when we tried to apply the same thing on
non-English websites problem raised.
This paper discusses the problems that came across while crawling the BRICS
nations, which have a high number of Indian origin academicians. However, these
websites are mostly in their native language and have to be translated via the Google
Translate API, which is automatically handled by the browsers but not by the web
crawlers which directly visit the seed URL provided. This process yields very poor
results as the keywords used to train crawler were in English language.
2 Problem Statement
From the converted texts driven from webpages, required data needs to be extracted.
The data to be fetched includes anything that might prove useful for the purpose of
the project, like name, department name, specializations, contact, etc. For Chinese
websites, recognizing above-mentioned data from a translated webpage is difficult.
Hence, we need to handle this challenge as well.
Multilingual Crawling Strategies for Information Retrieval … 155
Hence, we aim to enhance the current crawler [2], with the capability to
handle these different languages, convert them to English based on the translation
mechanism needed for that particular instance.
3 Techniques Used
UNL is a technique in which the given words/sentences are converted into Universal
words and vice-versa. Thus, it focuses on taking a source language and converting
it into a language which is independent of the given languages, whether source or
target. As shown in Fig. 1, the system of UNL consists of the two core processes,
namely UNLization and NLization, which are explained below.
These two processes are explained in detail.
(1) UNLization (using IAN): Interactive analyser (IAN) is a tool based on JAVA.
It is a web application used for the process of UNLization. Its input is a natural
language (source language), and it converts the given words into UNL form
which is language independent.
(2) NLization(using EUGENE): This is an online software tool. It was developed
by UNDL organization. It is similar to IAN and was released in 2012 [3]. The
UNL created in UNLization process is given as input to it.
UNL Components: The various components of UNL are discussed next.
(a) Universal Words (UW): Universal expressions consists of nodes, which are
formed, or represented, by the Universal Words. 2 other components, namely
relations and attributes are combined to represent these words. The following
format given for Eq. (1) is used to represent a Universal word in UNL.
<uw>=<headword>[<constraintlist>] (1)
To demonstrate this process, we provide the following English expression in
(2). English sentence: Man drives car. (2)
Universal words here are
man(icl>person)@singular),drive(icl>travel>
do,agt>thing),car((icl>object)@singular)))
(b) Relations: Relations are the links that exist between 2 UNLs. The relation
names, that are then used to make UNL expressions, are pre-decided set of
names.
(c) Attributes: The subjective nature of a Universal Word in a sentence is depicted
with the help of attributes.
NER is an initial step in data extraction. The major objective of NER is to locate and
classify different entities (which are named) in the text provided, and allocate them
into several pre-defined categories. These categories can be person, organizations,
expressions of time, locations, monetary values, symbols, percentages, quantities,
etc.
One example of the same mapping is shown below in Fig. 2. Where the sentence
is used to classify person and organization.
Many direct APIs like pytrans, googletrans, translation library of JAVA were used
for the translation of Spanish and Portuguese. Direct use of googletrans is enough
for these languages and gives high accuracy for the names.
Neural machine translation (NMT) [3] has proved to be one of the most powerful
and efficient algorithm to perform the task of natural language translation. Where
statistical translation uses only data for translation, it also raises the issues wrong
translation of sentences due to it, as mostly, they make no sense after the process.
One of these models of NMT is the encoder decoder structure. This architecture as
shown in Fig. 3 is comprised of 2 recurrent neural networks (RNNs) which are used
together in tandem to create a model for translation. After coupling it with the power
of attention mechanisms, this architecture can achieve impressive results [4].
4 Methodology
For BRICS, we modified this approach to bring in translation as well, which makes
the final approach as follows:
1. A list of University URLs is fed to crawler which visits them.
2. It extracts those URLs which might have faculty member names in the pages.
3. Applying NER on those pages, we can find proper nouns, which are mostly
names.
4. These names, in case of BRICS nations, are translated to the English language.
5 Data Gathering
The list of Indian names required for matching has been taken from Kaggle [5].
Also, the list of names we had from our ongoing project having database of the
Indian origin academicians in US and UK. Further names from Lok-Sabha elections
and parliamentary election [6] were extracted as to be candidate in the elections you
need to be an Indian citizen.
The start the crawling we need to have URLs of the academic websites that act as the
seed URL for the crawler to get the academicians. The following are the approach
we use to get the list of all the seed URLs of the home page for the universities in
the BRICS nations.
1. Higher education boards and sources [7]
2. Seed URL using google maps crawling [8].
6 Counter-Intuitive Approach
The proposed approach to deal with the change in names due to literal word translation
by the google translate is as follow. We are making a comparison between the English
version of the name and it’s the translation in Chinese. So that we can map the Chinese
translation with the original Indian name [9].
The first part of this creating a mapping of Indian to Chinese names. For this,
we used Google Translate to find the Chinese translation of the Indian name and
then again translating that Chinese translated name back to English to check for the
consistency in the translation and repeating the process one more time. This was
done for 33,000 Indian names derived from the datasets mentions above.
As shown in Fig. 4, the first column is the original name, and the consecu-
tive columns are translations from Google Translate of the previous column. The
reader can clearly observe the total change in name from column A to E after these
translations. So, the set of translations is as follows:
English -> Chinese -> English -> Chinese -> English
So, the first case is looking for Indian names in the English article that contains
Indian names and then counting the Indian names manually from the NER labeled
PERSON entities in the text and also making a count by comparing with the dataset
above.
Next, we convert the English article to foreign language, say Chinese using Google
Translate and then search for the Indian names in Chinese from the translated Chinese
Multilingual Crawling Strategies for Information Retrieval … 159
text using the list of names in column B and column D. Surprisingly, the count in
both the cases comes out to be different. Here, in the translated Chinese text file we
use regex to remove any English words present in the file.
Next, the Translated Chinese text from above is converted back to English and
tested for getting the names using NER as in the first case.
7 Results
Figure 5 shows the results from translation of entities of Chinese Websites. First row
shows the length of entities we get after running NER on the sites—1818. In this
corpus, the entities designated as PERSON were 2869, and it matched 2239 names
from our dataset of names, whose size is 30,000. Also, total matches for overall
entities is 35,821 (includes names, as well as other text on the website pages).
Now, as shown in Fig. 4, we translated the 30,000 names back and forth to English
and Chinese. Using the final translated names from those results, we match them again
with the NER entities. This time, the matches for English text were reduced to 450,
and the Chinese matches reduced to 317.
Similarly, for Row 8 and Row 9 which match the Chinese versions of the words.
Applying all these translations on the text, the final matches with the dataset as shown
in row 11:7814, which is a decrease of 78.18% from Row 4 where for the original
text, the matches were 35,821.
8 Conclusion
This paper discussed the usage of various techniques, and the results obtained after
applying some of those viable techniques on the corpus of data. The results can be
divided into 3 categories for BRICS nations:
A. South Africa: The data is already in English and hence needs to translation, so
just direct extraction of data is possible.
B. Brazil: The languages for the websites are Portuguese and Spanish. Both these
languages make no changes to the names of people, and hence names need not
to be translated.
C. Russia and China: For Russian and Chinese, the character set is different than
English with some letters like M and V and others are not present in one or both
of these.
Hence, to generalize, any language which has a character set same as that of
English is easily translatable using currently available methods, but other languages
need different methods to achieve the same. A combination of NER and NMT-based
translators seem a viable option for the same.
References
1. Kumar, Manish, Bhatia, Rajesh, Rattan, Dhavleesh: A survey of Web crawlers for information
retrieval. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 7(6), e1218 (2017)
2. Kumar, M., Bindal, A., Gautam, R., Bhatia, R.: Keyword query based focused Web crawler.
Procedia Comput. Sci. 125, 584–590 (2018)
3. Neural Machine Translation (online article). https://towardsdatascience.com/neural-machine-
translation-15ecf6b0b. Last accessed 1 March 2020
4. Wang, X., Zhu, C., Li, S., Zhao. T., Zheng, D.: Neural machine translation research based on the
semantic vector of the tri-lingual parallel corpus. In: 2016 International Conference on Machine
Learning and Cybernetics (ICMLC), Jeju, pp. 69–74. https://doi.org/10.1109/icmlc.2016.786
0879
5. https://www.kaggle.com/chaitanyapatil7/indian-names/version/1 [online dataset]
6. https://github.com/datameet [a community of Data Science enthusiasts.]
7. https://www.ugc.ac.in/oldpdf/Consolidated%20list%20of%20All%20Universities.pdf
8. https://github.com/shivamkathuria/Google-Maps-Crawler [to get code of developed crawler]
9. Creekmore, L.: Named entity recognition and classification for entity extraction. District Data
Labs
Missing Phone Activity Detection Using
LSTM Classifier
Abstract We propose a smart phone application that aids a user to find his lost
phone. In this application, we try to find the cases, where a mobile can be separated
from the user, by training a classifier. The application identifies specific events, where
a mobile phone goes away from the user, thus recording cumulative sensor data. The
obtained sensor data can be used for further analysis of mobile phone surroundings
which could narrow down the search domain. A more trivial example would be
that GPS couldn’t help when you forgot, where the phone has been placed. So, the
application can gather the information of surroundings at the last recognized event
and make the search more effective.
1 Introduction
Science and technology in today’s world has enhanced by a great extent. It is an effort
for making humans life easier and importantly comforting. A lot of information that
we might need can be stored in a smart phone. So, a mobile phone acts as our
secondary brain not only to remember but also remind, see, and listen, covering
basically all the senses humans have. With all these abilities, a smart phone has been
such a helpful companion to our daily life, what if you lost it or forgot it at some
place? And was low on battery causing it to turn off eventually? We will be left
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 161
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_18
162 A. Rastogi et al.
clueless. There aren’t many efficient ways to find the phone; the existing solutions
give you a GPS location, which could be helpful only when outside. And the mobile
need to be alive when the location request is triggered.
What if you were doing some task at your home and forgot your phone? Like
placing the phone in a bookshelf picking books. The GPS can just say that it is in
your home but nothing more. As said a smart phone has a sensing capability like that
of humans, we want to use this to advantage. A mobile phone can be made to sense the
surroundings like lighting, coordinates (Indoor positioning if it can), sound signals
(to identify if the environment is silent/noisy) along with GPS; at the instant, a phone
got separated from the user [5]. On sensing that is recording the sensor information,
it can be uploaded to a server. Thus knowing surrounding info on all cases when
a phone identifies the trained instances. In this paper, we propose a mobile phone
monitoring application. The application serves as an aid, helping narrow down the
details of the surroundings when a phone gets separated from the user. The existing
solutions make the user to query for the location after they realize that the phone is
lost. Then, the application can give the GPS location contacting the mobile phone
provided the mobile phone is on, whereas our application addresses the problem
of phone being separated from the user through the question “How or when can a
phone be separated from the user?” Thus, the trigger events, which indicate when a
phone gets separated from the user. With this approach, the user can get the latest
information about phones location before its dead (turned off). The trigger events
can be a phone getting dropped from the pocket or hands unnoticed, placing it at
a location and forgetting the location due to some other important distraction or a
mobile phone being stolen from the user [2, 6, 7].
2 Related Work
Loosing, a phone can be considered as one of the trivial problems yet difficult to
recover, a mobile user could face. On losing a phone, a user would like to know where
the phone currently is and want the phone not to be misused. Mobile phone operating
systems like Android and iOS have apps Find your phone and Find My iPhone that
can query the device for its location. But, the disadvantage with this approach is
that the phone needs to be alive (ON) when the query is made; otherwise, the phone
can’t respond to the query. A smart phone can learn the location of a neighborhood
not only through physical coordinates but also in a logical way. That is, a smart
phone can sense the light ambiance and sound in a neighborhood describing if the
location is a quite/noisy place, a dark/bright area logically. Research is also being
done in this field, where a mobile phone can locate users in adjacent stores, solving
the problem through logical localization [5]. This added to the application, brings the
search domain lower making work easy for users. The other approach is finding the
trigger event of the phone being lost and alerting the user [1] proposed a framework
called iGuard which triggers an alarm when it identifies a theft. Their paper addresses
problem by analyzing the activity during moment of theft. That is, they described
Missing Phone Activity Detection Using LSTM Classifier 163
how the sensor signatures of a mobile phone would differ when being taken out by
the thief and the mobile owner, thus alerting the user instantly on theft. However,
the disadvantage of this approach is that feature extraction is done manually by the
researchers, which is a very cumbersome task. For the activity of a missing phone, it
is possible that we may miss out on the actual factors responsible for differentiating
two different actions. The approach of our application aligns to their paper that is
how or when a mobile phone could be separated from the user. In this paper, we make
use of a deep learning approach to classify our sensor data for different activities.
3 System Design
The application at a high level can be divided into communication among three
components, (i) The mobile phone, to monitor the user’s activity, (ii) A classifier that
can take in the sensor information from mobile phone and identify the trigger events,
(iii) An online platform, where the sensor information about the neighborhood is
logged automatically for further logical analysis when the trigger event is identified.
Following are the assumptions for the app to be functional
(i) The application needs to be started manually and continuously monitors the
user activity for trigger, (ii) The mobile phone must be on when the trigger event
occurred, (iii) The mobile phone must have Internet on, in order for the sensor to
log information to an online platform. Figure 1 gives a high-level overview of the
mentioned system. The most important and challenging component of the system
is identifying the trigger event. The classifier must differentiate the short duration
activities like taking the phone out of pocket using the features extracted from the
sensor signatures of various activities. Once trigger has been identified, the system
can log sensor information onto an online platform for further analysis.
Fig. 2 Algorithm
4 Algorithm/Method Design
5 Experimental Setup
5.1 Preliminary
In this section, we first develop various scenarios, where a person can be separated
from his phone. Specifically, we take three different situations into consideration.
(i) User places a phone on a surface and forgets about it. (ii) User drops his phone
Missing Phone Activity Detection Using LSTM Classifier 165
while walking or standing. (iii) User’s phone is stolen from his pocket by a thief
or perpetrator. The implementation of this paper is focused on creating different
signatures for each of these activities and then training a classifier to demonstrate
how mobile sensor data could be used to warn the user about the missing phone
problem in real time.
collect 50 samples per volunteer per activity, leading to a total (50 × 7 × 4) 1400
samples. In addition to that, all experiments are video recorded, and data collected
from the filters is labeled manually by comparing sensor data to the video frame by
frame.
Collected sensor data was passed through a low pass filter to remove noise. Next,
data collected from each experiment per volunteer was sampled with a window size
of 3 s with 50% overlap. Since the frequency of data collection is 10 Hz, we end up
with 30 samples of data per input. Each sample is further represented by 12 features
individually, where the twelve entities comprise of x, y, z values of each of the four
sensors involved, namely accelerometer, gyroscope, linear acceleration, and gravity
sensors.
Data processing has been done using Python’s SciPy and NumPy libraries.
5.4 Training
In most of the previous models for theft detection, feature extraction is done manually
by the researchers. For example, [1] proposed a framework called iGuard, where the
authors explicitly look for a specific signature in both the activities of the phone
being taken out by the user himself and another, where the phone is stolen by the
perpetrator.
Their paper mentions how in the event of a user taking out the phone himself; the
speed of the user first decreases; then, the phone is taken out, and then, normal speed
is resumed.
Also, for another case of the perpetrator taking out the user’s phone, the phone is
first taken out; then, the speed of the perpetrator increases. These scenarios though
true for most cases are highly specific, and in an activity of theft, the user or the
perpetrator may not act like the model figures they have been represented as in
the application. Feature extraction is a highly cumbersome task and requires precise
feature engineering [4]. For the activity of a missing phone, it is possible that we may
miss out on the actual factors responsible for differentiating two different actions.
Also, as we consider more and more sensors in our application, it is a challenging
task to analyze the effect of each sensor on different activities. To counter the above
scenarios, we make use of a deep learning approach to classify our data for different
activities. We make use of a LSTM recurrent neural network to classify mobile sensor
data. The advantage of using LSTM is that while giving accurate results, it does the
feature engineering for us [4]. Also, we can avoid the hassle of doing a whole lot of
signal processing before using the sensor data in our model. We wanted to use an
RNN for our activity identification model, because we were dealing with a sequence
Missing Phone Activity Detection Using LSTM Classifier 167
of sensor data. And also, because we wanted the neural network to learn the hidden
features that differentiate two different activities.
5.5 Implementation
The classifier has been implemented in Python using tensor flow. Along with that,
all sensor processing, data preprocessing, and data analysis have been done using
scikit-learn, NumPy, matplotlib, and pandas libraries in Python along with Jupyter
Notebooks.
6 Performance Evaluation
In this section, we evaluate the performance of our model under various scenarios to
show its accuracy and robustness. We have tested our model using Android Phones
(Pixel 2, Samsung s7, Samsung Note3, Samsung S8). The sampling rate is set as
10 Hz. This sampling rate is conducive to our model because each activity of taking
the phone out, or stealing of the phone by the perpetrator takes roughly 3 s. Having a
sampling rate of 10 Hz gives us 30 readings per sensor per time window, which is good
enough to train our model. All experiments have been conducted in four different
scenarios: (i) the library, (ii) an open area, (iii) an open area with people around,
(iv) in an office like setting. To evaluate the accuracy of our model, we randomly
segment our dataset into training and test set, in the ratio 4:1. Figure 4 highlights the
accuracy of our model which is 95.52%, which is fairly good. We also calculate the
precision (96.03%), recall (95.51%), and f1_score (95.50%) for our model. Figure 3
shows how the loss over both training and testing data decreases with increase in
number of training iterations. Also, with increase in number of training iterations the
accuracy of the model increases.
In the confusion matrix depicted below, we can find that while the activities of
sitting, standing, and placing the phone on the table are detected without a miss;
certain segments (25) of walking are misclassified as being dropped or as taking
phone out of pocket. Apart from that, only four segments of the activity of phone
being taken out by the user were classified as that being stolen by a thief. Also, only
eight segments of the phone being stolen by a thief were classified as phone being
taken out by self (Fig. 5 and Fig. 6).
7 Conclusion
The proposed solution can successfully identify the triggers with around 95.5% accu-
racy, thus can send the log information for an effective search. But the application
has few drawbacks. The application is of no use when the trigger event happens,
and the mobile phone is turned off. There is a solution, where a signal can be trans-
mitted from the phone even if it is turned off using the bios battery. So, a minor log
information can be embedded to signal can be sent, for recovery. We will have to
manually open the app in order to check for triggers. Phone getting dropped and user
falling down along with the phone can give the same results. Hence, the classifier
can further be trained for such kind of scenarios. Therefore, proposed application
can successfully recognize the trigger events, which play a vital role in automated
sensor logging. This helps in removing the window between a user recognizing the
phone is missing and responding to that, by automation, thus helping in finding the
phone.
References
1. Jin, M., He, Y., Fang, D., Chen, X., Meng, X., Xing, T.: iGuard: a real-time anti-theft system for
smartphones. IEEE Trans. Mob. Comput. 17(10), 2307–2320 (2018). https://doi.org/10.1109/
tmc.2018.2798618
2. Liu, X., Wagner, D., Egelman, S.: Detecting phone theft using machine learning, pp. 30–36
(2018). https://doi.org/10.1145/3209914.3209923
3. Chang, S., Lu, T., Song, H.: SmartDog: real-time detection of smartphone theft. In: IEEE Interna-
tional Conference on Internet of Things (iThings) and IEEE Green Computing and Communica-
tions (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart
Data (SmartData), Chengdu, pp. 223–228 (2016). https://doi.org/10.1109/ithings-greencom-
CPSCom-SmartData.2016.61
4. Pulver, A., Lyu, S.: LSTM with working memory. In: International Joint Conference on Neural
Networks (IJCNN), Anchorage, AK, pp. 845–851 (2017). https://doi.org/10.1109/ijcnn.2017.
7965940
5. Uddin, M.P., Nitu, A.: A tracking application for lost/stolen android phones using face detection
(2015)
6. Carrara, F., Elias, P., Sedmidubsky, J., Zezula, P.: LSTM-based real-time action detection and
prediction in human motion streams. Multimedia Tools Appl. 78, 27309–27331 (2019)
7. Senyurek, V.Y., Imtiaz, M.H., Belsare, P., Tiffany, S., Sazonov, E.: A CNN-LSTM neural network
for recognition of puffing in smoking episodes using wearable sensors. Biomed. Eng. Lett. 10,
195–203 (2020)
Suvarga: Promoting a Healthy Society
R. L. Priya, Gayatri Patil, Gaurav Tirodkar, Yash Mate, and Nikhil Nagdev
Abstract In India, over 22% of the population is below the poverty line. This poverty
pushes people on streets which in the future transforms into slums. These slums,
as are not planned, lack certain necessities like electricity, sanitary services, and
basic hygiene resources leading to a hub for the spread of diseases. In essence, the
primary aim of this paper is to identify the leading causes of diseases in slum areas of
Mumbai using data collected from IoT modules, health checkup drives, and various
government authorities. With this information, the concerned civic authorities and
slum residents will be alerted regarding the danger so that necessary action can be
taken. This, in turn, promotes the healthier society in various slum regions of India.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 171
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_19
172 R. L. Priya et al.
1 Introduction
According to the United Nations (UN 2009) estimates, only 4% of the terrestrial
surface is occupied by cities [1]. Though the percentage is so low, more than half the
world population stays in these cities which eventually generates a huge imbalance
in the world resources as this section consumes three-quarters of the world’s natural
resources.
For upgrading these slums, commonly, the first action taken is to demolish the
slums and reallocate the residents but since 1970 there have been multiple recom-
mendations by authors such as Turner (1972) which suggest otherwise. This gives
birth to the concept of upgrading slums and their residents to a better standard of
habitation [2]. The paper uses the approach of data analysis and deep learning to
have a better understanding of this approach and provide solutions for implementing
the same.
2 Literature Survey
Slum management has always been a major issue in the city like Mumbai [3]. Current
research by SRA maps the information of residents on a website with the help
of drones. The government has gathered demographic data from this method. [4].
Although this method covers many data fields, major data fields like the health factors
and pollution measures are ignored.
Improved infrastructure can prove to be a major catalyst for achieving major
sustainable goals. Taking this aspect into consideration, the UN-Habitat Opinion
Survey method which was based on the nature of social reality and the perspective
of the researchers was used in the slum residents of Africa. The result after analysis
displayed that the infrastructure in Africa can be primarily developed with the help
of proper water supply, road networks, and telecommunication [4].
To understand the positive and negative implications of upgrading slums, a case
study was conducted in Moravia’s Neighborhood, Medellin. The principles of urban
design strategies and urban rehabilitation programs were identified through technical
documents, qualitative and quantitative data which was collected through surveys at
the community level [5].
A non-integrated framework was adopted to evaluate the suitability of the interior
design of a low-income multipurpose apartment to provide enhanced IEQ. Here,
expert opinion survey was taken, AHP TOPSIS was performed, and the final opti-
mized solution was generated [6]. This research included only the interior design and
not the other parameters like pollution and health and other geographical aspects.
Suvarga: Promoting a Healthy Society 173
3 Proposed System
3.1 Overview
To aid the health situation in the country, the model proposed a novel approach
by building an Internet of things (IoT)-based intelligence system. The model will
provide regular updates about the chances of epidemics, few symptoms to spot those
diseases, and also emergency contacts of concerned doctors along with a few home
remedies. It also alerts the government authorities regarding its aim to create the
necessary awareness among the government authorities and the slum residents.
The system is composed of various modules such as data collection from various
sources (IoT, BMC health data, water data, sanitation, survey data), preprocessing,
feature extraction, training, and testing models, displaying the final output on the
web and mobile application as listed below and shown in Fig. 1.
The detailed workflow of Suvarga as shown in Fig. 2 describes the data collection
from heterogeneous sources with data preprocessing to build a better prediction
model.
To calculate the air quality index of specific regions, we have built an IoT based
module. Each module is composed of MQ series sensors (MQ135, MQ2, MQ3) to
measure the air quality index. These sensors are placed on an ESP8266 (Table 1).
Poor water quality is a big issue, especially in slum regions. To test the water quality,
the model uses the BMC water quality monitoring module. The data gives out the
pH, dissolved oxygen, BOD, COD, etc. Over ten years of data is collected.
4.3 Sanitation
Sanitation data from BMC gives the distribution of toilets for men and women in the
Chembur region. It provides information on the number of toilets with respect to the
number of people.
The model was trained using algorithms like long short-term memory (LSTM) and
decision trees. Later it was compared to choose the best algorithm.
The proposed system aims to build an intelligent system for promoting healthy living
in various slum regions of India. It consists of mainly two components such as the
176 R. L. Priya et al.
prediction or analysis model and data visualization model. Such components are
designed to display the analysis obtained from the analysis model and visualized in
graphical formats via the web application.
Using the data collected from the IoT module for reading air quality parameters
and others collected from various other sources via government authorities such as
BMC and MPCB, an analysis algorithm is run to predict the values of air/water
and correlation among various features of the dataset. LSTM algorithm is applied to
data, and the correlation among features is found using Pearson’s correlation formula
available in the pandas’ library.
The final outcome of all the analysis needs to be presented to the layman in terms
understandable; hence, a user-friendly web app is built for the government authorities
as well as the slum residents. Each can access multiple features of the web app
like predictive analysis of air and water quality in the future, basic care, and home
available remedies to prevent oneself and loved ones from epidemics.
5 Implementation
The aggregated data provided by the government is useful in data analysis over long
time but there can be mishaps. To address this problem, Suvarga has developed a
network of IoT devices that would be installed in the slums. These IoT devices would
act as a network, continuously monitoring the various air quality parameters.
Three different IoT devices are fitted at the three corners of the slum area. The
devices act in unison, forming a mesh, transferring data to a common device acting
as the source which forms a server.
The data received over the IoT module is sent to this centralized data visualization
Cayenne server. The data could be visualized in real time and plotted on a live graph.
The set trigger is activated when a sensor gives a value that crosses a threshold,
indicating that an accident has taken place. An instant notification in the form of
SMS and email alert to the concerned government authority is sent simultaneously.
With this, the government can send instant relief or could take necessary actions to
pacify the toxic environment. As shown in Fig. 3, it describes the experimental setup
of the air quality real-time monitoring device.
A slum health drive was conducted for the residents of the slum adjoining VESIT
on the 25th of January 2020. The parameters of the data that the team obtained from
the health drive are name, gender, age, weight, height, poverty status, toilet, drainage
linked to the house, waste collection system, compost pit, source of water, washing
clothes and utensils, alcohol, diabetes, hypertension, cholesterol, level of education,
Aadhaar card, authorized electricity connection, bank account, computer literate,
and source of income. Figure 4 shows the ratio of blood pressure people. Figure 5
represents the percentage of people that were healthy by weight, or overweight, or
underweight.
178 R. L. Priya et al.
Open defecation has been an onerous issue for a while, causing a variety of health-
related issues. The team of researchers decided to collect sanitation data from the
government authorities of the Chembur region through BMC offices. The dataset
obtained was in the form of a CSV file encompassing various parameters including
hypertension, fever, asthma, communicable, etc. The dataset comprises 172 rows
(records) and 7 columns (attributes).
A ward by ward analysis is done to ensure proper sanitation facilities which exist
in every ward so as to not strain the resources. The group by method in pandas (a
library in Python that deals with data frames) is used to group the data by each ward.
To find out, if all wards have toilets commensurate with each other, a pie chart has
been plotted that shows the distribution of the toilets in the region. A discrepancy
has been observed in the distribution. While the number of toilets in ward number
154 soars as high as 32, the number of toilets in ward number 149 is a meager 3. The
total toilets in the region are 2669. According to research, the number of people per
toilet is 100 but using population data obtained, the number of persons per toilet is
close to 457 (Fig. 6).
Suvarga: Promoting a Healthy Society 179
The health data was obtained from the hospitals in the vicinity of the area where
cases of malaria and other respiratory infections happened sporadically. The dataset
obtained from BMC offices was month-wise historical data of years 2017 and 2018
and had the name of the dispensary, the month under consideration, and the average
levels of health-related parameters like the total number of people suffering from
asthma, malaria, URTIs, and heart diseases to name a few.
The air and water quality data obtained encompassed historic data for the past
four years, whereas the health data procured from the BMC authorities attributed
records from 2017 to 2018. To avoid any aberrations, air and water quality data of
2018 and 2017 is taken into account along with health data of the same two years.
Correlation between all the parameters is obtained, and the results of which are
stored in a correlation matrix. All the correlations are stored in the matrix and then
sorted in ascending order using quick sort. The variables having the most correlation
are shown in Fig. 7.
However, the correlations above show the relation between the attributes of the
same table. A correlation between air quality parameters and URTIs is established,
and in the same way, malaria is correlated with the biological oxygen demand in the
water.
According to research, there has been an association between upper urinary tract
infection and respirable suspended particulate matter (RSPM) [7]. An increase in
particulates is detrimental to health as it causes a variety of conditions related to
respiratory tracts. The analysis conducted (Fig. 8) also states the direct positive
association between the two factors.
The comparative study established between the regression algorithms suggests that
Decision Tree Regression achieves the lowest error rate when evaluated with error
measurement metrics for regression comprising of mean squared error (MSE), mean
absolute error (MAE), and R2 score as compared to the other regression algorithms
180 R. L. Priya et al.
Fig. 9 R2 score
being tested on these metrics like Lasso Regression, Lasso Lars Regression, Bayesian
Regression, and Random Forest Regression (Fig. 9).
Mean squared error is a metric that tells how close the predicted points are to the
actual points on the regression line.
2
1
L= Ŷ − Y (1)
N
Mean absolute error is the measure of the difference in the actual value and the
predicted value.
182 R. L. Priya et al.
1
n
MAE = |xi − x| (2)
n i=1
The research undertaken particularly focuses on improving the health and sanitation
facilities on the slums in Chembur, Mumbai region. Harnessing the potential of
artificial intelligence, data analysis, and Internet of things (IoT), the proposed system
predicts the patterns in air quality, water quality, and sanitary facilities and builds a
strong interdependence of these environmental and sanitary factors on the health of
the individuals residing there.
Through sanitation data, it was found out that the number of people per toilet
was 456.955 which were a lot higher than the ideal ratio which is 100 people per
toilet. A correlation is also established between the numbers of malaria patients in the
hospitals in the vicinity of the slums to the water quality Index. It was concluded that
URTI and RSPM have correlation of 0.547853, thus having a significant correlation.
Suvarga: Promoting a Healthy Society 183
Data from the NGO health drive conducted indicated that a 57.15% of residents were
either overweight or underweight. Moreover, the R2 score obtained from decision
trees has a value of 0.99, indicating almost perfect prediction. The government could
use the findings of the research to take appropriate actions to assuage the detrimental
effects of poor well-being, unclean surroundings, and polluted environment on the
dwellers of slums.
References
1. United Nations: World Population Prospects: 2009 revision, Population and Development
Division, Department of Economics, and Social affairs (2009)
2. Building resilience of urban slums in Dhaka, Bangladesh, Iftekhar Ahmed. Procedia-Soc. Behav.
Sci. 218 (2016)
3. Dikhle, S., Lakhena, R.: GIS-Based slum information management system. In: 17th Esri India
User Conference (2017)
4. Arimah, B.: Infrastructure as a catalyst for the prosperity of African cities. In: Urban Transitions
Conference, Shanghai (2016)
5. Vilar, K., Cartes, I.: Urban design, and social capital in slums. Case Study_ Moravia’s
Neighborhood, Medellin (2004–2014)
6. Sarkar, A., Bardhan, R.: Improved indoor environment through an optimised ventilator and
furniture positioning: a case of slum rehabilitation housing, Mumbai, India. Accepted 1 Dec
2019
7. Li, Y.R., Xiao, C.C., Li, J., Tang, J., Geng, X.Y., Cui, L.J., Zhai, J.X.: Association between air
pollution and upper respiratory tract infection in hospital outpatients aged 0–14 years in Hefei,
China: a time series study. Public Health 156, 92–100 (2018)
Multi-task Data Driven Modelling Based
on Transfer Learned Features in Deep
Learning for Biomedical Application
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 185
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_20
186 N. Harini et al.
data samples in deep learning, adapting them for biomedical application. We also
extend the model to classify gender and predict the age of the patients using the
features extracted from spine CT volumes. This extension might help in analysing
spine diseases for a particular time or place, even when there is no metadata. The
proposed method uses transfer learning to extract the feature descriptor of each spine
scan images from each spine sequence. Those feature descriptors are then fed to an
LSTM network combined with dense layers, to localize the spine vertebrae for each
spine sequence. The final dense layer features are used to classify the gender and to
predict the age of the patient using a random forest classifier and regressor respec-
tively. We also report a comparative study on our results with previous techniques
that have used the same dataset. Furthermore, the age prediction and gender classifi-
cation are evaluated using standard cross validation; 10, 20 and 30% of the available
training set. The following are the major contributions and novelty of the proposed
method
1. Limited availability of the data is handled with transfer learned feature extraction
for spine CT volumes in the MICCAI 2014 challenge dataset.
2. The LSTM network is utilized to extract continuity information from feature
descriptors of the spine volumes, where each feature descriptor is handled as
each instance of the spine scan.
3. A novel extension to identify age and gender of the patient.
2 Literature Survey
A good amount of spine scan images are necessary to model a machine learning prob-
lem yielding high localization accuracy. This creates challenges where availability
of data is scarce due to legal and medical bindings. Furthermore, to perform segmen-
tation or localization tasks, annotating large volumes of data (by domain experts) is
necessary, which is a tedious and challenging task. Generally, low amount of data
is dealt with by many, using traditional augmentation techniques such as translation
and rotation [9, 10] and GAN [7] as proposed for classification [5] and segmentation
[2]. Though generation of data helps to improve many computer vision applications,
generating spine scans comparable to original is risky and highly challenging. So we
have used, another approach; transfer learning a feature extraction network, that will
leverage using existing experience, without any synthetic data generation [13]. Trans-
fer learning approaches using a pre-trained CNN networks has improved the results
on various medial imaging applications as shown in [8, 13]. Among other methods
for localization of spine vertebrae, Glocker et al.’s random forest (RF) regression and
Hidden Markov Model achieved benchmark results on the MICCAI 2014 Computa-
tional Challenge on Vertebrae Localization and Identification [6] dataset. It uses hand
crafted features from CT spine volumes. Chen et al. proposed three stages; a coarse
vertebra candidate localization, vertebrae identification using JCNN, and localiza-
tion refinement with shape regression model [3]. This method employs a binary
Multi-task Data Driven Modelling Based on Transfer … 187
3 Methodology
The overall architecture of the proposed work is shown in Fig. 1. It has three stages,
where each stage serves as a feature extractor for the next stage.
Dataset Description: MICCAI 2014 conducted a challenge titled, “Vertebrae
Localization and Identification” which consists of 242 training and 60 testing scans
of CT spine volumes. Each scan in the training and testing set is manually annotated
with 3 dimensional vertebrae centroids with labels; the meta data such as age and
gender of the patient is available only for the training set. Within each scan in the
Fig. 1 Overall architecture proposed for the identification and localization of vertebrae centroids
and estimation of age and gender
188 N. Harini et al.
dataset there are a varying number of CT images between 31 and 511 grayscale
images of size 512 × 512. The total vertebrae centroids available are 7 cervical, 12
thoracic, 5 lumbar and 2 sacrum.
Stage 1—Transfer Learned Feature Descriptors: The extraction of feature
descriptors from the CT volumes is an essential step to train an automatic local-
ization model. Each scan is converted to a feature vector using a pre-trained Dense
network. This network is trained on Imagenet dataset and the final Global Average
Pooling (GAP) layer is selected to extract the feature for each image in the scan vol-
umes. Each scan has N number of images represented as {In }n=1 N
where each image I
is converted to f (I ) as a transfer learned feature descriptor with vector length 1664
as the final GAP layer of Dense network yields the same.
Stage 2—Vertebrae Localization using LSTM layer: The converted training
set, where each scan represented as f (I ) where n in range of 1 to N has vertebrae
centroids as V = {C1 , C2 , C3 , . . . , S1 , S2 } where each element is a 3D point rep-
resented as (x, y). It consists of total 26 vertebrae centroids resulting in a vector
length of 78 (26 * 3) and the missing centroids are set as zero. The feature descriptors
(vector) for the training set are used to train an LSTM layer combined with a dense
layer; the final fully connected regression layer gives the centroid points.
Stage 3—Age and Gender identification: From the trained LSTM Network,
the resultant feature before the regression layer serves as the feature descriptor for
each scan. In this module, each scan is represented as f (s) where s is the CT spine
volumes and f (s) is the feature extracted from a dense layer with 256 neural nodes.
Those features are used to train a random forest regressor to estimate the age and a
classifier to identify the gender. Random forests are the ensemble machine learning
algorithm which can handle both regression and classification [4].
The feature descriptor obtained by the transfer learning are embeddings, which are
used for the vertebrae centroids identification, age and gender prediction, a relevant
metric should be chosen to evaluate the extracted features. Multiple pre-trained CNN
networks are evaluated based on the cosine distance between these embeddings
(which are our feature descriptors). The premise is, each scan maps to a different
target vector V , so the cosine distance between the images belonging to different
scans should indicate divergence; all the images of the same scan are expected to
show proximity. The cosine distance between the images are computed as follows
1 N N
Ai × B j
Dsame = (1)
N (N − 1) i=1 i= j, j=1 | Ai | × | B j |
where Dsame represent the distance of a scan with N images. Similarly, the same
formula is used to calculate the distance between two scans in which A is the feature
Multi-task Data Driven Modelling Based on Transfer … 189
vector of the image belonging to one scan and B is the feature vector of the image
belonging to another scan. The cosine distance between two scans of N1 and N2
images respectively is computed as follows
1 N1 N2
Ai × B j
Ddiff = (2)
N1 (N2 − 1) i=1 i= j, j=1 | Ai | × | B j |
Table 1 Comparison between pre-trained networks for feature extraction based on the cosine
distance
Model Cosine distance Cosine distance (diff) Difference in distance
(same)
Inception V3 0.246 0.308 0.061
Densenet 169 0.230 0.319 0.089
VGG 19 0.016 0.021 0.004
Xception 0.320 0.406 0.086
Resnet 50 0.036 0.054 0.050
190 N. Harini et al.
V − min(V )
V = (3)
max(V ) − min(V )
LSTM network mentioned without GAP layer returns only the output of the
final LSTM cell but not, every cell. Extracting the information provided by every
LSTM cell and averaging the features vertically (GAP) before the regression layer
(Dense(78)) provides lesser localization error. Furthermore, the results obtained by
the best LSTM network is compared with the previous benchmark results that have
used the same test set evaluation as shown in Table 3. Though the network fails to
provide localization with lesser mean and standard deviation error with respect to
Cervical and Lumbar centroids, it gives better and uniform identification accuracy
across all types of centroids as shown in Fig. 2a. The method provides better results
than Glocker [6] without any classification on images as vertebrae or background
as presumed by JCNN [3]. The information that whether an image in the CT spine
volume is a vertebrae or a background is not available in the challenge dataset and it
requires domain expert’s knowledge. Chen et al. has reported the variation of iden-
tification rate for all the vertebrae centroids [3] where the identification rate ranges
between 30 and 100% in which the thoracic region experiences a lower identification
rate.
In Fig. 2a and b, the identification rate and the localization error between all
the centroids are shown. The proposed method has the identification rate varying
between 70 and 90% resulting in a balanced estimation on vertebrae centroids. The
Multi-task Data Driven Modelling Based on Transfer … 191
Fig. 2 a Identification rate of the predicted vertebrae centroids. b Localization error of the predicted
vertebrae centroids
stage 3 in the proposed model is a novel approach that extends the use of features
extracted from the stage 2 network. The MICCAI 2014 challenge provided the meta
information for every scan in the training set and it was not utilized in any previous
approaches as the main goal is to identify and localize the vertebrae centroids [1].
This extension is evaluated by different validation splits on the training set due to
availability of metadata only for the training set. The features obtained by the GAP
layer of the LSTM network is used to estimate the gender and age. In the training
set, the distribution of the data between the two genders are imbalanced, and so
a weighted class with more weightage to the Female class is trained on different
validation splits as shown in Table 4.
In all the three splits, the accuracy, F1 score and confusion matrix obtained on
testing the validation data is compared. The model is evidently capable of estimating
the gender of the patient from the spine scans with an F1 score of 0.70. The same
features, that are used to train the gender classifier, is utilized to identify the age of
the patients.
The age range of the patients available in the train data varies between 10 and
100 which is a wide space, the target variable (age) is transformed in the range
between 0 and 1 and trained using a random forest regressor. The results obtained
on evaluating different validation splits as experimented in gender identification are
compared based on Mean Absolute Error (MAE) in Table 5. As the estimation of the
192 N. Harini et al.
Table 5 Mean absolute error (MAE) between the age ranges among different validation splits
Validation Age ranges MAE
split (%) on vali-
dation
10–20 20–30 30–40 40–50 50–60 60–70 70–80 80–90 90–100
10 0 6.22 0 2.84 1.31 5.70 4.90 0 0 5.74
20 0 6.54 1.13 2.33 4.44 3.70 4.14 6.25 0.7 5.97
30 0 7.74 1.23 2.46 4.60 3.64 4.22 5.98 1.10 5.59
age is with the mean absolute error not more than 6, the model is evidently believed to
estimate the age from spine scans with less difference to the actual age of the patient.
The error obtained with age ranges in the validation data is tabulated in Table 5. The
error rate is lesser for the scans that belong to the patient of age lesser than 80 and
greater than 30. It might be due to the more amount of data in the train set for the
mentioned age ranges.
Clinical spine diagnosis and spine disease trend analysis can be assisted with a multi-
task data driven model that localizes spine vertebrae centroids and identifies the age
and gender using spine CT volumes. The proposed model handles the disadvantages
of limited sample dataset through transfer learning feature extraction using a novel
performance analysis to find the right transfer learning network. The model also seeks
to perform uniformly across all types of spine vertebrae by producing identification
rates between 70 and 90%. The novel extension of the model is not just limited
to identify age and gender but can also be extended to cluster the scans belonging
to the same patient. This will be tabled for future research since the metadata in
the challenge dataset also includes the annotation of scans belonging to the same
patient. Though the proposed algorithm outperforms the benchmark results with
identification rate of the vertebrae centroids, the model can be hyper tuned to further
reduce the localization error.
References
1. http://csi-workshop.weebly.com/challenges.html
2. Bowles, C., Chen, L., Guerrero, R., Bentley, P., Gunn, R., Hammers, A., Dickie, D.A., Hernán-
dez, M.V., Wardlaw, J., Rueckert, D.: GAN augmentation: augmenting training data using
generative adversarial networks. arXiv preprint arXiv:1810.10863 (2018)
3. Chen, H., Shen, C., Qin, J., Ni, D., Shi, L., Cheng, J.C., Heng, P.A.: Automatic localization and
identification of vertebrae in spine CT via a joint learning model with deep neural networks. In:
Multi-task Data Driven Modelling Based on Transfer … 193
1 Introduction
Speech recognition devices are usually based on adult data. While the latest tech-
nology of speech recognition is not yet ideal for adults, the task of building children’s
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 195
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_21
196 H. Kaur et al.
spoken dialog applications faces far greater challenges [1]. Children are an impor-
tant segment of users that will benefit from advances in multimedia technology.
In multimedia games and computer instructional material, children are one of the
primary potential users of computers for conversational interaction. By using spoken
language interfaces, children are generally comfortable and happy. In order to make
ASR interfaces more interesting to interact with, it is important that they understand
and adapt the language of the user to match or complement the speech of the user
[2].
The topic of children’s speech recognition has been gaining attention in recent
years [3]. Wilpon’s study [4] shows that recognition of the language of children
is more difficult than that of adults. He also noticed that some formant informa-
tion is missing in the case of children’s speech of telecommunications bandwidth
expression, and he lowered the count of Linear Predictive Coding (LPC) coefficients
in his recognizer to account for this trend. Another study [1] explored the utility
of frequency warping to account for shifts in the frequency spectrum owing to the
smaller vocal tracts of boys. Some of the mistakes were considered to be related to
grammar problems. Further focus had to be given to the features related to the ASR of
an infant. It is well recognized that young children’s speech habits differ significantly
from those of adults. Differences have been studied between the auditory properties
of speech in children and adults in [5]. For digit and phrase recognition activities,
this form of deterioration is checked in [6].
In this paper, we discuss some aspects of our work on the children Punjabi speech
recognition system developed under mismatched acoustic conditions for contin-
uous speech. We used discriminative techniques for our work. As a consequence,
discriminatory training techniques are key components of technologies and are a
major area of speech recognition research [7]. These techniques achieved substantial
improvements for small vocabulary tasks on small datasets.
Standard databases such as TIMIT and ATIS are available for foreign languages
such as English but the key obstacle in Punjabi speech research or any other
Indian language is the lack of resources such as speech and text corpora. In this
paper, the confusion among word pattern is handled on small child training dataset
using discriminative methods to enhance the unbalancing between corrected and
uncorrected word sequence.
Apart from the introduction in Sect. 1, the rest of the paper is sorted as pursues.
A part of the relevant work is shown in Sect. 2. Section 3 describes the context in
theoretical terms. The experimental setup is given in Sect. 4, and the experimental
results are provided in Sect. 5. The method is eventually summarized in Sect. 6.
2 Related Work
Li and Huang [8] have proposed an auditory extraction algorithm based on the feature.
The author applied Cochlear Filter Cepstral Coefficients (CFCC) features for speaker
Punjabi Children Speech Recognition System Under … 197
identification to resolve the conditions of acoustic mismatch between test and envi-
ronment. Typically, the system’s output dropped significantly when examined on
noisy data when it is focused on clean speech. In this type of situation, CFCC
is better to perform on the baseline instead Mel Frequency Cepstral Coefficient
(MFCC) under the three mismatched conditions that is car noise, white noise, and
babble noise. All the MFCC and CFCC worked well when data is clean but the preci-
sion of MFCC decreases when signal-to-noise level of 6 decibels although the CFCC
can still achieve higher precision than the MFCC. The CFCC does better under the
white noise than the PLP and RASTA but the CFCC does similarly to the PLP and
RASTA under the vehicle and babble noise.
Giuliani and Gerosa [5] examined speech comprehension in children in the
context of the process of phoneme identification. They analyzed phone recogni-
tion by comparing the two experimental configurations of which children get the
lower accuracy, 77.30% with respect to accuracy, 79.43% for adult phone recogni-
tion obtained. The outcomes of many children speakers being heard were as strong
as for adults. Concerning the reference system under mismatch conditions, for adults
and children, respectively, Vocal Tract Length Normalization (VTLN) makes a rela-
tive reduction of 10.5 and 5.3%. When they recognize the children’s speech with
the baseline system, they obtained a low recognition performance of 58.11%. When
they applied VTLN, they obtained better recognition performance on the same dataset
with the trained system on children up to 66.43%.
Das et al. [1] conducted a several experiments with children’s information to
develop a system of speech recognition for children. Using certain commands and
control data, they found a gain of frequency warping. They designed the acoustic and
language prototype and analyzed word recognition results in different configurations,
where a WER of 10.4 was achieved by the construction method.
Li and Russell [9] studied a speech recognition quality in a small children’s
community. The grammar is proposed to be a major influence on the quality of
recognition of expression. Use a personalized dictionary will improve the perfor-
mance of the ASR, but the change is small. Quality deterioration due to poor speech,
combined with degradation due to the use of telecommunications bandwidth capacity,
is proposed to account for most of the recorded differences in performance between
adult and child.
Lee et al. [3] published on a collection of temporal and auditory parameters
calculated from a recently compiled speech sample of 436 participants between
5 and 18 years of age and 56 adults. Their findings indicated that a major trend
correlated with the growth of speech in normal children is the decrease in amplitude
and in subject variation of both temporal and spectral acoustic parameters with age.
Arunachalam et al. [2] studied the research which focuses on a discourse analysis
of child-machine relationships in the spoken language. Their results indicate that
with no obvious gender differences, younger children are less likely to use respectful
signs and more direct requests for information compared to older ones.
Narayanan and Potamianos [10] reported results that are feasible for the creation
of children’s conversational framework. Standardization of the speaker and modifi-
cation of the template was used to improve the performance of speech recognition.
198 H. Kaur et al.
Overall, the prototype was a positive first effort to build a children’s multimodal
program with an emphasis on conversational language.
Kathania et al. [11] investigated the possibility of deliberately adjusting the pitch
of the voice of children to minimize reported pitch variations between two speaker
classes. Such a clear reduction in pitch is acknowledged as a significant improvement
in the recognition quality. The feasibility of the suggested methods was tested on
ASR models equipped for adults using different auditory simulation strategies, i.e.,
Gaussian Mixture Simulation (GMM), subspace GMM, and Deep Neural Network
(DNN). It is observed that the suggested approaches were highly effective in all the
simulation paradigms that have been studied.
Shahnawazuddin et al. [12] introduce their efforts to improve the quality of the
keyword spotting program under a limited data scenario for children’s language.
They addressed two different ways of implementing prosody alteration effectively.
Augmentation of data based on prosody modification often helps to improve the
quality of adult voice. The frequency of the speech test utterance of the kids is
lowered, and the speaking level is increased significantly. The performance attained
by prosody adjustment is much higher than the output obtained through VTLN usage.
It is also observed that data increase is very effective in improving the performance
of KWS concerning the speech of children.
3 Theoretical Background
K
R
pλ {xt }r |Hsr p L (sr )
FbMMI (λ) = log (2)
s pλ ({x t }r |Hs ) p L (s)e
K −b A(s,sr )
r =1
where A(s, sr ) for a reference sr is the phoneme accuracy of s, and (b > 0) controls
the stability of its outcomes. We equate MMI and bMMI output with ML efficiency.
yt = xt + Mh t (3)
where T represents the interchange, and T f is the aggregate count of datasets. The
objective function of fbMMI is similarly constructed. The optimal matrix M is
obtained by gradient descent. N components of GMM are achieved by accumu-
lating the Gaussians into N components in the original triphone acoustic models and
restating their criterion to form the functionality. The ht non-aligned properties are
measured as
T
xt,1 − μn,i xt,K − μn,K
pt,n , . . . , pt,n , αpt,n (6)
σn,i σn,K
where μn , I, and σ n , i in the nth Gaussian element are the mean and variance in
dimension i. α is the element of scaling. For each frame, Pt, n is measured posteriors
of Gaussian components, accurately measured so that all posteriors except the Q-best
200 H. Kaur et al.
are set at zero. This calculation was achieved to minimize the expense of computation
by making sure that the ht is sparse.
4 Experimental Setup
5 Experimental Results
Discriminative training is performed in both feature and model space using a variant
of the MMI criterion called boosted (or margin based) MMI (BMMI). The objective
function is used is a modification of BMMI which uses a frame-based, state-based loss
function instead of a phone-based accuracy measure. Result analysis was conducted
in this segment utilizing four variants of MMI discriminative approach: MMI, bMMI,
fMMI, and fbMMI to obtain the following results:
Punjabi Children Speech Recognition System Under … 201
Table 1 Word error rate using MMI, boosted MMI model on boosted value 0.25, and fMMI, fbMMI
on the iteration value 3
Dataset (train/test) WER%
MMI bMMI fMMI fbMMI
Adult/child 63.72 61.62 55.25 53.49
Adult, child/child 21.32 20.23 17.93 16.06
The primary step in the discriminative training sequence is the generation of lattices
for numerator and denominator. It has been found that the number of lattices also
appears to decrease the transcription’s forced alignment. We have adapted Kaldi’s
MMI recipes to fit the data through yet to our knowledge.
In order to decode for the process of MMI, four iterations (by default) of stochastic
gradient descent are being used. Further, we have used the boosting factor with MMI
to robustly locate the likelihood of the path with more errors. bMMI performs better
than MMI due to the refined constant learning rate of 0.00001 and I-smoothing as
regularization shown in Table 1. Moreover, the boosting factor was investigated with
the values varied from [0.25, 0.5, 0.75, 1.0, and 2.0] and the system obtained lower
WER at bMMI value of 0.25.
Before starting with feature space discriminative training, to train the diagonal mix of
the implemented Gaussian, 250 amounts of Gaussian with the silence of weight value
0.5 were employed. The total number of eight iterations with a boost value of 0.25
and a learning rate of 0.001 were employed for the training of feature space bMMI.
The denominator states are thus used, and the lattice is re-scored on all eight iterations
202 H. Kaur et al.
leading to the transformation of the features needed for robust discriminative training
in the feature space. The obtained WER shows that system obtained maximum output
at fbMMI iteration value 3 as shown in Table 1.
This paper represents a Punjabi language speech recognition system for children’s
speech under mismatch conditions. The experiments were repeated for the child and
adult speech corpus described in Sect. 5. Discriminative techniques were explored
for the training and testing conditions of both matched and mismatched data. The
framework presented was developed using one kind of discriminative technique and
its variants, i.e., MMI, bMMI, fMMI, and fbMMI. It showed significant that changes
were made by using discriminative training methods for limited vocabulary tasks
on small datasets. Recognition efficiency declines significantly for matched and
mismatch conditions. The WER for mismatch conditions is about higher than for
matched conditions. The primary cause of the loss of results is the inconsistency
between the mismatch of speech data. The boosting factor was investigated with
the different values and the system obtained lower WER at bMMI value of 0.25.
The obtained WER shows that system obtained maximum output at fbMMI iteration
value 3. It has been analyzed that fbMMI appeared to be a promising technique than
MMI, bMMI, and fMMI from the results presented in this paper.
In mismatched speech recognition process, acoustic properties of the speech signal
like pitch, formant frequency, fundamental frequency, and speech speaking rate play
an important role for achieving the good performance. There is a lot of differences
in acoustic properties of children’s and adult speech signal. So, in the future by
enhancing the pitch and acoustic properties of the children’s speech signal, the
performance of the Punjabi children’s speech recognition system will be increased.
References
1. Das, S., Nix, D., Picheny, M.: Improvements in children’s speech recognition performance.
In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal
Processing, ICASSP’98 (Cat. No. 98CH36181), vol. 1, pp. 433–436. IEEE (1998, May)
2. Arunachalam, S., Gould, D., Andersen, E., Byrd, D., Narayanan, S.: Politeness and frus-
tration language in child-machine interactions. In: Seventh European Conference on Speech
Communication and Technology (2001)
3. Lee, S., Potamianos, A., Narayanan, S.: Acoustics of children’s speech: developmental changes
of temporal and spectral parameters. J. Acoust. Soc. Am. 105(3), 1455–1468 (1999)
4. Wilpon, J.G., Jacobsen, C.N.: A study of speech recognition for children and the elderly. In:
1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference
Proceedings, vol. 1, pp. 349–352. IEEE (1996, May)
Punjabi Children Speech Recognition System Under … 203
5. Giuliani, D., Gerosa, M.: Investigating recognition of children’s speech. In: Proceedings
of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003.
(ICASSP’03), vol. 2, pp. II-137. IEEE (2003, April)
6. Potamianos, A., Narayanan, S., Lee, S.: Automatic speech recognition for children. In: Fifth
European Conference on Speech Communication and Technology (1997)
7. Heigold, G., Ney, H., Schluter, R., Wiesler, S.: Discriminative training for automatic speech
recognition: modeling, criteria, optimization, implementation, and performance. IEEE Signal
Process. Mag. 29(6), 58–69 (2012)
8. Li, Q., Huang, Y.: An auditory-based feature extraction algorithm for robust speaker identifica-
tion under mismatched conditions. IEEE Trans. Audio Speech Lang. Process. 19(6), 1791–1801
(2010)
9. Li, Q., Russell, M.J.: An analysis of the causes of increased error rates in children’s speech
recognition. In: Seventh International Conference on Spoken Language Processing (2002)
10. Narayanan, S., Potamianos, A.: Creating conversational interfaces for children. IEEE Trans.
Speech Audio Process. 10(2), 65–78 (2002)
11. Kathania, H.K., Ahmad, W., Shahnawazuddin, S., Samaddar, A.B.: Explicit pitch mapping for
improved children’s speech recognition. Circ. Syst. Sig. Process. 37(5), 2021–2044 (2018)
12. Shahnawazuddin, S., Maity, K., Pradhan, G.: Improving the performance of keyword spotting
system for children’s speech through prosody modification. Digit. Signal Proc. 86, 11–18 (2019)
13. Kaur, H., Kadyan, V.: Feature space discriminatively trained Punjabi children speech
recognition system using Kaldi toolkit. Available at SSRN 3565906 (2020)
14. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Silovsky, J.: The
Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition
and Understanding (No. CONF). IEEE Signal Processing Society (2011)
Effective Irrigation Management System
for Agriculture Using Machine Learning
Abstract Farming of different crops is major income source of the farmers in India
but there are many factors that affect the farming business. One of the important factor
is efficient water supply for the crop. The work in this paper proposes the effective
irrigation system that helps to increase the productivity of the crops by regulating
the water requirement for the crop with the help of machine learning approach. The
images of different farming land are studied and classified depending upon the soil
type and its properties like the water requirements in different conditions. The image
processing is applied to images of lands to understand the current soil condition.
This phase is followed by application of decision tree and random forest to take a
decision of water is required or not. If answer is yes, then using linear regression,
we calculate the time period of water flow.
1 Introduction
S. T. Patil (B)
Department of CSE, Sanjay Ghodawat University, Kolhapur, India
e-mail: sangram.patil@sanjayghodawatuniversity.ac.in
M. S. Bhosale
Department of Computer Science and Engineering, TKIET, Warnanagar, Kolhapur, India
e-mail: msbhosale@tkietwarana.ac.in
R. M. Kamble
Department of Computer Science and Engineering, ADCET, ASTHA, Ashta, Kolhapur, India
e-mail: rmk_cse@adcet.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 205
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_22
206 S. T. Patil et al.
explained that good seeds and fertilizers fail to achieve their full potential if crops
are not watered optimally. The fresh water requirement of different industry and
agriculture is increasing rapidly, but uncertainty in rain fall, limitations of reservoir,
and low ground water level are a big threat for farming/agriculture sector. To cope
with this situation and for the better and efficient use of existing water, we need an
effective irrigation management system.
Sugarcane is major crop in southern Maharashtra and proper percentage water,
moisture in soil/land yields to better productivity. The sugarcane is very water inten-
sive crop as it requires large quantity of water. Currently, sugarcane farmers are not
having any economical resource/equipment which will guide them for effective and
efficient watering for farming. They are guessing water requirement on their past
experience and flood the water into farm; this leads to huge wastage of water. Some
farmer uses sprinklers to minimize water wastage. In recent years, some low-cost
sensors are developed to detect the water contents in soil [Development of low-cost
sensor]. In [determination of soil moisture], Praveen et al. designed IoT-based sensors
for irrigation system but they are not suggesting what should be the water quantity
required for maintaining required moisture. The work presented in this paper esti-
mates the water requirement of crop based on given image of soil with machine
learning approach.
In the next chapter, literature survey is given. Third chapter explains methods
used for data collection. In the fourth chapter, we have explained methodology. Fifth
chapter explains the result analysis and includes the numerical comparisons of these
methods. The last chapter gives concluding comments.
2 Literature Survey
dos Santos [2] in which he presented relation between the soil and moisture with the
help of images taken by digital camera, and images are adjusted to white balance. In
his study, he derived the equation which will relate the presence of moisture in soil
from image taken. He referred the data provided by Federal University of Vicosa.
Fitton [3] studied on farming in Africa, China, Europe, and Asia. After study is
observed that proper watering can increase the productivity also he explained how
water affects productive capacity of land/soil. More water or less water contents
decreases the productivity. Proper water increases the productivity.
Ashok [4] studied how images can be used to detect and treatment of different
disease by capturing the images. After capturing the image, some preprocessing is
done from which the histogram which is generated will classify or detects the disease
on plant. This study is helpful for detecting the disease from image.
Khan in paper [5] uses decision tree method to estimate how much water is
available in that region. He also tried to estimate the moisture in soil. He showed
how data mining can be used for estimating water availability in that region. This
estimation will help to get good crop.
Effective Irrigation Management System … 207
Dhawan in [1] presented detailed analysis of water and agriculture in India. One
of the outcome of research is water efficiency in the country that should be increased
by making best use of available technologies.
3 Dataset
For our study, we have selected eight different pieces of lands of same size, depending
upon water holding capacity. The information of land is collected in the form of
four digital images from each corner of the land, and physical samples of soil are
collected. The collected soil samples are studied for moisture contents of the soil
and also to get other properties. The wind flow rate and current temperature of the
day are also important factor which affects the moisture contents of soil. That’s why
while collecting samples of soil the wind flow rate and temperature of that region is
also recorded and maintained with images taken. Then, we studied total quantity of
water irrigated for a sugarcane over the period of time and measured the final product
gain of sugarcane.
For the experimental study, we prepared a dataset of images. The images were
taken twice a day from every piece of land for six months excluding the rainy season
for sugarcane crop cycle. So we collected nearly 11,520 images, 5760 samples of soil,
and 1440 temperature and wind flow readings. We considered a standard moisture
contents that are expected to be present in land/soil to increase productivity at every
stage of a crop.
The practical analysis of our work is represented in Fig. 1. First digital images were
captured at 1 feet height from surface of land/soil. Then, the captured images are
taken as input to our system. The final analysis is depended on captured images so
we did preprocessing of images to remove unwanted part. The images are stored into
RGB format with 256 × 256 size. Also it is stored in 8 × 8 representation for further
processing.
The images are classified into three classes. This classification is dependent on the
water requirement of the soil, i.e., low, moderate, and high water requirement. Also
from this classification [5], we can get the current moisture in land/soil. Then, images
are provided to decision trees [4]. The output of decision trees provided to input to
random forest [4] which gives decision whether water is required for land/soil or not.
If irrigation is required, then from linear regression, it is decided how much water is
required.
The following equation is used to calculate water need of the crop.
W = EM − CM (1)
208 S. T. Patil et al.
Dataset
Image
Classification
Random Forest
Irrigation is Irrigation is
NOT required required
Linear
Regression.
How much
is required?
W
T = (2)
F
where T is time in minutes; F is flow rate of water from irrigation system install in
land/soil.
We are also calculating total water consumption (TWC) of the month using
formula
TWC = F∗T (3)
Effective Irrigation Management System … 209
5 Experimental Results
Initially, we checked the performance of the system which we have developed. For
this, the collected data is divided into two parts training data and test data. Training
data is used to train the model, and then, model is tested on testing data, and results
are shown below. Average accuracy of the model is approximately 84% (Fig. 2).
For experimental result, we primarily focused on number of units of electricity and
water consumption for one crop cycle of sugarcane and also recorded total production
of the product. We have studied the crop cycle of sugarcane for three years; out of
which first 2 years, we observed the normal farming of the crop, and for third year
crop cycle, we implemented our proposed system. Following are the experimental
results for electric consumptions. For experimental result we also considered the
distance from water resource also. We combined the results depending upon this
distance also.
If we consider the electrical consumption, then it clearly shows that after the
system is implemented, the electrical consumption is reduced as per previous two
years. This is because here we are giving time duration for allowing water to flow
which was not previously considered (Figs. 3, 4, 5, and 6).
If we check at the water consumption, then it clearly shows less water consumption
as previous years after implementation of the system (Fig. 7).
Effective irrigation system provides proper water for land which keeps moisture in
land/soil as per requirement which increases the productivity of the land/soil (Fig. 8).
6 Conclusion
In India, high water intensive crop like sugarcane is the major crop taken by farmers.
Water plays an important role in the growth of sugarcane. Optimal use of water gives
us better growth and better productivity. To achieve this machine learning techniques
can be used effectively. From the result, it is clear that our system decreases the water
consumption, electric consumption and increases the productivity of the crop.
212 S. T. Patil et al.
In future, we can implement IoT system for starting and stopping of irrigation
system. Also we can use this system to guide farmer for selecting type of sugarcane
and other crops.
Effective Irrigation Management System … 213
References
1. Dhawan, V.: Water and agriculture in India. In: Background paper for the South Asia expert
panel during the Global Forum for Food and Agriculture (GFFA) (2017)
2. dos Santos, J.F.C.: Use of digital images to estimate soil moisture. Sci. Direct (2016)
3. Fitton, N.: Global Environment Change. Elsevier, Amsterdam (2019)
4. Ashok, J.M.: Agricultural Plant Disease Detection and its Treatment usig Image Processing.
IJSRD (2015)
5. Khan, S.A.: An Approach to Predict Soil Nutrients and Efficient Irrigation for Agriculture with
Spatial Data Mining, IJSRD (2015)
6. Aruna, D.D.: A Survey on Different Disease and Image Processing Techniques in Sugarcane
Crops. IJSRD (2016)
7. Balew, A.: The Egyptian Journal of Remote Sensing and Space Science. Elsevier, Amsterdam
(2020)
8. Sneht, S.H.: Land Use Land Cover Change Detection of Gulbarga City Using Remote Sensing
and GIS. IJSRD (2014)
9. BenDor, E.: Using imaging spectroscopy to study soil properties. Remote Sens. Environ. http://
dx.doi.org/10.2016/j.rse.2008.09.019
10. Tomar, M.: Development of Low Cost Soil Moisture Sensor. IEEE, ViTECoN (2019)
11. Barapatre, P., Patel, J.: Determination of soil moisture using various sensors for irrigation water
management. IJITEE, 8 (2019)
12. Wang, W., Liu, K.: Remote sensing image-based analysis of the urban heat isaland effect in
Shenzhen, China. Elsevier Book 110 (2019)
13. Peng, J., Jia, J.: Seasonal contract of the dominant factors for spatial distribution of land surface
temperature in urban areas. Elsevier Book, 215 (2018)
IoT-Based Smart Irrigation System
1 Introduction
The Internet of Things is the concept of connecting any tool (see you later because
it has an on/off transfer) to the Internet and different related devices. The IoT is a
huge network of connected topics and people—all of which gather and percentage
facts about the way they’ll be used and approximately the surroundings round them.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 215
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_23
216 M. K. Pandey et al.
Irrigation is the process of providing the water to plants at the wanted intervals.
Irrigation allows us to grow crops and re-vegetate the disturbed soils in parched areas.
It gives water transport at the proper time, within the proper amount, and within the
right location within the area which performs an essential feature inside the plant’s
boom. Water manipulates remotely which is likewise a hard venture; particularly,
the manipulate turns into greater hard within the direction of the shortage of water,
which can also in any other case damage the crop. By using sensors like moisture,
rain, and so on. Water delivers for irrigation can be managed effects with the useful
resource of analyzing the state of affairs of soil and weather. Soil moisture sensors
properly hit upon the degree the soil moisture and primarily based on that statistics;
the place is getting irrigated robotically with amazing deal plenty much less human
interventions. Smart irrigation, that is, the idea of doing irrigation in a modern day
manner, there are a variety of strategies that might be adopted in irrigation so that
the yield grows and could boom the production.
2 Related Work
In this paper [1], Cloud computing, Wi-Fi sensors, UAVs, and verbal exchange tech-
nologies are used. Also, diverse IoT primarily based structures are furnished along
admires to farm applications. In this paper [2], Message Queuing Telemetry Trans-
port (MQTT) is used which communicates with devices; it saves power by way
of which it is low luxurious. This calls for an awful lot less human intercession
and less maintenance for agricultural fields. In this paper [3], programmed irriga-
tion system screens just as keep up special soil dampness fulfilled material texture
through programmed watering. It utilizes microcontroller ATMEGA328P, soil damp-
ness sensors, and sensor readings are transmitted to an issue talk channel to deliver
diagrams for the appraisal. In this paper [4] test the edge values for climatic condi-
tions like humidity, temperature, moisture and sense the invasion of animals and
deliver ideas via SMS to the farmer at once on his cell the usage of GSM module.
4 Benefits
5 Different Features
6 Literature Review
In this paper, some of these factors are taken into deliberation and the location of
various technologies; particularly, IoT is displayed to form the smart farm and more
efficient to satisfy future expectations. According to this aim, here, cloud computing,
UAVs, wi-fi sensors, and communique technologies are explained fully [1]. To require
an outstanding arrangement less human mediation and least wellbeing for horticul-
tural grounds, the machine is conceivable to use without issues for all agriculturists.
Additionally, the moment is taken care of, and water is utilized satisfactorily exclu-
sive of dipping. This MQTT has utilized which is capable of talking with the various
contraption. It is the lowest information move limit and espresso potential utilization
forms the projected contraption expenditure astounding [2]. The framework likewise
encourages ongoing far-flung tracking of the current ecological kingdom of place.
Present day innovation may be consolidated to permit down the cost [3]. This device
generates an irrigation time desk based on the sensed actual time facts from field
and statistics from the climate repository. This machine can advise farmers whether
or now not or not is there a need for irrigation [4]. Better and optimally farms are
irrigated appropriately which is the crop yield. So this portray has designed a brilliant
farming tool dependent totally on IoT with together sensor of moisture [6]. The edge
voltages are picked for modification of the sensors through because of past signifi-
cant lots of temperature and soil sogginess regards. Limit esteems can be different
based upon the crop and plantation. In the destiny, with the resource of introducing,
the device reading set of rules for use to device the information and decrease the
complexity of the hardware [7]. A channel is created with the aid of the open-supply
IOT platform is created to shop and show the soil moisture data and additionally to
govern the irrigation by the manner of the Internet [8]. These techniques take the
longer length and wasting the available water in higher quotes so it ends in the utiliza-
tion of water extra than what required [9]. The gadget proposed in this paper pursuits
the purpose of the mixture of structures with the engaging strengths provided through
the manner of cloud computing. It can be carried out for rural applications [5]. All
the machine features a characteristic sensor format for power standard performance,
charge performance, preservatives, with other gullibility to surrender the benefit of
use [10].
IoT-Based Smart Irrigation System 219
Fig. 2 Raspberry Pi 4
7 Hardware Description
Raspberry Pi is like a small PC, and it is lightweight, and it has an ARM processor.
Besides, it has HDMI port, wi-fi modules, USB ports, and Ethernet port. Raspberry
Pi has an operating system like Raspbian, Linux, Kali Linux, Snappy Ubuntu, Arch
Linux ARM, etc. It has an HDMI port, and it does not have HDD or SSD still we
can insert micro SD card into raspberry pi so that we can boot the operating system
of Raspberry Pi (Fig. 2).
A propelled reality securing is chosen on an unmarried chip that fuses styles and
proceeds with hardware. MCP3208 exhibit in Fig. 4 is the driven IC which changes
over easy to twelve-piece automated pointers. It has ended up programmable each
unmarried or discrepancy couple input. The difference between non-linearity is ±1
LSB, and central nonlinearity is ±1 LSB. It also has a form of SAR. Safeguard
capacitor can achieve for 1.5 clocks/cycles starting at the fourth creation brink of
the consecutive clock. This chip put the capacitor divide the yield into 12 pieces and
change the pace of 100 ksps.
The exactness soil moistness has picked appeared in Fig. 5 which consolidates tests
that might be inserted into the earth. Exactly, when the present day experiences the
tests, the soil contains low clamminess which gives a better than the average plan of
abundance fundamentally less check and goes through extraordinary contemporary
day. Volatile defiance is the criterion to choose the fraction of soil condensation.
7.7 Buzzer
Fig. 7 Buzzer
8 Block Diagram
See Fig. 8.
9 Result
• The above image describes the output of MQTT clients who are receiving
parameter values from different sensors (Fig. 9).
• Using MQTT protocol, all sensors parameters are transmitted to clients.
• If “Crop/node” resembles the MQTT node, then more clients in the same node can
receive multiple pieces of information from different clients placed in different
areas of fields.
10 Conclusion
IoT-based smart irrigation system reduces the human survey, utilization of water,
and the labor related with normal processes. By using simple electronic parts, this
smart irrigation system can be made at a low cost. To avoid the waste of water and
to use water efficiently, the smart irrigation system is very important. Also, it can
increase the creation of fruits or vegetables, and it helps the agriculture ground to
reduce the waste of water. In all the processes of a smart irrigation system, the MQTT
protocol plays the most important role. By the MQTT protocol, a smart irrigation
system becomes independent with the fast transmission of information. The benefit
of MQTT protocol is that whenever clients are not in range of node network, the
information will be sent, whenever clients come in range and connected with that
node network; then, they can see the information which has been sent before.
224 M. K. Pandey et al.
References
1. Ayaz, M., Ammad-Uddin, M., Sharif, Z., Mansour, A., Aggoune, E.-H. M.: Internet-of-Things
(IoT)-based smart agriculture: toward making the fields talk. IEEE Access (2019)
2. Islam, M.M., Hossain, M.S., Reza, R.K., Nath, A.: IOT based automated solar irrigation system
using MQTT protocol in Charandeep Chakaria. IEEE (2019)
3. Dokhande, A., Bomble, C., Patil, R., Khandekar, P., Dhone, N., Gode, C.: A review paper on
IOT based smart irrigation system. IJSRCSEIT (2019)
4. Sushanth, G., Sujatha, S.: IOT based smart agriculture system. IEEE (2018)
5. Saraf, S.B., Gawali, D.H.: IOT based smart irrigation monitoring and controlling system. IEEE
(2017)
6. Mishra, D., Khan, A., Tiwari, R., Upadhay, S.: Automated irrigation system-IOT based
approach. IEEE (2018)
7. Nageswara Rao, R., Sridhar, B.: IoT based smart crop-field monitoring and automation
irrigation system. IEEE (2018)
8. Benyezza, H., Bouhedda, M., Djellou, K.: Smart irrigation system based Thingspeak and
Arduino. IEEE (2018)
9. Pernapati, K.: IOT based low-cost smart irrigation system. IEEE (2018)
10. Vaishali, S., Suraj, S., Vignesh, G., Dhivya, S., Udhayakumar, S.: Mobile integrated smart
irrigation management and monitoring system using IOT. IEEE (2017)
A Hybrid Approach for Region-Based
Medical Image Compression
with Nature-Inspired Optimization
Algorithm
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 225
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_24
226 S. Saravanan and D. S. Juliet
data to attain the rapid interactivity, context-oriented images, and also for quantita-
tive analysis [1]. Lossy, lossless, and hybrid are the categories of compression tech-
niques. To represent the medical information source without significantly degrading,
its quality is achieved through lossless compression. Two crucial factors which dealt
with image compression are redundancy and irrelevance. Removing duplicate infor-
mation from the image is named redundancy reduction, whereas unnoticed image
information avoids in terms of irrelevance reduction. A hybrid method of compres-
sion acts with two or more algorithms processed over an image to achieve the best
visual quality output. JPEG and JPEG2000 [2] are the well-known algorithms, which
brings the output image as same as the quality of the input image, which represents
as a lossless compression algorithm. DCT and DWT are the expansively used trans-
form in image compression models. Transform coding and predictive coding play a
significant role in the process of lossless compression technique. Predictive coding,
namely JPEG-LS, CALIC were used as a single predictor to achieve the lossless
compression.
In this article, we propose a region-based medical image compression model in
Sect. 3, where the region of interest and non-region of interest are segmented using
the region-based active contour method. The BAT optimization algorithm drives
further results for region-based segmentation. Region of interest part process with the
integer-based KL transform (IKLT) that realizes integer to integer mapping by factor-
ization for achieving a lossless image compression. Non-region area compressed with
the KL transform. Results are analyzed in Sect. 4 with the existing image compression
IDTT [3] and IDWT [4].
2 Related Works
energy minimization and can achieve a smooth closed contour result. Comparing
the types of active contour methods, the edge-based model [9] and the region-based
contour model [10] region-based active contour method proved to be efficient in
comparing it with the Chan-Vese method [11]. Metaheuristic algorithms are used to
furthermore driven with a segmentation algorithm to achieve an optimized output
selection. Particle swarm optimization algorithm on segmentation proposed to be
efficient in order to compress the image with a high-quality image. Bat optimiza-
tion with fuzzy encoding method proves to be efficient when its compared with the
tradition transform-based compression model. Using an optimization technique to
optimum the active contour method can bring out efficient segmented image output.
Integer-based transform [12] tends to achieve a lossless image compression model as
proved in [3]. When the transformed output are factorized to the nearby values results
in lossy compression technique. Integer to integer conversion is achieved through
the integer-based transform model to produce a lossless compression technique.
3 Proposed Method
Finding from the survey, it declares that a region-based compression model can be
an efficient methodology to obtain an exhaustive uncompressed image for the region
of interest area. In this section, image segmentation experiments with a region-based
active contour method, which is optimized using the BAT algorithm. Segmented
region of interest areas are lead into the integer-based KL transform for decorrelation
purpose, and non-region of interest area is compressed using the KL transform as
illustrated in Fig. 1.
In order to segment the region of interest and non-region of interest from the medical
database, an active contour method of segmentation is utilized. As compared to the
classic method of segmentation like edge detection [13], thresholding method, and
region growing method, active contour method attain sub-pixel accuracy of the area
boundaries, it is easy to formulate under the energy minimization context. As a
Region based
acve ROI IKLT
Compressed
Input medical contour
image method image
driven by BAT NON-ROI KLT
algorithm
result, it achieves smooth and closed contours. Region-based active contour method
is implemented in this article that aims to identify each region of interest using a
certain region descriptor in order to guide the movement of active contour. It works
based on the intensity homogeneity [14]. In the article [14], overcome of assumption
problem in region-based active contour is achieved, which is implemented in the
proposed method.
N
ε=∫ ∫ K (y − x)|I (x) − b(y)ci | dx dy
2
(1)
i
i=1
Notation describes that true image (J) and observed image (I) with as image
domain with N distant constant value C 1 ,…, C N at disjoint regions say 1 ,…N ,
which results in minimizing the energy function Eq. (1). b denotes the component for
intensity inhomogeneity, K as kernel function chosen as truncated Gaussian function
as followed in [14]. Metaheuristics algorithm is clubbed with the active contour
method to achieve the optimal segmented region of the medical data. BAT algorithm
[15] is driven with a region-based active contour method to adaptively select the
external energy weights and escape the local minima to obtain the classical contour
method.
The BAT optimization is a metaheuristic algorithm based on the echolocation
property of the natural bats. Described in notation as bat frequency (f i ), position (X i )
and velocity (vi ) are initiated with loudness (Ai ) and pulse rate (r i ). According to the
number of iterations, the velocity and position of the bats are updated.
f i = f min + f min − f max β (2)
where minimum frequency ( f min ), maximum frequency ( f max ), and β denote the
randomly selected in the interval [0,1]. To create random walks using Eq. (5) and
Eqs. (6) and (7) are used to update the loudness and pulse rates.
Ai (t + 1) = α Ai (t) (6)
ri (t + 1) = ri (0) 1 − e−γ t (7)
A Hybrid Approach for Region-Based Medical Image Compression … 229
where α and γ are constants with bats, average loudness at time (t), which is signified
by A(t); the strength and direction of the random walk are employed using the vari-
able ε ∈ [−1, 1]. Figure 2 compares the region-based active contour method with
the existing Chan-Vese active contour method, and as it says that the region-based
segmentation works efficiently. Table 1 denotes the algorithm of BAT optimization
algorithm.
Fig. 2 Segmentation results using Chan-Vese contour method and region-based active contour
method
à : Z N → Z N , à = P L̃ Ũ S̃ (8)
After the combination in region of interest and non-region of interest, the performance
metrics evaluation like peak signal-noise ratio, mean square error, compression ratio,
and SSIM are used for findings. Sample input images considered for evaluation and
output compression images obtained are illustrated in the Fig. 3.
√
PSNR = 10 ∗ log10 2552 MSE (9)
Peak signal-noise ratio is a parameter for assessing the quality of the compressed
image. It is defined in Eq. (9). MSE defines the mean square error using Eq. (10).
The compression ratio is achieved by the size of the input image divided by the size
of a compressed image as given in Eq. (11).
1
MSE = × ( f (x, y) − F(x, y))2 (10)
N i j
2μx μ y + C1 2δx y + C2
SSIM = 2 (12)
(μx + μ2y + C1 ) δx2 + δ 2y + C2
The proposed method is compared with the existing algorithms like integer-
based DWT [4], inter based DTT [3]. And the results are analyzed in Table 2. The
proposed method outperforms when compared with the existing compression algo-
rithms. SSIM values from Table 2 reflect that the proposed method can able to regain
its best quality output image with the highest similarity of 0.998. Bolded values in
Table 2 indicates the highest value achieved when compared with other existing algo-
rithms. And it proves that the region-based compression technique works efficiently
with the medical image in terms of separating the region of interest and non-region
of interest. It also proved to be achieving a high-quality compressed image with a
higher PSNR value of 44 dB. Figure 4 describes the PSNR value analysis.
232 S. Saravanan and D. S. Juliet
Table 2 Comparison of the proposed method with other existing algorithms for image compression
Images Methodology PSNR CR MSE SSIM
Image 1 IDWT 40.14 4.1 2.19 0.9971
IDTT 42.01 4.49 2.53 0.9975
42.39 4.58 2.03 0.9998
IKLT (proposed)
Image 2 IDWT 40.64 3.91 2.16 0.9972
IDTT 40.16 4.18 2.20 0.9979
40.86 4.29 2.09 0.9996
IKLT (proposed)
Image 3 IDWT 40.26 4.40 2.17 0.9991
IDTT 41.10 4.8 2.09 0.9996
41.4 4.93 1.9 0.9997
IKLT (proposed)
Image 4 IDWT 41.92 4.25 2.48 0.9971
IDTT 42.07 4.72 2.60 0.9979
43.21 4.91 2.32 0.9997
IKLT (proposed)
Image 5 IDWT 41.2 3.72 2.40 0.9989
IDTT 42.81 4.04 2.15 0.9994
42.13 4.12 2.23 0.9998
IKLT (proposed)
5 Conclusion
References
1. Gonzalez, R.C., Woods, R.E., Masters, B.R.: Digital image processing, Third Edition. J.
Biomed. Opt. 14(2), 029901 (2009). https://doi.org/10.1117/1.3115362
2. Skodras, A., Christopoulos, C., Ebrahimi, T.: The JPEG 2000 still image compression standard.
IEEE Signal Process. Mag. 18(5), 36–58 (2001). https://doi.org/10.1109/79.952804
3. Xiao, B., Lu, G., Zhang, Y., Li, W., Wang, G.: Lossless image compression based on integer
Discrete Tchebichef Transform. Neurocomputing 214, 587–593 (2016). https://doi.org/10.
1016/j.neucom.2016.06.050
4. Nagendran, R., Vasuki, A.: Hyperspectral image compression using hybrid transform with
different wavelet-based transform coding. Int. J. Wavelets Multiresolut. Inf. Process. 17(2),
1–21 2019. https://doi.org/10.1142/s021969131941008x
5. Chen, X., Zhou, Y., Luo, Q.: A hybrid monkey search algorithm for clustering analysis. Sci.
World J. 2014 (2014). https://doi.org/10.1155/2014/938239
6. Vincent, C.S., Janet, J.: An enhanced N-pattern hybrid technique for medical images in
telemedicine. Procedia Comput. Sci. 79, 305–313 (2016). https://doi.org/10.1016/j.procs.2016.
03.040
7. Palanivelu, L.M., Vijayakumar, P.: Effective image segmentation using Particle Swarm Opti-
mization for image compression in multi application smart cards. In: Proceedings of the World
Congress Information and Communication Technologies WICT 2011, pp. 535–539 (2011).
https://doi.org/10.1109/wict.2011.6141302
8. Horng, M.H.: Multilevel thresholding selection based on the artificial bee colony algorithm for
image segmentation. Expert Syst. Appl. 38(11), 13785–13791 (2011). https://doi.org/10.1016/
j.eswa.2011.04.180
9. Xie, W., Li, Y., Ma, Y.: PCNN-based level set method of automatic mammographic image
segmentation. Optik (Stuttg) 127(4), 1644–1650 (2016). https://doi.org/10.1016/j.ijleo.2015.
09.250
10. Zuo, Z., Lan, X., Deng, L., Yao, S., Wang, X.: Optik an improved medical image compression
technique with lossless region of interest. Optik—Int. J. Light Electron Opt. 126(21), 2825–
2831 (2015). https://doi.org/10.1016/j.ijleo.2015.07.005
11. Mandal, D., Chatterjee, A., Maitra, M.: Robust medical image segmentation using particle
swarm optimization aided level set based global fitting energy active contour approach. Eng.
Appl. Artif. Intell. 35, 199–214 (2014). https://doi.org/10.1016/j.engappai.2014.07.001
12. Hao, P., Shi, Q.: Matrix factorizations for reversible integer mapping. IEEE Trans. Signal
Process. 49(10), 2314–2324 (2001). https://doi.org/10.1109/78.950787
13. Kiran, R., Kamargaonkar, C.: Region separation techniques for medical. 1314–1325 (2016)
https://doi.org/10.15680/ijirset.2016.0502021
14. Li, C., Huang, R., Ding, Z., Gatenby, J.C., Metaxas, D.N., Gore, J.C.: A level set method for
image segmentation in the presence of intensity inhomogeneities with application to MRI.
IEEE Trans. Image Process. 20(7), 2007–2016 (2011). https://doi.org/10.1109/TIP.2011.214
6190
15. Yang, X.S.: A new metaheuristic bat-inspired algorithm. Stud. Comput. Intell. 284, 65–74
(2010). https://doi.org/10.1007/978-3-642-12538-6_6
Attention Mechanism-Based News
Sentiment Analyzer
Sweta Kaman
1 Introduction
The global news media or news industry which is a very important source of infor-
mation linked with all of our lives is associated with numerous biased and unbiased
press groups according to [1]. We still await to believe that these news sources
which influence each one of us in many different ways, delivers true stories to the
public and not the sugar-coated one. However, some news channels, press groups and
websites are corrupt and political party dedicated and intentionally publishes fake
news, hate speech, etc., that transmits an alarming and disturbing environment. They
are awfully engaged in publishing negative stories that they have forgotten their role
in the society, i.e., enlightening the citizens of a country with reality and spreading
positivity and hope. Attention mechanism [2] accommodates a neural network to pay
attention to only a specific part of an input sentence while generating a translation
much like a human translation. The task of sentiment analysis [3] along with the
attention mechanism will help us to identify those websites and news sources by
S. Kaman (B)
Department of Science of Intelligence, IIT Jodhpur, Karwar, India
e-mail: kaman.1@iitj.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 235
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_25
236 S. Kaman
paying attention to each of the negative words present in the article and finding their
intentions that will ultimately lead us to prevent us from following them and sustain
an environment filled with positivity. The goal of the proposed model in this paper is
to analyze the sentiments of the article and to classify them into negative, positive,
and neutral with high accuracy.
The brief explanation of the structure of this paper is elaborated as follows.
Section 2 of this paper discusses the pros and cons of the existing methods; Sect. 3
of this paper discloses the dataset used in the project and the method to prepare
the train and test dataset; Sect. 4 elaborates the proposed methodology step by step
in completing the task; Sect. 5 of this paper presents the experimental results and
predictions of the proposed model; Sects. 6 and 7 conclude this paper with some
research accomplishments and future work of this project.
2 Related Work
Analyzing the sentiments of a news article have been in trend since last few years.
Numerous models have been proposed to perform this task with high accuracy. One
of the method is proposed by Lim et al. [4] in which the author has used a machine
learning technique to predict the opinion of an article with business headlines. The
task of analyzing the sentiments has also been modified by classifying the text into
two categories, i.e., “good news and bad news” as proposed by Alvarez et al. [5],
where they have focused only in positive and negative sentiments. But in my opinion,
both the sentence and document level analysis, by focusing at all three sentiments are
important while classifying the article, which is not included in the existing models
and the methodology of which is interpreted in the next sections of this paper.
3 Data Preparation
SemEval-2016 Task 4 [6] has been used as the training data for my model which
consists of a combination of training and additional datasets. The training dataset
lacked necessary amount of data to train my model so I successfully assembled all
the additional data files and train data files into a single file consisting of 53,368
sentences since “larger the corpus, higher the accuracy.” The training data contains
three basic sentiment columns which I explicitly constructed corresponding to each
of the train sentences. I named these three columns as negative, positive, and neutral.
The values inside each of these columns denote the presence or absence of the name
head sentiment tag, i.e., if the value is 1 under the neutral column, then the sentiment
Attention Mechanism-Based News Sentiment Analyzer 237
of the sentence is neutral, and if the value is 0, then vice versa, and so on (follow
Fig. 1).
BeautifulSoup [7] that analyzes the XML and HTML files and is a Python library
which has been used in the proposed model to create a news crawler. The input
to the crawler will be news articles from any news sources, and the output will be
preprocessed and clean tokenized sentences of the article. These sentences will be
compiled together with the whole article itself to reconstruct the test data, i.e., if the
number of sentences in the article is “n,” then “n + 1”th row in the file will consist
of the article itself for the document level analysis. Three columns will be explicitly
generated as described in Sect. 3.1, but here, the values will be initialized to 0.
4 Proposed Methodology
After performing some primitive preprocessing on the train and test datasets, glove
embeddings [8] which stand for global vector embeddings are used, which were
invented by a group of Stanford researchers. This assists us to consider the global
property of the dataset unlike the other word embeddings like Word2Vec [9] which
considers only the local property of the dataset. This makes use of the co-occurrence
matrix or simply the count matrix which helps in extracting the semantic relationship
between the words and to predict the words which are semantically sound to the words
around it. This reduces the dimension of the word vector embedding, and the glove
embeddings which I have used in this model are 300 dimension vectors. Then, the
text to sequence is done, and padding is added in the data after which the shape of
the train data will be (53,368, 150), and the shape of final test data will be (310,
150). After segregating the train data into train and validation with a ratio of 80:20,
it is ready to be fed into the neural network which uses attention mechanism and
LSTM network that helps us to decide what type of output we want to generate.
The activation function used is “relu,” the optimizer used to minimize the loss value
which is “rmsprop,” and the loss function used is “binary_crossentropy.” The brief
summary about the network is displayed in the following Fig. 2.
238 S. Kaman
5 Results
The model which is being depicted in Sect. 4 is now ready for training and for which
I set the number of the batch size as 256 and number of epochs as 25. After the
successful training, the model achieved an accuracy of 0.9186 and loss of 0.1885
which is displayed as follows (Fig. 3). The validation loss and accuracy are 0.1948
and 0.9257 respectively.
The final output consists of predicted scores for the probable sentiments corre-
sponding to each of the sentences of the test dataset. There are total 310 rows in
the final result out of which the last row, i.e., ID 309 indicates the whole article itself,
and the remaining rows are sentences of the article. The three sentiment tag columns
consist of predicted scores such that the highest amongst all three scores will decide
the final sentiment of the sentence. The last row of the output is the document level
prediction of the news article, corresponding to which the highest score is 0.999970
that comes under the column ‘neutral, as illustrated in the figure. This affirms that
Fig. 4 Final predictions at sentence and document level (Source Jupyter Notebook)
the document level sentiment is neutral. And rest of the rows in the final prediction
are the sentence level predictions of the test dataset which is laid out in the following
Fig. 4.
6 Conclusions
This project can successfully contribute to the task of analyzing the sentiment of
news sources. The three elementary sentiments, i.e., neutral, negative, and positive
are successfully predicted with an accuracy of 0.9186 by using attention mechanism
at sentence and document level. This model is not limited to predict the sentiments for
news articles only but can also be modified further and can be implied in numerous
other fields of natural language processing.
7 Future Work
The task of sentiment analysis has an ample amount of application areas, and in the
forthcoming years, there will be an increase in these numbers. This project can be
further modified to get much better performance by using XLNET [10] or BERT
[11] which are the pre-trained models of deep learning. The news articles which I
have used can be replaced by the conversations of people and the task of deception
detection can comply in it. Determine the emotions and mental health of people
240 S. Kaman
during the times of pandemic like COVID-19 [12], or to detect fake news’ in multiple
platforms since it spread chaos amidst people.
Acknowledgements This project has been fortunately executed because of the inspirational ideas
and teachings I got from numerous remarkable projects of Dr. L. Dey, chief scientist at TCS Research
and Innovation, India.
References
1. Eveland, J.W.P., Shah, D.V.: The impact of individual and interpersonal factors on perceived
news media bias. Polit. Psychol. 24(1), 101–17 (2003)
2. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Polosukhin, I.:
Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–
6008 (2017)
3. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2),
1–135 (2008)
4. Lim, S.L.O., Lim, H.M., Tan, E.K., Tan, T.P.: Examining machine learning techniques
in business news headline sentiment analysis. In: Computational Science and Technology,
pp. 363–372. Springer, Singapore (2020)
5. Alvarez, G., Choi, J., Strover, S.: Good news, bad news: a sentiment analysis of the 2016
Election Russian Facebook Ads. Int. J. Commun. 14, 27 (2020)
6. Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: SemEval-2016 task 4: sentiment
analysis in Twitter. arXiv preprint arXiv:1912.01973 (2019)
7. Chandrika, G.N., Ramasubbareddy, S., Govinda, K., Swetha, E.: Web scraping for unstruc-
tured data over web. In: Embedded Systems and Artificial Intelligence, pp. 853–859. Springer,
Singapore (2020)
8. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In:
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
(EMNLP), pp. 1532–1543 (2014)
9. Rong, X.: Word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 (2014)
10. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: generalized
autoregressive pretraining for language understanding. In: Advances in Neural Information
Processing Systems, pp. 5754–5764 (2019)
11. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
12. Mehta, P., McAuley, D.F., Brown, M., Sanchez, E., Tattersall, R.S., Manson, J.J.: COVID-19:
consider cytokine storm syndromes and immunosuppression. Lancet 395(10229), 1033–1034
(2020)
Interactive Chatbot for COVID-19 Using
Cloud and Natural Language Processing
1 Introduction
A new disease in late December 2019 has been spreading rapidly in China, which
has been named “coronavirus disease 2019,” also known as “COVID-19” [1]. Within
a few weeks, COVID-19 disease has been spread rapidly outside china in all over the
world. On March 11, the World Health Organization has declared it as a pandemic
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 241
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_26
242 P. Jaimin et al.
2 Literature Review
Table 1 show that 17 chatbots regarding COVID-19. In these bots, we have found
only one conversational bot, which is Providence St. Joseph. Some bots are providing
primary information about COVID-19 in the form of question–answer, while some
are checking the sign or severity of COVID-19 based on the user’s symptoms.
3 Proposed Model
Figure 1 depicts the COVID-19 Assessment bot flowchart. There are a total of three
services used in this system—(i) Messenger application [12], which is responsible
for the user interface, (ii) IBM Watson Assistant [13], which is natural language
processing service from IBM; it is liable to detect the intent of the text message and
deliver an appropriate response back to the messenger application and (iii) AWS
Lambda [14], which is a cloud service where custom code can be executed. It is
responsible for predicting the severity and sign of COVID-19. The flow of this system
Interactive Chatbot for COVID-19 Using Cloud … 243
starts with the user text entered in messenger application, which will be sent to IBM
Watson Assistant to recognize the meaning of the text. The assistant identifies the
intent of the text message and sends the response back to the messenger application.
The response is defined in the dialog of that particular intent. If the assistant identifies
the intent as the assessment test intent, then POST request webhook will be called
to AWS lambda, where custom code will be executed and give the prediction of
COVID-19 sign and severity as a webhook response, which will be sent back to
messenger application by Assistant as a dialog response.
244 P. Jaimin et al.
4 Implementation
used AWS lambda service to predict COVID-19 sign and severity based on user
inputted symptoms. Lastly, the Facebook messenger application needs to be linked
with Watson Assistant to make the bot conversational. When a user enters some text,
it will be sent to IBM Watson Assistant by messenger application. Watson Assistant
will perform some natural language processing and give the response text based on
the user’s text meaning. So, in this way, a user will get a personalized experience with
the bot. It saves the user’s time and effort to get the required information as they can
directly ask their query instead of going through structured design on websites to get
information. Figure 2 depicts Chatbot Profile, Fig. 3 represents Welcome message,
Fig. 4 denotes question–answer and Fig. 5 shows Symptoms Checker.
5 Conclusion
In this demo bot, we are successfully able to give information to the user based on the
message. This type of healthcare bot does not require having a dedicated server and
storage. It can be easily implemented in the cloud with the help of AI services such
as IBM Watson Assistant [13], Google Dialogflow [29], and Microsoft Azure [30].
In addition, we can easily integrate it into a website or any other messaging platform.
Thus, a chatbot is very useful, especially in this type of pandemic situation, when the
healthcare department can lower their burden to solve primary user’s queries with
the help of conversational chatbots.
Interactive Chatbot for COVID-19 Using Cloud … 247
Fig. 4 Question–answer
248 P. Jaimin et al.
References
1. APA Wu, Y.-C., Chen, C.-S., Chan, Y.-J.: The outbreak of COVID-19. J. Chin. Med. Assoc.
83(3), 217–220 (2020). https://doi.org/10.1097/jcma.0000000000000270
2. https://www.cdc.gov/mmwr/volumes/69/wr/mm6918e2.htm?s_cid=mm6918e2_w
3. https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html
4. https://www.indiatoday.in/newsmo/video/what-are-asymptomatic-covid-19-cases-1670422-
2020-04-24
5. https://www.cdc.gov/nonpharmaceutical-interventions/index.html
6. https://en.wikipedia.org/wiki/Chatbot
7. https://en.wikipedia.org/wiki/Natural_language_processing
8. https://www.nlpworld.co.uk/nlp-glossary/i/intent/
9. Valtolina, S., Barricelli, B.R., Di Gaetano, S.: Communicability of traditional interfaces VS
chatbots in healthcare and smart home domains. Behav. Inf. Technol. 39(1), 108–132 (2020)
10. Fadhil, A.: Beyond patient monitoring: conversational agents role in telemedicine and
healthcare support for home-living elderly individuals. arXiv preprint arXiv:1803.06000 (2018)
11. https://en.wikipedia.org/wiki/Artificial_intelligence
Interactive Chatbot for COVID-19 Using Cloud … 249
12. https://developers.facebook.com/docs/messenger-platform/
13. https://cloud.ibm.com/docs/services/assistant?topic=assistant-getting-started#getting-started
14. https://docs.aws.amazon.com/lambda/latest/dg/welcome.html
15. https://my.clevelandclinic.org/landing/preparing-for-coronavirus
16. https://www.cdc.gov/coronavirus/2019-ncov/testing/diagnostic-testing.html
17. https://www.messenger.com/t/COVID19.MOHW.BW
18. https://hellotars.com/chatbot-templates/coronavirus-covid19-fight/NkBd08/covid-19-assess
ment-chatbot-template
19. https://hellotars.com/chatbot-templates/coronavirus-covid19-fight/VJXnAu/covid-19-cases-
tracker
20. https://hellotars.com/chatbot-templates/coronavirus-covid19-fight/NJJrH-/covid-19-faq-cha
tbot
21. https://www.facebook.com/MyGovIndia/
22. https://www.projectbaseline.com/study/covid-19/
23. https://coronavirus.providence.org/
24. https://www.buoyhealth.com/symptom-checker/
25. https://nodejs.org/en/docs/
26. https://developers.facebook.com/docs
27. https://cloud.ibm.com/docs/assistant?topic=assistant-intents
28. https://cloud.ibm.com/docs/assistant?topic=assistant-entities
29. https://dialogflow.com/
30. https://azure.microsoft.com/en-in/
Investigating the Performance
of MANET Routing Protocols Under
Jamming Attack
Abstract Mobile ad hoc networks are a genre of wireless networks that can perform
as both routes and hosts and have the competency to organize dynamically without
using static infrastructure. Because of the scarcity of central administration and rapid
topological changes, they are mostly affected by various security attacks. Jamming
attack is a physical layer attack which is responsible for decreasing network perfor-
mance by isolating the communication with neighboring nodes. This paper aims to
find out the network performance under jamming attack on three routing proto-
cols such as geographical routing protocol (GRP), optimized link state routing
protocol (OLSR) and ad hoc on-demand distance vector (AODV). The simulation of
these protocols is considered with respect to performance parameters, network load,
throughput and delay by using Riverbed simulator. Finally, the outcome of different
scenarios is compared to find out the better performing protocols in case of jamming
attack.
1 Introduction
In recent days, mobile ad hoc network (MANET) has gained its popularity because
of its dynamic characteristics and mobility and also can handle any kind of changes
that is happened within the network. However, it is not mandatory to have any central
management [1]. Every node in the network plays a role to discover the routes and
maintain connection with other nodes around. A great benefit of MANET is that
it can be generated in any place, any time and any natural conditions without the
necessity of any pre-installed infrastructure [2]. Note that wireless network faces
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 251
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_27
252 P. Sen and M. Rahman
more security challenges than wired network because of the mobility of the nodes
[3, 4]. Pulse jamming attack is one of the most serious attacks in denial of service
(DoS) attack which prevents information transmission between genuine sender and
receiver. Sometimes malicious node can detect the original signal and destruct the
communication [5]. In this investigation paper, we represent the performance of three
routing protocols as AODV, OLSR and GRP using medium FTP, medium email and
low database traffic with respect to performance parameters, throughput and delay.
The same things are implemented under pulse jamming attack to find out the best
performed protocol under attack and without attack.
2 Related Works
Singh and Gupta [1] analyzed the performance of MANET under jamming attack and
without jamming attack. AODV routing protocol was selected for simulation. The
network performance was surveyed with simulation results of delay, data dropped
and network load. From the comparison, it was concluded that jamming attack is
responsible to decrease the network performance. Rao et al. [2] investigated the
performance of MANET routing protocols such as AODV, DSR, GRP and OLSR.
Eight performance metrics were used to compare the simulation result. Based on
these results, OLSR’s performance proved better than others. Popli and Raj [3]
configured a network with high mobility and AODV routing protocol. The network
in normal condition was compared with the network under jamming attack for end
to end delay and throughput. Jassim [8] presented the effect of jamming attack on
WLAN. Jamming attack decreased the throughput and increased delay. To mitigate
the jamming attack, PCF was enabled into the guard nodes.
Ad hoc on-demand routing protocol is used for mobile nodes in MANET which
can handle thousands of nodes at a time. It performs route table management to
find only one destination route instead of multiple routes and no other nodes are
required to maintain it. The main features of this protocol are that it can adopt
quickly and requires lower processing and lower utilization of network [9]. There
are three message formats which are used in this protocol. One of them is route
request (RREQ) which is a broadcast routing and used to discover a new route to
the receiver node. Second one is route reply (RREP) which is a unicast routing and
reply to its RREQ flood, and route error (RERR) which is a re-broadcast message,
used only when at least one unreachable destination is found [10].
A familiar routing protocol for mobile network is GRP which deals with the position
of source node. Because of its combined robustness of proactive and reactive routing
protocol, it is named as hybrid routing protocol [2]. Source node is liable for collecting
the information for finding the best route, and according to this information, data
packets are started to transmit. A great disadvantage of this protocol is complexity
and overhead [10].
254 P. Sen and M. Rahman
A suitable protocol for random traffic and large network is optimized link state
routing protocol. It uses multipoint relay (MRP) of node to forward packets rather
than flooding them [2]. It executes hop by hop routing and successfully delivered its
packets to the destination by following the shortest route. Its proactive behavior makes
it possible to get available routes immediately when needed [11]. The distributed
characteristic of its design makes sure that no central entity is required. OLSR is
compatible to any dense network where a considerable amount of networks perform
frequent communication [10].
4 Experimental Details
In this section, required simulation tool along with the several simulation setups will
be described.
In this paper, Riverbed 17.5 modeler is used for network simulation. It was previ-
ously known as OPNET simulator. The modified version of OPNET is Riverbed.
It is specialized for network researches and development. It facilitates the user to
design communication networks with different devices and protocols, test securities
and simulate the network with different performance parameters and applications
[7]. Several experiments have been settled in wireless technologies regarding with
development problems and their solutions.
The simulation setup describes MANET’s performance for three different protocols.
For each protocol, two scenarios are designed to compare its performance under
jamming attack and without attack. A campus network is implemented with 20
mobile nodes in 10 * 10 km. area. Each scenario is set to run for simulation as 250 s
and the seed value as 128. For traffic generation on the network, the used applications
are email (medium load), FTP (medium load) and database (low load) (Tables 1 and
2).
Investigating the Performance of MANET Routing Protocols Under … 255
The simulation outcomes are compared and studied in this section. Jammer nodes are
established inside the network for comparing the performance of this new network
with its normal condition. This comparison is accomplished by observing throughput,
delay, traffic sent, traffic received, etc.
256 P. Sen and M. Rahman
In the first scenario, AODV protocol is configured without any jammer node.
This scenario is then modified by introducing jammer nodes and compared the
performance of the new scenario with previous scenario.
From the simulation result, it is clear that jammer nodes decrease the performance
of the network by creating unwanted traffic. It reduces throughput from 4.5 megabits
to 3.0 megabits and increases delay from 3.6 to 4.8 s. Figure 2 shows the performance
of AODV in case of normal condition and under attack condition.
The same networks are configured with OLSR protocol with and without jamming
attack. Figure 3 shows the performance parameters of OLSR protocol without and
with jamming attack. When the routing traffic sent is compared, it is almost 114,000
bits without the existence of jammer whereas this value reduces to almost 72,000
with the presence of jammer.
Then the results of both simulations are compared by using throughput and
delay of the network. After introducing jammer nodes, throughput reduces from
4.0 megabits to 2.52 megabits and delay rises from 3.9 to 5.75 s which causes due to
the congestion in the network. As a result, the overall performance is fallen down.
Investigating the Performance of MANET Routing Protocols Under … 257
In this case, again two scenarios are simulated for GRP protocol; one is with the
involvement of jammer nodes and another is without jammer node. The performance
parameters of GRP protocol are compared for these two scenarios which are shown
in Fig. 4. This comparison gives clear evident about the degradation of network
performance during jamming attack. Throughput and delay of the network under
GPR protocol is also affected because of jamming attack. It decreases the number of
packet reached at the destination during run time.
6 Conclusion
Due to the behavior of wireless media between the sender node and the receiver
node, MANET is more susceptible to different attacks. These attacks are responsible
to fall down the network performance. The objective of this research was to find out
the reliable wireless routing protocol in the face of jamming attack. The networks
under AODV, OLSR and GRP protocol are all severely affected by jamming attack.
Among these three protocols, OLSR indicates the worst performance in term of
traffic sent, delay, throughput, traffic received and network load. From the observed
results of these three protocols, it can be concluded that GRP and OLSR protocols
are more attackable under jamming attack. On the contrary, AODV is verified as the
best performer in case of jamming attack among aforementioned three protocols. For
this reason, configuring the network with AODV protocol will be the best choice to
withstand under jamming attack. Security issue in WSN is now a great concern.
This research work would be further expanded to other security attacks like worm-
hole attack and byzantine attack. Prevention mechanism against these attacks will
also be discussed.
References
1. Singh, J., Gupta, S.: Impact of jamming attack in performance of mobile ad hoc networks. Int.
J. Comput. Sci. Trends Technol. (IJCST) 5(3), 184–190 (2017)
2. Rao, Y.C., Kishore, P., Prasad, S.R.: Riverbed modeler simulation-based performance analysis
of routing protocols in mobile ad hoc networks. Int. J. Recent Technol. Eng. (IJRTE) 7(6S),
350–354 (2019)
3. Popli, P., Raj, P.: Effect of jamming attack in mobile ad hoc environment. Int. J. Sci. Eng.
Technol. Res. (IJSETR), 5(5), 1521–1526 (2016)
4. Yadav, N., Dr. Kumar, V.: Securing ad hoc network by mitigating jamming attack. Int. J. Adv.
Res. Comput. Eng. Technol. (IJARCET) 4(6), 2502–2506 (2015)
5. Bandaru, S.: Investigating the effect of jamming attacks on wireless LANS. Int. J. Comput.
Appl. (0975–8887) 99(14), 5–9 (2014)
6. Manickam, P., Baskar, T.G., Girija, M., Dr. Manimegalai, D.: Performance comparisons of
routing protocols in mobile ad hoc networks. Int. J. Wirel. Mob. Netw. (IJWMN) 3(1), 98–106
(2011)
7. Jasim, S.I.: PCF investigation to improve the performance of TORA—based manet against
jamming attacks. Int. J. Comput. Sci. Eng. Survey (IJCSES) 5(3), 17–28 (2014)
8. Jassim, S.I.: Investigate the integration of PCF in WLAN to improve its performance against
attackers. J. Univ. Babylon Pure Appl. Sci. 26(5), 241–255 (2018)
9. Modi, S., Dr. Singh, P., Dr. Rani, S.: Performance improvement of mobile ad hoc networks
under jamming attack. Int. J. Comput. Sci. Inf. Technol. 5(4), 5197–5200 (2014)
10. Baxla, S., Nema, R.: Performance Analysis of ODV, OLSR, DSR and GRP routing protocols
of adhoc networks. Int. J. Innovative Res. Dev. 2(5), 888–900 (2013)
11. Jacquet, P., Muhlethaler, P., Clausen, T., Laouiti, A., Qayyum, A., Viennot, L.: Optimized link
state routing protocol for ad hoc networks. In: Proceedings, IEEE International Multi Topic
Conference, 2001, IEEE INMIC 2001, Technology for the 21st Century, pp. 62–68. IEEE
(2001)
Classification of Skin Cancer Lesions
Using Deep Neural Networks
and Transfer Learning
Abstract Skin cancer is among the life-threatening cancers, but unlike most cancers,
skin cancer is observable and can be detected in early stages, yet not many are aware
of its detectability. There are mainly three types of skin cancers, which are basal
cell carcinoma, squamous cell carcinoma, and melanoma, where melanoma is the
most dangerous type of cancer with a very low survival rate. Skin cancers are not
painful, most of the time, even though they appear to be visibly distressing it makes
them easily detectable, as cancer is nothing but the abnormal growth of skin cells.
A person can detect if a skin lesion is cancerous by taking a picture. Deep neural
networks can be used to classify the type of cancer. This can be done by collecting and
feeding several clinical images of cancerous skin lesions, segmentation, removing
noise, etc., to a deep neural network to train on before detecting cancerous lesions.
Our data was scraped from the Internet and few images were collected from the
HAM10000 dataset, ISIC Archive, and scraped images from the Web. Every class
has 3552 images which are a total of 10,656 images; image augmentation was used to
generate images to make all classes have an equal number of images. The first model
was a basic CNN model that trained, several times changing the hyperparameter
values to fine-tune the model to give accurate results, which gave us 86.5% accuracy
and implemented transfer learning with the ImageNet weights of different ImageNet
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 259
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_28
260 D. J. Devarapalli et al.
models, where ResNet50 gave us the highest accuracy of 95.6%. We have deployed
this into a Web application using JavaScript and tensorflow.js.
1 Introduction
There are many types of skin diseases and skin cancers and some of them can be
fatal. Skin cancers must be treated in the earlier stages or it proves to be deadly in the
long run. Many people neglect the skin lesions which can cost their life.In rural areas,
where the hospitals are not well equipped, there is struggle to detect the diseases in
early stages. So, the need for automatic skin disease prediction is increasing for the
patients and as well as the dermatologist. The available current diagnosis procedure
consists of long laboratory procedures and takes two weeks for the patient to get
their biopsy results, but this system will enable users to predict skin disease using
image processing. With a predictor system, a patient or the dermatologist can check
whether the lesion is malignant or not easily, so that cancer can be treated in the early
stages. We collected images of three most common skin cancers and the prediction
system is used to predict these three skin cancers, namely melanoma, squamous cell
carcinoma (SCC), and basal cell carcinoma (BCC). Early detection of melanoma can
potentially improve survival rate, yet nearly 30,000 people die yearly in the USA
alone. Skin cancers do not give any pain most of the time, but they are visible, as
the cancer is nothing but the abnormal growth of skin cells. In this paper, we have
discussed our prior study in the literature survey that has been published focusing on
a similar objective, in the Proposed Work section we have discussed our study and
workflow of its implementation, and Results section which shows promising results
when transfer learning is used. The Future Scope section of this paper discusses the
real-world possible implementations of this work.
1.1 Objective
Primary Objective: Our primary objective is to classify the skin cancers to their
types by building the best model using convolutional neural network and transfer
learning.
Second Objective: Our second objective is to learn how convolutional neural
networks work on sample data like this, to learn and carefully observe other architec-
tures such as VGG16/19, InceptionV2/V3, and ResNet50/101, which have excelled
in ILSVRC throughout the years. Study the results with and without transfer learning.
Classification of Skin Cancer Lesions Using Deep Neural … 261
2 Literature Review
There have been many types of research following not only skin cancer detec-
tion/classification but also on many skin diseases, and some of the papers that have
been published before [1]. Proposed a technique to classify a melanoma lesion only.
A very careful study on skin cancers and their anatomy performed three prepro-
cessing techniques—grayscale conversion, noise removal, and image enhancement
and used support vector machine classifier; the image is segmented and then fed into
the fit function for training, as melanoma is tested based on the shape and used a
very small sample size, which cannot identify the possibilities of real-world images
[2]. Computer-aided clinical skin disease diagnosis using CNN and object detec-
tion models gives us the influence of object detection technique in this approach
for training, which can increase the accuracy and decrease the computation cost by
removing unwanted background learning, used an ensemble learning approach to get
the final output, and used two datasets Skin-10 (which contains images belonging
to 10 classes of skin diseases, with a total of 10,218 images) and Skin-100 (which
contains 100 classes of common skin diseases with a total of 19,807 images). The
best accuracy achieved is 79.01% for dataset Skin-10 and 53.54% for dataset Skin-
100, the reason we believe for the low accuracy is because of the class imbalance
after checking the dataset they mentioned, and the model was sensitive toward the
high volume class, which may have led to the poor performance. The importance
of object detection and ensemble learning methods is emphasized. Jana et al. [3]
is a research on skin cancer cell detection using image processing gives the role
of segmentation, features extraction in image processing, and proposes a technique
to remove unwanted features in the area of interest, such as hair. Ambad [4] is an
image analysis system to detect skin diseases, and a workflow comprises basic steps,
used two-level classifier, which is a great idea, the first classifier detects if a lesion is
normal or defected and further if it is infected, the second classifier classifies whether
the lesion is a melanoma, psoriasis, dermo. A two-way classifier is a good approach.
3 Methodology
This project is aimed at determining the class of a skin cancer lesion to its correct clas-
sified class of skin cancer, achieved by training a deep convolutional neural network
and other segmentation methods. The major area of concentration and evaluation
would be in defining the number of epoch cycles, batch sizes to be good enough to
fit non-linearly to the data.
The whole workflow of this research is by following steps, and a clear explanation
and description are given to every step.
262 D. J. Devarapalli et al.
Data cleaning is done by removing the unwanted images and cropping the images
so that the lesion is focused. After data collection, the images are uploaded to the
Google Drive in different folders, that is, HAM 10,000, Web, social media. Some
images contained watermarks and clinical markings on lesions which might confuse
the model; these pictures have been either avoided or edited using the tool GIMP.
The images are resized to 224 × 224 × 3 since most of the ImageNet models use
this image format.
This step involves the final image data that is ready to be generated to a deep neural
network. Since our dataset is imbalance, which means our classes have a different
number of images, for melanoma we have found many images, the data available
was huge since it is very common, but we found less number of images for squamous
and basal cell carcinoma, for melanoma it was 1500+ images, SCC 700+, and BCC
900+ images. With class imbalanced data, there is a greater chance that the model
might overfit to the class belonging the highest number of classes, or be sensitive
to only the class that has more images. To avoid this vulnerability to overfitting, we
have used the ImageDataGenerator class that generates images with specified image
augmentations.
In deep learning, model training is the most tiring job, since it takes a lot of time to
train, to overcome this we have trained our models on Google Colab which allows
users to use their computer engine backend integrated GPUs and TPUs for free, with
almost 12 GB of RAM. We started by building a simple CNN architecture to train
on our data.
Classification of Skin Cancer Lesions Using Deep Neural … 263
CNN architecture starts with feature extraction, followed by pooling layers, fully
connected layers, and finishes with classification. Feature extraction is performed
by changing convolution layers with subsampling or pooling layers. Classification
is performed with dense or fully connected layers followed by a final softmax layer.
For image classification, CNN architecture performs better than a fully connected
feedforward neural network. Deotte [6] A basic CNN architecture contains the
following.
• Filters is the number of desired feature maps.
• Kernel size is the size of the convolution kernel. A single number 5 means a 5 ×
5 convolution.
• Padding is either ‘same’ or ‘valid’. Leaving this blank results in padding = ‘valid’.
If padding is ‘valid’ then the size of the new layer maps is reduced by kernel_size-
1. For example, if you perform a 5 × 5 convolution on a 28 × 28 image (map)
with padding = ‘valid’, then the next layer has maps of size 24 × 24. If padding
is ‘same’, then the size is not reduced.
• Activation is applied during forward propagation. Leaving this blank result in no
activation.
• We used ‘ReLu’ activation for every layer and softmax in the end, we also used
a drop out of 0.4 (40%) to generalize data, thereby avoiding overfitting. Our
architecture takes in an input_size of (28, 28, 1) ‘grayscale’ image and consists
of:
• Two convolutional layers with feature map 32 × 32 and kernel_size 3 × 3,
with activation ‘ReLu’ and one convolutional layer with feature map 32 × 32,
kernal_size 5 × 5 with stride 2.
• Two convolutional layers with feature map 64 × 64 and kernel_size 3 × 3,
with activation ‘ReLu’ and one convolutional layer with feature map 64 × 64,
kernal_size 5 × 5 with stride 2.
• Flatten layer followed by a fully connected layer, dense—128, a dropout of 0.4,
dense—3(number of classes present).
• The following model is trained with batch_size = 32 and epochs = 100, we
added batch normalization after every layer also we added two dropouts, one
with 0.4 after two layers, and 0.5 before the final fully connected layer, we used
kernel regularizer = l2(0.001), bias regularized = l2(0.001). The accuracy after
100 epochs was 0.8647 and on training accuracy, it was 0.9947, but this was not
our best model. We next trained with ILSVRC models that were trained on the
ImageNet dataset. We try to train our data without ImageNet weights and solely
on their architectures to see the results and then use the transfer learning method
to fine-tune the model and train it with ImageNet pretrained weights.
We have used ResNet101, VGG, and InceptionResNetV2 with and without
transfer learning, which means weights as None and with transfer learning weights
as ‘ImageNet’. We created a pickle object file of our image data ready in our drive
to avoid extracting data every time we ran by just loading it from Google Drive.
264 D. J. Devarapalli et al.
VGG16 Tewari [8] VGG16 is a CNN model. The model achieved 92.7% top-5
test accuracy with the ImageNet dataset. This network consists of a very simple
architecture using only 3 × 3 convolutional layers stacked on top of each other in
increasing depth.
• We have used Adam optimizer with a learning rate of 0.0001.
• With VGG16’s architecture, total params: 3,588,003, trainable params: 3,588,003,
non-trainable params: 0, with batch_size of 32, epochs 100, and validation_slipt
0.2. TRAINING ACCURACY = 0.9981 and VALIDATION ACCURACY =
0.8918.
Marcelino [10] With transfer learning, instead of starting the learning process from
scratch, we start from patterns that have been learned when solving a different
problem, such as ImageNet in our case. This way we leverage previous learning
and avoid starting from scratch which would save us a lot of time. There are three
types of transfer learning or strategies to implement transfer learning.
1. Train the entire model.
2. Train some layers and leave the others frozen
3. Freeze the convolutional base.
Out of the three strategies, the second strategy is used when we have a small
dataset, as the dataset of 10 k images is small compared to the 14million images of
ImageNet.
We used Keras.applications to use VGG16, InceptionResNetV2, and ResNet101
architectures, from the documentation provided by Keras/applications, the weights
parameter as ‘ImageNet’. The input images were extracted and resized to 224 × 224
× 3, since most of the ImageNet models use this image format and include top =
False. All the models have loss function as ‘sparse_categorical_crossentropy’ since
one image belongs to only one type of cancer, and the metrics values = [‘accuracy’]
for all.
VGG16-ImageNet Weights
• We added a convolutional layer to this with feature maps 64, kernal_size = (3, 3),
a max pooling 2D with pool_size = 2, a flatten layer, a fully connected layer of
256, a dropout of 0.5, and a fully connected layer 3 with a ‘softmax’ activation.
• We have used Adam optimizer with a learning rate of 0.0001.
• With VGG16’s architecture, total params: 12,278,915, trainable params:
12,278,915, non-trainable params: 0.
With batch_size of 32, epochs 100, and validation_slipt 0.2.
• TRAINING ACCURACY = 0.9785 and VALIDATION ACCURACY = 0.9009,
which is an acceptable result, but the difference value can be accepted but we
cannot certainly call it an optimal model. As the difference of accuracies shows
evidence of overfitting.
InceptionResNetV2—ImageNet weights
• We added a flatten layer to the loaded model, a dropout of 0.4, and finally a dense
layer 3 with ‘softmax’ activation.
• We have used Adam optimizer with a learning rate of 0.0001.
• With InceptionResNetV2’s architecture, total params: 54,451,939, trainable
params: 54,391,395, non-trainable params: 60,544. With batch_size of 32, epochs
10, and validation_slipt 0.2.
• TRAINING ACCURACY = 0.9927 and VALIDATION ACCURACY = 0.9314,
which is an acceptable result, but the difference value can be accepted but we
266 D. J. Devarapalli et al.
cannot certainly call it an optimal model. The difference also does not show much
of the evidence of overfitting.
ResNet101—with ImageNet Weights
• For ResNet101, we have added a global average pooling 2D layer, a dropout of
0.4, and finally a dense layer 3 with ‘softmax’ activation.
• We have used Adam optimizer with a learning rate of 0.0001, as we used for the
rest of the models. Gave us better results with a slow learning rate.
• With ResNet101’s architecture, total params: 42,664,323, trainable params:
42,558,979, non-trainable params: 105,344. With batch_size of 32, epochs 100,
and validation_slipt 0.2.
• TRAINING ACCURACY = 0.9992 and VALIDATION ACCURACY = 0.9563,
which is an acceptable result and almost an optimal model, but the difference
value is accepted. This has been the best result so far with training and testing
accuracies having very less difference, which means that the model has trained
well on the training data of 8516 images, generalized well without overfitting and
it can 95.63% accurately predicts to new data, and compared with the ResNet101
model without using any pretrained weights, we can say the transfer learning
method outperformed the traditional approaches.
• The learning curves (in Fig. 1) also do not show much of the evidence of overfitting.
Compared with ResNet101 without pretrained weights.
4 Results
The experimental results are shown below for the input images of skin cancers, by
the means of transfer learning approach.
Hence, from the results obtained from the deep learning algorithms (in Table 1),
it can be accounted for that ResNet101 is the best algorithm for predicting the class
of skin lesions.
Classification of Skin Cancer Lesions Using Deep Neural … 267
Table 1 Accuracies of
Algorithm Accuracy
different deep learning
algorithms With ImageNet Training accuracy Validation accuracy
weights
VGG16 0.9785 0.9001
InceptionResNetV2 0.9927 0.9317
ResNet101 0.9992 0.9563
5 Conclusion
The project’s key goal is to predict an image of skin lesion to its type with the highest
possible accuracy, by the means of transfer learning approach. Several architectures
have been trained with different learning rates, epochs, and batch sizes, however
ResNet101 architecture with ImageNet weights has given us the best accuracy to
identify the type of a skin cancer lesion ever to be published or recorded, which is
95.63% with training accuracy of 99.92%, and we do not see the model overfitting
in this case. Also, an ensemble approach that has been said to give better results
is implemented by using a basic voting mechanism that was written in Python. We
have tried to deploy this model as a Web application, but we got few errors with
the express server and tensorflow.js version. Our second goal of understanding how
these deep neural networks work and knowing how to implement them and fine-tune
them to get better results is achieved.
Since the project identifies the cancer lesion type, this can be used by both dermatolo-
gists and patients. Before sending the clinical image for the biopsy test dermatologists
can run the lesion through the model and based on the results, they can focus on vali-
dating if the lesion belongs to the class that the model has specified. This would
cut down the delay of 2 to 3 weeks for the biopsy results. If the model predicts
inaccurately in certain conditions then the model can be trained with the wrongly
classified images to better learn the features it missed in the first learning. For better
reliability on the model, because we cannot solely trust a machine for final prediction
and take the result of the machine as a final answer, we can find the performance
of dermatologists and the machine, by providing the dermatologists and the model
to classify a set of images and validate them with their predictions and evaluate the
performance of the model over an experienced doctor.
268 D. J. Devarapalli et al.
References
1. Ansari, U.B., Sarode, T.: Skin cancer detection using image processing. Int. Res. J. Eng.
Technol. (IRJET) (2017) Mumbai, India. Available at: https://www.irjet.net/archives/V4/i4/
IRJET-V4I4702.pdf
2. He, X., Wang, S., Shi, S., Tang, Z.: Computer-Aided clinical skin disease diagnosis using
CNN and object detection models, Nov 2019, China. Available at: https://www.researchg
ate.net/publication/337413270_Computer-Aided_Clinical_Skin_Disease_Diagnosis_Using_
CNN_and_Object_Detection_Models
3. Jana, E., Subban, R., Saraswathi, S.: Research on skin cancer cell detection using image
processing. In: IEEE-International Conference on Computational Intelligence and Computing
Research (ICCIC), Dec-2017, India. Available at: https://ieeexplore.ieee.org/document/852
4554
4. Ambad, P.S.: An image analysis system to detect skin diseases. IOSR J. VLSI Sig. Process.
(IOSR-JVSP) (2016) India. Available at: https://pdfs.semanticscholar.org/014e/75f75274d4b8
a75ae3e2356556f7450fdb5a.pdf
5. Brownlee, J.: How to configure image data augmentation?, (2019). Available at https://mac
hinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deep-
learning-neural-networks/
6. Deotte, C.: Basic CNN architecture, (2018). Available at https://www.kaggle.com/cdeotte/how-
to-choose-cnn-architecture-mnist
7. Fung, V.: An overview of resnet and its variants, (2017). Available at https://towardsdatascie
nce.com/an-overview-of-resnet-and-its-variants-5281e2f56035
8. Tewari, S.: CNN architecture series—VGG16 with implementation (Part I), (2019). https://
medium.com/datadriveninvestor/cnn-architecture-series-vgg-16-with-implementation-part-i-
bca79e7db415
9. Raj, B.: A simple guide to the versions of inception networks, (2018). Available at https://toward
sdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
10. Marcelino, P.: Transfer learning from pre-trained models, (2018). Available at https://toward
sdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751
Security Features in Hadoop—A Survey
1 Introduction
The large amount of data collected is known as Big Data. The data is collected from
various sources like social media, databases etc. Initially Big Data characteristics are
specified by 3v’s. They are variety, velocity and volume. Volume specifies the size
of data to be stored. Recent study forecasted that 1.8 zettabytes of data was created
in 2011 alone [1]. Around 2.5 quintillon bytes of data is created everyday and every
G. Begum (B)
CSE Department, JNTU, Ananthapuramu, India
e-mail: gousiyabegum@gmail.com
S. Z. U. Huq
CSE Department, GPREC, Kurnool, India
e-mail: szahoor@gmail.com
A. P. Siva Kumar
MGIT, Hyderabad, India
e-mail: sivakumar.ap@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 269
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_29
270 G. Begum et al.
2 Hadoop Architecture
machine there are some splits and for each split we are running a mapper that is
parallelism inside parallelism. This is called massive parallelism.. The client stores
the file on HDFS to perfom some operation on it. So, he sends a request to NameNode
to store the data. NameNode stores files metadata and tells the client, the location
of data nodes where the files in HDFS has to be stored [7]. The client then stores
the files in particular data nodes. After successfully storing the file, DataNode sends
acknowledgment to the client. The data is replicated in minimum 3 systems to avoid
data loss. A heartbeat is sent to NameNode from data nodes to inform about block
report and also tell NameNode that data node is still running. If the client want to
perform processes then Job Tracker will do that, it will ask NameNode where the
data is stored and Task Tracker will perform the functions required. While processing
the two functions done are Map and Reduce. One the two functions are completed
the output from all data nodes are taken and reduced to a single output. To tell about
current functioning of Task Tracker, it also send a heartbeat to Job Tracker. The main
distributors of Hadoop software [8, 9] are the 1. MapR, 2. Cloudera, 3. Hortonworks,
4. Microsoft HD Insight, 5. IBM Infosphere Big Insights, 6. Pivotal HD.
3 Literature Survey
Hadoop has developed with less security. In 2009 Security implementation has
started to hadoop. Hadoop distributors like Cloudera, Hortonworks and MapR have
proprietary security but these features are not present in Apache releases of Hadoop.
Security of Hadoop is based on four levels as per Cloudera [10].
• Authentication: It explains who are the users who can be authenticated.
• Authorization: It explains which authenticated users can access how much data
and what data.
• Audit: It monitors when the data is accessed, where the data is accessed from and
how the data can be accessed.
• Encryption: It explains how the data can be protected when it is at rest or moving.
• For every MapReduce job if the user use TGT then the Key Distribution
Center(KDC) would quickly become the bottleneck and traffic will increase and
there is a chance of getting Distributed Denial of Service attack.
• Kerberos tickets are not renewal frequently by which there may be hackers who
can hack the tickets and use the system.
• Deployed code should be complaint to Kerberos so there should be a separate
planning and testing for performing authentication to the code.
• If KDC fails HDFS or MapReduce will not work.
• To identify authentication breaches KDC does not have any strategy.
also supports authorizations at service level. MapReduce or YARN does not control
access to data but only provide access to resources of the clusters like memory, disk,
CPU and network I/O.
C. Auditing means keeping track of what users and services are doing in the cluster
[18]. In HDFS, audit.log is used for auditing which will audit user activities like
when the user create a file, change file permissions, etc. To audit at service level,
SecurityAuth-hdfs.audit, is used. The log files used for auditing in HDFS are
log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit
and log4j.category.SecurityLogger. Auditing in MapReduce focuses
on end user queries and jobs and to audit this mapred-audit.log
is used. To audit authorization at service level, SecurityAuth-
mapred.audit, is used. The log file used for auditing in MapRe-
duce are log4j.logger.org.apache.hadoop.mapred.AuditLogger and
log4j.category.SecurityLogger
D. Encryption: Encryption can be done for Data-in-Transit encyrption and Data-
at-Rest encryption
1. HDFS Data-at-Rest Encryption
Data at Rest encryption is encrypting data at application layer before it is sent in transit
and reaches storage. This type of encryption runs above the operating system layer
and it requires Hadoop system packages or hardware only. Within HDFS, directory
paths which has to be encrypted are broken down into encryption zones. A unique
data encryption key(DEK) [19] is used to encrypt each file in encryption zone. A
zone level encryption known as encryption zone key (EZK) is used to encrypt the
DEK into encrypted DEK (EDEK) as a plain text DEKs does not exist. These EZKs
should not be stored in HDFS because if it is stored in HDFS decryption becomes
easy, so EZKs must be accessed through a secure key server. In big enterprises, actual
storage component is taken care by a dedicated hardware security module(HSM).
Hadoop Key Management Server(KMS) is used between HDFS clients and key
server. The KMS handles both EZKs and DEKs, communicating with the key server,
and decrypting EDEKs. The KMS communicates with the key server through a Java
API called the KeyProvider.
2. Data-in-Transit Encryption
Transport Layer Security: SSL/TLS are thee protocols which is used for securing
data which moves through network. They are used to secure any socket connection.
SSL/TLS trust on a certificate authority(CA) for providing security Hadoop Data-
in-Transit Encryption: Hadoop used RPC, TCP/IP and HTTP [20, 21] are used to
communicate over the network RPC calls are used by API clients of MapReduce, Job
Tracker, Task Tracker, NameNode and data nodes. TCP/IP sockets for data transfer
are used by HDFS MapReduce shuffles uses HTTP protocol.
(a) Hadoop RPC Encryption: Hadoop’s RPC implementation supports SASL which
supports integrity, confidentiality and authentication using different variables.
274 G. Begum et al.
Data Leakage Prevention (DLP) Technology [22] is used for data security against
data leakage. This technology is introduced in year 2000. The disadvantage of DLP
is if the data is removed it does not able to protect it.
Verizon released a white paper [23] on cloud security. This model is been divided
into four security layers:
• Base: which takes cares of physical security
• Logical: checks the integrity, availability and confidentially of data, resources in
network. It has network, compute, management and storage sublayers
• Value-Added: Provides a Private IP network capability, Firewall capabilities and
VPN capabilities.
• Governance, Risk and Compliance: Ensure all measures of security in above three
layers
Based on Verizon, a security architecture by Twilio company has been introduced.
Twilio is a cloud company which implements Hadoop reliably using Amazon S3
services. It uses S3 policies and ACL.
According to [24], The computation of MapReduce is distributed in nature and
there is a chance for variety attacks such as
• Impersonation attack: A illegitimate user acts like a legitimate user by some brute
force attack and run map reduce jobs which result in data leakage.
• A Denial of Service attack: An attacker stops the functioning and accessing of
mapper or reducer using different tasks which are undesirable.
• A replay attack: Attacker use previous tasks to the data nodes and keeps them
busy continuously.
• An eavesdropping attack: Attacker gives input data and generate intermediate and
final outputs without MapReduce computations.
• A Man in the middle attack [25]: Attacker modifies or corrupts computing code
between two legitimate users.
Proper authorization, authentication, restricted access, confidentiality and input
to mapper class and reducer class is required for Secure MapReduce computation.
According to the author in [11], a Bull Eye Algorithm is proposed for hadoop.
This algorithm allows read or write of data only by authorized persons and when
Security Features in Hadoop—A Survey 275
implementing it will check that data is encrypted for better protection. Only highly
confidential data is stored in data node is checked.
There is one more approach given in [11] called NameNode approach and to
increase the security in the data available, two name nodes are used where one is
master and the other is slave. Name Node Security Enhance (NNSE) provides the
two redundant name nodes and these name nodes uses Bull Eye Algorithm.
Apache Knox [26, 27] is a framework for supporting security on Hadoop clusters.
It is a REST(Representational State Transfer) API gateway. The REST interact with
clusters using one access point. Authentication using LDAP [28] and Active Direc-
tory is managed by System Administrators. Through knox, they conduct an HTTP
header-based federated identity management and audit hardware on clusters.
Apache Ranger [17] is a centralized framework used to manage policies at
resource level and it developed various tools and techniques to standardize security
across Hadoop clusters. It also provides authorization in Hadoop.
Apache Rhino provides a security solution for Hadoop ecosystem. It is a frame-
work based on crypto codec and offers block level encryption of data in hadoop. It
also provides Token based authentication and SSO solution. To encrypt data blocks,
various key distribution and management functions which executes MapReduce jobs
are also supported. Audit logging framework for auditing [29] is also provided.
References
1. Tang, Y., Yang, J.: Secure Deduplications of General Computations. Columbia University
2. Geczy, P.: Big data characteristics. Macrotheme Rev. 3(6), 94–104 (2014)
3. Khan, N., Naim, A., Hussain, M.R., Ahmad, N., Qamar, S.: The 51 V’s of big data: survey
technologies characteristics opportunities issues and challenges. In: Proceedings of ACM
Omni-layer Intelligent Systems Conference (COINS’19), ACM, Heraklion, Crete, May (2019)
4. Martis, M., Pai, N.V., Pragathi, R.S., Rakshatha, S., Dixit, S.: Comprehensive survey on hadoop
security. Springer Nature Singapore Pte Ltd, (2019)
5. Horwitz, J., Nugent, A., Halper, F., Kaufman, M.: Big Data for Dummies. Wiley (2013)
276 G. Begum et al.
6. Akshata, Chandrashekhar, B.S.: An execution for security scheme in hadoop. MAT J. Comput.
Sci. Eng. Softw. Testing 4(2)
7. Das, D., O’Malley, O., Radia, S., Zhang, K.: Adding Security to Apache Hadoop. HortonWorks
(2011)
8. Erraissi, A., Belangour, A., Tragha, A.: A big data hadoop building blocks comparative study.
Int. J. Comput. Trends Technol. (IJCTT) 48(1):336 (2017). ISSN: 2231-2803 http://www.ijcttj
ournal.org
9. Securosis: Securing Hadoop: Security Recommendations for Hadoop Environments. Securosis.
White paper, Mar 2016 (2014-06-13). Knox Gateway Available: http://knox.apache.org/
10. Bhatal, G.S., Singh, A.: Big data: hadoop framework vulnerabilities, security issues and attacks.
Elsevier
11. Saraladevi, B., Pazhaniraja, N. Paul, P.V., Saleem Basha, M.S, Dhavachelvan. P.: Big data and
Hadoop—a study in security perspective. In: 2nd International Symposium on Big Data and
Cloud Computing (ISBCC’2015). Elsevier
12. Kohl, J., Neuman, C.: The Kerberos network authentication service (V5). (2017)
13. O’Malley, O., Zhang, K., Radia, S., Marti, R., Harrell, C.: Hadoop security design. Yahoo, Inc.,
Tech. Rep (2009)
14. Dou, Z., Khalil, I., Khreishah, A., Al-Fuqaha, A., Robust insider attacks countermeasure for
hadoop: design and implementation. IEEE Syst. J. (2017)
15. Shetty, M.M., Manjaiah, D.H., Hemdan, E.E.D.: Policy-Based access control scheme for
securing hadoop ecosystem. Springer Nature Singapore Pte Ltd. (2019)
16. Narayana, S., Securing hadoop implement robust end-to-end security for your Hadoop
ecosystem. Packt Publishing
17. Gupta, M., Patwa, F., Sandhu, R.: An attribute-based access control model for secure big data
processing in hadoop ecosystem. In: ABAC’18, Mar 21 (2018), Tempe, AZ, USA, 13
18. Spivey, B., Echeverria, J.: Hadoop security: protecting your big data platform. O’Reilly
Publishers (2015)
19. Cloudera Security Report, Cloudera version Cloudera Enterprise version 5.5x (2016)
20. Perwej, Y.: The hadoop security in big data: a technological viewpoint and analysis. Int. J. Sci.
Res. Comput. Sci. Eng. 7(3), 1–14 (2019)
21. Parmar, R.R., Roy, S., Bhattacharyya, D., Bandopadhyay, S.K., Kim, T.-H.: Large-Scale
encryption in the hadoop environment: challenges and solutions. IEEE (2017)
22. Security and Privacy in the Era of Big Data. The SMW, a Technological Solution to the
Challenge of Data Leakage, Arenci/ National Consortium for Data Science White Paper
23. Sharif, A., Cooney, S. Gong, S.: Current security threats and prevention measures relating to
cloud services, hadoop concurrent processing, and big data. In: IEEE International Conference
on Big Data, Washington, DC, USA (2015)
24. Philip Derbekoa, S.D.E.G.S.S.: Security and privacy aspects in MapReduce on clouds: a survey.
Comput. Sci. Rev. 1–28 (2016)
25. Butt, K.K., Li, G., Rehman, M.O.U.: Comparative analysis of hadoop security Ad-Ons. In:
IEEE International Conference on Computational Science and Engineering (CSE) and IEEE
International Conference on Embedded and Ubiquitous Computing (EUC) (2019)
26. Sharma, P.P., Navdeti, C.P.: Securing big data hadoop: a review of security issues, threats and
solution. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 5(2), 2126–2131 (2014)
27. Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, R,
Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., OMalley, O., Radia, S., Reed, B., Balde-
schwieler, E.: Apache hadoop YARN: yet another resource negotiator. SoCC13, Santa Clara,
California, USA, Oct (2013)
28. Priyadharshini, M., Baskaran, R., Srinivasan, M.K., Rodriques, P.: A framework for securing
web services by formulating a collaborative security standard among prevailing WS-* security
standards. Springer CCIS, Springer, Heidelberg, USA, Sep. 2012, Service, vol. 193, pp. 269–
283 (2012). https://doi.org/10.1007/978-3-642-22726-4_29
29. Kim, S.-H., Lee, I.-Y.: Data block management scheme based on secret sharing for HDFS. In:
10th International Conference on Broadband and Wireless Computing, Communication and
Applications (2015)
Optical Character Recognition
and Neural Machine Translation Using
Deep Learning Techniques
Abstract Over the years, the applications of text detection and text translation have
expanded across various fields. Many researchers have used several deep learning
algorithms for text detection and text translation separately. We propose a hybrid
methodology to use NMT with OCR to develop a better result to perform text detec-
tion and translation from an image. In this paper, we present techniques to detect and
recognize texts in Hindi from a given image and translate them into English and vice
versa. To achieve this, we are combining two concepts: optical character recognition
(OCR) and neural machine translation (NMT). The output from this hybrid scheme
gives the optimized feature.
1 Introduction
K. C. Shekar (B)
JNTUH, Hyderabad, Telangana, India
e-mail: chandhra2k7@gmail.com
M. A. Cross
GNITC, Hyderabad, Telangana, India
e-mail: mariaanishacross@gmail.com
V. Vasudevan
NIT, Trichy, Tamil Nadu, India
e-mail: vigyvasu4937@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 277
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_30
278 K. C. Shekar et al.
2 Related Work
Vijaya Kumar Reddy et al. [5] proposed an alternate neural network approach for
recognition of Hindi characters written by hand. G Vamvakas et al. [6] proposed a
total OCR approach that helps the identification of text in historical documents. This
technique can be applied to either hand-written or printed documents. Shashi Pal
Singh et al. [7] found that RNN and RAE provide better outcomes in text processing
when contrasted to other neural networks. Sarkhel, R. et al. [8] proposed a multi-
scale deep quad tree-based component extraction technique for the acknowledgment
of disconnected transcribed characters of famous Indic contents. Shahnawaz et al. [9]
proposed a neural network-based methodology for machine translation from English
to Hindi.
280 K. C. Shekar et al.
3 Proposed Methodology
Different models and functions have been used in the research reviewed so far, for
text detection and text translation. In this study, we propose a model which is capable
of performing both these tasks by the use of OCR and NMT in the following way:
Step 1: Image preprocessing
• Removal of noise present in the image.
• Removal of the ambient background.
• Handling of the various lighting conditions.
Step 2: Using an LSTM cell as a component of a CRNN to divide the image into
columns, identifying relationships between characters, and then generating the text.
• An established convolutional neural network (CNN)—this is the first layer that
breaks the image into features and divides it into feature columns.
• These columns are supplemented to a deep-bidirectional long short-term memory
(LSTM) cell, which provides a pattern to identify relationships between the
characters.
• The output of the LSTM cell is then given to a transcription layer, which takes the
character sequence, including expendable characters, and takes up a probabilistic
approach to clean the output.
• As shown in Fig. 1, two RNNs are placed from back to back: the first RNN is
responsible for generating the encoding that represents the recognized sentence,
the second RNN is responsible for taking that encoding and applying the same
logic inversely, to decode the original sentence; we can edit and train the second
RNN to decode the sentence into Hindi or any other language by using the parallel
corpora training data to train and develop it.
For our experiments, we used the Devanagri character dataset and street view text
dataset to train our model to locate and recognize texts in Hindi and English.
Street View Text Dataset Dealing with images that involve ambient noise, lighting
issues, and image artifacts is a highly demanding and arduous OCR task. The legacy
OCR algorithms cannot normally process the images in this dataset. A sample image
from this dataset is shown in Fig. 2. This dataset only has word-level interpretations
(no character bounding boxes) and can only be used for the
• Recognition of the cropped lexicon-driven word,
• Detection and recognition of the full image lexicon-driven word.
Devanagri Character Dataset This dataset contains 1800 samples from 36 char-
acter sets obtained from 25 varied writers in the Devanagri script. A distinct file
is used to store each character, and all these files are comma-separated text-based
values. Each character is estimated at around 4 KB. The organized datasets which
mirror the 36 classes are stored in the folders. There are 50 such samples inside
each class folder. A pattern of coordinates (pen-tip positions) from pen up to pen
down movement is considered as one stroke as shown below in Fig. 3. The pattern
of strokes made in a pen movement is captured by the digitizer.
IIT Bombay English Hindi Parallel Corpus We have used the IIT Bombay English
Hindi Parallel Corpus [10] to train our model to translate the detected and recognized
text from Hindi to English and vice versa. The approach proposed in this study
generates enhanced and optimized features by using LSTM with CRNN which is
an encoder–decoder model. The statistics corresponding to the number of sentences,
tokens, and types for the different types of data are given in Table 1.
The model was then able to detect the text and translate it, as shown in Fig. 4.
5 Conclusion
References
Abstract The world is suffering from the COVID pandemic. Out of empathy we
want to serve the planet with whatever possible we can do. We came up with an
idea of using technology to maintain physical distancing and tracking sanitizing of
each person at public places including ATMs and supermarkets. As monitoring each
person is very difficult, we have used detection models to solve the problem. Our
model can be deployed easily and can be integrated into a network of existing CC
cameras so that it is best used by the society. In our model, we have used close to
60 with max at 92FPS as a trade-off between speed and accuracy, and our project
has detected hand touching of the door/chair or any object around with those details
with an accuracy score of 96%.
1 Introduction
COVID-19 pandemic is spread like fire all over the world due to unknowing physical
contacts between persons. Keeping in mind that public places are visited by people on
a daily basis, we came up with a model which tracks the contacts made by hands with
different objects around the vicinity of the camera and alerting whenever we bring our
hand closer to face without washing\sanitizing (action). In our work, we considered
ATMs and shopping markets are the places that need this model for regulation of the
COVID-19 spread. Hand movements and sanitizing of hands can be detected. Every
detail of touching, timestamps, and skeleton information is stored in a database, and
C. V. Rohit (B)
Department of School of Computing, Sastra University, Thanjavur, Tamil Nadu 613401, India
e-mail: rohitchatla@gmail.com
G. Murthy
Department of Computer Science and Engineering, Avanthi Institute of Engineering and
Technology, Vizianagaram, Andhra Pradesh 531162, India
e-mail: murthy.grs@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 285
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_31
286 C. V. Rohit and G. Murthy
using cloud functions, we triggered notifications to citizens whenever they touch their
face without washing/sanitizing hands. Our models also sense whether a person wore
a mask, gloves or not, to give access to public places. Continuous camera monitoring
can be achieved by installing the model at places like ATMs and shopping malls where
doors are touched frequently and supermarkets where people get in contact with
different objects (fruits, vegetables, etc.). By integrating our model with existing CC
cameras officials now only need to sanitize that particular area instead of sanitizing
the whole market preventing adverse effects of cleansing more (cost and time). This
will also help officials to track travel and engagement history of COVID-19 positive
patients.
2 Related Works
There are so many projects which adopt YOLOv3 for different purposes but as far
as tracking/detecting for COVID cases and alerting clients using detection of human
movements and action concern none is available. Recently, India’s main contact
tracing technology launched Arogya Setu app [1] which works on the principle of
tracking of clients nearby them and alerting if COVID-positive patient is reported
nearby the client vicinity, but this will be fruitful to the fullest only when all clients
turn on their Bluetooth and GPS continuously, thus allowing to track uninterruptedly
but our model can be integrated to shopping malls and ATM’s seamlessly with a
simple software plug into existing CC cameras without installation of any sensors.
Arogya Setu also has some privacy issues like anyone can change the internal database
data snaps, etc., which we hope to be solved.
3 Methodology
If we are able to monitor people who are making contact with different objects around
them, then we can prevent these diseases from spreading. Our project used YOLOv3
for object detection to recognize the touches made by people near ATMs which is a
place that we visit frequently (Fig. 1).
YOLO model is fast and accurate enough to recognize different objects and uses
a regression mechanism. This model recognizes the images during runtime in each
frame and updates the table when required. It uses convolution neural network (CNN)
that is used as object detection in real time [2]. Each frame is divided into regions
COVID-19 Touch Project Using Deep Learning and Computer Vision 287
We preprocessed the images of the human hand, door, and common objects in context
(COCO). We have used open image dataset v6 [4] and OIDv4_Toolkit to download all
the images and used a custom script to convert Protobuf formatted label to YOLOv3
coordinates format [5, 6]. We have used darknet.conv.53 pre-trained weights and
classes to train YOLOv3. We trained some objects: Chair, door, human hand, and
used transfer learning for common objects in context (COCO objects). As we are
going to use this setup for a custom model, so we needed to write Python scripts
that were used as a wrapper to the original darknet repository, and those scripts are
used to detect and track (deep sort) human hands touching the objects. To make the
dataset serve our needs, we cropped out the unnecessary portions in the background
and focused on the foreground required objects.
3.3 Darknet
It is a data science framework written in C++ [7]. It can be installed along with two
dependencies—1. OpenCV and 2. CUDA. The computation power can be increased
288 C. V. Rohit and G. Murthy
by shifting the process to GPU from CPU using darknet. OpenCV along with darknet
gives more freedom to the model for detecting different images and videos. Darknet
flexibility makes it a favorite model for use.
3.4 OpenCV
In our project, we have used darknet/YOLOv3 for custom training, and we needed
to build OpenCV as a binder and should be built from scratch as darknet/YOLOv3
is written in C++. After custom training in our project, we used a custom Python
wrapper function for tracking (deep sort) and detecting objects. We used OpenCV
for image/video manipulations like opening of the stream, and contours (rectangular
boxes) are also constructed using OpenCV [8]. OpenCV is a computer vision library
that has 2500+ optimized algorithms for computer vision and machine learning [9].
These algorithms track camera coordinates or movements and extract 3D, 2D models
from the image, and for manipulating with the camera and video, image streams [10].
We used box overlapping methods to detect touch and hand coming to face features
of our project, for example, if you assume red box as door (Fig. 2) and a white box
as hands, then whenever in output white box touches/overlaps/comes inside the red
box, then it means hand touching the door [6].
3.5 Posenet
We used Posenet for detecting hand wash/sanitization action in our project. Our
function returns pose info (wrist left/right points, etc.), skeleton (elbow, arms lines,
etc.) info, key points, confidence, and accuracy score for all properties. It is a robust
and real-time monocular six degree of freedom relocalization system. It is trained
on a convolutional neural network to regress the 6-DOF camera pose from a single
RGB image in an end-to-end manner with no need for additional engineering or
graph optimization [11]. Posenet’s poses and skeleton helped for us to build a model
COVID-19 Touch Project Using Deep Learning and Computer Vision 289
Fig. 3 Detecting
wash/sanitize action
3.6 Tensorflow.js
We used Tensorflow.js for the hand and face detection which in turn used for hand
bring closer to face action (Fig. 4).
We used deep sort for tracking frame by frame and giving different ID’s to similar
objects which enables to track all image throughout the duration of a video feed,
once is being detected by the object detection at the first frame, if the detection of
its presence is lost, then the tracking is stopped for static (position of the camera)
video feed. To reach our goal, we tested a few models and gathered meta-information
which outstands others and gathered the results. We mainly used YOLOv3 and deep
sort algorithm [2] with customization of Posenet and OpenCV as mentioned above
for our use case as shown in below section.
What this model basically does is whenever you touch any item in your house, say,
for example, a chair with your hand, it keeps tracking this through CC camera context
(24 × 7) [6] then that info (metadata) will be sent to the database. In a database with
metadata info, we will be having importantly three attributes which are (1) touch,
(2) wash, and (3) detect. All are preset to 0, which are actually binary-valued classes
(logic high referred by 1).
How Cloud functions and pub/sub are used:
Whenever touch is detected of human hand with chair/door (Fig. 5), a cloud function
is activated to change the Touch variable to 1 and store the metadata (object coor-
dinates, timestamp, contours, etc.) in the database. Whenever you wash your hands
(Fig. 6), using Posenet/ml5.js info of poses and skeleton cloud function changes
Wash value to 1 and stores the Wash metadata in database [12] and changing Touch
to 0 else Touch remains 1 and Wash remains 0 (Fig. 7). Whenever hand coming to
face is detected (Fig. 8), then cloud function is used to change the Detect value to
Fig. 6 Posenet’s
coordinates, poses, and
skeletons info (2) Wash ==
1: wash details
COVID-19 Touch Project Using Deep Learning and Computer Vision 291
1 and obviously not detected means again changing to 0 in real time. Later to buzz
the alarm or not by simply checking if both touch and detect attribute is 1 then only
to shoot the alert, mobile notifications or of course in all stages too, depending on
attributes state (high/low) [6].
not cause an alarm to buzz or else like in in-home scenario while working before
PC/laptop whenever a person brings his hands close to face the notifications and
alarm will buzz. Same goes for supermarkets, we can use the existing CC cameras to
monitor the touches of people and send that information on the fly to the inventory
department later they can sanitize those areas in no-working hours. Using session
schemes, the touches of individuals can be sent to particular customers.
COVID-19 Touch Project Using Deep Learning and Computer Vision 293
Fig. 12 Graphical Representation of: (i) Accuracy score (dynamic stream), (ii) GPU (training) time,
(iii) memory used, and (iv) speed (FPS) of mask_R-CNN, SSD, CNN, and YOLOv3 algorithms,
respectively
5 Results
All these graphs (Fig. 12) are plotted by taking the average of 500 images/ 2 min
length video which is used as validation/testing data with a split of 80/20 into train
and testing dataset. Observing accuracy (avg dynamic) YOLOv3 is ahead, R-CNN is
ahead in accuracy (avg static stream), mask_R-CNN is detecting noise. As you can
see, the FPS graph which infers us that YOLOv3 is powerful for dynamic motions.
As YOLOv3 is sparse algorithm (macro items) which further fits for our need in
dynamic motions. So, we opted YOLOv3 for its faster predictions (as once we detect,
fits enough for next 10 s). Average FPS is way less than that of Max FPS shown in
294 C. V. Rohit and G. Murthy
Fig. 13 Testing images of (i) mask_R-CNN, (ii) SSD, (iii) R-CNN, and (iv) YOLOv3
the graph, but still YOLOv3 with 40 is a good score for average. Though GPU time,
memory usage is more for YOLOv3 for training but as training is more often a
one-time process so this con is also not a potential problem in our case. As we are
using macro-particles, (Fig. 13) so the resolution is decreased using cloud function
trigger (on the cloud) and is sent to the algorithm so FPS is increased and out average
is now bumped close to 60 with max at 92 fps, so with a trade-off between speed
and accuracy, we can achieve our project goal to detect hand touching door/chair
or any object around with those details, though accuracy is also good close to 96%
all graph is evidence for these and this also makes less use of computation on GPU
(cost-effective on GPU/CPU) as the resolution in diminished by 10’s scale [6].
We used the following configurations for training dataset
GPU: 1xTesla K80, compute 3.7, having 2496 CUDA cores, 12 GB GDDR5
VRAM.
References
Abstract A drone and rover integrated setup is used for the rapid recovery of the
people affected by a natural disaster. The rover has an onboard camera, and the
visuals are relayed to an operator remotely for control. The rover moves around
using its four wheels and the operator will manually move both the rover and the
drone using the system cameras. The rover also has an inbuilt GPS and PIR for
giving exact location and for helping in the accurate detection of victims. Using face
recognition algorithm, the user will identify the victim. This face recognition can be
done using database that contains a list of disaster-prone people. The medical staff
can also collect patient’s medical information using facial recognition algorithm for
rapid medical support and recovery. A fleet of these systems can be deployed so that
the search and rescue can be done efficiently and can save life.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 297
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_32
298 J. M. John et al.
1 Introduction
2 Drone Design
The drone provides the lift to reach the affected area quickly [2]. The drone is
integrated to the rover. The parts of the drone include the following [3].
Frame: The frame provides the support for the essential control and motor of the
quadcopter. The design should be light, should be compatible with the rover design,
and should hold the rover and carry it with ease. It also provides support for four of
the motors that provides lift to the design. However, the design should be collision
free and there should be adequate separation between each motor blades. For the
design of the frame, Hobbyking SK450 frame is used. The frame should facilitate
space for enough holes for screwing other parts to the frame.
Brushless Motors: The required lift and thrust for the drone is provided by the 4
brushless motors fixed on 4 arms of the quad frame. Brushless DC motors are used
instead of brushed DC motors, as they provide a greater thrust/weight ratio. The
motors are controlled with the help of ESC and other sensors so that correct rpm
is maintained on the motors and we can control the motion of the motor for the 6
DOF. The kv rating indicates the speed of the motor when 1v is applied. This motor
provides 1000 kV. The current of 12A in 60 s indicates the maximum current that
can be drawn. The weight of the motor is approximately 275 g.
Propellers: For the propeller, a carbon fiber propeller of length 12’, pitch 4.5’,
weight 36.5 g, and shaft diameter 6 mm. Four propellers are required for the four
quadcopter drone motors to provide the thrust. The propeller needs to be light, be
balanced to reduce the vibrations and should not overheat the motors.
ESC: Each motor needs its own electronic speed controller (ESC). The ESC
accepts commands in the form of pulse width modulated control signal and output
corresponding motor speed. The current rating of the ESC the maximum current, it
can deliver without causing the motor to overheat. They provide power to the motor
so that the required rpm is maintained to keep the drone stable. A 25A four in one
controller can be used to deliver power to the motors.
Transmitter and Receiver: The radio transmitter is what the pilot or the remote
operator controls. The transmitter sends out signal mostly in the 2.4 GHz range to
the receiver. The receiver processes these signals and send it to the flight control unit.
Increased power rating can increase the range as the operator in this case works with
the help of cameras and not line of sight communication. The drone army or swarms
of drones can act as repeaters for extending the range. The transmitter can be used
to control all six DOF.
LIPO Battery: Lithium polymer batteries are used to provide high torque to the
motors of the drone. Since LIPO batteries have very high discharge capacity they
are ideal for placement in the drone unlike a Li–ion battery which has less discharge
capacity. They also have high energy density when compared to the weight of the
battery itself. It typically has an output voltage of 3.7 V.
Attitude Sensor: The orientation and the attitude are controlled by the attitude
sensors. Basically, they contain sensors like gyroscope and accelerometer for control.
A six-axis inertial measurement unit (IMU) is used, consisting of an accelerometer
300 J. M. John et al.
and gyroscope on the same unit. Without the attitude sensors, the drone can simply tip
over and cannot fly. The drone must be able to control all the six degrees of freedom
with the least number of motors as much as possible. The six degrees of freedom that
we are using involves movement and rotation along each x, y, and z-axis as shown
in Fig. 1. All aspects of flight can be manipulated by applying different values of
thrusts to the different motors individually. When all the motors have the same rpm;
the drone lifts upwards. To move to a particular direction, for example, to the left,
the rpm of the two motors at the left can be reduced while also increasing the rpm of
the side. The six DOF are X, Y, Z, and rotational DOF includes roll, pitch, and Yaw.
3 Rover Fabrication
The design of the rover was planned to be done using 3D printing. But unfortunately,
the overall design proved to be fragile using 3D printing. So, laser cutting was used.
The design was made using SolidWorks. As can be seen from the side view in Fig. 2a,
a belt drive was also designed, but during the fabrication stage, it was difficult to laser
cut it. Figure 2b shows the 3D view of the rover. In order to laser cut the model, each
frame was resolved from 3D to 2D that is an outline was made in 2D and was printed
using laser cutter. An acrylic sheet of 5 mm thickness for the structure and 8 mm
thickness for the wheels is used. However, the wheels were found to be less strong,
so the thickness of both the front wheels was increased to 10 mm. To make the design
lighter, other materials instead of acrylic like carbon fiber or plastic can be used.
forward motion, M1A can be set to high and M1B to low. The direction is reversed
when M1A is low and M1B is high. An HC05 Bluetooth module is used, and the TX
is connected to Rx and Rx to TX on the Bluetooth module and Arduino.
The GPS module is used to get the position of the victim in Google Maps [4].
The GPS uses three satellites which sends the longitude, attitude, and time to the
GPS receiver. The GPS module is connected to the Raspberry Pi [5]. It consists of
four pins—VCC, GND, TX, and Rx. The TX is connected to the Rx and the Rx
to the TX of the Raspberry Pi and the GPS module. The VCC and GND pins are
connected to the respective pins on the Raspberry Pi. The data received is in the form
of pseudocode and it is converted to coordinates at the Raspberry Pi. The coordinates
that are converted in the Raspberry Pi are sent to an online server; in this case, we
use ThingSpeak since it is free. The latitude [6] as in Fig. 4a and longitude as in
Fig. 4b is uploaded to the server using its upload keys. The server plots the graph of
the data received as shown in Fig. 7a, b. With the help of Google API keys under the
developer options, we can plot the coordinates of the victims’ location on Google
Maps as shown in Fig. 5.
Flood Relief Rover for Air and Land Deployment (FRRALD) 303
The passive infrared sensor (PIR) detects people at arrange. The human body emits
radiations of 0.7–300 µm. The PIR has two slots. The slot is made of a material which
is sensitive to infrared radiations (IR). When the sensor is idle both slots receive the
same amount of radiations. These radiations can be the radiations from the room,
wall, etc. A positive differential change is created when a human body passes to
the first half of the sensor. When the human body leaves, a negative differential
change between the two halves is detected. These pulse changes are detected and
communicated to the Raspberry Pi.
The PIR module has three pins—VCC, GND, and control. The PIR module detects
people to about 8 m for the quick detection of humans [2]. The remote operator can
then detect the people with ease and can also know the location of the victims with
the help of GPS module. A high signal is given out from the PIR sensor on detection
of the human body to the microcontroller. Figure 6 shows the influence of the human
or animal body on the PIR sensor. The PIR module is connected to the Raspberry
Pi. The data received from the Pi is uploaded in the same way as in the GPS module
to the online server ThingSpeak. If human is detected the operator gets notification.
Figure 7 shows the data received on the ThingSpeak server, if the system shows “2”
that shows the presence of a human. On the other hand, if it shows “1” then the system
304 J. M. John et al.
conveys the absence of a human being. However, this system works with continuous
integration with the camera module for accurate results.
The PI cam is connected to a Raspberry Pi. The camera gives about 5 megapixel
clarity. Instead of the Raspbian OS, we will have to install MotionEye OS. This OS
is used for surveillance with the Raspberry Pi. The data received from the Raspberry
Pi is sent to the operator via the Internet [7]. The operator can get the visuals by
entering the Raspberry Pi’s IP address. This IP address is integrated to MATLAB
for visualization. We use database to store the relevant pictures of all the people. In
the case of an emergency this dataset can be used for face recognition. The system is
trained with AlexNet and the visual received from the Pi is used for matching. The
dataset in Fig. 8a, b are trained using MATLAB. The dataset consists of about more
Flood Relief Rover for Air and Land Deployment (FRRALD) 305
than 120 pictures each. The training is done with stochastic gradient decent with
momentum (SGDM) with an epoch of 20. 90% of the data is given for training and
the rest 10% for testing. The testing is done using MATLAB with a test set of person
1 and 2. As in Fig. 9a, b, the person 2 and 1 are detected successfully. The system
is modified for getting critical information such as patients’ blood group, medical
history, etc., along with the name. This can be used for the rapid response from the
rescue team and medical team.
8 Conclusion
FRRALD can help in the effortless and rapid rescue of victims affected by flood
and other disasters. They can get immediate medical attention. The system can assist
rescue teams to recognize the effect of the catastrophe that has taken effect. The
rescue team can rescue the people trapped by receiving the location to rescue. The
GPS accuracy was good. Some tests were carried out using the GPS and Google Maps
306 J. M. John et al.
and it gave accurate results. Tests were carried out, in which the coordinates from the
system were compared against known coordinates. With the help of the camera, the
operator gets a clear-cut view of the scenario and the effects of the disaster that has
taken place. The PIR and camera can assist in detecting and identifying the people
trapped. The face recognition is useful in getting important information about the
patient for rapid medical support. For the research work, a free ThingSpeak server
is used, but for real-time communication, operation and to send the acquired data, a
strong backhaul network and server can be used.
The challenges that FRRALD face includes the accurate detection of the victims
trapped deep within the mud. To increase the deep learning and MATLAB algorithm
and to better visualize the face of the victim, it is still a major drawback for FRRALD.
The cost of construction or the integration of the drone which can carry the payload
weight of the rover, it can make the overall cost high. To increase the accuracy of
the location tracked and relying to the operator can be a problem. The system can
be fitted with thermal camera to get an overall better visualization. The system can
also be upgraded by sending a FRRALD army or a fleet [8], to reduce the time in
detecting the victims and also to act as a repeater to increase the range.
References
1. Pedersen, J.: Use of UAVs in the NGO world. In: CRS Conference—ICT4 Development, Nairobi,
Kenya, Mar 25–28, (2014)
2. Rivera, A.J.A., Villalobos, A.D.C., Monje, J.C.N., Mariñas, J.A.G., Oppus, C.M: Post-disaster
rescue facility: Human detection and geolocation using aerial drones. In: 2016 IEEE Region 10
Conference (TENCON)
3. Alwateer, M., Loke, S.W.: On-Drone decision making for service delivery: concept and simu-
lation. 2019 IEEE International Conference on Pervasive Computing and Communications
Workshops (PerCom Workshops)
4. Tariq, R., Rahim, M., Aslam N., Bawany N., Faseeha, U.: DronAID: a smart human detection
drone for rescue. In: 2018 15th International Conference on Smart Cities: Improving Quality of
Life Using ICT & IoT (HONET-ICT)
5. Parvu, P., et al.: Autonomous system for image geo-tagging and target recognition. M J1 erospace
Conference, m press, May 2014, pp. 1–26
6. Câmara, D.: Cavalry to the rescue: drones fleet to help rescuers operations over disasters
scenarios. In: 2014 IEEE Conference on Antenna Measurements & Applications (CAMA)
7. Gaszczak, A., Breckon, T.P., Han. J.: Real-time people and vehicle detection from VAV imagery.
In: Proceedings of SPIE: Intelligent Robots and Computer Vision XXVIII: Algorithms and
Techniques, San Francisco, California, 2011, pp
8. Besada, J.A., Bernardos, A.M., Bergesio L, Vaquero, D., Campaña, I, Casar, J.R.: Drones-as-
a-service: A management architecture to provide mission planning, resource brokerage and
operation support for fleets of drones. In: 2019 IEEE International Conference on Pervasive
Computing and Communications Workshops (PerCom Workshops)
An Enhanced Differential Evolution
Algorithm with Sorted Dual Range
Mutation Operator to Solve Key Frame
Extraction Problem
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 307
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_33
308 M. Aathira and G. Jeyakumar
2 Related Works
Structural Similarity Index Method (SSIM) with classical DE algorithm to solve the
key frame extraction problem was first introduced in [21]. An extensive compar-
ative study on the conventional SSIM, entropy method and the Euclidean method
and their integration with DE was presented in [22]. It was reported that the DE
unified algorithms showed high accuracy. Following the DE_SSIM proposed in [21],
this paper proposes to integrate the proposed mutation modified DE algorithm with
SSIM approach to detect the key frames from a set of traffic surveillance videos. The
proposed mutation method is described in the next section.
4 Design of Experiments
The classical DE (cDE) and the proposed DE (DE sdrm ) were implemented to solve
the benchmarking functions chosen in the experimental setup. The AOS, nFE and
ExeTime measured for cDE and DE sdrm , for ps = 60, is presented in Table 1. The
results indicate that the proposed DE sdrm outperformed cDE by all the performance
metrics, only, for f 2 . The DE sdrm outperformed cDE by two metrics together (AOS
and ExeTime) for three functions f 3 , f 8 and f 13 . The DE sdrm outperformed cDE,
only by nFE, for two functions—f 4 and f 10 . The DE sdrm outperformed cDE, only
by ExeTime, for five functions—f 1 , f 7 , f 9 , f 11 and f 14 . Except for f 5 , in all other
functions the DE sdrm outperformed cDE by at least any one of the metrics. The
summary of inferences is “cDE was good in AOS and the DE sdrm was good in speed
in both ExeTime and nFE”.
The same experiments were repeated for cDE and DEsdrm to solve the 14 bench-
marking problems, however, with ps = 200. The values measured for the performance
metrics of cDE and DEsdrm is presented in Table 2. The superiority of the DE sdrm
was clearly evident. DE sdrm outperformed cDE in 11, 9 and 7 function cases out of
14 by AOS, ExeTime and nFE, respectively. DE sdrm could outperform cDE by all the
three metrics for five benchmarking functions—f 3 , f 5 , f 8 , f 9 and f 13, by speed (both
by ExeTime and nFE) for two functions—f 4 and f 10 and by AOS and ExeTime for
one function—f 6 .
Thus, the experiments done on the benchmarking functions proved that the
proposed DE sdrm shows superior performance than the classical DE by both the
solution and speed.
To validate further the superiority of DE sdrm , its performance was assessed on
solving the problem of extracting key frames from video. The experimental details
and the observations gathered are presented in the next section.
There was numerous evolutionary algorithm based frameworks proposed in the liter-
ature for extracting key frames from the given videos. In this experiment, the cDE
and DE sdrm were implemented for key frame extraction, to demonstrate the efficiency
of DE sdrm . The video was first converted into frames and 75 frames. The objective
of this experiment was set to extract 10 key frames from these 75 frames.
312 M. Aathira and G. Jeyakumar
The values for the DE parameters set were—ps = 10, D = 10, F = random (or
constant (0.9)), C r = 0.6, MaxGen = 10 and M tr = 3/50. A population with 10
candidates was initialized. Each candidate in the population was a set of 10 random
frames. The fitness of the candidates was measured as the ASSIM (Average Structural
Similarity Index) value of the frames in the set. The experiments were repeated for
3 trails, each with different independent runs, in order to get better comparative
analysis of the algorithms.
In Trial 1, the M tr was set 3 for Trail 1. The F values were chosen in the range
of [0, 0.5] and [0.5, 1] for the promising and non-promising regions. The proposed
DEsdrm failed to outperform cDE. The average ASSIM value of DE sdrm was higher
than the cDE.
In Trail 2, the F values were set differently for each region of the population. The
F values used were 0.5 and 0.9 for the promising and non-promising regions. The
M tr was set to 5. DE sdrm outperformed cDE with a marginal difference of 0.0039. As
well as, cDE also outperformed DE sdrm in 3 out of 5 runs. This showed the equal and
comparable performance of DE sdrm with cDE. It is worth noting here that the average
performance of DEsdrm has increased in its Trial 2 compared to its performance in
Trail 1.
In Trial 3, the F value was set constant, as 0.9, for both the promising and the
non-promising regions. The M tr was set to 50 for this trail. It was found that the
proposed DE sdrm generated key frames with lesser ASSIM values compared to cDE.
The experimental results recorded are presented in Table 3. The cDE and DE sdrm
algorithms were compared by different metrics measured on ASSIM values obtained
for the 50 runs. The best, worst and average ASSIM values of 50 runs were lesser
for the DE sdrm compared to the corresponding values of cDE. It is observed from the
results that, on comparing the corresponding runs of cDE and DE sdrm , the DEsdrm
significantly outperformed cDE in all the 50 runs. The pairwise difference between
the ASSIM values attained by the algorithms in each run is also reported in the results.
The average difference found was 0.0608. This showed the reasonable performance
enhancement achieved by the proposed DE sdrm algorithm. The key frames extracted
by the cDE and the DE sdrm are depicted in Fig. 1a, b, respectively, for a reference.
Thus, the superiority of the proposed DEsdrm algorithm was proven on a set of
14 benchmarking problems and a video analytics problem.
7 Conclusions
This paper proposed a novel mutation strategy named ‘sorted dual range mutation
(sdrm)’ to Differential Evolution (DE) algorithm. The DE, in which the classical
mutation operator replaced with the sdrm, was named as DE sdrm . To prove the
novelty of sdrm the classical DE and DE sdrm were implemented to solve a set of
14 benchmarking problems and a key frame extraction problem. The results of the
benchmarking experiments showed that the DE sdrm could outperform cDE, signifi-
cantly, for higher dimensions. For key frame extraction problems, three trials were
tried with different F values. The results revealed the trend of performance enhance-
ment of DE sdrm from Trial 1 to Trial 3. The results proved the superiority of the
proposed DE sdrm algorithm in the key frame extraction problem. The superiority of
DE sdrm was well evident in the chosen video.
314 M. Aathira and G. Jeyakumar
References
1. Rainer, S.: Differential evolution-a simple and efficient adaptive scheme for global optimization
over continuous spaces. Tech Report Int. Comput. Sci. Inst. (1995)
2. Attia, M., Arafa, M., Sallam, E.A., Fahmy, M.M.: An Enhanced differential evolution algorithm
with multi-mutation strategies and self-adapting control parameters. Int. J. Intell. Syst. Appl.
11(4), 26–38 (2019)
3. Zhou, Y., Li, X., Gao, L.: Adaptive differential evolution with intersect mutation and repaired
crossover rate. Appl. Soft Comput. 13(1), 390–401 (2013)
4. Duan, M., Yang, H., Liu, H., Chen, J., Duan, M., et al.: A differential evolution algorithm with
dual preferred learning mutation. Appl. Intell. 49, 605–627 (2019)
5. Ramadas, M., Abraham, A.: Revised mutation strategy for differential evolution algorithm.
In: Metaheuristics for Data Clustering and Image Segmentation-Intelligent Systems Reference
Library, vol. 152, pp 57–65 (2019)
6. Gokul, K., Pooja, R., Gowtham, K., Jeyakumar, G.: A Self-switching base vector selec-
tion mechanism for differential mutation of differential evolution algorithm. In: International
Conference on Communication and Signal Processing (2017)
7. Gokul, K., Pooja, R., Jeyakumar, G.: Empirical evidences to validate the performance of self-
switching base vector based mutation of differential evolution algorithm. In Proceedings of
7th International Conference on Advances in Computing, Communications and Informatics,
pp. 2213–2218 (2018)
8. Salehinejad, H., Rahnamayan, S., Tizhoosh, H.R.: CenDE: centroid-based differential evolu-
tion. In: Proceedings of IEEE Canadian Conference on Electrical & Computer Engineering
(CCECE)
9. Ali, Musrrat, Pant, Millie, Nagar, Atulya: Two new approach incorporating centroid based
mutation operators for differential evolution. World J. Model. Simul. 7(1), 16–28 (2011)
10. Prabha, Shashi, Yadav, Raghav: Differential evolution with biological-based mutation operator.
Eng. Sci. Technol. Int. J. 23(2), 253–263 (2020)
11. Jing, S.-Y.: Set-Based differential evolution algorithm based on guided local exploration for
automated process discovery. In: Foundations and Applications of Process-based Modeling of
Complex Systems, Complexity, vol. 2020, (2020)
12. Jeyakumar, G., ShunmugaVelayutham, C.: Differential evolution and dynamic differential
evolution variants—an empirical comparative performance analysis. Int. J. Comput. Appl.
(IJCA) 34(2), 135–144 (2012)
13. Jeyakumar, G., Shunmuga Velayutham, C.: Distributed mixed variant differential evolution
algorithms for unconstrained global optimization. Memetic Comput. 5(4), 275–293 (2013)
14. Jeyakumar, G., Shunmuga Velayutham, C.: Distributed heterogeneous mixing of differential
and dynamic differential evolution variants for unconstrained global optimization. Soft Comput.
18(10), 1949–1965 (2014). Springer
15. Wang, L., Zhang, Y., Feng, J.: On the Euclidean distance of images. IEEE Trans. Pattern Anal.
Mach. Intell. 27(8), (2005)
16. Algur, S.P., Vivek, R.: Video key frame extraction using entropy value as global and local
feature. arXiv:1605.08857 (cs.CV), (2016)
An Enhanced Differential Evolution Algorithm … 315
17. Liu, G., Zhao, J.: Key frame extraction from MPEG video stream. In: Proceedings of Second
Symposium International Computer Science and Computational Technology (2009)
18. Liu, H., Meng, W., Liu, Z.: Key Frame extraction of online video based on optimized frame
difference. In: Proceedings 9th International Conference on Fuzzy Systems and Knowledge
Discovery (2012)
19. Ramender, G., Pavani, M., Kishore Kumar, G.: Evolving optimized video processing and
wireless transmission system based on arm-cortex-a8 and gsm. Int. J. Comput. Netw. Wirel.
Mobile Commun. 3(5), (2013)
20. Liu, H., Pan, L., Meng, W.: Key frame extraction from online video based on improved frame
difference optimization. In: Proceedings of 14th International Conference on Communication
Technology (ICCT) (2012)
21. Abraham, K.T., Ashwin, M., Sundar, D., Ashoor, T., Jeyakumar, G.: An evolutionary computing
approach for solving key frame extraction problem in video analytics. In: Proceedings of
ICCSP-2017—International Conference on Communication and Signal Processing (2017)
22. Abraham, K.T., Ashwin, M., Sundar, D., Ashoor, T., Jeyakumar, G.: Empirical comparison
of different key frame extraction approaches with differential evolution based algorithms. In:
Intelligent Systems Technologies and Applications, ISTA 2017 Advances in Intelligent Systems
and Computing, vol. 683, pp. 317–326 (2018)
Annotation for Object Detection
1 Introduction
Video and image processing are a highly researched fields and are predicted to
continue expanding for a significant period. The improvement of computing capabil-
ities and easy access to video and image recording gadgets have enabled the develop-
ment of computer vision applications in surveillance, disease detection, autonomous
vehicles design, etc. Since most real-world applications are highly sensitive, it is
imperative to trained and tested machine learning algorithms on huge datasets.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 317
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_34
318 P. Myna et al.
Niche applications, such as those in biology and astronomy, often do not have
annotated datasets or easily accessible high-quality images. Thus, manual image
collection and annotation become the only option [1]. Another important application
of this tool is to compare the accuracy of algorithm-based object detection to the
accuracy of detection by the human eye.
The prime focus of this paper is to discuss the design of a manual annotation tool
and check the accuracy of the same with respect to algorithm-based annotation.
Annotation of an image means associating critical extra information with the
image/diagram. In this tool, all persons and objects are identified in the image and
are assigned the correct labels. YOLO9000 is a real-time object detection algorithm
used for classification for objects in the annotation tool.
Intersection over Union (IoU) is an evaluation metric used popularly used to check
object detection accuracy. This tool provides a feature to check the IoU between
human annotated image and image annotated done by the tool [2].
2 Related Work
Over the years, many automatic or semi-automatic annotation tools have been
developed [3]. Most of them work using pre-trained weights or targets. Thus, for
applications where no targets exist, manual annotation becomes a necessity.
The object detection algorithm which has been tested for accuracy in this paper
is a version of YOLO [5]. YOLO detects objects and provides a confidence score for
how accurate the detection of the object is.
YOLO employs regression and compacts the whole detection pipeline to one
network. A single head iterates through sections of the image and processes the same
using a few convolutional layers to get a feature map. Then, offsets are calculated to
get an anchor box. This system of anchor and offsets is reported to decrease training
time. A threshold confidence score of 30% is generally used while generating object
detection outputs.
YOLO9000 [6] is a more optimized version and is a better fit here. The use of
Siamese Networks [7] has helped to train with the limited annotated surveillance
data that is available.
As computer vision applications expand, newer annotated datasets for specific needs
are required. To provide context, some commonly used datasets are discussed briefly:
1. Common Objects in Context (COCO) dataset [4]: This dataset is of 328,000
images and 91 object classes of objects in their natural surroundings. It has
labels for commonly seen objects such as cat, car and eye glasses. This dataset
was annotated by a tool called coco-annotator.
2. ImageNet dataset [8]: This mammoth dataset contains 12 subtrees with 5247
synsets of classifications and 3.2 million images. This dataset contains more
detailed labels like Egyptian cat, freight car, passenger car and sunglasses. This
dataset was hand-annotated.
3. SUN dataset [9]: This dataset focuses on scene categorization with 397 categories
and 130,519 images. This contains images with object labels such as door, car
and tree, as well as scene labels such as cafeteria, farm and elevator. This dataset
was hand-annotated.
2.4 Metric
The metric chosen to measure object detection capabilities in this paper is Intersection
over Union (IoU) [10].
Intersection over Union calculation requires:
1. The actual hand-labelled bounding boxes, referred to as ground-truth bounding
boxes.
2. The bounding boxes predicted as output by the object detection model.
Figure 1 explains how Intersection over Union is calculated. An IoU score greater
than 0.5 usually indicates good detection.
320 P. Myna et al.
3 Implementation
The input to the system was videos of busy streets. A script was run to extract
frames from the video. Then, frames were run through YOLO9000 and also annotated
manually. The accuracy of detection was calculated using IoU. The frontend for the
application was implemented using ReactJS. MongoDB was used for its ability to
store semi-structured and unstructured data.
Images are uploaded to the tool where it is displayed with two layers; the actual
image and a transparent layer above on which annotation is done. Up to 100 images
can currently be uploaded at once.
Humans annotate each of these images manually by drawing boxes around each
person in the image. If required, YOLO9000 can be run on the images as well to
detect objects classified as ‘people’. On saving, a file with original images, human
annotation details and YOLO9000 annotation coordinates are stored on the local
system.
Input to the system is uploaded as images or as a video through a script to extract
frames. As inputting images individually would be cumbersome during testing, short
videos were inputted to the system. Frames were extracted from these videos at
random intervals and processed (Fig. 2).
A single page application, using ReactJS, has been created to provide access to the
annotation tool. The user flow has been crafted to be simple and intuitive for users.
The application has a drawable area, where the image to be annotated is layered
with a canvas. The user can proceed to manually annotate the displayed image by
Annotation for Object Detection 321
drawing boxes around persons, using their mouse. When the user saves the annota-
tions, all coordinates are stored at the backend. Further, the user can run a comparison
with the YOLO9000 for the annotated images and download all the results.
3.3 Design
The frontend is built with ReactJS as a single page application (SPA). The application
is created using create-react-app, and each of the page components is dynamic. HTTP
requests are made from the ReactJS app, to the backend. The backend consists of an
Application Programming Interface (API) written in Go language and NodeJS. The
annotated images are stored using Mongo Atlas cloud services.
A Docker image of the application is created which is used to create a Docker
container. The container is hosted on AWS cloud services thus ensuring security and
scalability (Fig. 3).
4 Experimental Setup
Suitable images of people were collected. Then boxes, called bounding boxes, were
drawn around the object of interest using two sets of coordinates. The coordinates
are denoted by (X1, Y1) and (X2, Y2) such that X1 and Y1 are the coordinates of the
top-left corner of the object, and X2 and Y2 are the coordinates of the bottom-right
322 P. Myna et al.
Fig. 3 Design
corner of the object. Coordinates are measured with the top-left corner of the image
as the origin.
Using the two sets of coordinates, all the four corners of the object section of the
image can be represented as:
(X1, Y1)—Top-left coordinate of object
(X2, Y1)—Top-right coordinate of object
(X1, Y2)—Bottom-left coordinate of object
(X2, Y2)—Bottom-right coordinate of object
A two-part experiment was setup to record coordinates as follows:
(a) Human Annotation: The authors of this paper manually recorded the coordinates
of boxes around people in the images.
(b) Machine Learning Algorithm: The images are annotated by the chosen ML
algorithm, where emphasis is laid on a particular label/class of objects. Addi-
tionally, most algorithms give a confidence score for these detected objects in the
images. This paper explores using the YOLO9000 object detection algorithm
in the tool.
Annotation for Object Detection 323
YOLO9000 works well on images with abundant noise and thus is selected for the
accuracy comparison in this paper. YOLO9000 has been tested for object detec-
tion on the ImageNet detection validation set and has received a score of 19.7mAP
(mean Average Precision). On testing with the COCO dataset, YOLO9000 has scored
16.0mAP for the 156 classes not in COCO [6].
Images of Indian urban and rural locations were used. A good mixture of images of
busy streets, markets and other public spaces was used. Four hundred images were
used to check the versatility of YOLO9000, especially to check its application in
monitoring crowded Indian public spaces.
YOLO9000 [6] has been chosen as the object detecting algorithm for its capa-
bility to provide coordinates of bounding boxes around 9000 objects and to provide
confidence scores. This case study focuses on detection of the ‘people’ label.
The images were uploaded to the tool, where each image was manually annotated
using the annotation tool.
Finally, the coordinates for people detected by YOLO9000 were evaluated with
respect to those detected by humans, using IoU. It is unrealistic to expect the model
to predict the exact coordinates of any detected object. By considering the area of
overlap between the ground-truth bounding boxes and the predicted coordinates, the
closeness of values generated by the model and by hand labelling can be measured
(Fig. 4).
In the accuracy analysis, the IoU is computed for each object (person) detected.
Further, the detected bounding boxes are looped over, and the IoU for each is
computed.
In order to measure IoU, each bounding box labelled as ‘people’ is checked with
all the possible ground truths. Then, the maximum IoU is considered for that specific
bounding box (non-max suppression). The above is repeated for each bounding box
detected by the YOLO9000 algorithm.
To tackle the possibility that people detected in the ground truth are completely
ignored by the YOLO9000 algorithm, the difference in the number of people detected
by YOLO9000 and the number of people annotated for ground truth is calculated.
324 P. Myna et al.
Fig. 5 Case study results: annotation by YOLO9000. Red bounding boxes represent annotation by
humans, and blue ones represent annotation by YOLO9000
These unaccounted detections of people are added with zero values to the list of the
people detected for IoU calculation. Then, IoU scores are averaged over images and
finally over the entire dataset (Table 1).
5.4 Results
Efficient image annotation is possible using this tool. Also, this tool enables stream-
lined, convenient testing of object detection and text detection algorithms. This
would be a significant convenience during the development of algorithms related
to computer vision.
On comparing manual annotation to YOLO9000 annotation, an IoU score of
0.3005834618310959 was obtained for our dataset. On assuming that the human eye
has an accuracy of 100% in detecting people, YOLO9000 scored about 30%. This
shows that human annotation might be more reliable for a variety of sensitive needs.
This tool helps to conveniently annotate large sets of images. Functionalities to allow
annotation by YOLO9000 have also been implemented. The use of open-source
software has made the tool inexpensive and thus, accessible,
The YOLO9000 feature currently processes an image in approximately 4 s on a
1.6 gigahertz, dual core system. The use of higher capacity processors would greatly
improve the speed of the YOLO9000 feature. Additionally, the use of improved
metrics might help us better evaluate and compare human annotation to machine
annotation.
References
1. Russell, B.C., Torralba, A., Murphy, K.P., et al.: LabelMe: a database and web-based tool for
image annotation. Int. J. Comput. Vis. 77, 157–173 (2008)
2. Cheng, Q., Zhang, Q., Fu, P., Tu, C., Li, S.: A survey and analysis on automatic image
annotation. Pattern Recogn. 79:242–259 (2018)
3. Zhang, D., Islam M.M., Lu, G.: A review on automatic image annotation techniques. Pattern
Recogn. 45(1):346–362 (2012)
4. Lin, T.-Y., et al.: Microsoft coco: common objects in context. In: European Conference on
Computer Vision. Springer, Cham (2014)
5. Redmon, J., et al.: You only look once: Unified, real-time object detection. In: Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (2016)
6. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (2017)
7. Koch, G., Zemel, R., Salakhutdinov, S.: Siamese neural networks for one-shot image
recognition. In: ICML Deep Learning Workshop, vol. 2 (2015)
8. Deng, J., et al.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference
on Computer Vision and Pattern Recognition. IEEE (2009)
9. Xiao, J., et al.: Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition. IEEE (2010)
10. Rosebrock, Adrian: Intersection over Union (IoU) for object detection-Py image search.
Machine Learning, Object Detection, Tutorials (2016)
Development of Self Governed Flashing
System in Automotives Using AI
Technique
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 327
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_35
328 N. Sankarachelliah et al.
1 Introduction
New technologies are nothing but continual development in the technologies already
exist with the further updating. Today most part of our surroundings are equipped
with an intelligence system, which play a vital role in our day to day life. They are
playing major role in the every activity of our day to day life to further improve the
performances and to enhance the human ability. System which is programmed to
think like humans and mimics their actions [1]. The Intelligent System (IS) can be
described as a device that integrates information into machine handling applications.
Intelligent devices often perform complex automatic processes that are not feasible
under the conventional programming model. Human-machine interface helps the
driver to perform various tasks, for example intelligence system can be used to
control the turn signals in automotives. The current turn signaling system requires
the driver to turn on/off the signal for the required turn.
The remainder of the article is structured as follows. Section 2 describes the Need
Statement and Present Study in Sect. 3. Discussion on the existing system in Sect. 4
and discussion of our proposed system in Sect. 5. The experimental findings were
discussed in Sect. 6 followed by conclusion and future work in Sect. 7.
The testing is carried out in the four-wheeler car for making a turns, change directions
or pass a car Drivers normally use the indicators [2]. Almost drivers either refuse to
signal or do not turn off the indicator while changing from one lane to another [3, 4].
Though it look like a minor violation to refuse a signal change, a lot of car crashes
occur when turning without notice or when the lane switches.
3 Problem Identification
The present study conducted by automotive engineers shows that most of the drivers
nearly 48% failed to turn the indicator off while changing the lane or while making a
turn and similarly 25% failed to turn the indicator on while making a turn [4]. More
study shows that most of the drivers not using the turn signals, nearly 20 crore times
for a day, for a year it comes to nearly 7500 crore time. It creates more problem than
the disturbances in the driving.
Mentioned numbers shows that there is an alarming rate of increase in this problem
and it is happening globally all around the world.No solution has been made till date
to address to this issue and whole present system is dependant on the driver input.
A driver mistake on the road not only threatens the safety of the driver, but also that
of the following cars. A single act of neglect quickly impacts a variety of individuals.
Development of Self Governed Flashing System in Automotives … 329
4 Existing Systems
The conventional turn indicator is fully manual controlled system which requires the
driver to turn on/off the signal for the required turn.
Often, this system may delay a driver’s response to trigger the turn signal. Some
drivers don’t trigger the turn signal because their hands need to be separated from
the steering wheel to turn the light on. The approach is much more challenging for
less experienced drivers.
4.2 ORVM
ORVM defines Outside Rear View Mirror shown in Fig. 1. Indicators mounted on
the rear view mirror to make sure the driver can quickly locate the signal and respond
correctly, particularly though a vehicle drive parallel to the car and has crossed the
traditional indicator mounted on the windshield. This is also very suitable for a
U-turn, because from a perpendicular perspective, these signs are clearly visible.
Fig. 1 OVRM
The system actuates the indicator by recognizing the driver’s speech which is done
with the help of Google Maps Voice Assistant [5].
5 Proposed Solution
The proposed system is designed to automatically turn the indicator on / off and totally
evacuate the manual operation during overtaking and change of lane. Currently, the
330 N. Sankarachelliah et al.
proposed system focuses only on two-way traffic. The system draws input from
various devices and sensors.
The Framework comprises of 3 segments
1. Camera data source
2. Steering angle data source
3. Ranging sensor.
When the vehicle (A) leaves its current lane, it crosses a lane line with or without
a second vehicle (B) in proximity. The automatic activation may occur when the
vehicle processes a data from the device on the first vehicle to determine whether or
not it crosses the lane [6]. If the first vehicle (A) is computed, the turn signal will be
activated.
In the case of unmarked lane
The automatic activation may occur when the vehicle processes a data from the
device on the first vehicle (B) to determine whether any vehicle is in front of that or
not and the distance between two vehicles is lowered or not if it is determined then
the turn signal will be activated.
For Prior Indication
Velocity Difference is inversely proportional to the distance.
Relative Velocity between A and B
VB − V A = 40 km/h
The distance between A and B is decreased. There is a more chance to overtake
(Fig. 2).
Introduction of Autopilot [7] mode enhances safety and comfort features of the
vehicle. Autopilot is designed to support you with the most burdensome driving
parts. Autopilot adds new features to make the Tesla safer and more reliable over
time and improves current functionality. Autopilot allows the car to automatically
steer, accelerate and brake within its path. Present Autopilot functions require active
supervision of the driver.
Block Diagram
Figure 3 shows a flowchart of an exemplary SGF embodiment.
Development of Self Governed Flashing System in Automotives … 331
6 Experimental Results
System Block Diagram is proposed and designed software for lane detection and
tracking using OpenCV (AI tool) [6]. Blinkers are automatically switched on/off,
that is made by controller. It requires a program (code) that is also designed for the
implementation (Fig. 4).
Avoid almost 50% of drivers failing to use indicators while changing lanes or
overtake another vehicle.
types of vehicles and for all way traffic. And switch to alternate vision solutions. It’ll
then be practically tested.
Acknowledgements We would like to express our sincere thanks and gratitude to our college
management for providing excellent infrastructure, laboratory and computing facilities to complete
this research work successfully.
References
1. https://www.igi-global.com/dictionary/intelligent-system/15045
2. Yusuf, M.M., Karim, T., Saif, A.S.: A robust method for lane detection under adverse weather
and illumination conditions using convolutional neural network. In: Proceedings of the Inter-
national Conference on Computing Advancements, pp. 1–8 (2020)
3. http://www.foxbusiness.com/features/2012/05/04/half-drivers-dont-use-turn-signals
4. Ponziani, R.: Turn signal usage rate results: A comprehensive field study of 12,000 observed
turning vehicles. In: SAE Technical Paper. SAE International (2012). https://doi.org/10.4271/
2012-01-0261
5. Divakar, A., Krishnakumar, S., et al.: Automatic vehicle turn indicator using speech recognition.
Int. J. Recent Technol. Eng. (IJRTE) 8, 6697–6700 (2019)
6. https://towardsdatascience.com/tutorial-build-a-lane-detector-679fd8953132
7. https://www.tesla.com/autopilot
Comparison Between CNN and RNN
Techniques for Stress Detection Using
Speech
Abstract The profession of maintaining law and order is not an easy task. It is an
inherently stressful job. Due to an increase in crime, policeman’s working hours have
also increased, resulting in poor psychological health and increased risk of suicide.
Hence, we are building software for the detection of stressed and non-stressed speech
for policemen. We propose to develop a system for Central Police Research (CPR)
using Machine Learning techniques. We are identifying if a person is in a stressed or
non-stressed condition using Python language. We are using two techniques Recur-
rent Neural Network (RNN) and Convolutional Neural Network (CNN) to detect
stress in speech.
1 Introduction
Speech is an expression of ideas and thoughts using articulate vocal sounds. Stress is
a mental, physical, or emotional factor which cause mental or bodily tension. In this
research work, we are using machine learning techniques to determine whether an
individual is in stress or non-stress condition, given an audio recording. The database
is generated in two ways for this research work which are database generated at
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 333
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_36
334 B. Pathak et al.
CPR department and recorded voice samples from Internet. In the training phase,
recorded samples need to be converted into an appropriate format and provided to the
preprocessor for applying different processing techniques such as noise reduction,
silenced voice removal, etc.
Preprocessing output is given to the feature extraction. We have used mel-
frequency cepstral coefficients. Representation of power spectrum of a speech is
called Mel-frequency Cepstral (MFC). MFCC is the most efficient technique for
feature extraction, and it is further given to a supervised learning algorithm that is
CNN and RNN techniques. CNN generates fixed size output by taking fixed size
input. RNN, on the other hand, can handle arbitrary input/output lengths, but would
typically require much more data compared to CNN because it is more complex.
2 Literature Survey
Kouzani and Kipli [1] present a depression detection using MRI. To find whether the
person is under depressed or normal condition, the brain’s structural MRI, and it’s
volumetric features has been investigated so that we can determine features that are
contributed to the more accurate depression detection. It gives accuracy up to 80%
but the cons of this existing system is that it is a costly system.
Lee and Kan [2] have researched depression detection using EEG. They stated that
in a recent study about this topic electroencephalography (EEG) is used for analyzing
the waves of the brain. Brain’s electrical activity is measured by EEG. They got the
accuracy of about 70% but as a cons of this project, usually people don’t prefer to
go through EEG for detection of depression. Even it is giving high accuracy people
don’t prefer this process.
In this paper [3], the main target of the review was to discover the occurrence
of stress or anxiety or depression for patients having normal pathologies prompting
voice. The pathologies were focused on MTD and PVFMD because of its presumed
connection to mental conditions of patients.”
In [4], feature extraction used as the impulse response of the vocal response track,
and then, it is convolution with the glottal source of excitation signals, and funda-
mental frequency IAIF removes vocal track effects. SWIPE algorithm as a feature
selection method. The technique used for classification is specifically, stressed and
neutral PDF curves. In the paper, feature extracted are formants, BFCC, PLP, MFCC,
and energy. In this existing system, if they would use neural network algorithms which
surely add on to the accuracy of their system.
Alghowinem [5] uses the dataset of the speech signal, extracted features such as
linguistic and acoustic, energy, intensity, loudness, jitter, HNNNR, MFCC followed
by support vector machine classification technique. Databases used are Berlin
database, entered face database, and expressive speech database. Database used for
developing this system is highly efficient.
In [6], the ORI-DB database used in this with the extraction of feature technique
as a spectral category with Support Vector Machine (SVM) mode classifier. The
Comparison Between CNN and RNN Techniques for Stress … 335
accuracy is obtained between 80 and 84.5%, due to the prior knowledge of the
emotions considered with the help of processing of speech. In this system, accuracy
achieved more due to known dataset samples. Hence, cons of this research work is
that we can process only the known samples of dataset. System is not efficient for
unsupervised techniques.
3 Database Generation
We have collected speech samples of stressed and non-stressed from media coverage
and YouTube videos. By considering the recent incidents such as the nationwide
pandemic situation of COVID 19, acid attack survivor’s speech, other judicial victims
of various cases, etc., the stressed speech samples collected from all such situational
videos. For non-stressed speech samples, by taking samples of family members,
relatives as they were in non-stressed phase.
4 Methodology
See Fig. 1.
336 B. Pathak et al.
Audio channels, sample rate, bit depth are the audio properties that need prepro-
cessing. Librosa package in a python is used for audio processing. In this research
work for preprocessing, we used Librosa’s load() function. It has a default sampling
rate of 22.05 kHz; it normalizes the data and flattens audio channels to mono. Figure 2
shows speech signal after preprocessing. Duration of audio sample is set to 3 s.
further given to a supervised learning algorithm that is CNN and RNN techniques.
Figure 3 shows the plot for MFCC data. We have taken 13 MFCC coefficients per
frame for our dataset. There are total 259 frames for each audio sample.
4.3 Classification
The classification of stressed data and non-stressed data has been done using two
classifiers that are RNN and CNN.
4.3.1 RNN
RNN is a supervised machine learning technique. RNN is one of the types of artificial
neural networks. RNN uses its internal state for processing variable-length sequences
of inputs which are derived from feedforward neural networks. RNN is used to denote
networks of two classes with a similar structure, one having infinite impulse and the
other having finite impulse.
Here, we have used Long Short-Term Memory (LSTM) which is an RNN architec-
ture. Feedback connections are present in LSTM. It can process single data points, as
well as complete sequences of data like video or speech. LSTM commonly comprises
of a cell, input, an output, and a forget gate. The cell can remember values over
random time intervals. The flow of data in and out of the cell is controlled by three
gates. LSTM systems are suitable to process, classify, and make predictions using
time series data as there can be delays in the unknown period among significant
338 B. Pathak et al.
time series events. While training traditional RNN, a fading gradient problem is
encountered. LSTM is developed to deal with this problem.
4.3.2 CNN
CNN is a supervised machine learning technique. CNN involves input, output, and
multiple hidden layers. In hidden layers, series of convolution layers are present. The
RELU layer is normally used as an activation layer. There are additional convolutions
such as fully connected, pooling, and normalization layers. These layers are called
hidden layers. Activation function and final convolution are used to mask inputs and
outputs of hidden layers. In the convolutional layer, stride, depth, and zero padding
are the three hyperparameters which control the size of the output.
The formula to calculate number of neurons which will fit in a given volume is
I input size
k kernel field size of the convolutional layer neurons
p zero paddings, and
s stride.
5 Results
A loss model indicates how bad the model’s prediction on a single epoch. In the
case of RNN, we have taken 10 epochs, and for CNN, 50 epochs have taken. If the
model’s prediction is true during validating the data, then the loss will be zero, else
the loss will be more. Model loss of training and testing data for CNN and RNN is
shown in Figs. 4 and 5, respectively. We have taken epoch on X-axis and loss on
Y-axis to show the model loss.
We used 104 speech samples to train the model. For RNN, we got 85.58%
accuracy. For CNN, we got 81.73% accuracy.
The confusion matrix is a table that describes the performance of the classifier
using the results of validation data. The confusion matrices shown in Tables 1 and 2
for validation data are obtained from two classification RNN and CNN, respectively.
We can verify model’s accuracy from confusion matrix tables.
Comparison Between CNN and RNN Techniques for Stress … 339
6 Conclusion
Acknowledgements We also like to express special thanks of gratitude to Central Police Research
(CPR) Department who gave us the opportunity to do this wonderful project and contributing toward
the betterment of the health of police officials.
References
1. Kouzani, A.Z., Kipli, K.: Evaluation of feature selection algorithms for detection of depression
from brain SMRI scans. Adv. Comput. Sci. Appl. Technol. (ACSAT) (2013)
2. Lee, P.F., Kan, D.P.X.: Decrease alpha waves in depression: an electroencephalogram (EEG)
study. In: International Conference on Biosignal Analysis, Processing and Systems (ICBAPS)
(2015)
3. Dietrich, M., Abbott, K.V., Gartner-Schmidt, J., Rosen, C.A.: The frequency of perceived stress,
anxiety, and depression in patients with common pathologies affecting voice. J. Voice 22(4)
(2008)
4. Simantiraki, O., Giannakakis, G., Pampouchidou, A.: Stress detection from speech using spectral
slope measurement. Pervasive Comput. Paradig. Mental Health (2016)
5. Alghowinem, S.: A comparative study of different classifiers for detecting depression in speech:
multi classifier system. In: IEEE International Conference on Acoustics, Speech and Signal
Processing (2013)
6. Stolar, M.N., Lech, M., Allen, N.B., Stolar, S.J.: Detection of Adolescent depression from speech
using optimized spectral roll-off parameters. Biomed. J. 2, 10 (2018)
7. Fung, P., Zuo, X., Li, T.: A multilingual database of natural stress emotion. In: Proceeding of
the 8th International Conference on Language Resources and Evaluation (LREC’12) (2012)
8. Hawila, S., Tomba, K., Dumoulin, J., Khaled, O.A., Mugellini, E.: Stress detection through
speech analysis. In: Proceeding of the 15th International Joint Conference on e-Business and
Telecommunication (ICETE) (2018)
Finding the Kth Max Sum Pair
in an Array of Distinct Elements Using
Search Space Optimization
Abstract The algorithm aims to find the Kth max sum pair of two indices of an
array of N (N ≥ 2) distinct elements [a1 , a2 , a3 , …, an ]. If the sum of values repre-
sented by the 2 indices of a single pair in array A is the same as that of any other pair,
i.e., if P(i, j) and P(m, n) are 2 distinct pairs and if (A[i] + A[j] = A[m] + A[n]),
then the pair containing the index which represents the maximum of all 4 values
represented by indices of the 2 pairs in the array obtains the highest priority, i.e., if
(A[m]>A[i]>A[n]>A[j]), then the pair containing the index m obtains the highest
priority. The purpose of this algorithm is to optimize the computation of recommen-
dations on real time platforms. At the time of making a purchase on e-commerce
platforms, with millions of options available in the product catalog, the algorithm
can be used to recommend the best complementary product that can be bought as a
pair with the main product or two all together different products of same type as of
main product which can be bought as a combo or a pair. Not only the top recommen-
dations, but random recommendations are also necessary so that the customers get
a good breadth or variety of the available products in the catalog. In this paper, we
propose an algorithm which can be used to address both the scenarios in real time
and conclusively, it is evident that the time and space complexities are independent
of K.
D. Ahire (B)
Walchand College of Engineering, Sangli, Maharashtra, India
e-mail: ahiredeepak20@gmail.com
S. Bhandari
Department of Computer Science and Engineering, Annasaheb Dange College of Engineering
and Technology, Ashta, Maharashtra, India
e-mail: smriti_bhandari@yahoo.com
K. Kamble
Department of Computer Science and Engineering, Walchand College of Engineering,
Sangli, Maharashtra, India
e-mail: kirankamble5065@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 341
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_37
342 D. Ahire et al.
1 Introduction
Searching is presently one of the most tedious tasks. With increases in the amount
of data every second, searching can consume a substantial amount of CPU time
depending upon the data organization and searching mechanisms used. Not only
the processors, but it also taxes the users performing online search [1, 2]. More
importantly, regarding the retrieval of data in real time, for example, in the case
of an e-commerce platform, Amazon found that a delay of a fraction of a second
can cost several percentage points of sales, as discussed in [3]. A Harvard business
review discusses how the design of the product page also affect the online sales [4].
In addition to processing time and page design, the consumer demands are also
affected by the availability of substitutes and complements, as discussed in [5]. Not
only swiftly rendering customers’ requirements but, sales promotion is also important
to change their perception and purchasing behaviour [6]. One of the most important
incentives of sales promotion is product bundling or combo [7]. Customers generally
tend to buy combos instead of one main product if offered at the same price or lower
price. According to the study discussed in [8], it was found that the customers placed
a perceived value on combo meals, even if it would cost the same when choosing
items a la carte. People also prefer combo meals even there is no discount [8]. Results
reported in [9], on basis of experiments provided empirical evidence that customer
preferred bundles in circumstances when the searching cost was reduced by the
availability of the combos as a choice. As customers expect the swift and graceful
experience because of the fact that humans are generally bad at choosing when plenty
of options are available, described in [10], loading all possible recommendations on
the page is not a feasible option as it would also consume a lot of time. Instead, the
top K matching combos can be recommended to the customer, which is analogous to
the “Top N Video Ranker” technique used by Netflix as discussed in [11]. Not only
the top K matching recommendations, but completely random recommendations are
also useful for customers so that they get a good breadth or variety of the available
products in the catalog as discussed in [11, 12]. The random recommendations, not
only provide a good breadth of available products, but act as a choice for customers
in terms of price, brand, current situations, publicity, and many more. These can also
be used to promote new, popular, non-recent and non-popular items which would
have not been found out by the users as described in [13]. A customer may like the
recommended combos, but may reject it on the basis of price as discussed in [14].
Customers also buy combos with high cost than their preferred price limit if they
get a better product quality, brand or for something extra which they might not have
considered while buying in the first place [15]. Therefore, computing and suggesting
random recommendations are also crucial with the top K recommendations taking
swiftness into account. In this paper, we propose an algorithm which can be used
Finding the Kth Max Sum Pair in an Array of Distinct Elements … 343
2 Problem Statement
2.1 Abbreviations
2.2 Example
Consider an array of distinct elements, A = {1, 2, 3, 4}, which represents the list
of commonality indices of the products. The maximum value, 4, belongs to the main
product and the other three belong to the related products.
Thus, the possible set of pairs (representing pairs of indices of the array A, sorted
according to PRI-1 and PRI-2, as mentioned in Table 1) are = {(2, 3), (1, 3), (0, 3),
(1, 2), (0, 2), and (0, 1)}.
Notice that the 3rd pair and the 4th pair ordered in the above set represent an
equal sum, i.e.(A[0] + A[3]) = (A[1] + A[2]), but the 3rd pair obtained the highest
priority as A[3] >A[2] >A[1] >A[0]. Therefore, if K = 3, then the answer is P(0,
3), which means that the combo of the main product and the product having
commonality index of 1 (as A[0] = 1) stands 3rd in the list of combos with respect
to PRI-1 and PRI-2.
The algorithm devised in this article is inspired by a similar use case based on two
arrays. The use case aims to find the first K maximum sum pairs from all the possible
sum pairs using the two given arrays, as discussed in [17–20]. For our scenario, we
Finding the Kth Max Sum Pair in an Array of Distinct Elements … 345
need an approach which works for a single array. The naive approach is to compute
a set of all possible pairs P(i, j) and sort them according to PRI-1
and PRI-2. After
sorting, the first K maximum sum pairs are returned. There are N2 distinct pairs that
can be formed from a list of N elements. Therefore, for sorting, it will take O( N2 *
N
log( 2 ) = O(N 2 * log(N)) time complexity and O( N2 ) space complexity. A more
optimised approach is to limit the search space as discussed in [17–20]. Identical
approach was used to devise an optimised algorithm which works for a single array,
provided in Algorithm 1.
Algorithm 1 Find_Kth_Max_Sum_Pair(A, K)
Finds the Kth Max Sum Pair in an array of distinct elements
Pre A is the array containing distinct elements, K is a constant
Post Array A is sorted
Return The TARGET_PAIR
1: Sort the array A in a non-decreasing order.
2: Enqueue the pair having MAX_SUM, i.e., P(N−2, N−1) into P-QUE.
3: Initialise temporary variable dequeue_count = 0.
4: Initialise S to an empty set (to avoid insertion of duplicate pair in the P-QUE ).
5: Loop( P-QUE is not empty and dequeue_count = K−1 ) do
5.1: Dequeue the P-QUE front item. (let it be P(i, j)).
5.2: Increment dequeue_count by 1.
5.3: Insert the dequeued pair P(i, j) into set S.
5.4: Enqueue new pair P(i−1, j), if not present in the set S, if i−1 ≥ 0 and (i−1) = j.
5.5: Enqueue new pair P(i, j−1), if not present in the set S, if j−1 ≥ 0 and (j−1)= i.
End Loop
6: Return P-QUE front (front item is the required TARGET_PAIR).
End Algorithm 1
Rather than computing all the possible pairs, the focus is to generate only the first
K Max Sum Pairs. We are enqueuing the pair once and also dequeuing it from the
queue. Therefore, for each pair, there are 2 operations (enqueue into and dequeue
from P-QUE). Therefore, for K number pairs, the time complexity is equal to O(K
* 2 * log(K)), that is, O(K * log(K)), as maximum number of pairs possible in this
case is of the squared order of the size of the input (K max = N2 ). The factor of
log(K) is generated because of the max heap operations. Gerald’s O(1) time priority
queue [21] is significant in reducing the factor of log(K) and thus finally reducing
the time complexity to O(K). A. Mirzaian and E. Arjomandi devised an O(N) time
algorithm for a similar use case for selecting the Kth smallest element in matrix
which is cartesian sum of 2 sorted vectors of real numbers each of size N [22]. For
our scenario, we have to compute the Kth Max Sum Pair using a single array. For K
= 1, we can just find the maximum sum pair in the given array and the pair can be
computed in O(N) time
complexity and O(1) space complexity as discussed in [23].
The case for K = N2 is equivalent to finding the minimum sum pair in the given array
and the pair can be computed in O(N) time complexity and O(1) space complexity
as discussed in [24].
346 D. Ahire et al.
Table 2 All pairs and corresponding pair sums are from Sect. 2.2 mentioned earlier
Pairs Corresponding pair sum Number of pairs (having pair
sum ≥ Corresponding pair
sum)
(2, 3) 7 1
(1, 3) 6 2
(0, 3) 5 4
(1, 2) 5 4
(0, 2) 4 5
(0, 1) 3 6
The pair sum is the sum of the values in array A at the indices represented by the pair, and the
corresponding pair sum is the pair sum of the pair mentioned in the respective rows
3 Proposed Approach
Having computed the greatest TARGET_SUM, we know that there are at max-
imum N/2 such pairs possible that have a pair sum equal to the greatest computed
TARGET_SUM.
Thus, our new search space is now reduced to the size of N/2 at maximum.
TheTARGET_PAIR lies in the newly generated search space.
Finding the Kth Max Sum Pair in an Array of Distinct Elements … 347
We need an offset to find the Kth pair. We cannot directly return the Kth pair in
the new search space, so we need to subtract the count of pairs (having pair sum >
greatest computed TARGET_SUM).
Therefore, let us define a function, F(given Sum), which will return the number
of pairs having a pair sum ≥ givenSum.
Therefore, F(givenSum) = (Number of pairs with pair sum ≥ givenSum). Then,
Note that, in Eq. 2 a very small value (Δ), is added to the greatest computed
TARGET_SUM and then passed to function F, as we want the count of pairs having
a pair sum greater than the greatest computed TARGET_SUM.
Note: Δ can take any type of value as per the datatype of input array elements.
For example, if the datatype of an array is an integer, then Δ = 1 or, if the datatype
of array is a floating-point, Δ = 0.001. Finally, the K N ew th pair in the new search
space is the required TARGET_PAIR.
Time Complexity: For sorting the array, it takes O(N * log(N)) and for binary search
and finding the greatest TARGET_SUM, O(N ∗ log(MAX_SUM − MIN_SUM)).
Therefore, Time Complexity = O(N ∗ log(N )) + O(N ∗ log
(MAX_SUM − MIN_SUM)) = O(N ∗ max(log(N ), log(MAX_SUM − MIN_
SUM))).
348 D. Ahire et al.
Space Complexity: For the generation of New Search Space: O(N/2). Therefore,
Space Complexity = O(N).
Both the algorithms were implemented, and multiple tests were performed using the
environment mentioned in Table 3. Tables 4, 5, 6, 7, 8 and 9 describe the results of
the tests carried out on unsorted array containing N distinct elements having values
in the range (1, N). Test 1 is comparison of average runtimes for both the algorithms.
K
Average Runtime = ( Runtime to compute the ith pair)/K (3)
i=1
For Algorithm 1,
K
Runtime to compute the ith pair = 2 ∗ (1 ∗ log(1) + 2 ∗ log(2)
i=1
+ 3 ∗ log(3) + . . . . . . + K ∗ log(K )) ≈ K 2 ∗ log(K ) ≈ O(N 4 ) (4)
From Eqs. 3 and 4, it is evident that for Algorithm 1, the time complexity to compute
average runtime is O(N 4 ), therefore, we have limited the input size for comparison
to 200, to compute the average runtime in polynomial time. For Test 2, the average
runtime for greater values of N was computed by limiting K, that is 1 ≤ K ≤ N . In
Test 3, the average runtime was computed for first half of the range of values of K,
that is, 1 ≤ K ≤ (N ∗ (N − 1)/4), whereas for Test 4, it was for second half, that is,
(N ∗ (N − 1)/4) ≤ K ≤ (N ∗ (N − 1)/2). Test 5 was carried out for constant input
size of N = 104 , so that the average runtime could be computed for greater values
of K in polynomial time. Test 6 was carried out solely on Algorithm 2 for greater
values of both N and K.
From Table 4, it is evident that, for N ≤ 200, there was more than 90% reduction in
average runtime as compared to Algorithm 1. From Test 2 and Test 5, it is evident that
Algorithm 1 performs better than Algorithm 2, when 1 ≤ K ≤ N . From Tables 6 and
7, it is evident that, % reduction in average runtime for higher values of K is greater
as compared to the lower values. For values of K ≥ N , the Algorithm 2 outperforms
Algorithm 1. Table 8 depicts that as K increases, the % reduction in runtime for
Algorithm 2 increases. Test 5 and Test 6, act as a proof that it is possible to calculate
Finding the Kth Max Sum Pair in an Array of Distinct Elements … 351
the pairs with lower priorities or higher in rank, in real time using Algorithm 2, giving
them a fair chance to be recommended to the customer with saving more than 89%
of average CPU time as compared to Algorithm 1.
5 Conclusion
In this manuscript, we addressed the importance of both top K and random recom-
mendations. We discussed why the existing algorithms have a high response time and
can’t provide a fair chance to the pairs with lower priority. We proposed an optimized
algorithm that finds the Kth max sum pair in time and space complexities indepen-
dent of K and proved why it is a feasible real time searching option by carrying out
various tests for different values of N, K and an input array of commonality indices
thus, supporting a catalog of a million products.
References
1. Sohail, S., et al.: Product recommendation techniques for ecommerce—past, present and
future. Int. J. Adv. Res. Comput. Eng. Technol. 1(9), 219–225 (2012)
2. Gayle, L.: How Product Bundling Can Boost Your E-Commerce Sales. https://returnonnow.
com/2018/08/how-product-bundling-boost-ecommerce/ (2018)
3. Einav, Y.: Amazon Found Every 100ms of Latency Cost them 1% in Sales. https://www.
gigaspaces.com/blog/amazon-found-every-100ms-of-latency-cost-them-1-in-sales (2019)
4. Harmeling, C. et al.: How to Design Product Pages that Increase Online Sales. https://hbr.
org/2019/11/how-to-design-product-pages-that-increase-online-sales
5. Rousu, M., et al.: The effects of selling complements and substitutes on consumer willing-
ness to pay: evidence from a laboratory experiment. Can. J. Agric. Econ. Revue canadienne
d’agroeconomie. 56(2), 179–194 (2008)
6. Ai, W., Yazdanifard, R.: The review of how sales promotion change the consumer’s perception
and their purchasing behavior of a product. Glob. J. Manage. Bus. Res. E Mark. 15(5), 32–37
(2015)
7. Foubert, B.: Product Bundling: Theory and Application. University of Antwerp, Faculty of
Applied Economics, Working Papers (1999)
8. Sharpe, K., Staelin, R.: Consumption effects of bundling: consumer perceptions, firm actions,
and public policy implications. J. Pub. Policy Mark. 29(2), 170–188 (2010)
9. Harris, J., Blair, E.: Consumer preference for product bundles: the role of reduced search
costs. J. Acad. Mark. Sci. 34(4), 506–513 (2006)
10. Schwartz, B.: The Paradox of Choice. Harper Perennial, New York (2004)
11. Gomez-Uribe, C., Hunt, N.: The netflix recommender system. ACM Trans. Manage. Inf. Syst.
6(4), 1–19 (2016)
12. What the difference between global and random recommendations?. https://support.
shippingeasy.com/hc/en-us/articles/115005400683-What-the-difference-between-global-
and-random-recommendations
13. Hopfgartner, F.: News recommendation in real-time. In: Smart Information Systems: Com-
putational Intelligence for Real-Life Applications, pp. 169–170. Springer International Pub-
lishing (2015)
352 D. Ahire et al.
14. Zhao, Q., et al.: E-commerce recommendation with personalized promotion. In: Proceedings
of the 9th ACM Conference on Recommender Systems—RecSys ’15, pp. 19–226 (2015)
15. Shanthi, R.: Customer Relationship Management. MJP Publisher (2019)
16. Linden, G., et al.: Collaborative Recommendations Using Item-to-Item Similarity Mappings
(2020)
17. Agrawal, N., Sharma, S.: K maximum sum combinations from two arrays—GeeksforGeeks.
https://www.geeksforgeeks.org/k-maximum-sum-combinations-two-arrays/
18. Gangwar, A.: N Max Sum Pairs. https://discuss.codechef.com/t/n-max-sum-pairs/14769
19. Liu, S.: N Max Pair Combinations. https://shengqianliu.me/heaps-and-maps/n-max-pair-
combinations
20. K maximum sum combinations from two arrays—Tutorialspoint.dev—TutorialsPoint.dev.
https://tutorialspoint.dev/data-structure/heap-data-structure/k-maximum-sum-
combinations-two-arrays
21. Paul, G.: A complexity O(1) priority queue for event driven molecular dynamics simulations.
J. Comput. Phys. 221(2), 615–625 (2007)
22. Mirzaian, A., Arjomandi, E.: Selection in X + Y and matrices with sorted rows and columns.
Inf. Process. Lett. 20(1), 13–17 (1985)
23. Mittal, N.: Find the Largest Pair Sum in an Unsorted Array—GeeksforGeeks. https://www.
geeksforgeeks.org/find-the-largest-pair-sum-in-an-unsorted-array/
24. Ojha, D.: Smallest Pair Sum in an array—GeeksforGeeks. https://www.geeksforgeeks.org/
smallest-pair-sum-in-an-array/
25. Mittal, N.: Count Pairs in a Sorted Array Whose Sum is Less than x—GeeksforGeeks. https://
www.geeksforgeeks.org/count-pairs-array-whose-sum-less-x/
Dynamic Trade Flow of Selected
Commodities Using Entropy Technique
Abstract The global entropy and uniformity of three major commodities that
exported most worldwide and all commodities combinedly were observed from
1995 to 2018. It is found that global entropy and uniformity of manufactured goods
chiefly classified by material are higher than two other products machinery, transport
equipment, and crude materials, inedible, except fuels, and it was fluctuating in two
products. In 2018, they fall remarkably in manufactured goods and crude materials,
inedible, except fuels. Further, local entropy and number of trade partners of two
world’s most influencing countries China and USA were investigated and compared
it with world’s average value of local entropy and trade partners. It is seen that local
entropy and trade partners of two countries are much higher than world’s average
value except some early cases of local entropy of China. It is also observed that
when local entropy and number of partners both countries declined together, world’s
average values fall significantly.
1 Introduction
Economic transactions are made among the countries for the purpose of providing a
nation with commodities it lacks in exchange for those commodities that it produces
in abundance. This is called the export-import relationship or worldwide trade [1–4]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 353
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_38
354 S. A. Milu et al.
2 Data Analysis
The trade data has been here was from United Nations (UN) COMTRADE. We
studied whole concept for a time period from 1995 to 2018 about 24 years for 168
countries. To build an international trade network, we constructed a matrix, where
each cell of the matrix contains the trade value that it exported with a partner country.
If there is a trade relationship between any two countries (e.g., France, India, Japan,
China, USA), then we’ve set the trade value and otherwise 0.
3 Methods
We have worked with exported value for measuring the global entropy. We took
the values, where Yi j (t) ≥ 0; here, trade flows from country i to j for the year t.
Dynamic Trade Flow of Selected Commodities Using … 355
We used the normalized value of exported volume by calculating like this yi j (t) =
N
Yi j (t)/Y (t), total trade value Y (t) = k,l Yk,l (t).
Then, the global entropy for each commodity for a time period is determined as
[16–18].
S(t) = − yi j (t)log2 yi j (t)
i, j
This equation provides the information about the problem of which pair of coun-
tries make a partnership for trading a product. If global entropy increases, it means
that total number of trading pair countries are increasing. On the other hand, it is
concentrated only some specific pairs of countries if the entropy decreases.
3.2 Uniformity
Uniformity gives the information how much the trade heterogeneous or homogeneous
was. To calculate
uniformity, we need the total partners involved in trading products
such as P(t) = i, j,yi j ≥0 1. Then, the uniformity is the ratio, U(t) = S(t)/ log2 P.
The trade flux will be normalized locally as f i j (t) = Yi j (t)/Yi (t) where Yi (t) =
j Yi j (t) and then calculate local trade entropy of a country i in year t.
The world’s average local entropy is calculated by
si (t)
savg =
N
Here, N = 168 (Total countries considered)
356 S. A. Milu et al.
Total number of countries that a country makes trade partnership with those coun-
tries. We investigated the number of partnerships for export value of some specific
countries.
pi (t) = 1
j,yi j (t)>0
4 Results
We have calculated global entropy and uniformity for produced products manufac-
tured goods vitally categorized by material; machinery and transpor equipment;
crude materials, inedible, except fuels, and all commodities for the exported trade
value. In Fig. 1a, we can see the global entropy of all commodities, and manufac-
tured goods are higher than two other products during whole time period. It means
that manufactured goods are the most exported products that involved higher trade
partners and obviously with a higher trade value. The global entropy was gradually
increasing till 2007, and it was almost constant till last except a sharp fall of global
entropy of manufactured goods in 2018. The global entropy of crude materials is
higher than machinery and transport equipment till 2006, and then, it interchanges
with each other with some fluctuations. A sharp fall in crude materials is also seen
in 2018 like manufactured goods.
In Fig. 1b, we explained the uniformity of trade all over the world. The higher
uniformity means that the trade was homogeneous over the world. In other words, it
can be said that the trade was evenly distributed; the influence of a specific country
was less in higher uniformity. On the other hand, lesser uniformity means the hetero-
geneity or unevenly distributed trade. Like global entropy, a transition is seen in
uniformity in both the products manufactured goods and crude materials in 2018.
Dynamic Trade Flow of Selected Commodities Using … 357
Fig. 1 a Global entropy for all commodities; manufactured goods chiefly categorized by material;
machinery and transport equipment; crude materials, inedible, except fuels. b Uniformity for all
commodities; manufactured goods chiefly categorized by material; appliance and transit ingredients;
crude materials, inedible, except fuels
We calculated local entropy for the most influential two countries: China and USA
in world trade and compared it with average local entropy of the world. In Fig. 2a,
in 1995, the local entropy of manufactured goods chiefly classified by material for
China is its lowest value; then, it increased gradually over time, and it takes the
highest value of entropy after 2010. A sharp fall is shown in 2018. In the case of
USA, it is decreasing till 2000, and then, it is increased up to 2017. In 2018, the
entropy of USA is fallen as like china. In world’s average local entropy, the value
remains same almost in whole time period, but in 2018, it is declined due to the fall
of both china and USA.
In machinery and transport goods (Fig. 2b), local entropy is increasing with small
fluctuation for China, and there is a rapid fall in 2018. While, the entropy of USA is
increased from 2000 to 2006 and slightly declined from 2007. We see in this product
that there is no significant change in world’s average local entropy.
Like two other products, in crude materials, China’s local entropy has fallen in
2018 (Fig. 2c). It was increasing with some significant fluctuations up to 2017 and fell
perniciously in 2018. It causes a fall in world’s average value as in the manufactured
products. On the other hand, USA has an opposite change in 2018 with an upward
transition. It means that the influence of China is higher than USA in this product.
And finally, we considered all the combined products named all commodity, we
saw that local entropy of China was also fallen drastically in 2018 which impacted
the average local entropy of the world deficiently (Fig. 2d).
358 S. A. Milu et al.
Fig. 2 Local entropy for a manufactured goods chiefly classified by material, b machinery and
transport equipment, c crude materials, inedible, except fuels, d all commodity
In trade partnership analysis, we took the most influential two countries: China and
USA in world trade to see their number of trade partners and compared it with average
number of partners of the world in the same manner of local entropy.
In Fig. 3a, we see partners of China and USA are almost same in whole time
period, and there is a sharp fall to 75 for both countries for manufactured goods. It
affected to the average partnership of the world trade as we see a downward transition
in world average partnership which means that trade partnership of China and USA
has a very strong effect on world trade.
In machinery and transport goods (Fig. 3b), the partnership in China in 2018 has
fallen sharply. For this reason, the average world partnership has decreased slightly
which means that it effected world trade of these products a little.
Dynamic Trade Flow of Selected Commodities Using … 359
Fig. 3 Trade partnership for a manufactured goods chiefly categorized by material, b machinery
and transport equipment, c crude materials, inedible, except fuels, d all commodity
Like two other products, in crude materials, China’s number of partners has fallen
in 2018 (Fig. 3c). The number of partners of China was gradually increasing with a
little fluctuation and fell perniciously in 2018.
In all commodity, we saw that the partnership of China was also fallen drastically
which impacted a little bit in the average partnership of the world (Fig. 3d). We can
see a common change in every product for china in 2018 which effected world trade
slightly. But when China and USA both’s partnership fallen, we see a great impact
in world trade as seen in manufactured goods. We also found that one thing is in all
products China’s fall which was constant in 2018, but the fall of USA was only in
manufactured goods in trade partnership.
360 S. A. Milu et al.
5 Conclusion
The global entropy describes the trade relationship among countries, and uniformity
represents how much uniform the trade was for the considered individual products
and all commodity. From three products, we have found that manufactured goods
were involved to higher global entropy and uniformity and two other products with
some fluctuating values. From the local entropy and trade partnership analysis, we
found that local entropy and the partnership of China and USA are much higher than
world’s average except some early years of local entropy of China. That’s why we can
say that the two countries have an impactful influence in world trade. We also noticed
that China’s local entropy and trade partnership drastically fall at 2018 in almost all
products which have an small effect on world trade but when local entropy and trade
partnership of China and USA fell together it effected world trade significantly that
we have seen in manufactured goods in both local entropy and trade partnership.
Acknowledgement This work is fully funded and supported by ICT Division, Ministry of Posts,
Telecommunications and Information Technology, Bangladesh under the ICT fellowship scheme.
References
1. Eaton, J., Kortum, S.: Technology, geographyand trade. Econometrica 70(5), 1741–1779 (2002)
2. Helpman, E., Melitz, M., Rubinstein, Y.: Estimating trade flows: trading partners and trading
volumes. Quart. J. Econ. 123(2), 441–487 (2008)
3. Rose, A.K.: Dowereally know that the WTO increases trade? Am. Econ. Rev. 94(1), 98–114
(2004)
4. Foschi, R., Riccaboni, M., Schiavo, S.: Preferential attachment in multiple trade networks.
Phys. Rev. 90, 022817 (2014)
5. Serrano, M.A., Boguná, M.: Topology of the world trade web. Phys. Rev. E 68, 015101 (2003)
6. Garlaschelli, D., Loffredo, M.I.: Structure and evolution of the world trade network. Physica
A 355, 138–144 (2005)
7. Fagiolo, G., Reyes, J., Schiavo, S.: World-trade web: topological properties, dynamics, and
evolution. Phys. Rev. E 79, 036115 (2009)
8. Riccaboni, M., Schiavo, S.: Structure and growth of weighted networks. New J. Phys. 12,
023003 (2010)
9. De Benedictis, L., Tajoli, L.: The world trade network. World Econ. 34, 1417–1454 (2011)
10. Riccaboni, M., Rossi, A., Schiavo, S.: Global networks of trade and bits. J. Econ. Interac.
Coord. 8, 33–56 (2013)
11. Riccaboni, M., Schiavo, S.: Stochastic trade networks. J. Complex Netw. forthcoming (2014)
12. Fagiolo, G., Reyes, J., Schiavo, S.: The evolution of the world trade web: a weighted–network
analysis. J. Evol. Econ. 20, 479–514 (2010)
13. Bhattacharya, K., Mukherjee, G., Saramäki, J., Kaski, K., Manna, S.S.: The international trade
network: weighted network analysis and modelling. J. Stat. Mech. P02002 (2008)
14. Cha, M.-Y., Lee, J.W., Lee, D.-S.: Complex networks and minimal spanning trees in
international trade network. J. Korean Phys. Soc. 56, 998 (2010)
15. Oh, C.-Y., Lee, D.-S.: Entropy of international trades. Phys. Rev. E95, 052319 (2017)
16. Shannon, C.E.: A Mathematical Theory of Computation Bell. Syst. Tech. J. 27, 379 (1948)
Dynamic Trade Flow of Selected Commodities Using … 361
17. Pulliainen, K.: Entropy measures for international trade. Swedish J. Econ. 72, 40 (1970)
18. Lei, H., Chen, Y., Li, R., He, D., Zhang, J.: Maximum entropy for the international division of
labor. PLoS ONE 10, e0129955 (2015)
An Automated Bengali Text
Summarization Technique Using
Lexicon-Based Approach
Abstract There is enough resources for English to process and obtain summarize
documents. But this thing is not directly applicable for Bengali language as there is
lots of complexity in Bengali, which is not same to English in the context of
grammar and sentence structure. Again, doing this for Bengali is harder as there is
no established tool to facilitate research work. But this necessary as 26 crore people
use this language. So, we have gone for a new approach Bengali document sum-
marization. Here, the system design has been completed by preprocessing the i/p
(input) doc, tagging the word, replacing pronoun, sentence ranking, respectively.
Pronoun replacement has been added here to minimize the rate of swinging pro-
noun in the output summary. As the pronoun replacement, we have gone ranking
sentences according to sentence frequency, numerical figures (both in digit and
word version) and document title. Here, if the sentence has any word that exists in
title also taken into our account. The similarity between two sentences has been
checked to deduct one as that causes less redundancy. The numerical figure also
makes an impact, so they were also identified. We have taken over 3000 newspaper
and books documents words has been trained according to grammar. And two
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 363
H. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_39
364 B. Jahan et al.
documents have been checked by the design system to evaluate the efficiency of
designed summarizer. From the evaluation system, it is been found that the recall,
precision, F-score are 0.70 as it is 70%, 0.82 as it is 82%, 0.74 as it is 74%,
respectively.
1 Introduction
2 Literature Review
have (i) preprocessing, (ii) scoring/ranking sentence and (iii) generating summary.
It has also term frequency (TF), inverse document frequency (IDF) and positional
value (PV).
The presented method of Haque et al. [5], it summarized Bangla document by
using an extraction based summarization technique. The four major steps of their
method are given here: (i) preprocessing, (ii) scoring/ranking sentence, (ii) sentence
clustering, (iv) generating summary.
Efat et al. [6] suggested a summarization method as an extraction based which
acts on the Bangla documents. At the same time, it is capable of summarizing a
single document. It has two major steps in their proposed method: (i) preprocessing,
(ii) scoring/ranking sentence and summarization.
The method of Das and Bandyopadhyay [7] presented the identification of
sentiment from the text, combines it and lastly signifies the text summarization.
They used a sentiment model to restore and integrate sentiment. The integration is
based on the presentation of theme clustering (K-means) and document level theme
relational graph algorithms and finally generates summary selected by the standard
page rank algorithm for data retrieval.
3 Suggested Method
For successfully we have employed two tagging systems. One is general tagging
system, and another is special tagging system. The special tagging system makes
the thing best and updated.
Every word is made to tag (like noun, pronoun, adjective, verb, preposition, etc.).
By using a lexicon database [2] and SentiWordNet [3]. The lexicon database and
SentiWordNet have limited number of predefined words. Using lexicon database,
the words can be tagged as “JJ” (adjective), “NP” (proper noun), “VM” (verb),
“NC” (common noun), “PPR” (pronoun), etc. On the other hand, SentiWordNet has
list of words with tag as “a” (adjective), “n” (noun), “r” (adverb), “v” (verb), “u”
(unknown). Based on these predefined lists of words, we have experimented on 200
Bangla news documents and found that 70% words can be tagged. Bangla words
(especially verb) are very much interesting [1]. Though we use word stemming to
identify the original term of the word, 100% inactive verbs cannot be stemmed. In
fact, it is very difficult to identifying verb because there are many suffixes in
Bangla. For example, basis on the tense and person, the English words “do” may be
“doing”, “did” and “does”, but on the other hand, the word may have different
forms in Bangla. To consider the present continuous tense Like, “কর” (kor-do),
three main forms of this word can only depend on the first, second and third person.
366 B. Jahan et al.
Also, it can be “করছি” (doing) for first person, “করছ” (doing) for second person
and “করছেন” (doing) for third person, respectively. To consider the present con-
tinuous tense Like, “কর” (kor-do), three main forms of this word can only depend
on the first, second and third person. Also, it can be “করছি” (doing) for first person,
“করছ” (doing) for second person and “করছেন” (doing) for third person, respec-
tively. The forms of verbs for all these meanings of “you” in Bangla are also
different. For instance, all these meanings for the forms of verbs of “you” are also
different in Bangla. As, “আপনি করছেন” (you are doing), “তুমি করছ” (you are
doing), “তুই করছিস” (you are doing) where those terms are specified in present
continuous tense and also with second person. Thus, the word “কর” (do) may have
the given forms: “করে” (do), “করেন” (do), “করিস” (do), “করি” (do), “করছে”
(doing), “করছেন” (doing), “করছ” (doing), “করছিস” (doing), “করছি” (doing),
“করেছে” (did), “করেছেন” (did), “করেছ” (did), “করেছিস” (did), “করেছি” (did),
“করুক” (do), “করুন” (do), “করল” (did), “করলেন” (did), “করলে” (did), “করলি”
(did), “করলাম” (did), “করত” (do), “করতেন” (did), “করতে” (did), “করতিস” (did),
“করতাম” (did), “করতেছি” (doing), “করতেছ” (doing), “করতেছেন” (doing),
“করছিল” (doing), “করছিলেন” (doing), “করছিলে” (doing), “করছিলি” (doing),
“করছিলাম” (doing), “করেছিল” (doing), “করেছিলেন” (doing), “করেছিলে” (do-
ing), “করেছিলি” (doing), “করেছিলাম” (doing), “করবে” (do), “করবেন” (do),
“করবি” (do), “করব” (do), “করো” (do). Thus, there is no any comparison between
the complexity of verb in Bangla and English. However, verb identification is very
important for language processing because the verb is the main word of a sentence.
So, the complexity of verb in Bangla cannot be compared with English. A list of
suffixes are considered as for the final checking in following: “ইতেছিস” (itechhis),
“তেছিস” (techhis), “ইতিস” (itis), “ইলে” (ile), “ইবি” (ibi), etc. Now, if the word has
suffix, it is tagged as a verb. The result of word tagging has been improved from
68.12% (before using the list of suffixes [4]) to 70% (after using the list of suffixes).
We get some preliminary tagging in this step, and later, it may be updated in the
next steps and also along with certain words will be specifically tagged as acronym,
named entity, occupation, etc., in the next step [8–11].
After general tagging, special tagging was introduced to identify the words as
acronym, elementary form, numerical figure, repetitive words, name of occupation,
organization and places.
1. Examining for English acronym: When the words are formed by the initials of
the other words, then it is called acronym such as “ইউএনও” (UNO), “ওআইসি”
(OIC), “‘ইউএসএ” (USA). For examining these kinds of words, when we can
separate these words that like “ইউএনও” (UNO) to match with “ইউ” (U), “এন”,
“ও” (O), those are matched every letter of the words. Actually, we can write all
English letters in Bangla like: A for (“এ”), B for (“বি”), C for (“সি”), D for
An Automated Bengali Text Summarization Technique Using … 367
4 Experimental Results
Sample input
Title: দুই ভাই-বোনের ময়না তদন্ত হয়েছে, মামলা হয়নি
Text: রাজধানীরবনশ্রীতেদুইভাইবোনেররহস্যজনকমৃত্যুরঘটনায়এখনোমা
মলাহয়নি।শিশুদেরবাবামামলাকরবেনবলেজানিয়েছেপরিবার।দুইশিশুরলাশেরময়ন
াতদন্তহয়েছে।তাঁদেরগ্রামেরবাড়িজামালপুরেলাশদাফনকরাহবে।খাবারেরনমুনা
পরীক্ষারফলাফলএখনোপাওয়াযায়নি।শিশুদের বাবা আমানউল্লাহর বন্ধু জাহিদুল
ইসলাম আজ মঙ্গলবার বেলা সোয়া ১১ টার দিকে প্রথম আলোকে এসব কথা
জানিয়েছেন।রামপুরাথানারভারপ্রাপ্তকর্মকর্তা (ওসি) রফিকুল ইসলাম বলেন,
এখনো মামলা হয়নি।পরিবারের পক্ষ থেকে আজ মামলা হতেপারে।জিজ্ঞাসা বাদের
জন্য চায়নিজ রেস্তোরাঁর ব্যবস্থাপক, কর্মচারী, পাচককে থানায় নেওয়া
হয়েছে।চায়নিজ রেস্তোরাঁ থেকে আগের দিন আনা খাবার গতকাল সোমবার দুপুরে
গরম করে খেয়ে ঘুমিয়ে পড়ে নুসরাত আমান (১২) ও আলভী আমান (৬)। এরপর তারা
আর জেগে ওঠেনি। অচেতন অবস্থায় হাসপাতালে নেওয়া হলে চিকিৎসকেরা তাদের
মৃত ঘোষণা করেন।পরিবারের অভিযোগের ভিত্তিতে পুলিশ জিজ্ঞাসাবাদের জন্য
ওই রেস্তোরাঁর মালিককে থানায় নিয়ে গেছে। নুসরাত ভিকারুননিসা নূন স্কুল অ্যান্ড
কলেজের পঞ্চম ও আলভী হলিক্রিসেন্ট স্কুলে নার্সারি শ্রেণির শিক্ষার্থী। তাদের
বাবা মো. আমান উল্লাহ ব্যবসায়ী ও মা জেসমিন আক্তার গৃহিণী। এই দম্পতির এই
দুটি সন্তানই ছিল। চায়নিজ রেস্তোরাঁ থেকে আগের দিন আনা খাবার গতকাল
সোমবার দুপুরে গরম করে খেয়ে ঘুমিয়ে পড়ে নুসরাত আমান(১২) ও আলভী আমান(৬)।
এরপর তারা আর জেগে ওঠেনি। অচেতন অবস্থায় হাসপাতালে নেওয়া হলে
চিকিৎসকেরা তাদের মৃত ঘোষণা করেন। পরিবারের অভিযোগের ভিত্তিতে পুলিশ
জিজ্ঞাসাবাদের জন্য ওই রেস্তোরাঁর মালিককে ওই দিনই থানায় নিয়ে গেছে।
An Automated Bengali Text Summarization Technique Using … 369
PrecisionðPÞ ¼ ðA \ BÞ=A
where “A” denotes that the number of sentences obtained by the summarizer and
also “B” denotes the number of relevant sentences compared to target sets.
(ii) Recall (R):
It is the number of sentences occurring in both systems generated summary and
ideal summary divided by the number of sentences in the ideal summary.
RecallðRÞ ¼ ðA \ BÞ=B
where “A” denotes that the number of sentences obtained by the summarizer and
also “B” denotes the number of relevant sentences compared to target sets.
(iii) F-measure:
The integrated measure that incorporated both precision and recall is F-measure.
FScore ¼ ð2 P RÞ=ðP þ RÞ
where “A” denotes that the number of sentences obtained by the summarizer and
also “B” denotes the number of relevant sentences compared to target sets.
The evaluation result of first ten document has given in Table 1.
An Automated Bengali Text Summarization Technique Using … 371
Table 1 Result of precision, Document No. Precision (P) Recall (R) F-score
recall and F-score
1 0.84 0.71 0.76
2 0.79 0.72 0.75
3 0.82 0.69 0.74
4 0.82 0.68 0.74
5 0.79 0.71 0.74
6 0.82 0.73 0.75
7 0.78 0.72 0.73
8 0.85 0.70 0.75
9 0.85 0.71 0.76
10 0.84 0.71 0.76
Average score 0.82 0.70 0.74
5 Conclusion
References
1. Radev, D.R., Hovy, E., McKeown, K.: Introduction to the special issue on summarization.
J. Comput. Linguist. 28(4), 399–408 (2002)
2. Hamou-Lhadj, A., Lethbridge, T.: Summarizing the content of large traces to facilitate the
understanding of the behaviour of a software system. In: Proceedings of the 14th IEEE
International Conference on Program Comprehension (ICPC), pp. 181–190. IEEE, (2006)
3. Hovy, E.: Automated text summarization. In: Mitkov, R. (ed.) The Oxford Handbook of
Computational Linguistics, pp. 583–598. Oxford University Press (2005)
4. Jones, K.S.: Automatic summarizing: factors and directions. In: Advances in Automatic Text
Summarization, pp. 1–12 (1999)
5. https://blog.frase.io/
6. Dongmei, A., Yuchao, Z., Dezheng, Z.: Automatic text summarization based on latent
semantic indexing. J. Artif. Life Robot. 15(1), 25–29 (2010)
7. Kunder, M.D.: The size of the world wide web. Online. Available. http://www.
worldwidewebsize.com. Accessed 15 Feb 2015
8. Chakma, R., et al.: Navigation and tracking of AGV in ware house via wireless sensor
network. In: 2019 IEEE 3rd International Electrical and Energy Conference (CIEEC), Beijing,
China, pp. 1686–1690 (2019). https://doi.org/10.1109/cieec47146.2019.cieec-2019589
9. Emon, I.S., Ahmed, S.S., Milu, S.A., Mahtab, S.S.: Sentiment analysis of bengali online
reviews written with english letter using machine learning approaches. In: Proceedings of the
6th International Conference on Networking, Systems and Security (NSysS ’19). Association
for Computing Machinery, New York, pp. 109–115 (2019). doi: https://doi.org/10.1145/
3362966.3362977
10. Ahmed, S.S., et al.: Opinion mining of Bengali review written with English character using
machine learning approaches. In: Bindhu V., Chen J., Tavares J. (eds.) International
Conference on Communication, Computing and Electronics Systems. Lecture Notes in
Electrical Engineering, vol. 637. Springer, Singapore (2020). https://doi.org/10.1007/978-
981-15-2612-1_5
11. Milu, S.A., et al.: Sentiment Analysis of Bengali reviews for data and knowledge engineering:
a Bengali language processing approach. In: Bindhu V., Chen J., Tavares J. (eds.)
International Conference on Communication, Computing and Electronics Systems. Lecture
Notes in Electrical Engineering, vol. 637. Springer, Singapore (2020). https://doi.org/10.1007/
978-981-15-2612-1_8
12. Munir, C., Ibrahim, K., Mofazzal, H.C.: Bangla VasarByakaran. Ideal publication, Dhaka
(2000)
13. Ferreira, R., de Souza Cabral, L., Freitas, F., Lins, R.D., de Frana Silva, G., Simske, S.J.,
Favaro, L.: A multi-document summarization system based on statistics and linguistic
treatment. Expert Syst. Appl. 41(13), 5780–5787 (2014)
14. Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165
(1958)
15. Foong, O.M., Oxley, A., Sulaiman, S.: Challenges and trends of automatic text summariza-
tion. Int. J. Inf. Telecommun. Technol. 1(1), 34–39 (2010)
16. Azmi, A.M., Al-Thanyyan, S.: A text summarizer for arabic. J. Comput. Speech Lang. 26(4),
260–273 (2012)
17. Karim, M.A., Kaykobad, M., Murshed, M.: Technical challenges and design issues in bangla
language processing. Published in the United States of America by Information Science
Reference (an imprint of IGI Global) (2013)
18. Islam, M.T., Masum, S.: Bhasa: a corpus based information retrieval and summarizer for
bengali text. In: Proceedings of the 7th International Conference on Computer and
Information Technology (2004)
An Automated Bengali Text Summarization Technique Using … 373
19. Uddin, M.N., Khan, S.A.: A study on text summarization techniques and implement few of
them for bangla language. In: Proceedings of the 10th International Conference on Computer
and Information Technology (ICCIT-2012), pp. 1–4. IEEE (2007)
20. Sarkar, K.: Bengali text summarization by sentence extraction. In: Proceedings of International
Conference on Business and Information Management (ICBIM-2012), pp. 233–245. NIT
Durgapur (2012)
Location-Based Pomegranate Diseases
Prediction Using GPS
Abstract In our India, agricultural field is most important and essential field. This
field plays major role in our economy and daily life. There are different types of
crops and fruits that are cultivated in our country. Pomegranate is one of the major
commercial fruit grown in our country, but these fruits are prone to many uneven
climatic diseases. The weather forecasting technology using GPS is very important,
effective, and beneficial to pomegranate farmer to protect the plant form different
diseases and maintain the immunity of their plant. In this research paper, we have
designed a system to predict the correct weather of that location to detect and forecast
the pomegranate diseases and provide prevention tips for the diseases. It gives an
alert message to cultivator based on which he makes decisions.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 375
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_40
376 R. N. Malage and M. B. Patil
(a) (b)
that farm and according to weather situation we forecast and predict occurrence of
diseases and solution of that disease which in turn prevent affection of plant. The
pomegranate cultivators can use the system/application to get information regarding
daily weather and changes in environmental conditions, application mostly gives the
information regarding diseases due to weather.Hangs and appropriate solutions on
that disease. The diseases generally, these pomegranate plants need to have a spray
Fig. 1. Bacterial blight on pomegranate fruit treatment is based on humidity and
temperature so in our application we have designed system for forecasting diseases
based on available whether information. Variation in wheatear condition requires
spray treatment as shown in Fig. 2 occurred due to changeable climate that affects to
whole plant of pomegranate; it would be a huge loss for cultivator. As per researchers
of pomegranate plant, it should require 16 months for healthy growth and if plant
is prone to any disease, it will be vanishing within few days. In pomegranate culti-
vation, exact information regarding the disease’s predication is an important issue.
The pomegranate is prone to many diseases like bacterial blight and Mars diseases as
shown in Fig. 1 which results in reduction in yield and reduce its medical importance.
2 Literature Review
is required for enhancing image.Lamani et al. [7] predicted plant diseases from
weather forecasting using data mining. Plant diseases determination is an art and
science. Plant diseases are essential problem that lowers the quantity and reduces
the quality of agricultural production. The proposed system uses segmentation tech-
nique such as k-means clustering and deep neural network learning to predict the
disease-based weather feature of orange plant.
3 Proposed System
3.1 Processing
In this phase, the deal information weather such as humidity, temperature, wind
speed, possibility of rain, and timing of sunrays is collected since these are the major
parameter in forecasting of diseases in pomegranate plant.
3.2 Segmentation
In this phase, the processed data is grouped together based on parameters which can
be used for efficient calculation and analysis of stored information and it also take
3.4 Preprocessing
In this phase, the data are normalized and selected relevant data for processing
missing data are corrected in this phase so that unreliable data, noise, and irrelevant
data are ignored during processing time.
3.5 Testing
In general, testing is finding out how well something works. To examine something
carefully to find out if it is working properly or what it is like, in this phase, testing
of pomegranate based on weather conditions is performed.
3.6 Classification
It is the phase of organizing things into groups based on their type, systematic
arrangement in group, or categories according to established criteria.
3.7 Detection
Plant detection is the process matching a specimen plant to know taxon. The ability to
identify plants allows us to access many important pasture variables that are critical
to management like range condition proper stopping rate wide life habited quality.
380 R. N. Malage and M. B. Patil
Figure 6 shows the temperature graph which is primary factor affecting the rate
of plant development. Raise in temperature may affect the plant which may impact
plant productivity. Temperature will help to farmer from sun burning for protect their
pomegranate fruit and cover the farm. Figure 7 shows wind speed. This result shows
to farmer for taking daily spray. So, he can save his money and spray. Wind direction
and velocity have significant influence on crop growth. Figure 8 shows humidity
means the amount of wetness or water vapor in the air and it can also predicate
rainfall.
5 Conclusion
References
1. Pawara, S., Navalem, D., Patil, K., Mahajan, R.: Detection of pomegranate disease using
machine learning and internet of things. In: IEEE 3rd International Conference for Convergence
in Technology (I2CT) (2018)
2. Dubey, S.R., Jalal, A.S.: Detection and classification of tomato vegetable diseases using
complete local binary patterns IEEE. In: Third International Conference on Computer and
Communication Technology, vol. 3, pp. 247–251 (2012)
3. Islam, M., Dinh, A., Wahid, K., Bhowmik, P.: Detection of potato diseases using image segmen-
tation and multiclass support vector machine. In: IEEE 30th Canadian Conference on Electrical
and Computer Engineering (CCECE) (2017)
4. Bhange, M., Hingoliwala, H.A.: Pomegranate disease detection using image processing.
Procedia Comput. Sci. 280–288 (2015)
5. Dhakate, M., Ingole, A.: Diagnosis of pomegranate plant diseases using neural network. In:
Fifth National Conference on Computer Vision, Pattern Recognition, Image Processing and
Graphics (NCVPRIPG) (2015)
6. Gaikwad, D.S., Karande, K.J.: Image processing approach for grading and identification of
diseases on pomegranate fruit: an overview. Int. J. Comput. Sci. Inf. Technol. 7(2), 519–522
(2016)
7. Lamani, S.B., Ravikumar, K., Jamal, A.: Pomegranate fruits disease classification with fuzzy
c mean clustering. Int. J. Adv. Eng. Res. Dev. 5(2) (2018)
8. Kaur, K., Kaur, M.: Prediction of plant disease from weather forecasting using data mining.
Int. J. Future Revolution Comput. Sci. Commun. Eng. 4(4) (2018)
Location-Based Pomegranate Diseases Prediction Using GPS 383
9. Sowmya, G.M., Chandan, V., Kin, S.: Disease detection in pomegranate leaf using image
processing technique. Int. J. Sci. Eng. Technol. Res. (IJSETR) 6(3) (2017)
10. Li, Q., Wang, M., Gu, W.: Computer vision based system for tomato surface defect detection.
Comput. Electron. Agric. 36, 215–223 (2002)
11. Mehl, P.M., Chao, K., Kim, M., Chen, Y.R.: Detection of defects on selected tomato cultivars
using hyperspectral and multispectral image analysis. Appl. Eng. Agric. 18, 219–226 (2002)
12. Wang, Y., Cui, Y., Huang, G.Q., Zhang, P., Chen, S.: Study on vegetable quality inspection
based on its surface color in produce logistics. In: International Conference on Manufacturing
Automation (2010)
13. Chaerle, L., Lenk, S., Hagenbeek, D., Buschmann, C., Straeten, D.V.D.: Multicolor fluores-
cence imaging for early detection of the hypersensitive reaction to tobacco mosaic virus. J.
Plant Physiol. 164(3), 253–262 (2007)
14. Singh, V., Varsha, A.K.: Detection of unhealthy region of plant leaves using image processing
and genetic algorithm. In: 2015 International Conference on Advances in Computer Engi-
neering and Applications (ICACEA) IMS Engineering College, Ghaziabad, India
15. Chaudhary, M., Chavan, R., Durgawali, S., Ghodeswar, A.: Smart agriculture: detection of
disease in plants using image processing. In: International Conference on Innovative and
Advanced Technologies in Engineering
16. Mithun, P., Aishwarya, K., Nikita, S., Aishwarya, G.: Android based application for fruit quality
analysis. Int. J. Innovative Res. Sci. Eng. Technol. 12(6) (2016)
17. Doddaraju, P., Kumar, P., Gunnaiah, R., Gowda, A.A., Lokesh, V., Pujer, P., Manjunatha,
G.: Reliable and early diagnosis of bacterial blight in pomegranate caused by Xanthomonas
axonopodis pv punics sensitive PCR technique
18. Sharma, J., Sharma, K.K., Kumar, A., Mondal, K.K., Thalor, S., Maity, A., Gharate, R.,
Chinchur, S., Jadhav, V.T.: Pomegranate bacterial blight: symptomatology and rapid inoculation
technique for Xanthomonas axonopodis pv punicae. J. plant Pathol.
19. Jain, K., Desai, N.: Pomegranate the cash crop of India: a comprehensive review on agricultural
practices diseases. Int. Res. Health Sci. Res.
Medical Image Enhancement Technique
Using Multiresolution Gabor Wavelet
Transform
Abstract Medical images are applied for analysis and diagnosis of particular
medical disorder or diseases. Hence, the medical image enhancement technique is
necessary and challenging for further processing through computer vision systems.
It assists in further processing of medical images for segmentation, detection and
prediction of certain diseases such as cancer, tumor and any other disorder. Most
of the medical images obtained through various sources are dark and seem to be
noisy that requires efficient image enhancement technique that preserves the content
of the images. In this paper, the image enhancement technique through multireso-
lution Gabor wavelet transform is presented. Gabor wavelet transform has demon-
strated multiresolution capabilities with better texture enhancement that helps in
quality improvement in medical images. Experiments based on public dataset reveal
better performance with respect to qualitative and quantitative analysis. Experimental
results on several low illuminated medical images demonstrate best results in terms
of enhancement parameters and visual testing. Finally, the obtained outcomes are
compared with the prominent methods published in the literature.
K. Moon (B)
Department of Electronics Engineering, Ramrao Adik Institute of Technology, Navi Mumbai,
India
e-mail: kapila.moon@gmail.com
A. Jetawat
Faculty of Engineering, Pacific Academy of Higher Education and Research University, Udaipur,
India
e-mail: drashokjetawat@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 385
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_41
386 K. Moon and A. Jetawat
1 Introduction
Medical images are mostly captured in low light environment and are affected by
noise and low contrast. Therefore, it is utmost necessary to apply image enhance-
ment techniques so as to make it suitable for further processing through computer
vision systems. These systems process medical images for segmentation, detection
and prediction of certain diseases such as cancer, tumor and any other disorder.
During day time or better illumination, the image quality may be sufficient for its
indented application, but especially during night and low illumination circumstances,
the image quality may be worse and affect the correct diagnosis of the disease [1–
3]. Some of the images obtained in low illumination are depicted in Fig. 1. Many
image enhancement techniques are proposed by researchers and developers based
on spatial and transform domain. Most of the spatial domain techniques are based on
histogram equalization for contrast enhancement, adaptive mean filtering technique
that uses statistical methods and models for removing noise. Whereas transform
domain techniques apply frequency domain techniques such as Fourier transform
and wavelet transform for image enhancement and contrast stretching. Nowadays,
machine learning techniques are being explored to enhance the region of interest and
further processing of medical images.
A technique based on multiresolution Gabor wavelet transform for medical image
enhancement is presented in this paper. Medical image enhancement technique is
necessary and challenging for further processing through computer vision systems.
It assists in further processing of medical images for segmentation, detection and
prediction of certain diseases such as cancer, tumor and any other disorder. Our contri-
bution can be summarized as first we apply fixed Gabor wavelet transform, secondly
variable Gabor wavelet transform is applied at several resolution and finally the results
are summed to obtain a no noise and contrast stretched image. Further, the paper is
organized in various sections that includes introduction in Sect. 1, related work is
discussed in Sect. 2, presented methodology is explained in Sect. 3, experimental
results are elaborated in Sect. 4 and conclusion is discussed in Sect. 5.
2 Related Work
Fourier transform is one of the best tools to obtained frequency response of the audio,
image and video signals. However, Fourier transform loses the frequency spectrum
information that makes it inappropriate for restorage of image and further processing
especially through convolutional neural network (CNN). Gabor wavelet transform
offers better multiresolution approach that represents the texture of an image. We
apply Gabor filter to extract global features from the whole medical image. The 2-D
Gabor function can be specified by the frequency of the sinusoid w and the standard
deviation σ x and σ y of the Gaussian envelope as shown in Eq. (1)
x2 2
1 − 21 + y 2 +2π jwx
g(x, y) = e σx2 σy
(1)
2π σx σ y
388 K. Moon and A. Jetawat
Let g(x, y) be the mother Gabor wavelet, filter coefficients can be obtained by
appropriate dilations and rotations of g(x, y) depicted in Eq. (2).
where m specifies the scale, whereas n specifies the orientation respectively of the
wavelets, the m and n are the integers whose values can be given as m = 0, 1, 2, …,
M − 1, n = 0, 1, 2, …, N − 1. The integers designated by M and N represent the
total number of scales and orientations applied in wavelet transform respectively and
can be represented as Eqs. (3) and (4)
where a > 1 and θ = 2π /N. Let I(x, y) be the gray level an input medical image, the
convolution of this image I with a Gabor kernel Gmn is given using Eq. (5).
Gmn (x, y) = I (x − s, y − t)g∗mn (s, t) (5)
s t
where s and t are the filter mask size variables, g∗mn is the complex conjugate of Gabor
function gmn . Application of Gabor filters on the whole medical image with different
orientation and scale, an array of magnitudes are obtained. These magnitudes at a
different scale and orientation of the image are finally summed to obtain a no noise
and contrast stretched image.
4 Experimental Results
Investigational results for established dataset [19], it consists of brain tumor dataset
containing 3064 images from 233 patients with three kinds of brain tumor menin-
gioma (708 slices), glioma (1426 slices) and pituitary tumor (930 slices) for
enhancing low light images through multiresolution Gabor wavelet transform. Most
of the medical images obtained through various sources are dark and seem to be noisy
that requires efficient image enhancement technique that preserves the content of the
images. In this paper, the image enhancement technique through multiresolution
Gabor wavelet transform is presented. Gabor wavelet transform has demonstrated
multiresolution capabilities with better texture enhancement that helps in quality
improvement in medical images. To appropriately quantify our results, three param-
eters were explored mean average error (MAE), peak signal to noise ratio (PSNR)
and image enhancement factor (IEF) defined in Eqs. (6), (8) and (9), respectively.
MAE, PSNR and IEF need base image to evaluate the parameters.
Medical Image Enhancement Technique Using Multiresolution … 389
2552
PSNR = 10 ∗ log10 (6)
1
m∗n
∗ se
m
n
se = |I (x, y) − O(x, y)|2 (7)
x=1 y=1
where I(x, y) and O(x, y) are obtained base input and output/restored image,
respectively.
m
n
MAE = |I (x, y) − O(x, y)| (8)
x=1 y=1
m n
x=1 y=1 |In(x, y) − I (x, y)|2
IEF = m n (9)
x=1 y=1 |I (x, y) − O(x, y)|2
5 Conclusion
as compared with DWT, DFT and histogram equalization especially for low illumi-
nated images. Thus, it is preferred to apply multiresolution Gabor wavelet transform
for preprocessing of medical images that assist in further diagnosis and analysis.
Medical Image Enhancement Technique Using Multiresolution … 391
Fig. 3 Experimental results Input image captured Output image (our work)
obtained on public dataset under low illumination
through multiresolution
Gabor wavelet transform
(our work)
References
1. Kadir, T., Gleeson, F.: Lung cancer prediction using machine learning and advanced imaging
techniques. Transl. Lung Cancer Res. 7(3), 304–312 (2018)
2. Makaju, S., Prasad, P.W.C., Alsadoon, A., Singh, A.K., Elchouemi, A.: Lung cancer detection
using CT scan images. Procedia Comput. Sci. 125, 107–114 (2018)
3. Zhang, G., Jiang, S., Yang, Z., Gong, L., Ma, X., Zhou, Z., Bao, C., Liu, Q.: Automatic nodule
detection for lung cancer in CT images: a review. Comput. Biol. Med. 103, 287–300 (2018)
4. Zhang, J., Xia, Y., Cuia, H., Zhang, Y.: Pulmonary nodule detection in medical images: a survey.
Biomed. Signal Process. Control 43, 138–147 (2018)
5. Uzelaltinbulat, S., Ugurb, B.: Lung tumor segmentation algorithm. Procedia Comput. Sci. 120,
140–147 (2017)
6. Nithila, E.E., Kumar, S.S.: Segmentation of lung nodule in CT data using active contour model
and Fuzzy C-mean clustering. Alexandria Eng. J. 55, 2583–2588 (2016)
7. Abdullah-Al-Wadud, M., Kabir, M.H., Dewan, M.A.A., Chae, O.: A dynamic histogram equal-
ization for image contrast enhancement. IEEE Trans. Consum. Electron. 53(2), 593–600
(2007)
8. Jobson, D.J., Rahman, Z.-U., Woodell, G.A.: Properties and performance of a center/surround
retinex. IEEE Trans. Image Process. 6(3), 451–462 (1997)
9. Rahman, Z., Jobson, D.J., Woodell, G.A.: Multi-scale retinex for color image enhancement.
In: Proceedings of 3rd IEEE International Conference on Image Processing, pp. 1003–1006
(1996)
392 K. Moon and A. Jetawat
10. Jobson, D.J., Rahman, Z.-U., Woodell, G.A.: A multiscale retinex for bridging the gap between
color images and the human observation of scenes. IEEE Trans. Image Process. 6(7), 965–976
(1997)
11. Sharma, D., Jindal, G.: Identifying lung cancer using image processing techniques. In:
International Conference on Computational Techniques and Artificial Intelligence (ICCTAI),
pp. 115–120 (2011)
12. Chaudhary, A., Singh, S.S.: Lung cancer detection on CT images by using image processing.
In: IEEE International Conference on Computing Sciences (ICCS), pp. 142–146 (2012)
13. Gupta, A., et al.: Methods for increased sensitivity and scope in automatic segmentation and
detection of lung nodules in CT image. In: IEEE International Symposium on Signal Processing
and Information Technology (ISSPIT), pp. 375–380 (2015)
14. Shen, L., Yue, Z., Feng, F., Chen, Q., Liu, S., Ma, J.: MSRnet: low-light image enhancement
using deep convolutional network. Available https://arxiv.org/abs/1711.02488 (2017)
15. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional
networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)
16. Jakimovski, G., Davcev, D.: Using double convolution neural network for lung cancer stage
detection. Appl. Sci. 9(427), 1–12 (2019)
17. Chapaliuk, B., Zaychenko, Y.: Deep learning approach in computer-aided detection system for
lung cancer. In: IEEE International Conference on System Analysis and Intelligent Computing
(SAIC), Ukraine (2018)
18. Li, Z., Li, L.: A novel method for lung masses detection and location based on deep learning. In:
IEEE International Conference on Bioinformatics and Biomedicine (BIBM), America (2017)
19. https://figshare.com/articles/brain_tumor_dataset/1512427/5
HOMER-Based DES
for Techno-Economic Optimization
of Grid
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 393
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_42
394 R. Raja Kishore et al.
1 Introduction
2 HOMER Software
The HOMER Pro is small scale lattice programming and the HOMER Energy is the
worldwide standard for advancing smaller-scale network structure in all territories,
from network force and island advantages to matrix connected grounds and the
army installations [3]. At first, it will be created at the national regeneratable energy
laboratory. This system can improve and distributed by HOMER Energy. Hybrid
optimization model for multiple energy resources (HOMER) nests three controlling
tools in one software product, so that production and financial side work side by side
are shown in Fig. 1.
HOMER-Based DES for Techno-Economic Optimization of Grid 395
2.1 Simulation
By the side of its basic, HOMER is a reproduction model it wills effort to simulate
a feasible method for all possible arrangements of the tools that you hope to study.
Reliant on how you complex your concern, HOMER programming can simulate a
hundred or even a huge number of techniques. HOMER simulates the procedure of
a hybrid micro-grid for an entire year in timeframes from one moment to 60 min.
2.2 Optimization
HOMER concentrates every single possible mix of game plan types in a lone single
run, and afterwards bunches the frameworks as per the advancement adaptable of
ideal. HOMER Pro structures our unique streamlining process that impressively
improves the technique method for grouping least-value opportunities for micro-
grids or other distributed generation of electrical power systems. HOMER optimizer
is a patented “derivative free” optimization process that was calculated specifically
to effort in HOMER is shown in Fig. 2.
3 Case Study
These case studies were developed to test the ability of the multilevel optimization
method to analyse remote communities with different climate conditions. For this, we
396 R. Raja Kishore et al.
Data solar resource input data for HOMER is made up of monthly averaged daily
insulation incidents on a horizontal surface (kWh/m2 /day) from the NASA Surface
Meteorology and Solar Energy (SSE) website. NASA gives monthly averaged values
from 10 years of data. Due to the close distance, the location data of the city Hyder-
abad is used as location data of the college Marri Laxman Reddy Institute of Tech-
nology and Management in the study. The following location data is used to find the
solar radiation data:
HOMER-Based DES for Techno-Economic Optimization of Grid 397
The energy system that is proposed is expected to meet the demand of the load
of electricity of the community that will also include classrooms. The source of
renewable energy considered here are mainly of solar and wind due to the unstable
nature of renewable energy battery bank is employed as storage system. In this
configuration, a two-way converter is inserted. This is used to change the battery
power in terms of AC type voltage into DC type voltage. It supplies AC type power
back from battery to AC type load to consumers. AC type power is required by all
the consumers, part of the input values into the software are given according to size
and quantity. The other components are solar PV and converter; these two are also
vary in size. The simulated model of the hybrid architecture considered in this paper
is presented in Fig. 5.
Immediately after selecting the segments innovation from the library of HOMER
programming, enter the power load into the demonstrating apparatus. The essential
load input entered on a 24 h information basis and from that point the product models
a peak load. It additionally combines the month to month load from a 24-h input
information. This paper describes an essential power burden and its data sources. It
groups an end of the week load and for August, January and further rest of the months
produced by HOMER in the after generated the 24 h load information portrays the
diurnal variety of the essential burden profile of the college Fig. 5 demonstrates the
essential load demands and shows that load profile changes during the day. The load
is going to zero from midnight to 6:00 clock in the morning. The load is about to
raises the demand from 6:00 to 9:00 o’ clock. Around lunch time, i.e. from 12:00
AM to 2:00 PM there is a greater demand in power. There is a greater demand
for power around dinner time; however, peak hour is from 6:00 PM to 12:00 PM
midnight. This schematic clearly demonstrates that electricity ID consumed most for
lightening purpose.
The cost of the modules as chief purpose of the work is investigating the best power
system contour which would meet the requirements with minimum NPC and COE
is the basic criteria relevant to the selection of the power system components in this
thesis. The cost of equipment was estimated on the basis of a current cost available
in the market.
Initial capital cost:
The total installed cost used in purchasing and installing components in the beginning
of the project.
O&M cost:
The cost of maintaining and operating of the system is the O&M cost. All the compo-
nents related to this scheme considered in this project as variable operational cost
and maintenance cost. Miscellaneous O&M cost mentioned by HOMER is emission
damages, capacity shortage, penalty and fixed operational and maintenance cost.
Replacement cost:
It is compulsory to change wear out modules at the end of its lifetime initial cost of
components is different from this because of all the spaces of the components are
not necessary to be replaced at the end of the life cycle and cost born by donors may
make up or reduce the starting cost. However, a new cost may not be considered as
travel cost.
400 R. Raja Kishore et al.
The reason for choosing after considering different products regarding the cost
provided them with four modules having product was chosen from the stated company
because of its low cost. This is expected to give an efficient service for a consider-
able long time. We considered a 50 KW solar panel 250 W capacity delivered by
Generic PV Company. The panel is known as Generic Flat Plate PV built with
mono-crystalline silicon we have efficiency of 20.4% and price range from $1.16
to $1.31/W. The insulation cost is taken as 60% of PV price. The operation and
maintenance would expected to 1% per year and other details found in Table 2.
Optimization results are presented in an overall and classified form showing the most
workable power system structure which is suitable for a load and input workable
solutions are appeared in an expanding order of the net present cost in a dropping
request. A characterization table gives the least cost effective from all the units’
setup. While the general enhancement results introduced all the moderate frame-
works mixes dependent on our NPC. Net present expense and net present cost were
the basics of selecting the power systems. The parameters were like low excess elec-
tricity generation, low capacitive shortage and high renewable fraction are used for
illustration of power generation schemes in order to test their technical feasibility.
Optimization results for a selected hybrid power system are shown in Fig. 6.
7 Conclusion
conditions figured in this study, the relatively low NPC of the system is much depen-
dent under among the price at which power can be sold at grid. Therefore, of selling
electricity as an important role for the systems economical suitable such an agree-
ment would make the energy system much more economically viable for the college,
which would continue not power to the stage power grid, reduce the CO2 emissions
and contribute to an increase renewable energy use and increased availability of
power supply.
References
1. Vendoti, S., Muralidhar, M., Kiranmayi, R.: Optimization of hybrid renewable energy systems
for sustainable and economical power supply at SVCET Chittoor. i-manager’s J. Power Syst.
Eng. 1(1), 26–34 (2017)
2. Boqtob, O., El Moussaoui, H.: Optimal sizing of grid connected micro grid in Morocco using
Homer Pro. In: IEEE Conference Proceedings (2019)
3. Vendoti, S., Muralidhar, M., Kiranmayi, R.: HOMER based optimization of solar-wind-diesel
hybrid system for electrification in a rural village. In: IEEE Digital Library Explorer pp. 1–6
(2018)
4. Vendoti, S., Muralidhar, M., Kiranmayi, R.: Techno-economic analysis of off-grid
solar/wind/biogas/biomass/fuelcell/battery based system for electrification in a cluster of villages
by HOMER software. Environ. Dev. Sustain. (2020)
5. Fernando, W., Gupta, N., Kamya, G., Ozveren Suheyl, C.: Feasibility study of small scale battery
storage systems integrated with renewable generation technologies for Sri Lankan domestic
applications. IEEE Conference Proceedings (2019)
402 R. Raja Kishore et al.
6. Khasawneh, H.J., Mustafa, M.B., Al-Salaymeh, A., Saidan, M.: Techno-economic evaluation
of on-grid battery energy storage system in Jordan using Homer Pro. AEIT (2019)
7. Marais, S., Kusakana, K., Koko, S.P.: Techno-economic feasibility analysis of a grid-interactive
solar PV system for South African residential. In: 2019 Proceedings of the 27th Domestic Use
of Energy Conference, pp. 163–168 (2019)
Odor and Air Quality Detection
and Mapping in a Dynamic Environment
R. Srinath
SenZopt Technologies, Bengaluru, India
e-mail: raghu@senzopt.com
J. Vrindavanam (B) · R. R. Budyal · Y. R. Sumukh · L. Yashaswini · S. S. Chegaraddi
Department of ECE, Nitte Meenakshi Institute of Technology, Bengaluru, India
e-mail: jayavrinda.v@nmit.ac.in
R. R. Budyal
e-mail: rahulrbud99@gmail.com
Y. R. Sumukh
e-mail: yrsumukh@gmail.com
L. Yashaswini
e-mail: yashaswina428@gmail.com
S. S. Chegaraddi
e-mail: sangeethaschegaraddi@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 403
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_43
404 R. Srinath et al.
1 Introduction
The waste disposal management systems of the cities around the world has been
facing challenges on account of ever-increasing urbanization and concomitant rising
volume of waste generated and littering. Public waste bins are filling up faster than
ever and inevitably many of the bins overflow prior to collection, causing not only
bad odors and garbage overflow nearby to the dumping area, but also poses health
hazards and environmental pollution as overflowing, and uncollected waste bins
are a perfect location for the growth of bacteria, insects, and vermin. The flies that
feed on the rubbish can spread the diseases like typhoid, gastroenteritis, and other
major illnesses. Overflowing garbage smell causes various respiratory diseases, fall
in air quality, adverse health effects in the form of spreading of disease causing
pathogens entering into human body through breathing and contacts. Though there
can be quite a few substances in the waste, in general, the air gets polluted with
gases like carbonic acid gas, laughing gas, ammonia, and methane. In daily life, we
identify the polluted air through smelling of odors, which are usually caused out of
decomposition of bio-degradable items. So as to make sure timely collection of food
and other perishable wastes at risk of degeneration, municipal authorities in certain
localities have introduced static sensors in areas like the market and other specific
locations. Since the wastes are generated at wider locations within the cities, timely
collection of biodegradable wastes assume importance as the delays can pollute the
air and the nearby environment. Instead of installing a few sensors at a specific
location, this paper proposes to introduce placing of the sensors in moving objects
like vehicles and tracking all sorts of odor so that the city-wide odors can be sensed,
and appropriate action can be initiated. When the sensors are fitted with large number
of vehicles, inputs on the air odor can be obtained from multiple locations found be
a better and effective approach that can support cleaner environment.
The odor sensing device proposed in this paper continuously detects, measures,
and monitors the odor gaseous contaminants. The solution incorporates Odor Atmo-
spheric Dispersion Modeling (OADM) for predicting odor impact on the surrounding
area depending on meteorological conditions. With the help of meteorological data,
the odor sensing device can trace the odorant dispersion plume incited by condi-
tions like wind speed and wind direction. The odor sensing device uses LoRa a
low-power wide-area network (LPWAN) technology network, which is one of the
best cost-effective approaches in such conditions. The odor sensor is being imple-
mented by using chemical sensors (MQ-2, MQ-3, MQ-9, MQ-135, etc.), air quality
sensors. Whenever the brink point of the chemical sensor is reached, the sensor data
is shipped to the LoRa gateway together with the placement (longitude and latitude)
of the vehicle. The LoRa main hub is placed every 3–5 km radius. The LoRa receiver
receives this data and sends it to the cloud. From the cloud, the municipality can take
action to scrub that area. All this data is shipped to the corporation for the upkeep of
cleanliness and keeping the environment clean.
Odor and Air Quality Detection and Mapping … 405
In the present scenario, in most of the countries, the sensors introduced by the
municipal authorities are static and accordingly, placed only in a few select loca-
tions. Given the increased attention to cleanliness amongst the cities (like Swachh
Bharat Abhiyan (Clean India Mission) and ranking of cities based on cleanliness
(e.g., in India), the project can support relevant authorities for ensuring better living
conditions by reducing the hazardous odor and timely waste collection by using the
proposed system, which can be a moving sensor that can detect the foul smell in any
part of the cities.
Sect. 1 as above is on introduction, and Sect. 2 explains the literature review.
Proposed system is discussed in Sect. 3. Results are analyzed in Sects. 4 and 5 is on
conclusion.
2 Literature Review
In the year 2004, a study has brought out a detection instrument to identify odor pollu-
tion in the environment [1]. Keeping in view the criticality of the waste management
for ensuring better public health, thereafter, a number of researches were carried on
the organic components and chemical compositions present in odor. The measure-
ment method was introduced for odor emission capacity to describe the number
of odorants present. Odor compounds can also be recognized by means of detec-
tion instruments like gas chromatography. Yet another method introduced has been
the E-nose, which is an instrument developed to approach the biological olfaction
system. This system comprised of the electronic, chemical sensors with partial speci-
ficity, and an appropriate pattern recognition system, capable of recognizing simple
or complex odors. In another study [2] on odor detection methods, such as olfac-
tometry and chemical sensors, the study examined the state of both human and
instrumental sensing currently used for the detection of odors. The olfactometric
techniques employing a panel of trained experts were discussed and the strong and
weak points of odor assessment through human detection highlighted. Further, the
paper also discussed the merits and demerits of the instrumental sensory methods and
chemical sensors. The limitations of the dynamic olfactometry are that it provided
point odor concentration data, which is not sufficient to have a full evaluation. There
are also studies that have attempted comparison and integrations between olfactom-
etry and E-nose, and their outcomes were listed out. Another paper viewed that using
more than one approach is required for better understanding of olfactory nuisance
cases. Monitoring household garbage odors in urban areas through distribution maps
were proposed [3] and introduced e-nose, which is equipped by bike, sensors such
as MQ series (2, 9) and TGS series (2620, 2602a, 2602b). The limitation is use of
bicycle for monitoring the waste; furthermore, bicycle is equipped with expensive
devices such as laptop, GPS module, and e-nose, etc. Other studies carried out by
Deepak and Neeta [4] surveyed odor detection and sensors in the year 2017. The
paper reviewed various odor detection system and sensors that can be employed for
real world in the field of detection, identification, and classification of various odor
406 R. Srinath et al.
present in the air. Surface acoustic wave sensors were created to detect multiple
volatile organic compounds. Metal oxide sensor also possesses the detection capa-
bility of volatile compounds and detects molecules in a range. New sensors were
introduced for the biological olfactory system called biosensors. Gas chromatog-
raphy system which results in the detection of VOCs. And various odor detection
sensors give depth knowledge about various aspects of odor that is used during
detection and classification. And also increases the efficiency of odor detectors.
Dabholkar et al. [5] proposed a method for illegal dumping detection. The authors
have demonstrated that application of deep learning approach can facilitate auto-
matic detection of illegal dumping. The authors explored multiple design trade-offs
to achieve better accuracy with a smaller memory footprint. Okokpujie [6] and others
introduced the idea of a smart air pollution monitoring system to continuously track
air quality, and the measured air quality indicators were displayed on a screen. The
authors have developed a platform named, “Thing speak,” where the collected indica-
tors were displayed. The purpose of the system was to enhance public awareness on
air quality. This monitoring device was capable of delivering real-time measurements
of air quality.
Smart air quality monitoring system with LoRa (Long Range) WAN was proposed
by Thu [7] and others in the year 2018. The smart air quality monitoring system,
which the paper described as an end-to-end system was implemented in Yangon,
the business capital of Myanmar. The smart system, according to the paper, allowed
the users to access the online dashboard to monitor the real-time status of the air
quality and also had the feature that allowed the users to retrieve the past data, all
by themselves. The smart system also has the feature of adding sensor nodes and
gateways in case the implementation team decides to extend the scope of monitoring
the area. Yet another intervension has been the IOT-based E-tracking system [8]
that enables monitoring of garbage. The paper viewed that the proposed application
has been economical apart from being a long-range automated garbage monitoring
system. The system was capable of generating real-time data and analysis, supported
through a web portal and android application. Through android application, system
sends the notification to the garbage collector about garbage level along with its ID
and location using GPS module. This feature also supported in the route optimiza-
tion. In addition, the proposed system has made use of machine learning model to
predict the impact of air pollution based on the collected air quality parameters for
a given period of time. This has enabled pro-active management of resources and its
deployment and also contributed in enhanced efficiency in terms of time and cost.
Unlike the above studies, the present implementation is novel and cost effective, as
it uses vehicles as a platform for fixing the sensors, and the online connectivity is
achieved through the LoRa.
Odor and Air Quality Detection and Mapping … 407
3 Proposed System
The proposed system uses MQ-2 gas sensor, which is highly sensitivity to LPG,
i-butane, propane, methane, alcohol, hydrogen, and smoke. MQ-135 gas sensor is
sensitive to ammonia, sulfide, and benzene steam, also sensitive to smoke and other
harmful gases. Heltec ESP32 module is an integrated system consisting of ESP32,
LoRa, and OLED display. Initially, the device will sense the odor values using the
MQ series sensor placed within the device. The smell is detected as bad or good
supported the edge being set. If the detected smell is bad, then the sensor will send
the information value to the LoRa, and also, it will fetch the location of the vehicle.
LoRa, as already explained in the literature, is a wireless technology that provides
long-range, low-power, and secure data transmission for M2M and Internet of Things
(IoT) applications. LoRa is predicated on chirp spread spectrum modulation, which
has low-power characteristics like Frequency Shift Key (FSK) modulation but may
be used for long-range communications. The sensor will send the sensor data together
with the situation of the vehicle (latitude and longitude) to the LoRa gateway given
that the edge value is reached. Location is fetched using the GPS module that is being
used.
The LoRa receiver will receive the information and send it to the cloud from the
cloud everyone can access the map. LoRa is low powered module. LoRa can be
connected to the vehicle battery using voltage regulators. The block diagram of the
proposed device is as shown in Fig. 1.
Advantages of the proposed model are as follows:
• Quality of the air can be measured.
• Prevents spreading of disease by detecting the foul smell of dead animals.
• Municipal’s will get the locations to be cleaned.
Fig. 1 Block diagram of the proposed system. Firebase—to fetch database, MapBox—for maps,
PyQt5—for creating GUI, MQ series sensors—odor detection, LoRa—to receive and send the data,
ESP32—microcontroller
408 R. Srinath et al.
In the proposed system as shown in Fig. 3, sensor is embedded on any vehicles plying
in the city and supports city-wide tracking of odor across multiple locations. This
would ensure that considerable data points are gathered, and the municipal authorities
can initiate actions depending upon intensity of the odor.
The GUI shown in Fig. 4 is a representative screen which provides the data, where
the garbage is detected by the vehicle. The data provided is in terms of latitudinal and
longitudinal coordinates that enable us to spot the location of garbage. The data 1 and
data 2 correspond to the value of MQ sensors, i.e., MQ-2 and MQ-135, respectively.
If sensors data exceeds the threshold value, then it indicates foul smell.
Live data will be highlighted on map, whenever the value of sensors surpasses the
threshold value as shown in Fig. 5. The bar graph on the right indicates the intensity
of foul smell which is depicted with different colors. The foul smell with highest
Odor and Air Quality Detection and Mapping … 409
intensity is with represented by red color, and the least intensity values are with blue.
The color code enables the municipal authorities to give priority to the red hotspot
areas over the remaining colors.
5 Conclusion
The paper has introduced a system-based odor detection and tracking that can be fitted
on a vehicle with the support of the network and GUI. The results on map provide
the areas, where the odor intensity is more, and accordingly, waste collection plan
can be initiated. Further, the system can also be used as a base support indicator for
410 R. Srinath et al.
Fig. 5 Snapshot of map showing the locations to be cleaned is detected from the vehicles
placing the right sized bins keeping in view of the volumes of waste generation or
on account of repeated triggers.
References
1. Yuwono, A., Lammers, P.S.: Odor pollution in the environment and the detection instrumentation.
Agric. Eng. Int. CIGR J. Sci. Res. Dev. 6 (2004)
2. Brattoli, M., Gennaro, G., Pinto, V., Loiotile, A.D., Lovascio, S., Michele, P.: Odor detection
methods: olfactometry and chemical sensors. Proc. J. Sens. 11(5), 5290–5322 (2011)
3. Monroy, G., Gonzalez, J.J., Sanchez-Garrido, C.: Monitoring Household Garbage Odors in
Urban Areas Through Distribution Maps. Department of System Engineering and Automation
IEEE Sensors. Valencia, November (2014)
4. Aeloor, D., Patil, N.: A Survey on Odor Detection and Sensors. Department of Computer
Engineering (2011)
5. Dabholkar, A., Muthiyan, B., Shilpa, S., Swetha, R., Jeon, H., Gao, J.: Smart illegal dumping
detection. In: 2017 IEEE Third International Conference on Big Data Computing Service and
Applications (2017)
6. Okokpujie, K., Noma-Osaghae, E., Modupe, O., John, S., Oluwatosin, O.: Smart air pollution
monitoring system. Int. J. Civil Eng. Technol. (IJCIET) 9(9), 799–809 (2018). ISSN: 0976-6308
and ISSN: 0976-6316
7. Thu, M.Y., Htun, W., Aung, Y.L., Shwe, P., Tun, N.M.: Smart Air Quality Monitoring System
With LoRaWAN (2018)
8. Gokhale, M., Chaudhari, P., Jadhav, N., Wagh, R., Smita, K.: IOT based E-tracking system for
waste management. In: The IEEE International Conference on Internet of Things and Intelligence
System (2018)
A Comparative Study
on the Performance of Bio-inspired
Algorithms on Benchmarking
and Real-World Optimization Problems
1 Introductions
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 411
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_44
412 E. Lakshmi Priya et al.
2 Related Works
A comparative study of results of five algorithms: GA, PSO, artificial bee colony
(ABC) algorithm, invasive weed optimization (IWO) algorithm, and artificial immune
(AI) algorithm to solve some standard benchmark multivariable functions was
presented in [1]. The comparison of ant colony optimization (ACO) and PSO on
optimizing the membership functions of a fuzzy logic controller was presented in
[2]. Another comprehensive review and comparative study of the Bi C algorithms was
presented in [3]. A comprehensive review of the significant bio-inspired algorithms
that are popularly applied in sentiment analysis is presented in [4]. A comparative
study of four bio-inspired algorithms (GA, PSO, DBDE, and BSO) in finding optimal
energy saving pattern for an intelligent Internet of things-based system was presented
in [5].
The authors of [6] proposed a hybrid bio-inspired algorithm for load balancing
and scheduling among the cloudlets. The proposed algorithm was compared with
firefly algorithm, ACO, and ABC. Aswanth et al. [7] presented comparison of firefly
algorithm, symbiotic organism search algorithm, harmony search algorithm, and the
k-means algorithms for clustering the sensor nodes in wireless sensor networks.
A similar study on comparing the algorithms for their performance in solving the
autonomous landing problem of Unmanned aerial vehicle was presented in [8].
Following the above-mentioned trend of research, this paper proposes to compare
the performance of GA [9], PSO [10] and SA [11] on a set of four benchmarking
problems and a medical image segmentation problem.
A Comparative Study on the Performance … 413
The experimental study was performed in two phases. The Phase I compared
the performance of the algorithms on four benchmarking functions chosen from
CEC2005 [12]. The Phase II used the algorithms to solve a medical image
segmentation problem and compared their performance.
Phase I—The GA, PSO and SA were used to solve the benchmark functions
with varying dimensions. The dimensions (d) used for this study were (d = 2, d
= 5 and d = 10). The benchmark functions [12] taken for this phase were Ackley,
Rastrigin, Griewank and Bent Cigar. These functions are differing from each other
by their basic properties. The performance metrics measured (solution obtained (So),
number of generations (NoG), and the execution time (ExeTime)) for GA, PSO and
SA on solving the above benchmarking functions are presented in Table 1. As shown
in Table 1, for Ackley function, PSO gives best solutions at all the dimensions.GA
takes less execution time as the dimension increases. PSO takes more execution time
as the dimension increases. SA takes more generations to solve the function. It also
noticed that the performance of the algorithms decay as the dimension increases. For
the Rastrigin function, PSO gives the best solutions at lower dimension and SA gives
best solutions at higher dimensions. Execution time taken by GA was less than other
algorithms, at higher dimensions. PSO takes more execution time as the dimension
increases. The results for the Griewank function show that PSO performs superior,
consistently, in all the dimensions in giving best solutions. The ExeTime of GA was
less than other algorithms for higher dimensions. For the Bent Cigar function, for d
= 2 and d = 5, PSO outperformed others by So. At d = 10, SA has given best So.
For d = 2, the ExeTime also less for PSO. However, for higher dimensions GA took
less ExeTime compared to other algorithms.
For unimodal functions (Griewank and Bent Cigar)—PSO has given best So
consistently in all the dimensions.GA was able to get its solutions faster, i.e., with
less execution time. SA could produce good results at few higher dimensional cases,
however, with more number of generations. For multimodal functions (Ackley and
Rastrigin)—PSO was found superior in producing best results than other algorithms
for varying dimensions. GA was found taking less execution at higher dimensions,
and PSO was taken more execution time as dimension increases. SA was found to
take more generations to solve multimodal functions also.
The Phase I of the comparative study revealed that PSO is good at solving all
the functions irrespective of its problem characteristics. However, PSO takes more
execution time as dimension increases and GA takes less time as dimension increases.
SA gives better result at higher dimensions (but not consistently) and it takes more
generations to solve benchmarking functions.
Phase II—The Phase II of the comparative study is to solve a medical image
Segmentation using GA, PSO and SA and to compare their performance. Image
segmentation is the process of partitioning of an image to many segments. Image
segmentation is primarily used in locating objects and boundaries in an image.
Medical image segmentation plays an essential role in computer-aided diagnosis
systems which are used in various applications such as microscopy, X-rays, and MRI
scans. Image segmentation is considered to be a most essential medical imaging
process as it extracts the region of interest. The input image taken for segmentation
is depicted in Fig. 1. The steps followed for GA, PSO, and SA to solve the image
segmentation problems are described below. The output images obtained using GA,
PSO and SA are shown in Fig. 1.
The image is divided into subimages and GA is applied to each subimage starting with
a random initial population. Each individual is evaluated using an arbitrary fitness
function. Best-fit individuals are taken and mated to produce offspring in forming
the next generation. Morphological operation is used to make new generation with
the help of crossover and mutation operators. The algorithm finally comes to an end
to give the final segmented subimage. The segmented subimages are combined to
form the final end image. The execution time taken by GA is 37.16 s.
Set a particular threshold level. For each speck in the population—upgrade speck’s
fitness in search space, upgrade speck’s best fitness in search space, and move the
speck in the population. For each speck do—if the swarm gets finer, reward the
swarm and extend the speck’s and swarm’s life, else remove the speck and decrease
the swam’s life. Extend the swarm to breed and it is considered for the next iteration.
Delete the failed swarm and rest the threshold counter. Execution time taken by PSO
is 33.39 s.
The image is divided into subimages, and SA is applied to each subimage. Initialize
the temperature to T. Calculate energy U of the conformation. Alter the system using
appropriate Gaussian distraction. Calculate new energy U 1 of the altered system and
calculate the change in energy of the system as well [det(U) = U − U 1 ]. If [det(U) >
0] accept the altered system as the new conformation. Else accept the altered system
as the new conformation with a probability exp. [det(U)/KT]. Reduce the temperature
corresponding to the cooling schedule. Repeat the above steps until it cools to do a
considerably low value. Now SA has been applied to a subimage. Repeat the above
steps for each subimage and combine all the subimages to get the final segmented
image. Execution time taken by SA is 40.98 s.
On comparing the resultant images and the execution time taken by GA, PSO and
SA, the following inferences are recorded from the Phase II comparative study.
(1) GA has segmented the image better followed by SA and PSO.
(2) PSO takes less execution time but with poor quality of result.
(3) The best-to-worst order of algorithms based on the clarity of output image is
GA, SA and PSO.
(4) The best-to-worst order of algorithms based on the execution time taken in PSO,
GA and SA.
416 E. Lakshmi Priya et al.
4 Conclusions
This paper analyzed the working and the performance of three widely used bio-
inspired algorithms namely GA, PSO, and SA. An elaborative comparative study
was performed in two phases. In Phase I, four benchmarking functions with different
characteristics were solved by GA, PSO and SA and their performance were compared
by three performance metrics. The experiments were done with different problem
dimensions. This phase could identify that PSO was consistently outperforming
other algorithms in producing optimal solution at all the dimensions. GA was found
to take lesser execution time, but not with good solutions. Few interesting higher-
dimensional cases were observed where SA was able to perform better than GA and
PSO. In Phase II, a medical image segmentation problem is solved by GA, PSO
and SA and their performances were compared based on the solution quality and
the execution time. The observations found were—GA was good in producing good
solutions, PSO was good in solving the problem faster and was no special remarkable
performance by SA.
This contradictory performance of the algorithms on the benchmarking problems,
and the real-world problems needs to be investigated further with more extensive
experimental setup and different optimization problems.
References
1. Krishnanand, K.R., Nayak, S.K., Panigrahi, B.K., Rout, P.K.: Comparative study of five bio-
inspired evolutionary optimization techniques. In: Proceedings of 2009 World Congress on
Nature and Biologically Inspired Computing (NaBIC), pp. 1231–1236 (2009)
2. Castillo, O., Martinez-Marroquin, R., Melin, P., Veldez, F., Soria, J.: Comparative study of
bio-inspired algorithms applied to the optimization of type-1 and type-2 fuzzy controllers for
an autonomous mobile robot. Inf. Sci. 192, 19–38 (2012)
3. Kalaiarasi, S., Sirramya, P., Edreena, P.: A review and comparative stud of bio-inspired
algorithms. Int. J. Appl. Eng. Res. 9(23), 23435–23448 (2014)
4. Yadav, A., Vishwakarma, D.K.: A comparative study on bio-inspired algorithms for sentiment
analysis. In: Cluster Computing (2020)
5. Romero-Rodriguez, W.J.G., Baltazar, R., Zamudio, V., Casillas, M., Alaniz, A.: Comparative
study of bio-inspired algorithms applied to illumination optimization in an ambient intelligent
environment. Smart Innov. Syst. Technol. 148 (2020)
6. Shobana, S., Radhika, N.: Efficient cloudlet provisioning using bio-inspired hybrid algorithm
in mobile cloud computing. J. Adv. Res. Dyn. Control Syst. 10(5), 1672–1678 (2018)
7. Aswanth, S.S., Gokulakannan, A., Sibi, C.S., Ramanathan, R.: Performance study of bio-
inspired approach to clustering in wireless sensor networks. In: Proceedings of 3rd International
Conference on Trends in Electronics and Informatics (2019)
8. Harun Surej, I., Ramanathan, R.: A Performance study of bio-inspired algorithms in
autonomous landing of unmanned aerial vehicle. In: Proceedings of Third International
Conference on Computing and Network Communications (2019)
9. Holland, J.H.: Adaptation in Natural and Artificial System. MIT press, Cambridge, USA (1975)
10. Russell, E., James, K.: Particle swarm optimization. Proc. IEEE Int. Conf. Neural Netw. 4,
1942–1948 (1995)
A Comparative Study on the Performance … 417
11. Van Laarhoven, P.J.M., Aarts, E.H.L.: Simulated Annealing: Theory and Applications, pp. 7–15
(1987)
12. Chen, Q., Liu, B., Zhang, Q., Liang, J., Sugunathan, P., Qu, B.: Problem definitions and
evaluation criteria for CEC 2015. In: Proceedings of Special Session on Bound Constrained
Single-Objective Computationally Expensive Numerical Optimization (2015)
A Study on Optimization of Sparse
and Dense Linear System Solver Over
GF(2) on GPUs
Abstract There are various crypt-analytic techniques where solving a large dense
or sparse system of linear equations over finite field becomes a challenge due to
high computation. For instance, the problem like NFS for factorization of large
integers, symmetric ciphers for crypt-analytic problem, discrete log problem, and
algebraic attacks involves solving large sparse or dense linear systems over finite
field. Here, we consider GF(2) finite field. Gaussian elimination is the popular and
relevant method for solving large dense systems while Block Lanczos and Block
Wiedemann algorithms are well known for solving large sparse systems. However,
the time complexity of such popular method makes it reluctant and hence, the concept
of parallelism is made compulsory for such methods. In addition, the availability
of high end parallel processors and accelerators such as general-purpose graphics
processing units (GPGPUs) solves computationally intensive problems in reasonable
time. The accelerators with thousand of cores available today explore the bandwidth
of memory and take advantage of multi-level parallelism on multi-node and multi-
GPU units. Here, we consider Nvidia GPUs like Keplar, Pascal, and Volta along
CUDA and MPI. Also, CUDA-aware MPI leverages GPU-Direct RDMA and P2P
for inter- and intranode communication.
1 Introduction
In today’s world, the growth of digital information has been increased rapidly; there-
fore, information security is imperative for the security requirement of the digital
world. There are various crypt-analytic techniques where solving a large system of
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 419
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_45
420 P. Verma and K. Sharma
linear equations over finite field become a challenge due to high computation. The
system can be either dense or sparse depending on algorithms and problem defined
in cryptography. For instance, the integer factorization problem using number fields
sieve (NFS) [1] algorithm, discrete log problem, symmetric ciphers cryptanalysis,
and algorithms used in algebraic attacks involves handling large system of linear
equations (either dense or sparse) over finite field or GF(2). Gaussian elimination is
the popular and relevant method to handle large dense systems. In order to achieve
the results in short span of time, it is hard to define the hotspots for parallelism with
high extent. Additionally, such huge systems cannot sustain into the memory of a
particular node; therefore, an effective solver based on latest parallel hardware plat-
form and Gaussian elimination approach should be present. Block Lanczos [2, 3] and
Block Wiedemann [4–6] are the popular methods to solve such compute intensive
problems but the time complexity of such systems is cubic, therefore such system
is computationally slow and practically not feasible. To solve compute intensive
problems in reasonable amount of time, accelerated units such as general-purpose
graphics processing units (GPGPUs) are accomplished. Now it is very popular to
create supercomputer in the form of clusters where each node hosts multiple GPUs.
If we see the top 500 supercomputer list [7], we found that most of them are GPUs
based. Thus, it is necessary to develop applications in a way that it can be efficiently
scaled over multiple GPGPUs and nodes. The original method for Block Lanczos
algorithm [8, 9] is roughly split in three steps. The steps are preprocessing, Lanczos
iterations, and post-processing and at higher densities greater than 10%, the Block
Lanczos is quite costly in terms of performance.
This paper describes the research work for optimizations carried out on an existing
GPU enabled code for Gaussian elimination and Block Lanczos algorithm. The opti-
mization exercise started with understanding, performance profiling of the existing
methods. The next section gives the details of literature review. Section 3 explains
parallel methodology of Gaussian elimination and Block Lanczos over GF(2) [3,
10]. The optimization in multiple hardware platforms is explained in Sect. 4. The
next Sect. 5 shows the results over different hardware platform for performance and
scalability of Gaussian elimination and Block Lanczos for dense and sparse system
over GF(2), and finally Sect. 6 concludes the paper.
2 Literature Review
In order to solve dense and sparse system of linear equations over GF(2), the methods
that are available were implemented serially [11, 12]. The parallel implementation
is also available [13], but they are not optimized with latest hardware platforms
available and hence not fully utilized the available hardware resource of latest existing
technology.
Nvidia introduces series of accelerating cards for researchers to make their appli-
cation parallel and solve bigger problems in a reasonable amount of time [14].
Figure 1 shows the architecture of Nvidia GPU where grid, blocks, and thread are
A Study on Optimization of Sparse and Dense … 421
arranged. For solving large system of dense linear equations, Gaussian elimination
is a prominent area for researchers and the research to optimize its parallel version
is less focused.
Koc and Arachchige [15] proposed Gaussian elimination algorithm over finite
field GF(2) and implemented the same on the geometric arithmetic parallel processor
known as GAPP. Parkinson and Wunderlich [16] proposed the parallel Gaussian
elimination for finite field GF(2) and the same was deployed on the parallel array
processor named as ICL-DAP. Bogdanov et al. [17] used a hardware that is parallel
in architecture to solve Gaussian Elimination over finite field GF(2) quickly. This
architecture was implemented on a field-programmable gate array (FPGA). In addi-
tion to this, the author also evaluates for a possible implementation based on ASIC
architecture. All these solutions can solve only small systems of dense over finite
field GF(2) and are very costly using special kind of hardware platforms. Albrecht
and Pernet [18] proposed the solution of dense system of linear equations over finite
field GF(2). The solution used multicore architectures and are very efficient and
part of the Method of four Russians (M4RI) library [19]. This solution shows the
performance results for 64 × 64 K linear systems of equations and presented that
their method is as good as to the implementation by Allan Steel [20] for solving
Gaussian elimination over finite field or GF(2) using MAGMA library. The work to
solve Gaussian elimination over finite field or GF(2) in a general-purpose processor
is not yet focused. This solution shows the performance results for 64 × 64 K linear
systems of equations and presented that their method is as good as to the implemen-
tation by Allan Steel for solving Gaussian elimination over GF(2) using MAGMA
library. This is the first work to solve Gaussian elimination over GF(2) in GPGPUs.
422 P. Verma and K. Sharma
The challenge with the sparse matrix is to reduce the substantial memory require-
ments by accumulating the only nonzero elements. Depending on the sparsity factor
distinct data structures can be utilized to save enormous amount of memory. Formats
to save only nonzero elements can be divided into mainly two groups. The first
groups are those that support modification efficiently. For instance, dictionary of
keys (DOK), list of lists (LOL), or coordinate list (COO) comes under this cate-
gory and typically used for constructing the matrices. The second groups that help
effective entree and matrix actions, such as compressed sparse column (CSC) or
compressed sparse row (CSR) [21, 22]. Figure 3 shows storage representation of
dense and sparse matrix format.
The system has the order of hundreds of thousands of unknowns [23, 24]. There-
fore, an efficient optimized Block Lanczos solver for large sparse systems should be
available which can run on multiple instruction multiple data (MIMD) architecture
shown in Fig. 2, a kind of cluster of multiple nodes, each consist of either one graphics
processing unit or multiple graphics processing unit. This study shows that how a
Gaussian elimination for dense system and Block Lanczos for sparse systems lever-
ages parallel hardware and scales efficiently over MIMD architecture with hybrid
technology [25].
Fig. 2 Multiple GPU devices across multiple nodes using MPI and CUDA
Given a system of linear equations over GF(2) and the task is to find out the equations
linearly dependent on others and remove them. Consider the system of equations
where no. of equations equals to no. of variables and of order O (105 ) or higher.
The system is of the form A * x = B (mod 2) where Matrix A is dense its 50% of
the elements nonzero and no. of rows is greater than no. of columns. All arithmetic
operations are over GF(2) which means that addition and multiplication is equivalent
to logical XOR and logical AND, respectively. The Gaussian elimination to solve
large dense system of equations has following steps:
4.4 Optimization
The initial method for Block Lanczos algorithm is roughly split in three steps
preprocessing, Lanczos iterations and post-processing shown in Fig. 4.
In the preprocessing step, operation such as memory allocation, initialization, and
loading of the linear system data are done. The Lanczos step involves the iterative
part of code that computes the solution and finally in post-processing step solution
is written to file. The optimization work that has been explored is as follows.
The method requires sparse linear systems as input for benchmarking the perfor-
mance. A new data generating module should be present, which is faster and can
generate arbitrary relations between columns of the matrix.
The Lanczos step involves repeated calls to two GPU kernels, the sparse matrix–
vector multiplication (SpMV) and sparse matrix transpose vector multiplication
(SpMTV). The high percentage share of these two kernels makes them primary candi-
date for optimization. Performance of both the kernels is improved with following
techniques. The SpMV and the SpMTV are both matrix–vector multiplication. The
matrix–vector multiplication is composed of multiple dot products. Multiple dot
products can be executed in parallel and warp (vector of 32 threads) is dedicated for
computing one dot product.
The dot product operation involves two steps: first pointwise multiplication and
second is adding all multiplication results together. The pointwise multiplication can
be done in parallel by each thread of the warp. However, for adding the multiplication
results together, is reduction operation and thus threads need to cooperate. The Kepler
architecture introduced four shuffles instructions: _shfl(), _shfl_down(), _shfl_up(),
and _shfl_xor(). Figure 5 shows shuffle down operation on 8 threads. Shuffle instruc-
tions allow faster cooperation between threads from same warp. Effectively, threads
can read registers of other threads in the same warp. The reduction operation in
new version of SpMV is implemented using shuffle instructions. The shuffle-based
reduction performs better than even the shared memory atomics-based implemen-
tation. This modification leads to better work distribution among threads of a warp
and reduces warp divergence significantly. Warp-level approach also results in more
coalesced memory access.
6 Conclusions
This paper presents a study on optimization of scalable solution for solving large
sparse and dense systems of linear equations over finite field or Galois Field for binary,
i.e., GF(2). These solvers are utilized as a library for various cryptography and crypt-
analysis applications like integer factorization problem using NFS, cryptanalysis of
ciphers, DLP, algebraic attacks, etc. The research work explored CUDA and MPI
to leverage multi-level parallelism used in multi-socket, multi-GPU systems. Many
optimizations techniques with respect to solving large dense and sparse systems are
discussed to tell about the capabilities of the device kernels and excellent scalability
in multi-GPUs architecture. At higher densities (>10%), the Block Lanczos is quite
costly in terms of performance. For such cases, even the dense solver such as Gaus-
sian elimination can be tried. The SpMV and SpMTV are essentially matrix–vector
operations. In SpMV the matrix is in normal format while in SpMTV the matrix is
in the transposed format. This change leads to huge change in performance of code.
The transpose multiply is 3–4x slower than the normal multiply. The overhead of
this approach in terms of execution time is, time needed for transposing the matrix
and in terms of memory is doubling the storage space of matrix. The future research
is to explore hotspots in a program which are massively parallel and offload it to
the GPGPUs. We also focus on memory out of bound, where the system of linear
equations overtakes the memory space of an individual GPU.
References
1. Wang, Q., Fan, X., Zang, H., Wang, Y.: The space complexity analysis in the general NFS
integer factorization. Theor. Comput. Sci. 630, 76–94, (2016). ISSN: 0304–3975, https://doi.
org/10.1016/j.tcs.2016.03.028
2. Sengupta, B., Das, A.: Use of SIMD-based data parallelism to speed up sieving in integer-
factoring algorithms. IACR Cryptol. 44 (2015)
3. Intel Corp.: Technical Report. https://en.wikipedia.org/wiki/Lanczos algorithm (2009)
4. Giorgi, P., Lebreton, R.: Online order basis algorithm and its impact on the block Wiede-
mann algorithm. In: Proceedings of 39th International Symposium on Symbolic and Algebraic
Computation (ISSAC’14), pp. 202–209. ACM (2014)
5. Huang, A.G.: Parallel Block Wiedemann-Based GNFS Algorithm for Integer Factorization.
Master thesis, St. Francis Xavier University, Canada (2010)
6. Zhou, T., Jiang, J.: Performance modeling of hyper-scale custom machine for the principal
steps in block Wiedemann algorithm. J. Supercomput. 1–23 (2016)
7. Top 500 list—Nov 2017. https://www.top500.org/list/2019/11/
8. Summit: Oak Ridge National Laboratory’s Next High-Performance Supercomputer. https://
www.olcf.ornl.gov/olcfresources/computesystems/summit
9. Flesch, I.: A new parallel approach to the Block Lanczos algorithm for finding null spaces over
GF (2). Master thesis, Utrecht University, The Netherlands (2006)
10. Thomé, E.: A Modified Block Lanczos Algorithm with Fewer Vectors. arXiv:1604.02277
11. Yang, L.T., Huang, Y., Feng, J., Pan, Q., Zhu, C.: An improved parallel block Lanczos algorithm
over GF (2) for integer factorization. Inf. Sci. 379, 257–273 (2017). ISSN 0020-0255, https://
doi.org/10.1016/j.ins.2016.09.052
A Study on Optimization of Sparse and Dense … 427
12. Xu, T.L.: Block Lanczos-Based Parallel GNFS Algorithm for Integer Factorization. Master
thesis, St. Francis Xavier University, Canada (2007)
13. Yang, L.T., Xu, L., Yeo, S.S., Hussain, S.: An integrated parallel GNFS algorithm for integer
factorization based on Linbox Montgomery block Lanczos method over GF (2). Comput. Math.
Appl. 60(2), 338–346 (2010)
14. Reaño, C., Silla, F.: Performance evaluation of the NVIDIA pascal GPU architecture. In:
2016 IEEE 18th International Conference on High Performance Computing and Communi-
cations, pp. 1234–1235. Sydney, NSW (2016). https://doi.org/10.1109/HPCCSmartCity-DSS.
2016.0173
15. Koc, K., Arachchige, S.N.: A fast algorithm for gaussian elimination over GF (2) and its
implementation on the GAPP. J. Parallel Distrib. Comput. 13(1), 118–122 (1991)
16. Parkinson, D., Wunderlich, M.: A compact algorithm for gaussian elimination over GF (2)
implemented on highly parallel computers. Parallel Comput. 1(1), 65–73 (1984)
17. Bogdanov, A., Mertens, M.C., Paar, C., Pelzl, J., Rupp, A.: A parallel hardware architecture
for fast gaussian elimination over GF (2). In: 14th IEEE Symposium on Field-Programmable
Custom Computing Machines, pp. 237–248 (2006)
18. Albrecht, M.R., Bard, G.V., Pernet, C.: Efficient dense gaussian elimination over the finite field
with two elements. CoRR, abs/1111.6549, 2011
19. M4ri library. https://github.com/malb/m4ri
20. Bosma, W., Cannon, J., Playoust, C.: The magma algebra system I: the user language. J. Symbol.
Comput. 24(3–4), 235–265 (1997)
21. Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector
and matrix-transpose-vector multiplication using compressed sparse blocks. In: Proceedings of
the Twenty-First Annual Symposium on Parallelism in Algorithms and Architectures (SPAA
’09). Association for Computing Machinery, New York, NY, USA, pp. 233–244. https://doi.
org/10.1145/1583991.1584053
22. Vastenhouw, B., Bisseling, R.H.: A two-dimensional data distribution method for parallel sparse
matrix-vector multiplication. SIAM Rev. 47(1), 67–95 (2004)
23. Zamarashkin, N.L., Zheltkov, D.A.: GPU based acceleration of parallel block Lancoz solver.
Lobachevskii J. Math. 39(4), 596–602 (2018)
24. GPU acceleration of dense matrix and block operations for lanczos method for systems over
large prime finite field. Supercomput. RuSCDays Ser. Commun. Comput. Inf. Sci. 793, 14–26
(2017)
25. Gupta, I., Verma, P., Deshpande, V., Vydyanathan, N., Sharma, B.: GPU-accelerated scalable
solver for large linear systems over finite fields. In: 2018 Fifth International Conference on
Parallel, Distributed and Grid Computing (PDGC), Solan Himachal Pradesh, India, pp. 324–329
(2018). https://doi.org/10.1109/PDGC.2018.8745743
Intracranial Hemorrhage Detection
Using Deep Convolutional Neural
Network
K. Thirunavukkarasu, Anmol Gupta, Satheesh Abimannan,
and Shahnawaz Khan
1 Introduction
A debilitating illness classifies intracranial hemorrhage (ICH) [1]. This illness is one
of the leading causes of death and injury and causes a stroke. Intra-crane hemor-
rhage is identified as bleeding inside the human body’s skull. Traumatic brain injury
(TBI) is among the leading causes of death and disability in USA. This represents
nearly 30% of all deaths in 2013. There is a high risk of TBI transforming into a
secondary brain injury that can lead to insensitivity. If it remains untreated, it may
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 429
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_46
430 K. Thirunavukkarasu et al.
The paper proposes a method using deep CNN and weighted multilabel focal loss
for the classification of intracranial hemorrhages into epidural hemorrhage, intra-
parenchymal hemorrhage, intraventricular hemorrhage, subarachnoid [15] hemor-
rhage, and subdural hemorrhage.
432 K. Thirunavukkarasu et al.
epidural
Dataset
The paper used the intracranial hemorrhage dataset RSNA [16] for the analysis of
intracranial hemorrhage. The dataset contains 4,516,818 DICOM format images of
five different types of intracranial hemorrhage together with its associated metadata
which was labelled with the help of 60 volunteers. Figure 2 shows the distribution
of the training data. Since a standard training and validation split was not provided
for the dataset, we split into 70:30 splits.
The image augmentation techniques such as rotation, zoom, scale, and translate
were performed before splitting the dataset. Also to avoid any data leakage was
performed while evaluating model performance adversarial cross-validation.
Proposed Deep CNN Architecture
Deep CNN with regularizing layers [17] like max pooling. Dropout is used to obtain
CT scans embedding features, which are then classified into four different classes
using a neural network fully convolution.
Architecture of the model with its corresponding input and output form is shown
in Fig. 3. Since the total trainable parameters are 5, 147, 716 which could result in
overfitting while practicing, batch normalization, max pooling, and dropout layers
are applied to the model which increased the generalizability of the proposed model.
Loss Function
Our dataset was highly imbalanced with very few images of epidural hemorrhage
while other forms of hemorrhage had roughly the same distribution. Because of this
imbalance complexity of data loss functions such as categorical cross-entropy, we
did not hit the global minimum and used weighted class approach and weighted
multilabel focal loss to solve this problem and found that the weighted multilabel
focal loss handled the class imbalance problem as shown in Table. 1 very well.
The Weighted multilabel focal loss used in our methodology is given as
1
N M
L= wm .[a + b] (1)
X n=1 m=1
Intracranial Hemorrhage Detection … 433
Fig. 3 Architecture of the proposed method with layer type and output shape
434 K. Thirunavukkarasu et al.
γ γ
where a = (1− ∝). 1 − yn,m .tn.m . ln(yn.m ) and b = α.yn,m . 1 − tn,m . ln 1 − yn,m
1
N M
L= wm .[c.ln(yn , m, t)] (2)
N n=1 m=1
γ
where c = (1− ∝t ). 1 − yn,m .
where w is the class weight, α is weighing factor, and γ is focusing parameter which
is tuned in the range of [0,5] and it was observed that on moving from γ = 0 to γ =
5 the loss evaluated [18] were having higher contributions from imbalanced classes
which were wrongly classified.
After careful experimentation with different methodologies and then evaluation using
log loss evaluation metrics, we found that our proposed method with deep CNN
architecture and weighted focal loss [19] performs very well and achieves 97%
accuracy, whereas, in comparison with other methodologies, the key takeaway is that
regulatory techniques and proper augmentation were the key methods that helped
in achieve top accuracy. Figure 4 shows the training and validation accuracy of our
intended model that is being trained for 40 epochs while Fig. 5 shows our intended
model’s training and validation losses.
Although the number of hemorrhages in the test sets is low, especially if broken
down by type, the findings provide important insights. The highest possible intra-
parenchymal hemorrhages were recognised. Typically, they were hyperattenuating
and enclosed by normal tissue. Epidural hemorrhage was evident straight away.
Intracranial Hemorrhage Detection … 435
References
1. Mandybur, T.I.: Intracranial hemorrhage caused by metastatic tumors. Neurology 27(7), 650–
650 (1977)
2. https://www.pnas.org/content/116/45/22737
3. Rao, A.A., Patel, M.D.: Deep 3D convolution neural network for CT brain hemorrhage classi-
fication. In: Proc. SPIE 10575, Medical Imaging 2018: Computer-Aided Diagnosis, 105751C
(27 Feb 2018). https://doi.org/10.1117/12.2293725
4. Khan, S.N., Usman, I.: Amodel for english to urdu and hindi machine translation system using
translation rules and artificial neural network. Int. Arab J. Inf. Technol. 16(1), 125–131 (2019)
5. https://stats.stackexchange.com/questions/362988/in-cnn-do-we-have-learn-kernel-values-at-
every-convolution-layer
6. https://datascience.stackexchange.com/questions/26755/cnn-how-does-backpropagation-
with-weight-sharing-work-exactly
7. Bashir, T., Usman, I., Khan, S., Rehman, J.U.: Intelligent reorganized discrete cosine transform
for reduced reference image quality assessment. Turkish J. Electr. Eng. Comput. Sci. 25(4),
2660–2673 (2017)
8. https://stats.stackexchange.com/questions/121703/what-does-shift-invariant-mean-in-convol
utional-neural-network
9. Khan, A., Sohail, A., Zahoora, M.M.E., Qureshi, A.S.: A survey of the recent architectures of
deep convolutional neural networks. Artif. Intell. Rev. (2019). https://doi.org/10.1007/s10462-
020-09825-6
10. Shahnawaz, Mishra, R.B.: An English to Urdu translation model based on CBR, ANN and
translation rules. Int. J. Adv. Intell. Paradig. 7(1), 1–23 (2015)
11. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V.,
Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9
12. Kotsiantis, S.B.K.: Supervised machine learning: a review of classification techniques.
Informatica 31, 249–268 (2007)
13. Khan, S., Kannapiran, T.: Indexing issues in spatial big data management. In: International
Conference on Advances in Engineering Science Management and Technology (ICAESMT)-
2019, Uttaranchal University, Dehradun, India (Mar, 2019)
14. L’azaro-Gredilla, M., Liu, Y., Phoenix, D.S., George, D.: Hierarchical compositional feature
learning. arXiv:1611.02252 [Online], https://arxiv.org/pdf/1611.02252.pdf
15. Thorgood, M., Adam, S.A., Nlann, J.: Fatal subarachnoid haemorrhage in young women: role
of oral contraceptives. Brit. Med. J. 283, 762 (1981)
16. https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/data
17. https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/
18. https://machinelearningmastery.com/loss-and-loss-functions-for-training-deep-learning-neu
ral-networks/
19. Multi-class classification with focal loss for imbalanced datasets [Online]. https://www.dlo
logy.com/blog/multi-class-classification-with-focal-loss-for-imbalanced-datasets/
A Multi-factor Approach for Cloud
Security
Abstract Cloud computing is known for its complexity regarding different models
of deployment and services that it offers. However, security remains a massive
hindrance in its development. Hence, a multi-factor approach to secure the cloud
environment proposed by this paper relies on authentication and auditing as the
fundamental elements for sustaining the privacy of information in the cloud environ-
ment. These are needful assets to counter various threats and attacks on the cloud
service provider as well as at the user end. This paper proposes a multi-factor approach
through which a user’s identity is verified securely; as well as a means to build trust
between the client and the cloud service provider by allowing proper visibility of the
user’s activities.
1 Introduction
The security in cloud computing has been a broad viewpoint since cloud computing
works in terms of services they provide to the users. Therefore, its security should
be applied proportionately. Following the architecture of cloud computing, which is
SaaS, PaaS, and IaaS, each level requires particular attention to its security as they do
not face the same threats. Moreover, the cloud service provider should dedicate and
implement the appropriate security role to the needed application or resources so as
not to slow its performance. The security of the cloud environment aims, to ensure
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 437
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_47
438 F. K. Mupila and H. Gupta
the user’s trust in their data and to prevent vulnerabilities from being exploited. It also
aims to prevent threats that cause error to the infrastructure and stops the likelihood
of attacks to occur.
Cloud computing is a virtual base concept of nature. It has its complexities from
services or techniques running in the background, such as its elasticity, scalability,
and ubiquity. Therefore, it becomes hard to secure and easy to breach the security.
Data protection englobes a wide variety of laws, technologies, and policies
to protect data, applications, and the ever-expanding cloud computing infrastruc-
ture. The cloud security alliance (CSA) published a detailed study on the top 12
information security risks [1] which are listed below.
• Data breaches
• Identity, Credential, and Access Management insufficient
• Vulnerable IPS and APIs
• Vulnerabilities of the system
• Account Hijacking
• Malicious Insiders
• Advanced Persistent Threats
• Data Loss
• Inadequate Due Diligence
• Abuse and Nefarious Use of Cloud Services
• Denial of Services
• Shared Technology Vulnerabilities.
There are various approaches to assess these threats, such as end-to-end strong
and appropriate encryption, better isolation of resources, strong authentication, moni-
toring and auditing among other. In regards to this, this paper proposes a multi-factor
approach where is focus is on authentication, auditing and monitoring which is
composed of steps between the cloud service provider and the end-user to manage
trust and data integrity of user’s data.
By definition, authentication and authorization are two different terms but both of
them have a similar implementation as well. Authentication is performed to verify
the user before accessing any resources. And, authorization focuses on providing
the right access to the right user. Multi-factor authentication (MFA) is defined as an
authentication scheme where a computer user has access only after having success-
fully sent two or more pieces of proofs (or factors) to an authentication mechanism,
for example; information (what the user and only the user knows), ownership (what
the user and only the user possess), and inherence (something the user and only
the user is) [2] Multi-factor authentication approach preserves the confidentiality
and integrity of the users. It deploys multiple manners of authentication which averts
against attacks such as the man in the cloud attack, men in the middle attack, phishing
attack, and so on which may result to the modification of data.
The end user’s concerns are about the storage and location of their data, as long
as they have no physical access to the data centre, although this flaw creates a signif-
icant trust problem with the service provider in which they are subscribed. One of
the schemes to build trust between the user and the cloud service provider is by
A Multi-factor Approach for Cloud Security 439
2 Related Work
In view of my proposed work, there are findings of other researchers whose works
have been highlighted regarding the security of cloud computing. Ganesh V. Gujar
proposed STEP-2 user authentication in which a dynamic token from the hash table is
sent to the user’s email ID; then, the token value is required for the step-2 authentica-
tion at the user interface. An additional feature for the session management in which
the dynamic token generated from hash table will remain valid up to a particular
session only. Once the user logs out from the cloud environment, the token expires
[3].
Prachi Soni proposed a multi-factor authentication security framework in cloud
computing. Here, an elliptic curve point algorithm is implemented, executing ten
steps to assure authentication and authorization. However, data confidentiality,
integrity, and then control access are based on attribute certificate. This technique
is used here to give the power and control to the client by using the combination of
cryptography. This also gives access control to keep the data safe from vulnerabilities
[4].
Sabout Nagaraju has suggested SecAuthn: Secure authentication of multi-factor
authentication for the cloud computing environments. Four steps, notably key creden-
tial, username and password, bio-metric fingerprint, and OTP have been used as the
multi-factor’s aspects. Then, station-to-station Diffie-Hellman key exchange is used
to prepare, encrypt and share one-time session keys, which is followed by the hashed
credentials checked in the authentication servers using only the original creden-
tials. The authentication scheme proposed offers true security to the cloud user’s
credentials with the aid of GNY logic [5].
Kashif Munir, stated that an in-depth security strategy must be enforced to protect
against threats to the integrity and safety of their applications and data. This security
line includes firewalls, identification and prevention of intruders, reputation manage-
ment, log review, and malware protection. Such protection can be used by prudent
organisations and service providers in their cloud infrastructure to provide data
protection so they can have a leverage on cloud computing before their competitors
[6].
440 F. K. Mupila and H. Gupta
3 Proposed Model
This work aims to reinforce the security of cloud computing. Considering the fact
that a single factor is not enough to secure a cloud-based environment, it is proposed
that using a multi-factor approach can provide more security to the environment
and establishes a trust between user and cloud service provider. The proposed work
offers a secure environment presenting a technique to keep track of all the activity,
and a reliable authentication process to give the power to the client in order for the
latter to have control over their activities and data. These processes verify the user’s
credentials and monitor their access control over the resources.
As previously mentioned, authentication is crucial for information privacy, there-
fore, this work has been divided in such a way authentication is counted in order to
mitigate various cyber-attacks over cloud computing, furthermore, monitoring the
activities of the user, and their access control refers as the second part of this work
where the focus is on granting to the user the visibility over their logs record and
the ability to track their activities following the six-steps of the proposed framework
where security responsibilities are shared between the client and the cloud service
provider in order to avoid a lengthy process at only one side of this communication.
Consequently, one of the most important difficulties in integrating cloud-based
security is to ensure unified access and accountability across the various domains,
however, a mixture of public, private, and even hybrid cloud-based services, makes
integration of security services a key task, despite several established networks [7].
The lack of visibility creates gaps in the overall safety of an organisation’s network,
making it difficult to see attacks. In the old network architecture, all structures inside
the wall of an organisation, that is, under the control of the organisation, were not an
essential challenge to maximum visibility in the network. However, when the cloud
is adopted, some control is lost and consequently the correct visibility is no longer
achievable. Visibility is the main take-over, since devices that you can’t see can’t be
secure [8].
The concerns of this work take place in the cloud service provider side as well as at
the user side. Amid the six-steps which contains this process, step one and four are
executed at the client-side, whereas, step two, three, five, and six are executed at the
service provider side.
A Multi-factor Approach for Cloud Security 441
It is known that security over private cloud is more secure since all data are within
the boundary (firewall) and are not available for the general public, meanwhile, in
public cloud, the subscriber or client does not know the structure of the data centre,
particular which server processes the data, how is the implementation of the network,
or how secure is the environment.
As a matter of fact, there is an urgent need to focus more on preventing breach of
confidence than on post-service lack of accountability from diminishing the concerns
that hinders its progress so that we could fully benefit from the unprecedented advan-
tages that cloud computing has to offer. An effective standardised trust management
system is necessarily required for the individuals and organisations to utilise the
potential benefits served by cloud computing technology adequately [9].
Figure 1 illustrates the basic interaction between the client and the cloud service
provider. It is well known that authentication is the first operation to take place in
order to authenticate the user’s credentials.
(1) Step 1
The step one of this proposed model concerns a standard login process which
means the user has to enter its credential to access the requested resource in the
cloud. However, to mitigate identity theft, a unique verification method needs
to be set to confirm the email through which the user uses to register for the
cloud service. And, the establishment of some other parameters to be used in
further steps.
(2) Step 2
After the user gained access across the cloud services, now it is the responsibility
of the cloud service provider to monitor all the activities performed by the user.
The first task from the cloud service provider is to send a report of login to the
client through its email address given at the registration time containing details
of login such as:
• Location
• Device IP address
• Device type
442 F. K. Mupila and H. Gupta
(3) Step 3
In this method, session time is introduced not only to establish the trust between
user and service provider but to maintain the confidentiality and integrity of data
as well.
At the time of registration or subscription to the cloud, the user must determine
the duration of their daily session in such a way that allows the service provider
to convey a new password to the user email address once their scheduled session
expires. The session is suspended until the user logs in again.
The user needs to log in again with a new password, which is shared from the
service provider through the registered email address to maintain the session
active. An email is sent to the user contains an encrypted password of a minimum
12 characters. The user is obligated to determine its decryption method at the
registration time to be able to decrypt the new password and to resume its
session.
(4) Step 4
Once the user logs in again with the new password, the session resumes unless
there is any mismatch with the one sent. Accordingly, if there is the presence of
an attacker, automatically he or she is going to be logged out from the session.
Most important of all is that, all the connected devices which has failed to
reconnect using the new password are going to be out of the session.
A Multi-factor Approach for Cloud Security 443
(5) Step 5
The cloud service provider ensures the security of client over the remote network
considering all the security methods applied by the client or subscriber will no
longer be taken or supervised by them. This step provides more visibility of the
user activity. Subsequently, after the client has entered the new password, the
cloud service provider has to perform another action to maintain the visibility
of the user activities. Therefore, another login report is sent to the user using
the registered email address considering:
• Current image of the client
• Screenshot of the current page
• Monitored report of the previous session
• IP address confirmation
• Location confirmation
(6) Step 6
The final step consists of a final report sent from the cloud service provider to
notify the client about the completion of the current session.
The challenges of cloud security are not insurmountable. With the right partners,
technology and foresight, companies can leverage the benefits of cloud technology.
A trusted administration service can be cloud independent, but trust techniques
and evaluations features must be consistent with the IaaS, PaaS, or SaaS cloud model
underlying this approach [10]. We argue that consideration of a multi-factor strategy
has to capabilities to establish potential trust.
Management view point and techniques is crucial. Out of several steps, it is
believed that these steps are essential in the secure functioning of the cloud. These
six-steps are user friendly because of the security share method it deploys from the
user side and the service provider, quick, reliable, and robust.
4 Research Analysis
The proposed model overcomes some of the threats and attacks cited in the table
below. Besides granting authority to the users, control is another critical question
that builds trust. In fact, we trust a system less when we don’t have much control
over our assets.
There is no way, of course, to ensure that cloud is fully safe for customers. The
importance of trust differs between organisations, depending on the nature of the
data. Therefore, the less confidence a company places in the cloud provider, the
more it needs the technology to monitor its data [11].
Table 1 indicates some attacks and threats which challenges the cloud environment
and averts the consumer to have a complete trust to the service provider.
444 F. K. Mupila and H. Gupta
5 Future Work
Despite the limitations, these are valuable in light of this work, reliable authentication,
and proper visibility of activities which gives authority and establish trust between
user and cloud service provider. Further researches should focus on securing the
email address in such a way to counter an intruder or attacker to have access and to
possess the new password. Besides, the implementation of a secure authentication
method is also favoured.
6 Conclusion
In summary, this paper argued that visibility of activities to the user is a valuable asset
because it brings trust to utilise the potential benefits served by cloud computing tech-
nology adequately. This work identifies the types of cloud services that this technique
supports and develop a suitable trust management system. In addition to authentica-
tion and authorization procedures, we assume that using the audit monitoring system
to monitor all positive and unsuccessful authentication and access attempts to be
genuine way to build trust and to asses attacks. For this reason, both the client and
the cloud service provider are responsible for maintaining its security.
References
1. Walker, K.: The treacherous twelve’ cloud computing top threats. In: RSA Conference Booth
#S2614, SAN FRANCISCO, Cloud Security Alliance, 29 Feb 2016
A Multi-factor Approach for Cloud Security 445
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 447
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_48
448 F. K. Mupila and H. Gupta
Username
Password
Remember Me Log In
An Innovative Authentication Model for the Enhancement … 449
2 Importance of Security
3 Related Work
Several techniques and methods have been deployed to provide sustainable authen-
tication; some of these techniques and work based on authentication are listed
below:
(1) Certificate-based authentication is usually used in the current industry because
a digital certificate has been incorporated to authenticate the customer. For the
first time, a user uses a service; a user installs a unique certificate on their
450 F. K. Mupila and H. Gupta
device, and when the user enters the service, they ask the computer for that
specific certificate and only access is given if the certificate is valid [10].
(2) Server authentication with location verification, the aim behind this paper is to
strengthen the problem of Web authentication. Here, the author uses a concept to
leverage the server location as the second factor of its authenticity by introducing
a location-based server authentication while preventing server impersonation at
any cost even if the secret of the victim server is known by the attacker [11].
(3) Security algorithms for cloud computing has been reviewed as symmetric algo-
rithms for different encryption and encoding strategies and has concluded that
EAS is a good choice for key encryption and that MD5 is faster for encoding.
In addition, it can be improved by using 1024-bit RSA and 128-bit keys with
RSA EAS encryption algorithm. This monitors the data protection of cloud-
based applications. The private key cannot be evaluated using AES although
the attacker provided the public keys [12].
The authentication breach is specified as the root of data losses in the cloud
environment. Once addressed, the customers will be ensured that the integrity of
their data stored in the cloud infrastructure is secure as just the tree in the soil is
secured from the root.
4 Proposed Model
This work aims to provide a secure procedure through which the user’s credentials
are entered and verified to gain access to the cloud environment. For this purpose, this
conceptual framework proposes an encrypted certificate and token build-up deployed
to provide a trusted authentication and authorization between the clients and the cloud
service provider by using the client’s geographical location. Additionally, with the
help of the Geolocation API used in google chrome55 and other upgraded Web
browsers, HTML5 Geolocation API is used to gather the user’s geographical location.
Even if this can compromise the user’s privacy, as mentioned in the new regulation
of privacy and the statement, the position is not procurable unless the user consents
to it [13].
Using this new feature allows the Web browser to notify the Web server about the
user’s accurate location. Similarly, there are a large number of factors that this feature
is dependent on, from technological, geographical and even to physical, to influence
how particular this feature is it implemented in the real world [14]. Subsequently,
this work is going to take place in three phases where each of them is performing and
executing a particular task in such a way to enhance the security over the process of
identifying users before gaining access to their resources.
The API gateway involved in this work acts as a reverse proxy to collect the user’s
requests and redirect them to the microservice in charge. Relatively, it decreases the
security breaches as only one public IP address it is used publicly [15].
An Innovative Authentication Model for the Enhancement … 451
Authentication
Server
Figure 2 shows the cycle which the authentication of this proposed work follows,
henceforth, to consolidate the user’s verification operation into the three phases.
1. Phase 1
It incorporates the communication between the client service inserted on the Web
browser and the API gateway. From this point, the client’s requests for a Web page
from the Web server are executed by the gateway API as it is hosted inside the Web
server and acts as an entry point. Generally, API gateway accommodates SSL
certificate, authentication, authorization and many more microservices according
to the Web server’s configuration. Additionally, API gateway is configured in such
a way to receive and manage all the static requests from clients.
It receives the client’s request, and then, it forwards the login page. Besides, the
gateway API is configured in such a way to share the cryptographic hash function
and the cryptographic key with the client service to execute the HMAC algorithm
at the client service side. The preference over hash function here is because of its
swiftness in computing and its ability to minimize any duplication of the output
value.
2. Phase 2
After the exchange is established, the client service sends the user’s credentials
along with the users’ locations by dint of the Geolocation API of the browser
to the authentication server. Basically, the content sent through the network to
the authentication server is a hash-based message authentication code. From this
452 F. K. Mupila and H. Gupta
point, the integrity and confidentiality of the user’s credentials and the loca-
tions are guaranteed. Subsequently, the cryptographic hash function used for this
model is the SHA3-512. The authentication server stores the results if correct,
so that the further request cannot use the same details (the user name and pass-
word) to request access to the resources. This procedure averts against one of
the challenging threats faced by the cloud service provider, which is the replay
attack.
Here are the sub-processes of this phase; at first, the authentication server gener-
ates the JWT Web token to be sent to the client service. However, the token
contains a new claim by adding the user’s location into the payload. Following
that, the authentication server and the Web server execute the key distribution
exchange to use the RSA cryptographic system. Finally, an encrypted certificate
containing the user’s location is shared between the authentication server and the
Web server to verify the authenticity of the received token due to the additional
claim added. Beyond the ability to be a public-key cryptosystem, RSA is used
in this proposed model to sign the certificate and counter any alteration to the
message.
Figure 3 shows how the exchange of the token, the key and the certificate takes
place. Besides, there is an important aspect, namely the refresh token. Every new
request sent from the client service has to bring up to date its location to the
authentication server, and then, the operation to get the JWT token takes place
again.
3. Phase 3
The encrypted token reaches the client service and then sent to the Web server to
validate the JWT token. And then, the pieces of information held in the certificate
are verified with the private key of the RSA cryptosystem before gaining access
to the resources.
Such security techniques ensure data transfer, the security of the user interface,
the security for the separation of data, the storage of data and the user access control
[16]. This research uses a token to make security decisions and store tamper-proof
information about a device individually. Although a token is usually used to reflect
only cryptographic details, it is also capable of carrying additional free-form data
that can be added when the token is being produced. Lack of good authentication can
result in illegal disclosure to the cloud domain user accounts, which can contribute
to breaches of privacy. Similarly, the absence of authorization in cloud computing
leads to infringements of privacy when unauthorized parties enter the user’s database
[17].
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.
eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4
gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.
SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
454 F. K. Mupila and H. Gupta
5 Future Work
It will be important that future research investigates whether the request is from an
authorized user. For that, it requires the efficacy and security of data transactions in
order to guarantee the integrity and confidentiality of the user’s data by implementing
a secure authentication technique. Most users prefer easy passwords which becomes
easy for an attacker to guess. But even the best password can also be stolen by brute
force and dictionary attacks. Taking this into consideration, future work must consist
of a more in-depth analysis in the complexity present in the cloud environment, and
a hybrid encryption system is preferably recommended to strengthen security.
An Innovative Authentication Model for the Enhancement … 455
5.1 Conclusion
The main conclusion that is drawn is that even though there is no such a robust or
stringent technique to implement the security into the Web environment, this paper
presents a model to authenticate the user with the help of the user’s geolocation. Cloud
security architecture works effectively only when the correct defensive implementa-
tions are in place and considered efficient only when it can recognize the questions
that arise the security management. To this end, the user’s access to the resource is
granted after the token, and the certificate is validated.
References
1. Turner, D.M.: Digital authentication: the basics. In: Cryptomathic. Archived from the Original
on 14 Aug 2016. Retrieved 9 Aug 2016
2. Gharami, S., Dinakaran, M.: Sequential mathematical solution for authentication and autho-
rization technique implementing encryption methodology creating a secure transaction using
various methods also at the quantum level. In: IOP Conference Series: Materials Science and
Engineering 2017
3. From Wikipedia, the free encyclopedia, https://en.wikipedia.org/wiki/cloud_computing_sec
urity
4. Bhardwaj, A., Goundar, S.: A framework to define the relationship between cybersecurity and
cloud performance. Comput. Fraud Secur. (2019)
5. Indu, I.A., Rubesh Bhaskar, P.M., Vidhyacharan: Identity and access management in a cloud
environment. Mech. Challenges Eng. Sci. Technol. Int. J. 21, 574–588 (2018)
6. Wueest, C., Barcena, M.B., O’Brien, L.: Mistakes in the IAAS Could Put Your Data
At Risk. https://www.symantec.com/content/en/us/enterprise/media/security_response/Whi
tepapers/mistakes-in-the-iaas-cloud-could-put-your-data-at-risk.pdf. May 2015
7. Subramanian, N., Jeyaraj, A.: Recent security challenges in cloud computing. J. Comput.
Eelectr. Eng. 71, 28–42 (2018)
8. Farooq, H., Lokhande, T.S., Rajeshri, R.: A review on cloud computing security using
authentication techniques. Int. J. Adv. Res. Comput. Sci. 8(2) (2017)
9. Kshetri, N.: “Privacy and security issues in cloud computing” The role of institutions and
institutional evolution. Telecommun. Policy 37, 372–386 (2013)
10. From Wikipedia, the free encyclopaedia. https://en.wikipedia.org/wiki/basic_access_authentic
ation
11. Yu, D.-Y., Ranganathan, A., Masti, R.J.: Salve: server authentication with location verification.
In: International Conference on Mobile Computing and Networking, Mobicom 2016
12. Bhardwaj, A., Subrahmanyam, G.V.B., Avasthi, V., Sastry, H.: Security algorithms for cloud
computing. In: International Conference on Computational Modelling and Security CMS
(2016)
13. The World’s Largest Web Developer. https://www.w3schools.com/html/html5_geolocation.asp
14. Rich, B.: Everything You Ever Wanted to Know About Html5 Geolocation Accuracy. Feb
2018. https://www.storelocatorwidgets.com/blogpost/20453/everything_you_ever_wanted_
to_know_about_html5_geolocation_accuracy
15. Bush, T.: API Gateway. 11 June 2019. https://nordicapis.com/what-is-an-api-gateway/
16. Gonzalez, N., Miers, C., Redigolo, F., Simplicio, M., Carvalho, T., Naslund, M., Pourzandi,
M.: A Quantitative Analysis of Current Security Concerns and Solutions for Cloud Computing.
Springer (2012)
17. Raju, B., Swarna, P., Rao, M.: Privacy and security issues of cloud computing. Int. J. (2016)
Substituting Phrases with Idioms:
A Sequence-to-Sequence Learning
Approach
Nikhil Anand
1 Introduction
Communication has evolved in thousands of years. Humans have covered a very long
journey, starting from cave paintings to modern language. Cave paintings, ideograms,
petroglyphs, pictograms, and writing, all these communication techniques have the
same common idea of conveying meanings from one individual to another or one
group to another. Language is an ordered system of communication that has emerged
in the past thousands of years and is continuously evolving. New words, phrases, and
proverbs are continuously being added to the languages. Apart from time, language
also got reshaped among a community and group of people. This leads to variation
in the same language in different periods and within different groups.
Idioms are part of any language that has a metaphorical meaning which is different
from the actual meaning of words comprising it. These phrases amplify the sentence
N. Anand (B)
Internshala, Gurugram, India
e-mail: nikhil@internshala.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 457
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_49
458 N. Anand
when they are used. New idioms are being added from time to time, gaining popu-
larity and becoming part of our daily use. Semantic and syntactical rules specific to
any language make linguistics difficult. Irregularities also emerged due to different
writing styles.
This paper explores the possibility of augmenting natural text by substituting
phrases by idioms. This is a development of the previous work on implementing
an idiom recommendation system using POS tagging and sentence parsing [1]. The
earlier work was entirely based on handcrafted rules. In the field of NLP, we have
observed that the irregularities in natural language restrict the effectiveness of rule-
based methods for any task. The impressive results of neural networks in various areas
of natural language processing have influenced this work. From sentiment analysis
to anomaly detection, from text generation to image captioning, deep learning has
shown its unmatched capabilities [2–5].
In this work, a sequence-to-sequence model is proposed that translates a sentence
without any idioms to sentences with idiomatic phrases based on the context.
Different variations of RNN encoder–decoder models for the experiment without
explicitly defining any syntactic rules. The paper is framed in the following manner:
Sect. 2 represents the literature survey, Sect. 3 represents the methodology, Sect. 4
represents the results, and Sect. 4 represents the conclusion.
2 Literature Survey
2.1 Encoder–Decoder
The growing popularity of deep learning has evolved various architectures for
different applications in machine learning. From image recognition to machine
translation, all the tasks have specialized deep learning architectures [6, 7].
Encoder–decoder is one such neural network architecture, which is used for image
compression, neural machine translation, anomaly detection [3, 7, 8] etc.
The encode–decoder architecture contains two connected layers known as encoder
and decoder. When the encoder receives a source sequence, it reads the sequence
and converts it to a low-dimensional hidden state feature vector; the process is called
encoding. The decoder reverses the process by transforming the low-dimensional
vector back to the sequence, the process is called decoding. Since encoder–decoder
architecture is an end-to-end machine learning model, the results are not visible, and
it can be seen as a mapping source sequence to target sequence via an intermediate
hidden layer as a feature extractor.
In machine translation, the encoder–decoder architecture was implemented
successfully before. Different variations were implemented in the past few years.
An RNN encoder–decoder was proposed for statistical machine translation [7].
Another similar approach with LSTM layers as encoder and decoder was proposed
Substituting Phrases with Idioms: A Sequence-to-Sequence … 459
for machine translation which achieved greater BLUE scores for even much longer
sentences [9].
Recurrent neural networks (RNN) are specialized neural networks that give impor-
tance to the order of input in long sequential input [10]. The gated architecture of
recurrent neural networks such as LSTM and GRU has gained popularity in recent
years due to their capability in capturing sequential regularities. There are two major
problems associated with RNN, vanishing gradient, and exploding gradient [11,
12]. Long short-term memory popularly known as LSTM is a solution to vanishing
gradient descent problems in recurrent neural networks [13]. It solves the problem
by using gated architecture. Gates are used at each input state to decide how much
the new input should be written in the memory cell and how much the content of the
current memory cell should be forgotten. LSTM architecture is defined as:
s j = RLSTM s j−1, h j = c j ; h j (1)
c j = f c j−1 + i z (2)
h j = o tan h c j (3)
i = σ x j W xi + h j−1 W hi (4)
f = σ x j W x f + h j−1 W h f (5)
o = σ x j W xo + h j−1 W ho (6)
z = tan h x j W x z + h j−1 W hz (7)
y j = OLSTM s j = h j (8)
s j ∈ R 2·dh , xi ∈ R d x , c j , h j , i, f, o, z ∈ R dh , W xo ∈ R d x × dh , W ho ∈ R dh × dh
(9)
Here, cj and hj are memory and hidden state component, respectively. Three gates
are there—i, f, and o which stands for input, forget, and output gate.
Gated recurrent unit popularly known as GRU is an alternative to LSTM. LSTM
architecture is hard to explain and its complexity makes it hard to analyze [14].
460 N. Anand
There are computational constraints with LSTM networks as well. GRU architecture
overcomes these shortcomings. It has fewer gates than LSTM and does not have a
separate memory cell. GRU architecture is defined as:
s j = RGRU s j−1, x j = (1 − z) s j−1 + z s ∼j (10)
z = σ x j W x z + s j−1 W sz (11)
r = σ x j W xr + s j−1 W sr (12)
s ∼j = tan h x j W xs + r s j−1 W sg (13)
y j = OGRU s j = s j (14)
s j , s ∼j ∈ R ds , xi ∈ R d x , z, r ∈ R ds , W xo ∈ R d x X dh , W so ∈ R ds X ds (15)
In bidirectional RNN, each element of the sequence is based on both past and
future contexts [15]. Two different RNN, one process from left to right and another
from right to left, are concatenated together. These networks are efficient when
features are extracted from the context window around a word. Bidirectional RNN
is defined as:
biRNN (x1: n , i) = yi = RNNforward (x1: i ); RNNbackward (xn:i ) (16)
Part of speech is word categories in any language. This part of speech includes—
nouns, verbs, adjectives, determiners, adverbs, etc. POS tagging is a technique where
POS tags are assigned to words and word sequences. POS tagging techniques are
classified into two categories—supervised and unsupervised.
Supervised POS tagging technique uses probability for assigning the POS tags.
A large corpus is trained on tagged data. The probabilistic approach is used while
tagging the data considering unigram, bigram, trigram, hidden Markov model, etc.
Due to the sequential training of the POS tagger, these show best results for sequential
data only [16].
Rule-based tagger utilizes grammatical information and handcrafted set of rules
for assigning POS tags. These are one of the earliest tagging practices. The unsu-
pervised approach is not as accurate as of the supervised approach. Although, some
Substituting Phrases with Idioms: A Sequence-to-Sequence … 461
recent work has filled the gap between unsupervised and supervised approaches for
POS tagging by using bilingual graph-based projections [17].
3 Methodology
4 Experimental Setup
4.1 Dataset
For the experiment, 1275 sentences are collected from different web sources. The
dataset contains the pair of two sentences—sentence without idiom and sentence
with idiom phrase. These sentences can be classified between the two idioms—in a
while and for a while/awhile based on the context. The second idiom can be in two
forms—for a while or awhile based on the semantic rules of language. The sample
data is shown in Table 1. The left column has the input sentences while the right
column has the same sentences with idiomatic expressions replacing phrases.
462 N. Anand
Fig. 1 Proposed architecture with concatenated word embedding and part-of-speech embedding
as the input for the encoder–decoder framework
BLEU score is a machine translation evaluation score that compares the n-gram
in the machine-generated text to the n-grams in the reference text. For this model,
we have used a cumulative score from 1-gram to 4-gram, also called BLUE-4 [19].
5 Results
The model is evaluated on 10% of the total data. To evaluate the machine-generated
text, we have used BLEU-4 score. The score for different models is shown in Table 2.
The model is evaluated on 10% of the total data. From the above results, we
observe that concatenated POS tag sequences along with word sequences performed
comparatively better than only word sequences. Except for the GRU, every other
architecture of RNN performed significantly better with the concatenated layers.
Although, the difference in GRU both the different versions is insignificant. The
qualitative evaluation has also shown that concatenated POS tags have predicted
idioms more accurately.
6 Conclusions
References
1. Anand, N.: Idiom recommendation using POS tagging and sentence parsing. In: Kumar, A.,
Paprzycki, M., Gunjan, V. (eds.) ICDSMLA 2019. Lecture Notes in Electrical Engineering,
vol. 601. Springer, Singapore (2020)
2. Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for senti-
ment classification. In: Proceedings of Conference on Empirical Methods in Natural Language
Processing—EMNLP 2015, pp. 1422–1432 (2015, September). https://doi.org/10.18653/v1/
d15-1167
3. Sakurada, M., Yairi, T.: Anomaly detection using autoencoders with nonlinear dimension-
ality reduction. In: ACM International Conference Proceeding Series, vol. 2, pp. 4–11 (2014,
December). https://doi.org/10.1145/2689746.2689747
4. Marcheggiani, D., Perez-Beltrachini, L.: Deep graph convolutional encoders for structured data
to text generation, pp. 1–9 (2018). https://doi.org/10.18653/v1/w18-6501
5. You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2016,
pp. 4651–4659 (2016, December). https://doi.org/10.1109/cvpr.2016.503
6. Calderon, A., Roa, S., Victorino, J.: Handwritten Digit Recognition using Convolutional
Neural Networks and Gabor filters. In: Proceedings of the 2003 International Congress on
Computational Intelligence (CIIC), pp. 1–8 (2003)
7. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical
machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural
Language Processing (EMNLP), pp. 1724–1734, 2014. https://doi.org/10.3115/v1/d14-1179
8. Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Deep convolutional autoencoder-based lossy image
compression. In: Proceedings of 2018 Picture Coding Symposium (PCS 2018), pp. 253–257
(2018). https://doi.org/10.1109/pcs.2018.8456308
9. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv.
Neural. Inf. Process. Syst. 4(January), 3104–3112 (2014)
10. Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990). https://doi.org/10.
1207/s15516709cog1402_1
11. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In:
30th International Conference on Machine Learning (ICML 2013), no. PART 3, pp. 2347–2355
(2013)
12. Pascanu, R., Mikolov, T., Bengio, Y.: Understanding the exploding gradient problem. In: 30th
International Conference on Machine Learning (ICML 2013), no. PART 3, pp. 2347–2355
(2013)
13. Hochreiter, S., Schmidhuber, J.: Long Short-term memory. Neural Comput. 9(8), 1735–1780
(1997). https://doi.org/10.1162/neco.1997.9.8.1735
14. Dey, R., Salemt, F.M.: Gate-variants of gated recurrent unit (GRU) neural networks. In: Midwest
Symposium Circuits System, vol. 2017, pp. 1597–1600 (2017, August). https://doi.org/10.
1109/mwscas.2017.8053243
15. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process.
45(11), 2673–2681 (1997). https://doi.org/10.1109/78.650093
16. Ratnaparkhi, A.: A Maximum entropy model for part-of-speech tagging. Ann. Neurol. 5(1),
133–142 (1996)
17. Das, D., Petrov, S.: Unsupervised part-of-speech tagging with bilingual graph-based projec-
tions. In: ACL-HLT 2011—Proceedings of 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies, vol. 1, pp. 600–609 (2011)
18. Tomás, J., Mas, J.À., Casacuberta, F.: A quantitative method for machine translation evaluation,
pp. 27–34 (2003). https://doi.org/10.3115/1641396.1641401
19. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of
machine translation. Ann. Phys. 371(23), 311–318 (2002). https://doi.org/10.3115/1073083.
1073135
A Composite Framework
for Implementation of ICT Enabled
Road Accident Prediction Using Spatial
Data Analysis
Abstract Matching with the growth of the nations, the road surface transportation
for each country has increased to it maximum capacity. The increased traffics have
left a very little space for the maintenance of the roads and keeping the roads up to the
mark for handling these higher volumes of traffic. Moreover, the traditional method
for road maintenance is manual and it is highly time consuming. Due to this fact,
many of the developed and underdeveloped nations are facing the problem of under
maintained roads, which again leads to the road accidents. The accidents in the roads
not only reduce the effectiveness of the country to match the growth of the industrial
development but also cause the threat to the human life, which is highly unacceptable.
Thus, the demand for the accident prediction automation is the need of the current
research. Many of the parallel research outcomes have aimed to solve this problem
by analysing the road traffic volume. However, many of the researchers, which again
is cited further in this work, have proven that the accidents on the road surface are
caused by the road conditions, rather by the traffic volume. Henceforth, this work
proposes a novel framework for demonstrating the use of ICT enabled methods
for predicting the accident-prone zones by analysing the road conditions. This work
demonstrates nearly 90% accuracy for noise reduction, nearly 98% accuracy for road
surface defect detection and nearly 98% accuracy for predicting the accident-prone
zones for making the road surface transportation a much safer option.
D. A. Kumari (B)
Department of Computer Science, JNTUH, Hyderabad, India
e-mail: anithakumaridara@gmail.com
A. Govardhan
JNTUH, Hyderabad, India
e-mail: govardhan_cse@jntuh.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 465
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_50
466 D. A. Kumari and A. Govardhan
1 Introduction
Indian cities are among the fastest developing metropolises in the world. With the
growth of the economy as the population is continuously increasing in big and
medium sized cities with the increase of the living standards. The future strength
of India lies in urban territories; therefore, it is crucial to evolve these environments.
Smart city solutions are best serving the needs of citizens to live a safe, convenient
and happy life. The mission to develop smart cities in India consists of many diverse
tasks. One of the ways towards smart city is better road condition. Road network
acts as the principal network to smooth out the progress of trade, transport, social
assimilation and financial development. It provides better accessibility, flexibility and
reliability and thereby is the greatest advantage of economies of scale. According
to NHAI, 60% of goods and 80% of fare traffic are carried by the road. Among
all type of transport via road transport is preferable for short distance connectivity.
Mega cities traffic in India is increasing day by day due to increase in the population.
Quantity of vehicles has also been increasing at an average speed of 10.16% per
annum over the decade.
Geometric shapes facilitates in the plane can be utilized instinctively regarding
one’s present area, where case the x-pivot will highlight the nearby north. All the more
officially, such arranges can be gotten from three-dimensional directions utilizing the
cunning of a guide projection. It is absurd to expect to outline bended surface of Earth
onto a level guide surface without disfigurement. The trade-off frequently picked—
called a conformal projection—jelly points and length proportions, with the goal that
little circles are mapped as little circles and little squares as squares.
Further, the rest of the paper is organized such that, in Sect. 2, the problem is iden-
tified and listed for finding the solution in the next phase, in Sect. 2.1, the proposed
architecture is furnished, in Sect. 3, the comparative benefits are listed with parallel
research outcomes and finally, this work presents the final research conclusion in
Sect. 4.
2 Problem Identification
In this section of the work, the problem is identified and presented. With the collected
recommendations from various research attempts, various research attempts are being
implemented and studied in order to establish a process to automate the maintenance
cycle of the roads. Nevertheless, the complete cost propositions of those models are
not completely justified. Also, adding to this complexity challenge, road conditions
are captured in various lighting conditions and using variety of capture devices,
disjointed by quality and method of capturing. A number of research attempts are
carried out in order to detect the road condition based on the potholes. Nonetheless,
the detection process is highly time complex and makes the maintenance process
delayed [4–6], Further, majority of the parallel research outcomes failed to measure
A Composite Framework for Implementation of ICT Enabled Road … 467
multiple potholes in a single image and cannot distinguish the potholes based on the
emergency of repair. Thus, this work defines a newer dimension of pothole detection
for road images, which contains higher number of potholes in a single image and
makes the process faster by reducing the change of false detection. It is being observed
that the preventing measures on the road repair can make the road surface last longer
and can reduce significant time for the maintenance needed for high damage rebuild
operations. Nonetheless, the potholes causing major delinquent on the road surfaces
can easily visible but the cracks on the road, which will significantly, become a
pothole cannot be always seen by human eyes.
Henceforth, after the understanding of the problem, in the next section of this
paper, the proposed architecture is presented.
• This work extracts the parameters for determining the potholes existence as the
major outcome.
• Yet another outcome of this work is to classify the potholes based on the urgency
of repair.
• The outcome of the work is to automate the detection facility to provide a timely
maintenance alert and deliver a better road condition in India.
• In addition, the maintenance tasks demand a suitable condition of the weather,
which is difficult to predict. Situations have proven that the maintenance work
started with no knowledge of the weather had to abort and caused further delay in
the task resulting into further decay in the road conditions. Thus, it is the demand
of the recent research to provide the prediction of the road condition in order
to detect the potholes to be given higher priority, cracks to be considered for
immediate repair and patch works to be ignored during the automation.
• The major outcome of this work is to build an automated framework to analyse and
predict the road damages and recommend the schedule maintenance tasks with
100% accuracy in order to make the world with better surface transport capable.
The proposed algorithms are already been discussed in the other works by the
same author [1–3].
Henceforth, in the next section of this work, the proposed framework is compared
with the other parallel research outcomes.
3 Comparative Analysis
As this work is already been demonstrated in various parts in the previous sections,
the final comparative analysis is carried out in this section of the work [1]. Firstly,
from research objective—1, the noise reduction comparisons are furnished in Table 1.
Dara, A.K., and Govardhan, A. [2] the results are visualized graphically [6–9]
in Fig. 2. Secondly, from research objective—2, the clustering comparisons are
furnished in Table 2.
The results are visualized graphically in Fig. 3.
Dara, A.K., and Govardhan, A. [3] finally, from research objective—3 [9–12], the
prediction accuracy comparisons are furnished in Table 3.
The results are visualized graphically in Fig. 4.
4 Conclusion
In order to match the current trend of research, this work proposes a novel frame-
work for predicting the road accident-prone zones on a live map. This work maps
the zones with the coordinates such as longitude and latitude from the maps. To
achieve the higher accuracy of the prediction, in the first phase of the research,
this work deploys three algorithms such as Adaptive Moment-Based Spatial Image
Noise Detection and Removal Algorithm (AMBSI-NDR) for reduction of the noises
from the image data, which is separated from the spatial data, Adaptive Logistic
Correlation-Based Missing Value Identification and Replacement Algorithm (ALC-
MVIR) for missing value reduction method from the textual data extracted from
A Composite Framework for Implementation of ICT Enabled Road … 471
References
1. Dara, A.K., Govardhan, A.: Noise reduction in spatial data using machine learning methods for
road condition data. Int. J. Adv. Comput. Sci. Appl. 11. https://doi.org/10.14569/ijacsa.2020.
0110120
2. Dara, A.K., Govardhan, A.: Parametric extraction of the road conditions spatial data and detec-
tion of defeats using pragmatic clustering method. Int. J. Eng. Adv. Technol. (IJEAT) 9(3)
(2020). ISSN 2249 – 8958
3. Dara, A.K., Govardhan, A.: Detection of coordinate based accident-prone areas on road surface
using machine learning methods. Int. J. Comput. Eng. Inf. Technol. (IJCEIT) 12(3) (2013).
E-ISSN 2412-8856
4. Ertürk, A., Çeşmeci, D., Güllü, M.K., Gerçek, D., Ertürk, S.: Integrating anomaly detection to
spatial preprocessing for endmember extraction of hyperspectral images. In: Proceedings of
IEEE Geoscience and Remote Sensing Symposium (IGARSS), pp. 1087–1090 (2013)
5. Ertürk, A., Çeşmeci, D., Güllü, M.K., Gerçek, D., Ertürk, S.: Endmember extraction guided
by anomalies and homogeneous regions for hyperspectral images, IEEE J. Sel. Topics Appl.
Earth Observ. Remote Sens. 7(8), 3630–3639 (2014)
6. Kanarachos, S., Christopoulos, S.R.G., Chroneos, A., Fitzpatrick, M.E.: Detecting anomalies
in time series data via a deep learning algorithm combining wavelets neural networks and
hilbert transform. Expert Syst. Appl. 85, 292–304 (2017)
7. Bello-Salau, H, Aibinu, A.M., Onumanyi, A.J., Onwuka, E.N., Dukiya, J.J., Ohize, H.: New
road anomaly detection and characterization algorithm for autonomous vehicles. Appl. Comput.
Inf. (2018). [online] Available https://doi.org/10.1016/j.aci.2018.05.002
8. Bayer, F.M., Kozakevicius, A.J., Cintra, R.J.: An iterative wavelet threshold for signal
denoising. Sig. Process. 162, 10–20 (2019)
9. Azhar, K., Murtaza, F., Yousaf, M.H., Habib, H.A.: Computer vision based detection and
localization of potholes in asphalt pavement images. In: 2016 IEEE Canadian Conference on
Electrical and Computer Engineering (CCECE), pp. 1–5 (2016, May)
10. Cai, Y., Wang, H., Chen, X., et al.: Trajectory-based anomalous behaviour detection for
intelligent traffic surveillance. IET Intell. Transp. Syst. 9(8), 810–816 (2015)
11. Wang, H., Klaser, A., Schmid, C., et al.: Action recognition by dense trajectories. In: Proceed-
ings of IEEE International Conference on Computer Vision and Pattern Recognition, Colorado
Springs, CO, USA, pp. 3169–3176 (2011)
12. Mahmood, Z., Haneef, O., Muhammad, N., et al.: Towards a fully automated car parking
system. IET Intell. Transp. Syst. 13, 293–302 (2019)
VISION AID: Scene Recognition
Through Caption Generation Using Deep
Learning
Abstract Visually impaired individuals heavily trust their alternative senses like
acoustic signals and touch to comprehend the world outside. It is incredibly tough for
a visually handicapped individual, to perceive objects without feeling them. But there
could be times when physical contact between the individual and the object is risky or
deadly. This proposed paper presents a real-time object recognition application to aid
the visually impaired. A camera linked mobile phone with systematised orientation,
given as input to a computer device for generation of real-time object detection. The
proposed project utilises a convolutional neural network (CNN) to recognise pre-
trained items in captured imagery and uses recurrent neural network (RNN) with
LSTM for generation of captions. Here, the caption dataset is utilised for the training
of captioning model. After the training, these neural models can generate captions
of objects. The network output can then be analysed to impart to those with visual
impairment. This is put forth in audio format by converting the generated captions to
audio. Exploratory outcomes on the MS-COCO dataset show that our design beats
the best in class.
1 Introduction
In this age, where most applications solely benefit the healthy ones, it is essential
to create a device for guiding the visually challenged. Generally, these impaired
individuals depend on the assistance of others to guide them through. Unfortunately,
there could be scenarios, where help may not be easily available or the blind may
get fooled.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 473
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_51
474 M. Regi and M. Abraham
Taking these issues into consideration, it was proposed to design a modern appli-
cation favouring the visually impaired. In this technological era, with strides towards
progress in every sphere, the blind must not get left behind. This application aims
to provide them with a better understanding of the world around. Currently, a few
things like spectacles, the Braille or even a walking stick are used to tide over the
impairment and move on with their lives.
This proposed project utilises convolutional neural network model for recognising
objects along with recurrent neural network for generating captions. Its imagery is
described automatically to the blind person by converting the generated text-to-
speech, devoid of external help.
2 Related Works
Various methods have been designed for generating captions from images. This
section includes some of the important works done in the area of caption generation
using deep learning techniques.
Khademi et al. [1] propose a contextual and focussed deep architecture for the
caption generation of images. The proposed architecture uses a bidirectional grid
LSTM. This captures visual aspects of an RGB imagery as input and learns its
intricate space patterning based on a dual-dimensional background, by choosing
or disregarding its input. Often, region-grounded versions elucidate features of
those entities and their link in the images. For caption generation of images, it
integrates characteristics from grid LSTM with these versions, utilising dual-layer
bidirectionally.
A new approach based on region-based deep learning [2] method is recommended
to generate captions for imagery. This consists of recurrent neural network (RNN)
attribute predictor, region-based object detector, encoder-decoder language producer
fixed with dual RNNs to create meaningful explanations of the given imagery. It uses
R-CNN architecture to detect objects and encoder-decoder RNN model to generate
sentences. The IAPR TC-12 dataset enables the evaluation process.
In paper [3], multilayer dense attention architecture is proposed to generate image
captions. Faster R-CNN is used to obtain imagery features, and LSTM helps decode
the multilayer dense attention architecture. Thus, caption text is produced. The
model’s overall architecture is performed on encoder-decoder format, which is split
into two levels: bottom-up attention and top-down attention. The first mechanism is
proposed to extract image regions, and the second mechanism is to produce the rele-
vant captions in each time series. It is evaluated on various datasets like MS-COCO,
Flickr, Chinese-AI.
The method proposed in paper [4] uses a cascade recurrent neural network
(CRNN) to generate image captions. CRNN uses a cascade network to generate the
captions of images. This network can utilise in-depth meaningful contexts present
in the imagery. Unlike the conventional MRNN, CRNN comprises front-end and
back-end network, linked to obtain visual language interfaces from two sides. Here,
VISION AID: Scene Recognition Through Caption Generation Using … 475
a stacked gated recurrent unit is made with dual concealed levels that stretch the
verticality of RNN and thus obtains meaningful correlation between images and
sentences. Its back-end network has been developed to extract semantic context by
front and back directions to predict words. It transfers the acquired knowledge in
the front as initial setting and feeds sentences in reversal in the back-end system.
Efficacy of CRNN is confirmed by MS-COCO datasets.
3 Proposed System
This proposed system is a real-time scene capturing application for those vision
impaired individuals to guide them through. This application will capture the image
of a scene and deliver the description of the scene as an audible format. Hereby, they
understand what objects are in front of their surrounding through a camera aligned
smartphone and thus reduce the risk of accidents.
The layout of this recommended device is explained in Fig. 1. Initially, the user
on shaking the mobile activates the application, and the camera starts taking pictures.
Then, the picture is taken to the server, where the weight file is stored for predicting
the caption. The MS-COCO dataset [5] is employed to train the network.
For generating captions, a lot of image data is essential. Varied image datasets like
Flickr30k, Flickr8k, MS-COCO, SBU, Pascal and more can be easily accessed. MS-
COCO is the latest and possibly the most popularly utilised and systematised dataset.
It has 82,783 images for the training and 40,504 for both testing and validation. Each
one of the images consists of five captions. Current model is used to train with the
MS-COCO dataset. This is used extensively for network testing and training.
Initially, a pre-trained convolutional neural network (CNN) with VGG19 architec-
ture is used for preprocessing the image, and the output is given to RNN to generate
descriptions for images. Subsequently, the generated captions are saved in a text file
and given over to the mobile. Next, the caption in text format is converted into speech
by using text-to-speech API and given back to the visually impaired user. The major
steps of this proposed system is described herewith.
by a set of 4 convolution levels having 256 filters. Next, 2 sets have 4 convolution
levels each, having 512 filters. Max pooling layers between each set of convolution
layers have 2 × 2 filters with 2 pixels. The yield of the last pooling layer is levelled
out and send to a fully connected layer, whose output is fed to another similar layer
with 1000 neurons. All these layers are ReLU activated. Finally, there is a softmax
layer that outputs a vector which represents the probability distributions of a list of
outcomes. Convolution layers and fully connected layers are trainable weights. Max
pool layer helps decrease the size of input imagery, where softmax is utilised for the
final decision making.
The system takes a (224, 224, 3) RBG image as the input from the MS-COCO
dataset for training. After training, the network is capable of detecting an object in
the scene.
Here, we utilise RNN-based attribute classification [7]. Research shows that RNNs
benefit varied spheres of machine learning, involving caption generation, device
transformation, etc. RNNs are employed here because of their capability in effectively
predicting the attributes that must be reported for the prescribed set of features.
Besides, RNNs utilised for this work are word based and uses LSTM [8] architec-
ture. At the point of testing, CNN helps to obtain imagery features. These are then
used for predicting multiple characteristics, one at a time. This prediction depends on
those extracted image aspects in combination with previously created terms. It goes
on producing appropriate features till the assigned STOP is generated, i.e., when
RNN concludes that no other attribute can be utilised for describing imagery, with
all its aspects and those formerly produced features (Table 1) [2].
The captions generated thereof are more explanatory than those brought forth by other
prevailing research studies, in terms of features and object recognition details. Hence,
ideally, the MS-COCO dataset is utilised for the purpose of training and evaluation.
Owing to its descriptions, this proposed system is superior when compared to other
popular databases.
478 M. Regi and M. Abraham
Both CNN and RNN are utilised to generate captions [9, 10]. Hence, the network is
trained through the MS-COCO dataset. Here, each image in particular is combined
with five captions. To hasten training, each imagery is pre-encoded to its feature
series. Since the caption may contain a large number of unique terms, word encoding
is not used. But, the trained entrenched architecture outputs the word into a vector
like (1, 128). LSTM architecture is used to generate captions. During the training
period, the network learns how to develop descriptions for images through analysis
of the provided dataset.
After training the network, a weight model is formed which contains all the learned
weights of the network. The vector format test image is fed as input to weight model
to create the captions.
Overall, both the CNN- and RNN-based object and attribute estimations are very
efficient in classifying high-meaningful sentence generation.
The proposed caption generation mobile app is developed through react native, a
JavaScript framework. It is used to write veritable, localised mobile apps for android
and ios devices. Based on react Facebook’s JavaScript library, this helps to build user
interactions, yet rather than target web browsers, it aims at mobile platforms. React
native easily enables simultaneous development for both android and iOS.
Our application is created to be work on android devices. As a blind aid the user
can activate the application by gesture like shaking the mobile. Then, the sensor event
listener consists of the sensor manager which notifies whenever it receives sensor
data. The two variables used to detect whether shake has occurred are the sensor
accelerometer and sensor manager. The camera activity commences if the threshold
is more than value of threshold set, or else it will not trigger the camera activity. The
camera activation enables the user to capture images.
The saved mobile imagery is fed as input into a trained system that generates
captions for images. Then, this caption is saved into a text document. Subsequently,
this is directed into the mobile and converted into speech. The text-to-speech API is
used to convert the generated captions into an audible format for the blind, which is
similar to human speech.
4 Evaluation Results
For evaluating the performance of this system to generate image captions, we use the
BLEU score to compare our model with other existing models. There are different
evaluation metrics like BLEU, ROUGE, METEOR, etc. for evaluating description
generation.
VISION AID: Scene Recognition Through Caption Generation Using … 479
5 Conclusion
Today, modern technology has grown by leaps and bounds. This can be harnessed
efficiently for creating a device that will aid the blind to live fuller lives. The proposed
480 M. Regi and M. Abraham
system will provide them with a better understanding of their surroundings and make
them more independent. So our project aims to develop a user friendly application
that can guide the visually impaired in our society. The proposed system focuses
on generating captions for varied images. This android application will generate
meaningful sentences for images captured by the camera aligned smartphone of the
blind user. It then speaks out the captions formed for the benefit of the visually
impaired.
As future work, this can work more precisely generate captions if it transferred
to video file. Some issues like out-of-focus or blurred imagery could be solved by
utilising the video as an input to the system. The efficacy of the network enables to
tide over any delays in caption generation for images fed as input to the server. The
accuracy of prediction can be increased through high-quality datasets and efficient
training.
References
1. Khademi, M., Schulte, O.: Image caption generation with hierarchical contextual visual spatial
attention In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Work-
shops (CVPRW), Salt Lake City, UT pp. 2024–20248 (2018). https://doi.org/10.1109/CVPRW.
2018.00260
2. Kinghorn, P., Zhang, L., Shao, L.: A region-Based Image Caption Generator with Refined
Descriptions. Elsevier (2018). https://doi.org/10.1016/2017.07.0140925-2312/2017
3. Wang, E.K., Zhang, X., Wang, F., Wu, T., Chen, C.: Multilayer dense attention model for
image caption. In: 2019 IEEE Access 7, 66358–66368. (2019). https://doi.org/10.1109/ACC
ESS.2019.2917771.
VISION AID: Scene Recognition Through Caption Generation Using … 481
4. Wu, J., Hu, H.: Cascade recurrent neural network for image caption generation. Electron. Lett.
53(25), 16421643 (201, 7th December). (IEEE)
5. Lin, T.Y., et al.: Microsoft coco: Common objects in context. In: Fleet, D., Pajdla, T., Schiele, B.,
Tuytelaars, T., (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer
Science, vol 8693. Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1_48
6. ResearchGate, Fig. 8 Illustration Of The Network Architecture Of VGG-19 Model: conv
means convolution, FC means fully connected , https://www.researchgate.net/figure/llustr
ation-of-the-network-architecture-of-VGG-19-model-conv-means-convolution-FC-means_
fig2_325137356
7. Wu,Q., Shen, C., Wang, P., Dick, A., van den Hengel, A.: Image captioning and visual question
answering based on attributes and external knowledge. IEEE Trans. Pattern Anal. Mach. Intell.
(2017). https://doi.org/10.1109/tpami.2017.2708709
8. Poghosyan, A., Sarukhanyan, H.: Long short-term memory with read only unit in neural image
caption generator. IEEE Comput. Sci. Inf. Technol. (2017). https://doi.org/10.1109/csitechnol.
2017.8312163,2017
9. Kumar, N.K., Vigneswari, D., Mohan, A., Laxman, K., Yuvaraj, J.: Detection and recognition of
objects in image caption generator system: a deep learning approach. In: 2019 5th International
Conference on Advanced Computing & Communication Systems (ICACCS), Coimbatore,
India, 107–109 (2019). https://doi.org/10.1109/ICACCS.2019.8728516
10. Luo, R.C., Hsu, Y., Wen, Y., Ye, H.: Visual image caption generation for service robotics and
industrial applications. In: 2019 IEEE International Conference on Industrial Cyber Physical
Systems (ICPS), Taipei, Taiwan, 827–832 (2019). https://doi.org/10.1109/ICPHYS.2019.878
0171
11. Re, Z., Wang, X., Zhang, N., Lv, X., Li, L.-J.: (2017) Deep reinforcement learning based image
captioning with embedding reward. In: 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) (2017). IEEE. https://doi.org/10.1109/cvpr.2017.128
12. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.:
Show, attend and tell: neural image caption generation with visual attention. In: Proceedings
of the International Conference on Machine Learning, pp. 20482057 (2015)
13. Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual
sentinel for image captioning.In: 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Honolulu, HI, 3242–3250 (2017). https://doi.org/10.1109/CVPR.201
7.345
Effect of Hybrid Multi-Verse with Whale
Optimization Algorithm on Optimal
Inventory Management in Block Chain
Technology with Cloud
Abstract One of the important works of supply chain management is the optimal
inventory control. The optimal inventory control techniques plan to minimize the
supply chain cost by efficiently managing the inventory. This paper tactics to analyze
the influence of hybrid Multi-Verse Optimization (MVO) and Whale Optimization
Algorithm (WOA) termed as Whale-based Multi-Verse Optimization Algorithm (W-
MVO) on optimal inventory management in block chain under cloud sector. The costs
like transaction cost, inventory holding cost, shortage cost, transportation cost, time
cost, setup cost, back-ordering cost and quality improvement cost are considered
for deriving the multi-objective model. The effectiveness of the proposed hybrid
algorithm is analyzed by varying Travelling Distance Rate (TDR) from 0.2 to 1.2,
and the model is evaluated with the assistance of block chain under the cloud sector.
1 Introduction
The continuity of organizations in this provident world is about the merit of control-
ling inventories. In most of the fabricating organizations, there must be few kinds
of inventory varieties like; effectiveness of material on the technique that is not
yet completed raw materials that are progressing to be sorted via generation and
finished outcomes for sales that are managed for the organization. The enhance-
ment of the organization is achieved by the best inventory control strategies [1].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 483
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_52
484 C. Govindasamy and A. Antonidoss
2 Literature Review
Although there were several inventory management models, still there exist various
challenges that have to be resolved in the future. Integrated research framework
[2] enhances the behaviour of inventory control systems, and it also reduces the
inventory-related costs. But, it does not consider the inconsistent and changing dates
of expiry in various groups of received orders, and it also does not consider the up-to-
date behavioural factors in healthcare. The two-stage stochastic programming model
[3] is sufficiently adaptable, and it also reduces the present target levels to minimize
the total cost and wastage. Still, it does not involve the feasibility regarding the
hospital in a network. Holistic Mixed Integer Linear Programming (MILP) model
[4] permits the dynamic inventory management, and interacting pumping runs, and it
also quickly proves the optimality. However, it does not manually assign the starting
products inside the pipeline, and it influences both the Central Processing Unit (CPU)
time and the solution quality. Simultaneous Equation Modelling (SEM) [5] provides
the concurrent and associated relationship among the demand simulation impact and
the sales impact, and it also provides various methods for producing the resulted
goods. Yet, it has limitations of data availability, and it also does not differentiate
and compute the predicting accuracy of every system. Continuous-time scheduling
model [6] accomplishes the optimization of depot inventory management and multi-
product pipeline transportation. But there exist few challenges among the realistic
applications and the developed work. These challenges motivated to analyze the
influence W-MVO on optimal inventory management in block chain under the cloud
sector.
Effect of Hybrid Multi-Verse with Whale Optimization Algorithm … 485
Parameters of the inventory cost: The term ho12 j represents the holding cost of
1
the final product j which represents manufacturer. tcn1k indicates transportation cost
of raw material k from the supplier n to manufacturer. scoj denotes shortage cost
2
of the product j for the distributor o. oc1oj denotes the fixed order cost of the final
1
product j from the distributor o to manufacturer. honk represents the holding cost
3
of raw materials at the supplier n. hooj denotes holding cost of the last product
2
at the distributor o. tc1oj represents transportation cost of the final product j from
1
manufacturer to distributor o. ocn1k represents the fixed order cost of raw material k
from manufacturer to the supplier n. deoj (tim) represents the need for raw materials
from the manufacturer during the time tim.
I n 21 j (tim) denotes real-time inventory of completed product during the time tim.
I n nk (tim) represents the real-time inventory of raw material k at supplier n during
1
the time tim. I n 3oj (tim) represents real-time inventory of completed product for
distributor o during the time tim(I n 1nk (tim), I n 21 j (tim) and I n 3oj (tim) represent the
non-negative integers).
2
Parameters of the time cost: ve1oj denotes the delayed transportation cost of
2
the final product j from the manufacturer to distributor o. tr1oj denotes the delayed
1
transit time of the final product j from the manufacturer to the distributor o. ven1k
represents the delayed transportation cost of raw material k from the supplier n to
1
manufacturer. trn1k represents the delayed transit time of raw material k from the
supplier n to the manufacturer. k, k = 1, 2, . . . , K denotes the index number of raw
material; n, n = 1, 2, . . . , N denotes the index number of supplier inventory; j, j =
1, 2, . . . , J represents the index number of the final product; u, u = 1, 2, . . . , U
represents the index number of time period; o, o = 1, 2, . . . , O represents the index
number of distributor inventory.
Parameter initialization of remaining costs: Let the cost of each item be repre-
sented as I c1 , I c2 , . . . , I c j , where j represents the finished product. The additional
cost to improve quality is represented as Ac1 , Ac2 , . . . , Ac j , in which j represents
486 C. Govindasamy and A. Antonidoss
the finished product. The supplier setup cost is represented as As1 , As2 , . . . , Asn ,
where n denotes the number of suppliers. The manufacturer setup cost is represented
as Am. The distributor setup cost is represented as Ad1 , Ad2 , . . . , Adn , where n
represents the number of distributors.
Distributor
Supplier Manufacturer
Cloud
Block
chain
Cost function
Transaction cost
Shortage cost
Transportation cost
Time cost
Setup cost
costs are minimized, and then, the finally obtained optimal solution is linked to every
distributor and stored in the cloud with the help of block chain. This completed
optimal solution of each distributor is safeguarded and could not display to any
distributors.
The usage of optimization algorithms gained high attention between scientists [8].
Multi-Verse Optimization (MVO) [9] is inspired by the abundant big bang that
follows to the conveyance of circle. Although there are several advantages, it suffers
from various disadvantages such as the binary version, and the multi-objective
method is not achieved. Therefore, to overcome the disadvantages, Whale Optimiza-
tion Algorithm (WOA) is integrated into it, and the resultant algorithm is known as
W-MVO. WOA [10, 11] is a nature-inspired meta-heuristic technique, which has the
capacity of handling different problems. When it is differentiated with other types
of optimization algorithms, WOA has many advantages like exploration capability,
exploitation capability, etc. optimization techniques or procedures are integrated to
generate a hybrid optimization algorithm. Generally, in the conventional MVO, if
ran2 < WEP the mechanism is updated using Eq. (1) and if ran2 ≥ WEP, then the
same solution is used. But in the proposed W-MVO if ran3 < 0.5, the solution is
updated using Eq. (1) of MVO. Otherwise, if ran2 ≥ WEP, then the location of the
individual is updated using the WOA based on Eq. (2).
⎧
⎪
⎨ C g + TDR × ubg − lbg × ran4 + lbg ran3 < 0.5 ran2 < WEP
c gp = C g − TDR × ubg − lbg × ran4 + lbg ran3 ≥ 0.5
⎪
⎩
c gp ran2 ≥ WEP
(1)
In the above equation, r represents the iteration, c represents the solution, c repre-
sents a constant, c∗ represents the location of prey, H = |
c ∗ (r ) − c(r )| denotes
the distance of whale to prey, ran represents random number in the interval range of
[−1, 1], and · represents element-by-element multiplication.
488 C. Govindasamy and A. Antonidoss
N
K
O
J
O= 1
On1k + 2
O1oj (3)
n=1 k=1 o=1 j=1
2. Inventory holding cost: It involves the cost of the manufacturer, suppliers and
distributors as in Eq. (4).
N
K
J
O
J 3
hooj ·
H= 1
honk · In1nk (tim) + ho12 j · In21 j (tim) + (4)
n=1 k=1 j=1 0=1 j=1 I n 3n (tim)
3. Shortage cost: It is defined by the need of raw materials from the manufacturer
deoj (tim), shortage cost scoj and the real-time inventory of the final product
In2oj (tim). In2oj (tim) as in Eq. (5).
O
J
S= scoj · deoj (tim) − I n 2oj (tim) (5)
o=1 j=1
Effect of Hybrid Multi-Verse with Whale Optimization Algorithm … 489
N
K
O
J
Tr = 1
tcn1k · I n 1nk (tim) + 2
tc1oj · I n 21 j (tim) (6)
n=1 k=1 o=1 j=1
5. Time cost: It involves the time cost between distributors and manufacturers and
between the manufacturer and the suppliers as in Eq. (7).
N
K
O
J
T = 1
ven1k · trn1k
1
+ 2
ve1oj · tr1oj
2
(7)
n=1 k=1 o=1 j=1
6. Setup cost: It is defined as the cost sustained to get equipment prepared to process
a divergent quantity of goods as in Eq. (8).
⎡ ⎤
K
J
N
O
⎢ tcok ⎥
Sec = Asn tcn1k · I n 1nk (tim) + Am + Ado ⎢
⎣ j=1
⎥
⎦ (8)
n=1 k=1 o=1
·I n 2k j (tim)
K
QIC = Pk · Ac j (9)
k=1
K
BOC = Icj · bj (10)
k=1
Here, I c represents the item cost, and b denotes the number of backorders.
The multi-echelon supply chain inventory model’s objective function is given in
Eq. (11).
The developed inventory management in block chain technology under cloud selector
was performed in MATLAB 2018a, and analysis was executed. The characteristics
of the developed technique were analyzed by taking into account three test cases.
The total population size was considered as 10, and the maximum rounds performed
were 1000. The behaviour of the developed W-MVO was differentiated based on
the analysis of proposed W-MVO and statistical analysis by varying the travelling
distance rates as 0.2, 0.4, 0.6, 0.8, 1.0 and 1.2.
6 Conclusion
This paper analyzed the influence of hybrid W-MVO on optimal inventory manage-
ment in block chain under the cloud sector. The costs like transaction cost, inventory
holding cost, shortage cost, transportation cost, time cost, setup cost, back-ordering
cost and quality improvement cost were considered for deriving the multi-objective
model. The effectiveness of the proposed hybrid algorithm was analyzed by varying
the TDR value, and the model was evaluated with the assistance of block chain under
the cloud sector. Moreover, from the analysis, the cost function of the proposed W-
MVO is maximum at TDR = 1.2. Thus, it can be concluded that the proposed
Effect of Hybrid Multi-Verse with Whale Optimization Algorithm … 491
Fig. 3 Algorithmic analysis of the proposed W-MVO for inventory management in block chain
under cloud sector by varying the TDR for ‘a test case 1, b test case 2 and c test case 3’
W-MVO-based block chain under the cloud sector performed effectively when it
was analyzed with various TDR values.
References
6. Yu, L., Chen, M., Xu, Q.: Simultaneous scheduling of multi-product pipeline distribution and
depot inventory management for petroleum refineries. 220 (2020)
7. Wang, Y., Geng, X., Zhang, f., Ruan, J.: An immune genetic algorithm for multi- echelon
inventory cost control of IOT based supply chains. IEEE Access. 6, 8547–8555 (2017)
8. Rajakumar, B.R.: Impact of static and adaptive mutation techniques on genetic algorithm. Int.
J. Hybrid Intell. Syst. 10(1), 11–22 (2013)
9. Mirjalili, S., Mirjalili, S.M., Hatamlou, A.: Multi-verse optimizer: a nature-inspired algorithm
for global optimization. Neural Comput. Appl. 27, 495–513 (2016)
10. Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016)
11. Beno, M.M., Valarmathi, I.R., Swamy, S.M., Rajakumar, B.R.: Threshold prediction for
segmenting tumour from brain MRI scans. Int. J. Imaging Syst. Technol. 24(2), 129–137
(2014)
Bottleneck Feature Extraction in Punjabi
Adult Speech Recognition System
Abstract In this paper, the bottleneck feature extraction technique with MLP is used
on Punjabi adult speech recognition. Nowadays, neural networks are most widely
used approaches for training and testing the system. It helps to recognize the back
probabilities among various phoneme set. This input info includes at some point
get wrapped, and it becomes difficult to prepare them on Hidden Markov Model
(HMM) based state-of-the-art synthesis. Here, context-based model is trained on
Deep Neural Network (DNN) and after that on Bottleneck-Neural Network (BN-
NN) system with the use of Multi-layer Perceptron (MLP). The baseline ASR is
performed with different environment conditions on different modelling system.
To improve the performance of a system, MLP-based supervised learning method
utilizing for adjoining voice outlines related data to change the design of profound
neural system DNN by extracting the bottleneck features. Finally, the MLP are used
as input for the DNN-HMM and BN-NN state-of-the-art system. This paper presents
the larger improvement obtained by applying the MLP feature vector with the relative
improvements of 4.03% which is achieved on the Punjabi ASR with varying the
several attributes associated with BN-NN and DNN-HMM modelling approaches.
S. Bala · V. Bhardwaj
Chitkara University Institute of Engineering & Technology, Chitkara University, Rajpura, Punjab,
India
e-mail: Shashi.bala@chitkara.edu.in
V. Bhardwaj
e-mail: vivek.bhardwaj@outlook.in
V. Kadyan (B)
Department of Informatics, School of Computer Science, University of Petroleum and Energy
Studies, Dehradun, India
e-mail: ervirenderkadyan@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 493
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_53
494 S. Bala et al.
1 Introduction
Neural networks have become a part of day to day human life from past few years
as we are rigorously moving towards human and machines interaction. However, it
has been remaining a major concern for many researchers that how to recognize the
pattern of a speech process. Need of pattern recognition evolved many techniques
like HMM, GMM [1–3]. Other than the probabilistic bottleneck approach [4–6] for
HMM-GMM [1–3] system (e.g. MLP based) acoustic modelling that have been addi-
tionally investigated as an elective methodology of Hidden Markov Model system
[1–3]. With regards to evaluate back likelihood, NN’s-based feature extraction with
two hidden layers can be considered as a procedure of non-direct feature transfor-
mation. While BN-NN approach is utilized as a non-direct discriminative analysis
that can be interpreted as dimension reduction method of state-of-the-art framework.
The BN-NN features are basically concatenated with MFCC which performs as the
output for posterior features. In the light of the ongoing achievement of profound
neural system in hybrid acoustic modelling, the initial step is the calculation of BN
features that are taken already []. In order to fuse state-of-the-art info in HMM-
GMM, as [4] appeared with the combination of BN-NN are various related ideas
resulted as a better performance of the system. Additionally, training the MLP-NN
on MFCC feature extraction with five-layer BN feature, where the neural network
evaluated as phoneme posteriors. Consequently, we have depicted the BN-NN feature
on MFCC features. To begin with, the output layer is optimized; second, the further
structure is used for MLP training for the output of DNN-HMM and BN-NN linear
transformation for Punjabi speech recognition.
The rest of this paper is organized as: Sect. 2 gives a related work performs
in BN features and a whole description of BN-NN in Sect. 3. In Sect. 4, whole
system overview is summarized on BN feature extraction. Section 5 describes the
experimental setup with corpus table. Result and analysis are reported in Sect. 6,
finally some conclusion in Sect. 7.
2 Related Work
In [5], author generated bottleneck features with the help of ANN structure and
has obtained 33.3% WER and has shown 2.5% better improvement over HLDA-
PLP baseline of state-of-the-art system. Also [7] presented a bottleneck features for
LVCSR dataset by some reduction in WER of the system, where [8] has presented
deep neural nets by exploring linear units for LVCSR utilizing Bayesian optimization
approach for relative improvement of the system. In [4], authors presented system for
preparing multilingual MLP’s, likewise, characterize the use of a language ward layer
traditional three layer’s which are used to drive phoneme posteriors. This method-
ology allows for sharing assets cross-wise over dialects without expecting to develop
common phoneme set. Likewise, Morgan et al. [9] proposed a novel method to
Bottleneck Feature Extraction in Punjabi Adult Speech … 495
manage multi-layer perceptron factor analysis using five-layer MLP with a normal-
ized linear bottleneck layer can outperform three-layer MLP system using MFCC
alone. Therefore, while taking about bottleneck, in [1], Michael et al. presented
DNN-based acoustic modelling for finding that they can match ASR system produc-
tion with Aurora without definite any noise. Kadyan et al. [3] have describe various
normalizing databases with using RASTA channel standardization of feature before
input to the MLP getting 18% relative improvement of WER.
3 Theoretical Background
3.1 Bottleneck
This approach is presented by Grezl et al. [10] which can be translated as a non-direct
dimensionality reduction method it fundamentally dependent on MLP approach,
where the internal layers has a small hidden unit, like the size of another hidden
layer. These layer makes a limitation/necessity in the system that must have the
option to produce compressed features after compelling the dimensionality reduction.
Therefore, bottleneck features can be derived using both unsupervised and supervised
method [11]. In supervised training, decoder is used to train acoustic model in several
languages and conditions [4–6, 10]. The system comprises of an encoder and a
decoder as shown in Fig. 1.
The input consists classifier with hidden vector x encoded to hidden layer h which
calculates the posterior probability over HMM state. x is encoded to hidden layer
h by a non-linear activation function σ, using learned weight matrix W (1) and bias
vector b(l) as follows:
Fig. 1 Structure of
bottleneck feature with
decoder
496 S. Bala et al.
h = w(1) x + b(1) (1)
after that, input layer is decoded from the hidden layer to produce a reconstructed
layer y using learned weight matrix W (2) and bias layer b(2) as follows:
y = w(1) h + b(1) (2)
The autoencoder parameter θ = (W (1) , b(1) ), (W (2) , b(2) ) is learned using back-
propagation algorithm by minimizing the mean square error (MSE) loss(m) as
defined:
1
MMSE (θ ) = m MSE (x, y) (3)
d
The learning process attempts to minimize the prediction error L (x, y) with respect
to the parameter θ = (W (1) , b(1) ), (W (2) , b(2) ), …, (W (L) , b(L) ). Typically, the loss
function in MLP is the cross-entropy error function [12]. Bottleneck features provide
more effective information while preserving enough information of the original input
features.
4 System Overview
The bottleneck-NN based Punjabi ASR has been described in Fig. 2. First, the system
is trained and tested on bottleneck-NN. For evaluating the accuracy, the front-end
Fig. 2 Block diagram summarizing the BN-NN features for enhancing Punjabi ASR system
Bottleneck Feature Extraction in Punjabi Adult Speech … 497
feature extraction technique MFCC is used in BN-NN based solution in ASR. So, to
improve the performance of the BN-NN-based ASR is trained on MLP features by
KALDI toolkit [13].
In training and testing phase, the stage of an input speech is handled by utilizing
20 ms window with an edge set of 100 Hz helping pre-emphasis factor 0.97. The
extricated input frames are changes over into frequency domain that is prepared by
DFT. It helped in expulsion of stage info from the output of short-term spectrum.
The Fourier sign is additionally gone through 25 channel banks. At long last, DCT is
utilizing to change the Mel-frequency which is effective to deliver numerous arrange-
ments of de-correlated cepstral coefficients. These coefficients are utilizing to expel
higher order data from it. Therefore, the output acquired is 13 default coefficients
with a splicing factor. A sum of 9 setting size edge with 4 left and right each has
been analyzed. The mail output got with 13 default MFCC is prepared with HMM
modelling; with following, the procedure of feature extraction, the monophones,
and + (triphone (delta + delta − delta)) are figured on tri2 models. Further,
these + are joined with a static cepstra to 39-dimensional size feature vector.
So, as to improve the performance of the framework: LDA and MLLT techniques
have been applied to implement 40-time spliced reduced features. Although, these
features further performed with fMLLR approach in tri4 model with speaker adap-
tive training which is used to handle these tri4 features. The output has been gotten
on triphone modelling, where the model has been given to baseline GMM-HMM,
DNN-BN approaches. In part of, GMM-HMM, three states HMM using an eight
corner to corner covariance mixture for each state used. A total of 2500 number of
leaved and 30,000 Gaussians a are selected. Further, DNN system is trained with
tanh non-linearity model with variation in hidden layers. For improvement of DNN-
HMM system, learning rate and epochs are employed on a mini batch size of 512
Besides.
5 Experimental Setup
For experiments, we employ two set of corpora, where 422 phonetically rich and
connected words generated from 5000 most frequent words. Later, these sets well
combined to form only one set. The replication of unique sentences and words will
further be produced by 20 different speakers. A roman transcription was done from
audio by keeping the aspect of linguistic characteristics of Punjabi language. There
are 7 male 13 female in the synthetically created dataset. Also, we divide the collected
dataset into two sets: 70% in train and rest 30% in test. Table 1 shown represents
the training and testing partitions of combined dataset. To analyze the performance
of obtained dataset, two parameter was employed, i.e. word error rate (WER) and
relative improvement (RI).
498 S. Bala et al.
Here, the result section is performed by specifying the accuracy of the system. To
obtain the performance of the ASR utilizing the BN-NN, DNN-HMM metrices like,
word Accuracy (WA) = 100 × ((TW − S − D − I)/TW) and word error rate (WER)
= (S + D + I)/TW.
For the entire dataset, the system is training, and testing was done using the corpus
of the system, i.e. speaker independent on training and speaker dependent in testing
phase. The acoustic models were trained on these datasets, and the size of comparing
language model 5 k was utilized. An input speech signal was processed to generate
acoustic features using 13 static MFCC + Delta + Double Delta coefficients. It
was likewise seen that later linear discriminative analysis (LDA) reproduces these
separated acoustic features and essentially ad libbed the training of small vocabulary
dataset. The initial 13 MFCC features in combination with nine frames was resulted
into 117 dimensions which were further wrapped into 40 dimensions through LDA
approach. apart; it additionally attempted to utilize these features on HMM state
arrangements utilizing triphones models (Table 2).
To find the best results by passing the entire dataset through neural networks is not
enough. Basically, impulse in an input signal is caused due to its ’vocal folds’ that
will closer due to its pitch is called ‘epochs. The major occurrence in vocal tract was
due to its glottal pulse, where significant excitation was taking place at each epoch’s
location. The calculation simply involves working out the difference between the
observed output for each unit, then adding up all these squared differences for each
output unit and for each input signal [21]. Therefore, in Table 4, epochs values of
the audio files input are ranges from 15, 20, 25, and 30, where the system got finest
result on epochs_30. For BN-NN with no change in DNN-HMM system.
7 Conclusion
The work proposed here focuses on the effect of feature vector on Punjabi language
with BN-NN. To further, extent the effectiveness, these variations has been projected
on DNN-HMM system. Preceding model training, optimal value of learning rate and
epoch are selected to produce effective result. Overall, the system is evaluated on
original and synthetic speech corpus, where gain has been obtained thorough fMLLR
speaker adaption training of the network which gives the finest performance of the
system. The output of the system on BN-NN achieved a relative improvement of
3.33% over conventional GMM-HMM and DNN-HMM system.
References
1. Seltzer, M. L., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust
speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal
Processing, pp. 7398–7402 (2013)
2. Patel, I., Rao, Y.S.: Speech recognition using hmm with mfcc-an analysis using frequency
specral decomposion technique. Sig. Image Process. Int. J. (SIPIJ) 1(2), 101–110 (2010)
3. Kadyan, V., Mantri, A., Aggarwal, R.K.: Improved filter bank on multitaper framework for
robust Punjabi-ASR system. Int. J. Speech Technol. 1–14 (2019)
4. Yu, D., Seltzer, M.L.: Improved bottleneck features using pretrained deep neural networks. In:
Twelfth Annual Conference of the International Speech Communication Association (2011)
5. Grézl, F., Karafiat, M., Burget, L.: Investigation into bottle-neck features for meeting
speech recognition. In: Tenth Annual Conference of the International Speech Communication
Association (2009)
6. Grézl, F., & Karafiát, M.: Hierarchical neural net architectures for feature extraction in ASR. In:
Eleventh Annual Conference of the International Speech Communication Association (2010)
7. Veselý, K., Karafiát, M., Grézl, F.: Convolutive bottleneck network features for LVCSR. In:
2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 42–47 (2011)
8. Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using
rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech
and Signal Processing, pp. 8609–8613. IEEE (2013)
9. Morgan, N.: Deep and wide: multiple layers in automatic speech recognition. IEEE Trans.
Audio Speech Lang. Process. 20(1), 7–13 (2011)
10. Grézl, F., Karafiát, M., Kontár, S., Cernocky, J.: Probabilistic and bottle-neck features for
LVCSR of meetings. In: 2007 IEEE International Conference on Acoustics, Speech and Signal
Processing-ICASSP’07, vol. 4, pp. IV-757. IEEE (2007)
11. Valente, F., Magimai-Doss, M., Wang, W.: Analysis and comparison of recent mlp features
for lvcsr systems. In: Twelfth Annual Conference of the International Speech Communication
Association (2011)
12. Essays, UK. (November 2018). Speech Recognition using Epochwise Back Propagation. Int.
J. Comput. Appl. 0975 – 8887. Retrieved from https://www.ukessays.com/essays/computer-
science/speech-recognition-using-epochwise-8817.php?vref=
13. Yegnanarayana, B., Murty, K.S.R.: Event-based instantaneous fundamental frequency esti-
mation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624
(2009)
14. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks
for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1),
30–42 (2011)
Bottleneck Feature Extraction in Punjabi Adult Speech … 501
15. Bourlard, H., Morgan, N.: Continuous speech recognition by connectionist statistical methods.
IEEE Trans. Neural Netw. 4(6), 893–909 (1993)
16. Grézl, F., Karafiat, M., Janda, M.: Study of probabilistic and bottle-neck features in multilingual
environment. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding,
pp. 359–364. IEEE (2011, December)
17. Grezl, F., Fousek, P.: Optimizing bottle-neck features for LVCSR. In: 2008 IEEE International
Conference on Acoustics, Speech and Signal Processing, pp. 4729–4732. IEEE (2008)
18. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M.,
Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J.: The Kaldi speech recognition toolkit. In: IEEE
2011 Workshop on Automatic Speech Recognition and Understanding (No. CONF). IEEE Sig.
Process. Soc. (2011)
19. Rodríguez, L.J., Torres, I.: Comparative study of the baum-welch and viterbi training algo-
rithms applied to read and spontaneous speech recognition. In: Iberian Conference on Pattern
Recognition and Image Analysis, pp. 847–857. Springer, Berlin, Heidelberg (2003)
20. Senior, A., Heigold, G., Ranzato, M. A., Yang, K.: An empirical study of learning rates in deep
neural networks for speech recognition. In: 2013 IEEE International Conference on Acoustics,
Speech and Signal Processing, pp. 6724–6728
A Study of Machine Learning
Algorithms in Speech Recognition
and Language Identification System
Abstract Speech recognition is a broad topic that primarily involves sub-topics like
language identification, speaker identification, speech emotion recognition, speech to
text systems, text to speech systems, dialogue systems and much more. While, human
beings are quickly able to recognize or identify a language because of the corpora of
knowledge built over the years. However, it is a challenging task to have a machine
identify a spoken language. So, to build a system that can correctly identify multiple
languages irrespective of the dialect and speaker characteristics is an interesting area
of research. One benefit of such a LID system is that the barrier between people
caused due to language differences will be broken. Such a system will further the
progress of globalization. The latest developments of machine learning in speech
and language are described as a detailed state of the art in this paper.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 503
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_54
504 A. Mathur and R. Sultana
1.1 Preprocessing
1.1.1 Pre-emphasis
1.1.2 Framing
Framing is the method of diving a speech signal into frames. Each frame is usually 20–
30 milli seconds. The frames overlap each other for some milli seconds. A common
overlap time is 20 milli seconds.
1.1.3 Windowing
The classification stage involves selecting a classifier, passing the extracted features
to the classifier and identifying the language. Earlier research had involved using
only audio signal processing techniques for language identification. In other words,
only signal processing techniques were used for both stages in a LID system. A
common signal processing method used for classification purposes is vector quan-
tization method. However, as research progressed, many researchers began using
machine learning classification algorithms like Gaussian mixture models (GMM),
decision trees (DT), K-nearest neighbours (K-NN), support vector machines (SVM),
artificial neural networks (ANN) and deep neural networks (DNN). These machine
learning classifiers performed very well in a LID system. This also highlights the
application of machine learning in speech recognition. While the classification stage
of a LID system has used machine learning techniques, the feature extraction stage
predominantly uses signal processing techniques.
A Study of Machine Learning Algorithms in Speech Recognition … 505
There are various speech signal features that are extracted for discriminating between
languages. There are two types of features: low-level and high-level speech features.
The different types of low-level features are acoustic features, phonotactic features
and prosodic features. The different types of high-level features are lexical and
syntactic features. Most of the research is focused on using low-level features to
language identification.
Another stream of features called acoustic features is obtained through two tech-
niques: linear prediction and mel frequency cepstral coefficients (MFCC) based.
The linear prediction techniques give linear predictive coding (LPC), linear predic-
tion cepstral coefficients (LPCC) and perceptual linear predictive features (PLP). The
MFCC features are widely used in research because of their robustness and ability
of eliminating speaker-dependent features.
Furthermore, MFCC, PLP and RASTA-PLP are auditory features while MFCC and
LPCC are static features. Auditory features use filter bank method for extracting
features. Auditory features are inspired from human hearing system. Static features
involve dividing speech signals into frames to obtain static characteristics. These
static characteristics vary with time. Another notable feature of speech processing
is phonotactic feature involving study of phonemes and system. The smallest unit
in a language is phoneme and phoneme is used to construct meaningful parts in a
language. A phoneme itself does not have a meaning. Phonology is a field concerned
with the functioning of the sound in a language. This objective of the sound functions
is to make the speech meaningful. Prosodic features like melody (pitch), stress,
intonation, duration of speech, intonation and rhythm are extracted by research.
Lexical features are a type of high-level features and deal with a language’s word
structure. The research with lexical features primarily involves extracting words from
the speech and building word-level LID systems.
506 A. Mathur and R. Sultana
Syntactic features are concerned with the order of words and sentence structure in a
language. Not many LID systems have been constructed which use syntactic features.
Researchers have used the extracted features individually and in combinations in the
LID systems. Often researchers compensate noise from the input speech signal to
improve the performance of the LID system. It depends on the researcher if noise
should be compensated or not. However, recent researches have attempted to identify
sub-languages. For instance, the Indian subcontinent consists of many sub-languages.
Researches have tried to identify sub-languages such as Tamil, Hindi, Punjabi or
Assamese. The data set used by researchers primarily consists of speeches from
local news and radio broadcasts. For instance, researchers using Indian languages
derived their data set from All India Radio broadcast or Doordarshan Television
Network. Moreover, the data set consists of male and female speakers and speakers
with different dialect to add variability to the data set. The extracted features inputted
to the classification algorithm. The classifier first trains itself on these features and
then recognizes the language in an unknown audio signal. So, researchers have further
split the classification stage into learning and recognition phase. Various researches
have been conducted to improve the learning phase which in turn will improve the
performance of the LID system. Let us now look at chronological development of
research in language identification in recent years. The next four sections explain
few of the LID models and finally the conclusion.
The objective of the research [1] was to build a LID for three types of Indone-
sian languages. The research extracted high-level speech features and phonotactic
features. The research used two phonotactic feature extraction methods:
1. Phone recognition followed by language modelling (PRLM)
2. Parallel phone recognition followed by language modelling (PPRLM).
2.1 Methodology
The research analysed and compared the performance of the two phonotactic
methods. The input to the PRLM is a speech signal. The PRLM method first performs
phone recognition and then performs classification of the phone into the target
languages. The PRLM system consists of a single universal phone recognizer. The
universal phone recognizer is created using n-gram statistics model. That is, the like-
lihood of sequences of a phone appearing in a certain language is calculated. The
phones recognition, from a speech signal, tabulates log likelihood for each language.
A Study of Machine Learning Algorithms in Speech Recognition … 507
The data set is divided into training subset, development subset and test subset.
The research performed experiment on the n-gram models for PRLM and PPRLM
methods. In PRLM experimentation, the research trains for four spoken language
identification systems using Czech, English Hungarian and Russian and then tests
the systems. The four systems are tested on three Indonesian languages. The four
systems are tested with different n-grams statistical models. The value of n ranged
from 3 to 10. Confusion matrices for PRLM experiments are derived. It was observed
that English and Russian phone recognizers gave the highest accuracy of 77.42 and
75.94%, respectively. The PPRLM experiments consist of two language identifica-
tion systems. The first system creates interpolated models by using all phone recog-
nizers for Czech, English Hungarian and Russian. Then, the first system tokenizes
the phones. The second language identification system uses two phone recognizers
that give the highest accuracy in PRLM experiments. The research selected phone
recognizers of English and Russian as they had the highest accuracy in PRLM exper-
imentation. The two language identifications systems are also tested with the three
Indonesian languages. The two LID are also tested for the different n-gram statistical
models.
The research [2] proposed a language identification model that identifies the
following five languages: Arabic, Chinese, English, Korean and Malay. The data set
consisted of ten speakers and each of them spoke the different languages mentioned
earlier. So, the total number of recordings was 50 (ten speakers × five languages).
508 A. Mathur and R. Sultana
3.1 Methodology
The preprocessing is the first step in the LID. The preprocessing step consists of
amplification of the speech signal. The speech signal was amplified because it was a
weak signal, and it could not be used as an input. Another preprocessing procedure
was removing the silence in speech recordings and removing the background noise.
The pre-emphasis stage performed noise removal in the speech and emphasized
the higher frequencies in the speech signal. There were two ways to implement
pre-emphasis stage. One way was pre-emphasis as a fixed coefficient filter. The
second way was pre-emphasis as an adaptive coefficient filter. In the second way,
the coefficient was adjusted with time according to a speech’s autocorrelation value.
The pre-emphasis causes spectral flattening. This results in the signal being less
vulnerable to the finite precision effects in subsequent signal processing.
The speech was divided into frames of 50 milli seconds. The frames overlapped
every 20 milli seconds. The research assumed that speech signal was stationary over
each frame. The research increased the correlation of linear predictive coding (LPC)
in order to decrease the discontinuity between beginning and end of each frame.
This was done by windowing each frame. The research used Hamming window.
Then, the proposed system passed the windowed frames through fast Fourier trans-
formation and mel frequency warping. By doing so, the mel spectrum was obtained.
The logarithmic of mel spectrum gave MFCC. The model derived these features
because MFCC features are robust. Once the MFCC features were extracted, these
features were then passed to the classification stage. The research used vector quan-
tization (VQ) method as classifier. The VQ technique is a classic technique in audio
processing. The process of approximating feature vector that causes quantization of
multiple values is known as quantization process. The research created a codebook
which is used by VQ. This objective of the codebook is to work as a descriptor for
vector quantizer. The codebook contains a set of fixed prototype vectors, where each
vector is called a codeword. The VQ process matches the input vector to the codeword
in the codebook. To perform this task, the VQ method needs a distortion measure.
The index of the input vector codeword replaces the input vector. The index should
show the codeword with the smallest distortion in the codebook. So, minimization
of distortion is the goal of VQ technique.
A Study of Machine Learning Algorithms in Speech Recognition … 509
The research divided the data set into testing and training data set. The training data
set consisted of speech recordings from four males and one female. The testing data
set consisted of speech recordings from two males and three females. The research
optimized the frequency parameter and codebook size parameter and observed its
effects on recognition rate.
The audio files were set to two frequencies: 8 and 16 kHz. The recognition rate for
all the five languages was higher for 16 kHz sampling frequency than that of 8 kHz
frequency. The recognition rate is the ability of the classifier to correctly classify the
audio signals into the different languages. The average recognition rate for the five
languages was 78%. A limitation of the research was lack of experimentation with
machine learning classifiers such SVM, K-NN, ANN and K-means clustering.
The next research proposal [3] applied machine learning procedure to build a LID
that uses MFCC and K-NN, a machine learning classifier. The LID is used to identify
Arunachal languages.
The data set consisted of speech files from five types of Arunachal languages. Speech
recordings were taken from All India Radio local news broadcast. The data set
consisted of speech files is of duration of 4 min.
The first stage of the system is feature extraction stage. The research extracted MFCC
features. The MFCC features were extracted because MFCC has production and
perception of the speech similar to that of a human being. The logarithmic perception
of loudness and pitch is imitated by the MFCC. MFCC features do not include
speaker-dependent features. The MFCC feature extraction technique involves the
510 A. Mathur and R. Sultana
following steps: framing, windowing, discrete Fourier transformation, mel filter bank
and discrete cosine transformation (DCT). Framing is a process of dividing speech
signals into frames, and these frames overlap each other. The number of features in a
one minute of a speech signal is a sequence of 5000 13-dimensional feature vectors.
The discrete Fourier transformation is a process to convert speech signal from time
domain to frequency domain.
The classification stage follows the feature extraction stage. The research had chosen
K-NN algorithm for classification task. The MFCC features extracted are passed to
the K-NN algorithm for language identification of a speech signal. The training data
set consisted of 20 min of the speech file. The testing data set contained speech
samples of time length of 20 s.
The research further experimented by changing the test data. The test data was
changed from 20 s speech signals to 10 s speech signals. It was observed that the
correct prediction accuracies of Adi, Apatani, Galo, Idu and Tagin were 77%, 65.8%,
94%, 97% and 83%. The Adi language was misclassified into Apatani, Galo, Idu and
Tagin with a misclassification rate of 1.5%, 20%, 1.5% and 0.5%, respectively. The
Apatani language was misclassified into Adi, Galo, Idu and Tagin with a misclas-
sification rate of 14.6%, 14.1%, 1.4% and 3.9%, respectively. The Galo language
was misclassified into Adi, Apatani, Idu and Tagin with a misclassification rate of
3.6%, 1.2%, 0.2% and 0.9%, respectively. The Idu language was misclassified into
Adi, Apatani, Galo and Tagin with a misclassification rate of 0.6%, 0.4%, 0.8% and
0.2%, respectively. The Tagin language was misclassified into Adi, Apatani, Galo
and Idu with a misclassification rate of 5.3%, 5.1%, 4.5% and 1.8%, respectively.
The research did not explore other classification algorithms when MFCC features
are used.
Another similar work constructs [4] a language identification system that identifies
four different Indian languages: Tamil, Telugu, Hindi and Kanada using machine
learning algorithm.
A Study of Machine Learning Algorithms in Speech Recognition … 511
5.1 Methodology
The language identification system takes a speech signal as input and classifies a
speech signal into one of the four Indian languages by performing computations on
the speech signal. The classifiers that the research work practices are decision tree
and SVM.
The proposed language identification system consists of several steps: MFCC feature
generator, feature vectors, training data and classifier. The data set consists of audio
files in waveform audio file format (WAV). The speech recordings are obtained from
news broadcasts of Doordarshan Television Network. The data set consists of 5 h
of speech recording for each language. The data set is divided into two: testing
and training data. Before feature extraction, the speech files are preprocessed. The
preprocessing step involves removing silence in the speech recordings. This is done
by using short-term energy function.
The system proposed by the research has a feature extraction step that involves
extracting MFCC features. The MFCC features remove the harmonics from speech
signals thereby eliminating speaker-dependent characteristics. The MFCC extraction
technique involves the followings steps. The first step is framing the signal into short
frames. The frames are of length 20 milli seconds, and the frame shift is of time
length of 10 milli seconds. Then the next step is computing periodogram estimate
of the power spectrum. The output of this step is mel spectrum. Following this is
applying mel filter bank to the power spectra and summation of the energy in each
filter. This step is also called mel scale filtering. The output of mel scale filtering is mel
frequency spectrum. The next is tabulating the logarithmic of all filter bank energies.
The DCT is applied on the logarithmic of filter bank energies. The DCT coefficients
2–13 are kept and other are discarded. From the discrete cosine transformation, we
get the mel cepstral coefficients. Once the MFCCs are obtained, the MFCC feature
values are saved in a comma separated value file (CSV).
512 A. Mathur and R. Sultana
These optimal hyperplanes will perform data separation with minimal to no errors.
Support vectors are those training points that are closest to the optimal hyperplanes.
Each training sample is stored as a row in the CSV file. This CSV file is passed to
the support vector machines and decision tree classifiers for training. The research
assessed the performance of the classifiers by calculating the detection rate. The
classifiers classify the unknown test speech signals.
The detection rate is the ratio of number of correctly classified keywords to sum of
number of correctly classified keywords, number of incorrectly classified keywords
and number of rejected keywords. The accuracy for Tamil, Telugu, Hindi and Kanada
when SVM was used as classifier: 0.4, 0.2, 0.28 and 0.33, respectively. The accuracy
for Tamil, Telugu, Hindi and Kanada when decision tree was used as classifier: 0.8,
0.67, 0.2 and 0.22, respectively. The research obtained accuracies of 76% and 73%
when using support vector machines and decision trees, respectively. The research
only used one type of spectral characteristic, MFCC and did not explore other features
like prosodic features.
6 Conclusion
The current report studies the significant research in speech recognition has taken
place 2016–2020. From the current report, it can be observed that machine learning
has an application in speech recognition. Moreover, the interaction between machine
learning and audio signal processing is also significant. A typical language identifi-
cation system consists of feature extraction stage and classification stage. The data
set used by the researchers predominantly consisted of speech recordings from local
news broadcasts. Moreover, the researchers brought variability to their data set by
incorporating speeches by male and female speakers. Researchers have performed
various feature extraction techniques. Moreover, machine learning classifiers like
SVM, DT and NN have been used. Some types of neural networks that have been
used are artificial neural networks, probabilistic neural networks [5], deep belief
NN and FFBPNN. Some researchers have broken down the classification stage into
learning phase and recognition phase. These researches have tried to optimize the
learning phase by building learning models. Researchers have used extreme learning
machine approach to create learning models. Moreover, researches have tried to
A Study of Machine Learning Algorithms in Speech Recognition … 513
References
1. Safitri, N. E., Zahra, A., Adriani, M.: Spoken language identification with phonotactics methods
on Minangkabau, Sundanese, and Javanese Languages. In: SLTU, pp. 182–187 (2016, January)
2. Gunawan, T.S., Husain, R., Kartiwi, M.: Development of language identification system
using MFCC and vector quantization. In: 2017 IEEE 4th International Conference on Smart
Instrumentation, Measurement and Application (ICSIMA), pp. 1–4 (2017, November). IEEE
3. Nyodu, K., Sambyo, K.: Automatic identification of arunachal language using K-nearest
neighbor algorithm. In: 2018 International Conference on Advances in Computing, Communi-
cation Control and Networking (ICACCCN), pp. 213–216 (2018, October). IEEE
4. Venkatesan, H., Venkatasubramanian, T.V., Sangeetha, J.: Automatic language identification
using machine learning techniques. In: 2018 3rd International Conference on Communication
and Electronics Systems (ICCES), pp. 583–588 (2018, October). IEEE
5. Sulthana, A.R., Gupta, M., Subramanian, S., Mirza, S.: Improvising the performance of image-
based recommendation system using convolution neural networks and deep learning. Soft
Comput., 1–14 (2020)
Plant Leaf Disease Detection
and Classification Using Machine
Learning Approaches: A Review
Abstract Early detection of plant diseases will certainly increase the productivity of
agricultural products. In addition, identification of the type of diseases by which the
plant leaves are affected is a cumbersome task for human beings. Hence, in recent
years, image processing techniques with machine learning algorithms provide an
accurate and reliable mechanism to detect and classify the type of diseases in plants.
We delivered a comprehensive study on the identification and classification of plant
leaves using image processing and machine learning techniques. We presented a
discussion about common infections and followed a line of investigation scenarios
in various phases of the plant disease detection system. Finally, the problems and
future developments in this area are explored and identified. This review would help
investigators to learn about image processing and machine learning applications in
the fields of plant disease detection and classification system.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 515
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_55
516 M. V. Appalanaidu and G. Kumaravelan
these diseases. In these cases, farmers fail to identify those diseases. Therefore, the
necessity of an automated system to detect the type of plant diseases, and its severity
level becomes more critical.
Recently, image processing techniques with machine learning (ML) algorithms
prove to be a prominent technique for automatic plant leaf recognition and catego-
rization of diseases. Figure 1 shows the overall architecture of an automated plant
disease detection and classification system. It typically involves two-step practices.
The first step consists of image processing routines, namely image acquisition a
method to capture images of the infected parts of the plant leaf through RGB camera
followed by image pre-processing a method that removes noises in the captured
image through filters, followed by image segmentation a method that extracts the
diseased portion from the chosen image, and finally feature extraction a method to
set up the derived values from the segmented image. The second step consists of a
classification process through the ML algorithm, which predicts the healthy and the
infected plant leaves.
The organization of this paper is as follows: Section 2 presents a Categorization of
Plant Diseases. Section 3 describes various modules that are involved in the Process
of Plant Leaf Disease Detection and classification Systems. Section 4 discusses the
results of the previous research works. Section 5 concludes this paper along with
future work directions.
Image Feature
• Smart phone Preprocessing • K-means Extraction
• Camera • Noise • Color • Color
• Image Threshold • Shape
Enhancement • Texture
Image Image
Acquistion Segmentation
Classification Practice
Healthy Leaf
• SVM
• ANN
• KNN
Generally, plant diseases have caused by biotic and abiotic factors. Usually, biotic
factors are living organisms such as bacteria, fungi, and viruses, and abiotic factors
are non-living organisms such as excess temperature, insufficient sun lights, and
chemicals substances from an industry outlet. Nevertheless, abiotic factors are mostly
avoidable because they are less dangerous, non-infectious, non-transmissible. Thus,
the farmers are more worried about the biotic factors than abiotic factors that affect
the agricultural farm in terms of the quality and quantity of the crop products.
Figure 2 shows the various types of biotic factors such as bacteria, fungi, and
viruses through which the plant leaves are affected. Soft spot, spot, and wilt are
examples of bacterial diseases that normally affect plants like potato, corn, and bean.
Mildew, rots, and spots are examples of fungal diseases that normally affect the
plant leaves like carrots, beetroot, and beans. Dwarfing, distortion, and mosaic are
examples of viral diseases that normally affect plants like tobacco, pepper, potato,
tomato, eggplant, cucumber, and petunia. Figure 3 shows the leaves of the plants
affected by various types of biotic factors.
Types of
Diseases
Bacterial
Diseases
Dwarfing
Mottling Distortion
This section elaborates in detail about the various processing modules used in the
development of the leaf disease detection/classification system.
In this method, the investigators have used well-known datasets, namely plant village
dataset, integrated pest management (IPM) images, and American Psychopatho-
logical Society (APS) images. Most of the works have observed a single crop for
the period instead of the full-fledged dataset [1–5]. Some of the experimenters use
scanned images [6, 7]. A few research workers have used self-collected images. A
powerful leaf disease detection system depends on the image capturing in the envi-
ronmental conditions. A list of datasets used by the various researchers has been
shown in Table 1.
Plant Leaf Disease Detection and Classification Using Machine … 519
This method helps to maintain all the images in fixed size using the resize function.
Various filters are employed to the image for noise removal and image enhancement.
If the captured images have no noise, then those images give better results in the next
steps. Mean and median filters have used to eliminate unwanted information like
dust, dewdrops, water drops, insects, and shadows that appear on the image. Weiner
filters are used to clear the blurring effect of the leaf image. The list of preprocessing
functions has shown in Table 2.
520 M. V. Appalanaidu and G. Kumaravelan
Segmentation of the image means to split the image into different segments and
extracts the unhealthy portion of the leaf image using segmentation techniques. A
list of various segmentation techniques used by the researchers has shown in Table 3.
The final method of image processing is feature extraction. This method is useful
to reduce the image dataset and also to find the name of the disease. Color features
have extracted by CMM (color moment method) and CCM (color co-occurrence
matrix). Mean and Standard_Deviation are some examples of the color feature. Shape
features have extracted by the MER (minimum enclosing rectangle). Area, perimeter,
and diameter are a few examples of shape features. The GLCM (Gray_Level_Co-
occurrence_Matrix) extracts texture features. Contrast, entropy, and homogeneity
are examples of the texture feature. CCM is also used to extract combinations of
color and textures. A list of features used by the researchers has shown in Table 4.
Support Vector Machine (SVM): The authors in [8] classify the five diseases of the
banana leaf. They collect a total number of 106 images by the digital camera. During
classification, training used 60 images, and testing used 46 images. SVM performs
the classification with 95.7% accuracy. The authors in [9] classify the soybean leaves
of three diseases. During classification, training used in 3341, and testing used 1434
images. They divide the whole dataset into three models, like model1, model2, and
model3. For training and testing, model1 uses each 50% of the total images. Model2
522 M. V. Appalanaidu and G. Kumaravelan
uses 60% and 40% of the overall pictures for training and testing. For the learning
and evaluation, model3 uses 70% and 30% of the total frames. Among the three
models, the highest classification accuracy 62.53% achieved by the model3. The
authors in [10] classify the grape leaves of the two diseases. The total number of
images 400 collected from a well-known benchmark dataset called the plant village
dataset in the form of JPEG. During classification, training used in 225 and testing
used 175 images. SVM performs the classification with accuracy 97.3%. The author
compared the proposed model with NN, ANN, fuzzy set theory algorithms and
concluded that the proposed model gets the best precision. The authors in [11] classify
the two diseases of potato leaves. They collect the images from the plant village
benchmark dataset. Images used during classification, training used 180, and testing
used 120. MulticlassSVM classifier performs classification with an accuracy of 95%.
The authors in [12] classify the four diseases of a wheat leaf. During classification,
training used 150 images, and testing used 50 images. The proposed model multiple
classifier systems (MCS) performs the classification with 95.1% accuracy.
K-Nearest Neighbor (KNN): The authors in [13] categorize two diseases of paddy
leaf. Initially, segment the image by the global threshold method to separate the
unhealthy region of the leaf image. After that, extract geometric features from the
segmented images and submitted to the KNN classifier. During classification, training
used 198, testing used 132 images. KNN classifies the diseases of paddy leaf with
76.59% accuracy. The authors in [14] classify the five kinds of diseases of corn leaves,
which take by the digital camera. The KNN classifies the diseases of corn leaf with
an accuracy of 90%. The authors in [15] classify the two diseases of soybean leaves.
During classification, training used 100, and testing used 44 images with accuracy
75%. The authors in [16] classify the two diseases of a paddy leaf using the KNN clas-
sifier. Classifiers, i.e., SVM and KNN, perform the classification with accuracy 93.3
and 91.10%, respectively. During classification, training used 90 images, and testing
used 30 images. Finally, the author concludes among the two classifiers that KNN
has given the best performance. The summary of all machine learning classification
algorithms has shown in Table 5.
Naïve Bayes (NB): The authors in [17] present an efficient technique to classify
the healthy and diseased leaf of the okra. They test the proposed method on 79 leaf
images. During classification, training used 49, and testing used 30 images. Naïve
Bayes classifier categorizes the healthy and unhealthy leaf with 87% accuracy.
Neural Network (NN): The authors in [18] proposed Enhanced Particle Swarm
Optimization (EPSO) to classify the diseases of root rot, leaf blight, bacterial blight,
micronutrient, wilt of the cotton leaf. They apply the reduced features to SVM and
BPNN classifiers. During classification, training used 270, and testing used 120
images for the classification of various diseases of cotton leaves with 94% accu-
racy. Finally, the author concludes that BPNN is the best classifier among the two
classifiers. The authors in [19] developed a new system to identify the varieties of
white rot, anthracnose, rust, ascochyta spot, witches broom of jujube leaf. Eleven
shapes, four texture, and nine color feature extract from the segmented images. Lastly,
these features apply to the neural network classifier as input. The classifier identifies
various diseases of jujube plant leaf with an accuracy of 85.33%. The authors in [20]
Plant Leaf Disease Detection and Classification Using Machine … 523
the pictures are segments by the Otsu thresholding segmentation technique to sepa-
rate the diseased part of the form. Later, extract ten color features from the segmented
image and store in the feature vector. Finally, these extracted features are submitted
to the decision tree and perform the classification with 78% accuracy.
4 Discussions
The image processing has shown an effective method for identifying and diag-
nosing plant disease and replacing the digital camera with human eyes and brain
with learning optimization algorithms. The above review clarified different methods
for identifying and classifying various plant leaf infections. Some facts drive this
inference. Table 1 indicates the list of datasets has used by multiple authors. Table 1
shows that the maximum number of images has taken from the source plant village
dataset. Table 2 describes that the different preprocessing functions have been applied
by the various authors. Table 3 shows the list of segmentation methods, and it has
indicated that k-means and thresholding methods have the best performance among
all the segmentation techniques. Table 4 presents different types of features and their
combination of features used by the various authors. From Table 4, color features
alone and the combination of the features have better performance among all the
features. The results for Table 5 show that the NN classifier performs best among all
the classifiers concerning all the classification performance measures. The classifi-
cation accuracy of the NN classifiers is 97.41. The SVM and DT classifier performs
well next to NN classifier. The SVM and DT have the same classification accuracy
97.3. However, the KNN classifier yields the next classification performance among
all the classifiers. The classification accuracy of the KNN classifiers is 93.3. Finally,
NB shows the lowest classification performance with 87%.
5 Conclusions
This review paper describes the various image processing and machine learning
strategies used in the detection and classification of diseases of different plants. A
detailed list of image processing methods has explained individually. A comparison
of different classification approaches has clearly described. From the above review,
researchers should implement a few new algorithms and an understanding of methods
to achieve better outcomes. A mixture of unexplored methods of processing, selec-
tion, and training can also have to improve detection and classification methods.
Through developing mobile applications, farmers can make immediate solutions
available. Web portals may have to provide online solutions for plant disease.
Plant Leaf Disease Detection and Classification Using Machine … 525
References
1. Mohanty, S.P., Hughes, D., Salathe, M.: Using deep learning for image-based plant disease
detection. Front. Plant Sci. 7, 1–10 (2016)
2. Ipm images. https://www.ipmimages.org/about/. Accessed 15 May 2017
3. APS Image database. https://imagedatabase.apsnet.org/search.aspx. Accessed 16 May 2017
4. Pujari, J.D., Yakkundimath, R.S., Jahagirdar, S., Byadgi, A.M.: Quantitative detection of
soybean rust using image processing techniques. J. Crop Prot. 5(1), 75–87 (2015)
5. Rumpf, T., Mahlein, A.K., Steiner, U., Oerke, E.C., Dehne, H.W., Plumer, L.: Early detec-
tion and classification of plant diseases with support vector machines based on hyperspectral
reflectance. Comput. Electron. Agric. 74(1), 91–99 (2010)
6. Pires, R.D.L., Goncalves, D.N., Orue, J.P.M., Kanashiro, W.E.S., Rodrigues, J.F., Machado,
B.B., Gonçalves, W.N.: Local descriptors for soybean disease recognition. Comput. Electron.
Agric. 125, 48–55 (2016)
7. Phadikar, S., Sil, J., Das, A.K.: Rice diseases classification using feature selection and rule
generation techniques. Comput. Electron. Agric. 90, 76–85 (2013)
8. Singh, V., Misra, A.K.: Detection of plant leaf diseases using image segmentation and soft
computing techniques. Inf. Process. Agric. 4(1), 41–49 (2017). (Elsevier)
9. Kaur, S., Pandey, S., Goel, S.: Semi-automatic leaf disease detection and classification system
for soybean culture. IET Image Process. 12(6), 1038–1048 (2018)
10. Kaur, P., Pannu, HS., Malhi, AK.: Plant disease recognition using fractional-order Zernike
moments and SVM classifier. Neural Comput. Appl. pp. 1–20 (2019). (Springer)
11. Islam, M., Dinh, A., Wahid, K., Bhowmik, P.: Detection of potato diseases using image segmen-
tation and multiclass support vector machine. In: 2017 IEEE 30th Canadian Conference on
Electrical and Computer Engineering (CCECE), pp. 1–4. IEEE (2017)
12. Tian, Y., Zhao, C., Lu, S., Guo, X.: SVM-based Multiple classifier system for recognition of
wheat leaf diseases. In: Proceedings of 2010 Conference on Dependable Computing (CDC
‘2010), pp. 2–6 (2010)
13. Suresha, M., Shreekanth, K.N., Thirumalesh, BV.: Recognition of diseases in paddy leaves
using knn classifier. In: 2nd IEEE International Conference for Convergence in Technology
(I2CT 2017), pp. 663–666 (2017)
14. Zhang, S.W., Shang, Y.J., Wang, L.: Plant disease recognition based on plant leaf image. J
Anim Plant Sci 25(Suppl. 1), 42–45 (2015)
15. Shrivastava, S., Hooda, D.S.: Automatic brown spot and frog eye detection from the image
captured in the field. Am. J. Intell. Syst. 4(4), 131–134 (2014)
16. Mohan, K.J., Balasubramanian, M., Palanivel, S.: Detection and recognition of diseases from
paddy plant leaf images. Int. J. Comput. Appl. 144(12), 34–41 (2016)
17. Mondal, D., Kole, D.K.: Detection and classification technique of yellow vein mosaic virus
disease in okra leaf images using leaf vein extraction and Naive Bayesian classifier. In: IEEE
International Conference on Soft Computing Techniques and Implementations (ICSCTI) (2015,
October)
18. Revathi, P., Hemalatha, M.: Cotton leaf spot disease detection utilizing feature selection with
skew divergence method. Int. J. Sci. Eng. Technol. 3(1), 22–30 (2014)
19. Zhang, W., Guifa, T., Chunshan, W.: Identification of jujube trees diseases using a neural
network. Int. J. Light Electron. Opt. 124(11), 1034–1037 (2013)
20. Ramakrishnan, M.: Groundnut leaf disease detection and classification by using a backpropaga-
tion algorithm. In: IEEE International Conference on Communications and Signal Processing
(ICCSP), pp. 0964–0968 (2015, April)
21. Sabrol, H., Kumar, S.: Tomato plant disease classification in digital images using classification
tree. In: International Conference on Communication and Signal Processing, IEEE, pp. 1242–
1246 (2016)
22. Sabrol, H., Kumar, S.: Intensity-based feature extraction for tomato plant disease recognition
by classification using a decision tree. Int. J. Comput. Sci. Inf. Secur. 14(9), 622–626 (2016)
Single-Channel Speech Enhancement
Based on Signal-to-Residual Selection
Criterion
Abstract Over the last 40 years, researchers/engineers have proposed quite a many
speech enhancement algorithms to reduce noise, but little efforts have been made
to improve speech comprehensibility. The prime aim in this paper is to ameliorate
speech quality standard and comprehensibility by examining the application of binary
mask in conditions, unfavorable to hearing impaired or normal listeners who find
the speech incomprehensible. Gain functions like Wiener and spectral subtraction
aim to attenuate the signal when speech is absent or the estimated SNR is low and
retain the signal when speech is present for which the estimated SNR is high. For
this approach, access to accurate SNR estimates and estimates of background noise
spectrum is needed. Even in extremely low SNR conditions (SNR < −5 dB), this aim
is attainable. This method is applicable in real time in hearing aids, mobile phones
and speech-activated machines.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 527
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_56
528 R. Nuthakki et al.
1 Introduction
2 Proposed Method
Consider a clean speech z(n), interrupted by noise d(n), (not correlated to z(n)).
Subsequently, the corrupted speech y(n) is:
Figure 1 shows the blocks used to build the mask in magnitude domain. Portions
containing noise are partitioned into frames/slots of 20 ms, with 50% overlap binding
the adjoining frames. Hanning window is applied to each speech frame and short-time
Fourier transformed. On multiplying the noise spectrum Y (k, mi ) with the gain G(k,
mi ), we get an estimate of the speech spectra. Here G(k, mi ) is given with respect to
a priori SNR, and Ẑ (k, mi ) is used to denote the estimate of clean sentence spectrum
where mi is frame index, and k denotes the frequency bin [9, 11].
After computing the estimated noise spectrum, the formulation of binary mask is
done on restricting the anomalies that were caused by inaccuracies during estima-
tion of the noise spectrum. Particularly if Ẑ (k, mi ) ≤ 2Z(k, mi ), binary mask lets the
spectrum pass through and masks the spectrum if vice versa. Usually, the processed
speech contains both noise underestimation and overestimation. The spectrum esti-
mate is differentiated opposed to the real noise spectrum for every T-F unit, the ones
that satisfy the constraints are held and ones that do not satisfy are eliminated. By
applying ISTFT, we get enhanced speech in time domain [12].
Parametric Wiener gain filter algorithm is used as a gain function. This algo-
rithm was chosen due to its less computation complexity, easy implementation and
its efficiency with respect to speech comprehensibility, unlike other sophisticated
noise-reducing algorithms [2]. The following equation is used to calculate parametric
Wiener filter function:
ω
1
G(k, m i ) = (3)
δ + SNRprio (k, m i )
α is a smoothing constant which controls SNRprio and its value is 0.98. Background
noise variance estimate is represented by λ D̂ [3, 5, 13].
The block diagram of the process involved for the aforementioned SNRESI -based
algorithm is shown in Fig. 2. Unlike the SNR rule, the SNRESI rule selects channels
from the enhanced (noise-suppressed) spectrum rather than from the noise-corrupted
spectrum. The noise-reduction block shown may include any conventional noise
reduction algorithm. The choice of algorithm will not influence performance, at least
in terms of intelligibility [11]. If Ẑ (k, mi ) > Z (k, mi ), it indicates noise overestimation,
and Ẑ (k, mi ) < Z (k, mi ) indicates noise underestimation distortion. Normally, both
are present in the processed speech.
In Region I: SNRENH (k) ≤ SNR(k) leads to Ẑ(k, mi ) ≤ Z(k, mi ) which gives rise
the condition in this region. In Region II:
SNR(k) < SNRENH (k) ≤ SNR(k) + 6 dB which gives rise to the condition in
this region. Lastly, Region III constraint is obtained because factually in this region:
SNRENH (k) > SNR(k) + 6 dB. By these definitions pertaining to these three regions, it
is made obvious that, to maximize SNRESI (and hence maximize comprehensibility),
the approximated magnitude spectra Ẑ (k, mi ) must be retained in both regions I and
II [3, 4].
3 Objective Measures
The initial speech signal and the enhanced speech signal are generally used to calcu-
late the objective quality measures. In either frequency or time domain, every speech
slot’s distortion measure average is taken to evaluate speech distortion. The objective
measures computed in our project are segmental signal-to-noise ratio (SSNR) and
short-time objective intelligibility (STOI).
532 R. Nuthakki et al.
3.1 SSNR
The SSNR is estimated in both time and frequency domains. The most simple measure
used to evaluate the speech improvement algorithm is the time domain approach. The
original and processed signals are time aligned with phase errors rectified. SSNR can
be defined as,
⎛ ⎞
−1 N
M−1
N m+N
i=1 Z (i)
2
10 ⎜ ⎟
SSNR = log ⎝
2 ⎠ (6)
M M=0 10 i=N m N
i=1 Z (i) − Ẑ (i)
where Z (i) is initial (clean) signal, Ẑ (i) is enhanced signal. M denotes the signal
frame numbers and N indicates the length of frame (20 ms).
Geometric average of SNRs over all slots of the signal provides basis of SSNR.
A probable issue with the approximation of SSNR is that, in the durations of quiet-
ness in a speech signal (which are plenty in all human conversations), the signal
energy is observed to be very low that results in highly negative SSNR values which
tend to create unfairness in the overall assessment. To resolve this, the quiet frames
are excluded, by comparison of short-time energy computations with respect to a
threshold and by restricting SSNR to low values. They were restricted within a span
of (−10, 35 dB) hence evading the use of a speech silence detector. The SSNR is
based on the clean and processed signals. The signals passed through the perceptual
weighting filters. After passing clean and processed signals through these filters,
computation of the segmental SNR is based on the outputs of these filters [14].
3.2 STOI
Here k 1 and k 2 are the 1/3rd octave band edges rounded off to the closest DFT-
channel. T-F depiction of the processed signal is achieved in the same manner and
denoted by s. The intermediary comprehensibility assessment for a single T-F unit,
represented dj(mi ), depends on N successive T-F unit regions from Z j (m i ) and
Ẑ j (m i ) both, with mi ∈ M where
In a way that its energy is equal to the clean signal energy, inside that T-F region.
Then, for signal-to-distortion ratio (SDR) to be lower bound, α Ẑ j (m i ) is clipped,
we define SDR as,
⎛ ⎞
⎜ Z j (m i )2 ⎟
SDR j (m i ) = 10 log10 ⎝
2 ⎠ (10)
α Ẑ j (m i ) − Z j (m i )
Hence,
Ẑ = max min α Ẑ , Z + 10−β/20 Z , Z − 10−β/20 Z (11)
Here Ẑ denotes clipped plus normalized T-F unit, also β indicates lower bound
SDR. Intermediary comprehensibility assessment is given by approximating the
correlation coefficient between processed and unprocessed T-F units,
mi Z j (m i ) − 1
N l Z j (l) Ẑ j (m i ) − 1
N l Ẑ j (l)
d j (m i ) = (12)
2
2
mi Z j (m i ) − 1
N l Z j (l) mi Ẑ j (m i ) − 1
N l Ẑ j (l)
1
d= d j (m i ) (13)
J M j,m
i
It is clear from Table 2 that there is a significant improvement in SSNR and STOI
for speech signal corrupted by random, babble, helicopter and car noises for δ and
534 R. Nuthakki et al.
ω values of the Parametric Wiener gain filter shown. The values were chosen so as
to get the trade-off between the overall signal quality and comprehensibility.
4 Subjective Measures
Fig. 4 Mean
comprehensibility score
Figure 4 indicates the mean % of words identified by listeners with standard capacity
to identify. It has been evident from the figure that intelligibility was improved when
noise distortion constraints were applied in the magnitude domain and was noticed to
have been degraded on applying the Weiner and unrefined stimuli [3]. The recordings
obtained on the basis of mean % of words taken from listeners are shown in Fig. 4.
UN represents the values derived from unprocessed speech. It has been evident that
there was a significant improvement in the performance on applying the proposed
binary mask, as depicted in Fig. 3. On considering −5 dB, the performance increased
from 25% with un-processed stimuli (UN) to 97% with proposed binary mask ( Ẑ (k,
mi ) ≤ 2.Z (k, mi )) and at 0 dB performance increased from 65% with un-processed
stimuli (UN) to 99% with the proposed binary mask ( Ẑ (k, mi ) ≤ 2.Z (k, mi )).
6 Spectral Analysis
The parametric Wiener gain filter was used for the implementation of the new binary
mask approach with the use of MATLAB tool. Different subjective and objective
tests were done. For objective tests, the parameter calculated was SSNR and STOI in
536 R. Nuthakki et al.
Clean speech
10000
8000
6000
4000
2000
0 0.5 1 1.5 2 2.5 3
Noisy speech
10000
8000
6000
Frequency(Hz)
4000
2000
0 0.5 1 1.5 2 2.5 3
Wiener Filter processed
10000
8000
6000
4000
2000
0 0.5 1 1.5 2 2.5 3
Enhanced speech Signal
10000
8000
6000
4000
2000
0 0.5 1 1.5 2 2.5 3
Time (s)
(a)
(b)
(c) (d)
time domain. The tests were run for different combinations of δ and ω of parametric
Wiener gain filter for different background noises at 0 and −5 dB SNR levels. On
looking at the objective scores, there was an obvious upgrade in the SSNR values
for sentences degraded by helicopter, car, random and babble noises at 0 and −5 dB
SNRs. The subjective tests also show improvement in overall speech enhancement
standard and speech comprehensibility. The mean comprehensibility scores also
suggest improvement in intelligibility for the proposed binary mask channel selection
criterion.
8 Future Scope
References
1. Naik, D.C., Sreenivasa Murthy, A., Nuthakki, R.: A literature survey on single channel speech
enhancement techniques. Int. J. Sci. Technol. Res. 9(3). ISSN 2277-8616
2. Rangachari, S., Loizou, P.C.: A noise-estimation algorithm for highly non-stationary environ-
ments. Speech Commun. 4, 220–231 (2006). (TX 75083-0688, 2005 Elsevier B.V)
3. Kim, P., Loizou, P.C.: Gain-Induced Speech Distortions and the Absence of Intelligibility
Benefit with Existing Noise-Reduction Algorithms. Department of Electrical Engineering,
University of Texas at Dallas, Richardson, Texas. 75080 VC 2011 Acoustical Society of
America. https://doi.org/10.1121/1.3619790. pp. 1581–1596. Accepted 2 July 2011
4. Kim, G., Loizou, P.C.: Why do Speech-Enhancement Algorithms not Improve Speech Intelli-
gibility? Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX
75080, USA 978-1-4244-4296-6/10/2010 IEEE 4738 ICASSP 2010
5. Nuthakki, R., Sreenivasa Murthy, A., Naik, D.C.: Single channel speech enhancement using a
new binary mask in power spectral domain. In: Proceedings of the 2nd International Conference
on Electronics, Communication and Aerospace Technology (ICECA 2018) IEEE Conference
Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1
6. Nuthakki, R., Sreenivasa Murthy, A., Naik, D.C.: Modified magnitude spectral subtraction
methods for speech enhancement. In: 2017 International Conference on Electrical, Electronics,
Communication, Computer and Optimization Techniques (ICEECCOT). IEEE. 978-1-5386-
2361-9/17/2017
7. Nuthakki, R.: Speech enhancement techniques. Int. J. Adv. Res. Sci. Eng. 6(8) (2017, August)
8. Kim, G.: Binary mask estimation for noise reduction based on instantaneous SNR estimation
using Bayes risk minimisation. Electron. Lett. 51(6), 526–528 (2015, 19 March)
9. Nuthakki, R., Sreenivasa Murthy, A.: Enhancement of speech intelligibility using binary mask
based on noise constraints. Int. J. Recent Technol. Eng. (IJRTE). 8(3) (2019, September). ISSN:
2277-3878
10. Li, N., Loizou, P.C.: Factors influencing intelligibility of ideal binary-masked speech:
implications for noise reduction. Acoust. Soc. Am. (2008). https://doi.org/10.1121/1.2832617
11. Nuthakki, R., Sreenivasa Murthy, A., Naik D.C.: Enhancement of speech intelligibility using
binary mask based on channel selection criteria. Int. J. Recent Technol. Eng. (IJRTE) 8(5)
(2020, January). ISSN: 2277-3878
12. Chen, F., Loizou, P.C.: Impact of SNR and Gain-Function Over- and Under-Estimation on
Speech Intelligibility. Department of Electrical Engineering, University of Texas at Dallas,
Richardson, TX 75083-0688, USA. Accepted 8 Sept 2011
13. Kim, G., Loizou, P.C.: A New Binary Mask Based on Noise Constraints for Improved Speech
Intelligibility. Department of Electrical Engineering, University of Texas at Dallas, USA, ISCA
1632, 26–30 Sept 2010, Makuhari, Chiba, Japan INTERSPEECH 2010
14. Ma, J., Hu, Y., Loizou, P.C.: Objective measures for predicting speech intelligibility in noisy
conditions based on new band-importance functions. In: 2009 Acoustical Society of America.
https://doi.org/10.1121/1.3097493
15. Hu, Y., Loizou, P.C.: Subjective Comparison and Evaluation of Speech Enhancement
Algorithms. Elsevier B.V (2006)
Evolutionary Algorithm for Solving
Combinatorial Optimization—A Review
1 Introduction
A. Radhakrishnan
Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore,
India
e-mail: r_anisha@cb.amrita.edu
G. Jeyakumar (B)
Amrita Vishwa Vidyapeetham, Ettimadai, India
e-mail: g_jeyakumar@cb.amrita.edu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 539
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_57
540 A. Radhakrishnan and G. Jeyakumar
where optimal solution is found from a finite set of possible solutions. From last
decades, researchers have explored a lot in EA solving complex COPs. Though there
are several approaches available to solve COPs, EAs have outperformed in solving
COPs and finding solution in polynomial run time [1].
The practical application of combinatorial optimization can be seen in all the
directions of real-world applications. As stated in “No Free Lunch” theorem [2],
there is no single global optimal algorithm that can solve all the problems. This
survey emphasizes on real-world COP applications where EAs are applied. It reviews
the approaches embedded with EA to solve COPs. Analysis on how EAs solve the
real-world COPs and benchmarking COPs is presented along with the performance
measurements.
The rest part of this article is structured as follows Sect. 2 introduces combinatorial
optimization problems (COPs), the Sect. 3 discusses how EAs are used for solving
COPs and summarizes based on application areas, and Sect. 4 concludes the article.
The problems, around us, which have multiple feasible solutions but one or more
best solutions are called as optimization problems. The process of searching the best
solution among the feasible solutions is termed as optimization. The optimization
problems are categorized into different types. COPs are those who have optimal solu-
tions within a finite set of possible solutions. This set is defined by a set of conditions,
and it is too large to search. The mathematical techniques finding the optimal solu-
tions to COPs involve finding an ordering of a finite set of objects (solution compo-
nents) that satisfy the given conditions. The COPs are hard optimization problems to
solve than the continuous optimization problems. However, the advancements in the
algorithm design methodologies and computing technologies made solving COPs
easier. There are two categories of approaches in formulating the algorithms to solve
COPs—(1) Exact Approach (2) Heuristic Approach. The exact approach follows the
brute force strategy. The complexity involved in generating all possible solutions
is high. Hence, the idea of finding approximate solutions which are good enough
brought into picture. The heuristic approaches follow this idea. They do not guar-
antee that the exact solution will be found but find approximate solutions which are
good enough for the problem at hand [3]. This lead to the availability of numerous
general purpose heuristics for solving complex COPs in reasonable time. They are
classified as constructive heuristics, meta-heuristics, approximate algorithms, and
iper-heuristics.
The constructive heuristics start the process with generating an “empty solution.”
Then, the empty solution is extended to get a complete solution. Meta-heuristics
[4] are problem independent algorithmic frame works which provides guidelines for
constructing optimization algorithms for solving problems [5]. The most popular
meta-heuristics are evolutionary algorithms (EAs) [6], Tabu search [7], simulated
annealing [8], and ant colony optimization [9], etc. The approximate algorithms
Evolutionary Algorithm for Solving Combinatorial Optimization … 541
are the special class of heuristics, which guarantee the near optimal solutions with a
limited error from global optimal solution with a specified threshold for the error. Inte-
grating the operation research and artificial intelligence techniques, the iper-heuristic
approaches aim at developing general algorithms able to generate problem specific
algorithm. Objective of this paper is to present an overview of how evolutionary
algorithms (EAs) are used to solve the COPs.
Table 1 Summary for algorithms for COPs (for electrical power systems)
Reference Algorithm used Technique
Optimal power flow with Enhanced ACO Modified structure
ecological emission [13]
Optimal chiller loading using Fish algorithm Hybridization
minimum power consumption
[14]
Optimal reactive power Ant lion optimizer Global optimizer
dispatch problem [15]
Optimal integration of PSO Modified PSO with operators
renewable energy sources [16] of DE
Optimal reactive power Enhanced firefly algorithm, Hybridization of GA and LS
dispatch problems [17–19] teaching learning based
algorithms, Gravitational
search algorithm
Table 2 Summary for algorithms for COPs (for routing, traveling salesman, scheduling, and
planning)
Reference Algorithm Used Technique
University time table scheduling Simulated annealing + Hybridization
[20] GA
Parallel machines manufacturing Symbiotic organisms Hybridization
scheduling [21] search
Simulated annealing
Job scheduling [22] PSO + simulated Hybridization
annealing
Vehicle routing problem [23] Tabu search Modified structure
Vehicle routing problem [24] Modified PSO Hybridization
Manufacture scheduling [25] Hybrid EDA (markov Hybridization
network-based EDA)
Job scheduling [26] EDA Hybridization
Hybrid dynamic berth allocation Chemical reaction Hybridization
planning problem [27] optimization
Flexible job scheduling [28] PSO Hybridization
Constraint shortest path problem PSO + VNS (variable Hybridization (GA + LS)
[29] neighborhood search)
Traveling salesman problem [30] ACO + 3 Opt algorithm Hybridization
4 Conclusion
This paper presented a survey on using EA and other meta-heuristics for solving
combinatorial optimization problems (COPs). As many real-world COPs are NP,
Evolutionary Algorithm for Solving Combinatorial Optimization … 543
Table 3 Summary for algorithms for COPs (for pattern recognition-feature selection, classification,
clustering)
Reference Algorithm Used Technique
Feature selection and classification ACO + BCO Hybridization
[31]
Feature selection [32] Artificial bee colony and gradient Hybridization
boosting decision tree
High-dimensional classification [33] PSO—competitive swarm Modified structure
optimizer (CSO)
Handwritten signature verification Artificial immune systems Modified structure
[34]
Feature selection in big data [35] Fish swarm optimization Modified structure
hard problems adapting EAs for approximate solutions is the most invited possi-
bility. EAs are designed for solving continuous parameter problems, but they are not
directly adaptable to discrete domain. We need to have proper mapping method to
represent them. Applying genetic operator results in real values, thus proper mapping
and searching techniques for global solution is effectively possible. Studies have
proved that EAs perform better when they are modified for solving COPs. This
paper summarized several COPs solved by EAs with suitable changes made in the
algorithmic structure and hybridization.
References
1. Puchinger, J., Raidl, G.R.: Combining metaheuristics and exact algorithms in combinatorial
optimization: a survey and classification. In: Mira, J., Álvarez, J.R. (eds.) Artificial Intelligence
and Knowledge Engineering Applications: A Bioinspired Approach. IWINAC 2005. Lecture
Notes in Computer Science, vol. 3562, pp 41–53. Springer, Berlin (2005)
2. Wolpert, D.H., Macreedy, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol.
Comput. 1(1), 67–82 (1997)
3. Montiel, O., Díaz Delgadillo, F.J.: Reducing the size of combinatorial optimization problems
using the operator vaccine by fuzzy selector with adaptive heuristics. Math. Prob. Eng. (2015)
4. Osman, I.H., Kelly, J.P.: Meta-Heuristics: An Overview. Meta-Heuristics, pp. 1–21. Springer,
Boston (1996)
5. Glover, F., Sörensen, K.: Metaheuristics. Scholarpedia 10(4), 6532 (2015)
6. Back, T.: Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary
Programming, Genetic Algorithms. Oxford University Press, Oxford (1996)
7. Glover, F., Laguna, M.: Tabu search. Handbook of Combinatorial Optimization, pp. 2093–2229.
Springer, Boston (1998)
8. Van Laarhoven, P.J.M., Aarts, E.H.L.: Simulated annealing. Simulated ANNEALING:
THEORY AND applications, pp. 7–15. Springer, Dordrecht (1987)
9. Dorigo, M., Di Caro, G.: Ant colony optimization: a new meta-heuristic. In: Proceedings of
the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), vol. 2. IEEE
(1999)
544 A. Radhakrishnan and G. Jeyakumar
10. Kazimipour, B., Li, X,. Qin, A.K.: A review of population initialization techniques for evolu-
tionary algorithms. In: 2014 IEEE Congress on Evolutionary Computation (CEC). IEEE
(2014)
11. Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation. Soft.
Comput. 9(1), 3–12 (2005)
12. Bartz-Beielstein, T., Zaefferer, M.: Model-based methods for continuous and discrete global
optimization. Appl. Soft Comput. 55, 154–167 (2017)
13. Raviprabakaran, V., Subramanian, R.C.: Enhanced ant colony optimization to solve the optimal
power flow with ecological emission. Int. J. Syst. Assur. Eng. Manag. 9(1), 58–65 (2018)
14. Zheng, Z., Li, J.: Optimal chiller loading by improved invasive weed optimization algorithm
for reducing energy consumption. Energy Build. 161, 80–88 (2018)
15. Mouassa, S., Bouktir, T., Salhi, A.: Ant lion optimizer for solving optimal reactive power
dispatch problem in power systems. Int. J. Eng. Sci. Technol. 20(3), 885–895 (2017)
16. Lorestani, A., Ardehali, M.M.: Optimal integration of renewable energy sources for
autonomous tri-generation combined cooling, heating and power system based on evolutionary
particle swarm optimization algorithm. Energy 145, 839–855 (2018)
17. Liang, R.-H., et al.: An enhanced firefly algorithm to multi-objective optimal active/reactive
power dispatch with uncertainties consideration. Int. J. Electr. Power Energy Syst. 64, 1088–
1097 (2015)
18. Ghasemi, M., et al.: Solving optimal reactive power dispatch problem using a novel teaching–
learning-based optimization algorithm. Eng. Appl. Artif. Intell. 39, 100–108 (2015)
19. Chen, G., et al.: Optimal reactive power dispatch by improved GSA-based algorithm with the
novel strategies to handle constraints. Appl. Soft Comput. 50, 58–70 (2017)
20. Fredrikson, R., Dahl, J.: A comparative study between a simulated annealing and a genetic
algorithm for solving a university timetabling problem (2016)
21. Ezugwu, A.E., Prayogo, D.: Symbiotic organisms search algorithm: theory, recent advances
and applications. Expert Syst. Appl. 119, 184–209 (2019)
22. Tang, H., et al.: Flexible job-shop scheduling with tolerated time interval and limited starting
time interval based on hybrid discrete PSO-SA: An application from a casting workshop. Appl.
Soft Comput. 78, 176–194 (2019)
23. Archetti, C., et al.: An iterated local search for the traveling salesman problem with release
dates and completion time minimization. Comput. Oper. Res. 98, 24–37 (2018)
24. Norouzi, N., Sadegh-Amalnick, M., Tavakkoli-Moghaddam, R.: Modified particle swarm opti-
mization in a time-dependent vehicle routing problem: minimizing fuel consumption. Optim.
Lett. 11(1), 121–134 (2017)
25. Gen, M., et al.: Advances in hybrid EDA for manufacturing scheduling with uncertainty: part I.
In: International Conference on Management Science and Engineering Management. Springer,
Cham (2018)
26. Hao, X., et al.: Effective multiobjective EDA for bi-criteria stochastic job-shop scheduling
problem. J. Intell. Manuf. 28(3), 833–845 (2017)
27. De, Arijit, et al.: A hybrid dynamic berth allocation planning problem with fuel costs consider-
ations for container terminal port using chemical reaction optimization approach. Ann. Oper.
Res. 1–29 (2018)
28. Nouiri, M., et al.: An effective and distributed particle swarm optimization algorithm for flexible
job-shop scheduling problem. J. Intell. Manuf. 29(3), 603–615 (2018)
29. Marinakis, Y., Migdalas, A., Sifaleras, A.: A hybrid particle swarm optimization–variable
neighborhood search algorithm for constrained shortest path problems. Eur. J. Oper. Res.
261(3), 819–834 (2017)
30. Mahi, M., Baykan, O.K., Kodaz, H.: A new hybrid method based on particle swarm optimiza-
tion, ant colony optimization and 3-opt algorithms for traveling salesman problem. Appl. Soft
Comput. 30, 484–490 (2015)
31. Shunmugapriya, P., Kanmani, S.: A hybrid algorithm using ant and bee colony optimization
for feature selection and classification (AC-ABC Hybrid). Swarm Evol. Comput. 36, 27–36
(2017)
Evolutionary Algorithm for Solving Combinatorial Optimization … 545
32. Rao, H., et al.: Feature selection based on artificial bee colony and gradient boosting decision
tree. Appl. Soft Comput. 74, 634–642 (2019)
33. Gu, S., Cheng, R., Jin, Y.: Feature selection for high-dimensional classification using a
competitive swarm optimizer. Soft. Comput. 22(3), 811–822 (2018)
34. Parmar, M., et al.: State of art survey signature verification techniques 2019. Asian J.
Convergence Technol. (AJCT) 5(3), 91–96 (2020)
35. Manikandan, R.P.S., Kalpana, A.M.: Feature selection using fish swarm optimization in big
data. Cluster Comput. 22(5), 10825–10837 (2019)
Effect of J48 and LMT Algorithms
to Classify Movies
in the Web—A Comparative Approach
Abstract Social Media websites such as Facebook, YouTube, twitter, etc., are the
convenient platforms to share one’s views about the multimedia. Videos getting
uploaded on YouTube every day are millions in number. Videos can be of different
category such as comedy video, sports video, news, advertisement, movie trailer
video, etc. Nowadays, data mining researchers are attracted towards different clas-
sification techniques of data mining to discover hidden information as well as to
discover knowledge from huge video data. The goal of this research is, classifying
and predicting movies trailer videos as poor movie, good movie, very good movie
and excellent movie based on the meta data such as likes, dislikes, comments, ratings,
budget, etc. An attempt is made in the present work to provide an effective mining
result about classifying Social Media movies. These movies are labelled based on
a particular class and other related attributes of the same dataset. 10 folds cross-
validation test is applied on J48 and LMT decision tree algorithm, and comparison
analysis is made based on confusion matrix and accuracy rate.
Keywords Classification · J48 Decision tree · LMT Decision tree · Social Media
1 Introduction
Social Media data is very huge in size, and many times the data will be full of noise,
fuzzy, incomplete and unstructured in nature. At the same time, it is an essential
task to handle such kind of data and discover useful information or knowledge out
of it. Every day about a million GB of movie videos are uploaded on Social Media
websites such as Facebook, YouTube, Instagram, etc [1].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 547
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_58
548 P. Bhat and P. Malaganve
In the proposed work, by using the meta data rating, all movies are classified as
whether the movie is poor, good, very good or excellent to watch. Ratings are nothing
but marks given to a movie trailer on YouTube in the range of 0–10 to convey one’s
opinion about the movie trailer. And, average of different viewers rates on particular
movie will be calculated and given to a movie. It gives a hint to the user to decide
upon watching that movie. The present dataset contains several attributes related to
YouTube and Twitter. As well as another meta data [2] budget is also considered for
comparison. The attribute budget is divided among three labels such as, high budget
movie, average budget movie and low budget movie [3, 4].
In this work, WEKA tool [5] is used to pre-process and to classify the dataset
considering the class and other related meta data. Decision Tree—J48 [6] and LMT
are used for testing and training the dataset by considering 10 folds cross-validation
test [7]. An attempt is to make comparative analysis between Decision Tree J48 and
LMT [8].
The rest of the paper is organised as literature review, proposed methodology
including sample dataset, table of contents, confusion matrix of both J48 and LMT
Decision Tree algorithms, findings and conclusion.
2 Proposed Model
Figure 1 represents the proposed model for comparison analysis of Decision Tree
J48 and LMT based on confusion matrix and accuracy rate. The data is extracted
from Social Media that are YouTube and Twitter, and the extracted data is stored
in .CSV file for pre-processing. The pre-processed data is classified into different
classes such as poor movie, good movie, very good movie and excellent movie based
on the meta data using LMT and J48 data mining algorithms. By using classification
techniques, we apply 10 folds cross-validation test on pre-processed data set to divide
it as training and testing data to get the efficient result of Classification. Finally, the
accuracy rate and confusion matrix of both LMT and J48 algorithms is generated
using WEKA then the results of both the algorithms is compared and analysed to
prove which algorithm can provide best accuracy result after classifying the whole
data set which contains movie trailer videos.
J48 Decision Tree is advance version of C4.5. The algorithm uses divide-and-conquer
method. And, to construct the tree, it uses pruning method [9]. It is a common method
which is used in information gain or entropy measure [10]. Hence, it is like tree
structure with root node, intermediate and leaf nodes. Node holds the decision and
helps to acquire the result [11].
Effect of J48 and LMT Algorithms to Classify Movies in the Web … 549
Data Selection
.CSV file
Data Pre-process
Classification Algorithms
LMT J48
Training Testing
Knowledge Discovery
Comparison analyses
Fig. 1 Propose model for comparison analysis of J48 and LMT Decision Tree
Movie: The dataset contains 232 different movie names which are stored under Movie
attribute column.
Year: It indicates, in which year each movie got released on the screen.
Ratings: Shows the number of ratings the movie has got, depending on which
movie can be classified.
Genre: It indicates the genre or category of the movie.
Gross: Gross collection of a particular movie after it got released on the screen.
Budget: Total amount of budget the movie required to build.
Screens: Number of screens in USA where all the movie got released.
Sequels: Number of sequels made next to the movie.
Sentiment: Sentiment score of the movie.
Views: Number of views of movie trailer on YouTube.
Likes: Number of likes of movie trailer on YouTube.
Dislikes: Number of dislikes of movie trailer on YouTube.
Comments: Number of comments of movie trailer on YouTube.
Aggregate followers: Aggregate actor followers on Twitter.
Table 1 contains the real values of used dataset with respective data type. To label
the class as poor movie, good movie, very good movie and excellent movie, we have
considered “Ratings” attribute and converted the numeric values of Ratings attribute
to nominal values by providing particular range as shown in Table 2.
Table 2 Classification of
Rating range Label
rating attribute
0–3 Poor
3.1–5 Good
5.1–8 Very good
8.1–10 Excellent
Table 3 Classification of
Budget range Label
budget attribute
Budget < 800,000 Low budget movie
Budget < 16,000,000 Average budget movie
Budget > 16,000,000 High budget movie
The attribute “Budget” was having carrying numeric values has converted to
nominal values to classify the dataset in better way. Nominal values of attribute
“Budget” are labelled as shown in Table 3.
Here, we can divide the dataset in different number of folds. If we consider 10 folds,
the dataset is divided into 10 different sets. In the first iteration, first set is considered
as testing dataset and remaining 9 sets are considered as training datasets. Same in
the second iteration, second set is considered as testing dataset and remaining 9 sets
are considered as training dataset and so on… Hence, the entire dataset is considered
as training as well as testing dataset.
Figure 2. shows the accuracy of Decision Tree J48 as 85.71% and the accuracy of
Decision Tree LMT as 86.58%; therefore, Decision Tree LMT gives better result as
compared to J48. When we observe the confusion matrix in Fig. 3. J48 has correctly
classified 193 instances as very good movies, 5 instances as good movies, 0 instances
as excellent movie and the dataset does not contain poor movie which are below 3
ratings so confusion matrix has taken only three classes. At the same time, LMT has
correctly classified 199 instances as very good movies, 1 instance as good movie, 0
instance as excellent movie, and the dataset does not contain poor movie which are
below 3 ratings so confusion matrix has taken only three classes [13, 14].
552 P. Bhat and P. Malaganve
4 Conclusion
As we saw in Sect. 3.2, i.e., comparison analysis of Decision Trees J48 and LMT,
accuracy of LMT is better than the accuracy of J48, as Decision Tree LMT has
correctly classified a greater number of instances. Based on the analysis of Confusion
Matrix, Classification accuracy and other required calculations shown in Fig. 2, such
as kappa statistics, mean absolute error, root mean squared error, relative absolute
error and root relative squared error which takes vital lead in classifying the instances
correctly, we conclude that LMT Decision Tree is the best suitable classification
method for classifying the movies trailers data set with good efficiency and accuracy.
5 Future Work
In the future, we will make good attempt to check whether all high budget movies
are excellent or very good to convey and whether all low budget and average budget
movies can also carry good ratings or not.
Effect of J48 and LMT Algorithms to Classify Movies in the Web … 553
References
1. Sharma, A.K., Sahni, S.: A comparative study of classification algorithms for spam email data
analysis. Int. J. Comput. Sci. Eng. (IJCSE). 3(5) (2011). ISSN 0975-3397
2. Rangaswamy, S., Ghosh, S., Jha, S., Ramalingam, S.: Metadata extraction and classification of
YouTube videos using sentiment analysis. In: 2016 IEEE International Carnahan Conference
on Security Technology (ICCST)
3. Algur, S.P., Bhat, P., Kulkarni, N.: Educational data mining: classification techniques for recruit-
ment analysis. Int. J. Modern Educ. Comput. Sci. 2, 59-65 (2016). (Published Online February
2016 in MECS). http://www.mecs-press.org/.10.5815/ijmecs.2016.02.08
4. Bansal, A., Gupta, C.L., Muralidhar, A.: A sentimental analysis for youtube data using
supervised learning approach. Int. J. Eng. Adv. Technol. (IJEAT) 8(5, (2019, June). ISSN
2249-8958
5. Weka—Data Mining Machine Learning Software. Available at http://www.cs.waikato.ac.nz/
ml/weka/
6. Kalmegh, S.R.: Comparative analysis of WEKA data mining algorithm random forest,
Randomtree and LADTree for classification of indigenous news data. Int. J. Emerg. Technol.
Adv. Eng. www.ijetae.com. 5(1) (2015, January). ISSN 2250-2459, ISO 9001:2008 Certified
7. Bhat, P., Malaganve, P., Hegde, P.: A new framework for social media content mining and
knowledge discovery. Int. J. Comput. Appl. (0975 – 8887) 182(36) (2019, January)
8. Kalmegh, S.: Analysis of WEKA data mining algorithm REPTree, simple cart and randomtree
for classification of Indian News. Int. J. Innov. Sci. Eng. Technol. (IJISET) 2(2) (2015, February)
9. Nahar, N., Ara, F.: Liver disease prediction by using different decision tree techniques. Int. J.
Data Mining Knowl. Manag. Process (IJDKP) 8(2) (2018, March)
10. Algur, S.P., Bhat, P.: Web video mining: metadata predictive analysis using classification tech-
niques. Int. J. Inf. Technol. Comput. Sci. 2, 68–76 (2016). (Published Online February 2016 in
MECS)
11. Algur, S.P., Bhat, P.: Abnormal web video prediction using RT and J48 classification techniques.
Int. J. Comput. Sci. Eng. 4(6), 101–107 (2016, June). E-ISSN 2347-2693
12. Malika, H., Tiana, Z.: A framework for collecting youtube meta-data. In: Peer-Review Under
Responsibility of the Conference Program Chairs. Published by Elsevier B.V. https://doi.org/
10.1016/j.procs.2017.08.347
13. Algur, S.P., Bhat, P., Ayachit, N.H.: Educational data mining: RT and RF classification models
for higher education professional courses. Int. J. Inf. Eng. Electron. Bus. 2, 59-65 (2016).
(Published Online March 2016 in MECS, http://www.mecs-press.org/) https://doi.org/10.5815/
ijieeb.2016.02.07
14. Vadhanam, B.R.J., Mohan, S., Ramalingam, V.V., Sugumaran, V.: Performance comparison
of various decision tree algorithms for classification of advertisement and non advertisement
videos. Indian J. Sci. Technol. 9(48) (2016, December). https://doi.org/10.17485/ijst/2016/
v9i48/102098
A System to Create Automated
Development Environments
Using Docker
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 555
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_59
556 N. S. Akhilesh et al.
1 Introduction
Modern applications are often quite complex. Generally, they are composed of a
number of software each playing a vital role in the application (e.g., a MEAN stack
application uses MongoDB as its database, Express for routing, AngularJS for its
frontend and NodeJS for its back-end). Setting up and managing each of these soft-
ware dependencies (as well as each of their own internal dependencies) can be quite
cumbersome, especially in a team of people working on the application. This is
where Docker comes in. Docker is a tool that (among other things) allows one to
define all the software dependencies of any application in a configuration file called a
Dockerfile (or docker-compose.yml file) and feed that configuration file into Docker
which will then use the file to create a development environment which has all the
dependencies mentioned in the file automatically installed and set up. And in a team,
the configuration file can easily be shared via git to ensure uniformity across the team
over the application’s dependencies.
By automating much of the dependency setup process, Docker has not only solved
a great deal of problems related to dependency management (such as version clashes,
complications involved in version updates and OS-level interference) but has also
made the actual process of development easier [1]. Needless to say, Docker is an
incredibly useful software, and its wide scale adoption and use in the industry is
reflective of this. In this paper, we focus primarily on Docker’s ability to automate
dependency management as we believe that a great deal of programmers (particularly
novice programmers and students) can benefit from this feature. Novice programmers
in particular often face difficulty setting up dependencies for a language, tool or
framework before they can start using it (Ex: Setting up Ruby on Rails on Windows
or setting up a MEAN stack application). Docker can be useful in this situation, but
it does have an associated learning curve and people who are new to programming or
unfamiliar with the topic may need to understand things such as virtualization before
they are able to understand Docker.
In this paper, we propose a system or application that takes the form of an inte-
grated development environment (IDE) that uses Docker under the hood to set up
environments for any language, tool or framework which people can immediately
start working with. The end product should be a code editor similar to VS code but
one where a developer can additionally type out a few pieces of information (such as
a language and its version) and the editor will then automatically set up an environ-
ment based on that information in which the developer can start coding. In essence,
this is a system which abstracts on top of Docker to allow for its use without having
to know how to write a Dockerfile or docker-compose.yml file. Such a system would
be useful to novice programmers, students (who want a learn various languages,
tools and frameworks without having to worry about setting them up), computer labs
(since this one system can replace a number of languages, tools and frameworks that
would otherwise need to be set up individually) and developers who are interested in
leveraging the power of multiple languages side by side in a Jupyter notebook style
operating environment.
A System to Create Automated Development Environments Using Docker 557
2 Literature Survey
2.1 Docker
Sample work and implementation are essential in many aspects of scientific research,
being able to reproduce the work of specific research has become very vital to its
verification and validation by researchers and domain experts. Though reproducing
computer software seems significantly simpler than replicating the physical envi-
ronments of some experiments, the ever-changing nature of software today and the
challenges of interoperable dependencies in software can make this task a serious
challenge. This is where Docker can prove to be extremely useful as it stands as
a far superior solution to existing solutions such as workflow systems and virtual
machines. Carl Boettiger illustrates this in his paper where he uses an R statistical
environment setup in various conditions (including Docker) and compares them [11].
Additionally, Docker is also useful in automating the various tasks of a workflow
system like makeflow and workqueue. Containers can be connected to various points
of a workflow’s infrastructure, and there have been several methods produced to
manage containers’ images that need to be shared for the execution of tasks [12]. All
of this hints to Docker’s extensive use in the field of research, and an area of interest
for this paper since replicating environments to test and review peer research is a
vital aspect of the field.
2.3 Electron JS
Electron is an open-source framework that can be used for desktop app development.
It is created and maintained by GitHub who used it to build the Atom editor. Electron
uses a combination of the Chromium browser and the NodeJS runtime to create fully
functioning desktop applications. Because of this, it allows for the UI development
of the application to be done using standard HTML, CSS and JS, while the core logic
of the application is done via NodeJS. A majority of Electron’s APIs is written using
C++ and Objective-C which are then exposed to the core logic via NodeJS bindings
[13].
3 Existing Solutions
Automation in development is not a new concept. There have been tools to do this even
before Docker. So if you are a developer, what are some of the ways you could tackle
common issues that arise when working on a coding project such as dependency hell,
poor documentation (which can often make it difficult to setup, initiate and work on
existing projects) and code rot (referring to code changing behavior due to external
circumstances such as OS updates or bug fixes in the languages used by the software)
[11].
A System to Create Automated Development Environments Using Docker 559
4 Proposed System
As we have discussed in the previous section, Docker alone does already achieve a
great deal of automation. Its only flaw being its associated learning curve which can
deter people who are new to programming. Therefore, in this paper, we propose a
solution which abstracts the features of Docker and provides them to a user through
an easy-to-use and simplified user interface, allowing the user to leverage Docker
without knowing how to use it.
Our proposed system is an IDE in which a user would enter a few details in a
form and the IDE will then use that information to set up a development environment
by creating the required Dockerfiles and docker-compose.yml files as well as the
required language files for that specific language, tool or framework that the user
wishes to work with.
560 N. S. Akhilesh et al.
The proposed system will also allow for these development environments to be
shareable. This is done by allowing for each development environment to be created
using a minimal configuration file. Adding this configuration file to any normal
project will make it compatible with the system, and we refer to such a project as
a recipe. These projects (recipes) can then be shared and managed via Git allowing
for them to be community driven and customizable.
The team behind the proposed system will maintain official recipes for various
languages that will act as both stable defaults and base recipes. Any individual can
then build on these official recipes to create customized and personalized recipes, and
this would be especially useful in private organizations where development teams
might have their own custom setups for products.
5 Mechanism
Assuming that all the OS and software requirements are fulfilled, the application
works in the following way: (Let us assume that a user of the application wishes to
execute some code in NodeJS)
First the application pulls a recipe template (remember that a recipe is merely
a project with a special configuration file and template here refers to handlebars
template) for NodeJS (the official by default, but a custom one can be specified by
the user) from GitHub/GitLab/BitBucket and stores the template in a special directory
reserved for the application by the OS which is usually:
• Windows XP—C:/Documents and Settings/USERNAME/Application Data
• Windows 7 and above—C:/Users/USERNAME/AppData/Roaming
• MacOS—/Users/USERNAME/Library/Preferences
• Linux—/home/USERNAME/.local/share.
Then, the application gets any inputs from the recipe that were specified by the
creator of the recipe and renders them as a form to the user. The application then
takes the output of the form and uses it to fill out the recipe template and then places
the recipe in a directory local to the project in which the user is working (usually a
directory called “judip_recipes”.
After the recipe is added, it appears on the frontend as a codeblock to the user
where the user can then enter any type of (in this case NodeJS) code, and the entered
code gets saved to the locally stored recipe.
Finally, the application checks the newly installed recipe’s configuration file which
contains “execute” and “execute_background” keys that the application can use to
execute the recipe.
A System to Create Automated Development Environments Using Docker 561
6 Results
7 Conclusion
References
10. Zhang, Q., Liu, L., Pu, C., Dou, Q., Wu, L., Zhou, W.: A Comparative Study of Containers and
Virtual Machines in Big Data Environment (2018). https://arxiv.org/pdf/1807.01842.pdf
11. Boettiger, C.: An Introduction to Docker for Reproducible Research, with Examples from the
R Environment (2014)
12. Zheng, C., Thain, D.: Integrating Containers into Workflows: A Case Study Using Makeflow,
Work Queue, and Docker (2015)
13. Electron. https://www.electronjs.org/
Novel Methodologies for Processing
Structured Big Data Using Hadoop
Framework
Abstract There are many tools and techniques to store and process data. But with
big data usual traditional systems fail to handle big data. The reason is its structure,
size, etc. Because of this reason many new tools have been developed and Hadoop
is one of them. Hadoop is a framework that contains many tools to manage big
data. Apache Hadoop has tool called Hive which can be used to process big data in
structured form. There are many ways in which big data can be processed. But if the
user is not well-versed with programming and knows query language like SQL, then
required information can be retrieved by using Hive. Using Apache, Hive not only
reduces the number of lines of coding but also saves time of programmer. This paper
explains the working of Hive along with an illustration of how useful data can be
retrieved by using HiveQL. This paper presents the effective way of achieving big
data analytics using big data using Hadoop Hive.
1 Introduction
The data obtained in huge volume from different sources can be in any form, i.e., it
may be structured, unstructured, and even semi-structured. Based on whether data is
structured, unstructured, or semi-structured, appropriate tool can be selected in order
to get useful data. Challenge is getting something which has a value for the user from
the huge amount of data gathered from various sources [1]. Analysis of big data [2]
results in information that can be used by the user to implement new ideas in their
business, hence results in increasing efficiency of the business. Getting information
from huge dataset can also result in monitoring financial transactions. Information
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 565
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_60
566 P. Bhat and P. Hegde
retrieval can be done from big data [3] that can be used in healthcare, crime, and
other fields. The data that flows into the system needs to be analyzed effectively and
quickly to get useful data [4]. Hadoop enables storing and processing of big data in
distributed systems across a cluster of computers using simple programming model.
Hadoop MapReduce is one of the data processing technique which can be applied
to perform big data analytics. Another Hadoop tool Apache Hive can be used to
process data quickly. Hive is a data warehouse system which allows user to write
query similar to SQL and helps to get appropriate answer to the query. Hive [5] is
used to analyze huge amount of data stored in HDFS. This paper gives insight into
working with big data using Hive. Also, this paper explains the flow of data in Hive
system, and the way by which query is processed [6]. This paper also focusses on
characteristics of Hive system.
2 Related Work
Authors Kan et al. [7] presented a paper, where they say Electronic Health Record
(HER) store information in digital format. Due to technological innovations, data in
HER is increasing. Effective techniques are needed to store and analyze and interpret
heterogeneous data in HER. In this paper, authors focused on techniques by which
data in HER can be analyzed, and required information can be retrieved. Hive queries
are executed in HDFS. This paper also shows the use of tableau as a data analysis
technique to get meaningful information from visual graph.
Authors Potharaju et al. [8, 9] say Hadoop is not a software. It cannot be down-
loaded directly into the computer. Hadoop is a framework that contains many tools.
It makes use of ad hoc queries and analyses huge datasets stored in Hadoop. SQL like
language is facilitated by Hive called HiveQL. In this paper, authors presented simple
examples for using Hive using Hadoop. This paper also explains how to create table
and store data in table along with getting data from table when required. Cumulative
CPU time, time required for fetching records from files are also explained in this
paper.
In this paper, author says Apache Hive which provides analytical power to the
users and organizations. Hence, it has become in practice standard for SQL. Hive
is created by Facebook in 2008. This paper compares Apache Hive with Impala,
Shark, and also HAWQ. This paper explains the strength of Hive which has become
enterprise SQL data warehouse.
Authors Thusoo et al. [10] presented a paper, where they say that warehouse
system is becoming expensive as datasets to be analyzed are growing with high
frequency. MapReduce programming model needs user to write custom programs
which make it low-level model. This paper explains Hive architecture, Hadoop,
HiveQL language. HiveQL allows user to add custom MapReduce scripts to queries.
HiveQL contains tables, supportive data type, arrays, maps, etc. Along with this, Hive
also has meta store which contains schemas and statistics that can be used for data
Novel Methodologies for Processing Structured … 567
3 Proposed Methodology
Data growing exponentially with time can be stored and processed using framework
called Hadoop. It provides different tools like MapReduce, Hive, Tez, etc. MapRe-
duce is a programming model which processes data using two steps called map and
reduce. User needs to write lengthy programs in order to work with data. Rather
than writing long codes, it is easy to query dataset to retrieve information. Hive is
built to work with structured data in the way same as SQL. Hive can be used to
query data which is in huge volume same as SQL. Hive is a warehouse infrastructure
developed on top of Hadoop. Hadoop Hive architecture is a solution to manage big
data. It works on data which is stored in HDFS. Hive uses language called HiveQL
to query data. Hive works on the principal of write once and read many times. Hive
is a mirage which processes data using MapReduce but no need to write long code
for the user. Hive query is converted into MapReduce program, and data is retrieved
and provided to the user. Hive is just a translator which makes work of user much
easier (Fig. 1).
3.1 UI
User interface is an interface between user and Hive. It allows user to communicate
with Hive. It provides Hive which provides command line interface, web interface,
and thrift server to the users to submit their queries.
3.2 Metastore
6 Execute
Execution En-
3 Get metadata
2 Get plan
Driver Compiler Metastore
5 Send plan 4 Send metadata
Its work is to execute the plan developed by compiler. To execute the work plan, it
interacts with name node, resource manager. It communicates with data node, where
actual data is stored. It also communicates bidirectionally with metastore to perform
data definition language operations. After communication with Hadoop daemons like
data node, name node, and job tracker execution engine executes query on HDFS.
The result generated is sent to user interface via driver.
4 Characteristics of Hive
Hive is a data warehouse infrastructure which resides on the top of Hadoop and is
used to analyze big data. Some of the characteristics of Hive are as follows (Fig. 2):
• Large data: Hive is a tool that can be used to process data with huge volume.
• Language: Hive uses a query language called HiveQL.
• Table structure: Hive stores data in table format. That is, it stores data in terms of
rows and columns.
• Data analysis: Hive is used to retrieve useful information from large dataset.
Hence, it helps in data analysis.
• Storage: Hive works on data stored in Hadoop distributed file system.
• Multi-user: More than one user can query data stored in Hadoop distributed file
system at the same time using HiveQL language provided by Hive.
Novel Methodologies for Processing Structured … 569
Large
Data
Hive
Table
Storage
Structure
Data
Analysis
Hive can be used to query huge database which cannot be queried using structured
query language (SQL). Queries are converted into series of MapReduce jobs. Hence,
user need to to write long MapReduce codes. Hive uses a query language called
HiveQL to get useful work done (Fig. 3).
Consider a dataset of Zomato restaurants in India. It is a huge dataset which can
be effectively queried using Hive. It contains following attributes:
• Res_id: It represents restaurants id.
• Name: It represents name of the restaurant.
• Establishment: It gives details of restaurant whether it is dhaba, quick bites, casual,
etc.
• Url: It represents url address.
• City: represents the city name, where the restaurant is located at.
• City_id: It represents city id number.
• Locality: It represents locality of restaurant.
• Latitude: It gives latitude coordinate of restaurant.
• Longitude: It gives longitude coordinate of restaurant.
Once dataset has been added to the table, it can be queried as required by the
user. Now, dataset has been places in Hadoop distributed file system. The useful
information can be retrieved from the dataset stored in Hadoop distributed file system
by writing the query in HiveQL language.
A. Display the number of restaurants in Panaji.
SELECT COUNT(*) from zomato1 WHERE locality = “Panaji”;
This query gives the total umber of restaurants which provide Zomato service
in the city Panaji.
B. Display different names of cities which are mentioned in the dataset.
SELECT DISTINCT(city) FROM zomato1;
This query returns different names of the restaurants given in the dataset.
C. Display names and average cost per two person of restaurants located in Amritsar.
SELECT name, average_cost_per_two from zomato1 where city = “Amritsar”;
This dataset gives the list of restaurant names and respective cost per two people.
D. How many restaurants are providing zomato service in Udupi?
SELECT COUNT(CITY) from zomato1;
This query gives total number of restaurants in Udupi city (Fig. 4).
5 Conclusion
In this paper, we discuss how data flows in Hive and how it processes data. Along
with characteristics of Hive, this paper explains some of the novel examples for
572 P. Bhat and P. Hegde
creating, storing, and retrieving useful information using Hive QL command. Big
data is mainly recognized by volume, velocity, and variety. It can be structured,
unstructured, and semi-structured. Big data can be analyzed by using technique like
MapReduce. But MapReduce expects user to write code to get useful information
from stored data. But if the data stored is in structured format, then it can be analyzed
using Hadoop Hive which required user to write query instead of long programming
code. It not only saves the time of user but also helps user who does not have much
expertise in coding. It processes the data by storing it in the form of rows and columns,
i.e., in table format. Hive converts query written in Hive QL language to MapReduce
tasks and processes the data.
References
1. Peng, X., Liu, L., Zhang, L.: A hive -based retrieval optimization scheme for long-term storage
of massive call detail records. IEEE Access 1–1. https://doi.org/10.1109/Access.2019.2961692
2. Shakhovska, N., Veres, O., Mariia, H.: Generalized formal model of big data. ECON-
TECHCHMOD Int. Q. J. 5(2), 33–38
3. Kapil, G., Agrawal, A., Khan, R.A.: Big data security issues. Asian J. Comput. Sci. Technol.
7(2), 128–133
4. Pandey, P., Satsangi, C.S.: Comparative performance using Hadoop ecosystem-PIG and HIVE
through rendering of duplicates. ICANI2018. https://doi.org/10.1007/978-981-13-2673-8_11
5. Krishna Mohan, K.V.N.: Query optimization in big data Hadoop using hive 4(1), 2347–9272
(2016)
6. Pushpalatha, N., Sudheer, P.: Data processing in big data by using hive interface 3(4), 2321–
7782 (2015)
7. Kan, K., Cheng, X., Kim, S.H., Jin, Y.: Apache hive-based big data analysis of health care data.
Int. J. Pure Appl. Math. 119(18), 237–259 (2018)
8. Potharaju, S.P., Shanmuk Srinivas, A., Tirandasu, R.K.: Case study of hive using Hadoop. Int.
J. Eng. Res. Technol. 3(11) (2014). ISSN: 2278–0181
9. Pushpa, S.K., Manjunath, T.N.: Analysis of airport data using Hadoop-hive: a case study. Int.
J. Comput. Appl. 0975–8887 (2016)
10. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy,
R.: Hive a petabyte scale data warehouse using Hadoop. In: Proceedings of 26th International
Conference on Data Engineering, California, USA, pp. 996–1005. https://doi.org/10.1109/
ICDE.2010.5447738
11. Gupta, A.: HIVE-processing structured data in Hadoop. Int. J. Sci. Eng. Res. 8(6), 2229–5518
(2017)
12. Patel, N.: Analyzing of vehicle registration trend in NY using HBase, pig, hive and MapReduce.
https://doi.org/10.13140//RG.2.2.18574.92488
13. Amiripalli, S.S., Tirandasu, R.K.: Case study of hive using hive. Int. J. Curr. Eng. Sci. Res.
1(3), 2393–8374 (2014)
14. Dubey, A.; Big data. Int. J. Eng. Serv. Manage. Res. 5, 9–12 (2018). https://doi.org/10.29121/
ijetmr.v5.i2.2018.606.
15. Manike, C., Nanda, A.K., Gajulagudem, T.: Hadoop Scalability and Performance Testing in
Homogeneous Clusters. https://doi.org/10.1007/978-3-030-30577-2_81
Intelligent Cane for Assistant to Blind
and Visual Impairment People
1 Introduction
According to the latest research conducted by World Health Organization, there are
at least 2.2 billion people suffering from vision impairment or blindness, of whom
around 1 billion people have a vision impairment that could have been prevented
or has yet to be addressed [1]. In India, there are 40 million people blind in that
1.6 million are children [2]. Major reasons being infections, diabetic retinopathy,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 573
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_61
574 M. Patel et al.
age-related macular degeneration, cataract, and glaucoma [1]. But among all this,
the cataract is the most common cause for blindness.
While blind people may require assistance in certain circumstances but they do
not get that always, so for their convenience this assistive aid is to guide them rather
than old and traditional white cane or guided dogs. Adoption of assistive technology
in their life helps them to make their life comfortable. There is some remarkable work
done in the field of electronic mobility aid which is being discussed in the below
section; a smart stick is also one of the electronic travel aids to support them but there
is some hurdle so to overcome this, the updated system is proposed. In this system,
navigation of the user is possible, and the guardian of the user also gets alert emails
and calls whenever the user tumbles. The major problem with the existing system is
that it is bulky and complicated for a user to use and understand so to avoid this, all
the modules in this system are implemented within the stick, and it is foldable which
makes it convenient in terms of mobility.
2 Related Works
Already there are many significant works done in this domain several researchers
came with some good ideas and constructed their projects. Ameliorating some tradi-
tional mobility aid by adding various electronic sensors. Different design of ETAs is
discussed below along with their functionality.
All the electronic systems are implemented in jackets mounted with five sensors.
There will be five ultrasonic sensors mounted on the jacket such that one sensor
detects potholes or stairs; the other sensor is implemented for obstacles near the
head, and the remaining three sensors are for right, left, and front [3]. They have
included the salient feature for users that the microcontroller finds minimum value
from these three ultrasonic sensors and notifies the user about obstacles by voice
command which is pre-installed in micro SD card [3]. The downside of this system
is that people won’t find it comfortable wearing it all the time.
This device was developed by Mukesh Agarwal and Atma ram Gupta. This version
of the stick includes ultrasonic and water sensors [4]. Both these sensors help them in
Intelligent Cane for Assistant to Blind … 575
obstacle detection and water detection as their name suggests. Stick is integrated with
the SIM808 module which supports global system for mobile communication and
merges GPS technology for satellite navigation [4]. Sim card is used to implement
identical communication like regular cell phones. The drawback of this gadget is that
its design is highly complex, and modules does not get fit inside the stick.
3 Proposed System
Ultrasonic sensor. The working principle of the ultrasonic sensor and radar system
is identical. The basic difference between sonar and ultrasonic is that sonar is used
for underwater with both kinds of frequency high as well as low while on another side
ultrasonic is used for terrain surface, and only uses high frequency. The electrical
signals provided to echo pin of ultrasonic sensor are converted to acoustic waves and
vice versa. The ultrasonic wave is also called an acoustic wave. The ultrasonic sensor
generates an acoustic wave at 40 kHz. At the frequency of 18 kHz, the acoustic wave
travels in a free medium by ultrasonic sensor (HC-SR04) [5]. It provides range from
2 to 400 cm the eight acoustic waves burst each burst having a 10 us duty cycle is
sent to the trigger pin of the sensor by the microcontroller at the same time timer
initiates [5]. Immediately, the timer stops after receiving reflected acoustic waves.
The primary aim of this sensor in a blind stick is for obstacle detection like potholes,
staircase, and many more by calculating the time difference between transmitting
and receiving a signal; the distance can be calculated by using this formula:
them so that they can get help and also as per distance between the obstacle and user
increases the intensity of the vibration motor increases.
GPS Neo-6M. GPS means a global position system that works on a basic math-
ematical principle of trilateration. The position is determined by calculating the
distance between the receiver and the satellite. Nowadays, the NEO-6 module series
is very popular due to its cost-effectiveness, on a board memory chip, miniature pack-
ages, high-performance; it also has a ceramic patch antenna and backup battery [7].
This series is based on the u-blox NEO-6M GPS engines. This module works well
with a DC input in the 3.3–5 V range [7]. UART, USB are some well-known commu-
nication protocols supported by this module. Well-known communication protocols
supported by this module. GPS module sends raw data in form NEMA message [7].
User 2D location can be determined by the receiver using at least three satellites
(latitude and longitude) and can track movement. By using four or more satellites
in view, the receiver can determine the user’s 3D position (latitude, longitude, and
altitude).
Node MCU. In this project, we have used Node MCU by Devkit having ESP8266
as a microcontroller. This microchip is integrated with Wi-fi SoC and low power
consumption. ESP8266 chip drives between 3 and 3.6 V [8]. Node MCU module has
a total of 30 pins from them 17 are GPIO having all peripheral duties ADC channels,
UART interface, PWM output, SPI, I2C, and I2S interface [8]. It includes four power
pins, three 3.3 V, and one V in . This microcontroller is for connecting through the
internet and makes a blind stick as a part of IoT so we can access from anywhere in
the world. This microcontroller board is shown in Fig. 2.
In this stick, there are features like emailing and calling as we have connected this
stick with the Internet, a person who has access to an account linked with a stick on
the server can see the location of users on Google maps. All the sensors as discussed
Intelligent Cane for Assistant to Blind … 577
above are installed inside the stick for user best comfort. The modules used in this
device are less than 4 cm. As we have seen blind people use conventional sticks
having stagnant design so we have replaced this stick with a foldable mechanism.
Not only this, but the panic button is also included in this stick for an emergency
purpose which will automatically call the user’s guardian or relative, and at the same
time, email is also sent which incorporates latitude and longitude of the user. The
ultrasonic sensor is arranged in the outer surface of cane in such a way that it covers
all the obstacles which come in the direction and alert the user about them.
As Node MCU is the microcontroller of this system, it is connected with all the
modules. Node MCU is not only connected with MPU6050 but also with GPS module
having I2C protocol. Two General Purpose Input/Output (GPIO) pins of Node MCU
are connected to the echo and trigger pin of the ultrasonic sensor (HC-SR04); this
helps the system to calculate the distance between obstacle and stick. The other two
GPIO pin function as output and warns the user about an obstacle on their way.
The first is connected to the speaker, and the second is connected to the vibration
motor. One more pin takes action as input, and it is connected to is a panic button for
emergency purposes. Most of the sensors are oriented inside the stick so it becomes
more convenient for the user to hold it as shown in Fig. 4.
4 Working
Fig. 5 Flowchart
pitch, and yaw by monitoring roll parameter the system analysis whether the user
has fallen or not. If users tumble, then the stick will also fall so to navigate them;
data from GPS module is fetched, and at the same time, the IFTTT server will trigger
links related to webhook service which will call and send emails to the guardian or
relatives of the user. The call will just inform them to check email while in the email,
there will be the location of the user at one click. This will direct the user’s guardian
or relative through the Thingspeak server, and it will show the latitude and longitude
of the user as pinout on Google maps. The whole process will continue checking all
the aspects of the system until the system is switched off.
580 M. Patel et al.
5 Results
The system is checked in outdoor and normal conditions, and the results are according
to expectation. When the stick fall, microcontroller triggers IoT platform services
like IFTTT and Thingspeak. The call is received informing to check email which
consists of a link as shown in Fig. 6 is sent, and when we click on that link web page
consisting of pinout location on Google maps is opened as shown in Fig. 7.
There are two ultrasonic sensors connected in this system; their readings are taken
as input and shown in Table 1. In this table, the output of the vibration motor will
vary according to the range of the ultrasonic sensor when the range is between 40
and 60 in. vibration will occur on an interval of 1000 ms; if the obstacle is in range of
20–40 in, then the vibration of motor interval becomes 100 ms; at last, if the object
is less than 20 in., then the vibration will be constant, and pre-install tune is played.
Table 2 contains the data of time taken by the system to connect wi-fi and trigger
time from system to server and server to user’s relatives when the user tumble. Table
2 mostly depends on the Internet speed of a user and a user’s relatives. These readings
were taken under 25–30 Mbps as a user connection and 35–40 as the user’s relative
connections.
The model is a simple foldable blind stick consisting of many features and easy to
use for users. The system is designed as such to replace the old and traditional blind
stick which visually impaired people are using for a long time. The main motive
is to provide visually impaired being with affordable assistive technology costing
around 3000–4000. Although, it has limitations like user must be accompanied by a
smartphone having Internet connection 24/7. In future, artificial intelligence can be
install so that the user can easily operate through voice command and get feedback
in terms of voice.
References
Abstract The increasing demand for improving road traffic and the driver’s safety
has brought our consideration towards the Intelligent Transportation System (ITS)
which was termed as Vehicular Ad-hoc Network (VANET). Its main goal is to
enhance roadways efficiency and traffic safety. In VANET many issues came while
implementing privacy and security measures. Since this network is vulnerable to
attacks on security therefore numerous security requirements need to be fulfilled. In
this survey we have emphasized on finding the limitations of the existing papers in
the respective field. Going through the fundamentals of VANET, we have illustrated
its communication methods. Then we have discussed the application areas and secu-
rity services in the contiguous sections. Later possible attacks in VANET have been
thoroughly discussed.
1 Introduction
Large number of vehicles can be seen running on the roads of a city. Road traffic
controllers manually direct vehicles to reduce the traffic congestion and prevent road
accidents but without using wireless communication technology, it is really a hec-
tic job for them. Unknowingly, they may direct the traffic to an already busy road.
Also, they may not be aware of some emergency vehicles which may be stuck in
the traffic away from their eyesight. To solve all these kind of problems, the Intelli-
gent Transportation System (ITS) was introduced which provides the two types of
communications i.e. vehicle to vehicle (V2V) and vehicle to infrastructure (V2I).
Here infrastructure involves the two basic components—Road Side Unit (RSU) and
Trusted Authority (TA), installed alongside the road. Later, ITS was termed as Vehic-
ular Ad-hoc Network (VANET) which uses the functionalities of Mobile Ad-hoc
Network (MANET). The network architecture of VANET consists of three major
components i.e On-Board Unit (OBU), Roadside Unit (RSU) and Trusted Authority
(TA) as shown in the Fig. 1. In VANET, every vehicle is assumed to be equipped with
an OBU device that also comprises of the different component e.g. Global Position-
ing System (GPS), micro-sensors etc. OBU takes the advantages of Dedicated Short
Range Communication (DSRC) protocol which is based on IEEE802.11p (5.9 Ghz)
radio technology to communicate among the vehicles. It also uses a Tamper-Proof
Device (TPD) to store secret information of the vehicles. TPD is assumed to be more
secure as it is considered to be unfeasible to ingress the stored data for a malicious
node. It warns the driver periodically about traffic-related information like speed,
location, direction and road condition etc. to avoid traffic jams and road accidents.
Further, this information is sent to RSU and then RSU verifies all the received infor-
mation and rebroadcasts it with the warnings to other vehicles. Moreover, RSU is
responsible for all the authentication work to lighten the burden of a TA. Whereas
TA plays a major role in registering all the OBUs and RSUs. It has high computa-
tional and storage capabilities as compared to other components and also maintains
a database of the vehicles so that they can remove a malicious node from the network
by tracing back to the origin of the messages.
Every vehicle in the VANET broadcasts safety messages which may contain a
vehicle’s information (e.g. speed, position etc.) that need to be processed before trans-
mitting to the other vehicles because any malicious vehicle may intend to send some
misleading messages deliberately that can destroy the VANET. It is also required to
secure the personal information (e.g. id, car number etc.) of a vehicle and prevent
other nodes from accessing it in the network. So, it arises the requirements of security
services. Verifying every message sequentially by RSU, may not satisfy the timing
requirement of the VANET. Let us suppose there are 200 vehicles and everyone is
sending messages in every 300 ms that need to be signed. Consequently, an RSU
will have to verify at least 650 messages per second approximately which is not
a good solution. Moreover, storing and managing the public key certificates were
also a communication overhead. So, to overcome this, ID-based group verification
scheme was suggested in which a batch of messages is verified at a time that sig-
nificantly reduces the time overhead. However, it is also having some drawbacks.
The rest structure of the paper is as follows: Sect. 2 provides the application areas
of VANET in brief. Required security services of VANET are described in Sect. 3.
Possible Attacks types are mentioned in Sect. 4. Section 5 presents the discussion on
existing papers in detail. And finally Sect. 6 presents the concluding remarks of the
paper.
2 Application Areas
The interaction of the OBU with the RSU has solved the traffic congestion problem.
There are a number of applications of VANET in real life some of them are as follows.
These applications observe the traffic pattern and manage them accordingly. It
enhances the delivery of traffic information by improving the efficiency and accuracy
of traffic detection.
586 A. Islam et al.
These applications directly relate to the comfort of passengers and drivers. It keeps
them updated with the vicinity such as locations of the nearest fuel point, ATMs, food
courts and restaurants with their price list. While having an interface with the RSUs,
it also provides entertainment related applications such as online games, nearest
cinema hall’s location etc.
3 Security Services
It is the basic requirement in VANET that enhance the network by providing the
security to its users, data and services. To make a vehicular network trustworthy
and efficient, its security services should be effective. There are some basic security
services which are as follows.
3.1 Availability
In VANET, availability ensures that all the required resources and services should
be available to all the legitimate vehicles during wireless communication. Since all
other services depend on the availability of the resources which make it one of the
very crucial security services.
3.2 Integrity
Integrity ensures vehicles of the network that the data they are sharing during the
communication is not altered or modified in between. It is an important security
service because in the absence of it, a hacker can modify the data that may cause
traffic congestion or accidents in some cases.
3.3 Authentication
Authentication service makes sure that the vehicle who is sending the safety message
to the RSU is an authorized user. In addition to this, the receiver can also be sure
about the legitimacy of the sender via a pseudonym. So it allows only the legitimate
vehicle to communicate in VANET.
A Comprehensive Survey on Attacks and Security … 587
3.4 Confidentiality
Confidentiality basically assures the vehicles or users that their messages will not
be read by any illegitimate user in the network. It is achieved by encrypting the
transmitted message.
With this service, the vehicle which has sent the message can not retract with this
fact. So it works like a proof of sender for the receiver of the message in the VANET.
It can also help in tracing an unauthorized vehicle.
As of now, we have discussed the vehicular Ad-hoc network and we know that
VANET is vulnerable to attacks. In VANET, attacks can be defined as stealing or
manipulating the vehicle’s information and using it for all the wrong purposes. In
this section, we will enunciate about major possible attack types in VANET.
Spams are the unsolicited messages (eg advertisement). They have no use to the
vehicles’ driver or the traveller. They are only meant for consuming the bandwidth
that may cause high latency. Due to the lack of Central Administration, it is difficult
to control.
It refers to modify or to manipulate the information. Hackers may add some new
messages or hide the precautions. That may result in road jams and accidents also in
some cases.
588 A. Islam et al.
If an attacker tries to jam the communication medium and restrict legitimate users
from accessing the network resources, this comes under DOS attack. This attack is
performed by flooding the requests to the RSUs.
Malware is the malicious software carried mostly by the insider. They are intended
to steal the relevant information. Malware (e.g. worm or virus ) could be installed in
vehicles during the installation of the update.
It is an attack where the fraudulent vehicle captures another vehicle’s safety message
and replays the manipulated messages for his own use which may cause traffic
congestion. This attack may incur insignificant failure to the network.
In a Sybil attack, multiple fake vehicles are created to send fake safety messages
which may force another vehicle to change their way and result in traffic jams.
It involves forging the identity for unauthorized access to the VANET. This is intended
for gaining the personal information of some authorized vehicle and may send wrong
messages in the network.
Global positioning is used to locate a vehicle in real-time. This attack involves pro-
viding the wrong location to the other vehicles.
A Comprehensive Survey on Attacks and Security … 589
Message tampering attack involves altering the useful information. In this attack, a
malicious vehicle may discard, alter or drop the information shared by an authorized
vehicle in VANET.
The attacker performs the tunneling attack with the intention of analysing the traffic
by linking two parts of the vehicular network with the help of a tunnel where the
tunnel refers to a communication medium.
Non-repudiation can provide assurance about the sender of the message that means
he can not oppose it [12]. Two users should be identified uniquely. It happens when
two or more users share the same key for communication. In such cases tracing the
unauthorized user is difficult.
As we can see that road traffic is increasing day by day, and unresponsive behaviour
of the drivers may cause a traffic jam and seldom an accident also. So to overcome
this, many researchers have proposed different security protocols. But these protocols
have various vulnerabilities. In this section we have classified those protocols into
590 A. Islam et al.
three different categories i.e. Public Key Infrastructure (PKI) based schemes, Elliptic
Curve Cryptography (ECC) schemes and Identity-based signature (IBS) schemes.
Later we have compared those protocols and analysed them briefly. This classification
has been shown in Table 1.
Elliptic curve cryptography (ECC) was first proposed by Neal Koblitz and Victor
S. Miller in 1985. It provides a high level of security in VANET with low cost.
Considering the points on the elliptic curve, it generates the public and private keys.
This algorithm uses smaller keys as compared to the RSA (Rivest–Shamir–Adleman)
and DSA (Digital Signature Algorithm) due to which it takes less computational
power. In addition to this, it requires less space and less bandwidth. As well as it
takes less time to generate keys and encrypting or decrypting the data.
In 2019, Cui et al. [3], analysed and found a weakness in the group based and
pseudonym protocol. The author has shown that these schemes lack many function-
alities such that it needs to manage CRL and distribute the certificate to vehicles.
With such schemes, the vehicles need to store the certificates, key pairs which is
very bulky. To manage certificate revocation lists, large computational and storage
capabilities are required. For that reason, many schemes are available on the basis of
TA but it is very difficult to implement it in real-world. So, the author has suggested
a new semi-trusted ECC based authentication scheme. In this scheme, the receiver
has no need to worry about CRL and the vehicle has no need to store it as well.
In the same year, Ming et al. [2], suggested a scheme which is based on ECC for
V2I communication. According to the scheme, the RSU can handle a vast number of
messages in very less time. The suggested scheme also fulfils all the security require-
ments along with provably secure in the random oracle model. This scheme neither
uses Bilinear Pairing (BP) nor map-to-point operation. Thus it reduces the computa-
tion delay of singing and verifying the messages and this scheme is appropriate for
real-life application.
As we know that ECC protocols have the benefits of less computational power
as compared to other encryption schemes. So, after analyzing the protocols we have
found that for batch verification the Cui et al. [3] protocol’s computation cost is
higher than that of Ming et al [2] protocol because Ming et al. [2] protocol takes (2n
+ 2) scalar multiplication operations in ECC whereas Cui et al. [3] protocol take (n
+ 2) scalar multiplications operation, (n) small scale multiplication operations and
(2n) addition operations in ECC along with (2n) one-way hash function operations.
Besides, in both the protocols, every user authenticated once is assumed not to be
maliciously affected in the near future.
private keys are generated with the help of the Private Key Generator (PKG). In IBS,
private keys are used to sign the safety messages. It can be defined in four phases:
• Setup phase: In the first phase of ID-based signature scheme, PKG generates the
system parameters (master key) which are distributed across the vehicles.
• Key Extraction: In this phase a private key is generated using vehicles unique id
and master key for the communication purpose.
• Signing phase: In this phase message is signed by using timestamp and previously
derived private key.
• Verification phase: Finally, the signed message is verified by using verification
algorithm.
In 2008, Zhang et al. [10], addressed the issue in the OBU to RSU communication
that when the RSU gets a large number of signatures, due to storage problems, it
could not verify them in the span of 300ms time. There must be a delay to verify
all the signatures. So, the author proposed this ID-based protocol to overcome these
issues. This is identity-based protocol due to which no certificate is needed and that
is why transmission overhead problems can be reduced. In this protocol, the author
used batch verification techniques by using bilinear pairing to overcome the delay in
verifying a huge number of signatures.
In 2010, Chim et al. [7] raised issues in Zhang et al. [10] protocol stating that this
protocol is impuissant to impersonation attack and heavily depends upon TPD. If the
TPD is compromised then the whole network will have to suffer. So, to overcome, they
suggested an first software-based group communication protocol by using Bloom
Filter(BF) and Binary Search Techniques(BST). In this scheme, there is no need for
an RSU to share information within a batch. It also takes the advantages of BP and
reduces the number of operations to improve its efficiency.
In 2011, Horng et al. [5], addressed the issue in the previous Chim et al. [7]
protocol and found that it is still vulnerable to an impersonation attack. So, the
author suggested a new authentication scheme to overcome it. Being a software-
based protocol, it does not rely on the hardware. In this protocol, the vehicle can
generate a pseudo-identity to transfer a message to another vehicle so that the real
identities of vehicles are not revealed. Only TA can disclose the uniqueness of the
vehicles whenever it is required.
In 2018, Li et al. [4], suggested an ID-based message authentication scheme that
takes the advantages of Id signature and ring signature along with BP. After analysing
the securities, they have shown that this protocol can defend key exposure attack and
forgeability attack.
In April 2020, Ali et al. [1], has suggested an identity-based conditional privacy
preserving authentication (ID-CPPA) scheme for V2I communication that relies on
BP. It uses one-way hash function due to which processing of the messages at RSU can
be done efficiently. It also allows the batch verification and ensures the forgeability
attack against the Inverse Computational Diffie-Hellman problem in the oracle model.
The above protocols use the Bilinear Pairing (BP) approach which requires heavy
operations. Due to which it obligates high computational cost. VANET needs to be
more secure and capable of refraining the attacker from accessing the network. For
594 A. Islam et al.
that reason, we have analysed aforementioned paper and found that Zhang et al. [10]
and Chim et al. [7] protocols are still impuissant to overcome impersonation attacks.
Moreover, Zhang et al. [10] and Horng et al. [5] protocols are needed to provide the
security against the traceability attack. We have noticed that Li et al. [4] protocol
uses the bilinear pairing which increases the computational delay. Along with this,
Li et al. [4] protocol makes use of ring-signature and Id-based signature. These all
are heavy operations that make message signing and verifying inefficient. At last, we
have observed that Ali et al. [1] protocol is still facing high communication overhead
due to PKG.
6 Conclusion
Vehicular Ad-hoc Network fulfils the emerging requirements of vehicles for mak-
ing the Intelligent Transportation System. So it is seen that in the past few years,
researchers have concentrated on improving the security and privacy of Vehicular
Ad-hoc Network. The rationale behind VANET is to implement it into the real world
and to provide a better traffic system. In this paper, we have discussed the security
services, possible attack types and communication method in VANET. At last, we
have illustrated the benefits and drawbacks of the existing papers successfully. It is
expected that this paper will give a clear overview of already suggested protocols on
VANET and will open a door for researchers to extend the securities in the VANET.
References
1. Ali, I., Li, F.: An efficient conditional privacy-preserving authentication scheme for vehicle-
to-infrastructure communication in VANETs. Veh. Commun. 22, 100228 (2020). https://www.
sciencedirect.com/science/article/abs/pii/S221420961930275X
2. Ming, Y., Cheng, H.: Efficient certificateless conditional privacy-preserving authentication
scheme in VANETs. In: Mobile Information Systems 2019 (2019). https://www.hindawi.com/
journals/misy/2019/7593138/
3. Cui, J., Wu, D., Zhang, J., Xu, Y., Zhong, H.: An efficient authentication scheme based on semi-
trusted authority in VANETs. IEEE Trans. Veh. Technol. 68(3), 2972–2986 (2019). https://
ieeexplore.ieee.org/document/8629275
4. Li, J., Liu, Y., Zhang, Z., Li, B., Liu, H., Cheng, J.: Efficient ID-based message authentication
with enhanced privacy in wireless ad-hoc networks. In: 2018 International Conference on
Computing, Networking and Communications (ICNC), Maui, HI, pp. 322–326 (2018). https://
ieeexplore.ieee.org/document/8390287
5. Horng, S., et al.: b-SPECS+: batch verification for secure pseudonymous authentication in
VANET. IEEE Trans. Inf. Forensics Secur. 8(11), 1860–1875 (2013). https://ieeexplore.ieee.
org/document/6576161
6. Wasef, A., Shen, X.: EMAP: expedite message authentication protocol for vehicular ad hoc net-
works. IEEE Trans. Mob. Comput. 12(1), 78–89 (2013). https://ieeexplore.ieee.org/document/
6081877
A Comprehensive Survey on Attacks and Security … 595
7. Chim, T.W., et al.: SPECS: secure and privacy enhancing communications schemes for
VANETs. Ad Hoc Netw. 9(2), 189–203 (2011). https://www.sciencedirect.com/science/article/
abs/pii/S1570870510000648
8. Lu, R., Lin, X., Zhu, H., Ho, P., Shen, X.: ECPP: efficient conditional privacy preservation pro-
tocol for secure vehicular communications. In: IEEE INFOCOM 2008—The 27th Conference
on Computer Communications, Phoenix, AZ, pp. 1229–1237 (2008). https://ieeexplore.ieee.
org/document/4509774
9. Zhang, C., Lin, X., Lu, R., Ho, P.: RAISE: an efficient RSU-aided message authentica-
tion scheme in vehicular communication networks. In: 2008 IEEE International Conference
on Communications, Beijing, pp. 1451–1457 (2008). https://ieeexplore.ieee.org/document/
4533317
10. Zhang, C., et al.: An efficient identity-based batch verification scheme for vehicular
sensor networks. In: IEEE INFOCOM 2008-The 27th Conference on Computer Com-
munications. IEEE (2008). https://www.researchgate.net/publication/4334277_An_Efficient_
Identity-Based_Batch_Verification_Scheme_for_Vehicular_Sensor_Networks
11. Raya, M., Hubaux, J.-P.: Securing vehicular ad hoc networks. J. Comput. Secur. 15(1),
39–68 (2007). https://www.researchgate.net/publication/37439204_Securing_Vehicular_Ad_
Hoc_Networks
12. Khan, S., Khan Pathan, A.: Wireless Networks and Security, vol. 10, pp. 978–3. Springer
(2013). https://link.springer.com/book/10.1007%2F978-3-642-36169-2
Analysis, Visualization and Prediction
of COVID-19 Pandemic Spread Using
Machine Learning
Abstract Over the years, human beings have faced several health issues related
to the spread of viruses. After Spanish flu, Nipah, and Ebola, now COVID-19 has
thrown a serious threat to society all over the world. The rate is increasing exponen-
tially, prevention, proper measurement and strategic action are the need of the hour
to combat this pandemic. This paper focuses on analyzing COVID-19 dataset using
numerous machine learning (ML) algorithms, visualizing the results and evaluating
the performance of the best algorithm. The spread of virus outbreak has caused thou-
sands of deaths across the world and is considered to be a pandemic according to
WHO reports. There are a number of methods in preventing the risk of infection
manually such as predicting the risk of infection, screening the patients, using chat-
bots to analyze the risk of infection, identifying and speeding up drug development,
etc. In this paper, we mainly experimented with KNN, ANN, SVM, linear (LR) and
polynomial regression (PR) methods to learn and analyze about pandemic spread.
To achieve this, we have considered COVID-19 dataset of Karnataka state. Mostly,
district-wise confirmed, active and death cases have been considered for this work.
In addition, we have also performed gender-wise infection spread and presented a
cumulative dashboard for overall district-wise active, confirmed and recovered cases
of Karnataka.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 597
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_63
598 S. Sen et al.
1 Introduction
2 Literature Survey
In the last few years, AI, machine learning and deep learning are setting notable
footprints in data analysis across every sector. Due to this COVID-19, researchers
all over the world demand help from data scientists to analyze and predict the spread
so that situation can be handled in a better and organized way. Benvenuto et al. [2]
discussed the ARIMA model based on autoregressive integrated moving average
which is useful for predicting COVID-19 spread and forecasting disease prevalence.
Narinder et al. [3] have evaluated and compared performance of support vector
machine, polynomial regression, deep neural network and recurrent neural networks
using long short-term memory (LSTM) with COVID-19 data from Johns Hopkins
University and finally reported PR offers best prediction results with low root mean
square error (RMSE) over other approaches in confirmed, death and recovered case.
In his paper [4], time series method described by Deb et al. helps in estimating
reproduction rate of COVID-19. Here, the authors also concentrated on usage of
various data analysis and statistical tools to find out patterns of virus outbreak. Based
on that, early precaution can be taken. While working in same research orientation,
Analysis, Visualization and Prediction of COVID-19 Pandemic … 599
Mainly, we used data from Kaggle and Johns Hopkins University. The first reported
confirmed case in Karnataka is on March 9, 2020. So, our dataset contains data from
that day till 5th June for analysis.
Data analysis has been done using Python in Jupyter notebook with libraries from
Matplotlib and Seaborn for visualization. In Fig. 1, we have reported district-wise
active versus confirmed versus recovered cases captured during March 2020 till June
8, 2020.
We have built a dashboard using Microsoft Excel for the number of COVID-19 cases
in Karnataka as on May 21, 2020. Some techniques like conditional formatting,
600 S. Sen et al.
pivot table and a few basic formulas were used in Excel for obtaining the desired
dashboard. A provision for users to compare districts has also been incorporated.
Users can also visualize the districts which are above a certain limit. Here, the input
for this type of formatting must be given by the user (Fig. 2).
Day-wise rise in confirmed cases has been plotted here keeping daily increase as a
target variable. Dataset for Karnataka has been used till June 5, 2020. For SVR, RBF
kernel is used, KNN with k = 3 and ANN with a single hidden layer with ReLU
activation function and linear activation function in the last layer is considered using
Keras. Network was trained in 10 epochs with 10 batch size. MAE reported 797.3837.
Analysis, Visualization and Prediction of COVID-19 Pandemic … 601
Table 2 Forecasting
Date Predicted value Actual value (confirmed
confirmed cases
(confirmed case) case)
1/6/20 3130 3221
2/06/20 3337 3408
3/06/20 3556 3796
4/06/20 3788 4063
5/06/20 4033 4320
MAE and RMSE are evaluator metric for regression model. Linear regression does
not fit COVID-19 data well, whereas polynomial regression with degree 5 works
best (Fig. 3).
From Table 1, it is visible that PR works best among all algorithms with least
RMSE. So, we used PR for forecasting day-wise confirmed cases depicted in Table 2.
Here, in Fig. 4, we have shown interactive pie chart, KDE plot to find the percentage
of confirmed, recovered, active and diseased cases from each district of Karnataka.
Brighter region consists of safer districts, and the darker region consists of districts
which are more prone to COVID-19. Date-wise number of male and female confirmed
cases and categorizing districts based on infection spread are analyzed using KNN.17
602 S. Sen et al.
districts are in critical zone. Critical zone is calculated by the percentage of recovered
victims with respect to total affected victims.
5 Conclusion
To save the world from the jaw of this pandemic, more collaboration among the
medical fraternity and data scientists should be promoted. Through this paper, we
tried to highlight the impact and potential of machine learning tools to fight this
disease quickly. Collecting more datasets and exploring other ML algorithms can
be part of further research study for even more better prediction. While considering
Indian population, early lockdown helped us to reduce the number of infected cases
and death rate too. Still, there is a long way to go by maintaining social distance,
avoiding crowded places and using sanitizer and mask. So, stay healthy, stay safe.
Analysis, Visualization and Prediction of COVID-19 Pandemic … 603
References
1. WHO corona viruses (COVID-19). Retrieved June 10, 2020 from https://www.who.int/emerge
ncies/diseases/novel-coronavirus-2019
2. Benvenuto, D., Giovanetti, M., Vassallo, L., Angeletti, S., Ciccozzi, M.: Application of the
ARIMA model on the COVID-2019 epidemic dataset. Data Brief 105340 (2020)
3. Punn, N.S., Sonbhadra, S.K., Agarwal, S.: COVID-19 epidemic analysis using machine
learning and deep learning algorithms. Preprint https://doi.org/10.1101/2020.04.08.20057679
(2020)
4. Deb, S., Majumdar, M.: A time series method to analyze incidence pattern and estimate
reproduction number of COVID-19. arXiv preprint arXiv:2003.10655 (2020)
5. Kucharski, A.J., Russell, T.W., Diamond, C., Liu, Y., Edmunds, J., Funk, S., Eggo, R.M., et al.:
Early dynamics of transmission and control of COVID-19: a mathematical modelling study.
Lancet Infect. Dis. (2020)
6. Lauer, S.A., Grantz, K.H., Bi, Q., Jones, F.K., Zheng, Q., Meredith, H.R., Azman, A.S., Reich,
N.G., Lessler, J.: The incubation period of coronavirus disease 2019 (COVID-19) from publicly
reported confirmed cases: estimation and application. Ann. Intern. Med. (2020)
7. Narin, A., Kaya, C., Pamuk, Z.: Automatic detection of coronavirus disease (COVID-19) using
x-ray images and deep convolutional neural networks. https://arxiv.org/ftp/arxiv/papers/2003/
2003.10849.pdf
8. Kent, J.: Data scientists use machine learning to discover COVID-19 treatments. https://health
itanalytics.com/news/data-scientists-use-machine-learning-to-discover-covid-19-treatments
(As on June 10, 2020)
9. Gallagher, M.B.: Model quantifies the impact of quarantine measures on Covid-19’ spread
http://news.mit.edu/2020/new-model-quantifies-impact-quarantine-measures-covid-19-spr
ead-0416 (As on June 10, 2020)
10. Nguyen, T.T.: Artificial intelligence in the battle against coronavirus (COVID-19): a survey
and future research directions. Preprint https://doi.org/10.13140/rg.2.2.36491.23846 (2020)
Study of Behavioral Changes
and Depression Control Mechanism
Using IoT and VR
Abstract Internet of things (IoT) has cemented its place as one of the critical tech-
nologies in providing solutions for the present issues. Though we say that there is a
massive advancement in technology, but we still have a person who dies every 40 s due
to depression or mental health-related issues. Mental health disorder/deterioration or
depression is the key issue which needs to be addressed in the healthcare domain. This
paper is a study about analyzing the behavioral changes and how the IoT and virtual
reality (VR) can be used for identification of mental disorder or depression-related
issues and avoid facing the critical stages of depression by the user.
1 Introduction
In this millennium generation, we have seen many technologies being used for
addressing the issues in various domains like health care, transport, public or civil
services and so on. Health care has progressed with many innovations including
tracking of health, fitness and providing interventions to maintain the health condi-
tion. However, as per the report by “World Health Organization” (WHO), one person
is dying in every 40 s [1]. Though there is vast research happening with respect to
various diseases and health conditions, there is limited research focused on diag-
nosing and healing depression [2]. As per the estimation of the World Health Orga-
nization, depression will be one of the top three pandemic diseases by the year
2030 [1]. Hence, there is a tremendous necessity to address the mental health-related
issues, which causes depression and emotional imbalance.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 605
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_64
606 P. K. Katkuri and A. Mantri
Depression and anxiety patients need to be treated with utmost care and support,
to cure and make them back normal, but most of times the end result is not achieved.
Depression will lead to low mood, lack of interest in the activities they do which
can be identified by the change in their behavior or actions. These patients if not
identified in the early stage, their disorder could blow out of proportion and could
lead them to suicidal thoughts.
Most of the patients fail from getting recognized that they are suffering from
disorder tendency; if identified on time, few could get cured based on their stage
of depression levels, few might end up with committing suicide and few could
undergo the treatment for a lifetime at regular intervals. So, it is very much
essential for the patient to get diagnosed at an early stage and get timely cured
or and thus avoid from reaching critical stages of depression [3]. As tracking
human emotions/health is key element to cure depression patients, IoT devices were
deployed for tracking/monitoring the emotional health. For this, a perfect mecha-
nism should be able to analyze the data generated by IoT devices and VR so as to
identify the level of depression and then generate the recommending actions to help
the patient recover.
Depression/mental disorders can be classified into three stages; they are mild,
moderate and severe. In order to cure the depression, we have adequate physiatrists,
hospitals and medical treatments, etc., but still depression is considered a critical
problem as they are not medicated or identified in the early stages. If the patients are
identified in their early stage and given utmost care and are shown empathy, most
of the cases could be cured. However, most of them cannot even know on their own
that they are suffering from a mental disorder. Patients with strong psychological and
physical health can recover even from the severe stage, but the percentage of such
types is very low. Once the stage of the mental disorder is identified based on the
score, the IoT system can be used to (1) provide interventions between doctors and
patients, (2) enable individuals to engage in their health care actively, (3) support
608 P. K. Katkuri and A. Mantri
Fig. 2 Recommendations
based on depression disorder
Study of Behavioral Changes and Depression Control Mechanism … 609
gardening, etc. These activities can keep the patient engaged and active—physi-
cally and mentally. Recommendations should vary from person to person and be
categorized and be categorized based on the age and health condition of the user.
IoT system should generate the recommendations by taking the following into
consideration:
(i) Gender of the use.
(ii) Age group varying from children to old age.
(iii) Based on emotions or symptoms measured.
(iv) Identifying the context when the abnormal behavior is recorded.
(v) Tracking the location of the user (Fig. 3).
Based on the above factors, the recommendation is to be generated by the system.
The sensors used for measuring play a vital role in generating the data. These recom-
mendations need to be changed based on the state of the mental disorder. Simply
having an automated mechanism will not be able to cure the patients, instead it can
only help the user to identify and suggest the recommendation. It is always necessary
and conveniently beneficial to have the people who can take care, shower love and
empathy on the patient. If the patient is suffering from mild depression, this is the
stage where the state and progress of the patient are regularly informed to his family
or friends or with whom the patient maintains frequent contact. This contacts’ data
can be taken from the patient’s phone, analyzed for measuring the emotional connect
of the patient with those contacts. When the patient is about to reach the severe stage,
the report has to be shared or reported to the doctor. These recommendations and
continuous monitoring of the patient and the collected data will precisely help in
610 P. K. Katkuri and A. Mantri
identify the depression or mental disorder. Data thus has a more significant impact
on analyzing depression or mental disorder.
A vulnerable segment of those with a mental illness is termed as severe mental illness
(SMI), a condition associated with psychosis and other extreme states. These patients
may resist treatment, particularly, medication; this is where the sensors and IoT
systems will be beneficial. This is the reason we have seen the penetration of wearable
devices to monitor health. The sensors in the devices can help the doctors and family
members to ensure that patients take their medication regularly as prescribed.
Various devices like Mimo, Sproutling, Withings home, Emospark, Apple watch,
Jawbone, Fitbit, etc., have proved that people are not only attracted towards wearable
devices but also become inclined to improve few parameters of fitness. Some of the
advanced devices can monitor blood pressure, sugar levels heart rate, sleep, oxygen
level, water consumption, etc. Apple’s new cognitive kit can also help the patients to
share their live moods with the doctors. The best thing about the cognitive kit is that
it has some games, which record the reaction of the patients and therefore act like a
psychometric test. This proves that IoT can be a better solution for diagnosing and
monitoring the patients.
4 Diagnostic Algorithms
Is it vital to diagnose the patients faster? Does this have a greater impact on the
rate of recovery? Studying the latest research in which the researchers believe that
IoT systems with VR technologies help in diagnosing the patients faster when
compared to the traditional methodology. However, combining this with AI algo-
rithms improves and speeds up the diagnosis and the following treatment. IoT
combined with VR/AI algorithms detects signs of clinical depression three months
earlier than the medical provider’s diagnosis. With the help of these advance systems,
monitoring of changes can be done and analyzed accurately.
During a recent test, the patients underwent the examination of their mood and
thought patterns. This information necessitated to guide the patients through cogni-
tive behavioral therapy (CBT) skills. This treatment led them to the proven AI-based
approach better than other mechanisms. As mental health is difficult to be measured,
we need to have better wearable tools to measure our vital statistics. IoT-driven smart
concepts supported by algorithms are a win-win situation for both, the rural masses
for getting cured and their doctors for quickly predicting illness before it develops.
These applications or the systems have the high potential to save countless lives. AI
and machine learning, for instance, could learn from the symptoms, treatments and
Study of Behavioral Changes and Depression Control Mechanism … 611
outcomes for a specific condition and provide insights to the physician, which would
be difficult to predict as an individual (Fig. 4).
Some of the major changes in the behavior that may lead to depression/mental
disorder are phobia and panic attacks. VR Technologies provides great means of
support in accessing and treating the clinical disorder patients [9]. Many reviews in the
past, discussed about the clinical implications and findings of VR on disorder patients
with accessed quality [10]. Various pilot studies, open trails and random access
trails (RCTs) were reviewed which implemented VR treatment. These compared the
effectiveness of VR treatment with other or no treatment. This study proved that
VR treatment provided better outcome compared to other treatment or no treatment.
VR-enabled treatment can provide better outcomes for disorders caused by “fear of
heights,” fear of flying,” “spider phobia,” “social phobia,” “obesity,” “fear of public
speaking,” etc. (Table 1).
VR Technology supports various treatments like “VR-assisted cognitive behavior
therapy,” “VR-based cognitive treatment,” “VR exposure,” “VR therapy,” etc. These
treatments provided better results by treating the clinical attendees and referred
patients who were facing depressions and behavioral changes, but the evidence for
the efficacy of VR treatment still needed to be established.
6 Research
Research study shows that one in every 20 persons suffers from depression at one
or the other stage of their life. Many researchers are working on designing various
IoT systems to detect depression and track emotions in diverse age groups. Multiple
612 P. K. Katkuri and A. Mantri
mechanisms like speech features, facial expressions, text patterns, algorithms and
energy levels were used to identify depression and provide recommendations for the
patients. Wearable IoT technology, smart healthcare, virtual reality [11], artificial
intelligence (AI), EEG signal processing and so on are all undergoing extended trials
for optimization in this field.
References
1. https://www.who.int/news-room/detail/09-09-2019-suicide-one-person-dies-every-40-sec
onds
2. Anumala, H., Busetty, S.M., Bharti, V.: Leveraging IoT device data for emotional health. In:
International Internet of Things Summit, pp. 487–501. Springer, Cham (2015)
3. Deepika Mathuvanthi, P., Suresh, V., Pradeep, C.: IoT powered wearable to assist individuals
facing depression symptoms (2019)
4. Beck, A.T., Ward, C.H., Mendelson, M., Mock, J., Erbaugh, J.: An inventory for measuring
depression. Arch. Gen. Psychiatry 4, 561–571 (1961)
5. Zois, D.S.: Sequential decision-making in healthcare IoT: real-time health monitoring, treat-
ments and interventions. In: 2016 IEEE 3rd World Forum on Internet of Things (WF-IoT),
pp. 24–29. IEEE (2016)
6. Ali, S., Kibria, M.G., Jarwar, M.A., Kumar, S., Chong, I.: Microservices model in WoO
based IoT platform for depressive disorder assistance. In: 2017 International Conference on
Information and Communication Technology Convergence (ICTC), pp. 864–866. IEEE (2017)
7. Vaseem, A., Dr. Sharma, S.: Depression: a survey on the Indian scenario and the technological
work done. Int. J. Eng. Res. Technol. (IJERT) 08(03) (2019)
8. https://www.c-sharpcorner.com/UploadFile/f88748/internet-of-things-applications/
9. Gregg, L., Tarrier, N.: Virtual reality in mental health. Soc. Psychiatry Psychiatr. Epidemiol.
42(5), 343–354 (2007)
10. Glantz, K., Rizzo, A., Graap, K.: Virtual reality for psychotherapy: current reality and future
possibilities. Psychother. Theor. Res. Pract. Training 40, 55–67 (2003)
11. Katkuri, P.K., Mantri, A., Anireddy, S.: Innovations in tourism industry and development using
Augmented Reality (AR), Virtual Reality (VR). In: TENCON 2019–2019 IEEE Region 10
Conference (TENCON), Kochi, India, pp. 2578–2581 (2019). https://doi.org/10.1109/tencon.
2019.8929478
Sentiment Analysis on Hindi–English
Code-Mixed Social Media Text
Abstract Social media has been experiencing an enormous amount of activity from
millions of people across the globe over last few years. This resulted in the accu-
mulation of substantial amount of textual data and increased several opportunities
of analysis. Sentiment analysis and classification is one such task where the opinion
expressed in the text is identified and classified accordingly. This becomes even more
trickier in code-mixed text due to free style of writing which does not have a proper
syntactic structure. In this paper, we worked on such Hind–English code-mixed
texts obtained from SentiMix shared task of SemEval-2020. We created a novel cus-
tomized embedding model for feature generation from Hindi–English code-mixed
texts to classify them to various sentiments like positive, neutral and negative using
deep learning techniques. It is observed that attention-based CNN-Bi-LSTM model
has achieved better performance out of all models with 70.32% F1-score.
1 Introduction
Social media platforms like Facebook, Twitter and Instagram have seen a phenomenal
range of interactions across the world. These platforms are flooded with all sorts of
data like texts, videos, images, and among all, textual communication is a prime
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 615
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_65
616 T. Tulasi Sasidhar et al.
source of research due to abundant usage. The amount of people engaging in social
networking are exponentially increasing each day on enormous variety of aspects.
This opened a huge scope for analyzing and understanding the behavioral pattern of
people and leveraging it for the improvement in several fields, i.e., getting feedback
for a product, reviewing public opinion on new government policy, obtaining verdict
of a movie and so on. The series of methods and techniques to understand the human
polarity by extracting the relevant information from the textual data can be termed
as sentiment analysis [1]. Traditionally, sentiment analysis and classification means
analyzing the polarity of the expressed opinion and categorizing them as positive,
neutral or negative.
Sentiment analysis is one of the subcategories and the prime area of research
within the area of natural language processing. There are lot of advanced models
achieved state-of-the-art classification results on texts which are expressed in mono-
lingual nature like English, Spanish, Chinese and so on. But people from multilingual
societies like India tend to use different style of writing texts as they more likely tend
to have an influence of at least two languages on them and code-mixed [2] writing is
one of such text patterns. Code mixing is a phenomena of transliterating and mixing
native and foreign language. An example of such text is illustrated below.
• Sarkar ne corona time pe lockdown shuru karke spread ko control kar diya.
The above example contains a Hind–English code-mixed text, and we can see that
Hindi words like “Sarkar ne”, “shuru karke” and “kar diya” are written in roman
script. Analyzing this type of sentences and classifying based on the sentiment
expressed in them is still an active research and difficult when compared to tra-
ditional text classification. Lack of pretrained models and quality annotated corpus
make the task even more trickier. In this paper, we conducted experiments on such
type of data, and we chose Hindi–English code-mixed texts and attempted to classify
them based into buckets of positive or negative or neutral. A novel way of creating
a customized embedding model is proposed for better feature generation and used
deep learning models to classify the sentences.
The paper structure is as follows. Section 2 provides a detailed description about
the existing works done in the field of sentiment classification. The details of the
dataset used for experimentation are given in Sect. 3. In Sect. 4, a detailed flow along
with description of each step followed while conducting experiments is provided,
and the paper is concluded in the Sect. 5.
2 Related Works
while preserving the context among them. Cha et al. [3] proposed a work on encap-
sulating models by the formation of word embedding clusters for evaluation of text.
They used Bag-Of-Words, word2vec, fastText and Doc2Vec to fabricate semantic
embedding features which are highly beneficial in readability of text, and among
them, fastText gave better performance. In the context of code-mixed text, Braja
et al. [4] presented a summary of a task in which texts are classified based on the
sentiment expressed in them, and two different code-mixed texts (Hindi–English
and Bengali–English) are used for experimentation. A brief of each team approach
in terms of features and models used by them is provided. Among all, the top two
performing teams used GloVe and fastText word embeddings. fastText along with
CNN layer to grab sub-word features and bi-directional long short-term memory
network (Bi-LSTM) to capture sequential information gave top classification per-
formance. Shalini et al. [5] proposed an approach for classifying the code-mixed
texts in Indian languages based on the sentiment embedded in it. They introduced
the first Kannada–English annotated corpus by grabbing Facebook comments using
API. The proposed model used Doc2Vec and fastText for feature vector genera-
tion. Machine learning model like SVM and deep learning networks like convolu-
tional neural network (CNN) and Bi-LSTM are used for classification. The proposed
method is validated on Bengali–English and Hindi–English corpus acquired from a
shared task. They achieved 60.22%, 72.20% using Bi-LSTM on EN-HI, EN-BE and
71.50% using CNN on EN-KA dataset. In many cases of code-mixed texts, a subset
of words constitute the entire context of sentence. In order to have better classifica-
tion, its important to weigh each word. This can be carried by neural network with
the incorporation of attention mechanism [6] in it. Zhou et al. introduced an LSTM
model with attention mechanism for classifying the sentiments in cross-language
texts [7]. Word2vec model trained on both English and Chinese is used to generate
feature vectors. They used a neural network model with attention mechanism which
is trained in combination with the bilingual bi-directional LSTMs to model the word
sequences and achieved 82.4, 84.1 and 81.3% accuracies on NLP&CC datasets. The
main challenge of distinguishing emotions in code-mixed texts is exploring mono-
lingual and bilingual content of each text and identifying the useful words from the
context. Wang et al. [8] addressed these challenges by proposing a bilingual atten-
tion network (BAN) model which accumulates the important word features from both
languages to construct feature vectors and integrate the vectors with high attention
weight to predict the emotion.
The related works that are provided for this task support the fact that sequential
models along with attention mechanism enhanced the classification but lack of state-
of-the art pretrained embedding model [9, 10] especially in Hindi–English code-
mixed domain resulted in sub-par accuracy values. In this work, we propose to
fabricate a customized embedding model which gives better numerical representation
of texts so that better classification can be achieved.
618 T. Tulasi Sasidhar et al.
3 Dataset Description
The dataset used for experimentation for sentiment analysis is obtained from Sen-
tiMix shared task organized in SemEval-2020 [11].
The detailed distribution of data is portrayed in Table 1. The task is to classify the
Hindi–English code-mixed text based on the sentiment expressed in it. The sentiment
labels considered are positive, neutral and negative. The dataset contains 14,000
sentences for training, 3000 sentences for validation and 3000 sentences for testing.
Each text is annotated with respective sentiment, and along with that, word level
language labels are also provided.
This section is organized as follows. In Sect. 4.1, preprocessing steps which are used
for cleaning the data are illustrated. First phase of experiments, results and output
analysis are provided in Sect. 4.2. In Sect. 4.3, the reason and procedure of fabricating
a customized embedding model are described, and the experiments conducted with
feature vectors from customized embedding model are provided in Sect. 4.4.
4.1 Preprocessing
The texts present in data are extracted from social media platforms, and they are filled
with information like Usernames, URLs, Hashtags and special characters. A prior
preprocessing is required to remove irrelevant information from the sentences. As
each data point is splitted into words, first step is to concatenate and form sentences.
After that, all the usernames which are in general start with @ and hashtags (#) are
removed from the sentences. All the special characters like multiple dots, smileys
along with additional spaces are removed, and each sentence is converted to lower
case. These preprocessed sentences are utilized for all the experiments that are carried
out in this work.
4.2 Experiments-1
It is clear from the literature survey that pretrained bilingual embedding models gen-
erate better feature vector in code-mixed sentence classification. As they were already
trained on similar pattern of sentences, they establish relatively better semantic rela-
tion between words and generate better numerical representation for them. Hence,
as the first level of experiments, a domain-specific pretrained model is utilized [12].
Initially, every preprocessed sentence is tokenized. Word2vec from gensim library
is used to load the pretrained model and retrain it with the tokenized sentences. Skip-
gram method is used, and model is retrained for 10 epochs. Sequential models like
LSTM, Bi-LSTM are used along with CNN headed models. Each model is trained for
15 epochs, and the test results are tabulated in Table 2. It is evident from the results
that there is a large scope of improvement. Hence, we performed a retrospective
analysis and found that there are huge number of unique words which are introduced
by this data to the pretrained embedding model. As lot of new words are present
in the dataset, it is tough to generate relevant embeddings with the available model.
Hence, we propose to fabricate a customized word embedding model.
As word2vec retrained model has many newly introduced words, we have decided
to create a customized embedding model for this dataset. It is also evident from the
literature survey that fastText is one of the word embedding model producing better
feature vectors. All the scenarios directed us to use fastText embedding model for
further set of experiments. So, the first stage of work is to collect tweets which are
similar to the texts in data of experimentation. Python library named tweepy is used
to scrape more code-mixed texts. Initially, all the words in data are tokenized, and
n-grams are collected out of them. The n value ranged from 1 to 5. The collected
n-grams are used as key words and given input to the tweepy library. It collected all
the tweets that contain n-gram within the text. All the collected tweets are manually
refined, and tweets which have relevant information and code-mixed in nature are
filtered. In total, 110,000 code-mixed texts are collected from social media and other
sources. Gensim fastText library is utilized to create embedding model. Skip-gram
mechanism is elected, and a fastText model is fabricated by training it with collected
data for 10 epochs.
4.4 Experiments-2
In this phase, customized fastText obtained model is used to generate feature vectors,
and the experiments are conducted using the same deep learning models used in
Experiment-1. There is a significant rise in classification results. At this stage, the
misclassified sentences are retrospected, and it is observed that most of them are either
lengthy or short. This high arbitrary nature in the lengths of sentences is responsible
for false classification. So, an attention model is adapted in the architecture to identify
and capture the relevant information according to context.
This resulted in the better performance than the already experimented models.
The architecture for the top performing model is as shown in Fig. 1. The results of
each of the experimented models are illustrated in Table 3. The metrics for measuring
the quality of classification are accuracy, recall, precision and F1-score.
The confusion matrix of test data results for the best performing model is shown
in Fig. 2 in which model class-wise classification performance can be displayed. In
comparison with Experiments-1 results, there is a surge in accuracy, and F1-score
of deep learning models can be seen in the phase of Experiments-2. It is evident that
a customized fastText bilingual embedding model gave better feature vectors and
attention mechanism helped in handling the sentences of which lengths are highly
arbitrary in nature.
The optimal hyperparameters of best performing model are given in Table 4. Ini-
tially, we started of with embedding vector size as 100, and to observe the change
in the results, we experimented by varying the vector size from 200 to 400. There
was improvement in results till 300 but at 400 the results started decreasing so the
optimal embedding vector was fixed as 300. Various activation functions like ReLu,
Tanh, etc., were used, and on observing the result, Tanh gave the best performance.
We experimented by varying the number of epochs from 5 to 15 and found that after
10 epochs the results were overfitting so we stopped with 10 epochs. The number of
units in Bi-LSTM is experimented from 100 to 400, and at 350, we observed better
classification. In summary, all the hyperparameters are selected based on trial and
error method.
5 Conclusion
vectors are generated. CNN headed BI-LSTM sequential model gave better per-
formance with 57% F1-score. In order to improve the classification, a customized
fastText bilingual embedding model is fabricated, and attention mechanism is utilized
to deal with arbitrary sentence lengths. It is observed that out of all the experimented
models attention-based CNN-BiLSTM has given better performance in terms of F1-
score. It is evident from the confusion matrix that it also has given better class-wise
performance.
References
1. Mäntylä, M.V., Graziotin, D., Kuutila, M.: The evolution of sentiment analysis–a review of
research topics, venues, and top cited papers. Comput. Sci. Rev. 27, 16–32 (2018)
2. Sreelakshmi, K., Premjith, B., Soman, K.P.: Detection of hate speech text in Hindi–English
code-mixed data. Procedia Comput. Sci. 171, 737–744 (2020)
3. Cha, M., Gwon, Y., Kung, H.T.: Language modeling by clustering with word embeddings for
text readability assessment. In: Proceedings of the 2017 ACM on Conference on Information
and Knowledge Management, pp. 2003–2006. ACM (2017)
4. Patra, B.G., Das, D., Das, A.: Sentiment Analysis of Code-Mixed Indian Languages: An
Overview of SAIL-Code-Mixed Shared Task@ ICON-2017. arXiv preprint arXiv:1803.06745
(2018)
5. Shalini, K., Ganesh, H.B., Kumar, M.A., Soman, K.P.: Sentiment analysis for code-mixed
Indian social media text with distributed representation. In: 2018 International Conference on
Advances in Computing, Communications and Informatics (ICACCI), pp. 1126–1131. IEEE
(2018)
6. Chen, H., Sun, M., Tu, C., Lin, Y., Liu, Z.: Neural sentiment classification with user and product
attention. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language
Processing, pp. 1650–1659 (2016)
7. Zhou, X., Wan, X., Xiao, J.: Attention-based LSTM network for cross-lingual sentiment clas-
sification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language
Processing, pp. 247–256 (2016)
8. Wang, Z., Zhang, Y., Lee, S., Li, S., Zhou, G.: A bilingual attention network for code-switched
emotion prediction. In: Proceedings of COLING 2016, the 26th International Conference on
Computational Linguistics: Technical Papers, pp. 1624–1634 (2016)
9. Kamble, S., Joshi, A.: Hate Speech Detection from Code-mixed Hindi-English Tweets Using
Deep Learning Models. arXiv preprint arXiv:1811.05145 (2018)
10. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword infor-
mation. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
11. Patwa, P., Aguilar, G., Kar, S., Pandey, S., PYKL, S., Gambäck, B., Chakraborty, T., Solorio,
T., Das, A.: Semeval-2020 task 9: overview of sentiment analysis of code-mixed tweets. arXiv
e-prints, pp.arXiv-2008 (2020)
12. Sasidhar, T.T., Premjith, B., Soman, K.P.: Emotion detection in Hinglish (Hindi + English)
code-mixed social media text. Procedia Comput. Sci. 171, 1346–1352 (2020)
Accident Risk Rating of Streets Using
Ensemble Techniques of Machine
Learning
Abstract Increased vehicular traffic and lack of expert drivers on the street coupled
with the adverse conditions and poor maintenance of streets are liable for increase in
traffic accidents. Hence, prediction of traffic collision is of paramount importance for
their mitigation. Street traffic analysis and prediction can be a dedicated approach to
ensure safe and reliable street networks. The primary objective of this research is to
assign an accurate accident risk factor for each street using machine learning models
on the identified dataset. For automated and accurate prediction, various ensemble
models of machine learning are applied, and their performance is compared with the
naive models.
1 Introduction
In recent times, increased urbanization has resulted in much higher count of vehicles
on streets, which has given rise to numerous troubles, such as traffic congestion, acci-
dents, and air pollution. These issues have caused immense physical and economic
loss as well as human casualties. The Global Status Report on Road Safety 2015,
representing statistics from 180 nations, reveals that the overall number of traffic
fatalities worldwide observes an annual increase by 1.25 million, with the maximum
traffic mortality rates in lower-income nations. According to National Vital Statistics
Reports 2017, traffic accidents are reasons for around 36,000 deaths in the USA.
Urgency of moment is to improve traffic safety and reduce the number of deaths.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 623
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_66
624 A. Rastogi and A. L. Sangal
Some collisions have found to have occurred because of the road structures whereas
others can be attributed to human error. Although progress has been made in strength-
ening the regulations for on-road protection and making cars safer, the study indicates
that the speed of reform remains very slow. With the guide of traffic information and
deep learning, traffic stream forecast has encouraged individuals to avoid enormous
road jams and accidents by choosing routes with a lower clog. Enormous traffic
information and machine learning may likewise give an urging solution to foresee
and diminish the danger of traffic casualties.
One significant work in car accident evasion is to develop a successful traffic
security score prediction system. On the off chance that a traffic security score in a
specific area can be anticipated, we can advance this report to the neighboring drivers
to make them careful or, while on the other hand, make them select a safer street. In
any case, the definite forecast of the car accident is as troublesome as many related
causes could impact car crash.
Ensemble models provide us an upper hand in machine learning such that it
combines the results of various models and allows better predictive performance
compared to a single model. This potential makes it beneficial to apply the ensemble
models of machine learning in the problem mentioned above. The main objective
is to investigate risk levels of roads using traffic accident data. The purpose of this
paper is to investigate accident data and classify streets into various accident risk
levels and applying machine learning models to accurately predict the risk levels of
the streets. So that future research can be done on making a device that would aid
in avoiding the collision-prone areas and advise alternative approaches to alleviate
accident recurrence and harshness.
The remaining part of the paper is structured in the following way: Sect. 2 talks
about the reviewed literature. This section is followed by our proposed accident risk
assignment methods in Sect. 3. Model evaluation is discussed in Sect. 4. Section 5
concludes the paper with a depiction on future directions.
2 Literature Survey
The immense effort was committed to the identifiable proof of main factors or distinct
road patterns that could have caused the collision of traffic. For instance, Oh suggested
the reason that disrupting traffic streams is one of the reasons for provoking the
accident [1]. Ultimately based on the loop detector dataset and crash dataset, they
found that a normal 5-min deviation of the automobile rates, immediately before a
car accident, is a notable indication of a crash. Even though few accident pointers
have been recommended, they could not address the issue of exact mishap prediction
because plenty of components had complicated relations with car accident.
Spatio-temporal reliance is a moving part of traffic, and the reliance of traffic
movement on space and time is assessed by Yue utilizing cross-relationship reasoning
and shows its importance in traffic prediction assessment [2]. Dauwels proposed
unsupervised learning ways to deal with and conclude the spatio-temporal examples
Accident Risk Rating of Streets Using Ensemble … 625
in a gigantic traffic speed forecasting [3]. Dish thought of a model intended to foresee
the spatio-temporal effect of happened occurrences on its neighboring traffic contin-
gent upon the constant traffic data [4]. A spatio-temporal recurrent convolutional
neural system to consider the spatial interdependent conditions and temporal behavior
of network-wide traffic and worldly conduct of system-wide traffic is proposed by
Yu [5].
The progression of AI innovation triggers the researchers to target real-time traffic
accident prediction. Lv considered the factors relying upon Euclidean measurement
and utilized k-closest neighbor way to deal with foresee car accidents [6]. Park
assembled an enormous measure of vast roadway car crash information in Seoul
and built a prediction flow of work contingent upon the k-means grouping approach
and logistic regression [7]. Chen gathered the human versatility dataset in Japan and
utilized model a stack denoise autoencoder to decipher the continuous traffic open
to danger. One negative mark of these inquiries is that these works did not manage
the temporal patterns of traffic impact itself into the models. Without this data, the
percent intensity of the model could be reduced.
In general, the literature reviewed on the harshness of collision injury found
that serious thought had been given to modeling crash harshness, but prediction of
injury outcome would not be a core concern. Statistical models are all the more
often utilized in crash seriousness modeling contrasted with AI techniques, while AI
strategies were, for the most part, utilized as prediction tools. DT, NB, SVM, and RF
are seen as utilized in crash seriousness modeling with changing popularity.
The proposed process for the risk assignment to each street is shown in Fig. 1. The
detailed description of all the steps is given in the subsections following the figure.
The two datasets used in this research are vehicle dataset and vehicular accident
dataset received from the Chicago Data Portal of the City of Chicago in the year
2015 to 2019.
Based on the common report number attribute we have merged both the datasets
also we have cleaned the data to include only on-road vehicles and passenger vehicle
types while excluding unknown values for the traffic type, lightning condition and
weather columns. The attribute ‘num-passengers’ does not include the driver of the
vehicle. Hence, we check If the ‘num-units’ = 1, and this implies that there was only
one driver involved in the crash, and we add +1 to get the total number of passengers
in the vehicle. If the ‘num-units’ >1, means there was more than one vehicle involved
in the crash, and we add that value to the total number of passengers in the vehicle.
From 1610 samples, 30% of samples were kept for testing and remaining 70%
were used to train the model.
Feature selection is the way toward choosing the traits that can make the anticipated
variable increasingly correct or taking out those properties that are insignificant
and can diminish the model precision and quality. Correlation is an approach to
comprehend the connection between numerous factors and characteristics in the
dataset. Correlation pictures on the off chance that one or numerous properties rely
upon another quality or a reason for another trait or one or different features are related
with different features. Correlation investigation shows that the accident severity can
be resolved dependent on number of wounds and physical harm in the accident data.
We assign the danger score according to the number of injuries per person in the
vehicle and the physical damage of the vehicle, and as these two factors have the
maximum correlation, we find number of injuries per person involved in the accident
by dividing the total number of the injuries in the accident by a total number of people
involved in the accident. We choose to assign four danger score ratings to account
for all accidents depending on the amount of injuries. We checked the unique values
Accident Risk Rating of Streets Using Ensemble … 627
of the damage and then decided what weight to add to it in the computation of the
danger score. There are three unique values of the damage in monitory terms (<500,
500–1500 and >1500), so we assign rating values 1, 2, and 3, respectively, then add
a weight of ‘w’ in the multiplication. With a weight of 0.5, we have eight different
scores to bin into four categories. Danger score is 1, if the current score is 0.5 and 1;
danger score is 2, if the current score is 1.5 and 2; danger score = 3, if the current
score is 3 and 4; danger score = 4, if the current score is 4.5 and 5. Then, we assign
the accidents into three bins as combined danger score.
After assigning the danger score to the each accident, we have applied machine
learning models to estimate the accuracy of our assigned danger scores. We have
applied basic machine learning models like logistic regression, SVM, KNN, deci-
sion tree, and ensemble models like random forest and gradient boosting. The
implementation results of these models are described in the next section.
4 Model Evaluation
We have implemented basic machine learning models and ensemble models and
compared the results based on metrics like precision, recall, and F1 Score in Table
1 with other existing methods used in the study for prediction such as SVM, KNN,
and DT models. We have shown the performance of models on ROC.
Figures 2, 3, 4, 5, 6 and 7 provide the ROC curve for comparison of each model’s
performance on the given dataset for the three danger score classes.
The results obtained shows that gradient boosting is better as compared to all
other models. The gradient boosting model has the smallest RMSE across different
prediction models for all the classes.
In the last decade, the analysis and prediction of street traffic have become a subject
of continuous research in various sub-fields of computer science. In this paper, we
enlisted and discussed various approaches proposed for the accidents risk rating
assignment of streets, including machine learning and ensemble techniques. With
results obtained, we can conclude that Gradient Boost, an ensemble model of machine
learning, performed better than simple random forest in terms of accuracy and other
parameters.
Accident Risk Rating of Streets Using Ensemble … 629
Future work can be extended to develop a more successful model that would
outperform the accuracies achieved by gradient boosting. Also, we can use hyper-
parameter tuning to optimize our models with parameters giving the best results,
Moreover, our system could be extended to include more accident-related features
that would help drivers to choose the safest path out of the various available routes,
and the model should also be able to tell the risk of all the available streets. Future
research will focus upon confirming the superiority of DNNs as traffic accident
severity classification/ prediction models using the existing datasets of the relevant
630 A. Rastogi and A. L. Sangal
literature, also putting forward a set of independent parameters that are not only
salient, but also enough, for traffic accident severity prediction.
Accident Risk Rating of Streets Using Ensemble … 631
References
1. Oh, C., Oh, J.-S., Ritchie, S., Chang, M.: Real-time estimation of freeway accident likelihood.
In: 80th Annual Meeting of the Transportation Research Board, Washington, DC (2001)
2. Yue, Y., Yeh, A.G.-O.: Spatiotemporal traffic-flow dependency and short-term traffic forecasting.
Environ. Plann. B Plann. Des. 35(5), 762–771 (2008)
3. Asif, M.T., Dauwels, J., Goh, C.Y., Oran, A., Fathi, E., Xu, M., Dhanya, M.M., Mitrovic, N.,
Jaillet, P.: Spatiotemporal patterns in large-scale traffic speed prediction. IEEE Trans. Intell.
Transp. Syst. 15(2), 794–804 (2014)
4. Pan, B., Demiryurek, U., Shahabi, C., Gupta, C.: Forecasting spatiotemporal impact of traffic
incidents on road networks. In: 2013 IEEE 13th International Conference on Data Mining
(ICDM), pp. 587–596. IEEE (2013)
5. Yu, H., Wu, Z., Wang, S., Wang, Y., Ma, X.: Spatiotemporal recurrent convolutional networks
for traffic prediction in transportation networks. Sensors 17(7), 1501 (2017)
6. Lv, Y., Tang, S., Zhao, H.: Real-time highway traffic accident prediction based on the k-nearest
neighbor method. In: International Conference on Measuring Technology and Mechatronics
Automation. ICMTMA’09, vol. 3, pp. 547–550. IEEE (2009)
7. Park, S.-H., Kim, S.-M., Ha, Y.-G.: Highway traffic accident prediction using vds big data
analysis. J. Supercomput. 72(7), 2815–2831 (2016)
Skin Detection Using YCbCr Colour
Space for UAV-Based Disaster
Management
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 633
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_67
634 S. J. Arya et al.
1 Introduction
2 Methodology
The paper describes an experimental study on skin detection in YCbCr colour space.
MATLAB is used for image processing, and results give the feasibility of the selected
skin detection method. This will help to make an effective system for disaster victim
surveillance using UAV.
Once the sample is loaded, it is converted into YCbCr colour spaces from
normal RGB value. Different morphological operations are done to develop efficient
skin detection. Along with this colour, segmentation model paper aims to check
the efficiency of skin detection in greyscale value distribution. For this purpose,
MATLAB code is improved to plot the histogram and to detect the skin colour in the
greyscale range also. In YCbCr, procedure effectiveness is broke down by plotting
the histogram of every part.
UAV mounted with a camera catches the pictures of the catastrophe site, and
it is received at the ground station. Preparing the pictures that have taken by UAV
during the flight from a height in MATLAB for skin identification opens an approach
to discover casualties and guarantees a snappy recuperation. Figure 1 indicates the
block diagram of the system which has aground board system and an air board
system.
3 Image Processing
Image processing is a rising field with a wide assortment of uses and assumes an
immense job in surveillance. It is a technique of modifying the characteristics of an
image and changing its attributes to get an ideal yield according to the enthusiasm
of the client. The fundamental strides in each sort of image processing continue as
before as it includes bringing in images utilizing image obtaining process, breaking
636 S. J. Arya et al.
down the images through programming, the caught images are controlled to deliver
the ideal highlights and different filter techniques are applied to evacuate noises. Skin
detection is done by separating skin shaded pixel from non-skin hued.
YCbCr colour space is an encoded form of RGB colour space used for video
streaming and compression. Since the portrayal makes it easy to discard some repet-
itive shading information, it finds application in the image and video compression
standards like JPE Group, MPE Group 1, MPE Group 2 and MPE Group 4. The
change straightforwardness and express parcel of luminance and chrominance frag-
ments make YCbCr shading model. In this course of action, luminance data is taken
care of as a unit fragment (‘Y’), and chrominance data is taken care of as 2 shading
differentiation parts (‘Cb’ and ‘Cr’). ‘Cb’ addresses the differentiation between blue
portion and reference esteem. ‘Cr’ addresses the complexity between red part and
reference esteem. It is a linear conversion of RGB [4].
YCbCr values can be gotten from RGB shading space as per Eqs. 1–3 [4].
Cr = R − Y (2)
Cb = B − Y (3)
YCbCr colour model representation is shown in Fig. 2. All three components are
represented in the image.
Skin tone selected for the study is of the moderate colour range.
0 < H < 50 & 23 < S < 68 [2]
RGB-level to grey-level conversion takes place as per the code, and the converted
range is y > 0.2; cb > 0.3 & cb< 0.44; cr >0.47.
Skin detection of both an indoor image and an outdoor image which is captured
during the flight of UAV is done in YCbCr colour space.
Fig. 8 Histogram of Y
image
Skin Detection Using YCbCr Colour Space for UAV … 639
Fig. 9 Histogram of Cb
image
Fig. 10 Histogram of Cr
image
4.2 Results
RGB colour space is not being used commonly because of its non-uniform nature.
YCbCr colour space shows better results and transformation. As the disaster site
contains more complex situations, YCbCr colour space will be effective for the skin
detection purpose. Figure 3 is an indoor sample. Skin pixels are extracted and shown
in Fig. 4. Skin part and non-skin part are marked in Figs. 5 and 6, respectively.
Histogram of the image can be used for better analysis of the picture. The original
colour image got by processing the sample using the code which is shown in Fig. 7.
Histogram of Y, Cb, Cr is indicated by Figs. 8, 9 and 10, respectively. Figure 11
is the outdoor sample that was taken using UAV from an approximate height of 15
metres. Skin pixel identification in greyscale is shown in Fig. 12. Figures 13 and 14
are the skin pixels in colour and non-skin pixels in colour, respectively. The results
were perfect in the case of the samples that we were used for the test.
5 Conclusion
Skin detection is a leading edge in human body identification and analysis and is
applied in many emerging technologies like face detection where human skin colour
acts as an elementary guide for detection. The segmentation process of human skin
Skin Detection Using YCbCr Colour Space for UAV … 641
colour depends upon the colour space selected, as the skin colour distribution largely
depends on the colour space. The images under various positions based on orientation,
illumination, shadow, pose, in-plane rotation can be easily distinguished and are
applied in diverse applications like video-compression and recognition technology,
as it finds difficult in computer vision technology.
Considering the various limitations in this field, we can say there is not been a
Zen per cent solution for skin detection and is still under development.
References
1. Mbaitiga, Z., Fuji, S., Minori, S.: Rapid human body detection in disaster sites using image
processing from unmanned aerial vehicle (UAV) cameras. In: ICIIBMS 2018, Track 2: Artificial
Intelligent, Robotics, and Human-Computer Interaction, Bangkok, Thailand
2. Lei, Y., Hui, L., Xiaoyu, W., Dewei, Z., Jun, Z.: An algorithm of skin detection based on texture.
In: 4th International Congress on Image and Signal Processing 2011, pp. 1822–1825 (2011)
3. Shaik, K.B., Ganesan, P., Kalist, V., Sathish, B.S., Jenitha, J.M.M.: Comparative study of skin
color detection and segmentation in HSV and YCbCr color space. Procedia Comput. Sci. 57,
41–48 (2015)
4. Ahmed, E., Crystal, M., Dunxu, H.: Skin detection—a short tutorial. Encyclopedia of Biometrics,
pp. 1218–1224. Springer, Berlin, Heidelberg (2009)
5. Kolkur, S.: Human skin detection using RGB, HSV and YCbCr colour models. Adv. Intell. Syst.
Res. 137, 324–332 (2017)
Lie Detection Using Thermal Imaging
Feature Extraction from Periorbital
Tissue and Cutaneous Muscle
1 Introduction
According to several studies, the differentiation between the liars and non-liars is
very poorly detected by normal as well as expert people. The well-known method,
i.e., polygraphy includes different sensors which measure a person’s blood pressure,
respiration activity, etc. Mostly, the polygraph technique succeeds in detecting the
lies with a accuracy of 90%. But its major drawback is the time taken and the
quality of expert people conducting the test. This drawback can be overcome by
using automated deception detector which detects the lies by facial and behavioral
changes. The lie detector uses thermal image as an input image. It uses the temperature
of the skin that is captured by the thermal camera, which varies due to the blood flow
which is the result of varying emotions. This technique seems to be promising as it is
hard to control one’s emotions [1]. This change in the blood flow is mostly observed
in the forehead and periorbital regions of face. This gives us the input as the relevant
patterns to be found in these regions to detect the lies and truth.
2 Literature Survey
Rajoub et al. [2] presented the lie detection approach by observing thermal fea-
tures in region of interest, i.e., periorbital region. The approach includes the use of
machine learning for the feature. Bhowmik et al. [3] proposed a solution for facial
feature detection. Features such as eye, nose, mouth are detected which do not vary
on rotation, scale, image noise such as eyes and nose. This solution uses an algo-
rithm called Harris Interest Point Detection algorithm to do so. Wu et al. [4] have
presented recognition of thermal face and also learn important features like nose,
eye, mouth, etc., from the raw dataset using a CNN architecture. The recognition
rate here is still affected by conditions like head rotation, expression variation and
illumination variation. Kyal et.al. [5] have presented a way to identify the face of a
human being from a thermal image efficiently. The feature extraction task was done
using a technique called histogram plot. In order to be able to detect face efficiently,
techniques such as object boundary analysis, thresholding are applied to images.
George et al. [2] analyzed the count of eye blink and duration of blink of truth and lie
responses. Analysis over 50 people over some sample questions showed that count
and the duration are more in case of deception. Grouping of responses was done on
the basis of maximum blink duration, and maximum count of blinks for both lie and
truth responses and also the responses where no blinking is observed was categorized
as no blink category.
3 Methodology
The methodology used for present research includes recording of facial skin tem-
perature using thermal camera, processing captured images on mobile device using
deep learning techniques.
Lie Detection Using Thermal Imaging Feature Extraction from Periorbital … 645
3.1 Architecture
Figure 1 demonstrates the top view architecture of the project model. Thermal camera
is plugged into a smartphone, and video is recorded. Then, recorded video in the
gallery of a smartphone is transferred to application. This video is fetched on local
machine where trained model is already located. The model is downloaded from
deep learning server after training with a sufficiently large dataset. The processed
input image is then given as input to the script, and respective response is generated
using the pre-trained model which is ultimately sent back to user via application.
Data will be acquired by using a thermal camera, Seek Compact Thermal Imager for
Android with a temperature range −20 to 1000 ◦ C. During each interview session,
thermal measurements of participant’s face will be obtained by using the Android
smartphone attached with the thermal camera. The dataset can be obtained by using
the following methods:
1. Surveillance of participants and interviewing them.
2. To ask a subject to describe another person.
3. Interview of different people which can have different base temperatures of the
body under normal circumstances.
These varying temperatures can affect or change the heat maps, and it can also worsen
the accuracy of detecting a lie. In order to get over this issue, the initial few seconds
of every recording will be used as the baseline for each person. In this case, the
concerned person will sit normally without getting into any pastime or answering.
The dataset is based on two profile case studies and a particular mock crime. There is
a total of ten participants who are considered for the demo interview, each interview
further divided into four parts. (1) Baseline (2) True (3) Direct lie and (4) Indirect
lie. For testing purposes, we will be using mock crime video.
The data processing begins with identifying and cropping of the subject’s face [6].
This action is followed by detecting the maximum intensity point in the image(refers
to the nasal tip which is most close to thermal camera). Now, taking this as the
reference point, the location of forehead and eye region is calculated [7]. These
regions are now cropped from the image for creation of dataset. Now, the dataset
consists of forehead and periorbital regions which we will eventually get by refining
the previously cropped images. These cropped images are stored as dataset. Sample
of processing is as shown in Figs. 2 and 3.
3.4 Experimenting
Experimenting the data acquired from the above step using deep learning (DL)
machine. Dataset of four parts like baseline, true, direct lie and an indirect lie is
uploaded on the DL server. The server has two NVIDIA V100 architecture GPU
cards and 128 GB DDR 4 ECC RAM. The model is trained by using AlexNet [8]
and different algorithms available on the server [9]. We have varied epochs and dif-
ferent parameters to get more accuracy. Model is trained using processed dataset
which contains 22,000 images of baseline, 33,912 for truth, 11,000 each for direct
and indirect lie. Test dataset contains 6752 test images for baseline, 7892 test images
for truth, 3375 test images for direct and indirect lie class.
4 Dataset
The dataset is acquired by using a thermal camera, Seek Compact Thermal Imager
for Android with a temperature range −20 to 1000◦ C (evening). The thermal camera
features are as follows—Seek Thermal Camera with an image resolution of 640 ×
480 pixels and a frame rate of 12 FPS. 256 × 156 Thermal sensor. 36-degree field
of view, works day and night. During each interview session, thermal measurements
of the participant’s faces were obtained by using the Android smartphone attached
with the thermal camera. The dataset was obtained by using the following methods:
1. Surveillance of people and interviewing them.
2. To tell a subject to describe another person.
3. Different subjects may have different base temperatures in usual circumstances.
4. These differences can control and worsen the accuracy of lie detection.
5. In order to get over this issue, the initial few seconds of every recording will be
used as the baseline for each person.
6. In this case, the concerned person will sit normally without getting into any pastime
or answering.
The dataset is based on two profile case studies and a particular mock crime. There are
a total of ten participants who are considered for the demo interview, each interview
further divided into four parts—true, direct lie, indirect lie and baseline. For testing
purposes, we used a mock crime video.
648 P. Kodavade et al.
The subject was given a character profile to learn for 10 min. Four sessions of each
subject were conducted.
Baseline session, truth session, direct lie session and indirect lie session. Figure 4
gives us the overall distribution of the dataset used in the project. It sums up the
total number of videos to 41 videos. Using the videos of various participants will
narrow the error zone, thereby widening the scope of the project. Table 1 gives us the
overall distribution of the questions. The baseline questions consist of the general
questions like what is your name, which city do you belong to, etc. The true session
consisted of questions where the participant gave genuine answers. The direct lie
session consisted of the questions where the person lied directly without hiding the
truth or making up a story. This was achieved as the person was given the story
plot before the session. The indirect lie session consisted of the questions where the
participants made up a story and lied.
5 Experimental Setup
• Sessions recording using the thermal Android app provided with the camera.
• The camera should be placed approximately 20 cm away from the subject’s face.
• Sessions to be conducted in the evening so as to get sharp edges.
Lie Detection Using Thermal Imaging Feature Extraction from Periorbital … 649
6 Result Analysis
The trained model is now able to identify between true and lie correctly, whereas the
minute differences like between direct and indirect are not detected.
Tables 2 and 3 denote the confusion matrix for classification as class true or class
lie and for classification in four classes, i.e., baseline, true, direct lie and indirect
lie, respectively. Table 2 denotes the overall accuracy of the model, while classifying
image into true or lie is 100%. Table 3 denotes the F1 score for baseline, truth, direct
lie and indirect lie is 54%, 46%, 67%, 67%, respectively. The overall accuracy based
on this is 60.53%.
Table 3 For classification in four classes, i.e., baseline, true, direct lie and indirect lie
Baseline Truth Direct lie Indirect lie Precision F1 score
Baseline 8000 8000 0 0 50% 54%
Truth 5390 5610 0 0 51% 46%
Direct lie 0 0 6050 4950 55% 67%
Indirect lie 0 0 1000 10,000 90.90% 77%
Overall 13,390 13,610 7050 14,950 – –
truth
Recall 59.74% 41.22% 85.816% 66.89% – –
650 P. Kodavade et al.
7 Conclusion
References
1. Marzec, M., Koprowski, R., Wrobel, Z.: Method of face localization in thermograms. Biocy-
bern. Biomed. Eng. (2014)
2. Rajoub, B.A., Zwiggelaar, R.: Thermal facial analysis for deception detection. IEEE Trans.
Inf. Forensics Secur. 9(6), 1015–1023 (2014)
3. Bhowmik, M.K., Shil, S., Saha, P.: Feature points extraction of thermal face using harris
interest point detection. In: International Conference on Computational Intelligence: Modeling
Techniques and Applications (CIMTA) (2013)
4. Wu, Z., Peng, M., Chen, T.: Thermal face recognition using convolutional neural network. In:
2016 International Conference on Optoelectronics and Image Processing
5. Kyal, C.K., Poddar, H., Reza, M.: Detection of human face by thermal infrared camera using
MPI model and feature extraction method. In: 2018 4th International Conference on Computing
Communication and Automation (ICCCA)
6. Latif, M.H., Md. Yusof H., Sidek, S.N., Rusli, N.: texture descriptors based affective states
recognition- frontal face thermal image. In: 2016 IEEE EMBS Conference on Biomedical
Engineering and Sciences (IECBES)
7. Abd Latif, M. H, Md. Yusof, H, Sidek, S.N, Rusli, N.: Implementation of GLCM features in
thermal imaging for human affective state detection. In: 2015 IEEE International Symposium
on Robotics and Intelligent Sensors
8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional
neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105,
Curran Associates, Inc (2012)
9. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Weinberger densely connected con-
volutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition
Voting Classification Method with PCA
and K-Means for Diabetic Prediction
Abstract Data mining can be defined as a technology using which valuable infor-
mation can be extracted from the massive volume of data. The big patterns can be
explored and analyzed using statistical and artificial intelligence in big databases. The
goal of this research work is to predict the diabetes disease accurately with machine
learning algorithms such as PCA, K-means, random forest, multilayer perceptron
(MLP), and naive Bayes. The diabetes prediction model has various steps like data
preprocessing, feature extraction with the help of PCA, and classification with voting
classifier. The fundamental focus of this research is to improve the prediction accu-
racy. To improve the accuracy for diabetes prediction, voting classifier is introduced
for the diabetes prediction.
1 Introduction
Diabetes is a frequent chronic malady. This disease severely affects the health of a
human being. The increase in blood sugar level from the normal range is the main
feature of this disease. Imperfect insulin secretion or impaired genetic effects are the
main causes of this disease. In this disease, human body either doesn’t generate suffi-
cient insulin or becomes inefficient in the usage of generated insulin in proper way. If
this disease is not treated in proper time, then it can cause harm to a person’s nerves,
eyes, kidneys, and other organs. The first type generally affects youngsters below
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 651
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_69
652 A. Yadav et al.
thirty year. Some medical indications of this disease are more thirst and repeated
urination, high blood sugar levels, etc. It is not possible to cure this disease with oral
medicines only. In many cases, insulin is given to the human body through injection.
The second type of this disease mainly occurs in middle-aged and old people. In old
people, this disease mainly occurs due to fatness, high blood pressure, dyslipidemia,
arteriosclerosis, and other maladies.
The future trends can be predicted, or hidden pattern can be discovered using data
mining. There are various methods in data mining using which relevant information
can be taken out. These techniques include classification, clustering, association rule,
regression, outlier detection, etc. The technology of data mining is gaining a lot of
popularity in healthcare sector. Data mining is a leading tool set in clinical databases.
Nowadays, the use of data mining algorithms for generating clinical predictions has
become quite common. Over the past few years, many researchers have theorized that
medically assistive supports, and prediction patterns can be acquired from the crucial
data of a patient. Most of the researches in the area of disease prediction analysis
are focused on increasing the accuracy rate. The data should be in understandable
format for carrying out analysis.
2 Literature Survey
The database of diabetes patient had employed in this system to provide the diabetes
malady analysis. The exploitation of KNN and Bayesian algorithms was suggested
in this system that had carried out in the dataset of diabetes patients. Several diabetes
features were extracted for analysis of these algorithms to predict the disease of
diabetes [1].
Recommends a risk prediction model for diabetes of type-2. This model was
designed on the basis of ensemble learning technique. The selection of optimal
attributes was done by RF-WFS and XGBoost (extreme gradient boosting) classi-
fiers for selecting best features. A lot of performance parameters were compared in
this work to validate the efficiency of the recommended classifiers. Moreover, these
classifiers showed more accurate prediction results as compared to other existing
classifiers [2].
A medical case had considered after taking account of electronic health records
from different sources that were related to patients of diabetes. The naive Bayes and
SVM data mining algorithms for classifications were employed to implement the
analysis. This analysis focused on the diabetes prediction with health record. The
superior algorithm was identified for predicting the diabetes when the precision of
both the data algorithms was compared [3].
The GMM, support vector machine, artificial neural network, ELM, and logistic
regression were various data mining technique that had applied to diagnose the
diabetes in early phase. The outcomes obtained after experiments demonstrated that
the better accuracy had achieved from ANN than other methods [4].
Voting Classification Method with PCA … 653
The experiment on WEKA tool was conducted using four classifiers named SVM,
random forest, simple cart, and naive Bayes for diabetes prediction. The comparison
of these classifiers was performed in terms of exactness value, time for training and
testing. The classifier measure of accuracy was another performance technique that
had used for evaluation. The SVM classifier was performed better than naive Bayes,
RF, and simple cart for predicting the diabetes. The results acquired after the testing
demonstrated the efficiency of suggested model [5].
The predicting models were set up from the diagnostic medical datasets for
extracting the knowledge. This extracted knowledge had proved efficient for diabetic
prediction in patients. The diabetic mellitus was predicted using SVM, naive Bayes,
KNN, and decision tree (C4.5) machine learning algorithms on data related to
youngsters. The greater accuracy had obtained from decision tree (C4.5) [6].
The prediction of diabetes was completed using ANN, K-means, and RF methods.
The highest accuracy had achieved from the artificial neural network that was eval-
uated 75.7%. It was helpful to aid medical professionals for making decisions for
treatment [7].
A new model based on data mining to diagnose and predict diabetes disease in
early stage is given. This algorithm could be utilized for different types of data. The k-
means was simple, and it was very responsive to original locations of cluster centers.
This phenomenon determined the ultimate clustering result. The adequate clustered
dataset was received from it for the logistic regression model. To improve the accuracy
rate of k-means along with logistic regression was the main motive of this work. It
was evaluated from the results that the accuracy of both the algorithms mentioned
above was improved by principal component analysis. The recommended model
included three algorithms. These algorithms were identified as principal component
analysis (PCA), k-means for clustering, and logistic regression for classification. The
tested outcomes depicted that PCA algorithm made improvements in the k-means
approach. This k-means algorithm showed 25% improvement in accuracy rate while
logistic regression showed 1.98% more accuracy rate [8].
Following are the various phases for the diabetic prediction (Fig. 1):
1. Dataset input: The diabetes dataset obtained from UCI database is used for this
prediction. The dataset is comprised of 768 sample female patient from the
Arizona, USA population who were examined for diabetes. The dataset has a
total of 8 attributes such as pregnancies (preg), glucose (plas), blood pressure
(pres), skin thickness (Skin), insulin, BMI, diabetes pedigree function, age with
one target class (0 or 1).
2. Attribute selection: In this phase, the technique of PCA is utilized to decrease
the dimensionality of the data. The technique of PCA is applied which can select
the most relevant attributes from the large number of attributes. The selection
654 A. Yadav et al.
STOP
Step 6 Repeat past two stages iteratively till the group centroids quits changing their
positions.
This work focuses on the diabetic prediction. The data is taken from the UCI database.
The dataset has 8 attributes, and dataset is of multivariate type for the prediction anal-
ysis. Different methods are implemented and compared in terms of certain parameters
like accuracy, precision, and recall. In the proposed method PCA, k-means and voting
classification approaches are implemented for diabetic prediction. The voting classi-
fication method is combination of multilayer perceptron (MLP), random forest, and
naive Bayes classifier. We have applied following models on the dataset whose result
is given in Tables 1 and 2.
In this paper, it is inferred that various steps are involved in diabetes prediction. The
technique of PCA is used for the feature reduction. The k-means clustering algorithm
is used to cluster alike and diverse type of data. In the last, the voting classification
656 A. Yadav et al.
References
1. Shetty, D., Rit, K., Shaikh, S., Patil, N.: Diabetes disease prediction using data mining. In:
2017 International Conference on Innovations in Information, Embedded and Communication
Systems (ICIIECS), pp. 1–5, Coimbatore (2017)
2. Xu, Z., Wang, Z.: A risk prediction model for type 2 diabetes based on weighted feature selection
of random forest and XGBoost ensemble classifier. In: 2019 Eleventh International Conference
on Advanced Computational Intelligence (ICACI), pp. 278–283, Guilin, China (2019)
3. Raj, R.S., Sanjay, D.S., Kusuma, M., Sampath, S: Comparison of support vector machine
and Naïve Bayes classifiers for predicting diabetes. In: 2019 1st International Conference on
Advanced Technologies in Intelligent Control, Environment, Computing and Communication
Engineering (ICATIECE), pp. 41–45, Bangalore, India (2019)
4. Komi, M., Li, J., Zhai, Y., Zhang, X.: Application of data mining methods in diabetes prediction.
In: 2017 2nd International Conference on Image, Vision and Computing (ICIVC), pp. 1006–
1010, Chengdu (2017)
5. Mir, A., Dhage, S.N.: Diabetes disease prediction using machine learning on big data of health-
care. In: 2018 Fourth International Conference on Computing Communication Control and
Automation (ICCUBEA), pp. 1–6, Pune, India (2018)
6. Faruque, M.F., Asaduzzaman, Sarker, I.H.: Performance analysis of machine learning techniques
to predict diabetes Mellitus. In: 2019 International Conference on Electrical, Computer and
Communication Engineering (ECCE), pp. 1–4, Cox’sBazar, Bangladesh (2019)
7. Alam, T.M., Iqbal, M.A., Ali, Y., Wahab, A., Abbas, Z.: A model for early prediction of diabetes.
Inform. Med. Unlocked 16, 100204 (2019)
8. Zhu, C., UwaIdemudia, C., Feng, W.: Improved logistic regression model for diabetes prediction
by integrating PCA and K-means techniques. Inform. Med. Unlocked 17 (2019)
Hybrid Model for Heart Disease
Prediction Using Random Forest
and Logistic Regression
Abstract Data mining is a method in which the valuable data is mined from the rough
data. The futuristic outcomes are forecasted using recent information in the prediction
analysis. The more useful, efficient, and commercial management of health resources
after the recognition of risks, the prediction of disease in people or the prediction
of hospital entry’s length is facilitated through it. This research work deals with the
prediction of the heart disease. There are several steps that are included in the heart
disease prediction. The preprocessing, feature selection and classification are some
of these steps. The Random Forest (RF) and logistic regression based the hybrid
scheme are introduced. The features are selected using RF. The implementation of
Logistic Regression (LR) is done for classification. The analysis of performance of
the recommended model for acquiring accuracy, precision, and recall is completed
in this research. The accuracy has obtained in predicting the heart disease from this
model is evaluated 95.08%.
1 Introduction
The use of data mining technology in healthcare sector has revolutionized the task
of disease prediction. The role of this technology in heart disease prediction is quite
significant. At present, a lot of data mining techniques are being used to detect and
extract valuable information from the medical dataset with minimum user inputs
and hard work. With the time, researchers have found several methods for imple-
menting data mining in medical domain so that different types of heart diseases can
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 657
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_70
658 H. K. Sharma and A. L. Sangal
2 Literature Survey
There are a few sources which are liable for any sort of coronary illness. The Naive
Bayesian (NB) calculation is viewed as which structure the Smart Heart Disease
Prediction (SHDP). A precision of around 89% is appeared by the proposed approach
[3].
To resolve the heart disease prediction related issues, ensemble techniques are
used. An accuracy of 85.48% is achieved by proposed technique [4].
To improve the accuracy of predicting cardiovascular diseases using a model of
hybrid random forest with linear model, an improved performance accuracy level of
around 88.7% was achieved in this research [5].
To predict heart disease, this research focused on adapting the SVM and apriori
algorithms. The medical profiles based on various factors were collected and used
here. The patients that were more likely to get heart disease were predicted here [6].
For the medical fraternity and patients, the usage of appropriate technology
support proved to be highly beneficial. Data mining techniques could be used to
resolve such an issue. The accuracies of naive Bayes and decision tree were compared
in this research [7].
Hybrid Model for Heart Disease Prediction … 659
To identify the risk in highly accurate manner, a heart disease prediction system
was proposed in this research. A new system was designed using the data mining tech-
niques. The frequent pattern growth association mining was applied on the dataset of
patients to provide strong association rules. The data could be explored, and the heart
disease could be predicted accurately by the doctors using this proposed method [8].
See Fig. 1.
The Cleveland dataset has been widely used for the heart disease prediction. This
dataset has 14 attributes.
STOP
660 H. K. Sharma and A. L. Sangal
For applying data mining techniques such that completeness can be introduced, and a
meaningful analysis can be achieved on the data; the data preprocessing is performed.
The performance of training model is improved by providing a clean and noise free
data for the feature selection process.
3.4 Classification
To categorize the given features for performing disease prediction, the selected
features are mapped to the training model. Here, a kind of heart disease is represented
by each separate class. The logistic regression model is applied for the classification.
The logistic regression takes input of the k-means output. In this research work, two
classes are defined which are heart disease and no heart disease. It means that which
persons have probability of heart disease and which don’t have probability of heart
disease.
A variety of models such as decision tree, Naive Bayes (NB), Multilayer Perceptron
(MLP), ensemble of Random Forest (RF), NB, and MLP is applied on the dataset. The
results of above models are compared in terms of accuracy, precision, and recall. It is
analyzed that accuracy of proposed model is 95.08% which is maximum as compared
to other models for the heart disease prediction. The dataset is divided into ratio of
60:40. 60% of the dataset is used for training, and remaining 40% is used for testing
(Fig. 2; Tables 1 and 2).
Hybrid Model for Heart Disease Prediction … 661
Heart disease is term that defines any disorder related to the heart. The problems using
the blood vessels, circulatory system, and the heart are defined as the cardiovascular
disease. It is analyzed in this work that heart disease prediction is very challenging as
the large number of features included in it. The various models are tested for the heart
disease prediction like decision tree, naive Bayes, multilayer perceptron, ensemble
classifier. The novel model in which the random forest and logistic regression are
integrated is introduced for the prediction. The selection of features is generated using
random forest, and logistic regression is carried out to perform the classification.
662 H. K. Sharma and A. L. Sangal
The recall, accuracy, and precision obtained from the proposed model is computed
as 95%. In future, the proposed model can be further improved using the methods of
deep learning.
References
1. Duff, F.L., Muntean, C., Cuggia, M., Mabo, P.: Predicting survival causes after out of hospital
cardiac arrest using data mining method. In: Medinfo, pp. 1256–1259 (2004)
2. Frawley, W.J., Piatetsky-Shapiro, G., Matheus, C.J.: Knowledge discovery in databases: an
overview. AI Mag. 13(3), 57–57 (1992)
3. Repaka, A.K., Ravikanti, S.D., Franklin, R.G.: Design and Implementing Heart Disease
Prediction Using Naives Bayesian. IEEE (2019)
4. Latha, C.B.C., Carolin Jeeva, S.: Improving the accuracy of prediction of heart disease risk based
on ensemble classification techniques. Inform. Med. Unlocked 16, 100203 (2019)
5. Mohan, S., Thirumalai, C., Srivastava, G.: Effective heart disease prediction using hybrid
machine learning techniques. IEEE Access 7
6. Sowmiya, C., Sumitra, P.: Analytical Study of Heart Disease Diagnosis Using Classification
Techniques. IEEE (2017)
7. Priyanka, N., Kumar, P.R.: Usage of data mining techniques in predicting the heart diseases—
Naïve Bayes and decision tree. In: 2017 International Conference on Circuit ,Power and
Computing Technologies (ICCPCT), pp. 1–7, Kollam (2017)
8. Chauhan, A., Jain, A., Sharma, P., Deep, V.: Heart disease prediction using evolutionary
rule learning. In: 2018 4th International Conference on Computational Intelligence and
Communication Technology (CICT), pp. 1–4 (2018)
Detection of Android Malware Using
Machine Learning Techniques
Abstract With the increase in popularity of the internet and android operating
system, the number of active internet user and their daily activity on android devices
is also increasing. So, that’s the reason malware writers are targeting android devices
more and more. The quickly creating malware is a major issue, and there is a require-
ment for discovery of android malware to secure the framework. Signature-based
technologies work efficiently for known malware but fail to detect unknown malware
or new malware. Academia is continuously working on machine learning and deep
learning techniques to detect advanced malware in today’s scenario. For machine
learning, feature vector and sufficient dataset are very important. In this paper, we
will develop and implement an approach for the detection of unknown malware with
a high detection rate.
1 Introduction
In recent years, android has overtaken many other mobile operating systems to
become one of the most popular and versatile mobile platforms in the world. Inter-
national Data Corporation (IDC) shared a report on global market share [1] for the
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 663
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_71
664 S. Pandey et al.
smartphone operating system, this report shows that in the 3rd quarter of the year
2018, 86.8% or total market is shared by the android operating system. Developers
prefer android over other smartphone operating systems for developing applications
because it is completely open-source. On the other hand, a user is biased to opt for
android smartphones due to the availability of low to high-end models, easy to use,
customization, high level of multitasking, custom ROMs, support of a large number
of applications, etc.
Hybrid Analysis: It includes both static and dynamic approaches for malware
analysis. It first inspects malware code by static examination shadowed by a dynamic
analysis approach for the improvement of complete examination [7].
The purpose of detection methods is to study the program’s behavior and verify
whether it is malicious or benign. Robust malware detection relies upon the capability
of handling obfuscated malware efficiently [2]. Two generally used obfuscation tech-
niques used are polymorphic and metamorphic in the generation of second-generation
malware. To battle threats/attacks from malware, antimalware developer’s software is
created, which primarily relies on the presumption that the formation of malware does
not modified considerably. The following are the techniques for malware detection:
1. Signature-Based: Signature-based technique is an easy and impressive way of
detecting known malware [8]. To combat threats/attacks from the malware,
antivirus developing companies use signature-based techniques. Unique byte
sequences are extracted when the malware is classified which is being used as a
signature.
2. Heuristic-Based: In heuristic-based detection technique artificial intelligence
used with signature-based detection techniques to enhance efficiency [9].
3. Malware Normalization: In malware normalization, normalizer acknowledges
the confounded type of malware and annihilates the confusion taken on the
program and creates the standardized executable.
4. Machine Learning: In the last couple of years, a machine learning technique
is attaining popularity for malware detection. “Tom Mitchell portrays machine
learning as the learning of computer algorithms that improves through analyses”
[10].
2 Background Study
Allix et al. [12] introduced a novel approach in 2014 to extract the control flow
graph from the application program that is a more expressive way than n-gram repre-
sentation. The authors used a sizeable dataset (over 50,000) of android application and
implemented using machine learning classifiers viz. Random forest, J48, LibSVM,
and JRip using ten-fold cross-validation.
Feizollah et al. [13] showed the effectiveness of explicit Intent and implicit Intent
for android malware detection. The evaluation has been done on 5560 malware
samples and 1846 benign samples. They adept a 91% accuracy by utilizing android
Intent, 83% utilizing android permission, and by consolidating together, they learned
the detection rate of 95.5% [13].
Sun et al. [14] introduced SigPID which is based on the permission analysis
to detect malicious applications. Authors have extracted 135 permissions from the
dataset but used only 34 permissions (25% of total permissions) to differentiate
between malicious and benign applications [6]. They have used the support vector
machine (SVM) for model training and declares an accuracy of 93.62% within dataset
and 91.4% for unknown malware [14].
Tao et al. [15] in this paper, authors studied hidden patterns of malware in real
world android applications. Authors extracted sensitive APIs that are utilized in
malware and similarly implemented an automatic malware recognition system to
detect unknown android malware [15]. Authors conducted a comprehensive study
using 31,185 benign and 15,336 malware samples and got 98.24 F1 scores.
Rashidi et al. [9] introduced an android assets utilization hazard evaluation
called XDroid. They have utilized Drebin malware dataset and affirmed that their
methodology could achieve up to 82% precision [9].
Zhu et al. [16] proposed a highly efficient and low-cost approach to extract permis-
sions and sensitive APIs as a feature and used ensemble rotation forest for model
training. The authors used 2130 samples to train the model and got 88.26% detec-
tion accuracy with 88.40% sensitivity at the precision of 88.16%. Opcode plays an
important role in malware detection as Sanjay et al. [17] used opcodes frequency for
malware detection in their approach. They have used the fisher score feature selection
algorithm for relevant feature selection. Authors used several classifiers existing in
Weka machine learning tool and got almost 100% detection accuracy.
Recently, Ashu et al. [17] examined the five classifiers on the Drebin dataset using
the opcodes occurrence as a feature and got an accuracy of 79.27% by functional
tree classifier for malicious application detection.
Sahin et al. [18] proposed an authorization based android malware framework to
recognize malevolent applications. In contrast to different investigations, the authors
proposed an authorization weight approach. At that point, K-nearest neighbor (KNN)
and Naıve Bayes (NB) calculations are used and got 90.76% accuracy. As indicated
by the authors, the proposed approach has preferable outcomes over different ones.
Detection of Android Malware Using Machine Learning Techniques 667
3 Proposed Methodology
In this segment, we clarify our methodology for the detection of malicious android
applications utilizing machine learning. Figure 1 displays the structure of our offered
strategy where we perform the accompanying advances:
• Dataset collection
• Feature extraction
• Feature selection
• Classification of malware and benign
We collect malicious and benign apks for android applications from AndroZoo [7],
which is a developing vault of android applications. AndroZoo contains the appli-
cations that are collected from the various sources, including the Google play store
marketplace. The dataset we use for the analysis contains 15,000 malware and 15,000
benign android application package APKs. We also check the secure hash algorithm
(SHA) value of the applications to ensure the unique sample for analysis.
In this phase, we extract the features from the dataset that we have collected. In
this work, we have performed the static analysis of android applications to identify
malicious applications. For the extraction of features, we use Apktool and Andro-
guard [19] reverse engineering tools. We extract seven categories of features, namely
requested android open source project (AOSP) permissions, requested third-party
permissions, providers, activities, receivers, services, and opcodes. We extract all
categories of features based on the occurrence in the application. During the initial
stage of feature extraction, we extract a huge number of features in each category.
Then, we first filter the features with the top frequency of occurrence in the dataset.
Figures 2, 3, 4, 5, 6, 7 and 8 shows the top 20 frequency feature in our dataset for
each category of features. The information that is extracted as features are as follows:
1. Permissions: Permissions are used to protect the privacy of an android user, and
a few applications also need permission to access users’ sensitive data like short
message service SMS, contact, etc. Some applications also request third part
permission which is not mentioned in the android open-source project [20]. The
combination of permission sometimes reflect the malicious behavior. Therefore,
we extract two types of permission as features.
• AOSP and third-party permission. Figure 2 shows comparison of top 20
android open-source project permissions in malware and benign. Figure 3
shows comparison of top 20 third-party permissions in malware and benign.
During the feature extraction phase, we extract a total of 696 features. But, we
understand that the efficiency of the machine learning model will decrease with a
large number of features, and also it increases the time to train and test the model. We
apply information gain and correlation coefficient as a feature selection algorithm
for the reduction of irrelevant features. During the literature review, we find that
the researchers widely use these feature reduction algorithms [21, 22]. During the
feature selection process, we remove those features which are either not participating
in enhancing the model performance or trying to make the performance of the model
worse. In the case of the correlation coefficient, feature selection, we select 180
features whose ranking score is greater than 0.1189 and in case of information gain
106 features are selected whose score is greater than 0.103665.
3.4 Classiftcation
This section discusses the machine learning classifiers which are utilized in our
work. For the classification, we practice four dissimilar supervised machine learning
classifiers, namely random forest [4], eXtreme Gradient Boosting (XGBoost) [23],
decision tree [24], and k-nearest neighbors (KNN) [25]. These ML classifiers are
mostly used in malware detection domain, and there is one reason to select tree-
based classifiers because these are very robust [20, 26–29]. They perform well on a
large variety of problems and also capture dependencies in such ways linear models
cannot. We spilt the dataset in a 70–30% ratio for training and testing the models,
respectively. We also are doing the parameter tuning while training and testing for
the higher performance of the classifiers. To analyze the performance of our models,
we execute experiments using ten-fold cross-validation.
4 Experimental Results
Firstly, we applied ten-fold cross-validation on data set with a single feature vector,
i.e. (opcodes, receivers, providers, etc.), after that on combined features. Finally, we
applied ten-fold validation on the dataset with selected features using information
gain and correlation. Table 1 summarized the accuracy of each classifier with a
Detection of Android Malware Using Machine Learning Techniques 673
single feature vector, without feature selection, and finally with selected features.
So, in the case of random forest using information gain feature selection algorithm
with combined features, we got maximum accuracy, i.e., 99.19%.
References
4. Dogru, N., Subasi, A.: Traffic accident detection using random forest classifier. In: 2018 15th
Learning and Technology Conference (L&T), pp. 40–45. IEEE (2018)
5. Sharma, S., Krishna, C.R., Sahay, S.K.: Detection of advanced malware by machine learning
techniques. In: Soft Computing: Theories and Applications, pp. 333–342. Springer (2019)
6. Shabtai, A., Moskovitch, R., Elovici, Y., Glezer, C.: Detection of malicious code by applying
machine learning classifiers on static features: a state-of-the-art survey. Inf. Secur. Tech. Rep.
14(1), 16–29 (2009)
7. Allix, K., Bissyand´e, T.F., Klein, J., Le Traon, Y.: Androzoo: Collecting millions of android
apps for the research community. In: Proceedings of the 13th International Conference on
Mining Software Repositories, pp. 468–471. MSR ’16, ACM, New York, NY, USA (2016).
https://doi.org/https://doi.org/10.1145/2901739.2903508
8. Griffin, K., Schneider, S., Hu, X., Chiueh, T.C.: Automatic generation of string signatures for
malware detection. In: International Workshop on Recent Advances in Intrusion Detection,
pp. 101–120. Springer (2009)
9. Rashidi, B., Fung, C., Bertino, E.: Android resource usage risk assessment using hidden markov
model and online learning. Comput. Secur. 65, 90–107 (2017)
10. Dietterich, T.G.: Machine learning in ecosystem informatics and sustainability. In: Twenty-First
International Joint Conference on Artificial Intelligence (2009)
11. Schultz, M.G., Eskin, E., Zadok, F., Stolfo, S.J.: Data mining methods for detection of new
malicious executables. In: Proceedings 2001 IEEE Symposium on Security and Privacy. S&P
2001, pp. 38–49. IEEE (2000)
12. Allix, K., Bissyand´e, T.F., J´erome, Q., Klein, J., State, R., Le Traon, Y.: Largescale machine
learning-based malware detection: confronting the” 10-fold cross validation” scheme with
reality. In: Proceedings of the 4th ACM Conference on Data and Application Security and
Privacy, pp. 163–166 (2014)
13. Narudin, F.A., Feizollah, A., Anuar, N.B., Gani, A.: Evaluation of machine learning classifiers
for mobile malware detection. Soft. Comput. 20(1), 343–357 (2016)
14. Li, J., Sun, L., Yan, Q., Li, Z., Srisa-an, W., Ye, H.: Significant permission identification for
machine-learning-based android malware detection. IEEE Trans. Industr. Inf. 14(7), 3216–3225
(2018)
15. Tao, G., Zheng, Z., Guo, Z., Lyu, M.R.: Malpat: mining patterns of malicious and benign
android apps via permission-related Apis. IEEE Trans. Reliab. 67(1), 355–369 (2017)
16. Zhu, H.J., You, Z.H., Zhu, Z.X., Shi, W.L., Chen, X., Cheng, L.: Droiddet: effective and
robust detection of android malware using static analysis along with rotation forest model.
Neurocomputing 272, 638–646 (2018)
17. Sharma, A., Sahay, S.K.: An investigation of the classifiers to detect android malicious apps.
In: Information and Communication Technology, pp. 207–217. Springer (2018)
18. Şahın, D.Ö., Kural,O.E., Akleylek, S., Kili¸c, E.: New results on permission based static analysis
for android malware. In: 2018 6th International Symposium on Digital Forensic and Security
(ISDFS), pp. 1–4. IEEE (2018)
19. Androguard. https://androguard.readthedocs.io/en/latest/ (Dec, 2019)
20. Milosevic, N., Dehghantanha, A., Choo, K.K.R.: Machine learning aided android malware
classification. Comput. Electr. Eng. 61, 266–274 (2017)
21. Jimenez, J.H., Goseva-Popstojanova, K.: Malware detection using power consumption and
network traffic data. In: 2019 2nd International Conference on Data Intelligence and Security
(ICDIS), pp. 53–59. IEEE (2019)
22. Zhang, Z., Chang, C., Han, P., Zhang, H.: Packed malware variants detection using deep belief
networks. MATEC Web Conf. 309, 02002 (2020)
23. Zhang, Y., Huang, Q., Ma, X., Yang, Z., Jiang, J.: Using multi-features and ensemble learning
method for imbalanced malware classification. In: 2016 IEEE Trustcom/BigDataSE/ISPA,
pp. 965–973. IEEE (2016)
24. Gunnarsdottir, K.M., Gamaldo, C.E., Salas, R.M., Ewen, J.B., Allen, R.P., Sarma, S.V.: A
novel sleep stage scoring system: combining expert-based rules with a decision tree classifier.
In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and
Biology Society (EMBC), pp. 3240–3243. IEEE (2018)
Detection of Android Malware Using Machine Learning Techniques 675
25. Cunningham, P., Delany, S.J.: k-nearest neighbour classifiers. Multiple Classifier Syst. 34(8),
1–17 (2007)
26. Alam, M.S., Vuong, S.T.: Random forest classification for detecting android malware. In: 2013
IEEE International Conference on Green Computing and Communications and IEEE Internet
of Things and IEEE Cyber, Physical and Social Computing, pp. 663–669. IEEE (2013)
27. Firdausi, I., Erwin, A., Nugroho, A.S., et al.: Analysis of machine learning techniques used in
behavior-based malware detection. In: 2010 Second International Conference on Advances in
Computing, Control, and Telecommunication Technologies, pp. 201–203. IEEE (2010)
28. Kruczkowski, M., Niewiadomska-Szynkiewicz, E.: Comparative study of supervised learning
methods for malware analysis. J. Telecommun. Inf. Technol. (2014)
29. Wang, J., Li, B., Zeng, Y.: Xgboost-based android malware detection. In: 2017 13th Inter-
national Conference on Computational Intelligence and Security (CIS), pp. 268–272. IEEE
(2017)
The Predictive Genetic Algorithm (GA)
Load Management Mechanism
for Artificial Intelligence System
Implementation (AI)
Abstract The next generation of cloud infrastructure will allow the network more
versatile and useable resources effectively. Load balancing is one of the key issues
of cloud computing which distributes the task over many nodes to insure that no one
tool becomes overwhelmed which underused. The user must be willing to guarantee
that all criteria are fulfilled in a limited span of time for optimal performance for
applications which are cloud dependent almost every day. The cloud load genetic
algorithm (GA) approach is given in this article. Depending on the population initial-
ization period, the urgency of the proposal is considered. The emphasis is on the
idea of imaging the universe in question. Systems for real-life situation have other
targets for our algorithm to be mixed. The suggested method is modeled using cloud
analyst. A simulation of cloud infrastructure is feasible. The end result will reveal the
viability of a quantitative workload management approach that would help manage
working loads with an improved use of computational capital. This article offers a
new approach to genetic algorithm (GA) load control. To order to minimize the diffi-
culty of a single task, the algorithm handles the cloud storage load. The proposed load
balancing strategy was evaluated by a program analyst model. The findings of simu-
lations for a typical sample system show that the suggested algorithms exceeded real
methods like FCFS, round robin (RR), and the local search algorithm of stochastic
hill climbing (SHC).
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 677
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_72
678 T. Pushpatha and S. Nagaprasad
1 Introduction
2 Literature Survey
For VMs in cloud storage, a common load balancing technique was adopted. This
requires regional policy experience to render load management decisions. Normal
performance is improved similarly by load balancing, and fault reduction is not
taken into account. R. The card approach that involves the load balancing and the
distributed rate control system has been introduced by Hamilton et al. and acts as an
integrated tool for cloud management. Brazilian et al.
Yi = f (X i , β) + ei (1)
f (X i , β) = β0 + β1 X i (2)
For data centers with automated cloud virtualization and computing, vector dot
methodology has been implemented. The dot component is used to distinguish nodes
by utility requirements. The algorithm in the illustration aims to resolve the problem
of load balance for capital delivery.
Yi = β0 + β1 X i + ei (3)
= (Yi − f (X i , β))2 (4)
i
Nevertheless, the approach does not tackle the reduction of costs, that is, the
expense of the allocation of loads that may take longer than the actual measurement
time (Fig. 1).
To mitigate costs for storage and the benefit from decreased knowledge trans-
mission. However, in order to maximize the distribution and migration of the data
using the linear algorithm, such algorithms will require simultaneous applications
for data processing and migration, implementing slave load balance master (master
slave load balance).
= ei2 = (Yi − (β0 + β1 X 1i + β2 X 2i ))2 = 0 (5)
i i
Yi = β0 + β1 X 1i + β2 X 2i + ei (6)
ei2 = (Yi − (β0 + β1 X 1i + β2 X 2i ))2 = 0 (7)
i i
Nevertheless, this method just addresses static load balancing. This indicates that
the Lagrange multiplier is estimated for the transmitted weight and therefore an
efficient functional weights balance converting algorithm in Euclidean form (Fig. 2).
The development technology of a hybrid grid and cloud infrastructure [8] reduces
the operating system’s length and overall management.
(xi − x)(yi − y)
β1 = (8)
(xi − x)2
yi = β0 + β1 xi . (9)
1 x2 xi2
σβ0 = σε + = σβ1 (10)
n (xi − x)2 n
682 T. Pushpatha and S. Nagaprasad
To address the expenditure and the time period for the question. This approach
yields better outcomes in a shorter time. Similar types of topics were regarded.
Typically located ML uses the BSP model, for example, spark and log. The calcula-
tion method for BSP contains a set of T super phases divided into a synchronization
firewall. Super phase is used to define a series of operations over two synchroniza-
tion maximum periods. In each super step, all measurement nodes conduct iterative
calculations simultaneously.
n
p
n
xi j xik βk = xi j yi , j = 1, . . . , p. (11)
i=1 k=1 i=1
X X β = X Y, (12)
Enter a synchronization barrier and wait. The parameter changes and passes the
regional parameters to all computer nodes until the calculations are finished by all
the computer nodes and acts are settled upon.
1
ρ yX, β, σ 2 ∝ (σ 2 )− 2 exp − 2 (y − Xβ)T (y − Xβ) .
n
(14)
2σ
All computer nodes then travel together through the next overlapping. The align-
ment limits are exposed. This sync function allows the parallel ML algorithm of
the BSP model to be serialized to ensure that changes in parameters are globally
consistent and that execution of the algorithm is precise.
If there is an unbalanced cluster load, the efficacy of the BSP system model will
decrease considerably. For e.g., the Ho analysis shows that, if the LDA model is run
on 32 BSP devices, the synchronization barrier is six times higher than iterations
[9]. Some cases, thus removing the problem of straggler. The styling threshold of
DSP is low, but the cluster load balance is implied. The straggler problem cannot
The Predictive Genetic Algorithm (GA) Load Management … 683
be solved because when the system nodes are added, the DSP does not fully fit the
cluster charge. The following threshold limits analyses are provided.
As all computation nodes are modified synchronously and the iterative check quantity
from each device node is calculated through the usage of the output model, the
controller mechanism generates the function performance via the Ganglia machine
control unit. A-DSP provides a system for adjusting load balances dependent on
DSP.
The main parameter method components are the centralized management framework,
the application unit for output insurance, the centralized synchronous control device,
and the redistribution task framework. For global model parameters, the standardized
parameter control architecture is applied (Fig. 3).
More estimates at strong nodes are delegated by rapidly converting the estimated
sum of the iteration of each node into the actual iteration period between nodes, thus
essentially growing the cluster load and the configuration exercise. Transform FR
makes less node slower time and faster node measurement.
ρ β, σ 2 = ρ σ 2 ρ βσ 2 , (17)
2 −
v0
−1 v0 s02
ρ σ 2
∝ (σ ) 2 exp − 2 . (18)
2σ
1
ρ βσ 2 ∝ (σ 2 )− 2 exp(− 2 β − μ0 )T 0 (β − μ0 ) .
k
(19)
2σ
The working load array records the training number calculated with the next
iteration of the algorithm node 1. Three computers with one system and database
transfers are included on the site level. Only through the flow lines from thirds. It
is represented when not all operator pairs connect in the following traffic matrix
(Fig. 4).
The coding was passed to the manager of the dependency program and an unbiased
operation evaluation. It receives the job and tests if it is totally isolated or requires
multiple jobs.
μn = (XT X + 0 )−1 XT X β + 0 μ0 . (20)
It explores the relations between various activities if several activities are involved.
You’ll take the job queue and another job queue into consideration. The work roles
will be directed to the scheduler to plan childcare after childcare.
The addiction job list will contain activities dependent on the other VM activities. As
long as all the roles of the child are fulfilled in this set, the parent task of the VM is
delegated while the person tail includes different duties. The scheduler comes with
a separate work queue and addiction feature.
ρ β, σ 2 y, X ∝ ρ βσ 2 , y, X ρ σ 2 y, X , (21)
n = XT X + 0 , μn = (n )−1 XT X β + 0 μ0 , (22)
The Predictive Genetic Algorithm (GA) Load Management … 685
The scheduler selects the correct machine based on the IWRR algorithm. The
scheduler gathers the details of the resource planner.
n 1
an = a0 + , bn = b0 + yT y + μT0 0 μ0 − μTn n μn . (23)
2 2
It tests the processing power of the VMs and then uses the suggested algorithm to
determine the right VM for the particular task. Every VM provides comprehensive
details on the task execution list, task split list, and job custody.
The load balancer tests the office number percentage to the VM point. The loading
on will of the VMs would be calculated by means of the VM’s job execution list if
the proportion is less than 1, then the VM scheduler must marking for the task.
1 T
bn = b0 + y y + μT0 0 μ0 − μTn n μn . (24)
2
p(y|m) = p(yX, β, σ ) p(β, σ )dβdσ (25)
1 det(0 ) b0a0 (an )
p(ym) = · · (26)
det(n ) bnan (a0 )
n
(2π ) 2
p(β, σ |m) p(yX, β, σ, m)
p(ym) = (27)
p(β, σ y, X, m)
The least-used VM will be allocated when usage falls below 20 per cent; the
scheduler will be informed of the right VM for the job. Before the right server is
located, the job will be assigned to this machine.
P(XZ)P(Z) P(XZ)P(Z)
P(ZX) = = (28)
P(X) ∫Z P(X, Z)dZ
The configured data centers contain host and VM with the corresponding elements.
The funds are checked for idleness and large loads to move demands for workers
efficiently to an acceptable location.
686 T. Pushpatha and S. Nagaprasad
5 Results
The following order indicates the maximum to lowest efficiency of the computing
power of the heterogeneous VMs into account. Among the homogenous workplaces
among heterogeneous settings, more workers are allocated to higher ability VMs.
DKL (QP) = Q(Z) log Q(Z) − log P(Z, X) + log P(X) (29)
Z
The WRR takes into consideration the connection of VM ability to the overall
VM resources and assigns a proportionate amount of works
DKL (QP) = EZ log Q(Z) − log P(Z, X) + log P(X) (30)
When the least loaded is able to complete all of the works current in extremely
loaded worker in the shortest possible period. It is the next step it does. Based from
the previous equation, the LDVs will be allocated long work, so the execution period
will be postponed (Fig. 5).
The scheduler then checks the estimated completion date for each of the loaded
VMs and covers the estimated period for a VM at the real completion date of the set.
Consequently, the least likely period of depletion was calculated in one of the VMs
from the above measurements, and the function was then allocated to this VM. At
the end of the task is the load balance in the IWRR with the work time. Also, for
heterogeneous area data centers, this method is ideal (Fig. 6).
Solid diurnal task weights developed by the Cicada professional tracking
algorithm. Nearly, all weights are given 24 h earlier by the algorithm (Fig. 7).
The following instruments are listed. The data storage module helps to store each
computer node’s subtractive sets. The node reads the sharing of workload already.
The goal of this paper is to adapter to the dynamic workload distribution (Fig. 8).
The output of the Cicada estimation algorithm versus past. When a tale rises, the
sum of past requires to be considered. For a projection in all but one event, Cicada
requires fewer than 10 ms (Fig. 9).
The SSP requires multiple training sessions to be performed by each the slowest
node completes this iteration by default. The global model parameters are then
satisfied in order to adjust local model parameters (Fig. 10).
The degree of the frequency of synchronization challenges is minimized, and SSP
model testing costs are lowered. At the same time, if the output is greater than PE,
a space-free CPU/PE may only be used to execute one operation at any time. The
intern performs higher job migrations in the WRR and RR algorithms. This sum
of migration is also important in the smaller number of resources in WRR and RR
algorithms (Fig. 11).
Displays this type for reservation. Rather, a ‘over-subcontracted data clumters’
(VOC) model is ideally adapted to authors who recognize there are no program traffic
trends in this curriculum vitae (quotation [37] and [61]). This model makes groups
with oversubscribed virtual machine interaction as seen in Figs. 3, 4 and 5b. The
VOC model needs two additional parameters (Fig. 12).
The estimate shows that the violation in SLA enhancement is less than 0.286.
Figure 6 points out the effects of the JCR with and without SVM. JCR has a redline
impact with SVM, while blue line reflects JCR’s effects without SVM. The figure
shows that the JCR increases above 0.538 (Fig. 13).
Fig. 12 Error
55–75%. The average increase is 11–18% in total and 25–26% in the amount
of applications that have been modified by Cicada. These are near the numbers
published.
The improved weighted round robin algorithm helps VMs to operate in and out of
most compatible VMs. In the three different circumstances of the atmosphere cycle,
there are three distinct phases. Initial placement concentrates on the strengthened
weighted round robin algorithms to provide job requirements for msv participants,
based on the capabilities of the VM and the required working time. The dynamic
planner is ready for the loading and completion time of all configured VMs for all
configured VMs. The minimum time for completion of this specific role was then
defined for one of the VMs based on the above calculations. The weighing device
for the ring robin is at the end of each game. When complete, the load is spread
consistently within the facilities (VMs) involved over any VMs and idling periods.
The results of the success analysis and tests with this algorithm have shown that the
improved weighted ring royal algorithm is appropriate for heterogeneous work with
heterogeneous devices relative to the other ring and weighted circular algorithm.
This algorithm calls the QoS key parameter reaction times.
References
1. Simar, P.S., Anju, S., Rajesh, K.: Analysis of load balancing algorithms using cloud analyst.
Int. J. Grid Distrib. Comput. 9(9), 11–24 (2016)
2. Maguluri, S.T., Srikant, R., Ying, L.: Stochastic models of load balancing and scheduling in
cloud computing clusters. In: INFOCOM Proceedings IEEE, pp. 702–710 (2012)
3. Desyatirikova, E.N., Kuripta, O.V.: Quality management in IT service management based on
statistical aggregation and decomposition approach. In: 2017 International Conference “Quality
Management, Transport and Information Security, Information Technologies” (IT&QM&IS),
pp. 500–505. https://doi.org/10.1109/ITMQIS.2017.8085871
4. Cheng, D., Rao, J., Guo, Y., Jiang, C., Zhou, X.: Improving performance of heterogeneous map
reduce clusters with adaptive task tuning. IEEE Trans. Parallel Distrib. Syst. 28(3), 774–786
(2016)
5. Chiang, M.L., Luo, J.A., Lin, C.B.: High-reliable dispatching mechanisms for tasks in
cloud computing. In: BAI2013 International Conference on Business and Information, Bali,
Indonesia, p. 73, 7–9 July 2013
6. Mohapatra, S., Smruti Rekha, K., Mohanty, S.: A comparison of Four Popular Heuristics for
Load Balancing of Virtual Machines in Cloud Computing
7. Kundu, S., Rangaswami, R., Dutta, K., Zhao, M.: Application Performance Modeling in a
Virtualized Environment. In: Proceedings of IEEE HPCA, Jan 2010
8. Chiang, M.-L., Hsieh, H.-C., Tsai, W.-C., Ke, M.-C.: An improved task scheduling and load
balancing algorithm under the heterogeneous cloud computing network. In: 2017 IEEE 8th
International Conference on Awareness Science and Technology (iCAST). https://doi.org/10.
1109/icawst.2017.8256465
The Predictive Genetic Algorithm (GA) Load Management … 691
9. von Laszewski, G., Wang, L., Younge, A.J., He, X.: Power-aware scheduling of virtual machines
in DVFS-enabled clusters. In: IEEE International Conference on Cluster Computing and
Workshops, New Orleans, LA, pp. 1–10 (2009)
10. Kaneria, O., Banyal, R.K.: Analysis and improvement of load balancing in cloud computing.
In: International Conference on ICT in Business Industry and Government (ICTBIG), Jan 2016
11. Ajila, S.A., Bankole, A.A.: Cloud client prediction models using machine learning techniques.
In: 37th Annual International Computer Software and Applications Conference, Kyoto, Japan
(2013)
12. Lyu, H., Li, P., Yan, R., Luo, Y.: Load forecast of resource scheduler in cloud architecture. In:
2016 International Conference on Progress in Informatics and Computing (PIC)
13. Shakir, M.S., Razzaque, A.: Performance comparison of load balancing algorithms using cloud
analyst in cloud computing. In: 2017 IEEE 8th Annual Ubiquitous Computing, Electronics
and Mobile Communication Conference (UEMCON). https://doi.org/10.1109/uemcon.2017.
8249108
14. Kumar, M., Sharma, S.C.: Dynamic load balancing algorithm for balancing the workload
among virtual machine in cloud computing. In: 7th International Conference on Advances in
Computing and Communications, ICACC-2017, 22–24 Aug 2017, Cochin, India
15. Volkova, V.N., Chemenkaya, L.V., Desyatirikova, E.N., Hajali, M., Khodar, A., Osama, A.:
Load balancing in cloud computing. In: 2018 IEEE Conference of Russian Young Researchers
in Electrical and Electronic Engineering (EIConRus). https://doi.org/10.1109/eiconrus.2018.
8317113
16. Wang, Y., Ren, Z., Zhang, H., Hou, X., Xiao, Y.: “Combat Cloud-Fog” network architec-
ture for internet of battlefield things and load balancing technology. In: 2018 IEEE Interna-
tional Conference on Smart Internet of Things (SmartIoT).https://doi.org/10.1109/smartiot.
2018.00054
17. Li, J., Qiu, M., Niu, J.-W., Chen, Y., Ming, Z.: Adaptive resource allocation for preempt able
jobs in cloud systems. In: 10th International Conference on Intelligent System Design and
Application, pp. 31–36 (2011)
18. Shi, J.Y., Taifi, M., Khreishah, A.: Resource planning for parallel processing in the cloud. In:
IEEE 13th International Conference on High Performance and Computing, pp. 828–833 (2011)
19. Goudarzi, H., Pedram, M.: Multi-dimensional SLA-based resource allocation for multi-tier
cloud computing systems. In: IEEE International Conference on Cloud Computing, pp. 324–
331 (2011)
20. Dhiman, G., Marchetti, G., Rosing, T.: vGreen: a system for energy efficient computing in
virtualized environments. In: Conference of ISLPED 2009 San Francisco, California ,USA,
pp. 19–21 (2009)
21. Jin, H., Deng, L., Wu, S., Shi, X., Pan, X.: Live virtual machine migration with adaptive, memory
compression. In: IEEE International Conference on Cluster Computing and Workshops, New
Orleans, LA, pp. 1–10 (2009)
22. Pattanaik, P.A., Roy, S., Pattnaik, P.K.: Performance study of some dynamic load balancing
algorithms in cloud computing environment. In: 2015 2nd International Conference on Signal
Processing and Integrated Networks (SPIN)
23. Li, B., Li, J., Huai, J., Wo, T., Li, Q., Zhong, L.: EnaCloud: an energy-saving application live
placement approach for cloud computing environments. In: IEEE International Conference on
Cloud Computing, Bangalore, pp. 17–24 (2009). (2) (PDF) VM Allocation in cloud computing
using SVM. Available from https://www.researchgate.net/publication/336022132_VM_Alloca
tion_in_cloud_computing_using_SVM. Accessed 16 Mar 2020
Continuous Recognition of 3D Space
Handwriting Using Deep Learning
Abstract In this paper, we attempt to present novel input methods that help enable
byzantine free of hands interface through recognition of 3D handwriting. The motion
is detected wirelessly by the use of the inertial measurement unit (IMU) of the
Arduino 101 board. Two different approaches are discussed. One approach is to use
the pattern matching engine (PME) of the Intel® Curie™ module on Arduino 101
mounted on the back of the hand. The second approach uses the IMU input to a well-
structured recurrent neural network. The spotting of handwriting segments is done
by a support vector machine. The former approach, being indigent of memory, is
not preferred over the latter. The deep learning approach can continuously recognize
random sentences. The model was trained on 1000 freely definable vocabulary and
was tested by only one person, achieving the lowest possible word error rate of 2%.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 693
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_73
694 S. Maheshwari and S. Gajjar
that combine the intuition gathered from gestures to express it in the form of hand-
writing, specifically as a text output. Several challenges arise. First, in everyday life,
the gestures are not limited to specific handwriting segments, but also include the
normal day-to-day activities, introducing a lot of irrelevance in the text input inter-
face. The handwriting segments should be identified beforehand in the continuous
stream of data. Secondly, as the accelerometer data is noisy, it should be filtered
before sending it to the recognition stage. Third, the actual text input must be recog-
nized in the whole data stream. For continuous recognition, we use two approaches.
The first approach involves the use of Arduino 101 [3]. The Intel® Curie™ module
embedded on the Arduino 101 provides us a pattern matching engine that can be used
to recognize the gestures. The second approach is to use the 3-axes accelerometer
of Arduino 101 and divide the process into 2 stages. The first stage is the spotting
stage, which involves the use of a support vector machine [4] to classify between the
writing and non-writing segments. The second stage uses recurrent neural networks
for recognition of the gestures [5]. While the existing proposed scheme is based on
recognition of text, this can be utilized as a base for any type of gesture recogni-
tion scheme which is built on a primeval alphabet of freely definable gestures. The
first approach lacks suitable memory for large datasets; i.e., it is limited to only 128
bytes of memory per neuron for 128 neurons. The second approach, however, can be
applied to large definable vocabularies, larger than V1K. Following is the organiza-
tion of rest of the paper. Section 2 discusses the related work. Recognition of gestures
using Arduino 101 and deep learning are discussed in Sects. 3 and 4, respectively,
finally followed by conclusion.
2 Related Work
Recent research suggests the paradigm shift to mobile device computing by facil-
itating hands-free action. Gestures allow fostering interface that is independent of
any handheld tool. Hence, allowing faultless incorporation into day-to-day activities.
Mini-projectors portray the display on a rigid exterior in front of the subject and the
gesture is tracked via a camera or any other medium [6]. However, the approach
depends on sensory input and hence would perform poorly in case of continuous
stream recognition. Other researchers propose that 3D communication is doable
without any sort of graphical output. The operator needs to imagine a blank surface
that serves the purpose of the screen [7]. Handwriting can be predicted as text lacking
any optical or sensory feedback, a method that is used here. In any accelerometer data,
the spotting of relevant signal segments is necessary. This is possible by employing
a binary classifier to detect probable segments and then classify the gesture after-
ward [7]. This approach, however, introduces latency, and therefore, the overhead
involved reduces the efficiency of the recognition system. The other method is to
sort the input constantly and eliminate any irrelevant outputs. Gesture recognition
using accelerometer data has been experimented heavily previously where normally
numerous secluded motions are expressed and sorted [8]. Many researchers propose
Continuous Recognition of 3D Space … 695
Arduino 101 is a development kit that comprises Intel® Curie™ module, intended
to assimilate the low power usage of the core with elevated ease-of-use [3]. The
Arduino has capacities of low energy Bluetooth and consists of an onboard 6-axes
accelerometer/gyroscope. It consists of 2 miniature cores, a 32-bit ARC architecture
core, and an x86 (Quark), both of which are clocked at 32 MHz. The real-time
operating systems (RTOS) and the associated framework designed by Intel are both
open-source [3].
The pattern matching engine (PME) of Curie™ works as an engine for concurrent data
recognition having 128 concurrent processing elements (PE) each along with input
vector of 128-byte, 128-byte model memory, and 8-bit arithmetic units. It supports 2
classification techniques: radial basis function and k-nearest neighbors and supports
127 contexts. Arduino 101 provides CuriePME API that can be used to train and
classify gestures. Additionally, the module also provides an inertial measurement
unit having 6 degrees of freedom and each sensor sample (so ) can be represented as
a 6-dimensional vector corresponding accelerometer/gyroscope values:
so = (a, g) = (ax , ay, az ), gx , g y , gz (1)
As stated earlier, the QuarkSE core on Curie module comes with 128 neurons, with
128 bytes of memory per neuron. And hence, there is a trade-off between memory
and the data that can be classified. Figure 1 shows the glove for gesture recognition.
We propose a system shown in Fig. 2 which is user-dependent and gives a
comparatively poor performance in terms of word error rate on a person-independent
setup. This system gives better performance when the dataset comprises of utmost
3 syllable words. The system gives 100% accuracy when single letters are to be
classified. Continuous recognition of words is also possible with this setup but is not
recommended due to memory indigence.
For instance, drawing the letter A takes almost 2 s, which is 200 samples at
100 Hz. Now, the 3-axes accelerometer values in ‘int’ are 4 bytes each which makes
696 S. Maheshwari and S. Gajjar
up 2400 bytes per letter. But for 128 neurons, our pattern can be no larger than 128
bytes. So, to throw off at least 95% of data without affecting the results requires
the use of under-sampling, after which the max size of 128 bytes per letter can be
achieved. Also, to remove the noisy data, we use an averaging filter. From the above
discussion, it is clear that memory management is not efficient with this system.
And which leads to a lot of data being wasted. The CuriePME library is mostly used
for (1) learning patterns (2) classifying and recognizing patterns and (3) storing and
retrieval of pattern matching as knowledge. The use of CurieBLE library ensures the
wireless function [9].
Figure 3 shows the raw and noisy accelerometer data. Figure 4 shows the
accelerometer data under-sampled to a total of 45 samples and mapped from 0 to 255.
Due to the memory constraints, i.e., only 128 bytes per pattern and poor memory
management, we propose a new method for gesture recognition with the use of deep
learning.
Continuous Recognition of 3D Space … 697
This approach is a more robust approach for gesture recognition. It can be divided
into spotting stage and the recognition stage. The combination of the two stages
introduces no overhead and there is no effect on the accuracy of word detection. The
process can be seemingly pipelined and real-time detection of gestures is possible.
698 S. Maheshwari and S. Gajjar
The role of the spotting stage is to classify the writing and the non-writing segments
in the accelerometer data, uniquely. The intuition of the spotting stage is derived by
Amma et al. [4]. The segments that are correctly recognized as the writing segments
are then carried forward to the detection stage. The stage uses an RBF kernel (C =
126, γ = 2) based on a binary support vector machine (SVM) classifier. For usage
on continuous data streams and in real-time, the approach of the sliding window is
more suitable. The overlapping sliding windows are classified and accumulated to
send to the recognition stage. The window of length 0.9 s and shifting width of 0.1 s
is used in the approach. Figure 5 depicts the architecture of the spotting stage. In the
figure, green and red segments show writing and non-writing segments, respectively.
Visual inspection depicts that the handwriting part has high frequency and ampli-
tude than the non-writing part. For each window wt , the SVM classifier C(wt ), returns
1 when handwriting segment is detected and returns 0 otherwise. One sample of
sensor, st is categorized as a handwriting motion if at least a single window consisting
of st is categorized as handwriting segment [4].
This system is biased towards the detection of writing motion. Also, minute inter-
vals while writing would not result in any gaps in the detected writing segments. All
real-time experiment results show that the chosen values are suitable for the model.
As the system is biased, a high recall of 98.2% is attained in the process and the
low precision of 32% is attained. As a comparable result in [4], these values are
reasonable.
The purpose of gesture detection is to build a dominant classifier and hence several
deep learning models that exhibit temporal dynamic behavior come into play. These
state-of-art models include gated recurrent units (GRU) [5], long short-term memory
(LSTM) [5], and recurrent neural networks (RNN) [5]. We discuss RNN for this
stage. They are utilized for processing time-sequence data. From the input layer to
Continuous Recognition of 3D Space … 699
the output layer in a conventional neural network, the layers are fully connected that
is not appropriate for time-series data. Hence in RNN, the present output is also
related to the past output. A network would remember the previous output and apply
this information for calculating the present output. Theoretically, however, RNN can
manage infinite time-series data. In practice, to reduce intricacy, the present state is
only associated to the past few states, according to need [5]. Averaged 3D acceleration
and averaged 3D angular rate features are extracted from the inertial measurement
unit of Arduino 101. Figure 6 shows the RNN structure that is used.
Formula for the network is a derivative of the Eqs. 3 and 4:
h t = f wh h t−1 + wi x t (3)
y t = f wo h t (4)
5 Conclusion
In the initial part of the work, a wearable system of gesture input that is adept
in recognizing text input written in air centered on the IMU of Arduino 101 is
suggested. With the use of CuriePME, the system works well on detection of gestures
700 S. Maheshwari and S. Gajjar
containing words with utmost 3 syllables and works with 100% efficiency when a
single syllable is input. However, dataset is limited because of lack of memory and
more memory is required to expand the vocabulary. To avoid memory constraints, a
new method using deep learning was used. During the spotting stage, 98% recall and
32% precision was achieved. The network was trained on a very small vocabulary
(V1K). Experiments were conducted on a dataset of approximately 300 words. A
WER of 2% was attained. In the future, the proposed system will be tested on a
versatile dataset of large vocabulary V8K and above.
Acknowledgements The work is funded by IDEA LAB Program at Institute of Technology, Nirma
University, India under contract IDEA-2019-EC-02.
References
1. Cheng, H., Yang, L., Liu, Z.: Survey on 3D hand gesture recognition. IEEE Trans. Circuits Syst.
Video Technol. 26(9), 1659–1673 (2016)
2. Amma, C., Schultz, T.: Airwriting: demonstrating mobile text input by 3D-space handwriting.
In: Proceedings of the ACM International Conference on Intelligent User Interfaces (IUI’12)
(2012)
3. “Arduino -Arduino101”, Arduino.cc, 2020 [Online]. Available https://www.arduino.cc/en/
guide/arduino101. Accessed: 05 Apr 2020
4. Amma, C., Georgi, M., Schultz, T.: Airwriting: hands-free mobile text input by spotting and
continuous recognition of 3D-space handwriting with inertial sensors. In: 2012 16th International
Symposium on Wearable Computers, Newcastle, pp. 52–59 (2012)
5. Du, T., Ren, X., Li, H.: Gesture recognition method based on deep learning. In: 2018 33rd
Youth Academic Annual Conference of Chinese Association of Automation (YAC), Nanjing,
pp. 782–787 (2018)
6. Chen, F., et al.: WristCam: a wearable sensor for hand trajectory gesture recognition and
intelligent human–robot interaction. IEEE Sens. J. 19(19), 8441–8451 (2019)
7. Gustafson, S., Bierwirth, D., Baudisch, P.: Imaginary interfaces: spatial interaction with empty
hands and without visual feedback. In: Proceedings of the 23rd Annual ACM Symposium on
User Interface Software and Technology (UIST’10) (2010)
8. Elmezain, M., Al-Hamadi, A., Michaelis, B.: Hand trajectory-based gesture spotting and recog-
nition using HMM. In: 2009 16th IEEE International Conference on Image Processing (ICIP),
Cairo, pp. 3577–3580 (2009)
9. Support for Intel® Curie™ Modules”, Intel, 2020 [Online]. Available https://www.intel.com/con
tent/www/us/en/support/products/94036/boards-and-kits/intel-curie-modules.html. Accessed:
05 Apr 2020
Automated SQL Grading System
1 Introduction
An automated grading system provides efficient means for tutors to check the
student’s understandings of certain concepts in order to determine comprehension.
Considering the incrementing proportion of students, the automation of this process
highly enhances the overall efficiency. The student queries are graded by comparing
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 701
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_74
702 S. Kanchan et al.
its each component with the components of the correct query [1, 2]. An initial
approach is to award full marks if the query under consideration is correct; i.e.,
the correctness of the student SQL query is evaluated by comparing the result and
the respective query of student query with that of instructor query. Furthermore, in
cases as given below [3], wherein the queries look similar but give different results.
In such a scenario our system allocates partial marks based on the matching
of student query with the correct query. A student may write some parts of the
query correctly, and in such cases, partial marks should be allotted by taking the
weighted attributes and predicates under consideration. Partial marking, thus, incor-
porates various sub techniques under query pre-processing for awarding partial
marks to incorrect student queries. Canonicalization of the student query as well
as the instructor query is required for performing comparison of the same syntactic
variations. Thus, the canonicalized queries are then broken into components and
these components are compared. Moreover, canonicalization may not guarantee an
optimal result due to deviations in the form in which the student query and instructor
query are canonicalized, even though they are equivalent. As a result, various pre-
processing techniques are required. Dividing a given query in different attributes and
performing initial pre-processing is an integral step toward awarding partial scores.
Depending upon the syntactic variations, techniques for instance, attribute disam-
biguation, WITH Clause Elimination, BETWEEN Predicate Elimination, normal-
ization of relational, predicates, join processing are performed. In this paper, we
summarize the pre-processing techniques which are required to allot partial marks
to the student queries.
2 Literature Review
The techniques for data generation in the X-Data system were further extended
in order to include a much larger variety of queries and mutations. This data was
used in building a SQL grading system. The testing for the accuracy of the datasets
generated by this X-Data system was conducted by using SQL queries that students
submitted as a part of their DBMS course. This system did not support, or supported
only partially, some SQL features which included sub-queries in a query, queries
containing arithmetic operations, and identifier replacement mutations. It also did
Automated SQL Grading System 703
not support the functionality of assigning partial marks to examine the extent of
correctness of the student query [3, 4].
A system was presented that took a database application program as input. It
generated datasets and using these datasets, unit tests were carried out to test the
accuracy of the functions with queries in the application. The techniques that were
used were on the basis of mutation testing and static program analysis. Java appli-
cations that used JDBC or Hibernate APIs were examined. The system could not
handle all areas of SQL query mutations. It would not suggest correct queries based
on the datasets [2, 5].
An automated SQL assignment grading system was developed using object-
oriented design (OOD) technique and model-view-controller (MVC) framework. The
system consisted of two main parts: assignment management and automated SQL
grader. Instructors could manage their assignment and student information conve-
niently anytime and anywhere via internet network. The automated SQL grader was
designed to support four DBMSs: MariaDB, MySQL, PostgreSQL, and Microsoft
SQL server. In this system, grading on SQL outputs was not applicable for the SQL
with comparison operators. The partial marking system was absent [6].
The scope of the XData system was increased by including functionalities of
assigning partial marks to student queries. In the comparison of student and instructor
query, the system was able to check many more syntactic features. Due to this, the
system was able to be fully automated and scalable to huge class sizes such as
those of massive open online courses (MOOCs). Canonicalization of sub-queries
was not taken into account in this system. Canonicalization of DISTINCT placement
in FROM clause sub-queries versus outer queries was another area of future work
[1, 7].
3 Challenges Identified
4 Problem Definition
Step 1 An instructor can create SQL assessment tests and can provide model
answers for the same in the instructor mode.
Step 2 Instructor will enter the required keywords and assign corresponding weights
to entities of the model answer query.
Step 3 In the student mode, the student attempts the assessment test and submits
answers for each question.
Step 4 Once the query is submitted, the student query and the tutor query as well
as their outputs are evaluated by the matching criteria, and if all mentioned
conditions are satisfied, the student is awarded full marks.
Step 5 If the query is incorrect, the student gets the justification for the same in the
learning mode along with appropriate partial scores (Fig. 1).
5.2 Algorithm
In Fig. 2, both the instructor and student queries are segregated into elements inclu-
sive of the basic selection clause, where clause predicates, from clause, operators,
etc. These elements are further divided into sub-parts like Predicates, Projections,
Relations, Group By, and Having Clauses. For each component of the instructor
query, the sub-parts from the instructor query are matched with the corresponding
sub-parts from the student query. Missing sub-parts are penalized by giving marks
for that component in proportion to the number of instructor query sub-parts that
are actually present. Extraneous sub-parts in the student query are not penalized;
marks are computed in this manner for each subpart and added to get a mark for each
component [1].
Following are the parameters based on which the performance will be evaluated:
706 S. Kanchan et al.
Instructor query:
SELECT Department.Dept_Id, Employee.Name
FROM Department
RIGHT JOIN Employee
ON Department.Employee_Id = Employee.Employee_Id
ORDER BY Department.Dept_Id;
Student query:
SELECT Department.Dept_Id, Employee.Name
FROM Department
RIGHT JOIN Employee
ON Department.Employee_Id = Employee.Employee_Id
ORDER BY Employee.Employee_Id;
Marks: 6.5
Paral Marking Details
Department.Employee_Id = Department.Employee_Id =
Employee.Employee_Id Employee.Employee_Id
Department.Dept_Id, Department.Dept_Id, Employee.Name
Employee.Name
Department Department
Employee Employee
Employee Employee
Employee.Employee_Id Department.Dept_Id
1. User Friendly Interface: The interface must be easily accessible and comprehen-
sible by the user (student and tutor).
2. Easy integration with existing systems: The system should portray flexibility
with respect to installation and up gradation.
3. Processing time: Time taken in evaluation of queries and displaying aggregate
marks.
The accuracy of the system will be evaluated based on the required execution time
and how well the awarded partial marks correlate with the marks assigned by the
tutor. Unit test of each program will be performed using black box testing followed
by integration, system and acceptance testing.
7 Experimental Setup
experience. The two corresponding modes will maintain a user specific profile for
every student and teacher, also providing authorization for the same. Furthermore,
the student mode is classified into two profiles—Learning and Assessment.
In the instructor mode, the questions and respective solutions will be provided.
The instructor also defines the keywords and assigns the weights. For e.g., in the
above sample GUI, given in Fig. 3 the instructor has provided the keywords such
as INNER JOIN, GROUP BY. Then, in the student mode, the student can view the
marks allotted to them and the optimal query execution time taken.
8 Conclusion
In the course of our research, we have studied the various approaches contributing
to the SQL query evaluation processes, especially highlighting about the syntactic
mutations of the same. We have tried to address the challenges identified due to
conventional grading systems through our developed system. In this work, we are
primarily focusing on standardization and canonicalization techniques to process and
evaluate SQL queries for assignment evaluation, as well as enhancing the learning
environment by improving efficiency. The system provides a clear indication of the
correctly assessed queries and vice versa up to a specific efficiency rate for the result
instances tested against the system. Considering 50 queries, comparing the expected
and actual result, the system facilitates an efficiency rate of 76%. We have also
708 S. Kanchan et al.
successfully studied the flow of the entire system regarding the query processing and
its various techniques involved.
References
1. Chandra, B., Joseph, M., Radhakrishnan, B., Acharya, S., Sudarshan, S.: Automated grading of
SQL queries. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE)
2. Neumann, T., Moerkotte, G.: A combined framework for grouping and order optimization. In:
VLDB, pp. 960–971 (2004)
3. Chandra, B., Chawda, B., Kar, B., Maheshwara Reddy, K.V., Shah, S., Sudarshan, S.: Data
generation for testing and grading SQL queries. VLDB J. 24(6), 731–755 (2015)
4. Paulley, G.N., Larson, P.-A.: Exploiting uniqueness in query optimization. In: CASCON,
pp. 804–822 (1993)
5. Agrawal, P., Chandra, B., Venkatesh Emani, K., Garg, N., Sudarshan, S.: Test data generation
for database applications. In: IEEE 34th International Conference on Data Engineering (ICDE)
(2018)
6. Singporn, P., Vichianroj, P., Trongratsameethong A.: ASQLAG—automated SQL assignment
grading system for multiple DBMSs. J. Technol. Innov. Tertiary Educ. 1(1), 41–59 (2018)
7. Silberschatz, A., Korth, H.F., Sudarshan, S.: Database System Concepts, 6th edn. McGraw Hill
(2010)
Error Analysis with Customer
Retention Data
1 Introduction
In the digital era, vast amounts of data are generated from various sources like
Health care, retail, telecommunications, banking, social networking sites, etc. Due
to the sharp growth of data, researchers and decision-makers often find it difficult
to analyse the data with efficiency and obtain beneficial and worthy conclusions.For
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 709
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_75
710 V. Kaviya et al.
any business, customer retention is the primary key to its long- term survival. Cus-
tomer retention is the act of retaining customers by undertaking activities to prevent
customers from defecting to other peer companies.According to the Harvard busi-
ness review, a company can raise its profits by 25–85% if its customer retention
rate is increased by 5% [1]. Therefore, companies are in need of accurate analytical
models that can identify the fruitful customers based on their personal, behavioral
and demographic data. The analysis used in this paper is carried out on a dataset of
transactions of bank clients [2].This paper intends to analyse and inspect the working
of numerous machine learning approaches for the dataset using different evaluation
metrics and to focus on the error analysis based on the wrong predictions to give a
clear understanding on the reason for the deviation from the general trend and wrong
predictions.This paper begins with Sect. 2 where a detailed assessment of the already
existing methodologies is presented. This section also discusses the limitations of
these methodologies. In Sect. 3, a description of the overall working mechanism of
the study with results of all the approaches along with the error analysis is presented.
Finally, the paper ends with Sect. 4 where conclusions are presented.
2 Literature Review
Most of the studies conducted by the machine learning community on churn predic-
tion have used datasets from Telecom industry [3–6] and very few studies have taken
datasets from Banking industry [7, 8]. In [2], the study compared the results of the
Decision tree algorithm with both the Spark ML package and the Spark MLlib pack-
age in handling enormous data and found that Spark ML package performed better.
In data pre-processing, it can be seen that feature selection, random under-sampling
or over-sampling, data cleaning, feature extraction, standardization and encoding of
categorical and continuous attributes has significant impact on the prediction of the
model.Prediction techniques like CART, SVM, Random Forest, MLP(Neural Net-
works), Naive Bayes and DT [3, 4, 9] are used to a large extent,and it has been found
that traditional models like DT and SVM perform better compared to the Neural
networks and clustering models.
However, most of the studies in the literature have not considered a study that
covers the error analysis for the wrong predictions. There are various types of errors
related to machine learning and data analytics. The training and testing error samples
are important for error analysis and testing error samples are considered the most
important since they aid in assessing potential execution of a given prescient model
on fresh and inconspicuous data. Therefore, the current study is unique considering
all the widely used pre-processing techniques employed into a single study and then
establishing a solid ground for the classification errors and the deviation from the
general trend by performing error analysis using distance and similarity metrics.
Error Analysis with Customer Retention Data 711
3 Methodology
3.1 Dataset
Dataset taken in this analysis contains details of bank clients which is freely accessible
on Kaggle. It consists of transactional details of 10,000 customers. The features of
the dataset include Row Number,Customer ID, Surname, Credit Score (A credit
score is a 3 digit number that quantifies a person’s capacity to pay back the acquired
amount), Geography The locality of the clients across three nations where the bank
is working), Gender, Age Balance, IsActiveMember, Estimated Salary, Tenure (The
time of having the account in months), NumOfProducts (Number of accounts the
individual has), HasCrCard (Binary variable to indicate whether the client has a credit
card)and Exited (Binary variable to denote whether the client has left the bank).
Studies have demonstrated that preprocessing has a remarkable impact on the pre-
diction of the model.First, the insignificant attributes (Row Number, CustomerID
and Surname) were dropped.Categorical variables like Gender and Geography are
encoded using one hot encoding and Min max normalization is adopted for feature
value normalization.
One of the main problems faced by any classification model is class bias where
there is an unequal distribution between the classes of a target variable. SMOTE
(synthetic minority oversampling technique) algorithm is utilized which works by
creating an arbitrary arrangement of minority class samples to move the classifier
learning bias towards minority class. ROC scores have improved from 74.62 to
75.55% for RandomForest after applying SMOTE.
The execution of the model is assessed utilizing accuracy and AUC- ROC. Accuracy
is the ratio between the correct number of predicted samples and the total number of
samples. ROC-AUC score reveals the ability of the model to differentiate between
classes. Larger the score, better the model is at distinguishing classes.
712 V. Kaviya et al.
3.4 Observation
The accuracy and ROC-AUC scores for all algorithms are tabulated in Table 1.
From the table, it can be seen that Random Forest and XGBoost gave better accuracy
and ROC-AUC score compared to other algorithms. Random Forest is a cluster of
decision trees. Each node within the decision trees is a condition on one feature, to
group similar values from the dataset. For classification algorithm the condition is
based on Gini impurity. The feature importance is computed based on the amount of
influence that each feature has in decreasing the weighted impurity.For example, in
Random Forest, the final feature importance is the average of the values across all
the trees.Feature importance for Random Forest and Xgboost is shown in Fig. 1 and
it can be observed that ‘Age’, ‘NumOfProducts’, ‘Balance’ are the most dominant
features in both algorithms.
Figure 2 gives the feature importance based on Random Forest algorithm when (a)
wrongly classified samples alone are taken and (b) when an equal number of correct
and wrongly classified samples are used. It can be observed that ‘CreditScore’, ‘Age’,
‘Balance’ are the most dominant features in the first scenario. When features were
analysed individually, customers with balance less than 20,000 and customers with
age greater than forty had high retention.
Table 3 shows the number of samples incorrectly classified and the count of
common incorrectly classified samples between the two algorithms. These samples
are considered for error analysis.
Fig. 2 Feature importance using wrongly classified and an equal number of correct and wrong
samples
In order to identify the common features that caused the errors, we have analyzed
the similarity and distance of the features belonging to the error samples.
Mahalanobis distance is a measure which considers the unequal variances and corre-
lations between features to find the distance between two data elements in the space
714 V. Kaviya et al.
defined by features. This algorithm has been used for object classification in [10].
Equation explains how to compute Mahalanobis distance which was first introduced
in [11].
D 2 = (x − m)T C −1 (x − m) (1)
Hamming distance is the minimum number of substitutions required to turn one into
another. Since hamming distance works on binary data, all continuous values were
binned and converted into minimum categories possible. In [12], Hamming distance
has been used for fault analysis of various circuits.
The Jaccard similarity estimates likeness between limited sample sets and is charac-
terized as the cardinality of the intersection of sets divided by the cardinality of the
union of the sample sets. In [13], Jaccard similarity has been used to find dissimilar-
ity between the frames of a video to detect motion wherein pixels are tokenized and
hashed.
Mahalanobis distance is calculated between each feature and the target variable
to find the within-class variance. The feature with the highest Mahalanobis distance
is the most confusing. The feature with the maximum between-class variance also
has the minimum within-class variance. Between class variance of each feature is
calculated by using Hamming distance and Jaccard similarity score. Error analysis
sample set is split into two classes (a) Samples which were wrongly predicted as
Exited and (b) Samples which were wrongly predicted as Not-Exited.
Distance measure can be considered as an inverse for similarity measure. The fea-
tures that contribute to wrong prediction are the ones which have smaller values for
distance and larger values of similarity in the error samples. These are the confusing
features that result in wrong prediction. The following details can be concluded from
the observations in Table 4. Concerning Mahalanobis distance, ‘Gender’, ‘Geog-
raphy’ and ‘NumofProducts’ are the confusing features as they have the highest
Mahalanobis distance. With respect to Hamming distance ,the similarity in the dom-
inant features which is ‘Credit Score’ and ‘Balance’ have resulted in the wrong
classification.
It can be observed that ‘Balance’ and ‘NumofProducts’ are the most dominant
features based on Fig. 1. But ‘Balance’ and ‘NumofProducts’ were found to be con-
fusing features based on error analysis.To quantify the importance of these features,
these features were removed and respective results are tabulated in Table 5. There
Error Analysis with Customer Retention Data 715
is a drop in accuracy from 87.1 to 83.95 and ROC score from 74.62 to 66.35 when
‘NumofProducts’ alone is removed.
The features in the dataset were ranked based on their importance concerning both
RandomForest and XGBoost for the whole sample set and when wrongly classified
samples alone are taken.When the whole sample set was taken, it could be seen
that Age, Credit Score, Balance, and NumOfProducts were the top-ranked features
for both the algorithms. When wrongly classified samples alone were taken, it can
be observed that the same features—Age, Credit Score, and Balance are the most
dominant.
Error analysis and identifying the features causing the error is very important in this
machine learning age. This paper has considered the Customer retention data for
error analysis. The data is tested and analysed with various classifiers and it has been
observed that Random Forest and XGboost algorithms perform very well for the data
set under consideration. The misclassified data of these two classifiers is considered
716 V. Kaviya et al.
for error analysis. The features are ranked based on Gini impurity which is used in
the default Scikit-Learn implementation.
Based on the observations, features that were found to be the confusing features
were ranked higher in the feature importance ranking with respect to both Random
Forest and XGBoost. When these particular features alone were removed, accuracy
dropped to a great extent. This affirms the fact that an algorithm gives more signifi-
cance to an attribute that is not capable of separating a data sample into two classes.
The classifier performance is analyzed and top ranking features are listed under three
circumstances: actual data with less bias, with 50% error samples and with 100%
error samples. This paper lists the possible confusion features that are responsible
for misclassification and compares with the actual data.
The study can be widened by assessing and performing error analysis with datasets
from various sources to identify a general pattern and to check whether this stratified
error analysis can be generalized. Feature ranking for various sampling techniques
and diverse machine learning algorithms needs to be explored to get a clearer under-
standing of their influence on feature ranking.
References
1. Reichheld, F.F., Sasser, E.: Zero defections: quality comes to services. Harvard Bus. Rev. 68(5),
105–111 (1990)
2. Sayed, H., Abdel-Fattah, M.A., Kholief, S.: Predicting potential banking customer churn using
apache spark ML and MLlib packages: a comparative study. Int. J. Adv. Comput. Sci. Appl. 9,
674–677 (2018). https://doi.org/10.14569/IJACSA.2018.091196
3. Sahar, F.: Machine-learning techniques for customer retention: a comparative study. Int. J. Adv.
Comput. Sci. Appl. 9 (2018). https://doi.org/10.14569/IJACSA.2018.090238
4. Au, T., Ma, G., Li, S.: Applying and evaluating models to predict customer attrition using data
mining techniques. J. Comp. Int. Manage. 6(1), 10 (2003)
5. Qureshi, S.A., Rehman, A.S., Qamar, A.M., Kamal, A., Rehman, A.: Telecommunication sub-
scribers’ churn prediction model using machine learning. In: Eighth International Conference
on Digital Information Management (ICDIM 2013), Islamabad, pp. 131–136 (2013). https://
doi.org/10.1109/ICDIM.2013.6693977
6. Umayaparvathi, V., Iyakutti, K.: Applications of data mining techniques in telecom churn
prediction. Int. J. Comput. Appl. 42, 5–9 (2012). https://doi.org/10.5120/5814-8122
7. He, B., Shi, Y., Wan, Q., Zhao, X.: Prediction of customer attrition of commercial banks based
on SVM model. Procedia Comput. Sci. 31, 423–430 (2014). https://doi.org/10.1016/j.procs.
2014.05.286
8. Devi Prasad, U., Madhavi, S.: Prediction of churn behaviour of bank customers using data
mining tools. Indian J. Mark. 42(9), 25–30 (2012)
9. Xia, G., Jin, W.: Model of customer churn prediction on support vector machine. Syst. Eng.
Theor. Pract. 28, 71–77 (2008). https://doi.org/10.1016/S1874-8651(09)60003-X
10. Natarajan, V., Bharadwaj, L.A., Krishna, K.H., Aravinth, J.: Urban objects classification from
HSR -HTIR data using gaussian and mahalanobis distance classifiers. In: Proceedings of the
2018 IEEE International Conference on Communication and Signal Processing (ICCSP 2018),
Chennai, pp. 1041–1045 (2018)
11. Mahalanobis, P.C.: On the generalized distance in statistics. Proc. Natl. Inst. Sci. India 2(1),
49–55 (1936)
Error Analysis with Customer Retention Data 717
12. Chandini, B., Nirmala, Devi M.: Analysis of circuits for security using logic encryption. In:
Thampi S., Madria S., Wang G., Rawat D., Alcaraz Calero J. (eds.) Security in Computing and
Communications. SSCC, Communications in Computer and Information Science, vol. 969.
Springer, Singapore (2018)
13. Srenithi, M., Kumar, P.: Motion detection algorithm for surveillance videos. In: Pandian, D.,
Fernando, X., Baig, Z., Shi, F. (eds.) Proceedings of the International Conference on ISMAC
in Computational Vision and Bio-Engineering 2018 (ISMAC-CVB). ISMAC 2018. Lecture
Notes in Computational Vision and Biomechanics, vol. 30, pp. 955–964. Springer Netherlands
(2019)
Prediction Based Task Scheduling for
Load Balancing in Cloud Environment
Suresh Chandra Moharana, Amulya Ratna Swain, and Ganga Bishnu Mund
Abstract The exponential growth in demand towards computing resources led the
foundation of cloud computing technology. Cloud computing enables the provision
of virtual resources in terms of Virtual Machine (VM)s to service user requests.
The user tasks are scheduled for these VMs for their accomplishment. However, the
services of cloud computing are web-based and hence the workload over the VM
gets updated dynamically. In order to handle the dynamic workload, smarter task
scheduling heuristics need to be incorporated in the cloud models. The absence of
a proper task scheduling scheme may result, uneven load distribution across VMs
leading to inefficient utilization of resources. In this work, a prediction based task
scheduling scheme is proposed that handles the dynamically changing workload
efficiently. It has been seen that the proposed model lessens the load imbalance level
across VMs as compared to the contemporary task scheduling models.
1 Introduction
The increasing demand for computing resources enabled cloud computing to provide
unlimited resources to the end-user. Cloud computing is based on distributed com-
puting concepts and it offers services to users over the web. It arranges infrastructure,
platform, and software as services as per the pay-per-use model [1, 2]. Besides, cloud
computing also reduces the cost of building and managing infrastructure by provid-
ing scalable virtualized resources to the end-user [3]. Virtualization [4] helps cloud
computing to provide scalable resources in terms of the Virtual Machine (VM) to the
end-user. The user tasks are allocated to these VMs for their accomplishment. How-
ever, the monstrous growth of need for these resources motivating service providers
of cloud to use their resources towards fulfilling more user service requests [5, 6].
It may lead to uneven distribution of workload across these VMs leading to ineffi-
cient usage of computing resources. Hence, there is a necessity of distributing the
workload evenly across the VMs leading to load balancing. It will not only use the
resources efficiently but also satisfy the QoS requirements effectively. Further, it is
also observed that the resource allocation to the tasks in the cloud is NP-complete
[7]. So, it is a moving assignment to build up a task scheduling model on cloud
resources.
The above-mentioned challenge can be addressed by designing appropriate task
scheduling heuristic that balances the load across VMs leading to the effective usage
of resources. Further, in the cloud model the workload is always unpredictable and
hence requires a dynamic scheduling model to handle the unpredictability. In this
work, the objective is to design a prediction based task scheduling model that leads to
load balancing across the available VMs. At first, the overloaded VMs get selected
by using an upper threshold. The upper threshold is chosen dynamically using a
popular statistical method considering the historic CPU usage information of VMs.
Then, the tasks need to be remapped to the VMs other than the current VM is selected
using a pre-defined heuristic. Finally, the identified tasks are rescheduled to the non-
overloaded VMs from overloaded ones in-order to achieve load balancing. The load
imbalance level parameter is taken into account to compare the proposed scheme
with the contemporary models. After performing extensive experimentation, it has
been seen that the proposed model decreases the load imbalance level marginally as
contrasted with the current methodologies. So, the proposed model not only achieves
load balancing but also handles the dynamic workload efficiently.
The remainder of the paper is sorted out as follows. The next section presents
a summary of the literature closely linked to the current work. Section 3 provides
the details of the proposed system model. In Sect. 4, the assessment of the proposed
model is featured. Finally, the last section highlights the conclusive remarks and
future directions.
2 Related Work
The assignment of tasks to the VMs famously known as task scheduling in the cloud
model has been widely studied in different literature. Among them, scheduling of
tasks for balancing the in the cloud has taken a reasonable place. As per the literature
[8], the objective of load balancing schemes not only to distribute the load evenly
across VMs but also maximize the utilization of computing resources. Milani and
Navimipour [9] have discussed the different load balancing schemes applicable to
the cloud environment. Besides, they have also mentioned the challenges faced by
the load balancing algorithms. Patel et al. [10] also studied the varied load balancing
Prediction Based Task Scheduling for Load Balancing … 721
schemes in the cloud environment. They have classified the mentioned load balancing
schemes into various categories and highlighted the pros as well as cons of each
method.
The resource allocation problem can be classified as of NP-Complete nature. So,
there is a need of developing heuristic and meta-heuristic schemes for addressing this
problem. Freund et al. [11] discussed max–min and min–min heuristic for scheduling
a task in a distributed environment. He et al. [12] suggested a modification of the min-
min heuristic by taking of QoS into consideration. The literature [13, 14] has focused
on task scheduling schemes in the cloud environment that takes QoS into account.
Umarani et al. [15] have presented an ant colony optimization based meta-heuristic
model for scheduling of tasks however experience the ill effects of prolonged waiting
time. Cho et al. [16] proposed a hybrid meta-heuristic approach towards addressing
the scheduling problem taking both the ant colony optimization and particle swarm
optimization into consideration. The task scheduling heuristics termed as round-robin
and random is presented in the literature [17, 18]. The work presented by Rimal et
al. [18] used round-robin for even distribution of load across computing resources,
but it does not take loads of VMs into consideration.
It has been observed that the load balancing schemes can also be either migration
based or prediction based in the cloud environment. Task scheduling schemes based
on migration transfers the running tasks from overloaded VMs to the non-overloaded
ones without service disruption. A particle swarm optimization based task mapping
scheme is proposed in [19] in which rather than migrating the overburdened VM
the tasks on the over-burdened VM are moved to achieve load balancing. Wu et al.
[20] have presented a prediction based task mapping scheme relying upon previous
data. The focus of the authors is to predict the VM needs in advance and schedule
the task accordingly to accomplish load balancing. Bala and Chana [21] presented a
predictive approach to identify the overloaded and underloaded VMs and highlighted
a migration based scheme of task scheduling from the over-burdened VMs to the
under-burdened ones to balance the load across VMs. After reviewing the literature,
the following research gaps are identified. It has been observed that prediction based
task scheduling schemes based on statistical techniques are missing. Alongside this,
most literature has considered a single parameter for taking decisions in their model.
It has been also found that the underutilized virtual machines are taken into account
in a few pieces of literature. In this proposed scheme, the authors have addressed
some of the gaps mentioned above. In the next section, the details of the proposed
task scheduling scheme will be discussed.
3 System Design
In the proposed model, the authors have presented a prediction based task mapping
scheme for achieving load balancing.
For this model, the tasks are assumed to be independent of one another. At first,
the underloaded hosts are detected by considering the lower threshold as 15% CPU
722 S. C. Moharana et al.
usage. The tasks available on these hosts are randomly placed over the available VMs.
The next job is to identify the overloaded VM. In-order to detect the overloaded
VM, an upper threshold is computed based on a statistical method. As suggested
by Beloglazov and Buyya [22], the median absolute deviation (MAD) has been
employed to decide upon the upper threshold. The MAD value is computed by taking
the previous CPU usage values of available VMs and is used to predict the upper
threshold value dynamically. The technique used for computing the MAD value is
mentioned below.
where n represents the count of currently active VMs, v_MAD represents the MAD
value, h_CPUi represents the previous CPU usage value of ith CPU, and h_CPU
represents the mean CPU usage value of n active VMs. Then, the upper threshold
value (u_THR) gets predicted using the rule,
In case, the value of k moves towards 0 then u_THR gets a lesser value otherwise
it gets a larger value. The value of k decides the aggressiveness of the VM for
the accomplishment of tasks assigned to it. In the proposed scheme, the k value is
assumed to be the standard value of 0.7. The complete proposed system model is
provided in Fig. 1 for reference. In the second phase of the presented model, the task
selection is taken into account after the selection of the overloaded VMs. For each
task running on the available VMs, a matrix is maintained that will keep the record
of workload (in MIPS) and the priority of each one.
The task with the highest value is selected for migration from the previously selected
VMs. Nevertheless, the random strategy is applied to break the coincidence in task
selection. The α value is meant for keeping the balance between the task load (t_Load)
and its priority (t_PTR) and it must be chosen according to the environmental require-
ments. In the proposed model the α value is chosen 0.6, as it leads to the best results.
It is worthy to mention, at the VM selection phase, the VMs are segregated into either
overloaded or non-overloaded ones. After the task gets selected from the overloaded
VM, it is migrated to the non-overloaded VM. This process continues in iterations in-
order to achieve load balancing across the VMs. As a result, the computing resources
are also utilized efficiently. The next section will highlight the assessment of the pre-
sented model and compare the findings with the existing approaches.
The experimental environment takes the hp pro-book system with core i5 8th gen
processor, 8 GB of RAM, and Ubuntu 18.04 OS into consideration. The proposed
model is implemented in the Java programming environment.
The collection framework in Java has been used efficiently for realizing the pro-
posed scheme. One list is created in Java for each VM. The CPU requirement of
the user tasks are represented as random values in a pre-defined range and these
values are stored in the list. It simulates the scheduling of tasks to VMs. Then, the
proposed model executed in many iterations. In the first experiment, 20 VMs are
considered whereas the second experiment considers 40 VMs for analyzing the per-
formance of the presented scheme with contemporary approaches. The performance
metric Load_Imbalance_Level has been used to measure the performance of the pre-
sented model with the existing schemes. The Load_Imbalance_Level parameter can
be mathematically defined as,
n
Load_Imbalance_Level = (c_LOADi − c_LOADi+1 ) n (4)
i=1
where c_LOADi represents the CPU load post applying the proposed scheme. The
values of c_LOADi in the gap of five iterations have been recorded to analyze the
effectiveness of the presented scheme. It has been observed that the presented scheme
outperforms the existing model in terms of Load_Imbalance_Level in both the exper-
iments as shown in Fig. 2. As Load_Imbalance_Level gets reduced in the presented
scheme, the CPU load difference among the active VMs gets minimized as well
leading to load balancing. The next section will present the concluding remarks and
provide future directions as well.
724 S. C. Moharana et al.
5 Conclusion
The scheduling of user service requests to the virtual machines in the cloud environ-
ment to balance the load is extensively studied in the literature. However, prediction
based task scheduling using the statistical approaches can be an extra addition. In this
work, a prediction based task scheduling scheme is presented based on the median
absolute deviation. The work aims to reduce the CPU load difference among the
active VMs to achieve load balancing among the VMs. The experiment results sug-
gest that the presented model outperforms the contemporary scheduling schemes in
terms of load imbalance level. In the future, the plan is to incorporate multiple param-
eters like memory usage, network bandwidth into the proposed model and analyze
its performance.
Prediction Based Task Scheduling for Load Balancing … 725
References
1. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson,
D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4),
50–58 (2010)
2. Armbrust, M., Fox, A., Griffith, R., Joseph, A., Katz, R., Konwinski, A., Lee, G., Patterson, D.,
Rabkin, A., Stoica, I., Zaharia, M.: Above the Clouds: A Berkeley View of Cloud Computing.
Rep. UCB/EECS-2009-28, University of California at Berkley, USA (2009)
3. Yousafzai, A., Gani, A., Noor, R.M., Sookhak, M., Talebian, H., Shiraz, M., Khan, M.K.:
Cloud resource allocation schemes: review, taxonomy, and opportunities. Knowl. Inf. Syst.
50(2), 347–381 (2017)
4. Panda, B., Moharana, S.C., Das, H., Mishra, M.K.: Energy aware virtual machine consoli-
dation for load balancing in virtualized environment. In: 2019 International Conference on
Communication and Electronics Systems, pp. 180–185. IEEE, India (2019)
5. Singh, S., Chana, I.: Cloud resource provisioning: survey, status and future research directions.
Knowl. Inf. Syst. 49(3), 1005–1069 (2016)
6. Arunarani, A., Manjula, D., Sugumaran, V.: Task scheduling techniques in cloud computing:
a literature survey. Future Gener. Comput. Syst. 91, 407–415 (2019)
7. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-
Completeness. Freeman, San Francisco (1979)
8. Subrata, R., Zomaya, A.Y., Landfeldt, B.: Game-theoretic approach for load balancing in
computational grids. IEEE Trans. Parallel Distrib. 19(1), 66–76 (2008)
9. Milani, A.S., Navimipour, N.J.: Load balancing mechanisms and techniques in the cloud envi-
ronments: systematic literature review and future trends. J. Netw. Comput. Appl. 71, 86–89
(2016)
10. Patel, D.K., Tripathy, D., Tripathy, C.R.: Survey of load balancing techniques for grid. J. Netw.
Comput. Appl. 65, 103–119 (2016)
11. Freund, R.F., Gherrity, M., Ambrosius, S.L., Campbell, M., Halderman, M., Hensgen, D.A.,
Keith, E.G., Kidd, T., Kussow, M., Lima, J.D., Mirabile, F., Moore, L., Rust, B., Siegel, H.J.:
Scheduling resources in multi-user, heterogeneous, computing environments with SmartNet.
In: Proceedings of the 7th Heterogeneous Computing Workshop, pp. 184–199. IEEE, USA
(1998)
12. He, X., Sun, X., Von, L.G.: QoS guided min-min heuristic for Grid task scheduling. J. Comput.
Sci. Technol. 18(4), 442–451 (2003)
13. Wu, X., Deng, M., Zhang, R., Zeng, B., Zhou, S.: A task scheduling algorithm based on
QoS-driven in cloud computing. Procedia Comput. Sci. 17, 1162–1169 (2013)
14. Ali, H.G.E.D.H., Saroit, I.A., Kotb, A.M.: Grouped tasks scheduling algorithm based on QoS
in cloud computing network. Egypt. Inform. J. 18(1), 11–19 (2017)
15. Umarani, S.G., Maheswari, V.U., Shanthi, P., Siromoney, A.: Tasks scheduling using ant colony
optimization. J. Comput. Sci. 8(8), 1314–1320 (2012)
16. Cho, K.M., Tsai, P.W., Tsai, C.W., Yang, C.S.: A hybrid meta-heuristic algorithm for virtual
machine scheduling with load balancing in cloud computing. Neural Comput. Appl. 26(6),
1297–1309 (2014)
17. Lee, Y.C., Zomaya, A.Y.: Energy efficient utilization of resources in cloud computing systems.
J. Supercomput. 60(2), 268–280 (2012)
18. Rimal, B.P., Choi, E., Lumb, I.: A taxonomy and survey of cloud computing systems. In: Fifth
International Joint Conference on INC, IMS and IDC, pp. 44–51. IEEE, South Korea (2009)
19. Ramezani, F., Lu, J., Hussain, F.K.: Task-based system load balancing in cloud computing
using particle swarm optimization. Int. J. Parallel Prog. 42(5), 739–754 (2013)
20. Wu, H.S., Wang, C.J., Xie, J.Y.: TeraScaler ELB—an algorithm of prediction-based elastic
load balancing resource management in cloud computing. In: 27th International Conference
on Advanced Information Networking and Applications Workshops, pp. 649–654. IEEE, Spain
(2013)
726 S. C. Moharana et al.
21. Bala, A., Chana, I.: Prediction-based proactive load balancing approach through virtual machine
migration. Eng. Comput. 32(4), 581–592 (2016)
22. Beloglazov, A., Buyya, R.: Optimal online deterministic algorithms and adaptive heuristics
for energy and performance efficient dynamic consolidation of virtual machines in cloud data
centers. Concurr. Comput.: Pract. Exp. 24(13), 1397–1420 (2011)
Test Case Generation Using
Adequacy-Based Genetic Algorithm
Abstract Generating test cases is one of the most time- and effort-consuming prob-
lems in software testing. Many efforts have been made to automate this problem so
as to make the procedure of software testing more efficient. The major part of these
solutions involves the use of evolutionary techniques. Genetic algorithm is associ-
ated with automating the problem of test case generation since early 1990s. This
paper presents an alternative way of using genetic algorithm for test case genera-
tion. It involves adequacy-based approach where the mutants are incorporated into
the source code while generating the test cases. This approach will not only help in
producing efficient results but also will reduce ample amount of time taken in the
process. The results show that the intended approach undergoes an effective decline
in the obtained number of test cases when compared to the path testing approach.
1 Introduction
In conventional life cycle of developing the software, the software testing procedure
takes nearly half of the development budget, more than half of the total development
time, and maximum effort compared to all the other phases [1]. The process of soft-
ware testing comprises of three main phases considering test cases: (i) generation,
(ii) execution, and (iii) evaluation which can be singularly described as: Test case
generation is the process which involves developing the relevant test cases in accor-
dance with a particular software system. Further, the test cases are executed for the
verification of software functionalities in the process called test case execution. The
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 727
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_77
728 R. Malhotra and S. Pandey
third phase, known as test case evaluation, involves recording the test cases which
were useful and provided value to the entire process of software testing [2].
The most crucial one out of all these phases is test case generation as it takes
the maximum cost, effort, and time out of all the three phases [1], and its execution
requires a certain level of expert knowledge. The process of automating test case
generation can greatly reduce the overall development time and cost of the software
testing procedure which in turn will reduce the time and cost of overall software
development procedure [3].
There has been a lot of research done in the field involving automation of
software testing. Most of the work regarding automation of test case generation
involves the use of evolutionary techniques. Genetic algorithm is the most used
evolutionary technique in automatic test case generation [4]. It imitates the process
of the natural biological evolution built on the notion: ‘survival of the fittest’ given
by Charles Darwin. This algorithm involves an initial population which evolves into
a better population in each generation by allowing the reproduction of the individ-
uals partaking high fitness and discarding the individuals with low fitness values.
It involves having an initial population which is checked for its fitness followed by
selection of parents having high fitness values to generate a new population which
contains the offsprings using the process of crossover and mutation of genes. This new
population is then evaluated. This algorithm is iterated until a pre-decided stopping
criterion is met [5].
Software testing is broadly categorized into functional testing and structural
testing. The process of functional testing is used for checking the functionalities of
the software system [6], and structural testing emphases the correctness in structure
or hierarchy of the system [7]. Structural testing is further classified into reliability-
based criteria and adequacy-based criteria. The reliability-based criteria outputs a
reliable set of test cases which proves the correctness of the program while adequacy-
based criteria brings out the fault finding capacity of the test suite generated [8]. This
paper uses adequacy-based testing criteria along with incorporating mutation anal-
ysis alongside the process of test case generation which will save a substantial amount
of time in the whole process. It is an extension of work in [9] and uses the concept
with better technology and enriched dataset to get efficient results which prove the
fidelity of the technique.
The rest of the paper is systematized as: Sect. 2 covers methodology employed
in the paper. Section 3 discusses experimental studies, technologies and parametric
settings, and the results obtained by the process. Section 4 carries the conclusions
and the possibilities of the future work.
2 Methodology
This paper proposes an adequacy-based criteria for generation of the test cases. The
most common practice which examines test case adequacy is ‘mutation analysis’
which usually is done after we have generated the test cases. This paper proposes a
Test Case Generation Using Adequacy-Based Genetic Algorithm 729
Fitness evaluation
Selection procedure
N
Mutation/ Crossover
Stopping
Condition
Y
End
technique which uses the mutation analysis alongside test case generation which save
ample amount of time and automatically generates adequate test cases as the output.
The typical process of genetic algorithm is shown below in a form of flowchart in
Fig. 1.
In the proposed method, we first generate mutants in the program by making some
slight variations in the source code. Then, we record the difference in the original
and mutated statement of the source code and generate the respective constraints
accordingly. The solution to these constraints would represent the test cases. Subse-
quently, using the rules given in [10] we construct fitness function for the source
code. This fitness function is then fed to the genetic algorithm for the generation of
the test cases. Now, this process has the capability to kill the other mutants, along
with the current one, recorded at the initial level. So we will now examine the status
of other mutants. If any mutant is still alive, we will repeat the process until all the
mutants are killed. Figure 2 shows the comprised steps in the proposed process.
730 R. Malhotra and S. Pandey
3 Experimental Studies
3.1 Dataset
In this work, we have used source codes of eight real time programs, which lie in
the range of 564–1092 lines of code. All these programs are developed in C/C++
language. All the selected source codes are of game-based programs and are fully
functional. The details of the programs used are shown in Table 1.
Once we have finalized the dataset, we have to consider each program source code
individually and apply the process described in Sect. 2 on the same until the desired
results are obtained.
Mutant generation. A mutant in a source code is introduced by intentional alter-
ation of the source code [11]. For each program considered, we have chosen five
mutants by introducing a slight variation in the source code. While choosing the
Test Case Generation Using Adequacy-Based Genetic Algorithm 731
mutants we have to make sure that we have to choose such a mutant that its execu-
tion will lead the program onto a different path than the expected path. This is how
the mutant will be identified and killed. We have recorded each of these mutants
carefully along with its processing status during the execution. For each mutant, a
total of 10 runs are iterated in each program.
Constraint Generation. This is the crucial step which will ensure the correctness
of the obtained test data [8]. After the generation of mutants, we record the differences
obtained in the mutated statement with the original statement of the source code. We
record these differences in a specific format to generate the constraints for the mutant.
The solutions to these constraints will give us the required test cases for the specific
program.
Fitness function. The main element in genetic algorithm is its fitness function
value. Based on this value of fitness function, the algorithm decides the goodness
measure of each individual element in the population [12]. The performance of
the algorithm solely depends upon how effective is the associated fitness function.
Henceforth, the step of generating the fitness function is the most crucial in the
execution of genetic algorithm. The procedure adopted for the construction of fitness
function is the one followed in [9]. It will generate the fitness function for the mutant
in consideration which will then be fed the genetic algorithm module of Python to
get the desired results.
For each program mentioned in Table 1, we have computed test cases for the proposed
technique by taking five mutants in each source code, generating their respective
constraints and fitness functions and subsequently the same to the genetic algorithm
module of Python. For comparing the efficiency of this adequacy-based technique,
we have used the most used technique from reliability testing criteria: ‘path testing
technique’ [13]. For this technique, we have constructed control flow graphs for
732 R. Malhotra and S. Pandey
Table 2 Total test cases generated as per adequacy-based and reliability-based testing techniques
S. No. Program source code Adequacy-based technique Reliability-based technique
(proposed technique) (path testing technique)
P1 snake.c (Snake and bits 43 111
game)
P2 bike.c (Bike racing game) 64 104
P3 pacman.cpp (Pacman 117 199
game)
P4 sal.c (Snake and ladder 33 415
game)
P5 heli.cpp (Helicopter 124 172
attack game)
P6 helilearn.cpp (Helicopter 107 179
driving game)
P7 fortell.cpp (Fortune teller 100 373
game)
P8 tank.c (Tank game) 165 241
each of the program source codes mentioned in Table 1 and then chose five unique
paths from those CFGs [14]. We have followed the method given in [9] for the
construction of fitness values for each of these paths. Upon the construction of these
fitness functions, we will feed them to our genetic algorithm module as we did for
the proposed technique; and in the same manner present technique is also iterated
for 10 runs.
Table 2 shows the total number of test cases generated by both the techniques,
namely adequacy-based technique (proposed technique) and reliability-based tech-
nique (path testing technique). Now, for each of the five mutants and paths selected
in each program, the total number of unique test cases is recorded for each of the
10 runs. Then, we have taken the average of the values for each mutants or path
separately. Any value obtained in decimal is approximated by rounding off to the
floor value to get an integral value. After this, we have taken the sum of all these
approximated values. These are the values shown in Table 2 (result of the sum of
approximated average values of all the 10 runs for mutants/paths for each program).
The comparison between both techniques is done on the basis of two measures:
(i) total number of test cases generated and (ii) time taken for generating the test
cases.
Here, the proposed method has generated considerably less number of test cases,
as and when compared with the path testing technique. The main reason being, the
proposed technique only generates adequate test cases while path testing technique
generates adequate and non-adequate test cases as well. Millo and Offut have stated
that an adequate test case set is responsible for the failure of all faulty versions
of considered program [6]. Adequacy mainly focuses on detection of faults by the
Test Case Generation Using Adequacy-Based Genetic Algorithm 733
100
90
80
% reduction in test cases
70
60
50
40
30
20
10
0
1 2 3 4 5 6 7 8
Program number
test case set rather than focusing on proving the correctness; and this makes it a
better alternative. We are comparing both the techniques on the basis of percentage
reduction in the test cases which is calculated by the formula used in [9] which is
shown as Eq. 1 below:
TPT − TOT
percentage reduction = (1)
TPT
Here, TPT denotes the number of test cases generated by the path testing technique
and TOT denotes the number of test cases generated by our proposed technique.
The following bar chart represented in Fig. 3 shows the comparison between the
two techniques based on the above-stated formula. Figure 3 shows that there is 27.9–
92.04% reduction observed in number of the test cases which are yielded by the
proposed technique in comparison to the reliability-based technique.
For the method of path testing, for each program five unique paths are taken
into consideration and fitness function are constructed for each of these five paths
separately before feeding them onto the genetic algorithm module. However, in the
proposed technique, we have chosen five mutants, which upon execution kill one or
more of the other mutants. Therefore, we have to construct the fitness function for
only those mutants which are not killed by their fellow mutants. For instance, if out
of 5 mutants, 3 are killed in the process, we have to construct the fitness function only
for two mutants saving 60% time when compared with the method of path testing
technique. Hence, ample amount of time is saved in the proposed technique when
compared with the time taken in path testing technique. The bar chart in Fig. 4 shows
the percentage reduction in time in proposed method for considered dataset when
compared to the path testing technique.
The above figure shows that time taken in generating the test cases undergoes
20–60% reduction in the proposed technique in comparison to the reliability-based
technique.
734 R. Malhotra and S. Pandey
70
60
% reduction in time
50
40
30
20
10
0
1 2 3 4 5 6 7 8
Program number
4 Conclusion
In this work, genetic algorithm has been used for the generation of adequate test
cases. We have adopted a concept of generating test cases along with simultaneous
mutation analysis so as to automatically generate adequate test cases and to save
additional time. We have taken a rich dataset of eight real-time program that ranges
up to 1092 lines of source code. We implemented the technique on this dataset and got
promising results. The results of the implementation were compared against the most
used reliability-based technique: path testing technique. Considering the number of
generated test cases, we have recorded a reduction up to 92.04% in comparison
of path testing technique; which is substantially better than the parent research.
In terms of time taken in generation of these test cases we have recorded up to
60% savings. Hence, in both the comparison criteria, the proposed technique has
shown significant better results than the reliability-based testing. Thus, we can say
the proposed technique is a promising technique in the area of automatic test cases
generation.
The future aspects of this works includes the implementation of other heuristic
algorithms following this concept of the proposed adequacy-based technique such
as particle search optimization, bat algorithm, artificial bee colony algorithm etc.
to verify the efficiency of the technique independent of the algorithm. Further, an
automatic tool can be developed for implementing the proposed technique to save
time and efforts to develop the code at each attempt.
References
3. McMinn, P.: Search based software test data generation: a survey. Softw. Test. Verif. Reliab.
14(2), 105–156 (2004)
4. Chuaychoo, N., Kansomkeat, S.: Path coverage test case generation using genetic algorithms.
J. Telecommun. Electron. Comput. Eng. (JTEC) 9(2–2), 115–119 (2017)
5. Korel, B.: Automated software test generation. IEEE Trans. Softw. Eng. 16(8), 870–879 (1990)
6. Duran, J.W., Ntafos, S.C.: An evaluation of random testing. IEEE Trans. Softw. Eng. 10(4),
438–443 (1984)
7. Jones, B.F., Sthamer, H.H., Eyres, D.E.: Automatic structural testing using genetic algorithms.
Softw. Eng. J. 299–306 (1996)
8. DeMillo, R., Offutt, A.J.: Constraint-based automatic test data generation. IEEE Trans. Softw.
Eng. 17(9), 900–910 (1991)
9. Malhotra, R., Garg, M.: An adequacy based test data generation technique using genetic
algorithms. J. Inf. Process. Syst. 7(2), 363–384 (2011)
10. Chen, Y., Zhong,Y.: Automatic path-oriented test data generation using a multi-population
genetic algorithm. In: Fourth International Conference on Natural Computation, pp. 566–570.
IEEE (2008)
11. Haga, H., Suehiro, A.: Automatic test case generation based on genetic algorithm and mutation
analysis. In: IEEE International Conference on Control System, Computing and Engineering,
pp. 119–123. IEEE (2012)
12. Xanthakis, S., Ellis, C.: Application of genetic algorithm to software testing. In: Proceedings
of 5th International Conference on Software Engineering and its Applications, pp. 625–636.
Toulouse, France (1992)
13. Nirpal, P.B., Kale, K.V.: Using genetic algorithm for automated efficient software test case
generation for path testing. Int. J. Adv. Netw. Appl. 2(6), 911–915 (2011)
14. Dahal, K., Hossain, A.: Test data generation from UML state machine diagrams using GAs.
In: International Conference on Software Engineering Advances, pp. 834–840. IEEE (2007)
Performance Analysis of π, AL and CT
for Consistency Regularization Using
Semi-Supervised Learning
1 Introduction
Deep learning models [1] yield better results when trained with an ample amount of
supervised data. In real-life scenarios, obtaining a large-scaled labeled dataset can be
challenging, since the construction of such datasets is usually cost-incurring. Here
semi-supervised learning [2, 3] comes into play. By virtue of standard development
procedures like Reinforcement Learning Models or Generative Adversarial Networks
(GANs) [4, 5], large-scaled labeled datasets can be composed, and their potency can
be further improved by the implementation of consistency enforcing models. These
models are trained using unlabeled data and aim at stabilizing the predictions, on
being subjected to input perturbations. These are widely used for training audio
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 737
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_78
738 R. Choubey and K. Bhattacharyya
2 Literature Review
The demand of consistency-based models is rapidly increasing due to the vivid use
of semi-supervised learning. One example of such consistency-based models has
proposed by Samuli Laine and Timo Aila [4], where self-ensembling [9] is intro-
duced. Here concurrent predictions for unknown labels are made using the outputs
of the network, on various time intervals. The proposed theory heavily relies on the
concept of dropout regularization and input augmentation [10]. The drawback here
was that dropout, when increased beyond a certain threshold, results in overfitting or
underfitting of the model. Intuitively, higher dropout rate would result in variance to
some of the layers which eventually degrades the overall performance of the model.
Hence, dropout is not used much nowadays. On the other hand, input augmenta-
tion is also computationally very expensive (only rotation and scaling are cheap).
Another model which was grounded on training consistency-based methods using
stochastic weight averaging (SWA) was proposed by Ben Athiwaratkun, Marc Finzi,
Pavel zmailov, Andrew Gordon Wilson. It is a recent approach where a modified
learning rate is used, it uses the mean of weights along the arc of stochastic gradient
descent (SGD) [7] is evaluated. But SWA does not give optimum predictions and can
be often slow for large datasets depending on the learning rate of the SGD. Further
researches include the concept of active learning (AL) [11], authored by Mingfei
Gao, Zizhao Zhang, Guo Sercan O. Arik, Larry S. Davis, Tomas Pfister. This is a
combination of data labeling and model training. Minimization of the labeling cost
is achieved by primarily selecting data of higher value. In AL models based on pool
mechanism, easily accessible unlabeled data, are used for selection purposes, but not
for training a model. The performance of AL model is better than π model, but it
cannot be considered as the optimum model since the error rate in this model is also
considerably elevated.
Performance Analysis of π, AL and CT … 739
3 Consistency-Based Models
NL
In the semi-supervised setting, we have access to labeled data DL = {(xiL , yiL )}i=1 )}
N
and unlabeled data DU = {(xi )}i=1 . Given two perturbed inputs x , x of x and
U U
the perturbed weights ωf and ωg the consistency loss penalizes the predicted
probabilities f i(x ; ωf ) and (x ; ωg ).
This loss is typically the mean squared error or KL divergence:
2
MSE
lcons ω f , x = f x : ωf − g x , ωg
or
KL
lcons ω f , x = KL f x ; ωf ||g x , ωg (1)
The total loss used to train the model can be written as:
L ωf = lC E ω f x, y + λ lcons ω f , x (2)
(x,y)∈D L x∈D L ∪DU
LCE L cons
where for classification LCE is the cross-entropy between the model predictions and
supervised training labels. The parameter λ > 0 controls the relative importance of
the consistency term in the overall loss.
3.1 Π-model
The -model can also be seen as a simplification of the -model of the ladder
network by Rasmus et al. [2, 12, 13], a previously presented network architecture
for semi-supervised learning.
Algorithm (-model pseudocode):
Require: xi = training stimuli.
Require: L = set of training input indices with known labels.
Require: yi = labels for labeled inputs i ∈ L.
Require: w(t) = unsupervised weight ramp-up function.
Require: f θ (x) = stochastic neural network with trainable parameters θ.
Require: g(x) = stochastic input augmentation function.
for t in [1, numepochs] do.
for each minibatch B do.
z i∈B ← f θ (g(xi∈B )), evaluate network outputs for augmented inputs.
z̃ i∈B ← f θ (g(xi∈B
)), again, with different dropout and augmentation.
loss ← − |B1i | log z i |yi |, supervised loss component.
i∈(B∩L)
740 R. Choubey and K. Bhattacharyya
+ω(t) C|B|
1
||z i − z̃ i ||2 , unsupervised loss component.
i∈B
update θ using, e.g., ADAM. Update network parameters.
end for.
end for.
return θ.
The network is evaluated for each training input xi twice, resulting in prediction
vectors z i and z̃ i . The loss function consists of two components. The first component
is the standard cross entropy loss, evaluated for labeled inputs only. The second
component, evaluated for all inputs, penalizes different predictions for the same
training input xi by taking the mean square difference between the prediction vectors
z i and z̃ i . To combine the supervised and unsupervised loss terms, the latter is scaled
by time-dependent weighting function w(t).
J
ε(x, M) = Var P(Ŷ = l|x, M), P(Ŷ = l|x̃1 , M), P(Ŷ = l|x̃ N , M) (4)
l=1
J is the number of response classes and N is the number of perturbed samples of the
original input data x, {x̃1 , …, x̃ N }.
Algorithm (AL-model pseudocode):
Performance Analysis of π, AL and CT … 741
and correct predictions for labeled examples xi . Given a mixup [8, 17] operation:
4 Experiments
4.1 Implementation
Labeled
Img(i) Supervis
ed loss
Super
Unlabele vised
d Img(j) Loss+
Consis
Consistenc tency
Mixed y loss Loss
m (j, k)
Unlabele
d Img(k)
Fig. 1 Proposed block diagram for consistency regularization using semi-supervised learning,
where li → correct predictions for labeled images, u j , u m , u k → low margined unlabeled sample
images
of the main dataset) of θ . By virtue of SGD or stochastic gradient descent, for each
repetition t, the parameters θ are revised so as to curtail L = L S + w(t)· L U S , where
L S is the loss incurred due to use of typical cross-entropy mechanism on supervised
samples D L . Besides that, L U S is the state-of-the-art terminology in regularization of
interpolation consistency. These losses of consistency are reckoned over minibatches
which can be either supervised or unsupervised, and the significance of L U S (consis-
tency regularization term), is amplified subsequently after simultaneous iterations
by consuming the gradient function w(t). The unlabeled samples namely u j and u k
develop a couple of mini batches, sampling which, the L U S is evaluated. Further,
fake labels of the same are assessed accordingly.
Table 1 Outcomes for various models on CIFAR10 (5000 labels) and SVHN (1000 labels) datasets
S. No. Models CIFAR10 5000 labeled 60,000 SVHN 1000 labeled 76,277 unlabeled
unlabeled (test error %) (test error %)
1 29.66 ± 2.34 17.21 ± 3.01
2 AL 18.67 ± 1.23 9.01 ± 1.01
3 ICT 6.79 ± 0.12 2.54 ± 0.04
determined by computing the sum of squared distances between our target variable
and predicted values (mean squared error).
Table 1 shows outcomes for various well-known consistency regularization
models on CIFAR10 (5000 labels) and SVHN (1000 labels) datasets. It is observed
that for CIFAR10 as well as SVHN datasets, CT attains better results compared to
other models. An SSL algorithm can be evaluated by comparing its performance
against a novel algorithm using supervised learning. Hence, this research article
shows an effective comparison of three sophisticated algorithms, administered as π ,
CT, and AL in Table 1. After successful completion of the experiment, it is observed
that CT method outperforms this test as compared to other models and results in
a twofold reduction in the error obtained in the case of CIFAR10, and a drastic
reduction of four-folds is detected for SVHN dataset.
Additionally, in Table 1, it is perceived that CT considerably cuts down the
test error as compared to robust SSL approaches. For example, for 5000 samples
(labeled), it brings down the error percentage of the best-affirmed approach by almost
25%. In general, it is noticed that for a handful of data having labels, lesser the values
of the max-consistency coefficient and α, better the validation errors were obtained.
For SVHN, CT obtains test errors are competent concerning other well-known SSL
methods (Table 1). SSL algorithm, which uses the WRN-28-2, brings out the least
error percentage obtained for either of these algorithms. To find the actual efficiency
of CT contrary to these semi-supervised learning algorithms, the experiments were
conducted on Wide ResNet-28-2 architecture. The outcomes are jotted down in Table
1. CT proves to be more efficient on CIFAR10 and SVHN datasets as compared to
other models.
5 Conclusion
Machine learning [1] has a radical influence to various extents, and still its applica-
tion is often constrained due to the high cost of labeled data. Advancement in SSL
techniques [18] bridges the gap for those implementations where obtaining labeled
data is cost incurring. In this article, we have conducted a performance analysis with
the best-known consistency-based models, namely π , AL, and CT, using CIFAR 10
and SVHN, to which we have observed that CT yields the optimal result (having the
least error prediction). CT has two benefits when compared to other methods using
746 R. Choubey and K. Bhattacharyya
References
1. Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn.
15(2), 201–221 (1994)
2. Rasmus, A., Berglund, M., Honkala, M., Valpola, H., Raiko, T.: Semi-supervised learning with
ladder networks. In: Advances in Neural Information Processing Systems (2015)
3. Sajjadi, M., Javanmardi, M., Tasdizen, T.: In regularization of stochastic transformations and
perturbations for deep semi-supervised learning. In: Proceedings of the International Confer-
ence on Neural Information Processing Systems, NIPS’16, pp. 1171–1179, USA, 2016. Curran
Associates Inc. ISBN: 978-1-5108-3881-9
4. Laine, Aila, T.: Temporal ensembling for semi-supervised learning. In: International Confer-
ence on Learning Representations (2017)
5. Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: Mixmatch: A
Holistic Approach to Semi-Supervised Learning (2019)
6. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.C., Bengio, Y.:
Generative adversarial nets. In: Advance Neural Information Processing Systems, pp. 2672–
2680 (2014)
7. Athiwaratkun, B., Finzi, M., Izmailov., P, Wilson, A.G.: There are many consistent explanations
of unlabeled data: In: International Conference on Learning Representations
8. Luo, J., Zhu, M., Li, Y.R., Zhang, B.: Smooth neighbors on teacher graphs for semi-supervised
learning. In: CVPR (2018)
9. French, M.M., Fisher, M.: Self-ensembling for visual domain adaptation. In: International
Conference on Learning Representations (2018)
10. Mohammadi, M., Al-Fuqaha, A., Guizani, M., Oh. J.: Semisupervised deep reinforcement
learning in support of IoT and smart city services. IEEE Internet Things J. 5(2), 624–635
(2018). https://doi.org/10.1109/JIOT.2017.2712560
11. Chapelle, O., Schlkopf, B., Zien, A.: Semi-Supervised Learning, 1st edn. The MIT Press (2010).
ISBN 0262514125, 9780262514125
12. Yazici, C.-S., Foo, S., Winkler, K.-H., Yap, G.P., Chandrasekhar, V.: The Unusual Effectiveness
of Averaging in GAN Training (2018)
13. Gao, M., Zhang, Z., Yu, G., Arik, S.O., Davis, L.S., Pfister, T.: Consistency-Based Semi-
Supervised Active Learning: Towards Minimizing Labeling Cost (2019). arXiv:1910.07153
14. Berthelot, D., Raffel, C., Roy, A., Goodfellow, I.: Understanding and Improving Interpolation
in Autoencoders Via an Adversarial Regularizer (2019)
15. Bachman, P., Alsharif, O., Precup, D.: Learning with pseudo-ensembles. In: Advances in Neural
Information Processing Systems, pp. 3365–3373 (2014)
16. Goodfellow, O.V., Saxe, A.: Qualitatively characterizing neural network optimization prob-
lems. In: International Conference on Learning Representations (2015)
17. Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A., Yamamoto, K., Ha, D.: Deep Learning
for Classical Japanese Literature. arXiv:1812.01718 (2018)
18. Oliver, A.O., Raffel, C., Cubuk, E.D., Goodfellow, J.: Realistic evaluation of deep semi-
supervised learning algorithms. In: ICLR Workshop (2018)
Performance Analysis of π, AL and CT … 747
19. Park, S., Park, J., Shin, S.-J., Moon, I.-C.: Adversarial dropout for supervised and semi-
supervised learning. In: AAAI (2018)
20. Balcan, M.-F., Broder, A., Zhang, T.: Margin based active learning. In: international Conference
on Computational Learning Theory, pp. 35–50. Springer (2007)
An Energy-Efficient PSO-Based Cloud
Scheduling Strategy
Abstract Cloud computing provides useful services to users with extensive and
scalable resources that virtualized over the internet. It defined as a collection of
the communication and computing resources located in the data-center. The service
based on on-demand is subject to QoS, the load balance, and certain other constraints
with a direct effect on the user’s consumption of resources that are controlled by this
cloud infrastructure. It is considered a popular method as it has several advantages
that have been provided by a cloud infrastructure. The cloud scheduling algorithm’s
primary goal was to bring down the time taken for completion (the cost of execution)
of the task graph. The start time and the finish time for the task node influence the
task graph completion completed to the time (the cost). The task node sort order an
essential aspect that influences the start time and the finish time for every task node.
In a hybrid cloud, efficient dense particle mass-based cloud scheduling is efficient
because users need to maintain the security of the hybrid cloud. Different algorithms
with different algorithms suggested by researchers in the cloud. This paper proposes
particle swarm optimization (PSO)-based cloud optimal scheduling. Effective results
obtained in an efficient fuzzy mass-based PSO cloud scheduling.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 749
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_79
750 R. S. Sirisati et al.
1 Introduction
Cloud computing delivers the services and resources of computers that include user
applications, computer processing power, networks, specialized corporates, and data
storage space. Cloud computing permits users to make use of software, hardware
managed by service providers of the cloud without the knowledge of the servers.
The main advantage of moving to the cloud is the scalability of the application.
Unlike the other grids, the cloud resource scalability has permitted some real-time
provisioning of the resources to meet the application’s requirements. The various
cloud services, such as storage, the capacity of data transfer and storage, are used
to bring down the expenses. Some new scheduling proposed to defeat the properties
of the network among the clients and their resources. They may use a part of the
ideas of traditional planning to combine them with some techniques to ensure effi-
cient scheduling [1]. Typically, these errands scheduled by the client’s needs. Firstly,
the scheduling algorithms executed by the networks. The reduced performance that
faced in grids, also a need for actualizing planning in clouds. Further, it enables
the workflow management systems to meet the QoS requirements for applications
as opposed to a conventional approach needed earlier in the common multi-client
grid conditions. The various cloud services, such as the data transfer capacity, the
resources, the process, and the storage, have to be accessible at a low cost. The envi-
ronments are not comfortable to be made on the grid resources. Each framework site
another setup that can result in some additional exertion each time the application
ported to one more site [2].
The VMs further permit an application developer to make a completely
customized and convenient environment in their application as shown in Fig. 1.
A traditional way to do this is to make use of all clients’ immediate undertakings
as their base of overhead applications. The main issue is the association between
the overhead application and how there are unlike ways in which there are some
overhead expenses resources found in the cloud systems. In case of a significant
number of these straightforward assignments, the cost decreased on the off chance
that it can have for the complex tasks. This on-demand service that is given by the
cloud result in the necessity of some newer strategies of scheduling. These proposed
combining some traditional concepts of scheduling along with the new parameters
of schedule like the cost of efficient scheduling, job migration, energy consumption,
and bandwidth [3].
The Resource Cost: This was determined in a cloud by the capacity of resources
and the time occupied. Their powerful resources result in a higher cost.
Scalability: It refers to the capacity to deal with and perform along with an
increased workload with the capability to be able to enhance the resources effectively.
Reliability: The system and its ability to continue the work even in a situation of
failure.
Resource Utilization: A parameter defines the effectiveness of the utilization of
the resources by the system.
2 Literature Review
The static scheduling versus the dynamic scheduling: In case of the static scheduling,
all information on the status of the resources available in the cloud and also in the
needs of the jobs. In the case of the dynamic scheduling of task allocation, the jobs
enter dynamically. The scheduler works hard on the decision making for allocating
resources within a stipulated time. The main improvement of the dynamic scheduling
over that of static scheduling is that the system does not have to have a runtime
behavior of application before running. In centralized scheduling, there is a central-
ized scheduler or a set of distributed schedulers to make global scheduling decisions.
Therefore, there is more resource control. The scheduler can continuously monitor
all the available resources with some ease on the implementation. The disadvantage,
however, is that there is a lack of scalability, performance, and fault-tolerance.
In decentralized scheduling, there is no other central entity that controls the
resources. Here, the lower schedulers called the local resources machine (LRM)
would manage and also maintain the various job queue. Energy-aware-scheduling
has some quick enhancement in cloud computing, and a data-center of a large scale
has played a vital role in the case of cloud computing. The depletion of energy for
these distributed systems (DS) is now a projecting issue that has been getting much
attention. Most of the application scheduling approaches had not been considering
the cost of energy for the network devices. The approaches also not considered the
cost of energy on the devices that have been a large portion of the consumption
An Energy-Efficient PSO-Based Cloud Scheduling Strategy 753
of power in the enormous data -centers. The model used in minimizing the energy
consumption of servers and devices of the network that developed. Gang scheduling
is a job that is efficient in the time-sharing applied in the parallel and the distributed
systems. This way, every job needs many processors to a certain amount of paral-
lelism. The tasks are executed based on arrival, send-off. In the cloud-computing
setup, using the job migration with the mutable workloads, the size of the jobs and
their types fit in the computing of high performance in cloud-computing. Different
methodology proposed by dieerent authors are arranged in Table 1.
3 Problem Statement
The scheduling algorithms that are prevalent in clouds include, based on time,
cost scheduling algorithm. The proposed novel algorithm of compromised-time-cost
scheduling that considers the cloud-computing characteristics for accommodating
the workflows that are instance-intensive and cost-constrained using compromising
the time taken for execution and the cost with the user input that enabled on the
fly. Particle swarm optimization (PSO)-based heuristics for the scheduling of Work-
flow Applications: There is a PSO-based heuristic for scheduling the applications to
the cloud resources considering the computation cost and the cost of transmission.
It used for a workflow application using changing its computation as well as the
communication cost. The results of the experiment proved that this PSO was able
to achieve a good savings of cost and proper distribution of the workload on the
resources. An improved cost-based algorithm for a task scheduling: This is to make
an efficient mapping of the available tasks in the cloud. There is a need to improvise
the traditional activity based on costing, which has been proposed by task scheduling
strategy in a cloud environment. This algorithm will divide the user tasks based on
the priority of the tasks into three lists. It measures the resource cost and also the
computation performance, thus improving the computation to communication ratio.
Several strategies and algorithms for cloud schedules have recently proposed. The
beneficiaries of cloud-computing technology depend entirely on the cloud scheduling
method or algorithm. Fuzzy inheritance-based cloud scheduling is a useful compar-
ison with older fuzzy-based cloud scheduling. Cloud scheduling based on the opti-
mization algorithm is implemented by a set of efficient fuzzy cells to improve fuzzy
gene-based scheduling tasks. Effective thin-cell mass optimization used in cloud
scheduling to make cloud scheduling more efficient with PSO algorithm shown in
Fig. 2.
754 R. S. Sirisati et al.
Table 1 (continued)
Author (Refs.) Methodology Parameters Algorithm Computing model
proposed worked theme
[16] Pheromone nature Power usage VM Placement Cloud simulation
in virtual model
machines
[17] Load hot spots Scheduling self-reliant Cloud model
utilization ACO
[18] Pheromone Completion time self-reliant Grid simulation
strategy of model (GridSim
updation toolkit)
[19] Pheromone Scheduling self-reliant Not mentioned
strategy of
updation
[20] Online Response time, VM Placement CloudSim
environment throughput
[21] Basic ant colony Completion time self-reliant Cloud simulation
optimization model (CloudSim
toolkit)
[22] ACO and PSO Resource Workflows Cloud simulation
(hybrid) utilization ratio, model (MatLlab
completion time 7.0)
[23] Dynamic load Scheduling self-reliant Java
balancing
Cell populations are a population-based strategy for adaptation that mimics the
social behavior of fish or bird livestock. In the PSO system, each candidate solution
is called a piece. Each cell moves faster in the search space. It adjusts dynamically
depending on the experience of the associated cells and cell partners. In mathematics,
cells organized by the following Eqs. (1) and (2):
5 Experimental Results
The performance of the proposed method was analyzed using the Cloud Sim Simula-
tion Kit. Cloud Sim Toolkit Mode supports researchers in the cloud-computing envi-
ronment, where the laboratory of cloud computing and distribution systems at the
An Energy-Efficient PSO-Based Cloud Scheduling Strategy 757
Table 2 Time is taken makespan for various algorithm execution (in a sec)
Algorithms/no of tasks 50 100 150 200
EPSO 382 556 690 906
FGA 404 578 712 924
PSO 430 600 730 1000
GA 458 626 758 1052
FL 488 700 848 1310
800
600
400
200
0
EFPSO FGA PSO GA FL
Algorithm
758 R. S. Sirisati et al.
Computational Cost
Tasks for various algorithm 30000
execution (in a sec) 25000
20000
15000
10000
5000
0
EFPSO FGA PSO GA FL
Algorithm
6 Conclusion
Since cloud computing provides the resources based on demand, it is called the
on-demand resource provisioning based on a subscription. There is also a central
remote server that maintains data and application. Owing to the reliability, fault toler-
ance, effective communication and speed, cloud computing is now a fast-emerging
technology. Cloud computing provides several scheduling algorithms for solving
real-world computing resource provisioning. In this paper, an energy efficient fuzzy
particle swarm optimization algorithm developed to provide optimum optimization
in cloud filling. In the scheduling process, optimization is an important step. Due to
the lack of optimization, some scheduling processes yield fewer results than cloud
scheduling, using only genetic algorithm or fuzzy logic. It gave good results, and
the performance of the algorithm was much better than other algorithms. Therefore,
it concluded here that the fuzzy particle mass optimization algorithm is effective in
cloud scheduling.
References
1. Ge, J.W., Yuan, Y.S.: Research of cloud computing task scheduling algorithm based on
improved genetic algorithm. Appl. Mech. Mater. 347, 2426–2429 (2013)
2. Li, K., Xu, G., Zhao, G., Dong, Y., Wang, D.: Cloud task scheduling based on load balancing
ant colony optimization. Sixth Annu. Chinagrid Conf. 2011, 3–9 (2011). https://doi.org/10.
1109/ChinaGrid.2011.17
3. Kousalya, K.: To improve ant algorithm’ s grid scheduling using local search. Int. J. Comput.
Cogn. 7, 47–57 (2009)
An Energy-Efficient PSO-Based Cloud Scheduling Strategy 759
4. Bagherzadeh, J., MadadyarAdeh, M.: An improved ant algorithm for grid scheduling problem
using biased initial ants. In: 3rd International Conference on Computer Research and
Development, pp. 373–378 (2011). https://doi.org/10.1109/CSICC.2009.5349368
5. Chen, W.-N., Zhang, J.Z.J.: An ant colony optimization approach to a grid workflow scheduling
problem with various QoS requirements. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.)
39, 29–43 (2009). https://doi.org/10.1109/TSMCC.2008.2001722
6. Chen, W.-N., Zhang, J., Yu, Y.: Workflow scheduling in grids: an ant colony optimization
approach. IEEE Congr. Evol. Comput. 3308–3315 (2007)
7. Chen, W., Shi, Y., Zhang, J.: An ant colony optimization algorithm for the time-varying
workflow scheduling problem in grids. IEEE Congr. Evol. Comput. 875–880 (2009)
8. Chiang, C.-W., Lee, Y.-C., Lee, C.-N., Chou, T.-Y.: Ant colony optimisation for task matching
and scheduling. IEE Proc. Comput. Digit. Tech. 153, 373–380 (2006). https://doi.org/10.1049/
ip-cdt
9. Chimakurthi, L., Madhu Kumar, S.: Power efficient resource allocation for clouds using ant
colony framework. Available from arXiv:11022608 (2011)
10. Dam, S., Mandal, G., Dasgupta, K., Dutta, P.: An ant colony based load balancing strategy
in cloud computing. Adv. Comput. Netw. Inform. 2, 403–413 (2014). https://doi.org/10.1007/
978-3-319-073507
11. Feller, E., Rilling, L., Morin, C.: Energy-aware ant colony based workload placement in clouds.
In: Proceedings of 12th IEEE/ACM International Conference on Grid Computing, pp. 26–33
(2011). https://doi.org/10.1109/Grid.2011.13
12. Ferdaus, M.H., Murshed, M., Calheiros, R.N., Buyya, R.: Virtual machine consolidation in
cloud data centers using ACO metaheuristic. In: Euro-Par 2014 Parallel Process, pp. 306–317.
Springer (2014). https://doi.org/10.1007/978-3-319-09873-9
13. Hu, Y., Xing, L., Zhang, W., Xiao, W., Tang, D.: A knowledge-based ant colony optimization
for a grid workflow scheduling problem. In: Adv. Swarm Intell. Notes Comput. Sci. 241–248
(2010). https://doi.org/10.1007/978-3-642-38703-6
14. Khan, S., Sharma, N.: Effective scheduling algorithm for load balancing (SALB) using ant
colony optimization in cloud computing. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4, 966–973
(2014)
15. Liu, A.L.A., Wang, Z.W.Z.: Grid task scheduling based on adaptive ant colony algorithm.
In: International Conference on Management e-Commerce eGovernment, pp. 415–418. IEEE
(2008). https://doi.org/10.1109/ICMECG.2008.50
16. Liu, X., Zhan, Z., Du, K., Chen, W.: Energy aware virtual machine placement scheduling in
cloud computing based on ant colony optimization. In: Proceedings of Conference on Genetic
and Evolution Computing, pp. 41–47. ACM (2014). https://doi.org/10.1145/2576768.2598265
17. Lu, X., Gu, Z.: A load-adapative cloud resource scheduling model based on ant colony
algorithm. In: IEEE International Conference on Cloud Computing Intelligent System 2011,
pp. 296–300. https://doi.org/10.1109/CCIS.2011.6045078
18. Mathiyalagan, P., Suriya, S., Sivanandam, S.N.: Modified ant colony algorithm for grid
scheduling. Int. J. Comput. Sci. Eng. 02, 132–139 (2010)
19. Nishant, K., Sharma, P., Krishna, V., Gupta, C., Singh, K.P., Nitin, et al.: Load balancing of
nodes in cloud using ant colony optimization. In: UKSim 14th International Conference on
Computing Model Simulation, pp. 3–8 (2012). https://doi.org/10.1109/UKSim.2012.11
20. Pacini, E., Mateos, C., Garino C.G.: Balancing throughput and response time in online scientific
clouds via ant colony optimization. Adv. Eng. Softw. 84, 31–47 (2015)
21. Tawfeek, M.A., El-Sisi, A., Keshk, A.E., Torkey, F.A.: Cloud task scheduling based on ant
colony optimization. In: 8th International Conference on Computer Engineering System,
pp. 64–69 (2013). https://doi.org/10.1109/ICCES.2013.6707172
760 R. S. Sirisati et al.
22. Wen, X., Huang, M., Shi, J.: Study on resources scheduling based on ACO algorithm and PSO
algorithm in cloud computing. In: Proceedings of 11th International Symposium Distribution
Computing Application to Business Engineering Science, pp. 219–222 (2012). https://doi.org/
10.1109/DCABES.2012.63
23. Zhang, Z., Zhang, X.: A load balancing mechanism based on ant colony and complex network
theory in open cloud computing federation. Int. Conf. Ind. Mechatron. Autom. 2, 240–243
(2010). https://doi.org/10.1109/ICINDMA.2010.5538385
A Pronoun Replacement-Based Special
Tagging System for Bengali Language
Processing (BLP)
Abstract Natural language processing (NLP) is one of the most important thing for
human machine interaction and a very important thing for machine learning system.
In the world, over 27 crore people use Bengali as their first and mother language, and
it has its own written system, so it is very much important to process Bengali language
for natural language processing. In this research work, we have tried to demonstrate
an upgraded parts of speech tagging system (POS) for Bengali language, where we
have used special tagging system with general grammatical parts of speech based on
many different things like—Considering suffixes for verb, where get 68% success
rate. We have also added places name, occupation name, Bengali Name, Bengali
repeated word, digit of Bengali in both written and digit form, English acronym,
organization name for both cases. The success rate of tagging for genera tagging is
70 and 76% for special tagging which is ever highest. This tagging system can be
used for Bengali language processing (BLP) like—sentiment analysis for Bengali,
Bengali text summarization, etc.
Keywords POS tagging · Bengali POS tagging · Special tagging · BLP · NLP
Bengali · Bangla POS tagging
B. Jahan · I. S. Emon
Department of CSE, Feni University, Feni, Bangladesh
e-mail: hossenbipasa980@gmail.com
I. S. Emon
e-mail: emonsahriar0@gmail.com
S. A. Milu
Department of CSTE, Noakhali Science and Technology University, Noakhali, Bangladesh
e-mail: sharminmilu7@gmail.com
M. M. Hossain
Department of CSE, Asian University of Bangladesh, Dhaka, Bangladesh
e-mail: mobarak3112@gmail.com
S. S. Mahtab (B)
Department of EEE, Feni University, Feni, Chittagong Division, Bangladesh
e-mail: mahtabshahzad@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 761
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_80
762 B. Jahan et al.
1 Introduction
Nowadays, we share, store, write, and publish text through the revolutionized
advances of Internet, hardware, and software. In this regard, a new era of information
explosion is impending. Users often find each retrieved document very lengthy that
is very tedious and time consuming to read [1]. Therefore, the automatic text summa-
rization is needed to process the huge amount of Internet data efficiently [2]. It is a
matter of fact that unlike English which has seen a large number of systems devel-
oped to cater to it; other languages are less fortunate [3]. So, the development of text
summarization has no mentionable progress for other languages specifically Bangla.
Point to be stated that we are Bangladeshi people, and our national language is Bangla.
Bangla is the fourth largest language in the Indo-European language family and the
sixth largest in the world in terms of number of native speakers. Bangla language is
also the seventh largest oral language in the world out of 3500 languages. Bangla
is the mother tongue of Bangladeshi people and the second largest oral language of
some states in India. According to economic surveys—2015, 62.2% of the educated
people in Bangladesh and most of them are only accustomed to Bengali language
only [4]. Summarize the text scientific documents, the literature, news documents,
books, etc. may be required today; the online content of Bangla news documents is
growing very fast, and mass people are reading it regularly [6]. Bangla news docu-
ment like, the online portal of Bangla magazine is also increasing rapidly, and also
electronic Bangla text is expanding in the cyber world with no borders, and there are
lots of great lessons from people and so on. Very few research works have been done
for improving the expanding big amount of text of Bangla. In addition, more research
work needs to be done for the community of Bengali-speaking people, especially for
retrieving Bangla information [7].
2 Methodology
The research work of Bangla text documents is much more difficult for the ensuring
things:
i. According to our study, automated tools for research work are rarely available
for Bengali language.
ii. Though a similar tool is under development and has limited features, but there
is no lexicon-based dictionary like WordNet for Bangla.
iii. Researchers have done differently, and there is very little consolidation.
iv. Lack of free and open source software [8].
v. The Bengali language originated from Sanskrit and is mostly maintained incon-
sistency rules. For proper recognition of the structure of sentences, the subjects
and objects of all the sentences need to be identified which are not easy in Bangla
compare to English.
A Pronoun Replacement-Based Special … 763
Despite these difficult issues, this method is focused on these difficulties in the
output of Bangla data recovery methods. In data recovery output, where the corre-
sponding nouns are missing, here, some sentences can be extracted with dangling
pronouns. User would not get any appropriate message from summary, and there are
chances to misunderstand the text for these pronouns. So, these dangling pronouns
need to be replaced by corresponding nouns [9]. Based on the analysis of Bangla news
documents, the immediate preceding sentence or in the second immediate preceding
sentence, it has been observed that the noun related to any pronoun is existed 88.63%
of the time.
The purpose to study this work is to recover Bangla information output methods
to free from dangling pronouns. Otherwise, a user may be misunderstood the text
because of an only dangling pronoun that is sufficient for sending the incorrect
message. In between in the situation, a method to solve this problem with the
following heads is proposed here contributions [10, 11]:
(i) Verify the nature of a word is needed to dependency parsing. In which the
inactive words are tagged by using dependency parsing; (ii) Identify pronouns and
distinguish subject or object; (iii) Detecting every words of the nature of Bangla text
sentence as noun, pronoun, regardless of verbs, subjects, objects, digits, acronym,
organization, people, and place names, etc. There are two phases in which words are
tagged as general and special tagging; (iv) Identify nouns related to pronouns and
replace pronouns in appropriate format [12].
Derived from the research of the authors’, this is found the much better result for
pronoun replacing in Bangla than others. Some rules are used here after analyzing
some news documents. For instance, special tagging, dependency parsing, and subject
and object identification, and all pronoun replacements are done after completion,
based on these rules. To carry out this work, we have been selected 3000 Bangla
news documents [collected from the most popular Bangladeshi newspaper the Daily
Prothom-Alo (February 2020)].
All words are tagged (like noun, pronoun, adjective, verb, preposition, etc.) by using a
lexicon database [1] and SentiWordNet [2]. Using lexicon database, we can be tagged
the words as “JJ” (Adjective), “NP” (Proper noun), “VM” (Verb), “NC” (Common
Noun), “PPR” (Pronoun), etc. On the other hand, SentiWordNet has list of words
with tag as “a” (Adjective), “n” (Noun), “r” (Adverb), “v” (Verb), “u” (Unknown).
Depend on these lists of words that are predefined, we have experimented on 200
Bangla news documents and found that 70% words can be tagged. Though we use
word stemming to find base form of word, here when the verb is not active form, then
it couldn’t be stemmed. In fact, it is very difficult for identifying verb because there are
many suffixes in Bangla. For instance, depending on the tense and person, the English
words “do” may be “doing,” “did,” and “does,” but on the other hand, the word may
have different forms in Bangla. To consider the present continuous tense like, “ ”
764 B. Jahan et al.
(kor-do), three main forms of this word can only have depend on the first, second,
and third person. Also it can be “ ” (doing) for first person, “ ” (doing)
for second person and “ ” (doing) for third person, respectively. To consider
the present continuous tense like, “ ” (kor-do), three main forms of this word
can only have depend on the first, second, and third person. Also it can be “ ”
(doing) for first person, “ ” (doing) for second person and “ ” (doing)
for third person, respectively. All these meanings for the forms of verbs of “you” are
also different in Bangla. As, “ ” (you are doing), “ ” (you
are doing), “ ” (you are doing) where those terms are specified in present
continuous tense and also with second person. Thus, this word “ ” (do) may have
the next forms: “ ” (do), “ ” (do), “ ” (do), “ ” (do), “ ”
(doing), “ ” (doing), “ ” (doing), “ ” (doing), “ ” (doing), “
” (did), “ ” (did), “ ” (did), “ ” (did), “ ” (did), “
” (do), “ ” (do), “ ” (did), “ ” (did), “ ” (did), “ ”
(did), “ ” (did), “ ” (do), “ ” (did), “ ” (did), “ ”
(did), “ ” (did), ” (doing), “ ” (doing), “ ” (doing),
“ ” (doing), “ ” (doing), “ ” (doing), “ ” (doing), “
” (doing), “ ” (doing), “ ” (doing), (doing),
“ ” (doing), “ ” (doing), “ ” (do), “ ” (do), “ ”
(do), “ ” (do), “ ” (do). However, verb identification plays a vital role for
language processing because this is the main root of a sentence. Thus, there is no any
comparison between the complexity of verb in Bangla and English. A list of suffixes
is considered as for the final checking in following: “ ” (itechhis), “ ”
(techhis), “ ” (itis), “ ” (ile), “ ” (ibi), etc. The result of the percentage
of word tagging has been amplified from 68.12% (before using the list of suffixes
[4]) to 70% (after using the list of suffixes). Some tagging we get in this step can be
an initial tag and some tags updating in the next steps. Again, certain words will be
specifically tagged as acronym, named unit, occupation, etc., in the next step.
After general tagging, special tagging was introduced to identify the words as
acronym, elementary form, numerical figure, repetitive words, name of occupation,
organization, and places.
1. Examining for English acronym: When the words are formed by the initials of the
other words, then it is called acronym. Such as, “ ” (UNO), “ ”
(OIC), “ ” (USA), etc. For examining these kinds of words, when we
can separate these words that like: “ ” (UNO) to match with “ ” (U),
“ ”, “ ” (O), those are matched every letter of the words. Actually, we can
write all English letters in Bangla like: A for (“ ”), B for (“ ”), C for (“ ”),
D for (“ ”), … W for (“ ”), X for (“ ”), Y for (“ ”), Z for
(“ ”), and if we can sort them by descending order depend on their string
A Pronoun Replacement-Based Special … 765
example, “ ” “RajdhaniUnnayanKarti-
pakkh (RAJUK)-Anticorruption Commission (ACC).” Depending on the
total number of letters in the acronym, if there is any acronym bounded with
parentheses. Then, before the acronym of the number of same words are
tagged as the name of organization. In this case, the acronym can be added
to the initial letters of the word immediately after the commencement of
the acronym; otherwise, this process will not be applicable. Research shows
that after name of acronym in parentheses may be found for the name of
organization 85%.
(b) The organization name with last part may contain certain words. Such as,
“ ” (limited-limited), “ ” (biddaloy-school), “ ”
(montronaloy-ministry), “ ” (kong–kong), etc., [5]. Along with the
above point. If any such of words are presented in the text according to the
point (b), then immediately check the three words of the particular word.
Consisting of more than three words and then also selecting three words of
the organization, it will be considered sufficient for the purpose. Uncertainty
when the three words are found as noun, name entity or any blocked word,
then call them the name of an organization. It is found that the organizations
name can be accepted for 85% times based on the point (b).
7. Studying for name of place: There is a table the name of places of Bangladesh, it
is made with 800 names for the list of division, district, upazila, and municipality.
Here, the top level is division, second level is district, and third level is upazila
or municipality in area-based separation. In addition, we have analyzed 230
countries names and their capitals. In this way, about 91% of the place names
can be identified in our experiment.
Word tagging success rate experimental result of word tagging success rate is given
in Table 1 for each phase. The Experiment has been conducted on 32,143 words of
200 test documents.
In the results of special tagging (shown in Table 2), it has been found that some
nature of words have been identified for 100% as acronym, initial, numerical figure
from digits and words. The procedures have followed some patterns to identify these
(acronym, initial, numerical figure from digits and words) and not based on any
limited number of predefined words. These specific patterns are the main reason
for 100% success rate achievement. But some nature of words can’t be identified
completely. Here, occupation has been identified as 96%, name of organization by
considering acronym 85%, name of human and places have been identified as 100
and 91% correspondingly. The procedures have utilized some lists of predefined
words to identify occupation, name of organization, name of human, and places.
From the evaluated 200 documents, we have counted the pronoun manually and
crosscheck it with our program. The results of replacement of pronoun and number
of pronouns have been given in Table 3.
4 Conclusion
Natural language processing (NLP) is one of the most important thing for human
machine interaction and a very important thing for machine learning system. In the
world, over 27 crore people use Bengali as their first and mother language, and it
has its own written system, so it is very much important to process Bengali language
for natural language processing. In this research work, we have tried to demonstrate
an upgraded parts of speech tagging system (POS) for Bengali language, where we
have used special tagging system with general grammatical parts of speech based on
768 B. Jahan et al.
many different things like—Considering suffixes for verb, where get 68% success
rate. We have also added places name, occupation name, Bengali name, Bengali
repeated word, digit of Bengali in both written and digit form, English acronym,
organization name for both cases. The success rate of tagging for Genera tagging is
70 and 76% for special tagging which is ever highest. This tagging system can be
used for Bengali language processing (BLP) like—sentiment analysis for Bengali,
Bengali text summarization, etc.
References
1. Azmi, A.M., Al-Thanyyan, S.: A text summarizer for arabic. J. Comput. Speech Lang. 26(4),
260–273 (2012)
2. Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
3. Indian Statistical Institute: A Lexical Database for Bengali 2015 [Online]. Available https://
www.isical.ac.in/∼lru/wordnetnew/index.php/site/aboutus. Accessed 28 Oct 2015
4. Chakma, R. et al.: Navigation and tracking of AGV in ware house via wireless sensor network.
In: 2019 IEEE 3rd International Electrical and Energy Conference (CIEEC), pp. 1686–1690.
Beijing, China, 2019. https://doi.org/10.1109/CIEEC47146.2019.CIEEC-2019589
5. Milu, S.A., et al.: Sentiment analysis of Bengali reviews for data and knowledge engineering:
a Bengali language processing approach. In: Bindhu, V., Chen, J., Tavares, J. (eds.) Interna-
tional Conference on Communication, Computing and Electronics Systems. Lecture Notes in
Electrical Engineering, vol. 637. Springer, Singapore (2020). https://doi.org/10.1007/978-981-
15-2612-1_8
6. Notes for Students: Rule Based System, Nov 2000 [Online]. Available https://www.jpaine.org/
students/lectures/lect3/node5.html. Accessed 01 Apr 2017
7. Gpedia: Gpedia, your encyclopaedia [Online]. Available www.gpedia.com/bn. Accessed 25
June 2016
8. BdJobs.com: Occupation in Bangladesh, Name of Occupation in Largest Job Site in
Bangladesh, Feb 2016 [Online]. Available https://bdjobs.com. Accessed 25 June 2016
9. Emon, I.S., Ahmed, S.S., Milu, S.A., Mahtab, S.S.: Sentiment analysis of Bengali online
reviews written with English letter using machine learning approaches. In: Proceedings of the
6th International Conference on Networking, Systems and Security (NSysS ’19). Association
for Computing Machinery, New York, NY, USA, pp. 109–115 (2019). https://doi.org/10.1145/
3362966.3362977
10. Khan, M.F.S., Mahtab, S.S.: PLC based energy-efficient home automation system with smart
task scheduling. In: 2019 IEEE Sustainable Power and Energy Conference (iSPEC), pp. 35–38.
Beijing, China, 2019.https://doi.org/10.1109/iSPEC48194.2019.8975223
11. Ahmed, S.S., et al.: Opinion mining of bengali review written with english character using
machine learning approaches. In: Bindhu, V., Chen, J., Tavares, J. (eds.) International Confer-
ence on Communication, Computing and Electronics Systems. Lecture Notes in Electrical
Engineering, vol. 637. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-261
2-1_5
12. Mahtab, S.S., Monsur, A., Ahmed, S.S., Chakma, R., Alam, M.J.: Design and optimization of
perovskite solar cell with thin ZnO insulator layer as electron transport. In: 2018 International
Conference on Advancement in Electrical and Electronic Engineering (ICAEEE), pp. 1–4.
IEEE, Gazipur, Bangladesh (2018). https://doi.org/10.1109/ICAEEE.2018.8643012
A High Performance Pipelined Parallel
Generative Adversarial Network
(PipeGAN)
1 Introduction
Generative Adversarial Networks [1] are gaining popularity with extensive use in
the generation of realistic images of objects, people, and scenes among others. Other
applications of GANs include image-to-image translation tasks such as translating
photos of night to day and autumn to spring. A GAN is usually composed of two
contending neural networks co-trained at the same time, involving forward and back-
ward propagation. The co-training, though effective, suffers from the curse of tight
dependencies posing a significant parallelism challenge and is not amenable to tra-
ditional data and task parallelization techniques. Pipeline parallelization divides the
work into different subtasks that constitutes pipeline stages thereby overlaps execu-
tion as data passes through stages. If stages are kept sufficiently balanced, this style
of parallelism offers good speed-up even in the presence of dependencies which ren-
der data and task parallelism impotent. Pipeline parallelism is ideal for accelerating
GANs involving thousands of batches in each epoch by overlapping batches.
In this paper, we describe and experimentally evaluate our novel and sophisticated
pipelined parallel implementation of GANs that accurately classify and generate new
images with significant speed-up. Our main contributions are as follows:
1. Design and implementation of a novel and sophisticated pipelined parallel imple-
mentation of GANs.
2. Detailed Experimental Evaluation of our scheme on standard MNIST and Fashion
MNIST dataset demonstrating speed-up of up to 30% with an average speed-up
of 20%.
The rest of the paper is organized as follows. We discuss related work in Sect. 2,
followed by our methodology and pipelined parallel algorithm in Sect. 3. We discuss
our implementation in Sect. 4 followed by a hard-nosed experimental evaluation of
our technique in Sect. 5. We present the results in Sect. 6 and conclude in Sect. 7.
2 Related Work
PipeDream [2] parallelize deep neural networks separating model layers into various
stages using an optimization algorithm mapping each stage to GPU. Every stage
performs a forward propagation and passes the result to the next, and the loss is
calculated at the final stage. This loss is propagated backward and the weights of the
model are updated. PipeDream pipelines the training of internal layers. However,
PipeDream does not pipeline the training process across multiple networks.
GPipe [3] proposes a batch splitting algorithm for fast training of large scale Deep
Neural Networks. During forward propagation, mini-batches are divided into smaller
batches and are pipelined over accelerators. Similarly, during backward propagation,
the gradients from each smaller batch are aggregated for every mini-batch to update
the parameters used in the model. Although, this technique attempts pipelined par-
allelism in deep neural networks. However, GPipe does not consider adversarial
settings peculiar to GANs.
Another approach to specifically parallelizing the training of Convolutional Neural
Networks (CNNs) [4] strives to maximize the communications. It uses a thread once
the gradients are computed followed by concurrent communication of data generated
during backpropagation and computations from different layers. This approach is not
focused on GANs.
A High Performance Pipelined Parallel Generative … 771
Whereas all the earlier techniques are specific to pipelining or parallelizing within
a neural network and don’t take into consideration any adversarial settings like that
of a GAN which separates our work from earlier proposals. Earlier techniques, as
discussed, focus mostly on the performance enhancements while training the inter-
nal layers of the neural network. Our technique specifically resolves the difficult
challenge of pipelining the training process itself.
3 Methodology
3.1 Training
Figure 1 shows various stages of computation and evaluation used in the training of
the GAN with input training data of images split into multiple mini-batches of size
d. These stages with input and output are outlined as follows:
• Discriminator Real Forward Propagation: Takes a mini-batch of real images and
outputs probability Preal that differentiates real from fake images.
• Gradient Compute Discriminator Real: Takes Probability Preal and determines loss
using the function in (1). This is followed by padding the computed loss with a
cost function used to compute gradients for each discriminator layer.
• Discriminator Weight Update Real: The discriminator’s weights are updated using
gradients.
• Generator Fake Forward Propagation: Forward propagation of the generator is
performed by transforming a random noise into d images. These images are labeled
as Fake Images.
• Discriminator Fake Forward Propagation: Fake Images are passed to the discrim-
inator which emits a probability Pfake .
• Gradient Compute Discriminator Fake: Gradients are computed using the proba-
bility Pfake based on the loss function specified in (2).
• Discriminator Weight Update Fake: Weights are updated based on gradients com-
puted in the Gradient Compute Discriminator Fake stage.
• Generator Forward Propagation: A new set of fake images is generated using the
same noise used to train the discriminator in order to train the generator.
• Discriminator Generator Forward Propagation: These images are passed to dis-
criminator to emit probabilities.
• Gradient Compute Generator: Probabilities are used to calculate gradients.
• Generator Weight Update: Generator weights are updated based on the gradients
computed.
A total of 40 epochs are performed on the MNIST and Fashion MNIST datasets as
the generated images reach a desirable quality at this point.
To implement pipelining in the GAN training process, the training data is split into
mini-batches and the 11 stage pipeline as described in Sect. 3.1 is used for training.
These overlapped executions of stages are interconnected in a pipeline structure to
provide efficiency. Figure 2 shows the pipelined structure using a timing diagram
for the abbreviated stages. A column denotes a point in time along with the units
running at that instant. As multiple units run simultaneously, pipeline parallelism
is harnessed. At the beginning, training data consisting of the images are split into
mini-batches which are maintained in a queue called the BatchQ. A mini-batch from
this queue is passed to stage 1 and in the next time step, it is propagated to stage 2.
Simultaneously, the next mini-batch enters stage 1 and so on. The pipeline is full and
runs at maximum capacity after 10 such iterations. The first 10 iterations are marked
by the HandleStart() function in Algorithm 1. When all functional units are busy,
mini-batches are sent to the functional unit queue one after another in a pipelined
fashion. Once the mini-batch finishes all the stages of the functional unit queue, that
mini-batch is popped from the BatchQ. These sequences of steps are done repeatedly
for e number of times.
Dependency analysis is performed in the kernel part of the pipeline. The results are
shown in Fig. 3. Nodes marked in grey are Weight Updates which causes Read After
Write (RAW) and Write After Write (WAW) dependencies. These dependencies
occur during shared weight update that causes conflicts. These conflicts are resolved
by explicit synchronization mechanisms such as locks or mutexes.
A High Performance Pipelined Parallel Generative … 773
4 Implementation
We have used the standard Modified National Institute of Standards and Technol-
ogy (MNIST) dataset having a large number of handwritten digits as well as Fashion
MNIST dataset with 28 × 28 grayscale images, from 10 classes namely apparel types
to further corroborate our results. Figures 4 and 5 depict generated images using these
datasets.
A serial, a non-pipelined parallel and pipelined parallel version of GAN is imple-
mented using Python. The non-pipelined parallel version exploited vectorization and
data parallelism wherever possible along with concurrent execution in the weight
update phases. Two pipelined versions using NumPy and PyTorch implement the
pipelined architecture as explained earlier. Pipelined NumPy version has the advan-
tage of not requiring backward compatibility, and the Pipelined PyTorch imple-
mentation generates quality MNIST and Fashion MNIST images at even higher
performance.
A High Performance Pipelined Parallel Generative … 775
5 Experimental Evaluation
The experimental setup consists of a GPU enabled Google Colab with 12.72GB
RAM, 68.4GB disk space, and a Python3 Google Compute Engine GPU having
multiple cores to leverage the pipeline parallel execution. The implementation of
non-pipelined parallel version shows that the overhead of spawning the threads not
only steals any possible benefits but essentially takes more time compared to serial
776 R. Chandan et al.
Fig. 7 Execution times of pipelined and serial numpy and PyTorch versions
Figure 7 depicts the speed-up obtained by our pipelined NumPy and PyTorch imple-
mentations compared to the respective serial version. We observe significant perfor-
mance gain up to 30% with an average speed-up as 23% for NumPy. PyTorch serial
GAN implementation has some pre-existing optimization resulting in a relatively
lesser speed-up compared to NumPy. However, our pipelined PyTorch version still
gained significantly in performance with maximum speed-up of 23% and an average
speed-up of 15% compared to pre-optimized serial version in the context of PyTorch.
7 Conclusion
References
1. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.,
Bengio, Y.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, 2672–2680 (2014)
2. Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N.R., Ganger, G.R., Gibbons,
P.B., Zaharia, M.: PipeDream: generalized pipeline parallelism for DNN training. In: Proceed-
ings of the 27th ACM Symposium on Operating Systems Principles (SOSP ’19). Association
for Computing Machinery, New York, NY, USA, pp. 1–15 (2019)
3. Huang, Y., Cheng, Y., Chen, D., Lee, H., Ngiam, J., Le, Q.V., Chen, Z.: Efficient Training of
Giant Neural Networks using Pipeline Parallelism, GPipe (2018)
4. Lee, S., Jha, D., Agrawal, A., Choudhary, A., Liao, W.: Parallel deep convolutional neural
network training by exploiting the overlapping of computation and communication. In: 2017
IEEE 24th International Conference on High Performance Computing (HiPC), Jaipur, pp. 183–
192 (2017)
Electroencephalogram-Based
Classification of Brain Disorders Using
Artificial Intelligence
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 779
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_82
780 L. Raja and R. Santhosh
sleep disorders and brain death and to monitor depth of anesthesia. Most of the EEG
signals range between 1 and 20 Hz. They have bandwidths like alpha, beta, theta and
delta [2].
In this study, EEG datasets of people with various brain disorders were obtained
and used. Those EEG were obtained using placing scalp electrodes using Interna-
tional 10–20 system (Fig. 1). Brain is inside skull covered by scalp outside [3]. All
the cells in the human body have a resting membrane potential and can produce
electrical signal. So, an EEG not only records brain electrical activity, but also the
extra-neuronal signals too (Fig. 2). These are termed as artifacts.
As a result, pre-processing methods or filtering was used to remove the unwanted
noise present in the EEG data which were present around the range of 0.4–100 Hz.
Moreover, electrical or power line interference usually present at around 50 Hz [4,
5]. This was removed using a notch filter. After pre-processing, dual tree complex
wavelet transform (DTCWT) was used to extract feature of the signals. Gaussian
mixture model (GMM) classifier was used to classify EEG signals to identify the
brain disorders.
Fig. 1 International 10–20 system of placing EEG electrodes and composite signal of EEG rhythms
Fig. 2 International 10–20 system of placing EEG electrodes and composite signal of EEG rhythms
Electroencephalogram-Based Classification … 781
Wavelet transform is the process of converting the wavelets of the signals to approxi-
mate coefficients. Data compression, motion estimation, classification and denoising
are some of the issues which can be solved using wavelet transform. Wavelet trans-
form helps to preserve the symmetry, smoothness and shape which are important to
get correct coefficients. DTCWT uses high and low double wavelet filters at each
scale. This results in real and imaginary complex wavelet coefficient. This property
is applied in areas of pattern recognition and signal processing [8, 9].
When the real signals are needed to be further processed or transformed, an
inverse transform of length is required, which will be produced by this transform. A
representation of this method is given in Fig. 5. Datasets of denoised EEG signals
given in Fig. 6a and b.
Electroencephalogram-Based Classification … 783
The two basic categories of classifiers are deterministic and statistical classifiers. The
deterministic classifiers can be explained as the classifiers which take into account
for initialization of the unlabeled parameters, and search is taken place only in the
Electroencephalogram-Based Classification … 785
search space [10]. On the other hand, threshold values of density functions are only
considered for statistical classifiers. Here, we are dealing with the Gaussian mixture
model which is an unsupervised learning method where we can find the pattern
without using class labels [11, 12].
The Expectation Maximization (EM) technique is used to find the amount of
data points which will be present in the clusters [13–15]. Later, cluster means and
covariance are calculated based on it. As a result, a cluster covariance is produced
for the signals used. And as we partition these clusters, we can differentiate and find
the classification pattern. See Fig. 7.
As a result, standard mean and covariance matrix are developed which helps to
analyze the independent Gaussian distribution. Using the patterns, we can eventually
distinguish the EEG signals of people with brain disorders with normal people.
5 Conclusion
The scope of the paper is in real-time analysis of people with brain disorder. In
the exponentially increasing population and parallelly increasing disorders, it is very
786 L. Raja and R. Santhosh
difficult to expect humanly diagnosis and personal inspection for each person. More-
over, some minute patterns will be ignored by human eyes due to the extent of data
in case of each person. As a result, analyzing EEG signals of such people seems to
be the best solution. In our paper, we have concluded that combination of DTCWT
and Gaussian mixture model classifier seems best for the purpose.
References
1. Teplan, M.: Fundamentals of EEG measurement. Measur. Sci. Rev. 2, Section 2 (2002)
2. Britton, J.W., Frey, L.C., Hopp, J.: Electroencephalography (EEG): an introductory text and
atlas of normal and abnormal findings in adults, children, and infants. American Epilepsy
Society. Chicago (2016)
3. Fahmie, M., Bin, I., Rodzi, M.: EEG Acquisition Using Labview. Faculty Electronics and
communication Engineering, Kolej University Teknikal Kebangsaan, Malaysia, May 2006
4. Adalarasu, K.: Detection of early onset of driver fatigue using multimodal bio signal.
Department of biotechnology, Indian institute of technology, Chennai India, February 2010
5. Arman, S.I., Ahmed, A., Syed, A.: Cost-effective EEG signal acquisition and recording system.
Int. J. Biosci. Biochem. Bioinform. 2(5) (2012)
6. Khatwani, P., Tiwari, A.: A survey on different noise removal techniques of EEG signals. Int.
J. Adv. Res. Comput. Commun. Eng. 2(2). ISSN 2319-5940 (2013)
7. Gurumurthy, S., VudiSai Mahit, Ghosh, R.: Analysis and simulation of brain signal data by
EEG signal processing technique using MATLAB. Int. J. Eng. Technol. (IJET) 5(3), ISSN
0975-4024 (2013)
8. Kingsbury, N.: The dual tree complex wavelet transform: a new technique for shift invariance
and directional filters. University of Cambridge, Cambridge CB2 1PZ
9. Slimen, I.B., Boubchir, L., Mbarki, Z., Seddik, H.: EEG epileptic seizure detection and clas-
sification based on dual-tree complex wavelet transform and machine learning algorithms. J.
Biomed. Res. 34(3), 151–161. https://doi.org/10.7555/JBR.34.20190026
10. Cao, M.: Practice on classification using gaussian mixture model course project report for
COMP-135 (2010)
11. Lakshmi, R., Prasad, T.V., Prakash, C.: Survey on EEG signal processing methods. Int. J. Adv.
Res. Comput. Sci. Softw. Eng. 4(1). ISSN 2277-128X (2014)
12. Raj, A., Deo, A., Kumari, M., Tripathi, S.: A review on automated detection, classification
and clustering of epileptic EEG using wavelet transform and soft computing techniques. Int. J.
Innov. Res. Sci. Eng. 17. ISSN 2347-320 (2016)
13. Patel, R.: A real time frequency analysis of the electroencephalogram using lab view. A Thesis
Submitted to the Faculty of New Jersey Institute of Technology in Partial Fulfillment of the
Requirements for the Degree of Master of Science in Biomedical Engineering, Department of
Biomedical Engineering, January 2002
14. Varunadikkarapatti, V.: Optimal EEG channels and rhythm selection for task classification.
A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Masters of
Science in Engineering, Madras University, India (2004)
15. Raja, L., Arunkumar, B.: A comparative study of various artificial neural network classifiers for
EEG based autism spectrum disorder diagnosis. J. Adv. Res. Dyn. Control Syst. 11(1) (2019)
Parallel Antlion Optimisation (ALO)
and Grasshopper Optimization (GOA)
for Travelling Salesman Problem (TSP)
1 Introduction
The Travelling Salesman Problem (TSP) is to find the shortest path that goes through
all the cities and returns to the first city in the path, given the direct path length
between each pair of cities. This problem has no known exact optimal polynomial-
time algorithm and is NP-complete. Many heuristic and meta-heuristic algorithms
strive to find near-optimal solutions. Meta-heuristic algorithms iteratively generate
a vast number of random solutions and search for global optima.
Ant Colony Optimisation (ACO) [5] and Genetic Algorithm (GA) [4] have been
applied widely to solve TSP. In this paper, we propose our adapted ALO and GOA
algorithm to solve TSP. We have also developed exclusive parallel versions of these
algorithms and have evaluated our algorithm compared to earlier proposed ACO
and GA. Our experimental evaluation reveals that our proposed ALO and GOA
algorithms are faster and more accurate in finding a solution to TSP for an even large
number of cities. Our main contributions are as follows:
1. We adapted ALO and GOA targeting TSP and this, to the best of our knowledge,
is the first attempt in this direction.
2. We have additionally developed and implemented parallel versions of our ALO
and GOA for TSP.
3. We performed a detailed hard-nosed experimental evaluation of the accuracy of
our algorithms as well as speedup of the parallel version of our ALO and GOA
that shows a significant gain in both accuracy and performance as compared to
the earlier state-of-the-art ACO and GA algorithms.
The rest of the paper is organized as follows. We explain related work in Sect. 2,
followed by our ALO and GOA algorithm as adapted to solve TSP in Sects. 3
and 4 respectively including parallel versions of these algorithms. We analyze
our results in Sect. 5 and conclude in Sect. 6 with mention of future directions for
this work.
2 Related Work
The Held Karp dynamic programming Solution [1, 2] brings down the O(N !) time
complexity to O(n 2 2n ) with the space complexity of order O(n2n ) and is one of the
best known exact solutions to TSP. However, the algorithm is still exponential with
space complexity that increases exponentially with an increasing number of cities
thereby this solution quickly becomes prohibitive even with 25–30 cities. Therefore,
meta-heuristics such as ACO [5] and GA [4] have been applied to TSP problems.
GA evolves a collection of possible solutions known as phenotype towards a better
solution by altering and rearranging solutions and selecting the fittest solution for the
next generation based on the principle of “Survival of the fittest”. However, GA tends
to frequently converge towards local optima thereby missing the optimal solution.
ACO exploits an ant’s capability to find the shortest path to a destination containing
food with each path travelled by an ant associated with a pheromone trail aiding in
path tracing. The intensity of the pheromone trail is proportional to the quality of a
sub-path, a property that is used to build better solutions while each ant decides on
which sub-path to take next. However, ACO does not scale well and thereby is not
efficient for large scale combinatorial optimization problems such as when solving
TSP for a large number of cities.
ALO mimics the interaction of antlions and their prey which are ants with various
random operators used to find optimal solutions. ALO has been used recently to
successfully solve a variety of engineering problems, including training neural net-
works. GOA is a population-based algorithm that mimics the movement of a swarm
of grasshoppers and their social interaction. Each grasshopper in the swarm repre-
sents a solution to the optimization problem. In contrast to earlier applications of
Parallel Antlion Optimisation (ALO) and Grasshopper … 789
ALO and GOA primarily that are primarily in the context of continuous optimiza-
tion problems, we have adapted and parallelized ALO and GOA targeting discrete
optimization problems.
ALO makes use of random operators to avoid local optima compared to other meta-
heuristic algorithms. In this paper, we propose a novel discrete version with an array
of cities to model the TSP path in contrast to the proposal in [3] targeted at optimizing
parameters of continuous variables that can not be used for a solution to TSP.
In our implementation, as outlined in Algorithm 1, each ant and antlion is a
permutation of the cities, and fitness is the total path length of the array of cities.
Ants randomly walk in a bounded area (search space) represented as an array of
numbers which are initialized to the maximum distance between two cities (upper
bound). We propose “random permutation of cities” as the random walk. Since a
random permutation can lead to the ant going out of the search space, we also
propose a new normalization function that tries to bring the maximum number of
distances between cities within the search space. The normalization function iterates
through the cities in the path. If the distance between any two cities i and j is greater
than the upper bound, then a city k is found such that the distance between i and k
is less than the distance between i and j, and following this, k and j are swapped.
This process repeats until the end of the array is reached. During the random walk,
ants can sometimes fall into the trap of antlions which is simulated using a roulette
wheel to select the antlion, which traps an ant. The roulette wheel is used to give
more preference to fitter antlions. To simulate elitism, which is a salient feature of
ALO, the antlion with the minimum fitness or minimum path length is taken as a
better solution (elitism). Since the elite antlion makes the best traps affecting all ants,
after the random walk of an ant, the position is updated taking into consideration
both the elite antlion and the antlion which was randomly selected using a roulette
wheel. This updated path can sometimes violate the TSP path by having a city more
than once in the path. So a function was developed to correct the path. To simulate
the sliding of the ants towards the antlion, the search space is gradually reduced as
the number of iterations increases.
than the attractive force. A parameter c is used to represent this comfort zone. Ini-
tially, this parameter is high, allowing the grasshoppers to explore large parts of the
search space. Over the iterations, the value of the parameter is reduced, leading to
the movement and convergence of the grasshoppers.
The overall process, as described in Algorithm 2, is as follows. First, the grasshop-
pers are initialized with a random permutation of cities. Also, a new grasshopper-
update function for the TSP problem is introduced compared to what is proposed
in [6]. The modification of the grasshopper’s position depends on three criteria.
(a) the current position of the grasshopper
(b) a random grasshopper’s position, and
(c) the best grasshopper’s position.
The random grasshopper’s position simulates the function s, which calculates the
social forces as in [6]. The main interactions between grasshoppers are attractive and
repulsive forces. These interactions are modelled as a comfort zone around every
grasshopper where the repulsive force is greater than the attractive force. A parameter
c is used to represent this comfort zone. Initially, this parameter is high, allowing the
grasshoppers to explore large parts of the search space. Over the iterations, the value
of the parameter is reduced, leading to the movement and convergence.
Parallelization strategy used for GOA is similar to ALO with parallelization of the
initialization of search agents as well as fitness calculation.
5 Performance Evaluation
Figure 1 compares the accuracy of ALO, GOA, ACO and GA with reference to
the Held Karp dynamic programming Algorithm for progressively more number of
cities. We observe that our GOA and ALO algorithms perform the best in accuracy,
whereas the Genetic Algorithm (GA) performs the worst.
Figure 2 depicts the speedup of the Parallel version of ALO, GOA, ACO and GA
compared to the corresponding serial version. Our ALO and GOA algorithms are
on average 1.5× and 4× faster compared to the serial version respectively on our
system with Ryzen 7 1700 8-Core CPU with 16 GB of memory and Nvidia GPU:
Nvidia GTX 1050 Ti GPU. There is little speedup at all for most of the algorithms
on a small number of cities because of the overheads of thread spawning that steals
away any parallelism benefits.
In this paper, we presented our adapted ALO and GOA algorithms for TSP, along
with exclusively designed and developed parallel high-performance versions of these
algorithms. We implemented serial and parallel versions of ALO, GOA and exper-
imentally evaluated the accuracy and performance compared to other well-known
algorithms. Our experimental results revealed that our ALO and GOA algorithms per-
Fig. 2 Comparison of time taken by serial and parallel version of algorithms on Ryzen 7 with GTX
1050Ti
References
1. Bellman, R.: Dynamic programming treatment of the travelling salesman problem. J. ACM
(JACM) 9(1), 61–63 (1962)
2. Held, M., Karp, R.M.: A dynamic programming approach to sequencing problems. J. Soc. Ind.
Appl. Math. 10(1), 196–210 (1962)
3. Mirjalili, S.: The ant lion optimizer. Adv. Eng. Softw. 83, 80–98 (2015)
4. Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA, USA (1996)
5. Moyson, F., Manderick, B.: The Collective Behavior of Ants: An Example of Self-organization
in Massive Parallelism. Vrije Universiteit Brussel, Artificial Intelligence Laboratory (1988)
6. Saremi, S., Mirjalili, S., Lewis, A.: Grasshopper optimisation algorithm: theory and application.
Adv. Eng. Softw. 105, 30–47 (2017). https://doi.org/10.1016/j.advengsoft.2017.01.004
Design and Development of Machine
Learning Model for Osteoarthritis
Identification
Abstract In the body of human beings, the calcaneus or heel bone is one of the
strongest and biggest bone which is in the foot. It helps the foot flexibly in normal
walking movements. In recent years, most of the people in the age group of 35
to 50 years are falling victim to osteoarthritis (calcaneal shift) as it makes serious
impacts continuously. Such types of diseases direct to ailment of the knee. If the
ache in the knee is not released by physiotherapy or medication, the affected person
may be confined to bed and has to undertake calcaneal osteotomy. It leads to the
development of a user-friendly application to load and run Calcaneus images into
the application model developed for calcaneal osteotomy. In general, machines are
available to predict calcaneal shift occurrence, but they fail to predict and analyze
the other subtypes of the calcaneal shift in the foot. This research work focused on
developing a model to predict and analyze the subtypes of calcaneal shift occurrence
in the foot.
N. S. K. Babu (B)
Department of Computer Applications, Career Point University, Kota, India
e-mail: kiranbabu.naidu16@gmail.com
E. M. Reddy
Department of CSE, Guru Nanak Institutions Technical Campus, Hyderabad, India
e-mail: e_mreddy@yahoo.com
S. Jayanthi
Department of IT, Guru Nanak Institute of Technology, Hyderabad, India
e-mail: drsjayanthicse@gmail.com
K. Rajkumar
School of Computer Science and Information Technology, DMI-St John the Baptist University,
Mangochi, Malawi
e-mail: rajkumarengg2020@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 795
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0_84
796 N. S. K. Babu et al.
1 Introduction
Fig. 1 External symptoms of pes planus a crumple of medial fallen arch, b malalignment of
the hindfoot and c forefoot abduction
Design and Development of Machine Learning Model … 797
available to treat flatfoot without harming the patient. Moreover, it is very tricky
for making comparisons directly on the different medical procedures performed on
confounding variables which are generated from the treatments given to specific
patients [3–6].
However, computational models are significantly influential, and they are
restricted to certain degree based on the robustness of its development and the rigor-
ousness of its validation. There are a number of issues such as anatomical accu-
racy, tissue mechanical properties and biofidelic boundary conditions which arise
in computational models. Cadaveric models are efficient tools which have often
adopted to be used to investigate more invasive biomechanics. In cadaveric models,
vitro lower limbs use to be loaded in the physiological simulation. It presents an
efficient way to explore foot mechanics and diverse treatment methods [7–9].
It permits to measure the different parameters which would not ethically be
feasible on living human being. For example, quantifying motion with bone pines
is very difficult task with living human beings. However various studies have been
carried out on flatfoot, most of the research and studies are quasi-static (or static), and
examine the farefoot at midstance, or select particular specific locations in stance.
Dynamic gait simulators are a recently developed simulator that avail researchers to
perform completely dynamic simulation with cadaveric specimens [6, 10].
2 Literature Review
After an extensive review carried out in the related work, it is observed that the aim of
the statistical model used for ideology is to enhance efficiency of a computational foot
system by using artificial neural networks. Human joints are used for computational
foot modeling to observe joint deformation and kinematics and also to study about
how joint function is affected by its structure. More particularly, research works
carried out using these foot computational models have done study on many topics
which includes joint motion and the positions of related bones with simulation of
load, and the forces made on joints due to injury or day to day routine activities, or
the hardware positioning for correcting the defect.
A. Evaluating calcaneal load from the footprint of human while standing using
a 3D scanner.
This research has done an extensive research in finding the relationship between the
footprint load and its depths in the calcaneal area of the human standing straightly.
Footprint depths which are the deformation in calcaneal area have been obtained
through the z-value extraction obtained from 3D scanner foot scanning. In this obser-
vation, force-sensing resistor sensor has been placed over the shoe in the calcaneal
area. Then, the peak loads are estimated from the footprint. To carry out these findings,
20 patients have been selected [1, 11, 12].
798 N. S. K. Babu et al.
Deep CNN model used classification of sentences and has three of the filtered region
with sizes: 2, 3 and 4, and each has two filters as shown in Fig. 2. Filters on this model
perform convolution in sentence matrix and produce feature maps with variable
length. Followed by this step, 1-max pooling will be carried out over each of the
maps. From six maps, univariate feature vector is created and is merged to formulate
a feature vector to penultimate layer. Subsequently, this feature vector is sent as the
input to the final layer of softmax model to perform classification in the sentence;
here, as we perform binary classification, it produces two outputs such as normal and
defected [10, 14].
Design and Development of Machine Learning Model … 799
3 Methodology
The research work on the proposed method is partitioned into four parts, namely
data collection, data preprocessing, fusion and features extraction, and classification
of patterns. First, in the data collection phase, images have been collected from
individuals. In the second phase, deep convolution neural network is applied to
extract images. In the third phase, important features are extracted from the images.
During the final phase, softmax algorithm, a pattern recognition algorithm has been
for recognition of patterns as either normal or defected.
800 N. S. K. Babu et al.
The goal of developing the novel classification model which takes a collection of
sample test case inputs to provide a reliable coverage at a identifiable depth in test
space. This leads to a set of test cases which are focused on triggering the functionality
of the foot independent of the model used for the implementation. This softmax model
based on a deep CNN can be divided into three steps.
1. Defining the operational scope of the softmax model. This step included data
acquisition and preprocessing stage of the osteoarthritis/flatfoot identification.
2. Identifying and enumerating the attributes in images and its values, respectively.
This step includes analyzing the preprocessed image which re-fed as input to
CNN softmax algorithm.
3. Appling the deep CNN softmax model on the data/images and classifying the
images either as normal or defected (Fig. 3).
The detailed architectural model for automatic detection of images of flatfoot and
prediction of results is shown in Fig. 4. It is similar to digital stain which identifies
the images’ region that is required and most relevant to classify as either normal or
defected in flatfoot. This research work is intended to propose a model for classifica-
tion of flatfoot by using CNN softmax algorithm. The efficacy of this model will be
tested on varying foot images to detect its distortion. Its efficiency parameters also
will be evaluated by comparing with that of comparative algorithms.
Visual Interpretable
Prediction
Automatic Detection of
Basal-Cell Carcinoma
(Softmax Classifier)
Image Representation
(Convolutional Auto
Encoder)
Unsupervised Failure
Learning
Image
Fig. 4 Architecural model for classification of calcaneal image using softmax algorithm
6 Conclusion
Calcaneal lengthening osteotomy is used for pain relief and notable clinical and
radiographic modification in forefoot and hindfoot for symptomatic pes planovalgus.
Different feeding techniques can be implemented to further enhance the model.
Further work can be carried out to detect the congenital anomalies related to this
calcaneal shift. And also, this program code can also be made to design for design
standards other than the Indian Standard by incorporating necessary modifications.
References
1. Albon, T.: Plantar force distribution for increasing heel height within women’s shoes. Physics,
The College of Wooster, Wooster, Ohio, December 2011
2. Wibowo, D.B., Gunawan, D.H., Agus, P.: Estimation of foot pressure from human footprint
depths using 3D scanner. AIP Conf. Proc. 1717 (2016)
802 N. S. K. Babu et al.
3. Wright, R.W., Boyce, R.H., Michener, T., Shyr, Y., McCarty, E.C., Spindler, K.P.: Radiographs
are not useful in detecting arthroscopically confirmed mild chondral damage. Clin. Orthop.
Relat. Res. 245–25 (2006)
4. Urry, S., Wearing, S.: Arch indexes from ink footprints and pressure platforms are different.
Foot 15(2), 68–73 (2005)
5. Hunt, A.E., Fahey, A.J.: Static measures of calcaneal deviation and arch angle as predictors of
rearfoot motion during walking. Aust. J. Phys. 46, 9–17 (2000)
6. Hsu, T.C., et al.: Comparison of the mechanical properties of the heel pad between young and
elderly adults. Arch. Phys. Med. Rehabil. 79, 1101–1104 (1998)
7. Barrett, S.L., O’Malley, R.: Plantar fasciitis and other causes of heel pain. Am. Fam. Phys.
15:59(8), 2200–2206 (1999)
8. Nass, D., Hennig, Treek, V.: The thickness of the heel pad loaded by bodyweight in obese
and normal weight adults. Biomechanics Laboratory, University of Essen, Germany, D 45117
(2000)
9. Pinto, C.C., Marques, M., Ramos, N.V., Vaz, M.A.P.: 3D modelling for FEM simulation of an
obese foot. ResearchGate. Conference Paper, January 2010
10. Filardi, V.: Flatfoot and normal foot a comparative analysis of the stress shielding. 15(3),
820–825 (2018)
11. Baldonado, M., Chang, C.-C.K., Gravano, L., Paepcke, A.: The stanford digital library metadata
architecture. Int. J. Digit. Libr. 108–121 (1997)
12. ScanPod3D: 3D Scanner Mini and Scansoft for Foot Orthotic. Vismach Technology Ltd. www.
scanpod3d.com (2013)
13. Lee, D.G., Davis, B.L.: Assessment of the effects of diabetes on midfoot joint pressures using
a robotic gait simulator. Foot Ankle Int. 30(8), 767–772 (2009)
14. Kim, J.: Convolutional neural network (CNN) perform text classification with word embed-
dings. In: Towards Data Science, Dec 3, 2017
Author Index
B E
Babu, Brundha Rajendra, 317 Eapen, Justin, 297
Babu, Naidu Srinivas Kiran, 795 Emon, Ismail Siddiqi, 363, 761
Balakesava Reddy, P., 53
Bala, Shashi, 493
Begum, Gousiya, 269 F
Bhandari, Smriti, 341 Faizul Huq Arif, Md., 363
Bhandigare, Shivani, 643 Febi Shine, B. S., 633
Bhardwaj, Vivek, 195, 493 Fernandes, Chelsea, 701
© The Editor(s) (if applicable) and The Author(s), under exclusive license 803
to Springer Nature Singapore Pte Ltd. 2021
H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering,
Lecture Notes in Networks and Systems 171,
https://doi.org/10.1007/978-981-33-4543-0
804 Author Index
I
Islam, Aminul, 583 M
Maan, Veerpaul Kaur, 123
Maheshwari, Sagar, 693
J Mahtab, Sheikh Shahparan, 363, 761
Jahan, Busrat, 363, 761 Maity, Soumayadev, 583
Jaimin, Patel, 241 Malaganve, Pradnya, 547
Jaya Kumar, D., 393 Malage, Rajshri N., 375
Jayanthi, S., 795 Malathi Latha, Y. L., 143
Jena, Debasish, 73 Malhotra, Ruchika, 727
Jetawat, Ashok, 385 Manivannan, S. S., 43
Jeyakumar, G., 307, 411, 539 Mantri, Archana, 605
Jha, Shikha, 9 Mate, Yash, 171
John, Jeffin, 297 Mathur, Aakansha, 503
John, Jewel Moncy, 297 Mavilla, Venkata Sai Dheeraj, 259
Joseph, Ebin, 297 Milu, Sharmin Akter, 353, 363, 761
Julfiker Raju, Md., 363 Mohapatra, Niva, 73
Juliet, D. Sujitha, 225 Moharana, Suresh Chandra, 719
Moon, Kapila, 385
Mund, Ganga Bishnu, 719
K Mupila, Francis K., 437, 447
Kadam, Aishwarya, 643 Murali Krishna, T., 135
Kadyan, Virender, 195, 493 Murthy, GRS, 285
Kalsekar, Samruddhi, 701 Myna, P., 317
Author Index 805
V
Valai Ganesh, S., 1, 327 Y
Varghese, Elizabeth, 633 Yadav, Anupama, 651
Vasudevan, Vignesh, 277 Yashaswini, L., 403